### In our first Methods of Demand Forecasting blog we discussed Triple Exponential Smoothing, ARIMA and Multiple Linear Regression. Below Joe breaks down Bayesian Structural Time Series (BSTS) and Prophet.

**Introduction**

Both of the algorithms discussed in today's blog are fairly sophisticated. I've tried to remove as much of the technical language as possible, while still communicating the general idea of the model. I've elected not to spend too much time discussing how the parameters are estimated. If you already have a technical background, I encourage you to read the following white papers: Bayesian Structural Time Series and Facebook's Prophet. In the interest of consistency, I have used the same notation to describe the models as was used in their respective white papers.

**Bayesian Structural Time Series (BSTS)**

BSTS falls under a broad class of modeling techniques called state space models. The big idea is the we want to predict an unobserved latent state, \(\alpha_t\), using noisy measurements of a quantity that we can observe, \(y_t\). The equation modeling this hidden relationship is called the observation equation:

$$y_t = Z^T_t\alpha_t + \epsilon_t$$

Typically, the noise \(\epsilon_t\) assumed to follow a Gaussian distribution, but this need not always be the case. We also assume that, even though we can't directly observe our latent state, we have some idea of how it evolves over time. We model this using what called a transition equation:

$$\alpha_{t+1} = T_t \alpha_t + R_t \eta_t$$

where \(T_t\) is the transition matrix for the latent state at time t, and \(R_t\) models the interaction of the noise terms modeled by \(\eta_t\). Depending on the nature of the problem you are solving, the model matrices will contain a mixture of known values and unknown parameters. In a structural time series, these model matrices are allowed to depend on time. This time dependency is typically computed using some combination of Kalman filtering, Kalman smoothing, and sampling from posterior distributions using Markov Chain Monte Carlo methods.

Basically, the model uses past values of the latent state to estimate future values of the hidden state. The estimates of the future values of latent state are then used to update the estimates of the past values of the latent state. While all of this is going on, there is a variable selection process that is choosing which predictive variables are most relevant.

**Prophet**

One of the problems with time series forecasting using ARIMA models is the requirement that the data have been observed at a regular interval. Missing data requires imputation, which can introduce additional error in forecasts. Facebook's Prophet model gets around this treating time series modeling as a piecewise-curve fitting problem. By not explicitly accounting for the temporal aspect of the data, missing data or irregular sampling intervals are irrelevant. As with most computational methods, this advantage comes at the expense being able to use the time lag structure associated with ARIMA model for inference. They adopt an additive modeling approach using

$$y(t) = g(t) + s(t) + h(t) + \epsilon_t$$

where \(g(t)\) models the growth (trend) of the data, \(s(t)\) models the periodic (seasonal) component of the data, \(h(t)\) models the impact the holidays have on the data, and \(\epsilon_t\)* *is the model error at time *t*. Prophet allows for changes to occur in the trend component of the model by adopting a piecewise modeling approach. These potential changes growth occur at "change points." These could be holidays, promotions, natural disasters, anything that you would expect to impact growth. How do they model seasonality without the explicit time dependence associated with ARIMA models? Good question, astute reader. They model seasonality by using Fourier series. What's a Fourier series? Well, given that a function \(f(x)\) with period \(P\) meets certain conditions, we can write it as the following infinite sum

$$f(x) = \sum_{n=-\infty}^{\infty} a_n e^{i\frac{2\pi xn}{P}},$$

where \(a_n\) are real numbers. Prophet doesn't actually compute the infinite sum. It creates a vector of 2N regressors

$$X(t) = \left[ e^{i\frac{2\pi(-N)}{365.25}} \dots e^{i\frac{2\pi(N)}{365.25}} \right],$$

and applies a Gaussian smoothing prior to smooth out the seasonal behavior. The holiday behavior is modeled using an indicator function which is 1 if the day is a holiday, 0 otherwise. As with seasonality, a smoothing prior is applied.

**I saw a bunch of math words and I panicked! What should I do?**

I suspect that you identify as someone who is "not a math person." Incorrect. You're a math person who, currently, isn't good at math. Don't worry. You're not alone and I'm here for you. However, this is the internet, this is a place for lists with hyperbolic titles.

**The 3 best math things to improve your data science chops:**

- Let Gilbert Strang teach you Linear Algebra - Watch the video lecture series here.
- Let Gilbert Strang teach you Calculus - Watch the video lecture series here.
- Read The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani and Jerome Friedman."