Stationarity Testing

date posted: 2020-10-19




Time series

Data points that are observed at specified times usually at equal intervals are referred to as time series data. Time series is very important in real life since most data are measured in time consequtive manner. Ex: Stock prices being recorded every second.

Time series analysis are used to predict the future. For example using past 12 months sales data to predict next n month sales therefore we could act accordingly.

Four components that explains time series data:

  1. Trend : Upward, downward, or stationary. If your company sales increase every year it is showing an upward trend.
  2. Seaonality: Repeating pattern in certain period. Ex: difference between summer and winter. Also includes special holidays
  3. Irregularity: External factors that affect time series data such as Covid, natural disasters.
  4. Cyclic: repeating up and down time series data.



ARIMA

Auto Regressive Integrated Moving Average
a.k.a Box-Jenkins method.

  • It is class of models that forecase using own past values: lag values and lagged forecast errors.
  • AR model uses lag values to forecast
  • MA model uses lagged forecast errors to forecast
  • Two models Integrated becomes ARIMA
  • Consists of three parameters: p, q, d

It is a naive model since it assumes time series data are:

  • "non-seasonal" meaning different seasons do not affect its values. When there exists seasonality we use SARIMA short for Seasonal ARIMA model
  • Has no Irregularity


Parameters
p - order of AR term

  Number of lags of Y to be used as predictors. In other words, If you are trying to predict June's sale how many previous(lag) month's data are you going to use?



q - order of MA term

  Number of lagged forecast errors -> how many past forecast errors will you use?



d - Minimum differncing period

  Minimum number of differencing needed to make time series data stationary. Already stationary data would have d = 0.

What is stationary?
Time series data considered stationary if it contains:

  1. constant mean
  2. constant variance
  3. Autocovariance that do not depend on time

In most cases time series data increase as time progresses therefore if you take consecutive segments it will not have constant mean. Below graph is Nvidia stock prices which is an example of non-stationary data. Segment into n periods and take means, they won't be the same.

Stationarity is important since we need our time series data to be stationary before using models to forecast future.
Often times it is non-stationary therefore we difference it, subtract previous value from current value.

Since it is important to have stationary time series data, we need a way to test it.
Common methods of testing whether time series data is stationary are: Augmented Dickey Fuller(ADF) Test