Haneul Kim's - Stationarity Testing

Stationarity Testing

date posted: 2020-10-19

Time series
ARIMA
References

Time series

Data points that are observed at specified times usually at equal intervals are referred to as time series data. Time series is very important in real life since most data are measured in time consequtive manner. Ex: Stock prices being recorded every second.

Time series analysis are used to predict the future. For example using past 12 months sales data to predict next n month sales therefore we could act accordingly.

Four components that explains time series data:

Trend : Upward, downward, or stationary. If your company sales increase every year it is showing an upward trend.
Seaonality: Repeating pattern in certain period. Ex: difference between summer and winter. Also includes special holidays
Irregularity: External factors that affect time series data such as Covid, natural disasters.
Cyclic: repeating up and down time series data.

ARIMA

Auto Regressive Integrated Moving Average
a.k.a Box-Jenkins method.

It is class of models that forecase using own past values: lag values and lagged forecast errors.
AR model uses lag values to forecast
MA model uses lagged forecast errors to forecast
Two models Integrated becomes ARIMA
Consists of three parameters: p, q, d

It is a naive model since it assumes time series data are:

"non-seasonal" meaning different seasons do not affect its values. When there exists seasonality we use SARIMA short for Seasonal ARIMA model
Has no Irregularity

Parameters

p - order of AR term

Number of lags of Y to be used as predictors. In other words, If you are trying to predict June's sale how many previous(lag) month's data are you going to use?

q - order of MA term

Number of lagged forecast errors -> how many past forecast errors will you use?

d - Minimum differncing period

Minimum number of differencing needed to make time series data stationary. Already stationary data would have d = 0.

What is stationary?
Time series data considered stationary if it contains:

constant mean
constant variance
Autocovariance that do not depend on time

In most cases time series data increase as time progresses therefore if you take consecutive segments it will not have constant mean. Below graph is Nvidia stock prices which is an example of non-stationary data. Segment into n periods and take means, they won't be the same.

Stationarity is important since we need our time series data to be stationary before using models to forecast future.
Often times it is non-stationary therefore we difference it, subtract previous value from current value.

Since it is important to have stationary time series data, we need a way to test it.
Common methods of testing whether time series data is stationary are: Augmented Dickey Fuller(ADF) Test

Stationarity Testing

Contents

Time series

ARIMA

Parameters

References: