date posted: 2020-10-19
Data points that are observed at specified times usually at equal intervals are referred to as time series data. Time series is very important in real life since most data are measured in time consequtive manner. Ex: Stock prices being recorded every second.
Time series analysis are used to predict the future. For example using past 12 months sales data to predict next n month sales therefore we could act accordingly.
Four components that explains time series data:
Auto Regressive Integrated Moving Average
a.k.a Box-Jenkins method.
It is a naive model since it assumes time series data are:
Number of lags of Y to be used as predictors. In other words, If you are trying to predict June's sale how many previous(lag) month's data are you going to use?
Number of lagged forecast errors -> how many past forecast errors will you use?
Minimum number of differencing needed to make time series data stationary. Already stationary data would have d = 0.
What is stationary?
Time series data considered stationary if it contains:
In most cases time series data increase as time progresses therefore if you take consecutive segments it will not have constant mean. Below graph is Nvidia stock prices which is an example of non-stationary data. Segment into n periods and take means, they won't be the same.
Stationarity is important since we need our time series data to be stationary
before using models to forecast future.
Often times it is non-stationary therefore we difference it, subtract previous value from current value.
Since it is important to have stationary time series data, we need a way to test it.
Common methods of testing whether time series data is stationary are:
Augmented Dickey Fuller(ADF) Test