Skip to content

Latest commit

 

History

History
241 lines (150 loc) · 12.9 KB

README.md

File metadata and controls

241 lines (150 loc) · 12.9 KB
Logo

Stock Prediction ML App

Never, ever argue with your trading system.

Table of Contents
  1. About The Project
  2. Models

About The Project

Time Series Forecasting App

As any one of us could guess, the market is unstable and, more than often, unpredictable. For several decades researchers have toyed with time-series data to predict future values – of which the most challenging and potentially lucrative application is predicting the values of stocks for a given company. However, as expected, market change depends on many parameters of which only a bunch can be quantified, such as historical stock data, the volume of trade, current prices. Of course, fundamental factors such as a company’s intrinsic value, assets, quarterly performance, recent investments, and strategies all affect the traders’ trust in the company and thus the price of its stock. Only a few of the latter can be incorporated effectively into a mathematical model.

While time series analysis is all about understanding the dataset; forecasting is all about predicting it. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values.

Models Used

Time series models are used to forecast events based on verified historical data. Common types include ARIMA, smooth-based, and moving average. Not all models will yield the same results for the same dataset, so it's critical to determine which one works best based on the individual time series.

(back to top)

Models

Simple Exponential Smoothing

A simple exponential smoothing is one of the simplest ways to forecast a time series. The basic idea of this model is to assume that the future will be more or less the same as the (recent) past. Thus, the only pattern that this model will learn from demand history is its level. This model is called exponential smoothing as the weight given to each demand observation is exponentially reduced.

Model

The underlying idea of an exponential smoothing model is that, at each period, the model will learn a bit from the most recent demand observation and remember a bit of the last forecast it did. The magic about this is that the last forecast populated by the model already included a part of the previous demand observation and a part of the previous forecast. And so forth. That means that this previous forecast includes everything the model learned so far based on demand history. The smoothing parameter (or learning rate) alpha will determine how much importance is given to the most recent demand observation.

  • alpha is a ratio (or a percentage) of how much importance the model will allocate to the most recent observation compared to the importance of demand history.
  • alpha d{t-1} represents the previous demand observation times the learning rate. You could say that the model attaches a certain weight (alpha) to the last demand occurrence.
  • (1-alpha) f{t-1} represents how much the model remembers from its previous forecast. Note that this is where the recursive magic happens as f{t-1} was itself defined as partially d{t-2} and f{t-2}.

(back to top)

Double Exponential Smoothing

This method is also known as Holt’s method, after Charles C. Holt and his paper from 1957.

Model

It’s called double exponential smoothing because it’s based on two smoothing parameters — Alpha (for level) and Beta (for trend). The algorithm solves the primary issue of simple exponential smoothing, as now the forecasts can account for the trend in historical data. Speaking of trend, it can be either additive or multiplicative:

  • Additive trend — trend grows linearly over time.
  • Multiplicative trend — trend doesn’t grow linearly and shows a curvature — even a slight one.

  • l(t) is level at time t.
  • x(t) is data value at time t.
  • b(t) is trend at time t.
  • n represents the number of time steps into the future.
  • Alpha and Beta are the smoothing parameters. Alpha is weight for the level and Beta is weight for the trend.
  • ŷ(t+n) is n-step-ahead forecast, at time t.

(back to top)

Triple Exponential Smoothing

Three years later (1960), Peter R. Winters and Charles. C. Holt extended the original Holt’s method to address for seasonality. The algorithm was named after both of them — Holt-Winters’ method.

Model

Triple exponential smoothing is used to handle the time series data containing a seasonal component. Yet another parameter was added — Gamma — to address for the seasonal component. Just like trend, the seasonality can also be additive or multiplicative.

  • l(t) is level at time t.
  • x(t) is data value at time t.
  • b(t) is trend at time t.
  • c(t) is seasonality at time t.
  • n represents the number of time steps into the future.
  • Alpha (Data smoothing factor. The range is 0 < α <1.), Beta (Trend smoothing factor. The range is 0 < β < 1.) and Gamma (Seasonal change smoothing factor. The range is 0 < γ <1.) are the smoothing parameters. Alpha is weight for the level, Beta is weight for the trend and Gamma is weight for the seasonality.
  • ŷ(t+n) is n-step-ahead forecast, at time t.

(back to top)

Auto Regressive Model

In a multiple regression model, we forecast the variable of interest using a linear combination of predictors. In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable. The term autoregression indicates that it is a regression of the variable against itself.

Model

Autoregressive models are remarkably flexible at handling a wide range of different time series patterns. We normally restrict autoregressive models to stationary data, in which case some constraints on the values of the parameters are required.

  • ε(t) is white noise. This is like a multiple regression but with lagged values of y(t) as predictors. We refer to this as an AR(p) model, an autoregressive model of order p.

(back to top)

Moving Average Model

Rather than using past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model.

Model

It is a time series model that accounts for very short-run autocorrelation. It basically states that the next observation is the mean of every past observation.

(back to top)

ARMA Model

An ARMA model, or Autoregressive Moving Average model, is used to describe weakly stationary stochastic time series in terms of two polynomials. The first of these polynomials is for autoregression, the second for the moving average.

Model

Often this model is referred to as the ARMA(p,q) model; where:

  • p is the order of the autoregressive polynomial.

  • q is the order of the moving average polynomial.

(back to top)

ARIMA Model

ARIMA, short for ‘Auto Regressive Integrated Moving Average’ is actually a class of models that ‘explains’ a given time series based on its own past values, that is, its own lags and the lagged forecast errors, so that equation can be used to forecast future values.

Model

  • Predicted Yt = Constant + Linear combination Lags of Y (upto p lags) + Linear Combination of Lagged forecast errors (upto q lags)

  • The objective, therefore, is to identify the values of p, d and q. .

(back to top)

Linear Regression

Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable.

Model

Ordinary least squares (OLS) is a method to quantify the evaluation of the different regression lines. According to OLS, we should choose the regression line that minimizes the sum of the squares of the differences between the observed dependent variable and the predicted dependent variable.

We can find a line that best fits the observed data according to the evaluation standard of OLS. A general format of the line is:

  • Here, μᵢ is the residual term that is the part of yᵢ that cannot be explained by xᵢ.

(back to top)

Random Forest

Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It is an ensemble learning method, constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean/average prediction (regression) of the individual trees. It can be used for both Classification and Regression problems in ML. However, it can also be used in time series forecasting, both univariate and multivariate dataset by creating lag variables and seasonal component variables manually.

(back to top)