Editor’s note: Nicole Königstein is a speaker for ODSC Europe 2022. Be sure to check out her talk, Dynamic and Context-Dependent Stock Price Prediction Using Attention Modules and News Sentiment, there to learn more about financial time series prediction!
The use of neural networks is relatively new in finance, as the cost of establishing a network previously outweighed the benefits. Traditional econometric modeling is different yet complementary to neural network modeling. Researchers make assumptions about data distributions within traditional econometrics or statistics ahead of the analysis. In contrast, researchers who use neural networks allow the data (and therefore the networks) to decide what fits best. Allowing computers to make decisions in this context has spawned an industry known as ”fintech,” which focuses on digital breakthroughs and technology-enabled improvements in the financial sector. These advances within the financial industry are driven by the fact that computers are capable of modeling extremely complicated and multi-dimensional data, and by the ability of machine learning to generalize parametric methods in financial econometrics. For instance, when data is in the form of a time series, for example, for predicting stock prices, neural networks can be set as recurrent to incorporate memory into the model. By utilizing these modeling tools, we can relax the assumptions associated with classical time series prediction methods such as ARIMA  and GARCH . In other words, RNNs extend classical time series approaches semi-parametrically or even non-parametrically. Figure 1 provides an overview of how neural networks and deep learning generalize classical econometrics methods.
Figure 1: Overview of how machine learning generalizes econometrics. The image is adopted from Machine Learning in Finance, Dixon et al., 2020 .
This aforementioned recurrent structure enables parameter sharing across a much deeper computational graph. In this way, RNNs can incorporate information on prior observations into the computation they perform by using the current feature vector. RNNs are non-linear time series models that extend the capabilities of conventional linear time series models such as AR(p), which are often used for financial modeling. With their recurrent formulation, they provide a powerful approach for predicting financial time series. As with their traditional multivariate autoregressive equivalents, they fit each input sequence with an autoregressive function . Furthermore, RNNs give a flexible functional form for modeling the predictor directly, rather than directly imposing an autocovariance structure.
Figure 2: A simple method that uses a network module A to process T distinct inputs in sequential order. Each item in the sequence is handled separately from the others. However, this does not acknowledge the data’s sequential nature, because there is no path between and . Image source: Inside Deep Learning, Edward Raff, 2022 .
Figure 3: The prior hidden state and the current input are both inputs to the network A. This allows us to “unroll” the network and distribute data over many time periods. This effectively deals with the data’s sequential nature. Image source: Inside Deep Learning, Edward Raff, 2022.
Long Short-Term Memory, Gated Recurrent Units, and -RNNs
However, plain RNNs have difficulty in learning long-term dynamics, which is partly related to the vanishing and exploding gradient problem , which results from back-propagating the gradients down through the many unfolded layers of the network. Nonetheless, there are two advanced types of RNNs, which address these issues: the Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU).
Figure 4: Here, in Figure 4, the “cell” of the LSTM recurrent network is depicted as a block diagram. The conventional hidden units of ordinary recurrent networks are replaced with cells that are linked recurrently to each other. A typical artificial neuron unit is used to compute an input characteristic; and if the sigmoidal input gate enables it, its value can be collected into the state. The forget gate controls the weight of a linear self-loop in the state unit. The output gate can be used to turn off the cell’s output. The input unit can have any squashing nonlinearity, but the gating units all have sigmoid nonlinearity. The state unit can be utilized as an additional input to the gating units as well. A single time step is represented by the black square. Image credit: Deep Learning, Goodfellow et al., 2017 .
In spite of being able to alleviate the vanishing and exploding gradient problem, the LSTMs and GRUs do not provide an easily accessible mathematical structure from which to study their time series modeling properties and are most likely over-engineered for financial time series forecasting.
This is where a new class of RNNs comes into play, referred to as -RNNs . The -RNNs are a generic family of exponentially smoothed RNNs that excel in modeling nonstationary time series data as seen in financial applications. They characterize the time series’ nonlinear partial autocorrelation structure and directly capture dynamic influences such as seasonality and trends. The -RNN is almost identical to a standard RNN except for the addition of a scalar smoothing parameter, which provides the recurrent network with extended memory, that is, autoregressive memory beyond the sequence length. To extend an -RNN to a dynamic time series model, the -RNN, the combination of the hidden state and the exponentially smoothed output , which is time-dependent and convex, is used. This combination means that the model is capable of modeling non-stationary time-series data, as in the following equation:
–RNNs Relationship to Long Short Term Memory and Gated Recurrent Units
Figure 5: Comparison of different recurrent neural networks. Image source: Industrial Forecasting with Exponentially Smoothed Recurrent Neural Networks, Matthew Dixon, 2020.
The -RNN model lacks the ability to completely reset its memory and transform into a feedforward network (FFN) because the equation for updating hidden variables is always dependent on the preceding, smoothed hidden state. Adding a reset layer, however, enables the ”recovery” of a gated recurrent unit (GRU) network, as in Equation 1:
As with the GRU, the -RNN implements a method for propagating a smoothed hidden state. Doing so results in a long-term memory, which may be obliterated or converted to a memoryless FFN or a plain RNN. More complex models, such as the LSTM, have a memory cell, , which is separate, in addition to the hidden state , as shown in Equation 2 and Figure 4. LSTMs are more generic than exponential smoothing, as they do not need convex combinations of memory updates.
In June, I will discuss how to use the -RNN model to predict stock prices with more detailed examples at ODSC Europe. Hope to meet you there.
Additionally, if you want to go even deeper into machine learning in finance, I highly recommend the book Machine Learning in Finance and/or the Artificial Intelligence in Finance Bootcamp, both covering the theory, implementation, and use of various machine learning models in finance and time series prediction.
 Box, G. E. and Jenkins P., Time series analysis: forecasting and control. Holden-Day, 1976.
 Bollerslev, T., “Generalized autoregressive conditional heteroskedasticity,” Journal of
Econometrics, vol. 31, no. 3, pp. 307–327, 1986. [Online].
 Dixon, M. F., Machine Learning in Finance: From Theory to Practice. S.l.: SPRINGER, 2021.
 Raff, Edward. Inside Deep Learning: Math, Algorithms, Models. Manning Publications, 2021.
 Pascanu, R. , Mikolov, T. , and Bengio, Y. , “Understanding the exploding gradient problem,” CoRR, vol. abs/1211.5063, 2012. [Online]. Available: http://arxiv.org/abs/1211.5063
 Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep Learning. Cambridge, Mass: The MIT Press, 2017.
 Dixon, M., “Industrial forecasting with exponentially smoothed recurrent neural networks,” arXiv preprint arXiv:2004.04717v2, 2020.
About the author/ODSC Europe 2022 speaker on time series prediction: