Types of problems
- Classification
- Regression
- Prediction/Forecasting
- Compression
Example applications
- natural language processing
- speech recognition
- traffic forecasting (see Diffusion Convolutional Recurrent Neural Network (DCRNN))
- electrical grid management
- earthquake prediction
- medicine (EEG, outcomes)
Historically popular models and deep learning architectures
- Autoregressive (AR, MA, ARMA, ARIMA, ARCH, GARCH)
- Shumway, Robert H., David S. Stoffer. Time series analysis and its applications. Vol. 4. New York: Springer, 2017.
- MLP
- Recurrent neural network (this tutorial)
- with an Attention layer
- Temporal convolutional network (TCN)
Successors:
- Transformers and their variants (GPT, BERT, BART, Reformer, Longformer, ...)
- Vision transformers
As a reminder, if you are doing this tutorial on ALCF ThetaGPU, be sure to pull the latest updates to this repo. See our previous tutorial's instructions for cloning it, if you havent done so already. From a terminal run the following commands (assuming this repo is cloned with the defualt name in your $HOME
directory):
ssh [email protected]
cd ai-science-training-series
git pull
You can run the notebooks of this session on ALCF's JupyterHub.
-
Log in to a ThetaGPU compute node via JupyterHub (be sure your browser navigates to https://jupyter.alcf.anl.gov/ and does not autocomplete to https://jupyter.alcf.anl.gov/theta/hub/login or another subdomain).
-
Change the notebook's kernel to
conda/2021-09-22
(you may need to change kernel each time you open a notebook for the first time):- select Kernel in the menu bar
- select Change kernel...
- select conda/2021-09-22 from the drop-down menu
-
Open
CAE_LSTM.ipynb
A standard time series classification tutorial is also included here if you want to try it out after the session: keras-imdb-rnn.ipynb
All RNN diagrams from Chirstopher Olah's famous 2015 blog post, Understanding LSTM Networks
(this is technically an Elman RNN, not a Jordan RNN). Train through backpropagation through time (BPTT). Both forward and backward passes can be slow since we cannot compute the time-dependencies in parallel (big advantage of transformers)
- Techniques for handling long or uneven sequences: https://machinelearningmastery.com/handle-long-sequences-long-short-term-memory-recurrent-neural-networks/
SimpleRNN
in TF/Keras uses a different formulation. See the source code forSimpleRNNCell
Introduced by Hochreiter and Schmidhuber (1997), greatly ameliorates the vanishing/exploding gradient problem that simple RNNs suffer from.
Introduce by Cho (2014), the fully gated version is just an LSTM minus the output gate and has fewer parameters.
See Kates-Harbeck (2019) for more details.