Using recurrent neural networks to build a transliteration system. The goal of this assignment is threefold: (i) learn how to model sequence-to-sequence learning problems using Recurrent Neural Networks (ii) compare different cells such as vanilla RNN, LSTM and GRU (iii) understand how attention networks overcome the limitations of vanilla seq2seq models
Entire code can be found in the following Kaggle Notebook : https://www.kaggle.com/code/adityanandakishore/cs6910-a3-ipynb
Wandb Report link : https://api.wandb.ai/links/berserank/sqi3bf5s
I have bulit the model using Aksharantar's Eng-Tamil Dataset. Data has been Pre-processed to account for variable length and unknwon characters in test data set. I have impleneted padding to account for the former and Loss was calculated over padding too. This won't affect the results as later seen in attention map.
I have implemented the entire code in a Kaggle Notebook.Here is the link for the same
https://www.kaggle.com/code/adityanandakishore/cs6910-a3-ipynb
For evaluation purposes, I have made a model.py file, that can help with the same. model.py contains the following functions. Please note that I have implemented wandb only in Kaggle Notebook but not in model.py, you will be able to see the results, but won't be able to log it. For logging the data, please follow the comments in the Kaggle Notebook.
In case the code in model.py appears too crowded, I have provided comments in the Kaggle version of the code.
- trainIters : This function will train the model based on the given hyper-parameters. Setting Attention = False will not be necessary as it is the default value. This function gives out val accuracy as the output after training the entire model
- infer : This function infers the built model over test data and stores the predictions in the given file(given log = True)
To deal with different number of layers in encoder and decoder, I have initialised all the hidden states of decoder with last layer of the encoder only.
Hyper parameter Sweep was implemented in the kaggle notebook over the following hyper-parameters
- Optimiser: Nadam, Adam
- Teacher forcing ratio: 0.3, 0.5, 0.7
- Encoder Embedding: 128, 256
- Decoder Embedding: 128, 256
- Epochs: 5, 8
- Hidden Size: 128, 256, 512
- Encoder Layers: 2, 3
- Decoder Layers: 1, 2, 3
- Dropout: 0.25, 0.4
- Cell Type: GRU, RNN, LSTM
- Bidirectional: True, False (Please note that "null" indicates "Bidirectional = True" too as it is the default value)
Results and comments are attached in the wandb report : https://api.wandb.ai/links/berserank/sqi3bf5s
The best model reported in the above sweep had the following hyperparameters:
- Optimiser: Nadam
- Teacher forcing ratio: 0.7
- Encoder Embedding: 128
- Decoder Embedding: 256
- Epochs: 5
- Hidden Size: 512
- Encoder Layers: 3
- Decoder Layers: 1
- Dropout: 0.4
- Cell Type: Bidirectional LSTM
- Batch Size : 32
- Val Accuracy Reported - 48.14%
The accuracy reported on test set was 33.40%
Results and Comments are attached in the wandb report : https://api.wandb.ai/links/berserank/sqi3bf5s
The model was later trained by adding a Bahdanau attention layer to the basic sequence-to-sequence model. Code and Reults are available in the Kaggle notebook and Wandb Report respectively.
For the sake of evaluation, one can set the "Attention Flag = True" and train the model.py. Further code specifications are given below.
I have performed a hyper parameter search again in a lesser space this time as results were satisfactory. The best model with attention reported an accuracy of
Please use the following arguments to check the code. Long opts are used as given, but I had to choose different variables for short opts as I have implemented my train.py using getopt.getopt
I set the default hyperparameters to the values that has given me the best validation accuracy with Attention.
Name | Default Value | Description |
---|---|---|
-e , --enc_embed |
256 | Encoder Embedding Size |
-d , --dec_embed |
256 | Decoder Embedding Size |
-k , --enc_layers |
1 | Encoder Layers |
-l , --dec_layers |
1 | Decoder Layers |
-h , --hidden |
512 | Hidden Unit Dimensions |
-c , --cell_type |
lstm | Choices = ['lstm', 'gru', 'rnn'] |
-g , --dropout |
0.4 | Dropout |
-a , --atention |
1 | Attention Flag Choices = [0 , 1] |
-p , --epochs |
5 | Number of Epochs |
-t , --teacher_forcing_ratio |
0.7 | Teacher forcing ratio |
-b , --bidirectional |
1 | Bidirectional Flag Choices = [0 , 1] |
Please use them to change the hyper-parameters while running the model.py. Incase you would like to change any other hyper-parameter please refer the Kaggle Notebook.