Skip to content

Using recurrent neural networks to build a transliteration system.

Notifications You must be signed in to change notification settings

berserank/Neural-Transliteration-System

Repository files navigation

CS6910-A3

Using recurrent neural networks to build a transliteration system. The goal of this assignment is threefold: (i) learn how to model sequence-to-sequence learning problems using Recurrent Neural Networks (ii) compare different cells such as vanilla RNN, LSTM and GRU (iii) understand how attention networks overcome the limitations of vanilla seq2seq models

Entire code can be found in the following Kaggle Notebook : https://www.kaggle.com/code/adityanandakishore/cs6910-a3-ipynb

Wandb Report link : https://api.wandb.ai/links/berserank/sqi3bf5s

Dataset

I have bulit the model using Aksharantar's Eng-Tamil Dataset. Data has been Pre-processed to account for variable length and unknwon characters in test data set. I have impleneted padding to account for the former and Loss was calculated over padding too. This won't affect the results as later seen in attention map.

Question 2- Buliding and Training Seq2Seq Model

I have implemented the entire code in a Kaggle Notebook.Here is the link for the same

https://www.kaggle.com/code/adityanandakishore/cs6910-a3-ipynb

For evaluation purposes, I have made a model.py file, that can help with the same. model.py contains the following functions. Please note that I have implemented wandb only in Kaggle Notebook but not in model.py, you will be able to see the results, but won't be able to log it. For logging the data, please follow the comments in the Kaggle Notebook.

In case the code in model.py appears too crowded, I have provided comments in the Kaggle version of the code.

  • trainIters : This function will train the model based on the given hyper-parameters. Setting Attention = False will not be necessary as it is the default value. This function gives out val accuracy as the output after training the entire model
  • infer : This function infers the built model over test data and stores the predictions in the given file(given log = True)

To deal with different number of layers in encoder and decoder, I have initialised all the hidden states of decoder with last layer of the encoder only.

Hyper parameter Sweep was implemented in the kaggle notebook over the following hyper-parameters

  • Optimiser: Nadam, Adam
  • Teacher forcing ratio: 0.3, 0.5, 0.7
  • Encoder Embedding: 128, 256
  • Decoder Embedding: 128, 256
  • Epochs: 5, 8
  • Hidden Size: 128, 256, 512
  • Encoder Layers: 2, 3
  • Decoder Layers: 1, 2, 3
  • Dropout: 0.25, 0.4
  • Cell Type: GRU, RNN, LSTM
  • Bidirectional: True, False (Please note that "null" indicates "Bidirectional = True" too as it is the default value)

Results and comments are attached in the wandb report : https://api.wandb.ai/links/berserank/sqi3bf5s

Question 4- Results on Test Data

The best model reported in the above sweep had the following hyperparameters:

  • Optimiser: Nadam
  • Teacher forcing ratio: 0.7
  • Encoder Embedding: 128
  • Decoder Embedding: 256
  • Epochs: 5
  • Hidden Size: 512
  • Encoder Layers: 3
  • Decoder Layers: 1
  • Dropout: 0.4
  • Cell Type: Bidirectional LSTM
  • Batch Size : 32
  • Val Accuracy Reported - 48.14%

The accuracy reported on test set was 33.40%

Results and Comments are attached in the wandb report : https://api.wandb.ai/links/berserank/sqi3bf5s

Question 5- Buliding and Training the Seq2Seq with Attention Model

The model was later trained by adding a Bahdanau attention layer to the basic sequence-to-sequence model. Code and Reults are available in the Kaggle notebook and Wandb Report respectively.

For the sake of evaluation, one can set the "Attention Flag = True" and train the model.py. Further code specifications are given below.

I have performed a hyper parameter search again in a lesser space this time as results were satisfactory. The best model with attention reported an accuracy of

Code Specifications

Please use the following arguments to check the code. Long opts are used as given, but I had to choose different variables for short opts as I have implemented my train.py using getopt.getopt

I set the default hyperparameters to the values that has given me the best validation accuracy with Attention.

Arguments

Name Default Value Description
-e, --enc_embed 256 Encoder Embedding Size
-d, --dec_embed 256 Decoder Embedding Size
-k, --enc_layers 1 Encoder Layers
-l, --dec_layers 1 Decoder Layers
-h, --hidden 512 Hidden Unit Dimensions
-c, --cell_type lstm Choices = ['lstm', 'gru', 'rnn']
-g, --dropout 0.4 Dropout
-a, --atention 1 Attention Flag Choices = [0 , 1]
-p, --epochs 5 Number of Epochs
-t, --teacher_forcing_ratio 0.7 Teacher forcing ratio
-b, --bidirectional 1 Bidirectional Flag Choices = [0 , 1]

Please use them to change the hyper-parameters while running the model.py. Incase you would like to change any other hyper-parameter please refer the Kaggle Notebook.

About

Using recurrent neural networks to build a transliteration system.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published