A Not-So-Large-Yet Language Model.
This is a decoder-only generative transformer written from scratch
To run :
- Make a gpu enabled env/venv.
- Clone this repo
- pip install requirements.txt
- Check the run.py file for general training and sampling example(s).
You can find the link to the poems dataset here
DataLoader class implements functions to load your own txt file, merge all txt files in a source directory into a single file and split an input data file into train and val files. As for the tokenizer, currently the model only supports character level ascii mapping and OpenAI's tiktoken BPE tokenizer, which works on a sub-word level.
Additionally, If you're a stranger and found your way here, I assume you're here because you're interested in language models or all things NLP. Instead of a readme, I think you should explore the code yourself, which could help in getting a better understanding of how the model works :). Cheers!