This repository contains code for data preperation, training and evaluation of various classifiers on Go-Emotions Dataset.
Use this file to see how the original 28-class problem was mapped to 6-class problem.
This file implements sequence classification pipeline from HuggingFace. Additionally, there is a parameter search implementation using Ray library and HuggingFace Trainer module.
This file has Lightning modules for data preperation and training to be used with Lightning Framework. I have used 4-bit quantization from BitsAndBytes and peft library from HuggingFace, to train a 13 billion parameter model on a 16GB GPU.
This file needs to be used with previous file to run supervised finetuning under various parameter settings.
I extract the adapters from the checkpoints created from supervised_finetuning.py. This saves space because the base Llama2-13B model is fixed during supervised finetuning, and therefore doesn't need to be saved.
Here I load the adapters saved in the previous step with the Llama2-13B model and generate text from validation split of Go Emotions dataset. The module has been designed to take inputs from commandline for various checkpoints.
This is a short scipt to run OpenAI text completion API to generate labels from the validation split, in a zero-shot fashion.
Here I list the samples generated by GPT3.5-Turbo and Llama2-13B finetuned models. The model predictions can also be found in pkl format.
A notebook to visualize a pytorch model with adapter modules.
A folder with various debugging scripts and notebooks.
A file with the commands I used for supervised finetuning.