This project is dedicated to researching subpopulations within large language models (LLMs). The goal is to understand the nuances and variations in model responses when prompted with different inputs that may represent various subpopulations.
To analyze the data generated by the LLMs, we utilize Google BigQuery, a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data.
-
Set up a Google Cloud account: If you do not have one, sign up for a new account at https://cloud.google.com/.
-
Create a new project: Once you have a Google Cloud account, create a new project for your LLM research.
-
Enable BigQuery API: Navigate to the API & Services dashboard and enable the BigQuery API for your project.
-
Set up authentication: Create a service account and download the JSON key file. This will be used to authenticate your requests to BigQuery.
-
Set up your environment: Place your service account JSON key file in a secure directory, then specify the path to this file in the
GOOGLE_APPLICATION_CREDENTIALS
variable within your.env
file.
To replicate the test environment, please follow these steps:
-
Set up a virtual environment using either
venv
orconda
.- For
venv
, run:python3 -m venv venv source venv/bin/activate
- For
conda
, run:conda create --name myenv python=3.11 conda activate myenv
- For
-
Install the required dependencies by running:
pip install -r requirements.txt
Ensure you activate the virtual environment before running any code within this workspace.