AutoRAG-template

Template for a new AutoRAG project

Installation

pip install -r requirements.txt

Running the above command will automatically install AutoRAG.

First, set the OPENAI_API_KEY environment variable directly or create a .env file and input it there.

RAG Evaluation Dataset Creation Tutorial

To use AutoRAG, you first need to create a RAG evaluation dataset. Follow the steps below to create and use the dataset yourself.

Check the original documents in raw_docs. In this tutorial, we will use three PDF documents.
Run run_parse.py. This file allows you to execute parsing methods specified in config/parse.yaml and compare the results.

python make_parse.py

In the parsed_raw folder, you will find several parquet files within the trial folder (numbered folders). These are the parsed results. Load them using pandas to inspect them directly.
Execute run_chunk.py to perform chunking using various methods. You can check the chunking methods in config/chunk.yaml. You need to set the raw file at this point.

python run_chunk.py --raw_path ./parsed_raw/0/2.parquet

After execution, check the chunked_corpus folder for the various chunked files created using different chunking methods.
Now, run the make_qa.py file. You need to set the raw file used for chunk creation and the chunk file to be used. Choose an appropriate chunk file, and you can generate a QA dataset using other chunk files later. You don't need to generate questions again. Refer to the update_corpus feature explained later.

python make_qa.py --raw_path ./parsed_raw/0/5.parquet --chunk_path ./chunked_corpus/0/3.parquet --qa_size 5

Check the generated_qa.parquet and generated_corpus.parquet files created in the data folder.

Running the Project

Using main.py

Copy the .env.template file to create a .env file and save it. Be sure to input your OpenAI API key in this file.
Run main.py as shown below to start AutoRAG.

python3 main.py --config ./config/tutorial.yaml

Once the benchmark folder is created, you can check the results there.

Using CLI

Create a benchmark folder.
Set the OPENAI_API_KEY as an environment variable. export OPENAI_API_KEY=sk-xxxx
Execute the CLI command below to start AutoRAG optimization.

autorag evaluate --qa_data_path ./data/qa.parquet --corpus_data_path ./data/corpus.parquet \
  --config ./config/tutorial.yaml --project_dir ./benchmark

To run with the dataset created in the dataset tutorial, replace corpus.parquet with corpus_new.parquet and qa.parquet with qa_new.parquet. 4. Once the benchmark folder is created, you can check the results there.

Running the Dashboard

Run the command below to load the dashboard. You can easily review the results through the dashboard.

autorag dashboard --trial_dir ./benchmark/0

Running Streamlit

Run Streamlit to directly use the optimized RAG. Execute the command below.

autorag run_web --trial_path ./benchmark/0

Update Corpus

This feature allows you to generate new QA files based on chunk corpora using the same raw file.

Try it as shown below.

from autorag.data.qa.schema import Raw, Corpus, QA

raw = Raw(initial_raw_df)
corpus = Corpus(initial_corpus_df, raw)
qa = QA(initial_qa_df, corpus)

new_qa = qa.update_corpus(Corpus(new_corpus_df, raw))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoRAG-template

Installation

RAG Evaluation Dataset Creation Tutorial

Running the Project

Using main.py

Using CLI

Running the Dashboard

Running Streamlit

Update Corpus

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
data		data
raw_docs		raw_docs
.env.template		.env.template
.gitignore		.gitignore
README.md		README.md
main.py		main.py
make_qa.py		make_qa.py
requirements.txt		requirements.txt
run_chunk.py		run_chunk.py
run_parse.py		run_parse.py

Marker-Inc-Korea/AutoRAG-tutorial

Folders and files

Latest commit

History

Repository files navigation

AutoRAG-template

Installation

RAG Evaluation Dataset Creation Tutorial

Running the Project

Using main.py

Using CLI

Running the Dashboard

Running Streamlit

Update Corpus

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages