-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add RAG Tools
- Loading branch information
Showing
40 changed files
with
2,809 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,10 @@ | ||
[deps] | ||
DataFramesMeta = "1313f7d8-7da2-5740-9ea0-a2ca25f37964" | ||
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" | ||
HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3" | ||
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1" | ||
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" | ||
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306" | ||
LiveServer = "16fef848-5104-11e9-1b77-fb7a48bbb589" | ||
PromptingTools = "670122d1-24a8-4d70-bfce-740807c42192" | ||
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,17 @@ | ||
using PromptingTools | ||
using Documenter | ||
using SparseArrays, LinearAlgebra | ||
using PromptingTools.Experimental.RAGTools | ||
using JSON3, Serialization, DataFramesMeta | ||
using Statistics: mean | ||
|
||
DocMeta.setdocmeta!(PromptingTools, | ||
:DocTestSetup, | ||
:(using PromptingTools); | ||
recursive = true) | ||
|
||
makedocs(; | ||
modules = [PromptingTools], | ||
modules = [PromptingTools, PromptingTools.Experimental.RAGTools], | ||
authors = "J S <[email protected]> and contributors", | ||
repo = "https://github.com/svilupp/PromptingTools.jl/blob/{commit}{path}#{line}", | ||
sitename = "PromptingTools.jl", | ||
|
@@ -24,9 +28,14 @@ makedocs(; | |
"Various examples" => "examples/readme_examples.md", | ||
"Using AITemplates" => "examples/working_with_aitemplates.md", | ||
"Local models with Ollama.ai" => "examples/working_with_ollama.md", | ||
"Building RAG Application" => "examples/building_RAG.md", | ||
], | ||
"F.A.Q." => "frequently_asked_questions.md", | ||
"Reference" => "reference.md", | ||
"Reference" => [ | ||
"PromptingTools.jl" => "reference.md", | ||
"Experimental Modules" => "reference_experimental.md", | ||
"RAGTools" => "reference_ragtools.md", | ||
], | ||
]) | ||
|
||
deploydocs(; | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Reference for Experimental Module | ||
|
||
Note: This module is experimental and may change in future releases. | ||
The intention is for the functionality to be moved to separate packages over time. | ||
|
||
```@index | ||
Modules = [PromptingTools.Experimental] | ||
``` | ||
|
||
```@autodocs | ||
Modules = [PromptingTools.Experimental] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Reference for RAGTools | ||
|
||
```@index | ||
Modules = [PromptingTools.Experimental.RAGTools] | ||
``` | ||
|
||
```@autodocs | ||
Modules = [PromptingTools.Experimental.RAGTools] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
# # Building a Simple Retrieval-Augmented Generation (RAG) System with RAGTools | ||
|
||
# Let's build a Retrieval-Augmented Generation (RAG) chatbot, tailored to navigate and interact with the DataFrames.jl documentation. | ||
# "RAG" is probably the most common and valuable pattern in Generative AI at the moment. | ||
|
||
# If you're not familiar with "RAG", start with this [article](https://towardsdatascience.com/add-your-own-data-to-an-llm-using-retrieval-augmented-generation-rag-b1958bf56a5a). | ||
|
||
## Imports | ||
using LinearAlgebra, SparseArrays | ||
using PromptingTools | ||
## Note: RAGTools is still experimental and will change in the future. Ideally, they will be cleaned up and moved to a dedicated package | ||
using PromptingTools.Experimental.RAGTools | ||
using JSON3, Serialization, DataFramesMeta | ||
using Statistics: mean | ||
const PT = PromptingTools | ||
const RT = PromptingTools.Experimental.RAGTools | ||
|
||
# ## RAG in Two Lines | ||
|
||
# Let's put together a few text pages from DataFrames.jl docs. | ||
# Simply go to [DataFrames.jl docs](https://dataframes.juliadata.org/stable/) and copy&paste a few pages into separate text files. Save them in the `examples/data` folder (see some example pages provided). Ideally, delete all the noise (like headers, footers, etc.) and keep only the text you want to use for the chatbot. Remember, garbage in, garbage out! | ||
|
||
files = [ | ||
joinpath("examples", "data", "database_style_joins.txt"), | ||
joinpath("examples", "data", "what_is_dataframes.txt"), | ||
] | ||
## Build an index of chunks, embed them, and create a lookup index of metadata/tags for each chunk | ||
index = build_index(files; extract_metadata = false) | ||
|
||
# Let's ask a question | ||
## Embeds the question, finds the closest chunks in the index, and generates an answer from the closest chunks | ||
answer = airag(index; question = "I like dplyr, what is the equivalent in Julia?") | ||
|
||
# First RAG in two lines? Done! | ||
# | ||
# What does it do? | ||
# - `build_index` will chunk the documents into smaller pieces, embed them into numbers (to be able to judge the similarity of chunks) and, optionally, create a lookup index of metadata/tags for each chunk) | ||
# - `index` is the result of this step and it holds your chunks, embeddings, and other metadata! Just show it :) | ||
# - `airag` will | ||
# - embed your question | ||
# - find the closest chunks in the index (use parameters `top_k` and `minimum_similarity` to tweak the "relevant" chunks) | ||
# - [OPTIONAL] extracts any potential tags/filters from the question and applies them to filter down the potential candidates (use `extract_metadata=true` in `build_index`, you can also provide some filters explicitly via `tag_filter`) | ||
# - [OPTIONAL] re-ranks the candidate chunks (define and provide your own `rerank_strategy`, eg Cohere ReRank API) | ||
# - build a context from the closest chunks (use `chunks_window_margin` to tweak if we include preceding and succeeding chunks as well, see `?build_context` for more details) | ||
# - generate an answer from the closest chunks (use `return_context=true` to see under the hood and debug your application) | ||
|
||
# You should save the index for later to avoid re-embedding / re-extracting the document chunks! | ||
serialize("examples/index.jls", index) | ||
index = deserialize("examples/index.jls") | ||
|
||
# # Evaluations | ||
# However, we want to evaluate the quality of the system. For that, we need a set of questions and answers. | ||
# Ideally, we would hand-craft a set of high quality Q&A pairs. However, this is time consuming and expensive. | ||
# Let's generate them from the chunks in our index! | ||
|
||
# ## Generate Q&A pairs | ||
|
||
# We need to provide: chunks and sources (filepaths for future reference) | ||
evals = build_qa_evals(RT.chunks(index), | ||
RT.sources(index); | ||
instructions = "None.", | ||
verbose = true); | ||
## Info: Q&A Sets built! (cost: $0.143) -- not bad! | ||
|
||
# > [!TIP] | ||
# > In practice, you would review each item in this golden evaluation set (and delete any generic/poor questions). | ||
# > It will determine the future success of your app, so you need to make sure it's good! | ||
|
||
## Save the evals for later | ||
JSON3.write("examples/evals.json", evals) | ||
evals = JSON3.read("examples/evals.json", Vector{RT.QAEvalItem}); | ||
|
||
# ## Explore one Q&A pair | ||
# Let's explore one evals item -- it's not the best but gives you the idea! | ||
# | ||
evals[1] | ||
|
||
# ## Evaluate this Q&A pair | ||
|
||
# Let's evaluate this QA item with a "judge model" (often GPT-4 is used as a judge). | ||
|
||
## Note: that we used the same question, but generated a different context and answer via `airag` | ||
msg, ctx = airag(index; evals[1].question, return_context = true); | ||
|
||
## ctx is a RAGContext object that keeps all intermediate states of the RAG pipeline for easy evaluation | ||
judged = aiextract(:RAGJudgeAnswerFromContext; | ||
ctx.context, | ||
ctx.question, | ||
ctx.answer, | ||
return_type = RT.JudgeAllScores) | ||
judged.content | ||
## Dict{Symbol, Any} with 7 entries: | ||
## :final_rating => 4.8 | ||
## :clarity => 5 | ||
## :completeness => 5 | ||
## :relevance => 5 | ||
## :consistency => 4 | ||
## :helpfulness => 5 | ||
## :rationale => "The answer is highly relevant to the user's question, as it provides a comprehensive list of frameworks that are compared with DataFrames.jl. The answer is complete, covering all | ||
|
||
# We can also run the whole evaluation in a function (a few more metrics are available): | ||
x = run_qa_evals(evals[10], ctx; | ||
parameters_dict = Dict(:top_k => 3), verbose = true, model_judge = "gpt4t") | ||
|
||
# Fortunately, we don't have to do this one by one -- let's evaluate all our Q&A pairs at once. | ||
|
||
# ## Evaluate the whole set | ||
|
||
# Let's run each question & answer through our eval loop in async (we do it only for the first 10 to save time). See the `?airag` for which parameters you can tweak, eg, `top_k` | ||
|
||
results = asyncmap(evals[1:10]) do qa_item | ||
## Generate an answer -- often you want the model_judge to be the highest quality possible, eg, "GPT-4 Turbo" (alias "gpt4t) | ||
msg, ctx = airag(index; qa_item.question, return_context = true, | ||
top_k = 3, verbose = false, model_judge = "gpt4t") | ||
## Evaluate the response | ||
## Note: you can log key parameters for easier analysis later | ||
run_qa_evals(qa_item, ctx; parameters_dict = Dict(:top_k => 3), verbose = false) | ||
end | ||
## Note that the "failed" evals can show as "nothing", so make sure to handle them. | ||
results = filter(x -> !isnothing(x.answer_score), results); | ||
|
||
# Note: You could also use the vectorized version `results = run_qa_evals(evals)` to evaluate all items at once. | ||
|
||
## Let's take a simple average to calculate our score | ||
@info "RAG Evals: $(length(results)) results, Avg. score: $(round(mean(x->x.answer_score, results);digits=1)), Retrieval score: $(100*round(Int,mean(x->x.retrieval_score,results)))%" | ||
## [ Info: RAG Evals: 10 results, Avg. score: 4.6, Retrieval score: 100% | ||
|
||
# Note: The retrieval score is 100% only because we have two small documents and running on 10 items only. In practice, you would have a much larger document set and a much larger eval set, which would result in a more representative retrieval score. | ||
|
||
# You can also analyze the results in a DataFrame: | ||
|
||
df = DataFrame(results) | ||
first(df, 5) | ||
|
||
# We're done for today! | ||
|
||
# # What would we do next? | ||
# - Review your evaluation golden data set and keep only the good items | ||
# - Play with the chunk sizes (max_length in build_index) and see how it affects the quality | ||
# - Explore using metadata/key filters (`extract_metadata=true` in build_index) | ||
# - Add filtering for semantic similarity (embedding distance) to make sure we don't pick up irrelevant chunks in the context | ||
# - Use multiple indices or a hybrid index (add a simple BM25 lookup from TextAnalysis.jl) | ||
# - Data processing is the most important step - properly parsed and split text could make wonders | ||
# - Add re-ranking of context (see `rerank` function, you can use Cohere ReRank API)`) | ||
# - Improve the question embedding (eg, rephrase it, generate hypothetical answers and use them to find better context) | ||
# | ||
# ... and much more! See some ideas in [Anyscale RAG tutorial](https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1) |
Oops, something went wrong.