By Fireworks.ai | March 21, 2024
- Link to blog post on Fireworks: https://fireworks.ai/blog/optimizing-rag-with-mongodb-atlas-and-fireworks
- Link to announcement on MongoDB: https://www.mongodb.com/blog/post/fireworks-ai-mongodb-fastest-ai-apps-with-best-models-powered-by-your-data
- What is RAG?
- Why RAG?
- RAG Architecture
- Optimizing RAG Architecture
- Prerequisites
- Configuring Your Environment
- Gathering Credentials
- Initializing Fireworks and MongoDB Clients
- Using Fireworks with OSS Embedding Models
- Generating Embeddings
- Creating an Index on MongoDB Collection
- Generating Personalized Recommendations with Fireworks
- What's Next?
Retrieval Augmented Generation (RAG) combines a retrieval component to fetch relevant information from a database or vector store and a generative component (LLM) to synthesize a coherent response to the user query.
- Data Efficiency: Dynamically pulls relevant data not seen during training, saving time and resources compared to fine-tuning.
- Flexibility: Enables dynamic updates to knowledge bases without regular retraining.
A RAG Architecture involves:
- A Large Language Model (LLM) for generating responses.
- A vector store (e.g., MongoDB Atlas) for retrieving relevant data based on embeddings.
- Fireworks AI for creating embeddings and handling LLM inference.
Tips for optimizing RAG:
- Cost Reduction: Use smaller embeddings for lower storage costs.
- Improved Throughput: Implement batching for efficient processing.
- Function Calling: Use Fireworks' function-calling models for dynamic query handling.
- MongoDB Atlas Account
- Fireworks AI Account
Install necessary packages:
!pip install -q pymongo fireworks-ai tqdm openai
Replace "FIREWORKS_API_KEY"
and "MONGODB_URI"
with your credentials.
from pymongo.mongo_client import MongoClient
import openai
uri = "MONGODB_URI"
client = MongoClient(uri)
fw_client = openai.OpenAI(api_key="FIREWORKS_API_KEY", base_url="https://api.fireworks.ai/inference/v1")
Generate embeddings using the Fireworks embedding API:
def generate_embeddings(input_texts, model_api_string):
return fw_client.embeddings.create(input=input_texts, model=model_api_string).data[0].embedding
Process movie data through the generate_embeddings
function:
embedding_model_string = 'nomic-ai/nomic-embed-text-v1.5'
sample_output = generate_embeddings(["This is a test."], embedding_model_string)
print(f"Embedding size is: {len(sample_output)}")
Define an index structure for efficient search:
{
"fields": [
{
"type": "vector",
"path": "embed",
"numDimensions": 768,
"similarity": "dotProduct"
}
]
}
Example query for recommendations:
query = "I like Christmas movies, any recommendations?"
query_emb = generate_embeddings([query], embedding_model_string)
results = collection.aggregate([
{
"$vectorSearch": {
"queryVector": query_emb,
"path": "embed",
"limit": 10
}
}
])
Explore guides for optimizing RAG architectures, reducing embedding size, and integrating function calling for dynamic query handling.