Skip to content

apurva-koti/rag_code

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Studio

Note: This feature is in Private Preview. To try it, reach out to your Databricks contact or [email protected].

The software and other materials included in this repo ("Copyrighted Materials") are protected by US and international copyright laws and are the property of Databricks, Inc. The Copyrighted Materials are not provided under a license for public or third-party use. Accordingly, you may not access, use, copy, modify, publish, and/or distribute the Copyrighted Materials unless you have received prior written authorization or a license from Databricks to do so.

Product overview

RAG Studio: The set of upgraded Mosaic AI platform capabilities for building high-quality Retrieval Augmented Generation (RAG) applications:

  • MLflow: Support for logging, parameterizing, and tracing Chains that are unified between development & production. Chains can be logged as code vs. pickled.
  • Model Serving: Support for hosting Chains e.g., token streaming, automated authentication of Databricks services used in your chain, feedback API and a simplified chain deployment API
  • RAG Cookbook: Sample code & how-to guide offering an opinionated end-to-end workflow for building RAG apps [this repo]
  • [Future release] Lakehouse Monitoring: Capabilities for monitoring your apps once in production

Evaluation Suite: Built-for-purpose tools to evaluate Generative AI App quality, starting with RAG apps:

  • Evaluation Harness: evaluate(...) command that runs the evaluation
  • Review App: UI tool for collecting stakeholder feedback & building evaluation sets
  • Databricks LLM Judges: Databricks' proprietary AI-assisted judges for evaluating RAG quality. Can be tuned with customer provided examples to increase agreement with human raters.
  • Metrics: A set of Databricks-defined metrics for measuring quality/cost/latency of your chain. Most metrics are defined using the output of the Databricks LLM judges.
  • Customer-defined LLM Judges: Databricks framework to quickly define custom judges that evaluate business / use-case specific aspects of quality
  • [Future release] Custom metrics: Provide a user-defined function to run and record its value as an evaluation metric.

Release notes & upcoming releases

databricks-rag-studio v0.2.0 release notes & upcoming releases

Table of contents

  1. Product documentation
  2. Known limitations
  3. Sample code

Product documentation

Our documentation provides a comprehensive overview of the above functionality:

Known limitations

  • Only tested on Databricks Runtime 15.0 and 14.3 Single User clusters. They have not been tested on MLR or Shared Clusters.
  • Only supports chains using the Langchain framework. Generic Python functionality is coming soon.
  • Chains that need custom credentials for external services e.g., directly calling 3rd party APIs require these credentials to be manually configured in the model serving UI after calling deploy_model(...)
  • Support for custom Python library dependencies and versions e.g., pip_requirements in mlflow.langchain.log_model(...) has not been tested extensively.
  • Serialization-based MLFlow logging has not been tested with RAG Studio
  • Code-based MLflow logging captures all loaded Python packages in the Driver Notebook as the pip_requirements for the MLflow model - if you need to add or remove requirements, pass a custom pip_requirements array that includes "databricks-rag-studio==0.2.0".
  • Some parts of the product documentation are still work-in-progress

Sample code

To get started, clone this repo as a Git Folder in your Databricks workspace.

Note: While stored in the Git repo as .py files, these .py files are actually Databricks Notebooks - if you import the file using Databricks, it will render as a Notebook in the Notebook editor.

RAG Cookbook

PDF Bot w/ single-turn conversation

This cookbook creates a simple RAG chain with PDF files stored in a UC Volume.

PDF Bot w/ multi-turn conversation

This cookbook creates a multi-turn conversation capable RAG chain with PDF files stored in a UC Volume. This cookbook is identical to the single-turn converastion cookbook, except for minor changes to the chain & configuration to support multi-turn conversations.

Advanced data pipeline for tuning parsing, chunking, embedding strategies

This cookbook helps you try different chunking & parsing strategies, alongside different embedding models. It provides a RAG data processing pipeline that provides a set of pre-baked chunking & parsing strategies + embedding models, yet is flexible enough to modify the pre-built techniques or add in custom techniques.

How to tutorials

Tutorial 1: Creating, logging & deploying chains

This tutorial walks you through how to create, log and deploy a chain. The outcome is a user-facing Web UI for chatting with the chain & providing feedback (the Review App) and a REST API for integrating the chain into your user-facing application.

Tutorial 2: Parameterizing chains

This tutorial walks you through RAG Studio's support for parameterizing chains. Parameterization allows you to quickly iterate on quality related parameters (such as the prompt or retriever configuruation) while holding the chain's code constant.

Tutorial 3: Running evaluation on a logged RAG chain

This tutorial walks you through using Evaluation Suite to evaluate the quality of a RAG chain built with RAG Studio.

Tutorial 4: Running evaluation on a RAG chain or app built outside of RAG Studio

This tutorial walks you through using Evaluation Suite to evaluate the quality of a RAG chain built outside of RAG Studio or already deployed outside of Databricks.

Tutorial 5: Using External Models or Provisioned Throughput

This tutorial walks you through using an External Model (e.g., OpenAI, etc) or Provisioned Throughput model in your RAG Studio chain.

Additional tutorials being worked on

These items are currently covered in the documentation, but will be covered in future hands-on tutorials. If you need one of these sooner, please contact us at [email protected].

  • Improving LLM judge agreement with human raters using few-shot examples
  • Curating an Evaluation Set using feedback from the Review App
  • Measuring use-case specific aspects of quality with customer-defined LLM judges

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%