Upstream changes for v0.6.0 release (#115)

NVIDIA · May 10, 2024 · e711143 · e711143
1 parent 136da43
commit e711143
Show file tree

Hide file tree

Showing 141 changed files with 8,834 additions and 1,886 deletions.
diff --git a/.gitignore b/.gitignore
@@ -23,3 +23,4 @@ uploaded_files/
 docs/_*
 docs/notebooks
 docs/experimental
+docs/tools
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,39 @@ All notable changes to this project will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.6.0] - 2024-05-07
+
+### Added
+- Ability to switch between [API Catalog](https://build.nvidia.com/explore/discover) models to on-prem models using [NIM-LLM](https://docs.nvidia.com/ai-enterprise/nim-llm/latest/index.html).
+- New API endpoint
+  - `/health` - Provides a health check for the chain server.
+- Containerized [evaluation application](./tools/evaluation/) for RAG pipeline accuracy measurement.
+- Observability support for langchain based examples.
+- New Notebooks
+  - Added [Chat with NVIDIA financial data](./notebooks/12_Chat_wtih_nvidia_financial_reports.ipynb) notebook.
+  - Added notebook showcasing [langgraph agent handling](./notebooks/11_LangGraph_HandlingAgent_IntermediateSteps.ipynb).
+- A [simple rag example template](https://nvidia.github.io/GenerativeAIExamples/latest/simple-examples.html) showcasing how to build an example from scratch.
+
+### Changed
+- Renamed example `csv_rag` to [structured_data_rag](./RetrievalAugmentedGeneration/examples/structured_data_rag/)
+- Model Engine name update
+  - `nv-ai-foundation` and `nv-api-catalog` llm engine are renamed to `nvidia-ai-endpoints`
+  - `nv-ai-foundation` embedding engine is renamed to `nvidia-ai-endpoints`
+- Embedding model update
+  - `developer_rag` example uses [UAE-Large-V1](https://huggingface.co/WhereIsAI/UAE-Large-V1) embedding model.
+  - Using `ai-embed-qa-4` for api catalog examples instead of `nvolveqa_40k` as embedding model
+- Ingested data now persists across multiple sessions.
+- Updated langchain-nvidia-endpoints to version 0.0.11, enabling support for models like llama3.
+- File extension based validation to throw error for unsupported files.
+- The default output token length in the UI has been increased from 250 to 1024 for more comprehensive responses.
+- Stricter chain-server API validation support to enhance API security
+- Updated version of llama-index, pymilvus.
+- Updated pgvector container to `pgvector/pgvector:pg16`
+- LLM Model Updates
+  - [Multiturn Chatbot](./RetrievalAugmentedGeneration/examples/multi_turn_rag/) now uses `ai-mixtral-8x7b-instruct` model for response generation.
+  - [Structured data rag](./RetrievalAugmentedGeneration/examples/structured_data_rag/) now uses `ai-llama3-70b` for response and code generation.
+
+
 ## [0.5.0] - 2024-03-19
 
 This release adds new dedicated RAG examples showcasing state of the art usecases, switches to the latest [API catalog endpoints from NVIDIA](https://build.nvidia.com/explore/discover) and also refactors the API interface of chain-server. This release also improves the developer experience by adding github pages based documentation and streamlining the example deployment flow using dedicated compose files.

diff --git a/README.md b/README.md
@@ -32,15 +32,15 @@ If you don't have a GPU, you can inference and embed remotely with [NVIDIA API C
 
 | Model                              | Embedding        | Framework  | Description                                                                                                                                                                                               | Multi-GPU                                                                  | TRT-LLM | NVIDIA Endpoints | Triton | Vector Database    |
 | ---------------------------------- | ---------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------- | ---------------- | ------ | ------------------ |
-| mixtral_8x7b                       | nvolveqa_40k     | LangChain  | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)]                | No                                                                         | No      | Yes              | Yes    | Milvus or pgvector |
-| llama-2                            | e5-large-v2      | LlamaIndex | Canonical QA Chatbot [[code](./RetrievalAugmentedGeneration/examples/developer_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html)]                                        | [Yes](https://nvidia.github.io/GenerativeAIExamples/latest/multi-gpu.html) | Yes     | No               | Yes    | Milvus or pgvector |
+| mixtral_8x7b                       | ai-embed-qa-4     | LangChain  | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)]                | No                                                                         | No      | Yes              | Yes    | Milvus or pgvector |
+| llama-2                            | UAE-Large-V1      | LlamaIndex | Canonical QA Chatbot [[code](./RetrievalAugmentedGeneration/examples/developer_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html)]                                        | [Yes](https://nvidia.github.io/GenerativeAIExamples/latest/multi-gpu.html) | Yes     | No               | Yes    | Milvus or pgvector |
 | llama-2                            | all-MiniLM-L6-v2 | LlamaIndex | Chat bot, GeForce, Windows [[repo](https://github.com/NVIDIA/trt-llm-rag-windows/tree/release/1.0)]                                                                                                       | No                                                                         | Yes     | No               | No     | FAISS              |
-| llama-2                            | nvolveqa_40k     | LangChain  | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)] | No                                                                         | No      | Yes              | Yes    | Milvus or pgvector |
-| mixtral_8x7b                       | nvolveqa_40k     | LangChain  | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)]                                               | No                                                                         | No      | Yes              | Yes    | FAISS              |
-| mixtral_8x7b<br>Deplot<br>Neva-22b | nvolveqa_40k     | Custom     | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)]                        | No                                                                         | No      | Yes              | No     | Milvus or pvgector |
-| llama-2                            | e5-large-v2      | LlamaIndex | Chat bot with quantized LLM model [[docs](https://nvidia.github.io/GenerativeAIExamples/latest/quantized-llm-model.html)]                                                                                 | Yes                                                                        | Yes     | No               | Yes    | Milvus or pgvector |
-| mixtral_8x7b                       | none             | PandasAI   | Chat bot with structured data [[code](./RetrievalAugmentedGeneration/examples/structured_data_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/structured-data.html)]                   | No                                                                         | No      | Yes              | No     | none               |
-| llama-2                            | nvolveqa_40k     | LangChain  | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)]                     | No                                                                         | No      | Yes              | No     | Milvus or pgvector |
+| llama-2                            | ai-embed-qa-4     | LangChain  | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)] | No                                                                         | No      | Yes              | Yes    | Milvus or pgvector |
+| mixtral_8x7b                       | ai-embed-qa-4     | LangChain  | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)]                                               | No                                                                         | No      | Yes              | Yes    | FAISS              |
+| mixtral_8x7b<br>Deplot<br>Neva-22b | ai-embed-qa-4     | Custom     | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)]                        | No                                                                         | No      | Yes              | No     | Milvus or pvgector |
+| llama-2                            | UAE-Large-V1      | LlamaIndex | Chat bot with quantized LLM model [[docs](https://nvidia.github.io/GenerativeAIExamples/latest/quantized-llm-model.html)]                                                                                 | Yes                                                                        | Yes     | No               | Yes    | Milvus or pgvector |
+| llama3-70b                       | none             | PandasAI   | Chat bot with structured data [[code](./RetrievalAugmentedGeneration/examples/structured_data_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/structured-data.html)]                   | No                                                                         | No      | Yes              | No     | none               |
+| llama-2                            | ai-embed-qa-4     | LangChain  | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)]                     | No                                                                         | No      | Yes              | No     | Milvus or pgvector |
 
 ### Enterprise RAG Examples
 

diff --git a/RetrievalAugmentedGeneration/Dockerfile b/RetrievalAugmentedGeneration/Dockerfile
@@ -8,7 +8,7 @@ ENV DEBIAN_FRONTEND noninteractive
 
 # Install required ubuntu packages for setting up python 3.10
 RUN apt update && \
-    apt install -y dpkg openssl libgl1 linux-libc-dev libksba8 curl software-properties-common build-essential libssl-dev libffi-dev && \
+    apt install -y curl software-properties-common libgl1 libglib2.0-0 && \
     add-apt-repository ppa:deadsnakes/ppa && \
     apt update && apt install -y python3.10 python3.10-dev python3.10-distutils && \
     apt-get clean
@@ -18,6 +18,9 @@ RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
 
 RUN rm -rf /var/lib/apt/lists/*
 
+# Uninstall build packages
+RUN apt autoremove -y curl software-properties-common
+
 # Install common dependencies for all examples
 RUN --mount=type=bind,source=RetrievalAugmentedGeneration/requirements.txt,target=/opt/requirements.txt \
     pip3 install --no-cache-dir -r /opt/requirements.txt

diff --git a/RetrievalAugmentedGeneration/common/configuration.py b/RetrievalAugmentedGeneration/common/configuration.py
@@ -57,7 +57,7 @@ class LLMConfig(ConfigWizard):
 
     server_url: str = configfield(
         "server_url",
-        default="localhost:8001",
+        default="",
         help_txt="The location of the Triton server hosting the llm model.",
     )
     model_name: str = configfield(
@@ -86,7 +86,7 @@ class TextSplitterConfig(ConfigWizard):
 
     model_name: str = configfield(
         "model_name",
-        default="intfloat/e5-large-v2",
+        default="WhereIsAI/UAE-Large-V1",
         help_txt="The name of Sentence Transformer model used for SentenceTransformer TextSplitter.",
     )
     chunk_size: int = configfield(
@@ -110,7 +110,7 @@ class EmbeddingConfig(ConfigWizard):
 
     model_name: str = configfield(
         "model_name",
-        default="intfloat/e5-large-v2",
+        default="WhereIsAI/UAE-Large-V1",
         help_txt="The name of huggingface embedding model.",
     )
     model_engine: str = configfield(
@@ -125,7 +125,7 @@ class EmbeddingConfig(ConfigWizard):
     )
     server_url: str = configfield(
         "server_url",
-        default="localhost:9080",
+        default="",
         help_txt="The url of the server hosting nemo embedding model",
     )