Skip to content

Commit

Permalink
Upstream changes for v0.6.0 release (#115)
Browse files Browse the repository at this point in the history
  • Loading branch information
nv-pranjald authored May 10, 2024
1 parent 136da43 commit e711143
Show file tree
Hide file tree
Showing 141 changed files with 8,834 additions and 1,886 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,4 @@ uploaded_files/
docs/_*
docs/notebooks
docs/experimental
docs/tools
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,39 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.6.0] - 2024-05-07

### Added
- Ability to switch between [API Catalog](https://build.nvidia.com/explore/discover) models to on-prem models using [NIM-LLM](https://docs.nvidia.com/ai-enterprise/nim-llm/latest/index.html).
- New API endpoint
- `/health` - Provides a health check for the chain server.
- Containerized [evaluation application](./tools/evaluation/) for RAG pipeline accuracy measurement.
- Observability support for langchain based examples.
- New Notebooks
- Added [Chat with NVIDIA financial data](./notebooks/12_Chat_wtih_nvidia_financial_reports.ipynb) notebook.
- Added notebook showcasing [langgraph agent handling](./notebooks/11_LangGraph_HandlingAgent_IntermediateSteps.ipynb).
- A [simple rag example template](https://nvidia.github.io/GenerativeAIExamples/latest/simple-examples.html) showcasing how to build an example from scratch.

### Changed
- Renamed example `csv_rag` to [structured_data_rag](./RetrievalAugmentedGeneration/examples/structured_data_rag/)
- Model Engine name update
- `nv-ai-foundation` and `nv-api-catalog` llm engine are renamed to `nvidia-ai-endpoints`
- `nv-ai-foundation` embedding engine is renamed to `nvidia-ai-endpoints`
- Embedding model update
- `developer_rag` example uses [UAE-Large-V1](https://huggingface.co/WhereIsAI/UAE-Large-V1) embedding model.
- Using `ai-embed-qa-4` for api catalog examples instead of `nvolveqa_40k` as embedding model
- Ingested data now persists across multiple sessions.
- Updated langchain-nvidia-endpoints to version 0.0.11, enabling support for models like llama3.
- File extension based validation to throw error for unsupported files.
- The default output token length in the UI has been increased from 250 to 1024 for more comprehensive responses.
- Stricter chain-server API validation support to enhance API security
- Updated version of llama-index, pymilvus.
- Updated pgvector container to `pgvector/pgvector:pg16`
- LLM Model Updates
- [Multiturn Chatbot](./RetrievalAugmentedGeneration/examples/multi_turn_rag/) now uses `ai-mixtral-8x7b-instruct` model for response generation.
- [Structured data rag](./RetrievalAugmentedGeneration/examples/structured_data_rag/) now uses `ai-llama3-70b` for response and code generation.


## [0.5.0] - 2024-03-19

This release adds new dedicated RAG examples showcasing state of the art usecases, switches to the latest [API catalog endpoints from NVIDIA](https://build.nvidia.com/explore/discover) and also refactors the API interface of chain-server. This release also improves the developer experience by adding github pages based documentation and streamlining the example deployment flow using dedicated compose files.
Expand Down
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,15 @@ If you don't have a GPU, you can inference and embed remotely with [NVIDIA API C

| Model | Embedding | Framework | Description | Multi-GPU | TRT-LLM | NVIDIA Endpoints | Triton | Vector Database |
| ---------------------------------- | ---------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------- | ---------------- | ------ | ------------------ |
| mixtral_8x7b | nvolveqa_40k | LangChain | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)] | No | No | Yes | Yes | Milvus or pgvector |
| llama-2 | e5-large-v2 | LlamaIndex | Canonical QA Chatbot [[code](./RetrievalAugmentedGeneration/examples/developer_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html)] | [Yes](https://nvidia.github.io/GenerativeAIExamples/latest/multi-gpu.html) | Yes | No | Yes | Milvus or pgvector |
| mixtral_8x7b | ai-embed-qa-4 | LangChain | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)] | No | No | Yes | Yes | Milvus or pgvector |
| llama-2 | UAE-Large-V1 | LlamaIndex | Canonical QA Chatbot [[code](./RetrievalAugmentedGeneration/examples/developer_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html)] | [Yes](https://nvidia.github.io/GenerativeAIExamples/latest/multi-gpu.html) | Yes | No | Yes | Milvus or pgvector |
| llama-2 | all-MiniLM-L6-v2 | LlamaIndex | Chat bot, GeForce, Windows [[repo](https://github.com/NVIDIA/trt-llm-rag-windows/tree/release/1.0)] | No | Yes | No | No | FAISS |
| llama-2 | nvolveqa_40k | LangChain | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)] | No | No | Yes | Yes | Milvus or pgvector |
| mixtral_8x7b | nvolveqa_40k | LangChain | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)] | No | No | Yes | Yes | FAISS |
| mixtral_8x7b<br>Deplot<br>Neva-22b | nvolveqa_40k | Custom | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)] | No | No | Yes | No | Milvus or pvgector |
| llama-2 | e5-large-v2 | LlamaIndex | Chat bot with quantized LLM model [[docs](https://nvidia.github.io/GenerativeAIExamples/latest/quantized-llm-model.html)] | Yes | Yes | No | Yes | Milvus or pgvector |
| mixtral_8x7b | none | PandasAI | Chat bot with structured data [[code](./RetrievalAugmentedGeneration/examples/structured_data_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/structured-data.html)] | No | No | Yes | No | none |
| llama-2 | nvolveqa_40k | LangChain | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)] | No | No | Yes | No | Milvus or pgvector |
| llama-2 | ai-embed-qa-4 | LangChain | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)] | No | No | Yes | Yes | Milvus or pgvector |
| mixtral_8x7b | ai-embed-qa-4 | LangChain | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)] | No | No | Yes | Yes | FAISS |
| mixtral_8x7b<br>Deplot<br>Neva-22b | ai-embed-qa-4 | Custom | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)] | No | No | Yes | No | Milvus or pvgector |
| llama-2 | UAE-Large-V1 | LlamaIndex | Chat bot with quantized LLM model [[docs](https://nvidia.github.io/GenerativeAIExamples/latest/quantized-llm-model.html)] | Yes | Yes | No | Yes | Milvus or pgvector |
| llama3-70b | none | PandasAI | Chat bot with structured data [[code](./RetrievalAugmentedGeneration/examples/structured_data_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/structured-data.html)] | No | No | Yes | No | none |
| llama-2 | ai-embed-qa-4 | LangChain | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)] | No | No | Yes | No | Milvus or pgvector |

### Enterprise RAG Examples

Expand Down
5 changes: 4 additions & 1 deletion RetrievalAugmentedGeneration/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ ENV DEBIAN_FRONTEND noninteractive

# Install required ubuntu packages for setting up python 3.10
RUN apt update && \
apt install -y dpkg openssl libgl1 linux-libc-dev libksba8 curl software-properties-common build-essential libssl-dev libffi-dev && \
apt install -y curl software-properties-common libgl1 libglib2.0-0 && \
add-apt-repository ppa:deadsnakes/ppa && \
apt update && apt install -y python3.10 python3.10-dev python3.10-distutils && \
apt-get clean
Expand All @@ -18,6 +18,9 @@ RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10

RUN rm -rf /var/lib/apt/lists/*

# Uninstall build packages
RUN apt autoremove -y curl software-properties-common

# Install common dependencies for all examples
RUN --mount=type=bind,source=RetrievalAugmentedGeneration/requirements.txt,target=/opt/requirements.txt \
pip3 install --no-cache-dir -r /opt/requirements.txt
Expand Down
8 changes: 4 additions & 4 deletions RetrievalAugmentedGeneration/common/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ class LLMConfig(ConfigWizard):

server_url: str = configfield(
"server_url",
default="localhost:8001",
default="",
help_txt="The location of the Triton server hosting the llm model.",
)
model_name: str = configfield(
Expand Down Expand Up @@ -86,7 +86,7 @@ class TextSplitterConfig(ConfigWizard):

model_name: str = configfield(
"model_name",
default="intfloat/e5-large-v2",
default="WhereIsAI/UAE-Large-V1",
help_txt="The name of Sentence Transformer model used for SentenceTransformer TextSplitter.",
)
chunk_size: int = configfield(
Expand All @@ -110,7 +110,7 @@ class EmbeddingConfig(ConfigWizard):

model_name: str = configfield(
"model_name",
default="intfloat/e5-large-v2",
default="WhereIsAI/UAE-Large-V1",
help_txt="The name of huggingface embedding model.",
)
model_engine: str = configfield(
Expand All @@ -125,7 +125,7 @@ class EmbeddingConfig(ConfigWizard):
)
server_url: str = configfield(
"server_url",
default="localhost:9080",
default="",
help_txt="The url of the server hosting nemo embedding model",
)

Expand Down
Loading

0 comments on commit e711143

Please sign in to comment.