Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fciannella added nvidia text model #265

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions community/multimodal_retrieval/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mongodbdata/
146 changes: 146 additions & 0 deletions community/multimodal_retrieval/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Introduction

This is a multimodal retrieval using long context. You will be able to ingest HTML documents and ask questions about the document. The tool will allow you to find answers inside the images and the tables.

Here is an example:

![Finding an answer inside a table](assets/table_example.png)
![Finding an answer inside a chart](assets/image_example.png)

The tool uses an openai vision model or an nvidia vision model (llama v3.2 90B)


### Setup details

There are two setups that need to be spun up:

- Langgraph that runs the agent
- Mongodb and langserve that run the database and some services that can be tested along with the Gradio UI to test

The idea is that you have a gradio UI that allows you to ingest html documents and then you can query the agent that is provided by langgraph.



# QuickStart

In this setup we will launch the langgraph agent in dev mode on the host machine and the rest of the setup will be hosted in docker containers, configured through docker compose.
You can also launch langgraph with the containers with `langgraph up`, in that case you don't need an extra .env.lg file (see below)

## Langgraph setup in the host machine

Run this command from the root of the repository (the one with the `langgraph.json` and `docker-compose.yml` files)

Install a venv:

```shell
python3 -m venv lb-venv
source ./lg-venv/bin/activate
pip install -r requirements.txt
```


## Create the env files

You need to create two .env files (one for the docker compose and one for the langgraph agent).

In the below we give the opportunity to use an NVIDIA text model for the pure text based tasks.

For the Langgraph agent we leave the LLM model to be openai as at the moment it is providing better results with tools binding.

### .env

Create a .env file in the root directory of this repository (the one with the `langgraph.json` and `docker-compose.yml` files)

```shell
# .env
MONGO_INITDB_ROOT_USERNAME=admin
MONGO_INITDB_ROOT_PASSWORD=secret
MONGO_HOST=localhost
MONGO_PORT=27017
AGENTS_PORT=2024
OPENAI_API_KEY=
LANGCHAIN_API_KEY=
LANGSMITH_API_KEY=
LANGGRAPH_CLOUD_LICENSE_KEY=
NVIDIA_API_KEY=
IMAGES_HOST=localhost
NVIDIA_VISION_MODEL=meta/llama-3.2-90b-vision-instruct
NVIDIA_TEXT_MODEL=meta/llama-3.3-70b-instruct
TEXT_MODEL_PROVIDER=nvidia
```

Normally LANGCHAIN_API_KEY and LANGSMITH_API_KEY have the same value.

### .env.lg

We need this because we want to launch langgraph in dev mode, so to be able to reach mongodb from inside the langgraph agent we need to set its hostname to the localhost.

It should be located in the root of the repository (the one with the `langgraph.json` and `docker-compose.yml` files)

```shell
MONGO_INITDB_ROOT_USERNAME=admin
MONGO_INITDB_ROOT_PASSWORD=secret
MONGO_HOST=localhost
MONGO_PORT=27017
AGENTS_PORT=2024
OPENAI_API_KEY=
LANGCHAIN_API_KEY=
LANGSMITH_API_KEY=
LANGGRAPH_CLOUD_LICENSE_KEY=
NVIDIA_API_KEY=
IMAGES_HOST=localhost
NVIDIA_VISION_MODEL=meta/llama-3.2-90b-vision-instruct
NVIDIA_TEXT_MODEL=meta/llama-3.3-70b-instruct
TEXT_MODEL_PROVIDER=nvidia
```

# Launch the mongodb and gradio services

Update the `.env` file adding your API Keys.

Launch the docker compose services

```shell
docker compose up --build
```
then you can connect to `http://localhost:7860` to ingest documents

# Launch langgraph

```bash
langgraph dev --host 0.0.0.0
```

## Test Langgraph

```bash
curl --request POST \
--url http://localhost:2024/runs/stream \
--header 'Content-Type: application/json' \
--data '{
"assistant_id": "agent",
"input": {
"messages": [
{
"role": "user",
"content": "What is the harness?"
}
]
},
"metadata": {},
"config": {
"configurable": {
"collection_name": "test",
"document_id": "8eb8f7396e4fe72595e6577c35a7a587"
}
},
"multitask_strategy": "reject",
"stream_mode": [
"values"
]
}'

```



Empty file.
Loading