Skip to content

Commit

Permalink
doc: fix heading levels (opea-project#690)
Browse files Browse the repository at this point in the history
Only one H1 for the title is allowed

Signed-off-by: David B. Kinder <[email protected]>
  • Loading branch information
dbkinder authored Sep 14, 2024
1 parent 3c5fc80 commit f8f8854
Show file tree
Hide file tree
Showing 2 changed files with 114 additions and 114 deletions.
144 changes: 72 additions & 72 deletions comps/dataprep/vdms/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ For dataprep microservice, we currently provide one framework: `Langchain`.

We organized the folders in the same way, so you can use either framework for dataprep microservice with the following constructions.

# 🚀1. Start Microservice with Python (Option 1)
## 🚀1. Start Microservice with Python (Option 1)

## 1.1 Install Requirements
### 1.1 Install Requirements

Install Single-process version (for 1-10 files processing)

Expand All @@ -25,11 +25,11 @@ pip install -r requirements.txt
cd langchain_ray; pip install -r requirements_ray.txt
``` -->

## 1.2 Start VDMS Server
### 1.2 Start VDMS Server

Please refer to this [readme](../../vectorstores/vdms/README.md).
Refer to this [readme](../../vectorstores/vdms/README.md).

## 1.3 Setup Environment Variables
### 1.3 Setup Environment Variables

```bash
export http_proxy=${your_http_proxy}
Expand All @@ -40,7 +40,7 @@ export COLLECTION_NAME=${your_collection_name}
export PYTHONPATH=${path_to_comps}
```

## 1.4 Start Document Preparation Microservice for VDMS with Python Script
### 1.4 Start Document Preparation Microservice for VDMS with Python Script

Start document preparation microservice for VDMS with below command.

Expand All @@ -56,13 +56,13 @@ python prepare_doc_vdms.py
python prepare_doc_redis_on_ray.py
``` -->

# 🚀2. Start Microservice with Docker (Option 2)
## 🚀2. Start Microservice with Docker (Option 2)

## 2.1 Start VDMS Server
### 2.1 Start VDMS Server

Please refer to this [readme](../../vectorstores/vdms/README.md).
Refer to this [readme](../../vectorstores/vdms/README.md).

## 2.2 Setup Environment Variables
### 2.2 Setup Environment Variables

```bash
export http_proxy=${your_http_proxy}
Expand All @@ -76,24 +76,24 @@ export DISTANCE_STRATEGY="L2"
export PYTHONPATH=${path_to_comps}
```

## 2.3 Build Docker Image
### 2.3 Build Docker Image

- Build docker image with langchain

Start single-process version (for 1-10 files processing)
Start single-process version (for 1-10 files processing)

```bash
cd ../../../
docker build -t opea/dataprep-vdms:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/vdms/langchain/Dockerfile .
```
```bash
cd ../../../
docker build -t opea/dataprep-vdms:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/vdms/langchain/Dockerfile .
```

<!-- - option 2: Start multi-process version (for >10 files processing)
```bash
cd ../../../../
docker build -t opea/dataprep-on-ray-vdms:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/vdms/langchain_ray/Dockerfile . -->

## 2.4 Run Docker with CLI
### 2.4 Run Docker with CLI

Start single-process version (for 1-10 files processing)

Expand All @@ -113,75 +113,75 @@ docker run -d --name="dataprep-vdms-server" -p 6007:6007 --runtime=runc --ipc=ho
-e TIMEOUT_SECONDS=600 opea/dataprep-on-ray-vdms:latest
``` -->

# 🚀3. Status Microservice
## 🚀3. Status Microservice

```bash
docker container logs -f dataprep-vdms-server
```

# 🚀4. Consume Microservice
## 🚀4. Consume Microservice

Once document preparation microservice for VDMS is started, user can use below command to invoke the microservice to convert the document to embedding and save to the database.

Make sure the file path after `files=@` is correct.

- Single file upload

```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.txt" \
http://localhost:6007/v1/dataprep
```
```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.txt" \
http://localhost:6007/v1/dataprep
```

You can specify chunk_size and chunk_size by the following commands.
You can specify `chunk_size` and `chunk_overlap` by the following commands.

```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./LLAMA2_page6.pdf" \
-F "chunk_size=1500" \
-F "chunk_overlap=100" \
http://localhost:6007/v1/dataprep
```
```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./LLAMA2_page6.pdf" \
-F "chunk_size=1500" \
-F "chunk_overlap=100" \
http://localhost:6007/v1/dataprep
```

- Multiple file upload

```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.txt" \
-F "files=@./file2.txt" \
-F "files=@./file3.txt" \
http://localhost:6007/v1/dataprep
```

- Links upload (not supported for llama_index now)

```bash
curl -X POST \
-F 'link_list=["https://www.ces.tech/"]' \
http://localhost:6007/v1/dataprep
```

or

```python
import requests
import json

proxies = {"http": ""}
url = "http://localhost:6007/v1/dataprep"
urls = [
"https://towardsdatascience.com/no-gpu-no-party-fine-tune-bert-for-sentiment-analysis-with-vertex-ai-custom-jobs-d8fc410e908b?source=rss----7f60cf5620c9---4"
]
payload = {"link_list": json.dumps(urls)}

try:
resp = requests.post(url=url, data=payload, proxies=proxies)
print(resp.text)
resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes
print("Request successful!")
except requests.exceptions.RequestException as e:
print("An error occurred:", e)
```
```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.txt" \
-F "files=@./file2.txt" \
-F "files=@./file3.txt" \
http://localhost:6007/v1/dataprep
```

- Links upload (not supported for `llama_index` now)

```bash
curl -X POST \
-F 'link_list=["https://www.ces.tech/"]' \
http://localhost:6007/v1/dataprep
```

or

```python
import requests
import json

proxies = {"http": ""}
url = "http://localhost:6007/v1/dataprep"
urls = [
"https://towardsdatascience.com/no-gpu-no-party-fine-tune-bert-for-sentiment-analysis-with-vertex-ai-custom-jobs-d8fc410e908b?source=rss----7f60cf5620c9---4"
]
payload = {"link_list": json.dumps(urls)}

try:
resp = requests.post(url=url, data=payload, proxies=proxies)
print(resp.text)
resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes
print("Request successful!")
except requests.exceptions.RequestException as e:
print("An error occurred:", e)
```
84 changes: 42 additions & 42 deletions comps/dataprep/vdms/multimodal_langchain/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,25 @@

For dataprep microservice, we currently provide one framework: `Langchain`.

# 🚀1. Start Microservice with Python (Option 1)
## 🚀1. Start Microservice with Python (Option 1)

## 1.1 Install Requirements
### 1.1 Install Requirements

- option 1: Install Single-process version (for 1-10 files processing)

```bash
apt-get update
apt-get install -y default-jre tesseract-ocr libtesseract-dev poppler-utils
pip install -r requirements.txt
```
```bash
apt-get update
apt-get install -y default-jre tesseract-ocr libtesseract-dev poppler-utils
pip install -r requirements.txt
```

## 1.2 Start VDMS Server
### 1.2 Start VDMS Server

```bash
docker run -d --name="vdms-vector-db" -p 55555:55555 intellabs/vdms:latest
```

## 1.3 Setup Environment Variables
### 1.3 Setup Environment Variables

```bash
export http_proxy=${your_http_proxy}
Expand All @@ -33,23 +33,23 @@ export your_hf_api_token="{your_hf_token}"
export PYTHONPATH=${path_to_comps}
```

## 1.4 Start Data Preparation Microservice for VDMS with Python Script
### 1.4 Start Data Preparation Microservice for VDMS with Python Script

Start document preparation microservice for VDMS with below command.

```bash
python ingest_videos.py
```

# 🚀2. Start Microservice with Docker (Option 2)
## 🚀2. Start Microservice with Docker (Option 2)

## 2.1 Start VDMS Server
### 2.1 Start VDMS Server

```bash
docker run -d --name="vdms-vector-db" -p 55555:55555 intellabs/vdms:latest
```

## 2.1 Setup Environment Variables
### 2.1 Setup Environment Variables

```bash
export http_proxy=${your_http_proxy}
Expand All @@ -61,64 +61,64 @@ export INDEX_NAME="rag-vdms"
export your_hf_api_token="{your_hf_token}"
```

## 2.3 Build Docker Image
### 2.3 Build Docker Image

- Build docker image

```bash
cd ../../../
docker build -t opea/dataprep-vdms:latest --network host --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/vdms/multimodal_langchain/Dockerfile .
```bash
cd ../../../
docker build -t opea/dataprep-vdms:latest --network host --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/vdms/multimodal_langchain/Dockerfile .

```
```

## 2.4 Run Docker Compose
### 2.4 Run Docker Compose

```bash
docker compose -f comps/dataprep/vdms/multimodal_langchain/docker-compose-dataprep-vdms.yaml up -d
```

# 🚀3. Status Microservice
## 🚀3. Status Microservice

```bash
docker container logs -f dataprep-vdms-server
```

# 🚀4. Consume Microservice
## 🚀4. Consume Microservice

Once data preparation microservice for VDMS is started, user can use below command to invoke the microservice to convert the videos to embedding and save to the database.

Make sure the file path after `files=@` is correct.

- Single file upload

```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.mp4" \
http://localhost:6007/v1/dataprep
```
```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.mp4" \
http://localhost:6007/v1/dataprep
```

- Multiple file upload

```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.mp4" \
-F "files=@./file2.mp4" \
-F "files=@./file3.mp4" \
http://localhost:6007/v1/dataprep
```
```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.mp4" \
-F "files=@./file2.mp4" \
-F "files=@./file3.mp4" \
http://localhost:6007/v1/dataprep
```

- List of uploaded files

```bash
curl -X GET http://localhost:6007/v1/dataprep/get_videos
```
```bash
curl -X GET http://localhost:6007/v1/dataprep/get_videos
```

- Download uploaded files

Please use the file name from the list
Use the file name from the list

```bash
curl -X GET http://localhost:6007/v1/dataprep/get_file/${filename}
```
```bash
curl -X GET http://localhost:6007/v1/dataprep/get_file/${filename}
```

0 comments on commit f8f8854

Please sign in to comment.