From 1ec71a5a617b540db67747bf8f9f8e527850514a Mon Sep 17 00:00:00 2001 From: alexsin368 <109180236+alexsin368@users.noreply.github.com> Date: Mon, 18 Nov 2024 18:10:33 -0800 Subject: [PATCH 1/5] Add codegen sample guide for Gaudi deployment (#248) * adding codegen sample guide for gaudi deployment Signed-off-by: alexsin368 * update with TAG instead of version number Signed-off-by: alexsin368 * remove mention of vllm Signed-off-by: alexsin368 * adding codegen sample guide for gaudi deployment Signed-off-by: alexsin368 * update with TAG instead of version number Signed-off-by: alexsin368 * remove mention of vllm Signed-off-by: alexsin368 * fix typos, add link to ITAC, attempt to fix IO flow diagram link issue Signed-off-by: alexsin368 * modify and add index.rst files Signed-off-by: alexsin368 * make text output to not think it's a hyperlink Signed-off-by: alexsin368 --------- Signed-off-by: alexsin368 Co-authored-by: Ying Hu --- examples/CodeGen/deploy/gaudi.md | 375 ++++++++++++++++++++++++++++++ examples/CodeGen/deploy/index.rst | 14 ++ examples/index.rst | 2 + 3 files changed, 391 insertions(+) create mode 100644 examples/CodeGen/deploy/gaudi.md create mode 100644 examples/CodeGen/deploy/index.rst diff --git a/examples/CodeGen/deploy/gaudi.md b/examples/CodeGen/deploy/gaudi.md new file mode 100644 index 00000000..d4dfafa9 --- /dev/null +++ b/examples/CodeGen/deploy/gaudi.md @@ -0,0 +1,375 @@ +# Single node on-prem deployment with TGI on Gaudi AI Accelerator + +This deployment section covers single-node on-prem deployment of the CodeGen +example with OPEA comps to deploy using the TGI service. We will be showcasing how +to build an e2e CodeGen solution with the CodeLlama-7b-hf model, deployed on Intel® +Tiber™ AI Cloud ([ITAC](https://www.intel.com/content/www/us/en/developer/tools/tiber/ai-cloud.html)). +To quickly learn about OPEA in just 5 minutes and set up the required hardware and software, +please follow the instructions in the [Getting Started](https://opea-project.github.io/latest/getting-started/README.html) +section. If you do not have an ITAC instance or the hardware is not supported in the ITAC yet, you can still run this on-prem. + +## Overview + +The CodeGen use case uses a single microservice called LLM. In this tutorial, we +will walk through the steps on how to enable it from OPEA GenAIComps to deploy on +a single node TGI megaservice solution. + +The solution is aimed to show how to use the CodeLlama-7b-hf model on the Intel® +Gaudi® AI Accelerator. We will go through how to setup docker containers to start +the microservice and megaservice. The solution will then take text input as the +prompt and generate code accordingly. It is deployed with a UI with 2 modes to +choose from: + +1. Svelte-Based UI +2. React-Based UI + +The React-based UI is optional, but this feature is supported in this example if you +are interested in using it. + +Below is the list of content we will be covering in this tutorial: + +1. Prerequisites +2. Prepare (Building / Pulling) Docker images +3. Use case setup +4. Deploy the use case +5. Interacting with CodeGen deployment + +## Prerequisites + +The first step is to clone the GenAIExamples and GenAIComps. GenAIComps are +fundamental necessary components used to build examples you find in +GenAIExamples and deploy them as microservices. Also set the `TAG` +environment variable with the version. + +```bash +git clone https://github.com/opea-project/GenAIComps.git +git clone https://github.com/opea-project/GenAIExamples.git +export TAG=1.1 +``` + +The examples utilize model weights from HuggingFace and langchain. + +Setup your [HuggingFace](https://huggingface.co/) account and generate +[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). + +Setup the HuggingFace token +``` +export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" +``` + +Additionally, if you plan to use the default model CodeLlama-7b-hf, you will +need to [request access](https://huggingface.co/meta-llama/CodeLlama-7b-hf) from HuggingFace. + +The example requires you to set the `host_ip` to deploy the microservices on +endpoint enabled with ports. Set the host_ip env variable +``` +export host_ip=$(hostname -I | awk '{print $1}') +``` + +Make sure to setup Proxies if you are behind a firewall +``` +export no_proxy=${your_no_proxy},$host_ip +export http_proxy=${your_http_proxy} +export https_proxy=${your_http_proxy} +``` + +## Prepare (Building / Pulling) Docker images + +This step will involve building/pulling relevant docker +images with step-by-step process along with sanity check in the end. For +CodeGen, the following docker images will be needed: LLM with TGI. +Additionally, you will need to build docker images for the +CodeGen megaservice, and UI (React UI is optional). In total, +there are **3 required docker images** and an optional docker image. + +### Build/Pull Microservice image + +::::::{tab-set} + +:::::{tab-item} Pull +:sync: Pull + +If you decide to pull the docker containers and not build them locally, +you can proceed to the next step where all the necessary containers will +be pulled in from dockerhub. + +::::: +:::::{tab-item} Build +:sync: Build + +From within the `GenAIComps` folder, checkout the release tag. +``` +cd GenAIComps +git checkout tags/v${TAG} +``` + +#### Build LLM Image + +```bash +docker build --no-cache -t opea/llm-tgi:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile . +``` + +### Build Mega Service images + +The Megaservice is a pipeline that channels data through different +microservices, each performing varied tasks. The LLM microservice and +flow of data are defined in the `codegen.py` file. You can also add or +remove microservices and customize the megaservice to suit your needs. + +Build the megaservice image for this use case + +```bash +cd .. +cd GenAIExamples +git checkout tags/v${TAG} +cd CodeGen +``` + +```bash +docker build --no-cache -t opea/codegen:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . +cd ../.. +``` + +### Build the UI Image + +You can build 2 modes of UI + +*Svelte UI* + +```bash +cd GenAIExamples/CodeGen/ui/ +docker build --no-cache -t opea/codegen-ui:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +cd ../../.. +``` + +*React UI (Optional)* +If you want a React-based frontend. + +```bash +cd GenAIExamples/CodeGen/ui/ +docker build --no-cache -t opea/codegen-react-ui:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . +cd ../../.. +``` + +### Sanity Check +Check if you have the following set of docker images by running the command `docker images` before moving on to the next step. +The tags are based on what you set the environment variable `TAG` to. + +* `opea/llm-tgi:${TAG}` +* `opea/codegen:${TAG}` +* `opea/codegen-ui:${TAG}` +* `opea/codegen-react-ui:${TAG}` (optional) + +::::: +:::::: + +## Use Case Setup + +The use case will use the following combination of GenAIComps and tools + +|Use Case Components | Tools | Model | Service Type | +|---------------- |--------------|-----------------------------|-------| +|LLM | TGI | meta-llama/CodeLlama-7b-hf | OPEA Microservice | +|UI | | NA | Gateway Service | + +Tools and models mentioned in the table are configurable either through the +environment variables or `compose.yaml` file. + +Set the necessary environment variables to setup the use case by running the `set_env.sh` script. +Here is where the environment variable `LLM_MODEL_ID` is set, and you can change it to another model +by specifying the HuggingFace model card ID. + +```bash +cd GenAIExamples/CodeGen/docker_compose/ +source ./set_env.sh +cd ../../.. +``` + +## Deploy the Use Case + +In this tutorial, we will be deploying via docker compose with the provided +YAML file. The docker compose instructions should be starting all the +above mentioned services as containers. + +```bash +cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi +docker compose up -d +``` + + +### Checks to Ensure the Services are Running +#### Check Startup and Env Variables +Check the startup log by running `docker compose logs` to ensure there are no errors. +The warning messages print out the variables if they are **NOT** set. + +Here are some sample messages if proxy environment variables are not set: + + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. + +#### Check the Container Status +Check if all the containers launched via docker compose have started. + +The CodeGen example starts 4 docker containers. Check that these docker +containers are all running, i.e, all the containers `STATUS` are `Up`. +You can do this with the `docker ps -a` command. + +``` +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +bbd235074c3d opea/codegen-ui:${TAG} "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp codegen-gaudi-ui-server +8d3872ca66fa opea/codegen:${TAG} "python codegen.py" About a minute ago Up About a minute 0.0.0.0:7778->7778/tcp, :::7778->7778/tcp codegen-gaudi-backend-server +b9fc39f51cdb opea/llm-tgi:${TAG} "bash entrypoint.sh" About a minute ago Up About a minute 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-gaudi-server +39994e007f15 ghcr.io/huggingface/tgi-gaudi:2.0.1 "text-generation-lau…" About a minute ago Up About a minute 0.0.0.0:8028->80/tcp, :::8028->80/tcp tgi-gaudi-server +``` + +## Interacting with CodeGen for Deployment + +This section will walk you through the different ways to interact with +the microservices deployed. After a couple minutes, rerun `docker ps -a` +to ensure all the docker containers are still up and running. Then proceed +to validate each microservice and megaservice. + +### TGI Service + +```bash +curl http://${host_ip}:8028/generate \ + -X POST \ + -d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_new_tokens":256, "do_sample": true}}' \ + -H 'Content-Type: application/json' +``` + +Here is the output: + +``` +{"generated_text":"\n\nIO iflow diagram:\n\n!\[IO flow diagram(s)\]\(TodoList.iflow.svg\)\n\n### TDD Kata walkthrough\n\n1. Start with a user story. We will add story tests later. In this case, we'll choose a story about adding a TODO:\n ```ruby\n as a user,\n i want to add a todo,\n so that i can get a todo list.\n\n conformance:\n - a new todo is added to the list\n - if the todo text is empty, raise an exception\n ```\n\n1. Write the first test:\n ```ruby\n feature Testing the addition of a todo to the list\n\n given a todo list empty list\n when a user adds a todo\n the todo should be added to the list\n\n inputs:\n when_values: [[\"A\"]]\n\n output validations:\n - todo_list contains { text:\"A\" }\n ```\n\n1. Write the first step implementation in any programming language you like. In this case, we will choose Ruby:\n ```ruby\n def add_"} +``` + +### LLM Microservice + +```bash +curl http://${host_ip}:9000/v1/chat/completions\ + -X POST \ + -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ + -H 'Content-Type: application/json' +``` + +The output is given one character at a time. It is too long to show +here but the last item will be +``` +data: [DONE] +``` + +### MegaService + +```bash +curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{ + "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception." + }' +``` + +The output is given one character at a time. It is too long to show +here but the last item will be +``` +data: [DONE] +``` + +## Launch UI +### Svelte UI +To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +```bash + codegen-gaudi-ui-server: + image: ${REGISTRY:-opea}/codegen-ui:${TAG:-latest} + ... + ports: + - "5173:5173" +``` + +### React-Based UI (Optional) +To access the React-based frontend, modify the UI service in the `compose.yaml` file. Replace `codegen-gaudi-ui-server` service with the codegen-gaudi-react-ui-server service as per the config below: +```bash +codegen-gaudi-react-ui-server: + image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest} + container_name: codegen-gaudi-react-ui-server + environment: + - no_proxy=${no_proxy} + - https_proxy=${https_proxy} + - http_proxy=${http_proxy} + - APP_CODE_GEN_URL=${BACKEND_SERVICE_ENDPOINT} + depends_on: + - codegen-gaudi-backend-server + ports: + - "5174:80" + ipc: host + restart: always +``` +Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +```bash + codegen-gaudi-react-ui-server: + image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest} + ... + ports: + - "80:80" +``` + +## Check Docker Container Logs + +You can check the log of a container by running this command: + +```bash +docker logs -t +``` + +You can also check the overall logs with the following command, where the +`compose.yaml` is the megaservice docker-compose configuration file. + +Assumming you are still in this directory `GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi`, +run the following command to check the logs: +```bash +docker compose -f compose.yaml logs +``` + +View the docker input parameters in `./CodeGen/docker_compose/intel/hpu/gaudi/compose.yaml` + +```yaml + tgi-service: + image: ghcr.io/huggingface/tgi-gaudi:2.0.1 + container_name: tgi-gaudi-server + ports: + - "8028:80" + volumes: + - "./data:/data" + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + HABANA_VISIBLE_DEVICES: all + OMPI_MCA_btl_vader_single_copy_mechanism: none + HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} + runtime: habana + cap_add: + - SYS_NICE + ipc: host + command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048 +``` + +The input `--model-id` is `${LLM_MODEL_ID}`. Ensure the environment variable `LLM_MODEL_ID` +is set and spelled correctly. Check spelling. Whenever this is changed, restart the containers to use +the newly selected model. + + +## Stop the services + +Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +``` +docker compose down +``` diff --git a/examples/CodeGen/deploy/index.rst b/examples/CodeGen/deploy/index.rst new file mode 100644 index 00000000..ac0a37d0 --- /dev/null +++ b/examples/CodeGen/deploy/index.rst @@ -0,0 +1,14 @@ +.. _codegen-example-deployment: + +CodeGen Example Deployment Options +################################### + +Here are some deployment options, depending on your hardware and environment: + +Single Node +*********** + +.. toctree:: + :maxdepth: 1 + + Gaudi AI Accelerator \ No newline at end of file diff --git a/examples/index.rst b/examples/index.rst index 6524dc1d..283693f3 100644 --- a/examples/index.rst +++ b/examples/index.rst @@ -12,6 +12,8 @@ GenAIExamples are designed to give developers an easy entry into generative AI, ChatQnA/deploy/index AgentQnA/AgentQnA_Guide AgentQnA/deploy/index + CodeGen/deploy/gaudi.md + CodeGen/deploy/index ---- From 112db286640b7d08629db6eb5efa7467beddb3db Mon Sep 17 00:00:00 2001 From: Ying Hu Date: Tue, 19 Nov 2024 11:52:44 +0800 Subject: [PATCH 2/5] Update index.rst for AgentQnA doc refactor (#250) * Update index.rst for AgentQnA doc refactor Update index.rst for AgentQnA doc refactor * Update index.rst for * Update index.rst for AgentQnA doc refactor * Update index.rst fix the rst ref link * Update index.rst * Update index.rst * Update AgentQnA_Guide.rst * Update AgentQnA_Guide.rst * Update AgentQnA_Guide.rst * Update AgentQnA_Guide.rst * Update AgentQnA_Guide.rst * Update index.rst * Update AgentQnA_Guide.rst fix the format * Update index.rst * Update index.rst * Update AgentQnA_Guide.rst * Update AgentQnA_Guide.rst * Update AgentQnA_Guide.rst * Update examples/AgentQnA/AgentQnA_Guide.rst * Update conf.py * Update AgentQnA_Guide.rst --- conf.py | 2 ++ examples/AgentQnA/AgentQnA_Guide.rst | 9 ++++++++- examples/AgentQnA/deploy/index.rst | 7 ++----- examples/index.rst | 1 - 4 files changed, 12 insertions(+), 7 deletions(-) diff --git a/conf.py b/conf.py index 35051272..5ab3c491 100644 --- a/conf.py +++ b/conf.py @@ -70,6 +70,8 @@ # files and directories to ignore when looking for source files. exclude_patterns = [ 'scripts/*', + 'examples/AgentQnA/deploy/index.rst', + 'examples/AgentQnA/deploy/xeon.md' ] try: import sphinx_rtd_theme diff --git a/examples/AgentQnA/AgentQnA_Guide.rst b/examples/AgentQnA/AgentQnA_Guide.rst index c8424071..a4889928 100644 --- a/examples/AgentQnA/AgentQnA_Guide.rst +++ b/examples/AgentQnA/AgentQnA_Guide.rst @@ -43,5 +43,12 @@ The worker agent uses the retrieval tool to generate answers to the queries post Deployment ********** +Here are some deployment options, depending on your hardware and environment: -See the :ref:`agentqna-example-deployment`. \ No newline at end of file +Single Node ++++++++++++++++ +.. toctree:: + :maxdepth: 1 + + Xeon Scalable Processor + Gaudi diff --git a/examples/AgentQnA/deploy/index.rst b/examples/AgentQnA/deploy/index.rst index 629d84ae..7ecf527a 100644 --- a/examples/AgentQnA/deploy/index.rst +++ b/examples/AgentQnA/deploy/index.rst @@ -6,9 +6,6 @@ AgentQnA Example Deployment Options Here are some deployment options, depending on your hardware and environment: Single Node -*********** -.. toctree:: - :maxdepth: 1 - - Xeon Scalable Processor \ No newline at end of file +- **Xeon Scalable Processor**: `Xeon `_ +- **Gaudi**: `Gaudi `_ diff --git a/examples/index.rst b/examples/index.rst index 283693f3..d9392887 100644 --- a/examples/index.rst +++ b/examples/index.rst @@ -11,7 +11,6 @@ GenAIExamples are designed to give developers an easy entry into generative AI, ChatQnA/ChatQnA_Guide ChatQnA/deploy/index AgentQnA/AgentQnA_Guide - AgentQnA/deploy/index CodeGen/deploy/gaudi.md CodeGen/deploy/index From b65df98ac6299a697826e7a92cca4916bcee24ff Mon Sep 17 00:00:00 2001 From: Ying Hu Date: Wed, 20 Nov 2024 09:18:33 +0800 Subject: [PATCH 3/5] Update index.rst for CodeGen Guide structure (#255) --- examples/index.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/examples/index.rst b/examples/index.rst index d9392887..5d1e9934 100644 --- a/examples/index.rst +++ b/examples/index.rst @@ -11,7 +11,6 @@ GenAIExamples are designed to give developers an easy entry into generative AI, ChatQnA/ChatQnA_Guide ChatQnA/deploy/index AgentQnA/AgentQnA_Guide - CodeGen/deploy/gaudi.md CodeGen/deploy/index ---- From 75573a61b4effdda59c53ba43daaaa3f9bd8bbce Mon Sep 17 00:00:00 2001 From: alexsin368 <109180236+alexsin368@users.noreply.github.com> Date: Wed, 20 Nov 2024 16:07:56 -0800 Subject: [PATCH 4/5] reorganize sample guides, add CodeGen top sample guide (#256) * reorganize sample guides, add CodeGen top sample guide Signed-off-by: alexsin368 * include index.rst files in toctrees Signed-off-by: alexsin368 * correct toctree Signed-off-by: alexsin368 * fix indent Signed-off-by: alexsin368 --------- Signed-off-by: alexsin368 --- examples/ChatQnA/ChatQnA_Guide.rst | 14 +++++----- examples/CodeGen/CodeGen_Guide.rst | 45 ++++++++++++++++++++++++++++++ examples/index.rst | 3 +- 3 files changed, 53 insertions(+), 9 deletions(-) create mode 100644 examples/CodeGen/CodeGen_Guide.rst diff --git a/examples/ChatQnA/ChatQnA_Guide.rst b/examples/ChatQnA/ChatQnA_Guide.rst index d217f821..7e9b4779 100644 --- a/examples/ChatQnA/ChatQnA_Guide.rst +++ b/examples/ChatQnA/ChatQnA_Guide.rst @@ -204,16 +204,16 @@ The gateway serves as the interface for users to access. The gateway routes inco Deployment ********** -See the :ref:`chatqna-example-deployment` that includes both single-node and +Here are some deployment options, including both single-node and orchestrated multi-node configurations, and choose the one that best fits your -requirements. Here are quick references to the single-node deployment options: - -* :doc:`Xeon Scalable Processor ` -* :doc:`Gaudi AI Accelerator ` -* :doc:`Nvidia GPU ` -* :doc:`AI PC ` +requirements. +.. toctree:: + :maxdepth: 1 + ChatQnA Deployment Options + +---- Troubleshooting *************** diff --git a/examples/CodeGen/CodeGen_Guide.rst b/examples/CodeGen/CodeGen_Guide.rst new file mode 100644 index 00000000..dd64bb49 --- /dev/null +++ b/examples/CodeGen/CodeGen_Guide.rst @@ -0,0 +1,45 @@ +.. _Codegen_Guide: + +Codegen Sample Guide +##################### + +.. note:: This guide is in its early development and is a work-in-progress with + placeholder content. + +Overview +******** + +The CodeGen example uses specialized AI models that went through training with datasets that +encompass repositories, documentation, programming code, and web data. With an understanding +of various programming languages, coding patterns, and software development concepts, the +CodeGen LLMs assist developers and programmers. The LLMs can be integrated into the developers' +Integrated Development Environments (IDEs) to have more contextual awareness to write more +refined and relevant code based on the suggestions. + +Purpose +******* +* Code Generation: Streamline coding through Code Generation, enabling non-programmers to describe tasks for code creation. +* Code Completion: Accelerate coding by suggesting contextually relevant snippets as developers type. +* Code Translation and Modernization: Translate and modernize code across multiple programming languages, aiding interoperability and updating legacy projects. +* Code Summarization: Extract key insights from codebases, improving readability and developer productivity. +* Code Refactoring: Offer suggestions for code refactoring, enhancing code performance and efficiency. +* AI-Assisted Testing: Assist in creating test cases, ensuring code robustness and accelerating development cycles. +* Error Detection and Debugging: Detect errors in code and provide detailed descriptions and potential fixes, expediting debugging processes. + +How It Works +************ + +The CodeGen example uses an open-source code generation model with Text Generation Inference (TGI) +for serving deployment. It is presented as a Code Copilot application as shown in the diagram below. + +.. figure:: /GenAIExamples/CodeGen/assets/img/codegen_architecture.png + :alt: CodeGen Architecture Diagram + +Deployment +********** +Here are some deployment options, depending on your hardware and environment: + +.. toctree:: + :maxdepth: 1 + + CodeGen Deployment Options diff --git a/examples/index.rst b/examples/index.rst index 5d1e9934..9273bf8a 100644 --- a/examples/index.rst +++ b/examples/index.rst @@ -9,9 +9,8 @@ GenAIExamples are designed to give developers an easy entry into generative AI, :maxdepth: 1 ChatQnA/ChatQnA_Guide - ChatQnA/deploy/index AgentQnA/AgentQnA_Guide - CodeGen/deploy/index + CodeGen/CodeGen_Guide ---- From dc92900b7fffcc336ddeddb1142e63992d43b202 Mon Sep 17 00:00:00 2001 From: ZePan110 Date: Thu, 21 Nov 2024 14:55:56 +0800 Subject: [PATCH 5/5] Rename image name XXX-hpu to XXX-gaudi (#254) * Rename image name XXX-hpu to XXX-gaudi Signed-off-by: ZePan110 * Rename image name XXX-hpu to XXX-gaudi Signed-off-by: ZePan110 --------- Signed-off-by: ZePan110 --- examples/ChatQnA/deploy/gaudi.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/ChatQnA/deploy/gaudi.md b/examples/ChatQnA/deploy/gaudi.md index 926267fe..a68d0ce4 100644 --- a/examples/ChatQnA/deploy/gaudi.md +++ b/examples/ChatQnA/deploy/gaudi.md @@ -416,7 +416,7 @@ CONTAINER ID IMAGE COMMAND ce4e7802a371 opea/retriever-redis:${TAG} "python retriever_re…" About a minute ago Up About a minute 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server be6cd2d0ea38 opea/reranking-tei:${TAG} "python reranking_te…" About a minute ago Up About a minute 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-gaudi-server cc45ff032e8c opea/tei-gaudi:${TAG} "text-embeddings-rou…" About a minute ago Up About a minute 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server -4969ec3aea02 opea/llm-vllm-hpu:${TAG} "/bin/bash -c 'expor…" About a minute ago Up About a minute 0.0.0.0:8007->80/tcp, :::8007->80/tcp vllm-gaudi-server +4969ec3aea02 opea/vllm-gaudi:${TAG} "/bin/bash -c 'expor…" About a minute ago Up About a minute 0.0.0.0:8007->80/tcp, :::8007->80/tcp vllm-gaudi-server 0657cb66df78 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" About a minute ago Up About a minute 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db 684d3e9d204a ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" About a minute ago Up About a minute 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-gaudi-server ``` @@ -863,7 +863,7 @@ View the docker input parameters in `./ChatQnA/docker_compose/intel/hpu/gaudi/c ```yaml vllm-service: - image: ${REGISTRY:-opea}/llm-vllm-hpu:${TAG:-latest} + image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest} container_name: vllm-gaudi-server ports: - "8007:80"