Visual Question Answering (VQA) is the task of answering open-ended questions based on an image. The input to models supporting this task is typically a combination of an image and a question, and the output is an answer expressed in natural language.
Some noteworthy use case examples for VQA include:
- Accessibility applications for visually impaired individuals.
- Education: posing questions about visual materials presented in lectures or textbooks. VQA can also be utilized in interactive museum exhibits or historical sites.
- Customer service and e-commerce: VQA can enhance user experience by letting users ask questions about products.
- Image retrieval: VQA models can be used to retrieve images with specific characteristics. For example, the user can ask “Is there a dog?” to find all images with dogs from a set of images.
General architecture of VQA shows below:
This example guides you through how to deploy a LLaVA (Large Language and Vision Assistant) model on Intel Gaudi2 to do visual question and answering task. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Please visit Habana AI products for more details.
- Build the Docker image needed for starting the service
cd serving/
docker build . --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${http_proxy} -t intel/gen-ai-examples:llava-gaudi
- Start the LLaVA service on Intel Gaudi2
docker run -d -p 8085:8000 -v ./data:/root/.cache/huggingface/hub/ -e http_proxy=$http_proxy -e https_proxy=$http_proxy --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host intel/gen-ai-examples:llava-gaudi
Here are some explanation about the above parameters:
-p 8085:8000
: This will map the 8000 port of the LLaVA service inside the container to the 8085 port on the host-v ./data:/root/.cache/huggingface/hub/
: This is to prevent from re-downloading model fileshttp_proxy
andhttps_proxy
are used if you have some proxy setting--runtime=habana ...
is required for running this service on Intel Gaudi2
Now you have a LLaVa service with the exposed port 8085
and you can check whether this service is up by:
curl localhost:8085/health -v
If the reply has a 200 OK
, then the service is up.
Now you have two options to start the frontend UI by following commands:
cd ui/
pip install -r requirements.txt
http_proxy= python app.py --host 0.0.0.0 --port 7860 --worker-addr http://localhost:8085 --share
cd ui/
pip install -r requirements.txt
http_proxy= python app.py --host 0.0.0.0 --port 7860 --worker-addr http://localhost:8085 --lang CN --share
Here are some explanation about the above parameters:
--host
: the host of the gradio app--port
: the port of the gradio app, by default 7860--worker-addr
: the LLaVA service IP address. If you setup the service on a different machine, please replacelocalhost
to the IP address of your Gaudi2 host machine--lang
: Specify this parameter to use the Chinese interface. The default UI language is English and can be used without any additional parameter.
SCRIPT USAGE NOTICE: By downloading and using any script file included with the associated software package (such as files with .bat, .cmd, or .JS extensions, Docker files, or any other type of file that, when executed, automatically downloads and/or installs files onto your system) (the “Script File”), it is your obligation to review the Script File to understand what files (e.g., other software, AI models, AI Datasets) the Script File will download to your system (“Downloaded Files”). Furthermore, by downloading and using the Downloaded Files, even if they are installed through a silent install, you agree to any and all terms and conditions associated with such files, including but not limited to, license terms, notices, or disclaimers.