This repository contains the Helm chart for deploying the Azure AI Document Intelligence Connected Containers for use with custom extraction scenarios.
The templates in this Helm chart will deploy the following components to a Kubernetes cluster:
- nginx (alpine) - Reverse proxy for the Layout and Custom Template services.
- Azure AI Document Intelligence - Layout - Service to perform layout analysis on documents to extract text, tables, and forms.
- Azure AI Document Intelligence - Custom Template - Service to create custom extraction models for specific document types.
- Azure AI Document Intelligence - Studio - Web-based Studio UI to create and manage custom extraction models.
This Helm chart also includes a custom StorageClass
used for provisioning persistent storage as Azure File Shares with nobrl
mount option enabled to prevent file locking issues with the Azure AI Document Intelligence Studio sqlite database file.
To deploy the Azure AI Document Intelligence Connected Containers, you will need:
- Install Azure CLI.
- Install kubectl.
- Install Helm.
- Install Azure Kube Login if you are authenticating kubectl commands using Entra ID authentication with RBAC.
- An Azure Kubernetes Service (AKS) cluster. Create a new AKS cluster if you don't have one.
- An Azure AI Document Intelligence service (required for billing purposes only). Create a new DI service if you don't have one.
The following steps will guide you through deploying the Azure AI Document Intelligence Connected Containers to your Kubernetes cluster.
Note
The templates have been pre-configured with default values except for the required documentIntelligence.env.billing
and documentIntelligence.env.apikey
values. You can override these values using the --set
option or a values.yaml
file as described below. To customize the deployment further, you can modify the chart's default values.yaml
file in the ai-document-intelligence
directory.
kubectl create namespace di
helm install di-extraction ai-document-intelligence --namespace di --set documentIntelligence.env.billing.value=your-document-intelligence-endpoint-value --set documentIntelligence.env.apikey.value=your-document-intelligence-apikey-value
When using secret values, you can configure the billing endpoint and API key with the --set documentIntelligence.env.billing.valueFrom.secretKeyRef
and --set documentIntelligence.env.apikey.valueFrom.secretKeyRef
options with both the required name
and key
values.
Alternatively, you can use a values.yaml
file to configure the deployment. See more on Helm values files.
By default, this Helm chart deploys only the containers required and does not expose any services to the public internet. To access the Azure AI Document Intelligence Studio and nginx proxy, you can use kubectl port-forward
to forward the service ports to your local machine.
kubectl port-forward svc/di-extraction-ai-document-intelligence-nginx 5000:5000 --namespace di
kubectl port-forward svc/di-extraction-ai-document-intelligence-studio 5001:5001 --namespace di
You can then access the Studio UI at http://localhost:5001
. When creating a new custom extraction project, the Form Recognizer Service Endpoint value will be set to the nginx proxy URL http://localhost:5000
.
Note
In a real-world scenario, you would expose the services using an ingress controller, service mesh, or other application gateway to control access to the services and provide a secure connection. This is not covered in this Helm chart as you may already be implementing this. Please refer to the Kubernetes documentation for more information.
Although not included in the default values.yaml
file, you can configure these settings using the Helm --set
option or a values.yaml
file for each of the deployment configurations. The values include tolerations
, affinity
, and nodeSelector
for each of the deployments.
Here is an example values.yaml
for how to configure the layout
deployment with tolerations, affinity, and node selectors:
layout:
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "key"
operator: "In"
values:
- "value"
nodeSelector:
key: "value"