cortex.onnx is a high-efficiency C++ inference engine for edge computing focusing on Windows platform using DirectML for GPU acceleration.
It is a dynamic library that can be loaded by any server at runtime.
.
├── base -> Engine interface
├── examples -> Server example to integrate engine
├── onnxruntime-genai -> Upstream onnxruntime-genai
├── src -> Engine implementation
├── third-party -> Dependencies of the cortex.onnx project
This guide provides step-by-step instructions for building cortex.onnx from source on Windows systems.
First, you need to clone the cortex.onnx repository:
git clone --recurse https://github.com/janhq/cortex.onnx.git
If you don't have git, you can download the source code as a file archive from cortex.onnx GitHub.
- On Windows
Install CMake and MsBuild
# Build dependencies ./build_cortex_onnx.bat # Build engine mkdir build cd build cmake .. cmake --build . --config Release -j4 # Build server example (from root repository) mkdir -p examples/server/build cd examples/server/build cmake .. cmake --build . --config Release -j4
Step 1: Downloading a Model
Clone a model from https://huggingface.co/cortexhub, checkout to dml branch
Step 2: Start server
-
On Windows
cd examples/server/build/Release mkdir -p engines\cortex.onnx\ cp ..\..\..\..\build\Release\engine.dll engines\cortex.onnx\ cp ..\..\..\..\onnxruntime-genai\build\Release\*.dll .\ server.exe
Step 3: Load model
curl http://localhost:3928/loadmodel \
-H 'Content-Type: application/json' \
-d '{
"model_path": "./model/llama3",
"model_alias": "llama3",
"system_prompt": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n",
"user_prompt": "<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n",
"ai_prompt": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
}'
Step 4: Making an Inference
curl http://localhost:3928/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who won the world series in 2020?"
}
],
"model": "llama3"
}'
Table of parameters
Parameter | Type | Description |
---|---|---|
model_path |
String | The file path to the onnx model. |