Name		Name	Last commit message	Last commit date
parent directory ..
models		models
README.md		README.md
cache_uuids.py		cache_uuids.py
cassandra_reader_decoupled.py		cassandra_reader_decoupled.py
cassandra_reader_interactive.py		cassandra_reader_interactive.py
client-grpc-ensemble.py		client-grpc-ensemble.py
client-grpc-stream-ensemble.py		client-grpc-stream-ensemble.py
client-grpc-stream-stress.py		client-grpc-stream-stress.py
client-grpc-stress.py		client-grpc-stress.py
client-http-ensemble.py		client-http-ensemble.py
client-http-stress.py		client-http-stress.py
create-json.py		create-json.py
create_tables.cql		create_tables.cql
extract_common.py		extract_common.py
extract_serial.py		extract_serial.py
private_data.py		private_data.py
start-and-fill-db.sh		start-and-fill-db.sh
start-triton.sh		start-triton.sh

README.md

Integration with NVIDIA Triton

This plugin also supports inference via the powerful and flexible NVIDIA Triton server.

This allows a client to request that images stored in a remote Cassandra server be inferenced on a different remote, GPU-powered server.

Cassandra DALI operators

The plugin provides two operators to be used with Triton:

`fn.crs4.cassandra_interactive`

This operator expects a batch of UUIDs as input, represented as pairs of uint64, and produces as output a batch containing the raw images which are stored as BLOBs in the database, possibly paired with the corresponding labels.

`fn.crs4.cassandra_decoupled`

The decoupled version of the operator splits the input UUIDs (which, in this case, can form a very long list) into mini-batches and proceeds to request the images from the database using prefetching to increase the throughput and hide the network latencies.

Testing the examples

The directory models contains the following subdirectories, with examples of pipelines using both cassandra_interactive and cassandra_decoupled:

`dali_cassandra_interactive`

This model retrieves the raw data from the database, decodes it into images, performs normalization and cropping, and returns the images as a tensor. It utilizes the fn.crs4.cassandra_interactive class.

`dali_cassandra_interactive_stress`

This model retrieves the raw data from the database and returns the first byte of every BLOB. It utilizes the fn.crs4.cassandra_interactive class.

`dali_cassandra_decoupled`

This model retrieves the raw data from the database, decodes it into images, performs normalization and cropping, and returns the images as a tensor. It utilizes the fn.crs4.cassandra_decoupled class.

`dali_cassandra_decoupled_stress`

This model retrieves the raw data from the database and returns the first byte of every BLOB. It utilizes the fn.crs4.cassandra_decoupled class.

`classification_resnet`

This model utilizes a pre-trained ResNet50 for ImageNet classification to perform inference. To download the network, simply run the runme.py file.

`cass_to_inference`

This ensemble model connects dali_cassandra_interactive and classification_resnet to load and preprocess images from the database and perform inference on them.

`cass_to_inference_decoupled`

This ensemble model connects dali_cassandra_decoupled and classification_resnet to load and preprocess images from the database and perform inference on them.

Building and running the docker container

The most convenient method to test the cassandra-dali-plugin with Triton is by utilizing the provided Dockerfile.triton (derived from NVIDIA Triton Inference Server NGC), which contains our plugin, NVIDIA Triton, NVIDIA DALI, Cassandra C++ and Python drivers, as well as a Cassandra server. To build and run the container, use the following commands:

# Build and run cassandra-dali-triton docker container
$ docker build -t cassandra-dali-triton -f Dockerfile.triton .
$ docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
  --gpus all --cap-add=sys_admin --name cass-dali cassandra-dali-triton

Starting and filling the DB

Once the Docker container is set up, it is possible to start the database and populate it with images from the imagenette dataset using the provided script:

./start-and-fill-db.sh  # might take a few minutes

Starting Triton server

After the database is populated, we can start the Triton server with

./start-triton.sh
# i.e., tritonserver --model-repository ./models --backend-config dali,plugin_libs=/opt/conda/lib/python3.8/site-packages/libcrs4cassandra.so

Now you can leave this shell open, and it will display the logs of the Triton server.

Testing the inference

To run the clients, start a new shell in the container with following command:

docker exec -ti cass-dali fish

Now, within the container, run the following commands to test the inference:

python3 client-http-stress.py
python3 client-grpc-stress.py
python3 client-grpc-stream-stress.py
python3 client-http-ensemble.py
python3 client-grpc-ensemble.py
python3 client-grpc-stream-ensemble.py

You can also benchmark the inference performance using NVIDIA's perf_analyzer. For example:

perf_analyzer -m dali_cassandra_interactive_stress --input-data uuids.json -b 256 --concurrency-range 16 -p 10000
perf_analyzer -m dali_cassandra_interactive_stress --input-data uuids.json -b 256 --concurrency-range 16 -p 10000 -i grpc
perf_analyzer -m dali_cassandra_decoupled_stress --input-data uuids_2048.json --shape UUID:2048,2 --concurrency-range 4 -i grpc --streaming -p 10000
perf_analyzer -m cass_to_inference --input-data uuids.json -b 256 --concurrency-range 16 -i grpc
perf_analyzer -m cass_to_inference_decoupled --input-data uuids_2048.json --shape UUID:2048,2 --concurrency-range 4 -i grpc --streaming -p 10000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triton

triton

README.md

Integration with NVIDIA Triton

Cassandra DALI operators

`fn.crs4.cassandra_interactive`

`fn.crs4.cassandra_decoupled`

Testing the examples

`dali_cassandra_interactive`

`dali_cassandra_interactive_stress`

`dali_cassandra_decoupled`

`dali_cassandra_decoupled_stress`

`classification_resnet`

`cass_to_inference`

`cass_to_inference_decoupled`

Building and running the docker container

Starting and filling the DB

Starting Triton server

Testing the inference

Files

triton

Directory actions

More options

Directory actions

More options

Latest commit

History

triton

Folders and files

parent directory

README.md

Integration with NVIDIA Triton

Cassandra DALI operators

fn.crs4.cassandra_interactive

fn.crs4.cassandra_decoupled

Testing the examples

dali_cassandra_interactive

dali_cassandra_interactive_stress

dali_cassandra_decoupled

dali_cassandra_decoupled_stress

classification_resnet

cass_to_inference

cass_to_inference_decoupled

Building and running the docker container

Starting and filling the DB

Starting Triton server

Testing the inference

`fn.crs4.cassandra_interactive`

`fn.crs4.cassandra_decoupled`

`dali_cassandra_interactive`

`dali_cassandra_interactive_stress`

`dali_cassandra_decoupled`

`dali_cassandra_decoupled_stress`

`classification_resnet`

`cass_to_inference`

`cass_to_inference_decoupled`