Skip to content

Latest commit

 

History

History
201 lines (141 loc) · 10.1 KB

README.md

File metadata and controls

201 lines (141 loc) · 10.1 KB

Docker for DSTA

Introduction

This repo contains useful docker files and scripts for the DSTA project. You can find the pre-built docker images, docker files, and helper scripts here. The docker usage targets both x86 (ordinary desktop CPU/GPU) and ARM (aarch64, Jetson platforms) architectures. Although, currently the images for these two platforms have different functionalities.

Major packages

NGC ubuntu CUDA Python PyTorch Torch-TRT comment
22.12 20.04 11.8 3.8.10 1.14.0a0 1.3.0a0 For pre-proc
22.08 20.04 11.7 3.8 1.13.0a0 1.1.0a0 All purpose on x86.
20.11 18.04 11.1 3.6.10 1.8.0a0 None For compatibility test on x86

Pre-built images

NOTE: All the pre-built images have the root as the default user. This is not a good practice and it leads to issues when trying to give GUI support to a docker container. The user is encouraged to create a wrapper docker image based on any of the pre-built images and add an appropriate non-root user to the wrapper image. Please refer to the Adding the host user to an image section for more details.

Pre-built images can be found in our Docker Hub repository for x86 and ARM architectures. The convention of the image tag is <Docker Hub account>/ngc_<platform>_dsta:<NGC version>_<suffix>. Where the placeholders are

  • Docker Hub account: A Docker Hub account name. This part could be any valid name if the images are built locally following the Building images locally section.
  • platform: Can be x86 or arm. Use arm on Jetson devices.
  • NGC version: The version of the NGC PyTorch image with out the -py3 suffix. E.g., 22.12.
  • suffix: An ordered name showing the functions of the image. For suffixs that contain pre, they are used for pre-processing. Not for testing inference or any operations that have GPU involved (if on Jetson devices).

An example image tag could be

yaoyuh/ngc_arm_dsta:22.12_pre_02_python

which is an image for a Jetson device based on NGC PyTorch 22.12 and it provides necessary third-party python packages for pre-processing. Note that the suffix is only for documentation purposes. The user needs to look at the actual docker files to get a sense of what is in an image without running the image.

Scripts for creating docker containers

NOTE: It is strongly recommended to follow the Adding the host user to an image section to create a wrapper docker image for working with the scripts provided here.

In the scripts folder, there is a script file for starting a development docker container. That is start_dev_container.sh. GUI (X server) support is already built in.

To use these scripts, it is recommended to make a copy of a script and modify the necessary arguments for the docker run command.

To start a docker container and enter it immediately, use the following command

cd <scripts/>
./start_dev_docker.sh <docker image with tag>

The container created by these scripts does not get removed automatically. The user needs to do docker rm before running the script with the same image tag.

Adding the host user to an image

It is recommended to create a wrapper image and add the host user to it. This has several benefits:

  • No file permission issues in and out of the container.
  • Friendly to GUI-enabled applications.
  • Provides an extra layer of security such that the user won't accidentally change or delete mounted host files.

A dedicated script, add_user_2_image.sh, is provided to help to add the host user to an image. The usage is

cd <scripts/>
./add_user_2_image.sh <input image tag> <new/wrapper image tag>

Building images locally

To build the images locally, do

cd <scripts/>
./build_images.sh <Docker Hub account> <NGC version>

As mentioned in the Pre-built images section, <Docker Hub account> could be any valid account name that best suits the user's needs.

An concrete example (on Jetson) could be

cd <scripts/>
./build_images.sh yaoyuh 22.12 

Several images will be built progressively. In case of a failure, the user can comment out some parts of the script, make necessary changes to the docker file and re-run again. Then previous successfully built images serve as warm start (base images) of the modified docker file. When the whole build procedure finishes, there will be an image with a 99_local tag suffix. This is the final image that has the host user already added. The images built by the above example command on a Jetson device are

REPOSITORY            TAG                   IMAGE ID       CREATED      SIZE
yaoyuh/ngc_arm_dsta   22.12_pre_99_local    ef1c20223b47   2 days ago   14.4GB
yaoyuh/ngc_arm_dsta   22.12_pre_02_python   650ffbd9c296   2 days ago   14.4GB
yaoyuh/ngc_arm_dsta   22.12_pre_01_base     555cde636521   2 days ago   14.3GB

Running ./build_images.sh yaoyuh 20.11 on x86 will be

REPOSITORY            TAG                             IMAGE ID       CREATED        SIZE
yaoyuh/ngc_x86_dsta   20.11_99_local                  8d07077ba47d   16 hours ago   13.6GB
yaoyuh/ngc_x86_dsta   20.11_03_cuda_torch_dependent   fe7659272a98   16 hours ago   13.6GB
yaoyuh/ngc_x86_dsta   20.11_02_python                 c0cf065066aa   16 hours ago   13.4GB
yaoyuh/ngc_x86_dsta   20.11_01_base                   8054a4e1402e   16 hours ago   13.2GB

Note that, currently, different set of images with different suffixs will be generated on Jetson and x86 platforms. This situation may be changed to have a more consistent set of images later.

Notes for supporting PyG and CuPy on x86

The user is encouraged to take a look at dockerfiles/version_helper.py to get the supported versions of PyG and CuPy. Since the versions of CUDA and PyTorch are determined by the NGC image, we need to find the best match for PyG and CuPy by referring to the CUDA and PyTorch.

On ARM (Jetson), no images are provided for supporting PyG and CuPy at the moment. They might be supported later. Currently on Jetpack 4.6, all our inference codes are running without docker.

Remove a series of images based on NGC version

NOTE: Use with caution.

When a newer NGC version is available, we can remove a series of images that are based on an old NGC version by the remove_images.sh script.

First, perform a dry run.

cd <scripts/>
./remove_images.sh <Docker Hub account> <NGC version>

The following is an example on a x86.

$ cd <scripts/> 
$ ./remove_images.sh yaoyuh 20.11   
REPOSITORY            TAG                             IMAGE ID       CREATED        SIZE
yaoyuh/ngc_x86_dsta   20.11_99_local                  8d07077ba47d   16 hours ago   13.6GB
yaoyuh/ngc_x86_dsta   20.11_03_cuda_torch_dependent   fe7659272a98   16 hours ago   13.6GB
yaoyuh/ngc_x86_dsta   20.11_02_python                 c0cf065066aa   16 hours ago   13.4GB
yaoyuh/ngc_x86_dsta   20.11_01_base                   8054a4e1402e   16 hours ago   13.2GB

To confirm the deletion, add the -c option to the command line.

$ ./remove_images.sh yaoyuh 22.12 -c
Removing...
REPOSITORY           TAG                 IMAGE ID       CREATED        SIZE
...

The above command untags the images. To finally/acutally remove them, use

# This will remove other stuff! Use with caution.
# Please read the documentation of 'docker system prune' before proceeding.
docker system prune

FAQ

  • Build from NGC images failed with the following error.
-----
 > [1/3] FROM nvcr.io/nvidia/pytorch:22.08-py3@sha256:1aa83e1a13f756f31dabf82bc5a3c4f30ba423847cb230ce8c515f3add88b262:
------
failed to copy: httpReadSeeker: failed open: failed to authorize: rpc error: code = Unknown desc = failed to fetch anonymous token: unexpected status: 401 Unauthorized

It seems that the reason for this problem is some incompatibility issues related to BuildKit when a Docker Hub account has logged in on the host computer. To fix it, log out the docker hub account on the host computer. Ref.

docker logout

If the problem persists, go to build_docker_image.sh and disable BuildKit by removing the command DOCKER_BUILDKIT=1.

  • x86, cannot import rospy because rospkg cannot be found.

This is due to the fact that the NGC images use conda but ROS is installed with /usr/bin/python3. For working with ROS (in images that support ROS), the user need to add the correct search path to sys.path manually. Something like

import sys
sys.path.append('/usr/lib/python3/dist-packages/')

Who to talk to

Please create GitHub issues if you find any problems.

Point of contact:

Yaoyu Hu <[email protected]>