We provide the Docker container with everything set up to run GluonNLP. With the prebuilt docker image, there is no need to worry about the operating systems or system dependencies. You can launch a JupyterLab development environment and try out to use GluonNLP to solve your problem.
Name | Description | Target User |
---|---|---|
cpu-ci-latest or gpu-ci-latest |
Extends the CUDA image to include the basic functionalities, e.g., GluonNLP package, MXNet, PyTorch, Horovod. This is the image used in GluonNLP CI | GluonNLP Developers |
cpu-latest or gpu-latest |
It has more functionality than the CI image, including the a development platform powered by Jupyter Lab. Some useful functionalities like Tensorboard are pre-installed. | Users that are willing to solve NLP problems and also do distributed training with Horovod + GluonNLP. |
You can run the docker with the following command.
# On GPU machine
docker pull gluonai/gluon-nlp:gpu-latest
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=2g gluonai/gluon-nlp:gpu-latest
# On CPU machine
docker pull gluonai/gluon-nlp:cpu-latest
docker run --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=2g gluonai/gluon-nlp:cpu-latest
Here, we open the ports 8888, 8787, 8786, which are used for connecting to JupyterLab.
Also, we set --shm-size
to 2g
. This sets the shared memory storage to 2GB. Since NCCL will
create shared memory segments, this argument is essential for the JupyterNotebook to work with NCCL.
(See also NVIDIA/nccl#290).
The folder structure of the docker image will be
/workspace/
├── gluonnlp
├── tvm
├── data
If you have a multi-GPU instance, e.g., g4dn.12xlarge, p2.8xlarge, p3.8xlarge, you can try to verify the installation of horovod + MXNet by running the question answering script
# Assume that you are currently in GluonNLP
cd gluon-nlp/scripts/question_answering
docker run --gpus all --rm -it --shm-size=2g -v `pwd`:/workspace/data gluonai/gluon-nlp:gpu-latest \
bash -c 'cd /workspace/data && bash commands/run_squad2_albert_base.sh 1 2.0'
To build a docker image from the dockerfile, you may use the following command:
# Build CPU Dockers
docker build -f ubuntu18.04-cpu.Dockerfile --target ci -t gluonai/gluon-nlp:cpu-ci-latest .
docker build -f ubuntu18.04-cpu.Dockerfile --target devel -t gluonai/gluon-nlp:cpu-latest .
# Build GPU Dockers
docker build -f ubuntu18.04-gpu.Dockerfile --target ci -t gluonai/gluon-nlp:gpu-ci-latest .
docker build -f ubuntu18.04-gpu.Dockerfile --target devel -t gluonai/gluon-nlp:gpu-latest .
In addition, to build the GPU docker, you will need to install the nvidia-docker2 and edit /etc/docker/daemon.json
like the following:
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
After that, restart docker via sudo systemctl restart docker.service
.
For more details, you may refer to NVIDIA/nvidia-docker#595. We need this additional setup because the horovod+mxnet integration identifies the library and include path of MXNet by querying the MXNet runtime.
You may try to login to your dockerhub account and push the image to dockerhub.
docker push gluonai/gluon-nlp:cpu-ci-latest
docker push gluonai/gluon-nlp:cpu-latest
docker push gluonai/gluon-nlp:gpu-ci-latest
docker push gluonai/gluon-nlp:gpu-latest
Our current batch job dockers are in 747303060528.dkr.ecr.us-east-1.amazonaws.com/gluon-nlp-1. To update the docker:
- Build the docker image, or update the Dockerfile as described above
- Export the AWS account credentials as environment variables
- CD to the same folder as the Dockerfile and execute the following:
# this executes a command that logs into ECR.
$(aws ecr get-login --no-include-email --region us-east-1)
# tags the recent build as gluon-nlp-1:latest, which AWS batch pulls from.
docker tag gluonai/gluon-nlp:gpu-ci-latest 747303060528.dkr.ecr.us-east-1.amazonaws.com/gluon-nlp-1:gpu-ci-latest
docker tag gluonai/gluon-nlp:cpu-ci-latest 747303060528.dkr.ecr.us-east-1.amazonaws.com/gluon-nlp-1:cpu-ci-latest
# pushes the change
docker push 747303060528.dkr.ecr.us-east-1.amazonaws.com/gluon-nlp-1:gpu-ci-latest
docker push 747303060528.dkr.ecr.us-east-1.amazonaws.com/gluon-nlp-1:cpu-ci-latest
After updating the docker, please create a pull request to commit your changes on the docker files or some scripts.