-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @leothomas! Just some small comments for now. Do you think we should also add a .dockerignore
file to keep the Docker image a pure virtual environment? Also, what are your thoughts on setting up some CI to push pre-built containers to a docker registry (can be done in a separate PR)?
environment.yml
Outdated
- pytorch~=2.1.0 | ||
- pyarrow~=15.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you resolve the merge conflict here with the main
branch? Also see if removing the pyarrow
pin works. I managed to get conda to solve for osx-arm64
recently on #164 after https://github.com/conda-forge/torchvision-feedstock/pull/89/files was merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. You should be able to simply ditch your changes to environment.yml
. If not, then keep in mind that whenever you modify environment.yml
, you must regenerate conda-lock.yml
, so you should never be committing changes to only environment.yml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean conflicts with ci/osx-arm64, rather than conflicts with main
? I didn't notice any conflicts with main
.
I merged the changes from ci/osx-arm64
and, while I'm able to install from conda-lock.yml
(as @chuckwondo suggested) I'm not able to install from environment.yml
due to cuda
being unavailable in the docker container:
> [claymodel 4/4] RUN micromamba create -y -n claymodel --file environment.yml && micromamba clean --all --yes:
135.1 error libmamba Could not solve for environment specs
135.1 The following package could not be installed
135.1 └─ pytorch ~=2.1.0 *cuda12* is not installable because it requires
135.1 └─ __cuda, which is missing on the system.
135.3 critical libmamba Could not solve for environment specs
------
failed to solve: process "/usr/local/bin/_dockerfile_shell.sh micromamba create -y -n claymodel --file environment.yml && micromamba clean --all --yes" did not complete successfully: exit code: 1
I'm thinking that the conda-lock.yml
and environment.yml
files aren't synchronized?
Let me know what the best course of action is. I can look into making a composite docker image, based off of both micromamba
and nvidia/cuda so that we can install pytorch with cuda support in the docker container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean conflicts with ci/osx-arm64, rather than conflicts with
main
? I didn't notice any conflicts withmain
.
I meant with main
actually. The changes in the ci/osx-arm64
branch aren't much.
I merged the changes from
ci/osx-arm64
and, while I'm able to install fromconda-lock.yml
(as @chuckwondo suggested) I'm not able to install fromenvironment.yml
due tocuda
being unavailable in the docker container:
Yet, best to go with conda-lock.yml
as Chuck suggested. If you want to install from environment.yml
on a device without CUDA GPUs, set CONDA_OVERRIDE_CUDA=12.0
following https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html#overriding-detected-packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm that's odd - there were no merge conflict with main
in the case. I'm might have been missing something. Do the environment.yml and conda-lock.yml files look correct in their current state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the individual comments/suggestions, please also add a section to README.md
about how to run the Docker container (docker compose up
) as an alternative to installing things locally, and how to access JupyterLab once the container is started. In particular, mention the URLs that appears in the logging messages:
claymodel-1 | Or copy and paste one of these URLs:
claymodel-1 | http://ffd23ea64b9b:8888/lab?token=abebb7b9476b8fff1fb8b543cc552e8c9641b5b38547c3cb
claymodel-1 | http://127.0.0.1:8888/lab?token=abebb7b9476b8fff1fb8b543cc552e8c9641b5b38547c3cb
Absolutely add a However, keep in mind that there are cases where you must be more explicit in For example, you cannot use only |
Alternatively, as I mentioned to you ages ago, I tend to "invert" my use of
|
Awesome! Thank you both! I've addressed some of the request changes:
Do y'all think it would be valuable to try to see if I can get this to run with a cuda docker image to enable the pytorch-cuda installation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggested changes to reduce diff from merge conflict handling.
Dockerfile
Outdated
|
||
COPY --chown=$MAMBA_USER:$MAMBA_USER . . | ||
|
||
RUN micromamba create -y -n claymodel --file conda-lock.yml && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm getting this error when running docker build . -t clay --platform linux/amd64
locally:
0.384 Transaction starting
79.51 critical libmamba Failed to create dir 'info'
79.51 error libmamba Error opening for reading "/opt/conda/pkgs/cudnn-8.9.7.29-h092f7fd_3/info/index.json": No such file or directory
79.51 error libmamba Error when extracting package: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal
79.51 cudnn-8.9.7.29-h092f7fd_3.conda extraction failed
79.73 critical libmamba Found incorrect download: cudnn. Aborting
Not sure if it's because something changed when I re-locked the conda-lock.yml file manually when doing the merge from main at 5be9fa2#diff-63113c19c5d310b8e350b302f279e2297ba6faa9ac9c99c1c82e83e508447865
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh jeez - I'm getting a segmentation fault:
2.612 Transaction starting
1441.7 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
1441.7 bash: line 1: 10 Segmentation fault micromamba create -y -n claymodel --file conda-lock.yml
------
Dockerfile:11
--------------------
10 |
11 | >>> RUN micromamba create -y -n claymodel --file conda-lock.yml && \
12 | >>> micromamba clean --all --yes
13 |
--------------------
ERROR: failed to solve: process "/usr/local/bin/_dockerfile_shell.sh micromamba create -y -n claymodel --file conda-lock.yml && micromamba clean --all --yes" did not complete successfully: exit code: 139
Wondering if it's either related to a lack of available memory or installing CUDA in docker.
I recall that when we installed the model libraries locally we had to install pytorch without CUDA - could the CUDA installation have made its way into the conda-lock file and causing issues?
… a docker container
for more information, see https://pre-commit.ci
867e3bd
to
ce425e2
Compare
Putting some life into this to see if we can get this to work for binder and friends. |
for more information, see https://pre-commit.ci
Got this to work locally. Docker-compose and Jupyter notebook working fine with the image built from the Dockerfile. 🌮 @leothomas have a look and let me know if you think the small changes make sense to you. Happy to merge this in after some review. Not sure how to use the functionality in things like binder but would be nice to use the dockerized version for those if that unbreaks the deployments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐳
@chuckwondo re-requested your review to unblock this. I think with v1 it's now working ok. If you have some time try it out 🐋 |
In order to make it easier to run the model/notebooks without having to manage installing dependencies across various machines/environments, I've added a
micromamba
based Dockerfile which will create the conda environment with the specified libraries.I've also added a docker compose file, in order to specify the build-time and run-time arguments for exposing the jupyter lab port and mounting the current directly as a volume, into the docker container. This will allow users to modify any of the model/notebook code locally, without having to re-build the image.
By default the docker image starts with running jupyter lab but this can be overridden both in the docker-compose or even in the command line with any other python or bash command.
The
platform=linux/amd64
build and run-time args enable the image to be built on Mac M1 while maintaining compatibility with Linux.The container can be run with:
docker-compose up
ordocker-compose run claymodel <command>
wherecommand
is a command which override the jupyter lab startupThe container can also be built directly (bypassing the need for docker-compose) with:
docker build . -t clay --platform linux/amd64
and then run with:
docker run --rm -it -v $(pwd):/model -p 8888:8888 -e ENV_NAME=claymodel --platform linux/amd64 clay:latest