Docker and the storage of its image data long term RFC #288

tatarsky · 2015-07-14T14:30:29Z

(RFC=Request for Comment)

I am trying to get a clear understanding of the way docker should best store images in what it calls its "root area" which is currently set to /scratch/docker compared to the default of /var/lib/docker which would be a smaller area on a node.

And also get the requirements for the space and usage of docker. We currently have it in a form of "open season test mode" and have for awhile.

That root area consists of more than just images. It is the area where I believe much of the container state is also kept so I don't think you can just toss the entire region into a shared GPFS directory. Or at least I would have to experiment with that and I can sort of see there is going to be collisions if I do.

My brief reading seems to point to the use of a local repository if a large collection of images and their local modifications is required. With its backend storage then on some shared media. This requires a daemon be run and there appear to be two projects with implementations.

All this would be considered an enhancement but I said I would look into it and so here is the start of my trying to understand the requirements to move forward.

tatarsky · 2015-07-14T14:36:45Z

I am going to add a few of the items I look at on these topics as I go for reference.
http://odewahn.github.io/docker-jumpstart/
https://github.com/docker/docker-registry

jchodera · 2015-08-09T16:52:55Z

These are all good questions. I don't have any good answers yet. For now, I've been maintaining scripts to sweep the nodes of old images and containers corresponding to our docker image (jchodera/fah-client), but it would be much easier if there was a concept of user ownership over images and containers.

lzamparo · 2015-10-07T19:07:33Z

@jchodera: do your scripts allow the jchodera/fah-client image to be deployed in the image cache on all nodes? If so, would you mind sharing?

Currently I find that my own docker jobs often involve a first (very annoying) step of pulling an image from dockerhub. Though on some nodes where I've done interactive work, the image is in the root area already, and is visible via docker images. This adds another layer of complexity when scripting, and is most undesirable.

@tatarsky maybe collisions could be avoided by having users own their own docker images by committing them with their own HAL ids? Or do I not understand the entirety of creating a common root area?

tatarsky · 2015-10-07T19:58:43Z

This is as I note why I am requesting comments. I do not have docker setup in a way to use their own HAL ids as I believe that requires sudo (compared to group docker). If you know otherwise advise.

tatarsky · 2015-10-07T20:03:14Z

BTW, I believe what @jchodera is doing is registering his image as a variant at the docker hub. Thus preventing collision because he's made it a unique image...unless I misunderstand.

lzamparo · 2015-10-07T20:36:00Z

I think as long as users commit their changes and tag the revision, there should be no collision problem. FYI, I'm also registering my images as variants using docker hub; this prevents collisions with other users.

There does seem to be detritus building up in the /scratch/docker of different nodes. For instance, I'm working now on gpu-2-4 and see:

[zamparol@gpu-2-4 fit_hic]$ docker images
REPOSITORY                                     TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
jchodera/docker-core-linux-build-environment   latest              8239a3e4f625        5 weeks ago         3.867 GB
ubuntu                                         fbcunn              212c4aafa747        7 weeks ago         9.808 GB
ubuntu                                         fbcunn_final        ab910b556361        7 weeks ago         6.293 GB
jchodera/docker-fah-client                     latest              a2f867fc1c4b        8 weeks ago         2.079 GB
<none>                                         <none>              6b06d5cf3d45        8 weeks ago         2.079 GB
ubuntu                                         latest              13b176913597        10 weeks ago        197.8 MB
kaixhin/cuda-torch                             latest              713c712e4a87        3 months ago        3.154 GB
centos                                         latest              ae0c2d0bdc10        11 months ago       224 MB

On other nodes where I've done previous docker work, I've got docker images available locally to be spun up.

tatarsky · 2015-10-07T20:41:03Z

Yep. Part of this RFC (which has been pretty "C" or comment light) is "so, when should those images be deleted" as well. Weeks? Months? I seldom am going to make such decisions without user input and consensus so feel free to define a retention period.

Currently I am only looking for stray docker images still running due to being not scheduled with the proper exit conditions. I use docker ps and I carefully check before deciding to docker kill

lzamparo · 2015-10-07T20:41:23Z

I just ran a test to see if I could make a change to the version of ubuntu:fbcunn_final:

[zamparol@gpu-2-4 fit_hic]$ docker images
REPOSITORY                                     TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
jchodera/docker-core-linux-build-environment   latest              8239a3e4f625        5 weeks ago         3.867 GB
ubuntu                                         fbcunn              212c4aafa747        7 weeks ago         9.808 GB
ubuntu                                         fbcunn_final        ab910b556361        7 weeks ago         6.293 GB
jchodera/docker-fah-client                     latest              a2f867fc1c4b        8 weeks ago         2.079 GB
<none>                                         <none>              6b06d5cf3d45        8 weeks ago         2.079 GB
ubuntu                                         latest              13b176913597        10 weeks ago        197.8 MB
kaixhin/cuda-torch                             latest              713c712e4a87        3 months ago        3.154 GB
centos                                         latest              ae0c2d0bdc10        11 months ago       224 MB
[zamparol@gpu-2-4 fit_hic]$ docker run -t -i ubuntu:fbcunn_final /bin/bash
root@9e2967938dd7:/# ls
bin  boot  dev  etc  home  initrd.img  lib  lib32  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var  vmlinuz
root@9e2967938dd7:/# cd home
root@9e2967938dd7:/home# mkdir test_lee_woo
root@9e2967938dd7:/home# cd test_lee_woo/
root@9e2967938dd7:/home/test_lee_woo# vim test.txt
root@9e2967938dd7:/home/test_lee_woo# exit
[zamparol@gpu-2-4 fit_hic]$ docker commit -m "test alteration ubuntu fbcunn_final minor alteration, changing the version tag." -a "Lee Zamparo" 9e2967938dd7 ubuntu:fbcunn_lee
b2e8fcaf0656d489f8274c1a93154895f37a430664b42c736f67bb69dacc4dd5
[zamparol@gpu-2-4 fit_hic]$ docker images
REPOSITORY                                     TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
ubuntu                                         fbcunn_lee          b2e8fcaf0656        2 minutes ago       6.293 GB
jchodera/docker-core-linux-build-environment   latest              8239a3e4f625        5 weeks ago         3.867 GB
ubuntu                                         fbcunn              212c4aafa747        7 weeks ago         9.808 GB
ubuntu                                         fbcunn_final        ab910b556361        7 weeks ago         6.293 GB
jchodera/docker-fah-client                     latest              a2f867fc1c4b        8 weeks ago         2.079 GB
<none>                                         <none>              6b06d5cf3d45        8 weeks ago         2.079 GB
ubuntu                                         latest              13b176913597        10 weeks ago        197.8 MB
kaixhin/cuda-torch                             latest              713c712e4a87        3 months ago        3.154 GB
centos                                         latest              ae0c2d0bdc10        11 months ago       224 MB

So my new docker image ubuntu:fbcunn_lee is now available in the root area, ready to be spun up. Maybe the docker policy could be to either:

Use tags to version your own images or
Explicitly rename and commit them using a docker hub ID

I think (2) would be safer in preventing people from clobbering each other's images.

tatarsky · 2015-10-07T20:43:37Z

I'm fine with that. I am only aware of a small number of people using docker on a regular basis. If they wish to disagree, this is the place! Enforcing that may require some additional steps but I think this makes the most sense.

lzamparo · 2015-10-07T20:43:39Z

Well, how much space in /scratch are idle or discarded images using?

If it's a significant fraction, I'd vote for deleting from the cache on the order of weeks, especially if people are committing their images to docker hub, where they can be pulled easily. This should be pretty simple to set up, given that you can use github credentials to get a docker hub account IIRC.

lzamparo · 2015-10-07T20:44:14Z

But we should probably have Chodera lab and Raetsch lab ppl weigh in.

tatarsky · 2015-10-07T20:44:48Z

Concur. I'll take a look at the space consumed. In general the /scratch drives are quite empty.

lzamparo · 2015-10-07T20:46:34Z

@akahles, @karalets, @jchodera : any preference?

jchodera · 2015-10-09T19:47:05Z

@jchodera: do your scripts allow the jchodera/fah-client image to be deployed in the image cache on all nodes? If so, would you mind sharing?

I used a very simple set of scripts to wipe and deploy docker images to all nodes:

#!/bin/tcsh

foreach node ( `cat nodelist` )
echo $node
ssh $node "docker rmi jchodera/docker-fah-client"
end

and

#!/bin/tcsh
foreach node ( `cat nodelist` )
echo $node
ssh $node "docker pull jchodera/docker-fah-client"
end

where nodelist is this file

gpu-1-4
gpu-1-5
gpu-1-6
gpu-1-7
gpu-1-8
gpu-1-9
gpu-1-10
gpu-1-11
gpu-1-12
gpu-1-13
gpu-1-14
gpu-1-15
gpu-1-16
gpu-1-17
gpu-2-4
gpu-2-5
gpu-2-6
gpu-2-7
gpu-2-8
gpu-2-9
gpu-2-10
gpu-2-11
gpu-2-12
gpu-2-13
gpu-2-14
gpu-2-15
gpu-2-16
gpu-2-17
gpu-3-8
gpu-3-9

Not very pretty, but this got the job done.

jchodera · 2015-10-09T19:49:53Z

BTW, I believe what @jchodera is doing is registering his image as a variant at the docker hub. Thus preventing collision because he's made it a unique image...unless I misunderstand.

I am using dockerhub to automatically build and host docker images. It's great. You just point it to a github repo with your Dockerfile, like this project was:
https://hub.docker.com/r/jchodera/docker-fah-client/

jchodera · 2015-10-09T19:52:00Z

Well, how much space in /scratch are idle or discarded images using?

If it's a significant fraction, I'd vote for deleting from the cache on the order of weeks, especially if people are committing their images to docker hub, where they can be pulled easily. This should be pretty simple to set up, given that you can use github credentials to get a docker hub account IIRC.

This sounds great, though we want to make sure deleting a docker image that is in use doesn't cause any problems to currently-running containers. Our management plan should exploit the fact that docker containers should be "short running" jobs that are only run through the queue, with a execution time limit of a few days before destruction.

tatarsky · 2015-10-09T19:55:26Z

Also note, from a disk space point of view the current space consumed on /scratch/docker is noise. No local size of those dirs exceeds 100GB. One node has 93GB. So discussion of the methods of cleaning is not time critical. Most node local drives are very lightly loaded up.

akahles · 2015-10-09T22:56:25Z

@akahles, @karalets, @jchodera : any preference?

No preference from my side. If the space does not get tight on /scratch I am fine with very spaced cleanup intervals (e.g., every two months). Ideally only unused images that are also available on docker hub are removed, but that might be difficult to check.

tatarsky · 2015-10-12T14:32:06Z

So basically having monitored this if we proceeded with:

Warn Docker users to check in specific versions of their images to dockerhub before some date
Delete docker images older than say four weeks after validating those images are not running
Possibly warn folks that not having a Hal or Git handle in the name of the image will result in its deletion more often (not 100% sure how I would enforce this)

lzamparo · 2015-10-13T14:53:43Z

I'm fine with this. Maybe to help the check-in process along, we could link to instructions for how to sign up for docker hub & upload images (or docker files)?

tatarsky · 2015-10-13T15:06:54Z

My suggestion is a section in the wiki with said instructions. And point people at the wiki. Its more persistent.

tatarsky · 2015-10-16T13:20:05Z

@lzamparo misc item to assist with another cleaning script concept. Can you double confirm this docker image on gpu-1-11 is an orphan? My script believes it is but I do not auto-kill yet.

docker ps
CONTAINER ID      
c8a46359b580       (removed uid but its related to you)

lzamparo · 2015-10-16T15:48:50Z

@tatarsky I didn't think I had any running jobs, so this job is an orphan. I attached to it and killed it myself.

tatarsky · 2015-10-16T15:52:03Z

Thanks script agreed but I like to check.

tatarsky · 2016-02-19T14:23:34Z

This remains a desire for the docker cleaning script which I may get some cycles on shortly.

tatarsky added enhancement question labels Jul 14, 2015

tatarsky mentioned this issue Aug 10, 2015

Torch CUDA package does not work on gpu-1-5 and gpu-2-7 #293

Closed

tatarsky mentioned this issue Oct 14, 2015

Cleaning /scratch of old files #249

Open

tatarsky self-assigned this Nov 3, 2015

lzamparo mentioned this issue Jan 8, 2016

Docker image storage policy on Hal #360

Closed

tatarsky mentioned this issue Feb 26, 2016

general help with using docker #379

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker and the storage of its image data long term RFC #288

Docker and the storage of its image data long term RFC #288

tatarsky commented Jul 14, 2015

tatarsky commented Jul 14, 2015

jchodera commented Aug 9, 2015

lzamparo commented Oct 7, 2015

tatarsky commented Oct 7, 2015

tatarsky commented Oct 7, 2015

lzamparo commented Oct 7, 2015

tatarsky commented Oct 7, 2015

lzamparo commented Oct 7, 2015

tatarsky commented Oct 7, 2015

lzamparo commented Oct 7, 2015

lzamparo commented Oct 7, 2015

tatarsky commented Oct 7, 2015

lzamparo commented Oct 7, 2015

jchodera commented Oct 9, 2015

jchodera commented Oct 9, 2015

jchodera commented Oct 9, 2015

tatarsky commented Oct 9, 2015

akahles commented Oct 9, 2015

tatarsky commented Oct 12, 2015

lzamparo commented Oct 13, 2015

tatarsky commented Oct 13, 2015

tatarsky commented Oct 16, 2015

lzamparo commented Oct 16, 2015

tatarsky commented Oct 16, 2015

tatarsky commented Feb 19, 2016

Docker and the storage of its image data long term RFC #288

Docker and the storage of its image data long term RFC #288

Comments

tatarsky commented Jul 14, 2015

tatarsky commented Jul 14, 2015

jchodera commented Aug 9, 2015

lzamparo commented Oct 7, 2015

tatarsky commented Oct 7, 2015

tatarsky commented Oct 7, 2015

lzamparo commented Oct 7, 2015

tatarsky commented Oct 7, 2015

lzamparo commented Oct 7, 2015

tatarsky commented Oct 7, 2015

lzamparo commented Oct 7, 2015

lzamparo commented Oct 7, 2015

tatarsky commented Oct 7, 2015

lzamparo commented Oct 7, 2015

jchodera commented Oct 9, 2015

jchodera commented Oct 9, 2015

jchodera commented Oct 9, 2015

tatarsky commented Oct 9, 2015

akahles commented Oct 9, 2015

tatarsky commented Oct 12, 2015

lzamparo commented Oct 13, 2015

tatarsky commented Oct 13, 2015

tatarsky commented Oct 16, 2015

lzamparo commented Oct 16, 2015

tatarsky commented Oct 16, 2015

tatarsky commented Feb 19, 2016