-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker and the storage of its image data long term RFC #288
Comments
I am going to add a few of the items I look at on these topics as I go for reference. |
These are all good questions. I don't have any good answers yet. For now, I've been maintaining scripts to sweep the nodes of old images and containers corresponding to our docker image ( |
@jchodera: do your scripts allow the Currently I find that my own docker jobs often involve a first (very annoying) step of pulling an image from dockerhub. Though on some nodes where I've done interactive work, the image is in the root area already, and is visible via @tatarsky maybe collisions could be avoided by having users own their own docker images by committing them with their own HAL ids? Or do I not understand the entirety of creating a common root area? |
This is as I note why I am requesting comments. I do not have docker setup in a way to use their own HAL ids as I believe that requires sudo (compared to group docker). If you know otherwise advise. |
BTW, I believe what @jchodera is doing is registering his image as a variant at the docker hub. Thus preventing collision because he's made it a unique image...unless I misunderstand. |
I think as long as users commit their changes and tag the revision, there should be no collision problem. FYI, I'm also registering my images as variants using docker hub; this prevents collisions with other users. There does seem to be detritus building up in the /scratch/docker of different nodes. For instance, I'm working now on
On other nodes where I've done previous docker work, I've got docker images available locally to be spun up. |
Yep. Part of this RFC (which has been pretty "C" or comment light) is "so, when should those images be deleted" as well. Weeks? Months? I seldom am going to make such decisions without user input and consensus so feel free to define a retention period. Currently I am only looking for stray docker images still running due to being not scheduled with the proper exit conditions. I use |
I just ran a test to see if I could make a change to the version of ubuntu:fbcunn_final:
So my new docker image
I think (2) would be safer in preventing people from clobbering each other's images. |
I'm fine with that. I am only aware of a small number of people using docker on a regular basis. If they wish to disagree, this is the place! Enforcing that may require some additional steps but I think this makes the most sense. |
Well, how much space in /scratch are idle or discarded images using? If it's a significant fraction, I'd vote for deleting from the cache on the order of weeks, especially if people are committing their images to docker hub, where they can be pulled easily. This should be pretty simple to set up, given that you can use github credentials to get a docker hub account IIRC. |
But we should probably have Chodera lab and Raetsch lab ppl weigh in. |
Concur. I'll take a look at the space consumed. In general the /scratch drives are quite empty. |
I used a very simple set of scripts to wipe and deploy docker images to all nodes: #!/bin/tcsh
foreach node ( `cat nodelist` )
echo $node
ssh $node "docker rmi jchodera/docker-fah-client"
end and #!/bin/tcsh
foreach node ( `cat nodelist` )
echo $node
ssh $node "docker pull jchodera/docker-fah-client"
end where
Not very pretty, but this got the job done. |
I am using dockerhub to automatically build and host docker images. It's great. You just point it to a github repo with your |
This sounds great, though we want to make sure deleting a docker image that is in use doesn't cause any problems to currently-running containers. Our management plan should exploit the fact that docker containers should be "short running" jobs that are only run through the queue, with a execution time limit of a few days before destruction. |
Also note, from a disk space point of view the current space consumed on /scratch/docker is noise. No local size of those dirs exceeds 100GB. One node has 93GB. So discussion of the methods of cleaning is not time critical. Most node local drives are very lightly loaded up. |
|
So basically having monitored this if we proceeded with:
|
I'm fine with this. Maybe to help the check-in process along, we could link to instructions for how to sign up for docker hub & upload images (or docker files)? |
My suggestion is a section in the wiki with said instructions. And point people at the wiki. Its more persistent. |
@lzamparo misc item to assist with another cleaning script concept. Can you double confirm this docker image on gpu-1-11 is an orphan? My script believes it is but I do not auto-kill yet.
|
@tatarsky I didn't think I had any running jobs, so this job is an orphan. I attached to it and killed it myself. |
Thanks script agreed but I like to check. |
This remains a desire for the docker cleaning script which I may get some cycles on shortly. |
(RFC=Request for Comment)
I am trying to get a clear understanding of the way docker should best store images in what it calls its "root area" which is currently set to /scratch/docker compared to the default of /var/lib/docker which would be a smaller area on a node.
And also get the requirements for the space and usage of docker. We currently have it in a form of "open season test mode" and have for awhile.
That root area consists of more than just images. It is the area where I believe much of the container state is also kept so I don't think you can just toss the entire region into a shared GPFS directory. Or at least I would have to experiment with that and I can sort of see there is going to be collisions if I do.
My brief reading seems to point to the use of a local repository if a large collection of images and their local modifications is required. With its backend storage then on some shared media. This requires a daemon be run and there appear to be two projects with implementations.
All this would be considered an enhancement but I said I would look into it and so here is the start of my trying to understand the requirements to move forward.
The text was updated successfully, but these errors were encountered: