-
Notifications
You must be signed in to change notification settings - Fork 8
Build cache
mistry can cache contents between builds. Depending on the project, this may speed up jobs significantly. Additionally, the cache is the means to achieve incremental building.
Prerequisites:
The cache is enabled on a per-job basis by passing a value to the group
option when scheduling a build. The actual value can be anything, as long as it's the same between jobs that want to use the same cache.
Enabling the cache means that the data directory of the last completed job of the same group will be used as the data directory for the new job.
Given the following jobs that are executed sequentially:
- Job A {project: X, group: foo}
- Job B {project: X, group: bar}
- Job C {project: X, group: foo}
- Job D {project: X, group: bar}
we have the following behavior:
- Job A will start with an empty data directory, since it's the first job executed in project X with group
foo
- Job B will start with an empty data directory, since it's the first job executed in project X with group
bar
- Job C will start with the data directory from Job A, since it uses the same group as Job A.
- Job D will start with the data directory from Job B, since it uses the same group as Job B.
To illustrate this with an example, let's assume a project that just writes the string "bar" to a file named foo.txt and produces an additional file for cache purposes, bar.txt (its contents are irrelevant but let's assume it contains cache data). We schedule a job of this project passing it a group so that the cache is enabled:
$ mistry build --project projectx --group abc
After the build completes we can see in the host the data path of this job containing the following:
$ tree /var/lib/mistry/data/foo/ready/var/lib/mistry/data/projectx/ready
|-- 00f46193c660ccb36d355a7a1ba104b1e66da55c24bdca26e54c9843f79c18bd
| |-- data # mounted as `/data` inside the container
| | |-- artifacts # <-- build artifacts
| | | |-- foo.txt
| | |-- cache # <-- build cache contents
| | | |-- bar.txt
| | `-- params # <-- job parameters
| |-- out.log
| `-- result.json
Now let's assume that another job of the same project is scheduled:
$ mistry build --project projectx --group abc -- --cachebust=$RANDOM
Note: the cachebust parameter is passed only for demonstration purposes, so that the job is not considered identical to the previous one (ie. bypass the result cache) - the name and value are both irrelevant.
When this new job is started, its /data
path will initially contain all the contents of the previous job's results. So from inside the container, we would see this:
$ tree /data
|-- artifacts
| |-- foo.txt # "bar"
|-- cache
| |-- bar.txt
|-- params
| |-- cachebust # eg. 312456
Q: Do I have to take care of cleaning up the artifacts path in case there are leftovers from previous artifacts?
A: Yes.
Q: Why are artifacts from previous builds also retained?
A: Because they may be used to achieve incremental building (many build tools do not re-build artifacts when they are already present).