Export cache image and app image in parallel #1167

ESWZY · 2023-07-30T05:29:33Z

In our scenario, the app image and the cache image need to be exported at the same time, but this process is serial in the lifecycle, which means that after the app image is exported, we have to wait for the cache image to be exported. After calculation, the time to export the app image is about the same as the time to export the cache, but we don’t need to wait for the export of cache image, only after the app is exported, we can proceed to the next steps (distribution and deployment).

So we tried to parallelize this step (this PR) and compare it with the serial exporting. We used several projects for testing and pushed app images and cache images to the same self-hosting registry.

Java (app image is 202.361MB, cache image is 157.525MB, with a same layer: 107.648MB):
- Before: total 18.34s, app 8.96s, cache 9.38s
- After: total 14.70s, app 11.42s, cache 13.93s
- app image: 0+1.103MB+15.153MB+107.648MB+49.953MB+0+0+28.502MB
- cache image: 9.411MB+40.465MB+107.648MB
Go (app image is 114.273MB, cache is 175.833MB, no same layer):
- Before: total 16.57s, app 5.92s, cache 10.65s
- After: total 12.02s, app 7.31s, cache 11.48s
- app image: 0+1MB+25.72MB+8.993MB+49.953MB+0+0+28.502MB
- cache image: 70.87MB+104.964MB

We can get some improvements here, although the export time of app image and cache image has significantly increased, the total time has decreased. The effect will be more pronounced if we export to different registry or use faster bandwidth.

But my confusion is, if the app image and cache image contain same layeres, will this method no longer resue layeres when pushing to the registry. Or we should detece / specify whether to use a parallel pushing strategy. Thx.

jabrown85

I quite like this and was also thinking about proposing this recently. The only risk I see is that logs for both internal steps will mix, but that hardly seems worth doing.

To answer your ? earlier - Layer reuse won't be affected. The cache is an image with just a single layer and not the same layers that the run image uses. If I'm understanding your question correctly 😄

I'm +1 on this, but I want to wait for @natalieparellano @joe-kimmel-vmw to weigh in as well. Unsure if we would need/want to guard this by platform version or any other mechanism. My vote would be no, we don't guarantee logs or their order as part of any API.

cmd/lifecycle/exporter.go

dlion · 2023-07-31T14:41:13Z

Thanks for your contribution @ESWZY !
Would be awesome if you could add some unit tests to validate that your changes work as expected.
You can find utilities to test async code in the testhelpers.go file, specifically the helper function Eventually.

joe-kimmel-vmw · 2023-07-31T16:18:53Z

LGTM; I'm +1 on using defer as Jesse suggested.

ESWZY · 2023-08-01T09:01:06Z

Thanks for your contribution @ESWZY ! Would be awesome if you could add some unit tests to validate that your changes work as expected. You can find utilities to test async code in the testhelpers.go file, specifically the helper function Eventually.

Thanks for your comment! It is reasonable to add tests for the parallelized export process. But I found that most of the existing tests are to test the two steps of func (e *Exporter) Export and func (e *Exporter) Cache; or to run a Docker container to test the entire export process.

But the modified parallel logic is in between, should I reuse the existing testing logic? I found some test cases here that I feel are similar, but not sure if I should create a new test case based on this, or just reuse this case and modify it:

lifecycle/acceptance/exporter_test.go

Lines 294 to 318 in f8b3419

    
           when("cache image case", func() { 
        
           	it("is created", func() { 
        
           		cacheImageName := exportTest.RegRepoName("some-cache-image-" + h.RandString(10)) 
        
           		exportFlags := []string{"-cache-image", cacheImageName} 
        
           		if api.MustParse(platformAPI).LessThan("0.7") { 
        
           			exportFlags = append(exportFlags, "-run-image", exportRegFixtures.ReadOnlyRunImage) 
        
           		} 
        
           		exportArgs := append([]string{ctrPath(exporterPath)}, exportFlags...) 
        
           		exportedImageName = exportTest.RegRepoName("some-exported-image-" + h.RandString(10)) 
        
           		exportArgs = append(exportArgs, exportedImageName) 
        
           		output := h.DockerRun(t, 
        
           			exportImage, 
        
           			h.WithFlags( 
        
           				"--env", "CNB_PLATFORM_API="+platformAPI, 
        
           				"--env", "CNB_REGISTRY_AUTH="+exportRegAuthConfig, 
        
           				"--network", exportRegNetwork, 
        
           			), 
        
           			h.WithArgs(exportArgs...), 
        
           		) 
        
           		h.AssertStringContains(t, output, "Saving "+exportedImageName) 
        
           		h.Run(t, exec.Command("docker", "pull", exportedImageName)) 
        
           		assertImageOSAndArchAndCreatedAt(t, exportedImageName, exportTest, imgutil.NormalizedDateTime)

// Add these lines to detect whether the export of the two is successful
h.Run(t, exec.Command("docker", "pull", cacheImageName))
h.Run(t, exec.Command("docker", "pull", exportedImageName))

dlion · 2023-08-01T13:31:53Z

should I reuse the existing testing logic? I found some test cases here that I feel are similar, but not sure if I should create a new test case based on this, or just reuse this case and modify it.

Since you are just using the go-subroutines this is not a big change in terms of logic; so it would be fine to just validate that those steps are completed successfully as expected so I guess we can modify the existing ones asserting that everything went well, thanks 😄

natalieparellano · 2023-08-14T14:03:39Z

@ESWZY is this ready for a re-review?

ESWZY · 2023-08-14T16:10:50Z

@ESWZY is this ready for a re-review?

@natalieparellano @dlion Sorry for no reply for a long time. I was trying to design a test case for the parallel export process, but had no idea how to do that.

The original test cases have already included the export tests of single app image, single cache image and both of them. These test cases have been able to part of prove that the parallel export process works normally. Should we add some more complex, or more intensive export tests? Or just use the original test cases without any modification. Thx.

cmd/lifecycle/exporter.go

natalieparellano

Looks good to me! @jabrown85 would you like to have another look?

natalieparellano · 2023-08-16T20:56:16Z

I've been thinking about this more and I think we need to check that launch = true cache = true layers are cached successfully when this is done in parallel. I fear the cacher may be relying on the app exporter for creating these layers.

natalieparellano · 2023-08-17T14:50:20Z

Looking into the code a little bit I think we'll want to put a mutex around e.LayerFactory.DirLayer so that we avoid re-creating the same layer.tar in parallel in the case that the exporter (while exporting) and the exporter (while caching) is processing the same layer.

Edit: we won't want to lock all of e.LayerFactory.DirLayer, just the processing of a particular fsLayer.Identifier()

natalieparellano · 2023-08-17T20:04:21Z

we'll want to put a mutex around e.LayerFactory.DirLayer

I think we can make Factory.tarHashes a sync.Map and use LoadOrStore when we access the map here:

lifecycle/layers/factory.go

Line 45 in df9ae90

if sha, ok := f.tarHashes[tarPath]; ok {

Signed-off-by: Woa <[email protected]>

ESWZY · 2023-08-18T14:59:48Z

we'll want to put a mutex around e.LayerFactory.DirLayer

I think we can make Factory.tarHashes a sync.Map and use LoadOrStore when we access the map here:

lifecycle/layers/factory.go

Line 45 in df9ae90

if sha, ok := f.tarHashes[tarPath]; ok {

Good idea! I changed to use sync.Map for storage. But I'm not sure how to use LoadOrStore. 👀

kritkasahni-google · 2023-08-21T17:22:11Z

@ESWZY @natalieparellano @jabrown85 I am thinking about this and do we really need this change?
I am investigating build performance improvements for Google Cloud Functions and I really need access to app image as soon as its ready. If this change is adding few seconds to app image export maybe can we hold onto it?

Instead, I have this proposal (and this could be optional on platform side based on some input from platform indicating that platform wants/expects this behavior) -

If lifecycle/exporter could write some status like "APP_IMAGE_READY" (in the context of corresponding build/execution) as soon as app image is ready[1] (w/o waiting for cache image export) to CNB_PLATFORM_DIR from which our platform which can be relayed to interested services who can then immediately deploy app image and don't necessarily need to wait for cache image export. ... [1] https://github.com/buildpacks/lifecycle/blob/main/cmd/lifecycle/exporter.go#L227

@ESWZY Would implementing this proposal help your use case as well?

jabrown85 · 2023-08-21T17:57:45Z

@kritkasahni-google I like the idea of messaging with the platform in an async fashion. Maybe something like EXPORTED_APP_IMAGE_REF that contains the registry.blah/repo/blah@digest would useful if we like the file based pattern. A similar EXPORTED_CACHE_IMAGE_REF could be written for parity reasons. I guess the platform would watch for the files to be written by lifecycle and treat them as events for kicking off platform specific actions?

Essentially platform hooks. We could formalize a hook concept if we thought that was better. Similar to git hooks, where we execute CNB_PLATFORM_DIR/{hook} if it exists.

$CNB_PLATFORM_DIR/hooks/pre-{stage}
$CNB_PLATFORM_DIR/hooks/post-{stage}
$CNB_PLATFORM_DIR/hooks/exporter-image-exported
$CNB_PLATFORM_DIR/hooks/exporter-cache-exported
etc.

What do you think would be best for your platform @kritkasahni-google? lifecycle written files or executables?

natalieparellano · 2023-08-21T19:26:13Z

Maybe something like EXPORTED_APP_IMAGE_REF that contains the registry.blah/repo/blah@digest would useful

Another option would be to look for the presence of <layers>/report.toml, which is currently written when the app image is done, but before we start processing the cache image

ESWZY · 2023-08-22T06:40:43Z

@ESWZY @natalieparellano @jabrown85 I am thinking about this and do we really need this change? I am investigating build performance improvements for Google Cloud Functions and I really need access to app image as soon as its ready. If this change is adding few seconds to app image export maybe can we hold onto it?

Instead, I have this proposal (and this could be optional on platform side based on some input from platform indicating that platform wants/expects this behavior) -

If lifecycle/exporter could write some status like "APP_IMAGE_READY" (in the context of corresponding build/execution) as soon as app image is ready[1] (w/o waiting for cache image export) to CNB_PLATFORM_DIR from which our platform which can be relayed to interested services who can then immediately deploy app image and don't necessarily need to wait for cache image export. ... [1] https://github.com/buildpacks/lifecycle/blob/main/cmd/lifecycle/exporter.go#L227

@ESWZY Would implementing this proposal help your use case as well?

@kritkasahni-google Actually, our team also need to control the export behavior. As you can see in the PR description, there is a few seconds improvement overall, but the app export process does slow down by a few seconds. If there is a trigger mechanism in image registry, then deployment steps can be executed subsequently. For some scenes that are not covered, then we need this kind of concurrent export.

So, I also agree that there should be a platform action to specify whether parallelism is enabled. Is such a design feasible? @natalieparellano @joe-kimmel-vmw

natalieparellano · 2023-08-22T17:30:16Z

@ESWZY we could add that! It would require an RFC: https://github.com/buildpacks/rfcs#rfc-process

Is that something you'd be willing to contribute? We could guide you through the process if that would be helpful.

kritkasahni-google · 2023-08-22T17:32:46Z

@ESWZY Makes sense that parallelism could be enabled/disabled based on input from platform. If we could keep this behind a flag I would be maybe interested in trying out if/how parallelism helps us, based on perf gains or not we could then easily disable it using that flag.

ESWZY · 2023-08-23T02:26:01Z

@ESWZY we could add that! It would require an RFC: https://github.com/buildpacks/rfcs#rfc-process

Is that something you'd be willing to contribute? We could guide you through the process if that would be helpful.

I would like to! Let me read RFC 0004 first. 🥰

kritkasahni-google · 2023-08-23T04:10:12Z

@kritkasahni-google I like the idea of messaging with the platform in an async fashion. Maybe something like EXPORTED_APP_IMAGE_REF that contains the registry.blah/repo/blah@digest would useful if we like the file based pattern. A similar EXPORTED_CACHE_IMAGE_REF could be written for parity reasons. I guess the platform would watch for the files to be written by lifecycle and treat them as events for kicking off platform specific actions?

Essentially platform hooks. We could formalize a hook concept if we thought that was better. Similar to git hooks, where we execute CNB_PLATFORM_DIR/{hook} if it exists.
$CNB_PLATFORM_DIR/hooks/pre-{stage}
$CNB_PLATFORM_DIR/hooks/post-{stage}
$CNB_PLATFORM_DIR/hooks/exporter-image-exported
$CNB_PLATFORM_DIR/hooks/exporter-cache-exported
etc.
What do you think would be best for your platform @kritkasahni-google? lifecycle written files or executables?

@jabrown85 Let me get back to you in a bit about this - I am also discussing with our platform folks atm about this.

kritkasahni-google · 2023-09-13T00:08:52Z

RFC for exporting images in parallel buildpacks/rfcs#291

@kritkasahni-google I like the idea of messaging with the platform in an async fashion. Maybe something like EXPORTED_APP_IMAGE_REF that contains the registry.blah/repo/blah@digest would useful if we like the file based pattern. A similar EXPORTED_CACHE_IMAGE_REF could be written for parity reasons. I guess the platform would watch for the files to be written by lifecycle and treat them as events for kicking off platform specific actions?
Essentially platform hooks. We could formalize a hook concept if we thought that was better. Similar to git hooks, where we execute CNB_PLATFORM_DIR/{hook} if it exists.
$CNB_PLATFORM_DIR/hooks/pre-{stage}
$CNB_PLATFORM_DIR/hooks/post-{stage}
$CNB_PLATFORM_DIR/hooks/exporter-image-exported
$CNB_PLATFORM_DIR/hooks/exporter-cache-exported
etc.
What do you think would be best for your platform @kritkasahni-google? lifecycle written files or executables?
@jabrown85 Let me get back to you in a bit about this - I am also discussing with our platform folks atm about this.

@jabrown85 Our platform team has tabled this for now - it is hard to say when it will get prioritized. But I would be interested in making this change provided there are more users asking for this. Let me open a new issue to track this, what do you think?

natalieparellano · 2023-09-19T14:05:26Z

Blocking on buildpacks/rfcs#291

Signed-off-by: Woa <[email protected]>

kritkasahni-google · 2023-10-09T17:49:31Z

RFC for exporting images in parallel buildpacks/rfcs#291
@kritkasahni-google I like the idea of messaging with the platform in an async fashion. Maybe something like EXPORTED_APP_IMAGE_REF that contains the registry.blah/repo/blah@digest would useful if we like the file based pattern. A similar EXPORTED_CACHE_IMAGE_REF could be written for parity reasons. I guess the platform would watch for the files to be written by lifecycle and treat them as events for kicking off platform specific actions?
Essentially platform hooks. We could formalize a hook concept if we thought that was better. Similar to git hooks, where we execute CNB_PLATFORM_DIR/{hook} if it exists.
$CNB_PLATFORM_DIR/hooks/pre-{stage}
$CNB_PLATFORM_DIR/hooks/post-{stage}
$CNB_PLATFORM_DIR/hooks/exporter-image-exported
$CNB_PLATFORM_DIR/hooks/exporter-cache-exported
etc.
What do you think would be best for your platform @kritkasahni-google? lifecycle written files or executables?
@jabrown85 Let me get back to you in a bit about this - I am also discussing with our platform folks atm about this.
@jabrown85 Our platform team has tabled this for now - it is hard to say when it will get prioritized. But I would be interested in making this change provided there are more users asking for this. Let me open a new issue to track this, what do you think?

Opened #1215 to track messaging with platform in async fashion

natalieparellano

@ESWZY thank you for pushing this forward. I added a couple of comments. Would you be willing to make the spec PR also? This could go out in Platform 0.13.

natalieparellano · 2023-10-23T20:02:59Z

cmd/lifecycle/creator.go

+	if c.ParallelExport {
+		if c.CacheImageRef == "" {
+			cmd.DefaultLogger.Warn("parallel export has been enabled, but it has not taken effect because cache image (-cache-image) has not been specified.")
+		}
+	}


Could we put this validation in the platform package? Somewhere in ResolveInputs? Our eventual aim is to move all such validations there. That would also have the nice side effect of printing the warning when cmd/lifecycle/exporter is invoked with this configuration.

natalieparellano · 2023-10-23T20:54:48Z

layers/factory.go

-	}
-	if sha, ok := f.tarHashes[tarPath]; ok {
-		f.Logger.Debugf("Reusing tarball for layer %q with SHA: %s\n", id, sha)
+	if sha, ok := f.tarHashes.Load(tarPath); ok {


I've been thinking about it, and I think we could still have a race condition if this load returns !ok. We could end up processing the same tar path in parallel - exporter and cacher each reading all the bits in the layer before one of them stores the result. What about something like:

const processing = "processing" func (f *Factory) writeLayer(id, createdBy string, addEntries func(tw *archive.NormalizingTarWriter) error) (layer Layer, err error) { tarPath := filepath.Join(f.ArtifactsDir, escape(id)+".tar") var ( tries int sha any loaded bool ) for { sha, loaded = f.tarHashes.LoadOrStore(tarPath, processing) if loaded { shaString := sha.(string) if shaString == processing { // another goroutine is processing this layer, wait and try again time.Sleep(time.Duration(tries) * 500 * time.Millisecond) tries++ continue } f.Logger.Debugf("Reusing tarball for layer %q with SHA: %s\n", id, shaString) return Layer{ ID: id, TarPath: tarPath, Digest: shaString, History: v1.History{CreatedBy: createdBy}, }, nil } break } // function continues...

We could debate about the manner of the backoff but this seems preferable to potentially reading all the bits twice, especially for large layers which tend to take a lot of time anyway. @jabrown85 do you have any thoughts?

@jabrown85 wdyt?
I think - if not faster than processing the same layer again, backoff and retrying this way won't be at least slower. I will implement exponential backoff and retry. @natalieparellano @jabrown85 is that okay?

On second thought, do you think "exponential" backoff can cause it to be slower in some case? Ideally we need parallelism to make it faster and at least "exponential" backoff might be counter-productive in worst case and I think fixed time delay would be better here - how about 500ms or even 1 sec. wdyt?

I think starting with what @natalieparellano seems reasonable - we can always circle back and adjust any delay timings as we get feedback and real world cases.

natalieparellano · 2023-10-23T20:56:14Z

cmd/lifecycle/exporter.go

+			RunImageRef:        runImageID,
+			RunImageForExport:  runImageForExport,
+			WorkingImage:       appImage,
+		})


Could we move encoding.WriteTOML(e.ReportPath, &report) into this func as well? That would allow report.toml to be written before the cache has finished, which would allow platforms to use the presence of this file as a signal that the image is ready.

That makes sense @natalieparellano - working on this change. 1 qq -> when parallel export is enabled and when go routine to export app image fails, should we cancel go routine to export cache image or wait for it to complete, wdyt?

kritkasahni-google · 2023-10-31T18:53:52Z

@ESWZY if you are preoccupied with something else, I could take this change forward if thats okay with you and everyone? I am interested in trying out this change with our buildpacks as well.

ESWZY · 2023-11-01T04:53:39Z

@ESWZY if you are preoccupied with something else, I could take this change forward if thats okay with you and everyone? I am interested in trying out this change with our buildpacks as well.

@kritkasahni-google That's okay, thank you for your contribution.

cc @natalieparellano

kritkasahni-google · 2023-11-15T06:03:58Z

Spec PR buildpacks/spec#380

natalieparellano · 2023-11-15T13:53:20Z

Superseded by #1247. Thanks again for your work @ESWZY!

ESWZY requested a review from a team as a code owner July 30, 2023 05:29

ESWZY force-pushed the parallel-export branch from ba3365c to edfdf73 Compare July 30, 2023 05:30

jabrown85 reviewed Jul 31, 2023

View reviewed changes

cmd/lifecycle/exporter.go Outdated Show resolved Hide resolved

cmd/lifecycle/exporter.go Show resolved Hide resolved

ESWZY force-pushed the parallel-export branch from edfdf73 to eccad2d Compare August 1, 2023 09:01

natalieparellano added this to the lifecycle 0.18.0 milestone Aug 3, 2023

natalieparellano reviewed Aug 14, 2023

View reviewed changes

cmd/lifecycle/exporter.go Outdated Show resolved Hide resolved

natalieparellano approved these changes Aug 14, 2023

View reviewed changes

jabrown85 approved these changes Aug 15, 2023

View reviewed changes

ESWZY force-pushed the parallel-export branch 6 times, most recently from 688c9bf to 75fd8f0 Compare August 15, 2023 06:18

natalieparellano self-requested a review August 16, 2023 20:56

natalieparellano requested a review from jabrown85 August 17, 2023 20:08

jabrown85 approved these changes Aug 17, 2023

View reviewed changes

ESWZY force-pushed the parallel-export branch from 75fd8f0 to 2c7287d Compare August 18, 2023 13:54

Export cache image and app image in parallel

78fb788

Signed-off-by: Woa <[email protected]>

ESWZY force-pushed the parallel-export branch from 2c7287d to 78fb788 Compare August 18, 2023 14:49

ESWZY mentioned this pull request Aug 29, 2023

RFC for exporting images in parallel buildpacks/rfcs#291

Merged

natalieparellano modified the milestones: lifecycle 0.18.0, lifecycle 0.19.0 Sep 19, 2023

ESWZY force-pushed the parallel-export branch 2 times, most recently from 16502eb to c3d65c4 Compare September 21, 2023 17:35

New API for parallel export

c0c421b

Signed-off-by: Woa <[email protected]>

ESWZY force-pushed the parallel-export branch from c3d65c4 to c0c421b Compare September 22, 2023 04:15

kritkasahni-google mentioned this pull request Oct 9, 2023

Messaging with the platform in an async fashion #1215

Open

natalieparellano reviewed Oct 23, 2023

View reviewed changes

This was referenced Nov 14, 2023

Export cache image and app image in parallel #1244

Closed

Export app image and cache image in parallel #1247

Merged

natalieparellano closed this Nov 15, 2023

Export cache image and app image in parallel #1167

Export cache image and app image in parallel #1167

Conversation

ESWZY commented Jul 30, 2023 • edited Loading

jabrown85 left a comment

Choose a reason for hiding this comment

dlion commented Jul 31, 2023

joe-kimmel-vmw commented Jul 31, 2023

ESWZY commented Aug 1, 2023 • edited Loading

dlion commented Aug 1, 2023 • edited Loading

natalieparellano commented Aug 14, 2023

ESWZY commented Aug 14, 2023

natalieparellano left a comment

Choose a reason for hiding this comment

natalieparellano commented Aug 16, 2023

natalieparellano commented Aug 17, 2023 • edited Loading

natalieparellano commented Aug 17, 2023

ESWZY commented Aug 18, 2023

kritkasahni-google commented Aug 21, 2023 • edited Loading

jabrown85 commented Aug 21, 2023

natalieparellano commented Aug 21, 2023 • edited Loading

ESWZY commented Aug 22, 2023

natalieparellano commented Aug 22, 2023

kritkasahni-google commented Aug 22, 2023 • edited Loading

ESWZY commented Aug 23, 2023

kritkasahni-google commented Aug 23, 2023 • edited Loading

kritkasahni-google commented Sep 13, 2023

natalieparellano commented Sep 19, 2023

kritkasahni-google commented Oct 9, 2023 • edited Loading

natalieparellano left a comment

Choose a reason for hiding this comment

natalieparellano Oct 23, 2023

Choose a reason for hiding this comment

natalieparellano Oct 23, 2023

Choose a reason for hiding this comment

kritkasahni-google Nov 5, 2023

Choose a reason for hiding this comment

kritkasahni-google Nov 5, 2023

Choose a reason for hiding this comment

jabrown85 Nov 15, 2023

Choose a reason for hiding this comment

natalieparellano Oct 23, 2023

Choose a reason for hiding this comment

kritkasahni-google Nov 3, 2023

Choose a reason for hiding this comment

kritkasahni-google commented Oct 31, 2023

ESWZY commented Nov 1, 2023

kritkasahni-google commented Nov 15, 2023

natalieparellano commented Nov 15, 2023

ESWZY commented Jul 30, 2023 •

edited

Loading

ESWZY commented Aug 1, 2023 •

edited

Loading

dlion commented Aug 1, 2023 •

edited

Loading

natalieparellano commented Aug 17, 2023 •

edited

Loading

kritkasahni-google commented Aug 21, 2023 •

edited

Loading

natalieparellano commented Aug 21, 2023 •

edited

Loading

kritkasahni-google commented Aug 22, 2023 •

edited

Loading

kritkasahni-google commented Aug 23, 2023 •

edited

Loading

kritkasahni-google commented Oct 9, 2023 •

edited

Loading