-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firecracker Snapshots Support #448
base: main
Are you sure you want to change the base?
Firecracker Snapshots Support #448
Conversation
Signed-off-by: Plamen Petrov <[email protected]>
Signed-off-by: Plamen Petrov <[email protected]>
Notes: 1. Uses logging-only branch from ustiugov/firecracker-go-sdk 2. Firecracker logs path is hard-coded. Signed-off-by: Plamen Petrov <[email protected]>
Signed-off-by: Plamen Petrov <[email protected]>
Signed-off-by: Plamen Petrov <[email protected]>
firecracker update Signed-off-by: Plamen Petrov <[email protected]>
* Check that shim dir exists when loading shim * No longer try to create shim dir when loading shim, as it must exist Signed-off-by: Plamen Petrov <[email protected]>
Signed-off-by: Plamen Petrov <[email protected]>
Signed-off-by: Plamen Petrov <[email protected]>
This is really cool. Thanks for all the work! I have discussed with @ustiugov a few months back regarding the snapshot support but haven't updated you since then. I apologize for the lack of communication from our side. |
hi @kzys |
For starting a micro VM, the proposed API asks clients to call;
However, if a client is going to call LoadSnapshot, CreateVM doesn't have to load a guest OS. Is that correct understanding? I think it would be better to provide a way to create a brand-new VM from its snapshot, rather than "offloading" a booted VM. Could you explain the reasons regarding the design decision? |
@kzys Sorry for the late response. We expect that a workflow where an idle VM can be snapshotted followed by freeing its resources with Offload (i.e., killed) on the same physical host. When a new request comes for this VM, the orchestrator (i.e., the containerd client) can restore the VM into a newly created Firecracker process. LoadSnapshot allows to load the VM state and resume the VM from the same exact point where the VM was previously Offloaded. Compared to StartVM, LoadSnapshot expects certain files to be present before loading the VM state (including the guest memory). These files include the disk image (the two disk drives, namely As opposed to StopVM, Offload preserves the disk drives whereas the tap is recreated manually (by our custom orchestrator). To simplify the procedure, Offload does NOT remove the VM's shim folder, keeping all the files in place. I hope these explanations make sense although we are open for the feedback. |
Thanks! Now I can understand the assumption here, but I'd like to give more options to customers regarding how to keep micro VM's artifacts. Moving micro VMs between multiple hosts would be beneficial for customers. For example, if a host is having a hardware failure or some system updates such as updating Linux kernel, the customer may want to keep their micro VMs in somewhere (e.g. cloud storage) and run them in somewhere else. While don't want to directly integrate cloud storage clients in firecracker-containerd, we should make that possible for higher-level orchestrators such as Fargate. Would you mind if I ask you to split this pull request into a few ones? Pause, Resume and CreateSnapshot are relatively straightforward. We will move some of the implementation from here to the SDK, but the proposed APIs look good to me. The rest may need more design discussions upfront. |
@kzys Thank you! Indeed, the scenario that you mentioned makes a lot of sense. We took the easier path, by limiting our scenario to the same physical host, because we are still not clear what firecracker-containerd can assume about the disk state of VMs (both the data and the emulated devices). This is the only missing piece that we miss here and would really like to hear your opinion/advice on. sure, we can split the PR. should we create the following PRs:
What do you think? |
What do you mean by the disk state of VMs? It would be better to make less assumptions and let customers decide what they want to do. Regarding the PRs, the split makes sense but I'd like to have Pause, Resume and CreateSnapshot on the SDK first. We are going to have the SDK's 0.22.0 release next week. Could you help us to have the APIs on the SDK after the release? |
@kzys I call the disk state the following: 1) the images pulled by firecracker-containerd; 2) the devmapper block devices. Yes, I think that we should be able to get Pause, Resume and CreateSnapshot to the SDK first. @plamenmpetrov is our leading developer here. |
@kzys Just to clarify, version 0.22.0 of the SDK does not seem to offer support for Pause, Resume, or CreateSnapshot. In that case, would you suggest that we wait for the SDK's 0.22.0 release and then submit a PR to the SDK implementing these calls? Once this is done, then we can use the SDK to implement the calls in firecracker-containerd. |
@plamenmpetrov Yes! We've finally released 0.22.0 and Firecracker has released 0.23.0. It would be awesome if you could port Pause, Resume and CreateSnapshot changes to the SDK. |
@kzys we merged PauseVM, ResumeVM, and CreateSnapshot into the sdk. what should be our next steps? |
Thanks for the contribution. The next step would be using the new SDK APIs from firecracker-containerd. While CreateSnapshot is probably complicated since we need to keep a container's root filesystem in addition to the microVM, Pause and Resume should be straightforward. Can you make a new PR that exposes Pause and Resume from firecracker-containerd? |
@kzys are there any plans on finalizing the snapshots support? maybe, we could start discussing the next steps |
Any updates to the snapshots feature? Now that loading snapshots is available in go SDK v1 |
Hello, we have been working on supporting microVM snapshotting in containerd-firecracker, following its introduction to firecracker. This PR contains new functions for the firecracker-containerd API that together comprise a complete working prototype for working with Firecracker snapshots. This prototype, however, contains workarounds for the missing calls in go-sdk. We also highlight a couple of issues that we would like to hear your feedback on.
We are open to feedback from the community and would be glad to engage in discussions to finalize and contribute this code to upstream.
Authored by @plamenmpetrov and @ustiugov
Summary
We implement functionality for:
We refer to these collectively as microVM snapshotting requests.
The firecracker-go-sdk does not support microVM snapshotting as of now. As a result, we embedded the microVM snapshotting requests inside the runtime as HTTP requests. We use our own fork of the firecracker-go-sdk v0.21.0, where we provide basic support to the new logging and metrics of the firecracker version that we use (see below). Without these changes in the firecracker-go-sdk, we observe an error in the containerd logs concerning the firecracker logging. This prevents us from seeing the firecracker logs and makes debugging difficult.
We use the following firecracker version in our tests: firecracker.
API Extensions Description
We create an HTTP client upon creating a microVM or loading a microVM snapshot, which is used to send HTTP requests directly to the firecracker process for the respective microVM (contrary to using the firecracker-go-sdk).
ResumeVM, PauseVM and CreateSnapshot
ResumeVM, PauseVM and CreateSnapshot use the HTTP client to send the respective request to the firecracker process. The return code from firecracker is checked to verify that the operation was successful.
Note that CreateSnapshot does not pause the microVM, but assumes that it is paused. This is in line with the prerequisites for creating a microVM snapshot in firecracker.
Offload
Offload kills the firecracker process for the microVM with the respective ID (using SIGKILL) and deletes the firecracker process’ sock file and vsock file so the microVM can later be loaded. This functionality is implemented in the runtime.
In addition, Offload also kills the shim using SIGKILL, so that the resources can be freed up until/if the microVM is loaded in the future. We remove the functionality where the shim directory for the microVM is removed when the shim terminates. This is because in our use case we decide to store the guest memory file and the state file in the shim directory. We also remove the shim socket file and the firecracker shim socket file and recreate the sockets upon LoadSnapshot (see below). This functionality is implemented in the control plugin.
LoadSnapshot
Before doing anything else, the shim needs to be started for the microVM. We recreate the shim socket and the fccontrol shim socket, and start the shim binary. This functionality is implemented in the control plugin.
LoadSnapshot starts a firecracker process listening on the API same socket that the microVM was using prior to being offloaded. The HTTP client is recreated and a load snapshot request is sent to the firecracker process. The return code returned by firecracker is checked to verify the success of the operation. This functionality is implemented in the runtime.
Note that LoadSnapshot assumes that the tap with the same exact name, IP, and MAC, as before the VM was offloaded, exists. Currently, we recommend removing the tap after calling Offload and re-creating the tap before calling LoadSnapshot because if these two calls are back to back (as may be in tests), it would cause “Tap is busy” error.
Limitations
Calling StopVM on a microVM which has been loaded from a snapshot results in an error, because we lose connection to the agent running inside the microVM.
Performance: re-creating a shim process takes about 30ms, before loading the snapshot in Firecracker, in our experiments, we haven’t yet investigated this issue. The intuition is that shim start-up should not exceed 5-10ms as it is for starting a Firecracker process.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.