-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRIU dumps triggers COW on all memory in all child processes #2386
Comments
@Tianyang-Zhang Would you be able to share an example code snippet for a test program that could be used to reproduce this problem? |
Sure, here is a minimal test program to reproduce the issue. The CRIU I'm using is v3.19.
Compile and run:
Dump:
Please let me know if you need anything else. @rst0git |
@Tianyang-Zhang Would it be possible to confirm if the memory utilisation remains increased after the checkpoint has been created, or it is increased only when What is the kernel version and CPU architecture of your system? |
Have you tried changing the following line in the example above? -void *ptr = mmap(NULL, n_GB, PROT_READ | PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+void *ptr = mmap(NULL, n_GB, PROT_READ | PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0); |
@rst0git Thanks for helping! The memory usage remains increased after the checkpoint(use
I'm using the AWS
Using |
It looks like this is the cause of why |
You are right. I have found the same thing while investigating this issue:
Both vmsplice and process_vm_readv uses get_user_pages. |
@Tianyang-Zhang I don't see a better solution rather than to use write instead of vmsplice. Could you try out the next patch:
In ideal case, we need to detect write-protected cow pages to dump them once. |
It seems like this change was introduced with torvalds/linux@17839856fd58
|
That is because your kernel version is low, you can update your kernel > 5.15. It will not triggers COW again. |
I just tried the patch and the COW is still triggered. I thought the trigger would be the vmsplice() call on the target process side. Which is the |
@Tianyang-Zhang Silly me. I patched the wrong vmsplice. Here is the right patch:
|
This patch was introduced in v5.8 and then it was rolled back in v5.9 (torvalds/linux@a308c71). I messed up with my environment and I was thinking the issue exists in new kernels (6.8+), but actually it doesn't. @Tianyang-Zhang could you verify that you can reproduce the issue on a non-rhel kernel? |
@avagin I tried on a Ubuntu host with 5.15 kernel, and the issue is gone. I also tried CentOS 7 with 3.10 kernel and also no issue. Haven't found an environment to try 5 ~ 5.8 kernels yet. The patch somehow causes CRIU to hang when transferring pages. But anyway, Here is the dump log from the patch:
Besides the Would that be an easy fix? Thanks for helping! |
I believe the reason for this is because the mappings are private (MAP_PRIVATE) and child processes created by fork() inherit copies of these mappings. Thus, CRIU saves a copy of the private mappings for each child process. In contrast, if the mappings are shared |
Is there any plan to support the |
Hi, could you please confirm if there is a plan to support the |
I am trying to figure out how we can do that. Any ideas are welcome. |
A friendly reminder that this issue had no activity for 30 days. |
Hi @avagin @rst0git, thanks a lot for your previous help! May I ask if this COW mechanism on CRIU wiki is implemented? https://criu.org/Copy-on-write_memory Is there any limitation of that method? It looks like someone had that idea for a long time. |
Description
I have a process that allocates 3GB memory then fork(), the child process never touch any memory so that no copy on write, and the total memory usage is 3GB. Checkpoint that process tree and the checkpoint size is 6GB, also, the system memory usage increases during checkpointing. If checkpointing with
--leave-running
flag, the total process memory usage is doubled to 6GB after the checkpoint is finished.I read the page https://criu.org/Copy-on-write_memory. It looks like CRIU should have such "forked COW memory" support since v0.3.
CRIU parses the
/proc/pid/smaps
to get the VMA type. In this case, I think CRIU got the wrong VMA type. CRIU read theperm field
(rw-p in below) to see if the VMA is private or shared. However, the "forked memory" is marked as private in smaps, although its actually anonymous shared. You can see from the smaps output below:In this case, CRIU treats this VMA as
VMA_ANON_PRIVATE
and eventually the parasite thread will callvmsplice()
to transfer pages to the pipe. Then, thevmsplice()
syscall somehow triggers the Copy-on-Write and causes the memory usage to increase. You can verify thisvmsplice()
issue by creating an app that:You will see the process memory usage is doubled after data transfer is done, and the anonymous shared VMA becomes anonymous private.
Then I tried to find if there is any code path to handle this "forked memory" but I couldn't find it. It looks like all
VMA_ANON_SHARED
handling eventually needs to find a corresponding entry in/proc/self/map_files/
, but the forked memory doesn't map to any file.I would greatly appreciate any help. I want to know if this is a limitation of CRIU or a bug. If a process uses 1GB forks 100 times and all children don't touch the memory, the system usage is 1GB but will increase to 100GB after checkpoint.
Steps to reproduce the issue:
--leave-running
Describe the results you received:
The system memory increase after the entire process tree is seized. Eventually the system memory usage increased to 40GB. The checkpoint image size is 40GB.
Describe the results you expected:
System memory should not increase. The process should still only use 2GB after the dump. The checkpoint image size should be 2GB because all 20 processes shares the same physical pages and no CoW.
Additional information you deem important (e.g. issue happens only occasionally):
The
sys_vmsplice()
call inparasite.c::dump_pages()
. Theproc_parse.c::parse_smaps()
.Also, I'm wondering if the
bit 61
from/proc/pid/pagemap
can be used to determine whether a page is anonymous shared(from https://www.kernel.org/doc/Documentation/vm/pagemap.txt), and then specially handle that case. I see CRIU uses that bit to check if the page is file-page.CRIU logs and information:
(The dump log is too long, I will just paste some of the VMA related parts. Please let me know if anything else is needed)
CRIU full dump/restore logs:
Output of `criu --version`:
Output of `criu check --all`:
Additional environment details:
The text was updated successfully, but these errors were encountered: