-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ztdm: inhfd: handle child hanging #2251
Draft
osctobe
wants to merge
192
commits into
checkpoint-restore:criu-dev
Choose a base branch
from
osctobe:zdtm-alarm-handling
base: criu-dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
ztdm: inhfd: handle child hanging #2251
osctobe
wants to merge
192
commits into
checkpoint-restore:criu-dev
from
osctobe:zdtm-alarm-handling
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Change made through this commit: - Include copy of flog as a seperate tree. - Modify the makefile to add and compile flog code. Signed-off-by: prakritigoyal19 <[email protected]>
CID 302713 (checkpoint-restore#1 of 1): Missing varargs init or cleanup (VARARGS) va_end was not called for argptr. Signed-off-by: Adrian Reber <[email protected]>
Separate commit for easier criu-dev <-> master transfer. Acked-by: Mike Rapoport <[email protected]> Signed-off-by: Adrian Reber <[email protected]>
It is mapped, not maped. Same applies for mmap I guess. Found by codespell, except it wants to change it to mapped, which will make it less specific. Signed-off-by: Kir Kolyshkin <[email protected]>
Brought to you by codespell -w (using codespell v2.1.0). [v2: use "make indent" on the result] Signed-off-by: Kir Kolyshkin <[email protected]>
Fixes: checkpoint-restore#2121 Signed-off-by: Pengda Yang <[email protected]>
The TOS(type of service) field in the ip header allows you specify the priority of the socket data. Signed-off-by: Suraj Shirvankar <[email protected]>
Signed-off-by: Suraj Shirvankar <[email protected]>
The pipe_size type is unsigned int, when the fcntl call fails and return -1, it will cause a negative rollover problem. Signed-off-by: zhoujie <[email protected]>
Newer Intel CPUs (Sapphire Rapids) have a much larger xsave area than before. Looking at older CPUs I see 2440 bytes. # cpuid -1 -l 0xd -s 0 ... bytes required by XSAVE/XRSTOR area = 0x00000988 (2440) On newer CPUs (Sapphire Rapids) it grows to 11008 bytes. # cpuid -1 -l 0xd -s 0 ... bytes required by XSAVE/XRSTOR area = 0x00002b00 (11008) This increase the xsave area from one page to four pages. Without this patch the fpu03 test fails, with this patch it works again. Signed-off-by: Adrian Reber <[email protected]>
Signed-off-by: Adrian Reber <[email protected]>
Signed-off-by: Adrian Reber <[email protected]>
Using the fact that we know criu_pid and criu is a parent of restored process we can create pidfile with pid on caller pidns level. We need to move mount namespace creation to child so that criu-ns can see caller pidns proc. Signed-off-by: Pavel Tikhomirov <[email protected]>
By default, the file name 'amdgpu_plugin.txt' is used also as the name for the corresponding man page (`man amdgpu_plugin`). However, when this man page is installed system-wide it would be more appropriate to have a prefix 'criu-' (e.g., `man criu-amdgpu-plugin`). Signed-off-by: Radostin Stoyanov <[email protected]>
crun wants to set empty_ns and this interface is missing from the library. This adds it to libcriu. Signed-off-by: Adrian Reber <[email protected]>
--criu-binary argument provides a way to supply the CRIU binary location to run_criu(). Related to: checkpoint-restore#1909 Signed-off-by: Dhanuka Warusadura <[email protected]>
These changes remove and update the changes introduced in 7177938 in favor of the Python version in CI. os.waitstatus_to_exitcode() function appeared in Python 3.9 Related to: checkpoint-restore#1909 Signed-off-by: Dhanuka Warusadura <[email protected]>
These changes add test implementations for criu-ns script. Fixes: checkpoint-restore#1909 Signed-off-by: Dhanuka Warusadura <[email protected]>
These changes fix the `ImportError: No module named pathlib` error when executing criu-ns tests located at criu/test/others/criu-ns Signed-off-by: Dhanuka Warusadura <[email protected]>
CentOS 7 CI environment uses Python 2. To execute criu-ns script in CentOS 7 changing the current shebang line to python is required. This reverse the changes made in a15a63f Signed-off-by: Dhanuka Warusadura <[email protected]>
This is a patch proposed by Thomas here: https://lore.kernel.org/all/87ilczc7d9.ffs@tglx/ It removes (created id > desired id) "sanity" check and adds proper checking that ids start at zero and increment by one each time when we create/delete a posix timer. First purpose of it is to fix infinite looping in create_posix_timers on old pre 3.11 kernels. Second purpose is to allow kernel interface of creating posix timers with desired id change from iterating with predictable next id to just setting next id directly. And at the same time removing predictable next id so that criu with this patch would not get to infinite loop in create_posix_timers if this happens. Thanks a lot to Thomas! Signed-off-by: Pavel Tikhomirov <[email protected]>
This hook allows to start image streamer process from an action script. Signed-off-by: Radostin Stoyanov <[email protected]>
…tions This does cgroup namespace creation separately from joining task cgroups. This makes the code more logical, because creating cgroup namespace also involves joining cgroups but these cgroups can be different to task's cgroups as they are cgroup namespace roots (cgns_prefix), and mixing all of them together may lead to misunderstanding. Another positive thing is that we consolidate !item->parent checks in one place in restore_task_with_children. Signed-off-by: Valeriy Vdovin <[email protected]> Signed-off-by: Pavel Tikhomirov <[email protected]>
4.15-based kernels don't allow F_*SEAL for memfds created with MFD_HUGETLB. Since seals are not possible in this case, fake F_GETSEALS result as if it was queried for a non-sealing-enabled memfd. Signed-off-by: Michał Mirosław <[email protected]>
Linux 4.15 doesn't like empty string for cgroup2 mount options. Pass NULL then to satisfy the kernel check. Log the options for easier debugging. Signed-off-by: Michał Mirosław <[email protected]>
The original commit added saving THP_DISABLED flag value, but missed restoring it. There is restoring code, but used only when --lazy_pages mode is enabled. Restore the prctl flag always. While at it, rename the `has_thp_enabled` -> `!thp_disabled` for consistency. Fixes: bbbd597 (2017-06-28 "mem: add dump state of THP_DISABLED prctl") Signed-off-by: Michał Mirosław <[email protected]>
If prctl(SET_THP_DISABLE) is not used due to bad semantics, log it for easier debugging. Signed-off-by: Michał Mirosław <[email protected]>
While at it, don't carry over stale errno to the fail() message. Signed-off-by: Michał Mirosław <[email protected]>
Signed-off-by: Michał Mirosław <[email protected]>
Add a sanity check for THP_DISABLE. This discovered a broken commit in Google's kernel tree. Signed-off-by: Michał Mirosław <[email protected]>
cgroup_ifpriomap test needs net_prio cgroup, which might not be available. Make the .checkskip script check it. Signed-off-by: Michał Mirosław <[email protected]>
Newer versions of pip use an isolated virtual environment when building Python projects. However, when the source code of CRIT is copied into the isolated environment, the symlink for `../lib/py` (pycriu) becomes invalid. As a workaround, we used the `--no-build-isolation` option for `pip install`. However, this functionality has issues in some versions of PIP [1, 2]. To fix this problem, this patch adds separate packages for pycriu and crit, and each package is installed independently. [1] pypa/pip#8221 [2] pypa/pip#8165 (comment) Signed-off-by: Radostin Stoyanov <[email protected]>
Do not use $(USERCFLAGS) for anything other than what the user provide. Signed-off-by: Marcus Folkesson <[email protected]>
osctobe
force-pushed
the
zdtm-alarm-handling
branch
from
November 14, 2023 16:49
3e580c8
to
b883c49
Compare
In the compel/arch/arm/plugins/std/syscalls/syscall.def, the syscall number of bind on ARM64 should be 200 instead of 235 Signed-off-by: Sally Kang <[email protected]>
The rawhide netlink errors are fixed with a newer kernel than the default 6.2 available in Fedora 38. Signed-off-by: Adrian Reber <[email protected]>
The old test was checking if '/' is btrfs but we should check if the current directory is btrfs. Signed-off-by: Adrian Reber <[email protected]>
Signed-off-by: Adrian Reber <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Replace a recursive call with a loop. Reported-by: Andrei Vagin <[email protected]> Signed-off-by: Radostin Stoyanov <[email protected]>
Checkpoint/restore with version 25.0.0-beta.1 fails with the following error: $ docker start --checkpoint=c1 cr Error response from daemon: failed to create task for container: content digest fdb1054b00a8c07f08574ce52198c5501d1f552b6a5fb46105c688c70a9acb45: not found: unknown Release notes: moby/moby#46816 Signed-off-by: Radostin Stoyanov <[email protected]>
WARNINGS variable should be amended, not redefined. We still need, e.g., `-Wno-dangling-pointer` to build criu on loongarch64 with gcc13. Signed-off-by: Ivan A. Melnikov <[email protected]>
If ioctl(TIOCSLCKTRMIOS) fails with EPERM it means that a CRIU process lacks of CAP_SYS_ADMIN capability. But we can use ioctl(TIOCGLCKTRMIOS) to *read* current ->termios_locked value from the kernel and if it's the same as we already have we can skip failing ioctl(TIOCSLCKTRMIOS) safely. Adrian has recently posted [1] a very good patch to allow ioctl(TIOCSLCKTRMIOS) for processes that have CAP_CHECKPOINT_RESTORE (right now it requires CAP_SYS_ADMIN). [1] https://lore.kernel.org/all/[email protected]/ Suggested-by: Andrei Vagin <[email protected]> Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Newer versions of 'tail' rely on inotify and after a restore 'tail' is unhappy with the state of inotify and just stops. This replaces 'tail' with a minimal shell based test (thanks Andrei). Signed-off-by: Adrian Reber <[email protected]>
The image has a too old version of nettle which does not work with gnutls. Just upgrade to the latest to make the error go away. Signed-off-by: Adrian Reber <[email protected]>
Signed-off-by: Adrian Reber <[email protected]>
In commit [1] was introduced a mechanism to auto-generate the files: sys-exec-tbl*.c, syscalls*.S, syscall-codes*.h, and syscall*.h. This commit also updated the gitignore rules to ignore auto-generated files. However, after commit [2], the path for these files has changed and the patterns specified in gitignore are no longer needed. [1] bbc2f13 (x86/build: generate syscalls-{64,32}.built-in.o) [2] 19fadee (compel: plugins,std -- Implement syscalls in std plugin) Reported-by: @felicitia Signed-off-by: Radostin Stoyanov <[email protected]>
Starting with the musl v1.2.4~69, _GNU_SOURCE doesn't set _LARGEFILE64_SOURCE. Fixes checkpoint-restore#2313 Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: robert <[email protected]>
osctobe
force-pushed
the
zdtm-alarm-handling
branch
from
January 9, 2024 18:48
b883c49
to
65d9823
Compare
Let's kill the child when the test is hanging. Due to PEP 475 the SIGALRM handler needs to throw an exception to be able to interrupt wait(). To improve debuggig, close the fd in the child after reading it and detect that in the parent to show whether the child hung part way. Signed-off-by: Michał Mirosław <[email protected]>
osctobe
force-pushed
the
zdtm-alarm-handling
branch
from
January 9, 2024 18:50
65d9823
to
974f9cc
Compare
rst0git
added
no-auto-close
Don't auto-close as a stale issue
and removed
stale-pr
labels
Jan 10, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Let's kill the child when the test is hanging. Due to PEP 475 the SIGALRM handler needs to throw an exception to be able to interrupt wait(). To improve debuggig, close the fd in the child after reading it and detect that in the parent to show whether the child hung part way.