Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict Checks for Open/MMAPed Files - CLI & Checksum #1148

Open
wants to merge 123 commits into
base: criu-dev
Choose a base branch
from

Conversation

AnorexicAtticusFinch
Copy link
Contributor

@AnorexicAtticusFinch AnorexicAtticusFinch commented Jul 22, 2020

This adds CLI options and the checksum file validation method in addition to the previous build-ID and file size checks as part of GSoC 2020.

CRIU uses the build-ID validation method by default whenever possible and if that cannot be done, it will try and use the checksum method with the first 1 MB of the file. If this fails as well, it will use only the file size check.

To use the checksum method first and make the build-ID method the fallback:
--file-validation="chksm" --chksm-parameter=[N]
(To calculate checksum with the first N bytes of the file, at most 10 MB)

--file-validation="chksm-full"
(To calculate checksum with the entire file, at most 10 MB)

--file-validation="chksm-period" --chksm-parameter=[N]
(To calculate checksum with every Nth byte of the file)

To explicitly use only the file size check all the time, the following command line option can be used: --file-validation="filesize"
In other words, --file-validation="buildid" will not make a difference as this is the default method

Using the time command to measure the time taken for 2 tests.
zdtm/transition/shmem:
file size: 3.782s
build-id: 4.153s
chksum (Parameter = 1024): 4.465s
chksum-full: 4.722s
chksum-period (Parameter = 1024): 4.498s

zdtm/static/maps04:
file size: 35.317s
build-id: 35.720s
chksum (Parameter = 1024): 35.919s
chksum-full: 36.679s
chksum-period (Parameter = 1024): 36.476s

Mentors: @avagin @0x7f454c46

This also adds ZDTM tests for build-ID and checksum validation methods.

Link to the previous PR:
#1131

Changes from the previous PR:
rebase on the current criu-dev
cosmetic changes
other minor changes

rst0git and others added 30 commits July 16, 2020 05:32
When saddr.ss_family is AF_INET6 we should cast &saddr to
(struct sockaddr_in6 *).

Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
When CRIU calls the ip tool on restore, it passes the fd of remote
socket by replacing the STDIN before execvp. The stdin is used by the
ip tool to receive input. However, the ip tool calls ftell(stdin)
which fails with "Illegal seek" since UNIX sockets do not support file
positioning operations. To resolve this issue, read the received
content from the UNIX socket and store it into temporary file, then
replace STDIN with the fd of this tmp file.

 # python test/zdtm.py run -t zdtm/static/env00 --remote -f ns
 === Run 1/1 ================ zdtm/static/env00

 ========================= Run zdtm/static/env00 in ns ==========================
 Start test
 ./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
 Adding image cache
 Adding image proxy
 Run criu dump
 Run criu restore
 =[log]=> dump/zdtm/static/env00/31/1/restore.log
 ------------------------ grep Error ------------------------
 RTNETLINK answers: File exists
 (00.229895)      1: do_open_remote_image RDONLY path=route-9.img snapshot_id=dump/zdtm/static/env00/31/1
 (00.230316)      1: 	Running ip route restore
 Failed to restore: ftell: Illegal seek
 (00.232757)      1: Error (criu/util.c:712): exited, status=255
 (00.232777)      1: Error (criu/net.c:1479): IP tool failed on route restore
 (00.232803)      1: Error (criu/net.c:2153): Can't create net_ns
 (00.255091) Error (criu/cr-restore.c:1177): 105 killed by signal 9: Killed
 (00.255307) Error (criu/mount.c:2960): mnt: Can't remove the directory /tmp/.criu.mntns.dTd7ak: No such file or directory
 (00.255339) Error (criu/cr-restore.c:2119): Restoring FAILED.
 ------------------------ ERROR OVER ------------------------
 ################# Test zdtm/static/env00 FAIL at CRIU restore ##################
 ##################################### FAIL #####################################

Fixes checkpoint-restore#311

Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Instead of erroring, we should loop until we get the desired number of
bytes written, like regular I/O loops.

Signed-off-by: Nicolas Viennot <[email protected]>
This adds the ability to stream images with criu-image-streamer

The workflow is the following:
1) criu-image-streamer is started, and starts listening on a UNIX
   socket.
2) CRIU is started. img_streamer_init() is invoked, which connects to the
   socket. During dump/restore operations, instead of using local disk to
   open an image file, img_streamer_open() is called to provide a UNIX pipe
   that is sent over the UNIX socket.
3) Once the operation is done, img_streamer_finish() is called, and the
   UNIX socket is disconnected.

criu-image-streamer can be found at:
https://github.com/checkpoint-restore/criu-image-streamer

Signed-off-by: Nicolas Viennot <[email protected]>
One can pass --stream to zdtm.py for testing criu with image streaming.
criu-image-streamer should be installed in ../criu-image-streamer
relative to the criu project directory. But any path will do providing
that criu-image-streamer can be found in the PATH env.

Added a few tests to run on travis-ci to make sure streaming works.
We run test that are likely to fail. However, it would be good to once
in a while run all tests with `--stream -a`.

Signed-off-by: Nicolas Viennot <[email protected]>
So, here's the enhanced version of the first try.

Changes are:

1. The wrapper name is criu-ns instead of crns.py
2. The CLI is absolutely the same as for criu, since the script
   re-execl-s criu binary. E.g.
	   scripts/criu-ns dump -t 1234 ...
   just works
3. Caller doesn't need to care about substituting CLI options,
   instead, the scripts analyzes the command line and
   a) replaces -t|--tree argument with virtual pid __if__ the
      target task lives in another pidns
   b) keeps the current cwd (and root) __if__ switches to another
      mntns. A limitation applies here -- cwd path should be the
      same in target ns, no "smart path mapping" is performed. So
      this script is for now only useful for mntns clones (which
      is our main goal at the moment).

Signed-off-by: Pavel Emelyanov <[email protected]>
Looks-good-to: Andrey Vagin <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Signed-off-by: Pavel Emelyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
In Py2 `range` returns a list and `xrange` creates a sequence object
that evaluates lazily. In Py3 `range` is equivalent to `xrange` in Py2.

Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
The print_data() function was part of the deprecated (and removed)
'show' action, and it was moved in util.c with the following commit:

	a501b48
	The 'show' action has been deprecated since 1.6, let's finally drop it.

	The print_data() routine is kept for yet another (to be deprecated too)
	feature called 'criu exec'.

The criu exec feature was removed with:

	909590a
	Remove criu exec code

	It's now obsoleted by compel library.
	Maybe-TODO: Add compel tool exec action?

Therefore, now we can drop print_data() as well.

Signed-off-by: Radostin Stoyanov <[email protected]>
class ctypes.c_char_p
    Represents the C char * datatype when it points to a zero-
    terminated string. For a general character pointer that may
    also point to binary data, POINTER(c_char) must be used.
    The constructor accepts an integer address, or a bytes object.

https://docs.python.org/3/library/ctypes.html#ctypes.c_char_p

Signed-off-by: Radostin Stoyanov <[email protected]>
criu-3.12/criu/net.c:2043: overwrite_var: Overwriting "img" in "img =
open_image_at(-1, CR_FD_IP6TABLES, 0UL, pid)" leaks the storage that
"img" points to.

Signed-off-by: Adrian Reber <[email protected]>
…braries

     This patch only adds the support but does not enable it for building.

Signed-off-by: Guoyun Sun <[email protected]>
Oddly, one of the test had a typo which should be fatal.

Signed-off-by: Nicolas Viennot <[email protected]>
We permanently have issues like this:
./test/jenkins/criu-iter.sh: 3: source: not found

It looks like a good idea to use one shell to run our jenkins scripts.

Signed-off-by: Andrei Vagin <[email protected]>
On MIPS CPUs with VIPT caches also has aliasing issues, just like ARMv6.
To overcome this issue, page coloring 0x40000 align for shared mappings was introduced (SHMLBA) in kernel.
    https://github.com/torvalds/linux/blob/master/arch/mips/include/asm/shmparam.h

Related to this, zdtm test suites ipc.c shm.c shm-unaligned.c and shm-mp.c are passed.

Signed-off-by: Guoyun Sun <[email protected]>
k_rtsigset_t is 16Bytes in mips architecture but not 8Bytes.
so blk_sigset_extended be added in TaskCoreEntry and ThreadCoreEntry for dumping
extern 8Bytes data in parasite-syscall.c, restore extern 8Bytes data in cr-restore.c

Signed-off-by: Guoyun Sun <[email protected]>
A similar one is already printed in check_options().

Before this patch:
> $ ./criu/criu -vvvvvv --deprecated --log-file=/dev/stdout xxx
> (00.000000) Turn deprecated stuff ON
> ...
> (00.029680) DEPRECATED ON
> (00.029687) Error (criu/crtools.c:284): unknown command: xxx

Signed-off-by: Kir Kolyshkin <[email protected]>
A few tests were still running on xenial because at some point they were
hanging. This switches now all tests to bionic except one docker test
which still uses xenial to test with overlayfs.

Signed-off-by: Adrian Reber <[email protected]>
This moves the cross compilation tests to github actions, to slightly
reduce the number of Travis tests and run them in parallel on github
actions.

Signed-off-by: Adrian Reber <[email protected]>
The message "Overwriting RPC settings with values from <filename>" is
misleading, giving the impression that file is being read and consumed.
It really puzzled me, since <filename> didn't exist.

What it needs to say is "Would overwrite", i.e. if a file with such name
is present, it would be used.

Also, add actual "Parsing file ..." so it will be clear which files are
being used.

Signed-off-by: Kir Kolyshkin <[email protected]>
While working on runc checkpointing, I incorrectly closed status_fd
prematurely, and received an error from CRIU, but it was
non-descriptive.

Do print the error from open().

Signed-off-by: Kir Kolyshkin <[email protected]>
The orphan pts master option was introduced with commit [1]
to enable checkpoint/restore of containers with a pty pair
used as a console.

[1] checkpoint-restore@6afe523

Signed-off-by: Radostin Stoyanov <[email protected]>
abhishekvijeev and others added 19 commits August 25, 2020 20:45
This commit defines constants and includes necessary headers to c/r
BPF maps

Source files modified:

* magic.h - Defining BPFMAP_FILE_MAGIC and BPFMAP_DATA_MAGIC

* image-desc.h - Defining CR_FD_BPFMAP_FILE and CR_FD_BPFMAP_DATA

* image-desc.c - Create new entries for bpfmap-file and bpfmap-data
in CRIU's file descriptor set

* protobuf-desc.h - Defining PB_BPFMAP_FILE and PB_BPFMAP_DATA

* protobuf-desc.c - Including headers for BPF map protobuf images

Signed-off-by: Abhishek Vijeev <[email protected]>
Source files modified:

* Makefile.config - Checks whether libbpf is installed on the system.
If so, we add -lbpf to LIBS_FEATURES, -DCONFIG_HAS_LIBBPF to
FEATURE_DEFINES and set CONFIG_HAS_LIBBPF. This allows us to check for
the presence of libbpf before compiling or executing BPF c/r code and
ZDTM tests.

* Makefile - Set CONFIG_HAS_LIBBPF to clean all files.

Signed-off-by: Abhishek Vijeev <[email protected]>
This commit enables CRIU to:
(a) identify an anonymous inode as being a BPF map
(b) parse information about BPF maps from procfs

Source files modified:

* files.c - Checks anonymous inodes to see whether they are BPF maps.
If so, sets struct fdtype_ops *ops to a structure that knows how to
dump BPF maps

* proc_parse.c - Function parse_fdinfo_pid_s() now checks whether the
current file being processed is a BPF map. If so, it calls a newly
defined function parse_bpfmap() which knows how to parse information
about BPF maps from procfs

Signed-off-by: Abhishek Vijeev <[email protected]>
This commit enables CRIU to dump meta-data about BPF maps files by
prividing the structures and functions needed by other parts of the
code-base.

Source files added:

* bpfmap.c - defines new structures and functions:

    (a) struct fdtype_ops bpfmap_dump_ops:
            sets up the function handler to dump BPF maps

    (b) is_bpfmap_link():
            checks whether an anonymous inode is a BPF map file

    (c) dump_one_bpfmap():
            parses information for a BPF map file from procfs and
			dumps it

* include/bpfmap.h - structure and function declarations

Source files modified:

* Makefile.crtools - generates build artifacts for bpfmap.c

Signed-off-by: Abhishek Vijeev <[email protected]>
This commit enables CRIU to dump data(key-value) pairs stored in BPF
maps

Source files modified:

* bpfmap.c

    - Function dump_one_bpfmap_data() reads the map's keys and
    values into two buffers using bpf_map_lookup_batch() and then
    writes them out to a protobuf image along with the number of
    key-value pairs read

    - Function dump_one_bpfmap() now dumps the data as well before
    returning

* include/bpfmap.h - Includes headers and declares functions needed to
dump BPF map data

Signed-off-by: Abhishek Vijeev <[email protected]>
This commit enables CRIT to decode the contents of a protobuf image
that stores information related to BPF map

Signed-off-by: Abhishek Vijeev <[email protected]>
This commit enables CRIU to restore a process' BPF map file
descriptors.

Source files modified:

* bpfmap.c - Structure and function definitions needed to:
    (a) collect a BPF map's information from its protobuf image
    (b) create and open a BPF map with the same parameters as when
    it was dumped
    (c) add the newly opened BPF map to the process' file descriptor
    list

* include/bpfmap.h - Structure declarations for restoring BPF maps

* files.c - Collects a BPF map's file entry during the restoration
phase

Signed-off-by: Abhishek Vijeev <[email protected]>
This commit restores the data of BPF maps. A hash table (indexed by
the map's id) is used to store data objects for multiple BPF map
files that a process may have opened. Collisions are resolved with
chaining using a linked list.

Source files modified:

* bpfmap.c - Structure and function definitions needed to:
    (a) collect the protobuf image containing BPF map data
    (b) read the BPF map's data from the image and store it in the
    hash table
    (c) restore the map's data using bpf_map_update_batch()

* include/bpfmap.h
    - Defines the size of the hash table and maks to be used while
	indexing into it
	- Structure and function declarations that are used while restoring
	BPF map data

* cr-restore.c - Collects the protobuf image containing BPF map data
during the restoration phase

Signed-off-by: Abhishek Vijeev <[email protected]>
This commit adds ZDTM tests for c/r of processes with BPF maps as open
files

Source files added:

* zdtm/static/bpf_hash.c - Tests for c/r of the data and meta-data of
BPF map type BPF_MAP_TYPE_HASH

* zdtm/static/bpf_array.c - Tests for c/r of the data and meta-data
of BPF map type BPF_MAP_TYPE_ARRAY

Source files modified:

* zdtm/static/Makefile - Generating build artifacts for BPF tests

Signed-off-by: Abhishek Vijeev <[email protected]>
Source files modified:

* travis/vagrant.sh - Adding libbpf-devel

Signed-off-by: Abhishek Vijeev <[email protected]>
This change fixes the error:
error: 'security_context_t' is deprecated
[-Werror=deprecated-declarations]

Source files modified:

* lsm.c
* net.c

Please refer to:
SELinuxProject/selinux@9eb9c9327

Signed-off-by: Abhishek Vijeev <[email protected]>
Since commit cdd08cd ("uffd: use userns_call() to execute
ioctl(UFFDIO_API)") UFFD_API ioctl() is wrapped with userns_call() and this
allows runing lazy-pages tests on recent kernels in uns.

Restore testing of lazy-pages in uns in travis.

Signed-off-by: Mike Rapoport <[email protected]>
The docker hub container registry is not updated as fast as Fedora's
registry at registry.fedoraproject.org. Fedora's registry gets a new
image whenever there is a new version of rawhide, docker hub's rawhide
image can take a couple of weeks because the process is not automated.

Especially when Fedora branches of a new release we see lot's of errors
in CRIU's Fedora rawhide based Travis runs. Switch to Fedora's registry
to always have the newest rawhide images for our tests.

Signed-off-by: Adrian Reber <[email protected]>
file_validation_chksm_config and file_validation_chksm_parameter fields
added to cr_options structure in criu/include/cr_options.h along
with the constants:
FILE_VALIDATION_CHKSM
FILE_VALIDATION_CHKSM_FULL
FILE_VALIDATION_CHKSM_FIRST
FILE_VALIDATION_CHKSM_PERIOD
FILE_VALIDATION_CHKSM_CONFIG_DEFAULT (Equal to FILE_VALIDATION_CHKSM_FIRST)
FILE_VALIDATION_CHKSM_PARAM_DEFAULT (Equal to 1024)

Usage and description information is yet to be added.

Usage:
--file-validation="checksum" --checksum-parameter=[N]
(To calculate checksum with the first N bytes of the file, at most 10MB)

--file-validation="checksum-full"
(To calculate checksum with the entire file, at most 10MB)

--file-validation="checksum-period" --checksum-parameter=[N]
(To calculate checksum with every Nth byte of the file)

Signed-off-by: Ajay Bharadwaj <[email protected]>
This adds functions to find, store and compare with the stored checksum.
The file is mapped 10MB at a time.

Signed-off-by: Ajay Bharadwaj <[email protected]>
This adds the flag 'crfail-restore' which is used to indicate that the
test is expected to fail during restore.

Signed-off-by: Ajay Bharadwaj <[email protected]>
This adds 4 tests (2 each for 32 bit and 64 bit ELF files):
1. ELF file has only 1 PT_NOTE header
2. ELF file has 2 PT_NOTE headers and the build-ID is present in the 2nd
header
The build-ID is altered after dump and before restore, causing the restore
to fail in all 4 tests.

Signed-off-by: Ajay Bharadwaj <[email protected]>
This adds 6 tests:
1. The last byte of the file is altered after dump and before restore,
causing restore to fail (checksum-full)

2. checksum, checksum-parameter = 10485800 (First 10485800 bytes are
checksummed)

3. checksum-period, checksum-parameter = 10485800 (Every 10485800th byte
is checksummed)

4. checksum, checksum-parameter = 10485700 (First 10485700 bytes are
checksummed)

5. checksum-period, checksum-parameter = 10485700 (Every 10485700th byte
is checksummed)

6. The last byte of a file of size 10485800 bytes is altered after dump
and before restore, causing restore to fail
checksum, checksum-parameter = 10485800 (First 10485800 bytes are
checksummed)

Signed-off-by: Ajay Bharadwaj <[email protected]>
This moves the file validation helper function declarations to
files-validation.h and the function definitions to files-validation.c.
It also updates Makefile.crtools.

Signed-off-by: Ajay Bharadwaj <[email protected]>
@github-actions
Copy link

github-actions bot commented Jan 7, 2021

A friendly reminder that this PR had no activity for 30 days.

@xiongzubiao
Copy link

If I understand correctly, this change is not merged in. Anyone knows why? CRIU's wiki has it though... https://criu.org/index.php?title=Validate_files_on_restore

@Snorch
Copy link
Member

Snorch commented Aug 28, 2023

@xiongzubiao @avagin @checkpoint-restore/maintainers

The PR has unresolved conversation #1148 (comment) about "how fast is the algorithm?", it seem to only work fast for files with BUILDID, checksumming is not so fast obviously.

So it seems we need to update wiki that this feature is not available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.