Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for safe symlink extraction #763

Closed
wants to merge 7 commits into from

Conversation

AndrewFasano
Copy link

@AndrewFasano AndrewFasano commented Feb 12, 2024

This PR aims to fix a few issues around symlink extraction discussed in #761. I don't think this PR is perfect, but I hope it's an improvement over the current state of things.

Now in #768 954c1cd rewrites the logic to sanitize symlinks to be relative and kept within the extraction directory. This is done using the os module instead of Pathlib as Pathlib.resolve would fail if a symlink target was missing (which doesn't prevent us from safely converting it to a relative link). With this change I no longer see false positives around MaliciousSymlinks, instead symlinks are created safely within the extraction directory. If a relative symlink originally tried accessing a directory above its own root (i.e., ./bin/sh -> ../../../../../bin/bash), we update the link so it remains within the extraction directory.

This may have just been an artifact of swapping src/dst? bbe18c6 and 76c29fe change how the .dst field of a symlink is calculated in file_utils and in _safe_tarfile - previously it was made by combining the extraction root with the symlink destination. This would lose critical information about the path of the symlink source. For example a symlink at ./sbin/shell -> ../bin/sh is safely within the extraction directory while a symlink at /shell -> ../bin/sh is trying to go up too high.

Now in #770 fc60755 fixes a bug where tarfile absolute symlinks would be improperly dropped which I observed with a system that had/var/log -> /tmp. 0e6e026 fixes a bug with relative symlinks that I observed on a system that had /var/tmp -> ../../tmp

56617a3 and 2ce66d6 are trying to fix a mix up between symlink source and destination when calling create_symlink. I fixed the CPIO extractor but it looks like things may be backwards in other extractors as well.

To test these changes, I created a CPIO archive with the following script:

#!/bin/bash

# Set up a temporary directory for our archive contents
WORKDIR=$(mktemp -d)
echo "Working directory: $WORKDIR"

# Create necessary directories and dummy file for symlink targets
mkdir -p "$WORKDIR/bin" "$WORKDIR/sbin" "$WORKDIR/usr/bin"
echo "This is BusyBox" > "$WORKDIR/bin/busybox"
echo "Dummy content" > "$WORKDIR/usr/bin/dummy"

# Descriptive names for symlink targets
# 1) Symlink in the same directory
ln -s busybox "$WORKDIR/bin/symlink_same_dir"

# 2) Symlink that points to a valid parent directory + file
ln -s ../bin/busybox "$WORKDIR/sbin/symlink_up_to_busybox"

# 3) Symlink with extra parent directories that would still be valid
ln -s ../../../bin/busybox "$WORKDIR/sbin/symlink_extra_up_to_busybox"

# 4) Absolute symlink to a file
ln -s /bin/busybox "$WORKDIR/bin/symlink_absolute"

# 5) Broken symlink (target does not exist)
ln -s non_existent_file "$WORKDIR/bin/symlink_broken"

# 6) Symlink pointing to another symlink (chained symlinks)
ln -s symlink_same_dir "$WORKDIR/bin/symlink_to_symlink"

# 7) Circular symlink (A -> B, B -> A)
ln -s symlink_circular_b "$WORKDIR/bin/symlink_circular_a"
ln -s symlink_circular_a "$WORKDIR/bin/symlink_circular_b"

# 8) Second symlink to busybox
ln -s busybox "$WORKDIR/bin/symlink_same_dir2"

# Navigate to the work directory
cd "$WORKDIR"

# Create the CPIO archive
find . | cpio -ov --format=newc > test_archive.cpio

# Move the archive to the current directory (assuming it's where the script is run)
mv test_archive.cpio "$OLDPWD"
cd "$OLDPWD"

# Clean up the working directory
rm -rf "$WORKDIR"
echo "Archive created: test_archive.cpio"

If I extract this with the head of unblob (d0f3086) and run
find ../test/test_archive.cpio_extract/ -type f,l -exec ls -al {} \; I get:

-rw-r--r-- 1 root root 0 Feb 12 17:09 ../test/test_archive.cpio_extract/test_archive.cpio
lrwxrwxrwx 1 root root 17 Feb 12 17:09 ../test/test_archive.cpio_extract/bin/symlink_broken -> non_existent_file
-rw-r--r-- 1 root root 16 Feb 12 17:09 ../test/test_archive.cpio_extract/bin/busybox
lrwxrwxrwx 1 root root 7 Feb 12 17:09 ../test/test_archive.cpio_extract/bin/symlink_to_symlink -> busybox
lrwxrwxrwx 1 root root 7 Feb 12 17:09 ../test/test_archive.cpio_extract/bin/symlink_same_dir2 -> busybox
lrwxrwxrwx 1 root root 18 Feb 12 17:09 ../test/test_archive.cpio_extract/bin/symlink_circular_a -> symlink_circular_b
lrwxrwxrwx 1 root root 7 Feb 12 17:09 ../test/test_archive.cpio_extract/bin/symlink_same_dir -> busybox
lrwxrwxrwx 1 root root 7 Feb 12 17:09 ../test/test_archive.cpio_extract/bin/symlink_absolute -> busybox
-rw-r--r-- 1 root root 14 Feb 12 17:09 ../test/test_archive.cpio_extract/usr/bin/dummy
lrwxrwxrwx 1 root root 14 Feb 12 17:09 ../test/test_archive.cpio_extract/sbin/symlink_up_to_busybox -> ../bin/busybox

After applying the changes in this PR I get the following result with two additional (and expected) files extracted: symlink_circular_b and symlink_extra_up_to_busybox. All the files are still contained within the extraction directory.

-rw-r--r-- 1 root root 0 Feb 12 17:10 ../test/test_archive.cpio_extract/test_archive.cpio
lrwxrwxrwx 1 root root 18 Feb 12 17:10 ../test/test_archive.cpio_extract/bin/symlink_circular_b -> symlink_circular_a
lrwxrwxrwx 1 root root 17 Feb 12 17:10 ../test/test_archive.cpio_extract/bin/symlink_broken -> non_existent_file
-rw-r--r-- 1 root root 16 Feb 12 17:10 ../test/test_archive.cpio_extract/bin/busybox
lrwxrwxrwx 1 root root 16 Feb 12 17:10 ../test/test_archive.cpio_extract/bin/symlink_to_symlink -> symlink_same_dir
lrwxrwxrwx 1 root root 7 Feb 12 17:10 ../test/test_archive.cpio_extract/bin/symlink_same_dir2 -> busybox
lrwxrwxrwx 1 root root 18 Feb 12 17:10 ../test/test_archive.cpio_extract/bin/symlink_circular_a -> symlink_circular_b
lrwxrwxrwx 1 root root 7 Feb 12 17:10 ../test/test_archive.cpio_extract/bin/symlink_same_dir -> busybox
lrwxrwxrwx 1 root root 7 Feb 12 17:10 ../test/test_archive.cpio_extract/bin/symlink_absolute -> busybox
-rw-r--r-- 1 root root 14 Feb 12 17:10 ../test/test_archive.cpio_extract/usr/bin/dummy
lrwxrwxrwx 1 root root 14 Feb 12 17:10 ../test/test_archive.cpio_extract/sbin/symlink_extra_up_to_busybox -> ../bin/busybox
lrwxrwxrwx 1 root root 14 Feb 12 17:10 ../test/test_archive.cpio_extract/sbin/symlink_up_to_busybox -> ../bin/busybox

@qkaiser
Copy link
Contributor

qkaiser commented Feb 13, 2024

@AndrewFasano from within the unblob directory, you can do pre-commit install followed by pre-commit run --all. This will perform multiple tasks like type checking, code linting, import sorting, etc.

pre-commit can be installed with pip.

It will modify the code and you can create fixups for existing commits depending on the part it touches.

@qkaiser qkaiser self-assigned this Feb 13, 2024
@qkaiser qkaiser added bug Something isn't working help wanted Extra attention is needed format:archive format:filesystem labels Feb 13, 2024
@AndrewFasano
Copy link
Author

Ruff seems to be quite unhappy with my use of os.path and wants me to use Pathlib instead, but there are some critical differences between how those two modules behave around symlinks. Seems like it might take some time to go through all these linter errors and test what can actually be switched to Pathlib versus what needs to stay as os.path.

@e3krisztian
Copy link
Contributor

e3krisztian commented Feb 13, 2024

I think there is some confusion on how FileSystem.create_symlink() should work.

pytest --no-cov tests/test_file_utils.py fails on symlink - and surprisingly on hardlink - tests as well.

Maybe there was some confusion in some handlers on the argument order - or names, but these tests show the intention, and they should keep working after modifications.

@e3krisztian
Copy link
Contributor

FileSystem.create_symlink(src, dst)

is supposed to be following the unix command line order and naming

cp src dst
mv src dst
ln src dst
ln -s src dst

Except these are named differently in the man page for ln, as src and dst is arguably confusing for links:

SYNOPSIS
       ln [OPTION]... [-T] TARGET LINK_NAME

So src=TARGET and dst=LINK_NAME for create_symlink, we definitely should have used the man ln names for arguments.

@AndrewFasano
Copy link
Author

Thanks for the info @e3krisztian. These names are definitely confusing and now that I cleaned up my changes, I can see that you're right - the order was fine before. Sorry about that! There might still be value in revising the change from bbe18c6 to add the root to the base of the target path before checking if it's safe, but maybe even that's unnecessary? At one point I thought these changes were required for some symlinks to properly be created, but that might've been a mistake since it seems like they're largely a no-op other than switching the arg order.

I think the other changes are still relevant though!

@e3krisztian
Copy link
Contributor

@AndrewFasano this is a valued input and contribution for unblob and these changes/problems are taken seriously!

You might also be correct with some hidden argument swapping somewhere still.

The below symlinks are intentionally marked as problematic in the current code (3) being the case of path traversal), maybe unblob is too strict by throwing these out, and we should sanitize the link targets as you propose.

# 3) Symlink with extra parent directories that would still be valid
ln -s ../../../bin/busybox "$WORKDIR/sbin/symlink_extra_up_to_busybox"

# 7) Circular symlink (A -> B, B -> A)
ln -s symlink_circular_b "$WORKDIR/bin/symlink_circular_a"
ln -s symlink_circular_a "$WORKDIR/bin/symlink_circular_b"

@e3krisztian
Copy link
Contributor

Added a review comment for 954c1cd - sanitize_symlink_target, and extracted that commit into a separate branch, and PR: #768.

@e3krisztian
Copy link
Contributor

bbe18c6 and 76c29fe change how the .dst field of a symlink is calculated in file_utils and in _safe_tarfile - previously it was made by combining the extraction root with the symlink destination. This would lose critical information about the path of the symlink source. For example a symlink at ./sbin/shell -> ../bin/sh is safely within the extraction directory while a symlink at /shell -> ../bin/sh is trying to go up too high.

Cherry-picking these commit over main and running the tests 6 tests fail. These failures are due to x / y / z becoming absolute path, if y is absolute path. Maybe they were needed due to some other changes in this PR?

I have tried to reproduce the problems locally based on the description given with the tests below applied to main branch. These tests pass, so I do not understand what is the problem supposed to be solved by these commits.

diff --git a/tests/test_file_utils.py b/tests/test_file_utils.py
index 9a20da74..1fc94138 100644
--- a/tests/test_file_utils.py
+++ b/tests/test_file_utils.py
@@ -25,7 +25,7 @@ from unblob.file_utils import (
     round_down,
     round_up,
 )
-from unblob.report import PathTraversalProblem
+from unblob.report import LinkExtractionProblem, PathTraversalProblem


 @pytest.mark.parametrize(
@@ -503,6 +503,30 @@ class TestFileSystem:
         assert os.readlink(output_path) == "target file"
         assert sandbox.problems == []

+    def test_create_symlink_target_inside_sandbox(self, sandbox: FileSystem):
+        # ./sbin/shell -> ../bin/sh
+        sandbox.mkdir(Path("bin"))
+        sandbox.write_bytes(Path("bin/sh"), b"posix shell")
+        sandbox.mkdir(Path("sbin"))
+        sandbox.create_symlink(Path("../bin/sh"), Path("sbin/shell"))
+
+        output_path = sandbox.root / "sbin/shell"
+        assert output_path.read_bytes() == b"posix shell"
+        assert output_path.exists()
+        assert os.readlink(output_path) == "../bin/sh"
+        assert sandbox.problems == []
+
+    def test_create_symlink_target_outside_sandbox(self, sandbox: FileSystem):
+        # /shell -> ../bin/sh
+        sandbox.mkdir(Path("bin"))
+        sandbox.write_bytes(Path("bin/sh"), b"posix shell")
+        sandbox.create_symlink(Path("../bin/sh"), Path("/shell"))
+
+        assert any(p for p in sandbox.problems if isinstance(p, LinkExtractionProblem))
+        output_path = sandbox.root / "shell"
+        assert not output_path.exists()
+        assert not output_path.is_symlink()
+
     def test_create_symlink_absolute_paths(self, sandbox: FileSystem):
         sandbox.write_bytes(Path("target file"), b"test content")
         sandbox.create_symlink(Path("/target file"), Path("/symlink"))

@AndrewFasano
Copy link
Author

@e3krisztian I can try updating those commits to work on main and try digging up a filesystem where I was seeing the issue that was supposed to address. At a minimum bbe18c6 would need to be updated to swap src/dst since this PR has them swapped.

@AndrewFasano
Copy link
Author

I think this PR was a bit too big and skewed by my mix up with the src/dst variables. I updated the original comment to point to the new PRs for the relevant components. The changes worth discussing are in #768 and #770.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working format:archive format:filesystem help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants