-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Hide /sysroot in a private mount namespace #3358
base: main
Are you sure you want to change the base?
Conversation
Hi @ruihe774. Thanks for your PR. I'm waiting for a ostreedev member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
feebf16
to
f53b275
Compare
The test failed because only very new glibc has wrappers for |
f53b275
to
1586d24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for working on this!
One struggle we have right now is that the prepare-root code is not unit or integration tested well; I wouldn't block this on that, but adding more features there compounds that issue. I may find some time to work on that...
On a different note, in theory this code could be implemented as a service that runs After=ostree-prepare-root.service
right? I wonder if it'd be helpful for us to structure it that way. But eh, just a thought.
I'm also OK just not having this feature work on older glibc for now. (Though it's also of course worth noting that in containers/bootc#919 I used the super nice rustix bindings for this which worked fine on C9S even though glibc doesn't have the headers there...and this touches on the ongoing thought of how we might start to use more Rust in libostree in general...or maybe put some of this in bootc to start...) |
8b97cd2
to
160bf13
Compare
I think in first step we can have individual binaries in I'll try to make some POC later if I have spare time. |
7e427cc
to
175a9e7
Compare
We do have a composefs-rs project going on but honestly...I would like to try to remove the static ostree-prepare-root in favor of UKIs basically. |
I have read some docs about socket activation, fd store, and unix socket. I find it's possible to write a service that uploads the fd obtained by We can keep current part that hides However, the problem is that I'm not familiar with systemd and C socket programming. It is difficult for me to implement the aforementioned. I wonder if you could give me some help. Thank you. |
A totally different approach: Today systemd when using volatile root sets up a special symlink that points to the root block device. On Linux, it's possible to mount a block device multiple times. So instead of having I have only spent 3 minutes and 5 seconds thinking about this and maybe there's something I'm missing but it would be dramatically simpler than anything discussed so far. EDIT: (another 20 seconds of thought) A big bonus here is that there's already the ability to have LSM policies which can apply to accessing block devices, so it'd be easy to restrict the set of things accessing the physical root (which would be a special case of general block device access, which really few things should have in general). |
And to be clear I think we should create that symlink unconditionally (should be a separate PR). Then this PR is just about whether or not to leave |
I don't think this approach can work with Btrfs subvolumes. Sometimes system roots are Btrfs subvolumes, not at the top of a block device. I know OpenSUSE (i.e. snapper) has such configurations. And what if the sysroot is not backed by a block device at all? For example, through network? I believe there are many corner cases if we assume sysroot is always a root of a block device. |
Yes, a valid point; it is also the setup for Fedora btrfs (at least with Anaconda). The general way to handle this would be to emit a Actually...we already should have a (This intersects a bit with containers/bootc#972 - for the btrfs and other cases we could attempt to re-synthesize the necessary mount information from kargs, but I think the most general fix is to honor however the root was set up in the initramfs, which would be
I don't know that there's a real use case for accessing the physical root in the network case though. Although it depends on what you mean by "network". iSCSI for example is still network, but is also a block device. NFS-as-root...I think is a bad idea. See also containers/bootc#898 which I think is better for those desiring diskless. |
I have compared several approaches to protect
|
I would call this: "Move /sysroot to /run/ostree/.private/sysroot" and list its advantages/disadvantages as basically "Makes it somewhat less likely for processes to find and traverse it, otherwise same as status quo". Re option 3:
Sure but that's really unlikely to happen accidentally, which is about half of what we're trying to improve. A historical issue we've had with the |
@@ -121,11 +123,6 @@ ostree_builtin_admin (int argc, char **argv, OstreeCommandInvocation *invocation | |||
} | |||
} | |||
|
|||
else if (g_str_equal (argv[in], "--")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you motivate this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to support command patterns like this:
$ ostree admin nsenter -- sh -c "echo hello"
In this command, if --
is omitted, -c
is processed by ostree and it fails.
So, nsenter need to support --
processing. If --
is processed here, nsenter cannot get the arguments after --
.
Maybe deleting the logic here may break things. We need to add a OSTREE_BUILTIN_FLAG_ to specify whether a command needs to process --
by itself. I'll do that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe deleting the logic here may break things. We need to add a OSTREE_BUILTIN_FLAG_ to specify whether a command needs to process -- by itself. I'll do that later.
Agreed
cd06b32
to
01de3ea
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
082605e
to
594d873
Compare
594d873
to
8da00c9
Compare
It turned out bringing a lot of trouble. I reverted it. |
8da00c9
to
699a816
Compare
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Related: #3211
This PR implements a new option
sysroot.invisible
inprepare-root.conf
to hide/sysroot
in a private mount namespace/run/ostree/.private/sysroot-ns
to prevent the rest of the system from accessing it, while ostree admin commands can still operate on it inside a mount namespace.This PR also add a new admin command "nsenter" that runs program in the mount namespace where
/sysroot
is present. This can wrap tools that currently do not support invisible sysroot. As an example, below is an drop-in override forrpm-ostreed.service
:I know this is a big change, and there may be corner cases that I haven't considered. However, I have tested it on my local machine, and basic functionalities (booting, deploying, integration with rpm-ostree) work fine. And my code is ready for review.