-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge recent dev to main #2063
Merged
Merged
Merge recent dev to main #2063
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Previously we used a mix of perror(3) + exit(3) and err(3); standardize on the latter instead. This does remove one free() in an error path, because we're decidedly leaking a lot more than just the loader name there (loader handle, vcpu, vmctx...) anyways. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D43331
Only try sending more data on pure ACKs when there is more data available in the send buffer. In the case of a retransmitted SYN not being sent due to an internal error, the snd_una/snd_nxt accounting could be off, leading to a panic. Pulling snd_nxt up to snd_una prevents this from happening. Reported by: [email protected] Reviewed by: cc, tuexen, #transport MFC after: 1 week Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43343
PRR state was not properly reset on subsequent ECN CE events. Clean up after local transmission failures too. Reviewed by: tuexen, cc, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43170
Reviewed by: markj, melifaro Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D43368
PR: 253069 MFC after: 1 week Sponsored by: Chelsio Communications
Reviewed by: brooks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D43371
This matches what Linux does. Reviewed by: melifaro, tuexen Differential Revision: https://reviews.freebsd.org/D43366
This is a quick plug to fix panic with Netlink which has protocol specific buffers. Note that PF_UNIX/SOCK_DGRAM, which also has its own buffers, avoids the panic due to being SOCK_DGRAM. A correct but more complicated fix that needs to be done is to merge pr_shutdown, pr_flush and dom_dispose into one protocol method that may call sorflush for generic sockets or do their own stuff for protocol which has own buffers. Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43367 Reported-by: [email protected]
In the case of hostbase_fd, this is infact a bug fix; we have a seek callback that the host: filesystem may use in loader, and we really don't have a good excuse to break it. bootfd-derived fds will only be used with fdlopen(3) and rtld doesn't seem to need pread / lseek at all for it today, but there's no reason to break if it finds a good reason to later. Suggested by: markj
CC/CXX/CPP/LD may all have arguments supplied in various circumstances, which break the logic here. We only need to determine which of these tools we're expecting to invoke from PATH, which just requires examination of the first word. Limit our scope to exactly that. Patch suggested by: jrtc27 Reviewed by: imp, jrtc27 Differential Revision: https://reviews.freebsd.org/D43372
The change of argument for sizeof() (from a type to an object) is to be consistent with the change done for the malloc() code just above in the preceding commit touching this file. Consider bit flags as integers and test whether they are set with an explicit comparison with 0. Use an explicit flag value (PTHREAD_SCOPE_SYSTEM) in place of a variable that has this value at point of substitution. All other changes are straightforward. Suggested by: kib Reviewed by: kib Approved by: emaste (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D43327
Using calloc() instead of malloc() is useless here since the allocated memory is to be wholly crushed by the memcpy() call that follows. Suggested by: kib Reviewed by: emaste, kib Approved by: emaste (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D43328
Issue data claim command after every chunk of data programmed, so we can reuse the SVC buffer for the next chunk. Tested on Terasic DE10 Pro. Sponsored by: UKRI
The standard is somewhat unclear, but on the balance, I believe that the phrase “the rest of the input line” should be interpreted to mean the rest of the input line including the terminating newline if and only if there is one. This means the current implementation is incorrect on two points: - First, it suppresses the previous line's newline in the '1' case. - Second, it unconditionally emits a newline at the end of the output for non-empty input, even if the input did not end with a newline. Resolve this by rewriting the main loop. Instead of special-casing the first line and then assuming that every line ends with a newline, we remember how each line ends and emit that either at the beginning of the next line or at the end of the file except in the one case ('+') where the standard explicitly says not to. While here, try to reduce diff to upstream a little and update their RCS tag to reflect the fact that while we've diverged significantly from them, we've incorporated all their changes. Remove the useless second RCS tag. We also update the tests to account for the change in interpretation of the '1' case and add a test case for unterminated input. MFC after: 1 week Sponsored by: Klara, Inc. Reviewed by: kevans Differential Revision: https://reviews.freebsd.org/D43326
This allows writing setup scripts that contain lines starting with "#!", e.g., a shebang when creating a shell script using cat: #!/bin/sh echo "Populate rc.local" cat >/etc/rc.local<<EOF #!/bin/sh echo booted | logger -s -t 'example' EOF Prevent accidentally running a setup script left behind by a previous invocation of bsdinstall. Reviewed by: imp, jrtc27 Differential Revision: https://reviews.freebsd.org/D43350
Existing powerpc kernels include additional sections beyond .dynamic in the PT_DYNAMIC segment. Relax the requirement for an exact size match of the section and segment for PowerPC files as a workaround. Reported by: jrtc27 Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D43123
These loops already handled a NULL return from mbufq_dequeue when the queue was empty, so remove a redundant check of mbufq_len before dequeueing. Reviewed by: bz Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D43336
Complement to the existing mbufq_full Reviewed by: bz Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D43337
Reviewed by: bz, emaste Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D43338
…ffer memdesc_alloc_ext_mbufs constructs a chain of external (M_EXT or M_EXTPG) mbufs backed by a data buffer described by a memory descriptor. Since memory descriptors are not an actual buffer just a description of a buffer, the caller is required to supply a couple of helper routines to manage allocation of the raw mbufs and associating them with a reference to the underlying buffer. Reviewed by: markj Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D42933
In particular, don't reserve resources added by drivers via other means (e.g. acpi_bus_alloc_gas which calls bus_alloc_resource right after adding the resource). The intention of reserved resources is to ensure that a resource range that a bus driver knows is assigned to a device is reserved by the system even if no driver is attached to the device. This prevents other "wildcard" resource requests from conflicting with these resources. For ACPI, the only resources the bus driver knows about for unattached devices are the resources returned from _CRS. All of these resources are already reserved now via acpi_reserve_resources called from acpi_probe_children. As such, remove the logic from acpi_set_resource to try to reserve resources when they are set. This permits RF_SHAREABLE to work with acpi_bus_alloc_gas without requiring hacks like the current one for CPU device resources in acpi_set_resource. Reported by: gallatin (RF_SHAREABLE not working) Diagnosed by: jrtc27
…sind=1) This applies the fix to powerpc's pmap as was done in commit aa3bcaa and d0941ed for amd64 and riscv pmaps, respectively. Reported by: Jenkins Reviewed by: bojan.novkovic_fer.hr, markj Fixes: e407849 Differential Revision: https://reviews.freebsd.org/D43339
Refactoring of argument list to nl_send_one() led to derefercing wrong union member. Rename nl_send_one() to a more generic name, isolate anew nl_send_one() as the callback only for the normal writer and provide correct argument to nl_send() from nl_send_group(). Fixes: ff5ad90
Work around vendors who use the same address for multiple ReadAckRegisters in their ACPI HEST table. This allows apei to attach cleanly on Ampere Altra servers. Note the issue is not specific to Ampere, I've run into it with at least one other vendor (whose server is not yet released). Sponsored by: Netflix Reviewed by: jhb
The logic analyzer in the T6 CIM block has a different capture size than previous chips. MFC after: 1 week Sponsored by: Chelsio Communications
The global hw.em.rx_process_limit knob has been replaced by the device- specific dev.IF.N.iflib.rx_budget along with the conversion to iflib(4). While at it, remove the - besides initialization of tx_process_limit - unused {r,t}x_process_limit members.
If in the future it becomes useful to plumb flags through all of these layers, we can add a field to struct vm_cheri_revoke_cookie. No functional change intended.
- Sync the syscall names and parameters with reality. - Use style(9) for code examples. - Change the wording in a couple of places where I found the text a bit unclear. - Remove the $FreeBSD$ tag.
Most of these flags had no effect, and they weren't documented and weren't used by any code in the base system. Remove them so that it's a bit easier to decide how async caprevoke scanning will fit into the revocation state machine. Avoid shuffling the values to avoid binary compatibility problems.
To handle CHERI_REVOKE_IGNORE_START, we use the kernel's notion of the current epoch rather than userspace's. Be sure to acquire the VM map lock first though, to avoid races which make the code harder to reason about.
I suspect that this was done to handle the possibility of td != curthread, since otherwise the vmspace stays live by virtue of being used by the currently executing thread. However, in practice, td == curthread, and some parts of the revocation scan assume that it is operating on the current vmspace, so acquiring a reference is not sufficient. Thus, simply stop acquiring a reference. This gives us less work to do in cleanup paths.
No functional change intended.
As archaic as it may seem, I'm quite used to being able to find a function definition with a search for "^free".
No functional change intended.
quarantine_flush() both invokes cheri_revoke() and pushes formerly quarantined memory into the backing allocator. With an asynchronous revocation scan, we need to defer the latter operation, so split it out into a separate function. Use this opportunity to deindent the inner loop and reduce ifdefery by inverting a test. No functional change intended.
The new probes give us an easy way to compare time spent in cheri_revoke() with time spent flushing quarantined memory.
mips is gone. No functional change intended.
No functional change intended.
We should never be operating on a system map, so assert that every time kern_cheri_revoke() is called.
No functional change intended.
No functional change intended.
While the intention behind it is good, it is not currently needed and complicates control flow in the implementation of asynchronous revocation passes in kern_cheri_revoke(), which is already too complicated. Let's remove it for now. No functional change intended.
This helps keep caprevoke state machine logic internal to kern_cheri_revoke.c. No functional change intended.
Previously, cheri_revoke() was always synchronous. Because it scans the entire mapped address space of the process, it can take a very long time to complete. This work is foisted onto an unsuspecting thread which happens to call into the allocator at the wrong time, so can result in undesirable stalls. Thus it becomes desirable to support asynchronous revocation. Asynchronous revocation can be implemented in userspace, but this has some complications: - Creating threads from libc is somewhat tricky. - We may wish to support multiple processes sharing an address space, in which case it's easier to keep track of async revocation state in the kernel. - Before the vmspace scan, we have to stop the process anyway, and it doesn't make a lot of sense to offload that to a separate thread in a single-threaded process. Asynchonous revocation in the kernel has the downside of not handling the work of flushing previously quarantined arenas and updating the bitmap. This work can be substantial. However, we can easily rate-limit it, and it's still much less expensive than scanning in general. One other downside of this approach is increased lock contention, since the caprevoke scan holds the VM map lock for long periods. Though, this is otherwise already a problem in multi-threaded processes. This implementation adds a new cheri_revoke(2) flag, CHERI_REVOKE_ASYNC, which is mutually exclusive with CHERI_REVOKE_LAST_PASS. When set, and no epoch is open, it behaves like cheri_revoke() with no flags set up until it comes time to scan the vmspace. Then, a request is submitted to a kernel worker process, which calls vm_cheri_revoke_pass() on the submitter's behalf. A subsequent call to cheri_revoke(2) with CHERI_REVOKE_ASYNC set will check the status of the scan and update the epoch accordingly if the scan was successful. If it was not successful, a new request is submitted in the hopes that the failure was transient. Some new fields are added to the VM map: - vm_cheri_async_revoke_st maintains the current asynchronous revocation state. It borrows the states from cheri_revoke_state_t; comments in revoke.h explain the state machine. - vm_cheri_async_revoke_status stores the return value of vm_cheri_revoke_pass(). Its value is valid only when the async state is CHERI_REVOKE_ST_CLOSING. - vm_cheri_async_revoke_shadow stores a capability for the userspace shadow bitmap, since the worker process cannot easily derive this capability. All are synchronized by the VM map lock.
Add a new environment variable, _RUNTIME_REVOCATION_ASYNC_REVOKE, which, when set, enables the use of asynchronous revocation scans in the kernel. By default it is not set. When asynchronous revocation is configured, MRS maintains at least two application quarantine arenas, plus a list of available arenas (i.e., those whose contents have been revoked and drained) and a list of arenas currently undergoing revocation. When the active application arena triggers revocation, it is swapped with a free arena and cheri_revoke(CHERI_REVOKE_ASYNC) is called to begin revocation. Subsequent calls poll the current revocation state and conditionally drain arenas as they become eligible. malloc_revoke() is implemented synchronously - it still needs to know whether MRS is running in async mode or not in order to figure out which arenas to revoke. This scheme is simple but has some caveats which need thought: - There is currently no backpressure, i.e., an application can in principle free memory faster than the revoker can scan, and thus it's possible for the application to accumulate an unbounded amount of quarantined memory. We should consider the conditions under which we might want to fall back to a blocking cheri_revoke() system call. - Application threads are still responsible for flushing previously quarantined memory. This can take a long time. We could perhaps divide this into smaller amounts of work. - snmalloc integration isn't implemented yet, as I don't know how it works. In particular, I'm not sure when to call snmalloc_flush_message_queue(). - Once all of the applications arenas "fill up", threads will acquire the app quarantine mutex upon each call to malloc() etc., which of course would scale poorly. We can make this condition harder to reach by providing more arenas, effectively increasing the amount of memory that we're allowed to keep in quarantine. Since mrs_free() already has to acquire the mutex upon each call, this problem is not the highest priority.
Following the example of other VM subsystems, put some counters under vm.stats so that it's easy to tell at a glance whether, for example, vm_fault() has returned errors. We'll likely add more counters over time. This is pretty rudimentary but doesn't add much overhead and can be helpful when digging into sources of anomalous behaviour.
The concern which motivated this check is the possibility of a map entry upgrading its protections to include READ_CAP, and thus getting missed by the revoker. However, the revocation scan only skips entries with READ_CAP in the max protections. In practice, vm_map_protect() is still blocked during a scan because it waits for the map to be unbusied.
Akin to other revocation policy sysctls, this one sets a flag which is used by MRS to decide whether to use asynchronous revocation. In non-setuid/gid processes, the _RUNTIME_REVOCATION_SYNC_REVOKE and _RUNTIME_REVOCATION_ASYNC_REVOKE environment variables can be used to override the global policy.
It is unnecessary, and allows us to avoid having to revoke capabilities in the would-be-mapped pages.
This catches a bug that arose in an early implementation of the asynchronous revocation mode.
brooksdavis
approved these changes
Mar 20, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a reasonable point to merge
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This stops just before the ABI change for many arguments but is post all of the FreeBSD merges. It includes all of the async revocation changes, though not the commit to enable it by default. More recent c18n changes in dev are not yet in this merge.