Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rv32 submit v3 #2

Open
wants to merge 155 commits into
base: master
Choose a base branch
from
Open

Rv32 submit v3 #2

wants to merge 155 commits into from

Conversation

AndrewD
Copy link

@AndrewD AndrewD commented Oct 8, 2021

rebased onto musl master. Identical to v1.2.2 PR.

richfelker and others added 30 commits September 3, 2020 17:30
prior to commit 685e40b, x86_64 was
correctly passing O_LARGEFILE to SYS_open; it was removed (defined to
0 in the public header, and changed to use the public definition) as
part of that change, probably out of a mistaken belief that it's not
needed.

however, on a mixed system with 32-bit and 64-bit binaries, it's
important that all files be opened with O_LARGEFILE, even if the
opening process is 64-bit, in case a descriptor is passed to a 32-bit
process. otherwise, attempts to access past 2GB in the 32-bit process
could produce EOVERFLOW.

most 64-bit archs added later got this right alread, except for
mips64. x32 was also affected. there are now fixed.
the fcntl file locking command macro values in the existing generic
bits/fcntl.h were the "64" variants, requiring 64-bit archs that use
the "plain" variants to have their own bits/fcntl.h, even if they
otherwise use the common definitions for everything.

since commit 7cc79d1 exposed
__LONG_MAX to all bits headers, we can now make the generic one common
between 32- and 64-bit archs.
these were only using a custom version because they needed the
"non-64" variants of the file locking command macros.
see

  linux commit 480274787d7e3458bc5a7cfbbbe07033984ad711
  tcp: add TCP_INFO status for failed client TFO
also added clone3 on sh and m68k, on sh it's still missing (not
yet wired up), but reserved so safe to add.

see

  linux commit fddb5d430ad9fa91b49b1d34d0202ffe2fa0e179
  open: introduce openat2(2) syscall

  linux commit 9a2cef09c801de54feecd912303ace5c27237f12
  arch: wire up pidfd_getfd syscall

  linux commit 8649c322f75c96e7ced2fec201e123b2b073bf09
  pid: Implement pidfd_getfd syscall

  linux commit e8bb2a2a1d51511e6b3f7e08125d52ec73c11139
  m68k: Wire up clone3() syscall
add IPPROTO_ETHERNET and IPPROTO_MPTCP, see

  linux commit 2677625387056136e256c743e3285b4fe3da87bb
  seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number

  linux commit faf391c3826cd29feae02078ca2022d2f912f7cc
  tcp: Define IPPROTO_MPTCP
TCP_NLA_TIMEOUT_REHASH queries timeout-triggered rehash attempts,
tcpm_ifindex limits the scope of TCP_MD5SIG* sockopt to a device.

see

  linux commit 32efcc06d2a15fa87585614d12d6c2308cc2d3f3
  tcp: export count for rehash attempts

  linux commit 6b102db50cdde3ba2f78631ed21222edf3a5fb51
  net: Add device index to tcp_md5sig
The use of TCP_ in udp.h is not known, fortunately udp.h is not
specified by posix so there are no strict namespace rules, added in

  linux commit e27cca96cd68fa2c6814c90f9a1cfd36bb68c593
  xfrm: add espintcp (RFC 8229)
needed for storage drivers with userspace component that may
run in the IO path, see

  linux commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463
  prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
added in

  linux commit 75551dbf112c992bc6c99a972990b3f272247e23
  random: add GRND_INSECURE to return best-effort non-cryptographic bytes
reuses a bit from CSIGNAL so it can only be used with unshare
and clone3, added in

  linux commit 769071ac9f20b6a447410c7eaa55d1a5233ef40c
  ns: Introduce Time Namespace
these were missed before, added in

  linux commit 1201937491822b61641c1878ebcd16a93aed4540
  arm64: Expose ARMv8.5 CondM capability to userspace

  linux commit ca9503fc9e9812aa6258e55d44edb03eb30fc46f
  arm64: Expose FRINT capabilities to userspace
added in

  linux commit 1a50ec0b3b2e9a83f1b1245ea37a853aac2f741c
  arm64: Implement archrandom.h for ARMv8.5-RNG

  linux commit d4209d8b717311d114b5d47ba7f8249fd44e97c2
  arm64: cpufeature: Export matrix and other features to userspace
see

  linux commit 9e2ba2c34f1922ca1e0c7d31b30ace5842c2e7d1
  fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name

  linux commit 44d705b0370b1d581f46ff23e5d33e8b5ff8ec58
  fanotify: report name info for FAN_DIR_MODIFY event
it remaps anon mappings without unmapping the original. chromeos plans
to use it with userfaultfd, see:

  linux commit e346b3813067d4b17383f975f197a9aa28a3b077
  mm/mremap: add MREMAP_DONTUNMAP to mremap()
add TCP_NLA_BYTES_NOTSENT and new tcp_zerocopy_receive fields, see

  linux commit c8856c051454909e5059df4e81c77b9c366c5515
  tcp-zerocopy: Return inq along with tcp receive zerocopy.

  linux commit 33946518d493cdf10aedb4a483f1aa41948a3dab
  tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.

  linux commit e08ab0b377a1489760533424437c5f4be7f484a4
  tcp: add bytes not sent to SCM_TIMESTAMPING_OPT_STATS
the linux faccessat syscall lacks a flag argument that is necessary
to implement the posix api, see

  linux commit c8ffd8bcdd28296a198f237cc595148a8d4adfbe
  vfs: add faccessat2 syscall
On x86 and aarch64 GNU properties may be used to mark ELF objects.
Ethernet protocol number for media redundancy protocol, see

  linux commit 4714d13791f831d253852c8b5d657270becb8b2a
  bridge: uapi: mrp: Add mrp attributes.
commit 0a05eac implemented AT_EACCESS
for faccessat with a horrible hack, creating a child process to change
switch uid/gid and perform the access probe without making potentially
irreversible changes to the caller's credentials. this was due to the
syscall lacking a flags argument.

linux 5.8 introduced a new syscall, SYS_faccessat2, fixing this
deficiency. use it if any flags are passed, and fallback to the old
strategy on ENOSYS. continue using the old syscall when there are no
flags.
taking the deprecated/dropped vfork spec strictly, doing pretty much
anything but execve in the child is wrong and undefined. however,
these are commonly needed operations to setup the child state before
exec, and historical implementations tolerated them.

for single-threaded parents, these operations already worked as
expected in the vforked child. however, due to the need for __synccall
to synchronize id/resource limit changes among all threads, calling
these functions in the vforked child of a multithreaded parent caused
a misdirected broadcast signaling of all threads in the parent. these
signals could kill the parent entirely if the synccall signal handler
had never been installed in the parent, or could be ignored if it had,
or could signal/kill one or more utterly wrong processes if the parent
already terminated (due to vfork semantics, only possible via fatal
signal) and the parent tids were recycled. in any case, the expected
number of semaphore posts would never happen, so the child would
permanently hang (with all signals blocked) waiting for them.

to mitigate this, and also make the normal usage case work as
intended, treat the condition where the caller's actual tid does not
match the tid in its thread structure as single-threaded, and bypass
the entire synccall broadcast operation.
previously, if a file descriptor had aio operations pending in the
parent before fork, attempting to close it in the child would attempt
to cancel a thread belonging to the parent. this could deadlock, fail,
or crash the whole process of the cancellation signal handler was not
yet installed in the parent. in addition, further use of aio from the
child could malfunction or deadlock.

POSIX specifies that async io operations are not inherited by the
child on fork, so clear the entire aio fd map in the child, and take
the aio map lock (with signals blocked) across the fork so that the
lock is kept in a consistent state.
the dummy definition of __abort_lock in sigaction.c was performing
exactly the same role that putting the lock in its own source file
could and should have been used to achieve.

while we're moving it, give it a proper declaration.
if the multithreaded parent forked while another thread was calling
sigaction for SIGABRT or calling abort, the child could inherit a lock
state in which future calls to abort will deadlock, or in which the
disposition for SIGABRT has already been reset to SIG_DFL. this is
nonconforming since abort is AS-safe and permitted to be called
concurrently with fork or in the MT-forked child.
this makes the code slightly smaller and eliminates these functions
from relevance to possible future changes to multithreaded fork.

the barrier of a_store isn't technically needed here, but a_store is
used anyway for internal consistency of the memory model.
queue_ctors should not be called with the init_fini_lock held, since
it may longjmp out on allocation failure. this introduces a minor
TOCTOU race with p->constructed, but one already exists further down
anyway, and by design it's okay to run through the queue more than
once anyway. the only reason we bother to check p->constructed at all
is to avoid spurious failure of dlopen when the library is already
fully loaded and constructed.
commit 188759b documented the intent
to allow recursive dlopen based on tracking ctor_visitor, but used a
kernel tid rather than the pthread_t to identify the caller. as a
result, it would not behave as intended under fork by a ctor, where
the child tid would not match.
this is in preparation for implementing _Fork from POSIX-future,
factored as a separate commit to improve readability of history.
the _Fork interface is defined for future issue of POSIX as the
outcome of Austin Group issue 62, which drops the AS-safety
requirement for fork, and provides an AS-safe replacement that does
not run the registered atfork handlers.
Érico Rolim and others added 30 commits April 20, 2021 15:34
based on the pthread_setname_np implementation
the function already returns (void *)
on riscv64 this syscall is called __NR_newfstatat
this helps the name match kernel UAPI for external
programs
previously, the contents of the TZ variable were considered a
candidate for a file/path name only if they began with a colon or
contained a slash before any comma. the latter was very sloppy logic
to avoid treating any valid POSIX TZ string as a file name, but it
also triggered on values that are not valid POSIX TZ strings,
including 3-letter timezone names without any offset.

instead, only treat the TZ variable as POSIX form if it begins with a
nonzero standard time name followed by +, -, or a digit.

also, special case GMT and UTC to always be treated as POSIX form
(with implicit zero offset) so that a stray file by the same name
cannot break software that depends on setting TZ=GMT or TZ=UTC.
the kernel structure has padding of the shm_segsz member up to 64
bits, as well as 2 unused longs at the end. somehow that was
overlooked when the powerpc port was added, and it has been broken
ever since; applications compiled with the wrong definition do not
correctly see the shm_segsz, shm_cpid, and shm_lpid members.

fixing the definition just by adding the missing padding would break
the ABI size of the structure as well as the position of the time64
shm_atime and shm_dtime members we added at the end. instead, just
move one of the unused padding members from the original end (before
time64) of the structure to the position of the missing padding. this
preserves size and preserves correct behavior of any compiled code
that was already working. programs affected by the wrong definition
need to be recompiled with the correct one.
due to historical reasons, the mips signal set has 128 bits rather
than 64 like on every other arch. this was special-cased correctly, at
least for 32-bit mips, at one time, but was inadvertently broken in
commit 7c44097, and seems never to
have been right on mips64/n32.

as consequenct of this bug, applications making use of high realtime
signal numbers on mips may have been able to execute application code
in contexts where doing so was unsafe.
len is unsigned and can never be smaller than 0. though unlikely, an
error in read() would have lead to an out of bounds write to name.

Reported-by: Michael Forney <[email protected]>
commit 6d99ad9 introduced this
regression as part of a larger change, based on an incorrect
assumption that rdhwr being part of the mips r2 ISA level meant that
the TLS register, known in the mips documentation as UserLocal, was
unconditionally present on chips providing this ISA level and would
not need trap-and-emulate. this turns out to be false.

based on research by Stanislav Kljuhhin and Abilio Marques, who
reported the problem as a performance regression on certain routers
using OpenWRT vs older uclibc-based versions, it turns out the mips
manuals document the UserLocal register as a feature that might or
might not be implemented or enabled, reflected by a cpu capability bit
in the CONFIG3 register, and that Linux checks for this and has to
explicitly enable it on models that have it.

thus, it's indeed possible that r2+ chips can lack the feature,
bringing us back to the situation where Linux only has a fast
trap-and-emulate path for the case where the destination register is
$3. so, always read the thread pointer through $3. this may incur a
gratuitous move to the desired final register on chips where it's not
needed, but it really doesn't matter.
…mcpy

both passing a null pointer to memcpy with length 0, and adding 0 to a
null pointer, are undefined. in some sense this is 'benign' UB, but
having it precludes use of tooling that strictly traps on UB. there
may be better ways to fix it, but conditioning the operations which
are intended to be no-ops in the k==0 case on k being nonzero is a
simple and safe solution.
When the soft-float ABI for PowerPC was added in commit
5a92dd9, with Freescale cpus using
the alternative SPE FPU as the main use case, it was noted that we
could probably support hard float on them, but that it would involve
determining some difficult ABI constraints. This commit is the
completion of that work.

The Power-Arch-32 ABI supplement defines the ABI profiles, and indeed
ATR-SPE is built on ATR-SOFT-FLOAT. But setjmp/longjmp compatibility
are problematic for the same reason they're problematic on ARM, where
optional float-related parts of the register file are "call-saved if
present". This requires testing __hwcap, which is now done.

In keeping with the existing powerpc-sf subarch definition, which did
not have fenv, the fenv macros are not defined for SPE and the SPEFSCR
control register is left (and assumed to start in) the default mode.
we make qsort a wrapper by providing a wrapper_cmp function that uses
the extra argument as a function pointer. should be optimized to a tail
call on most architectures, as long as it's built with
-fomit-frame-pointer, so the performance impact should be minimal.

to keep the git history clean, for now qsort_r is implemented in qsort.c
and qsort is implemented in qsort_nr.c.  qsort.c also received a few
trivial cleanups, including replacing (*cmp)() calls with cmp().
qsort_nr.c contains only wrapper_cmp and qsort as a qsort_r wrapper
itself.
riscv32 and future architectures lack the _time32 variants entirely, so
don't try to use their numbers.
fix merge conflict in v2 submit
riscv32 and future architectures only provide prlimit64.
riscv64 and future architectures only provide the clock_ functions.
We need to make internal syscalls to SYS_statx when SYS_fstatat is not
available without changing the musl API.
riscv32 and future architectures lack it.
riscv32 and future architectures lack wait4.

waitpid is required by POSIX to be a cancellation point.  pclose is
specified as undefined if a cancellation occurs, so it would be
permitted for it to call a cancellable wait function; however, as a
quality of implementation matter, pclose must close the pipe fd before
it can wait (consider popen("yes","r")) and if the wait could be
interrupted the pipe FILE would be left in an intermediate state that
portable software cannot recover from, so the only useful behavior is
for pclose to NOT be a cancellation point.  We therefore support both at
a small cost in code size.

wait4 is historically not a cancellation point in musl; we retain that
since we need the non-cancellable version of __wait4 anyway.
Matches glibc behavior and fixes a case where we could fall off the
function without returning a value.
not empty because buildroot removes removes empty files generated by a patch...
These are mostly copied from riscv64.  _Addr and _Reg had to become int
to avoid errors in libstdc++ when size_t and std::size_t mismatch.
There is no kernel stat struct; the userspace stat matches glibc in the
sizes and offsets of all fields (including glibc's __dev_t __pad1).  The
jump buffer is 12 words larger to account for 12 saved double-precision
floats; additionally it should be 64-bit aligned to save doubles.

The syscall list was significantly revised by deleting all time32 and
pre-statx syscalls, and renaming several syscalls that have different
names depending on __BITS_PER_LONG, notably mmap2 and _llseek.

futex was added as an alias to futex_time64 since it is widely used by
software which does not pass time arguments.
These are identical to riscv64.
Identical to riscv64.
Largely copied from riscv64 but required recalculation of offsets.
Identical to riscv64 except for stack offsets in clone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.