because struct stat is no longer assumed to correspond to the
structure used by the stat-family syscalls, it's not valid to make any
of these syscalls directly using a buffer of type struct stat.
commit 9493892021 moved all logic around
this change for stat-family functions into fstatat.c, making the
others wrappers for it. but a few other direct uses of the syscall
were overlooked. the ones in tmpnam/tempnam are harmless since the
syscalls are just used to test for file existence. however, the uses
in fchmodat and __map_file depend on getting accurate file properties,
and these functions may actually have been broken one or more mips
variants due to removal of conversion hacks from syscall_arch.h.
as a low-risk fix, simply use struct kstat in place of struct stat in
the affected places.
these did not truncate excess precision in the return value. fixing
them looks like considerable work, and the current C code seems to
outperform them significantly anyway.
long double functions are left in place because they are not subject
to excess precision issues and probably better than the C code.
for functions implemented in C, this is a requirement of C11 (F.6);
strictly speaking that text does not apply to standard library
functions, but it seems to be intended to apply to them, and C2x is
expected to make it a requirement.
failure to drop excess precision is particularly bad for inverse trig
functions, where a value with excess precision can be outside the
range of the function (entire range, or range for a particular
subdomain), breaking reasonable invariants a caller may expect.
this extends commit 5a105f19b5, removing
timer[fd]_settime and timer[fd]_gettime. the timerfd ones are likely
to have been used in software that started using them before it could
rely on libc exposing functions.
under _GNU_SOURCE for namespace cleanliness, analogous to other archs.
the original placement in sys/reg.h seems not to have been motivated;
such a header isn't even present on other implementations.
some nontrivial number of applications have historically performed
direct syscalls for these operations rather than using the public
functions. such usage is invalid now that time_t is 64-bit and these
syscalls no longer match the types they are used with, and it was
already harmful before (by suppressing use of vdso).
since syscall() has no type safety, incorrect usage of these syscalls
can't be caught at compile-time. so, without manually inspecting or
running additional tools to check sources, the risk of such errors
slipping through is high.
this patch renames the syscalls on 32-bit archs to clock_gettime32 and
gettimeofday_time32, so that applications using the original names
will fail to build without being fixed.
note that there are a number of other syscalls that may also be unsafe
to use directly after the time64 switchover, but (1) these are the
main two that seem to be in widespread use, and (2) most of the others
continue to have valid usage with a null timeval/timespec argument, as
the argument is an optional timeout or similar.
_POSIX_VDISABLE is only visible if unistd.h has already been included,
so conditional use of it here makes no sense. the value is always 0
anyway; it does not vary.
This patch adds an explicit cast to the int arguments passed to the
inline asm used in the RISC-V's implementation of `a_cas`, to ensure
that they are properly sign extended to 64 bits. They aren't
automatically sign extended by Clang, and GCC technically also doesn't
guarantee that they will be sign extended.
For Thumb2 compatibility, replace two instances of a single
instruction "orr with a variable shift" with the two instruction
equivalent. Neither of the replacements are in a performance critical
loop.
the bug fixed in commit b82cd6c78d was
mostly masked on arm because __hwcap was zero at the point of the call
from the dynamic linker to __set_thread_area, causing the access to
libc.auxv to be skipped and kuser_helper versions of TLS access and
atomics to be used instead of the armv6 or v7 versions. however, on
kernels with kuser_helper removed for hardening it would crash.
since __set_thread_area potentially uses __hwcap, it must be
initialized before the function is called. move the AT_HWCAP lookup
from stage 3 to stage 2b.
This enables alternative compilers, which may not define __GNUC__,
to implement alloca, which is still fairly widely used.
This is similar to how stdarg.h already works in musl; compilers must
implement __builtin_va_arg, there is no fallback definition.
this change was discussed on the mailing list thread for the linux
uapi v5.3 patches, and submitted as a v2 patch, but overlooked when I
applied the patches much later.
revert commit f291c09ec9 and apply the
v2 as submitted; the net change is just padding.
notes by Szabolcs Nagy follow:
compared to the linux uapi (and glibc) a padding is used instead of
aligned attribute for keeping the layout the same across targets, this
means the alignment of the struct may be different on some targets
(e.g. m68k where uint64_t is 2 byte aligned) but that should not affect
syscalls and this way the abi does not depend on nonstandard extensions.
at least gcc 9 broke execution of DT_INIT/DT_FINI for fdpic archs
(presently only sh) by recognizing that the stores to the
compound-literal function descriptor constructed to call them were
dead stores. there's no way to make a "may_alias function", so instead
launder the descriptor through an asm-statement barrier. in practice
just making the compound literal volatile seemed to have worked too,
but this should be less of a hack and more accurately convey the
semantics of what transformations are not valid.
commit 1c84c99913 moved the call to
__init_tp above the initialization of libc.auxv, inadvertently
breaking archs where __set_thread_area examines auxv for the sake of
determining the TLS/atomic model needed at runtime. this broke armv6
and sh2.
the syscall numbers were reserved in v5.3 but not wired up on mips, see
linux commit 0671c5b84e9e0a6d42d22da9b5d093787ac1c5f3
MIPS: Wire up clone3 syscall
mips application specific isa extensions were previously not exported
in hwcaps so userspace could not apply optimized code at runtime.
linux commit 38dffe1e4dde1d3174fdce09d67370412843ebb5
MIPS: elf_hwcap: Export userspace ASEs
allows waiting on a pidfd, in the future it might allow retrieving the
exit status by a non-parent process, see
linux commit 3695eae5fee0605f316fbaad0b9e3de791d7dfaf
pidfd: add P_PIDFD to waitid()
tcpi_rcv_ooopack for tracking connection quality:
linux commit f9af2dbbfe01def62765a58af7fbc488351893c3
tcp: Add TCP_INFO counter for packets received out-of-order
tcpi_snd_wnd peer window size for diagnosing tcp performance problems:
linux commit 8f7baad7f03543451af27f5380fc816b008aa1f2
tcp: Add snd_wnd to TCP_INFO
per thread prctl commands to relax the syscall abi such that top bits
of user pointers are ignored in the kernel. this allows the use of
those bits by hwasan or by mte to color pointers and memory on aarch64:
linux commit 63f0c60379650d82250f22e4cf4137ef3dc4f43d
arm64: Introduce prctl() options to control the tagged user addresses ABI
These were mainly introduced so android can optimize the memory usage
of unused apps.
MADV_COLD hints that the memory range is currently not needed (unlike
with MADV_FREE the content is not garbage, it needs to be swapped):
linux commit 9c276cc65a58faf98be8e56962745ec99ab87636
mm: introduce MADV_COLD
MADV_PAGEOUT hints that the memory range is not needed for a long time
so it can be reclaimed immediately independently of memory pressure
(unlike with MADV_DONTNEED the content is not garbage):
linux commit 1a4e58cce84ee88129d5d49c064bd2852b481357
mm: introduce MADV_PAGEOUT
the syscall number is reserved on all targets, but it is not wired up
on all targets, see
linux commit 8f6ccf6159aed1f04c6d179f61f6fb2691261e84
Merge tag 'clone3-v5.3' of ... brauner/linux
linux commit 8f3220a806545442f6f26195bc491520f5276e7c
arch: wire-up clone3() syscall
linux commit 7f192e3cd316ba58c88dfa26796cf77789dd9872
fork: add clone3
see
linux commit 7615d9e1780e26e0178c93c55b73309a5dc093d7
arch: wire-up pidfd_open()
linux commit 32fcb426ec001cb6d5a4a195091a8486ea77e2df
pid: add pidfd_open()
ptrace API to get details of the syscall the tracee is blocked in, see
linux commit 201766a20e30f982ccfe36bebfad9602c3ff574a
ptrace: add PTRACE_GET_SYSCALL_INFO request
the align attribute was used to keep the layout the same across targets
e.g. on m68k uint32_t is 2 byte aligned, this helps with compat ptrace.
adding this condition makes the entire convert_ioctl_struct function
and compat_map table statically unreachable, and thereby optimized out
by dead code elimination, on archs where they are not needed.
VIDIOC_OMAP3ISP_STAT_REQ is a device-specific command for the omap3isp
video device. the command number is in a device-private range and
therefore could theoretically be used by other devices too in the
future, but problematic clashes should not be able to arise without
intentional misuse.
This ensures that the musl definition of 'struct iphdr' does not conflict
with the Linux kernel UAPI definition of it.
Some software, i.e. net-tools, will not compile against 5.4 kernel headers
without this patch and the corresponding Linux kernel patch.
since time64 switchover has changed the size and layout of the struct
anyway, take the opportunity to fix it up so that it can be shared
between 32- and 64-bit ABIs on the same system as long as byte order
matches.
the ut_type member is explicitly padded to make up for m68k having
only 2-byte alignment; explicit padding has no effect on other archs.
ut_session is changed from long to int, with endian-matched padding.
this affects 64-bit archs as well, but brings the type into alignment
with glibc's x86_64 struct, so it should not break software, and does
not break on-disk format. the semantic type is int (pid-like) anyway.
the padding produces correct alignment for the ut_tv member on 32-bit
archs that don't naturally align it, so that ABI matches 64-bit.
this type is presently not used anywhere in the ABI between libc and
libc consumers; it's only used between pairs of consumers if a
third-party utmp library using the system utmpx.h is in use.
the elf_prstatus structure is used in core dumps, and the timeval
structures in it are longs matching the elf class, *not* the kernel
"old timeval" for the arch. this means using timeval here for x32 was
always wrong, despite kernel uapi headers and glibc also exposing it
this way, and of course it's wrong for any arch with 64-bit time_t.
rather than just changing the type on affected archs, use a tagless
struct containing long tv_sec and tv_usec members in place of the
timevals. this intentionally breaks use of them as timevals (e.g.
assignment, passing address, etc.) on 64-bit archs as well so that any
usage unsafe for 32-bit archs is caught even in software that only
gets tested on 64-bit archs. from what I could gather, there is not
any software using these members anyway. the only reason they need to
be fixed to begin with is that the only members which are commonly
used, the saved registers, follow the time members and have the wrong
offset if the time members are sized incorrectly.
commit ae388becb5 accidentally
introduced #define SYSCALL_NO_TLS 1 in mmap.c, which was probably a
stale change left around from unrelated syscall timing measurements.
reverse it.
this commit covers all remaining ioctls I'm aware of that use
time_t-derived types in their interfaces. it may still be incomplete,
and has undergone only minimal testing for a few commands used in
audio playback.
the SNDRV_PCM_IOCTL_SYNC_PTR command is special-cased because, rather
than the whole structure expanding, it has two substructures each
padded to 64 bytes that expand within their own 64-byte reserved zone.
as long as it's the only one of its type, it doesn't really make sense
to make a general framework for it, but the existing table framework
is still used for the substructures in the special-case. one of the
substructures, snd_pcm_mmap_status, has a snd_pcm_uframes_t member
which is not a timestamp but is expanded just like one, to match the
64-bit-arch version of the structure. this is handled just like a
timestamp at offset 8, and is the motivation for the conversions table
holding offsets of individual values to be expanded rather than
timespec/timeval type pairs.
for some of the types, the size to which they expand is dependent on
whether the arch's ABI aligns 8-byte types on 8-byte boundaries.
new_req entries in the table need to reflect this size to get the
right ioctl request number that will match what callers pass, but we
don't have access to the actual structure type definitions here and
duplicating them would be cumbersome. instead, the new_misaligned
macro introduced here constructs an artificial object whose size is
the result of expanding a misaligned timespec/timeval to 64-bit and
imposing the arch's alignment on the result, which can be passed to
the _IO{R,W,WR} macros.
record offsets of individual slots that expand from 32- to 64-bit,
rather than timespec/timeval pairs. this flexibility will be needed
for some ioctls. reduce size of types in table. adjust representation
of offsets to include a count rather than needing -1 padding so that
the table is less ugly and doesn't need large diffs if we increase max
number of slots.
with the current set of supported ioctls, this conversion is hardly an
improvement, but it sets the stage for being able to do alsa, v4l2,
ppp, and other ioctls with timespec/timeval-derived types. without
this capability, a lot of functionality users depend on would stop
working with the time64 switchover.