the linux faccessat syscall lacks a flag argument that is necessary
to implement the posix api, see
linux commit c8ffd8bcdd28296a198f237cc595148a8d4adfbe
vfs: add faccessat2 syscall
added in
linux commit 1a50ec0b3b2e9a83f1b1245ea37a853aac2f741c
arm64: Implement archrandom.h for ARMv8.5-RNG
linux commit d4209d8b717311d114b5d47ba7f8249fd44e97c2
arm64: cpufeature: Export matrix and other features to userspace
these were missed before, added in
linux commit 1201937491822b61641c1878ebcd16a93aed4540
arm64: Expose ARMv8.5 CondM capability to userspace
linux commit ca9503fc9e9812aa6258e55d44edb03eb30fc46f
arm64: Expose FRINT capabilities to userspace
also added clone3 on sh and m68k, on sh it's still missing (not
yet wired up), but reserved so safe to add.
see
linux commit fddb5d430ad9fa91b49b1d34d0202ffe2fa0e179
open: introduce openat2(2) syscall
linux commit 9a2cef09c801de54feecd912303ace5c27237f12
arch: wire up pidfd_getfd syscall
linux commit 8649c322f75c96e7ced2fec201e123b2b073bf09
pid: Implement pidfd_getfd syscall
linux commit e8bb2a2a1d51511e6b3f7e08125d52ec73c11139
m68k: Wire up clone3() syscall
the fcntl file locking command macro values in the existing generic
bits/fcntl.h were the "64" variants, requiring 64-bit archs that use
the "plain" variants to have their own bits/fcntl.h, even if they
otherwise use the common definitions for everything.
since commit 7cc79d10af exposed
__LONG_MAX to all bits headers, we can now make the generic one common
between 32- and 64-bit archs.
prior to commit 685e40bb09, x86_64 was
correctly passing O_LARGEFILE to SYS_open; it was removed (defined to
0 in the public header, and changed to use the public definition) as
part of that change, probably out of a mistaken belief that it's not
needed.
however, on a mixed system with 32-bit and 64-bit binaries, it's
important that all files be opened with O_LARGEFILE, even if the
opening process is 64-bit, in case a descriptor is passed to a 32-bit
process. otherwise, attempts to access past 2GB in the 32-bit process
could produce EOVERFLOW.
most 64-bit archs added later got this right alread, except for
mips64. x32 was also affected. there are now fixed.
dtv_copy, canary2, and canary_at_end existed solely to match multiple
ABI and asm-accessed layouts simultaneously. now that pthread_arch.h
can be included before struct __pthread is defined, the struct layout
can depend on macros defined by pthread_arch.h.
the adjustment made is entirely a function of TLS_ABOVE_TP and
TP_OFFSET. aside from avoiding repetition of the TP_OFFSET value and
arithmetic, this change makes pthread_arch.h independent of the
definition of struct __pthread from pthread_impl.h. this in turn will
allow inclusion of pthread_arch.h to be moved to the top of
pthread_impl.h so that it can influence the definition of the
structure.
previously, arch files were very inconsistent about the type used for
the thread pointer. this change unifies the new __get_tp interface to
always use uintptr_t, which is the most correct when performing
arithmetic that may involve addresses outside the actual pointed-to
object (due to TP_OFFSET).
a number of users performing seccomp filtering have requested use of
the new individual syscall numbers for socket syscalls, rather than
the legacy multiplexed socketcall, since the latter has the arguments
all in memory where they can't participate in filter decisions.
previously, some archs used the multiplexed socketcall if it was
historically all that was available, while other archs used the
separate syscalls. the intent was that the latter set only include
archs that have "always" had separate socket syscalls, at least going
back to linux 2.6.0. however, at least powerpc, powerpc64, and sh were
wrongly included in this set, and thus socket operations completely
failed on old kernels for these archs.
with the changes made here, the separate syscalls are always
preferred, but fallback code is compiled for archs that also define
SYS_socketcall. two such archs, mips (plain o32) and microblaze,
define SYS_socketcall despite never having needed it, so it's now
undefined by their versions of syscall_arch.h to prevent inclusion of
useless fallback code.
some archs, where the separate syscalls were only added after the
addition of SYS_accept4, lack SYS_accept. because socket calls are
always made with zeros in the unused argument positions, it suffices
to just use SYS_accept4 to provide a definition of SYS_accept, and
this is done to make happy the macro machinery that concatenates the
socket call name onto __SC_ and SYS_.
signal 7 is SIGEMT on Linux mips* ABI according to the man pages and
kernel. it's not clear where the wrong name came from but it dates
back to original mips commit.
it's been reported that the vdso clock_gettime64 function on (32-bit)
arm is broken, producing erratic results that grow at a rate far
greater than one reported second per actual elapsed second. the vdso
function seems to have been added sometime between linux 5.4 and 5.6,
so if there's ever been a working version, it was only present for a
very short window.
it's not clear what the eventual upstream kernel solution will be, but
something needs to be done on the libc side so as not to be producing
binaries that seem to work on older/existing/lts kernels (which lack
the function and thus lack the bug) but will break fantastically when
moving to newer kernels.
hopefully vdso support will be added back soon, but with a new symbol
name or version from the kernel to allow continued rejection of broken
ones.
Linux defines MAP_SYNC on powerpc and powerpc64 as of commit
22fcea6f85f2 ("mm: move MAP_SYNC to asm-generic/mman-common.h"),
so we can stop undefining it on those architectures.
on all mips variants, Linux did (and maybe still does) have some
syscall return paths that wrongly return both the error flag in r7 and
a negated error code in r2. in particular this happened for at least
some causes of ENOSYS.
add an extra check to only negate the error code if it's positive to
begin with.
bug report and concept for patch by Andreas Dröscher.
commit 4221f154ff added the r7
constraint apparently out of a misunderstanding of the breakage it was
addressing, and did so because the asm was in a shared macro used by
all the __syscallN inline functions. now "+r" is used in the output
section for the forms 4-argument and up, so having it in input is
redundant, and the forms with 0-3 arguments don't need it as an input
at all.
the r2 constraint is kept because without it most gcc versions (seems
to be all prior to 9.x) fail to honor the output register binding for
r2. this seems to be a variant of gcc bug #87733.
both the r7 and r2 input constraints look useless, but the r2 one was
a quiet workaround for gcc bug 87733, which affects all modern
versions prior to 9.x, so it's kept and documented.
exactly revert commit 604f8d3d8b which
was wrong; it caused a major regression on Linux versions prior to
2.6.36. old kernels did not properly preserve r2 across syscall
restart, and instead restarted with the instruction right before
syscall, imposing a contract that the previous instruction must load
r2 from an immediate or a register (or memory) not clobbered by the
syscall.
effectivly revert commit ddc7c4f936
which was wrong; it caused a major regression on Linux versions prior
to 2.6.36. old kernels did not properly preserve r2 across syscall
restart, and instead restarted with the instruction right before
syscall, imposing a contract that the previous instruction must load
r2 from an immediate or a register (or memory) not clobbered by the
syscall.
since other changes were made since, including removal of the struct
stat conversion that was replaced by separate struct kstat, this is
not a direct revert, only a functional one.
the "0"(r2) input constraint added back seems useless/erroneous, but
without it most gcc versions (seems to be all prior to 9.x) fail to
honor the output register binding for r2. this seems to be a variant
of gcc bug #87733. further changes should be made later if a better
workaround is found, but this one has been working since 2012. it
seems this issue was encountered but misidentified then, when it
inspired commit 4221f154ff.
this extends commit 5a105f19b5, removing
timer[fd]_settime and timer[fd]_gettime. the timerfd ones are likely
to have been used in software that started using them before it could
rely on libc exposing functions.
under _GNU_SOURCE for namespace cleanliness, analogous to other archs.
the original placement in sys/reg.h seems not to have been motivated;
such a header isn't even present on other implementations.
some nontrivial number of applications have historically performed
direct syscalls for these operations rather than using the public
functions. such usage is invalid now that time_t is 64-bit and these
syscalls no longer match the types they are used with, and it was
already harmful before (by suppressing use of vdso).
since syscall() has no type safety, incorrect usage of these syscalls
can't be caught at compile-time. so, without manually inspecting or
running additional tools to check sources, the risk of such errors
slipping through is high.
this patch renames the syscalls on 32-bit archs to clock_gettime32 and
gettimeofday_time32, so that applications using the original names
will fail to build without being fixed.
note that there are a number of other syscalls that may also be unsafe
to use directly after the time64 switchover, but (1) these are the
main two that seem to be in widespread use, and (2) most of the others
continue to have valid usage with a null timeval/timespec argument, as
the argument is an optional timeout or similar.
This patch adds an explicit cast to the int arguments passed to the
inline asm used in the RISC-V's implementation of `a_cas`, to ensure
that they are properly sign extended to 64 bits. They aren't
automatically sign extended by Clang, and GCC technically also doesn't
guarantee that they will be sign extended.
the syscall numbers were reserved in v5.3 but not wired up on mips, see
linux commit 0671c5b84e9e0a6d42d22da9b5d093787ac1c5f3
MIPS: Wire up clone3 syscall
mips application specific isa extensions were previously not exported
in hwcaps so userspace could not apply optimized code at runtime.
linux commit 38dffe1e4dde1d3174fdce09d67370412843ebb5
MIPS: elf_hwcap: Export userspace ASEs
the syscall number is reserved on all targets, but it is not wired up
on all targets, see
linux commit 8f6ccf6159aed1f04c6d179f61f6fb2691261e84
Merge tag 'clone3-v5.3' of ... brauner/linux
linux commit 8f3220a806545442f6f26195bc491520f5276e7c
arch: wire-up clone3() syscall
linux commit 7f192e3cd316ba58c88dfa26796cf77789dd9872
fork: add clone3
see
linux commit 7615d9e1780e26e0178c93c55b73309a5dc093d7
arch: wire-up pidfd_open()
linux commit 32fcb426ec001cb6d5a4a195091a8486ea77e2df
pid: add pidfd_open()
commit 4d3a162d00 overlooked that the
mips64 reloc.h dependent on endian.h not only for setting the ABI ldso
name to match the byte order, but also for use of the byte swapping
macros. they are needed to override R_TYPE, R_SYM, and R_INFO, to
compensate for a mips "quirk" of always using big endian order for
symbol references in relocations.
part of that commit canot be reverted because the original code was
wrong: it's invalid to define _GNU_SOURCE or any feature test macro
in reloc.h, or anywhere except at the top of a source file. however,
thanks to commit 316730cdc7, the feature
test macro is no longer needed to access the endian-swapping macros,
so simply bringing back the #include directive suffices.
now that all 32-bit archs have 64-bit time_t (and suseconds_t), the
arch-provided _Int64 macro (long or long long, as appropriate) can be
used to define them, and arch-specific definitions are no longer
needed.
now that all 32-bit archs have 64-bit time types, the values for the
time-related ioctls can be shared. the mechanism for this is an
arch/generic version of the bits header. archs which don't use the
generic header still need to duplicate the definitions.
x32, which does not use the new time64 values of the macros, already
has its own overrides, so this commit does not affect it.
now that all 32-bit archs have 64-bit time types, the values for the
time-related socket option macros can be treated as universal for
32-bit archs. the sys/socket.h mechanism for this predates
arch/generic and is instead in the top-level header.
x32, which does not use the new time64 values of the macros, already
has its own overrides, so this commit does not affect it.
this commit preserves ABI fully for existing interface boundaries
between libc and libc consumers (applications or libraries), by
retaining existing symbol names for the legacy 32-bit interfaces and
redirecting sources compiled against the new headers to alternate
symbol names. this does not necessarily, however, preserve the
pairwise ABI of libc consumers with one another; where they use
time_t-derived types in their interfaces with one another, it may be
necessary to synchronize updates with each other.
the intent is that ABI resulting from this commit already be stable
and permanent, but it will not be officially so until a release is
made. changes to some header-defined types that do not play any role
in the ABI between libc and its consumers may still be subject to
change.
mechanically, the changes made by this commit for each 32-bit arch are
as follows:
- _REDIR_TIME64 is defined to activate the symbol redirections in
public headers
- COMPAT_SRC_DIRS is defined in arch.mak to activate build of ABI
compat shims to serve as definitions for the original symbol names
- time_t and suseconds_t definitions are changed to long long (64-bit)
- IPC_STAT definition is changed to add the IPC_TIME64 bit (0x100),
triggering conversion of semid_ds, shmid_ds, and msqid_ds split
low/high time bits into new time_t members
- structs semid_ds, shmid_ds, msqid_ds, and stat are modified to add
new 64-bit time_t/timespec members at the end, maintaining existing
layout of other members.
- socket options (SO_*) and ioctl (sockios) command macros are
redefined to use the kernel's "_NEW" values.
in addition, on archs where vdso clock_gettime is used, the
VDSO_CGT_SYM macro definition in syscall_arch.h is changed to use a
new time64 vdso function if available, and a new VDSO_CGT32_SYM macro
is added for use as fallback on kernels lacking time64.
these definitions are copied from generic bits/ioctl.h, so that x32
keeps the "_OLD" versions (which are already time64 on x32) when
32-bit archs switch to 64-bit time_t.
these definitions are merely copied from the top-level sys/socket.h,
so there is no functional change at this time. however, the top-level
definitions will change to use the time64 "_NEW" versions on 32-bit
archs when time_t is switched over to 64-bit. this commit ensures that
change will be suppressed on x32.
these structures can now be defined generically in terms of endianness
and long size. previously, the 32-bit archs all shared a common
definition from the generic bits header, and each 64-bit arch had to
repeat the 64-bit version, with endian conditionals if the arch had
variants of each endianness.
I would prefer getting rid of the preprocessor conditionals for
padding and instead using unnamed bitfield members, like commit
9b2921bea1 did for struct timespec.
however, at present sendmsg, recvmsg, and recvmmsg need access to the
padding members by name to zero them. this could perhaps be cleaned up
in the future.
being that it contains pointers and (from the kernel perspective,
which is wrong) size_t members, x32 uses the 32-bit version of the
structure, not a half-32-bit, half-64-bit layout like we had here. the
x86_64 definition was inadvertently copied when x32 was first added.
unlike errors in the opposite direction (missing padding), this error
was not easily detected breakage, because the layout of the commonly
used initial subset of members still matched. breakage could only be
observed in the presence of control messages or flags.
this is analogous to commit 40aa18d55a.
so far, there are not any actual time64 versions of the rusage
syscalls (getrusage and wait4) and might never be. however, the
existing x32 ones behave the way time64 versions would if they
existed: using 64-bit slots in place of all longs.
presently, wait4 and getrusage are broken on x32, storing the timevals
correctly but messing up everything else due to the long/kernel-long
mismatch. this would be a huge buffer overflow if not for the 16
reserved slots we left long ago, which suffice to prevent 14
double-sized longs from overflowing into unrelated memory. this commit
will make it possible to fix them.
this is to match the kernel and glibc interfaces. here, struct pt_regs
is an incomplete type, but that's harmless, and if it's completed by
inclusion of another header then members of the struct pointed to by
the regs member can be accessed directly without going through a cast
or intermediate pointer object.
the userspace ucontext API has this as an array rather than a
structure.
commit 3c59a86895 fixed the
corresponding mistake for vrregset_t, namely that the original
powerpc64 port used a mix of types from 32-bit powerpc and powerpc64
rather than matching the 64-bit types.
policy has long been that these definitions are purely a function of
whether long/pointer is 32- or 64-bit, and that they are not allowed
to vary per-arch. move the definition to the shared alltypes.h.in
fragment, using integer constant expressions in terms of sizeof to
vary the array dimensions appropriately. I'm not sure whether this is
more or less ugly than using preprocessor conditionals and two sets of
definitions here, but either way is a lot less ugly than repeating the
same thing for every arch.
LLONG_MAX is uniform for all archs we support and plenty of header and
code level logic assumes it is, so it does not make sense for limits.h
bits mechanism to pretend it's variable.
LONG_BIT can be defined in terms of LONG_MAX; there's no reason to put
it in bits.
by moving LONG_MAX definition to __LONG_MAX in alltypes.h and moving
LLONG_MAX out of bits, there are now no plain-C limits that are
defined in the bits header, so the bits header only needs to be
included in the POSIX or extended profiles. this allows the feature
test macro logic to be removed from the bits header, facilitating a
long-term goal of getting such logic out of bits.
having __LONG_MAX in alltypes.h will allow further generalization of
headers.
archs without a constant PAGESIZE no longer need bits/limits.h at all.
building on commit 97d35a552e,
__BYTE_ORDER is now available wherever alltypes.h is included. since
reloc.h is only used from src/internal/dynlink.h, it can be assumed
that __BYTE_ORDER is exposed. reloc.h is not permitted to be included
in other contexts, and generally, like most arch headers, lacks
inclusion guards that would allow such usage. the mips64 version
mistakenly included such guards; they are removed for consistency.
building on commit 97d35a552e,
__BYTE_ORDER is now available wherever alltypes.h is included.
endian.h should not be used since, in the future, it will expose
identifiers that are not in the reserved namespace for the headers
which were previously using it.
this change is motivated by the intersection of several factors.
presently, despite being a nonstandard header, endian.h is exposing
the unprefixed byte order macros and functions only if _BSD_SOURCE or
_GNU_SOURCE is defined. this is to accommodate use of endian.h from
other headers, including bits headers, which need to define structure
layout in terms of endianness. with time64 switch-over, even more
headers will need to do this.
at the same time, the resolution of Austin Group issue 162 makes
endian.h a standard header for POSIX-future, requiring that it expose
the unprefixed macros and the functions even in standards-conforming
profiles. changes to meet this new requirement would break existing
internal usage of endian.h by causing it to violate namespace where
it's used.
instead, have the arch's alltypes.h define __BYTE_ORDER, either as a
fixed constant or depending on the right arch-specific predefined
macros for determining endianness. explicit literals 1234 and 4321 are
used instead of __LITTLE_ENDIAN and __BIG_ENDIAN so that there's no
danger of getting the wrong result if a macro is undefined and
implicitly evaluates to 0 at the preprocessor level.
the powerpc (32-bit) bits/endian.h being removed had logic for varying
endianness, but our powerpc arch has never supported that and has
always been big-endian-only. this logic is not carried over to the new
__BYTE_ORDER definition in alltypes.h.
now that commit f7f1079796 removed the
legacy i386 conditional definition, va_list is in no way
arch-specific, and has no reason to be in the future. move it to the
shared part of alltypes.h.in
commit ffaaa6d230 removed the
corresponding stdarg.h support for compilers without va_list builtins,
but failed to remove the alternate type definition, leaving incorrect
va_list definitions in place with compilers that don't define __GNUC__
with a value >= 3.
commit ab3eb89a8b removed it as part of
correcting the mcontext_t definition, but there is still code using
struct sigcontext and expecting the member names present in it, most
notably libgcc_eh. almost all such usage is incorrect, but bring back
struct sigcontext at least for now so as not to introduce regressions.
in order for sys/procfs.h (provided by sys/user.h) to be useful, it
needs to match the API its consumers (gdb, etc.) expect, including the
member names established by glibc.
this partly reverts commit 29e8737f81,
which partly reverted d493206de7,
eliminating struct user_fpregs_struct which seems to have had no
precedent and using union __riscv_mc_fp_state for elf_fpregset_t. this
requires indirect inclusion of signal.h to make union
__riscv_mc_fp_state visible, but being that these are nonstandard
"junk" headers with no official restrictions on what they can pull in,
that's no big deal.
split off and expanded from patch by Khem Raj.
the top-level mcontext_t member names were namespace-violating in
standards profiles before, and nested-level member names (some of them
single-letter) were egregiously bad namespace impositions even in
non-strict profiles. moreover, they mismatched those used in the
public API first defined in glibc, breaking any code making use of
them.
unlike most archs, the public API used in glibc for riscv mcontext_t
members was designed to be namespace-safe, so we can and should expose
the members regardless of feature test macros. only the typedefs for
greg_t, gregset_t, and fpregset_t need to be protected behind FTMs.
the struct tags for mcontext_t and ucontext_t are also changed. for
mcontext_t this is necessary to make the common definition across
profiles namespace-safe. for ucontext_t, it's just a matter of
matching the tag from the glibc-defined API.
these changes are split off and expanded from a patch by Khem Raj.
analogous to commit ddc7c4f936 for
mips64 and n32, remove the hack to load the syscall number into $2 via
asm, and use a constraint to let the compiler load it instead.
now, only $4, $5, and $6 are potential input-only registers. $2 is
always input and output, and $7 is both when it's an argument,
otherwise output-only. previously, $7 was treated as an input (with a
"1" constraint matching its output position) even when it was not an
input, which was arguably undefined behavior (asm input from
indeterminate value). this is corrected.
as before, $8, $9, and $10 are conditionally input-output registers
for 5-, 6-, and 7-argument syscalls. their role in input is carrying
in the values that will be stored on the stack for arguments 5-7.
their role in output is carrying back whatever the kernel has
clobbered them with, so that the compiler cannot assume they still
contain the input values.
mips r6 (an incompatible isa from traditional mips) removes the hi and
lo registers used for mul/div results. older gcc versions accepted
them in the clobber list for asm, but their presence is incorrect and
breaks on later versions.
in the process of fixing this, the clobber list for 32-bit mips
syscalls has been deduplicated via a macro like on mips64 and n32.
The operand sepcifiers in a_cas and a_cas_p for riscv64 were incorrect:
there's a backwards branch in the routine, so despite tmp being written
at the end of the assembly fragment it cannot be allocated in one of the
input registers because the input values may be needed for another trip
around the loop.
For code that follows the guaranteed forward progress requirements, the
backwards branch is rarely taken: SiFive's hardware only fails a store
conditional on execptional cases (ie, instruction cache misses inside
the loop), and until recently a bug in QEMU allowed back-to-back
store conditionals to succeed. The bug has been fixed in the latest
QEMU release, but it turns out that the fix caused this latent bug in
musl to manifest.
AT_HWCAP2 flags, see
linux commit 671db581815faf17cbedd7fcbc48823a247d90b1
arm64: Expose DC CVADP to userspace
linux commit 06a916feca2b262ab0c1a2aeb68882f4b1108a07
arm64: Expose SVE2 features for userspace
new mount api syscalls were added, same numers on all targets, see
linux commit a07b20004793d8926f78d63eb5980559f7813404
vfs: syscall: Add open_tree(2) to reference or clone a mount
linux commit 2db154b3ea8e14b04fee23e3fdfd5e9d17fbc6ae
vfs: syscall: Add move_mount(2) to move mounts around
linux commit 24dcb3d90a1f67fe08c68a004af37df059d74005
vfs: syscall: Add fsopen() to prepare for superblock creation
linux commit ecdab150fddb42fe6a739335257949220033b782
vfs: syscall: Add fsconfig() for configuring and managing a context
linux commit 93766fbd2696c2c4453dd8e1070977e9cd4e6b6d
vfs: syscall: Add fsmount() to create a mount for a superblock
linux commit cf3cba4a429be43e5527a3f78859b1bfd9ebc5fb
vfs: syscall: Add fspick() to select a superblock for reconfiguration
linux commit 9c8ad7a2ff0bfe58f019ec0abc1fb965114dde7d
uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]
linux commit d8076bdb56af5e5918376cd1573a6b0007fc1a89
uapi: Wire up the mount API syscalls on non-x86 arches [ver #2]
historically, a number of 32-bit archs used long rather than int for
wchar_t, for no good reason. GCC still uses the historical types, but
clang replaced them all with int, and it seems PCC uses int too.
mismatching the compiler's type for wchar_t is not an option due to
wide string literals.
note that the mismatch does not affect C++ ABI since wchar_t is its
own builtin type/keyword in C++, distinct from both int and long, not
a typedef.
i386 already worked around this by honoring __WCHAR_TYPE__ if defined
by the compiler, and only using the official legacy ABI type if not.
add the same to the other affected archs.
it might make sense at some point to switch to using int as the
default if __WCHAR_TYPE__ is not defined, if the expectations is that
new compilers will treat int as the correct choice, but it's unlikely
that the case where __WCHAR_TYPE__ is undefined will ever be used
anyway. I actually wanted to move the definition of wchar_t to the
top-level shared alltypes.h.in, using __WCHAR_TYPE__ and falling back
to int if not defined, but that can't be done without assuming all
compilers define __WCHAR_TYPE__ thanks to some pathological archs
where the ABI has wchar_t as an unsigned type.
due to historical accident/sloppiness in glibc, the powerpc,
powerpc64, and sh versions of struct user, defined by sys/user.h, used
struct pt_regs from the kernel asm/ptrace.h for their regs member.
this made it impossible to define the type in an API-compatible manner
without either including asm/ptrace.h like glibc does (contrary to our
policy of not depending on kernel headers), or clashing with
asm/ptrace.h's definition of struct pt_regs if both headers are
included (which is almost always the case in software using
sys/user.h).
for a long time I viewed this problem as having no reasonable fix. I
even explored the possibility of having the powerpc[64] and sh
versions of user.h just include the kernel header (breaking with
policy), but that looked like it might introduce new clashes with
sys/ptrace.h. and it would also bring in a lot of additional cruft
that makes no sense for sys/user.h to expose. glibc goes out of its
way to suppress some of that with #undef, possibly leading to
different problems. this is a rabbit-hole that should be explored no
further.
as it turns out, however, nothing actually uses struct user
sufficiently to care about the type of the regs member; most software
including sys/user.h does not even use struct user at all. so, the
problem can be fixed just by doing away with the insistence on strict
glibc API compatibility for the struct tag of the regs member.
rather than renaming the tag, which might lead to the new name
entering use as API, simply use an untagged structure inside struct
user with the same members/layout as struct pt_regs.
for sh, struct pt_dspregs is just removed entirely since it was not
used.
R_PPC_UADDR32 (R_PPC64_UADDR64) has the same meaning as R_PPC_ADDR32
(R_PPC64_ADDR64), except that its address need not be aligned. For
powerpc64, BFD ld(1) will automatically convert between ADDR<->UADDR
relocations when the address is/isn't at its native alignment. This
will happen if, for example, there is a pointer in a packed struct.
gold and lld do not currently generate R_PPC64_UADDR64, but pass
through misaligned R_PPC64_ADDR64 relocations from object files,
possibly relaxing them to misaligned R_PPC64_RELATIVE. In both cases
(relaxed or not) this violates the PSABI, which defines the relevant
field type as "a 64-bit field occupying 8 bytes, the alignment of
which is 8 bytes unless otherwise specified."
All three linkers violate the PSABI on 32-bit powerpc, where the only
difference is that the field is 32 bits wide, aligned to 4 bytes.
Currently musl fails to load executables linked by BFD ld containing
R_PPC64_UADDR64, with the error "unsupported relocation type 43".
This change provides compatibility with BFD ld on powerpc64, and any
static linker on either architecture that starts following the PSABI
more closely.
the contents conflicted with asm/ptrace.h. glibc does not provide
anything in user.h for riscv, so software cannot be depending on it.
simplified from patch submitted by Baruch Siach.
Rename user registers struct definitions to avoid conflict with the
asm/ptrace.h kernel header that defines the same structs. Use the
__riscv_mc prefix as glibc does.
commit f3f96f2daa added these for the
rest of the archs, but the patch it corresponded to missed riscv64
since riscv64 was not yet upstream at the time. this caused commit
dfc81828f7 to break riscv64 build, due
to a wrong assumption that SYS_statx was unconditionally defined.
otherwise, 32-bit archs that could otherwise share the generic
bits/ipc.h would need to duplicate the struct ipc_perm definition,
obscuring the fact that it's the same. sysvipc is not widely used and
these headers are not commonly included, so there is no performance
gain to be had by limiting the number of indirectly included files
here.
files with the existing time32 definition of IPC_STAT are added to all
current 32-bit archs now, so that when it's changed the change will
show up as a change rather than addition of a new file where it's less
obvious that the value is changing vs the generic one that was used
before.
without this, the SIOCGSTAMP and SIOCGSTAMPNS ioctl commands, for
obtaining timestamps, would stop working on pre-5.1 kernels after
time_t is switched to 64-bit and their values are changed to the new
time64 versions.
new code is written such that it's statically unreachable on 64-bit
archs, and on existing 32-bit archs until the macro values are changed
to activate 64-bit time_t.
without this, the SO_RCVTIMEO and SO_SNDTIMEO socket options would
stop working on pre-5.1 kernels after time_t is switched to 64-bit and
their values are changed to the new time64 versions.
new code is written such that it's statically unreachable on 64-bit
archs, and on existing 32-bit archs until the macro values are changed
to activate 64-bit time_t.
these differ from generic only in using endian-matched padding with a
short __ipc_perm_seq field in place of the int field in generic. this
is not a documented public interface anyway, and the original intent
was to use int here. some ports just inadvertently slipped in the
kernel short+padding form.
previously these differed from generic because they needed their own
definitions of IPC_64. now that it's no longer in public header,
they're identical.
the definition of the IPC_64 macro controls the interface between libc
and the kernel through syscalls; it's not a public API. the meaning is
rather obscure. long ago, Linux's sysvipc *id_ds structures used
16-bit uids/gids and wrong types for a few other fields. this was in
the libc5 era, before glibc. the IPC_64 flag (64 is a misnomer; it's
more like 32) tells the kernel to use the modern[-ish] versions of the
structures.
the definition of IPC_64 has nothing to do with whether the arch is
32- or 64-bit. rather, due to either historical accident or
intentional obnoxiousness, the kernel only accepts and masks off the
0x100 IPC_64 flag conditional on CONFIG_ARCH_WANT_IPC_PARSE_VERSION,
i.e. for archs that want to provide, or that accidentally provided,
both. for archs which don't define this option, no masking is
performed and commands with the 0x100 bit set will fail as invalid. so
ultimately, the definition is just a matter of matching an arbitrary
switch defined per-arch in the kernel.
this layout is more common already than the old generic, and should
become even more common in the future with new archs added and with
64-bit time_t on 32-bit archs.
some of these were not exact duplicates, but had gratuitously
different naming for padding, or omitted the endian checks because the
arch is fixed-endian.
this layout is slightly less common than the old generic one, but only
because x86_64 and x32 wrongly (according to comments in the kernel
headers) copied the i386 padding. for future archs, and with 64-bit
time_t on 32-bit archs, the new layout here will become the most
common, and it makes sense to treat it as the generic.
various padding fields in the generic bits/sem.h were defined in terms
of time_t as a cheap hack standing in for "kernel long", to allow x32
to use the generic version of the file. this was a really bad idea, as
it ended up getting copied into lots of arch-specific versions of the
bits file, and is a blocker to changing time_t to 64-bit on 32-bit
archs.
this commit adds an x32-specific version of the header, and changes
padding type back from time_t to long (currently the same type on all
archs but x32) in the generic header and all the others the hack got
copied into.
this layout is more common already than the old generic, and should
become even more common in the future with new archs added and with
64-bit time_t on 32-bit archs.
the duplicate arch-specific copies are not removed yet in this commit,
so as to assist git tooling in copy/rename tracking.
there are more archs sharing the generic 64-bit version of the struct,
which is uniform and much more reasonable, than sharing the current
"generic" one, and depending on how time64 sysvipc is done for 32-bit
archs, even more may be sharing the "64-bit version" in the future.
so, duplicate the current generic to all archs using it (arm, i386,
m68k, microblaze, or1k) so that the generic can be changed freely.
this is recorded as its own commit mainly as a hint to git tooling, to
assist in copy/move tracking.
the x32 syscall interfaces treat timespec's tv_nsec member as 64-bit
despite the API type being long and long being 32-bit in the ABI. this
is no problem for syscalls that store timespecs to userspace as
results, but caused uninitialized padding to be misinterpreted as the
high bits in syscalls that take timespecs as input.
since the beginning of the port, we've dealt with this situation with
hacks in syscall_arch.h, and injected between __syscall_cp_c and
__syscall_cp_asm, to special-case the syscall numbers that involve
timespecs as inputs and copy them to a form suitable to pass to the
kernel.
commit 40aa18d55a set the stage for
removal of these hacks by letting us treat the "normal" x32 syscalls
dealing with timespec as if they're x32's "time64" syscalls,
effectively making x32 ax "time64-only 32-bit arch" like riscv32 will
be when it's added. since then, all users of syscalls that x32's
syscall_arch.h had hacks for have been updated to use time64 syscalls,
so the hacks can be removed.
there are still at least a few other timespec-related syscalls broken
on x32, which were overlooked when the x32 hacks were done or added
later. these include at least recvmmsg, adjtimex/clock_adjtime, and
timerfd_settime, and they will be fixed independently later on.
x32 is odd in that it's the only ILP32 arch/ABI we have where time_t
is 64-bit rather than (32-bit) long, and this has always been
problematic in that it results in struct timespec having unused
padding space, since tv_nsec has type long, which the kernel insists
be zero- or sign-extended (due to negative tv_nsec being invalid, it
doesn't matter which) to match the x86_64 type.
up til now, we've had really ugly hacks in x32/syscall_arch.h to patch
up the timespecs passed to the kernel. but the same requirement to
zero- or sign-extend tv_nsec also applies to all the new time64
syscalls on true 32-bit archs. so let's take advantage of this to
clean things up.
this patch defines all of the time64 syscalls for x32 as aliases for
the existing syscalls by the same name. this establishes the following
invariants:
- if the time64 form is defined, it takes time arguments as 64-bit
objects, and tv_nsec inputs must be zero-/sign-extended to 64-bit.
- if the time64 form is not defined, or if the time64 form is defined
and is not equal to the "plain" form, the plain form takes time
arguments as longs.
this will avoid the need for protocols for archs to define appropriate
types for each family of syscalls, and for the reader of the code to
have to be aware of such type definitions.
in some sense it might be simpler if the plain syscall form were
undefined for x32, so that it would always take longs if defined.
however, a number of these syscalls are used in contexts with a null
time argument, or (e.g. futex) for commands that don't involve time at
all, and having to introduce time64-specific logic to all those call
points does not make sense. thus, while the "plain" forms are kept now
just because they're needed until the affected code is converted over,
they'll also almost surely be kept in the future as well.
kernel support for x32 was added long after the utimensat syscall was
already available, so having a fallback is just wasted code size.
also, for changes related to time64 support on 32-bit archs, I want to
be able to assume the old futimesat syscall always works with longs,
which is true except for x32. by ensuring that it's not used on x32,
the needed invariant is established.
now that we have a kstat structure decoupled from the public struct
stat, we can just use the broken kernel structures directly and let
the code in fstatat do the translation.
presently, all archs/ABIs have struct stat matching the kernel
stat[64] type, except mips/mipsn32/mips64 which do conversion hacks in
syscall_arch.h to work around bugs in the kernel type. this patch
completely decouples them and adds a translation step to the success
path of fstatat. at present, this is just a gratuitous copying, but it
opens up multiple possibilities for future support for 64-bit time_t
on 32-bit archs and for cleaned-up/unified ABIs.
for clarity, the mips hacks are not yet removed in this commit, so the
mips kstat structs still correspond to the output of the hacks in
their syscall_arch.h files, not the raw kernel type. a subsequent
commit will fix this.
these were overlooked during review. bits headers are not allowed to
pull in additional headers (note: that rule is currently broken in
other places but just for endian.h). string.h has no place here
anyway, and including bits/alltypes.h without defining macros to
request types from it is a nop.
the "A" constraint is simply for an address expression that's a single
register, but it's not yet supported by clang, and has no advantage
here over just using a register operand for the address. the latter is
actually preferable in the a_cas_p case because it avoids aliasing an
lvalue onto the memory.
most egregious problem was the lack of memory clobber and lack of
volatile asm; this made the atomics memory barriers but not compiler
barriers. use of "+r" rather than "=r" for a clobbered temp was also
wrong, since the initial value is indeterminate.
having "+r"(a0) is redundant with "0"(a0) in syscalls with at least 1
arg, which is arguably a constraint violation (clang treats it as
such), and an invalid input with indeterminate value in the 0-arg
case. use the "=r"(a0) form instead.
ever since inline syscalls were added for (o32) mips in commit
328810d325, the asm has nonsensically
loaded the syscall number, rather than taking $2 as an input
constraint to let the compiler load it. commit
cfc09b1ecf improved on this somewhat by
allowing a constant syscall number to propagate into an immediate, but
missed that the whole operation made no sense.
now, only $4, $5, $6, $8, and $9 are potential input-only registers.
$2 is always input and output, and $7 is both when it's an argument,
otherwise output-only. previously, $7 was treated as an input (with a
"1" constraint matching its output position) even when it was not an
input, which was arguably undefined behavior (asm input from
indeterminate value). this is corrected.
this patch is not purely non-functional changes, since before, $8 and
$9 were wrongly in the clobberlist for syscalls with fewer than 5 or 6
arguments. of course it's impossible for syscalls to have different
clobbers depending on their number of arguments. the clobberlist for
the recently-added 5- and 6-argument forms was correct, and for the 0-
to 4-argument forms was erroneously copied from the mips o32 ABI where
the additional arguments had to be passed on the stack.
in making this change, I reviewed the kernel sources, and $8 and $9
are always saved for 64-bit kernels since they're part of the syscall
argument list for n32 and n64 ABIs.
a fully thumb1 build is not supported because some asm files are
incompatible with thumb1, but apparently it works to compile the C
code as thumb1
commit 06fbefd100 caused this regression
but introducing use of the clz instruction, which is not supported in
arm mode prior to v5, and not supported in thumb prior to thumb2
(v6t2). commit 1b9406b03c fixed the
issue only for arm mode pre-v5 but left thumb1 broken.
Commit 3517d74a5e changed the token in
sys/ioctl.h from 0x01 to 1, so bits/termios.h no longer matches. Revert
the bits/termios.h change to keep the headers in sync.
This reverts commit 9eda4dc69c.
this was apparently copied from x86_64; it's not part of the kernel
API for riscv64. this change eliminates the need for a
riscv64-specific bits header and lets it use the generic one.
syscall numbers are now synced up across targets (starting from 403 the
numbers are the same on all targets other than an arch specific offset)
IPC syscalls sem*, shm*, msg* got added where they were missing (except
for semop: only semtimedop got added), the new semctl, shmctl, msgctl
imply IPC_64, see
linux commit 0d6040d4681735dfc47565de288525de405a5c99
arch: add split IPC system calls where needed
new 64bit time_t syscall variants got added on 32bit targets, see
linux commit 48166e6ea47d23984f0b481ca199250e1ce0730a
y2038: add 64-bit time_t syscalls to all 32-bit architectures
new async io syscalls got added, see
linux commit 2b188cc1bb857a9d4701ae59aa7768b5124e262e
Add io_uring IO interface
linux commit edafccee56ff31678a091ddb7219aba9b28bc3cb
io_uring: add support for pre-mapped user IO buffers
a new syscall got added that uses the fd of /proc/<pid> as a stable
handle for processes: allows sending signals without pid reuse issues,
intended to eventually replace rt_sigqueueinfo, kill, tgkill and
rt_tgsigqueueinfo, see
linux commit 3eb39f47934f9d5a3027fe00d906a45fe3a15fad
signal: add pidfd_send_signal() syscall
on some targets (arm, m68k, s390x, sh) some previously missing syscall
numbers got added as well.
Author: Alex Suykov <alex.suykov@gmail.com>
Author: Aric Belsito <lluixhi@gmail.com>
Author: Drew DeVault <sir@cmpwn.com>
Author: Michael Clark <mjc@sifive.com>
Author: Michael Forney <mforney@mforney.org>
Author: Stefan O'Rear <sorear2@gmail.com>
This port has involved the work of many people over several years. I
have tried to ensure that everyone with substantial contributions has
been credited above; if any omissions are found they will be noted
later in an update to the authors/contributors list in the COPYRIGHT
file.
The version committed here comes from the riscv/riscv-musl repo's
commit 3fe7e2c75df78eef42dcdc352a55757729f451e2, with minor changes by
me for issues found during final review:
- a_ll/a_sc atomics are removed (according to the ISA spec, lr/sc
are not safe to use in separate inline asm fragments)
- a_cas[_p] is fixed to be a memory barrier
- the call from the _start assembly into the C part of crt1/ldso is
changed to allow for the possibility that the linker does not place
them nearby each other.
- DTP_OFFSET is defined correctly so that local-dynamic TLS works
- reloc.h LDSO_ARCH logic is simplified and made explicit.
- unused, non-functional crti/n asm files are removed.
- an empty .sdata section is added to crt1 so that the
__global_pointer reference is resolvable.
- indentation style errors in some asm files are fixed.