historically linux limited the number of supplementary groups a
process could be in to 32, but this limit was raised to 65536 in linux
2.6.4. proposals to support the new limit, change NGROUPS_MAX, or make
it dynamic have been stalled due to the impact it would have on
initgroups where the groups array exists in automatic storage.
the changes here decouple initgroups from the value of NGROUPS_MAX and
allow it to fall back to allocating a buffer in the case where
getgrouplist indicates the user has more supplementary groups than
could be reported in the buffer. getgrouplist already involves
allocation, so this does not pull in any new link dependency.
likewise, getgrouplist is already using the public malloc (vs internal
libc one), so initgroups does the same. if this turns out not to be
the best choice, both can be changed together later.
the initial buffer size is left at 32, but now as the literal value,
so that any potential future change to NGROUPS_MAX will not affect
initgroups.
commit cfa0a54c08 attempted to fix
rounding on archs where long double is not 80-bit (where LDBL_MANT_DIG
is not zero mod four), but failed to address the edge case where
rounding was skipped because LDBL_MANT_DIG/4 rounded down in the
comparison against the requested precision.
the rounding logic based on hex digit count is difficult to understand
and not well-motivated, so rather than try to fix it, replace it with
an explicit calculation in terms of number of bits to be kept, without
any truncating division operations. based on patch by Peter Ammon, but
with scalbn to apply the rounding exponent since the value will not
generally fit in any integer type. scalbn is used instead of scalbnl
to avoid pulling in the latter unnecessarily, since the value is an
exact power of two whose exponent range is bounded by LDBL_MANT_DIG, a
small integer.
The principal expressions defining acosh and acos are such that
acosh(z) = ±i acos(z)
where the + is only true on the Im(z)>0 half of the complex plane
(and partly on Im(z)==0 depending on number representation).
fix the comment without expanding on the details.
POSIX requires pwrite to honor the explicit file offset where the
write should take place even if the file was opened as O_APPEND.
however, linux historically defined the pwrite syscall family as
honoring O_APPEND. this cannot be changed on the kernel side due to
stability policy, but the addition of the pwritev2 syscall with a
flags argument opened the door to fixing it, and linux commit
73fa7547c70b32cc69685f79be31135797734eb6 adds the RWF_NOAPPEND flag
that lets us request a write honoring the file offset argument.
this patch changes the pwrite function to first attempt using the
pwritev2 syscall with RWF_NOAPPEND, falling back to using the old
pwrite syscall only after checking that O_APPEND is not set for the
open file. if O_APPEND is set, the operation fails with EOPNOTSUPP,
reflecting that the kernel does not support the correct behavior. this
is an extended error case needed to avoid the wrong behavior that
happened before (writing the data at the wrong location), and is
aligned with the spirit of the POSIX requirement that "An attempt to
perform a pwrite() on a file that is incapable of seeking shall result
in an error."
since the pwritev2 syscall interprets the offset of -1 as a request to
write at the current file offset, it is mapped to a different negative
value that will produce the expected error.
pwritev, though not governed by POSIX at this time, is adjusted to
match pwrite in honoring the offset.
added in linux kernel commit 73fa7547c70b32cc69685f79be31135797734eb6.
this is added now as a prerequisite for fixing pwrite/pwritev behavior
for O_APPEND files.
the jis0208 table we use is only 84x94 in size, but the shift_jis
encoding supports a 94x94 grid. attempts to convert sequences outside
of the supported zone resulted in out-of-bounds table reads,
misinterpreting adjacent rodata as part of the character table and
thereby converting these sequences to unexpected characters.
this is not needed, but may act as a hint to the compiler, and also
serves to suppress unused function warnings if enabled (on by default
since commit 86ac0f7947).
this is how it's defined in the cp936 document referenced by the IANA
charset registry as defining GBK, and of the mappings defined there,
was the only one missing.
it is not accepted for GB18030, as GB18030 is a UTF and has its own
unique mapping for the euro symbol.
- add mount_setattr from linux v5.12
- add epoll_pwait2 from linux v5.11
- add process_madvise from linux v5.10
- add __NR_faccessat2 from linux v5.8
- add pidfd_getfd and openat2 syscall numbers from linux v5.6
- add clone3 syscall number from linux v5.3
- add process_mrelease from linux v5.15
- add futex_waitv from linux v5.16
- add set_mempolicy_home_node from linux v5.17
- add cachestat from linux v6.4
- add __NR_fchmodat2 from linux v6.6
despite riscv32 being natively time64, the IPC_TIME64 bit (0x100) is
set in IPC_STAT and derived command macros, differentiating their
values from the raw command values used to interface with the kernel.
this reflects that the kernel ipc structure types are not natively
time64, but have broken-down hi/lo fields that cannot be used in-place
and require translation, and that the userspace struct types differ
from the kernel types (relevant to things like strace).
These are mostly copied from riscv64. _Addr and _Reg had to become int
to match compiler-controlled parts of the ABI (result type of sizeof,
etc.). There is no kernel stat struct; the userspace stat matches
glibc in the sizes and offsets of all fields (including glibc's
__dev_t __pad1). The jump buffer is 12 words larger to account for 12
saved double-precision floats; additionally it should be 64-bit
aligned to save doubles.
The syscall list was significantly revised by deleting all time32 and
pre-statx syscalls, and renaming several syscalls that have different
names depending on __BITS_PER_LONG, notably mmap2 and _llseek.
futex was added as an alias to futex_time64 since it is widely used by
software which does not pass time arguments.
__res_send returns the full answer length even if it didn't fit the
buffer, but __dns_parse expects the length of the filled part of the
buffer.
This is analogous to commit 77327ed064,
which fixed the only other __dns_parse call site.
A child process created by posix_spawn reports errors to its parent via
a pipe, retrying infinitely on any write error to prevent falsely
reporting success. If the (original) parent dies before write is
attempted, there is nobody to report to, but the child will remain
stuck in the write loop forever if SIGPIPE is blocked or ignored.
Fix this by not retrying write if it fails with EPIPE.
user_regs_struct and user_fp_struct were missing from the initial
commit of the port.
the union type for elf_fpreg_t and the new value of ELF_NFPREG are
made consistent with glibc.
originally, compilers did not provide these macros and we had to
provide them ourselves. this meant we were redefining them, which was
technically invalid unless the token sequence of the original
definition matched exactly.
the original patch proposed by Jules Maselbas to fix this made the
definitions conditional on them not already being defined; however I
suggested using #undef to avoid any possibly-wrong definitions already
in place and ensure that the definitions are 1. the version adopted as
commit 8b70486807 made this change.
unfortunately, gcc is loud about not liking #undef of any __STDC_*
macro name, and while warnings are suppressed in the system include
path, there is apparently no way to suppress this warning if the
system include dir has also been provided via -I.
while normally we don't go out of our way to satisfy warnings over
style in the public headers, in this case, it seems to be a matter of
disagreement over contract of which part of "the implementation" is
entitled to define or undefine macros belonging to the implementation,
and it's quite reasonable to conclude that the compiler may reject
attempts to undefine them.
this commit reverts to the originally-submitted version of the patch
making the definitions conditional.
this code dates back to the original commit of the sh port, with no
real clue as to how the bug was introduced. it looks like it was
written to assume the return address was pushed to the stack like on
x86, rather than arriving in the pr special register.
commit 0dc4824479 worked around for lack
of flags argument in syscall for fchmodat.
linux 6.6 introduced a new syscall, SYS_fchmodat2, fixing this
deficiency. use it if any flags are passed, and fallback to the old
strategy on ENOSYS. continue using the old syscall when there are no
flags. this is the exact same strategy used when SYS_faccessat2 was used
to implement faccessat with flags.
the linux fchmodat syscall lacks a flag argument that is necessary to
implement the posix api, see
linux commit 09da082b07bbae1c11d9560c8502800039aebcea
fs: Add fchmodat2()
linux commit 78252deb023cf0879256fcfbafe37022c390762b
arch: Register fchmodat2, usually as syscall 452
see
linux commit cf264e1329fb0307e044f7675849f9f38b44c11a
cachestat: implement cachestat syscall
linux commit 946e697c69ffeeefdd84dad90eac307284df46be
cachestat: wire up cachestat for other architectures
see
linux commit c6018b4b254971863bd0ad36bb5e7d0fa0f0ddb0
mm/mempolicy: add set_mempolicy_home_node syscall
linux commit 21b084fdf2a49ca1634e8e360e9ab6f9ff0dee11
mm/mempolicy: wire up syscall set_mempolicy_home_node
see
linux commit 039c0ec9bb77446d7ada7f55f90af9299b28ca49
futex,x86: Wire up sys_futex_waitv()
linux commit ea7c45fde5aa3e761aaddb7902a31a95cb120e7b
futex,arm: Wire up sys_futex_waitv()
linux commit b3ff2881ba18b852f79f5476d7631940071f1adb
MIPS: syscalls: Wire up futex_waitv syscall
linux commit 6c122360cf2f4c5a856fcbd79b4485b7baec942a
s390: wire up sys_futex_waitv system call
linux commit a0eb2da92b715d0c97b96b09979689ea09faefe6
futex: Wireup futex_waitv syscall
see
linux commit 884a7e5964e06ed93c7771c0d7cf19c09a8946f1
mm: introduce process_mrelease system call
linux commit dce49103962840dd61423d7627748d6c558d58c5
mm: wire up syscall process_mrelease
see
linux commit 7bb7f2ac24a028b20fca466b9633847b289b156a
arch, mm: wire up memfd_secret system call where relevant
linux commit 1507f51255c9ff07d75909a84e7c0d7f3c4b2f49
mm: introduce memfd_secret system call to create "secret" memory areas
linux commit b633896314c0f78f2b4eb7b19a530d68f2a35445
tools headers UAPI: Sync s390 syscall table file that wires up the
memfd_secret syscall
this commit should make no codegen change for existing archs, but is a
prerequisite for new archs including riscv32. the wait4 emulation
backend provides both cancellable and non-cancellable variants because
waitpid is required to be a cancellation point, but all of our other
uses are not, and most of them cannot be.
based on patch by Stefan O'Rear.
commit f47a5d400b overlooked that
strtoul was responsible for setting p to a const-laundered copy of the
format string pointer f, even in the case where there was no number to
parse. by making the call conditional on isdigit, that copy was lost.
the logic here is a mess and should be cleaned up, but for now, this
seems to be the least invasive change that undoes the breakage.
commit f247462b08 incorrectly hid ppoll
in the presence of _GNU_SOURCE due to an oversight that defining
_BSD_SOURCE does not implicitly define _GNU_SOURCE. at present,
headers still have to explicitly check for each feature profile level;
this may be changed at some point in the future via features.h, but
has not been changed yet.
depending on contents of the LC_TIME locale, log messages could be
malformatted (especially if the ABMON strings contain non-alphabetic
characters) or the subsequent code could invoke undefined behavior,
via passing a timebuf[] with unspecified contents to snprintf, if
the translated ABMON string did not fit in the 16-byte timebuf.
this does not appear to be a security-relevant bug, as locale loading
functionality is intentionally not available to set*id programs -- the
MUSL_LOCPATH environment variable is ignored when libc.secure is true,
and custom locales are not loadable without it.