The instruction encoding that would be "br %r0" is not actually a
branch to r0, but instead a nop/memory-barrier. gcc 14 has been found
to choose r0 for the "r"(pc) constraint, breaking CRTJMP.
This patch adjusts the inline assembly constraints and marks "pc" as
address ("a"), which disallows usage of r0.
commit 8cca79a72c added use of SYS_pause
to exit() without accounting for newer archs omitting the syscall.
use the newly-added __sys_pause abstraction instead, which uses
SYS_ppoll when SYS_pause is missing.
newer archs lack the syscall. the pause() function accounted for this
with its own #ifdef, but that didn't allow use of the syscall directly
elsewhere, so move the logic to macros in src/internal/syscall.h where
it can be shared.
commit b817541f1c introduced statx with
a fallback using fstatat, but failed to fill in stx_rdev_major/minor
and stx_attributes[_mask]. the rdev omission has been addressed
separately. rather than explicitly zeroing the attributes and their
mask, pre-fill the entire structure with zeros. this will also cover
the padding adjacent to stx_mode, in case it's ever used in the
future.
explicit zeroing of stx_btime is removed since, with this change, it
will already be pre-zeroed. as an aside, zeroing it was not strictly
necessary, since STATX_BASIC_STATS does not include STATX_BTIME and
thus does not indicate any validity for it.
The current implementation of the statx function fails to set the
values of stx->stx_rdev_major and stx->stx_rdev_minor if the statx
syscall fails with ENOSYS and thus the statx function has to fall back
on fstatat-based emulation.
the value placed in the aux vector AT_MINSIGSTKSZ by the kernel is
purely the signal frame size, and does not include any execution space
for the signal handler. this is contrary to the POSIX definition of
MINSIGSTKSZ to be a value that can actually execute at least some
minimal signal handler, and contrary to the historical definitions of
MINSIGSTKSZ which had at least 1k of headroom.
commit 996b6154b2 added support for
querying the dynamic limit but did not enforce it in sigaltstack. the
kernel also does not seem to reliably enforce it, or at least does not
necessarily enforce the same limit exposed to userspace, so it needs
to be enforced here.
internally, printf always works with the maximal-size supported
integer and floating point formats. however, the space needed to
format a floating point number is proportional to the mantissa and
exponent ranges. on archs where long double is larger than double,
knowing that the actual value fit in double allows us to use a much
smaller buffer, roughly 1/16 the size.
as a bonus, making the working buffer a VLA whose dimension depends on
the format specifier prevents the compiler from lifting the stack
adjustment to the top of printf_core. this makes it so printf calls
without floating point arguments do not waste even the smaller amount
of stack space needed for double, making it much more practical to use
printf in tightly stack-constrained environments.
linux puts hung-up ttys in a state where ioctls produce EIO, and may
do the same for other types of devices in error or shutdown states.
such an error clearly does not mean the device is not a tty, but it
also can't reliably establish that the device is a tty, so the only
safe thing to do seems to be reporting the error. programs that don't
check errno will conclude that the device is not a tty, which is no
different from what happens now, but at least they gain the option to
differentiate between the cases.
commit c84971995b introduced the errno
collapsing behavior, but prior to that, errno was not set at all by
isatty.
this is purely a readability change, not a functional one. all of the
integer format cases use a common tail for handling precision logic
after the string representation of the number has been generated. the
code as I originally wrote it was overly clever in the aim of making a
point that the flow could be done without goto, and jumped over
intervening cases by wrapping them in if (0) { }, with the case labels
for each inside the conditional block scope.
this has been a perpetual source of complaints about the readability
and comprehensibility of the file, so I am now changing it to
explicitly jump to the tail logic with goto statements.
this is the same as cp850, but with the euro symbol replacing the
lowercase dotless i at 0xd5. it is significant because it's used by
thermal receipt printers.
the comment does not match the required or actual behavior when x<0
and y is not an integer. while it could be corrected, the role of
comments here is to tell about characteristics unique to the
implementation, not to restate the requirements of the standard, so
just removing it seems best.
while not the only error codes presently omitted, these two are
particularly likely to be encountered in the wild.
EUCLEAN is used by linux filesystem and device drivers to report
filesystem structure corruption or data corruption.
ENAVAIL is used by some linux drivers to indicate non-availability of
a resource.
both names are new inventions to correspond to how they are actually
used, as the original kernel strings ("Structure needs cleaning" and
"No XENIX semaphores available") are not remotely meaningful or
reasonable.
the file-level crt_arch.h asm fragments generally make direct
(non-PLT) calls from _start to _start_c, which is only valid when
there is a local, non-interposable definition for _start_c. generally,
the linker is expected to know that local definitions in a main
executable (as opposed to shared library) output are non-interposable,
making this work, but historically there have been linker bugs in this
area, and microblaze is reportedly still broken, flagging the
relocation for the call as a textrel.
the equivalent _dlstart_c, called from the same crt_arch.h asm
fragments, has always used hidden visibility without problem, and
semantically it should be hidden, so make it hidden. this ensures the
direct call is always valid regardless of whether the linker properly
special-cases main executable output.
if sem_post is interrupted between clearing the waiters bit from the
semaphore value and performing the futex wait operation, subsequent
calls to sem_post will not perform a wake operation unless a new
waiter has arrived.
usually, this is at most a minor nuisance, since the original wake
operation will eventually happen. however, it's possible that the wake
is delayed indefinitely if interrupted by a signal handler, or that
the address the wake needs to be performed on is no longer mapped if
the semaphore was a process-shared one that has since been unmapped
but has a waiter on a different mapping of the same semaphore. this
can happen when another thread using the same mapping "steals the
post" atomically before actually becoming a second waiter, deduces
from success that it was the last user of the semaphore mapping, then
re-posts and unmaps the semaphore mapping. this scenario was described
in a report by Markus Wichmann.
instead of checking only the waiters bit, also check the waiter count
that was sampled before the atomic post operation, and perform the
wake if it's nonzero. this will not produce any additional wakes under
non-race conditions, since the waiters bit only becomes zero when
targeting a single waiter for wake. checking both was already the
behavior prior to commit 159d1f6c02.
our pthread barrier implementation reportedly has bugs that are could
lead to malfunction or crash in timer_create. while this has not been
reviewed to confirm, there have been past reports of pthread barrier
bugs, and it seems likely that something is actually wrong.
pthread barriers are an obscure primitive, and timer_create is the
only place we are using them internally at present. even if they were
working correctly, this means we are imposing linking of otherwise
likely-dead code whenever timer_create is used.
a pair of semaphores functions identically to a 2-waiter barrier
except for destruction order properties. since the parent is
responsible for the argument structure (including semaphores)
lifetimes, the last operation on them in the timer thread must be
posting to the parent.
previously, global dtors, which are executed after all atexit handlers
have been called rather than being implemented as an atexit handler
themselves, would deadlock if they called atexit.
it was intentional to disallow adding more atexit handlers past the
last point where they would be executed, since a successful return
from atexit imposes a contract that the handler will be executed, but
this was only considered in the context of calls to atexit from other
threads, not calls from the dtors.
to fix this, release the lock after the exit handlers loop completes,
but but set a flag first so that we can make all future calls to
atexit return a failure code.
per the C and POSIX standards, calling exit "more than once",
including via return from main, produces undefined behavior. this
language predates threads, and at the time it was written, could only
have applied to recursive calls to exit via atexit handlers. C++
likewise makes calls to exit from global dtors undefined. nonetheless,
by the present specification as written, concurrent calls to exit by
multiple threads also have undefined behavior.
originally, our implementation of exit did have locking to handle
concurrent calls safely, but that was changed in commit
2e55da9118 based on it being undefined.
from a standpoint of both hardening and quality of implementation,
that change seems to have been a mistake.
this change adds back locking, but with awareness of the lock owner so
that recursive calls to exit can be trapped rather than deadlocking.
this also opens up the possibility of allowing recursive calls to
succeed, if future consensus ends up being in favor of that.
prior to this change, exit already behaved partly as if protected by a
lock as long as atexit was linked, but multiple threads calling exit
could concurrently "pop off" atexit handlers and execute them in
parallel with one another rather than serialized in the reverse order
of registration. this was a likely unnoticed but potentially very
dangerous manifestation of the undefined behavior. if on the other
hand atexit was not linked, multiple threads calling exit concurrently
could each run their own instance of global dtors, if any, likely
producing double-free situations.
now, if multiple threads call exit concurrently, all but the first
will permanently block (in SYS_pause) until the process terminates,
and all atexit handlers, global dtors, and stdio flushing/position
consistency will be handled in the thread that arrived first. this is
really the only reasonable way to define concurrent calls to exit. it
is not recommended usage, but may become so in the future if there is
consensus/standardization, as there is a push from the rust language
community (and potentially other languages interoperating with the C
runtime) to make concurrent calls to the language's exit interfaces
safe even when multiple languages are involved in a program, and this
is only possible by having the locking in the underlying C exit.
commit 895736d49b made these changes
along with fixing a real bug in LOG_MAKEPRI. based on further
information, they do not seem to be well-motivated or in line with
policy.
the result of LOG_FAC is not a meaningful facility value if we shift
it down like before, but apparently the way it is used by applications
is as an index into an array of facility names. moreover, all
historical systems which define it do so with the shift. as it is a
nonstandard interface, there is no justification for providing a macro
by the same name that is incompatible with historical practice.
the value of LOG_FACMASK likewise is 0x3f8 on all historical systems
checked. while only 5 bits are used for existing facility codes, the
convention seems to be that all 7 bits belong to the facility field
and theoretically could be used to expand to having more facilities.
that seems unlikely to happen, but there is no reason to make a
gratuitously incompatible change here.
Per RFC 5952, ties for longest sequence of zero fields must be broken
by choosing the earliest, but the implementation put the leading
sequence of zeros at a disadvantage. That's because for example when
compressing "0:0:0:10:0:0:0:10" the strspn(buf+i, ":0") call returns 6
for the first sequence and 7 for the second one – the second sequence
has the benefit of a leading colon.
Changing the condition to require beating the leading sequence by not
one but two characters resolves the issue.
while commit 53ac44ff4c fixed the temp
buffer being undersized, the use of a temp buffer to begin with was a
mistake. instead, compare the requested symbol name in-place and use
the already-null-terminated copy of the name without "64" present in
lfs64_list[] to look up the real symbol.
add two ioctls to get and set struct epoll_params to allow users to
control epoll based busy polling of network sockets.
added to uapi in commit 18e2bf0edf4dd88d9656ec92395aa47392e85b61 (Linux
kernel 6.9 and newer).
this interface does not have a lot of historical consensus on how it
handles the contents of the /etc/shells file in regard to whitespace
and comments, but the commonality between all checked is that they
ignore lines that are blank or that begin with '#', so that is the
behavior we adopt.
these are nonstandard and unnecessary for using the associated
functionality, but resulted in applications that used them
malfunctioning.
patch based on proposed fix by erny hombre.
This syscall is available since Linux 3.15 and also implemented in
glibc from version 2.28. It is commonly used in filesystem or security
contexts.
Constants RENAME_NOREPLACE, RENAME_EXCHANGE, RENAME_WHITEOUT are
guarded by _GNU_SOURCE as with glibc.
commit 1b0d48517f wrongly copied the
getdents return type of int rather than matching the ssize_t used by
posix_getdents. this was overlooked in testing on 32-bit archs but
obviously broke 64-bit archs.
without explicit alignment directives, whether they end up at the
necessary alignment depends on linker/linking conditions. initially
reported as mold issue 1255.
this interface was added as the outcome of Austin Group tracker issue
697. no error is specified for unsupported flags, which is probably an
oversight. for now, EOPNOTSUPP is used so as not to overload EINVAL.
the bits file is retained, but as a single generic version, to allow
for the unlikely future possibility of letting a new arch define
something differently.
previously, only a few archs defined it here. this change makes the
presence consistent across all archs, and reduces the amount of header
duplication (and potential for future inconsistency) between archs.
this change is purely to document that they are the same in
preparation to remove the arch-specific headers for these archs and
replace them with a generic version that matches riscv32 and can be
shared by these and all future archs.
commit f47a8cdd25 introduced an
alternate mechanism for access to runtime page size for compatibility
with early stages of dynamic linking, but because pthread_impl.h
indirectly includes libc.h, the condition #ifndef PAGE_SIZE was never
satisfied.
rather than depend on order of inclusion, use the (baseline POSIX)
macro PAGESIZE, not the (XSI) macro PAGE_SIZE, to determine whether
page size is dynamic. our internal libc.h only provides a dynamic
definition for PAGE_SIZE, not for PAGESIZE.
the %s conversion is added as the outcome of Austin Group tracker
issue 169 and its unspecified behavior is clarified as the outcome of
issue 1727.
the %F, %g, %G, %u, %V, %z, and %Z conversions are added as the
outcome of Austin Group tracker issue 879 for alignment with strftime
and the behaviors of %u, %z, and %Z are defined as the outcome of
issue 1727.
at this time, the conversions with unspecified effects on struct tm
are all left as parse-only no-ops. this may be changed at a later
time, particularly for %s, if there is reasonable cross-implementation
consensus outside the standards process on what the behavior should
be.
once the remaining value is less than 10, the modulo operation to
produce the final digit and division to prepare for next loop
iteration can be dropped. this may be a meaningful performance
distinction when formatting low-magnitude numbers in bulk, and should
never hurt.
based on patch by Viktor Reznov.
historically linux limited the number of supplementary groups a
process could be in to 32, but this limit was raised to 65536 in linux
2.6.4. proposals to support the new limit, change NGROUPS_MAX, or make
it dynamic have been stalled due to the impact it would have on
initgroups where the groups array exists in automatic storage.
the changes here decouple initgroups from the value of NGROUPS_MAX and
allow it to fall back to allocating a buffer in the case where
getgrouplist indicates the user has more supplementary groups than
could be reported in the buffer. getgrouplist already involves
allocation, so this does not pull in any new link dependency.
likewise, getgrouplist is already using the public malloc (vs internal
libc one), so initgroups does the same. if this turns out not to be
the best choice, both can be changed together later.
the initial buffer size is left at 32, but now as the literal value,
so that any potential future change to NGROUPS_MAX will not affect
initgroups.