Commit Graph

4932 Commits

Author SHA1 Message Date
Gabriel Ravier
4724793f96 fix wide printf numbered argument buffer overflow
The nl_type and nl_arg arrays defined in vfwprintf may be accessed
with an index up to and including NL_ARGMAX, but they are only of size
NL_ARGMAX, meaning they may be written to or read from 1 element too
far.
2023-04-14 11:19:33 -04:00
Alexey Izbyshev
c1b42c4a3a wait4: fix missing rusage on x32 due to wrong success condition
Resource usage data is filled by the kernel only when wait4 returns
a pid, i.e. a positive value.

Commit 5850546e96 introduced this bug,
possibly because of copy-pasting from getrusage.
2023-04-11 09:23:44 -04:00
Alexey Izbyshev
9b12982d52 semtimedop: fix timespec kernel ABI mismatch for 32-bit timeouts on x32
For time64 support, musl normally defines SYS_foo to the time32 variant
of that syscall on arches that have it, and to the time64 variant
otherwise, so that "SYS_foo == SYS_foo_time64" implies that the arch is
time64-only. However, SYS_semtimedop is an odd case: some arches define
only SYS_semtimedop_time64, yet they are not time64-only, because the
time32 variant is provided via SYS_ipc instead. For such arches,
defining SYS_semtimedop to SYS_semtimedop_time64 would break the
implication above, so commit 4bbd7baea7
doesn't do this. Commit eb2e298cdc
attempts to detect time64-only arches by checking that both
SYS_semtimedop and SYS_ipc are undefined, but this doesn't work for
x32, because it's a time64-only arch that does define SYS_semtimedop.
As a result, 32-bit timeouts trigger the fallback path that passes
a 32-bit timespec to the kernel while it expects a 64-bit one, so
the effective tv_sec is formed by interpreting 32-bit tv_sec and
tv_nsec as a single long long, and the effective tv_nsec is whatever
is located in the next 64 bits of the stack.

Fix this by expanding the time64-only check to include arches where
SYS_semtimedop is the time64 variant of the syscall.
2023-04-11 09:21:41 -04:00
Alexey Izbyshev
6d322159c6 getopt: fix null pointer arithmetic ub
When an option that requires an argument is the last character of
argv[argc-1], getopt computes argv[argc] + optpos. While optpos
is always zero in this case, adding it to null pointer is still
undefined.
2023-04-11 09:18:38 -04:00
Alexey Izbyshev
35e9831156 nftw: fix use of uninitialized struct stat
If lstat/stat fails with EACCES, st is left uninitialized, but its
st_dev/st_ino fields are then used in several places:

* for FTW_MOUNT check (in practice typically results in a false
  positive and an early return)
* for copying to the new struct history (though the struct is not used
  afterwards since we don't recurse in this case)
* for cycle detection check (could theoretically result in a false
  positive and an early return)

To avoid adding FTW_NS checks to all these places, fix this by
zero-initializing st_dev/st_ino (which can never match an existing
dentry due to zero inode being reserved in Linux), and check for FTW_NS
only when handling FTW_MOUNT since we need two valid dentries there.
2023-04-11 09:18:01 -04:00
Rich Felker
7c41047285 fix inadvertently static local var in dynlink get_lfs64
commit 246f1c8114 inadvertently
introduced the local variable p as static by declaring it together
with lfs64_list. the function is only reachable under lock, and is not
called reentrantly, so this is not a functional bug, but it is
confusing and inefficient. fix by separating the declarations.
2023-04-11 09:06:27 -04:00
Alexey Kodanev
77327ed064 dns: check length field in tcp response message
The received length field in the message may be greater than the
size of the 'answer' buffer in which the message resides. Currently,
ABUF_SIZE is 768. And if we get a larger 'alens[i]', it will result
in an out-of-bounds reading in __dns_parse().

To fix this, limit the length to the size of the received buffer.
2023-04-07 20:44:20 -04:00
Rich Felker
1d5750b95c fix swprintf handling of nul character in output
the buffer-flush function did not account for mbtowc returning 0
rather than 1 when converting the nul character. this prevented
advancing past it, instead repeatedly converting it into the output
wide character string until the max output length was exhausted.
2023-03-22 12:56:46 -04:00
Rich Felker
0e5234807d in printf, use ferror macro rather than directly inspecting flags bit
this is purely aesthetic and should not affect code generation or
functionality.
2023-03-21 09:11:17 -04:00
Rich Felker
868c964300 remove wide printf dependency on ugly hack in vfprintf
commit d42269d7c8 appropriated the
stream error flag temporarily to let the printf family of functions
suppress further output attempts after encountering a write error.
since the wide printf code relies on (narrow) vfprintf to print
padding and numeric conversions, a hack was put in vfprintf not to
clear the initial error status unless the stream is narrow oriented.
this was okay, because calling vfprintf on a wide-oriented stream
(outside of internal use by the implementation) produces undefined
behavior. however, it was highly non-obvious to anyone reading the
wide printf code, where the calls to fprintf without first checking
for error status appeared erroneous.

this patch removes all direct use of fprintf from the wide printf
core, except in the numeric conversions case where it was already
checked before starting processing of the directive that the error
status is not set. the other calls, which were performing padding, are
replaced by a new pad() helper function, which performs the check and
abstracts out the mechanism of writing the padding.

direct use of the error flag is also replaced by ferror, which is
defined as a macro in stdio_impl.h, expanding directly to the flag
check with no call or locking overhead.
2023-03-21 09:11:17 -04:00
Rich Felker
3a051769c4 fix (normal, narrow) printf erroneously processing %n after output errors
unlike with wide printf variants, encoding errors are not a vector by
which this bug is reachable, and the out() helper function already
ensured that no further output could be written after an output error,
transient or otherwise. however, the %n specifier could still be
processed after an error, yielding a side effect that wrongly implied
output had succeeded.

due to buffering effects, it's still possible for %n to show output as
having "succeeded", but for it never to appear on the underlying file
due to an error at flush time. this change, however, ensures that
processing of %n does not conflict with any error which has already
been seen.
2023-03-21 09:11:17 -04:00
Rich Felker
0440ed69ea fix wide printf continuation after output or encoding errors
this fixes a broader bug for which a special case was reported by
Bruno Haible, in the form of %n getting processed (and reporting the
number of wide characters which would have been written, but weren't)
after an encoding error (EILSEQ). in addition to the %n case, some but
not all of the format specifiers continued to attempt output after an
error. in particular, %c, %lc, and %s all used fputwc directly without
any check for error status.

as long as the error condition was permanent rather than transient,
these write attempts had no visible side effects, but in theory it
could be visible, for example with EAGAIN/EWOULDBLOCK or ENOSPC, if
the condition precluding output came to an end. this could produce
output with missing non-final data, rather than just truncated output,
albeit with the function still returning -1 as expected to report an
error.

to fix this, a check is added to stop processing of any new directive
(including %n) if the stream is already in error state, and direct use
of fputwc is replaced with calls to the out() helper function, which
checks for error status.

note that fprintf is also used directly without checking error status,
but due to how commit d42269d7c8
previously attempted to solve the issue of output after error, the
call to fprintf does not attempt to write anything when the
wide-oriented stream is already in error state. this is non-obvious,
and is quite a hack, so it should be changed, but I've left it alone
for now to make the bug fix commit itself as non-invasive as possible.
2023-03-21 09:10:11 -04:00
Rich Felker
d055e6a45a fix wide printf forms ignoring width for %lc format specifier
since the code path for %c was already doing it right, and the logic
is identical, condense them into a single case.
2023-03-20 13:48:50 -04:00
Rich Felker
b6811019e6 poll: fix misuse of timespec type on 32-bit archs without poll syscall
this function was overlooked during the time64 transition, probably as
a result of not having any time-related types in its application-side
interface. however, for archs that lack the traditional poll syscall
and have only ppoll, it used timespec as part of its interface with
the kernel: the millisecond timeout was converted to a timespec to
pass to SYS_ppoll. this is a type/ABI mismatch on 32-bit archs with
legacy time32 syscalls.

only one supported arch, or1k, is affected. all of the others either
have SYS_poll, or are 64-bit.

rather than using timespec, define a type locally to match what the
kernel expects. the condition (SYS_ppoll_time64 == SYS_ppoll),
comparable to conditions used elsewhere in timespec-handling code,
evaluates true for "natively time64" 32-bit archs including x32,
future riscv32, and all future 32-bit archs (via definitions in
internal syscall.h). otherwise, the arch is either 64-bit or has
syscalls that take the legacy type, and in either case "long" is
correct.

this fix is based on bug report and proposal by Alexey Izbyshev but
with a different approach to the changes to minimize the contextual
knowledge needed for a reader to understand the source file.
2023-03-03 09:52:52 -05:00
Alexey Izbyshev
8949da7ab1 select: fix 64-bit timeout truncation on pre-time64 kernels
If the (normalized) timeout passed to select exceeds INT_MAX seconds on
an arch with SYS_pselect6_time64 and the kernel is too old to support
time64 syscalls, the timeout is implicitly converted to (32-bit) long on
the fallback path, losing its upper 32 bits and potentially becoming a
small positive value, violating the intended semantics, or even
a negative value, causing the fallback syscall failure. Fix this by
saturating the timeout at INT_MAX as done in other time64 fallback
cases.
2023-03-02 20:00:45 -05:00
Rich Felker
3281047cfc dup3: don't set FD_CLOEXEC on failure on kernels without dup3 syscall
this is the best-effort fallback path for kernels that can't actually
support the dup3 functionality. it was setting FD_CLOEXEC flag on the
target fd (new) even if the dup2 operation failed. normally that
shouldn't happen under correct usage, but it's possible if the source
fd is not open or intentionally invalid (e.g. -1).
2023-02-28 15:44:46 -05:00
Rich Felker
c99b7daafd fix dup3 ignoring all flags but O_CLOEXEC on archs with SYS_dup2 syscall
our dup3 code wrongly skipped directly to making the SYS_dup2 syscall
whenever the O_CLOEXEC bit of flags was not set. this is incorrect if
any new flags are ever added, as it would silently ignore them rather
than failing with an error.

archs which lack SYS_dup2 were unaffected.

adjust the logic so that SYS_dup3 is attempted whenever flags is
nonzero, and explicitly fail with EINVAL if SYS_dup3 is unavailable
and there are any unknown flags.
2023-02-28 12:21:23 -05:00
Rich Felker
fb7fb5e4bd fix pipe2 silently ignoring unknown flags on old kernels
kernels using the fallback have an inherent close-on-exec race
condition and as such support for them is only best-effort anyway.
however, ignoring potential new flags is still very bad behavior.
instead, fail with EINVAL.
2023-02-28 12:18:43 -05:00
Alexey Izbyshev
b1dfb734a4 getservbyport_r: fix wrong result if getnameinfo fails with EAI_OVERFLOW
EAI_OVERFLOW should be propagated as ERANGE to inform the caller about
the need to expand the buffer.
2023-02-28 12:01:34 -05:00
Alexey Izbyshev
595416b11d getservbyport_r: fix out-of-bounds buffer read
If the buffer passed to getservbyport_r is just enough to store two
pointers after aligning it, getnameinfo is called with buflen == 0
(which means that service name is not needed) and trivially succeeds.
Then, strtol is called on the address just past the buffer end, and
if it doesn't happen to find the port number there, getservbyport_r
spuriously succeeds and returns the same bad address to the caller.

Fix this by ensuring that buflen is at least 1 when passed to
getnameinfo.
2023-02-28 12:00:55 -05:00
Alexey Izbyshev
1a708ece1a getifaddrs: fix UB via taking address of null pointer union dereference
getifaddrs computes &ctx->first->ifa even if ctx->first is NULL. While
this shouldn't be possible on the success path because the loopback
interface is hardcoded into the kernel, this is still possible on the
error path (for example, if __rtnetlink_enumerate couldn't create a
socket due to exceeding the fd limit).
2023-02-28 11:59:53 -05:00
Alexey Izbyshev
c499c1084e accept4: don't fall back to accept if we got unknown flags
accept4 emulation via accept ignores unknown flags, so it can spuriously
succeed instead of failing (or succeed without doing the action implied
by an unknown flag if it's added in a future kernel). Worse, unknown
flags trigger the fallback code even on modern kernels if the real
accept4 syscall returns EINVAL, because this is indistinguishable from
socketcall returning EINVAL due to lack of accept4 support.

Fix this by always failing with EINVAL if unknown flags are present and
the syscall is missing or failed with EINVAL.
2023-02-28 11:48:05 -05:00
Alexey Izbyshev
523d9b965d fix potential read past end of buffer in getnameinfo host name lookup
This is completely analoguous to commit 633183b5d1.

Similar code called from __lookup_name is not affected because it checks
that the line contains the host name surrounded by blanks.
2023-02-27 10:04:34 -05:00
Alexey Izbyshev
d0b7f9768d dns: fix workaround for systems defaulting to ipv6-only sockets
When IPv6 nameservers are present, __res_msend_rc attempts to disable
IPV6_V6ONLY socket option to ensure that it can communicate with IPv4
nameservers (if they are present too) via IPv4-mapped IPv6 addresses.
However, this option can't be disabled on bound sockets, so setsockopt
always fails.
2023-02-27 10:03:56 -05:00
Alexey Izbyshev
bec42ef393 dns: handle early eof in tcp fallback
A zero returned from recvmsg is currently treated as if some data were
received, so if a DNS server closes its TCP socket before sending the
full answer, __res_msend_rc will spin until the timeout elapses because
POLLIN event will be reported on each poll. Fix this by treating an
early EOF as an error.
2023-02-27 10:03:34 -05:00
Alexey Izbyshev
9b132e5567 prevent CNAME/PTR parsing from reading data past the response end
DNS parsing callbacks pass the response buffer end instead of the actual
response end to dn_expand, so a malformed DNS response can use message
compression to make dn_expand jump past the response end and attempt to
parse uninitialized parts of that buffer, which might succeed and return
garbage.
2023-02-27 10:03:06 -05:00
Alexey Izbyshev
12590c8bbd fix out-of-bounds reads in __dns_parse
There are several issues with range checks in this function:

* The question section parsing loop can read up to two out-of-bounds
  bytes before doing the range check and bailing out.

* The answer section parsing loop, in addition to the same issue as
  above, uses the wrong length in the range check that doesn't prevent
  OOB reads when computing len later.

* The len range check before calling the callback is off by 10. Also,
  p+len can overflow in a (probably theoretical) case when p is within
  2^16 from UINTPTR_MAX.

Because __dns_parse is used only with stack-allocated buffers, such
small overreads can't result in a segfault. The first two also don't
affect the function result, but the last one may result in getaddrinfo
incorrectly succeeding and returning up to 10 bytes past the
response buffer as a part of the IP address, and in (canon) name
returned by getaddrinfo/getnameinfo being affected by memory past the
response buffer (because dn_expand might interpret it as a pointer).
2023-02-27 10:01:29 -05:00
Rich Felker
bc695a5ac1 fix incorrect unit for CPU_SETSIZE macro
this macro is supposed to reflect the number of members (bits) in
cpu_set_t, not the storage size (bytes).
2023-02-23 10:10:44 -05:00
A. Wilcox
7d756e1c04 dns: prefer monotonic clock for timeouts
Before this commit, DNS timeouts always used CLOCK_REALTIME, which
could produce spurious timeouts or delays if wall time changed for
whatever reason.

Now we try CLOCK_MONOTONIC and only fall back to CLOCK_REALTIME when
it is unavailable.
2023-02-12 18:03:24 -05:00
Gabriel Ravier
07616721f1 fix return value of wcs{,n}cmp for extreme wchar_t values
As a result of using simple subtraction to implement the return values
for wcscmp and wcsncmp, integer overflow can occur (producing
undefined behavior, and in practice, a wrong comparison result). This
does not occur for meaningful character values (21-bit range) but the
functions are specified to work on arbitrary wchar_t arrays.

This patch replaces the subtraction with a little bit of code that
orders the characters correctly, returning -1 if the character from
the first string is smaller than the one from the second, 0 if they
are equal and 1 if the character from the first string is larger than
the one from the second.
2023-02-12 17:50:59 -05:00
Szabolcs Nagy
35fdfe62a4 math: fix undefined shift in logf
A signed int shift overflowed when computing a constant mask, use hex
literal instead.  This is unlikely to cause actual issues unless the
code was compiled with ubsan or similar instrumentation specifically
to catch this. The stripped libc.so is unchanged on x86_64.
Reported by q66 on irc.
2023-02-12 17:46:50 -05:00
Alexey Izbyshev
7e13e5ae69 inet_pton: fix uninitialized memory use for IPv4-mapped IPv6 addresses
When a dot is encountered, the loop counter is incremented before
exiting the loop, but the corresponding ip array element is left
uninitialized, so the subsequent memmove (if "::" was seen) and the
loop copying ip to the output buffer will operate on an uninitialized
uint16_t.

The uninitialized data never directly influences the control flow and
is overwritten on successful return by the second half of the parsed
IPv4 address. But it's better to fix this to avoid unexpected
transformations by a sufficiently smart compiler and reports from
UB-detection tools.
2023-02-12 17:42:37 -05:00
Szabolcs Nagy
7e6da7ac98 hsearch: fix null pointer arithmetic UB
htab->__tab->entries pointer may be 0 so delay using it in arithmetics.
this did not cause any known issue other than with ubsan instrumentation.
2023-02-12 17:41:23 -05:00
Colin Cross
f79b973d92 increase sendmsg internal buffer to support SCM_MAX_FD
The kernel defines a limit on the number of fds that can be passed
through an SCM_RIGHTS ancillary message as SCM_MAX_FD. The value was
255 before kernel 2.6.38 (after that it is 253), and an SCM_RIGHTS
ancillary message with 255 fds requires 1040 bytes, slightly more than
the current 1024 byte internal buffer in sendmsg. 1024 is an arbitrary
size, so increase it to match the the arbitrary size limit in the
kernel. This fixes tests that are verifying they support up to
SCM_MAX_FD fds.
2023-02-12 17:38:37 -05:00
Rich Felker
0ab97350f0 mq_notify: block all (application) signals in the worker thread
until the mq notification event arrives, it is mandatory that signals
be blocked. otherwise, a signal can be received, and its handler
executed, in a thread which does not yet exist on the abstract
machine.

after the point of the event arriving, having signals blocked is not a
conformance requirement but a QoI requirement. while the application
can unblock any signals it wants unblocked in the event handler
thread, if they did not start out blocked, it could not block them
without a race window where they are momentarily unblocked, and this
would preclude controlled delivery or other forms of acceptance
(sigwait, etc.) anywhere in the application.
2023-02-12 15:05:39 -05:00
Rich Felker
711673ee77 mq_notify: join worker thread before returning in error path
this avoids leaving behind transient resource consumption whose
cleanup is subject to scheduling behavior.
2023-02-12 15:05:38 -05:00
Rich Felker
8c0c9c69a1 mq_notify: rework to fix use-after-close/double-close bugs
in the error path where the mq_notify syscall fails, the initiating
thread may have closed the socket before the worker thread calls recv
on it. even in the absence of such a race, if the recv call failed,
e.g. due to seccomp policy blocking it, the worker thread could
proceed to close, producing a double-close condition.

this can all be simplified by moving the mq_notify syscall into the
new thread, so that the error case does not require pthread_cancel.
now, the initiating thread only needs to read back the error status
after waiting for the worker thread to consume its arguments.
2023-02-12 15:05:38 -05:00
Rich Felker
fde6891e59 mq_notify: use semaphore instead of barrier to sync args consumption
semaphores are a much lighter primitive, and more idiomatic with
current usage in the code base.
2023-02-11 13:00:37 -05:00
Rich Felker
c3cd04fa5f fix pthread_detach inadvertently acting as cancellation point in race case
disabling cancellation around the pthread_join call seems to be the
safest and logically simplest fix. i believe it would also be possible
to just perform the unmap directly here after __tl_sync, removing the
dependency on pthread_join, but such an approach duplicately encodes a
lot more implementation assumptions.
2023-02-11 13:00:22 -05:00
Rich Felker
115149c023 powerpc-sf longjmp clobbering of val argument
the logic to check hwcap for SPE register file inadvertently clobbered
the val argument before use. switch to a different work register so
this doesn't happen.
2023-02-11 10:00:31 -05:00
Pedro Falcato
5763f003a5 riscv64: add vfork
Implement vfork() using clone(CLONE_VM | CLONE_VFORK | ...).
2023-02-09 12:33:35 -05:00
Rich Felker
269d193820 fix wrong sigaction syscall ABI on mips*, or1k, microblaze, riscv64
we wrongly defined a dummy SA_RESTORER flag on these archs, despite
the kernel interface not actually having such a feature. on archs
which lack SA_RESTORER, the kernel sigaction structure also lacks the
restorer function pointer member, which means the signal mask appears
at a different offset. the kernel was thereby interpreting the bits of
the code address as part of the signal set to be masked while handling
the signal.

this patch removes the erroneous SA_RESTORER definitions from archs
which do not have it, makes access to the member conditional on
whether SA_RESTORER is defined for the arch, and removes the
now-unused asm for the affected archs.

because there are reportedly versions of qemu-user which also use the
wrong ABI here, the old ksigaction struct size is preserved with an
unused member at the end. this is harmless and mitigates the risk of
such a bug turning into a buffer overflow onto the sigaction
function's stack.
2023-02-09 12:33:35 -05:00
Rich Felker
ea3b40a321 fix integer overflow in WIFSTOPPED macro
the result of the 0xffff mask with the exit status could have bit 15
set, in which case multiplying by 0x10001 overflows 32-bit signed int.
making the multiply unsigned avoids the overflow. it also changes the
sign extension behavior of the subsequent >> operation, but the
affected bits are all unwanted anyway and all discarded by the cast to
short.
2023-02-08 16:42:28 -05:00
Rich Felker
f897461d4f fix debugger tracking of shared libraries on mips with PIE main program
mips has its own mechanisms for DT_DEBUG because it makes _DYNAMIC
read-only, and the original mechanism, DT_MIPS_RLD_MAP, was
PIE-incompatible. DT_MIPS_RLD_MAP_REL was added to remedy this, but we
never implemented support for it. add it now using the same idioms for
mips-specific ldso logic.
2023-01-18 10:32:14 -05:00
Rich Felker
a4b0a665b8 expose memmem under baseline POSIX feature profile
memmem has been adopted for the next issue of POSIX (outcome of
tracker item 1061). since mem* is in the reserved namespace for
string.h it's already fully conforming to expose it by default, so
just do so.
2023-01-06 06:33:19 -05:00
Rich Felker
9532ae1318 use libc-internal malloc for pthread_atfork
while no lock is held here making it a lock-order issue, replacement
malloc is likely to want to use pthread_atfork, possibly making the
call to malloc infinitely recursive.

even if not, there is no reason to prefer an application-provided
malloc here.
2022-12-17 16:00:19 -05:00
Markus Wichmann
7d358599d4 prevent invalid reads of nl_arg in printf_core
printf_core() runs twice, and during its first run, nl_arg is
uninitialized and must not be read. It gets initialized at the end of
the first run. Conversely, nl_type does not need to be set during the
second run, as its useful life has ended at that point, since the only
time it is read is during that exact same initialization. Therefore we
can simply alternate the assignments.

p and w do still need to get values assigned to them, since at least one
line in the same if-statement depends on that, but they can be dummy
values. arg does not need to be assigned, since in the first run, we
encounter a continue statement before using the argument.
2022-12-14 10:03:37 -05:00
Fangrui Song
c5f4b2dfea elf.h: add ELFCOMPRESS_ZSTD 2022-12-14 09:34:32 -05:00
Rich Felker
159d1f6c02 semaphores: fix missed wakes from ABA bug in waiter count logic
because the has-waiters state in the semaphore value futex word is
only representable when the value is zero (the special value -1
represents "0 with potential new waiters"), it's lost if intervening
operations make the semaphore value positive again. this creates an
ABA issue in sem_post, whereby the post uses a stale waiters count
rather than re-evaluating it, skipping the futex wake if the stale
count was zero.

the fix here is based on a proposal by Alexey Izbyshev, with minor
changes to eliminate costly new spurious wake syscalls.

the basic idea is to replace the special value -1 with a sticky
waiters bit (repurposing the sign bit) preserved under both wait and
post. any post that takes place with the waiters bit set will perform
a futex wake.

to be useful, the waiters bit needs to be removable, and to remove it
safely, we perform a broadcast wake instead of a normal single-task
wake whenever removing the bit. this lets any un-accounted-for waiters
wake and re-add the waiters bit if they still need it.

there are multiple possible choices for when to perform this
broadcast, but the optimal choice seems to be doing it whenever the
observed waiters count is less than two (semantically, this means
exactly one, but we might see a stale count of zero). in this case,
the expected number of threads to be woken is one, with exactly the
same cost as a non-broadcast wake.
2022-12-13 18:39:44 -05:00
Rich Felker
f47a8cdd25 ldso: fix invalid early references to extern-linkage libc.page_size
when PAGE_SIZE is not constant, internal/libc.h defines it to expand
to libc.page_size. however, kernel_mapped_dso, reachable from stage 2
of the dynamic linker bootstrap (__dls2), needs PAGE_SIZE to interpret
the relro range. at this point the libc object is both uninitialized
and invalid to access according to our model for bootstrapping, which
does not assume any external-linkage objects are accessible until
stages 2b/3. in practice it likely worked because hidden visibility
tends to behave like internal linkage, but this is not a property that
the dynamic linker was designed to rely upon.

this bug likely manifested as relro malfunction on archs with variable
page size, due to incorrect mask when aligning the relro bounds to
page boundaries.

while there are certainly more direct ways to fix the known problem
point here, a maximally future-proof way is to just bypass the libc.h
PAGE_SIZE definition in the dynamic linker and instead have dynlink.c
define its own internal-linkage object for variable page size. then,
if anything else in stage 2 ever ends up referencing PAGE_SIZE, it
will just automatically work right.
2022-11-30 19:07:34 -05:00