Commit Graph

4877 Commits

Author SHA1 Message Date
Rich Felker
63402be229 clean up dns_parse_callback
the only functional change here should be that MAXADDRS is only
checked for RRs that provide address results, so that a CNAME which
appears after an excessive number of address RRs does not get ignored.
I'm not aware of any servers that order the RRs this way, and it may
even be forbidden to do so, but I prefer having the callback logic not
be order dependent.

other than that, the motivation for this change is that the A and AAAA
cases were mostly duplicate code that could be combined as a single
code path.
2022-10-19 14:02:48 -04:00
Rich Felker
0a7b4323b0 dns response handling: don't treat too many addresses as an error
returning -1 rather than 0 from the parse function causes __dns_parse
to bail out and return an error. presently, name_from_dns does not
check the return value anyway, so this does not matter, but if it ever
started treating this as an error, lookups with large numbers of
addresses would break. this is a consequence of adding TCP support and
extending the buffer size used in name_from_dns.
2022-10-19 14:01:45 -04:00
Rich Felker
41603c7706 dns response handling: ignore presence of wrong-type RRs
reportedly there is nameserver software with question-rewriting
"functionality" which gives A answers when AAAA is queried. since we
made no effort to validate that the answer RR type actually
corresponds to the question asked, it was possible (depending on
flags, etc.) for these answers to leak through, which the caller might
not be prepared for. indeed, our implementation of gethostbyname2_r
makes an assumption that the resulting addresses are in the family
requested, and will misinterpret the results if they don't.

commit 45ca5d3fcb already noted in
fixing CVE-2017-15650 that this could happen, but did nothing to
validate that the RR type of the answer matches the question; it just
enforced the limit on number of results to preclude overflow.

presently, name_from_dns ignores the return value of __dns_parse, so
it doesn't really matter whether we return 0 (ignoring the RR) or -1
(parse-ending error) upon encountering the mismatched RR. if that ever
changes, though, ignoring irrelevant answer RRs sounds like the
semantically correct thing to do, so for now let's return 0 from the
callback when this happens.
2022-10-19 14:01:32 -04:00
Rich Felker
cf76df0e1f fix missing synchronization of pthread TSD keys with MT-fork
commit 167390f055 seems to have
overlooked the presence of a lock here, probably because it was one of
the exceptions not using LOCK() but a rwlock.

as such, it can't be added to the generic table of locks to take, so
add an explicit atfork function for the pthread keys table. the order
it is called does not particularly matter since nothing else in libc
but pthread_exit interacts with keys.
2022-10-19 14:01:32 -04:00
Rich Felker
5ff3eea91f fgets: avoid arithmetic overflow when n==INT_MIN is passed
performing n-- is not a safe operation for arbitrary signed input n.
only perform the decrement in the code path where the initial n is
greater than 1, and adjust the condition in the n<=1 code path to
compensate for it not having been decremented.
2022-10-19 14:01:32 -04:00
Rich Felker
d8f35e29d0 fix AS-safety of close when aio is in use and fd map is expanded
the aio operations that lead to calling __aio_get_queue with the
possibility to expand the fd map are not AS-safe, but if they are
interrupted by a signal handler, the signal handler may call close,
which is required to be AS-safe. due to __aio_get_queue taking the
write lock without blocking signals, such a call to close from a
signal handler could deadlock.

change __aio_get_queue to block signals if it needs to obtain a write
lock, and restore when finished.
2022-10-19 14:01:32 -04:00
Alexey Izbyshev
26c76a908b fix use of uninitialized dummy_fut in aio_suspend
aio_suspend waits on a dummy futex in the corner case when the array of
requests contains NULL pointers only. But the value of this futex was
left uninitialized, so if it happens to be non-zero, aio_suspend
degrades to spinning instead of blocking.
2022-10-19 14:01:32 -04:00
Rich Felker
aebd6a3644 fix potential deadlock between multithreaded fork and aio
as reported by Alexey Izbyshev, there is a lock order inversion
deadlock between the malloc lock and aio maplock at MT-fork time:
_Fork attempts to take the aio maplock while fork already has the
malloc lock, but a concurrent aio operation holding the maplock may
attempt to allocate memory.

move the __aio_atfork calls in the parent from _Fork to fork, and
reorder the lock before most other locks, since nothing else depends
on aio(*). this leaves us with the possibility that the child will not
be able to obtain the read lock, if _Fork is used directly and happens
concurrent with an aio operation. however, in that case, the child
context is an async signal context that cannot call any further aio
functions, so all we need is to ensure that close does not attempt to
perform any aio cancellation. this can be achieved just by nulling out
the map pointer.

(*) even if other functions call close, they will only need a read
lock, not a write lock, and read locks being recursive ensures they
can obtain it. moreover, the number of read references held is bounded
by something like twice the number of live threads, meaning that the
read lock count cannot saturate.
2022-10-19 14:01:32 -04:00
Rich Felker
d64148a874 fix potential unsynchronized access to killlock state at thread exit
as reported by Alexey Izbyshev, when the second-to-last thread exits
causing a return to single-threaded (no locks needed) state, it
creates a situation where the last remaining thread may obtain the
killlock that's already held by the exiting thread. this means it may
erroneously use the tid of the exiting thread, and may corrupt the
lock state due to double-unlock.

commit 8d81ba8c0b, which (re)introduced
the switch back to single-threaded state, documents the intent that
the first lock after switching back should provide the necessary
synchronization. this is correct, but only works if the switch back is
made after there is no further need for synchronization with locks
(other than the thread list lock, which can't be bypassed) held by the
exiting thread.

in order to hit the bug, the remaining thread must first take a
different lock, causing it to perform an actual lock one last time,
consume the need_locks==-1 state, and transition to need_locks==0.
after that, the next attempt to lock the exiting thread's killlock
will bypass locking.

fix this by reordering the unlocking of killlock at thread exit time,
along with changes to the state protected by it, to occur earlier,
before the switch to single-threaded state. there are really no
constraints on where it's done, except that it occur after there is no
longer any possibility of application code executing in the exiting
thread, so do it as early as possible.
2022-10-19 14:01:32 -04:00
Rich Felker
36b72cd6fd fix potential deadlock in dlerror buffer handling at thread exit
ever since commit 8f11e6127f introduced
the thread list lock, this has been wrong. initially, it was wrong via
calling free from the context with the thread list lock held. commit
aa5a9d15e0 deferred the unsafe free but
added a lock, which was also unsafe. in particular, it could deadlock
if code holding freebuf_queue_lock was interrupted by a signal handler
that takes the thread list lock.

commit 4d5aa20a94 observed that there
was a lock here but failed to notice that it's invalid.

there is no easy solution to this problem with locks; any attempt at
solving it while still using locks would require the lock to be an
AS-safe one (blocking signals on each access to the dlerror buffer
list to check if there's deferred free work to be done) which would be
excessively costly, and there are also lock order considerations with
respect to how the lock would be handled at fork.

instead, just use an atomic list.
2022-10-19 14:01:32 -04:00
Rich Felker
833a469167 configure: disable TBAA optimization because most compilers are buggy
unlike most projects that use -fno-strict-aliasing, we aim to have all
sources respect the C language rules for effective type that make
type-based alias analysis optimizations possible. unfortunately, it
turns out that there are deep, and likely very difficult to fix, flaws
in the TBAA performed by GCC and likely other compilers, whereby this
kind of optimization can transform code that follows the rules
strictly in ways that will make it malfunction. see for example GCC
bugs 107107 and 107115, the latter of which also affects clang.

there are not presently any known instances of breakage due to wrong
type-based aliasing optimizations in our codebase. nonetheless, since
the transformations are unsound and could introduce breakage,
configure CFLAGS to build with -fno-strict-aliasing.

some casual analysis of the effects on codegen suggest that this is
unlikely to affect performance except possibly in the regex engine. in
general, we should probably prefer making better use of the restrict
keyword over relying on types to imply non-aliasing for optimization
purposes; doing so should be able to get back any performance that was
lost and more, should it turn out to matter (unlikely).
2022-10-19 14:01:31 -04:00
Rich Felker
e6e8213244 disable MADV_FREE usage in mallocng
the entire intent of using madvise/MADV_FREE on freed slots is to
improve system performance by avoiding evicting cache of useful data,
or swapping useless data to disk, by marking any whole pages in the
freed slot as discardable by the kernel. in particular, unlike
unmapping the memory or replacing it with a PROT_NONE region, use of
MADV_FREE does not make any difference to memory accounting for commit
charge purposes, and so does not increase the memory available to
other processes in a non-overcommitted environment.

however, various measurements have shown that inordinate amounts of
time are spent performing madvise syscalls in processes which
frequently allocate and free medium sized objects in the size range
roughly between PAGESIZE and MMAP_THRESHOLD, to the point that the net
effect is almost surely significant performance degredation. so, turn
it off.

the code, which has some nontrivial logic for efficiently determining
whether there is a whole-page range to apply madvise to, is left in
place so that it can easily be re-enabled if desired, or later tuned
to only apply to certain sizes or to use additional heuristics.
2022-10-19 14:01:31 -04:00
Rich Felker
25e6fee27f remove LFS64 programming interfaces (macro-only) from _GNU_SOURCE
these badly pollute the namespace with macros whenever _GNU_SOURCE is
defined, which is always the case with g++, and especially tends to
interfere with C++ constructs.

as our implementation of these was macro-only, their removal cannot
affect any existing binaries. at the source level, portable software
should be prepared for them not to exist.

for now, they are left in place with explicit _LARGEFILE64_SOURCE.
this provides an easy temporary path for integrators/distributions to
get packages building again right away if they break while working on
a proper, upstreamable fix. the intent is that this be a very
short-term measure and that the macros be removed entirely in the next
release cycle.
2022-10-19 14:01:31 -04:00
Rich Felker
246f1c8114 remove LFS64 symbol aliases; replace with dynamic linker remapping
originally the namespace-infringing "large file support" interfaces
were included as part of glibc-ABI-compat, with the intent that they
not be used for linking, since our off_t is and always has been
unconditionally 64-bit and since we usually do not aim to support
nonstandard interfaces when there is an equivalent standard interface.

unfortunately, having the symbols present and available for linking
caused configure scripts to detect them and attempt to use them
without declarations, producing all the expected ill effects that
entails.

as a result, commit 2dd8d5e1b8 was made
to prevent this, using macros to redirect the LFS64 names to the
standard names, conditional on _GNU_SOURCE or _LARGEFILE64_SOURCE.
however, this has turned out to be a source of further problems,
especially since g++ defines _GNU_SOURCE by default. in particular,
the presence of these names as macros breaks a lot of valid code.

this commit removes all the LFS64 symbols and replaces them with a
mechanism in the dynamic linker symbol lookup failure path to retry
with the spurious "64" removed from the symbol name. in the future,
if/when the rest of glibc-ABI-compat is moved out of libc, this can be
removed.
2022-10-19 14:01:31 -04:00
Rich Felker
dec8f0a4fa dns query core: detect udp truncation at recv time
we already attempt to preclude this case by having res_send use a
sufficiently large temporary buffer even if the caller did not provide
one as large as or larger than the udp dns max of 512 bytes. however,
it's possible that the caller passed a custom-crafted query packet
using EDNS0, e.g. to get detailed DNSSEC results, with a larger udp
size allowance.

I have also seen claims that there are some broken nameservers in the
wild that do not honor the dns udp limit of 512 and send large answers
without the TC bit set, when the query was not using EDNS.

we generally don't aim to support broken nameservers, but in this case
both problems, if the latter is even real, have a common solution:
using recvmsg instead of recvfrom so we can examine the MSG_TRUNC
flag.
2022-10-19 14:01:31 -04:00
Rich Felker
8c408937da getaddrinfo dns lookup: use larger answer buffer to handle long CNAMEs
the size of 512 is not sufficient to get at least one address in the
worst case where the name is at or near max length and resolves to a
CNAME at or near max length. prior to tcp fallback, there was nothing
we could do about this case anyway, but now it's fixable.

the new limit 768 is chosen so as to admit roughly the number of
addresses with a worst-case CNAME as could fit for a worst-case name
that's not a CNAME in the old 512-byte limit. outside of this
worst-case, the number of addresses that might be obtained is
increased.

MAXADDRS (48) was originally chosen as an upper bound on the combined
number of A and AAAA records that could fit in 512-byte packets (31
and 17, respectively). it is not increased at this time.

so as to prevent a situation where the A records consume almost all of
these slots (at 768 bytes, a "best-case" name can fit almost 47 A
records), the order of parsing is swapped to process AAAA first. this
ensures roughly half of the slots are available to each address
family.
2022-10-19 14:01:12 -04:00
Rich Felker
759bf785a8 arpa/nameser.h: update RR types list
our RR type list in arpa/nameser.h was badly outdated, and missing
important types for DNSSEC and DANE use, among other things.
2022-09-22 18:44:44 -04:00
Rich Felker
51d4669fb9 dns: implement tcp fallback in __res_msend query core
tcp fallback was originally deemed unwanted and unnecessary, since we
aim to return a bounded-size result from getaddrinfo anyway and
normally plenty of address records fit in the 512-byte udp dns limit.
however, this turned out to have several problems:

- some recursive nameservers truncate by omitting all the answers,
  rather than sending as many as can fit.

- a pathological worst-case CNAME for a worst-case name can fill the
  entire 512-byte space with just the two names, leaving no room for
  any addresses.

- the res_* family of interfaces allow querying of non-address records
  such as TLSA (DANE), TXT, etc. which can be very large. for many of
  these, it's critical that the caller see the whole RRset. also,
  res_send/res_query are specified to return the complete, untruncated
  length so that the caller can retry with an appropriately-sized
  buffer. determining this is not possible without tcp.

so, it's time to add tcp fallback.

the fallback strategy implemented here uses one tcp socket per
question (1 or 2 questions), initiated via tcp fastopen when possible.
the connection is made to the nameserver that issued the truncated
answer. right now, fallback happens unconditionally when truncation is
seen. this can, and may later be, relaxed for queries made by the
getaddrinfo system, since it will only use a bounded number of results
anyway.

retry is not attempted again after failure over tcp. the logic could
easily be adapted to do that, but it's of questionable value, since
the tcp stack automatically handles retransmission and the successs
answer with TC=1 over udp strongly suggests that the nameserver has
the full answer ready to give. further retry is likely just "take
longer to fail".
2022-09-22 14:17:05 -04:00
Rich Felker
e2e9517607 res_send: use a temp buffer if caller's buffer is under 512 bytes
for extremely small buffer sizes, the DNS query core in __res_msend
may malfunction completely, being unable to get even the headers to
determine the response code. but there is also a problem for
reasonable sizes under 512 bytes: __res_msend is unable to determine
if the udp answer was truncated at the recv layer, in which case it
may be incomplete, and res_send is then unable to honor its contract
to return the length of the full, non-truncated answer.

at present, res_send does not honor that contract anyway when the full
answer would exceed 512 bytes, since there is no tcp fallback, but
this change at least makes it consistent in a context where this is
the only "full answer" to be had.
2022-09-22 12:41:23 -04:00
Rich Felker
c87d75f2aa adapt res_msend DNS query core for working with multiple sockets
this is groundwork for TCP fallback support, but does not itself
change behavior in any way.
2022-09-21 18:49:53 -04:00
Rich Felker
85050ac5a2 getaddrinfo: add EAI_NODATA error code to distinguish NODATA vs NxDomain
this was apparently omitted long ago out of a lack of understanding of
its importance and the fact that POSIX doesn't specify it. despite not
being officially standardized, however, it turns out that at least
AIX, glibc, NetBSD, OpenBSD, QNX, and Solaris document and support it.

in certain usage cases, such as implementing a DNS gateway on top of
the stub resolver interfaces, it's necessary to distinguish the case
where a name does not exit (NxDomain) from one where it exists but has
no addresses (or other records) of the requested type (NODATA). in
fact, even the legacy gethostbyname API had this distinction, which we
were previously unable to support correctly because the backend lacked
it.

apart from fixing an important functionality gap, adding this
distinction helps clarify to users how search domain fallback works
(falling back in cases corresponding to EAI_NONAME, not in ones
corresponding to EAI_NODATA), a topic that has been a source of
ongoing confusion and frustration.

as a result of this change, EAI_NONAME is no longer a valid universal
error code for getaddrinfo in the case where AI_ADDRCONFIG has
suppressed use of all address families. in order to return an accurate
result in this case, getaddrinfo is modified to still perform at least
one lookup. this will almost surely fail (with a network error, since
there is no v4 or v6 network to query DNS over) unless a result comes
from the hosts file or from ip literal parsing, but in case it does
succeed, the result is replaced by EAI_NODATA.

glibc has a related error code, EAI_ADDRFAMILY, that could be used for
the AI_ADDRCONFIG case and certain NODATA cases, but distinguishing
them properly in full generality seems to require additional DNS
queries that are otherwise not useful. on glibc, it is only used for
ip literals with mismatching family, not for DNS or hosts file results
where the name has addresses only in the opposite family. since this
seems misleading and inconsistent, and since EAI_NODATA already covers
the semantic case where the "name" exists but doesn't have any
addresses in the requested family, we do not adopt EAI_ADDRFAMILY at
this time. this could be changed at some point if desired, but the
logic for getting all the corner cases with AI_ADDRCONFIG right is
slightly nontrivial.
2022-09-20 18:09:42 -04:00
Rich Felker
dc9285ad1d fix error cases in gethostbyaddr_r
EAI_MEMORY is not possible (but would not provide errno if it were)
and EAI_FAIL does not provide errno. treat the latter as EBADMSG to
match how it's handled in gethostbyname2_r (it indicates erroneous or
failure response from the nameserver).
2022-09-19 19:12:09 -04:00
Rich Felker
f9827fc7da remove impossible error case from gethostbyname2_r
EAI_MEMORY is not possible because the resolver backend does not
allocate. if it did, it would be necessary for us to explicitly return
ENOMEM as the error, since errno is not guaranteed to reflect the
error cause except in the case of EAI_SYSTEM, so the existing code was
not correct anyway.
2022-09-19 19:09:02 -04:00
Rich Felker
f081d5336a fix return value of gethostnbyname[2]_r on result not found
these functions are horribly underspecified, inconsistent between
historical systems, and should never have been included. however, the
signatures we have match the glibc ones, and the glibc behavior is to
treat NxDomain and NODATA results as a success condition, not an
ENOENT error.
2022-09-19 19:02:40 -04:00
Rich Felker
1e7fb12f77 dns: treat names rejected by res_mkquery as nonexistent rather than error
this distinction only affects search, but allows search to continue
when concatenating one of the search domains onto the requested name
produces a result that's not valid. this can happen when the
concatenation is too long, or one of the search list entries is
itself not valid.

as a consequence of this change, having "." in the search domains list
will now be ignored/skipped rather than making the lookup abort with
no results (due to producing a concatenation ending in ".."). this
behavior could be changed later if needed.
2022-09-19 15:51:04 -04:00
Rich Felker
001c1afb0a res_mkquery: error out on consecutive final dots in name
the main loop already errors out on zero-length labels within the
name, but terminates before having a chance to check for an erroneous
final zero-length label, instead producing a malformed query packet
with a '.' byte instead of the terminating zero.

rather than poke at the look logic, simply detect this condition early
and error out without doing anything.

this also fixes behavior of getaddrinfo when "." appears in the search
domain list, which produces a name ending in ".." after concatenation,
at least in the sense of no longer emitting malformed packets on the
network. however, due to other issues, the lookup will still fail.
2022-09-19 15:38:00 -04:00
Alexey Izbyshev
3ad3fa962e fix thread leak on timer_create(SIGEV_THREAD) failure
After commit 5b74eed3b3 the timer thread
doesn't check whether timer_create() actually created the timer,
proceeding to wait for a signal that might never arrive.  We can't fix
this by simply checking for a negative timer_id after
pthread_barrier_wait() because we have no way to distinguish a timer
creation failure and a request to delete a timer with INT_MAX id if it
happens to arrive quickly (a variation of this bug existed before
5b74eed3b3, where the timer would be
leaked in this case).  So (ab)use cancel field of pthread_t instead.
2022-09-19 13:24:05 -04:00
Rich Felker
bf14ef193b re-enable vdso clock_gettime on arm (32-bit) with workaround
commit 4486c579cb disabled vdso
clock_gettime on arm due to a Linux kernel bug that was not understood
at the time, whereby the vdso function silently produced
catastrophically wrong results on some systems.

since then, the bug was tracked down to the way the arm kernel
disabled use of vdso clock_gettime on kernels where the necessary
timer was not available or was disabled. it simply patched out the
symbols, but it only did this for the legacy time32 functions, and
left the time64 function in place but non-operational. kernel commit
4405bdf3c57ec28d606bdf5325f1167505bfdcd4 (first present in 5.8)
provided the fix.

if this were a bug that impacted all users of the broken kernel
versions, we could probably ignore it and assume it had been patched
or replaced. however, it's very possible that these kernels appear in
the wild in devices running time32 userspace (glibc, musl 1.1.x, or
some other environment) where they appear to work fine, but where our
new binaries would fail catastrophically if we used the time64 vdso
function.

since the kernel has not (yet?) given us a way to probe for the
working time64 vdso function semantically, we work around the problem
by refusing to use the time64 one unless the time32 one is also
present. this will revert to not using vdso at all if the time32 one
is ever removed, but at least that's safe against wrong results and is
just a missed optimization.
2022-09-19 13:21:54 -04:00
Rich Felker
6f3ead0ae1 process DT_RELR relocations in ldso-startup/static-pie
commit d32dadd60e added DT_RELR
processing for programs and shared libraries processed by the dynamic
linker, but left them unsupported in the dynamic linker itseld and in
static pie binaries, which self-relocate via code in dlstart.c.

add the equivalent processing to this code path so that there are not
arbitrary restrictions on where the new packed relative relocation
form can be used.
2022-09-12 08:30:36 -04:00
Rich Felker
25085c85a0 fix fwprintf missing output to open_wmemstream FILEs
open_wmemstream's write method was written assuming no buffering,
since it sets the FILE up with buf_len of zero in order to avoid
issues with position/seeking. however, as a consequence of commit
bd57e2b43a, a FILE being written to by
the printf core has a temporary local buffer for the duration of the
operation if it was unbuffered to begin with. since this was
disregarded by the wide memstream's write method, output produced
through this code path, particularly numeric fields, was missing from
the output wchar buffer.

copy the equivalent logic for using the buffered data from the
byte-oriented open_memstream.
2022-09-07 19:38:32 -04:00
Rich Felker
a636fd630f dns: fail if ipv6 is disabled and resolv.conf has only v6 nameserves
if resolv.conf lists no nameservers at all, the default of 127.0.0.1
is used. however, another "no nameservers" case arises where the
system has ipv6 support disabled/configured-out and resolv.conf only
contains v6 nameservers. this caused the resolver to repeat socket
operations that will necessarily fail (sending to one or more
wrong-family addresses) while waiting for a timeout.

it would be contrary to configured intent to query 127.0.0.1 in this
case, but the current behavior is not conducive to diagnosing the
configuration problem. instead, fail immediately with EAI_SYSTEM and
errno==EAFNOSUPPORT so that the configuration error is reportable.
2022-08-26 14:57:52 -04:00
Rich Felker
996b6154b2 use kernel-provided AT_MINSIGSTKSZ for sysconf(_SC_[MIN]SIGSTKSZ)
use the legacy constant values if the kernel does not provide
AT_MINSIGSTKSZ (__getauxval will return 0 in this case) and as a
safety check if something is wrong and the provided value is less than
the legacy constant.

sysconf(_SC_SIGSTKSZ) returns SIGSTKSZ adjusted for the difference
between the legacy constant MINSIGSTKSZ and the runtime value, so that
the working space the application has on top of the minimum remains
invariant under changes to the minimum.
2022-08-26 11:34:46 -04:00
Rich Felker
25340a9337 add sysconf keys/values for signal stack size
as a result of ISA extensions exploding register file sizes on some
archs, using a constant for minimum signal stack size no longer seems
viably future-proof. add sysconf keys allowing the kernel to provide a
machine-dependent minimum applications can query to ensure they
allocate sufficient space for stacks. the key names and indices align
with the same functionality in glibc.

see commit d5a5045382 for previous
action on this subject.

ultimately, the macros MINSIGSTKSZ and SIGSTKSZ probably need to be
deprecated, but that is standards-amendment work outside the scope of
a single implementation.
2022-08-26 10:20:46 -04:00
Rich Felker
d8fddb9641 fix fallback when ipv6 is disabled but resolv.conf has v6 nameserves
apparently this code path was never tested, as it's not usual to have
v6 nameservers listed on a system without v6 networking support. but
it was always intended to work.

when reverting to binding a v4 address, also revert the family in the
sockaddr structure and the socklen for it. otherwise bind will just
fail due to mismatched family/sockaddr size.

fix dns resolver fallback when v6 nameservers are listed by
2022-08-24 20:48:47 -04:00
Kristina Martsenko
d4f987e4ac epoll_create: fail with EINVAL if size is non-positive
This is a part of the interface contract defined in the Linux man
page (official for a Linux-specific interface) and asserted by test
cases in the Linux Test Project (LTP).
2022-08-24 20:35:47 -04:00
Rich Felker
2e5fff43dd use alt signal stack when present for implementation-internal signals
a request for this behavior has been open for a long time. the
motivation is that application code, particularly under some language
runtimes designed around very-low-footprint coroutine type constructs,
may be operating with extremely small stack sizes unsuitable for
receiving signals, using a separate signal stack for any signals it
might handle.

progress on this was blocked at one point trying to determine whether
the implementation is actually entitled to clobber the alt stack, but
the phrasing "available to the implementation" in the POSIX spec for
sigaltstack seems to make it clear that the application cannot rely on
the contents of this memory to be preserved in the absence of signal
delivery (on the abstract machine, excluding implementation-internal
signals) and that we can therefore use it for delivery of signals that
"don't exist" on the abstract machine.

no change is made for SIGTIMER since it is always blocked when used,
and accepted via sigwaitinfo rather than execution of the signal
handler.
2022-08-20 12:24:49 -04:00
Érico Nogueira
379b18218d ldso: make exit condition clearer in fixup_rpath
breaking out of the switch-case when l==-1 means the conditional below
will necessarily be true (-1 >= buf_size, a size_t variable) and the
function will return 0. it is, however, somewhat unclear that that's
what's happening. simply returning there is simpler
2022-08-17 19:49:54 -04:00
Rich Felker
37e18b7bf3 freopen: reset stream orientation (byte/wide) and encoding rule
this is a requirement of the C language (orientation) and POSIX
(encoding rule) that was somehow overlooked.

we rely on the fact that the buffer pointers have been reset by
fflush, so that any future stdio operations on the stream will go
through the same code paths they would on a newly-opened file without
an orientation set, thereby setting the orientation as they should.
2022-08-17 18:34:07 -04:00
Rich Felker
bf99258564 ldso: process RELR only for non-FDPIC archs
the way RELR is applied is not a meaningful operation for FDPIC (there
is no single "base" address). it seems unlikely RELR would ever be
added for FDPIC, but if it ever is, the behavior and possibly data
format will need to be different, so guard against calling the
non-FDPIC code.
2022-08-02 17:29:01 -04:00
Fangrui Song
d32dadd60e ldso: support DT_RELR relative relocation format
this resolves DT_RELR relocations in non-ldso, dynamic-linked objects.
2022-08-02 17:27:45 -04:00
Alex Xu (Hello71)
2404d9d643 use syscall_arg_t and __scc macro for arguments to __alt_socketcall
otherwise, pointer arguments are sign-extended on x32, resulting in
EFAULT.
2022-08-02 14:12:18 -04:00
Michael Pratt
46d1c7801b fix strings.h feature test macro usage due to missing features.h 2022-08-01 13:57:11 -04:00
Eugene Yudin
baaf257f05 fix ESRCH error handling for clock_getcpuclockid
the syscall used to probe availability of the clock fails with EINVAL
when the requested pid does not exist, but clock_getcpuclockid is
specified to use ESRCH for this purpose.
2022-08-01 13:53:22 -04:00
Szabolcs Nagy
4f48da008d aarch64: add vfork
The generic vfork implementation uses clone(SIGCHLD) which has fork
semantics.

Implement vfork as clone(SIGCHLD|CLONE_VM|CLONE_VFORK, 0) instead which
has vfork semantics. (stack == 0 means sp is unchanged in the child.)

Some users rely on vfork semantics when memory overcommit is disabled
or when the vfork child runs code that synchronizes with the parent
process (non-conforming).
2022-08-01 13:37:39 -04:00
Rich Felker
7d568410b4 fix mishandling of errno in getaddrinfo AI_ADDRCONFIG logic
this code attempts to use the value of errno from failure of socket or
connect to infer availability of the requested address family (v4 or
v6). however, in the case where connect failed, there is an
intervening call to close between connect and the use of errno. close
is not required to preserve errno on success, and in fact the
__aio_close code, which is called whenever aio is linked and thus
always called in dynamic-linked programs, unconditionally clobbers
errno. as a result, getaddrinfo fails with EAI_SYSTEM and errno=ENOENT
rather than correctly determining that the address family was
unavailable.

this fix is based on report/patch by Jussi Nieminen, but simplified
slightly to avoid breaking the case where socket, not connect, failed.
2022-08-01 12:54:23 -04:00
Rich Felker
d16d7b1099 early stage ldso: remove symbolic references via error handling function
while the error handling function should not be reached in stage 2
(assuming ldso itself was linked correctly), this was not statically
determinate from the compiler's perspective, and in theory a compiler
performing LTO could lift the TLS references (errno and other things)
out of the printf-family functions called in a stage where TLS is not
yet initialized.

instead, perform the call via a static-storage, internal-linkage
function pointer which will be set to a no-op function until the stage
where the real error handling function should be reachable.

inspired by commit 63c67053a3.
2022-07-19 19:00:53 -04:00
Alex Xu (Hello71)
63c67053a3 in early stage ldso before __dls2b, call mprotect with __syscall
if LTO is enabled, gcc hoists the call to ___errno_location outside the
loop even though the access to errno is gated behind head != &ldso
because ___errno_location is marked __attribute__((const)). this causes
the program to crash because TLS is not yet initialized when called from
__dls2. this is also possible if LTO is not enabled; even though gcc 11
doesn't do it, it is still wrong to use errno here.

since the start and end are already aligned, we can simply call
__syscall instead of using global errno.

Fixes: e13a2b8953 ("implement PT_GNU_RELRO support")
2022-07-02 16:12:04 -04:00
Rich Felker
a23a3da29b avoid limited space of random temp file names if clock resolution is low
this is not an issue that was actually hit, but I noticed it during
previous changes to __randname: if the resolution of tv_nsec is too
low, the space of temp file names obtainable by a thread could
plausibly be exhausted. mixing in tv_sec avoids this.
2022-06-23 11:53:28 -04:00
Rich Felker
4100279825 remove random filename obfuscation that leaks ASLR information
the __randname function is used by various temp file creation
interfaces as a backend to produce a name to attempt using. it does
not have to produce results that are safe against guessing, and only
aims to avoid unintentional collisions.

mixing the address of an object on the stack in a reversible manner
leaked ASLR information, potentially allowing an attacker who can
observe the temp files created and their creation timestamps to narrow
down the possible ASLR state of the process that created them. there
is no actual value in mixing these addresses in; it was just
obfuscation. so don't do it.

instead, mix the tid, just to avoid collisions if multiple
processes/threads stampede to create temp files at the same moment.
even without this measure, they should not collide unless the clock
source is very low resolution, but it's a cheap improvement.

if/when we have a guaranteed-available userspace csprng, it could be
used here instead. even though there is no need for cryptographic
entropy here, it would avoid having to reason about clock resolution
and such to determine whether the behavior is nice.
2022-06-03 18:54:41 -04:00
Rich Felker
6c858d6fd4 ensure distinct query id for parallel A and AAAA queries in resolver
assuming a reasonable realtime clock, res_mkquery is highly unlikely
to generate the same query id twice in a row, but it's possible with a
very low-resolution system clock or under extreme delay of forward
progress. when it happens, res_msend fails to wait for both answers,
and instead stops listening after getting two answers to the same
query (A or AAAA).

to avoid this, increment one byte of the second query's id if it
matches the first query's. don't bother checking if the second byte is
also equal, since it doesn't matter; we just need to ensure that at
least one byte is distinct.
2022-06-03 11:03:00 -04:00