Commit Graph

4968 Commits

Author SHA1 Message Date
Alexey Izbyshev
8949da7ab1 select: fix 64-bit timeout truncation on pre-time64 kernels
If the (normalized) timeout passed to select exceeds INT_MAX seconds on
an arch with SYS_pselect6_time64 and the kernel is too old to support
time64 syscalls, the timeout is implicitly converted to (32-bit) long on
the fallback path, losing its upper 32 bits and potentially becoming a
small positive value, violating the intended semantics, or even
a negative value, causing the fallback syscall failure. Fix this by
saturating the timeout at INT_MAX as done in other time64 fallback
cases.
2023-03-02 20:00:45 -05:00
Rich Felker
3281047cfc dup3: don't set FD_CLOEXEC on failure on kernels without dup3 syscall
this is the best-effort fallback path for kernels that can't actually
support the dup3 functionality. it was setting FD_CLOEXEC flag on the
target fd (new) even if the dup2 operation failed. normally that
shouldn't happen under correct usage, but it's possible if the source
fd is not open or intentionally invalid (e.g. -1).
2023-02-28 15:44:46 -05:00
Rich Felker
c99b7daafd fix dup3 ignoring all flags but O_CLOEXEC on archs with SYS_dup2 syscall
our dup3 code wrongly skipped directly to making the SYS_dup2 syscall
whenever the O_CLOEXEC bit of flags was not set. this is incorrect if
any new flags are ever added, as it would silently ignore them rather
than failing with an error.

archs which lack SYS_dup2 were unaffected.

adjust the logic so that SYS_dup3 is attempted whenever flags is
nonzero, and explicitly fail with EINVAL if SYS_dup3 is unavailable
and there are any unknown flags.
2023-02-28 12:21:23 -05:00
Rich Felker
fb7fb5e4bd fix pipe2 silently ignoring unknown flags on old kernels
kernels using the fallback have an inherent close-on-exec race
condition and as such support for them is only best-effort anyway.
however, ignoring potential new flags is still very bad behavior.
instead, fail with EINVAL.
2023-02-28 12:18:43 -05:00
Alexey Izbyshev
b1dfb734a4 getservbyport_r: fix wrong result if getnameinfo fails with EAI_OVERFLOW
EAI_OVERFLOW should be propagated as ERANGE to inform the caller about
the need to expand the buffer.
2023-02-28 12:01:34 -05:00
Alexey Izbyshev
595416b11d getservbyport_r: fix out-of-bounds buffer read
If the buffer passed to getservbyport_r is just enough to store two
pointers after aligning it, getnameinfo is called with buflen == 0
(which means that service name is not needed) and trivially succeeds.
Then, strtol is called on the address just past the buffer end, and
if it doesn't happen to find the port number there, getservbyport_r
spuriously succeeds and returns the same bad address to the caller.

Fix this by ensuring that buflen is at least 1 when passed to
getnameinfo.
2023-02-28 12:00:55 -05:00
Alexey Izbyshev
1a708ece1a getifaddrs: fix UB via taking address of null pointer union dereference
getifaddrs computes &ctx->first->ifa even if ctx->first is NULL. While
this shouldn't be possible on the success path because the loopback
interface is hardcoded into the kernel, this is still possible on the
error path (for example, if __rtnetlink_enumerate couldn't create a
socket due to exceeding the fd limit).
2023-02-28 11:59:53 -05:00
Alexey Izbyshev
c499c1084e accept4: don't fall back to accept if we got unknown flags
accept4 emulation via accept ignores unknown flags, so it can spuriously
succeed instead of failing (or succeed without doing the action implied
by an unknown flag if it's added in a future kernel). Worse, unknown
flags trigger the fallback code even on modern kernels if the real
accept4 syscall returns EINVAL, because this is indistinguishable from
socketcall returning EINVAL due to lack of accept4 support.

Fix this by always failing with EINVAL if unknown flags are present and
the syscall is missing or failed with EINVAL.
2023-02-28 11:48:05 -05:00
Alexey Izbyshev
523d9b965d fix potential read past end of buffer in getnameinfo host name lookup
This is completely analoguous to commit 633183b5d1.

Similar code called from __lookup_name is not affected because it checks
that the line contains the host name surrounded by blanks.
2023-02-27 10:04:34 -05:00
Alexey Izbyshev
d0b7f9768d dns: fix workaround for systems defaulting to ipv6-only sockets
When IPv6 nameservers are present, __res_msend_rc attempts to disable
IPV6_V6ONLY socket option to ensure that it can communicate with IPv4
nameservers (if they are present too) via IPv4-mapped IPv6 addresses.
However, this option can't be disabled on bound sockets, so setsockopt
always fails.
2023-02-27 10:03:56 -05:00
Alexey Izbyshev
bec42ef393 dns: handle early eof in tcp fallback
A zero returned from recvmsg is currently treated as if some data were
received, so if a DNS server closes its TCP socket before sending the
full answer, __res_msend_rc will spin until the timeout elapses because
POLLIN event will be reported on each poll. Fix this by treating an
early EOF as an error.
2023-02-27 10:03:34 -05:00
Alexey Izbyshev
9b132e5567 prevent CNAME/PTR parsing from reading data past the response end
DNS parsing callbacks pass the response buffer end instead of the actual
response end to dn_expand, so a malformed DNS response can use message
compression to make dn_expand jump past the response end and attempt to
parse uninitialized parts of that buffer, which might succeed and return
garbage.
2023-02-27 10:03:06 -05:00
Alexey Izbyshev
12590c8bbd fix out-of-bounds reads in __dns_parse
There are several issues with range checks in this function:

* The question section parsing loop can read up to two out-of-bounds
  bytes before doing the range check and bailing out.

* The answer section parsing loop, in addition to the same issue as
  above, uses the wrong length in the range check that doesn't prevent
  OOB reads when computing len later.

* The len range check before calling the callback is off by 10. Also,
  p+len can overflow in a (probably theoretical) case when p is within
  2^16 from UINTPTR_MAX.

Because __dns_parse is used only with stack-allocated buffers, such
small overreads can't result in a segfault. The first two also don't
affect the function result, but the last one may result in getaddrinfo
incorrectly succeeding and returning up to 10 bytes past the
response buffer as a part of the IP address, and in (canon) name
returned by getaddrinfo/getnameinfo being affected by memory past the
response buffer (because dn_expand might interpret it as a pointer).
2023-02-27 10:01:29 -05:00
Rich Felker
bc695a5ac1 fix incorrect unit for CPU_SETSIZE macro
this macro is supposed to reflect the number of members (bits) in
cpu_set_t, not the storage size (bytes).
2023-02-23 10:10:44 -05:00
A. Wilcox
7d756e1c04 dns: prefer monotonic clock for timeouts
Before this commit, DNS timeouts always used CLOCK_REALTIME, which
could produce spurious timeouts or delays if wall time changed for
whatever reason.

Now we try CLOCK_MONOTONIC and only fall back to CLOCK_REALTIME when
it is unavailable.
2023-02-12 18:03:24 -05:00
Gabriel Ravier
07616721f1 fix return value of wcs{,n}cmp for extreme wchar_t values
As a result of using simple subtraction to implement the return values
for wcscmp and wcsncmp, integer overflow can occur (producing
undefined behavior, and in practice, a wrong comparison result). This
does not occur for meaningful character values (21-bit range) but the
functions are specified to work on arbitrary wchar_t arrays.

This patch replaces the subtraction with a little bit of code that
orders the characters correctly, returning -1 if the character from
the first string is smaller than the one from the second, 0 if they
are equal and 1 if the character from the first string is larger than
the one from the second.
2023-02-12 17:50:59 -05:00
Szabolcs Nagy
35fdfe62a4 math: fix undefined shift in logf
A signed int shift overflowed when computing a constant mask, use hex
literal instead.  This is unlikely to cause actual issues unless the
code was compiled with ubsan or similar instrumentation specifically
to catch this. The stripped libc.so is unchanged on x86_64.
Reported by q66 on irc.
2023-02-12 17:46:50 -05:00
Alexey Izbyshev
7e13e5ae69 inet_pton: fix uninitialized memory use for IPv4-mapped IPv6 addresses
When a dot is encountered, the loop counter is incremented before
exiting the loop, but the corresponding ip array element is left
uninitialized, so the subsequent memmove (if "::" was seen) and the
loop copying ip to the output buffer will operate on an uninitialized
uint16_t.

The uninitialized data never directly influences the control flow and
is overwritten on successful return by the second half of the parsed
IPv4 address. But it's better to fix this to avoid unexpected
transformations by a sufficiently smart compiler and reports from
UB-detection tools.
2023-02-12 17:42:37 -05:00
Szabolcs Nagy
7e6da7ac98 hsearch: fix null pointer arithmetic UB
htab->__tab->entries pointer may be 0 so delay using it in arithmetics.
this did not cause any known issue other than with ubsan instrumentation.
2023-02-12 17:41:23 -05:00
Colin Cross
f79b973d92 increase sendmsg internal buffer to support SCM_MAX_FD
The kernel defines a limit on the number of fds that can be passed
through an SCM_RIGHTS ancillary message as SCM_MAX_FD. The value was
255 before kernel 2.6.38 (after that it is 253), and an SCM_RIGHTS
ancillary message with 255 fds requires 1040 bytes, slightly more than
the current 1024 byte internal buffer in sendmsg. 1024 is an arbitrary
size, so increase it to match the the arbitrary size limit in the
kernel. This fixes tests that are verifying they support up to
SCM_MAX_FD fds.
2023-02-12 17:38:37 -05:00
Rich Felker
0ab97350f0 mq_notify: block all (application) signals in the worker thread
until the mq notification event arrives, it is mandatory that signals
be blocked. otherwise, a signal can be received, and its handler
executed, in a thread which does not yet exist on the abstract
machine.

after the point of the event arriving, having signals blocked is not a
conformance requirement but a QoI requirement. while the application
can unblock any signals it wants unblocked in the event handler
thread, if they did not start out blocked, it could not block them
without a race window where they are momentarily unblocked, and this
would preclude controlled delivery or other forms of acceptance
(sigwait, etc.) anywhere in the application.
2023-02-12 15:05:39 -05:00
Rich Felker
711673ee77 mq_notify: join worker thread before returning in error path
this avoids leaving behind transient resource consumption whose
cleanup is subject to scheduling behavior.
2023-02-12 15:05:38 -05:00
Rich Felker
8c0c9c69a1 mq_notify: rework to fix use-after-close/double-close bugs
in the error path where the mq_notify syscall fails, the initiating
thread may have closed the socket before the worker thread calls recv
on it. even in the absence of such a race, if the recv call failed,
e.g. due to seccomp policy blocking it, the worker thread could
proceed to close, producing a double-close condition.

this can all be simplified by moving the mq_notify syscall into the
new thread, so that the error case does not require pthread_cancel.
now, the initiating thread only needs to read back the error status
after waiting for the worker thread to consume its arguments.
2023-02-12 15:05:38 -05:00
Rich Felker
fde6891e59 mq_notify: use semaphore instead of barrier to sync args consumption
semaphores are a much lighter primitive, and more idiomatic with
current usage in the code base.
2023-02-11 13:00:37 -05:00
Rich Felker
c3cd04fa5f fix pthread_detach inadvertently acting as cancellation point in race case
disabling cancellation around the pthread_join call seems to be the
safest and logically simplest fix. i believe it would also be possible
to just perform the unmap directly here after __tl_sync, removing the
dependency on pthread_join, but such an approach duplicately encodes a
lot more implementation assumptions.
2023-02-11 13:00:22 -05:00
Rich Felker
115149c023 powerpc-sf longjmp clobbering of val argument
the logic to check hwcap for SPE register file inadvertently clobbered
the val argument before use. switch to a different work register so
this doesn't happen.
2023-02-11 10:00:31 -05:00
Pedro Falcato
5763f003a5 riscv64: add vfork
Implement vfork() using clone(CLONE_VM | CLONE_VFORK | ...).
2023-02-09 12:33:35 -05:00
Rich Felker
269d193820 fix wrong sigaction syscall ABI on mips*, or1k, microblaze, riscv64
we wrongly defined a dummy SA_RESTORER flag on these archs, despite
the kernel interface not actually having such a feature. on archs
which lack SA_RESTORER, the kernel sigaction structure also lacks the
restorer function pointer member, which means the signal mask appears
at a different offset. the kernel was thereby interpreting the bits of
the code address as part of the signal set to be masked while handling
the signal.

this patch removes the erroneous SA_RESTORER definitions from archs
which do not have it, makes access to the member conditional on
whether SA_RESTORER is defined for the arch, and removes the
now-unused asm for the affected archs.

because there are reportedly versions of qemu-user which also use the
wrong ABI here, the old ksigaction struct size is preserved with an
unused member at the end. this is harmless and mitigates the risk of
such a bug turning into a buffer overflow onto the sigaction
function's stack.
2023-02-09 12:33:35 -05:00
Rich Felker
ea3b40a321 fix integer overflow in WIFSTOPPED macro
the result of the 0xffff mask with the exit status could have bit 15
set, in which case multiplying by 0x10001 overflows 32-bit signed int.
making the multiply unsigned avoids the overflow. it also changes the
sign extension behavior of the subsequent >> operation, but the
affected bits are all unwanted anyway and all discarded by the cast to
short.
2023-02-08 16:42:28 -05:00
Rich Felker
f897461d4f fix debugger tracking of shared libraries on mips with PIE main program
mips has its own mechanisms for DT_DEBUG because it makes _DYNAMIC
read-only, and the original mechanism, DT_MIPS_RLD_MAP, was
PIE-incompatible. DT_MIPS_RLD_MAP_REL was added to remedy this, but we
never implemented support for it. add it now using the same idioms for
mips-specific ldso logic.
2023-01-18 10:32:14 -05:00
Rich Felker
a4b0a665b8 expose memmem under baseline POSIX feature profile
memmem has been adopted for the next issue of POSIX (outcome of
tracker item 1061). since mem* is in the reserved namespace for
string.h it's already fully conforming to expose it by default, so
just do so.
2023-01-06 06:33:19 -05:00
Rich Felker
9532ae1318 use libc-internal malloc for pthread_atfork
while no lock is held here making it a lock-order issue, replacement
malloc is likely to want to use pthread_atfork, possibly making the
call to malloc infinitely recursive.

even if not, there is no reason to prefer an application-provided
malloc here.
2022-12-17 16:00:19 -05:00
Markus Wichmann
7d358599d4 prevent invalid reads of nl_arg in printf_core
printf_core() runs twice, and during its first run, nl_arg is
uninitialized and must not be read. It gets initialized at the end of
the first run. Conversely, nl_type does not need to be set during the
second run, as its useful life has ended at that point, since the only
time it is read is during that exact same initialization. Therefore we
can simply alternate the assignments.

p and w do still need to get values assigned to them, since at least one
line in the same if-statement depends on that, but they can be dummy
values. arg does not need to be assigned, since in the first run, we
encounter a continue statement before using the argument.
2022-12-14 10:03:37 -05:00
Fangrui Song
c5f4b2dfea elf.h: add ELFCOMPRESS_ZSTD 2022-12-14 09:34:32 -05:00
Rich Felker
159d1f6c02 semaphores: fix missed wakes from ABA bug in waiter count logic
because the has-waiters state in the semaphore value futex word is
only representable when the value is zero (the special value -1
represents "0 with potential new waiters"), it's lost if intervening
operations make the semaphore value positive again. this creates an
ABA issue in sem_post, whereby the post uses a stale waiters count
rather than re-evaluating it, skipping the futex wake if the stale
count was zero.

the fix here is based on a proposal by Alexey Izbyshev, with minor
changes to eliminate costly new spurious wake syscalls.

the basic idea is to replace the special value -1 with a sticky
waiters bit (repurposing the sign bit) preserved under both wait and
post. any post that takes place with the waiters bit set will perform
a futex wake.

to be useful, the waiters bit needs to be removable, and to remove it
safely, we perform a broadcast wake instead of a normal single-task
wake whenever removing the bit. this lets any un-accounted-for waiters
wake and re-add the waiters bit if they still need it.

there are multiple possible choices for when to perform this
broadcast, but the optimal choice seems to be doing it whenever the
observed waiters count is less than two (semantically, this means
exactly one, but we might see a stale count of zero). in this case,
the expected number of threads to be woken is one, with exactly the
same cost as a non-broadcast wake.
2022-12-13 18:39:44 -05:00
Rich Felker
f47a8cdd25 ldso: fix invalid early references to extern-linkage libc.page_size
when PAGE_SIZE is not constant, internal/libc.h defines it to expand
to libc.page_size. however, kernel_mapped_dso, reachable from stage 2
of the dynamic linker bootstrap (__dls2), needs PAGE_SIZE to interpret
the relro range. at this point the libc object is both uninitialized
and invalid to access according to our model for bootstrapping, which
does not assume any external-linkage objects are accessible until
stages 2b/3. in practice it likely worked because hidden visibility
tends to behave like internal linkage, but this is not a property that
the dynamic linker was designed to rely upon.

this bug likely manifested as relro malfunction on archs with variable
page size, due to incorrect mask when aligning the relro bounds to
page boundaries.

while there are certainly more direct ways to fix the known problem
point here, a maximally future-proof way is to just bypass the libc.h
PAGE_SIZE definition in the dynamic linker and instead have dynlink.c
define its own internal-linkage object for variable page size. then,
if anything else in stage 2 ever ends up referencing PAGE_SIZE, it
will just automatically work right.
2022-11-30 19:07:34 -05:00
Alexey Izbyshev
377218cb96 pthread_atfork: fix return value on malloc failure
POSIX requires pthread_atfork to report errors via its return value,
not via errno. The only specified error is ENOMEM.
2022-11-12 12:22:38 -05:00
Rich Felker
29e4319178 fix double-processing of DT_RELR relocations in ldso relocating itself
this is analogous to skip_relative logic in do_relocs -- because
relative relocations for the dynamic linker itself were already
performed at entry (stage 1), they must not be applied again.
2022-11-10 09:02:02 -05:00
Rich Felker
b50eb8c36c fix strverscmp comparison of digit sequence with non-digits
the rule that longest digit sequence not beginning with a zero is
greater only applies when both sequences being compared are
non-degenerate. this is spelled out explicitly in the man page, which
may be deemed authoritative for this nonstandard function: "If one or
both of these is empty, then return what strcmp(3) would have
returned..."

we were wrongly treating any sequence of digits not beginning with a
zero as greater than a non-digit in the other string.
2022-11-07 22:33:24 -05:00
Rich Felker
ad5dcd398b fix async thread cancellation stack alignment
if async cancellation is enabled and acted upon, the stack pointer is
not necessarily pointing to a __syscall_cp_asm stack frame. the
contents of the stack being wrong don't really matter, but if the
stack pointer is not suitably aligned, the procedure call ABI is
violated when calling back into C code via __cancel, and pthread_exit,
cancellation cleanup handlers, TSD destructors, etc. may malfunction
or crash.

for the async cancel case, just call __cancel directly like we did
prior to commit 102f6a01e2. restore the
signal mask prior to doing this since the cancellation handler runs
with all signals blocked.
2022-11-05 18:59:53 -04:00
Rich Felker
8f9259450a fix return value of gethostby{name[2],addr} with no result but no error
commit f081d5336a fixed
gethostbyname[2]_r to treat negative results as a non-error, leaving
gethostbyname[2] wrongly returning a pointer to the unfilled result
buffer rather than a null pointer. since, as documented with commit
fe82bb9b92, the caller of
gethostby{name[2],addr}_r can always rely on the result pointer being
set, use that consistently rather than trying to duplicate logic about
whether we have a result or not in gethostby{name[2],addr}.
2022-10-20 19:48:32 -04:00
Rich Felker
63402be229 clean up dns_parse_callback
the only functional change here should be that MAXADDRS is only
checked for RRs that provide address results, so that a CNAME which
appears after an excessive number of address RRs does not get ignored.
I'm not aware of any servers that order the RRs this way, and it may
even be forbidden to do so, but I prefer having the callback logic not
be order dependent.

other than that, the motivation for this change is that the A and AAAA
cases were mostly duplicate code that could be combined as a single
code path.
2022-10-19 14:02:48 -04:00
Rich Felker
0a7b4323b0 dns response handling: don't treat too many addresses as an error
returning -1 rather than 0 from the parse function causes __dns_parse
to bail out and return an error. presently, name_from_dns does not
check the return value anyway, so this does not matter, but if it ever
started treating this as an error, lookups with large numbers of
addresses would break. this is a consequence of adding TCP support and
extending the buffer size used in name_from_dns.
2022-10-19 14:01:45 -04:00
Rich Felker
41603c7706 dns response handling: ignore presence of wrong-type RRs
reportedly there is nameserver software with question-rewriting
"functionality" which gives A answers when AAAA is queried. since we
made no effort to validate that the answer RR type actually
corresponds to the question asked, it was possible (depending on
flags, etc.) for these answers to leak through, which the caller might
not be prepared for. indeed, our implementation of gethostbyname2_r
makes an assumption that the resulting addresses are in the family
requested, and will misinterpret the results if they don't.

commit 45ca5d3fcb already noted in
fixing CVE-2017-15650 that this could happen, but did nothing to
validate that the RR type of the answer matches the question; it just
enforced the limit on number of results to preclude overflow.

presently, name_from_dns ignores the return value of __dns_parse, so
it doesn't really matter whether we return 0 (ignoring the RR) or -1
(parse-ending error) upon encountering the mismatched RR. if that ever
changes, though, ignoring irrelevant answer RRs sounds like the
semantically correct thing to do, so for now let's return 0 from the
callback when this happens.
2022-10-19 14:01:32 -04:00
Rich Felker
cf76df0e1f fix missing synchronization of pthread TSD keys with MT-fork
commit 167390f055 seems to have
overlooked the presence of a lock here, probably because it was one of
the exceptions not using LOCK() but a rwlock.

as such, it can't be added to the generic table of locks to take, so
add an explicit atfork function for the pthread keys table. the order
it is called does not particularly matter since nothing else in libc
but pthread_exit interacts with keys.
2022-10-19 14:01:32 -04:00
Rich Felker
5ff3eea91f fgets: avoid arithmetic overflow when n==INT_MIN is passed
performing n-- is not a safe operation for arbitrary signed input n.
only perform the decrement in the code path where the initial n is
greater than 1, and adjust the condition in the n<=1 code path to
compensate for it not having been decremented.
2022-10-19 14:01:32 -04:00
Rich Felker
d8f35e29d0 fix AS-safety of close when aio is in use and fd map is expanded
the aio operations that lead to calling __aio_get_queue with the
possibility to expand the fd map are not AS-safe, but if they are
interrupted by a signal handler, the signal handler may call close,
which is required to be AS-safe. due to __aio_get_queue taking the
write lock without blocking signals, such a call to close from a
signal handler could deadlock.

change __aio_get_queue to block signals if it needs to obtain a write
lock, and restore when finished.
2022-10-19 14:01:32 -04:00
Alexey Izbyshev
26c76a908b fix use of uninitialized dummy_fut in aio_suspend
aio_suspend waits on a dummy futex in the corner case when the array of
requests contains NULL pointers only. But the value of this futex was
left uninitialized, so if it happens to be non-zero, aio_suspend
degrades to spinning instead of blocking.
2022-10-19 14:01:32 -04:00
Rich Felker
aebd6a3644 fix potential deadlock between multithreaded fork and aio
as reported by Alexey Izbyshev, there is a lock order inversion
deadlock between the malloc lock and aio maplock at MT-fork time:
_Fork attempts to take the aio maplock while fork already has the
malloc lock, but a concurrent aio operation holding the maplock may
attempt to allocate memory.

move the __aio_atfork calls in the parent from _Fork to fork, and
reorder the lock before most other locks, since nothing else depends
on aio(*). this leaves us with the possibility that the child will not
be able to obtain the read lock, if _Fork is used directly and happens
concurrent with an aio operation. however, in that case, the child
context is an async signal context that cannot call any further aio
functions, so all we need is to ensure that close does not attempt to
perform any aio cancellation. this can be achieved just by nulling out
the map pointer.

(*) even if other functions call close, they will only need a read
lock, not a write lock, and read locks being recursive ensures they
can obtain it. moreover, the number of read references held is bounded
by something like twice the number of live threads, meaning that the
read lock count cannot saturate.
2022-10-19 14:01:32 -04:00
Rich Felker
d64148a874 fix potential unsynchronized access to killlock state at thread exit
as reported by Alexey Izbyshev, when the second-to-last thread exits
causing a return to single-threaded (no locks needed) state, it
creates a situation where the last remaining thread may obtain the
killlock that's already held by the exiting thread. this means it may
erroneously use the tid of the exiting thread, and may corrupt the
lock state due to double-unlock.

commit 8d81ba8c0b, which (re)introduced
the switch back to single-threaded state, documents the intent that
the first lock after switching back should provide the necessary
synchronization. this is correct, but only works if the switch back is
made after there is no further need for synchronization with locks
(other than the thread list lock, which can't be bypassed) held by the
exiting thread.

in order to hit the bug, the remaining thread must first take a
different lock, causing it to perform an actual lock one last time,
consume the need_locks==-1 state, and transition to need_locks==0.
after that, the next attempt to lock the exiting thread's killlock
will bypass locking.

fix this by reordering the unlocking of killlock at thread exit time,
along with changes to the state protected by it, to occur earlier,
before the switch to single-threaded state. there are really no
constraints on where it's done, except that it occur after there is no
longer any possibility of application code executing in the exiting
thread, so do it as early as possible.
2022-10-19 14:01:32 -04:00