Commit Graph

65 Commits

Author SHA1 Message Date
Rich Felker
10d0268ccf switch to using trap number 31 for syscalls on sh
nominally the low bits of the trap number on sh are the number of
syscall arguments, but they have never been used by the kernel, and
some code making syscalls does not even know the number of arguments
and needs to pass an arbitrary high number anyway.

sh3/sh4 traditionally used the trap range 16-31 for syscalls, but part
of this range overlapped with hardware exceptions/interrupts on sh2
hardware, so an incompatible range 32-47 was chosen for sh2.

using trap number 31 everywhere, since it's in the existing sh3/sh4
range and does not conflict with sh2 hardware, is a proposed
unification of the kernel syscall convention that will allow binaries
to be shared between sh2 and sh3/sh4. if this is not accepted into the
kernel, we can refit the sh2 target with runtime selection mechanisms
for the trap number, but doing so would be invasive and would entail
non-trivial overhead.
2015-06-16 15:25:02 +00:00
Rich Felker
ec634aad91 add sh asm for vfork 2015-06-11 05:01:04 +00:00
Rich Felker
19a1fe670a remove remnants of support for running in no-thread-pointer mode
since 1.1.0, musl has nominally required a thread pointer to be setup.
most of the remaining code that was checking for its availability was
doing so for the sake of being usable by the dynamic linker. as of
commit 71f099cb7d, this is no longer
necessary; the thread pointer is now valid before any libc code
(outside of dynamic linker bootstrap functions) runs.

this commit essentially concludes "phase 3" of the "transition path
for removing lazy init of thread pointer" project that began during
the 1.1.0 release cycle.
2015-04-13 19:24:51 -04:00
Rich Felker
4e98cce1c5 optimize out setting up robust list with kernel when not needed
as a result of commit 12e1e32468, kernel
processing of the robust list is only needed for process-shared
mutexes. previously the first attempt to lock any owner-tracked mutex
resulted in robust list initialization and a set_robust_list syscall.
this is no longer necessary, and since the kernel's record of the
robust list must now be cleared at thread exit time for detached
threads, optimizing it out is more worthwhile than before too.
2015-04-10 00:54:48 -04:00
Rich Felker
14a0117117 make execvp continue PATH search on EACCES rather than issuing an errror
the specification for execvp itself is unclear as to whether
encountering a file that cannot be executed due to EACCES during the
PATH search is a mandatory error condition; however, XBD 8.3's
specification of the PATH environment variable clarifies that the
search continues until a file with "appropriate execution permissions"
is found.

since it seems undesirable/erroneous to report ENOENT rather than
EACCES when an early path element has a non-executable file and all
later path elements lack any file by the requested name, the new code
stores a flag indicating that EACCES was seen and sets errno back to
EACCES in this case.
2015-02-03 00:31:35 -05:00
Rich Felker
8f7bc690f0 use direct syscall rather than write function in posix_spawn child
the write function is a cancellation point and accesses thread-local
state belonging to the calling thread in the parent process. since
cancellation is blocked for the duration of posix_spawn, this is
probably safe, but it's fragile and unnecessary. making the syscall
directly is just as easy and clearly safe.
2014-12-05 21:19:39 -05:00
Rich Felker
1c12c24364 don't fail posix_spawn on failed close
the resolution of austin group issue #370 removes the requirement that
posix_spawn fail when the close file action is performed on an
already-closed fd. since there are no other meaningful errors for
close, just ignoring the return value completely is the simplest fix.
2014-12-05 21:15:41 -05:00
Rich Felker
83dc6eb087 eliminate use of cached pid from thread structure
the main motivation for this change is to remove the assumption that
the tid of the main thread is also the pid of the process. (the value
returned by the set_tid_address syscall was used to fill both fields
despite it semantically being the tid.) this is historically and
presently true on linux and unlikely to change, but it conceivably
could be false on other systems that otherwise reproduce the linux
syscall api/abi.

only a few parts of the code were actually still using the cached pid.
in a couple places (aio and synccall) it was a minor optimization to
avoid a syscall. caching could be reintroduced, but lazily as part of
the public getpid function rather than at program startup, if it's
deemed important for performance later. in other places (cancellation
and pthread_kill) the pid was completely unnecessary; the tkill
syscall can be used instead of tgkill. this is actually a rather
subtle issue, since tgkill is supposedly a solution to race conditions
that can affect use of tkill. however, as documented in the commit
message for commit 7779dbd266, tgkill
does not actually solve this race; it just limits it to happening
within one process rather than between processes. we use a lock that
avoids the race in pthread_kill, and the use in the cancellation
signal handler is self-targeted and thus not subject to tid reuse
races, so both are safe regardless of which syscall (tgkill or tkill)
is used.
2014-07-05 23:29:55 -04:00
Rich Felker
0b3d33d4d2 fix ungrammatical comment in posix_spawn code 2014-07-01 18:32:52 -04:00
Rich Felker
5cf9e8f860 additional fixes for linux kernel apis with old syscalls removed 2014-05-30 01:51:23 -04:00
Rich Felker
dd5f50da6f support linux kernel apis (new archs) with old syscalls removed
such archs are expected to omit definitions of the SYS_* macros for
syscalls their kernels lack from arch/$ARCH/bits/syscall.h. the
preprocessor is then able to select the an appropriate implementation
for affected functions. two basic strategies are used on a
case-by-case basis:

where the old syscalls correspond to deprecated library-level
functions, the deprecated functions have been converted to wrappers
for the modern function, and the modern function has fallback code
(omitted at the preprocessor level on new archs) to make use of the
old syscalls if the new syscall fails with ENOSYS. this also improves
functionality on older kernels and eliminates the incentive to program
with deprecated library-level functions for the sake of compatibility
with older kernels.

in other situations where the old syscalls correspond to library-level
functions which are not deprecated but merely lack some new features,
such as the *at functions, the old syscalls are still used on archs
which support them. this may change at some point in the future if or
when fallback code is added to the new functions to make them usable
(possibly with reduced functionality) on old kernels.
2014-05-29 21:01:32 -04:00
Rich Felker
594c827a22 support kernels with no SYS_open syscall, only SYS_openat
open is handled specially because it is used from so many places, in
so many variants (2 or 3 arguments, setting errno or not, and
cancellable or not). trying to do it as a function would not only
increase bloat, but would also risk subtle breakage.

this is the first step towards supporting "new" archs where linux
lacks "old" syscalls.
2014-05-24 22:54:05 -04:00
M Farkas-Dyck
164c5c7a32 expose public execvpe interface 2014-04-20 00:26:55 -04:00
Rich Felker
dab441aea2 always initialize thread pointer at program start
this is the first step in an overhaul aimed at greatly simplifying and
optimizing everything dealing with thread-local state.

previously, the thread pointer was initialized lazily on first access,
or at program startup if stack protector was in use, or at certain
random places where inconsistent state could be reached if it were not
initialized early. while believed to be fully correct, the logic was
fragile and non-obvious.

in the first phase of the thread pointer overhaul, support is retained
(and in some cases improved) for systems/situation where loading the
thread pointer fails, e.g. old kernels.

some notes on specific changes:

- the confusing use of libc.main_thread as an indicator that the
  thread pointer is initialized is eliminated in favor of an explicit
  has_thread_pointer predicate.

- sigaction no longer needs to ensure that the thread pointer is
  initialized before installing a signal handler (this was needed to
  prevent a situation where the signal handler caused the thread
  pointer to be initialized and the subsequent sigreturn cleared it
  again) but it still needs to ensure that implementation-internal
  thread-related signals are not blocked.

- pthread tsd initialization for the main thread is deferred in a new
  manner to minimize bloat in the static-linked __init_tp code.

- pthread_setcancelstate no longer needs special handling for the
  situation before the thread pointer is initialized. it simply fails
  on systems that cannot support a thread pointer, which are
  non-conforming anyway.

- pthread_cleanup_push/pop now check for missing thread pointer and
  nop themselves out in this case, so stdio no longer needs to avoid
  the cancellable path when the thread pointer is not available.

a number of cases remain where certain interfaces may crash if the
system does not support a thread pointer. at this point, these should
be limited to pthread interfaces, and the number of such cases should
be fewer than before.
2014-03-24 16:57:11 -04:00
rofl0r
664cd34192 x32 port (diff against vanilla x86_64) 2014-02-23 11:09:16 +01:00
rofl0r
323272db17 import vanilla x86_64 code as x32 2014-02-23 11:07:18 +01:00
Rich Felker
8011614da0 make posix_spawn accept null pid pointer arguments
this is a requirement in the specification that was overlooked.
2014-02-12 01:03:07 -05:00
Szabolcs Nagy
571744447c include cleanups: remove unused headers and add feature test macros 2013-12-12 05:09:18 +00:00
Szabolcs Nagy
c3a43b35cc add missing va_end in execl* for correcness and static code analyzers 2013-10-07 13:24:00 +00:00
Rich Felker
2b2aff37ac fix new environment always being null with execle
the va_arg call for the argv[]-terminating null pointer was missing,
so this pointer was being wrongly used as the environment pointer.

issue reported by Timo Teräs. proposed patch slightly modified to
simplify the resulting code.
2013-10-03 10:16:01 -04:00
Rich Felker
3c5c5e6f92 optimize posix_spawn to avoid spurious sigaction syscalls
the trick here is that sigaction can track for us which signals have
ever had a signal handler set for them, and only those signals need to
be considered for reset. this tracking mask may have false positives,
since it is impossible to remove bits from it without race conditions.
false negatives are not possible since the mask is updated with atomic
operations prior to making the sigaction syscall.

implementation-internal signals are set to SIG_IGN rather than SIG_DFL
so that a signal raised in the parent (e.g. calling pthread_cancel on
the thread executing pthread_spawn) does not have any chance make it
to the child, where it would cause spurious termination by signal.

this change reduces the minimum/typical number of syscalls in the
child from around 70 to 4 (including execve). this should greatly
improve the performance of posix_spawn and other interfaces which use
it (popen and system).

to facilitate these changes, sigismember is also changed to return 0
rather than -1 for invalid signals, and to return the actual status of
implementation-internal signals. POSIX allows but does not require an
error on invalid signal numbers, and in fact returning an error tends
to confuse applications which wrongly assume the return value of
sigismember is boolean.
2013-08-09 21:03:47 -04:00
Rich Felker
65d7aa4dfd fix missing errno from exec failure in posix_spawn
failures prior to the exec attempt were reported correctly, but on
exec failure, the return value contained junk.
2013-08-09 20:04:05 -04:00
Rich Felker
d4d6d6f322 block signals during fork
there are several reasons for this. some of them are related to race
conditions that arise since fork is required to be async-signal-safe:
if fork or pthread_create is called from a signal handler after the
fork syscall has returned but before the subsequent userspace code has
finished, inconsistent state could result. also, there seem to be
kernel and/or strace bugs related to arrival of signals during fork,
at least on some versions, and simply blocking signals eliminates the
possibility of such bugs.
2013-08-08 23:17:05 -04:00
Rich Felker
c8c0844f7f debloat code that depends on /proc/self/fd/%d with shared function
I intend to add more Linux workarounds that depend on using these
pathnames, and some of them will be in "syscall" functions that, from
an anti-bloat standpoint, should not depend on the whole snprintf
framework.
2013-08-02 12:59:45 -04:00
Rich Felker
b06dc66639 make posix_spawn (and functions that use it) use CLONE_VFORK flag
this is both a minor scheduling optimization and a workaround for a
difficult-to-fix bug in qemu app-level emulation.

from the scheduling standpoint, it makes no sense to schedule the
parent thread again until the child has exec'd or exited, since the
parent will immediately block again waiting for it.

on the qemu side, as regular application code running on an underlying
libc, qemu cannot make arbitrary clone syscalls itself without
confusing the underlying implementation. instead, it breaks them down
into either fork-like or pthread_create-like cases. it was treating
the code in posix_spawn as pthread_create-like, due to CLONE_VM, which
caused horribly wrong behavior: CLONE_FILES broke the synchronization
mechanism, CLONE_SIGHAND broke the parent's signals, and CLONE_THREAD
caused the child's exec to end the parent -- if it hadn't already
crashed. however, qemu special-cases CLONE_VFORK and emulates that
with fork, even when CLONE_VM is also specified. this also gives
incorrect semantics for code that really needs the memory sharing, but
posix_spawn does not make use of the vm sharing except to avoid
momentary double commit charge.

programs using posix_spawn (including via popen) should now work
correctly under qemu app-level emulation.
2013-07-17 13:54:41 -04:00
Rich Felker
a0473a0c82 remove explicit locking to prevent __synccall setuid during posix_spawn
for the duration of the vm-sharing clone used by posix_spawn, all
signals are blocked in the parent process, including
implementation-internal signals. since __synccall cannot do anything
until successfully signaling all threads, the fact that signals are
blocked automatically yields the necessary safety.

aside from debloating and general simplification, part of the
motivation for removing the explicit lock is to simplify the
synchronization logic of __synccall in hopes that it can be made
async-signal-safe, which is needed to make setuid and setgid, which
depend on __synccall, conform to the standard. whether this will be
possible remains to be seen.
2013-04-26 15:09:49 -04:00
Rich Felker
7914ce9204 remove cruft from pre-posix_spawn version of the system function 2013-03-24 22:40:54 -04:00
Rich Felker
23ccb80fcb consistently use the internal name __environ for environ
patch by Jens Gustedt.
previously, the intended policy was to use __environ in code that must
conform to the ISO C namespace requirements, and environ elsewhere.
this policy was not followed in practice anyway, making things
confusing. on top of that, Jens reported that certain combinations of
link-time optimization options were breaking with the inconsistent
references; this seems to be a compiler or linker bug, but having it
go away is a nice side effect of the changes made here.
2013-02-17 14:24:39 -05:00
Rich Felker
a8799356d5 base system() on posix_spawn
this avoids duplicating the fragile logic for executing an external
program without fork.
2013-02-03 18:21:04 -05:00
Rich Felker
4862864fc1 fix unsigned comparison bug in posix_spawn
read should never return anything but 0 or sizeof ec here, but if it
does, we want to treat any other return as "success". then the caller
will get back the pid and is responsible for waiting on it when it
immediately exits.
2013-02-03 17:09:47 -05:00
Rich Felker
fb6b159d9e overhaul posix_spawn to use CLONE_VM instead of vfork
the proposed change was described in detail in detail previously on
the mailing list. in short, vfork is unsafe because:

1. the compiler could make optimizations that cause the child to
clobber the parent's local vars.

2. strace is buggy and allows the vforking parent to run before the
child execs when run under strace.

the new design uses a close-on-exec pipe instead of vfork semantics to
synchronize the parent and child so that the parent does not return
before the child has finished using its arguments (and now, also its
stack). this also allows reporting exec failures to the caller instead
of giving the caller a child that mysteriously exits with status 127
on exec error.

basic testing has been performed on both the success and failure code
paths. further testing should be done.
2013-02-03 16:42:40 -05:00
Rich Felker
96fbcf7d80 fix up minor misplacement of restrict keyword in spawnattr sched stubs 2013-02-01 15:57:28 -05:00
Rich Felker
1e21e78bf7 add support for thread scheduling (POSIX TPS option)
linux's sched_* syscalls actually implement the TPS (thread
scheduling) functionality, not the PS (process scheduling)
functionality which the sched_* functions are supposed to have.
omitting support for the PS option (and having the sched_* interfaces
fail with ENOSYS rather than omitting them, since some broken software
assumes they exist) seems to be the only conforming way to do this on
linux.
2012-11-11 15:38:04 -05:00
Rich Felker
efd4d87aa4 clean up sloppy nested inclusion from pthread_impl.h
this mirrors the stdio_impl.h cleanup. one header which is not
strictly needed, errno.h, is left in pthread_impl.h, because since
pthread functions return their error codes rather than using errno,
nearly every single pthread function needs the errno constants.

in a few places, rather than bringing in string.h to use memset, the
memset was replaced by direct assignment. this seems to generate much
better code anyway, and makes many functions which were previously
non-leaf functions into leaf functions (possibly eliminating a great
deal of bloat on some platforms where non-leaf functions require ugly
prologue and/or epilogue).
2012-11-08 17:04:20 -05:00
Rich Felker
76f28cfce5 system is a cancellation point
ideally, system would also be cancellable while running the external
command, but I cannot find any way to make that work without either
leaking zombie processes or introducing behavior that is far outside
what the standard specifies. glibc handles cancellation by killing the
child process with SIGKILL, but this could be unsafe in that it could
leave the data being manipulated by the command in an inconsistent
state.
2012-10-28 21:17:45 -04:00
Rich Felker
599f973603 fix usage of locks with vfork
__release_ptc() is only valid in the parent; if it's performed in the
child, the lock will be unlocked early then double-unlocked later,
corrupting the lock state.
2012-10-19 15:02:37 -04:00
Rich Felker
97c8bdd88a fix parent-memory-clobber in posix_spawn (environ) 2012-10-18 16:41:27 -04:00
Rich Felker
44eb4d8b9b overhaul system() and popen() to use vfork; fix various related bugs
since we target systems without overcommit, special care should be
taken that system() and popen(), like posix_spawn(), do not fail in
processes whose commit charges are too high to allow ordinary forking.

this in turn requires special precautions to ensure that the parent
process's signal handlers do not end up running in the shared-memory
child, where they could corrupt the state of the parent process.

popen has also been updated to use pipe2, so it does not have a
fd-leak race in multi-threaded programs. since pipe2 is missing on
older kernels, (non-atomic) emulation has been added.

some silly bugs in the old code should be gone too.
2012-10-18 15:58:23 -04:00
Rich Felker
d5304147b9 block uid/gid changes during posix_spawn
usage of vfork creates a situation where a process of lower privilege
may momentarily have write access to the memory of a process of higher
privilege.

consider the case of a multi-threaded suid program which is calling
posix_spawn in one thread while another thread drops the elevated
privileges then runs untrusted (relative to the elevated privilege)
code as the original invoking user. this untrusted code can then
potentially modify the data the child process will use before calling
exec, for example changing the pathname or arguments that will be
passed to exec.

note that if vfork is implemented as fork, the lock will not be held
until the child execs, but since memory is not shared it does not
matter.
2012-10-15 11:42:46 -04:00
Rich Felker
d62f4e9888 use vfork if possible in posix_spawn
vfork is implemented as the fork syscall (with no atfork handlers run)
on archs where it is not available, so this change does not introduce
any change in behavior or regression for such archs.
2012-09-14 15:32:51 -04:00
Rich Felker
400c5e5c83 use restrict everywhere it's required by c99 and/or posix 2008
to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.
2012-09-06 22:44:55 -04:00
Rich Felker
4cf667c9c9 x86_64 vfork implementation
untested; should work.
2012-02-06 18:23:11 -05:00
Rich Felker
2bb75db611 support vfork on i386 2011-10-14 23:56:31 -04:00
Rich Felker
768f39a535 make available a namespace-safe vfork, if supported
this may be useful to posix_spawn..?
2011-10-14 23:34:12 -04:00
Rich Felker
4b87736998 fix various bugs in path and error handling in execvp/fexecve 2011-09-29 00:48:04 -04:00
Rich Felker
13cd969552 fix various errors in function signatures/prototypes found by nsz 2011-09-13 21:09:35 -04:00
Rich Felker
0f1ef81462 add missing posix_spawnattr_init/destroy functions 2011-09-13 14:45:59 -04:00
Rich Felker
98acf04fc0 use weak aliases rather than function pointers to simplify some code 2011-08-06 20:09:51 -04:00
Rich Felker
4ec07e1f60 ensure in fork that child gets its own new robust mutex list 2011-07-16 23:17:17 -04:00
Rich Felker
f48832ee15 fix backwards posix_spawn file action order 2011-05-29 12:58:02 -04:00