add two ioctls to get and set struct epoll_params to allow users to
control epoll based busy polling of network sockets.
added to uapi in commit 18e2bf0edf4dd88d9656ec92395aa47392e85b61 (Linux
kernel 6.9 and newer).
previously, only a few archs defined it here. this change makes the
presence consistent across all archs, and reduces the amount of header
duplication (and potential for future inconsistency) between archs.
added in linux kernel commit 73fa7547c70b32cc69685f79be31135797734eb6.
this is added now as a prerequisite for fixing pwrite/pwritev behavior
for O_APPEND files.
This is the only missing part in struct statvfs. The LSB calls
[f]statfs() deprecated, and its weird types are definitely
off-putting. However, its use is required to get f_type.
Instead, allocate one of the six spares to f_type, copied directly
from struct statfs. This then becomes a small extension to the
standard interface on Linux, instead of two different interfaces, one
of which is quite odd due to being an ABI type, and there no longer is
any reason to use statfs().
The underlying kernel type is a mess, but all architectures agree on u32
(or more) for the ABI, and all filesystem magicks are 32-bit integers.
Since commit 6567db65f4 (prior to
1.0.0), the spare slots have been zero-filled, so on all versions that
may be reasonably be encountered in the wild, applications can rely on
a nonzero f_type as indication that the new field has been filled in.
the result of the 0xffff mask with the exit status could have bit 15
set, in which case multiplying by 0x10001 overflows 32-bit signed int.
making the multiply unsigned avoids the overflow. it also changes the
sign extension behavior of the subsequent >> operation, but the
affected bits are all unwanted anyway and all discarded by the cast to
short.
these badly pollute the namespace with macros whenever _GNU_SOURCE is
defined, which is always the case with g++, and especially tends to
interfere with C++ constructs.
as our implementation of these was macro-only, their removal cannot
affect any existing binaries. at the source level, portable software
should be prepared for them not to exist.
for now, they are left in place with explicit _LARGEFILE64_SOURCE.
this provides an easy temporary path for integrators/distributions to
get packages building again right away if they break while working on
a proper, upstreamable fix. the intent is that this be a very
short-term measure and that the macros be removed entirely in the next
release cycle.
see
linux commit 90f093fa8ea48e5d991332cee160b761423d55c1
rseq, ptrace: Add PTRACE_GET_RSEQ_CONFIGURATION request
the struct type got __ prefix to follow existing practice.
see
linux commit 1446e1df9eb183fdf81c3f0715402f1d7595d4cb
kernel: Implement selective syscall userspace redirection
linux commit 36a6c843fd0d8e02506681577e96dabd203dd8e8
entry: Use different define for selector variable in SUD
redirect syscalls to a userspace handler via SIGSYS, except for a specific
range of code. can be toggled via a memory write to a selector variable.
mainly for wine.
these are for the aarch64 MTE (memory tagging extension), see
linux commit 1c101da8b971a36695319dce7a24711dc567a0dd
arm64: mte: Allow user control of the tag check mode via prctl()
linux commit af5ce95282dc99d08a27a407a02c763dde1c5558
arm64: mte: Allow user control of the generated random tags via prctl()
path resolution does not follow symlinks on nosymfollow mounts (but
readlink still does), see
linux commit dab741e0e02bd3c4f5e2e97be74b39df2523fc6e
Add a "nosymfollow" mount option.
can cause rseq restart on another cpu to synchronize with global
memory access from rseq critical sections, see
linux commit 2a36ab717e8fe678d98f81c14a0b124712719840
rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
Update fanotify.h, see
linux commit 929943b38daf817f2e6d303ea04401651fc3bc05
fanotify: add support for FAN_REPORT_NAME
linux commit 83b7a59896dd24015a34b7f00027f0ff3747972f
fanotify: add basic support for FAN_REPORT_DIR_FID
linux commit 08b95c338e0c5a96e47f4ca314ea1e7580ecb5d7
fanotify: remove event FAN_DIR_MODIFY
FAN_DIR_MODIFY that was new in v5.7 is now removed from linux uapi,
but kept in musl, so we don't break api, linux cannot reuse the
value anyway.
it remaps anon mappings without unmapping the original. chromeos plans
to use it with userfaultfd, see:
linux commit e346b3813067d4b17383f975f197a9aa28a3b077
mm/mremap: add MREMAP_DONTUNMAP to mremap()
see
linux commit 9e2ba2c34f1922ca1e0c7d31b30ace5842c2e7d1
fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name
linux commit 44d705b0370b1d581f46ff23e5d33e8b5ff8ec58
fanotify: report name info for FAN_DIR_MODIFY event
needed for storage drivers with userspace component that may
run in the IO path, see
linux commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463
prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
putting the (simple) definition in alltypes.h seems like the best
solution here. making sys/ioctl.h implicitly include termios.h is
probably excess namespace pollution.
these have been adopted for future issue of POSIX as the outcome of
Austin Group issue 1151, and are simply functions performing the roles
of the historical ioctls. since struct winsize is being standardized
along with them, its definition is moved to the appropriate header.
there is some chance this will break source files that expect struct
winsize to be defined by sys/ioctl.h without including termios.h. if
this happens, further changes will be needed to have sys/ioctl.h
expose it too.
_POSIX_VDISABLE is only visible if unistd.h has already been included,
so conditional use of it here makes no sense. the value is always 0
anyway; it does not vary.
this change was discussed on the mailing list thread for the linux
uapi v5.3 patches, and submitted as a v2 patch, but overlooked when I
applied the patches much later.
revert commit f291c09ec9 and apply the
v2 as submitted; the net change is just padding.
notes by Szabolcs Nagy follow:
compared to the linux uapi (and glibc) a padding is used instead of
aligned attribute for keeping the layout the same across targets, this
means the alignment of the struct may be different on some targets
(e.g. m68k where uint64_t is 2 byte aligned) but that should not affect
syscalls and this way the abi does not depend on nonstandard extensions.
allows waiting on a pidfd, in the future it might allow retrieving the
exit status by a non-parent process, see
linux commit 3695eae5fee0605f316fbaad0b9e3de791d7dfaf
pidfd: add P_PIDFD to waitid()
per thread prctl commands to relax the syscall abi such that top bits
of user pointers are ignored in the kernel. this allows the use of
those bits by hwasan or by mte to color pointers and memory on aarch64:
linux commit 63f0c60379650d82250f22e4cf4137ef3dc4f43d
arm64: Introduce prctl() options to control the tagged user addresses ABI
These were mainly introduced so android can optimize the memory usage
of unused apps.
MADV_COLD hints that the memory range is currently not needed (unlike
with MADV_FREE the content is not garbage, it needs to be swapped):
linux commit 9c276cc65a58faf98be8e56962745ec99ab87636
mm: introduce MADV_COLD
MADV_PAGEOUT hints that the memory range is not needed for a long time
so it can be reclaimed immediately independently of memory pressure
(unlike with MADV_DONTNEED the content is not garbage):
linux commit 1a4e58cce84ee88129d5d49c064bd2852b481357
mm: introduce MADV_PAGEOUT
ptrace API to get details of the syscall the tracee is blocked in, see
linux commit 201766a20e30f982ccfe36bebfad9602c3ff574a
ptrace: add PTRACE_GET_SYSCALL_INFO request
the align attribute was used to keep the layout the same across targets
e.g. on m68k uint32_t is 2 byte aligned, this helps with compat ptrace.
the elf_prstatus structure is used in core dumps, and the timeval
structures in it are longs matching the elf class, *not* the kernel
"old timeval" for the arch. this means using timeval here for x32 was
always wrong, despite kernel uapi headers and glibc also exposing it
this way, and of course it's wrong for any arch with 64-bit time_t.
rather than just changing the type on affected archs, use a tagless
struct containing long tv_sec and tv_usec members in place of the
timevals. this intentionally breaks use of them as timevals (e.g.
assignment, passing address, etc.) on 64-bit archs as well so that any
usage unsafe for 32-bit archs is caught even in software that only
gets tested on 64-bit archs. from what I could gather, there is not
any software using these members anyway. the only reason they need to
be fixed to begin with is that the only members which are commonly
used, the saved registers, follow the time members and have the wrong
offset if the time members are sized incorrectly.
commit b60fdf133c broke the
SIOCGSTAMP[NS] ioctl fallbacks introduced in commit
2e554617e5, as well as use of these
ioctls, by creating a situation where bits/ioctl.h could be included
without __LONG_MAX being visible.
now that all 32-bit archs have 64-bit time types, the values for the
time-related socket option macros can be treated as universal for
32-bit archs. the sys/socket.h mechanism for this predates
arch/generic and is instead in the top-level header.
x32, which does not use the new time64 values of the macros, already
has its own overrides, so this commit does not affect it.
these structures can now be defined generically in terms of endianness
and long size. previously, the 32-bit archs all shared a common
definition from the generic bits header, and each 64-bit arch had to
repeat the 64-bit version, with endian conditionals if the arch had
variants of each endianness.
I would prefer getting rid of the preprocessor conditionals for
padding and instead using unnamed bitfield members, like commit
9b2921bea1 did for struct timespec.
however, at present sendmsg, recvmsg, and recvmmsg need access to the
padding members by name to zero them. this could perhaps be cleaned up
in the future.
SO_RCVTIMEO and SO_SNDTIMEO already were, but only in aggregate with
SO_DEBUG and all of the other low/traditional options that varied per
arch. SO_TIMESTAMP* are newly overridable. the two groups have to be
done separately since mips64 and powerpc64 will override the former
but not the latter.
at some point this should be cleaned up to use bits headers more
idiomatically.
a _REDIR_TIME64 macro is introduced, which the arch's alltypes.h is
expected to define, to control redirection of symbol names for
interfaces that involve time_t and derived types. this ensures that
object files will only be linked to libc interfaces matching the ABI
whose headers they were compiled against.
along with time32 compat shims, which will be introduced separately,
the redirection also makes it possible for a single libc (static or
shared) to be used with object files produced with either the old
(32-bit time_t) headers or the new ones after 64-bit time_t switchover
takes place. mixing of such object files (or shared libraries) in the
same program will also be possible, but must be done with care; ABI
between libc and a consumer of the libc interfaces is guaranteed to
match by the the symbol name redirection, but pairwise ABI between
consumers of libc that define interfaces between each other in terms
of time_t is not guaranteed to match.
this change adds a dependency on an additional "GNU C" feature to the
public headers for existing 32-bit archs, which is generally
undesirable; however, the feature is one which glibc has depended on
for a long time, and thus which any viable alternative compiler is
going to need to provide. 64-bit archs are not affected, nor will
future 32-bit archs be, regardless of whether they are "new" on the
kernel side (e.g. riscv32) or just newly-added (e.g. a new sparc or
xtensa port). the same applies to newly-added ABIs for existing
machine-level archs.
building on commit 97d35a552e,
__BYTE_ORDER is now available wherever alltypes.h is included.
endian.h should not be used since, in the future, it will expose
identifiers that are not in the reserved namespace for the headers
which were previously using it.
otherwise, 32-bit archs that could otherwise share the generic
bits/ipc.h would need to duplicate the struct ipc_perm definition,
obscuring the fact that it's the same. sysvipc is not widely used and
these headers are not commonly included, so there is no performance
gain to be had by limiting the number of indirectly included files
here.
files with the existing time32 definition of IPC_STAT are added to all
current 32-bit archs now, so that when it's changed the change will
show up as a change rather than addition of a new file where it's less
obvious that the value is changing vs the generic one that was used
before.
to make use of {sem,shm,msg}ctl IPC_STAT functionality to provide
64-bit time_t on 32-bit archs, IPC_STAT and related macros must be
defined with bit 8 (0x100) set. allow archs to define IPC_STAT in
bits/ipc.h, and define the other macros in terms of it so that they
all get the same value of the time64 bit.
SO_BINDTOIFINDEX behaves similar to SO_BINDTODEVICE, but takes a
network interface index as argument, rather than the network
interface name. see
linux commit f5dd3d0c9638a9d9a02b5964c4ad636f06cf7e2c
net: introduce SO_BINDTOIFINDEX sockopt
allows specifying that the speculative store bypass disable bit should
be cleared on exec. see
linux commit 71368af9027f18fe5d1c6f372cfdff7e4bde8b48
x86/speculation: Add PR_SPEC_DISABLE_NOEXEC
includes changes from linux v5.1
linux commit 235328d1fa4251c6dcb32351219bb553a58838d2
fanotify: add support for create/attrib/move/delete events
linux commit 5e469c830fdb5a1ebaa69b375b87f583326fd296
fanotify: copy event fid info to user
linux commit e9e0c8903009477b630e37a8b6364b26a00720da
fanotify: encode file identifier for FAN_REPORT_FID
as well as earlier changes that were missed.
sys/statfs.h is included for fsid_t.