PROC_MACHINE_ARCH32(P) to override the value for sysctl hw.machine_arch
(native and netbsd32 commpat resp.).
Use these for arm and mips instead of the (not working, noisy, in case
of arm) sysctl override and #ifdef __mips__ in architecture neutral
code.
extend the uint64_t's when building it, so we're leaking 48 bits of kernel
stack to userland.
Having said that, it appears that I unintentionally fixed most of this
issue in locore.S::rev1.127 - by building the frame with interrupts
disabled, we are implicitly guaranteeing that the structure doesn't get
overwritten by the kernel. Which means, we are leaking to userland data
that comes from userland anyway.
(still other places with this issue, but I'll fix them differently)
disable it - which defaults to disabled. The following command is now
required to use linux binaries:
sysctl -w emul.linux.enabled=1
After a discussion on tech-kern@. All the other ideas to reduce the attack
surface have drawbacks, and this sysctl seems to be the best option.
mitigation against similar bugs.
The operations on segment registers can generate a page fault if there is
an issue when touching the in-memory gdt. Theoretically, it is never
supposed to happen, since the gdt is mapped correctly. However, in the
kernel we allow the gdt to be resized, and to do that, we allocate the
maximum amount of va needed by it, but only kenter a few pages until we
need more. Moreover, to avoid reloading the gdt each time we grow it, the
'size' field of gdtr is set to the maximum value. All of this means that
if a mov or iretq is done with a segment register whose index hits a page
that has not been kentered, a page fault is sent.
Such a page fault, if received in kernel mode, does not trigger a swapgs
on amd64; in other words, the kernel would be re-entered with the userland
tls.
And there just happens to be a place in compat_linux32 where the index of
%cs is controlled by userland, making it easy to trigger the page fault
and get kernel privileges.
The mitigation simply consists in abandoning the gdt_grow mechanism and
allocating/kentering the maximum size right away, in such a way that no
page fault can be triggered because of segment registers.
This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.
This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.
This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.
Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h
Reduce code complexity after removal of this functionality.
Update TODO.ptrace accordingly: remove two entries about /proc tracing.
Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).
Proposed on tech-kern@.
All filesystem tracing utility users are encouraged to switch to ptrace(2).
Sponsored by <The NetBSD Foundation>