On the executable pages that have the GP (Guarded Page) bit, the semantic
of the "br" and "blr" instructions is changed: the CPU expects the first
instruction of the jump/call target to be "bti", and faults if it isn't.
We add the GP bit on the kernel .text pages (and incidentally the .rodata
pages, but we don't care). The compiler adds a "bti c" instruction at the
beginning of each C function. We modify the ENTRY() macros to manually add
"bti c" in the asm functions.
cpuswitch.S needs a specific change: with "br x27" the CPU expects "bti j",
which is bad because the functions begin with "bti c"; switch to "br x16",
for the CPU to accept "bti c".
BTI helps defend against JOP/COP. Tested on Qemu.
and pool_prime() (and their pool_cache_* counterparts):
- the pool_set*wat() APIs are supposed to specify thresholds for the count of
free items in the pool before pool pages are automatically allocated or freed
during pool_get() / pool_put(), whereas pool_sethardlimit() and pool_prime()
are supposed to specify minimum and maximum numbers of total items
in the pool (both free and allocated). these were somewhat conflated
in the existing code, so separate them as they were intended.
- change pool_prime() to take an absolute number of items to preallocate
rather than an increment over whatever was done before, and wait for
any memory allocations to succeed. since pool_prime() can no longer fail
after this, change its return value to void and adjust all callers.
- pool_setlowat() is documented as not immediately attempting to allocate
any memory, but it was changed some time ago to immediately try to allocate
up to the lowat level, so just fix the manpage to describe the current
behaviour.
- add a pool_cache_prime() to complete the API set.
simplify and fix error handling in xbd_handler(), expect backend
to not use the grant once request is finished, and avoid leaking
bounce buffer when the request using it happens to end with error
in xbd_diskstart() only do the RING_PUSH_REQUESTS_AND_CHECK_NOTIFY()
when actually the request was pushed successfully
to set BC_INVAL for them, and also need to explicitly remove them
from the BQ_LOCKED queue
fixes DIAGNOSTIC panic when force unmounting unresponsive disk device
PR kern/51178 by Michael van Elst
to be accessing system ACL's on behalf of the kernel. This code appears
to be copied from FreeBSD, but there it works because in FreeBSD NOCRED
is 0, ours is -1. I guess nobody has used system extended attributes on
NetBSD yet :-)
unmount on shutdown with slow I/O device
wapbl_discard() needs to hold both wl_mtx and bufcache_lock while
manipulating wl_entries - the rw lock is not enough, because
wapbl_biodone() only takes wl_mtx while removing the finished entry
from list
wapbl_biodone() must take bufcache_lock before reading we->we_wapbl,
so it's blocked until wapbl_discard() finishes, and takes !wl path
appropriately
this is supposed to fix panic on shutdown:
[ 67549.6304123] forcefully unmounting / (/dev/wd0a)...
...
[ 67549.7272030] panic: mutex_vector_enter,510: uninitialized lock (lock=0xffffa722a4f4f5b0, from=ffffffff80a884fa)
...
[ 67549.7272030] wapbl_biodone() at netbsd:wapbl_biodone+0x4d
[ 67549.7272030] biointr() at netbsd:biointr+0x7d
[ 67549.7272030] softint_dispatch() at netbsd:softint_dispatch+0x12c
[ 67549.7272030] Xsoftintr() at netbsd:Xsoftintr+0x4f
We use the "pac-ret" option, to sign the return instruction pointer on
function entry, and authenticate it on function exit. This acts as a
mitigation against ROP.
The authentication uses a per-lwp (secret) I-A key stored in the 128bit
APIAKey register and part of the lwp context. During lwp creation, the
kernel generates a random key, and during context switches, it installs
the key of the target lwp on the CPU.
Userland cannot read the APIAKey register directly. However, it can sign
its pointers with it, because the register is architecturally shared
between userland and the kernel. Although part of the CPU design, it is
a bit of an undesired behavior, because it allows to forge valid kernel
pointers from userland. To avoid that, we don't share the key with
userland, and rather switch it in EL0<->EL1 transitions. This means that
when userland executes, a different key is loaded in APIAKey than the one
the kernel uses. For now the userland key is a fixed 128bit zero value.
The DDB stack unwinder is changed to strip the authentication code from
the pointers in lr.
Two problems are known:
* Currently the idlelwps' keys are not really secret. This is because
the RNG is not yet available when we spawn these lwps. Not overly
important, but would be nice to fix with UEFI RNG.
* The key switching in EL0<->EL1 transitions is not the most optimized
code on the planet. Instead of checking aarch64_pac_enabled, it would
be better to hot-patch the code at boot time, but there currently is
no hot-patch support on aarch64.
Tested on Qemu.
We are returning an ACPI_INTEGER (= uint64_t), so it doesn't make
sense to handle more than 64 bits.
Apparently there are some ACPIs out there that ask for unreasonably
large widths here. Just reject those requests, rather than writing
past the caller's stack buffer.
Previously we attempted to fix this by copying byte by byte as large
as the caller asked, in order to avoid the undefined behaviour of
shifting past the size of ACPI_INTEGER, but that just turned a shift
(which might have been harmless on real machines) into a stack buffer
overflow (!).
ok msaitoh
on the OpenBSD single-port XR21V1410 uxrcom driver, but adds support
for multi-port chipsets and uses the common umodem framework instead of
being a standalone driver.
Thanks to skrll@ for much USB clue and mrg@ for financing the
development of this driver.