Move the handling of PG_PAGEOUT from uvm_aio_aiodone_pages() to
uvm_page_unbusy() so that all callers of uvm_page_unbusy() don't need to
handle this flag separately. Split out the pages part of uvm_aio_aiodone()
into uvm_aio_aiodone_pages() in rump just like in the real kernel.
In ZFS functions that can fail to copy data between the ARC and VM pages,
use uvm_aio_aiodone_pages() rather than uvm_page_unbusy() so that we can
handle these "I/O" errors. Fixes PR 55702.
operation kauth_authorize_system().
KAUTH_SYSTEM_MOUNT / KAUTH_REQ_SYSTEM_MOUNT_NEW wants the to be
covered vnode and the mount flags, not the mount structure.
Fix for PR kern/55602: zpool panic on mounting zfs filesystem
temporary zbookmark_phys_t using kmem_alloc() rather than stack;
this recuses several times usually, and this saves 2x
sizeof(zbookmark_phys_t) == 64 bytes per recursion
part of fix for PR kern/55402 by Frank Kardel
on-stack - this function is called recursively, and the 120 bytes per call
add up; also remove unused variable
part of fix for PR kern/55402 by Frank Kardel
this should generally slightly improve performance on MP systems, and
specifically for xbd(4) storage avoids slow unaligned I/O buffer handling
this change requires updated kernel, to allow up to SPA_MAXBLOCKSHIFT item
size for pools
fixes PR kern/55397 by Frank Kardel
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
This logic correctly uses strncpy(3) to fully initialize a fixed-width field, and also ensures
NUL-termination on the next line as other users of the field expect.
Add -Werror=stringop-truncation to prevent build failure, when run with MKSANITIZER=yes.
Error was reported when build.sh was run with MKSANITIZER=yes flag.
Reviewed by: kamil@
Operation zfs_znode.c::zfs_zget_cleaner() depends on this
zil_commit() as a barrier to guarantee the znode cannot
get freed before its log entries are resolved.
Currently x86_patch_window_open is a big problem, because it is a perfect
function to inject/modify executable code with ROP.
- Remove x86_patch_window_open(), along with its x86_patch_window_close()
counterpart.
- Introduce a read-only link-set of hotpatch descriptor structures,
which reference a maximum of two read-only hotpatch sources.
- Modify x86_hotpatch() to open a window and call the new
x86_hotpatch_apply() function in a hard-coded manner.
- Modify x86_hotpatch() to take a name and a selector, and have
x86_hotpatch_apply() resolve the descriptor from the name and the
source from the selector, before hotpatching.
- Move the error handling in a separate x86_hotpatch_cleanup() function,
that gets called after we closed the window.
The resulting implementation is a bit complex and non-obvious. But it
gains the following properties: the code executed in the hotpatch window
is strictly hard-coded (no callback and no possibility to execute your own
code in the window) and the pointers this code accesses are strictly
read-only (no possibility to forge pointers to hotpatch an area that was
not designated as hotpatchable at compile-time, and no possibility to
choose what bytes to write other than the maximum of two read-only
templates that were designated as valid for the given destination at
compile-time).
With current CPUs this slightly improves a situation that is already
pretty bad by definition on x86. Assuming CET however, this change closes
a big hole and is kinda great.
The only ~problem there is, is that dtrace-fbt tries to hotpatch random
places with random bytes, and there is just no way to make it safe.
However dtrace is only in a module, that is rarely used and never compiled
into the kernel, so it's not a big problem; add a shitty & vulnerable
independent hotpatch window in it, and leave big XXXs. It looks like fbt
is going to collapse soon anyway.
- Don't use a static buffer for the result.
- kauth_cred_getgroups refuses to return more than the actual number
of groups, so passing NGROUPS_MAX generally doesn't work.
To avoid patching zfs, just expose struct kauth_cred::cr_groups
directly, with __KAUTH_PRIVATE. Unclear why the official API only
exposes it via memcpy or copyout anyway.
This makes unprivileged zfs operations work, by anyone with access to
/dev/zfs (which is conventionally mode 777, and which we should maybe
set it to by default; zfs has its own ACL system, zfs allow).