Move the handling of PG_PAGEOUT from uvm_aio_aiodone_pages() to
uvm_page_unbusy() so that all callers of uvm_page_unbusy() don't need to
handle this flag separately. Split out the pages part of uvm_aio_aiodone()
into uvm_aio_aiodone_pages() in rump just like in the real kernel.
In ZFS functions that can fail to copy data between the ARC and VM pages,
use uvm_aio_aiodone_pages() rather than uvm_page_unbusy() so that we can
handle these "I/O" errors. Fixes PR 55702.
operation kauth_authorize_system().
KAUTH_SYSTEM_MOUNT / KAUTH_REQ_SYSTEM_MOUNT_NEW wants the to be
covered vnode and the mount flags, not the mount structure.
Fix for PR kern/55602: zpool panic on mounting zfs filesystem
temporary zbookmark_phys_t using kmem_alloc() rather than stack;
this recuses several times usually, and this saves 2x
sizeof(zbookmark_phys_t) == 64 bytes per recursion
part of fix for PR kern/55402 by Frank Kardel
on-stack - this function is called recursively, and the 120 bytes per call
add up; also remove unused variable
part of fix for PR kern/55402 by Frank Kardel
this should generally slightly improve performance on MP systems, and
specifically for xbd(4) storage avoids slow unaligned I/O buffer handling
this change requires updated kernel, to allow up to SPA_MAXBLOCKSHIFT item
size for pools
fixes PR kern/55397 by Frank Kardel
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
This logic correctly uses strncpy(3) to fully initialize a fixed-width field, and also ensures
NUL-termination on the next line as other users of the field expect.
Add -Werror=stringop-truncation to prevent build failure, when run with MKSANITIZER=yes.
Error was reported when build.sh was run with MKSANITIZER=yes flag.
Reviewed by: kamil@
Operation zfs_znode.c::zfs_zget_cleaner() depends on this
zil_commit() as a barrier to guarantee the znode cannot
get freed before its log entries are resolved.
Currently x86_patch_window_open is a big problem, because it is a perfect
function to inject/modify executable code with ROP.
- Remove x86_patch_window_open(), along with its x86_patch_window_close()
counterpart.
- Introduce a read-only link-set of hotpatch descriptor structures,
which reference a maximum of two read-only hotpatch sources.
- Modify x86_hotpatch() to open a window and call the new
x86_hotpatch_apply() function in a hard-coded manner.
- Modify x86_hotpatch() to take a name and a selector, and have
x86_hotpatch_apply() resolve the descriptor from the name and the
source from the selector, before hotpatching.
- Move the error handling in a separate x86_hotpatch_cleanup() function,
that gets called after we closed the window.
The resulting implementation is a bit complex and non-obvious. But it
gains the following properties: the code executed in the hotpatch window
is strictly hard-coded (no callback and no possibility to execute your own
code in the window) and the pointers this code accesses are strictly
read-only (no possibility to forge pointers to hotpatch an area that was
not designated as hotpatchable at compile-time, and no possibility to
choose what bytes to write other than the maximum of two read-only
templates that were designated as valid for the given destination at
compile-time).
With current CPUs this slightly improves a situation that is already
pretty bad by definition on x86. Assuming CET however, this change closes
a big hole and is kinda great.
The only ~problem there is, is that dtrace-fbt tries to hotpatch random
places with random bytes, and there is just no way to make it safe.
However dtrace is only in a module, that is rarely used and never compiled
into the kernel, so it's not a big problem; add a shitty & vulnerable
independent hotpatch window in it, and leave big XXXs. It looks like fbt
is going to collapse soon anyway.
- Don't use a static buffer for the result.
- kauth_cred_getgroups refuses to return more than the actual number
of groups, so passing NGROUPS_MAX generally doesn't work.
To avoid patching zfs, just expose struct kauth_cred::cr_groups
directly, with __KAUTH_PRIVATE. Unclear why the official API only
exposes it via memcpy or copyout anyway.
This makes unprivileged zfs operations work, by anyone with access to
/dev/zfs (which is conventionally mode 777, and which we should maybe
set it to by default; zfs has its own ACL system, zfs allow).
It looks like this is a false positive, since the section of code triggering the error
external/cddl/osnet/dist/lib/libdtrace/common/dt_proc.c:400:42:
is only accessed after "err" is initialized.
Error was reported when build.sh was run with MKLIBCSANITIZER=yes flag.
Reviewed by: kamil@
1. Issue zil_commit only if we're actually updating something --
there's no need to commit if we're unlinking the file or if
there's no atime update being applied.
2. Issue zil_commit only if the zfs has sync=always set -- for
sync=standard there's no need for us to commit anything here since
no application asked for an explicit sync.
Speeds up untarring base.tgz on top of itself by a factor of about
2x, and speeds up rm by a factor of about 10x, on my system with an
SSD SLOG over SATA. Histogram of unlink, rmdir, and rename timing
shows dramatic reduction in latency for most samples.
(To be fair, this was not an improvement over zfs; issuing the
unnecessary zil_commit was a self-inflicted performance wound.)