memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.
We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.
The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.
The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).
We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.
Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.
Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.
The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.
This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.
kMSan is available with LLVM, but not with GCC.
The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.
We do basically two things:
- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).
- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.
The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.
The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.
Reviewed by Kamil.
In conjunction with mount_9p, it enables a NetBSD system running as a VM guest
to mount an exported filesystem by the host via virtio-9p. It exports a 9p
end-point of virtio-9p via a character device file for mount_9p.
Reviewed by yamaguchi@
The new member is caled f_mntfromlabel and it is the dkw_wname
of the corresponding wedge. This is now used by df -W to display
the mountpoint name as NAME=
from userland via /dev/vhci. Using this, it becomes possible to test and
fuzz the USB stack and all the USB drivers without having the associated
hardware.
The vHCI device has four ports independently addressable.
For each xfer on each port, we create two packets: a setup packet (which
indicates mostly the type of request) and a data packet (which contains
the raw data). These packets are processed by read and write operations
on /dev/vhci: userland poll-reads it to fetch usb_device_request_t
structures, and dispatches the requests depending on bRequest and
bmRequestType.
A few ioctls are available:
VHCI_IOC_GET_INFO - Get the current status
VHCI_IOC_SET_PORT - Choose a vHCI port
VHCI_IOC_USB_ATTACH - Attach a USB device on the current port
VHCI_IOC_USB_DETACH - Detach the USB device on the current port
vHCI has already allowed me to automatically find several bugs in the USB
stack and its drivers.
Benefits:
- larger seeds -- a 128-bit key alone is not enough for `128-bit security'
- better resistance to timing side channels than AES
- a better-understood security story (https://eprint.iacr.org/2018/349)
- no loss in compliance with US government standards that nobody ever
got fired for choosing, at least in the US-dominated western world
- no dirty endianness tricks
- self-tests
Drawbacks:
- performance hit: throughput is reduced to about 1/3 in naive measurements
=> possible to mitigate by using hardware SHA-256 instructions
=> all you really need is 32 bytes to seed a userland PRNG anyway
=> if we just used ChaCha this would go away...
XXX pullup-7
XXX pullup-8
XXX pullup-9
(like sensor readout) are locked, so that a userland program may interfere with
envsys operation.
To use this you need a program like ipmitool built with OpenIPMI support.
It is yet another psref leak detector that enables to tell where a leak occurs
while a simpler version that is already committed just tells an occurrence of a
leak.
Investigating of psref leaks is hard because once a leak occurs a percpu list of
psref that tracks references can be corrupted. A reference to a tracking object
is memorized in the list via an intermediate object (struct psref) that is
normally allocated on a stack of a thread. Thus, the intermediate object can be
overwritten on a leak resulting in corruption of the list.
The tracker makes a shadow entry to an intermediate object and stores some hints
into it (currently it's a caller address of psref_acquire). We can detect a
leak by checking the entries on certain points where any references should be
released such as the return point of syscalls and the end of each softint
handler.
The feature is expensive and enabled only if the kernel is built with
PSREF_DEBUG.
Proposed on tech-kern
freed buffer cannot be reallocated. This greatly helps detecting
use-after-frees, because they are not short-lived anymore.
We maintain a per-pool fifo of 128 buffers. On each pool_put, we do a real
free of the oldest buffer, and insert the new buffer. Before insertion, we
mark the buffer as invalid with KASAN. On each pool_cache_put, we destruct
the object, so it lands in pool_put, and the quarantine is handled there.
POOL_QUARANTINE can be used in conjunction with KASAN to detect more
use-after-free bugs.
The variables were removed from sys/conf/param.c and moved into the
SYSV IPC code, but config options were never propagated via any opt_*
file.
This should fix an issue reported on netbsd-users list from Dima Veselov.
Note that this does not address other parameters included in that report,
including CHILD_MAX and NOFILE; this commit only affects items related to
the SYSV IPC code. Also note that this does not affect non-built-in
sysv_ipc modules, for which you need to update the Makefile to use any
non-standard config values - just like any other non-built-in modules
which have config params.
XXX Pull-up to -8 and -8-0
XXX Note that there are a couple of panic() calls in msginit() which
XXX really should be changed to simple printf() and then result in
XXX msginit failure. Unfortunately msginit() currently doesn't return
XXX a value so we cannot indicate failure to the caller. I will fix
XXX this is a future commit.
Register cmajor 252 for fbt and 253 for sdt.
Previously the major number was picked randomly and it causes conflicts
with preallocated values for different devices.
The KCOV driver implements collection of code coverage inside the kernel.
It can be enabled on a per process basis from userland, allowing the kernel
program counter to be collected during syscalls triggered by the same
process.
The device is oriented towards kernel fuzzers, in particular syzkaller.
Currently the only supported coverage type is -fsanitize-coverage=trace-pc.
The KCOV driver was initially developed in Linux. A driver based on the
same concept was then implemented in FreeBSD and OpenBSD.
Documentation is borrowed from OpenBSD and ATF tests from FreeBSD.
This patch has been prepared by Siddharth Muralee, improved by <maxv>
and polished by myself before importing into the mainline tree.
All ATF tests pass.
XXX: consider using copts.mk for various warning/copt flags passed
in kernel builds currently set via 'makeoptions' in files.* files.
this is suboptimal, as those all get embedded into the kernel with
config_file.h.