to the pool layer. It is taken out of POOL_QUARANTINE.
Advertise POOL_NOCACHE for kMSan rather than POOL_QUARANTINE. With kMSan
we are only interested in the no-caching effect, not the quarantine. This
reduces memory pressure on kMSan kernels.
/netbsd/modules respectively instead of /netbsd and
/stand/<arch>/<version>/modules. This is only supported for x86,
and is turned off by default. To try it, add KERNEL_DIR=yes in your
/mk.conf and install a system from that build.
KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.
Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.
That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.
KLEAK was a good ride, and a fun project, but now is time for it to go.
Discussed with several people, including Thomas Barabosch.
Discussed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2020/01/13/msg025938.html
This was never (intentionally) enabled by default, and the design has
some shortcomings. You can get mostly the same results with ktrace,
as in usr.bin/make/filemon/filemon_ktrace.c which is now used instead
of filemon for make's meta mode.
If applications require higher performance than ktrace, or nesting
that ktrace doesn't support, we might consider adding something back
into the vfs system calls themselves, without hijacking the syscall
table. (Might want a more reliable output format too, e.g. one that
can handle newlines in file names.)
- Partially sort the list of per-vnode namecache entries by using a TAILQ.
Put the real name to the head, and put dot and dotdot to the tail so that
cache_lookup_reverse() doesn't have to consider them.
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.
We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.
The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.
The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).
We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.
Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.
Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.
The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.
This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.
kMSan is available with LLVM, but not with GCC.
The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.
We do basically two things:
- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).
- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.
The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.
The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.
Reviewed by Kamil.
In conjunction with mount_9p, it enables a NetBSD system running as a VM guest
to mount an exported filesystem by the host via virtio-9p. It exports a 9p
end-point of virtio-9p via a character device file for mount_9p.
Reviewed by yamaguchi@
The new member is caled f_mntfromlabel and it is the dkw_wname
of the corresponding wedge. This is now used by df -W to display
the mountpoint name as NAME=
from userland via /dev/vhci. Using this, it becomes possible to test and
fuzz the USB stack and all the USB drivers without having the associated
hardware.
The vHCI device has four ports independently addressable.
For each xfer on each port, we create two packets: a setup packet (which
indicates mostly the type of request) and a data packet (which contains
the raw data). These packets are processed by read and write operations
on /dev/vhci: userland poll-reads it to fetch usb_device_request_t
structures, and dispatches the requests depending on bRequest and
bmRequestType.
A few ioctls are available:
VHCI_IOC_GET_INFO - Get the current status
VHCI_IOC_SET_PORT - Choose a vHCI port
VHCI_IOC_USB_ATTACH - Attach a USB device on the current port
VHCI_IOC_USB_DETACH - Detach the USB device on the current port
vHCI has already allowed me to automatically find several bugs in the USB
stack and its drivers.
Benefits:
- larger seeds -- a 128-bit key alone is not enough for `128-bit security'
- better resistance to timing side channels than AES
- a better-understood security story (https://eprint.iacr.org/2018/349)
- no loss in compliance with US government standards that nobody ever
got fired for choosing, at least in the US-dominated western world
- no dirty endianness tricks
- self-tests
Drawbacks:
- performance hit: throughput is reduced to about 1/3 in naive measurements
=> possible to mitigate by using hardware SHA-256 instructions
=> all you really need is 32 bytes to seed a userland PRNG anyway
=> if we just used ChaCha this would go away...
XXX pullup-7
XXX pullup-8
XXX pullup-9
(like sensor readout) are locked, so that a userland program may interfere with
envsys operation.
To use this you need a program like ipmitool built with OpenIPMI support.
It is yet another psref leak detector that enables to tell where a leak occurs
while a simpler version that is already committed just tells an occurrence of a
leak.
Investigating of psref leaks is hard because once a leak occurs a percpu list of
psref that tracks references can be corrupted. A reference to a tracking object
is memorized in the list via an intermediate object (struct psref) that is
normally allocated on a stack of a thread. Thus, the intermediate object can be
overwritten on a leak resulting in corruption of the list.
The tracker makes a shadow entry to an intermediate object and stores some hints
into it (currently it's a caller address of psref_acquire). We can detect a
leak by checking the entries on certain points where any references should be
released such as the return point of syscalls and the end of each softint
handler.
The feature is expensive and enabled only if the kernel is built with
PSREF_DEBUG.
Proposed on tech-kern