NetBSD

Author	SHA1	Message	Date
riastradh	5084c1b50f	Rewrite entropy subsystem. Primary goals: 1. Use cryptography primitives designed and vetted by cryptographers. 2. Be honest about entropy estimation. 3. Propagate full entropy as soon as possible. 4. Simplify the APIs. 5. Reduce overhead of rnd_add_data and cprng_strong. 6. Reduce side channels of HWRNG data and human input sources. 7. Improve visibility of operation with sysctl and event counters. Caveat: rngtest is no longer used generically for RND_TYPE_RNG rndsources. Hardware RNG devices should have hardware-specific health tests. For example, checking for two repeated 256-bit outputs works to detect AMD's 2019 RDRAND bug. Not all hardware RNGs are necessarily designed to produce exactly uniform output. ENTROPY POOL - A Keccak sponge, with test vectors, replaces the old LFSR/SHA-1 kludge as the cryptographic primitive. - `Entropy depletion' is available for testing purposes with a sysctl knob kern.entropy.depletion; otherwise it is disabled, and once the system reaches full entropy it is assumed to stay there as far as modern cryptography is concerned. - No `entropy estimation' based on sample values. Such `entropy estimation' is a contradiction in terms, dishonest to users, and a potential source of side channels. It is the responsibility of the driver author to study the entropy of the process that generates the samples. - Per-CPU gathering pools avoid contention on a global queue. - Entropy is occasionally consolidated into global pool -- as soon as it's ready, if we've never reached full entropy, and with a rate limit afterward. Operators can force consolidation now by running sysctl -w kern.entropy.consolidate=1. - rndsink(9) API has been replaced by an epoch counter which changes whenever entropy is consolidated into the global pool. . Usage: Cache entropy_epoch() when you seed. If entropy_epoch() has changed when you're about to use whatever you seeded, reseed. . Epoch is never zero, so initialize cache to 0 if you want to reseed on first use. . Epoch is -1 iff we have never reached full entropy -- in other words, the old rnd_initial_entropy is (entropy_epoch() != -1) -- but it is better if you check for changes rather than for -1, so that if the system estimated its own entropy incorrectly, entropy consolidation has the opportunity to prevent future compromise. - Sysctls and event counters provide operator visibility into what's happening: . kern.entropy.needed - bits of entropy short of full entropy . kern.entropy.pending - bits known to be pending in per-CPU pools, can be consolidated with sysctl -w kern.entropy.consolidate=1 . kern.entropy.epoch - number of times consolidation has happened, never 0, and -1 iff we have never reached full entropy CPRNG_STRONG - A cprng_strong instance is now a collection of per-CPU NIST Hash_DRBGs. There are only two in the system: user_cprng for /dev/urandom and sysctl kern.?random, and kern_cprng for kernel users which may need to operate in interrupt context up to IPL_VM. (Calling cprng_strong in interrupt context does not strike me as a particularly good idea, so I added an event counter to see whether anything actually does.) - Event counters provide operator visibility into when reseeding happens. INTEL RDRAND/RDSEED, VIA C3 RNG (CPU_RNG) - Unwired for now; will be rewired in a subsequent commit.	2020-04-30 03:28:18 +00:00
thorpej	276ef22378	Add a NetBSD native futex implementation, mostly written by riastradh@. Map the COMPAT_LINUX futex calls to the native ones.	2020-04-26 18:53:31 +00:00
rin	a34427deb2	At the moment, we need kern/uipc_mbufdebug.c only if DDB is enabled.	2020-04-22 09:18:42 +00:00
riastradh	54e08fd152	Include kern_crashme.c in non-DEBUG kernels. This is useful for simulating crashes in production to test failover.	2020-03-02 16:00:54 +00:00
maxv	081da2e4c3	Retire KLEAK. KLEAK was a nice feature and served its purpose; it allowed us to detect dozens of info leaks on the kernel->userland boundary, and thanks to it we tackled a good part of the infoleak problem 1.5 years ago. Nowadays however, we have kMSan, which can detect uninitialized memory in the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to detect, but in addition, (1) it operates in all of the kernel and not just the kernel->userland boundary, (2) it requires no user interaction, and (3) it is deterministic and not statistical. That makes kMSan the feature of choice to detect info leaks nowadays; people interested in detecting info leaks should boot a kMSan kernel and just wait for the magic to happen. KLEAK was a good ride, and a fun project, but now is time for it to go. Discussed with several people, including Thomas Barabosch.	2020-02-08 07:07:06 +00:00
kamil	71b1583f64	Rename sys_ptrace_lwpstatus.c to sys_process_lwpstatus.c Keep the names of functions internally as ptrace intact as this code is shared with core_elf32.c that already reaches ptrace(2) specifc symbols. No functional change intended.	2020-01-04 03:46:19 +00:00
kamil	3097490d1b	Put ptrace_read_lwpstatus() and process_read_lwpstatus() to a new file Fixes "no PTRACE" kernel build, in particular zaurus kernel=INSTALL_C700.	2019-12-26 08:52:38 +00:00
ad	dd632e5898	Split subr_cpu.c out of kern_cpu.c, to contain routines shared with rump.	2019-12-20 21:20:09 +00:00
pgoyette	f01c2b4e29	Eliminate per-hook duplication of common code as suggested by (and with major contributions from) riastradh@ Welcome to 9.99.23	2019-12-12 22:55:20 +00:00
pgoyette	1d577fe379	Move all non-emulation-specific coredump code into the coredump module, and remove all #ifdef COREDUMP conditional compilation. Now, the coredump module is completely separated from the emulation modules, and they can all be independently loaded and unloaded. Welcome to 9.99.18 !	2019-11-20 19:37:51 +00:00
maxv	10c5b02320	Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized memory used by the kernel at run time, and just like kASan and kCSan, it is an excellent feature. It has already detected 38 uninitialized variables in the kernel during my testing, which I have since discreetly fixed. We use two shadows: - "shad", to track uninitialized memory with a bit granularity (1:1). Each bit set to 1 in the shad corresponds to one uninitialized bit of real kernel memory. - "orig", to track the origin of the memory with a 4-byte granularity (1:1). Each uint32_t cell in the orig indicates the origin of the associated uint32_t of real kernel memory. The memory consumption of these shadows is consequent, so at least 4GB of RAM is recommended to run kMSan. The compiler inserts calls to specific __msan_* functions on each memory access, to manage both the shad and the orig and detect uninitialized memory accesses that change the execution flow (like an "if" on an uninitialized variable). We mark as uninit several types of memory buffers (stack, pools, kmem, malloc, uvm_km), and check each buffer passed to copyout, copyoutstr, bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory that leaves the system. This allows us to detect kernel info leaks in a way that is more efficient and also more user-friendly than KLEAK. Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot tolerate having one non-instrumented function, because this could cause false positives. kMSan cannot instrument ASM functions, so I converted most of them to __asm__ inlines, which kMSan is able to instrument. Those that remain receive special treatment. Contrary to kASan again, kMSan uses a TLS, so we must context-switch this TLS during interrupts. We use different contexts depending on the interrupt level. The orig tracks precisely the origin of a buffer. We use a special encoding for the orig values, and pack together in each uint32_t cell of the orig: - a code designating the type of memory (Stack, Pool, etc), and - a compressed pointer, which points either (1) to a string containing the name of the variable associated with the cell, or (2) to an area in the kernel .text section which we resolve to a symbol name + offset. This encoding allows us not to consume extra memory for associating information with each cell, and produces a precise output, that can tell for example the name of an uninitialized variable on the stack, the function in which it was pushed on the stack, and the function where we accessed this uninitialized variable. kMSan is available with LLVM, but not with GCC. The code is organized in a way that is similar to kASan and kCSan, so it means that other architectures than amd64 can be supported.	2019-11-14 16:23:52 +00:00
maxv	b7edd3d132	Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us to detect race conditions at runtime. It is a variation of TSan that is easy to implement and more suited to kernel internals, albeit theoretically less precise than TSan's happens-before. We do basically two things: - On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell describing the access, and delay the calling CPU (10ms). - On all memory accesses, we verify if the memory we're reading/writing is referenced in a cell already. The combination of the two means that, if for example cpu0 does a read that is selected and cpu1 does a write at the same address, kCSan will fire, because cpu1's write collides with cpu0's read cell. The coverage of the instrumentation is the same as that of kASan. Also, the code is organized in a way similar to kASan, so it is easy to add support for more architectures than amd64. kCSan is compatible with KCOV. Reviewed by Kamil.	2019-11-05 20:19:17 +00:00
maxv	3808726a03	Unlink KMEM_GUARD leftovers.	2019-08-15 12:24:08 +00:00
christos	9a23e406ae	move setdisklabel(9) into a separate file.	2019-04-04 20:19:07 +00:00
kamil	0fe7e51662	Add KCOV - kernel code coverage tracing device The KCOV driver implements collection of code coverage inside the kernel. It can be enabled on a per process basis from userland, allowing the kernel program counter to be collected during syscalls triggered by the same process. The device is oriented towards kernel fuzzers, in particular syzkaller. Currently the only supported coverage type is -fsanitize-coverage=trace-pc. The KCOV driver was initially developed in Linux. A driver based on the same concept was then implemented in FreeBSD and OpenBSD. Documentation is borrowed from OpenBSD and ATF tests from FreeBSD. This patch has been prepared by Siddharth Muralee, improved by <maxv> and polished by myself before importing into the mainline tree. All ATF tests pass.	2019-02-23 03:10:05 +00:00
kamil	e807f4b65a	Silent UB alignment issues in acpica under kUBSan Pass -DACPI_MISALIGNMENT_NOT_SUPPORTED under kUBSan enabled. This option is dedicated for alignment sensitive CPUs in acpica. It was originally designed for Itanium CPUs, but nowadays it's wanted for aarch64 as well. Define it in acpica code under kUBSan in order to pacify Undefined Behavior reports on all ports (in particular x86). The number of reports is now halved with this patch applied. The remaining alignment alarms in acpica will be addressed in future. Patch contributed by <Akul Pillai>	2019-02-13 18:04:35 +00:00
pgoyette	d91f98a871	Merge the [pgoyette-compat] branch	2019-01-27 02:08:33 +00:00
mrg	f40d255452	crashme: a framework to test kernel faults. so far, only a basic panic() and null deref nodes are added. with options DEBUG, one can now use: # sysctl -w kern.crashme_enable=1 # sysctl -w kern.crashme.panic=1 # sysctl -w kern.crashme.null_deref=1 to trigger a crash. crashme_enable must be set to 1 before any of the nodes will be writeable. supports dynamic additional/removal of crashme nodes. (obsoletes kern.panic_now, which will be removed later.)	2019-01-09 04:01:20 +00:00
thorpej	2834fa0ab4	Add threadpool(9), an abstraction that provides shared pools of kernel threads running at specific priorities, with support for unbound pools and per-cpu pools. Written by riastradh@, and based on the May 2014 draft, with a few changes by me: - Working on the assumption that a relative few priorities will actually be used, reduce the memory footprint by using linked lists, rather than 2 large (and mostly empty) tables. The performance impact is essentially nil, since these lists are consulted only when pools are created (and destroyed, for DIAGNOSTIC checks), and the lists will have at most 225 entries. - Make threadpool job object, which the caller must allocate storage for, really opaque. - Use typedefs for the threadpool types, to reduce the verbosity of the API somewhat. - Fix a bunch of pool / worker thread / job object lifecycle bugs. Also include an ATF unit test, written by me, that exercises the basics of the API by loading a kernel module that exposes several sysctls that allow the ATF test script to create and destroy threadpools, schedule a basic job, and verify that it ran. And thus NetBSD 8.99.29 has arrived.	2018-12-24 16:58:53 +00:00
rmind	6577bb50ec	Import thmap -- a concurrent trie-hash map, combining the elements of hashing and radix trie. It supports lock-free lookups and concurrent inserts/deletes. It is designed to be optimal as a general purpose concurrent associative array. Upstream: https://github.com/rmind/thmap Discussed on tech-kern@	2018-12-16 14:06:56 +00:00
christos	171ab4cc98	Expose addresses depending on the KASLR setting (from mrg@). Restores the status quo of exposing kernel addresses if there is no KASLR.	2018-12-03 00:11:02 +00:00
maxv	e5fadd7f81	Introduce KLEAK, a new feature that can detect kernel information leaks. It works by tainting memory sources with marker values, letting the data travel through the kernel, and scanning the kernel<->user frontier for these marker values. Combined with compiler instrumentation and rotation of the markers, it is able to yield relevant results with little effort. We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is supported on amd64 only for now, but it is not complicated to add more architectures (just a matter of having the address of .text, and a stack unwinder). A userland tool is provided, that allows to execute a command in rounds and monitor the leaks generated all the while. KLEAK already detected directly 12 kernel info leaks, and prompted changes that in total fixed 25+ leaks. Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer FKIE).	2018-12-02 21:00:13 +00:00
maxv	decaffde75	Merge uipc_mbuf2.c into uipc_mbuf.c. Reorder the latter a little to gather similar functions. No functional change.	2018-11-15 09:38:57 +00:00
maxv	790d0b797c	Move the MI parts of KASAN into kern/subr_asan.c. This file includes machine/asan.h, which contains the MD functions. We use an include rather than a plain C file, because we want GCC to optimize/inline some functions into one single block. The amd64 MD parts of KASAN are moved accordingly. The naming convention we use is: kasan_* a generic kasan object, declared in subr_asan.c kasan_md_* an MD kasan object, declared in machine/asan.h, and used in subr_asan.c __md_* an MD object, declared in machine/asan.h, and not used outside Overall this makes it easier to add KASAN support on more architectures. Discussed with several people.	2018-10-31 06:26:25 +00:00
mrg	ca332959c3	retire kern_xxx.c. long live kern_xxx.c. split it into kern_reboot.c and kern_scdebug.c. while here, add my copyright to kern_scdebug.c as it was largely rewritten for kernhist support.	2018-09-14 01:55:19 +00:00
maxv	312ff3500a	Retire KMEM_REDZONE and KMEM_POISON. KMEM_REDZONE is not very efficient and cannot detect read overflows. KASAN can, and will be used instead. KMEM_POISON is enabled along with KMEM_GUARD, but it is redundant, since the latter can detect read UAFs contrary to the former. In fact maybe KMEM_GUARD should be retired too, because there are many cases where it doesn't apply. Simplifies the code.	2018-08-20 11:35:28 +00:00
kamil	e1971882d1	Register kUBSan in the GENERIC amd64 kernel config Tested with GCC.	2018-08-03 04:35:20 +00:00
msaitoh	b0dc241449	- Fix compile error for kernel configuration file which has no any Ethernet device driver. - Add missing default label. - Fix NetBSD RCS Id.	2018-07-18 07:06:40 +00:00
msaitoh	f146225c9b	Add /d(dump) and /v(verbose) modifiers to DDB's "show mbuf" command. Mainly written by Hiroki SUENAGA. Currently, /v supports Ethernet, PPP, PPPoE, ARP, IPv4, ICMP, IPv6, ICMPv6, TCP and UDP.	2018-07-17 05:52:07 +00:00
maxv	62c8988166	Remove the kernel PMC code. Sent yesterday on tech-kern@. This change: * Removes "options PERFCTRS", the associated includes, and the associated ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is good. * Removes the PMC code of ARM XSCALE. * Removes all the pmc.h files. They were all empty, except for ARM XSCALE. * Reorders the x86 PMC code not to rely on the legacy pmc.h file. The definitions are put in sysarch.h. * Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control and sys_pmc_get_info syscalls. They are marked as OBSOL in kern, netbsd32 and rump. * Removes the pmc_evid_t and pmc_ctr_t types. * Removes all the associated man pages. The sets are marked as obsolete.	2018-07-12 10:46:40 +00:00
christos	f3e9eebed2	defflag {SETUID,FD}SCRIPTS	2018-06-30 00:37:37 +00:00
maxv	e9069ab139	compat_util.c must be compiled by default in the kernel. It is needed by generic non-compat code, so it must not depend on anything (libcompat or whatever option we choose to associate it to).	2017-12-16 10:15:12 +00:00
pgoyette	a372bceac2	Introduce new localcount(9) reference-count primitives.	2017-05-19 00:01:33 +00:00
christos	cd306a0c3c	use opt_kmem.h for the KMEM_ variables.	2017-04-12 20:05:54 +00:00
pgoyette	a60b99094c	* Split sys/kern/sys_process.c into three parts: 1 - ptrace(2) syscall for native emulation 2 - common ptrace(2) syscall code (shared with compat_netbsd32) 3 - support routines that are shared with PROCFS and/or KTRACE * Add module glue for #1 and #2. Both modules will be built-in to the kernel if "options PTRACE" is included in the config file (this is the default, defined in sys/conf/std). * Mark the ptrace(2) syscall as modular in syscalls.master (generated files will be committed shortly). * Conditionalize all remaining portions of PTRACE code on a new kernel option PTRACE_HOOKS. XXX Instead of PROCFS depending on 'options PTRACE', we should probably just add a procfs attribute to the sys/kern/sys_process.c file's entry in files.kern, and add PROCFS to the "#if defineds" for process_domem(). It's really confusing to have two different ways of requiring this file.	2016-11-02 00:11:59 +00:00
pgoyette	06402e0a42	Move kern_ctf.c into the dtrace_fbt module (the only place it is used) rather than including in kernels with KDTRACE_HOOKS defined. Update the dtrace_fbt module to depend on the zlib module. Bump kernel version to avoid module mismatch. Welcome to 7.99.38 !	2016-09-16 03:10:45 +00:00
riastradh	c03dceb184	Add passive references, intermediate between pserialize and refcount. Discussed on tech-kern: https://mail-index.netbsd.org/tech-kern/2016/01/24/msg020069.html API is still experimental and likely to change. (Obvious changes: either remove extra arguments everywhere, or shrink psref_target to a single bit, at the expense of possibly valuable diagnostic checks.) Should do some real testing before we use this in anger!	2016-04-09 06:21:16 +00:00
pgoyette	65fae32d98	Merge the compat_sysv module into the sysv_ipc module - it should never have been a separate module in the first place (my bad). Adjust dependencies as appropriate.	2015-12-03 02:51:00 +00:00
uebayasi	185d99e39a	Build conf/param.c normally.	2015-09-03 01:09:38 +00:00
uebayasi	520a795665	Move dev/ definitions out of files.kern.	2015-08-21 02:18:18 +00:00
knakahara	a604df282c	Add kernel code to support intrctl(8).	2015-08-17 06:16:02 +00:00
pgoyette	ff00f0b594	Split the SYSV* compat code out into a separate compat_sysv module. For monolithic kernels, both modules will be compiled as "built-ins", while modular environments will be able to load the SYSVSEM, SYSVSHM, and SYSVMSG code independant from the rest of compat. This is a necessary precursor step to making the "STD" SYSV* code into a separate module. Tested in both monolithic and modular environments with no errors seen.	2015-05-10 07:41:15 +00:00
hannken	e10a32f7f7	Remove miscfs/syncfs and - move the syncer into kern/vfs_subr.c. - change the syncer to process the mountlist and VFS_SYNC as appropriate. - use an API for mount points similiar to the API for vnodes: - vfs_syncer_add_to_worklist(struct mount mp) to add - vfs_syncer_remove_from_worklist(struct mount mp) to remove a mount. No objections on tech-kern@	2015-05-06 15:57:07 +00:00
mlelstv	1cd9710165	Merge dk_intf and dkdriver interfaces. Merge common disk driver functionality in ld.c with dksubr.c. Adjust the two previous users of dk_intf (cgd and xbd) to the changes. This file was missing from the commit.	2015-05-02 12:57:19 +00:00
christos	811682d624	syscallnames are needed by dtrace	2015-03-07 16:35:37 +00:00
uebayasi	c2aa342e46	Mark some stray files as kern for now.	2014-10-12 04:38:28 +00:00
uebayasi	27424e5fa5	Move kern definitions.	2014-10-12 04:30:42 +00:00

47 Commits