Commit Graph

47 Commits

Author SHA1 Message Date
riastradh
5084c1b50f Rewrite entropy subsystem.
Primary goals:

1. Use cryptography primitives designed and vetted by cryptographers.
2. Be honest about entropy estimation.
3. Propagate full entropy as soon as possible.
4. Simplify the APIs.
5. Reduce overhead of rnd_add_data and cprng_strong.
6. Reduce side channels of HWRNG data and human input sources.
7. Improve visibility of operation with sysctl and event counters.

Caveat: rngtest is no longer used generically for RND_TYPE_RNG
rndsources.  Hardware RNG devices should have hardware-specific
health tests.  For example, checking for two repeated 256-bit outputs
works to detect AMD's 2019 RDRAND bug.  Not all hardware RNGs are
necessarily designed to produce exactly uniform output.

ENTROPY POOL

- A Keccak sponge, with test vectors, replaces the old LFSR/SHA-1
  kludge as the cryptographic primitive.

- `Entropy depletion' is available for testing purposes with a sysctl
  knob kern.entropy.depletion; otherwise it is disabled, and once the
  system reaches full entropy it is assumed to stay there as far as
  modern cryptography is concerned.

- No `entropy estimation' based on sample values.  Such `entropy
  estimation' is a contradiction in terms, dishonest to users, and a
  potential source of side channels.  It is the responsibility of the
  driver author to study the entropy of the process that generates
  the samples.

- Per-CPU gathering pools avoid contention on a global queue.

- Entropy is occasionally consolidated into global pool -- as soon as
  it's ready, if we've never reached full entropy, and with a rate
  limit afterward.  Operators can force consolidation now by running
  sysctl -w kern.entropy.consolidate=1.

- rndsink(9) API has been replaced by an epoch counter which changes
  whenever entropy is consolidated into the global pool.
  . Usage: Cache entropy_epoch() when you seed.  If entropy_epoch()
    has changed when you're about to use whatever you seeded, reseed.
  . Epoch is never zero, so initialize cache to 0 if you want to reseed
    on first use.
  . Epoch is -1 iff we have never reached full entropy -- in other
    words, the old rnd_initial_entropy is (entropy_epoch() != -1) --
    but it is better if you check for changes rather than for -1, so
    that if the system estimated its own entropy incorrectly, entropy
    consolidation has the opportunity to prevent future compromise.

- Sysctls and event counters provide operator visibility into what's
  happening:
  . kern.entropy.needed - bits of entropy short of full entropy
  . kern.entropy.pending - bits known to be pending in per-CPU pools,
    can be consolidated with sysctl -w kern.entropy.consolidate=1
  . kern.entropy.epoch - number of times consolidation has happened,
    never 0, and -1 iff we have never reached full entropy

CPRNG_STRONG

- A cprng_strong instance is now a collection of per-CPU NIST
  Hash_DRBGs.  There are only two in the system: user_cprng for
  /dev/urandom and sysctl kern.?random, and kern_cprng for kernel
  users which may need to operate in interrupt context up to IPL_VM.

  (Calling cprng_strong in interrupt context does not strike me as a
  particularly good idea, so I added an event counter to see whether
  anything actually does.)

- Event counters provide operator visibility into when reseeding
  happens.

INTEL RDRAND/RDSEED, VIA C3 RNG (CPU_RNG)

- Unwired for now; will be rewired in a subsequent commit.
2020-04-30 03:28:18 +00:00
thorpej
276ef22378 Add a NetBSD native futex implementation, mostly written by riastradh@.
Map the COMPAT_LINUX futex calls to the native ones.
2020-04-26 18:53:31 +00:00
rin
a34427deb2 At the moment, we need kern/uipc_mbufdebug.c only if DDB is enabled. 2020-04-22 09:18:42 +00:00
riastradh
54e08fd152 Include kern_crashme.c in non-DEBUG kernels.
This is useful for simulating crashes in production to test failover.
2020-03-02 16:00:54 +00:00
maxv
081da2e4c3 Retire KLEAK.
KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.
2020-02-08 07:07:06 +00:00
kamil
71b1583f64 Rename sys_ptrace_lwpstatus.c to sys_process_lwpstatus.c
Keep the names of functions internally as ptrace intact as this code
is shared with core_elf32.c that already reaches ptrace(2) specifc symbols.

No functional change intended.
2020-01-04 03:46:19 +00:00
kamil
3097490d1b Put ptrace_read_lwpstatus() and process_read_lwpstatus() to a new file
Fixes "no PTRACE" kernel build, in particular zaurus kernel=INSTALL_C700.
2019-12-26 08:52:38 +00:00
ad
dd632e5898 Split subr_cpu.c out of kern_cpu.c, to contain routines shared with rump. 2019-12-20 21:20:09 +00:00
pgoyette
f01c2b4e29 Eliminate per-hook duplication of common code as suggested by
(and with major contributions from) riastradh@

Welcome to 9.99.23
2019-12-12 22:55:20 +00:00
pgoyette
1d577fe379 Move all non-emulation-specific coredump code into the coredump module,
and remove all #ifdef COREDUMP conditional compilation.  Now, the
coredump module is completely separated from the emulation modules, and
they can all be independently loaded and unloaded.

Welcome to 9.99.18 !
2019-11-20 19:37:51 +00:00
maxv
10c5b02320 Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
 - "shad", to track uninitialized memory with a bit granularity (1:1).
   Each bit set to 1 in the shad corresponds to one uninitialized bit of
   real kernel memory.
 - "orig", to track the origin of the memory with a 4-byte granularity
   (1:1). Each uint32_t cell in the orig indicates the origin of the
   associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
 - a code designating the type of memory (Stack, Pool, etc), and
 - a compressed pointer, which points either (1) to a string containing
   the name of the variable associated with the cell, or (2) to an area
   in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.
2019-11-14 16:23:52 +00:00
maxv
b7edd3d132 Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

 - On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
   describing the access, and delay the calling CPU (10ms).

 - On all memory accesses, we verify if the memory we're reading/writing
   is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.
2019-11-05 20:19:17 +00:00
maxv
3808726a03 Unlink KMEM_GUARD leftovers. 2019-08-15 12:24:08 +00:00
christos
9a23e406ae move setdisklabel(9) into a separate file. 2019-04-04 20:19:07 +00:00
kamil
0fe7e51662 Add KCOV - kernel code coverage tracing device
The KCOV driver implements collection of code coverage inside the kernel.
It can be enabled on a per process basis from userland, allowing the kernel
program counter to be collected during syscalls triggered by the same
process.

The device is oriented towards kernel fuzzers, in particular syzkaller.

Currently the only supported coverage type is -fsanitize-coverage=trace-pc.

The KCOV driver was initially developed in Linux. A driver based on the
same concept was then implemented in FreeBSD and OpenBSD.

Documentation is borrowed from OpenBSD and ATF tests from FreeBSD.

This patch has been prepared by Siddharth Muralee, improved by <maxv>
and polished by myself before importing into the mainline tree.

All ATF tests pass.
2019-02-23 03:10:05 +00:00
kamil
e807f4b65a Silent UB alignment issues in acpica under kUBSan
Pass -DACPI_MISALIGNMENT_NOT_SUPPORTED under kUBSan enabled. This option
is dedicated for alignment sensitive CPUs in acpica. It was originally
designed for Itanium CPUs, but nowadays it's wanted for aarch64 as well.

Define it in acpica code under kUBSan in order to pacify Undefined Behavior
reports on all ports (in particular x86). The number of reports is now
halved with this patch applied. The remaining alignment alarms in acpica
will be addressed in future.

Patch contributed by <Akul Pillai>
2019-02-13 18:04:35 +00:00
pgoyette
d91f98a871 Merge the [pgoyette-compat] branch 2019-01-27 02:08:33 +00:00
mrg
f40d255452 crashme: a framework to test kernel faults.
so far, only a basic panic() and null deref nodes are added.
with options DEBUG, one can now use:

   # sysctl -w kern.crashme_enable=1

   # sysctl -w kern.crashme.panic=1
   # sysctl -w kern.crashme.null_deref=1

to trigger a crash.  crashme_enable must be set to 1 before any
of the nodes will be writeable.

supports dynamic additional/removal of crashme nodes.


(obsoletes kern.panic_now, which will be removed later.)
2019-01-09 04:01:20 +00:00
thorpej
2834fa0ab4 Add threadpool(9), an abstraction that provides shared pools of kernel
threads running at specific priorities, with support for unbound pools
and per-cpu pools.

Written by riastradh@, and based on the May 2014 draft, with a few changes
by me:
- Working on the assumption that a relative few priorities will actually
  be used, reduce the memory footprint by using linked lists, rather than
  2 large (and mostly empty) tables.  The performance impact is essentially
  nil, since these lists are consulted only when pools are created (and
  destroyed, for DIAGNOSTIC checks), and the lists will have at most 225
  entries.
- Make threadpool job object, which the caller must allocate storage for,
  really opaque.
- Use typedefs for the threadpool types, to reduce the verbosity of the
  API somewhat.
- Fix a bunch of pool / worker thread / job object lifecycle bugs.

Also include an ATF unit test, written by me, that exercises the basics
of the API by loading a kernel module that exposes several sysctls that
allow the ATF test script to create and destroy threadpools, schedule a
basic job, and verify that it ran.

And thus NetBSD 8.99.29 has arrived.
2018-12-24 16:58:53 +00:00
rmind
6577bb50ec Import thmap -- a concurrent trie-hash map, combining the elements of
hashing and radix trie.  It supports lock-free lookups and concurrent
inserts/deletes.  It is designed to be optimal as a general purpose
*concurrent* associative array.

Upstream: https://github.com/rmind/thmap
Discussed on tech-kern@
2018-12-16 14:06:56 +00:00
christos
171ab4cc98 Expose addresses depending on the KASLR setting (from mrg@). Restores the
status quo of exposing kernel addresses if there is no KASLR.
2018-12-03 00:11:02 +00:00
maxv
e5fadd7f81 Introduce KLEAK, a new feature that can detect kernel information leaks.
It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).
2018-12-02 21:00:13 +00:00
maxv
decaffde75 Merge uipc_mbuf2.c into uipc_mbuf.c. Reorder the latter a little to gather
similar functions. No functional change.
2018-11-15 09:38:57 +00:00
maxv
790d0b797c Move the MI parts of KASAN into kern/subr_asan.c. This file includes
machine/asan.h, which contains the MD functions. We use an include rather
than a plain C file, because we want GCC to optimize/inline some functions
into one single block.

The amd64 MD parts of KASAN are moved accordingly.

The naming convention we use is:

	kasan_*
		a generic kasan object, declared in subr_asan.c
	kasan_md_*
		an MD kasan object, declared in machine/asan.h, and used
		in subr_asan.c
	__md_*
		an MD object, declared in machine/asan.h, and not used
		outside

Overall this makes it easier to add KASAN support on more architectures.

Discussed with several people.
2018-10-31 06:26:25 +00:00
mrg
ca332959c3 retire kern_xxx.c. long live kern_xxx.c.
split it into kern_reboot.c and kern_scdebug.c.  while here,
add my copyright to kern_scdebug.c as it was largely rewritten
for kernhist support.
2018-09-14 01:55:19 +00:00
maxv
312ff3500a Retire KMEM_REDZONE and KMEM_POISON.
KMEM_REDZONE is not very efficient and cannot detect read overflows. KASAN
can, and will be used instead.

KMEM_POISON is enabled along with KMEM_GUARD, but it is redundant, since
the latter can detect read UAFs contrary to the former. In fact maybe
KMEM_GUARD should be retired too, because there are many cases where it
doesn't apply.

Simplifies the code.
2018-08-20 11:35:28 +00:00
kamil
e1971882d1 Register kUBSan in the GENERIC amd64 kernel config
Tested with GCC.
2018-08-03 04:35:20 +00:00
msaitoh
b0dc241449 - Fix compile error for kernel configuration file which has no any Ethernet
device driver.
- Add missing default label.
- Fix NetBSD RCS Id.
2018-07-18 07:06:40 +00:00
msaitoh
f146225c9b Add /d(dump) and /v(verbose) modifiers to DDB's "show mbuf" command. Mainly
written by Hiroki SUENAGA. Currently, /v supports Ethernet, PPP, PPPoE, ARP,
IPv4, ICMP, IPv6, ICMPv6, TCP and UDP.
2018-07-17 05:52:07 +00:00
maxv
62c8988166 Remove the kernel PMC code. Sent yesterday on tech-kern@.
This change:

 * Removes "options PERFCTRS", the associated includes, and the associated
   ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
   good.

 * Removes the PMC code of ARM XSCALE.

 * Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

 * Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
   definitions are put in sysarch.h.

 * Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
   and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
   netbsd32 and rump.

 * Removes the pmc_evid_t and pmc_ctr_t types.

 * Removes all the associated man pages. The sets are marked as obsolete.
2018-07-12 10:46:40 +00:00
christos
f3e9eebed2 defflag {SETUID,FD}SCRIPTS 2018-06-30 00:37:37 +00:00
maxv
e9069ab139 compat_util.c must be compiled by default in the kernel. It is needed by
generic non-compat code, so it must not depend on anything (libcompat or
whatever option we choose to associate it to).
2017-12-16 10:15:12 +00:00
pgoyette
a372bceac2 Introduce new localcount(9) reference-count primitives. 2017-05-19 00:01:33 +00:00
christos
cd306a0c3c use opt_kmem.h for the KMEM_ variables. 2017-04-12 20:05:54 +00:00
pgoyette
a60b99094c * Split sys/kern/sys_process.c into three parts:
1 - ptrace(2) syscall for native emulation
        2 - common ptrace(2) syscall code (shared with compat_netbsd32)
        3 - support routines that are shared with PROCFS and/or KTRACE

* Add module glue for #1 and #2.  Both modules will be built-in to the
  kernel if "options PTRACE" is included in the config file (this is
  the default, defined in sys/conf/std).

* Mark the ptrace(2) syscall as modular in syscalls.master (generated
  files will be committed shortly).

* Conditionalize all remaining portions of PTRACE code on a new kernel
  option PTRACE_HOOKS.

XXX Instead of PROCFS depending on 'options PTRACE', we should probably
    just add a procfs attribute to the sys/kern/sys_process.c file's
    entry in files.kern, and add PROCFS to the "#if defineds" for
    process_domem().  It's really confusing to have two different ways
    of requiring this file.
2016-11-02 00:11:59 +00:00
pgoyette
06402e0a42 Move kern_ctf.c into the dtrace_fbt module (the only place it is used)
rather than including in kernels with KDTRACE_HOOKS defined.  Update
the dtrace_fbt module to depend on the zlib module.

Bump kernel version to avoid module mismatch.

Welcome to 7.99.38 !
2016-09-16 03:10:45 +00:00
riastradh
c03dceb184 Add passive references, intermediate between pserialize and refcount.
Discussed on tech-kern:

https://mail-index.netbsd.org/tech-kern/2016/01/24/msg020069.html

API is still experimental and likely to change.  (Obvious changes:
either remove extra arguments everywhere, or shrink psref_target to a
single bit, at the expense of possibly valuable diagnostic checks.)
Should do some real testing before we use this in anger!
2016-04-09 06:21:16 +00:00
pgoyette
65fae32d98 Merge the compat_sysv module into the sysv_ipc module - it should
never have been a separate module in the first place (my bad).

Adjust dependencies as appropriate.
2015-12-03 02:51:00 +00:00
uebayasi
185d99e39a Build conf/param.c normally. 2015-09-03 01:09:38 +00:00
uebayasi
520a795665 Move dev/ definitions out of files.kern. 2015-08-21 02:18:18 +00:00
knakahara
a604df282c Add kernel code to support intrctl(8). 2015-08-17 06:16:02 +00:00
pgoyette
ff00f0b594 Split the SYSV* compat code out into a separate compat_sysv module.
For monolithic kernels, both modules will be compiled as "built-ins",
while modular environments will be able to load the SYSVSEM, SYSVSHM,
and SYSVMSG code independant from the rest of compat.

This is a necessary precursor step to making the "STD" SYSV* code
into a separate module.

Tested in both monolithic and modular environments with no errors
seen.
2015-05-10 07:41:15 +00:00
hannken
e10a32f7f7 Remove miscfs/syncfs and
- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
  - vfs_syncer_add_to_worklist(struct mount *mp) to add
  - vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@
2015-05-06 15:57:07 +00:00
mlelstv
1cd9710165 Merge dk_intf and dkdriver interfaces.
Merge common disk driver functionality in ld.c with dksubr.c.
Adjust the two previous users of dk_intf (cgd and xbd) to
the changes.

This file was missing from the commit.
2015-05-02 12:57:19 +00:00
christos
811682d624 syscallnames are needed by dtrace 2015-03-07 16:35:37 +00:00
uebayasi
c2aa342e46 Mark some stray files as kern for now. 2014-10-12 04:38:28 +00:00
uebayasi
27424e5fa5 Move kern definitions. 2014-10-12 04:30:42 +00:00