- include <bsd.init.mk> now that we can do it, because we need Makefile.rump
to be included first, so that NOLINT gets defined, so that we don't end up
building lint modules just for this.
This may reduce performance by not taking advantage of 64x64->128
multiplications on some platforms, but let's worry about that later
and fix the build on the other platforms instead.
Adiantum is a wide-block cipher, built out of AES, XChaCha12,
Poly1305, and NH, defined in
Paul Crowley and Eric Biggers, `Adiantum: length-preserving
encryption for entry-level processors', IACR Transactions on
Symmetric Cryptology 2018(4), pp. 39--61.
Adiantum provides better security than a narrow-block cipher with CBC
or XTS, because every bit of each sector affects every other bit,
whereas with CBC each block of plaintext only affects the following
blocks of ciphertext in the disk sector, and with XTS each block of
plaintext only affects its own block of ciphertext and nothing else.
Adiantum generally provides much better performance than
constant-time AES-CBC or AES-XTS software do without hardware
support, and performance comparable to or better than the
variable-time (i.e., leaky) AES-CBC and AES-XTS software we had
before. (Note: Adiantum also uses AES as a subroutine, but only once
per disk sector. It takes only a small fraction of the time spent by
Adiantum, so there's relatively little performance impact to using
constant-time AES software over using variable-time AES software for
it.)
Adiantum naturally scales to essentially arbitrary disk sector sizes;
sizes >=1024-bytes take the most advantage of Adiantum's design for
performance, so 4096-byte sectors would be a natural choice if we
taught cgd to change the disk sector size. (However, it's a
different cipher for each disk sector size, so it _must_ be a cgd
parameter.)
The paper presents a similar construction HPolyC. The salient
difference is that HPolyC uses Poly1305 directly, whereas Adiantum
uses Poly1395(NH(...)). NH is annoying because it requires a
1072-byte key, which means the test vectors are ginormous, and
changing keys is costly; HPolyC avoids these shortcomings by using
Poly1305 directly, but HPolyC is measurably slower, costing about
1.5x what Adiantum costs on 4096-byte sectors.
For the purposes of cgd, we will reuse each key for many messages,
and there will be very few keys in total (one per cgd volume) so --
except for the annoying verbosity of test vectors -- the tradeoff
weighs in the favour of Adiantum, especially if we teach cgd to do
>>512-byte sectors.
For now, everything that Adiantum needs beyond what's already in the
kernel is gathered into a single file, including NH, Poly1305, and
XChaCha12. We can split those out -- and reuse them, and provide MD
tuned implementations, and so on -- as needed; this is just a first
pass to get Adiantum implemented for experimentation.
1. Rip out old variable-time reference implementation.
2. Replace it by BearSSL's constant-time 32-bit logic.
=> Obtained from commit dda1f8a0c46e15b4a235163470ff700b2f13dcc5.
=> We could conditionally adopt the 64-bit logic too, which would
likely give a modest performance boost on 64-bit platforms
without AES-NI, but that's a bit more trouble.
3. Select the AES implementation at boot-time; allow an MD override.
=> Use self-tests to verify basic correctness at boot.
=> The implementation selection policy is rather rudimentary at
the moment but it is isolated to one place so it's easy to
change later on.
This (a) plugs a host of timing attacks on, e.g., cgd, and (b) paves
the way to take advantage of CPU support for AES -- both things we
should've done a decade ago. Downside: Computing AES takes 2-3x the
CPU time. But that's what hardware support will be coming for.
Rudimentary measurement of performance impact done by:
mount -t tmpfs tmpfs /tmp
dd if=/dev/zero of=/tmp/disk bs=1m count=512
vnconfig -cv vnd0 /tmp/disk
cgdconfig -s cgd0 /dev/vnd0 aes-cbc 256 < /dev/zero
dd if=/dev/rcgd0d of=/dev/null bs=64k
dd if=/dev/zero of=/dev/rcgd0d bs=64k
The AES-CBC encryption performance impact is closer to 3x because it
is inherently sequential; the AES-CBC decryption impact is closer to
2x because the bitsliced AES logic can process two blocks at once.
Discussed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2020/06/18/msg026505.html
Benefits:
- larger seeds -- a 128-bit key alone is not enough for `128-bit security'
- better resistance to timing side channels than AES
- a better-understood security story (https://eprint.iacr.org/2018/349)
- no loss in compliance with US government standards that nobody ever
got fired for choosing, at least in the US-dominated western world
- no dirty endianness tricks
- self-tests
Drawbacks:
- performance hit: throughput is reduced to about 1/3 in naive measurements
=> possible to mitigate by using hardware SHA-256 instructions
=> all you really need is 32 bytes to seed a userland PRNG anyway
=> if we just used ChaCha this would go away...
XXX pullup-7
XXX pullup-8
XXX pullup-9
It is yet another psref leak detector that enables to tell where a leak occurs
while a simpler version that is already committed just tells an occurrence of a
leak.
Investigating of psref leaks is hard because once a leak occurs a percpu list of
psref that tracks references can be corrupted. A reference to a tracking object
is memorized in the list via an intermediate object (struct psref) that is
normally allocated on a stack of a thread. Thus, the intermediate object can be
overwritten on a leak resulting in corruption of the list.
The tracker makes a shadow entry to an intermediate object and stores some hints
into it (currently it's a caller address of psref_acquire). We can detect a
leak by checking the entries on certain points where any references should be
released such as the return point of syscalls and the end of each softint
handler.
The feature is expensive and enabled only if the kernel is built with
PSREF_DEBUG.
Proposed on tech-kern
It detects leaks by counting up the number of held psref by an LWP and checking
its zeroness at the end of syscalls and softint handlers. For the counter, a
unused field of struct lwp is reused.
The detector runs only if DIAGNOSTIC is turned on.
XXX arm_icache_sync_range() and mips_icache_sync_range() call
(void)rumpcomp_sync_icache((void *)va, (uint64_t)sz);
but linking fails if I do the same on aarch64 (I suspect it also
fails on 32bit arm and mips).
As a workaround, I call __builtin___clear_cache().
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.
XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).
This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file
sys/arch/usermode/modules/syscallemu/syscallemu.c
this changes the upstream vendor from OpenSolaris to FreeBSD,
and this version is based on FreeBSD svn r315983.
in addition to the 10 years of improvements from upstream,
this version also has these NetBSD-specific enhancements:
- dtrace FBT probes can now be placed in kernel modules.
- ZFS now supports mmap().
sprinkling them around the faction directories. Avoids having
to add a CPPFLAGS (or several) to pretty much every component
Makefile.
Leave compat headers around in the old locations.
The commit changes some autogenerated files, but I'll fix the
generators shortly and regen.