NetBSD

Commit Graph

Author	SHA1	Message	Date
riastradh	a220774a13	Provide hand-written AES NEON assembly for arm32. gcc does a lousy job at compiling 128-bit NEON intrinsics on arm32; hand-writing it made it about 12x faster, by avoiding a zillion loads and stores to spill everything and the kitchen sink onto the stack. (But gcc does fine on aarch64, presumably because it has twice as many registers and doesn't have to deal with q2=d4/d5 overlapping.)	2020-06-29 23:57:56 +00:00
riastradh	0a776e17e0	New permutation-based AES implementation using ARM NEON. Also derived from Mike Hamburg's public-domain vpaes code.	2020-06-29 23:56:30 +00:00
riastradh	9f4370e773	Move aarch64/fpu.h to arm/fpu.h.	2020-06-29 23:53:12 +00:00
riastradh	c057901613	New permutation-based AES implementation using SSSE3. This covers a lot of CPUs -- particularly lower-end CPUs over the past decade which lack AES-NI. Derived from Mike Hamburg's public domain vpaes software; see <https://crypto.stanford.edu/vpaes/> for details.	2020-06-29 23:51:35 +00:00
riastradh	4809cab8b6	Split SSE2 logic into separate units. Ensure that there are no paths into files compiled with -msse -msse2 at all except via fpu_kern_enter. I didn't run into a practical problem with this, but let's not leave a ticking time bomb for subsequent toolchain changes in case the mere declaration of local __m128i variables causes trouble.	2020-06-29 23:50:05 +00:00
riastradh	336b5650c6	New SSE2-based bitsliced AES implementation. This should work on essentially all x86 CPUs of the last two decades, and may improve throughput over the portable C aes_ct implementation from BearSSL by (a) reducing the number of vector operations in sequence, and (b) batching four rather than two blocks in parallel. Derived from BearSSL'S aes_ct64 implementation adjusted so that where aes_ct64 uses 64-bit q[0],...,q[7], aes_sse2 uses (q[0], q[4]), ..., (q[3], q[7]), each tuple representing a pair of 64-bit quantities stacked in a single 128-bit register. This translation was done very naively, and mostly reduces the cost of ShiftRows and data movement without doing anything to address the S-box or (Inv)MixColumns, which spread all 64-bit quantities across separate registers and ignore the upper halves. Unfortunately, SSE2 -- which is all that is guaranteed on all amd64 CPUs -- doesn't have PSHUFB, which would help out a lot more. For example, vpaes relies on that. Perhaps there are enough CPUs out there with PSHUFB but not AES-NI to make it worthwhile to import or adapt vpaes too. Note: This includes local definitions of various Intel compiler intrinsics for gcc and clang in terms of their __builtin_* &c., because the necessary header files are not available during the kernel build. This is a kludge -- we should fix it properly; the present approach is expedient but not ideal.	2020-06-29 23:47:54 +00:00
riastradh	04a6492d1e	New cgd cipher adiantum. Adiantum is a wide-block cipher, built out of AES, XChaCha12, Poly1305, and NH, defined in Paul Crowley and Eric Biggers, `Adiantum: length-preserving encryption for entry-level processors', IACR Transactions on Symmetric Cryptology 2018(4), pp. 39--61. Adiantum provides better security than a narrow-block cipher with CBC or XTS, because every bit of each sector affects every other bit, whereas with CBC each block of plaintext only affects the following blocks of ciphertext in the disk sector, and with XTS each block of plaintext only affects its own block of ciphertext and nothing else. Adiantum generally provides much better performance than constant-time AES-CBC or AES-XTS software do without hardware support, and performance comparable to or better than the variable-time (i.e., leaky) AES-CBC and AES-XTS software we had before. (Note: Adiantum also uses AES as a subroutine, but only once per disk sector. It takes only a small fraction of the time spent by Adiantum, so there's relatively little performance impact to using constant-time AES software over using variable-time AES software for it.) Adiantum naturally scales to essentially arbitrary disk sector sizes; sizes >=1024-bytes take the most advantage of Adiantum's design for performance, so 4096-byte sectors would be a natural choice if we taught cgd to change the disk sector size. (However, it's a different cipher for each disk sector size, so it _must_ be a cgd parameter.) The paper presents a similar construction HPolyC. The salient difference is that HPolyC uses Poly1305 directly, whereas Adiantum uses Poly1395(NH(...)). NH is annoying because it requires a 1072-byte key, which means the test vectors are ginormous, and changing keys is costly; HPolyC avoids these shortcomings by using Poly1305 directly, but HPolyC is measurably slower, costing about 1.5x what Adiantum costs on 4096-byte sectors. For the purposes of cgd, we will reuse each key for many messages, and there will be very few keys in total (one per cgd volume) so -- except for the annoying verbosity of test vectors -- the tradeoff weighs in the favour of Adiantum, especially if we teach cgd to do >>512-byte sectors. For now, everything that Adiantum needs beyond what's already in the kernel is gathered into a single file, including NH, Poly1305, and XChaCha12. We can split those out -- and reuse them, and provide MD tuned implementations, and so on -- as needed; this is just a first pass to get Adiantum implemented for experimentation.	2020-06-29 23:44:01 +00:00
riastradh	1f8a993cb5	VIA AES: Batch AES-XTS computation into eight blocks at a time. Experimental -- performance improvement is not clearly worth the complexity.	2020-06-29 23:41:35 +00:00
riastradh	7ff94d7a4a	Add AES implementation with VIA ACE.	2020-06-29 23:39:30 +00:00
riastradh	776602aed4	Provide the standard AES key schedule. Different AES implementations prefer different variations on it, but some of them -- notably VIA -- require the standard key schedule to be available and don't provide hardware support for computing it themselves. So adapt BearSSL's logic to generate the standard key schedule (and decryption keys, with InvMixColumns), rather than the bitsliced key schedule that BearSSL uses natively.	2020-06-29 23:36:59 +00:00
riastradh	d4b0170e34	Implement AES in kernel using ARMv8.0-AES on aarch64.	2020-06-29 23:31:41 +00:00
riastradh	99325bb896	Add x86 AES-NI support. Limited to amd64 for now. In principle, AES-NI should work in 32-bit mode, and there may even be some 32-bit-only CPUs that support AES-NI, but that requires work to adapt the assembly.	2020-06-29 23:29:39 +00:00
riastradh	5dcdae413b	Rework AES in kernel to finally address CVE-2005-1797. 1. Rip out old variable-time reference implementation. 2. Replace it by BearSSL's constant-time 32-bit logic. => Obtained from commit dda1f8a0c46e15b4a235163470ff700b2f13dcc5. => We could conditionally adopt the 64-bit logic too, which would likely give a modest performance boost on 64-bit platforms without AES-NI, but that's a bit more trouble. 3. Select the AES implementation at boot-time; allow an MD override. => Use self-tests to verify basic correctness at boot. => The implementation selection policy is rather rudimentary at the moment but it is isolated to one place so it's easy to change later on. This (a) plugs a host of timing attacks on, e.g., cgd, and (b) paves the way to take advantage of CPU support for AES -- both things we should've done a decade ago. Downside: Computing AES takes 2-3x the CPU time. But that's what hardware support will be coming for. Rudimentary measurement of performance impact done by: mount -t tmpfs tmpfs /tmp dd if=/dev/zero of=/tmp/disk bs=1m count=512 vnconfig -cv vnd0 /tmp/disk cgdconfig -s cgd0 /dev/vnd0 aes-cbc 256 < /dev/zero dd if=/dev/rcgd0d of=/dev/null bs=64k dd if=/dev/zero of=/dev/rcgd0d bs=64k The AES-CBC encryption performance impact is closer to 3x because it is inherently sequential; the AES-CBC decryption impact is closer to 2x because the bitsliced AES logic can process two blocks at once. Discussed on tech-kern: https://mail-index.NetBSD.org/tech-kern/2020/06/18/msg026505.html	2020-06-29 23:27:52 +00:00
riastradh	2c43339709	Count cprng_fast reseed events.	2020-04-30 03:29:45 +00:00
riastradh	66a63640d3	Adapt cprng_fast to use entropy_epoch(), not rnd_initial_entropy. This way it has an opportunity to be reseeded after boot.	2020-04-30 03:29:35 +00:00
rin	b203ba4088	Make crypto/rijindael optional again as cprng_strong does no longer depend on it. Dependency is explicitly declared in files.foo if a component requires it.	2020-04-22 09:15:39 +00:00
riastradh	7ba101b07e	Nuke crypto/arc4. Has not been used since 2003. Will not be missed.	2019-12-05 03:22:02 +00:00
riastradh	67c16d2af5	Use an explicit run-time assertion where compile-time doesn't work.	2019-09-19 18:29:55 +00:00
riastradh	1557be4823	Use CTASSERT where possible, run-time assertion where not. Should fix negative-length variable-length array found by kamil.	2019-09-19 14:34:59 +00:00
riastradh	8e07b51739	Switch from NIST CTR_DRBG with AES to NIST Hash_DRBG with SHA-256. Benefits: - larger seeds -- a 128-bit key alone is not enough for `128-bit security' - better resistance to timing side channels than AES - a better-understood security story (https://eprint.iacr.org/2018/349) - no loss in compliance with US government standards that nobody ever got fired for choosing, at least in the US-dominated western world - no dirty endianness tricks - self-tests Drawbacks: - performance hit: throughput is reduced to about 1/3 in naive measurements => possible to mitigate by using hardware SHA-256 instructions => all you really need is 32 bytes to seed a userland PRNG anyway => if we just used ChaCha this would go away... XXX pullup-7 XXX pullup-8 XXX pullup-9	2019-09-02 20:09:29 +00:00
mrg	da2b3afaf6	add fallthru comments. i considered patching makefiles to ignore these problems, but this code is dead upstream and likely will be removed here rather than ever updated.	2019-02-04 08:23:53 +00:00
christos	87fd18f8e5	s/static inline/static __inline/g for consistency.	2018-04-19 21:50:06 +00:00
alnsn	6c4ed8c121	Add XTS mode.	2016-12-11 00:28:44 +00:00
riastradh	5c5f06b858	More rnd.h user cleanup.	2015-04-13 22:43:41 +00:00
riastradh	556fc62b15	cprng_strong(kern_cprng, ...) never blocks, pass 0 for flags. FASYNC was wrong anyway! It's FNONBLOCK.	2015-04-13 15:51:00 +00:00
justin	1624076525	Fix inconsistent use of inline in prototype and definition	2014-08-11 22:36:49 +00:00
riastradh	47d7f02ac0	Tweak cprng_fast_buf to use 32-bit unaligned writes if possible.	2014-08-11 13:22:16 +00:00
riastradh	0c0361fcdd	Move initial entropy bookkeeping out of the fast path.	2014-08-11 13:12:53 +00:00
riastradh	7e518a1255	Use percpu_foreach instead of manual iteration.	2014-08-11 13:06:31 +00:00
riastradh	215d8661dd	Access to struct cprng_fast must be consistently at IPL_VM.	2014-08-11 13:01:58 +00:00
riastradh	bf5402594e	No need for cprng_fast_seed to be inline.	2014-08-11 03:50:29 +00:00
riastradh	a7da2729e7	Include <sys/rnd.h>, don't copypasta declare rnd_initial_entropy.	2014-08-11 03:47:49 +00:00
riastradh	8756d4c167	Sort #includes.	2014-08-11 03:46:54 +00:00
justin	cd946d79cd	define function consistently as inline	2014-08-10 22:35:32 +00:00
tls	ea6af427bd	Merge tls-earlyentropy branch into HEAD.	2014-08-10 16:44:32 +00:00
christos	d22061e092	fix sprintf.	2014-03-25 16:28:15 +00:00
pgoyette	f45c6e8a3c	Create modules for software crypto components.	2014-01-01 15:18:57 +00:00
tls	6e1dd068e9	Separate /dev/random pseudodevice implemenation from kernel entropy pool implementation. Rewrite pseudodevice code to use cprng_strong(9). The new pseudodevice is cloning, so each caller gets bits from a stream generated with its own key. Users of /dev/urandom get their generators keyed on a "best effort" basis -- the kernel will rekey generators whenever the entropy pool hits the high water mark -- while users of /dev/random get their generators rekeyed every time key-length bits are output. The underlying cprng_strong API can use AES-256 or AES-128, but we use AES-128 because of concerns about related-key attacks on AES-256. This improves performance (and reduces entropy pool depletion) significantly for users of /dev/urandom but does cause users of /dev/random to rekey twice as often. Also fixes various bugs (including some missing locking and a reseed-counter overflow in the CTR_DRBG code) found while testing this. For long reads, this generator is approximately 20 times as fast as the old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of 2.5MB/sec) and also uses a separate mutex per instance so concurrency is greatly improved. For reads of typical key sizes for modern cryptosystems (16-32 bytes) performance is about the same as the old code: a little better for 32 bytes, a little worse for 16 bytes.	2011-12-17 20:05:38 +00:00
macallan	19166a6288	NIST_CTR_DRBG.V is accessed as (unsigned long *) so we need to make sure it's aligned accordingly or we go boom on sparc64	2011-11-21 23:48:52 +00:00
tls	3afd44cf08	First step of random number subsystem rework described in <20111022023242.BA26F14A158@mail.netbsd.org>. This change includes the following: An initial cleanup and minor reorganization of the entropy pool code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are fixed. Some effort is made to accumulate entropy more quickly at boot time. A generic interface, "rndsink", is added, for stream generators to request that they be re-keyed with good quality entropy from the pool as soon as it is available. The arc4random()/arc4randbytes() implementation in libkern is adjusted to use the rndsink interface for rekeying, which helps address the problem of low-quality keys at boot time. An implementation of the FIPS 140-2 statistical tests for random number generator quality is provided (libkern/rngtest.c). This is based on Greg Rose's implementation from Qualcomm. A new random stream generator, nist_ctr_drbg, is provided. It is based on an implementation of the NIST SP800-90 CTR_DRBG by Henric Jungheim. This generator users AES in a modified counter mode to generate a backtracking-resistant random stream. An abstraction layer, "cprng", is provided for in-kernel consumers of randomness. The arc4random/arc4randbytes API is deprecated for in-kernel use. It is replaced by "cprng_strong". The current cprng_fast implementation wraps the existing arc4random implementation. The current cprng_strong implementation wraps the new CTR_DRBG implementation. Both interfaces are rekeyed from the entropy pool automatically at intervals justifiable from best current cryptographic practice. In some quick tests, cprng_fast() is about the same speed as the old arc4randbytes(), and cprng_strong() is about 20% faster than rnd_extract_data(). Performance is expected to improve. The AES code in src/crypto/rijndael is no longer an optional kernel component, as it is required by cprng_strong, which is not an optional kernel component. The entropy pool output is subjected to the rngtest tests at startup time; if it fails, the system will reboot. There is approximately a 3/10000 chance of a false positive from these tests. Entropy pool _input_ from hardware random numbers is subjected to the rngtest tests at attach time, as well as the FIPS continuous-output test, to detect bad or stuck hardware RNGs; if any are detected, they are detached, but the system continues to run. A problem with rndctl(8) is fixed -- datastructures with pointers in arrays are no longer passed to userspace (this was not a security problem, but rather a major issue for compat32). A new kernel will require a new rndctl. The sysctl kern.arandom() and kern.urandom() nodes are hooked up to the new generators, but the /dev/*random pseudodevices are not, yet. Manual pages for the new kernel interfaces are forthcoming.	2011-11-19 22:51:18 +00:00
jmmv	9b52d4003a	Revert my previous change. christos@ submitted a different fix pretty much at the same time. Did an update amd64 release build to ensure my change was really not needed.	2011-05-14 16:46:55 +00:00
jmmv	d899efcf6e	Declare for-loop control variable outside of the for statement to prevent a warning and therefore fix the build.	2011-05-14 16:27:49 +00:00
christos	018b374686	- don't assume aligned buffers. - little KNF	2011-05-14 01:59:19 +00:00
drochner	9d083d2f9c	add "camellia" crypto code, copied from FreeBSD	2011-05-05 17:38:35 +00:00
pooka	4d79e8c53d	Apply const where necessary (XXX: where is bf_locl.org?)	2009-06-30 13:14:40 +00:00
dsl	02cdf4d2c8	Remove all the __P() from sys (excluding sys/dist) Diff checked with grep and MK1 eyeball. i386 and amd64 GENERIC and sys still build.	2009-03-14 14:45:51 +00:00
lukem	6ec6d598ac	use __KERNEL_RCSID()	2007-12-11 23:31:07 +00:00
lukem	06d6cbc0d9	use __KERNEL_RCSID()	2007-12-11 23:13:57 +00:00
cbiere	f5b684cf56	Added missing const-qualifiers.	2007-01-22 01:38:33 +00:00
cbiere	6d8f729825	Added const-qualifiers.	2007-01-21 23:00:08 +00:00

1 2 3

135 Commits