NetBSD

Commit Graph

Author	SHA1	Message	Date
rin	78c3759dfd	sys/crypto: aarch64: Catch up with builtin rename for GCC12 Kernel self tests successfully pass for aarch64{,eb}. Same binary generated by GCC10 and GCC12 for: --- #include <sys/types.h> #include "arm_neon.h" uint32x4_t my_vshrq_n_u32(uint32x4_t v, uint8_t bits) { return vshrq_n_u32(v, bits); } uint8x16_t my_vshrq_n_u8(uint8x16_t v, uint8_t bits) { return vshrq_n_u8(v, bits); } ---	2023-08-07 01:14:19 +00:00
rin	d754abaff4	sys/crypto: Introduce arch/{arm,x86} to share common MD headers Dedup between aes and chacha. No binary changes.	2023-08-07 01:07:35 +00:00
rin	8ee3d6ae37	sys/crypto/{aes,chacha}/arch/arm/arm_neon.h: Sync (whitespace fix) No binary changes.	2023-08-07 00:58:35 +00:00
riastradh	1242a10b26	cprng_fast(9): Drop and retake percpu reference across cprng_strong. cprng_strong may sleep on an adaptive lock (via entropy_extract), which invalidates percpu(9) references. Discovered by stumbling upon this panic in a test run: panic: kernel diagnostic assertion "(cprng == percpu_getref(cprng_fast_percpu)) && (percpu_putref(cprng_fast_percpu), true)" failed: file "/home/riastradh/netbsd/current/src/sys/rump/librump/rumpkern/../../../crypto/cprng_fast/cprng_fast.c", line 117 XXX pullup-10	2023-08-05 11:39:18 +00:00
jmcneill	3f729ba586	Make aes and chacha prints debug only.	2022-11-05 17:36:33 +00:00
riastradh	2bea1ff88c	cprng_fast(9): Assert not in pserialize read section. This may sleep to take the global entropy lock in case it needs to be reseeded. If that happens we can't be in a pserialize read section.	2022-09-01 18:32:25 +00:00
riastradh	57b54f5ca4	arm/aes_neon: Fix formatting of self-test failure message. Discovered by code inspection. Remarkably, a combination of errors made this fail to be a stack buffer overrun. Verified by booting with ARMv8.0-AES disabled and with the self-test artificially made to fail.	2022-06-26 17:52:54 +00:00
riastradh	caa8fd7fab	cprng(9): cprng_fast is no longer used from interrupt context. Rip out logic to defer reseeding to softint.	2022-06-01 15:44:37 +00:00
andvar	f84252b461	fix various typos in comments and log messages.	2022-04-16 18:15:20 +00:00
msaitoh	68c4a8e200	s/folllowing/following/	2021-12-05 04:48:35 +00:00
jmcneill	129a3690a1	Upgrade self-test passed messages from verbose to debug.	2021-10-17 14:45:45 +00:00
andvar	279d5541d3	fix typos in comments.	2021-10-15 22:32:28 +00:00
gutteridge	7735a0bb91	Fix typos in comments and add missing KERNEL_RCSID	2021-09-04 00:33:09 +00:00
christos	674dc63d44	use an enum instead of constant variables so that they work in CTASSERT.	2021-04-14 21:29:57 +00:00
rin	d50adbc140	Fix build with clang for earmv7hf; loadroundkey() is used only for __aarch64__.	2020-11-21 08:09:21 +00:00
jmcneill	4a48ef14f2	Fix detection of NEON features. ID_AA64PFR0_EL1_ADV_SIMD_NONE means SIMD is not available, and any other value means it is.	2020-10-10 08:24:10 +00:00
riastradh	ea2d112d7c	aes neon: Gather mc_forward/backward so we can load 256 bits at once.	2020-09-10 11:31:03 +00:00
riastradh	3e1dd6a02d	aes neon: Hoist dsbd/dsbe address calculation out of loop.	2020-09-10 11:30:28 +00:00
riastradh	db39c37e7d	aes neon: Tweak register usage. - Call r12 by its usual name, ip. - No need for r7 or r11=fp at the moment.	2020-09-10 11:30:08 +00:00
riastradh	b5c99049d3	aes neon: Write vtbl with {qN} rather than {d(2N)-d(2N+1)}. Cosmetic; no functional change.	2020-09-10 11:29:43 +00:00
riastradh	8bfafdf5aa	aes neon: Issue 256-bit loads rather than pairs of 128-bit loads. Not sure why I didn't realize you could do this before! Saves some temporary registers that can now be allocated to shave off a few cycles.	2020-09-10 11:29:02 +00:00
riastradh	c71abd7388	aesarmv8: Reallocate registers to shave off unnecessary MOV.	2020-09-08 23:58:09 +00:00
riastradh	f70af73535	aesarmv8: Issue two 4-register ld/st, not four 2-register ld/st.	2020-09-08 23:57:43 +00:00
riastradh	ab19f80d4d	aesarmv8: Adapt aes_armv8_64.S to big-endian. Patch mainly from (and tested by) jakllsch@ with minor tweaks by me.	2020-09-08 23:57:13 +00:00
riastradh	0fc796c545	aes(9): Fix edge case in bitsliced SSE2 AES-CBC decryption. Make sure self-tests exercise this edge case. Discovered by confusion over code inspection of jak's adaptation of aes_armv8_64.S for big-endian.	2020-09-08 22:48:24 +00:00
jakllsch	3eade4a405	Acknowledge clang warning for NEON cipher code on aarch64eb We've already made the nonportable vector initializations portable; the code works on aarch64eb.	2020-09-08 17:35:27 +00:00
jakllsch	b762c4de07	use correct condition	2020-09-08 17:17:32 +00:00
jakllsch	9cb9f9bc98	Fix vgetq_lane_u32 for aarch64eb with GCC Fixes NEON AES on aarch64eb	2020-09-07 18:06:13 +00:00
jakllsch	ee45e31caf	Use a working macro to detect big endian aarch64. Fixes aarch64eb NEON ChaCha.	2020-09-07 18:05:17 +00:00
maxv	60236c8c49	x86: fix several CPUID flags - Rename: CPUID_PN -> CPUID_PSN CPUID_CFLUSH -> CPUID_CLFSH CPUID_SBF -> CPUID_PBE CPUID_LZCNT -> CPUID_ABM CPUID_P1GB -> CPUID_PAGE1GB CPUID2_PCLMUL -> CPUID2_PCLMULQDQ CPUID2_CID -> CPUID2_CNXTID CPUID2_xTPR -> CPUID2_XTPR CPUID2_AES -> CPUID2_AESNI To match the x86 specification and the other OSes. - Remove: CPUID_B10, CPUID_B20, CPUID_IA64. They do not exist.	2020-09-05 07:45:44 +00:00
christos	1d0978b88c	Instead of returning 0 when sysctl kern.expose_address=0, return a random hashed value of the data. This allows sockstat to work without exposing kernel addresses or being setgid kmem.	2020-08-26 22:56:55 +00:00
riastradh	3a2006068f	Adjust sp, not fp, to allocate a 32-byte temporary. Costs another couple MOV instructions, but we can't skimp on this -- there's no red zone below sp for interrupts on arm, so we can't touch anything there. So just use fp to save sp and then adjust sp itself, rather than using fp as a temporary register to point just below sp. Should fix PR port-arm/55598 -- previously the ChaCha self-test failed 33/10000 trials triggered by sysctl during running system; with the patch it has failed 0/10000 trials. (Presumably it happened more often at boot time, leading to 5/26 failures in the test bed, because we just enabled interrupts and some devices are starting to deliver interrupts.)	2020-08-23 16:39:06 +00:00
riastradh	faaca7c6d4	Import small BLAKE2s implementation.	2020-08-20 21:21:05 +00:00
riastradh	a5b568d2b4	[ozaki-r] libsodium glue	2020-08-20 21:20:16 +00:00
riastradh	613921b5b8	Fix AES NEON code for big-endian softfp ARM. ...which is how the kernel runs. Switch to using __SOFTFP__ for consistency with how it gets exposed to C, although I'm not sure how to get it defined automagically in the toolchain for .S files so that's set manually in files.aesneon for now.	2020-08-16 18:02:03 +00:00
rin	0d644b585e	Add hack to compile aes_ccm_tag() with -O0 for m68k for GCC8. GCC 8 miscompiles aes_ccm_tag() for m68k with optimization level -O[12], which results in failure in aes_ccm_selftest(): \| aes_ccm_selftest: tag 0: 8 bytes @ 0x4d3e38 \| 03 80 5f 08 22 6f cb fe \| .._."o.. \| aes_ccm_selftest: verify 0 failed \| ... \| WARNING: module error: built-in module aes_ccm failed its MODULE_CMD_INIT, error 5 This is observed for amiga (A1200, 68060), mac68k (Quadra 840AV, 68040), and luna68k (nono, 68030 emulator). However, it is not for sun3 (TME, 68020 emulator) and sun2 (TME, 68010 emulator). At the moment, it is unclear whether this is due to differences b/w 68010-20 vs 68030-60, or something wrong with TME.	2020-08-10 06:27:29 +00:00
riastradh	062ecd5ff2	Fix some clang neon intrinsics. Compile-tested only, with -Wno-nonportable-vector-initializers. Need to address -- and test -- this stuff properly but this is progress.	2020-08-09 02:49:38 +00:00
riastradh	6e727d4c03	Use vshlq_n_s32 rather than vsliq_n_s32 with zero destination. Not sure why I reached for vsliq_n_s32 at first -- probably so I wouldn't have to deal with a new intrinsic in arm_neon.h!	2020-08-09 02:48:38 +00:00
riastradh	da4b946081	Nix outdated comment. I implemented this parallelism a couple weeks ago.	2020-08-09 02:00:57 +00:00
riastradh	43f5649092	Fix mistake in big-endian arm clang. Swapped the two halves (only gcc does that, I think) and wrote j,i backwards, oops. (I don't have a big-endian arm clang build handy to test; hoping this works.)	2020-08-09 01:59:04 +00:00
riastradh	18ff0ad8d5	Fix ARM NEON implementations of AES and ChaCha on big-endian ARM. New macros such as VQ_N_U32(a,b,c,d) for NEON vector initializers. Needed because GCC and Clang disagree on the ordering of lanes, depending on whether it's 64-bit big-endian, 32-bit big-endian, or little-endian -- and, bizarrely, both of them disagree with the architectural numbering of lanes. Experimented with using static const uint8_t x8[16] = {...}; uint8x16_t x = vld1q_u8(x8); which doesn't require knowing anything about the ordering of lanes, but this generates considerably worse code and apparently confuses GCC into not recognizing the constant value of x8. Fix some clang mistakes while here too.	2020-08-08 14:47:01 +00:00
riastradh	143bed0ba5	Issue three more swaps to save eight stores. Reduces code size and yields a small (~2%) cgd throughput boost. Remove duplicate comment while here.	2020-07-29 14:23:59 +00:00
riastradh	8748ca0e56	Rewrite cprng_fast in terms of new ChaCha API.	2020-07-28 20:15:07 +00:00
riastradh	1dd279420f	Draft 2x vectorized neon vpaes for aarch64. Gives a modest speed boost on rk3399 (Cortex-A53/A72), around 20% in cgd tests, for parallelizable operations like CBC decryption; same improvement should probably carry over to rpi4 CPU which lacks ARMv8.0-AES.	2020-07-28 20:11:09 +00:00
riastradh	7a8eb9a111	Implement 4-way vectorization of ChaCha for armv7 NEON. cgd performance is not as good as I was hoping (~4% improvement over chacha_ref.c) but it should improve substantially more if we let the cgd worker thread keep fpu state so we don't have to pay the cost of isb and zero-the-fpu on every 512-byte cgd block.	2020-07-28 20:08:48 +00:00
riastradh	783ffb04d5	Fix big-endian build with appropriate casts around vrev32q_u8.	2020-07-28 20:05:33 +00:00
riastradh	48a3032d8a	Fix typo in comment.	2020-07-28 15:42:41 +00:00
riastradh	9c455bb20f	Initialize authctr in both branches. I guess I didn't test the unaligned case, weird.	2020-07-28 14:01:35 +00:00
riastradh	3cca5606cd	Note that VSRI seems to hurt here.	2020-07-27 20:58:56 +00:00
riastradh	d4cf8df3e4	Take advantage of REV32 and TBL for 16-bit and 8-bit rotations. However, disable use of (V)TBL on armv7/aarch32 for now, because for some reason GCC spills things to the stack despite having plenty of free registers, which hurts performance more than it helps at least on ARM Cortex-A8.	2020-07-27 20:58:06 +00:00

1 2 3 4 5

234 Commits