indeed quicker, it nonetheless failed to actually fill all the requested
memory with the specified value much of the time if a non-aligned start
address was used.
- Remove memory barriers from the atomic ops. I don't understand why those
are there. Is it some architectural thing, or for a CPU bug, or just
over-caution maybe? They're not needed for correctness.
- Have unlikely conditional branches go forwards to help the static branch
predictor.
==> Provide a much more complete set of setters and getters for different
value types in the prop_array_util(3) and prop_dictionary_util(3)
functions.
==> Overhaul the prop_data(3), prop_number(3), and prop_string(3) APIs
to be easier to use and less awkwardly named, Deprecate the old
awkward names, and produce link-time warnings when they are referenced.
==> Deprecate mutable prop_data(3) and prop_string(3) objects. The old
APIs that support them still exist, but will now produce link-time
warnings when used.
==> When the new prop_string(3) API is used, strings are internally
de-duplicated as a memory footprint optimization.
==> Provide a rich set of bounds-checked gettter functions in and a
corresponding set of convenience setters in the prop_number(3) API.
==> Add a new prop_bool_value(3) function that is equivalent to
prop_bool_true(3), but aligned with the new "value" routines in
prop_data(3), prop_string(3), and prop_number(3).
This is intended for 68060:
- GCC does not emit __muldi3() for 68020-40, that have 32 * 32 --> 64 mulul
- mulsl (and moveml), used in this code, are not implemented for 68010
In comparison with that from compiler_rt, this version saves:
- 12% of processing time
- 12 bytes of stack
- 50 bytes of code size
Also, slightly faster, memory saving, and smaller than libgcc version.
By examining with evcnt(9), __muldi3() is invoked more than 1000 times per
sec by kernel, which should justify to introduce assembler version of this
function.
Primary goals:
1. Use cryptography primitives designed and vetted by cryptographers.
2. Be honest about entropy estimation.
3. Propagate full entropy as soon as possible.
4. Simplify the APIs.
5. Reduce overhead of rnd_add_data and cprng_strong.
6. Reduce side channels of HWRNG data and human input sources.
7. Improve visibility of operation with sysctl and event counters.
Caveat: rngtest is no longer used generically for RND_TYPE_RNG
rndsources. Hardware RNG devices should have hardware-specific
health tests. For example, checking for two repeated 256-bit outputs
works to detect AMD's 2019 RDRAND bug. Not all hardware RNGs are
necessarily designed to produce exactly uniform output.
ENTROPY POOL
- A Keccak sponge, with test vectors, replaces the old LFSR/SHA-1
kludge as the cryptographic primitive.
- `Entropy depletion' is available for testing purposes with a sysctl
knob kern.entropy.depletion; otherwise it is disabled, and once the
system reaches full entropy it is assumed to stay there as far as
modern cryptography is concerned.
- No `entropy estimation' based on sample values. Such `entropy
estimation' is a contradiction in terms, dishonest to users, and a
potential source of side channels. It is the responsibility of the
driver author to study the entropy of the process that generates
the samples.
- Per-CPU gathering pools avoid contention on a global queue.
- Entropy is occasionally consolidated into global pool -- as soon as
it's ready, if we've never reached full entropy, and with a rate
limit afterward. Operators can force consolidation now by running
sysctl -w kern.entropy.consolidate=1.
- rndsink(9) API has been replaced by an epoch counter which changes
whenever entropy is consolidated into the global pool.
. Usage: Cache entropy_epoch() when you seed. If entropy_epoch()
has changed when you're about to use whatever you seeded, reseed.
. Epoch is never zero, so initialize cache to 0 if you want to reseed
on first use.
. Epoch is -1 iff we have never reached full entropy -- in other
words, the old rnd_initial_entropy is (entropy_epoch() != -1) --
but it is better if you check for changes rather than for -1, so
that if the system estimated its own entropy incorrectly, entropy
consolidation has the opportunity to prevent future compromise.
- Sysctls and event counters provide operator visibility into what's
happening:
. kern.entropy.needed - bits of entropy short of full entropy
. kern.entropy.pending - bits known to be pending in per-CPU pools,
can be consolidated with sysctl -w kern.entropy.consolidate=1
. kern.entropy.epoch - number of times consolidation has happened,
never 0, and -1 iff we have never reached full entropy
CPRNG_STRONG
- A cprng_strong instance is now a collection of per-CPU NIST
Hash_DRBGs. There are only two in the system: user_cprng for
/dev/urandom and sysctl kern.?random, and kern_cprng for kernel
users which may need to operate in interrupt context up to IPL_VM.
(Calling cprng_strong in interrupt context does not strike me as a
particularly good idea, so I added an event counter to see whether
anything actually does.)
- Event counters provide operator visibility into when reseeding
happens.
INTEL RDRAND/RDSEED, VIA C3 RNG (CPU_RNG)
- Unwired for now; will be rewired in a subsequent commit.