Not a premature optimization after all -- this is necessary because
entropy_request can run in softint context, where the cv_wait_sig in
rnd_lock_sources is forbidden. Need to do this another way.
This was a premature optimization that turned out to be bogus. It's
not harmful to request more than we need from drivers, so let's not
go out of our way to avoid that.
This avoids a race with a concurrent ualea_get updating sc_needed,
which could lead to hang when requesting more entropy.
ualea(4) now survives
sysctl -w kern.entropy.depletion=1
cat </dev/random >/dev/null &
cat </dev/random >/dev/null &
without hanging for longer (even if yanked and reinserted in the
middle, although the detach path is not relevant to the bug this
change fixes).
The consolidation xcall can preempt entropy_enter, between when it
unlocks the per-CPU state and when it calls entropy_account_cpu, with
the effect of setting ec->ec_pending=0.
Previously this was impossible because we called entropy_account_cpu
with the per-CPU state still locked, but that doesn't work now that
the global entropy lock is an adaptive lock which might sleep which
is forbidden while the per-CPU state is locked.
This was previously called with the per-CPU state locked, which
worked fine as long as the global entropy lock was a spin lock so
acquiring it would never sleep. Now it's an adaptive lock, so it's
not safe to take with the per-CPU state lock -- but we still need to
prevent reentrant access to the per-CPU entropy pool by interrupt
handlers while we're extracting from it. So now the logic for
entering a sample is:
- lock per-CPU state
- entpool_enter
- unlock per-CPU state
- if anything pending on this CPU and it's time to consolidate:
- lock global entropy state
- lock per-CPU state
- transfer
- unlock per-CPU state
- unlock global entropy state
- Avoid going into a loop in case the transfer fails repeatedly --
just give up immediately if it fails.
- Assert result size is reasonable; no need to assume usbdi(9) is
malicious. If it can return ux_actlen > ux_length, that's a bug in
usbdi(9) that we should fix.
- Set sc_needed before aborting the pipe to prevent the xfer callback
from rescheduling itself.
- Make sure all paths out of the xfer callback clear sc_inflight.
While here, use device_printf instead of aprint_* after attach.
Now my system survives repeated insertion and yanking of ualea(4)
during:
sysctl -w kern.entropy.depletion=1
cat </dev/random >/dev/null
There may be a callback in flight from an xfer that has already been
taken off the queue by the time usbd_ar_pipe gets to it. We must
guarantee that even that callback has completed before returning
control to the caller.
Changing the global entropy lock from IPL_VM to IPL_SOFTSERIAL meant
it went from being a spin lock, which blocks preemption, to being an
adaptive lock, which might sleep -- and allow other threads to run
concurrently with the softint, even if those threads have softints
blocked with splsoftserial.
This manifested as KASSERT(!ec->ec_locked) triggering in
entropy_consolidate_xc -- presumably entropy_softintr slept on the
global entropy lock while holding the per-CPU state locked with
ec->ec_locked, and then entropy_consolidate_xc ran.
Instead, to protect access to the per-CPU state without taking a
global lock, defer entropy_account_cpu until after ec->ec_locked is
cleared. This way, we never sleep while holding ec->ec_locked, nor
do we incur any contention on shared memory when entering entropy
unless we're about to distribute it. To verify this, sprinkle in
assertions that curlwp->l_ncsw hasn't changed by the time we release
ec->ec_locked.
Syzbot found a way to see ud_cdesc=NULL but ud_ifaces!=NULL:
https://syzkaller.appspot.com/bug?id=e6d4449a128e73a9a88100a5cc833e5cae9fecae
Maybe it's a race with two threads somehow doing usbd_free_device at
the same time when only one should, but let's rule this case out
early on to make it easier to prove it has to be a race.
If MAKEVERBOSE > 2, each shell command from a make target is echoed.
This resulted in two additional words ending up in the variable
_POSTINSTALL. Noticed by Brad Harder.
Before:
$ make -v _POSTINSTALL MAKEVERBOSE=3
echo .../usr.sbin/postinstall .../usr.sbin/postinstall/postinstall ...
After:
$ make -v _POSTINSTALL MAKEVERBOSE=3
.../usr.sbin/postinstall/postinstall ...
Now that the rnd(9) API guarantees serial callbacks, we can simplify
everything a bit more.
(Some drivers like hifn(4) and sun8icrypto(4) still use locks to
coordinate with other parts of the driver to submit requests to and
process responses from the device.)
This simplifies the rndsource API -- no need to lock, unless you're
also coordinating with other driver logic like concurrent
opencrypto(4) requests that share device requests.
It looks like the original motivation for deferring to
config_interrupts was to wait until softint_establish worked. But
this no longer needs to use softints to deliver the entropy, so
that's moot.
Doing this synchronously gives us a better chance for more entropy
earlier.
These no longer ever run from hard interrupt context or with a spin
lock held, so there is no longer any need to have them at IPL_VM to
block hard interrupts. Instead, lower them to IPL_SOFTSERIAL.
This only ever reads from a single device register, so no need to
serialize access.
XXX This should really have a hardware-specific health test, but I
can't find any documentation on the underlying physical entropy
source.
We only ever read a single register at a time; no exclusive access or
serialization needed.
XXX This driver should have some kind of hardware-specific health
test -- is there documentation anywhere for what this RNG actually
is?
This incorrectly rejected the configuration as invalid if any
descriptor is not large enough to be interface descriptors.
Instead, it should reject the configuration only if any descriptor is
not large enough to be a _descriptor_, or if any interface-type
descriptor is not large enough to be an interface descriptor, but
skip over descriptors of other types even if they're smaller than
interface descriptors.
Candidate fix for PR kern/56762.
Just need to wait until softint_establish and high-priority xcalls
will work, no later than that. Doing this earlier gives us slightly
more of a chance to ensure cprng_fast and ssp get entropy from
hardware RNG devices that rely on interrupts.
If the self-test fails, disable everything else at boot -- don't just
leave it to the operator to notice and do something.
This way we get entropy earlier at boot, before threads start and
before the first things in the kernel that draw from it (cprng fast
init, ssp init).
Previously this was attached as RND_TYPE_UNKNOWN, at a time when the
kernel assumed _any_ RNG-type rndsource produced independent uniform
random bits and subjected it to automatic tests that would fail with
high probability for many other distributions. But sun8icrypto(4) is
very nonuniform (probably yields consecutive samples of a ring
oscillator, which are very much not independent).
Now the kernel no longer makes this assumption, so it is valid to
label this as what it is -- a hardware RNG. We should ideally still
have better information from the vendor about what's going on under
the hood before enabling nonzero entropy for it. But at least we can
label its type accurately.
No need to block interrupts while we're going through all the data
structures -- only need to block interrupts for the handoff from
interrupt handler to lower-priority logic.
The syscall only guarantees up to 256 bytes in a single go -- if
interrupted, it might return short, but if the caller requested at
least 256 bytes it will definitely return 256 bytes.
This is no longer ever taken in hard interrupt context, so there's no
longer any need to block interrupts while doing crypto operations on
the global entropy pool.
Otherwise, there is a window during which interrupts are running, but
the softint is not, so if many interrupts queue (low-entropy) samples
early at boot, they might get dropped on the floor. This could
happen, for instance, with a PCI RNG like ubsec(4) or hifn(4) which
requests entropy and processes it in its own hard interrupt handler.
This way, we never take the global entropy lock from interrupt
handlers (no interrupts while cold), so the global entropy lock need
not block interrupts.
There's an annoying ordering issue here: softint_establish doesn't
work until after CPUs have been detected, which happens inside
configure(), which is also what enables interrupts. So we have no
opportunity to softint_establish the entropy softint _before_
interrupts are enabled.
To work around this, we have to put a conditional into the interrupt
path, and go out of our way to process any queued samples after
establishing the softint. If we just made softint_establish work
early, like percpu_create does now, this problem would go away and we
could delete a bit of logic here.
Candidate fix for PR kern/56730.
This way we get a full lockdebug dump when LOCKDEBUG is enabled,
instead of just the panic message (which includes the lock address
you could pass to `show lock' in ddb, but let's get the dump by
default even if you don't enter ddb).
Also in the KASSERT print the mutex.