abd userland, as proposed on tech-security, with explicit_bzero using
a volatile function pointer as suggested by Alan Barrett.
Both do what the name says. For userland, both are prefixed by "__"
to keep them out of the user namespace.
Change some memset/memcmp uses to the new functions where it makes
sense -- these are just some examples, more to come.
arc4random() hacks in rump with stubs that call the host arc4random() to
get numbers that are hopefully actually random (arc4random() keyed with
stack junk is not). This should fix some of the currently failing anita
tests -- we should no longer generate duplicate "random" MAC addresses in
the test environment.
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:
An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.
A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.
The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.
An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.
A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.
An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.
In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.
The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.
The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.
A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.
The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.
Manual pages for the new kernel interfaces are forthcoming.
and can not disappear -- no need to hold crypto_mtx to check the
driver list
(the whole check is questionable)
-crp->crp_cv (the condition variable) is used by userland cryptodev
exclusively -- move its initialization there, no need to waste
cycles of in-kernel callers
-add a comment which members of "struct cryptop" are used
by opencrypto(9) and which by crypto(4)
(this should be split, no need to waste memory for in-kernel callers)
This is still somewhat experimental. Tested between 2 similar boxes
so far. There is much potential for performance improvement. For now,
I've changed the gmac code to accept any data alignment, as the "char *"
pointer suggests. As the code is practically used, 32-bit alignment
can be assumed, at the cost of data copies. I don't know whether
bytewise access or copies are worse performance-wise. For efficient
implementations using SSE2 instructions on x86, even stricter
alignment requirements might arise.
For this to fit, an API change in cryptosoft was adopted from OpenBSD
(addition of a "Setkey" method to hashes) which was done for GCM/GMAC
support there, so it might be useful in the future anyway.
tested against KAME IPSEC
AFAICT, FAST_IPSEC now supports as much as KAME.
implementation thing) from the abstract xform descriptor to
the cryptosoft implementation part -- for sanity, and now clients
of opencrypto don't depend on headers of cipher implementations anymore
instead of arc4random(). AES-CTR is sensitive against IV recurrence
(with the same key / nonce), and a random number doesn't give that
guarantee.
This needs a little API change in cryptosoft -- I've suggested it to
Open/FreeBSD, might change it depending on feedback.
Thanks to Steven Bellovin for hints.
anywhere afaics
(The confusion comes probably from use of arc4random() at various places,
but this lives in libkern and doesn't share code with the former.)
-g/c non-implementation of arc4 encryption in swcrypto(4)
-remove special casing of ARC4 in crypto(4) -- the point is that it
doesn't use an IV, and this fact is made explicit by the new "ivsize"
property of xforms
into "blocksize" and "IV size"
-add an "reinit" function pointer which, if set, means that the xform
does its IV handling itself and doesn't want the default CBC handling
by the framework (poor name, but left that way to avoid unecessary
differences)
This syncs with Open/FreeBSD, purpose is to allow non-CBC transforms.
Refer to ivsize instead of blocksize where appropriate.
(At this point, blocksize and ivsize are identical.)
that only 96 random bits were used for IV generation,
this caused eg that the last 4 bytes of the IV in ESP/AES-CBC
were constant, leaking kernel memory
affects FAST_IPSEC only
the incoming and outgoing request queues (which can be dealt with
by hardware accelerators) and an adaptive lock for "all the rest"
(mostly driver configuration, but also some unrelated stuff in
cryptodev.c which should be revisited)
The latter one seems to be uneeded at many places, but for now I've
done simple replacements only, except minor fixes (where
softint_schedule() was called without the lock held)
crypto_{new.free}session() to be called with the "crypto_mtx"
spinlock held.
This doesn't change much for now because these functions acquire
the said mutex first on entry now, but at least it keeps the nasty
locks local to the opencrypto core.
work the same way, grow output buffer exponentially and kill
reallocation of metadata
-minor cleanup, make definitions private which are implementation
details of deflate.gzip
-RFC2104 says that the block size of the hash algorithm must be used
for key/ipad/opad calculations. While formerly all ciphers used a block
length of 64, SHA384 and SHA512 use 128 bytes. So we can't use the
HMAC_BLOCK_LEN constant anymore. Add a new field to "struct auth_hash"
for the per-cipher blocksize.
-Due to this, there can't be a single "CRYPTO_SHA2_HMAC" external name
anymore. Replace this by 3 for the 3 different keysizes.
This was done by Open/FreeBSD before.
-Also fix the number of authenticator bits used tor ESP and AH to
conform to RFC4868, and remove uses of AH_HMAC_HASHLEN which did
assume a fixed authenticator size of 12 bytes.
FAST_IPSEC will not interoperate with KAME IPSEC anymore if sha2 is used,
because the latter doesn't implement these standards. It should
interoperate with at least modern Free/OpenBSD now.
(I've only tested with NetBSD-current/FAST_IPSEC on both ends.)
decompression:
-seperate the IPCOMP specific rule that compression must not grow the
data from general compression semantics: Introduce a special name
CRYPTO_DEFLATE_COMP_NOGROW/comp_algo_deflate_nogrow to describe
the IPCOMP semantics and use it there. (being here, fix the check
so that equal size is considered failure as well as required by
RFC2393)
Customers of CRYPTO_DEFLATE_COMP/comp_algo_deflate now always get
deflated data back, even if they are not smaller than the original.
-allow to pass a "size hint" to the DEFLATE decompression function
which is used for the initial buffer allocation. Due to the changes
done there, additional allocations and extra copies are avoided if the
initial allocation is sufficient. Set the size hint to MCLBYTES (=2k)
in IPCOMP which should be good for many use cases.
This case can be triggered from userland cryptodev if the buffer
for decompressed data is too small.
(It would look cleaner if the lengths would be passed explicitely
everywhere, but that would thwart the abstraction done by COPYDATA/COPYBACK
which allows to treat mbufs and iovs the same way.)
-use exponentially growing buffer sizes instead of just linear extension
-drop the dynamic allocation of buffer metadata introduced in rev.1.8 --
if the initial array is not sufficient something is wrong
-apply some (arbitrary, heuristic) limit so that compressed data
which extract into insane amounts of constant data don't kill the system
This addresses PR kern/36864 by Wolfgang Stukenbrock. Some tuning
might be useful, but hopefully this is an improvement already.
a successful decompression in rare cases. A necessary but not sufficient
condition seems to be that the decompressed data end exactly at the end
of an allocated output buffer. (I can reproduce this reliably with
a userland program built against kernel zlib. Userland libz is much
newer and not affected.)
Since kernel zlib is based on an old version and heavily modified, I don't
dare to touch it. So catch this case in the wrapper.
Being here, reorder deflate/inflate error handling and add comments
to make understandable what is tested and why.
the DEFLATE complssion/decompression result is within a single
buffer already
-simplify bookkeeping of allocated buffers (and don't waste the
last member of the metadata array)
from Wolfgang Stukenbrock per PR kern/36865 (with some cleanup
of error handling by me)
The Gzip compression case can be improved too, but for now I've applied
the buffer bookkeeping changes.
tested with IP4 IPCOMP
the data chunk is the final one, which makes that zlib issues the
proper termination marker
(KAME IPSEC does this, but doesn't check eagerly in the receive
path, so the missing termination didn't cause problems so far)
closes my PR kern/44539
being here, replace the Z_PARTIAL_FLUSH flag which is marked
deprecated by zlib by Z_SYNC_FLUSH in the decompression path
(tested with IPv4 IPCOMP on i386)
can be shared by multiple threads -- pass them on the stack instead.
Add some "const" to document this. (One _could_ use the session struct
for temporary stuff with proper locking, but it seems unnecessary here.)
Also remove the unused SW_crc member in the session struct.
From Wolfgang Stukenbrock per PR kern/44472.
Improve CRYPTO_DEBUG printing a bit:
print pointers with %p
print unsigned with %u rather than %d
use CRYPTO_SESID2LID instead of just casting to uint32_t
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567