sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
one without -- interestingly this didn't break the connection but just
caused a useless encapsulation
(this code needs to be rearranged to get it clean)
for an ESP tunnel, and add some fixes which make v4-in-v6 work
(v6 as inner protocol isn't ready, even v6-in-v6 can never have worked)
being here, fix a statistics counter and kill an unused variable
This is still somewhat experimental. Tested between 2 similar boxes
so far. There is much potential for performance improvement. For now,
I've changed the gmac code to accept any data alignment, as the "char *"
pointer suggests. As the code is practically used, 32-bit alignment
can be assumed, at the cost of data copies. I don't know whether
bytewise access or copies are worse performance-wise. For efficient
implementations using SSE2 instructions on x86, even stricter
alignment requirements might arise.
For this to fit, an API change in cryptosoft was adopted from OpenBSD
(addition of a "Setkey" method to hashes) which was done for GCM/GMAC
support there, so it might be useful in the future anyway.
tested against KAME IPSEC
AFAICT, FAST_IPSEC now supports as much as KAME.
into "blocksize" and "IV size"
-add an "reinit" function pointer which, if set, means that the xform
does its IV handling itself and doesn't want the default CBC handling
by the framework (poor name, but left that way to avoid unecessary
differences)
This syncs with Open/FreeBSD, purpose is to allow non-CBC transforms.
Refer to ivsize instead of blocksize where appropriate.
(At this point, blocksize and ivsize are identical.)
SADB_ACQUIRE -- this doesn't change much because racoon ignores
the proposal from the kernel anyway and applies its own configuration,
but having MD5 and SHA1 in the list but SHA2 not looks strange
as done in KAME and FAST_IPSEC after NetBSD imported the code
(The default differs: KAME uses the oldest valid SA while FAST_IPSEC
in NetBSD uses the newest one. I'm not changing this -- there is a lack
of specification and behavior can be changed with the "oldsa" sysctl.)
For incoming packets it shouldn't matter but I made it look similar
just to avoid unnecessary differences.
crypto_{new.free}session() to be called with the "crypto_mtx"
spinlock held.
This doesn't change much for now because these functions acquire
the said mutex first on entry now, but at least it keeps the nasty
locks local to the opencrypto core.
-RFC2104 says that the block size of the hash algorithm must be used
for key/ipad/opad calculations. While formerly all ciphers used a block
length of 64, SHA384 and SHA512 use 128 bytes. So we can't use the
HMAC_BLOCK_LEN constant anymore. Add a new field to "struct auth_hash"
for the per-cipher blocksize.
-Due to this, there can't be a single "CRYPTO_SHA2_HMAC" external name
anymore. Replace this by 3 for the 3 different keysizes.
This was done by Open/FreeBSD before.
-Also fix the number of authenticator bits used tor ESP and AH to
conform to RFC4868, and remove uses of AH_HMAC_HASHLEN which did
assume a fixed authenticator size of 12 bytes.
FAST_IPSEC will not interoperate with KAME IPSEC anymore if sha2 is used,
because the latter doesn't implement these standards. It should
interoperate with at least modern Free/OpenBSD now.
(I've only tested with NetBSD-current/FAST_IPSEC on both ends.)
decompression:
-seperate the IPCOMP specific rule that compression must not grow the
data from general compression semantics: Introduce a special name
CRYPTO_DEFLATE_COMP_NOGROW/comp_algo_deflate_nogrow to describe
the IPCOMP semantics and use it there. (being here, fix the check
so that equal size is considered failure as well as required by
RFC2393)
Customers of CRYPTO_DEFLATE_COMP/comp_algo_deflate now always get
deflated data back, even if they are not smaller than the original.
-allow to pass a "size hint" to the DEFLATE decompression function
which is used for the initial buffer allocation. Due to the changes
done there, additional allocations and extra copies are avoided if the
initial allocation is sufficient. Set the size hint to MCLBYTES (=2k)
in IPCOMP which should be good for many use cases.
into account that the extension header type is not in the extension
header itself but in the previous one -- this makes a difference
because (a) the length field is different for AH than for all others
and (b) the offset of the "next type" field isn't the same in primary
and extension headers.
(I didn't manage to trigger the bug in my tests, no extension headers
besides AH made it to that point. Didn't try hard enough -- the fix
is still valid.)
they are initialized -- during lifetime, no changes are expected
plus some constification of input to comparision functions etc
mostly required by the former
the refcount in the (global) policies gets decremented
(This apparently was missed when the policy cache code was copied
over from KAME IPSEC.)
From Wolfgang Stukenbrock per PR kern/44410, just fixed differently
to avoid unecessary differences to KAME.
Before, setting the IP_RAWOUTPUT flag did imply that the ip_id
(the fragmentation thing) was used as-is.
Now, a new ID is diced unless the new IP_NOIPNEWID flag is set.
The ip_id is part of the data which are used to calculate the hash
for AH, so set the IP_NOIPNEWID flag to make sure the IP header
is not modified behind AH's back. Otherwise, the recipient will detect
a checksum mismatch and discard the packet.
everywhere splsoftnet() was used before, to fix MP concurrency problems
-pull KERNEL_LOCK where ip(6)_output() is called, as this is what
the network stack (unfortunately) expects, in particular to avoid
races for packets in the interface send queues
From Wolfgang Stukenbrock per PR kern/44418, with the application
of KERNEL_LOCK to what I think are the essential points, tested
on a dual-core i386.
1) RFC2367 says in 2.3.3 Address Extension: "All non-address
information in the sockaddrs, such as sin_zero for AF_INET sockaddrs,
and sin6_flowinfo for AF_INET6 sockaddrs, MUST be zeroed out."
the IPSEC_NAT_T code was expecting the port information it needs
to be conveyed in the sockaddr instead of exclusively by
SADB_X_EXT_NAT_T_SPORT and SADB_X_EXT_NAT_T_DPORT,
and was not zeroing out the port information in the non-nat-traversal
case.
Since it was expecting the port information to reside in the sockaddr
it could get away with (re)setting the ports after starting to use them.
-> Set the natt ports before setting the SA mature.
2) RFC3947 has two Original Address fields, initiator and responder,
so we need SADB_X_EXT_NAT_T_OAI and SADB_X_EXT_NAT_T_OAR and not just
SADB_X_EXT_NAT_T_OA
The change has been created using vanhu's patch for FreeBSD as reference.
Note that establishing actual nat-t sessions has not yet been tested.
Likely fixes the following:
PR bin/41757
PR net/42592
PR net/42606
"initializing IPsec..."" done" is of somewhat limited value.
(I normally wouldn't care; but on my box the (root) uhub(4)s attach
between the first and last portion of the line.)
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.
The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.
- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.
With much feedback from matt@ and plunky@.
(actually splnet) and condvars instead of tsleep/wakeup. Fix a few
miscellaneous problems and add some debugging printfs while there.
Restore set of CRYPTO_F_DONE in crypto_done() which was lost at some
point after this code came from FreeBSD -- it made it impossible to wait
properly for a condition.
Add flags analogous to the "crp" flags to the key operation's krp struct.
Add a new flag, CRYPTO_F_ONRETQ which tells us a request finished before
the kthread had a chance to dequeue it and call its callback -- this was
letting requests stick on the queues before even though done and copied
out.
Callers of crypto_newsession() or crypto_freesession() must now take the
mutex. Change netipsec to do so. Dispatch takes the mutex itself as
needed.
This was tested fairly extensively with the cryptosoft backend and lightly
with a new hardware driver. It has not been tested with FAST_IPSEC; I am
unable to ascertain whether FAST_IPSEC currently works at all in our tree.
pjd@FreeBSD.ORG, ad@NetBSD.ORG, and darran@snark.us pointed me in the
right direction several times in the course of this. Remaining bugs
are mine alone.
Only record an IPSEC_OUT_DONE tag when we have finished the processing
In ip{,6}_output, check this tag to know if we have already processed this
packet.
Remove some dead code (IPSEC_PENDING_TDB is not used in NetBSD)
Fix pr/36870
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
the seq used by the request. It will improve consistency with the answer of SADB_GET
request and helps some applications which relies both on seq and pid.
Reported by Karl Knutsson by pr/36119.
socket. If we don't make an exact match, we may use a cached rule which
has lower priority than a rule that would otherwise have matched the
packet.
Code submitted by Karl Knutsson in PR/36051
parentheses in return statements.
Cosmetic: don't open-code TAILQ_FOREACH().
Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.
Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.
Constify:
1 Introduce const accessor for route->ro_dst, rtcache_getdst().
2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.
3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.
4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.
Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.
Based on the earlier patch from dyoung@, reviewed and discussed with
him.
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).
Stale IPv6 and ISO route caches will be treated by separate patches.
Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.
Here are the details:
Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.
Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.
Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.
In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.
In gif(4), discard the workaround for stale caches that involves
expiring them every so often.
Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).
Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.
Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.
In domain initializers, use .dom_xxx tags.
KNF here and there.
missing conversion spotted by Geoff Wing
XXX This code need to be checked whether UTC time
is really the right abstraction. I suspect uptime
would be the correct time scale for measuring life times.
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
read-only. This can actually happen if the packet was received by the
xennet interface, see PR kern/33162. Change it to m_copyback_cow.
AH and IPCOMP probably need similar fixes.
Requested by Jeff Rizzo, tested on Xen with -current by him.
net.inet.ipsec.test_replay - When set to 1, IPsec will send packets with
the same sequence number. This allows to verify if the other side
has proper replay attacks detection.
net.inet.ipsec.test_integrity - When set 1, IPsec will send packets with
corrupted HMAC. This allows to verify if the other side properly
detects modified packets.
(a message will be printed indicating when these sysctls changed)
By Pawel Jakub Dawidek <pjd@FreeBSD.org>.
Discussed with Christos Zoulas and Jonathan Stone.