the seq used by the request. It will improve consistency with the answer of SADB_GET
request and helps some applications which relies both on seq and pid.
Reported by Karl Knutsson by pr/36119.
socket. If we don't make an exact match, we may use a cached rule which
has lower priority than a rule that would otherwise have matched the
packet.
Code submitted by Karl Knutsson in PR/36051
parentheses in return statements.
Cosmetic: don't open-code TAILQ_FOREACH().
Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.
Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.
Constify:
1 Introduce const accessor for route->ro_dst, rtcache_getdst().
2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.
3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.
4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.
Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.
Based on the earlier patch from dyoung@, reviewed and discussed with
him.
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).
Stale IPv6 and ISO route caches will be treated by separate patches.
Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.
Here are the details:
Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.
Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.
Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.
In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.
In gif(4), discard the workaround for stale caches that involves
expiring them every so often.
Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).
Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.
Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.
In domain initializers, use .dom_xxx tags.
KNF here and there.
missing conversion spotted by Geoff Wing
XXX This code need to be checked whether UTC time
is really the right abstraction. I suspect uptime
would be the correct time scale for measuring life times.
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
read-only. This can actually happen if the packet was received by the
xennet interface, see PR kern/33162. Change it to m_copyback_cow.
AH and IPCOMP probably need similar fixes.
Requested by Jeff Rizzo, tested on Xen with -current by him.
net.inet.ipsec.test_replay - When set to 1, IPsec will send packets with
the same sequence number. This allows to verify if the other side
has proper replay attacks detection.
net.inet.ipsec.test_integrity - When set 1, IPsec will send packets with
corrupted HMAC. This allows to verify if the other side properly
detects modified packets.
(a message will be printed indicating when these sysctls changed)
By Pawel Jakub Dawidek <pjd@FreeBSD.org>.
Discussed with Christos Zoulas and Jonathan Stone.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.
NAT-T changes. Matches changes to reference non-nonexistent structs in
sys/netkey.
I have no clue if this is correct, but it matches the style in
sys/netkey, and (unlike the previous two revisions) it actually compiles...
Rework usr.bin/netstat/fast_ipsec.c to find the stats nodes under the
new names (Kame uses the name stats so we use different ones), as well
as setting slen appropriately between calls to sysctlbyname(), and
providing forward compatibility when actually retrieving stats via
sysctlbyname().
And correct a spelling error.
to ``registered'' sockets -- be treated ``specially'', as suggested
by RFC-2367.
The "special" treatment sys/netipsec now gives such messages is that
we use sbappendaddrchain() to deliver the (single) kernel-generated
message to each registered PF_KEY socket, with an sbprio argument of
SB_PRIO_BESTEFFORT, thus by-passing
For now, we check for registered messages, set a local `sbprio'
argument, and call sbappendaddrchain() (as opposed to sbappendaddr())
if and only if sbprio is non-NULL. As noted, we can rework
key_sendup_mbuf(), and all its callers, to pass the sbprio argument;
pending consensus (and hopeful KAME buy-back).
because the sysctl() code wasn't setting the requestor-pid field in dump
responses, the reworked unicast dump wasn't setting the requestor pid, either.
More exaclty, the pid field was set to 0.
No problem for setkey(8), but racoon reportedly ignores SADB dump-responses
with any pid (including 0) which doesn't match its own pid. A private bug
report says the 0-valued pid field broke racoon code which attempts to recover
from death of a prior racoon process, by dumping the SADB at startup.
Fix by revising sys/netipsec, so that both the new unicast PF_KEY dump
responses and the sysctl code set the requestor pid field in all
response mesages to DUMP requests.
Introduce new socket-layer function sbappendaddrchain() to
sys/kern/uipc_socket2.c: like sbappendaddr(), only takes a chain of
records and appends the entire chain in one pass. sbappendaddrchain()
also takes an `sbprio' argument, which indicates the caller requires
special `reliable' handling of the socket-buffer. `sbprio' is
described in sys/sys/socketvar.h, although (for now) the different
levels are not yet implemented.
Rework sys/netipsec/key.c PF_KEY DUMP responses to build a chain of
mbuf records, one record per dump response. Unicast the entire chain
to the requestor, with all-or-none semantics.
Changed files;
sys/socketvar.h kern/uipc_socket2.c netipsec/key.c
Reviewed by:
Jason Thorpe, Thor Lancelot Simon, post to tech-kern.
Todo: request pullup to 2.0 branch. Post-2.0, rework sysctl() API for
dumps to use new record-chain constructors. Actually implement
the distinct service levels in sbappendaddrchain() so we can use them
to make PF_KEY ACQUIRE messages more reliable.
KAME sys/netkey/key.c rev 1.119 ke_sp_unlink()/key_sp_dead() logic.
I have been running a similar version for about 10 days now, and it
fixes the PCB-cache refcount problems for me.
Checked in as a candidate for pullup to the 2.0 branch.
key_prefered_oldsa, defaulted to 1 (on): preferring old SAs, based on
the ill-concieved Jenkins I-D, is broken by design. For now, just
turn it off, as the simplest way to fix this in the 2.0 branch.
Next step is to rip it out entirely: it was always a bad idea.
August 2003:
On NetBSD, when we get to ah_massage_headers(), ip->ip_len is in
network byte order and includes all bytes in the input packet.
Therefore we don't need to byte-swap it or to add `skip' back in,
before verifying the receive-side hash.
With this change, AH transport mode works against FreeBSD 4.9 fast-ipsec
(which also works against Win2k, &c., &c.).
FAST_IPSEC headers (with declarations of stats structures) in
userspace code. I haven't checked for strict POSIX conformance, but
Sam Leffler's FreeBS `ipsecstats' tool will now compile, if you
manually make and populate usr/include/sys/netipsec.
Committed as-is for Andrew Brown to check more of the sys/netipsec sysctls.
#1. Fix an off-by-one error in sysctl_net_key_dumpsa(), which was
passing sysctl argument name[1] to a helper. According to Andrew
Brown's revised dynamic sysctl schmea, it must instead pass name[0].
2. There is a naming glitch in using sysctl() for setkey(8): setkey
queries the same sysctl MIB numbers to dump IPsec database state,
irrepesctive of the underlying IPsec is KAME or FAST_IPSEC.
For this to work as expected, sys/netipsec must export net.key.dumpsa
and net.key.dumpsp via the identical MIB numbers used by sys/netkey.
``Make it so''. For now, renumber the sys/netipsec/key.c nodes;
post-2.0 we can use sysctl aliases.
3. For as-yet-unexplained reasons, the PF_KEY_V2 nodes are never
shown (or queried?) by sysctl(8). For 2.0, I am following an earlier
suggestion from Andrew Brown, and renumbering allthe FAST_IPSEC sysctl
nodes to appear under net.key at MIB number { CTL_NET, PF_KEY }. Since
the renumbering may change, the renumbering is done via a level of
indirection in the C preprocessor.
The nett result is that setkey(8) can find the nodes it needs for
setkey -D and setkey -PD: and that sysctl(8) finds all the FAST_IPSEC
sysctl nodes relatedy to IPsec keying, under net.key. Andrew Brown
has reviewed this patch and tentatively approved the changes, though
we may rework some of the changes in -current in the near future.
SPDs, and to warn about and reject any such attempts.
Addresses a security concern, that the (eas-yet incomplete, experimental)
FAST_IPSEC+INET6 does not honour IPv6 SPDs. The security risk is that
Naive users may not realize this, and their data may get leaked in
cleartext, rather than IPsec'ed, if they use IPv6.
Security issue raised by: Thor Lancelot Simon
reviewed and OKed by: Thor Lancelot Simon
2.0 Pullup request after: 24 hours for further public comment.
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.
This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).
NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.
In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:
sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15
Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
Add 'XXX FIXME' comments to ah4_ctlinput(), esp4_ctlinput()
ipcode-paths merely cast away local variables ip, ah/esp, sav; the
fast-ipsec IPv4 code appears to work even so.
In espv6_ctlinput(), call the fast-ipsec KEY_ALLOCSA()/KEY_FREESA()
macros, not the KAME-native key_allocsa()/key_freesa() functions.
Cast sa6_src/sa6_dst to void; the fast-ipsec API does not (yet) pass
both src and dst addrs to KEY_d-ALLOCSA/KEY_FREESA.
Make sure 'off' is set to 0 on the branch where it was formerly
used-before-set.
Will now compile with ``options INET6'' (as in
sys/arch/i386/conf/GENERIC.FAST_IPSEC), but is not yet
expected to acutally work with IPv6.
prototypes for the IPv6 ECN ingress/egress functions in sys/netinet/ip_ecn.h,
inside an #ifdef INET6 wrapper. So, wrap sys/netipsec ocurrences of
#include <netinet6/ip6_ecn.h>
in #ifdef __FreeBSD__/#endif, until both camps can agree on this
teensy little piece of namespace. Affects:
ipsec_output.c xform_ah.c xform_esp.c xform_ipip.c
and the surrounding #ifndef notyet/#else/#endif which had the removed lines
in the #else branch. The inpcb_hdr versions have been in use for
some time now.
used to short-circuit IPsec processing in other places.
This is enabled only for NetBSD at the moment; in order for it to function
correctly, ipsec_pcbconn() must be called as appropriate.
va_{start,end} audit:
Make sure that each va_start has one and only one matching va_end,
especially in error cases.
If the va_list is used multiple times, do multiple va_starts/va_ends.
If a function gets va_list as argument, don't let it use va_end (since
it's the callers responsibility).
Improved by comments from enami and christos -- thanks!
Heimdal/krb4/KAME changes already fed back, rest to follow.
Inspired by, but not not based on, OpenBSD.
Remove #ifdef FAST_IPSEC/#endif around the inclusion of local
(sys/netipsec) header files; they are always appropriate for
this file (sys/netipsec/ipsec_netbsd.c). At least on NetBSD.
If INET6 is defined, include appropriate header files
(local netipsec/ipsec6.h, netinet6/ip6protosw.h, and icmp6.h
from its standards-compliant location in netinet/).
Will now at least compile and link when ``options INET6' is configured.
(struct in6pcb* versus struct inpcb*) in ipsec_getpolicybysock().
Add new macros (in lieu of an abstract data type) for a ``generic''
PCB_T (points to a struct inpcb* or struct in6pcb*) to ipsec_osdep.h.
Use those new macros in ipsec_getpolicybysock() and elsewhere.
As posted to tech-net for comment/feedback, late 2003.
is NULL, otherwise ipsec4_process_packet() may try to m_freem() a
bad pointer.
In ipsec4_process_packet(), don't try to m_freem() 'm' twice; ipip_output()
already did it.