parentheses in return statements.
Cosmetic: don't open-code TAILQ_FOREACH().
Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.
Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.
Constify:
1 Introduce const accessor for route->ro_dst, rtcache_getdst().
2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.
3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.
4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
explicitly return `EINVAL'. Before, it was done but later in
ip6_setpktopt() when checking for 'len < ...')
CID-3316: check for 'm != NULL' before using it
ok christos@
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.
Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.
Based on the earlier patch from dyoung@, reviewed and discussed with
him.
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).
Stale IPv6 and ISO route caches will be treated by separate patches.
Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.
Here are the details:
Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.
Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.
Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.
In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.
In gif(4), discard the workaround for stale caches that involves
expiring them every so often.
Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).
Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.
Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.
In domain initializers, use .dom_xxx tags.
KNF here and there.
in6_control() with splnet()/splx(). I was being a bit paranoid
here. Following a cursory analysis of the code, this still looked
necessary. We don't spend a lot of time in these calls, so it
should not be too harmful to suspend network interrupts.
In in6_unlink_ifa(), call in6_delmulti() just once on each multicast
address (in6_multi). Previously, in6_unlink_ifa() called in6_delmulti()
on each in6_multi until in6_delmulti() removed the in6_multi from
the list and freed its memory. That's not justified: the multicast
list holds *one* reference. All other references belong to other
entities. We must wait to free the memory until the other entities
release their references, to protect against dereferencing a freed
in6_multi.
XXX I need to revisit in6_delmulti(), in6_unlink_ifa(), and friends,
XXX to pry apart the conditions where an in6_multi is removed from
XXX its list and where it is freed. Following my change, above,
XXX we still risk dereferencing a freed in6_multi.
Prevent in6_update_ifa() and in6_addremloop() from creating dangling
pointers to interfaces in the routing table. Previously, my NetBSD
tunnel concentrator, which adds and deletes a lot of P2P interfaces
with the same local address, crashed in 8 hours or less when it
dereferenced a dangling pointer to a deleted ifnet. Now, its uptime
is greater than 3 days.
Annotate a memory leak.
When copying one multicast address list to another, IFAREF before IFAFREE
to protect against using an ifaddr after (accidentally) freeing it.
LIST_REMOVE() a multicast address from its old list before
LIST_INSERT_HEAD() on its new list.
Do not count on in6_delmulti() removing its multicast-record argument
from the multicast address list that the record belongs to, because
clearly that is not what it (always) does.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).
Some input from Christos.
Use the usual idiom for iterating over a list where we might
_REMOVE() entries,
for (x = TAILQ_FIRST(...); x != NULL; x = nx) {
nx = TAILQ_NEXT(x, ...);
...
}
Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.
To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.
Miscellaneous changes in support of source-address selection:
1 Factor out some common code, producing rt_replace_ifa().
2 Abbreviate a for-loop with TAILQ_FOREACH().
3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.
4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.
See in_getifa(9) for a more thorough description of source-address
selection policy.
XXX: We still install rmd160.h and sha2.h in /usr/include/crypto, unlike
the other hash functions which get installed in /usr/include for compatibility.
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.
It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:
s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);
and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".
It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.
See also:
http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.htmlhttp://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
IPv6 interface address (e.g., sin6_addr fe80::200:24ff:fec3:4bac
sin6_scope_id 1), set a multicast interface with
setsockopt(,IPPROTO_IPV6,IPV6_MULTICAST_IF,), and sendto(2) multicast
destinations with "wildcard" scope ID, 0, without error EHOSTUNREACH.
Prior to this patch, sendto(2) would exit with EHOSTUNREACH, even
though the scope ID was unambiguously specified both by bind(2)
and setsockopt(2). This was a bug because it broke old applications.
Thanks JINMEI Tatuya for the patch!
the mbuf which supposed to get sent out:
- Complain in ip_output() if any of the IPv6 related flags are set.
- Complain in ip6_output() if any of the IPv4 related flags are set.
- Complain in both functions if the flags indicate that both a TCP and
UCP checksum should be calculated by the hardware.
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.
revision 1.1.1.2.2.5:
do not call pfctlinput2(PRC_MSGSIZE) on fragmentation to avoid
notification storm
From Keiichi SHIMA:
"In the current NetBSD code, the PRC_MSGSIZE message will be generated
for every fragmented packets when a node is trying to send a big
packet. That was the intermediate behavior while RFC3542 was under
discussion."
By (obviously) the KAME project.
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
case:
<driver>_ioctl(ifp, SIOCSIFADDR, struct ifreq *)
where it should be calling:
<driver>_ioctl(ifp, SIOCSIFADDR, struct ifaddr *)
and "Bad Things Happen (TM)"
Returning an error is good enough because none of the drivers handle INET6.
The problem here is that handling SIOCSIFADDR is a kludge. The ioctl gets
passed a struct ifreq * from userland, but then in the control routines
SIOCSIFADDR is handled "specially", and we call:
ifp->if_ioctl(ifp, SIOCSIFADDR, struct ifaddr *)
directly with the ifaddr we computed for that interface. It would be nice
if we called the ioctl routine if the original struct ifreq, and computed
the ifaddr, or passed it directly. This way all the ioctls would be treated
the same way, and we would not have the problem of pointer overloading.
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.
From the KAME project (www.kame.net).
Reviewed by core.
value. Previously the router should treat the recieved router
advertisement as having a 0 router lifetime. The RFC now says that the
router should treat the "Reserved" field the same way as if it was the
medium (default) preference.
From the KAME project via SUZUKI Shinsuke.
RFC4191
- supports host-side router-preference
RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes
Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code
From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
logic, and then call nd6_llinfo_settimer. Instead, call
nd6_llinfo_settimer immediately.
This should cause no functional change. I've been running this
patch for months.
In ipcomp6_input(), check 'md' not 'm' after a call to m_pulldown(): 'm'
may be a stale pointer at this point, and we're interested in whether or
not m_pulldown() failed.
Noticed by: Coverity Prevent analysis tool
- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.
This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.
From the KAME project via JINMEI Tatuya.
Approved by core@.
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.
over IPsec tunnels.
I have changed the default to 2 [copy]. I've verified that this works with
all my IPSEC setups, and this change has also been discussed in tech-net.
store a struct ifnet *, and define it for udp/tcp/rawip for INET and
INET6. When deleting a struct ifnet, invoke PRU_PURGEIF on all
protocols marked with PR_PURGEIF. Closes PR kern/29580 (mine).
"const struct mbuf *" to "struct mbuf *". Without this change the
actual implementation cannot even use m_copydata() on the mbuf chain
which is broken.
net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist
which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
(in my case it as a switch set to "monitor" mode):
If we see an NS request for the address we are just probing for, for
three times the number of DAD packets we are supposed to send (the
"ip6.dad_count" sysctl variable), assume that these are our own packets
and let DAD succeed.
The code for this was mostly there, commented out. Just needed some fixes.
The "three times" is heuristic of course.
Being here, reset the "dad_ns_tcount" variable on a successful send;
otherwise we get strange interdependencies with user-settable variables
(ever tried to set ip6.dad_count to something >15?).
sign for the existence of routing header.
- fragment to 1280 on IPv6-over-IPv6 encapsulation, as ICMPv6 too big may not
give you enough information to update pmtu cache.
from iij seil team, via kame.
The sys/netipsec policy-cache (added by Jason Thorpe as a rewrite of
the KAME per-PCB policy cache) assumes that policy-cacheable PCBs
always has a non-NULL inph_sp in the common PCB header. So we must
do all the per-PCB policy cache calls when either (KAME) IPSEC, or
FAST_IPSEC is defined. ``Make it so''.
We can now support non-IPsec'ed IPv6 traffic, when both
``options FAST_IPSEC'' and ``options INET6'' are configured.
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.
Convert struct session, ucred and lockf to pools.
IPsec processing in other places. The hint has 3 values: MAYBE, YES,
and NO. Hints are initialized to MAYBE, and MAYBE is always used for
unconnected sockets (since the spidx may change for every packet
that is output). For connected sockets, NONE and BYPASS policies cause
the hint to be set to NO, and all other policies to YES.
Also shuffle the PCB cache data structure, turning 3 arrays into a
single array of a struct.
to check if interface exists, as (1) if_index has different meaning
(2) ifindex2ifnet could become NULL when interface gets destroyed,
since when we have introduced dynamically-created interfaces. from kame
- seed2 is necessary, but use it as "seed2 + x" not "seed2 ^ x".
- skipping number is not needed, so disable it for 16bit generator (makes
the repetition period to 30000)
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.
Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.
All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.
PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.
Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.
This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.
NB: This does not fix the fact that IPsec leaks "packet tags."
- use hash for SPI-based SAD entry lookup (should be faster, i hope)
- cleanup keydb.c and key.c. key.c is responsible for refcounting secasvar,
keydb.c is responsible for alloc/free.
the old one. Rename the functions/structures from cast_* to cast128_*.
Adapt the KAME IPsec to use the new CAST-128 code, which has a simpler
API and smaller footprint.
replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)
In preparation for fast-ipsec. Reviewed by itojun.
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)
In preparation for fast-ipsec. Reviewed by itojun.
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.
All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.
Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
decide whether to create an empty llinfo stricter so that a user
can manually change the link-layer address of an existing neighbor
cache.
Pointed out by: KIU Shueng Chuan
from kame
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)
part of advanced API update (RFC2292 -> 3542).