Commit Graph

131 Commits

Author SHA1 Message Date
dyoung 72f0a6dfb0 Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing.  Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously.  Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs.  I have
  introduced routines for allocating, copying, and duplicating,
  and freeing sockaddrs:

        struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
        struct sockaddr *sockaddr_copy(struct sockaddr *dst,
                                       const struct sockaddr *src);
        struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
        void sockaddr_free(struct sockaddr *sa);

  sockaddr_alloc() returns either a sockaddr from the pool belonging
  to the specified family, or NULL if the pool is exhausted.  The
  returned sockaddr has the right size for that family; sa_family
  and sa_len fields are initialized to the family and sockaddr
  length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
  sockaddr_in).  sockaddr_free() puts the given sockaddr back into
  its family's pool.

  sockaddr_dup() and sockaddr_copy() work analogously to strdup()
  and strcpy(), respectively.  sockaddr_copy() KASSERTs that the
  family of the destination and source sockaddrs are alike.

  The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
  passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
  family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
  etc.  They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more.  All protocol families
  use struct route.  I have changed the route cache, 'struct route',
  so that it does not contain storage space for a sockaddr.  Instead,
  struct route points to a sockaddr coming from the pool the sockaddr
  belongs to.  I added a new method to struct route, rtcache_setdst(),
  for setting the cache destination:

        int rtcache_setdst(struct route *, const struct sockaddr *);

  rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
  available to create the sockaddr storage.

  It is now possible for rtcache_getdst() to return NULL if, say,
  rtcache_setdst() failed.  I check the return value for NULL
  everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
  caches, dom_rtcache.  rtflushall(sa_family_t af) looks up the
  domain indicated by 'af', walks the domain's list of route caches
  and invalidates each one.
2007-05-02 20:40:22 +00:00
christos 53524e44ef Kill caddr_t; there will be some MI fallout, but it will be fixed shortly. 2007-03-04 05:59:00 +00:00
dyoung 5493f188c7 KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
   in6_src.c, avoid casts by changing several route_in6 pointers
   to struct route pointers.  Remove unnecessary casts to caddr_t
   elsewhere.

Pave the way for eliminating address family-specific route caches:
   soon, struct route will not embed a sockaddr, but it will hold
   a reference to an external sockaddr, instead.  We will set the
   destination sockaddr using rtcache_setdst().  (I created a stub
   for it, but it isn't used anywhere, yet.)  rtcache_free() will
   free the sockaddr.  I have extracted from rtcache_free() a helper
   subroutine, rtcache_clear().  rtcache_clear() will "forget" a
   cached route, but it will not forget the destination by releasing
   the sockaddr.  I use rtcache_clear() instead of rtcache_free()
   in rtcache_update(), because rtcache_update() is not supposed
   to forget the destination.

Constify:

   1 Introduce const accessor for route->ro_dst, rtcache_getdst().

   2 Constify the 'dst' argument to ifnet->if_output().  This
     led me to constify a lot of code called by output routines.

   3 Constify the sockaddr argument to protosw->pr_ctlinput.  This
     led me to constify a lot of code called by ctlinput routines.

   4 Introduce const macros for converting from a generic sockaddr
     to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
     satocsin, et cetera.
2007-02-17 22:34:07 +00:00
degroote e2211411a4 Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
2007-02-10 09:43:05 +00:00
dyoung f2a11fe343 bzero -> memset 2007-01-29 06:02:26 +00:00
dyoung 2148d49b3a Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.
2007-01-15 21:49:56 +00:00
degroote ed7ae80021 Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo
2007-01-15 19:11:48 +00:00
joerg eb04733c4e Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
2006-12-15 21:18:52 +00:00
dyoung c308b1c661 Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route).  Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL.  Provide
in_rtcache() for adding a route to the chain.  Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches.  In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain.  In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
2006-12-09 05:33:04 +00:00
christos 168cd830d2 __unused removal on arguments; approved by core. 2006-11-16 01:32:37 +00:00
christos 4d595fd7b1 - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
2006-10-12 01:30:41 +00:00
dyoung 71522fb484 Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.
2006-09-05 16:11:26 +00:00
dyoung 19ce2e4680 Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.
2006-09-01 02:44:46 +00:00
christos 86fad06b76 declare the type of code. 2006-08-30 15:25:08 +00:00
tron 8fe4e4040d Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.
2006-07-11 22:13:56 +00:00
kardel de4337ab21 merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
  time.tv_sec -> time_second
- struct timeval mono_time is gone
  mono_time.tv_sec -> time_uptime
- access to time via
	{get,}{micro,nano,bin}time()
	get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
  Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
  NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
2006-06-07 22:33:33 +00:00
christos f1a8105e4c Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.
2006-04-15 00:24:12 +00:00
rpaulo 8c2379fd97 NDP-related improvements:
RFC4191
	- supports host-side router-preference

	RFC3542
	- if DAD fails on a interface, disables IPv6 operation on the
          interface
	- don't advertise MLD report before DAD finishes

	Others
	- fixes integer overflow for valid and preferred lifetimes
	- improves timer granularity for MLD, using callout-timer.
	- reflects rtadvd's IPv6 host variable information into kernel
	  (router only)
	- adds a sysctl option to enable/disable pMTUd for multicast
          packets
	- performs NUD on PPP/GRE interface by default
	- Redirect works regardless of ip6_accept_rtadv
	- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
2006-03-05 23:47:08 +00:00
rpaulo eb35daf5b2 Fix typos in comments.
From: the KAME project via SUZUKI Shinsuke.
2006-03-03 14:07:06 +00:00
rpaulo 78678b130a Better support of IPv6 scoped addresses.
- most of the kernel code will not care about the actual encoding of
  scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
  scoped addresses as a special case.
- scope boundary check will be stricter.  For example, the current
  *BSD code allows a packet with src=::1 and dst=(some global IPv6
  address) to be sent outside of the node, if the application do:
    s = socket(AF_INET6);
    bind(s, "::1");
    sendto(s, some_global_IPv6_addr);
  This is clearly wrong, since ::1 is only meaningful within a single
  node, but the current implementation of the *BSD kernel cannot
  reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
  entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
2006-01-21 00:15:35 +00:00
christos 95e1ffb156 merge ktrace-lwp. 2005-12-11 12:16:03 +00:00
bouyer b3b0d23068 In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.
2005-10-19 20:42:54 +00:00
yamt 2e85eff671 - introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.
2005-08-18 00:30:58 +00:00
christos 2ab31527e2 - avoid shadowed variables
- sprinkle const.
2005-05-29 21:43:51 +00:00
itojun 57fd095fdf shouldn't check code field on "packet too big" icmp6 message. 2005-01-17 10:16:07 +00:00
atatat 4de3747b89 Sysctl descriptions under net subtree (net.key not done) 2004-05-25 04:33:59 +00:00
itojun e050c8a03d do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi 2004-03-26 03:35:02 +00:00
atatat 19af35fd0d Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
2004-03-24 15:34:46 +00:00
lha 2b1cb68e2f Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat
2003-12-17 18:49:38 +00:00
atatat 13f8d2ce5f Dynamic sysctl.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al.  Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded.  Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment.  I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
2003-12-04 19:38:21 +00:00
simonb a2facef339 Remove some assigned-to but otherwise unused variables. 2003-10-30 01:43:08 +00:00
itojun 495906ca8e revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
2003-09-04 09:16:57 +00:00
itojun 9569786c95 deref member in in6p directly, don't rely on existence of macro 2003-08-25 00:11:52 +00:00
itojun 11ede1ed88 remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output. 2003-08-22 22:00:36 +00:00
itojun 82eb4ce914 change the additional arg to be passed to ip{,6}_output to struct socket *.
this fixes KAME policy lookup which was broken by the previous commit.
2003-08-22 21:53:01 +00:00
jonathan 902669955f Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec.  Reviewed by itojun.
2003-08-22 20:20:09 +00:00
agc aad01611e7 Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22364, verified by myself.
2003-08-07 16:26:28 +00:00
itojun 256877974a m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call.  from Julian Coleman via kame.
2003-08-06 14:47:32 +00:00
itojun 6d4a3c4191 remove unneeded checks of accept_rtadv. from kame 2003-06-24 07:54:47 +00:00
itojun 194f048bd9 use time.tv_sec directly 2003-06-24 07:39:24 +00:00
itojun 7a5741651c - sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
  (note: sizeof(ip6_rthdr0) has changed!)
  also, sync up with RFC2460 routing header definition (no "strict" source
  routing mode any more)

part of advanced API update (RFC2292 -> 3542).
2003-06-06 08:13:43 +00:00
itojun 5c0f142820 remove assumption on redirect header option processing. from kame 2003-06-03 05:20:06 +00:00
itojun 346e0198f0 always use PULLDOWN_TEST codepath. 2003-05-14 06:47:33 +00:00
itojun a81c2be8be avoid mbuf leak in redirect header option attachment. more complete
fix to come.  from kame
2003-03-31 23:55:46 +00:00
provos 0f09ed48a5 remove trailing \n in panic(). approved perry. 2002-09-27 15:35:29 +00:00
simonb 4e3613273b Remove breaks after returns, unreachable returns and returns after
returns(!).
2002-09-23 05:51:10 +00:00
itojun 9401012487 KNF - return is not a function. sync w/kame. 2002-09-11 02:46:42 +00:00
itojun a919a4c628 no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>
2002-07-30 23:27:15 +00:00
itojun 51bd9285d5 correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame 2002-07-10 05:05:01 +00:00
thorpej 10c252ba47 Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
  m_pullup(), except that it always prepends and copies, rather
  than only doing so if the desired length is larger than m->m_len.
  m_copyup() also allows an offset into the destination mbuf, which
  allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP.  These
  macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
  architectures which do not have strict alignment constraints don't
  pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
  assert that it already is, as appropriate.

Note: This code is still somewhat experimental.  However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
2002-06-30 22:40:32 +00:00