Commit Graph

121 Commits

Author SHA1 Message Date
dholland 0fcc047588 Remove stray #undef, probably someone's debugging leftover from long ago. 2012-08-24 06:03:18 +00:00
christos 84f52095ad rename rfc6056 -> portalgo, requested by yamt 2012-06-25 15:28:38 +00:00
drochner 364a06bb29 remove KAME IPSEC, replaced by FAST_IPSEC 2012-03-22 20:34:37 +00:00
christos 42c420856f - fix offsetof usage, and redundant defines
- kill pointer casts to 0
2011-12-31 20:41:58 +00:00
drochner 23e5beaef1 rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
2011-12-19 11:59:56 +00:00
christos 5ec72efbaa Add inet6 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
2011-09-24 17:22:14 +00:00
plunky 7f3d4048d7 NULL does not need a cast 2011-08-31 18:31:02 +00:00
dyoung c1922724a7 Invalidate the vestigital PCB at the top of in6_pcblookup_connect() to
fix the bug where incoming TCPv6 connections were reset.
2011-05-04 01:45:48 +00:00
dyoung c2e43be1c5 Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires.  On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer.  Corresponding to each class
is an MSL, and a session uses the MSL of its class.  The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways).  Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote.  Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB".  VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion.  The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer.  When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive.  It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
2011-05-03 18:28:44 +00:00
joerg 0253fb8bf4 Remove stray { 2010-08-20 16:38:16 +00:00
joerg 0e26070ea9 Consider a mapped IPv4 address of 0.0.0.0 as unspecified. This allows
using mapped IPv4 address with connect without preceding bind.
2010-08-20 15:01:11 +00:00
pooka 7aa04865ab POOL_INIT -> pool_init 2009-05-26 00:17:56 +00:00
elad b15203315e Implicit EPERM -> explicit EACCES.
Requested by ad@ and yamt@.
2009-05-12 22:22:46 +00:00
elad 996746c20d Replace wrong __UNCONST() use with a local variable.
Similar to issues pointed out by bouyer@ and forgotten by me when I did
the last commit.

Should fix issues reported on current-users@ in:

    http://mail-index.netbsd.org/current-users/2009/05/02/msg009273.html
2009-05-02 18:58:03 +00:00
elad ddcbe0e1dd - Make in6_pcbbind_{addr,port}() static
- Properly authorize port binding in in_pcbsetport() and in6_pcbsetport()

- Pass struct sockaddr_in6 to in6_pcbsetport() instead of just the address,
  so that we have a more complete context

- Adjust udp6_output() to craft a sockaddr_in6 as it calls in6_pcbsetport()

- Fix an issue in in_pcbbind() where we used the "dom_sa_any" pointer and
  not a copy of it, pointed out by bouyer@, thanks!

Mailing list reference:

	http://mail-index.netbsd.org/tech-net/2009/04/29/msg001259.html
2009-04-30 18:18:34 +00:00
elad 3a272cca86 Only check if the port is used if it was specified.
Should fix problem reported in

    http://mail-index.netbsd.org/current-users/2009/04/22/msg009130.html
2009-04-22 18:35:01 +00:00
elad b7a329340e Replace KAUTH_GENERIC_ISSUSER with a better alternative. 2009-04-20 19:57:18 +00:00
elad e75a3b5e33 Extract in6_pcbbind()'s guts into two new routines: in6_pcbbind_addr() and
in6_pcbbind_port(), used for binding to an address and a port respectively.

While here, fix a possible "leak" of an in6pcb when binding to an address
succeeded but binding to an auto-assigned port failed.

Proposed and received no objections on tech-net@:

	http://mail-index.netbsd.org/tech-net/2009/04/15/msg001223.html
2009-04-20 18:14:30 +00:00
tsutsui d779b85d3e Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
2009-04-18 14:58:02 +00:00
elad d91dbb36b0 Don't set sin->sin_port and sin6->sin6_port to 0 before calling
ifa_ifwithaddr(), as we no longer do a byte compare on the entire struct.

Reviewed by and okay from dyoung@.
2009-04-14 21:25:20 +00:00
cegger e2cb85904d bcopy -> memcpy 2009-03-18 17:06:41 +00:00
cegger c363a9cb62 bzero -> memset 2009-03-18 16:00:08 +00:00
matt 34cedfb2bf Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.
2008-08-20 18:35:20 +00:00
matt b89c8b7b61 Free the socket only after disposing of the PCB. 2008-08-04 06:47:52 +00:00
ad 15e29e981b Merge the socket locking patch:
- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
2008-04-24 11:38:36 +00:00
dyoung 4957795396 Use ip6_clearpktopts() to destroy the IPv6 PCB's in6p_outputopts,
so that there's no chance of either leaking memory, or leaving
dangling pointers to a route cache.
2008-03-20 20:32:00 +00:00
dyoung ff82b311dd No code ever sets struct ip6_pktopts member ip6po_m, so get rid of
it.
2008-03-19 08:10:18 +00:00
dyoung 19dd9ed4a7 Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in6_losing().
2008-01-14 04:16:45 +00:00
dyoung 1386ee4adf Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().
2008-01-12 02:58:58 +00:00
dyoung 45485bd0b7 Save some rtcache_getrt() calls. 2008-01-10 08:06:11 +00:00
dyoung 72fa642a86 Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt.  Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address.  Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c.  Move the rtflush()
code into rtcache_clear() and delete rtflush().  Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt().  They compile, but I have not
tested them.  I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
2007-12-20 19:53:29 +00:00
drochner e3e9b75351 Fix in6_pcbrtentry() for the case of IPv6-mapped IPv4 addresses:
don't assume that the cached route is a sockaddr_in6, and do the
right comparisions so that no out-of-bounds memory is accessed.

btw, the use of "#ifdef INET" throughout the source doesn't look clean
to me: There are 2 cases -- whether AF_INET is usable by userland
programs, and whether IPv4 is supported as on-wire protocol.
2007-11-21 21:18:25 +00:00
dyoung 5121052595 Use sockaddr_in6_init(). 2007-11-10 00:14:31 +00:00
dyoung 08e6f22226 Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

        Introduce rt_walktree() for walking the routing table and
        applying a function to each rtentry.  Replace most
        rn_walktree() calls with it.

        Use rt_getkey()/rt_setkey() to get/set a route's destination.
        Keep a pointer to the sockaddr key in the rtentry, so that
        rtentry users do not have to grovel in the radix_node for
        the key.

        Add a RTM_GET method to rtrequest.  Use that instead of
        radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

        Constify.  KNF.  Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
        et cetera.  Use NULL instead of 0 for null pointers.  Use
        __arraycount().  Reduce gratuitous parenthesization.

        Stop using variadic arguments for rip6_output(), it is
        unnecessary.

        Remove the unnecessary rtentry member rt_genmask and the
        code to maintain it, since nothing actually used it.

        Make rt_maskedcopy() easier to read by using meaningful variable
        names.

        Extract a subroutine intern_netmask() for looking up a netmask in
        the masks table.

        Start converting backslash-ridden IPv6 macros in
        sys/netinet6/in6_var.h into inline subroutines that one
        can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
2007-07-19 20:48:52 +00:00
christos 72cfe7327b Ansify + add a few comments, from Karl Sjödahl 2007-05-23 17:14:59 +00:00
dyoung 72f0a6dfb0 Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing.  Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously.  Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs.  I have
  introduced routines for allocating, copying, and duplicating,
  and freeing sockaddrs:

        struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
        struct sockaddr *sockaddr_copy(struct sockaddr *dst,
                                       const struct sockaddr *src);
        struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
        void sockaddr_free(struct sockaddr *sa);

  sockaddr_alloc() returns either a sockaddr from the pool belonging
  to the specified family, or NULL if the pool is exhausted.  The
  returned sockaddr has the right size for that family; sa_family
  and sa_len fields are initialized to the family and sockaddr
  length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
  sockaddr_in).  sockaddr_free() puts the given sockaddr back into
  its family's pool.

  sockaddr_dup() and sockaddr_copy() work analogously to strdup()
  and strcpy(), respectively.  sockaddr_copy() KASSERTs that the
  family of the destination and source sockaddrs are alike.

  The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
  passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
  family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
  etc.  They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more.  All protocol families
  use struct route.  I have changed the route cache, 'struct route',
  so that it does not contain storage space for a sockaddr.  Instead,
  struct route points to a sockaddr coming from the pool the sockaddr
  belongs to.  I added a new method to struct route, rtcache_setdst(),
  for setting the cache destination:

        int rtcache_setdst(struct route *, const struct sockaddr *);

  rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
  available to create the sockaddr storage.

  It is now possible for rtcache_getdst() to return NULL if, say,
  rtcache_setdst() failed.  I check the return value for NULL
  everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
  caches, dom_rtcache.  rtflushall(sa_family_t af) looks up the
  domain indicated by 'af', walks the domain's list of route caches
  and invalidates each one.
2007-05-02 20:40:22 +00:00
ad 59d979c5f1 Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
2007-03-12 18:18:22 +00:00
christos 53524e44ef Kill caddr_t; there will be some MI fallout, but it will be fixed shortly. 2007-03-04 05:59:00 +00:00
dyoung 5493f188c7 KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
   in6_src.c, avoid casts by changing several route_in6 pointers
   to struct route pointers.  Remove unnecessary casts to caddr_t
   elsewhere.

Pave the way for eliminating address family-specific route caches:
   soon, struct route will not embed a sockaddr, but it will hold
   a reference to an external sockaddr, instead.  We will set the
   destination sockaddr using rtcache_setdst().  (I created a stub
   for it, but it isn't used anywhere, yet.)  rtcache_free() will
   free the sockaddr.  I have extracted from rtcache_free() a helper
   subroutine, rtcache_clear().  rtcache_clear() will "forget" a
   cached route, but it will not forget the destination by releasing
   the sockaddr.  I use rtcache_clear() instead of rtcache_free()
   in rtcache_update(), because rtcache_update() is not supposed
   to forget the destination.

Constify:

   1 Introduce const accessor for route->ro_dst, rtcache_getdst().

   2 Constify the 'dst' argument to ifnet->if_output().  This
     led me to constify a lot of code called by output routines.

   3 Constify the sockaddr argument to protosw->pr_ctlinput.  This
     led me to constify a lot of code called by ctlinput routines.

   4 Introduce const macros for converting from a generic sockaddr
     to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
     satocsin, et cetera.
2007-02-17 22:34:07 +00:00
dyoung befcb437f9 Change a couple of bzeros to memsets. 2007-01-26 19:01:26 +00:00
elad b2eb9a5389 Consistent usage of KAUTH_GENERIC_ISSUSER. 2007-01-04 19:07:03 +00:00
joerg eb04733c4e Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
2006-12-15 21:18:52 +00:00
dyoung c308b1c661 Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route).  Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL.  Provide
in_rtcache() for adding a route to the chain.  Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches.  In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain.  In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
2006-12-09 05:33:04 +00:00
joerg 22f3b113a0 Remove now superflous {. 2006-12-08 17:20:05 +00:00
joerg c882b2cbc1 When a dynamic route is deleted in in_losing and in6_losing, rtrequest
is called, but the current reference via the PCB is not removed. This
is effectively a leaked reference. Call rtfree unconditional.
2006-12-08 16:06:22 +00:00
dyoung 3b46d8b708 Use the queue(3) macros instead of open-coding them. Shorten
staircases.  Remove unnecessary casts.  Where appropriate, s/8/NBBY/.
De-__P().  KNF.

No functional changes intended.
2006-12-02 18:59:17 +00:00
christos 168cd830d2 __unused removal on arguments; approved by core. 2006-11-16 01:32:37 +00:00
christos 4d595fd7b1 - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
2006-10-12 01:30:41 +00:00
tls 8cc016b4bc Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

	s=splfoo();
	/* frob data structure */
	splx(s);
	pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next.  "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them.  One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
2006-10-05 17:35:19 +00:00
ad f474dceb13 Use the LWP cached credentials where sane. 2006-07-23 22:06:03 +00:00