Commit Graph

1226 Commits

Author SHA1 Message Date
dyoung
8f7c4dceea Don't refer to extern tcbtable here, it is unused. 2011-06-01 22:59:44 +00:00
spz
5f1fd2312c RA flood mitigation via a limit on accepted routes:
- introduce a limit for the routes accepted via IPv6 Router Advertisement:
  a common 2 interface client will have 6, the default limit is 100 and
  can be adjusted via sysctl
- report the current number of routes installed via RA via sysctl
- count discarded route additions. Note that one RA message is two routes.
  This is at present only across all interfaces even though per-interface
  would be more useful, since the per-interface structure complies to RFC2466
- bump kernel version due to the previous change
- adjust netstat to use the new value (with netstat -p icmp6)
2011-05-24 18:07:11 +00:00
dholland
ebbcc1e872 Add missing $NetBSD$ header. 2011-05-17 04:39:57 +00:00
dyoung
c1922724a7 Invalidate the vestigital PCB at the top of in6_pcblookup_connect() to
fix the bug where incoming TCPv6 connections were reset.
2011-05-04 01:45:48 +00:00
dyoung
c2e43be1c5 Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires.  On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer.  Corresponding to each class
is an MSL, and a session uses the MSL of its class.  The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways).  Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote.  Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB".  VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion.  The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer.  When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive.  It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
2011-05-03 18:28:44 +00:00
dyoung
ac162b774b *_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag.  Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
2011-05-03 17:44:30 +00:00
yamt
0cc7ac519a undefer csum in looutput.
looutput is used by various code (ether_output, mcast) to loopback packets.
2011-04-25 22:20:59 +00:00
yamt
61e76cd651 ip6_undefer_csum:
- don't forget ntohs
- KNF
2011-04-25 22:07:57 +00:00
yamt
e86be17a4f fix assertions 2011-04-25 22:04:32 +00:00
dholland
423044e331 Prune dead assignment, from Henning Petersen in PR 44890. 2011-04-21 06:58:31 +00:00
spz
5e98b9a2eb mitigation for CVE-2011-1547
this should really be solved by counting nested headers (like in the
inet6 case) instead
2011-04-01 08:25:02 +00:00
dyoung
060522dec8 Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
2011-03-31 19:40:51 +00:00
dyoung
2158ec89af Delete unnecessary casts to void *. No functional change intended. Same
assembly generated before and after this change.
2011-02-06 19:12:55 +00:00
mlelstv
f724a1d32d When deleting a fragment header use the simple copy operation only if it fits
completely into the mbuf.
2011-01-22 18:26:36 +00:00
matt
ebb2d31714 Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.
2010-12-11 22:37:46 +00:00
oki
e7b1f54727 Fixed mbuf leak possibility. 2010-10-14 03:34:42 +00:00
drochner
bd39bacef7 avoid NULL dereference in error case 2010-09-12 16:04:57 +00:00
jakllsch
c77ac47598 Make the EtherIP in IPv6 input path work.
XXX: Figure out if we really need a separate protosw for IPv6.
2010-08-24 00:07:00 +00:00
joerg
0253fb8bf4 Remove stray { 2010-08-20 16:38:16 +00:00
joerg
0e26070ea9 Consider a mapped IPv4 address of 0.0.0.0 as unspecified. This allows
using mapped IPv4 address with connect without preceding bind.
2010-08-20 15:01:11 +00:00
jym
f3fb0a5620 Fix some code paths where pointers are dereferenced after checking that
they are NULL (oops?)

XXX pull-ups for NetBSD-4 and NetBSD-5.
2010-08-14 18:28:59 +00:00
jakllsch
8688cfe5da Make MRT6DEBUG compile on LP64 by using ptrdiff_t printf() format specifier. 2010-07-27 13:59:40 +00:00
dyoung
a055a1e00a Under some circumstances, udp6_output() would call ip6_clearpktopts()
with an uninitialized struct ip6_pktopts on the stack, opt.
ip6_clearpktopts(&opt, ...) could dereference dangling pointers,
leading to memory corruption or a crash.  Now, udp6_output() calls
ip6_clearpktopts(&opt, ...) only if opt was initialized. Thanks to
Clement LECIGNE for reporting this bug.

Fix a potential memory leak: it is udp6_output()'s responsibility
to free its mbuf arguments on error.  In the unlikely event that
sa6_embedscope() failed, udp6_output() would not free its mbuf
arguments.

I will ask for this to be pulled up to -4, -5, and -5-0.
2010-07-15 23:46:55 +00:00
dyoung
c0d48690b2 To help find the cause of kernel complaints such as "/netbsd:
nd6_storelladdr: sdl_alen == 0, dst=... if=wm1", add printfs for some
"impossible" conditions, and make the nd6_storelladdr() printf more
informative by printing the value of sdl_alen.
2010-07-15 19:15:30 +00:00
dyoung
597c5734cd Remove unnecessary casts from struct route * to struct route *. 2010-07-08 01:22:28 +00:00
dyoung
44c56ddcb9 Sprinkle const to prevent rip6_output() from re-assigning all but one of
its arguments.
2010-07-08 01:13:01 +00:00
dyoung
ff06028902 Sprinkle 'const' to prevent udp6_output() from reassigning all but one
of its arguments.
2010-07-08 00:12:35 +00:00
dyoung
cb401b7946 When choosing IPv6 source addresses, respect the ifaddr preference
level such as one might set with 'ifconfig xx0 inet6 <address>
preference <pref>'.  I've been running this for many months without
any problems.
2010-04-22 20:05:15 +00:00
oki
304292d4c5 ip6_sprintf: compress the zeros of representation of the IPv6 address.
see RFC4291 section 2.2 item 2.
2010-04-07 22:59:15 +00:00
joerg
58e867556f Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
2010-04-05 07:19:28 +00:00
joerg
3d7916e198 Explicitly include opt_gateway.h when depending on GATEWAY. 2010-02-04 21:48:11 +00:00
pooka
b014350f7f Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client.  This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached.  However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff.  ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
2010-01-19 22:08:16 +00:00
elad
2faaca9342 Collapse identical switch cases. 2009-12-30 23:23:58 +00:00
joerg
1a57a79dcb Clear cksum flags before any further processing like ip_forward does.
Many drivers set the UDP/TCP v4 flags even for v6 traffic and if the
packet is encapsulated with gif, the IPv6 header would get corrupted by
ip_output. Patch suggested by bad@
2009-11-11 22:19:22 +00:00
dyoung
2421a1af93 Fix net.inet6.ip6.accept_rtadv and 'ndp -i <interface> accept_rtadv':
Add a flag ND6_IFF_OVERRIDE_RTADV that tells the kernel to override
ip6_accept_rtadv (net.inet6.ip6.accept_rtadv) on an interface.

Add a routine nd6_accepts_rtadv(ndi) that evaluates both the flags
on the interface represented by ndi and ip6_accept_rtadv, and
returns 'true' if the given interface should accept Router
Advertisements, and 'false' if not.

Now, ND6_IFF_ACCEPT_RTADV works as it was historically documented:
if it is set, then accept router advertisements iff ip6_accept_rtadv
!= 0.  Otherwise, do not accept router advertisements.

If ND6_IFF_OVERRIDE_RTADV is set, then the flag ND6_IFF_ACCEPT_RTADV
overrides ip6_accept_rtadv: if ND6_IFF_ACCEPT_RTADV is set, accept;
otherwise reject.  Ignore ip6_accept_rtadv.

If neither ND6_IFF_ACCEPT_RTADV nor ND6_IFF_OVERRIDE_RTADV is set,
reject Router Advertisements.
2009-11-06 20:41:22 +00:00
christos
981156292d fix the sun2 case for real. 2009-10-18 22:57:05 +00:00
christos
e1d5a1ca51 unbreak sun2. 2009-10-12 22:32:23 +00:00
christos
c0bc5ed834 backout the changes that establish a workqueue to synchronize the addresses
for arg and gre because they cause a race condition by calling ioctl() during
interface initialization. To make this work correctly we would need to
synchronize all interface init routines.
2009-09-19 13:11:02 +00:00
pooka
11281f01a0 Replace a large number of link set based sysctl node creations with
calls from subsystem constructors.  Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
2009-09-16 15:23:04 +00:00
dyoung
c5d5f7697a Make ifconfig(8) set and display preference numbers for IPv6
addresses.  Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr.  Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
  provide an implementation for IPv6.  Expect more work in this area: it
  may be more proper to say that the IPv6 implementation "internalizes"
  a sockaddr.  Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
  family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
  sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
  ifconfig(8).
2009-09-11 22:06:29 +00:00
dyoung
21904877ab Nothing uses sockaddr_in6_cmp() right now, and the generic
sockaddr_cmp() is probably as fast or faster than calling
sockaddr_in6_cmp() through a function pointer, so let's stop
compiling it.
2009-09-11 20:10:06 +00:00
yamt
7dc10fea3d nd6_ifattach: fix a missing parens bug in rev.1.132. 2009-08-31 12:37:59 +00:00
tsutsui
1b5375c235 Fix error on kernels with options IPSEC without options IPSEC_ESP.
Found on building evbppc/conf/PMPPC.
2009-08-21 16:52:43 +00:00
seanb
edb4329e21 - Newer gcc was throwning a 'dereferencing type-punned pointer will
break strict-aliasing rules' warning against IN6_IS_ADDR_* macros
  at -O2 -Wall.
2009-08-19 18:52:48 +00:00
cegger
549d6a10af buildfix: if_indexlim is of type size_t 2009-08-13 09:04:03 +00:00
dyoung
94981d88f4 Postpone to a workqueue adding link-local and loopback IPv6 addresses
to an interface.  This keeps the kernel from entering ifp->if_ioctl
recursively, which can deadlock if if_ioctl takes locks.  This will
fix deadlocks & LOCKDEBUG errors in agr(4) (kern/39940) and in
gre(4).
2009-08-13 00:34:04 +00:00
cegger
302b7dbb45 Check if ndi is valid before use.
ok tonnerre@
2009-08-06 12:17:11 +00:00
dyoung
bb61b3608a Use malloc(...|M_ZERO) instead of malloc(...) followed by memset(,0,). 2009-08-04 22:04:23 +00:00
dyoung
59b8f11a8b Fix typo in comment, s/SIOCSIFADDR/SIOCINITIFADDR/. 2009-07-30 17:28:36 +00:00
tonnerre
5d2cc68d22 Instead of using the net.inet6.ip6.accept_rtadv sysctl for all devices,
make net.inet6.ip6.accept_rtadv the default for individual per-device
settings so people can use the ndp(8) utility to set per-device whether
or not to accept route advertisements.

rtadvd changes to follow.

(Debated on tech-net@ before but almost two weeks passed by without any
comment on the patch.)
2009-07-25 23:12:09 +00:00