Commit Graph

146 Commits

Author SHA1 Message Date
jonathan 28b5f5dfab (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''.  Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
2003-08-15 03:42:00 +00:00
agc aad01611e7 Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22364, verified by myself.
2003-08-07 16:26:28 +00:00
he 80ccb5520c As a temporary workaround, apply the fix from PR#20390, thereby
cooperating with the callout code in working around the race
condition caused by the TCP code's use of the callout facility.

Instead of unconditionally releasing memory in tcp_close() and
SYN_CACHE_PUT(), check whether any of the related callout handlers
are about to be invoked (but have not yet done callout_ack()), and
if so, just mark the associated data structure (tcpcb or syn cache
entry) as "dead", and test for this (and release storage) in the
callout handler functions.
2003-07-20 16:35:07 +00:00
ragge 9e2d68cb61 Make it possible to set TCP_INIT_WIN and TCP_INIT_WIN_LOCAL in the config
file as options.
2003-07-03 08:28:16 +00:00
fvdl d5aece61d6 Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
2003-06-29 22:28:00 +00:00
ragge 679db94879 Add code to remember where in the send queue of mbufs the last packet was
sent from. This change avoid a linear search through all mbufs when using
large TCP windows, and therefore permit high-speed connections on long
distances.

Tested on a 1 Gigabit connection between Luleå and San Francisco, a distance
of about 15000km.  With TCP windows of just over 20 Mbytes it could keep up
with 950Mbit/s.

After discussions with Matt Thomas and Jason Thorpe.
2003-06-29 18:58:26 +00:00
martin d505b18964 Make sure to include opt_foo.h if a defflag option FOO is used. 2003-06-23 11:00:59 +00:00
thorpej cdf1b0026c Allow TCP connections to hosts on a local network to use a larger
slow start initial window.  Default this larger initial window to
4 packets, allowing it to be adjusted with net.inet.tcp.init_win_local.
2003-03-01 04:40:27 +00:00
matt 65e5548a17 Add MBUFTRACE kernel option.
Do a little mbuf rework while here.  Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *).  These are not performance critical and making them
call m_get saves considerable space.  Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
2003-02-26 06:31:08 +00:00
scw 5521093d4b Quell an uninitialised variable warning. 2002-11-24 10:52:47 +00:00
lukem 3b5f6123fa fix typo in previous: s/tip/top/ 2002-10-22 07:22:19 +00:00
simonb 8b9702b758 Micro-optimisation: don't check if the high bit is set and then mask it
off - just mask it off anyways.  Saves a branch 50% of the time.
2002-10-22 02:53:59 +00:00
itojun 167b0b8ebd minor KNF 2002-09-25 11:19:23 +00:00
itojun c00fa8dfd9 avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year.  should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged.  we may want to
provide IP_HDRINCL variant that does not swap endian.
2002-08-14 00:23:27 +00:00
itojun 390ee363bd check AF_INET6 socketes when IPv4 "too big" messages arrive.
PR 17448
2002-07-01 20:51:25 +00:00
itojun f192b66b94 whitespace 2002-06-09 16:33:36 +00:00
itojun 5c1df51d53 attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
2002-05-29 07:53:39 +00:00
itojun b9f810de55 use arc4random() on tcp iss generation 2002-05-28 10:17:27 +00:00
itojun 3e7ae517e0 path MTU discovery blackhole detection.
PR 12790 (sorry for not committing it for a long time)
2002-05-26 16:05:43 +00:00
matt c03e11f081 Eliminate commons. 2002-05-12 20:33:50 +00:00
matt e5555e5c26 Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently).  Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole.  Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS).  This is
part 1/2 of tcp_reass changes.
2002-05-07 02:59:38 +00:00
thorpej 9054daca3e * Instrument tcp_build_datapkt().
* Remove the code that allocates a cluster if the packet would
  fit in one; it totally defeats doing references to M_EXT mbufs
  in the socket buffer.  This drastically reduces the number of
  data copies in the tcp_output() path for applications which use
  large writes.  Kudos to Matt Thomas for pointing me in the right
  direction.
2002-04-27 01:47:58 +00:00
itojun 38f3d28842 have tcp6_drain 2002-03-15 09:25:41 +00:00
thorpej a180cee23b Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map).  Try to deal with this:

* Group all information about the backend allocator for a pool in a
  separate structure.  The pool references this structure, rather than
  the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
  to become available, but will still fail if it cannot callocate KVA
  space for the pages.  If this happens, carefully drain all pools using
  the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
  some pages, and use that information to make draining easier and more
  efficient.
* Get rid of PR_URGENT.  There was only one use of it, and it could be
  dealt with by the caller.

From art@openbsd.org.
2002-03-08 20:48:27 +00:00
lukem ea1cd7eb08 add RCSIDs 2001-11-13 00:32:34 +00:00
matt da5a70805c Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros.  Use the FOREACH macros where appropriate.
2001-11-04 20:55:25 +00:00
matt 47577dca93 Change a few variable/tables to const since they are read-only. 2001-11-04 13:42:27 +00:00
thorpej 050e9de009 Use callouts for SYN cache timers, rather than traversing time queues
in tcp_slowtimo().
2001-09-11 21:03:20 +00:00
thorpej 6d0e813f6c Use callouts for TCP timers, rather than traversing the list of
all open TCP connections in tcp_slowtimo() (which is called 2x
per second).  It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
2001-09-10 22:14:26 +00:00
thorpej 413e5cb878 Initialize TCP timer variables in a new function, tcp_timer_init(). 2001-09-10 20:36:43 +00:00
thorpej 3d9c42775e Add explicit initialization of TCP timer state. A noop right now. 2001-09-10 20:19:54 +00:00
thorpej 783db90019 Use a callout for the delayed ACK timer, and delete tcp_fasttimo().
Expose the delayed ACK timer as net.inet.tcp.delack_ticks.
2001-09-10 04:24:24 +00:00
itojun ddf920093e wrap IPv6 code by #ifdef INET6 2001-07-23 15:20:41 +00:00
itojun 489df53efe use in6_maxmtu, not in_maxmtu, for IPv6 mss computation 2001-07-23 15:17:58 +00:00
wiz 0a600be867 receive, not recieve 2001-06-12 15:17:10 +00:00
thorpej ad9d3794b0 Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces.  This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us.  In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software.  This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off".  It is
enabled with ifconfig(8).  See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
2001-06-02 16:17:09 +00:00
itojun a7596d1912 call icmp6_mtudisc_update(foo, 0) even if ICMPv6 messages are very short.
let icmp6 layer decide whether we take PMTUD routes or not.
2001-05-24 07:22:27 +00:00
chs 5947ce8284 make this compile without rnd. 2001-03-21 03:35:11 +00:00
thorpej 7a3c8f81a5 Two changes, designed to make us even more resilient against TCP
ISS attacks (which we already fend off quite well).

1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic
   hash method of generating TCP ISS values.  Note, this code is experimental
   and disabled by default (experimental enough that I don't export the
   variable via sysctl yet, either).  There are a couple of issues I'd
   like to discuss with Steve, so this code should only be used by people
   who really know what they're doing.

2. Per a recent thread on Bugtraq, it's possible to determine a system's
   uptime by snooping the RFC1323 TCP timestamp options sent by a host; in
   4.4BSD, timestamps are created by incrementing the tcp_now variable
   at 2 Hz; there's even a company out there that uses this to determine
   web server uptime.  According to Newsham's paper "The Problem With
   Random Increments", while NetBSD's TCP ISS generation method is much
   better than the "random increment" method used by FreeBSD and OpenBSD,
   it is still theoretically possible to mount an attack against NetBSD's
   method if the attacker knows how many times the tcp_iss_seq variable
   has been incremented.  By not leaking uptime information, we can make
   that much harder to determine.  So, we avoid the leak by giving each
   TCP connection a timebase of 0.
2001-03-20 20:07:51 +00:00
itojun bc5a6e2482 pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
2001-02-11 06:49:49 +00:00
itojun 617b3fab7e - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
  so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
2001-01-24 09:04:15 +00:00
itojun b2aef8afe2 fix call to in6_pcbnotify. s/EMSGSIZE/PRC_MSGSIZE/. 2000-12-21 00:45:17 +00:00
itojun 5eae50d991 update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
  route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
  route entries (there's upper limit, so bad guys cannot blow up our routing
  table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case
2000-12-09 01:29:45 +00:00
itojun be2983be9d cleanup tcp_drop 2000-10-29 06:33:59 +00:00
itojun 7813d4bf6e process IPv4 tcp RST packet right. reported by thorpej. 2000-10-29 06:30:51 +00:00
itojun 9183e2dc4e remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
2000-10-19 20:22:59 +00:00
itojun 9288750911 memcpy -> bcopy, for sync with kame tree 2000-10-19 00:40:44 +00:00
itojun 23a03329ef verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration.  as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync
2000-10-18 21:14:12 +00:00
thorpej ea9b5a9106 Restructure the Path MTU Discovery code somewhat to avoid
entering rtentry's for hosts we're not actually communicating
with.

Do this by invoking the ctlinput for the protocol, which is
responsible for validating the ICMP message:
	* TCP -- Lookup the connection based on the address/port
	  pairs in the ICMP message.
	* AH/ESP -- Lookup the SA based on the SPI in the ICMP message.

If validation succeeds, ctlinput is responsible for calling
icmp_mtudisc().  icmp_mtudisc() then invokes callbacks registered
by protocols (such as TCP) which want to take some sort of special
action when a path's MTU changes.  For TCP, this is where we now
refresh cached routes and re-enter slow-start.

As a side-effect, this fixes the problem where TCP would not be
notified when a path's MTU changed if AH/ESP were being used.

XXX Note, this is only a fix for the IPv4 case.  For the IPv6
XXX case, we need to wait for the KAME folks.

Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
2000-10-18 17:09:14 +00:00
itojun 06700c02aa move tcp syn cache parameters from in_proto.c to tcp_subr.c.
it makes more sense and helps INET6-only (INET-less) build.
2000-10-18 07:21:10 +00:00