Commit Graph

115 Commits

Author SHA1 Message Date
thorpej 7994b6f95e Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
2004-12-15 04:25:19 +00:00
jonathan c8c7a6dbab With FAST_IPSEC, include <netipsec/key.h>, as Itojun's recent changes
now require KEY_FREESAV() to be in scope.
2004-05-20 22:59:02 +00:00
itojun 4ebcfcf29a fix MD5 signature support to actually validate inbound signature, and
drop packet if fails.
2004-05-18 14:44:14 +00:00
chs bd3ff85ff7 work around an LP64 problem where we report an excessively large window
due to incorrect mixing of types.
2004-05-08 14:41:47 +00:00
itojun e0395ac8f0 make TCP MD5 signature work with KAME IPSEC (#define IPSEC).
support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream
for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the
right thing).

XXX
current TCP MD5 signature code has giant flaw:
it does not validate signature on input (can't believe it! what is the point?)
2004-04-26 03:54:28 +00:00
jonathan 887b782b0b Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP).  Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net.  Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct.  Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures.  Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary.  Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
2004-04-25 22:25:03 +00:00
christos dc9378460c Make sure we disarm the persist timer before we arm the rexmit
timer, otherwise there is a tiny window where both timers are
active, and this is not correct according to the comments in the
code. I believe that this is the cause of the to_ticks <= 0 assertion
failure in callout_schedule() that I've been getting.
2004-03-30 19:58:14 +00:00
thorpej 8387ab32c5 Use IPSEC_PCB_SKIP_IPSEC() to short-circuit calls to ipsec{4,6}_hdrsiz_tcp(). 2004-03-03 05:59:38 +00:00
itojun d334411bcd deal with IPv6 path MTU < 1280 (RFC2460 section 5 last paragraph).
check if there really is room for TCP data.
2004-02-04 05:36:03 +00:00
ragge 4a9b211e76 Remove the FAST_MBSEARCH ifdef, send packet prediction is now default. 2003-11-12 10:48:04 +00:00
ragge da20a11a23 Fix the bug in the tcp transmit prediction code.
During testing the prediction counters show a hit-rate on about 85% for
packets sent on a local LAN, and better than 99% for intercontinental
high-speed bulk traffic (!).
2003-10-24 10:25:40 +00:00
enami 935b3c7ad5 Make this file compile again when TCP_OUTPUT_COUNTERS defined. 2003-10-24 03:12:53 +00:00
thorpej e8a98ee63e Oops, FAST_MBSEARCH counters were swapped; fix it. Pointed out by yamt@. 2003-10-23 17:02:23 +00:00
thorpej 861856caa0 Add event counters that measure FAST_MBSEARCH. 2003-10-21 21:17:20 +00:00
itojun 11ede1ed88 remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output. 2003-08-22 22:00:36 +00:00
itojun 82eb4ce914 change the additional arg to be passed to ip{,6}_output to struct socket *.
this fixes KAME policy lookup which was broken by the previous commit.
2003-08-22 21:53:01 +00:00
jonathan 902669955f Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec.  Reviewed by itojun.
2003-08-22 20:20:09 +00:00
jonathan 28b5f5dfab (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''.  Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
2003-08-15 03:42:00 +00:00
agc aad01611e7 Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22364, verified by myself.
2003-08-07 16:26:28 +00:00
ragge cb6b5a36c4 Make the fast-search stuff an option. There are still reports on
problem with it.
2003-07-02 21:43:49 +00:00
ragge c6308a0598 Fix previous bug. Thanks to Enami for spotting the (obvious) error, and
to other people with much help with bug reports etc.
While fixing, change some of the code I added last time to make it
cleaner and simpler.
2003-07-02 19:33:20 +00:00
ragge c04e1a5756 Disable the code I checked in yesterday; reports that samba (!) are crashing
machines with it. Will do some more tests.
2003-06-30 14:51:06 +00:00
fvdl d5aece61d6 Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
2003-06-29 22:28:00 +00:00
ragge 679db94879 Add code to remember where in the send queue of mbufs the last packet was
sent from. This change avoid a linear search through all mbufs when using
large TCP windows, and therefore permit high-speed connections on long
distances.

Tested on a 1 Gigabit connection between Luleå and San Francisco, a distance
of about 15000km.  With TCP windows of just over 20 Mbytes it could keep up
with 950Mbit/s.

After discussions with Matt Thomas and Jason Thorpe.
2003-06-29 18:58:26 +00:00
itojun 6ca34aa391 no need for ip_v recovery in output path too
(tcp_template includes ip_v setting)
2003-05-17 17:16:20 +00:00
thorpej cdf1b0026c Allow TCP connections to hosts on a local network to use a larger
slow start initial window.  Default this larger initial window to
4 packets, allowing it to be adjusted with net.inet.tcp.init_win_local.
2003-03-01 04:40:27 +00:00
matt 65e5548a17 Add MBUFTRACE kernel option.
Do a little mbuf rework while here.  Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *).  These are not performance critical and making them
call m_get saves considerable space.  Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
2003-02-26 06:31:08 +00:00
scw 5b169b8d2e Fix a genuine uninitialised variable warning. 2002-11-24 10:51:56 +00:00
itojun 61eed162b2 cleanup ipsec.h dependency. commented by perry, sync w/kame 2002-11-02 19:03:44 +00:00
mycroft 129af72834 In the txsegsize bounding code, it is not necessary to adjust for the options
length.
2002-09-13 18:26:55 +00:00
thorpej c23fa5a752 Never send more than half a socket buffer of data. This insures that
we can always keep 2 packets on the wire, no matter what SO_SNDBUF is,
and therefore ACKs will never be delayed unless we run out of data to
transmit.  The problem is quite easy to tickle when the MTU of the
outgoing interface is larger than the socket buffer size (e.g. loopback).

Fix from Charles Hannum.
2002-08-20 16:29:42 +00:00
itojun c00fa8dfd9 avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year.  should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged.  we may want to
provide IP_HDRINCL variant that does not swap endian.
2002-08-14 00:23:27 +00:00
thorpej 8038dd2cbe Disable TCP Congestion Window Monitoring by default; there are
performance problems in the face of tinygrams.
2002-06-13 16:31:05 +00:00
itojun f192b66b94 whitespace 2002-06-09 16:33:36 +00:00
itojun 5c1df51d53 attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
2002-05-29 07:53:39 +00:00
itojun 3e7ae517e0 path MTU discovery blackhole detection.
PR 12790 (sorry for not committing it for a long time)
2002-05-26 16:05:43 +00:00
thorpej 9054daca3e * Instrument tcp_build_datapkt().
* Remove the code that allocates a cluster if the packet would
  fit in one; it totally defeats doing references to M_EXT mbufs
  in the socket buffer.  This drastically reduces the number of
  data copies in the tcp_output() path for applications which use
  large writes.  Kudos to Matt Thomas for pointing me in the right
  direction.
2002-04-27 01:47:58 +00:00
thorpej 1caa35aa0f In tcp_segsize(), move a label so that option length is considered
when using the default TCP MSS as well.  From Matt Thomas.
2002-03-01 22:54:09 +00:00
itojun a709c83618 place NRL copyright notice itself, not a reference to it. 2002-01-24 02:12:29 +00:00
jmcneill 078a8c0cc3 Fix TCP segment size computation. From Rick Byersm, PR kern/14799. 2001-12-03 01:45:43 +00:00
lukem ea1cd7eb08 add RCSIDs 2001-11-13 00:32:34 +00:00
thorpej 6d0e813f6c Use callouts for TCP timers, rather than traversing the list of
all open TCP connections in tcp_slowtimo() (which is called 2x
per second).  It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
2001-09-10 22:14:26 +00:00
thorpej 7446fd2bc8 Change the way receive idle time and round trip time are measured.
Instead of incrementing t_idle and t_rtt in tcp_slowtimo(), we now
take a timstamp (via tcp_now) and use subtraction to compute the
delta when we actually need it (using unsigned arithmetic so that
tcp_now wrapping is handled correctly).

Based on similar changes in FreeBSD.
2001-09-10 15:23:09 +00:00
thorpej 7a89a34393 Enable Congestion Window Monitoring by default. 2001-09-10 04:43:35 +00:00
thorpej 783db90019 Use a callout for the delayed ACK timer, and delete tcp_fasttimo().
Expose the delayed ACK timer as net.inet.tcp.delack_ticks.
2001-09-10 04:24:24 +00:00
thorpej 35df06a642 Carve off the code that builds a TCP data packet into its own
function, and inline it, except when profiling... so we can
profile it.
2001-07-31 02:25:22 +00:00
thorpej 938720eea4 Count the number of times we "self-quench" (ip_output() returns
ENOBUFS), and don't inline tcp_segsize() if profiling.
2001-07-31 00:57:45 +00:00
thorpej 52654926a4 Slight cosmetic change. 2001-07-26 21:47:04 +00:00
abs 03aaf3d8b4 Rename TCPDEBUG to TCP_DEBUG, defopt TCP_DEBUG and TCP_NDEBUG, and
make all usage of tcp_trace dependent on TCP_DEBUG - resulting in
a 31K saving on an INET enabled i386 kernel.
2001-07-08 16:18:56 +00:00
thorpej ad9d3794b0 Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces.  This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us.  In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software.  This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off".  It is
enabled with ifconfig(8).  See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
2001-06-02 16:17:09 +00:00