support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream
for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the
right thing).
XXX
current TCP MD5 signature code has giant flaw:
it does not validate signature on input (can't believe it! what is the point?)
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.
This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).
NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.
In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:
sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15
Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
timer, otherwise there is a tiny window where both timers are
active, and this is not correct according to the comments in the
code. I believe that this is the cause of the to_ticks <= 0 assertion
failure in callout_schedule() that I've been getting.
During testing the prediction counters show a hit-rate on about 85% for
packets sent on a local LAN, and better than 99% for intercontinental
high-speed bulk traffic (!).
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)
In preparation for fast-ipsec. Reviewed by itojun.
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.
All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.
Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
sent from. This change avoid a linear search through all mbufs when using
large TCP windows, and therefore permit high-speed connections on long
distances.
Tested on a 1 Gigabit connection between Luleå and San Francisco, a distance
of about 15000km. With TCP windows of just over 20 Mbytes it could keep up
with 950Mbit/s.
After discussions with Matt Thomas and Jason Thorpe.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
we can always keep 2 packets on the wire, no matter what SO_SNDBUF is,
and therefore ACKs will never be delayed unless we run out of data to
transmit. The problem is quite easy to tickle when the MTU of the
outgoing interface is larger than the socket buffer size (e.g. loopback).
Fix from Charles Hannum.
optimization made last year. should solve PR 17867 and 10195.
IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
* Remove the code that allocates a cluster if the packet would
fit in one; it totally defeats doing references to M_EXT mbufs
in the socket buffer. This drastically reduces the number of
data copies in the tcp_output() path for applications which use
large writes. Kudos to Matt Thomas for pointing me in the right
direction.
all open TCP connections in tcp_slowtimo() (which is called 2x
per second). It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
Instead of incrementing t_idle and t_rtt in tcp_slowtimo(), we now
take a timstamp (via tcp_now) and use subtraction to compute the
delta when we actually need it (using unsigned arithmetic so that
tcp_now wrapping is handled correctly).
Based on similar changes in FreeBSD.
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.
We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.
Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.
Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.