Commit Graph

227 Commits

Author SHA1 Message Date
manu cddc307094 Fix build problem after recent NAT-T changes 2005-04-26 05:37:45 +00:00
yamt 4b935040d0 tcp_input: update a comment to match with the code. 2005-04-03 05:02:46 +00:00
yamt 8b0967ff45 protect tcpipqent with splvm. 2005-03-29 20:10:16 +00:00
yamt df05ca7085 simplify data receiver side sack processing.
- introduce t_segqlen, the number of segments in segq/timeq.
  the name is from freebsd.
- rather than maintaining a copy of sack blocks (rcv_sack_block[]),
  build it directly from the segment list when needed.
2005-03-16 00:39:56 +00:00
mycroft c9f058f65e Copyright maintenance. 2005-03-02 10:20:18 +00:00
jonathan 4ae1f36dc9 Commit TCP SACK patches from Kentaro A. Karahone's patch at:
http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz

Fixes in that patch for pre-existing TCP pcb initializations were already
committed to NetBSD-current, so are not included in this commit.

The SACK patch has been observed to correctly negotiate and respond,
to SACKs in wide-area traffic.

There are two indepenently-observed, as-yet-unresolved anomalies:
First, seeing unexplained delays between in fast retransmission
(potentially explainable by an 0.2sec RTT between adjacent
ethernet/wifi NICs); and second, peculiar and unepxlained TCP
retransmits observed over an ath0 card.

After discussion with several interested developers, I'm committing
this now, as-is, for more eyes to use and look over.  Current hypothesis
is that the anomalies above may in fact be due to link/level (hardware,
driver, HAL, firmware) abberations in the test setup, affecting  both
Kentaro's  wired-Ethernet NIC and in my two (different) WiFi NICs.
2005-02-28 16:20:59 +00:00
perry f07677dd81 nuke trailing whitespace 2005-02-26 22:45:09 +00:00
perry 870f206724 ANSIfy function declarations 2005-02-03 23:39:32 +00:00
perry 3494482345 de-__P -- will ANSIfy .c files later. 2005-02-02 21:41:55 +00:00
mycroft 47759e6333 Several changes based on comparison with NS:
1) dupseg_fix_=true from NS: do not count a segment with completely duplicate
data as a duplicate ack.  This can occur due to duplicate packets in the
network, or due to fast retransmit from the other side.

2) dupack_reset_=false from NS: do not reset the duplicate ack counter or exit
fast recovery if we happen to get data or a window update along with a
duplicate ack.

3) In the "very old ack" case that itojun added, send an ACK before dropping
the segment, to try to update the other side's send sequence number.

4) Check the ssthresh crossover point with >= rather than >.  Otherwise we
start to do "exponential" growth immediately following recovery, where we
should be doing "linear".  This is what NS does.
2005-01-28 00:18:22 +00:00
mycroft 746d109a3c There is no reason to adjust ts_recent_age for ts_timebase; it's strictly an
internal variable.
2005-01-27 17:14:04 +00:00
mycroft 470f2d0705 Do the other TCP_PAWS_IDLE check unsigned as well. It doesn't do us any harm,
and it could detect even older time stamps.  (Really, to be 100% correct, there
should be a timer that clears these out -- but it probably doesn't matter in
the real world.)
2005-01-27 17:10:07 +00:00
mycroft 42655f2a87 Also check whether an echoed RTT is very large -- this *could* cause the
smoothing function to overflow.  I use TCP_PAWS_IDLE (24 days) for this.
2005-01-27 16:56:06 +00:00
mycroft 7215a0b3f1 Introduce a new state variable, t_partialacks. It has 3 states:
* t_partialacks<0 means we are not in fast recovery.
* t_partialacks==0 means we are in fast recovery, but we have not received
  any partial acks yet.
* t_partialacks>0 means we are in fast recovery, and we have received
  partial acks.

This is used to implement 2 changes in RFC 3782:
* We keep the notion that we are in fast recovery separate from t_dupacks, so
  it is not reset due to out-of-order acks.  (This affects both the Reno and
  NewReno cases.)
* We only reset the retransmit timer on the first partial ack -- preventing us
  from possibly taking one RTO per segment once fast recovery is initiated.

As before, it is hard to measure any difference between Reno and NewReno in the
real-world cases that I've tested.
2005-01-27 03:39:36 +00:00
mycroft 5283ca74ad Fix two problems in our TCP stack:
1) If an echoed RFC 1323 time stamp appears to be later than the current time,
   ignore it and fall back to old-style RTT calculation.  This prevents ending
   up with a negative RTT and panicking later.

2) Fix NewReno.  This involves a few changes:

   a) Implement the send_high variable in RFC 2582.  Our implementation is
      subtly different; it is one *past* the last sequence number transmitted
      rather than being equal to it.  This simplifies some logic and makes
      the code smaller.  Additional logic was required to prevent sequence
      number wraparound problems; this is not mentioned in RFC 2582.

   b) Make sure we reset t_dupacks on new acks, but *not* on a partial ack.
      All of the new ack code is pushed out into tcp_newreno().  (Later this
      will probably be a pluggable function.)  Thus t_dupacks keeps track of
      whether we're in fast recovery all the time, with Reno or NewReno, which
      keeps some logic simpler.

   c) We do not need to update snd_recover when we're not in fast recovery.
      See tech-net for an explanation of this.

   d) In the gratuitous fast retransmit prevention case, do not send a packet.
      RFC 2582 specifically says that we should "do nothing".

   e) Do not inflate the congestion window on a partial ack.  (This is done by
      testing t_dupacks to see whether we're still in fast recovery.)

This brings the performance of NewReno back up to the same as Reno in a few
random test cases (e.g. transferring peer-to-peer over my wireless network).
I have not concocted a good test case for the behavior specific to NewReno.
2005-01-26 21:49:27 +00:00
yamt ffebedd625 factor out receive side tcp/udp checksum handling code so that they
can be used by eg. packet filters.

reviewed by Christos Zoulas on tech-net@.
(slightly tweaked since then to make tcp and udp similar.)
2004-12-21 05:51:31 +00:00
yamt 6e353db6e4 tcp_input: add missing loopback checksum omission code for ipv6. 2004-12-18 07:30:17 +00:00
thorpej 7994b6f95e Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
2004-12-15 04:25:19 +00:00
yamt 0ea22c32fa fix ipqent pool corruption problems. make tcp reass code use
its own pool of ipqent rather than sharing it with ip reass code.
PR/24782.
2004-09-15 09:21:22 +00:00
itojun 2aef0b1784 correct TCP-MD5 support. Jeff Rizzo 2004-06-26 03:29:15 +00:00
jonathan 349ad018c7 Remove now-unused variable. 2004-05-23 00:37:27 +00:00
itojun 4ebcfcf29a fix MD5 signature support to actually validate inbound signature, and
drop packet if fails.
2004-05-18 14:44:14 +00:00
jonathan 85b3ba5bf1 Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl,  for
netstat -p ipsec.

New kernel files:
	sys/netipsec/Makefile		(new file; install *_var.h includes)
	sys/netipsec/ipsec_var.h	(new 64-bit mib counter struct)

Changed kernel files:
	sys/Makefile			(recurse into sys/netipsec/)
	sys/netinet/in.h		(fake IP_PROTO name for fast_ipsec
					sysctl subtree.)
	sys/netipsec/ipsec.h		(minimal userspace inclusion)
	sys/netipsec/ipsec_osdep.h	(minimal userspace inclusion)
	sys/netipsec/ipsec_netbsd.c	(redo sysctl subtree from scratch)
	sys/netipsec/key*.c		(fix broken net.key subtree)

	sys/netipsec/ah_var.h		(increase all counters to 64 bits)
	sys/netipsec/esp_var.h		(increase all counters to 64 bits)
	sys/netipsec/ipip_var.h		(increase all counters to 64 bits)
	sys/netipsec/ipcomp_var.h	(increase all counters to 64 bits)

	sys/netipsec/ipsec.c		(add #include netipsec/ipsec_var.h)
	sys/netipsec/ipsec_mbuf.c	(add #include netipsec/ipsec_var.h)
	sys/netipsec/ipsec_output.c	(add #include netipsec/ipsec_var.h)

	sys/netinet/raw_ip.c		(add #include netipsec/ipsec_var.h)
	sys/netinet/tcp_input.c		(add #include netipsec/ipsec_var.h)
	sys/netinet/udp_usrreq.c	(add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
	usr.bin/netstat/fast_ipsec.c	(print fast-ipsec counters)

Changed files:
	usr.bin/netstat/Makefile	(add fast_ipsec.c)
	usr.bin/netstat/netstat.h	(declarations for fast_ipsec.c)
	usr.bin/netstat/main.c		(call KAME-vs-fast-ipsec dispatcher)
2004-05-07 00:55:14 +00:00
matt 5a0de7507d When a packet is received that overlaps the left side of the window,
check for RST *before* trimming data and adjust its sequence number.
2004-04-27 14:46:07 +00:00
itojun e0395ac8f0 make TCP MD5 signature work with KAME IPSEC (#define IPSEC).
support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream
for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the
right thing).

XXX
current TCP MD5 signature code has giant flaw:
it does not validate signature on input (can't believe it! what is the point?)
2004-04-26 03:54:28 +00:00
matt 5413745100 Remove #else clause of __STDC__ 2004-04-26 01:31:56 +00:00
jonathan 887b782b0b Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP).  Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net.  Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct.  Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures.  Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary.  Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
2004-04-25 22:25:03 +00:00
simonb b5d0e6bf06 Initialise (most) pools from a link set instead of explicit calls
to pool_init.  Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

 Convert struct session, ucred and lockf to pools.
2004-04-25 16:42:40 +00:00
itojun 22bdfd729d fix how we send RST against ACK. markus@openbsd 2004-04-25 03:29:11 +00:00
itojun 8a0aba4304 indent for little bit better readability 2004-04-25 00:08:54 +00:00
itojun 3b87628cfb fix comment; we no longer move ip+tcp into the same mbuf 2004-04-24 23:59:13 +00:00
ragge febf637b17 Avoid performance problem in tcp_reass() when appending mbufs to a chain
by keeping a pointer to the last mbuf in the chain.
2004-04-22 15:05:33 +00:00
itojun 6a16706746 follow draft-ietf-tcpm-tcpsecure-00.txt 3.2 (B):
if SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
2004-04-20 19:49:15 +00:00
itojun f2e796b13f - respond to RST by ACK, as suggested in NISCC recommendation
- rate-limit ACKs against RSTs and SYNs
2004-04-20 16:52:12 +00:00
matt 35b9f3ec72 If a segment is received with RST set and the segment is completely to the
left of the receive window, ignore it.  Add some additional comments to
the code that deals with received segemnts that are completely to the right
of the receive window.  If an invalid SYN is received, force an ACK and
drop it; if the other side really sent the SYN; it'll respond with a reset.
2004-04-17 23:35:37 +00:00
ragge 0a7fe37708 Add back one line which was accidentially removed (by me) a while ago.
Spotted by Markus Friedl (markus at openbsd.org).
2004-04-14 18:07:52 +00:00
atatat 83b193a052 Make these compile without INET. tcp_input probably needs a lot more
work...
2004-03-29 04:59:02 +00:00
drochner 6a4fbf616c fix tcp/udp checksum test in the M_CSUM_NO_PSEUDOHDR case
(this can never have worked)
now I can use a "bge" gigabit interface with hw checksumming
ttcp-t: 2147483648 bytes in 18.31 real seconds = 114527.11 KB/sec +++
woow!
2004-03-10 18:50:45 +00:00
itojun 8ef33296ff KNF 2004-02-26 02:34:59 +00:00
itojun 5377ace199 some corrections from markus@openbsd;
- callout_ack() was called with wrong argument
- no need for xor with timestamp as we are using arc4random()
- minor typo/cleanup
2004-01-02 12:01:39 +00:00
jonathan b6e73d53fb Footwork for fast-ipsec and IPv6: when compiling sys/netinet/tcp_input.c
for both FAST_IPSEC and INET6, include <netipsec/ipsec6.h>.
2003-11-19 20:47:00 +00:00
ragge da20a11a23 Fix the bug in the tcp transmit prediction code.
During testing the prediction counters show a hit-rate on about 85% for
packets sent on a local LAN, and better than 99% for intercontinental
high-speed bulk traffic (!).
2003-10-24 10:25:40 +00:00
mycroft 5a8b331f54 Remove all the code to maintain ia_inpcbs. This information was only used to
close sockets on address changes, which was deemed to be a bad idea and was
summarily removed, so there is no point in wasting effort on maintaining it
any more.
2003-10-23 20:55:08 +00:00
itojun 644a4857fb cut-and-paste error. Valeriy E. Ushakov 2003-09-10 01:46:27 +00:00
itojun 99bc41d6fd if IPsec inbound policy mismatches, respond to SYN with RST (instead of
just dropping it), allow client to react quickly.
2003-09-10 00:58:29 +00:00
itojun 175c9afa3f clarify flowlabel handling 2003-09-06 03:12:51 +00:00
itojun 495906ca8e revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
2003-09-04 09:16:57 +00:00
itojun a3bad645a4 make sure so is properly initialized 2003-08-22 22:49:34 +00:00
itojun 11ede1ed88 remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output. 2003-08-22 22:00:36 +00:00
itojun 82eb4ce914 change the additional arg to be passed to ip{,6}_output to struct socket *.
this fixes KAME policy lookup which was broken by the previous commit.
2003-08-22 21:53:01 +00:00