Commit Graph

168 Commits

Author SHA1 Message Date
jonathan
4ae1f36dc9 Commit TCP SACK patches from Kentaro A. Karahone's patch at:
http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz

Fixes in that patch for pre-existing TCP pcb initializations were already
committed to NetBSD-current, so are not included in this commit.

The SACK patch has been observed to correctly negotiate and respond,
to SACKs in wide-area traffic.

There are two indepenently-observed, as-yet-unresolved anomalies:
First, seeing unexplained delays between in fast retransmission
(potentially explainable by an 0.2sec RTT between adjacent
ethernet/wifi NICs); and second, peculiar and unepxlained TCP
retransmits observed over an ath0 card.

After discussion with several interested developers, I'm committing
this now, as-is, for more eyes to use and look over.  Current hypothesis
is that the anomalies above may in fact be due to link/level (hardware,
driver, HAL, firmware) abberations in the test setup, affecting  both
Kentaro's  wired-Ethernet NIC and in my two (different) WiFi NICs.
2005-02-28 16:20:59 +00:00
perry
f07677dd81 nuke trailing whitespace 2005-02-26 22:45:09 +00:00
perry
870f206724 ANSIfy function declarations 2005-02-03 23:39:32 +00:00
thorpej
7994b6f95e Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
2004-12-15 04:25:19 +00:00
jonathan
c8c7a6dbab With FAST_IPSEC, include <netipsec/key.h>, as Itojun's recent changes
now require KEY_FREESAV() to be in scope.
2004-05-20 22:59:02 +00:00
itojun
4ebcfcf29a fix MD5 signature support to actually validate inbound signature, and
drop packet if fails.
2004-05-18 14:44:14 +00:00
chs
bd3ff85ff7 work around an LP64 problem where we report an excessively large window
due to incorrect mixing of types.
2004-05-08 14:41:47 +00:00
itojun
e0395ac8f0 make TCP MD5 signature work with KAME IPSEC (#define IPSEC).
support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream
for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the
right thing).

XXX
current TCP MD5 signature code has giant flaw:
it does not validate signature on input (can't believe it! what is the point?)
2004-04-26 03:54:28 +00:00
jonathan
887b782b0b Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP).  Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net.  Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct.  Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures.  Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary.  Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
2004-04-25 22:25:03 +00:00
christos
dc9378460c Make sure we disarm the persist timer before we arm the rexmit
timer, otherwise there is a tiny window where both timers are
active, and this is not correct according to the comments in the
code. I believe that this is the cause of the to_ticks <= 0 assertion
failure in callout_schedule() that I've been getting.
2004-03-30 19:58:14 +00:00
thorpej
8387ab32c5 Use IPSEC_PCB_SKIP_IPSEC() to short-circuit calls to ipsec{4,6}_hdrsiz_tcp(). 2004-03-03 05:59:38 +00:00
itojun
d334411bcd deal with IPv6 path MTU < 1280 (RFC2460 section 5 last paragraph).
check if there really is room for TCP data.
2004-02-04 05:36:03 +00:00
ragge
4a9b211e76 Remove the FAST_MBSEARCH ifdef, send packet prediction is now default. 2003-11-12 10:48:04 +00:00
ragge
da20a11a23 Fix the bug in the tcp transmit prediction code.
During testing the prediction counters show a hit-rate on about 85% for
packets sent on a local LAN, and better than 99% for intercontinental
high-speed bulk traffic (!).
2003-10-24 10:25:40 +00:00
enami
935b3c7ad5 Make this file compile again when TCP_OUTPUT_COUNTERS defined. 2003-10-24 03:12:53 +00:00
thorpej
e8a98ee63e Oops, FAST_MBSEARCH counters were swapped; fix it. Pointed out by yamt@. 2003-10-23 17:02:23 +00:00
thorpej
861856caa0 Add event counters that measure FAST_MBSEARCH. 2003-10-21 21:17:20 +00:00
itojun
11ede1ed88 remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output. 2003-08-22 22:00:36 +00:00
itojun
82eb4ce914 change the additional arg to be passed to ip{,6}_output to struct socket *.
this fixes KAME policy lookup which was broken by the previous commit.
2003-08-22 21:53:01 +00:00
jonathan
902669955f Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec.  Reviewed by itojun.
2003-08-22 20:20:09 +00:00
jonathan
28b5f5dfab (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''.  Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
2003-08-15 03:42:00 +00:00
agc
aad01611e7 Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22364, verified by myself.
2003-08-07 16:26:28 +00:00
ragge
cb6b5a36c4 Make the fast-search stuff an option. There are still reports on
problem with it.
2003-07-02 21:43:49 +00:00
ragge
c6308a0598 Fix previous bug. Thanks to Enami for spotting the (obvious) error, and
to other people with much help with bug reports etc.
While fixing, change some of the code I added last time to make it
cleaner and simpler.
2003-07-02 19:33:20 +00:00
ragge
c04e1a5756 Disable the code I checked in yesterday; reports that samba (!) are crashing
machines with it. Will do some more tests.
2003-06-30 14:51:06 +00:00
fvdl
d5aece61d6 Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
2003-06-29 22:28:00 +00:00
ragge
679db94879 Add code to remember where in the send queue of mbufs the last packet was
sent from. This change avoid a linear search through all mbufs when using
large TCP windows, and therefore permit high-speed connections on long
distances.

Tested on a 1 Gigabit connection between Luleå and San Francisco, a distance
of about 15000km.  With TCP windows of just over 20 Mbytes it could keep up
with 950Mbit/s.

After discussions with Matt Thomas and Jason Thorpe.
2003-06-29 18:58:26 +00:00
itojun
6ca34aa391 no need for ip_v recovery in output path too
(tcp_template includes ip_v setting)
2003-05-17 17:16:20 +00:00
thorpej
cdf1b0026c Allow TCP connections to hosts on a local network to use a larger
slow start initial window.  Default this larger initial window to
4 packets, allowing it to be adjusted with net.inet.tcp.init_win_local.
2003-03-01 04:40:27 +00:00
matt
65e5548a17 Add MBUFTRACE kernel option.
Do a little mbuf rework while here.  Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *).  These are not performance critical and making them
call m_get saves considerable space.  Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
2003-02-26 06:31:08 +00:00
scw
5b169b8d2e Fix a genuine uninitialised variable warning. 2002-11-24 10:51:56 +00:00
itojun
61eed162b2 cleanup ipsec.h dependency. commented by perry, sync w/kame 2002-11-02 19:03:44 +00:00
mycroft
129af72834 In the txsegsize bounding code, it is not necessary to adjust for the options
length.
2002-09-13 18:26:55 +00:00
thorpej
c23fa5a752 Never send more than half a socket buffer of data. This insures that
we can always keep 2 packets on the wire, no matter what SO_SNDBUF is,
and therefore ACKs will never be delayed unless we run out of data to
transmit.  The problem is quite easy to tickle when the MTU of the
outgoing interface is larger than the socket buffer size (e.g. loopback).

Fix from Charles Hannum.
2002-08-20 16:29:42 +00:00
itojun
c00fa8dfd9 avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year.  should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged.  we may want to
provide IP_HDRINCL variant that does not swap endian.
2002-08-14 00:23:27 +00:00
thorpej
8038dd2cbe Disable TCP Congestion Window Monitoring by default; there are
performance problems in the face of tinygrams.
2002-06-13 16:31:05 +00:00
itojun
f192b66b94 whitespace 2002-06-09 16:33:36 +00:00
itojun
5c1df51d53 attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
2002-05-29 07:53:39 +00:00
itojun
3e7ae517e0 path MTU discovery blackhole detection.
PR 12790 (sorry for not committing it for a long time)
2002-05-26 16:05:43 +00:00
thorpej
9054daca3e * Instrument tcp_build_datapkt().
* Remove the code that allocates a cluster if the packet would
  fit in one; it totally defeats doing references to M_EXT mbufs
  in the socket buffer.  This drastically reduces the number of
  data copies in the tcp_output() path for applications which use
  large writes.  Kudos to Matt Thomas for pointing me in the right
  direction.
2002-04-27 01:47:58 +00:00
thorpej
1caa35aa0f In tcp_segsize(), move a label so that option length is considered
when using the default TCP MSS as well.  From Matt Thomas.
2002-03-01 22:54:09 +00:00
itojun
a709c83618 place NRL copyright notice itself, not a reference to it. 2002-01-24 02:12:29 +00:00
jmcneill
078a8c0cc3 Fix TCP segment size computation. From Rick Byersm, PR kern/14799. 2001-12-03 01:45:43 +00:00
lukem
ea1cd7eb08 add RCSIDs 2001-11-13 00:32:34 +00:00
thorpej
6d0e813f6c Use callouts for TCP timers, rather than traversing the list of
all open TCP connections in tcp_slowtimo() (which is called 2x
per second).  It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
2001-09-10 22:14:26 +00:00
thorpej
7446fd2bc8 Change the way receive idle time and round trip time are measured.
Instead of incrementing t_idle and t_rtt in tcp_slowtimo(), we now
take a timstamp (via tcp_now) and use subtraction to compute the
delta when we actually need it (using unsigned arithmetic so that
tcp_now wrapping is handled correctly).

Based on similar changes in FreeBSD.
2001-09-10 15:23:09 +00:00
thorpej
7a89a34393 Enable Congestion Window Monitoring by default. 2001-09-10 04:43:35 +00:00
thorpej
783db90019 Use a callout for the delayed ACK timer, and delete tcp_fasttimo().
Expose the delayed ACK timer as net.inet.tcp.delack_ticks.
2001-09-10 04:24:24 +00:00
thorpej
35df06a642 Carve off the code that builds a TCP data packet into its own
function, and inline it, except when profiling... so we can
profile it.
2001-07-31 02:25:22 +00:00
thorpej
938720eea4 Count the number of times we "self-quench" (ip_output() returns
ENOBUFS), and don't inline tcp_segsize() if profiling.
2001-07-31 00:57:45 +00:00
thorpej
52654926a4 Slight cosmetic change. 2001-07-26 21:47:04 +00:00
abs
03aaf3d8b4 Rename TCPDEBUG to TCP_DEBUG, defopt TCP_DEBUG and TCP_NDEBUG, and
make all usage of tcp_trace dependent on TCP_DEBUG - resulting in
a 31K saving on an INET enabled i386 kernel.
2001-07-08 16:18:56 +00:00
thorpej
ad9d3794b0 Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces.  This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us.  In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software.  This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off".  It is
enabled with ifconfig(8).  See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
2001-06-02 16:17:09 +00:00
itojun
6e45c58f53 check ip_mtudisc only for TCP over IPv4.
PMTUD is mandatory for TCP over IPv6 (if packets > 1280).
2001-04-03 06:14:31 +00:00
thorpej
7a3c8f81a5 Two changes, designed to make us even more resilient against TCP
ISS attacks (which we already fend off quite well).

1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic
   hash method of generating TCP ISS values.  Note, this code is experimental
   and disabled by default (experimental enough that I don't export the
   variable via sysctl yet, either).  There are a couple of issues I'd
   like to discuss with Steve, so this code should only be used by people
   who really know what they're doing.

2. Per a recent thread on Bugtraq, it's possible to determine a system's
   uptime by snooping the RFC1323 TCP timestamp options sent by a host; in
   4.4BSD, timestamps are created by incrementing the tcp_now variable
   at 2 Hz; there's even a company out there that uses this to determine
   web server uptime.  According to Newsham's paper "The Problem With
   Random Increments", while NetBSD's TCP ISS generation method is much
   better than the "random increment" method used by FreeBSD and OpenBSD,
   it is still theoretically possible to mount an attack against NetBSD's
   method if the attacker knows how many times the tcp_iss_seq variable
   has been incremented.  By not leaking uptime information, we can make
   that much harder to determine.  So, we avoid the leak by giving each
   TCP connection a timebase of 0.
2001-03-20 20:07:51 +00:00
itojun
617b3fab7e - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
  so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
2001-01-24 09:04:15 +00:00
itojun
ef8a34f5c3 fix IPv4 TTL selection with AF_INET6 API. sync with kame. From: jdc 2000-11-06 00:50:12 +00:00
itojun
9183e2dc4e remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
2000-10-19 20:22:59 +00:00
itojun
a7e15e4935 be more friendly with INET-less build.
XXX we need to do more to do a working INET-less build
2000-10-17 03:06:42 +00:00
thorpej
d839a91f5f Add an IP_MTUDISC flag to the flags that can be passed to
ip_output().  This flag, if set, causes ip_output() to set
DF in the IP header if the MTU in the route is not locked.

This allows a bunch of redundant code, which I was never
really all that happy about adding in the first place, to
be eliminated.

Inspired by a similar change made by provos@openbsd.org when
he integrated NetBSD's Path MTU Discovery code into OpenBSD.
2000-10-17 02:57:01 +00:00
itojun
7abf4641c6 forgot to call tcp6_quench(). sync with kame. 2000-07-28 02:39:45 +00:00
itojun
23f6a4f4e8 remove old mbuf assumption (ip header and tcp header are on the same mbuf).
this is for m_pulldown use. (sync with kame)
2000-06-30 16:44:33 +00:00
augustss
8529438fe6 Remove register declarations. 2000-03-30 12:51:13 +00:00
itojun
04ac848d6f introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing.  this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
2000-03-01 12:49:27 +00:00
itojun
4f53db2499 optimize mbuf allocation for ip/tcp/tcpopt part. 2000-02-09 00:50:40 +00:00
itojun
ea861f0183 sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
  using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
1999-12-13 15:17:17 +00:00
itojun
9474edfcd8 cleanup and correct TCP MSS consideration with IPsec headers.
MSS advertisement must always be:
	max(if mtu) - ip hdr siz - tcp hdr siz
We violated this in the previous code so it was fixed.

tcp_mss_to_advertise() now takes af (af on wire) as its argument,
to compute right ip hdr siz.

tcp_segsize() will take care of IPsec header size.
One thing I'm not really sure is how to handle IPsec header size in
*rxsegsizep (inbound segment size estimation).
The current code subtracts possible *outbound* IPsec size from *rxsegsizep,
hoping that the peer is using the same IPsec policy as me.
It may not be applicable, could TCP gulu please comment...
1999-09-23 02:21:30 +00:00
thorpej
f9a7668b3f defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h). 1999-07-09 22:57:15 +00:00
fvdl
e3fa5cc725 Fix for -Wunitialized warnings broke compiles without INET6, refix. 1999-07-02 21:02:05 +00:00
itojun
4b961b81e3 avoid "variable not initialized" warnings on some of the platforms. 1999-07-02 12:45:32 +00:00
itojun
118d2b1d4f IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
  data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
  package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
  file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
1999-07-01 08:12:45 +00:00
thorpej
a43786143f Fix a problem pointed out by Charles Hannum; DF wasn't being set in
SYN,ACK packets during Path MTU Discovery.  Fix tcp_respond() to do the
appropriate route lookup and set DF as appropriate.

Also, fixup similar code in tcp_output() to relookup the route if it
is down.
1999-01-20 03:39:54 +00:00
thorpej
93454aafc6 Delay sending if SS_MORETOCOME is set in so_state. This avoids the case
where the user issued a write with a length greater than MLEN but less
than MINCLSIZE, thus causing two mbufs to be used.  The loop in sosend()
would then call PRU_SEND twice, causing TCP to transmit 2 packets when
it could have transmitted one.

Suggested by Justin Walker <justin@apple.com> on the freebsd-net
mailing list.
1998-12-16 00:33:14 +00:00
matt
8e8f38e0f2 Add a sysctl for newreno (default to off). 1998-10-06 00:20:44 +00:00
matt
25054b5cf7 Adapt the NEWRENO changes from the UCSB diffs of BSDI 3.0's TCP
to NetBSD.  Ignore the SACK & FACK stuff for now.
1998-10-04 21:33:52 +00:00
mycroft
cca4e566a9 Implement a better fix for the `gratuitous FIN' problem, as
mentioned on tcp-impl but with a bit more commentary.
1998-07-21 10:46:00 +00:00
thorpej
fa20f24cd9 Add a comment wrt. a current issue w/ CWM. 1998-07-17 23:00:02 +00:00
thorpej
830879a809 Comment where the Restart Window is computed, and in the non-CWM case,
make sure it never _increases_ cwnd.
1998-07-17 22:52:01 +00:00
sommerfe
065cac9798 Delete bogus (void) cast of m_freem (which is already a void function..) 1998-07-07 00:04:59 +00:00
thorpej
5596fe2614 Nuke TUBA per my note to tech-net; there's no reason to keep it around. 1998-05-11 19:57:23 +00:00
thorpej
1ffa60ac01 Use macros from tcp_timer.h to manipulate TCP timers, so that their
implementation can be changed easily.
1998-05-06 01:21:20 +00:00
thorpej
e1934b4c36 Correct a comment related to Congestion Window Monitoring. 1998-05-02 01:00:24 +00:00
thorpej
ce40806e29 In the CWM code, don't use the Floyd initial window computation as
the burst size allowed, but rather a fixed number of packets, as
described in the Internet Draft.  Default allowed burst is 4 packets,
per the Draft.

Make the use of CWM and the allowed burst size tunable via sysctl.
1998-04-30 18:27:20 +00:00
kml
1579dcec47 Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code.  Add sysctl for timeout period.
1998-04-29 03:44:11 +00:00
kml
fcf0227962 Fix to ensure that the correct MSS is advertised for loopback
TCP connections by using the MTU of the interface.  Also added
a knob, mss_ifmtu, to force all connections to use the MTU of
the interface to calculate the advertised MSS.
1998-04-13 21:18:19 +00:00
thorpej
f9463514bf Implement Congestion Window Monitoring as described in the TCPIMPL
meeting of IETF #41 by Amy Hughes <ahughes@isi.edu>, and in an upcoming
internet draft from Hughes, Touch, and Heidemann.

CWM eliminates line-rate bursts after idle periods by counting pending
(unacknowledged) packets and limiting the congestion window to the
initial congestion window plus the pending packet count.  This has the
effect of allowing us to use the window as long as we continue to transmit,
but as soon as we stop transmitting, we go back to a slow-start (also known
as `use it or lose it').

This is not enabled by default.  You can enable this behavior by patching
the "tcp_cwm" global (set it to non-zero) or by building a kernel with the
TCP_CWM option.
1998-04-01 22:15:52 +00:00
thorpej
2da6c91259 Fix a potential-congestion case in the larger initial congestion window
code, as clarified in the TCPIMPL WG meeting at IETF #41: If the SYN
(active open) or SYN,ACK (passive open) was retransmitted, the initial
congestion window for the first slow start of that connection must be
one segment.
1998-03-31 22:49:09 +00:00
kml
96954c2a53 Ensure that we take the IP option length into account when we calculate
the effective maximum send size for TCP.  ip_optlen() and tcp_optlen()
should probably be inlined for efficiency.
1998-03-24 03:10:02 +00:00
kml
123232e156 Fix a retransmission bug introduced by the Brakmo and Peterson
RTO estimation changes.  Under some circumstances it would return a value
of 0, while the old Van Jacobson RTO code would return a minimum of 3.
This would result in 12 retransmissions, each 1 second apart.
This takes care of those instances, and ensures that t_rttmin is
used everywhere as a lower bound.
1998-03-19 22:29:33 +00:00
kml
ffb211fb9d Ensure that the TCP segment size reflects the size of TCP options
in the packet.  This fixes a bug that was resulting in extra packets
in retransmissions (the second packet would be 12 bytes long,
reflecting the RFC1323 timestamp option size).
1998-03-17 23:50:30 +00:00
thorpej
5837cc6b07 Update copyright (sigh, should have done this long ago). 1998-02-19 02:36:42 +00:00
thorpej
e5e283e02d Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
1998-01-05 10:31:44 +00:00
thorpej
673fb149c6 Implement a queue for delayed ACK processing. This queue is used in
tcp_fasttimo() in lieu of scanning all open TCP connections.
1997-12-31 03:31:23 +00:00
thorpej
82ce1f6a97 From 4.4BSD-Lite2:
- If we fail to allocate mbufs for the outgoing segment, free the header
  and abort.

From Stevens:
- Ensure the persist timer is running if the send window reaches zero.
  Part of the fix for kern/2335 (pete@daemon.net).
1997-12-17 05:59:32 +00:00
thorpej
c02a72fcd0 Implement an infrastructure to allow larger initial congestion windows.
The sysctl'able variable "tcp_init_win", when set to 0, selects an
auto-tuning algorithm for selecting the initial window, based on transmit
segment size, per discussion in the IETF tcpimpl working group.

Default initial window is still 1 segment, but will soon become 2 segments,
per discussion in tcpimpl.
1997-12-11 22:47:24 +00:00
thorpej
8346cea65d Count delayed ACKs after they have been sucessfully transmitted. 1997-12-11 06:37:48 +00:00
thorpej
e2a99027d2 Add missing (implied) int to a variable declaration. 1997-11-20 19:12:41 +00:00
kml
86275dc497 TCP MSS fixes to provide cleaner slow-start and recovery. 1997-11-08 02:35:22 +00:00
kml
6b86b260cb change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc 1997-10-18 21:18:28 +00:00
kml
323c04642b Path MTU Discovery support. This is turned off by default.
Use sysctl -w net.inet.icmp.mtudisc=1 to turn on.
Still to come:  path removal after some period, black hole detection
1997-10-17 22:12:14 +00:00