Commit Graph

93 Commits

Author SHA1 Message Date
thorpej 10c252ba47 Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
  m_pullup(), except that it always prepends and copies, rather
  than only doing so if the desired length is larger than m->m_len.
  m_copyup() also allows an offset into the destination mbuf, which
  allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP.  These
  macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
  architectures which do not have strict alignment constraints don't
  pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
  assert that it already is, as appropriate.

Note: This code is still somewhat experimental.  However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
2002-06-30 22:40:32 +00:00
itojun f192b66b94 whitespace 2002-06-09 16:33:36 +00:00
itojun 3e7ae517e0 path MTU discovery blackhole detection.
PR 12790 (sorry for not committing it for a long time)
2002-05-26 16:05:43 +00:00
matt c03e11f081 Eliminate commons. 2002-05-12 20:33:50 +00:00
itojun 38f3d28842 have tcp6_drain 2002-03-15 09:25:41 +00:00
itojun a709c83618 place NRL copyright notice itself, not a reference to it. 2002-01-24 02:12:29 +00:00
thorpej 050e9de009 Use callouts for SYN cache timers, rather than traversing time queues
in tcp_slowtimo().
2001-09-11 21:03:20 +00:00
thorpej 6d0e813f6c Use callouts for TCP timers, rather than traversing the list of
all open TCP connections in tcp_slowtimo() (which is called 2x
per second).  It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
2001-09-10 22:14:26 +00:00
thorpej 45e02f5ee8 Split tcp_timers() into multiple functions, one for each timer,
and call it directly from tcp_slowtimo() (via a table) rather
than going through tcp_userreq().

This will allow us to call TCP timers directly from callouts,
in a future revision.
2001-09-10 20:15:14 +00:00
thorpej 7446fd2bc8 Change the way receive idle time and round trip time are measured.
Instead of incrementing t_idle and t_rtt in tcp_slowtimo(), we now
take a timstamp (via tcp_now) and use subtraction to compute the
delta when we actually need it (using unsigned arithmetic so that
tcp_now wrapping is handled correctly).

Based on similar changes in FreeBSD.
2001-09-10 15:23:09 +00:00
thorpej 783db90019 Use a callout for the delayed ACK timer, and delete tcp_fasttimo().
Expose the delayed ACK timer as net.inet.tcp.delack_ticks.
2001-09-10 04:24:24 +00:00
thorpej 938720eea4 Count the number of times we "self-quench" (ip_output() returns
ENOBUFS), and don't inline tcp_segsize() if profiling.
2001-07-31 00:57:45 +00:00
mrg 67afbd6270 use _KERNEL_OPT 2001-05-30 11:57:16 +00:00
matt 524a19371f Make t_flags a u_int instead of u_short. It's followed by a mbuf pointer
so there's padding around it already.  And it increases the amount of bits
available for TF_* flags.
2001-05-26 22:02:57 +00:00
thorpej bf2dcec4f5 Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
2001-04-13 23:29:55 +00:00
thorpej 7a3c8f81a5 Two changes, designed to make us even more resilient against TCP
ISS attacks (which we already fend off quite well).

1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic
   hash method of generating TCP ISS values.  Note, this code is experimental
   and disabled by default (experimental enough that I don't export the
   variable via sysctl yet, either).  There are a couple of issues I'd
   like to discuss with Steve, so this code should only be used by people
   who really know what they're doing.

2. Per a recent thread on Bugtraq, it's possible to determine a system's
   uptime by snooping the RFC1323 TCP timestamp options sent by a host; in
   4.4BSD, timestamps are created by incrementing the tcp_now variable
   at 2 Hz; there's even a company out there that uses this to determine
   web server uptime.  According to Newsham's paper "The Problem With
   Random Increments", while NetBSD's TCP ISS generation method is much
   better than the "random increment" method used by FreeBSD and OpenBSD,
   it is still theoretically possible to mount an attack against NetBSD's
   method if the attacker knows how many times the tcp_iss_seq variable
   has been incremented.  By not leaking uptime information, we can make
   that much harder to determine.  So, we avoid the leak by giving each
   TCP connection a timebase of 0.
2001-03-20 20:07:51 +00:00
itojun 9183e2dc4e remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
2000-10-19 20:22:59 +00:00
thorpej ea9b5a9106 Restructure the Path MTU Discovery code somewhat to avoid
entering rtentry's for hosts we're not actually communicating
with.

Do this by invoking the ctlinput for the protocol, which is
responsible for validating the ICMP message:
	* TCP -- Lookup the connection based on the address/port
	  pairs in the ICMP message.
	* AH/ESP -- Lookup the SA based on the SPI in the ICMP message.

If validation succeeds, ctlinput is responsible for calling
icmp_mtudisc().  icmp_mtudisc() then invokes callbacks registered
by protocols (such as TCP) which want to take some sort of special
action when a path's MTU changes.  For TCP, this is where we now
refresh cached routes and re-enter slow-start.

As a side-effect, this fixes the problem where TCP would not be
notified when a path's MTU changed if AH/ESP were being used.

XXX Note, this is only a fix for the IPv4 case.  For the IPv6
XXX case, we need to wait for the KAME folks.

Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
2000-10-18 17:09:14 +00:00
itojun 32e6a89b31 net.inet.tcp.rstratelimit is deprecated. make it invalid and return
ENOPROTOOPT.
2000-08-15 22:13:02 +00:00
itojun 63de4c2cb9 nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
	net.inet.tcp.rstratelimit
	net.inet.icmp.errratelimit
	net.inet6.icmp6.errratelimit
2000-07-28 04:06:52 +00:00
itojun dd9f2f7f1d implement net.inet.tcp.rstppslimit to limit TCP RSTs by packet-per-second
basis.  default: 100pps

set default value for net.inet.tcp.rstratelimit to 0 (disabled),
NOTE: it does not work right for smaller-than-1/hz interval.  maybe we should
nuke it, or make it impossible to set smaller-than-1/hz value.
2000-07-27 11:34:06 +00:00
thorpej b178e1f58c Add support for rate-limiting RSTs sent in response to no socket for
an incoming packet.  Default minimum interval is 10ms.  The interval
is changeable via the "net.inet.tcp.rstratelimit" sysctl variable.
2000-02-15 19:54:11 +00:00
itojun ea861f0183 sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
  using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
1999-12-13 15:17:17 +00:00
itojun 313f5eb9cd do not drop from IP header to tcp option until sbappend(), to reduce
requirement to mbuf chain.
part of KAME sync, committed separately for its (possible) impact.
1999-12-08 16:22:20 +00:00
bouyer f86517a031 Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
1999-11-19 10:41:41 +00:00
itojun 9474edfcd8 cleanup and correct TCP MSS consideration with IPsec headers.
MSS advertisement must always be:
	max(if mtu) - ip hdr siz - tcp hdr siz
We violated this in the previous code so it was fixed.

tcp_mss_to_advertise() now takes af (af on wire) as its argument,
to compute right ip hdr siz.

tcp_segsize() will take care of IPsec header size.
One thing I'm not really sure is how to handle IPsec header size in
*rxsegsizep (inbound segment size estimation).
The current code subtracts possible *outbound* IPsec size from *rxsegsizep,
hoping that the peer is using the same IPsec policy as me.
It may not be applicable, could TCP gulu please comment...
1999-09-23 02:21:30 +00:00
itojun 809ab7f1ff When listening socket goes away, remove assockated syn cache entires.
Stale syn cache entries are useless because none of them will be used
if there is no listening socket, as tcp_input looks up listening socket by
in_pcblookup*() before looking into syn cache.

This fixes race condition due to dangling socket pointer from syn cache
entries to listening socket (this was introduced when ipsec is merged in).

This should preserve currently implemented behavior (but not 4.4BSD
behavior prior to syn cache).

Tested in KAME repository before commit, but we'd better run some
regression tests.
1999-08-25 15:23:12 +00:00
itojun 98fab25334 fix sototcpcb(). this sometimes caused panic on OOB data reception.
the macro may need to be expanded into dedicated function, rather than a macro,
to capture unsupported values.
1999-08-12 16:04:52 +00:00
itojun 70ada0957e sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
1999-07-31 18:41:15 +00:00
itojun 7fee35f579 - implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.
1999-07-22 12:56:56 +00:00
itojun 685747d56c Use proper ip protocol # field and tcp hdr on sending RST against SYN,
when ip header and tcp header are not adjacent to each other
(i.e. when ip6 options are attached).

To test this, try
	telnet @::1@::1 port
toward a port without responding server.  Prior to the fix, the kernel will
generate broken RST packet.
1999-07-14 22:37:13 +00:00
thorpej 267920eb1a defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
1999-07-09 23:41:16 +00:00
itojun 118d2b1d4f IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
  data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
  package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
  file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
1999-07-01 08:12:45 +00:00
ad ccc7e59e1f Add new sysctl (net.inet.tcp.log_refused) that when set, causes refused TCP
connections to be logged.
1999-05-23 20:33:50 +00:00
thorpej 2cd33a0ce1 Implement retransmit logic for the SYN cache engine. Fixes a rare condition
where one side can think a connection exists, where the other side thinks
the connection was never established.

The original problem was first reported by Ty Sarna in PR #5909.  The
original fix I made to the code didn't cover all cases.  The problem this
fix addresses was reported by Christoph Badura via private e-mail.

Many thanks to Bill Sommerfeld for helping me to test this code, and
for finding a subtle bug.
1999-04-29 03:54:22 +00:00
thorpej a58f271406 Oops, forgot to update copyright notice in previous. 1999-01-24 01:21:18 +00:00
thorpej 86e2c3fbc6 * Completely rewrite syn_cache_respond().
- Don't use tcp_respond(), instead create the tcp/ip header from scratch,
and send it ourself.
- Reuse the mbuf that carried the SYN, or allocate one if that is not
available.
- Cache the route we look up to do the Path MTU Discovery check, and
transfer the reference to that route to the inpcb when the connection
completes.
* Macro'ize a small, but often repeated code fragment.
1999-01-24 01:19:28 +00:00
thorpej 4f177aec90 Add a lock around the TCPCB's sequence queue, to prevent tcp_drain()
from corrupting the queue if called from a device's interrupt context.

Similar in nature to the problem reported in PR #5684.
1998-12-18 21:38:02 +00:00
matt 8e8f38e0f2 Add a sysctl for newreno (default to off). 1998-10-06 00:20:44 +00:00
matt 25054b5cf7 Adapt the NEWRENO changes from the UCSB diffs of BSDI 3.0's TCP
to NetBSD.  Ignore the SACK & FACK stuff for now.
1998-10-04 21:33:52 +00:00
mouse b95116821c Create tcp.keepidle, tcp.keepintvl, tcp.keepcnt, tcp.slowhz sysctls. 1998-09-10 10:46:03 +00:00
thorpej 4dbfe05f1f Use an algorithm similar to that in tcp_notify() to determine if
syn_cache_unreach() should remove the entry, or just continue on.

Algorithm is to only remove the entry if we've had more than one unreach
error and have retransmitted 3 or more times.  This prevents the following
scenario, as noted in PR #5909 (PR from Ty Sarna, scenario from
Charles Hannum):

	* Host A sends a SYN.
	* Host A retransmits the SYN.
	* Host B gets the first SYN and sends a SYN-ACK.
	* Host B gets the second SYN and sends a SYN-ACK.
	* One of the SYN-ACK bounces with an
	  ICMP unreachable, causing the `SYN cache' entry to be
	  removed with no notification.
	* Host A receives the other SYN-ACK, sends an ACK, and goes to
	  ESTABLISHED state.

Should fix PR #5909.
1998-09-09 01:32:27 +00:00
mycroft cca4e566a9 Implement a better fix for the `gratuitous FIN' problem, as
mentioned on tcp-impl but with a bit more commentary.
1998-07-21 10:46:00 +00:00
thorpej 5596fe2614 Nuke TUBA per my note to tech-net; there's no reason to keep it around. 1998-05-11 19:57:23 +00:00
thorpej ce3d776874 Rework the syn cache code somewhat:
- Don't use home-grown queue manipulation.  Use <sys/queue.h> instead.  The
  data structures are a little larger, but we are otherwise wasting the
  memory chunk anyway (we're already a 64-byte malloc bucket).
- Fix a bug in the cache-is-full case: if the oldest element removed from
  the first non-empty bucket was the only element in the bucket, the
  bucket wouldn't be removed from the bucket cache, causing queue corruption
  later.
- Optimize the syn cache timers by using PRT timers rather than home-grown
  decrement-and-propagate timers.

This code is now a fair bit smaller, and significantly easier to read
and understand.
1998-05-07 01:37:27 +00:00
thorpej 34e34c985a Use the monotonically increasing slow timer timestamp provided by
the protocol dispatch layer for TCP timers.  This saves having to
modify a potentially large number of timer values (which were shorts,
and expanded to ... a lot of code on the Alpha).
1998-05-06 01:24:38 +00:00
thorpej b71e4ddf4c Reintroduce the immediate ACK-on-PUSH behavior removed in revision 1.47,
but make the decision to do this dependent on the sysctl variable
net.inet.tcp.ack_on_push, which is disabled by default.
1998-05-02 04:21:58 +00:00
thorpej be12c489b4 Garbage-collect. 1998-05-01 18:31:12 +00:00
thorpej ce40806e29 In the CWM code, don't use the Floyd initial window computation as
the burst size allowed, but rather a fixed number of packets, as
described in the Internet Draft.  Default allowed burst is 4 packets,
per the Draft.

Make the use of CWM and the allowed burst size tunable via sysctl.
1998-04-30 18:27:20 +00:00
thorpej e81920fa23 Make tcp_compat_42 a sysctl option. 1998-04-30 17:55:27 +00:00