Commit Graph

131 Commits

Author SHA1 Message Date
thorpej
d679590033 Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums.  There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.
2001-09-17 17:26:59 +00:00
thorpej
050e9de009 Use callouts for SYN cache timers, rather than traversing time queues
in tcp_slowtimo().
2001-09-11 21:03:20 +00:00
thorpej
6d0e813f6c Use callouts for TCP timers, rather than traversing the list of
all open TCP connections in tcp_slowtimo() (which is called 2x
per second).  It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
2001-09-10 22:14:26 +00:00
thorpej
7446fd2bc8 Change the way receive idle time and round trip time are measured.
Instead of incrementing t_idle and t_rtt in tcp_slowtimo(), we now
take a timstamp (via tcp_now) and use subtraction to compute the
delta when we actually need it (using unsigned arithmetic so that
tcp_now wrapping is handled correctly).

Based on similar changes in FreeBSD.
2001-09-10 15:23:09 +00:00
abs
03aaf3d8b4 Rename TCPDEBUG to TCP_DEBUG, defopt TCP_DEBUG and TCP_NDEBUG, and
make all usage of tcp_trace dependent on TCP_DEBUG - resulting in
a 31K saving on an INET enabled i386 kernel.
2001-07-08 16:18:56 +00:00
wiz
3f9984fc90 existent', not existant' 2001-06-19 13:42:07 +00:00
thorpej
ad9d3794b0 Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces.  This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us.  In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software.  This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off".  It is
enabled with ifconfig(8).  See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
2001-06-02 16:17:09 +00:00
itojun
1bec764d78 correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine.  sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)
2001-05-08 10:15:13 +00:00
thorpej
7a3c8f81a5 Two changes, designed to make us even more resilient against TCP
ISS attacks (which we already fend off quite well).

1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic
   hash method of generating TCP ISS values.  Note, this code is experimental
   and disabled by default (experimental enough that I don't export the
   variable via sysctl yet, either).  There are a couple of issues I'd
   like to discuss with Steve, so this code should only be used by people
   who really know what they're doing.

2. Per a recent thread on Bugtraq, it's possible to determine a system's
   uptime by snooping the RFC1323 TCP timestamp options sent by a host; in
   4.4BSD, timestamps are created by incrementing the tcp_now variable
   at 2 Hz; there's even a company out there that uses this to determine
   web server uptime.  According to Newsham's paper "The Problem With
   Random Increments", while NetBSD's TCP ISS generation method is much
   better than the "random increment" method used by FreeBSD and OpenBSD,
   it is still theoretically possible to mount an attack against NetBSD's
   method if the attacker knows how many times the tcp_iss_seq variable
   has been incremented.  By not leaking uptime information, we can make
   that much harder to determine.  So, we avoid the leak by giving each
   TCP connection a timebase of 0.
2001-03-20 20:07:51 +00:00
itojun
617b3fab7e - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
  so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
2001-01-24 09:04:15 +00:00
itojun
9302f377a6 remove NRL code leftover. sync with kame 2000-12-10 23:39:36 +00:00
itojun
9183e2dc4e remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
2000-10-19 20:22:59 +00:00
itojun
a7e15e4935 be more friendly with INET-less build.
XXX we need to do more to do a working INET-less build
2000-10-17 03:06:42 +00:00
thorpej
d839a91f5f Add an IP_MTUDISC flag to the flags that can be passed to
ip_output().  This flag, if set, causes ip_output() to set
DF in the IP header if the MTU in the route is not locked.

This allows a bunch of redundant code, which I was never
really all that happy about adding in the first place, to
be eliminated.

Inspired by a similar change made by provos@openbsd.org when
he integrated NetBSD's Path MTU Discovery code into OpenBSD.
2000-10-17 02:57:01 +00:00
itojun
63de4c2cb9 nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
	net.inet.tcp.rstratelimit
	net.inet.icmp.errratelimit
	net.inet6.icmp6.errratelimit
2000-07-28 04:06:52 +00:00
itojun
dd9f2f7f1d implement net.inet.tcp.rstppslimit to limit TCP RSTs by packet-per-second
basis.  default: 100pps

set default value for net.inet.tcp.rstratelimit to 0 (disabled),
NOTE: it does not work right for smaller-than-1/hz interval.  maybe we should
nuke it, or make it impossible to set smaller-than-1/hz value.
2000-07-27 11:34:06 +00:00
itojun
a18c2d780f be proactive about unspecified IPv6 source address. pcb layer uses
unspecified address (::) to mean "unbounded" or "unconnected",
and can be confused by packets from outside.

use of :: as source is not documented well in IPv6 specification.

not sure if it presents a real threat.  the worst case scenario is a DoS
against TCP listening socket:
- outsider transmit TCP SYN with :: as IPv6 source
- receiving side creates TCP control block with:
	local address = my addres
	remote address = ::     (meaning "unconnected")
	state = SYN_RCVD
  note that SYN ACK will not be sent due to ip6_output() filter.
  this stays until it timeouts.
- the TCP control block prevents listening TCP control block from
  being contacted (DoS).

udp6/raw6 socket may have similar problem, but as they are connectionless,
it may too much to filter it out.
2000-07-27 06:18:13 +00:00
itojun
ca777cb72c add an DIAGNOSTIC case for MCLBYTES assumption 2000-07-23 05:00:01 +00:00
itojun
8a661b9beb be more cautious about tcp option length field. drop bogus ones earlier.
not sure if there is a real threat or not, but it seems that there's
possibility for overrun/underrun (like non-NOP option with optlen > cnt).
2000-07-09 12:49:08 +00:00
itojun
0a1e211454 - do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation.  TOOD: should implement ppsratecheck(9).
2000-07-06 12:36:18 +00:00
thorpej
6a900bc9ff Fix some zero-vs-NULL confusion. 2000-07-05 21:45:14 +00:00
itojun
8ff902fca1 repair kernel faithd(8) support. there were two mistakes:
(1) tcp6_input dropped packets for translation
(2) in6_pcblookup_connect was too strict
2000-07-02 08:04:10 +00:00
itojun
23f6a4f4e8 remove old mbuf assumption (ip header and tcp header are on the same mbuf).
this is for m_pulldown use. (sync with kame)
2000-06-30 16:44:33 +00:00
matt
650107086a remove superfluous test (snd_una is always > iss since th_ack must > iss
(first test at start of case) and th_ack is assigned to snd_una).
2000-05-05 15:05:29 +00:00
matt
5a6e4c896c From PR #3733: Only disarm timer if SYN contained the ACK bit since if
it didn't it would be a crossing/simultaneous SYN and doesn't mean the
remote TCP received our SYN.
2000-05-05 14:51:46 +00:00
augustss
8529438fe6 Remove register declarations. 2000-03-30 12:51:13 +00:00
itojun
04ac848d6f introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing.  this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
2000-03-01 12:49:27 +00:00
thorpej
b178e1f58c Add support for rate-limiting RSTs sent in response to no socket for
an incoming packet.  Default minimum interval is 10ms.  The interval
is changeable via the "net.inet.tcp.rstratelimit" sysctl variable.
2000-02-15 19:54:11 +00:00
thorpej
312cb38ccb In the tcp_input() path:
- Filter out multicast destinations explicitly for every incoming packet,
  not just SYNs.  Previously, non-SYN multicast destination would be
  filtered out as a side effect of PCB lookup.  Remove now redundant
  similar checks in the dropwithreset case and in syn_cache_add().
- Defer the TCP checksum until we know that we want to process the
  packet (i.e. have a non-CLOSED connection or a listen socket).
2000-02-12 17:19:34 +00:00
itojun
1a2a1e2b1f bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
2000-01-31 14:18:52 +00:00
itojun
dc0f1c0435 drop IPv6 packets with v4 mapped address on src/dst. they are illegal
and may be used to fool IPv6 implementations (by using ::ffff:127.0.0.1 as
source you may be able to pretend the packet is from local node)
1999-12-22 04:03:01 +00:00
itojun
abddb5f851 do not overwrite traffic class field when we write IPv6 version field. 1999-12-15 06:28:43 +00:00
itojun
ea861f0183 sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
  using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
1999-12-13 15:17:17 +00:00
itojun
4d757da195 implement upper-layer reachability confirmation for IPv6 ND (RFC2461 7.3.1).
fix code to reject "tcp to IPv6 anycast".

sync with recent KAME.
1999-12-11 09:55:14 +00:00
itojun
313f5eb9cd do not drop from IP header to tcp option until sbappend(), to reduce
requirement to mbuf chain.
part of KAME sync, committed separately for its (possible) impact.
1999-12-08 16:22:20 +00:00
itojun
9474edfcd8 cleanup and correct TCP MSS consideration with IPsec headers.
MSS advertisement must always be:
	max(if mtu) - ip hdr siz - tcp hdr siz
We violated this in the previous code so it was fixed.

tcp_mss_to_advertise() now takes af (af on wire) as its argument,
to compute right ip hdr siz.

tcp_segsize() will take care of IPsec header size.
One thing I'm not really sure is how to handle IPsec header size in
*rxsegsizep (inbound segment size estimation).
The current code subtracts possible *outbound* IPsec size from *rxsegsizep,
hoping that the peer is using the same IPsec policy as me.
It may not be applicable, could TCP gulu please comment...
1999-09-23 02:21:30 +00:00
simonb
fd8040a031 s/acknowledgment/acknowledgement/ 1999-09-10 03:24:14 +00:00
thorpej
1e921673e3 Fix a problem discovered by the snd_recover update fix. A bit of the
New Reno fast recovery code was being executed even when New Reno was
disabled, resulting in an unfortunate interaction with the traditional
fast recovery code, the end resulting being that the very condition
that would trigger the traditional fast recovery mechanism caused fast
recovery to be disabled!

Problem reported by Ted Lemon, and some analytical help from Charles Hannum.
1999-08-26 00:04:30 +00:00
itojun
809ab7f1ff When listening socket goes away, remove assockated syn cache entires.
Stale syn cache entries are useless because none of them will be used
if there is no listening socket, as tcp_input looks up listening socket by
in_pcblookup*() before looking into syn cache.

This fixes race condition due to dangling socket pointer from syn cache
entries to listening socket (this was introduced when ipsec is merged in).

This should preserve currently implemented behavior (but not 4.4BSD
behavior prior to syn cache).

Tested in KAME repository before commit, but we'd better run some
regression tests.
1999-08-25 15:23:12 +00:00
christos
d6f8878423 PR/8254: Wolfgang Rupprecht: Incorrect logging of tcp connections; Fix src/dst
confusion.
1999-08-23 14:14:30 +00:00
thorpej
af1e02ad91 Fix a few bugs in the TCP New Reno code:
- Make sure that snd_recover is always at least snd_una.  If we don't do
  this, there can be confusion when sequence numbers wrap around on a
  large loss-free data transfer.
- When doing a New Reno retransmit, snd_una hasn't been updated yet,
  and the socket's send buffer has not yet dropped off ACK'd data, so
  don't muddle with snd_una, so that tcp_output() gets the correct data
  offset.
- When doing a New Reno retransmit, make sure the congestion window is
  open one segment beyond the ACK'd data, so that we can actually perform
  the retransmit.

Partially derived from, although more complete than, similar changes in
OpenBSD, which in turn originated from Tom Henderson <tomh@cs.berkeley.edu>.
1999-08-11 17:37:59 +00:00
thorpej
e48f29e82b Make sure the echoed RFC 1323 timestamp is valid before using it to
compute the round trip time.  From Mark Allman <mallman@lerc.nasa.gov>.
1999-08-11 03:02:18 +00:00
itojun
7fee35f579 - implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.
1999-07-22 12:56:56 +00:00
itojun
b479094c45 no need to include faith.h on non-IPv6 build, so wrap by #ifdef.
(dunno if it's better to always include it or not)
1999-07-17 12:53:05 +00:00
itojun
c74f79d16f fix faith interface support. need testing.
(i understand this is a dirty hack, of course)
1999-07-17 07:07:08 +00:00
itojun
685747d56c Use proper ip protocol # field and tcp hdr on sending RST against SYN,
when ip header and tcp header are not adjacent to each other
(i.e. when ip6 options are attached).

To test this, try
	telnet @::1@::1 port
toward a port without responding server.  Prior to the fix, the kernel will
generate broken RST packet.
1999-07-14 22:37:13 +00:00
thorpej
f9a7668b3f defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h). 1999-07-09 22:57:15 +00:00
itojun
4b961b81e3 avoid "variable not initialized" warnings on some of the platforms. 1999-07-02 12:45:32 +00:00
itojun
118d2b1d4f IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
  data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
  package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
  file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
1999-07-01 08:12:45 +00:00
ad
ccc7e59e1f Add new sysctl (net.inet.tcp.log_refused) that when set, causes refused TCP
connections to be logged.
1999-05-23 20:33:50 +00:00