NetBSD

Commit Graph

Author	SHA1	Message	Date
mycroft	7215a0b3f1	Introduce a new state variable, t_partialacks. It has 3 states: * t_partialacks<0 means we are not in fast recovery. * t_partialacks==0 means we are in fast recovery, but we have not received any partial acks yet. * t_partialacks>0 means we are in fast recovery, and we have received partial acks. This is used to implement 2 changes in RFC 3782: * We keep the notion that we are in fast recovery separate from t_dupacks, so it is not reset due to out-of-order acks. (This affects both the Reno and NewReno cases.) * We only reset the retransmit timer on the first partial ack -- preventing us from possibly taking one RTO per segment once fast recovery is initiated. As before, it is hard to measure any difference between Reno and NewReno in the real-world cases that I've tested.	2005-01-27 03:39:36 +00:00
mycroft	5283ca74ad	Fix two problems in our TCP stack: 1) If an echoed RFC 1323 time stamp appears to be later than the current time, ignore it and fall back to old-style RTT calculation. This prevents ending up with a negative RTT and panicking later. 2) Fix NewReno. This involves a few changes: a) Implement the send_high variable in RFC 2582. Our implementation is subtly different; it is one past the last sequence number transmitted rather than being equal to it. This simplifies some logic and makes the code smaller. Additional logic was required to prevent sequence number wraparound problems; this is not mentioned in RFC 2582. b) Make sure we reset t_dupacks on new acks, but not on a partial ack. All of the new ack code is pushed out into tcp_newreno(). (Later this will probably be a pluggable function.) Thus t_dupacks keeps track of whether we're in fast recovery all the time, with Reno or NewReno, which keeps some logic simpler. c) We do not need to update snd_recover when we're not in fast recovery. See tech-net for an explanation of this. d) In the gratuitous fast retransmit prevention case, do not send a packet. RFC 2582 specifically says that we should "do nothing". e) Do not inflate the congestion window on a partial ack. (This is done by testing t_dupacks to see whether we're still in fast recovery.) This brings the performance of NewReno back up to the same as Reno in a few random test cases (e.g. transferring peer-to-peer over my wireless network). I have not concocted a good test case for the behavior specific to NewReno.	2005-01-26 21:49:27 +00:00
yamt	ffebedd625	factor out receive side tcp/udp checksum handling code so that they can be used by eg. packet filters. reviewed by Christos Zoulas on tech-net@. (slightly tweaked since then to make tcp and udp similar.)	2004-12-21 05:51:31 +00:00
thorpej	7994b6f95e	Don't perform checksums on loopback interfaces. They can be reenabled with the net.inet.*.do_loopback_cksum sysctl. Approved by: groo	2004-12-15 04:25:19 +00:00
yamt	0ea22c32fa	fix ipqent pool corruption problems. make tcp reass code use its own pool of ipqent rather than sharing it with ip reass code. PR/24782.	2004-09-15 09:21:22 +00:00
itojun	4ebcfcf29a	fix MD5 signature support to actually validate inbound signature, and drop packet if fails.	2004-05-18 14:44:14 +00:00
itojun	e0395ac8f0	make TCP MD5 signature work with KAME IPSEC (#define IPSEC). support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the right thing). XXX current TCP MD5 signature code has giant flaw: it does not validate signature on input (can't believe it! what is the point?)	2004-04-26 03:54:28 +00:00
jonathan	887b782b0b	Initial commit of a port of the FreeBSD implementation of RFC 2385 (MD5 signatures for TCP, as used with BGP). Credit for original FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship credited to sentex.net. Shortening of the setsockopt() name attributed to Vincent Jardin. This commit is a minimal, working version of the FreeBSD code, as MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp modified to set the TCP-MD5 option; BMS's additions to tcpdump-current (tcpdump -M) confirm that the MD5 signatures are correct. Committed as-is for further testing between a NetBSD BGP speaker (e.g., quagga) and industry-standard BGP speakers (e.g., Cisco, Juniper). NOTE: This version has two potential flaws. First, I do see any code that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5 options are internally padded and assumed to be 32-bit aligned. A more space-efficient scheme is to pack all TCP options densely (and possibly unaligned) into the TCP header ; then do one final padding to a 4-byte boundary. Pre-existing comments note that accounting for TCP-option space when we add SACK is yet to be done. For now, I'm punting on that; we can solve it properly, in a way that will handle SACK blocks, as a separate exercise. In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c ,and modifies: sys/net/pfkeyv2.h,v 1.15 sys/netinet/files.netinet,v 1.5 sys/netinet/ip.h,v 1.25 sys/netinet/tcp.h,v 1.15 sys/netinet/tcp_input.c,v 1.200 sys/netinet/tcp_output.c,v 1.109 sys/netinet/tcp_subr.c,v 1.165 sys/netinet/tcp_usrreq.c,v 1.89 sys/netinet/tcp_var.h,v 1.109 sys/netipsec/files.netipsec,v 1.3 sys/netipsec/ipsec.c,v 1.11 sys/netipsec/ipsec.h,v 1.7 sys/netipsec/key.c,v 1.11 share/man/man4/tcp.4,v 1.16 lib/libipsec/pfkey.c,v 1.20 lib/libipsec/pfkey_dump.c,v 1.17 lib/libipsec/policy_token.l,v 1.8 sbin/setkey/parse.y,v 1.14 sbin/setkey/setkey.8,v 1.27 sbin/setkey/token.l,v 1.15 Note that the preceding two revisions to tcp.4 will be required to cleanly apply this diff.	2004-04-25 22:25:03 +00:00
itojun	0f06e31eb6	no space between function name and paren: foo (blah) -> foo(blah)	2004-04-21 17:49:46 +00:00
itojun	f2e796b13f	- respond to RST by ACK, as suggested in NISCC recommendation - rate-limit ACKs against RSTs and SYNs	2004-04-20 16:52:12 +00:00
matt	db6a0b431a	De __P()	2004-04-18 21:00:35 +00:00
thorpej	31923baa46	Rather than zeroing a tcpcb structure and filling in all the fields individually, create a tcpcb template pre-initialized (and pre-zero'd) with the static and mostly-static tcpcb parameters. The template is now copied into the new tcpcb, which zeros and initializes most of the tcpcb in one pass. The template is kept up-to-date as TCP sysctl variables are changed. Combined with the previous sb_max change, TCP socket creation is now 25% faster.	2003-10-22 02:45:57 +00:00
itojun	495906ca8e	revamp inpcb/in6pcb so that they are more aligned with each other. in6pcb lookup now uses hash(9).	2003-09-04 09:16:57 +00:00
agc	aad01611e7	Move UCB-licensed code from 4-clause to 3-clause licence. Patches provided by Joel Baker in PR 22364, verified by myself.	2003-08-07 16:26:28 +00:00
he	80ccb5520c	As a temporary workaround, apply the fix from PR#20390, thereby cooperating with the callout code in working around the race condition caused by the TCP code's use of the callout facility. Instead of unconditionally releasing memory in tcp_close() and SYN_CACHE_PUT(), check whether any of the related callout handlers are about to be invoked (but have not yet done callout_ack()), and if so, just mark the associated data structure (tcpcb or syn cache entry) as "dead", and test for this (and release storage) in the callout handler functions.	2003-07-20 16:35:07 +00:00
fvdl	d5aece61d6	Back out the lwp/ktrace changes. They contained a lot of colateral damage, and need to be examined and discussed more.	2003-06-29 22:28:00 +00:00
ragge	679db94879	Add code to remember where in the send queue of mbufs the last packet was sent from. This change avoid a linear search through all mbufs when using large TCP windows, and therefore permit high-speed connections on long distances. Tested on a 1 Gigabit connection between Luleå and San Francisco, a distance of about 15000km. With TCP windows of just over 20 Mbytes it could keep up with 950Mbit/s. After discussions with Matt Thomas and Jason Thorpe.	2003-06-29 18:58:26 +00:00
darrenr	960df3c8d1	Pass lwp pointers throughtout the kernel, as required, so that the lwpid can be inserted into ktrace records. The general change has been to replace "struct proc " with "struct lwp " in various function prototypes, pass the lwp through and use l_proc to get the process pointer when needed. Bump the kernel rev up to 1.6V	2003-06-28 14:20:43 +00:00
christos	8924cfdcba	abuse the mib instead of abusing the new pointer. Idea from simon burge. It allows the tcp_sysctl_ident to run by non-super-users. No backwards compatibility provided.	2003-06-26 17:32:22 +00:00
martin	d505b18964	Make sure to include opt_foo.h if a defflag option FOO is used.	2003-06-23 11:00:59 +00:00
christos	9b6eb382c2	PR/2352: Tor Egge: Add sysctl to get uid of connected socket.	2003-04-19 20:58:35 +00:00
thorpej	cdf1b0026c	Allow TCP connections to hosts on a local network to use a larger slow start initial window. Default this larger initial window to 4 packets, allowing it to be adjusted with net.inet.tcp.init_win_local.	2003-03-01 04:40:27 +00:00
matt	65e5548a17	Add MBUFTRACE kernel option. Do a little mbuf rework while here. Change all uses of MGET(, M_WAIT, ) to m_get(M_WAIT, *). These are not performance critical and making them call m_get saves considerable space. Add m_clget analogue of MCLGET and make corresponding change for M_WAIT uses. Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE. Begin to change netstat to use sysctl.	2003-02-26 06:31:08 +00:00
perry	6858187df6	/CONTCOND/ while (0)'ed macros	2002-11-02 07:20:42 +00:00
thorpej	10c252ba47	Changes to allow the IPv4 and IPv6 layers to align headers themseves, as necessary: * Implement a new mbuf utility routine, m_copyup(), is is like m_pullup(), except that it always prepends and copies, rather than only doing so if the desired length is larger than m->m_len. m_copyup() also allows an offset into the destination mbuf, which allows space for packet headers, in the forwarding case. * Add _HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that architectures which do not have strict alignment constraints don't pay for the test or visit the new align-if-needed path. Use the new macros to check if a header needs to be aligned, or to assert that it already is, as appropriate. Note: This code is still somewhat experimental. However, the new code path won't be visited if individual device drivers continue to guarantee that packets are delivered to layer 3 already properly aligned (which are rules that are already in use).	2002-06-30 22:40:32 +00:00
itojun	f192b66b94	whitespace	2002-06-09 16:33:36 +00:00
itojun	3e7ae517e0	path MTU discovery blackhole detection. PR 12790 (sorry for not committing it for a long time)	2002-05-26 16:05:43 +00:00
matt	c03e11f081	Eliminate commons.	2002-05-12 20:33:50 +00:00
itojun	38f3d28842	have tcp6_drain	2002-03-15 09:25:41 +00:00
itojun	a709c83618	place NRL copyright notice itself, not a reference to it.	2002-01-24 02:12:29 +00:00
thorpej	050e9de009	Use callouts for SYN cache timers, rather than traversing time queues in tcp_slowtimo().	2001-09-11 21:03:20 +00:00
thorpej	6d0e813f6c	Use callouts for TCP timers, rather than traversing the list of all open TCP connections in tcp_slowtimo() (which is called 2x per second). It's fairly rare for TCP timers to actually fire, so saving this list traversal is good, especially if you want to scale to thousands of open connections.	2001-09-10 22:14:26 +00:00
thorpej	45e02f5ee8	Split tcp_timers() into multiple functions, one for each timer, and call it directly from tcp_slowtimo() (via a table) rather than going through tcp_userreq(). This will allow us to call TCP timers directly from callouts, in a future revision.	2001-09-10 20:15:14 +00:00
thorpej	7446fd2bc8	Change the way receive idle time and round trip time are measured. Instead of incrementing t_idle and t_rtt in tcp_slowtimo(), we now take a timstamp (via tcp_now) and use subtraction to compute the delta when we actually need it (using unsigned arithmetic so that tcp_now wrapping is handled correctly). Based on similar changes in FreeBSD.	2001-09-10 15:23:09 +00:00
thorpej	783db90019	Use a callout for the delayed ACK timer, and delete tcp_fasttimo(). Expose the delayed ACK timer as net.inet.tcp.delack_ticks.	2001-09-10 04:24:24 +00:00
thorpej	938720eea4	Count the number of times we "self-quench" (ip_output() returns ENOBUFS), and don't inline tcp_segsize() if profiling.	2001-07-31 00:57:45 +00:00
mrg	67afbd6270	use _KERNEL_OPT	2001-05-30 11:57:16 +00:00
matt	524a19371f	Make t_flags a u_int instead of u_short. It's followed by a mbuf pointer so there's padding around it already. And it increases the amount of bits available for TF_* flags.	2001-05-26 22:02:57 +00:00
thorpej	bf2dcec4f5	Remove the use of splimp() from the NetBSD kernel. splnet() and only splnet() is allowed for the protection of data structures used by network devices.	2001-04-13 23:29:55 +00:00
thorpej	7a3c8f81a5	Two changes, designed to make us even more resilient against TCP ISS attacks (which we already fend off quite well). 1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic hash method of generating TCP ISS values. Note, this code is experimental and disabled by default (experimental enough that I don't export the variable via sysctl yet, either). There are a couple of issues I'd like to discuss with Steve, so this code should only be used by people who really know what they're doing. 2. Per a recent thread on Bugtraq, it's possible to determine a system's uptime by snooping the RFC1323 TCP timestamp options sent by a host; in 4.4BSD, timestamps are created by incrementing the tcp_now variable at 2 Hz; there's even a company out there that uses this to determine web server uptime. According to Newsham's paper "The Problem With Random Increments", while NetBSD's TCP ISS generation method is much better than the "random increment" method used by FreeBSD and OpenBSD, it is still theoretically possible to mount an attack against NetBSD's method if the attacker knows how many times the tcp_iss_seq variable has been incremented. By not leaking uptime information, we can make that much harder to determine. So, we avoid the leak by giving each TCP connection a timebase of 0.	2001-03-20 20:07:51 +00:00
itojun	9183e2dc4e	remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c (separate TCP/IPv6 stack) into netbsd-current.	2000-10-19 20:22:59 +00:00
thorpej	ea9b5a9106	Restructure the Path MTU Discovery code somewhat to avoid entering rtentry's for hosts we're not actually communicating with. Do this by invoking the ctlinput for the protocol, which is responsible for validating the ICMP message: * TCP -- Lookup the connection based on the address/port pairs in the ICMP message. * AH/ESP -- Lookup the SA based on the SPI in the ICMP message. If validation succeeds, ctlinput is responsible for calling icmp_mtudisc(). icmp_mtudisc() then invokes callbacks registered by protocols (such as TCP) which want to take some sort of special action when a path's MTU changes. For TCP, this is where we now refresh cached routes and re-enter slow-start. As a side-effect, this fixes the problem where TCP would not be notified when a path's MTU changed if AH/ESP were being used. XXX Note, this is only a fix for the IPv4 case. For the IPv6 XXX case, we need to wait for the KAME folks. Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.	2000-10-18 17:09:14 +00:00
itojun	32e6a89b31	net.inet.tcp.rstratelimit is deprecated. make it invalid and return ENOPROTOOPT.	2000-08-15 22:13:02 +00:00
itojun	63de4c2cb9	nuke the following sysctl variables. "ppsratelimit" should work better. need to recompile sbin/sysctl after updating /usr/include. net.inet.tcp.rstratelimit net.inet.icmp.errratelimit net.inet6.icmp6.errratelimit	2000-07-28 04:06:52 +00:00
itojun	dd9f2f7f1d	implement net.inet.tcp.rstppslimit to limit TCP RSTs by packet-per-second basis. default: 100pps set default value for net.inet.tcp.rstratelimit to 0 (disabled), NOTE: it does not work right for smaller-than-1/hz interval. maybe we should nuke it, or make it impossible to set smaller-than-1/hz value.	2000-07-27 11:34:06 +00:00
thorpej	b178e1f58c	Add support for rate-limiting RSTs sent in response to no socket for an incoming packet. Default minimum interval is 10ms. The interval is changeable via the "net.inet.tcp.rstratelimit" sysctl variable.	2000-02-15 19:54:11 +00:00
itojun	ea861f0183	sync IPv6 part with latest KAME tree. IPsec part is left unmodified due to massive changes in KAME side. - IPv6 output goes through nd6_output - faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator using heavily modified DNS servers - per-interface statistics (required for IPv6 MIB) - interface autoconfig is revisited - udp input handling has a big change for mapped address support. - introduce in4_cksum() for non-overwriting checksumming - introduce m_pulldown() - neighbor discovery cleanups/improvements - netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland) - IFA_STATS is fixed a bit (not tested) - and more more more. TODO: - cleanup os-independency #ifdef - avoid rcvif dual use (for IPsec) to help ifdetach (sorry for jumbo commit, I can't separate this any more...)	1999-12-13 15:17:17 +00:00
itojun	313f5eb9cd	do not drop from IP header to tcp option until sbappend(), to reduce requirement to mbuf chain. part of KAME sync, committed separately for its (possible) impact.	1999-12-08 16:22:20 +00:00
bouyer	f86517a031	Update protocoles and interfaces stats counters to 64bit. RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14) struct with 32bit counters (binary compat, conditioned on COMPAT_14). Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4. Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic() when the message is larger than MHLEN.	1999-11-19 10:41:41 +00:00
itojun	9474edfcd8	cleanup and correct TCP MSS consideration with IPsec headers. MSS advertisement must always be: max(if mtu) - ip hdr siz - tcp hdr siz We violated this in the previous code so it was fixed. tcp_mss_to_advertise() now takes af (af on wire) as its argument, to compute right ip hdr siz. tcp_segsize() will take care of IPsec header size. One thing I'm not really sure is how to handle IPsec header size in rxsegsizep (inbound segment size estimation). The current code subtracts possible outbound* IPsec size from *rxsegsizep, hoping that the peer is using the same IPsec policy as me. It may not be applicable, could TCP gulu please comment...	1999-09-23 02:21:30 +00:00

1 2 3

117 Commits