NetBSD

Commit Graph

Author	SHA1	Message	Date
rmind	4175f8693b	TCP socket buffers automatic sizing - ported from FreeBSD. http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html ! Disabled by default, marked as experimental. Testers are very needed. ! Someone should thoroughly test this, and improve if possible. Discussed on <tech-net>: http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html Thanks Greg Troxel for comments. OK by the long silence on <tech-net>.	2007-08-02 02:42:40 +00:00
ad	88ab7da936	Merge some of the less invasive changes from the vmlocking branch: - kthread, callout, devsw API changes - select()/poll() improvements - miscellaneous MT safety improvements	2007-07-09 20:51:58 +00:00
christos	0a36551606	tcpdrop kernel bits (from anon ymous)	2007-06-25 23:35:12 +00:00
christos	eeff189533	- per socket keepalive settings - settable connection establishment timeout	2007-06-20 15:29:17 +00:00
dyoung	72f0a6dfb0	Eliminate address family-specific route caches (struct route, struct route_in6, struct route_iso), replacing all caches with a struct route. The principle benefit of this change is that all of the protocol families can benefit from route cache-invalidation, which is necessary for correct routing. Route-cache invalidation fixes an ancient PR, kern/3508, at long last; it fixes various other PRs, also. Discussions with and ideas from Joerg Sonnenberger influenced this work tremendously. Of course, all design oversights and bugs are mine. DETAILS 1 I added to each address family a pool of sockaddrs. I have introduced routines for allocating, copying, and duplicating, and freeing sockaddrs: struct sockaddr sockaddr_alloc(sa_family_t af, int flags); struct sockaddr sockaddr_copy(struct sockaddr dst, const struct sockaddr src); struct sockaddr sockaddr_dup(const struct sockaddr src, int flags); void sockaddr_free(struct sockaddr sa); sockaddr_alloc() returns either a sockaddr from the pool belonging to the specified family, or NULL if the pool is exhausted. The returned sockaddr has the right size for that family; sa_family and sa_len fields are initialized to the family and sockaddr length---e.g., sa_family = AF_INET and sa_len = sizeof(struct sockaddr_in). sockaddr_free() puts the given sockaddr back into its family's pool. sockaddr_dup() and sockaddr_copy() work analogously to strdup() and strcpy(), respectively. sockaddr_copy() KASSERTs that the family of the destination and source sockaddrs are alike. The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is passed directly to pool_get(9). 2 I added routines for initializing sockaddrs in each address family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(), etc. They are fairly self-explanatory. 3 structs route_in6 and route_iso are no more. All protocol families use struct route. I have changed the route cache, 'struct route', so that it does not contain storage space for a sockaddr. Instead, struct route points to a sockaddr coming from the pool the sockaddr belongs to. I added a new method to struct route, rtcache_setdst(), for setting the cache destination: int rtcache_setdst(struct route , const struct sockaddr *); rtcache_setdst() returns 0 on success, or ENOMEM if no memory is available to create the sockaddr storage. It is now possible for rtcache_getdst() to return NULL if, say, rtcache_setdst() failed. I check the return value for NULL everywhere in the kernel. 4 Each routing domain (struct domain) has a list of live route caches, dom_rtcache. rtflushall(sa_family_t af) looks up the domain indicated by 'af', walks the domain's list of route caches and invalidates each one.	2007-05-02 20:40:22 +00:00
christos	53524e44ef	Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.	2007-03-04 05:59:00 +00:00
dyoung	5493f188c7	KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous parentheses in return statements. Cosmetic: don't open-code TAILQ_FOREACH(). Cosmetic: change types of variables to avoid oodles of casts: in in6_src.c, avoid casts by changing several route_in6 pointers to struct route pointers. Remove unnecessary casts to caddr_t elsewhere. Pave the way for eliminating address family-specific route caches: soon, struct route will not embed a sockaddr, but it will hold a reference to an external sockaddr, instead. We will set the destination sockaddr using rtcache_setdst(). (I created a stub for it, but it isn't used anywhere, yet.) rtcache_free() will free the sockaddr. I have extracted from rtcache_free() a helper subroutine, rtcache_clear(). rtcache_clear() will "forget" a cached route, but it will not forget the destination by releasing the sockaddr. I use rtcache_clear() instead of rtcache_free() in rtcache_update(), because rtcache_update() is not supposed to forget the destination. Constify: 1 Introduce const accessor for route->ro_dst, rtcache_getdst(). 2 Constify the 'dst' argument to ifnet->if_output(). This led me to constify a lot of code called by output routines. 3 Constify the sockaddr argument to protosw->pr_ctlinput. This led me to constify a lot of code called by ctlinput routines. 4 Introduce const macros for converting from a generic sockaddr to family-specific sockaddrs, e.g., sockaddr_in: satocsin6, satocsin, et cetera.	2007-02-17 22:34:07 +00:00
yamt	8836e5995d	add some more tcp mowners.	2006-12-06 09:10:45 +00:00
yamt	f5830ee995	- make tcp_reass static. - constify.	2006-12-06 09:08:27 +00:00
yamt	c31e22237d	- constify. - make tcp_dooptions and tcpipqent_pool static.	2006-10-21 10:08:54 +00:00
yamt	81463c93c7	implement RFC3465 appropriate byte counting. from Kentaro A. Kurahone, with minor adjustments by me. the ack prediction part of the original patch was omitted because it's a separate change. reviewed by Rui Paulo.	2006-10-19 11:40:51 +00:00
rpaulo	21df8206df	Export the tcp_do_rfc1948 variable to userland via sysctl. The code to generate an ISS via an MD5 hash has been present in the NetBSD kernel since 2001, but it wasn't even exported to userland at that time. It was agreed on tech-net with the original author <thorpej> that we should let the user decide if he wants to enable it or not. Not enabled by default.	2006-10-16 18:13:56 +00:00
rpaulo	f3330397f0	Modular (I tried ;-) TCP congestion control API. Whenever certain conditions happen in the TCP stack, this interface calls the specified callback to handle the situation according to the currently selected congestion control algorithm. A new sysctl node was created: net.inet.tcp.congctl.{available,selected} with obvious meanings. The old net.inet.tcp.newreno MIB was removed. The API is discussed in tcp_congctl(9). In the near future, it will be possible to selected a congestion control algorithm on a per-socket basis. Discussed on tech-net and reviewed by <yamt>.	2006-10-09 16:27:07 +00:00
rpaulo	2fb2ae3251	Import of TCP ECN algorithm for congestion control. Both available for IPv4 and IPv6. Basic implementation test results are available at http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html. Work sponsored by the Google Summer of Code project 2006. Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their help, comments and support during the project.	2006-09-05 00:29:35 +00:00
rpaulo	25ec6d007f	revert stuff that shouldn't have gone in.	2006-07-22 17:45:03 +00:00
rpaulo	f5f6aa2ed3	TCP RFC is 793, not 783.	2006-07-22 17:39:48 +00:00
perry	fbae48b901	Change "inline" back to "__inline" in .h files -- C99 is still too new, and some apps compile things in C89 mode. C89 keywords stay. As per core@.	2006-02-16 20:17:12 +00:00
perry	0f0296d88a	Remove leading __ from __(const\|inline\|signed\|volatile) -- it is obsolete.	2005-12-24 20:45:08 +00:00
christos	95e1ffb156	merge ktrace-lwp.	2005-12-11 12:16:03 +00:00
elad	9702e98730	Multiple inclusion protection, as suggested by christos@ on tech-kern@ few days ago.	2005-12-10 23:31:41 +00:00
rpaulo	37cbe61e67	Implement tcp.inet{,6}.tcp{,6}.(debug\|debx) when TCP_DEBUG is set. They can be used to ``transliterate protocol trace'' like trpt(8) does.	2005-09-06 02:41:14 +00:00
yamt	f02551ec2d	move {tcp,udp}_do_loopback_cksum back to tcp/udp so that they can be referenced by ipv6.	2005-08-10 13:06:49 +00:00
elad	6439f2618f	Add sysctls for IP, ICMP, TCP, and UDP statistics.	2005-08-05 09:21:25 +00:00
christos	89940190d0	Implement PMTU checks from: http://www.gont.com.ar/drafts/icmp-attacks-against-tcp.html 1. Don't act on ICMP-need-frag immediately if adhoc checks on the advertised MTU fail. The MTU update is delayed until a TCP retransmit happens. 2. Ignore ICMP Source Quench messages meant for TCP connections. From OpenBSD.	2005-07-19 17:00:02 +00:00
christos	ea2d4204b6	- add const - remove bogus casts - avoid nested variables	2005-05-29 21:41:23 +00:00
kurahone	f7707899c1	Added sysctl tunable limits for the number of maximum SACK holes per connection and per system. Idea taken from FreeBSD.	2005-04-05 01:07:17 +00:00
yamt	8b0967ff45	protect tcpipqent with splvm.	2005-03-29 20:10:16 +00:00
yamt	df05ca7085	simplify data receiver side sack processing. - introduce t_segqlen, the number of segments in segq/timeq. the name is from freebsd. - rather than maintaining a copy of sack blocks (rcv_sack_block[]), build it directly from the segment list when needed.	2005-03-16 00:39:56 +00:00
yamt	0446b7c3e3	- use full sized segments unless we actually have SACKs to send. - avoid TSO duplicate D-SACK. - send SACKs regardless of TF_ACKNOW. - don't clear rcv_sack_num when transmitting. discussed on tech-net@.	2005-03-16 00:38:27 +00:00
atatat	76a9013c25	gc the tcp_sysctl() prototype since it's completely vestigial	2005-03-09 04:51:56 +00:00
mycroft	c9f058f65e	Copyright maintenance.	2005-03-02 10:20:18 +00:00
jonathan	4ae1f36dc9	Commit TCP SACK patches from Kentaro A. Karahone's patch at: http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz Fixes in that patch for pre-existing TCP pcb initializations were already committed to NetBSD-current, so are not included in this commit. The SACK patch has been observed to correctly negotiate and respond, to SACKs in wide-area traffic. There are two indepenently-observed, as-yet-unresolved anomalies: First, seeing unexplained delays between in fast retransmission (potentially explainable by an 0.2sec RTT between adjacent ethernet/wifi NICs); and second, peculiar and unepxlained TCP retransmits observed over an ath0 card. After discussion with several interested developers, I'm committing this now, as-is, for more eyes to use and look over. Current hypothesis is that the anomalies above may in fact be due to link/level (hardware, driver, HAL, firmware) abberations in the test setup, affecting both Kentaro's wired-Ethernet NIC and in my two (different) WiFi NICs.	2005-02-28 16:20:59 +00:00
pk	237a0c2d85	Update tcp_trace() prototype to match implementation.	2005-02-06 20:13:09 +00:00
mycroft	7215a0b3f1	Introduce a new state variable, t_partialacks. It has 3 states: * t_partialacks<0 means we are not in fast recovery. * t_partialacks==0 means we are in fast recovery, but we have not received any partial acks yet. * t_partialacks>0 means we are in fast recovery, and we have received partial acks. This is used to implement 2 changes in RFC 3782: * We keep the notion that we are in fast recovery separate from t_dupacks, so it is not reset due to out-of-order acks. (This affects both the Reno and NewReno cases.) * We only reset the retransmit timer on the first partial ack -- preventing us from possibly taking one RTO per segment once fast recovery is initiated. As before, it is hard to measure any difference between Reno and NewReno in the real-world cases that I've tested.	2005-01-27 03:39:36 +00:00
mycroft	5283ca74ad	Fix two problems in our TCP stack: 1) If an echoed RFC 1323 time stamp appears to be later than the current time, ignore it and fall back to old-style RTT calculation. This prevents ending up with a negative RTT and panicking later. 2) Fix NewReno. This involves a few changes: a) Implement the send_high variable in RFC 2582. Our implementation is subtly different; it is one past the last sequence number transmitted rather than being equal to it. This simplifies some logic and makes the code smaller. Additional logic was required to prevent sequence number wraparound problems; this is not mentioned in RFC 2582. b) Make sure we reset t_dupacks on new acks, but not on a partial ack. All of the new ack code is pushed out into tcp_newreno(). (Later this will probably be a pluggable function.) Thus t_dupacks keeps track of whether we're in fast recovery all the time, with Reno or NewReno, which keeps some logic simpler. c) We do not need to update snd_recover when we're not in fast recovery. See tech-net for an explanation of this. d) In the gratuitous fast retransmit prevention case, do not send a packet. RFC 2582 specifically says that we should "do nothing". e) Do not inflate the congestion window on a partial ack. (This is done by testing t_dupacks to see whether we're still in fast recovery.) This brings the performance of NewReno back up to the same as Reno in a few random test cases (e.g. transferring peer-to-peer over my wireless network). I have not concocted a good test case for the behavior specific to NewReno.	2005-01-26 21:49:27 +00:00
yamt	ffebedd625	factor out receive side tcp/udp checksum handling code so that they can be used by eg. packet filters. reviewed by Christos Zoulas on tech-net@. (slightly tweaked since then to make tcp and udp similar.)	2004-12-21 05:51:31 +00:00
thorpej	7994b6f95e	Don't perform checksums on loopback interfaces. They can be reenabled with the net.inet.*.do_loopback_cksum sysctl. Approved by: groo	2004-12-15 04:25:19 +00:00
yamt	0ea22c32fa	fix ipqent pool corruption problems. make tcp reass code use its own pool of ipqent rather than sharing it with ip reass code. PR/24782.	2004-09-15 09:21:22 +00:00
itojun	4ebcfcf29a	fix MD5 signature support to actually validate inbound signature, and drop packet if fails.	2004-05-18 14:44:14 +00:00
itojun	e0395ac8f0	make TCP MD5 signature work with KAME IPSEC (#define IPSEC). support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the right thing). XXX current TCP MD5 signature code has giant flaw: it does not validate signature on input (can't believe it! what is the point?)	2004-04-26 03:54:28 +00:00
jonathan	887b782b0b	Initial commit of a port of the FreeBSD implementation of RFC 2385 (MD5 signatures for TCP, as used with BGP). Credit for original FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship credited to sentex.net. Shortening of the setsockopt() name attributed to Vincent Jardin. This commit is a minimal, working version of the FreeBSD code, as MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp modified to set the TCP-MD5 option; BMS's additions to tcpdump-current (tcpdump -M) confirm that the MD5 signatures are correct. Committed as-is for further testing between a NetBSD BGP speaker (e.g., quagga) and industry-standard BGP speakers (e.g., Cisco, Juniper). NOTE: This version has two potential flaws. First, I do see any code that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5 options are internally padded and assumed to be 32-bit aligned. A more space-efficient scheme is to pack all TCP options densely (and possibly unaligned) into the TCP header ; then do one final padding to a 4-byte boundary. Pre-existing comments note that accounting for TCP-option space when we add SACK is yet to be done. For now, I'm punting on that; we can solve it properly, in a way that will handle SACK blocks, as a separate exercise. In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c ,and modifies: sys/net/pfkeyv2.h,v 1.15 sys/netinet/files.netinet,v 1.5 sys/netinet/ip.h,v 1.25 sys/netinet/tcp.h,v 1.15 sys/netinet/tcp_input.c,v 1.200 sys/netinet/tcp_output.c,v 1.109 sys/netinet/tcp_subr.c,v 1.165 sys/netinet/tcp_usrreq.c,v 1.89 sys/netinet/tcp_var.h,v 1.109 sys/netipsec/files.netipsec,v 1.3 sys/netipsec/ipsec.c,v 1.11 sys/netipsec/ipsec.h,v 1.7 sys/netipsec/key.c,v 1.11 share/man/man4/tcp.4,v 1.16 lib/libipsec/pfkey.c,v 1.20 lib/libipsec/pfkey_dump.c,v 1.17 lib/libipsec/policy_token.l,v 1.8 sbin/setkey/parse.y,v 1.14 sbin/setkey/setkey.8,v 1.27 sbin/setkey/token.l,v 1.15 Note that the preceding two revisions to tcp.4 will be required to cleanly apply this diff.	2004-04-25 22:25:03 +00:00
itojun	0f06e31eb6	no space between function name and paren: foo (blah) -> foo(blah)	2004-04-21 17:49:46 +00:00
itojun	f2e796b13f	- respond to RST by ACK, as suggested in NISCC recommendation - rate-limit ACKs against RSTs and SYNs	2004-04-20 16:52:12 +00:00
matt	db6a0b431a	De __P()	2004-04-18 21:00:35 +00:00
thorpej	31923baa46	Rather than zeroing a tcpcb structure and filling in all the fields individually, create a tcpcb template pre-initialized (and pre-zero'd) with the static and mostly-static tcpcb parameters. The template is now copied into the new tcpcb, which zeros and initializes most of the tcpcb in one pass. The template is kept up-to-date as TCP sysctl variables are changed. Combined with the previous sb_max change, TCP socket creation is now 25% faster.	2003-10-22 02:45:57 +00:00
itojun	495906ca8e	revamp inpcb/in6pcb so that they are more aligned with each other. in6pcb lookup now uses hash(9).	2003-09-04 09:16:57 +00:00
agc	aad01611e7	Move UCB-licensed code from 4-clause to 3-clause licence. Patches provided by Joel Baker in PR 22364, verified by myself.	2003-08-07 16:26:28 +00:00
he	80ccb5520c	As a temporary workaround, apply the fix from PR#20390, thereby cooperating with the callout code in working around the race condition caused by the TCP code's use of the callout facility. Instead of unconditionally releasing memory in tcp_close() and SYN_CACHE_PUT(), check whether any of the related callout handlers are about to be invoked (but have not yet done callout_ack()), and if so, just mark the associated data structure (tcpcb or syn cache entry) as "dead", and test for this (and release storage) in the callout handler functions.	2003-07-20 16:35:07 +00:00
fvdl	d5aece61d6	Back out the lwp/ktrace changes. They contained a lot of colateral damage, and need to be examined and discussed more.	2003-06-29 22:28:00 +00:00
ragge	679db94879	Add code to remember where in the send queue of mbufs the last packet was sent from. This change avoid a linear search through all mbufs when using large TCP windows, and therefore permit high-speed connections on long distances. Tested on a 1 Gigabit connection between Luleå and San Francisco, a distance of about 15000km. With TCP windows of just over 20 Mbytes it could keep up with 950Mbit/s. After discussions with Matt Thomas and Jason Thorpe.	2003-06-29 18:58:26 +00:00

1 2 3

150 Commits