NetBSD

Author	SHA1	Message	Date
dyoung	c2e43be1c5	Reduces the resources demanded by TCP sessions in TIME_WAIT-state using methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime Truncation (MSLT). MSLT and VTW were contributed by Coyote Point Systems, Inc. Even after a TCP session enters the TIME_WAIT state, its corresponding socket and protocol control blocks (PCBs) stick around until the TCP Maximum Segment Lifetime (MSL) expires. On a host whose workload necessarily creates and closes down many TCP sockets, the sockets & PCBs for TCP sessions in TIME_WAIT state amount to many megabytes of dead weight in RAM. Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to a class based on the nearness of the peer. Corresponding to each class is an MSL, and a session uses the MSL of its class. The classes are loopback (local host equals remote host), local (local host and remote host are on the same link/subnet), and remote (local host and remote host communicate via one or more gateways). Classes corresponding to nearer peers have lower MSLs by default: 2 seconds for loopback, 10 seconds for local, 60 seconds for remote. Loopback and local sessions expire more quickly when MSLT is used. Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket dead weight with a compact representation of the session, called a "vestigial PCB". VTW data structures are designed to be very fast and memory-efficient: for fast insertion and lookup of vestigial PCBs, the PCBs are stored in a hash table that is designed to minimize the number of cacheline visits per lookup/insertion. The memory both for vestigial PCBs and for elements of the PCB hashtable come from fixed-size pools, and linked data structures exploit this to conserve memory by representing references with a narrow index/offset from the start of a pool instead of a pointer. When space for new vestigial PCBs runs out, VTW makes room by discarding old vestigial PCBs, oldest first. VTW cooperates with MSLT. It may help to think of VTW as a "FIN cache" by analogy to the SYN cache. A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT sessions as fast as it can is approximately 17% idle when VTW is active versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM when VTW is active (approximately 64k vestigial PCBs are created) than when it is inactive.	2011-05-03 18:28:44 +00:00
dyoung	60149b1ce8	Work in progress: use a raw socket for GRE in IP encapsulation instead of adding/subtracting our own IPv4 header. There are many benefits: gre(4) needn't grok the outer encapsulation header any longer, so this simplifies the gre(4) code. The IP stack needn't grok GRE, so it is simplified, too. gre(4) will benefit from optimizations in the socket code. Eventually, gre(4) will gain an IPv6 encapsulation with very few new lines of code. There is a small performance loss. A 133 MHz, 486-class AMD Elan sinks/sources a TCP stream over GRE with about 93% the throughput of the old code. TCP throughput on a 266 MHz, 586-class AMD Geode is about 96% the throughput of the old code. A 175-MHz ADM5120 (MIPS) only sinks a TCP stream over GRE at about 90% of the old code; I am still investigating that. I produced stripped-down versions of sosend() and soreceive() for gre(4) to use. They are guaranteed not to block, so they can be called from a software interrupt and from a socket upcall, respectively. A kernel thread is no longer necessary for socket transmit/receive, but I didn't get around to removing it, yet. Thanks to Matt Thomas for suggesting the use of stripped-down socket code and software interrupts, and to Andrew Doran for advice and answers concerning software interrupts, threads, and performance.	2007-10-05 03:28:12 +00:00
dyoung	8b646d9bb9	Remove obsolete files netinet/in_route.[ch].	2007-05-02 22:39:03 +00:00
dyoung	c308b1c661	Here are various changes designed to protect against bad IPv4 routing caused by stale route caches (struct route). Route caches are sprinkled throughout PCBs, the IP fast-forwarding table, and IP tunnel interfaces (gre, gif, stf). Stale IPv6 and ISO route caches will be treated by separate patches. Thank you to Christoph Badura for suggesting the general approach to invalidating route caches that I take here. Here are the details: Add hooks to struct domain for tracking and for invalidating each domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall. Introduce helper subroutines, rtflush(ro) for invalidating a route cache, rtflushall(family) for invalidating all route caches in a routing domain, and rtcache(ro) for notifying the domain of a new cached route. Chain together all IPv4 route caches where ro_rt != NULL. Provide in_rtcache() for adding a route to the chain. Provide in_rtflush() and in_rtflushall() for invalidating IPv4 route caches. In in_rtflush(), set ro_rt to NULL, and remove the route from the chain. In in_rtflushall(), walk the chain and remove every route cache. In rtrequest1(), call rtflushall() to invalidate route caches when a route is added. In gif(4), discard the workaround for stale caches that involves expiring them every so often. Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a call to rtflush(ro). Update ipflow_fastforward() and all other users of route caches so that they expect a cached route, ro->ro_rt, to turn to NULL. Take care when moving a 'struct route' to rtflush() the source and to rtcache() the destination. In domain initializers, use .dom_xxx tags. KNF here and there.	2006-12-09 05:33:04 +00:00
dyoung	a25eaede91	Add a source-address selection policy mechanism to the kernel. Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference numbers for addresses. Make ifconfig(8) set/display preference numbers. To activate source-address selection policies in your kernel, add 'options IPSELSRC' to your kernel configuration. Miscellaneous changes in support of source-address selection: 1 Factor out some common code, producing rt_replace_ifa(). 2 Abbreviate a for-loop with TAILQ_FOREACH(). 3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and IN_PRIVATE(), that are true for link-local unicast (169.254/16) and RFC1918 private addresses, respectively. Add the predicate IN_ANY_LOCAL() that is true for link-local unicast and multicast. 4 Add IPv4-specific interface attach/detach routines, in_domifattach and in_domifdetach, which build #ifdef IPSELSRC. See in_getifa(9) for a more thorough description of source-address selection policy.	2006-11-13 05:13:38 +00:00
liamjfoy	4876c304b1	Integrate Common Address Redundancy Procotol (CARP) from OpenBSD 'pseudo-device carp' Thanks to: joerg@ christos@ riz@ and others who tested Ok: core@	2006-05-18 09:05:49 +00:00
christos	95e1ffb156	merge ktrace-lwp.	2005-12-11 12:16:03 +00:00
xtraeme	997ffdbad9	Move ipl.h into the ipfilter block, which is the right place.	2005-07-09 14:15:11 +00:00
martti	840228057a	Install netinet/ipl.h (bin/30095)	2005-05-01 14:57:27 +00:00
peter	1c9b56c830	Add MKIPFILTER; if set to no, don't build and install the ipf(4) programs, headers and LKM. Add MKPF; if set to no, don't build and install the pf(4) programs, headers, LKM and spamd. Both options default to yes, so nothing changed in the default build. Reviewed by lukem.	2005-02-22 14:39:58 +00:00
yamt	8484dd9eed	move ipf headers and add a comment.	2004-10-05 04:55:48 +00:00
christos	5976437e5f	Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf. I think though that the files.ipfilter and Makefile glue should go to the dist directory, not like it is done now.	2004-10-01 15:24:45 +00:00
manu	6e3c639957	IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on tech-net@	2004-09-04 23:29:44 +00:00
martti	7ff15b917f	Upgraded IPFilter to 4.1.3	2004-07-23 05:39:03 +00:00
martti	621e9bac7f	Sync with official IPFilter	2004-03-28 09:01:26 +00:00
itojun	495906ca8e	revamp inpcb/in6pcb so that they are more aligned with each other. in6pcb lookup now uses hash(9).	2003-09-04 09:16:57 +00:00
lukem	0635de35a3	Remove KDIR=, since SYS_INCLUDE=symlinks and KDIR are not supported any more.	2002-11-26 23:30:07 +00:00
itojun	d300ce3942	add net/if_stf.h and netinet/ip_encap.h (almost noone will include them though)	2000-04-19 06:39:15 +00:00
itojun	118d2b1d4f	IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628. (Sorry for a big commit, I can't separate this into several pieces...) Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details. - sys/kern: do not assume single mbuf, accept chained mbuf on passing data from userland to kernel (or other way round). - "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ package (ftp://ftp.csl.sony.co.jp/pub/kjc/). - sys/netinet/tcp: IPv4/v6 dual stack tcp support. - sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those file to be there so we patch it up. - sys/netinet: IPsec additions are here and there. - sys/netinet6/: most of IPv6 code sits here. - sys/netkey: IPsec key management code - dev/pci/pcidevs: regen In my understanding no code here is subject to export control so it should be safe.	1999-07-01 08:12:45 +00:00
itojun	74d3c214ec	KAME/NetBSD 1.4 SNAP kit, dated 19990628. NOTE: this branch (kame) is used just for refernce. this may not compile due to multiple reasons.	1999-06-28 06:36:47 +00:00
cgd	651b44e211	Rework the way kernel include files are installed. In the new method, as with user-land programs, include files are installed by each directory in the tree that has includes to install. (This allows more flexibility as to what gets installed, makes 'partial installs' easier, and gives us more options as to which machines' includes get installed at any given time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_ still supported, though at least one bug in the 'symlinks' case is fixed by this change. Include files can't be build before installation, so directories that have includes as targets (e.g. dev/pci) have to move those targets into a different Makefile.	1998-06-12 23:22:30 +00:00

21 Commits