NetBSD/sys/rump
dyoung c2e43be1c5 Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires.  On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer.  Corresponding to each class
is an MSL, and a session uses the MSL of its class.  The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways).  Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote.  Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB".  VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion.  The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer.  When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive.  It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
2011-05-03 18:28:44 +00:00
..
dev actually add libpud and revert damage to libputter. 2011-03-31 08:36:25 +00:00
fs When panicing, at least tell the _real_ reason. 2011-03-10 22:11:05 +00:00
include Regen for ISSYMLINK removal. 2011-04-18 00:43:56 +00:00
kern Fix spelling of MKZFS 2011-03-05 03:15:25 +00:00
librump More lim_free() fallout 2011-05-01 02:52:42 +00:00
net Reduces the resources demanded by TCP sessions in TIME_WAIT-state using 2011-05-03 18:28:44 +00:00
ldscript.rump Introduce RUMP_COMPONENT. It behaves mostly like a simplified 2010-03-01 13:12:19 +00:00
Makefile Add infrastructure for kern compnents. This is meant for those 2010-06-10 21:56:42 +00:00
Makefile.rump Define COMPAT_50 to be 1 just like config(8) would be opt_compat_netbsd.h 2011-02-01 01:15:51 +00:00
README.dirs update slightly 2010-05-11 11:58:14 +00:00
TODO update todo from my private collection (which is now empty) 2011-02-01 15:26:46 +00:00

	$NetBSD: README.dirs,v 1.11 2010/05/11 11:58:14 pooka Exp $


The following is a quick rundown of the current directory structure.
First, components in the kernel namespace, i.e. compiled with -D_KERNEL

sys/rump/librump - kernel runtime emulation
  /rumpkern	- kernel core, e.g. syscall, interrupt and lock support

  /rumpcrypto	- kernel cryptographic routines
  /rumpdev	- device support, e.g. autoconf subsystem
  /rumpnet	- networking support and sockets layer
  /rumpvfs	- file system support

sys/rump/include
  /machine - used for architectures where the rump ABI is not yet the
	     same as the kernel module ABI.  will eventually disappear
	     completely
  /rump    - rump headers installed to userspace

sys/rump/dev - device components, e.g. audio, raidframe, usb drivers

sys/rump/fs - file system components
  /lib/lib${fs}  - kernel file system code

sys/rump/net - networking components
  /lib/libnet	  - subroutines from sys/net, e.g. route and if_ethersubr
  /lib/libnetinet - TCP/IP
  /lib/libvirtif  - a virtual interface which uses host tap(4) to shovel
		    packets.  This is used by netinet and if_ethersubr.
  /lib/libsockin  - implements PF_INET using host kernel sockets.  This is
		    mutually exclusive with net, netinet and virtif.



The rest are out-of-kernel components (i.e. no -D_KERNEL)
related to rump.

hypercall interface:
src/lib/librumpuser
  The "rumpuser" set of interfaces is used by rump to communicate
  with the host.

Users:
src/lib
  /libp2k  - puffs-to-vfs adaption layer, userspace namespace
  /libukfs - user kernel file system, a library to access file system
	     images (or devices) directly in userspace without going
	     through a system call and puffs.  It provides a slightly
	     higher interface than pure rump syscalls.

src/usr.sbin/puffs
  rump_$fs - userspace file system daemons using the kernel fs code

src/share/examples/rump
  Various examples detailing use of rump in different scenarios.
  These are provided source-only.