methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).
MSLT and VTW were contributed by Coyote Point Systems, Inc.
Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.
Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.
Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.
It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.
A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.
This work was part of my 2009 GSoC
No objection on tech-net@
This commit mostly adds code written by Claudio Jeker for OpenBSD to
support sysctl in the interface printing parts (-i, -I, -w). The port has
been ported to NetBSD with tiny adjustments -- of course all bugs etc.
are mine.
Also add and document a -X flag to force sysctl usage. The documentation
notes this flag may be removed at any time and its presence should not be
relied on.
Some misc. comments/#ifdef changes/code snippet moves as well.
Please note that no functionality should change as the routing and
interface printing code is still not fully supported.
Mailing list reference:
http://mail-index.netbsd.org/tech-userlevel/2009/09/09/msg002604.html
Heavily based on similar code from Claudio Jeker (at OpenBSD).
While here, fix inet/inet6 sysctl stuff commited previously to
actually work, and some other nits to make netstat more sysctl
friendly.
One step closer to losing setgid kmem on this one...
net.bpf.stats and net.bpf.peers sysctls respectively. netstat(1) now
has an additional syntax:
netstat [-s] [-B] [-I Interface]
Only the super user can see a list of BPF peers with the following command:
# netstat -B
Active BPF peers
PID Int Recv Drop Capt Flags Bufsize Comm
4941 lo0 0 0 0 I--S- 262144 tcpdump
252 ex0 19668 0 5 I-RS- 32768 dhclient
And every user can see the BPF statistics with:
$ netstat -s -B
bpf:
19669 total packets received
5 total packets captured
0 total packets dropped
This idea came from FreeBSD (Christian S.J. Peron) but, currently, they
doen't have a userland utility in the base system to read the sysctls.
Reviewed by: christos@
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
fix PR7059 (partial): mbuf cluster counts were based on counters which
are no longer maintained.
(full fix will involve renaming the now-unused fields in mclstat in mbuf.h)
Add code to netstat to use libkvm to for kernel variables "mclbytes"
and "msize', and if found, use those for netstat -m rather than
compiled-in defaults.
of long-standing bugs:
- Actually deal with the fact that the kernel ifnet list is
a TAILQ; it just happened to work before.
- Use kvm_openfiles() instead of kvm_open(). The code passed
arguments to kvm_open() as if it were kvm_openfiles(), but
apparently went unnoticed since the prototypes are the same.
Amusing bit: there were XXX's in the code which seemed to
apologize for a verbose libkvm, when it happened to be a
bug in netstat!