Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).
MSLT and VTW were contributed by Coyote Point Systems, Inc.
Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.
Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.
Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.
It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.
A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
2011-05-03 22:28:44 +04:00
|
|
|
# $NetBSD: Makefile,v 1.20 2011/05/03 18:28:45 dyoung Exp $
|
1998-06-13 03:22:30 +04:00
|
|
|
|
|
|
|
INCSDIR= /usr/include/netinet
|
|
|
|
|
2004-10-01 19:24:45 +04:00
|
|
|
INCS= icmp6.h icmp_var.h if_atm.h if_ether.h if_inarp.h igmp.h \
|
Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).
Stale IPv6 and ISO route caches will be treated by separate patches.
Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.
Here are the details:
Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.
Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.
Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.
In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.
In gif(4), discard the workaround for stale caches that involves
expiring them every so often.
Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).
Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.
Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.
In domain initializers, use .dom_xxx tags.
KNF here and there.
2006-12-09 08:33:04 +03:00
|
|
|
igmp_var.h in.h in_gif.h in_pcb.h in_pcb_hdr.h \
|
|
|
|
in_selsrc.h in_systm.h \
|
Work in progress: use a raw socket for GRE in IP encapsulation
instead of adding/subtracting our own IPv4 header.
There are many benefits: gre(4) needn't grok the outer encapsulation
header any longer, so this simplifies the gre(4) code. The IP
stack needn't grok GRE, so it is simplified, too. gre(4) will
benefit from optimizations in the socket code. Eventually, gre(4)
will gain an IPv6 encapsulation with very few new lines of code.
There is a small performance loss. A 133 MHz, 486-class AMD Elan
sinks/sources a TCP stream over GRE with about 93% the throughput
of the old code. TCP throughput on a 266 MHz, 586-class AMD Geode
is about 96% the throughput of the old code. A 175-MHz ADM5120
(MIPS) only sinks a TCP stream over GRE at about 90% of the old
code; I am still investigating that.
I produced stripped-down versions of sosend() and soreceive() for
gre(4) to use. They are guaranteed not to block, so they can be
called from a software interrupt and from a socket upcall,
respectively.
A kernel thread is no longer necessary for socket transmit/receive,
but I didn't get around to removing it, yet.
Thanks to Matt Thomas for suggesting the use of stripped-down socket
code and software interrupts, and to Andrew Doran for advice and
answers concerning software interrupts, threads, and performance.
2007-10-05 07:28:12 +04:00
|
|
|
in_var.h ip.h ip_carp.h ip6.h ip_ecn.h ip_encap.h \
|
2005-07-09 18:15:11 +04:00
|
|
|
ip_icmp.h ip_mroute.h ip_var.h pim.h pim_var.h \
|
2004-10-01 19:24:45 +04:00
|
|
|
tcp.h tcp_debug.h tcp_fsm.h tcp_seq.h tcp_timer.h tcp_var.h \
|
Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).
MSLT and VTW were contributed by Coyote Point Systems, Inc.
Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.
Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.
Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.
It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.
A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
2011-05-03 22:28:44 +04:00
|
|
|
tcpip.h udp.h udp_var.h \
|
|
|
|
tcp_vtw.h
|
1998-06-13 03:22:30 +04:00
|
|
|
|
2004-10-05 08:55:48 +04:00
|
|
|
# ipfilter headers
|
|
|
|
# XXX shouldn't be here
|
2005-02-22 17:39:58 +03:00
|
|
|
.include <bsd.own.mk>
|
|
|
|
|
|
|
|
INCS+= ip_compat.h # always needed by kdump(1)
|
|
|
|
|
|
|
|
.if (${MKIPFILTER} != "no")
|
|
|
|
INCS+= ip_auth.h ip_fil.h ip_frag.h ip_htable.h ip_nat.h \
|
2005-07-09 18:15:11 +04:00
|
|
|
ip_lookup.h ip_pool.h ip_proxy.h ip_scan.h ip_state.h ip_sync.h \
|
|
|
|
ipl.h
|
2005-02-22 17:39:58 +03:00
|
|
|
.endif
|
2004-10-01 19:24:45 +04:00
|
|
|
|
1998-06-13 03:22:30 +04:00
|
|
|
.include <bsd.kinc.mk>
|
2004-10-01 19:24:45 +04:00
|
|
|
|
|
|
|
.PATH: ${NETBSDSRCDIR}/sys/dist/ipf/netinet
|