Use kmem_zalloc() instead of kmem_alloc() + bzero().
During initialization, try to get all of the memory we need for the
vestigial time-wait structures before we set any of the structures up,
and if any single allocation fails, release all of the memory.
This should help low-memory hosts. A much better fix postpones
allocating any memory until vtw is enabled through the sysctl.
vtw_control() is called. In this way, vtw_tick() will be re-scheduled
repeatedly while vtw is in use.
Pay tcp_vtw_was_enabled no attention in vtw_earlyinit(), since it's
always going to be 0 during initialization.
ts_rtt is 1 plus the RTT, so that 0 can mean invalid measurement.
However, the code failed to subtract the 1 back out before use. With
this change, TCP from Massachusetts to France now typically has 1s RTO
values, rather than 1.5s.
This bug was found and fixed by Bev Schwartz of BBN. This material is
based upon work supported by the Defense Advanced Research Projects
Agency and Space and Naval Warfare Systems Center, Pacific, under
Contract No. N66001-09-C-2073. Approved for Public Release,
Distribution Unlimited
- introduce a limit for the routes accepted via IPv6 Router Advertisement:
a common 2 interface client will have 6, the default limit is 100 and
can be adjusted via sysctl
- report the current number of routes installed via RA via sysctl
- count discarded route additions. Note that one RA message is two routes.
This is at present only across all interfaces even though per-interface
would be more useful, since the per-interface structure complies to RFC2466
- bump kernel version due to the previous change
- adjust netstat to use the new value (with netstat -p icmp6)
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).
MSLT and VTW were contributed by Coyote Point Systems, Inc.
Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.
Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.
Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.
It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.
A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
- don't forget ntohs.
- don't add hdrlen twice for l4 header offset.
- use M_CSUM_DATA_IPv4_IPHL instead of extracting it from ip header.
- simplify code.
- KNF.
Long ago, the storage representations of srtt and rttvar were changed
from the 4.4BSD scheme, and the comments are out of sync with the
code. This commit rewrites most of the comments that explain the RTO
calculations, and points out some issues in the code.
Joint work with Bev Schwartz of BBN (original analysis and comments),
but I have rewritten and extended them, so errors are mine.
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073. Approved for Public
Release, Distribution Unlimited
Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.
Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.
mlelstv pointed out that we sometimes may use checksums on loopback
interfaces. Make the test consistent with the code path selecting
the checksum operation before invoking fragmentation.
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.
sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.
Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)