Long ago, the storage representations of srtt and rttvar were changed
from the 4.4BSD scheme, and the comments are out of sync with the
code. This commit rewrites most of the comments that explain the RTO
calculations, and points out some issues in the code.
Joint work with Bev Schwartz of BBN (original analysis and comments),
but I have rewritten and extended them, so errors are mine.
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073. Approved for Public
Release, Distribution Unlimited
Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.
Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.
mlelstv pointed out that we sometimes may use checksums on loopback
interfaces. Make the test consistent with the code path selecting
the checksum operation before invoking fragmentation.
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.
sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.
Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
This is possible now since softints have a thread context. It's
also not a very frequent code path. Addresses ABI issue with delay
(kern/40505).
I'm not entire sure what this delay is meant to accomplish, though.
the callout if needed so frees are not delayed too much.
syn_cache_timer(): we can't call syn_cache_put() here any more,
so move code deleted from syn_cache_put() here.
Avoid KASSERT() in kern_timeout.c because pool_put() is called from
ipintr context, as reported in
http://mail-index.netbsd.org/tech-kern/2010/03/19/msg007762.html
Thanks to Andrew Doran and Mindaugas Rasiukevicius for help and review.
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.
About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.
the driver output path (that is, ifp->if_output()). In the case of
entry through the socket code, we are fine, because pru_usrreq takes
KERNEL_LOCK. However, there are a few other ways to cause output
which require protection:
1) direct calls to tcp_output() in tcp_input()
2) fast-forwarding code (ip_flow) -- protected elsewise
against itself by the softnet lock.
3) *Possibly* the ARP code. I have currently persuaded
myself that it is safe because of how it's called.
4) Possibly the ICMP code.
This change addresses #1 and #2.
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.
Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.
Damon Permezel noticed the problem. Temporary fix suggested by matt@.