Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
optimization made last year. should solve PR 17867 and 10195.
IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.
* Remove the code that allocates a cluster if the packet would
fit in one; it totally defeats doing references to M_EXT mbufs
in the socket buffer. This drastically reduces the number of
data copies in the tcp_output() path for applications which use
large writes. Kudos to Matt Thomas for pointing me in the right
direction.
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:
* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.
From art@openbsd.org.
all open TCP connections in tcp_slowtimo() (which is called 2x
per second). It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.
We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.
Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.
Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
ISS attacks (which we already fend off quite well).
1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic
hash method of generating TCP ISS values. Note, this code is experimental
and disabled by default (experimental enough that I don't export the
variable via sysctl yet, either). There are a couple of issues I'd
like to discuss with Steve, so this code should only be used by people
who really know what they're doing.
2. Per a recent thread on Bugtraq, it's possible to determine a system's
uptime by snooping the RFC1323 TCP timestamp options sent by a host; in
4.4BSD, timestamps are created by incrementing the tcp_now variable
at 2 Hz; there's even a company out there that uses this to determine
web server uptime. According to Newsham's paper "The Problem With
Random Increments", while NetBSD's TCP ISS generation method is much
better than the "random increment" method used by FreeBSD and OpenBSD,
it is still theoretically possible to mount an attack against NetBSD's
method if the attacker knows how many times the tcp_iss_seq variable
has been incremented. By not leaking uptime information, we can make
that much harder to determine. So, we avoid the leak by giving each
TCP connection a timebase of 0.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.
TODO: use header history for stricter inbound validation
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame
XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync
entering rtentry's for hosts we're not actually communicating
with.
Do this by invoking the ctlinput for the protocol, which is
responsible for validating the ICMP message:
* TCP -- Lookup the connection based on the address/port
pairs in the ICMP message.
* AH/ESP -- Lookup the SA based on the SPI in the ICMP message.
If validation succeeds, ctlinput is responsible for calling
icmp_mtudisc(). icmp_mtudisc() then invokes callbacks registered
by protocols (such as TCP) which want to take some sort of special
action when a path's MTU changes. For TCP, this is where we now
refresh cached routes and re-enter slow-start.
As a side-effect, this fixes the problem where TCP would not be
notified when a path's MTU changed if AH/ESP were being used.
XXX Note, this is only a fix for the IPv4 case. For the IPv6
XXX case, we need to wait for the KAME folks.
Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
ip_output(). This flag, if set, causes ip_output() to set
DF in the IP header if the MTU in the route is not locked.
This allows a bunch of redundant code, which I was never
really all that happy about adding in the first place, to
be eliminated.
Inspired by a similar change made by provos@openbsd.org when
he integrated NetBSD's Path MTU Discovery code into OpenBSD.
this avoids too aggressive memory usage on heavy load web server, for example.
From: Kevin Lahey <kml@dotrocket.com>
release and reallocate t_template, if t_template->m_len changes.
(this happens if we connect to IPv4 mapped destination and then IPv6
destination, on a single AF_INET6 socket)
KAME 1.26 -> 1.28