mbuf chains which are recycled (e.g., ICMP reflection, loopback
interface). A consensus was reached that such recycled packets should
behave (more-or-less) the same way if a new chain had been allocated
and the contents copied to that chain.
Some packet tags may in future be marked as "persistent" (e.g., for
mandatory access controls) and should persist across such deletion.
NetBSD as yet hos no persistent tags, so m_tag_delete_nonpersistent()
just deletes all tags. This should not be relied upon.
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.
As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.
initialized. Update the txp(4) to compensate.
- Statically initialize the TCP timer callout handles in the tcpcb
template. We still use callout_setfunc(), but that call is now much
less expensive. Add a comment that the compiler is likely to unroll
the loop (so don't sweat that it's there).
During testing the prediction counters show a hit-rate on about 85% for
packets sent on a local LAN, and better than 99% for intercontinental
high-speed bulk traffic (!).
close sockets on address changes, which was deemed to be a bad idea and was
summarily removed, so there is no point in wasting effort on maintaining it
any more.
individually, create a tcpcb template pre-initialized (and pre-zero'd)
with the static and mostly-static tcpcb parameters. The template is
now copied into the new tcpcb, which zeros and initializes most of the
tcpcb in one pass. The template is kept up-to-date as TCP sysctl
variables are changed.
Combined with the previous sb_max change, TCP socket creation is now
25% faster.
throughput significantly in a wide variety of test cases, including
local gigabit ethernet with both jumbo and standard frames,
transcontinental (U.S.) connections with e2e bandwidths ranging from
10Mbit/sec to 155Mbit/sec, and on a variety of test connections
between the NetBSD Project public servers and machines in Australia.
The impact of this change is less dramatic for high-delay connections
when Path MTU is in use but still measurable.
For optimal performance on local gigabit networks, a higher socket
buffer size (at least 64K) will still yield a substantial improvement
in performance, but 32K gets us most of the way there in my test
cases, with only a cost of _doubling_ memory use per socket rather
than _quadrupling_ it.
N.B. Windows NT, at least since Win2k SP2, uses a default socket buffer
size (or their analogue thereof) of 64K, which is a useful data
point.
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.
NB: This does not fix the fact that IPsec leaks "packet tags."
argument. So check so is non-NULL before doing the pointer-chasing
dance to find the PCB. (Unless and until we rework fast-ipsec and
KAME, to pass a struct in_pcbhdr * instead of the struct socket *).
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)
In preparation for fast-ipsec. Reviewed by itojun.
reported by Shiva Shenoy
while we're here, fix another problem when the same interface address is
assigned to !IFF_MULTICAST and IFF_MULTICAST interface. if ip_multicast_if()
returns the first one, join/leave will fail, which is not an desired effect.
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.
All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.
Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
cooperating with the callout code in working around the race
condition caused by the TCP code's use of the callout facility.
Instead of unconditionally releasing memory in tcp_close() and
SYN_CACHE_PUT(), check whether any of the related callout handlers
are about to be invoked (but have not yet done callout_ack()), and
if so, just mark the associated data structure (tcpcb or syn cache
entry) as "dead", and test for this (and release storage) in the
callout handler functions.
sent from. This change avoid a linear search through all mbufs when using
large TCP windows, and therefore permit high-speed connections on long
distances.
Tested on a 1 Gigabit connection between Luleå and San Francisco, a distance
of about 15000km. With TCP windows of just over 20 Mbytes it could keep up
with 950Mbit/s.
After discussions with Matt Thomas and Jason Thorpe.
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)
part of advanced API update (RFC2292 -> 3542).
by the application, all NetBSD interfaces are made visible, even
if some other feature-test macro (like _POSIX_C_SOURCE) is defined.
<sys/featuretest.h> defined _NETBSD_SOURCE if none of _ANSI_SOURCE,
_POSIX_C_SOURCE and _XOPEN_SOURCE is defined, so as to preserve
existing behaviour.
This has two major advantages:
+ Programs that require non-POSIX facilities but define _POSIX_C_SOURCE
can trivially be overruled by putting -D_NETBSD_SOURCE in their CFLAGS.
+ It makes most of the #ifs simpler, in that they're all now ORs of the
various macros, rather than having checks for (!defined(_ANSI_SOURCE) ||
!defined(_POSIX_C_SOURCE) || !defined(_XOPEN_SOURCE)) all over the place.
I've tried not to change the semantics of the headers in any case where
_NETBSD_SOURCE wasn't defined, but there were some places where the
current semantics were clearly mad, and retaining them was harder than
correcting them. In particular, I've mostly normalised things so that
_ANSI_SOURCE gets you the smallest set of stuff, then _POSIX_C_SOURCE,
_XOPEN_SOURCE and _NETBSD_SOURCE in that order.
Tested by building for vax, encouraged by thorpej, and uncontested in
tech-userlevel for a week.
If there are any "old programs which incorrectly set this" left,
they will now fail with EAFNOSUPPORT.
This make in_pcbbind() consistent with in_pcbconnect() and the other
protocol families.
As per my PR [kern/4441], which has the comment:
Steven's "TCP/IP Illustrated, Volume 2", page 730, notes that
in_pcbbind() has the check which determines if sin_family == AF_INET
commented out, but the same check in in_pcbconnect() is still active.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
class, so in a C++ environment rename the ip_opts member to Ip_opts as
observed in several other implementations; from Jon Olsson in
PR toolchain/19880.
kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals
kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)
based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
Remove the set-but-not-used "proto" variable.
Guard the "ostate" variable in #ifdef TCP_DEBUG.
Remove the set-but-not-used "parentinpcb" variable in syn_cache_get().
netinet/files.ipfilter, etinet/files.netinet, netinet6/files.netinet6,
and netinet6/files.netipsec.
XXX There are still a few stragglers in conf/files, which are entangled
with other network protocols.
hunting for an MSS option to clamp. The previous code assumed that at least
one more byte of options (such as a TCPOPT_EOL) would follow the MSS
option; now, we allow the MSS option to end on the last byte of the
TCP header.
Packets have been observed "in the wild" with a TCP header length of
'6' (24 bytes.. 20 bytes fixed header, 4 bytes options) with a 4-byte
MSS option exactly filling the 4 bytes of options payload and no
following TCPOPT_EOL.
RFC793 is quite explicit that the EOL byte:
" .. need only be used if the end of the options would not
otherwise coincide with the end of the TCP header."
"In rare cases when there is no room for ip options ip_insertoptions()
can fail and corrupt a header length. Initialize len and check what
ip_insertoptions() returns."
This merge changes the device switch tables from static array to
dynamically generated by config(8).
- All device switches is defined as a constant structure in device drivers.
- The new grammer ``device-major'' is introduced to ``files''.
device-major <prefix> char <num> [block <num>] [<rules>]
- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.
- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.
- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.
- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.
- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
TCPCB .. the fields need to be converted back to net-order, because
the packet is checksummed after the TCPCB lookup happens.
From YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>.
we can always keep 2 packets on the wire, no matter what SO_SNDBUF is,
and therefore ACKs will never be delayed unless we run out of data to
transmit. The problem is quite easy to tickle when the MTU of the
outgoing interface is larger than the socket buffer size (e.g. loopback).
Fix from Charles Hannum.
optimization made last year. should solve PR 17867 and 10195.
IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
value of the TCP_NODELAY socket option from the listener to the
newly connected connection. Agrees with how Linux & FreeBSD behave,
and goes more with the spirit of accept(2) creating a socket with
the same properties as the listener.
Analysis by Kevin Lahey. Closes PR 17616 by myself.