pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.
net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist
which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.
This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).
NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.
In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:
sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15
Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.
Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.
All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.
PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
individually, create a tcpcb template pre-initialized (and pre-zero'd)
with the static and mostly-static tcpcb parameters. The template is
now copied into the new tcpcb, which zeros and initializes most of the
tcpcb in one pass. The template is kept up-to-date as TCP sysctl
variables are changed.
Combined with the previous sb_max change, TCP socket creation is now
25% faster.
throughput significantly in a wide variety of test cases, including
local gigabit ethernet with both jumbo and standard frames,
transcontinental (U.S.) connections with e2e bandwidths ranging from
10Mbit/sec to 155Mbit/sec, and on a variety of test connections
between the NetBSD Project public servers and machines in Australia.
The impact of this change is less dramatic for high-delay connections
when Path MTU is in use but still measurable.
For optimal performance on local gigabit networks, a higher socket
buffer size (at least 64K) will still yield a substantial improvement
in performance, but 32K gets us most of the way there in my test
cases, with only a cost of _doubling_ memory use per socket rather
than _quadrupling_ it.
N.B. Windows NT, at least since Win2k SP2, uses a default socket buffer
size (or their analogue thereof) of 64K, which is a useful data
point.
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
* Keep pointers to the first and last mbufs of the last record in the
socket buffer.
* Use the sb_lastrecord pointer in the sbappend*() family of functions
to avoid traversing the packet chain to find the last record.
* Add a new sbappend_stream() function for stream protocols which
guarantee that there will never be more than one record in the
socket buffer. This function uses the sb_mbtail pointer to perform
the data insertion. Make TCP use sbappend_stream().
On a profiling run, this makes sbappend of a TCP transmission using
a 1M socket buffer go from 50% of the time to .02% of the time.
Thanks to Bill Sommerfeld and YAMAMOTO Takashi for their debugging
assistance!
This avoids kernel crashes when we don't handle nonsensial values
like 0 gracefully. Better check here once beforehand than having to
check for non meaningful values in time critical paths (like tcp_output).
Fixes PR 15709.
as config(8) will warn for value-less defparam options
- minor whitespace/formatting cleanup
- consolidate opt_tcp_recvspace.h and opt_tcp_sendspace.h into opt_tcp_space.h