from Kentaro A. Kurahone, with minor adjustments by me.
the ack prediction part of the original patch was omitted because
it's a separate change. reviewed by Rui Paulo.
The code to generate an ISS via an MD5 hash has been present in the
NetBSD kernel since 2001, but it wasn't even exported to userland at
that time. It was agreed on tech-net with the original author <thorpej>
that we should let the user decide if he wants to enable it or not.
Not enabled by default.
credentials on sockets, at least not anytime soon, this is a way to check
if we can "look" at a socket. Later on when (and if) we do have socket
credentials, the interface usage remains the same because we pass the
socket.
This also fixes sysctl for inet/inet6 pcblist.
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).
In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.
Discussed on tech-net and reviewed by <yamt>.
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.
It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:
s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);
and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".
It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.
See also:
http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.htmlhttp://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
- Add a few scopes to the kernel: system, network, and machdep.
- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.
- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.
- Update all relevant documentation.
- Add some code and docs to help folks who want to actually use this stuff:
* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.
This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...
* And of course, documentation describing how to do the above for quick
reference, including code samples.
All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:
http://kauth.linbsd.org/kauthwiki
NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:
- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.
(or if you feel you have to, contact me first)
This is still work in progress; It's far from being done, but now it'll
be a lot easier.
Relevant mailing list threads:
http://mail-index.netbsd.org/tech-security/2006/01/25/0011.htmlhttp://mail-index.netbsd.org/tech-security/2006/03/24/0001.htmlhttp://mail-index.netbsd.org/tech-security/2006/04/18/0000.htmlhttp://mail-index.netbsd.org/tech-security/2006/05/15/0000.htmlhttp://mail-index.netbsd.org/tech-security/2006/08/01/0000.htmlhttp://mail-index.netbsd.org/tech-security/2006/08/25/0000.html
Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).
Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.
Happy birthday Randi! :)
Both available for IPv4 and IPv6.
Basic implementation test results are available at
http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.
Work sponsored by the Google Summer of Code project 2006.
Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their
help, comments and support during the project.
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.
net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist
which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.
This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).
NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.
In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:
sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15
Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.
Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.
All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.
PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
individually, create a tcpcb template pre-initialized (and pre-zero'd)
with the static and mostly-static tcpcb parameters. The template is
now copied into the new tcpcb, which zeros and initializes most of the
tcpcb in one pass. The template is kept up-to-date as TCP sysctl
variables are changed.
Combined with the previous sb_max change, TCP socket creation is now
25% faster.
throughput significantly in a wide variety of test cases, including
local gigabit ethernet with both jumbo and standard frames,
transcontinental (U.S.) connections with e2e bandwidths ranging from
10Mbit/sec to 155Mbit/sec, and on a variety of test connections
between the NetBSD Project public servers and machines in Australia.
The impact of this change is less dramatic for high-delay connections
when Path MTU is in use but still measurable.
For optimal performance on local gigabit networks, a higher socket
buffer size (at least 64K) will still yield a substantial improvement
in performance, but 32K gets us most of the way there in my test
cases, with only a cost of _doubling_ memory use per socket rather
than _quadrupling_ it.
N.B. Windows NT, at least since Win2k SP2, uses a default socket buffer
size (or their analogue thereof) of 64K, which is a useful data
point.