Commit Graph

92 Commits

Author SHA1 Message Date
atatat 4de3747b89 Sysctl descriptions under net subtree (net.key not done) 2004-05-25 04:33:59 +00:00
itojun 4ebcfcf29a fix MD5 signature support to actually validate inbound signature, and
drop packet if fails.
2004-05-18 14:44:14 +00:00
jonathan 887b782b0b Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP).  Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net.  Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct.  Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures.  Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary.  Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
2004-04-25 22:25:03 +00:00
matt 30e63c6236 export tcpstates for _KERNEL and remove tcp_usrreq.c's incorrect
declartion.
2004-04-20 22:54:31 +00:00
atatat 83b193a052 Make these compile without INET. tcp_input probably needs a lot more
work...
2004-03-29 04:59:02 +00:00
atatat 19af35fd0d Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
2004-03-24 15:34:46 +00:00
atatat 13f8d2ce5f Dynamic sysctl.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al.  Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded.  Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment.  I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
2003-12-04 19:38:21 +00:00
thorpej 31923baa46 Rather than zeroing a tcpcb structure and filling in all the fields
individually, create a tcpcb template pre-initialized (and pre-zero'd)
with the static and mostly-static tcpcb parameters.  The template is
now copied into the new tcpcb, which zeros and initializes most of the
tcpcb in one pass.  The template is kept up-to-date as TCP sysctl
variables are changed.

Combined with the previous sb_max change, TCP socket creation is now
25% faster.
2003-10-22 02:45:57 +00:00
tls b911732f2a Increase default socket-buffer sizes from 16K to 32K. This increases
throughput significantly in a wide variety of test cases, including
local gigabit ethernet with both jumbo and standard frames,
transcontinental (U.S.) connections with e2e bandwidths ranging from
10Mbit/sec to 155Mbit/sec, and on a variety of test connections
between the NetBSD Project public servers and machines in Australia.

The impact of this change is less dramatic for high-delay connections
when Path MTU is in use but still measurable.

For optimal performance on local gigabit networks, a higher socket
buffer size (at least 64K) will still yield a substantial improvement
in performance, but 32K gets us most of the way there in my test
cases, with only a cost of _doubling_ memory use per socket rather
than _quadrupling_ it.

N.B. Windows NT, at least since Win2k SP2, uses a default socket buffer
     size (or their analogue thereof) of 64K, which is a useful data
     point.
2003-09-29 21:39:35 +00:00
itojun 495906ca8e revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
2003-09-04 09:16:57 +00:00
agc aad01611e7 Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22364, verified by myself.
2003-08-07 16:26:28 +00:00
fvdl d5aece61d6 Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
2003-06-29 22:28:00 +00:00
simonb 130b423e90 Fix a nit in a comment. 2003-06-29 12:00:47 +00:00
darrenr 960df3c8d1 Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records.  The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
2003-06-28 14:20:43 +00:00
christos 8924cfdcba abuse the mib instead of abusing the new pointer. Idea from simon burge.
It allows the tcp_sysctl_ident to run by non-super-users. No backwards
compatibility provided.
2003-06-26 17:32:22 +00:00
martin d505b18964 Make sure to include opt_foo.h if a defflag option FOO is used. 2003-06-23 11:00:59 +00:00
christos 9b6eb382c2 PR/2352: Tor Egge: Add sysctl to get uid of connected socket. 2003-04-19 20:58:35 +00:00
matt 65e5548a17 Add MBUFTRACE kernel option.
Do a little mbuf rework while here.  Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *).  These are not performance critical and making them
call m_get saves considerable space.  Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
2003-02-26 06:31:08 +00:00
simonb d79a5f79da Guard use of "ostate" with #ifdef TCP_DEBUG in tcp_usrreq().
Don't put semicolons at the end of "#define token value".
2002-10-22 03:14:16 +00:00
thorpej 668640a43d Rename sbappend_stream() to sbappendstream(), per suggestion from
Jonathan Stone.
2002-07-03 21:36:57 +00:00
thorpej 0585ce1489 Make insertion of data into socket buffers O(C):
* Keep pointers to the first and last mbufs of the last record in the
  socket buffer.
* Use the sb_lastrecord pointer in the sbappend*() family of functions
  to avoid traversing the packet chain to find the last record.
* Add a new sbappend_stream() function for stream protocols which
  guarantee that there will never be more than one record in the
  socket buffer.  This function uses the sb_mbtail pointer to perform
  the data insertion.  Make TCP use sbappend_stream().

On a profiling run, this makes sbappend of a TCP transmission using
a 1M socket buffer go from 50% of the time to .02% of the time.

Thanks to Bill Sommerfeld and YAMAMOTO Takashi for their debugging
assistance!
2002-07-03 19:06:47 +00:00
itojun f192b66b94 whitespace 2002-06-09 16:33:36 +00:00
martin 0039b1300a KNFify my last change. 2002-03-11 10:06:12 +00:00
martin 75c5a16cfc Enforce a lower bound of 32 for tcp_mssdflt.
This avoids kernel crashes when we don't handle nonsensial values
like 0 gracefully. Better check here once beforehand than having to
check for non meaningful values in time critical paths (like tcp_output).

Fixes PR 15709.
2002-02-28 20:26:17 +00:00
lukem 0fa231134c - replace "defopt" with "defparam" for options which must take a value,
as config(8) will warn for value-less defparam options
- minor whitespace/formatting cleanup
- consolidate opt_tcp_recvspace.h and opt_tcp_sendspace.h into opt_tcp_space.h
2001-11-20 14:34:18 +00:00
lukem ea1cd7eb08 add RCSIDs 2001-11-13 00:32:34 +00:00
simonb 5f717f7c33 Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
2001-10-29 07:02:30 +00:00
thorpej 45e02f5ee8 Split tcp_timers() into multiple functions, one for each timer,
and call it directly from tcp_slowtimo() (via a table) rather
than going through tcp_userreq().

This will allow us to call TCP timers directly from callouts,
in a future revision.
2001-09-10 20:15:14 +00:00
itojun fd5e7077a3 allocate ipsec policy buffer attached to pcb in in*_pcballoc, before
giving anyone accesses to pcb (do not reveal an inconsistent ones).
sync with kame
2001-07-25 23:28:02 +00:00
abs 03aaf3d8b4 Rename TCPDEBUG to TCP_DEBUG, defopt TCP_DEBUG and TCP_NDEBUG, and
make all usage of tcp_trace dependent on TCP_DEBUG - resulting in
a 31K saving on an INET enabled i386 kernel.
2001-07-08 16:18:56 +00:00
itojun 193167b1eb call in{,6}_pcbpurgeif0() before in{,6}_purgeif(). 2001-07-03 08:06:19 +00:00
thorpej 7a3c8f81a5 Two changes, designed to make us even more resilient against TCP
ISS attacks (which we already fend off quite well).

1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic
   hash method of generating TCP ISS values.  Note, this code is experimental
   and disabled by default (experimental enough that I don't export the
   variable via sysctl yet, either).  There are a couple of issues I'd
   like to discuss with Steve, so this code should only be used by people
   who really know what they're doing.

2. Per a recent thread on Bugtraq, it's possible to determine a system's
   uptime by snooping the RFC1323 TCP timestamp options sent by a host; in
   4.4BSD, timestamps are created by incrementing the tcp_now variable
   at 2 Hz; there's even a company out there that uses this to determine
   web server uptime.  According to Newsham's paper "The Problem With
   Random Increments", while NetBSD's TCP ISS generation method is much
   better than the "random increment" method used by FreeBSD and OpenBSD,
   it is still theoretically possible to mount an attack against NetBSD's
   method if the attacker knows how many times the tcp_iss_seq variable
   has been incremented.  By not leaking uptime information, we can make
   that much harder to determine.  So, we avoid the leak by giving each
   TCP connection a timebase of 0.
2001-03-20 20:07:51 +00:00
itojun 52e23efa5f make sure we call tcp_output() only if we have template. 2001-02-11 06:39:35 +00:00
jdolecek 34c8ae80da constify 2001-01-18 20:28:15 +00:00
itojun 25296369e5 make sure t_family has the correct protocol family, after connect(2)
and/or bind(2).  sync with kame
2000-12-11 00:07:48 +00:00
itojun 1eb3565937 allow INET6-less build.
From: smd@ebone.net (Sean Doran)
2000-10-17 21:16:57 +00:00
itojun a7e15e4935 be more friendly with INET-less build.
XXX we need to do more to do a working INET-less build
2000-10-17 03:06:42 +00:00
enami d127401d7f Cosmetic changes to previous commit; indent break statement sanely. 2000-10-06 10:21:06 +00:00
enami 358aa75755 Just call matching purgeif/pcbpurgeif routine for the protocol family.
Without this, if a v6 address is placed before a v4 address in if_addrlist,
a PRU_PURGEIF request for v6 tcp protocol purges also v4 addresses and,
as a result, if_detach fails to request PRU_PURGEIF for v4 protocols
other than tcp.
2000-10-06 09:24:40 +00:00
itojun 63de4c2cb9 nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
	net.inet.tcp.rstratelimit
	net.inet.icmp.errratelimit
	net.inet6.icmp6.errratelimit
2000-07-28 04:06:52 +00:00
mrg cf594a3f4d <vm/vm.h> -> <uvm/uvm_extern.h> 2000-06-28 03:01:16 +00:00
itojun 8987054176 pass struct proc * down to udp6_output and in6_pcbbind. 2000-06-05 06:38:22 +00:00
itojun 5de72de121 disallow negative numbers for ratelimit interval (tcp, icmp, icmp6). 2000-05-22 12:08:43 +00:00
augustss 8529438fe6 Remove register declarations. 2000-03-30 12:51:13 +00:00
thorpej b178e1f58c Add support for rate-limiting RSTs sent in response to no socket for
an incoming packet.  Default minimum interval is 10ms.  The interval
is changeable via the "net.inet.tcp.rstratelimit" sysctl variable.
2000-02-15 19:54:11 +00:00
itojun f91ee608a9 avoid calling in6_control(SIOCDIFADDR_IN6) from interrupt context.
it is not supposed to work.
logging fix: add "\n" to some of log() in in6_prefix.c.

improve in6_ifdetach().  now almost all structure depend on ifnet
will be cleared up.
possible loose ends:
- cached route_in6 in static varaiables needs to be cleared as well
- there are ifaddr manipulation without reference counting,
  which should be fixed
we still see panics after card removal, though...  not sure what is left.

(sync with kame)
2000-02-04 14:34:22 +00:00
thorpej c1185c1020 PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
2000-02-02 23:28:08 +00:00
thorpej d844a3ac41 First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
  work was.  This is hard to get right, and we should attack one
  protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
  purge an ifaddr from a protocol.  The old method Bill used didn't work
  on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
2000-02-01 22:52:04 +00:00
itojun 1a2a1e2b1f bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
2000-01-31 14:18:52 +00:00
itojun ea861f0183 sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
  using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
1999-12-13 15:17:17 +00:00