Commit Graph

319 Commits

Author SHA1 Message Date
drochner
23e5beaef1 rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
2011-12-19 11:59:56 +00:00
tls
3afd44cf08 First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>.  This change includes
the following:

	An initial cleanup and minor reorganization of the entropy pool
	code in sys/dev/rnd.c and sys/dev/rndpool.c.  Several bugs are
	fixed.  Some effort is made to accumulate entropy more quickly at
	boot time.

	A generic interface, "rndsink", is added, for stream generators to
	request that they be re-keyed with good quality entropy from the pool
	as soon as it is available.

	The arc4random()/arc4randbytes() implementation in libkern is
	adjusted to use the rndsink interface for rekeying, which helps
	address the problem of low-quality keys at boot time.

	An implementation of the FIPS 140-2 statistical tests for random
	number generator quality is provided (libkern/rngtest.c).  This
	is based on Greg Rose's implementation from Qualcomm.

	A new random stream generator, nist_ctr_drbg, is provided.  It is
	based on an implementation of the NIST SP800-90 CTR_DRBG by
	Henric Jungheim.  This generator users AES in a modified counter
	mode to generate a backtracking-resistant random stream.

	An abstraction layer, "cprng", is provided for in-kernel consumers
	of randomness.  The arc4random/arc4randbytes API is deprecated for
	in-kernel use.  It is replaced by "cprng_strong".  The current
	cprng_fast implementation wraps the existing arc4random
	implementation.  The current cprng_strong implementation wraps the
	new CTR_DRBG implementation.  Both interfaces are rekeyed from
	the entropy pool automatically at intervals justifiable from best
	current cryptographic practice.

	In some quick tests, cprng_fast() is about the same speed as
	the old arc4randbytes(), and cprng_strong() is about 20% faster
	than rnd_extract_data().  Performance is expected to improve.

	The AES code in src/crypto/rijndael is no longer an optional
	kernel component, as it is required by cprng_strong, which is
	not an optional kernel component.

	The entropy pool output is subjected to the rngtest tests at
	startup time; if it fails, the system will reboot.  There is
	approximately a 3/10000 chance of a false positive from these
	tests.  Entropy pool _input_ from hardware random numbers is
	subjected to the rngtest tests at attach time, as well as the
	FIPS continuous-output test, to detect bad or stuck hardware
	RNGs; if any are detected, they are detached, but the system
	continues to run.

	A problem with rndctl(8) is fixed -- datastructures with
	pointers in arrays are no longer passed to userspace (this
	was not a security problem, but rather a major issue for
	compat32).  A new kernel will require a new rndctl.

	The sysctl kern.arandom() and kern.urandom() nodes are hooked
	up to the new generators, but the /dev/*random pseudodevices
	are not, yet.

	Manual pages for the new kernel interfaces are forthcoming.
2011-11-19 22:51:18 +00:00
yamt
4fa9fc4940 fix a double unlock bug introduced by tcp_input.c rev.1.312. 2011-10-31 13:01:42 +00:00
plunky
7f3d4048d7 NULL does not need a cast 2011-08-31 18:31:02 +00:00
joerg
3eb244d801 Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
2011-07-17 20:54:30 +00:00
gdt
c238210804 Remove erroneous additional tick in RTO estimation. The variable
ts_rtt is 1 plus the RTT, so that 0 can mean invalid measurement.
However, the code failed to subtract the 1 back out before use.  With
this change, TCP from Massachusetts to France now typically has 1s RTO
values, rather than 1.5s.

This bug was found and fixed by Bev Schwartz of BBN.  This material is
based upon work supported by the Defense Advanced Research Projects
Agency and Space and Naval Warfare Systems Center, Pacific, under
Contract No. N66001-09-C-2073.  Approved for Public Release,
Distribution Unlimited
2011-05-25 23:20:57 +00:00
dholland
5d71a1f21c typo in comment 2011-05-17 05:40:24 +00:00
dyoung
c2e43be1c5 Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires.  On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer.  Corresponding to each class
is an MSL, and a session uses the MSL of its class.  The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways).  Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote.  Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB".  VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion.  The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer.  When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive.  It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
2011-05-03 18:28:44 +00:00
yamt
3e17d0f5a4 tcp_input: simplify redundant assignment. no functional changes. 2011-04-25 22:12:43 +00:00
wiz
d8926a5a43 Fix typos. 2011-04-20 14:08:07 +00:00
gdt
f641bea548 Rewrite comments about TCP RTO calculations.
Long ago, the storage representations of srtt and rttvar were changed
from the 4.4BSD scheme, and the comments are out of sync with the
code.  This commit rewrites most of the comments that explain the RTO
calculations, and points out some issues in the code.

Joint work with Bev Schwartz of BBN (original analysis and comments),
but I have rewritten and extended them, so errors are mine.

This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.  Approved for Public
Release, Distribution Unlimited
2011-04-20 13:35:51 +00:00
yamt
0b881f8c57 comments 2011-04-14 15:48:48 +00:00
yamt
b1563ea6d9 fix a typo in rev.1.283, which broke tcp dupack and duppack statistics. 2011-03-09 00:44:23 +00:00
plunky
d334ec0fc0 fix potential mbuf overflow, from Alexander Danilov on tech-net 2010-12-02 19:07:27 +00:00
bouyer
adad9c5471 Make sure SYN_CACHE_TIMER_ARM() has been run before calling syn_cache_put()
as it will reschedule the timer.  Fixes PR kern/43318.
2010-05-26 17:38:29 +00:00
bouyer
c638cbeac1 syn_cache_put(): defer all pool_put() to the callout. Reschedule
the callout if needed so frees are not delayed too much.
syn_cache_timer(): we can't call syn_cache_put() here any more,
so move code deleted from syn_cache_put() here.

Avoid KASSERT() in kern_timeout.c because pool_put() is called from
ipintr context, as reported in
http://mail-index.netbsd.org/tech-kern/2010/03/19/msg007762.html
Thanks to Andrew Doran and Mindaugas Rasiukevicius for help and review.
2010-04-21 20:40:16 +00:00
rmind
b278cb5138 tcp_input: set ECE flag even if CWR flag is active.
Submitted by Richard Scheffenegger in PR/43150.
2010-04-16 03:13:03 +00:00
tls
4e0229021b Oops. Fix LOCKDEBUG panic -- and spurious calls to tcp_output()! -- in
previous.  Be careful with that {}, Eugene.
2010-04-01 14:31:51 +00:00
tls
994b02bdbe After discussion with ad@: it appears that KERNEL_LOCK also protects
the driver output path (that is, ifp->if_output()).  In the case of
entry through the socket code, we are fine, because pru_usrreq takes
KERNEL_LOCK.  However, there are a few other ways to cause output
which require protection:

	1) direct calls to tcp_output() in tcp_input()
	2) fast-forwarding code (ip_flow) -- protected elsewise
	   against itself by the softnet lock.
	3) *Possibly* the ARP code.  I have currently persuaded
	   myself that it is safe because of how it's called.
	4) Possibly the ICMP code.

This change addresses #1 and #2.
2010-04-01 00:24:41 +00:00
pooka
54b3dc4108 tcp sockbuf autoscaling was initially added turned off because it
was experimental.  People (including myself) have been running with
it turned on for eons now, so flip the default to enabled.
2010-01-26 18:09:07 +00:00
darran
ddd44491c6 Make tcp msl (max segment life) tunable via sysctl net.inet.tcp.msl.
Okayed by tls@.
2009-09-09 22:41:28 +00:00
minskim
2708c3c1b9 Check the minimum ttl only when pcb is available. 2009-07-18 23:09:53 +00:00
minskim
d0a9c36e4a Add the IP_MINTTL socket option.
The IP_MINTTL option may be used on SOCK_STREAM sockets to discard
packets with a TTL lower than the option value.  This can be used to
implement the Generalized TTL Security Mechanism (GTSM) according to
RFC 3682.

OK'ed by christos@.
2009-07-17 22:02:54 +00:00
christos
8d20d2e953 Follow exactly the recommendation of draft-ietf-tcpm-tcpsecure-11.txt:
Don't check gainst the last ack received, but the expected sequence number.
This makes RST handling independent of delayed ACK. From Joanne M Mikkelson.
2009-06-20 17:29:31 +00:00
cegger
c363a9cb62 bzero -> memset 2009-03-18 16:00:08 +00:00
cegger
35fb64746b bcmp -> memcmp 2009-03-18 15:14:29 +00:00
cegger
dc56dbbd97 ansify function definitions 2009-03-15 21:23:31 +00:00
pooka
c7a407f862 stinkset purge: POOL_INIT -> pool_init
also, make the syncache pool static in scope
2009-01-29 20:38:22 +00:00
tls
c5ddeafa76 Unlock reassembly queue before calling sorwakeup(), not after. In unusual
cases with in-kernel consumers which might send data on the same socket,
we can deadlock on the reassembly queue otherwise (observed while testing
accept filters).
2008-08-04 04:08:47 +00:00
matt
34ac358652 Reacquire softnet_lock after calling soabort which returns with the socket
unlocked.
2008-07-28 18:41:07 +00:00
ad
c4e6bfaf85 tcp_input: add a couple of assertions. 2008-07-04 18:22:21 +00:00
ad
4c75eca868 syn_cache_get: remove new endpoint's socket from head's queue if aborting
the connection. Should fix KASSERT(so->so_head == NULL).
2008-07-03 15:35:28 +00:00
martin
ce099b4099 Remove clause 3 and 4 from TNF licenses 2008-04-28 20:22:51 +00:00
ad
15e29e981b Merge the socket locking patch:
- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
2008-04-24 11:38:36 +00:00
thorpej
caf49ea572 Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
2008-04-23 06:09:04 +00:00
thorpej
7ff8d08aae Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
2008-04-12 05:58:22 +00:00
thorpej
f5c68c0b9f Change TCP stats from a structure to an array of uint64_t's.
Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.
2008-04-08 01:03:58 +00:00
rmind
c6186face4 Welcome to 4.99.55:
- Add a lot of missing selinit() and seldestroy() calls.

- Merge selwakeup() and selnotify() calls into a single selnotify().

- Add an additional 'events' argument to selnotify() call.  It will
  indicate which event (POLL_IN, POLL_OUT, etc) happen.  If unknown,
  zero may be used.

Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
2008-03-01 14:16:49 +00:00
matt
a4a1e5ce55 Convert stragglers to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
2008-02-27 19:41:51 +00:00
yamt
c3985cffec make TCP_SETUP_ACK, ICMP_CHECK, TCP_FIELDS_TO_HOST, and TCP_FIELDS_TO_NET
static functions.
2008-02-20 11:44:07 +00:00
yamt
f35baba8dd - start tcp timestamp from 1 instead of 0.
- add a comment to explain why:
+        * We start with 1, because 0 doesn't work with linux, which
+        * considers timestamp 0 in a SYN packet as a bug and disables
+        * timestamps.
2008-02-05 09:38:47 +00:00
yamt
d5bac2f6b1 redo tcp_input.c rev.1.230 correctly.
revision 1.230
    date: 2005/06/30 02:58:28;  author: christos;  state: Exp;  lines: +20 -4
    Normalize our PAWS code with Free and Open, as mentioned in tech-security.

reviewed by christos@ and matt@.
2008-02-04 23:56:14 +00:00
yamt
a944f4302a revert tcp_output.c 1.253 because it has an ill effect when sending
small (not full-sized) segments.
http://mail-index.NetBSD.org/tech-net/2008/01/27/0009.html
2008-01-29 12:34:47 +00:00
dyoung
2d4e7e5856 Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().
2008-01-14 04:19:09 +00:00
martin
7080c9db1e A few missing ifdefs to make non-INET6 kernels build again. 2007-12-20 20:24:49 +00:00
dyoung
72fa642a86 Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt.  Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address.  Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c.  Move the rtflush()
code into rtcache_clear() and delete rtflush().  Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt().  They compile, but I have not
tested them.  I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
2007-12-20 19:53:29 +00:00
elad
7beaf4911f Really fix low port allocation, by always passing a valid lwp to
in_pcbbind().

Okay dyoung@.

Note that the network code is another candidate for major cleanup... also
note that this issue is likely to be present in netinet6 code, too.
2007-12-16 14:12:34 +00:00
dyoung
94b72f0f97 Change macros SYN_CACHE_PUT() and SYN_CACHE_RM() into inline
subroutines syn_cache_put() and syn_cache_rm().
2007-11-09 23:55:58 +00:00
rmind
d63e75f696 Pick the smallest possible TCP window scaling factor that will still allow
us to scale up to sb_max.  This might fix the problems with some firewalls.

Taken from FreeBSD (silby).
OK by <dyoung>.
2007-11-04 11:04:26 +00:00
yamt
e74ee454c1 our tcp timestamps are in PR_SLOWHZ, not HZ. 2007-08-02 13:06:30 +00:00