Commit Graph

170 Commits

Author SHA1 Message Date
christos
526a1efeb9 Limit the tcp initial window setting to 10, leaving it by default to 4
and simplifying the code in process. Per draft-ietf-initcwnd-08.txt.
2013-04-10 00:16:03 +00:00
tls
7b0b7dedd9 Entropy-pool implementation move and cleanup.
1) Move core entropy-pool code and source/sink/sample management code
   to sys/kern from sys/dev.

2) Remove use of NRND as test for presence of entropy-pool code throughout
   source tree.

3) Remove use of RND_ENABLED in device drivers as microoptimization to
   avoid expensive operations on disabled entropy sources; make the
   rnd_add calls do this directly so all callers benefit.

4) Fix bug in recent rnd_add_data()/rnd_add_uint32() changes that might
   have lead to slight entropy overestimation for some sources.

5) Add new source types for environmental sensors, power sensors, VM
   system events, and skew between clocks, with a sample implementation
   for each.

ok releng to go in before the branch due to the difficulty of later
pullup (widespread #ifdef removal and moved files).  Tested with release
builds on amd64 and evbarm and live testing on amd64.
2012-02-02 19:42:57 +00:00
yamt
bf52753ac3 tcp_reass_unlock: assertion 2011-10-31 12:52:19 +00:00
gdt
2377e629f8 Add comment urging a separation of TCP_RTT_SHIFT into separate defines
describing the EWMA calculation and the storage representation.
(No code change.)
2011-05-25 23:17:44 +00:00
dyoung
c2e43be1c5 Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires.  On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer.  Corresponding to each class
is an MSL, and a session uses the MSL of its class.  The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways).  Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote.  Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB".  VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion.  The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer.  When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive.  It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
2011-05-03 18:28:44 +00:00
dyoung
ac162b774b *_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag.  Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
2011-05-03 17:44:30 +00:00
gdt
f641bea548 Rewrite comments about TCP RTO calculations.
Long ago, the storage representations of srtt and rttvar were changed
from the 4.4BSD scheme, and the comments are out of sync with the
code.  This commit rewrites most of the comments that explain the RTO
calculations, and points out some issues in the code.

Joint work with Bev Schwartz of BBN (original analysis and comments),
but I have rewritten and extended them, so errors are mine.

This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.  Approved for Public
Release, Distribution Unlimited
2011-04-20 13:35:51 +00:00
yamt
37494bba21 comments 2011-04-14 15:55:46 +00:00
pooka
11281f01a0 Replace a large number of link set based sysctl node creations with
calls from subsystem constructors.  Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
2009-09-16 15:23:04 +00:00
darran
ddd44491c6 Make tcp msl (max segment life) tunable via sysctl net.inet.tcp.msl.
Okayed by tls@.
2009-09-09 22:41:28 +00:00
pooka
9d2101a249 POOL_INIT -> pool_init 2009-05-27 17:41:03 +00:00
pooka
c7a407f862 stinkset purge: POOL_INIT -> pool_init
also, make the syncache pool static in scope
2009-01-29 20:38:22 +00:00
plunky
fd7356a917 Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
2008-08-06 15:01:23 +00:00
martin
ce099b4099 Remove clause 3 and 4 from TNF licenses 2008-04-28 20:22:51 +00:00
ad
15e29e981b Merge the socket locking patch:
- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
2008-04-24 11:38:36 +00:00
thorpej
7ff8d08aae Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
2008-04-12 05:58:22 +00:00
thorpej
f5c68c0b9f Change TCP stats from a structure to an array of uint64_t's.
Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.
2008-04-08 01:03:58 +00:00
matt
a34217b8de Rework tcp congctl selection code so that the congctl entries can be const.
Don't access tcp_congctl stuff outside of tcp_congctl.c, use routines to
update t_congctl.  This code is slightly now more complicated.
2008-02-29 07:39:17 +00:00
matt
a4a1e5ce55 Convert stragglers to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
2008-02-27 19:41:51 +00:00
perry
b6a2ef7569 Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
2007-12-25 18:33:32 +00:00
rmind
4175f8693b TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
2007-08-02 02:42:40 +00:00
ad
88ab7da936 Merge some of the less invasive changes from the vmlocking branch:
- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
2007-07-09 20:51:58 +00:00
christos
0a36551606 tcpdrop kernel bits (from anon ymous) 2007-06-25 23:35:12 +00:00
christos
eeff189533 - per socket keepalive settings
- settable connection establishment timeout
2007-06-20 15:29:17 +00:00
dyoung
72f0a6dfb0 Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing.  Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously.  Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs.  I have
  introduced routines for allocating, copying, and duplicating,
  and freeing sockaddrs:

        struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
        struct sockaddr *sockaddr_copy(struct sockaddr *dst,
                                       const struct sockaddr *src);
        struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
        void sockaddr_free(struct sockaddr *sa);

  sockaddr_alloc() returns either a sockaddr from the pool belonging
  to the specified family, or NULL if the pool is exhausted.  The
  returned sockaddr has the right size for that family; sa_family
  and sa_len fields are initialized to the family and sockaddr
  length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
  sockaddr_in).  sockaddr_free() puts the given sockaddr back into
  its family's pool.

  sockaddr_dup() and sockaddr_copy() work analogously to strdup()
  and strcpy(), respectively.  sockaddr_copy() KASSERTs that the
  family of the destination and source sockaddrs are alike.

  The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
  passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
  family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
  etc.  They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more.  All protocol families
  use struct route.  I have changed the route cache, 'struct route',
  so that it does not contain storage space for a sockaddr.  Instead,
  struct route points to a sockaddr coming from the pool the sockaddr
  belongs to.  I added a new method to struct route, rtcache_setdst(),
  for setting the cache destination:

        int rtcache_setdst(struct route *, const struct sockaddr *);

  rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
  available to create the sockaddr storage.

  It is now possible for rtcache_getdst() to return NULL if, say,
  rtcache_setdst() failed.  I check the return value for NULL
  everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
  caches, dom_rtcache.  rtflushall(sa_family_t af) looks up the
  domain indicated by 'af', walks the domain's list of route caches
  and invalidates each one.
2007-05-02 20:40:22 +00:00
christos
53524e44ef Kill caddr_t; there will be some MI fallout, but it will be fixed shortly. 2007-03-04 05:59:00 +00:00
dyoung
5493f188c7 KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
   in6_src.c, avoid casts by changing several route_in6 pointers
   to struct route pointers.  Remove unnecessary casts to caddr_t
   elsewhere.

Pave the way for eliminating address family-specific route caches:
   soon, struct route will not embed a sockaddr, but it will hold
   a reference to an external sockaddr, instead.  We will set the
   destination sockaddr using rtcache_setdst().  (I created a stub
   for it, but it isn't used anywhere, yet.)  rtcache_free() will
   free the sockaddr.  I have extracted from rtcache_free() a helper
   subroutine, rtcache_clear().  rtcache_clear() will "forget" a
   cached route, but it will not forget the destination by releasing
   the sockaddr.  I use rtcache_clear() instead of rtcache_free()
   in rtcache_update(), because rtcache_update() is not supposed
   to forget the destination.

Constify:

   1 Introduce const accessor for route->ro_dst, rtcache_getdst().

   2 Constify the 'dst' argument to ifnet->if_output().  This
     led me to constify a lot of code called by output routines.

   3 Constify the sockaddr argument to protosw->pr_ctlinput.  This
     led me to constify a lot of code called by ctlinput routines.

   4 Introduce const macros for converting from a generic sockaddr
     to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
     satocsin, et cetera.
2007-02-17 22:34:07 +00:00
yamt
8836e5995d add some more tcp mowners. 2006-12-06 09:10:45 +00:00
yamt
f5830ee995 - make tcp_reass static.
- constify.
2006-12-06 09:08:27 +00:00
yamt
c31e22237d - constify.
- make tcp_dooptions and tcpipqent_pool static.
2006-10-21 10:08:54 +00:00
yamt
81463c93c7 implement RFC3465 appropriate byte counting.
from Kentaro A. Kurahone, with minor adjustments by me.
the ack prediction part of the original patch was omitted because
it's a separate change.  reviewed by Rui Paulo.
2006-10-19 11:40:51 +00:00
rpaulo
21df8206df Export the tcp_do_rfc1948 variable to userland via sysctl.
The code to generate an ISS via an MD5 hash has been present in the
NetBSD kernel since 2001, but it wasn't even exported to userland at
that time. It was agreed on tech-net with the original author <thorpej>
that we should let the user decide if he wants to enable it or not.
Not enabled by default.
2006-10-16 18:13:56 +00:00
rpaulo
f3330397f0 Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
2006-10-09 16:27:07 +00:00
rpaulo
2fb2ae3251 Import of TCP ECN algorithm for congestion control.
Both available for IPv4 and IPv6.
Basic implementation test results are available at
http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.

Work sponsored by the Google Summer of Code project 2006.
Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their
help, comments and support during the project.
2006-09-05 00:29:35 +00:00
rpaulo
25ec6d007f revert stuff that shouldn't have gone in. 2006-07-22 17:45:03 +00:00
rpaulo
f5f6aa2ed3 TCP RFC is 793, not 783. 2006-07-22 17:39:48 +00:00
perry
fbae48b901 Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
2006-02-16 20:17:12 +00:00
perry
0f0296d88a Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete. 2005-12-24 20:45:08 +00:00
christos
95e1ffb156 merge ktrace-lwp. 2005-12-11 12:16:03 +00:00
elad
9702e98730 Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
2005-12-10 23:31:41 +00:00
rpaulo
37cbe61e67 Implement tcp.inet{,6}.tcp{,6}.(debug|debx) when TCP_DEBUG is set. They
can be used to ``transliterate protocol trace'' like trpt(8) does.
2005-09-06 02:41:14 +00:00
yamt
f02551ec2d move {tcp,udp}_do_loopback_cksum back to tcp/udp
so that they can be referenced by ipv6.
2005-08-10 13:06:49 +00:00
elad
6439f2618f Add sysctls for IP, ICMP, TCP, and UDP statistics. 2005-08-05 09:21:25 +00:00
christos
89940190d0 Implement PMTU checks from:
http://www.gont.com.ar/drafts/icmp-attacks-against-tcp.html

1. Don't act on ICMP-need-frag immediately if adhoc checks on the
advertised MTU fail. The MTU update is delayed until a TCP retransmit
happens.
2. Ignore ICMP Source Quench messages meant for TCP connections.

From OpenBSD.
2005-07-19 17:00:02 +00:00
christos
ea2d4204b6 - add const
- remove bogus casts
- avoid nested variables
2005-05-29 21:41:23 +00:00
kurahone
f7707899c1 Added sysctl tunable limits for the number of maximum SACK holes
per connection and per system.

Idea taken from FreeBSD.
2005-04-05 01:07:17 +00:00
yamt
8b0967ff45 protect tcpipqent with splvm. 2005-03-29 20:10:16 +00:00
yamt
df05ca7085 simplify data receiver side sack processing.
- introduce t_segqlen, the number of segments in segq/timeq.
  the name is from freebsd.
- rather than maintaining a copy of sack blocks (rcv_sack_block[]),
  build it directly from the segment list when needed.
2005-03-16 00:39:56 +00:00
yamt
0446b7c3e3 - use full sized segments unless we actually have SACKs to send.
- avoid TSO duplicate D-SACK.
- send SACKs regardless of TF_ACKNOW.
- don't clear rcv_sack_num when transmitting.

discussed on tech-net@.
2005-03-16 00:38:27 +00:00
atatat
76a9013c25 gc the tcp_sysctl() prototype since it's completely vestigial 2005-03-09 04:51:56 +00:00