Commit Graph

350 Commits

Author SHA1 Message Date
pooka
54b3dc4108 tcp sockbuf autoscaling was initially added turned off because it
was experimental.  People (including myself) have been running with
it turned on for eons now, so flip the default to enabled.
2010-01-26 18:09:07 +00:00
darran
ddd44491c6 Make tcp msl (max segment life) tunable via sysctl net.inet.tcp.msl.
Okayed by tls@.
2009-09-09 22:41:28 +00:00
minskim
2708c3c1b9 Check the minimum ttl only when pcb is available. 2009-07-18 23:09:53 +00:00
minskim
d0a9c36e4a Add the IP_MINTTL socket option.
The IP_MINTTL option may be used on SOCK_STREAM sockets to discard
packets with a TTL lower than the option value.  This can be used to
implement the Generalized TTL Security Mechanism (GTSM) according to
RFC 3682.

OK'ed by christos@.
2009-07-17 22:02:54 +00:00
christos
8d20d2e953 Follow exactly the recommendation of draft-ietf-tcpm-tcpsecure-11.txt:
Don't check gainst the last ack received, but the expected sequence number.
This makes RST handling independent of delayed ACK. From Joanne M Mikkelson.
2009-06-20 17:29:31 +00:00
cegger
c363a9cb62 bzero -> memset 2009-03-18 16:00:08 +00:00
cegger
35fb64746b bcmp -> memcmp 2009-03-18 15:14:29 +00:00
cegger
dc56dbbd97 ansify function definitions 2009-03-15 21:23:31 +00:00
pooka
c7a407f862 stinkset purge: POOL_INIT -> pool_init
also, make the syncache pool static in scope
2009-01-29 20:38:22 +00:00
tls
c5ddeafa76 Unlock reassembly queue before calling sorwakeup(), not after. In unusual
cases with in-kernel consumers which might send data on the same socket,
we can deadlock on the reassembly queue otherwise (observed while testing
accept filters).
2008-08-04 04:08:47 +00:00
matt
34ac358652 Reacquire softnet_lock after calling soabort which returns with the socket
unlocked.
2008-07-28 18:41:07 +00:00
ad
c4e6bfaf85 tcp_input: add a couple of assertions. 2008-07-04 18:22:21 +00:00
ad
4c75eca868 syn_cache_get: remove new endpoint's socket from head's queue if aborting
the connection. Should fix KASSERT(so->so_head == NULL).
2008-07-03 15:35:28 +00:00
martin
ce099b4099 Remove clause 3 and 4 from TNF licenses 2008-04-28 20:22:51 +00:00
ad
15e29e981b Merge the socket locking patch:
- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
2008-04-24 11:38:36 +00:00
thorpej
caf49ea572 Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
2008-04-23 06:09:04 +00:00
thorpej
7ff8d08aae Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
2008-04-12 05:58:22 +00:00
thorpej
f5c68c0b9f Change TCP stats from a structure to an array of uint64_t's.
Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.
2008-04-08 01:03:58 +00:00
rmind
c6186face4 Welcome to 4.99.55:
- Add a lot of missing selinit() and seldestroy() calls.

- Merge selwakeup() and selnotify() calls into a single selnotify().

- Add an additional 'events' argument to selnotify() call.  It will
  indicate which event (POLL_IN, POLL_OUT, etc) happen.  If unknown,
  zero may be used.

Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
2008-03-01 14:16:49 +00:00
matt
a4a1e5ce55 Convert stragglers to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
2008-02-27 19:41:51 +00:00
yamt
c3985cffec make TCP_SETUP_ACK, ICMP_CHECK, TCP_FIELDS_TO_HOST, and TCP_FIELDS_TO_NET
static functions.
2008-02-20 11:44:07 +00:00
yamt
f35baba8dd - start tcp timestamp from 1 instead of 0.
- add a comment to explain why:
+        * We start with 1, because 0 doesn't work with linux, which
+        * considers timestamp 0 in a SYN packet as a bug and disables
+        * timestamps.
2008-02-05 09:38:47 +00:00
yamt
d5bac2f6b1 redo tcp_input.c rev.1.230 correctly.
revision 1.230
    date: 2005/06/30 02:58:28;  author: christos;  state: Exp;  lines: +20 -4
    Normalize our PAWS code with Free and Open, as mentioned in tech-security.

reviewed by christos@ and matt@.
2008-02-04 23:56:14 +00:00
yamt
a944f4302a revert tcp_output.c 1.253 because it has an ill effect when sending
small (not full-sized) segments.
http://mail-index.NetBSD.org/tech-net/2008/01/27/0009.html
2008-01-29 12:34:47 +00:00
dyoung
2d4e7e5856 Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().
2008-01-14 04:19:09 +00:00
martin
7080c9db1e A few missing ifdefs to make non-INET6 kernels build again. 2007-12-20 20:24:49 +00:00
dyoung
72fa642a86 Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt.  Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address.  Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c.  Move the rtflush()
code into rtcache_clear() and delete rtflush().  Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt().  They compile, but I have not
tested them.  I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
2007-12-20 19:53:29 +00:00
elad
7beaf4911f Really fix low port allocation, by always passing a valid lwp to
in_pcbbind().

Okay dyoung@.

Note that the network code is another candidate for major cleanup... also
note that this issue is likely to be present in netinet6 code, too.
2007-12-16 14:12:34 +00:00
dyoung
94b72f0f97 Change macros SYN_CACHE_PUT() and SYN_CACHE_RM() into inline
subroutines syn_cache_put() and syn_cache_rm().
2007-11-09 23:55:58 +00:00
rmind
d63e75f696 Pick the smallest possible TCP window scaling factor that will still allow
us to scale up to sb_max.  This might fix the problems with some firewalls.

Taken from FreeBSD (silby).
OK by <dyoung>.
2007-11-04 11:04:26 +00:00
yamt
e74ee454c1 our tcp timestamps are in PR_SLOWHZ, not HZ. 2007-08-02 13:06:30 +00:00
rmind
4175f8693b TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
2007-08-02 02:42:40 +00:00
ad
88ab7da936 Merge some of the less invasive changes from the vmlocking branch:
- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
2007-07-09 20:51:58 +00:00
christos
eeff189533 - per socket keepalive settings
- settable connection establishment timeout
2007-06-20 15:29:17 +00:00
riz
711b142f07 Fix compilation in the TCP_SIGNATURE case:
- don't use void * for pointer arithmetic
	- don't try to modify const parameters

A kernel with 'options TCP_SIGNATURE' works as well as it ever did, now.
(ie, clunky, but passable)
2007-05-18 21:48:43 +00:00
riz
89c9ca415d Revert a small part of revision 1.254 - remove const qualifier from
the struct tcphdr * argument of tcp_dooptions().  RFC2385 support
(options TCP_SIGNATURE) needs to modify the header during options
processing, and this revision broke it.

OK yamt@.
2007-05-18 21:31:16 +00:00
dyoung
72f0a6dfb0 Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing.  Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously.  Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs.  I have
  introduced routines for allocating, copying, and duplicating,
  and freeing sockaddrs:

        struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
        struct sockaddr *sockaddr_copy(struct sockaddr *dst,
                                       const struct sockaddr *src);
        struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
        void sockaddr_free(struct sockaddr *sa);

  sockaddr_alloc() returns either a sockaddr from the pool belonging
  to the specified family, or NULL if the pool is exhausted.  The
  returned sockaddr has the right size for that family; sa_family
  and sa_len fields are initialized to the family and sockaddr
  length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
  sockaddr_in).  sockaddr_free() puts the given sockaddr back into
  its family's pool.

  sockaddr_dup() and sockaddr_copy() work analogously to strdup()
  and strcpy(), respectively.  sockaddr_copy() KASSERTs that the
  family of the destination and source sockaddrs are alike.

  The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
  passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
  family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
  etc.  They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more.  All protocol families
  use struct route.  I have changed the route cache, 'struct route',
  so that it does not contain storage space for a sockaddr.  Instead,
  struct route points to a sockaddr coming from the pool the sockaddr
  belongs to.  I added a new method to struct route, rtcache_setdst(),
  for setting the cache destination:

        int rtcache_setdst(struct route *, const struct sockaddr *);

  rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
  available to create the sockaddr storage.

  It is now possible for rtcache_getdst() to return NULL if, say,
  rtcache_setdst() failed.  I check the return value for NULL
  everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
  caches, dom_rtcache.  rtflushall(sa_family_t af) looks up the
  domain indicated by 'af', walks the domain's list of route caches
  and invalidates each one.
2007-05-02 20:40:22 +00:00
ad
59d979c5f1 Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
2007-03-12 18:18:22 +00:00
christos
53524e44ef Kill caddr_t; there will be some MI fallout, but it will be fixed shortly. 2007-03-04 05:59:00 +00:00
thorpej
7cc07e11dc TRUE -> true, FALSE -> false 2007-02-22 06:16:03 +00:00
degroote
e2211411a4 Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
2007-02-10 09:43:05 +00:00
joerg
eb04733c4e Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
2006-12-15 21:18:52 +00:00
dyoung
c308b1c661 Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route).  Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL.  Provide
in_rtcache() for adding a route to the chain.  Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches.  In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain.  In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
2006-12-09 05:33:04 +00:00
yamt
8836e5995d add some more tcp mowners. 2006-12-06 09:10:45 +00:00
yamt
f5830ee995 - make tcp_reass static.
- constify.
2006-12-06 09:08:27 +00:00
christos
168cd830d2 __unused removal on arguments; approved by core. 2006-11-16 01:32:37 +00:00
yamt
c31e22237d - constify.
- make tcp_dooptions and tcpipqent_pool static.
2006-10-21 10:08:54 +00:00
yamt
e1c6fffb40 tcp_input: if we have SACK, don't enter fastrecovery on three dupacks.
otherwise, we can enter fastrecovery due to DSACKs, which we treat
as dupacks here.  PR/34748.  reviewed by Rui Paulo.
2006-10-17 09:31:17 +00:00
rpaulo
1c1f230e81 Move comments to proper places. 2006-10-15 17:53:30 +00:00
rpaulo
a70594d346 Add a new tcp_congctl(9) structure member for congestion experienced callback.
Needed by HSTCP.
2006-10-15 17:45:06 +00:00
rpaulo
c1fc16d084 PR 34776: don't accept TCP connections to broadcast addresses.
Move the multicast/broadcast check above (before creating a syn_cache entry)
By Yasuoka Yasuoka.
2006-10-12 11:46:30 +00:00
christos
4d595fd7b1 - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
2006-10-12 01:30:41 +00:00
rpaulo
a6762e54d7 Revert previous. The check is now done in tcp_congctl. 2006-10-10 11:13:02 +00:00
yamt
f5209007e9 tcp_input: don't call congctl->newack when doing fast retransmit. 2006-10-10 09:19:40 +00:00
rpaulo
f3330397f0 Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
2006-10-09 16:27:07 +00:00
tls
8cc016b4bc Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

	s=splfoo();
	/* frob data structure */
	splx(s);
	pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next.  "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them.  One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
2006-10-05 17:35:19 +00:00
rpaulo
2fb2ae3251 Import of TCP ECN algorithm for congestion control.
Both available for IPv4 and IPv6.
Basic implementation test results are available at
http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.

Work sponsored by the Google Summer of Code project 2006.
Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their
help, comments and support during the project.
2006-09-05 00:29:35 +00:00
kardel
de4337ab21 merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
  time.tv_sec -> time_second
- struct timeval mono_time is gone
  mono_time.tv_sec -> time_uptime
- access to time via
	{get,}{micro,nano,bin}time()
	get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
  Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
  NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
2006-06-07 22:33:33 +00:00
bouyer
01307555ec Revert rev 1.241: calling m_makewritable() in tcp_input causes problems when
it has to change the mbuf chain. I experience hard hang on a Xen2 domU after
TCP connections have been closed, and a crash has been reported which may be
caused by this too.
2006-05-27 13:35:20 +00:00
bouyer
bc93583ffe If we're going to byteswap fields in the TCP header, make sure the mbuf
area is writable first.
2006-05-25 21:49:19 +00:00
christos
4fd8acf0f3 Coverity CID 1152: Add KASSERT before deref. 2006-04-15 02:32:22 +00:00
rpaulo
ae6865ba83 PR 13952: Noritoshi Demizu: correct the TCP window information update check. 2006-02-18 17:34:49 +00:00
riz
854279801b If TCP_SIGNATURE is defined, include netinet6/scope6_var.h for the
prototype of in6_clearscope().  Kernels with options TCP_SIGNATURE now
compile again after the IPv6 scoped address changes.
2006-02-02 05:52:23 +00:00
dsl
c24781af04 Pass the current process structure to in_pcbconnect() so that it can
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
2005-11-15 18:39:46 +00:00
christos
622690226a If called from syn_cache_add, we need to initialize t_state before calling
tcp_dooptions. Pointed out by yamt.
2005-08-12 14:41:00 +00:00
hubertf
a72fe4e4bf Clarify comment that "the protocol specification dated September, 1981"
is really RFC 793.
2005-08-12 04:19:22 +00:00
christos
5910d08b05 Don't process TCP options in SYN packets after the connection has
been established. (FreeBSD-SA-05:15.tcp)
2005-08-11 22:25:18 +00:00
yamt
f02551ec2d move {tcp,udp}_do_loopback_cksum back to tcp/udp
so that they can be referenced by ipv6.
2005-08-10 13:06:49 +00:00
yamt
8220c378e6 device independent part of ipv6 rx checksum offloading. 2005-08-10 13:05:16 +00:00
christos
89940190d0 Implement PMTU checks from:
http://www.gont.com.ar/drafts/icmp-attacks-against-tcp.html

1. Don't act on ICMP-need-frag immediately if adhoc checks on the
advertised MTU fail. The MTU update is delayed until a TCP retransmit
happens.
2. Ignore ICMP Source Quench messages meant for TCP connections.

From OpenBSD.
2005-07-19 17:00:02 +00:00
christos
a85b0c68e0 Normalize our PAWS code with Free and Open, as mentioned in tech-security. 2005-06-30 02:58:28 +00:00
yamt
0e70c535bf tcp_input: don't overload opti.ts_ecr. 2005-06-06 12:10:09 +00:00
christos
ea2d4204b6 - add const
- remove bogus casts
- avoid nested variables
2005-05-29 21:41:23 +00:00
manu
cddc307094 Fix build problem after recent NAT-T changes 2005-04-26 05:37:45 +00:00
yamt
4b935040d0 tcp_input: update a comment to match with the code. 2005-04-03 05:02:46 +00:00
yamt
8b0967ff45 protect tcpipqent with splvm. 2005-03-29 20:10:16 +00:00
yamt
df05ca7085 simplify data receiver side sack processing.
- introduce t_segqlen, the number of segments in segq/timeq.
  the name is from freebsd.
- rather than maintaining a copy of sack blocks (rcv_sack_block[]),
  build it directly from the segment list when needed.
2005-03-16 00:39:56 +00:00
mycroft
c9f058f65e Copyright maintenance. 2005-03-02 10:20:18 +00:00
jonathan
4ae1f36dc9 Commit TCP SACK patches from Kentaro A. Karahone's patch at:
http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz

Fixes in that patch for pre-existing TCP pcb initializations were already
committed to NetBSD-current, so are not included in this commit.

The SACK patch has been observed to correctly negotiate and respond,
to SACKs in wide-area traffic.

There are two indepenently-observed, as-yet-unresolved anomalies:
First, seeing unexplained delays between in fast retransmission
(potentially explainable by an 0.2sec RTT between adjacent
ethernet/wifi NICs); and second, peculiar and unepxlained TCP
retransmits observed over an ath0 card.

After discussion with several interested developers, I'm committing
this now, as-is, for more eyes to use and look over.  Current hypothesis
is that the anomalies above may in fact be due to link/level (hardware,
driver, HAL, firmware) abberations in the test setup, affecting  both
Kentaro's  wired-Ethernet NIC and in my two (different) WiFi NICs.
2005-02-28 16:20:59 +00:00
perry
f07677dd81 nuke trailing whitespace 2005-02-26 22:45:09 +00:00
perry
870f206724 ANSIfy function declarations 2005-02-03 23:39:32 +00:00
perry
3494482345 de-__P -- will ANSIfy .c files later. 2005-02-02 21:41:55 +00:00
mycroft
47759e6333 Several changes based on comparison with NS:
1) dupseg_fix_=true from NS: do not count a segment with completely duplicate
data as a duplicate ack.  This can occur due to duplicate packets in the
network, or due to fast retransmit from the other side.

2) dupack_reset_=false from NS: do not reset the duplicate ack counter or exit
fast recovery if we happen to get data or a window update along with a
duplicate ack.

3) In the "very old ack" case that itojun added, send an ACK before dropping
the segment, to try to update the other side's send sequence number.

4) Check the ssthresh crossover point with >= rather than >.  Otherwise we
start to do "exponential" growth immediately following recovery, where we
should be doing "linear".  This is what NS does.
2005-01-28 00:18:22 +00:00
mycroft
746d109a3c There is no reason to adjust ts_recent_age for ts_timebase; it's strictly an
internal variable.
2005-01-27 17:14:04 +00:00
mycroft
470f2d0705 Do the other TCP_PAWS_IDLE check unsigned as well. It doesn't do us any harm,
and it could detect even older time stamps.  (Really, to be 100% correct, there
should be a timer that clears these out -- but it probably doesn't matter in
the real world.)
2005-01-27 17:10:07 +00:00
mycroft
42655f2a87 Also check whether an echoed RTT is very large -- this *could* cause the
smoothing function to overflow.  I use TCP_PAWS_IDLE (24 days) for this.
2005-01-27 16:56:06 +00:00
mycroft
7215a0b3f1 Introduce a new state variable, t_partialacks. It has 3 states:
* t_partialacks<0 means we are not in fast recovery.
* t_partialacks==0 means we are in fast recovery, but we have not received
  any partial acks yet.
* t_partialacks>0 means we are in fast recovery, and we have received
  partial acks.

This is used to implement 2 changes in RFC 3782:
* We keep the notion that we are in fast recovery separate from t_dupacks, so
  it is not reset due to out-of-order acks.  (This affects both the Reno and
  NewReno cases.)
* We only reset the retransmit timer on the first partial ack -- preventing us
  from possibly taking one RTO per segment once fast recovery is initiated.

As before, it is hard to measure any difference between Reno and NewReno in the
real-world cases that I've tested.
2005-01-27 03:39:36 +00:00
mycroft
5283ca74ad Fix two problems in our TCP stack:
1) If an echoed RFC 1323 time stamp appears to be later than the current time,
   ignore it and fall back to old-style RTT calculation.  This prevents ending
   up with a negative RTT and panicking later.

2) Fix NewReno.  This involves a few changes:

   a) Implement the send_high variable in RFC 2582.  Our implementation is
      subtly different; it is one *past* the last sequence number transmitted
      rather than being equal to it.  This simplifies some logic and makes
      the code smaller.  Additional logic was required to prevent sequence
      number wraparound problems; this is not mentioned in RFC 2582.

   b) Make sure we reset t_dupacks on new acks, but *not* on a partial ack.
      All of the new ack code is pushed out into tcp_newreno().  (Later this
      will probably be a pluggable function.)  Thus t_dupacks keeps track of
      whether we're in fast recovery all the time, with Reno or NewReno, which
      keeps some logic simpler.

   c) We do not need to update snd_recover when we're not in fast recovery.
      See tech-net for an explanation of this.

   d) In the gratuitous fast retransmit prevention case, do not send a packet.
      RFC 2582 specifically says that we should "do nothing".

   e) Do not inflate the congestion window on a partial ack.  (This is done by
      testing t_dupacks to see whether we're still in fast recovery.)

This brings the performance of NewReno back up to the same as Reno in a few
random test cases (e.g. transferring peer-to-peer over my wireless network).
I have not concocted a good test case for the behavior specific to NewReno.
2005-01-26 21:49:27 +00:00
yamt
ffebedd625 factor out receive side tcp/udp checksum handling code so that they
can be used by eg. packet filters.

reviewed by Christos Zoulas on tech-net@.
(slightly tweaked since then to make tcp and udp similar.)
2004-12-21 05:51:31 +00:00
yamt
6e353db6e4 tcp_input: add missing loopback checksum omission code for ipv6. 2004-12-18 07:30:17 +00:00
thorpej
7994b6f95e Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
2004-12-15 04:25:19 +00:00
yamt
0ea22c32fa fix ipqent pool corruption problems. make tcp reass code use
its own pool of ipqent rather than sharing it with ip reass code.
PR/24782.
2004-09-15 09:21:22 +00:00
itojun
2aef0b1784 correct TCP-MD5 support. Jeff Rizzo 2004-06-26 03:29:15 +00:00
jonathan
349ad018c7 Remove now-unused variable. 2004-05-23 00:37:27 +00:00
itojun
4ebcfcf29a fix MD5 signature support to actually validate inbound signature, and
drop packet if fails.
2004-05-18 14:44:14 +00:00
jonathan
85b3ba5bf1 Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl,  for
netstat -p ipsec.

New kernel files:
	sys/netipsec/Makefile		(new file; install *_var.h includes)
	sys/netipsec/ipsec_var.h	(new 64-bit mib counter struct)

Changed kernel files:
	sys/Makefile			(recurse into sys/netipsec/)
	sys/netinet/in.h		(fake IP_PROTO name for fast_ipsec
					sysctl subtree.)
	sys/netipsec/ipsec.h		(minimal userspace inclusion)
	sys/netipsec/ipsec_osdep.h	(minimal userspace inclusion)
	sys/netipsec/ipsec_netbsd.c	(redo sysctl subtree from scratch)
	sys/netipsec/key*.c		(fix broken net.key subtree)

	sys/netipsec/ah_var.h		(increase all counters to 64 bits)
	sys/netipsec/esp_var.h		(increase all counters to 64 bits)
	sys/netipsec/ipip_var.h		(increase all counters to 64 bits)
	sys/netipsec/ipcomp_var.h	(increase all counters to 64 bits)

	sys/netipsec/ipsec.c		(add #include netipsec/ipsec_var.h)
	sys/netipsec/ipsec_mbuf.c	(add #include netipsec/ipsec_var.h)
	sys/netipsec/ipsec_output.c	(add #include netipsec/ipsec_var.h)

	sys/netinet/raw_ip.c		(add #include netipsec/ipsec_var.h)
	sys/netinet/tcp_input.c		(add #include netipsec/ipsec_var.h)
	sys/netinet/udp_usrreq.c	(add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
	usr.bin/netstat/fast_ipsec.c	(print fast-ipsec counters)

Changed files:
	usr.bin/netstat/Makefile	(add fast_ipsec.c)
	usr.bin/netstat/netstat.h	(declarations for fast_ipsec.c)
	usr.bin/netstat/main.c		(call KAME-vs-fast-ipsec dispatcher)
2004-05-07 00:55:14 +00:00
matt
5a0de7507d When a packet is received that overlaps the left side of the window,
check for RST *before* trimming data and adjust its sequence number.
2004-04-27 14:46:07 +00:00
itojun
e0395ac8f0 make TCP MD5 signature work with KAME IPSEC (#define IPSEC).
support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream
for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the
right thing).

XXX
current TCP MD5 signature code has giant flaw:
it does not validate signature on input (can't believe it! what is the point?)
2004-04-26 03:54:28 +00:00
matt
5413745100 Remove #else clause of __STDC__ 2004-04-26 01:31:56 +00:00
jonathan
887b782b0b Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP).  Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net.  Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct.  Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures.  Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary.  Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
2004-04-25 22:25:03 +00:00