Commit Graph

2021 Commits

Author SHA1 Message Date
dholland
5d71a1f21c typo in comment 2011-05-17 05:40:24 +00:00
drochner
4f6bdd19b5 use getmicrouptime(9) rather than microtime(9) for TIME_WAIT duration
calculation, because this doesn't get confused by system time changes,
and uses less CPU cycles
reviewed by dyoung
2011-05-11 15:08:59 +00:00
spz
18f5539bfc update (unused) ND option identifiers and corresponding comments 2011-05-08 18:42:53 +00:00
drochner
060227a80a remove an empty function 2011-05-06 12:52:43 +00:00
dyoung
6866464399 Remove #ifdef INET6 throughout. 2011-05-03 23:57:41 +00:00
dyoung
c2e43be1c5 Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires.  On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer.  Corresponding to each class
is an MSL, and a session uses the MSL of its class.  The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways).  Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote.  Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB".  VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion.  The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer.  When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive.  It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
2011-05-03 18:28:44 +00:00
dyoung
ac162b774b *_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag.  Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
2011-05-03 17:44:30 +00:00
dyoung
8e054749e4 arp_drain() may be called with locks held, so instead of doing any work
in arp_drain(), set a drain-needed flag.  Do the work in the fasttimo
handler.

Contributed by Coyote Point Systems, Inc.
2011-05-03 16:00:29 +00:00
yamt
0cc7ac519a undefer csum in looutput.
looutput is used by various code (ether_output, mcast) to loopback packets.
2011-04-25 22:20:59 +00:00
yamt
3e17d0f5a4 tcp_input: simplify redundant assignment. no functional changes. 2011-04-25 22:12:43 +00:00
yamt
45430a8699 ip_undefer_csum:
- don't forget ntohs.
- don't add hdrlen twice for l4 header offset.
- use M_CSUM_DATA_IPv4_IPHL instead of extracting it from ip header.
- simplify code.
- KNF.
2011-04-25 22:11:31 +00:00
yamt
e86be17a4f fix assertions 2011-04-25 22:04:32 +00:00
wiz
d8926a5a43 Fix typos. 2011-04-20 14:08:07 +00:00
gdt
f641bea548 Rewrite comments about TCP RTO calculations.
Long ago, the storage representations of srtt and rttvar were changed
from the 4.4BSD scheme, and the comments are out of sync with the
code.  This commit rewrites most of the comments that explain the RTO
calculations, and points out some issues in the code.

Joint work with Bev Schwartz of BBN (original analysis and comments),
but I have rewritten and extended them, so errors are mine.

This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.  Approved for Public
Release, Distribution Unlimited
2011-04-20 13:35:51 +00:00
dyoung
b34b1e2f1f In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.
Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen.  Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.
2011-04-14 20:32:04 +00:00
yamt
e3f6054711 simplify a compile-time assertion 2011-04-14 16:08:53 +00:00
yamt
41529ab272 - comments
- g/c stale extern
2011-04-14 15:57:02 +00:00
yamt
37494bba21 comments 2011-04-14 15:55:46 +00:00
yamt
c9cf49ace7 - comments
- whitespace
2011-04-14 15:54:31 +00:00
yamt
3695fc890a after ip_input.c rev.1.285 and 1.286, restore kernel_lock for if_output. 2011-04-14 15:53:36 +00:00
yamt
0b881f8c57 comments 2011-04-14 15:48:48 +00:00
martin
c655df0d1c PR kern/43664:
mlelstv pointed out that we sometimes may use checksums on loopback
interfaces. Make the test consistent with the code path selecting
the checksum operation before invoking fragmentation.
2011-04-09 21:00:53 +00:00
martin
8a8f4ef60a We do not do checksums on loopback interfaces, not even if fragmenting.
Fixes PR kern/43664.
2011-04-09 20:34:36 +00:00
yamt
18a0ef4a04 simplify code a little. no functional changes. 2011-04-08 11:15:11 +00:00
dyoung
060522dec8 Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
2011-03-31 19:40:51 +00:00
matt
271ad088e1 Clean up setting ECN bit in TOS. Fixes PR 44742 2011-03-21 20:39:32 +00:00
yamt
b1563ea6d9 fix a typo in rev.1.283, which broke tcp dupack and duppack statistics. 2011-03-09 00:44:23 +00:00
chuck
e3e22c95ba udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
2011-02-01 19:40:24 +00:00
matt
4d5d6d9aa5 Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs.  The old unclean one remains for backward compatibility.
2011-02-01 01:39:19 +00:00
matt
2c1217a227 Back out rev that shouldn't have been committed. 2010-12-13 14:18:50 +00:00
matt
ebb2d31714 Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.
2010-12-11 22:37:46 +00:00
plunky
d334ec0fc0 fix potential mbuf overflow, from Alexander Danilov on tech-net 2010-12-02 19:07:27 +00:00
rmind
c40af51a1a ip_randomid: make mechanism MP-safe and more modular.
OK matt@
2010-11-05 01:35:57 +00:00
rmind
aa7dc4aa25 ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.
2010-11-05 00:21:51 +00:00
matt
e787a03c11 Replace the copyright with a new TNF copyright since nothing of the old
ip_id.c remains.  Remove old comments which have no relevance anymore.
2010-11-04 22:00:51 +00:00
yamt
a2939d499b make ipfr_lock IPL_VM as ip_reass_drain is called in interrupts via
the drain hook for mbuf pools.
2010-10-07 03:15:49 +00:00
enami
daf969e420 Don't free memory still in use. Fixes nfs root problem reported
by Christoph Egger on source-changes-d.
2010-10-06 07:39:37 +00:00
rmind
ff74682fb4 Re-structure IPv4 reassembly code to make it more MP-friendly and simplify
some code fragments while here.  Also, use pool_cache(9) and mutex(9).

IPv4 reassembly mechanism is MP-safe now.
2010-10-03 19:44:47 +00:00
bad
6b557ece78 Defopt the rest of the Ipfilter options and tunables.
Per discussion with darrenr@ a year ago.
2010-10-02 20:07:39 +00:00
rmind
574e8cee41 Use own IPv4 reassembly queue entry structure and leave struct ipqent only
for TCP.  Now both struct ipfr_qent, struct ipfr_queue and hashed fragment
queue are abstracted and no longer public.
2010-08-25 00:05:14 +00:00
pooka
abddd18860 ahem, min -> max in previous 2010-08-11 11:06:42 +00:00
pooka
57ec5229b9 Use kpause() instead of DELAY() and sleep a minimum of 1 tick.
This is possible now since softints have a thread context.  It's
also not a very frequent code path.  Addresses ABI issue with delay
(kern/40505).

I'm not entire sure what this delay is meant to accomplish, though.
2010-08-11 09:36:44 +00:00
pooka
4fa346679c Include opt_inet since this checks INET/INET6 2010-08-10 21:46:12 +00:00
rmind
7b5ee09e0b Revert previous change of making struct ipqent invisible to userland. 2010-07-19 19:16:45 +00:00
rmind
2f196e2fd9 Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@
2010-07-19 14:09:44 +00:00
rmind
bcc65ff09f Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete).  No functional changes to the actual mechanism.

OK matt@
2010-07-13 22:16:10 +00:00
rmind
419f3b11a1 ip_input: move lookup for fragment queue a little bit further. OK matt@. 2010-07-09 18:42:46 +00:00
kefren
8f87b4e7b8 manually adjust m_data and m_len so it can later be prepended with a
struct ip in case that a cluster is used. icmp len panic is not valid for
cluster case.

Fixes PR/43548
2010-07-02 07:02:00 +00:00
kefren
826653c190 Add MPLS support, proposed on tech-net@ a couple of days ago
Welcome to 5.99.33
2010-06-26 14:24:27 +00:00
bouyer
adad9c5471 Make sure SYN_CACHE_TIMER_ARM() has been run before calling syn_cache_put()
as it will reschedule the timer.  Fixes PR kern/43318.
2010-05-26 17:38:29 +00:00