Commit Graph

49 Commits

Author SHA1 Message Date
christos
bf93cf8726 Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
2011-09-24 17:18:17 +00:00
dyoung
c2e43be1c5 Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires.  On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer.  Corresponding to each class
is an MSL, and a session uses the MSL of its class.  The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways).  Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote.  Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB".  VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion.  The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer.  When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive.  It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
2011-05-03 18:28:44 +00:00
minskim
d0a9c36e4a Add the IP_MINTTL socket option.
The IP_MINTTL option may be used on SOCK_STREAM sockets to discard
packets with a TTL lower than the option value.  This can be used to
implement the Generalized TTL Security Mechanism (GTSM) according to
RFC 3682.

OK'ed by christos@.
2009-07-17 22:02:54 +00:00
minskim
ca28940e0e Add the IP_RECVTTL option support.
If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram.  The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.
2009-07-16 04:09:51 +00:00
elad
ce55394a89 Oops. Remove kauth.h inclusion.
Pointed out by gdt@, thanks.
2007-12-16 18:39:57 +00:00
elad
7beaf4911f Really fix low port allocation, by always passing a valid lwp to
in_pcbbind().

Okay dyoung@.

Note that the network code is another candidate for major cleanup... also
note that this issue is likely to be present in netinet6 code, too.
2007-12-16 14:12:34 +00:00
dyoung
4c9b6756a5 1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
   datagrams and remove the header from rx'd datagrams:

        int onoff = 1, s = socket(...);
        setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
   sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
   Consistently return ENOPROTOOPT when an option is unsupported,
   and EINVAL if a supported option's arguments are incorrect.
   Reorganize the flow of code so that it's more clear how/when
   options are passed down the stack until they are handled.

   Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
   methods, and introduce a new subroutine, fsocreate(), for reuse
   later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

        Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

        Create an mbuf, make its owner the socket `so', put the
        int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

        Create a socket, a la socreate(9), put the socket into the
        given LWP's descriptor table, return the descriptor at `fd'
        on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

        Extract a pointer to the address part of a sockaddr.  Write
        the length of the address  part at `slenp', if `slenp' is
        not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

        Return the length of a sockaddr.  This just evaluates to
        sa->sa_len.  I only add this for consistency with code that
        appears in a portable userland library that I am going to
        import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

        Return the "don't care" sockaddr in the same family as
        `sa'.  This is the address a client should sobind(9) if it
        does not care the source address and, if applicable, the
        port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

        Return the "don't care" sockaddr in the same family as
        `sa'.  This is the address a client should sobind(9) if it
        does not care the source address and, if applicable, the
        port et cetera that it uses.
2007-09-19 04:33:42 +00:00
ad
f474dceb13 Use the LWP cached credentials where sane. 2006-07-23 22:06:03 +00:00
elad
9702e98730 Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
2005-12-10 23:31:41 +00:00
dsl
c24781af04 Pass the current process structure to in_pcbconnect() so that it can
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
2005-11-15 18:39:46 +00:00
manu
5c217c1a67 Add support for IPsec Network Address Translator traversal (NAT-T), as
described by RFC 3947 and 3948.
2005-02-12 12:31:07 +00:00
itojun
0f06e31eb6 no space between function name and paren: foo (blah) -> foo(blah) 2004-04-21 17:49:46 +00:00
matt
db6a0b431a De __P() 2004-04-18 21:00:35 +00:00
mycroft
5a8b331f54 Remove all the code to maintain ia_inpcbs. This information was only used to
close sockets on address changes, which was deemed to be a bad idea and was
summarily removed, so there is no point in wasting effort on maintaining it
any more.
2003-10-23 20:55:08 +00:00
itojun
495906ca8e revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
2003-09-04 09:16:57 +00:00
agc
aad01611e7 Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22364, verified by myself.
2003-08-07 16:26:28 +00:00
matt
27e1742142 Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table.  By doing this, the mkludge code can go
away.  At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source.  Switch IGMP and multicasts to use pools
for allocation.  Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
2003-06-15 02:49:32 +00:00
itojun
61eed162b2 cleanup ipsec.h dependency. commented by perry, sync w/kame 2002-11-02 19:03:44 +00:00
itojun
f192b66b94 whitespace 2002-06-09 16:33:36 +00:00
itojun
1ff38f4d03 on interface removal, remove multicast groups joined from pcb, before
removing interface addresses.  without the change, we may deref
NULL pointer in in_pcbpurgeif().  from jinmei@kame, sync with kame
2001-07-02 15:25:34 +00:00
thorpej
c1185c1020 PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
2000-02-02 23:28:08 +00:00
itojun
1a2a1e2b1f bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
2000-01-31 14:18:52 +00:00
itojun
118d2b1d4f IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
  data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
  package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
  file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
1999-07-01 08:12:45 +00:00
lukem
a1ea50ee45 * in_pcblookup_port(): deprecate INPLOOKUP_WILDCARD and flags in favour
of a lookup_wildcard arg; simplifies the logic a bit.
* when assigning ephemeral ports in in_pcbbind(), always call
  in_pcblookup_port() with lookup_wildcard=1, so that ephemeral port
  allocation on sockets with SO_REUSEADDR set won't potentially bind to a
  port in use by something else (principle of least surprise).
1998-10-05 14:33:14 +00:00
matt
f070ddb8ed Move the ppcb pointer towards the front of the structure so that it and the
pcb chain pointers can possibly be in the same cache line.
1998-05-18 17:10:37 +00:00
perry
f73530ba55 add/cleanup multiple inclusion protection. 1998-02-10 01:26:19 +00:00
lukem
c80b4400e5 add the following, derived from FreeBSD:
* IP_PORTRANGE socket option, which controls how the ephemeral ports
  are allocated. it takes the following settings:
	IP_PORTRANGE_DEFAULT	use anonportmin (49152) -> anonportmax (65535)
	IP_PORTRANGE_HIGH	as IP_PORTRANGE_DEFAULT (retained for FreeBSD
				compat reasons, where these are separate)
	IP_PORTRANGE_LOW	use 600 -> 1023. only works if uid==0.
* in_pcb flag INP_ANONPORT. set if port was allocated ephmerally
1998-01-07 22:51:22 +00:00
matt
8c42ff649b Add support for returning maximum supported MTU when ip_output fails with
EMSGSIZE.
1997-10-14 00:52:39 +00:00
thorpej
de572198ad Implement in_pcbrtentry() - return the route associated with a PCB. If
one does not exist, attempt to allocate one.  This is mostly pulled from
tcp_input.c.
1997-09-22 21:39:40 +00:00
thorpej
efa8881dbe Pull SYN_cache_branch down into the main line. 1997-07-23 21:26:40 +00:00
thorpej
9df1988ac8 Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.
1997-01-11 05:21:07 +00:00
mycroft
d6121891ef Overlay inp_faddr and inp_laddr into the header prototype. 1996-09-17 17:10:20 +00:00
mycroft
9bfa240a98 Hash unconnected PCBs. 1996-09-15 18:11:06 +00:00
mycroft
62a6cce9ca Add in_nullhost() and in_hosteq() macros, to hide some protocol
details.  Also, fix a bug in TCP wrt SYN+URG packets.
1996-09-09 14:51:07 +00:00
mycroft
49d52c9b1c Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL.  The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
1996-05-22 13:54:55 +00:00
christos
14d9cd33af netinet prototypes 1996-02-13 23:40:59 +00:00
mycroft
67e78477db Build a hash table of PCBs. Hash function needs tweaking. 1996-01-31 03:49:23 +00:00
cgd
f90cf78fba convert pcb lists to CIRCLEQs, so that the end can be looked at more
easily, and so that the original (insque/remque) logic can be effectively
mimiced.  (This fixes a bug in the previous set of list changes.)
also (since terminator is no longer null) reinstate uninitted list checks,
but mark them XXX.
1995-06-18 20:01:08 +00:00
mycroft
cd7edee1ca in_pcbnotify*() don't return anything. 1995-06-12 06:49:55 +00:00
mycroft
6897f39ae9 Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
1995-06-12 00:46:47 +00:00
cgd
b5b72d26ea be a bit more careful and explicit with types. (basically a large no-op.) 1995-04-13 06:25:36 +00:00
jtc
7c04233887 KERNEL -> _KERNEL 1995-03-26 20:23:52 +00:00
cgd
cf92afd66e New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD' 1994-06-29 06:29:24 +00:00
mycroft
07b4f2ab54 Update to 4.4-Lite networking code, with a few local changes. 1994-05-13 06:02:48 +00:00
mycroft
4fe12e6e88 Fix some inconsistent spacing; spaces at the end of lines, etc. 1994-01-08 21:21:28 +00:00
hpeyerl
9589a22f73 More multicast stuff.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
1993-12-08 23:46:31 +00:00
cgd
45a57e79ea more rcsid additions and file header cleanups 1993-05-20 03:49:51 +00:00
mycroft
235bd1db44 Add consistent multiple-inclusion protection. 1993-04-19 03:45:34 +00:00
cgd
61f282557f initial import of 386bsd-0.1 sources 1993-03-21 09:45:37 +00:00