NetBSD/share/man/man4/tcp.4

390 lines
11 KiB
Groff

.\" $NetBSD: tcp.4,v 1.31 2015/02/14 13:02:38 wiz Exp $
.\" $FreeBSD: tcp.4,v 1.11.2.16 2004/02/16 22:21:47 bms Exp $
.\"
.\" Copyright (c) 1983, 1991, 1993
.\" The Regents of the University of California. All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 3. Neither the name of the University nor the names of its contributors
.\" may be used to endorse or promote products derived from this software
.\" without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" @(#)tcp.4 8.1 (Berkeley) 6/5/93
.\"
.Dd February 14, 2015
.Dt TCP 4
.Os
.Sh NAME
.Nm tcp
.Nd Internet Transmission Control Protocol
.Sh SYNOPSIS
.In sys/socket.h
.In netinet/in.h
.Ft int
.Fn socket AF_INET SOCK_STREAM 0
.Ft int
.Fn socket AF_INET6 SOCK_STREAM 0
.Sh DESCRIPTION
The
.Tn TCP
provides reliable, flow-controlled, two-way transmission of data.
It is a byte-stream protocol used to support the
.Dv SOCK_STREAM
abstraction.
.Tn TCP
uses the standard Internet address format and, in addition, provides
a per-host collection of
.Dq port addresses .
Thus, each address is composed of an Internet address specifying
the host and network, with a specific
.Tn TCP
port on the host identifying the peer entity.
.Pp
Sockets using
.Tn TCP
are either
.Dq active
or
.Dq passive .
Active sockets initiate connections to passive
sockets.
By default
.Tn TCP
sockets are created active; to create a passive socket the
.Xr listen 2
system call must be used
after binding the socket with the
.Xr bind 2
system call.
Only passive sockets may use the
.Xr accept 2
call to accept incoming connections.
Only active sockets may use the
.Xr connect 2
call to initiate connections.
.Pp
Passive sockets may
.Dq underspecify
their location to match incoming connection requests from multiple networks.
This technique, termed
.Dq wildcard addressing ,
allows a single
server to provide service to clients on multiple networks.
To create a socket which listens on all networks, the Internet
address
.Dv INADDR_ANY
must be bound.
The
.Tn TCP
port may still be specified at this time; if the port is not
specified the system will assign one.
Once a connection has been established the socket's address is
fixed by the peer entity's location.
The address assigned the socket is the address associated with the
network interface through which packets are being transmitted and received.
Normally this address corresponds to the peer entity's network.
.Pp
.Tn TCP
supports a number of socket options which can be set with
.Xr setsockopt 2
and tested with
.Xr getsockopt 2 :
.Bl -tag -width TCP_KEEPINTVL
.It Dv TCP_NODELAY
Under most circumstances,
.Tn TCP
sends data when it is presented;
when outstanding data has not yet been acknowledged, it gathers
small amounts of output to be sent in a single packet once
an acknowledgment is received.
For a small number of clients, such as window systems
that send a stream of mouse events which receive no replies,
this packetization may cause significant delays.
Therefore,
.Tn TCP
provides a boolean option,
.Dv TCP_NODELAY
(from
.In netinet/tcp.h ,
to defeat this algorithm.
.It Dv TCP_MAXSEG
By default, a sender- and receiver-TCP
will negotiate among themselves to determine the maximum segment size
to be used for each connection.
The
.Dv TCP_MAXSEG
option allows the user to determine the result of this negotiation,
and to reduce it if desired.
.It Dv TCP_MD5SIG
This option enables the use of MD5 digests (also known as TCP-MD5)
on writes to the specified socket.
In the current release, only outgoing traffic is digested;
digests on incoming traffic are not verified.
The current default behavior for the system is to respond to a system
advertising this option with TCP-MD5; this may change.
.Pp
One common use for this in a
.Nx
router deployment is to enable
based routers to interwork with Cisco equipment at peering points.
Support for this feature conforms to RFC 2385.
Only IPv4 (AF_INET) sessions are supported.
.Pp
In order for this option to function correctly, it is necessary for the
administrator to add a tcp-md5 key entry to the system's security
associations database (SADB) using the
.Xr setkey 8
utility.
This entry must have an SPI of 0x1000 and can therefore only be specified
on a per-host basis at this time.
.Pp
If an SADB entry cannot be found for the destination, the outgoing traffic
will have an invalid digest option prepended, and the following error message
will be visible on the system console:
.Em "tcp_signature_compute: SADB lookup failed for %d.%d.%d.%d" .
.It Dv TCP_KEEPIDLE
.\" XXX: We always do it.
.\" When the
.\" .Dv SO_KEEPALIVE
.\" option is enabled,
TCP probes a connection that
has been idle for some amount of time.
The default value for this idle period is 4 hours.
The
.Dv TCP_KEEPIDLE
option can be used to affect this value for a given socket, and specifies
the number of seconds of idle time between keepalive probes.
This option takes an
.Vt "unsigned int"
value, with a value greater than 0.
.\" range of 1 to N (where N is
.\" the
.\" .Xr sysctl 8
.\" variable
.\" .Dv net.inet.tcp.keepidle ).
.\" divided by
.\" .Dv PR_SLOWHZ
.\" which is defined in the
.\" .In sys/protosw.h
.\" header file).
.It Dv TCP_KEEPINTVL
When the
.Dv SO_KEEPALIVE
option is enabled, TCP probes a connection that
has been idle for some amount of time.
If the remote system does not
respond to a keepalive probe, TCP retransmits the probe after some
amount of time.
The default value for this retransmit interval is 150 seconds.
The
.Dv TCP_KEEPINTVL
option can be used to affect this value for
a given socket, and specifies the number of seconds to wait before
retransmitting a keepalive probe.
This option takes an
.Vt "unsigned int"
value, with a value greater than 0.
.\" range of 1 to N (where N is the
.\" .Xr sysctl 8
.\" variable
.\" .Dv net.inet.tcp.keepintvl ).
.It Dv TCP_KEEPCNT
When the
.Dv SO_KEEPALIVE
option is enabled, TCP probes a connection that
has been idle for some amount of time.
If the remote system does not
respond to a keepalive probe, TCP retransmits the probe a certain
number of times before a connection is considered to be broken.
The default value for this keepalive probe retransmit limit is 8.
The
.Dv TCP_KEEPCNT
option can be used to affect this value for a given socket,
and specifies the maximum number of keepalive probes to be sent.
This option takes an
.Vt "unsigned int"
value, with a value greater than 0.
.\" range of 0 to N (where N is the
.\" .Xr sysctl 8
.\" variable
.\" .Dv net.inet.tcp.keepcnt ).
.It Dv TCP_KEEPINIT
If a TCP connection cannot be established within some amount of time,
TCP will time out the connect attempt.
The default value for this initial connection establishment timeout
is 150 seconds.
The
.Dv TCP_KEEPINIT
option can be used to affect this initial timeout period for a given
socket, and specifies the number of seconds to wait before the connect
attempt is timed out.
For passive connections, the
.Dv TCP_KEEPINIT
option value is inherited from the listening socket.
This option takes an
.Vt "unsigned int"
value, with a value greater than 0.
.It Dv TCP_INFO
Information about a socket's underlying TCP session may be retrieved
by passing the read-only option
.Dv TPC_INFO
to
.Xr getsockopt 2 .
It accepts a single argument: a pointer to an instance of
.Vt "struct tcp_info" .
.Pp
This API is subject to change; consult the source to determine
which fields are currently filled out by this option.
.Nx
specific additions include
send window size,
receive window size,
and
bandwidth-controlled window space.
.\" range of 0 to N (where N is the
.\" .Xr sysctl 8
.\" variable
.\" .Dv net.inet.tcp.keepinit ).
.El
.Pp
The option level for the
.Xr setsockopt 2
call is the protocol number for
.Tn TCP ,
available from
.Xr getprotobyname 3 .
.Pp
In the historical
.Bx
.Tn TCP
implementation, if the
.Dv TCP_NODELAY
option was set on a passive socket, the sockets returned by
.Xr accept 2
erroneously did not have the
.Dv TCP_NODELAY
option set; the behavior was corrected to inherit
.Dv TCP_NODELAY
in
.Nx 1.6 .
.Pp
Options at the
.Tn IP
network level may be used with
.Tn TCP ;
see
.Xr ip 4
or
.Xr ip6 4 .
Incoming connection requests that are source-routed are noted,
and the reverse source route is used in responding.
.Pp
There are many adjustable parameters that control various aspects
of the
.Nx
TCP behavior; these parameters are documented in
.Xr sysctl 7 ,
and they include:
.Bl -bullet -compact
.It
RFC 1323 extensions for high performance
.It
Send/receive buffer sizes
.It
Default maximum segment size (MSS)
.It
SYN cache parameters
.It
Hughes/Touch/Heidemann Congestion Window Monitoring algorithm
.It
Keepalive parameters
.It
newReno algorithm for congestion control
.It
Logging of connection refusals
.It
RST packet rate limits
.It
SACK (Selective Acknowledgment)
.It
ECN (Explicit Congestion Notification)
.It
Congestion window increase methods; the traditional packet counting or
RFC 3465 Appropriate Byte Counting
.It
RFC 3390: Increased initial window size
.El
.Sh DIAGNOSTICS
A socket operation may fail with one of the following errors returned:
.Bl -tag -width [EADDRNOTAVAIL]
.It Bq Er EISCONN
when trying to establish a connection on a socket which
already has one;
.It Bq Er ENOBUFS
when the system runs out of memory for
an internal data structure;
.It Bq Er ETIMEDOUT
when a connection was dropped
due to excessive retransmissions;
.It Bq Er ECONNRESET
when the remote peer
forces the connection to be closed;
.It Bq Er ECONNREFUSED
when the remote
peer actively refuses connection establishment (usually because
no process is listening to the port);
.It Bq Er EADDRINUSE
when an attempt
is made to create a socket with a port which has already been
allocated;
.It Bq Er EADDRNOTAVAIL
when an attempt is made to create a
socket with a network address for which no network interface
exists.
.El
.Sh SEE ALSO
.Xr getsockopt 2 ,
.Xr socket 2 ,
.Xr inet 4 ,
.Xr inet6 4 ,
.Xr intro 4 ,
.Xr ip 4 ,
.Xr ip6 4 ,
.Xr sysctl 7
.Rs
.%R RFC
.%N 793
.%D September 1981
.%T "Transmission Control Protocol"
.Re
.Rs
.%R RFC
.%N 1122
.%D October 1989
.%T "Requirements for Internet Hosts -- Communication Layers"
.Re
.Sh HISTORY
The
.Nm
protocol stack appeared in
.Bx 4.2 .