390 lines
11 KiB
Groff
390 lines
11 KiB
Groff
.\" $NetBSD: tcp.4,v 1.31 2015/02/14 13:02:38 wiz Exp $
|
|
.\" $FreeBSD: tcp.4,v 1.11.2.16 2004/02/16 22:21:47 bms Exp $
|
|
.\"
|
|
.\" Copyright (c) 1983, 1991, 1993
|
|
.\" The Regents of the University of California. All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\" 3. Neither the name of the University nor the names of its contributors
|
|
.\" may be used to endorse or promote products derived from this software
|
|
.\" without specific prior written permission.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" @(#)tcp.4 8.1 (Berkeley) 6/5/93
|
|
.\"
|
|
.Dd February 14, 2015
|
|
.Dt TCP 4
|
|
.Os
|
|
.Sh NAME
|
|
.Nm tcp
|
|
.Nd Internet Transmission Control Protocol
|
|
.Sh SYNOPSIS
|
|
.In sys/socket.h
|
|
.In netinet/in.h
|
|
.Ft int
|
|
.Fn socket AF_INET SOCK_STREAM 0
|
|
.Ft int
|
|
.Fn socket AF_INET6 SOCK_STREAM 0
|
|
.Sh DESCRIPTION
|
|
The
|
|
.Tn TCP
|
|
provides reliable, flow-controlled, two-way transmission of data.
|
|
It is a byte-stream protocol used to support the
|
|
.Dv SOCK_STREAM
|
|
abstraction.
|
|
.Tn TCP
|
|
uses the standard Internet address format and, in addition, provides
|
|
a per-host collection of
|
|
.Dq port addresses .
|
|
Thus, each address is composed of an Internet address specifying
|
|
the host and network, with a specific
|
|
.Tn TCP
|
|
port on the host identifying the peer entity.
|
|
.Pp
|
|
Sockets using
|
|
.Tn TCP
|
|
are either
|
|
.Dq active
|
|
or
|
|
.Dq passive .
|
|
Active sockets initiate connections to passive
|
|
sockets.
|
|
By default
|
|
.Tn TCP
|
|
sockets are created active; to create a passive socket the
|
|
.Xr listen 2
|
|
system call must be used
|
|
after binding the socket with the
|
|
.Xr bind 2
|
|
system call.
|
|
Only passive sockets may use the
|
|
.Xr accept 2
|
|
call to accept incoming connections.
|
|
Only active sockets may use the
|
|
.Xr connect 2
|
|
call to initiate connections.
|
|
.Pp
|
|
Passive sockets may
|
|
.Dq underspecify
|
|
their location to match incoming connection requests from multiple networks.
|
|
This technique, termed
|
|
.Dq wildcard addressing ,
|
|
allows a single
|
|
server to provide service to clients on multiple networks.
|
|
To create a socket which listens on all networks, the Internet
|
|
address
|
|
.Dv INADDR_ANY
|
|
must be bound.
|
|
The
|
|
.Tn TCP
|
|
port may still be specified at this time; if the port is not
|
|
specified the system will assign one.
|
|
Once a connection has been established the socket's address is
|
|
fixed by the peer entity's location.
|
|
The address assigned the socket is the address associated with the
|
|
network interface through which packets are being transmitted and received.
|
|
Normally this address corresponds to the peer entity's network.
|
|
.Pp
|
|
.Tn TCP
|
|
supports a number of socket options which can be set with
|
|
.Xr setsockopt 2
|
|
and tested with
|
|
.Xr getsockopt 2 :
|
|
.Bl -tag -width TCP_KEEPINTVL
|
|
.It Dv TCP_NODELAY
|
|
Under most circumstances,
|
|
.Tn TCP
|
|
sends data when it is presented;
|
|
when outstanding data has not yet been acknowledged, it gathers
|
|
small amounts of output to be sent in a single packet once
|
|
an acknowledgment is received.
|
|
For a small number of clients, such as window systems
|
|
that send a stream of mouse events which receive no replies,
|
|
this packetization may cause significant delays.
|
|
Therefore,
|
|
.Tn TCP
|
|
provides a boolean option,
|
|
.Dv TCP_NODELAY
|
|
(from
|
|
.In netinet/tcp.h ,
|
|
to defeat this algorithm.
|
|
.It Dv TCP_MAXSEG
|
|
By default, a sender- and receiver-TCP
|
|
will negotiate among themselves to determine the maximum segment size
|
|
to be used for each connection.
|
|
The
|
|
.Dv TCP_MAXSEG
|
|
option allows the user to determine the result of this negotiation,
|
|
and to reduce it if desired.
|
|
.It Dv TCP_MD5SIG
|
|
This option enables the use of MD5 digests (also known as TCP-MD5)
|
|
on writes to the specified socket.
|
|
In the current release, only outgoing traffic is digested;
|
|
digests on incoming traffic are not verified.
|
|
The current default behavior for the system is to respond to a system
|
|
advertising this option with TCP-MD5; this may change.
|
|
.Pp
|
|
One common use for this in a
|
|
.Nx
|
|
router deployment is to enable
|
|
based routers to interwork with Cisco equipment at peering points.
|
|
Support for this feature conforms to RFC 2385.
|
|
Only IPv4 (AF_INET) sessions are supported.
|
|
.Pp
|
|
In order for this option to function correctly, it is necessary for the
|
|
administrator to add a tcp-md5 key entry to the system's security
|
|
associations database (SADB) using the
|
|
.Xr setkey 8
|
|
utility.
|
|
This entry must have an SPI of 0x1000 and can therefore only be specified
|
|
on a per-host basis at this time.
|
|
.Pp
|
|
If an SADB entry cannot be found for the destination, the outgoing traffic
|
|
will have an invalid digest option prepended, and the following error message
|
|
will be visible on the system console:
|
|
.Em "tcp_signature_compute: SADB lookup failed for %d.%d.%d.%d" .
|
|
.It Dv TCP_KEEPIDLE
|
|
.\" XXX: We always do it.
|
|
.\" When the
|
|
.\" .Dv SO_KEEPALIVE
|
|
.\" option is enabled,
|
|
TCP probes a connection that
|
|
has been idle for some amount of time.
|
|
The default value for this idle period is 4 hours.
|
|
The
|
|
.Dv TCP_KEEPIDLE
|
|
option can be used to affect this value for a given socket, and specifies
|
|
the number of seconds of idle time between keepalive probes.
|
|
This option takes an
|
|
.Vt "unsigned int"
|
|
value, with a value greater than 0.
|
|
.\" range of 1 to N (where N is
|
|
.\" the
|
|
.\" .Xr sysctl 8
|
|
.\" variable
|
|
.\" .Dv net.inet.tcp.keepidle ).
|
|
.\" divided by
|
|
.\" .Dv PR_SLOWHZ
|
|
.\" which is defined in the
|
|
.\" .In sys/protosw.h
|
|
.\" header file).
|
|
.It Dv TCP_KEEPINTVL
|
|
When the
|
|
.Dv SO_KEEPALIVE
|
|
option is enabled, TCP probes a connection that
|
|
has been idle for some amount of time.
|
|
If the remote system does not
|
|
respond to a keepalive probe, TCP retransmits the probe after some
|
|
amount of time.
|
|
The default value for this retransmit interval is 150 seconds.
|
|
The
|
|
.Dv TCP_KEEPINTVL
|
|
option can be used to affect this value for
|
|
a given socket, and specifies the number of seconds to wait before
|
|
retransmitting a keepalive probe.
|
|
This option takes an
|
|
.Vt "unsigned int"
|
|
value, with a value greater than 0.
|
|
.\" range of 1 to N (where N is the
|
|
.\" .Xr sysctl 8
|
|
.\" variable
|
|
.\" .Dv net.inet.tcp.keepintvl ).
|
|
.It Dv TCP_KEEPCNT
|
|
When the
|
|
.Dv SO_KEEPALIVE
|
|
option is enabled, TCP probes a connection that
|
|
has been idle for some amount of time.
|
|
If the remote system does not
|
|
respond to a keepalive probe, TCP retransmits the probe a certain
|
|
number of times before a connection is considered to be broken.
|
|
The default value for this keepalive probe retransmit limit is 8.
|
|
The
|
|
.Dv TCP_KEEPCNT
|
|
option can be used to affect this value for a given socket,
|
|
and specifies the maximum number of keepalive probes to be sent.
|
|
This option takes an
|
|
.Vt "unsigned int"
|
|
value, with a value greater than 0.
|
|
.\" range of 0 to N (where N is the
|
|
.\" .Xr sysctl 8
|
|
.\" variable
|
|
.\" .Dv net.inet.tcp.keepcnt ).
|
|
.It Dv TCP_KEEPINIT
|
|
If a TCP connection cannot be established within some amount of time,
|
|
TCP will time out the connect attempt.
|
|
The default value for this initial connection establishment timeout
|
|
is 150 seconds.
|
|
The
|
|
.Dv TCP_KEEPINIT
|
|
option can be used to affect this initial timeout period for a given
|
|
socket, and specifies the number of seconds to wait before the connect
|
|
attempt is timed out.
|
|
For passive connections, the
|
|
.Dv TCP_KEEPINIT
|
|
option value is inherited from the listening socket.
|
|
This option takes an
|
|
.Vt "unsigned int"
|
|
value, with a value greater than 0.
|
|
.It Dv TCP_INFO
|
|
Information about a socket's underlying TCP session may be retrieved
|
|
by passing the read-only option
|
|
.Dv TPC_INFO
|
|
to
|
|
.Xr getsockopt 2 .
|
|
It accepts a single argument: a pointer to an instance of
|
|
.Vt "struct tcp_info" .
|
|
.Pp
|
|
This API is subject to change; consult the source to determine
|
|
which fields are currently filled out by this option.
|
|
.Nx
|
|
specific additions include
|
|
send window size,
|
|
receive window size,
|
|
and
|
|
bandwidth-controlled window space.
|
|
.\" range of 0 to N (where N is the
|
|
.\" .Xr sysctl 8
|
|
.\" variable
|
|
.\" .Dv net.inet.tcp.keepinit ).
|
|
.El
|
|
.Pp
|
|
The option level for the
|
|
.Xr setsockopt 2
|
|
call is the protocol number for
|
|
.Tn TCP ,
|
|
available from
|
|
.Xr getprotobyname 3 .
|
|
.Pp
|
|
In the historical
|
|
.Bx
|
|
.Tn TCP
|
|
implementation, if the
|
|
.Dv TCP_NODELAY
|
|
option was set on a passive socket, the sockets returned by
|
|
.Xr accept 2
|
|
erroneously did not have the
|
|
.Dv TCP_NODELAY
|
|
option set; the behavior was corrected to inherit
|
|
.Dv TCP_NODELAY
|
|
in
|
|
.Nx 1.6 .
|
|
.Pp
|
|
Options at the
|
|
.Tn IP
|
|
network level may be used with
|
|
.Tn TCP ;
|
|
see
|
|
.Xr ip 4
|
|
or
|
|
.Xr ip6 4 .
|
|
Incoming connection requests that are source-routed are noted,
|
|
and the reverse source route is used in responding.
|
|
.Pp
|
|
There are many adjustable parameters that control various aspects
|
|
of the
|
|
.Nx
|
|
TCP behavior; these parameters are documented in
|
|
.Xr sysctl 7 ,
|
|
and they include:
|
|
.Bl -bullet -compact
|
|
.It
|
|
RFC 1323 extensions for high performance
|
|
.It
|
|
Send/receive buffer sizes
|
|
.It
|
|
Default maximum segment size (MSS)
|
|
.It
|
|
SYN cache parameters
|
|
.It
|
|
Hughes/Touch/Heidemann Congestion Window Monitoring algorithm
|
|
.It
|
|
Keepalive parameters
|
|
.It
|
|
newReno algorithm for congestion control
|
|
.It
|
|
Logging of connection refusals
|
|
.It
|
|
RST packet rate limits
|
|
.It
|
|
SACK (Selective Acknowledgment)
|
|
.It
|
|
ECN (Explicit Congestion Notification)
|
|
.It
|
|
Congestion window increase methods; the traditional packet counting or
|
|
RFC 3465 Appropriate Byte Counting
|
|
.It
|
|
RFC 3390: Increased initial window size
|
|
.El
|
|
.Sh DIAGNOSTICS
|
|
A socket operation may fail with one of the following errors returned:
|
|
.Bl -tag -width [EADDRNOTAVAIL]
|
|
.It Bq Er EISCONN
|
|
when trying to establish a connection on a socket which
|
|
already has one;
|
|
.It Bq Er ENOBUFS
|
|
when the system runs out of memory for
|
|
an internal data structure;
|
|
.It Bq Er ETIMEDOUT
|
|
when a connection was dropped
|
|
due to excessive retransmissions;
|
|
.It Bq Er ECONNRESET
|
|
when the remote peer
|
|
forces the connection to be closed;
|
|
.It Bq Er ECONNREFUSED
|
|
when the remote
|
|
peer actively refuses connection establishment (usually because
|
|
no process is listening to the port);
|
|
.It Bq Er EADDRINUSE
|
|
when an attempt
|
|
is made to create a socket with a port which has already been
|
|
allocated;
|
|
.It Bq Er EADDRNOTAVAIL
|
|
when an attempt is made to create a
|
|
socket with a network address for which no network interface
|
|
exists.
|
|
.El
|
|
.Sh SEE ALSO
|
|
.Xr getsockopt 2 ,
|
|
.Xr socket 2 ,
|
|
.Xr inet 4 ,
|
|
.Xr inet6 4 ,
|
|
.Xr intro 4 ,
|
|
.Xr ip 4 ,
|
|
.Xr ip6 4 ,
|
|
.Xr sysctl 7
|
|
.Rs
|
|
.%R RFC
|
|
.%N 793
|
|
.%D September 1981
|
|
.%T "Transmission Control Protocol"
|
|
.Re
|
|
.Rs
|
|
.%R RFC
|
|
.%N 1122
|
|
.%D October 1989
|
|
.%T "Requirements for Internet Hosts -- Communication Layers"
|
|
.Re
|
|
.Sh HISTORY
|
|
The
|
|
.Nm
|
|
protocol stack appeared in
|
|
.Bx 4.2 .
|