6845 lines
282 KiB
Plaintext
6845 lines
282 KiB
Plaintext
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Network Working Group Internet Engineering Task Force
|
|||
|
Request for Comments: 1122 R. Braden, Editor
|
|||
|
October 1989
|
|||
|
|
|||
|
|
|||
|
Requirements for Internet Hosts -- Communication Layers
|
|||
|
|
|||
|
|
|||
|
Status of This Memo
|
|||
|
|
|||
|
This RFC is an official specification for the Internet community. It
|
|||
|
incorporates by reference, amends, corrects, and supplements the
|
|||
|
primary protocol standards documents relating to hosts. Distribution
|
|||
|
of this document is unlimited.
|
|||
|
|
|||
|
Summary
|
|||
|
|
|||
|
This is one RFC of a pair that defines and discusses the requirements
|
|||
|
for Internet host software. This RFC covers the communications
|
|||
|
protocol layers: link layer, IP layer, and transport layer; its
|
|||
|
companion RFC-1123 covers the application and support protocols.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Table of Contents
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
1. INTRODUCTION ............................................... 5
|
|||
|
1.1 The Internet Architecture .............................. 6
|
|||
|
1.1.1 Internet Hosts .................................... 6
|
|||
|
1.1.2 Architectural Assumptions ......................... 7
|
|||
|
1.1.3 Internet Protocol Suite ........................... 8
|
|||
|
1.1.4 Embedded Gateway Code ............................. 10
|
|||
|
1.2 General Considerations ................................. 12
|
|||
|
1.2.1 Continuing Internet Evolution ..................... 12
|
|||
|
1.2.2 Robustness Principle .............................. 12
|
|||
|
1.2.3 Error Logging ..................................... 13
|
|||
|
1.2.4 Configuration ..................................... 14
|
|||
|
1.3 Reading this Document .................................. 15
|
|||
|
1.3.1 Organization ...................................... 15
|
|||
|
1.3.2 Requirements ...................................... 16
|
|||
|
1.3.3 Terminology ....................................... 17
|
|||
|
1.4 Acknowledgments ........................................ 20
|
|||
|
|
|||
|
2. LINK LAYER .................................................. 21
|
|||
|
2.1 INTRODUCTION ........................................... 21
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 1]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
2.2 PROTOCOL WALK-THROUGH .................................. 21
|
|||
|
2.3 SPECIFIC ISSUES ........................................ 21
|
|||
|
2.3.1 Trailer Protocol Negotiation ...................... 21
|
|||
|
2.3.2 Address Resolution Protocol -- ARP ................ 22
|
|||
|
2.3.2.1 ARP Cache Validation ......................... 22
|
|||
|
2.3.2.2 ARP Packet Queue ............................. 24
|
|||
|
2.3.3 Ethernet and IEEE 802 Encapsulation ............... 24
|
|||
|
2.4 LINK/INTERNET LAYER INTERFACE .......................... 25
|
|||
|
2.5 LINK LAYER REQUIREMENTS SUMMARY ........................ 26
|
|||
|
|
|||
|
3. INTERNET LAYER PROTOCOLS .................................... 27
|
|||
|
3.1 INTRODUCTION ............................................ 27
|
|||
|
3.2 PROTOCOL WALK-THROUGH .................................. 29
|
|||
|
3.2.1 Internet Protocol -- IP ............................ 29
|
|||
|
3.2.1.1 Version Number ............................... 29
|
|||
|
3.2.1.2 Checksum ..................................... 29
|
|||
|
3.2.1.3 Addressing ................................... 29
|
|||
|
3.2.1.4 Fragmentation and Reassembly ................. 32
|
|||
|
3.2.1.5 Identification ............................... 32
|
|||
|
3.2.1.6 Type-of-Service .............................. 33
|
|||
|
3.2.1.7 Time-to-Live ................................. 34
|
|||
|
3.2.1.8 Options ...................................... 35
|
|||
|
3.2.2 Internet Control Message Protocol -- ICMP .......... 38
|
|||
|
3.2.2.1 Destination Unreachable ...................... 39
|
|||
|
3.2.2.2 Redirect ..................................... 40
|
|||
|
3.2.2.3 Source Quench ................................ 41
|
|||
|
3.2.2.4 Time Exceeded ................................ 41
|
|||
|
3.2.2.5 Parameter Problem ............................ 42
|
|||
|
3.2.2.6 Echo Request/Reply ........................... 42
|
|||
|
3.2.2.7 Information Request/Reply .................... 43
|
|||
|
3.2.2.8 Timestamp and Timestamp Reply ................ 43
|
|||
|
3.2.2.9 Address Mask Request/Reply ................... 45
|
|||
|
3.2.3 Internet Group Management Protocol IGMP ........... 47
|
|||
|
3.3 SPECIFIC ISSUES ........................................ 47
|
|||
|
3.3.1 Routing Outbound Datagrams ........................ 47
|
|||
|
3.3.1.1 Local/Remote Decision ........................ 47
|
|||
|
3.3.1.2 Gateway Selection ............................ 48
|
|||
|
3.3.1.3 Route Cache .................................. 49
|
|||
|
3.3.1.4 Dead Gateway Detection ....................... 51
|
|||
|
3.3.1.5 New Gateway Selection ........................ 55
|
|||
|
3.3.1.6 Initialization ............................... 56
|
|||
|
3.3.2 Reassembly ........................................ 56
|
|||
|
3.3.3 Fragmentation ..................................... 58
|
|||
|
3.3.4 Local Multihoming ................................. 60
|
|||
|
3.3.4.1 Introduction ................................. 60
|
|||
|
3.3.4.2 Multihoming Requirements ..................... 61
|
|||
|
3.3.4.3 Choosing a Source Address .................... 64
|
|||
|
3.3.5 Source Route Forwarding ........................... 65
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 2]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
3.3.6 Broadcasts ........................................ 66
|
|||
|
3.3.7 IP Multicasting ................................... 67
|
|||
|
3.3.8 Error Reporting ................................... 69
|
|||
|
3.4 INTERNET/TRANSPORT LAYER INTERFACE ..................... 69
|
|||
|
3.5 INTERNET LAYER REQUIREMENTS SUMMARY .................... 72
|
|||
|
|
|||
|
4. TRANSPORT PROTOCOLS ......................................... 77
|
|||
|
4.1 USER DATAGRAM PROTOCOL -- UDP .......................... 77
|
|||
|
4.1.1 INTRODUCTION ...................................... 77
|
|||
|
4.1.2 PROTOCOL WALK-THROUGH ............................. 77
|
|||
|
4.1.3 SPECIFIC ISSUES ................................... 77
|
|||
|
4.1.3.1 Ports ........................................ 77
|
|||
|
4.1.3.2 IP Options ................................... 77
|
|||
|
4.1.3.3 ICMP Messages ................................ 78
|
|||
|
4.1.3.4 UDP Checksums ................................ 78
|
|||
|
4.1.3.5 UDP Multihoming .............................. 79
|
|||
|
4.1.3.6 Invalid Addresses ............................ 79
|
|||
|
4.1.4 UDP/APPLICATION LAYER INTERFACE ................... 79
|
|||
|
4.1.5 UDP REQUIREMENTS SUMMARY .......................... 80
|
|||
|
4.2 TRANSMISSION CONTROL PROTOCOL -- TCP ................... 82
|
|||
|
4.2.1 INTRODUCTION ...................................... 82
|
|||
|
4.2.2 PROTOCOL WALK-THROUGH ............................. 82
|
|||
|
4.2.2.1 Well-Known Ports ............................. 82
|
|||
|
4.2.2.2 Use of Push .................................. 82
|
|||
|
4.2.2.3 Window Size .................................. 83
|
|||
|
4.2.2.4 Urgent Pointer ............................... 84
|
|||
|
4.2.2.5 TCP Options .................................. 85
|
|||
|
4.2.2.6 Maximum Segment Size Option .................. 85
|
|||
|
4.2.2.7 TCP Checksum ................................. 86
|
|||
|
4.2.2.8 TCP Connection State Diagram ................. 86
|
|||
|
4.2.2.9 Initial Sequence Number Selection ............ 87
|
|||
|
4.2.2.10 Simultaneous Open Attempts .................. 87
|
|||
|
4.2.2.11 Recovery from Old Duplicate SYN ............. 87
|
|||
|
4.2.2.12 RST Segment ................................. 87
|
|||
|
4.2.2.13 Closing a Connection ........................ 87
|
|||
|
4.2.2.14 Data Communication .......................... 89
|
|||
|
4.2.2.15 Retransmission Timeout ...................... 90
|
|||
|
4.2.2.16 Managing the Window ......................... 91
|
|||
|
4.2.2.17 Probing Zero Windows ........................ 92
|
|||
|
4.2.2.18 Passive OPEN Calls .......................... 92
|
|||
|
4.2.2.19 Time to Live ................................ 93
|
|||
|
4.2.2.20 Event Processing ............................ 93
|
|||
|
4.2.2.21 Acknowledging Queued Segments ............... 94
|
|||
|
4.2.3 SPECIFIC ISSUES ................................... 95
|
|||
|
4.2.3.1 Retransmission Timeout Calculation ........... 95
|
|||
|
4.2.3.2 When to Send an ACK Segment .................. 96
|
|||
|
4.2.3.3 When to Send a Window Update ................. 97
|
|||
|
4.2.3.4 When to Send Data ............................ 98
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 3]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
4.2.3.5 TCP Connection Failures ...................... 100
|
|||
|
4.2.3.6 TCP Keep-Alives .............................. 101
|
|||
|
4.2.3.7 TCP Multihoming .............................. 103
|
|||
|
4.2.3.8 IP Options ................................... 103
|
|||
|
4.2.3.9 ICMP Messages ................................ 103
|
|||
|
4.2.3.10 Remote Address Validation ................... 104
|
|||
|
4.2.3.11 TCP Traffic Patterns ........................ 104
|
|||
|
4.2.3.12 Efficiency .................................. 105
|
|||
|
4.2.4 TCP/APPLICATION LAYER INTERFACE ................... 106
|
|||
|
4.2.4.1 Asynchronous Reports ......................... 106
|
|||
|
4.2.4.2 Type-of-Service .............................. 107
|
|||
|
4.2.4.3 Flush Call ................................... 107
|
|||
|
4.2.4.4 Multihoming .................................. 108
|
|||
|
4.2.5 TCP REQUIREMENT SUMMARY ........................... 108
|
|||
|
|
|||
|
5. REFERENCES ................................................. 112
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 4]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
1. INTRODUCTION
|
|||
|
|
|||
|
This document is one of a pair that defines and discusses the
|
|||
|
requirements for host system implementations of the Internet protocol
|
|||
|
suite. This RFC covers the communication protocol layers: link
|
|||
|
layer, IP layer, and transport layer. Its companion RFC,
|
|||
|
"Requirements for Internet Hosts -- Application and Support"
|
|||
|
[INTRO:1], covers the application layer protocols. This document
|
|||
|
should also be read in conjunction with "Requirements for Internet
|
|||
|
Gateways" [INTRO:2].
|
|||
|
|
|||
|
These documents are intended to provide guidance for vendors,
|
|||
|
implementors, and users of Internet communication software. They
|
|||
|
represent the consensus of a large body of technical experience and
|
|||
|
wisdom, contributed by the members of the Internet research and
|
|||
|
vendor communities.
|
|||
|
|
|||
|
This RFC enumerates standard protocols that a host connected to the
|
|||
|
Internet must use, and it incorporates by reference the RFCs and
|
|||
|
other documents describing the current specifications for these
|
|||
|
protocols. It corrects errors in the referenced documents and adds
|
|||
|
additional discussion and guidance for an implementor.
|
|||
|
|
|||
|
For each protocol, this document also contains an explicit set of
|
|||
|
requirements, recommendations, and options. The reader must
|
|||
|
understand that the list of requirements in this document is
|
|||
|
incomplete by itself; the complete set of requirements for an
|
|||
|
Internet host is primarily defined in the standard protocol
|
|||
|
specification documents, with the corrections, amendments, and
|
|||
|
supplements contained in this RFC.
|
|||
|
|
|||
|
A good-faith implementation of the protocols that was produced after
|
|||
|
careful reading of the RFC's and with some interaction with the
|
|||
|
Internet technical community, and that followed good communications
|
|||
|
software engineering practices, should differ from the requirements
|
|||
|
of this document in only minor ways. Thus, in many cases, the
|
|||
|
"requirements" in this RFC are already stated or implied in the
|
|||
|
standard protocol documents, so that their inclusion here is, in a
|
|||
|
sense, redundant. However, they were included because some past
|
|||
|
implementation has made the wrong choice, causing problems of
|
|||
|
interoperability, performance, and/or robustness.
|
|||
|
|
|||
|
This document includes discussion and explanation of many of the
|
|||
|
requirements and recommendations. A simple list of requirements
|
|||
|
would be dangerous, because:
|
|||
|
|
|||
|
o Some required features are more important than others, and some
|
|||
|
features are optional.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 5]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
o There may be valid reasons why particular vendor products that
|
|||
|
are designed for restricted contexts might choose to use
|
|||
|
different specifications.
|
|||
|
|
|||
|
However, the specifications of this document must be followed to meet
|
|||
|
the general goal of arbitrary host interoperation across the
|
|||
|
diversity and complexity of the Internet system. Although most
|
|||
|
current implementations fail to meet these requirements in various
|
|||
|
ways, some minor and some major, this specification is the ideal
|
|||
|
towards which we need to move.
|
|||
|
|
|||
|
These requirements are based on the current level of Internet
|
|||
|
architecture. This document will be updated as required to provide
|
|||
|
additional clarifications or to include additional information in
|
|||
|
those areas in which specifications are still evolving.
|
|||
|
|
|||
|
This introductory section begins with a brief overview of the
|
|||
|
Internet architecture as it relates to hosts, and then gives some
|
|||
|
general advice to host software vendors. Finally, there is some
|
|||
|
guidance on reading the rest of the document and some terminology.
|
|||
|
|
|||
|
1.1 The Internet Architecture
|
|||
|
|
|||
|
General background and discussion on the Internet architecture and
|
|||
|
supporting protocol suite can be found in the DDN Protocol
|
|||
|
Handbook [INTRO:3]; for background see for example [INTRO:9],
|
|||
|
[INTRO:10], and [INTRO:11]. Reference [INTRO:5] describes the
|
|||
|
procedure for obtaining Internet protocol documents, while
|
|||
|
[INTRO:6] contains a list of the numbers assigned within Internet
|
|||
|
protocols.
|
|||
|
|
|||
|
1.1.1 Internet Hosts
|
|||
|
|
|||
|
A host computer, or simply "host," is the ultimate consumer of
|
|||
|
communication services. A host generally executes application
|
|||
|
programs on behalf of user(s), employing network and/or
|
|||
|
Internet communication services in support of this function.
|
|||
|
An Internet host corresponds to the concept of an "End-System"
|
|||
|
used in the OSI protocol suite [INTRO:13].
|
|||
|
|
|||
|
An Internet communication system consists of interconnected
|
|||
|
packet networks supporting communication among host computers
|
|||
|
using the Internet protocols. The networks are interconnected
|
|||
|
using packet-switching computers called "gateways" or "IP
|
|||
|
routers" by the Internet community, and "Intermediate Systems"
|
|||
|
by the OSI world [INTRO:13]. The RFC "Requirements for
|
|||
|
Internet Gateways" [INTRO:2] contains the official
|
|||
|
specifications for Internet gateways. That RFC together with
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 6]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
the present document and its companion [INTRO:1] define the
|
|||
|
rules for the current realization of the Internet architecture.
|
|||
|
|
|||
|
Internet hosts span a wide range of size, speed, and function.
|
|||
|
They range in size from small microprocessors through
|
|||
|
workstations to mainframes and supercomputers. In function,
|
|||
|
they range from single-purpose hosts (such as terminal servers)
|
|||
|
to full-service hosts that support a variety of online network
|
|||
|
services, typically including remote login, file transfer, and
|
|||
|
electronic mail.
|
|||
|
|
|||
|
A host is generally said to be multihomed if it has more than
|
|||
|
one interface to the same or to different networks. See
|
|||
|
Section 1.1.3 on "Terminology".
|
|||
|
|
|||
|
1.1.2 Architectural Assumptions
|
|||
|
|
|||
|
The current Internet architecture is based on a set of
|
|||
|
assumptions about the communication system. The assumptions
|
|||
|
most relevant to hosts are as follows:
|
|||
|
|
|||
|
(a) The Internet is a network of networks.
|
|||
|
|
|||
|
Each host is directly connected to some particular
|
|||
|
network(s); its connection to the Internet is only
|
|||
|
conceptual. Two hosts on the same network communicate
|
|||
|
with each other using the same set of protocols that they
|
|||
|
would use to communicate with hosts on distant networks.
|
|||
|
|
|||
|
(b) Gateways don't keep connection state information.
|
|||
|
|
|||
|
To improve robustness of the communication system,
|
|||
|
gateways are designed to be stateless, forwarding each IP
|
|||
|
datagram independently of other datagrams. As a result,
|
|||
|
redundant paths can be exploited to provide robust service
|
|||
|
in spite of failures of intervening gateways and networks.
|
|||
|
|
|||
|
All state information required for end-to-end flow control
|
|||
|
and reliability is implemented in the hosts, in the
|
|||
|
transport layer or in application programs. All
|
|||
|
connection control information is thus co-located with the
|
|||
|
end points of the communication, so it will be lost only
|
|||
|
if an end point fails.
|
|||
|
|
|||
|
(c) Routing complexity should be in the gateways.
|
|||
|
|
|||
|
Routing is a complex and difficult problem, and ought to
|
|||
|
be performed by the gateways, not the hosts. An important
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 7]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
objective is to insulate host software from changes caused
|
|||
|
by the inevitable evolution of the Internet routing
|
|||
|
architecture.
|
|||
|
|
|||
|
(d) The System must tolerate wide network variation.
|
|||
|
|
|||
|
A basic objective of the Internet design is to tolerate a
|
|||
|
wide range of network characteristics -- e.g., bandwidth,
|
|||
|
delay, packet loss, packet reordering, and maximum packet
|
|||
|
size. Another objective is robustness against failure of
|
|||
|
individual networks, gateways, and hosts, using whatever
|
|||
|
bandwidth is still available. Finally, the goal is full
|
|||
|
"open system interconnection": an Internet host must be
|
|||
|
able to interoperate robustly and effectively with any
|
|||
|
other Internet host, across diverse Internet paths.
|
|||
|
|
|||
|
Sometimes host implementors have designed for less
|
|||
|
ambitious goals. For example, the LAN environment is
|
|||
|
typically much more benign than the Internet as a whole;
|
|||
|
LANs have low packet loss and delay and do not reorder
|
|||
|
packets. Some vendors have fielded host implementations
|
|||
|
that are adequate for a simple LAN environment, but work
|
|||
|
badly for general interoperation. The vendor justifies
|
|||
|
such a product as being economical within the restricted
|
|||
|
LAN market. However, isolated LANs seldom stay isolated
|
|||
|
for long; they are soon gatewayed to each other, to
|
|||
|
organization-wide internets, and eventually to the global
|
|||
|
Internet system. In the end, neither the customer nor the
|
|||
|
vendor is served by incomplete or substandard Internet
|
|||
|
host software.
|
|||
|
|
|||
|
The requirements spelled out in this document are designed
|
|||
|
for a full-function Internet host, capable of full
|
|||
|
interoperation over an arbitrary Internet path.
|
|||
|
|
|||
|
|
|||
|
1.1.3 Internet Protocol Suite
|
|||
|
|
|||
|
To communicate using the Internet system, a host must implement
|
|||
|
the layered set of protocols comprising the Internet protocol
|
|||
|
suite. A host typically must implement at least one protocol
|
|||
|
from each layer.
|
|||
|
|
|||
|
The protocol layers used in the Internet architecture are as
|
|||
|
follows [INTRO:4]:
|
|||
|
|
|||
|
|
|||
|
o Application Layer
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 8]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
The application layer is the top layer of the Internet
|
|||
|
protocol suite. The Internet suite does not further
|
|||
|
subdivide the application layer, although some of the
|
|||
|
Internet application layer protocols do contain some
|
|||
|
internal sub-layering. The application layer of the
|
|||
|
Internet suite essentially combines the functions of the
|
|||
|
top two layers -- Presentation and Application -- of the
|
|||
|
OSI reference model.
|
|||
|
|
|||
|
We distinguish two categories of application layer
|
|||
|
protocols: user protocols that provide service directly
|
|||
|
to users, and support protocols that provide common system
|
|||
|
functions. Requirements for user and support protocols
|
|||
|
will be found in the companion RFC [INTRO:1].
|
|||
|
|
|||
|
The most common Internet user protocols are:
|
|||
|
|
|||
|
o Telnet (remote login)
|
|||
|
o FTP (file transfer)
|
|||
|
o SMTP (electronic mail delivery)
|
|||
|
|
|||
|
There are a number of other standardized user protocols
|
|||
|
[INTRO:4] and many private user protocols.
|
|||
|
|
|||
|
Support protocols, used for host name mapping, booting,
|
|||
|
and management, include SNMP, BOOTP, RARP, and the Domain
|
|||
|
Name System (DNS) protocols.
|
|||
|
|
|||
|
|
|||
|
o Transport Layer
|
|||
|
|
|||
|
The transport layer provides end-to-end communication
|
|||
|
services for applications. There are two primary
|
|||
|
transport layer protocols at present:
|
|||
|
|
|||
|
o Transmission Control Protocol (TCP)
|
|||
|
o User Datagram Protocol (UDP)
|
|||
|
|
|||
|
TCP is a reliable connection-oriented transport service
|
|||
|
that provides end-to-end reliability, resequencing, and
|
|||
|
flow control. UDP is a connectionless ("datagram")
|
|||
|
transport service.
|
|||
|
|
|||
|
Other transport protocols have been developed by the
|
|||
|
research community, and the set of official Internet
|
|||
|
transport protocols may be expanded in the future.
|
|||
|
|
|||
|
Transport layer protocols are discussed in Chapter 4.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 9]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
o Internet Layer
|
|||
|
|
|||
|
All Internet transport protocols use the Internet Protocol
|
|||
|
(IP) to carry data from source host to destination host.
|
|||
|
IP is a connectionless or datagram internetwork service,
|
|||
|
providing no end-to-end delivery guarantees. Thus, IP
|
|||
|
datagrams may arrive at the destination host damaged,
|
|||
|
duplicated, out of order, or not at all. The layers above
|
|||
|
IP are responsible for reliable delivery service when it
|
|||
|
is required. The IP protocol includes provision for
|
|||
|
addressing, type-of-service specification, fragmentation
|
|||
|
and reassembly, and security information.
|
|||
|
|
|||
|
The datagram or connectionless nature of the IP protocol
|
|||
|
is a fundamental and characteristic feature of the
|
|||
|
Internet architecture. Internet IP was the model for the
|
|||
|
OSI Connectionless Network Protocol [INTRO:12].
|
|||
|
|
|||
|
ICMP is a control protocol that is considered to be an
|
|||
|
integral part of IP, although it is architecturally
|
|||
|
layered upon IP, i.e., it uses IP to carry its data end-
|
|||
|
to-end just as a transport protocol like TCP or UDP does.
|
|||
|
ICMP provides error reporting, congestion reporting, and
|
|||
|
first-hop gateway redirection.
|
|||
|
|
|||
|
IGMP is an Internet layer protocol used for establishing
|
|||
|
dynamic host groups for IP multicasting.
|
|||
|
|
|||
|
The Internet layer protocols IP, ICMP, and IGMP are
|
|||
|
discussed in Chapter 3.
|
|||
|
|
|||
|
|
|||
|
o Link Layer
|
|||
|
|
|||
|
To communicate on its directly-connected network, a host
|
|||
|
must implement the communication protocol used to
|
|||
|
interface to that network. We call this a link layer or
|
|||
|
media-access layer protocol.
|
|||
|
|
|||
|
There is a wide variety of link layer protocols,
|
|||
|
corresponding to the many different types of networks.
|
|||
|
See Chapter 2.
|
|||
|
|
|||
|
|
|||
|
1.1.4 Embedded Gateway Code
|
|||
|
|
|||
|
Some Internet host software includes embedded gateway
|
|||
|
functionality, so that these hosts can forward packets as a
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 10]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
gateway would, while still performing the application layer
|
|||
|
functions of a host.
|
|||
|
|
|||
|
Such dual-purpose systems must follow the Gateway Requirements
|
|||
|
RFC [INTRO:2] with respect to their gateway functions, and
|
|||
|
must follow the present document with respect to their host
|
|||
|
functions. In all overlapping cases, the two specifications
|
|||
|
should be in agreement.
|
|||
|
|
|||
|
There are varying opinions in the Internet community about
|
|||
|
embedded gateway functionality. The main arguments are as
|
|||
|
follows:
|
|||
|
|
|||
|
o Pro: in a local network environment where networking is
|
|||
|
informal, or in isolated internets, it may be convenient
|
|||
|
and economical to use existing host systems as gateways.
|
|||
|
|
|||
|
There is also an architectural argument for embedded
|
|||
|
gateway functionality: multihoming is much more common
|
|||
|
than originally foreseen, and multihoming forces a host to
|
|||
|
make routing decisions as if it were a gateway. If the
|
|||
|
multihomed host contains an embedded gateway, it will
|
|||
|
have full routing knowledge and as a result will be able
|
|||
|
to make more optimal routing decisions.
|
|||
|
|
|||
|
o Con: Gateway algorithms and protocols are still changing,
|
|||
|
and they will continue to change as the Internet system
|
|||
|
grows larger. Attempting to include a general gateway
|
|||
|
function within the host IP layer will force host system
|
|||
|
maintainers to track these (more frequent) changes. Also,
|
|||
|
a larger pool of gateway implementations will make
|
|||
|
coordinating the changes more difficult. Finally, the
|
|||
|
complexity of a gateway IP layer is somewhat greater than
|
|||
|
that of a host, making the implementation and operation
|
|||
|
tasks more complex.
|
|||
|
|
|||
|
In addition, the style of operation of some hosts is not
|
|||
|
appropriate for providing stable and robust gateway
|
|||
|
service.
|
|||
|
|
|||
|
There is considerable merit in both of these viewpoints. One
|
|||
|
conclusion can be drawn: an host administrator must have
|
|||
|
conscious control over whether or not a given host acts as a
|
|||
|
gateway. See Section 3.1 for the detailed requirements.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 11]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
1.2 General Considerations
|
|||
|
|
|||
|
There are two important lessons that vendors of Internet host
|
|||
|
software have learned and which a new vendor should consider
|
|||
|
seriously.
|
|||
|
|
|||
|
1.2.1 Continuing Internet Evolution
|
|||
|
|
|||
|
The enormous growth of the Internet has revealed problems of
|
|||
|
management and scaling in a large datagram-based packet
|
|||
|
communication system. These problems are being addressed, and
|
|||
|
as a result there will be continuing evolution of the
|
|||
|
specifications described in this document. These changes will
|
|||
|
be carefully planned and controlled, since there is extensive
|
|||
|
participation in this planning by the vendors and by the
|
|||
|
organizations responsible for operations of the networks.
|
|||
|
|
|||
|
Development, evolution, and revision are characteristic of
|
|||
|
computer network protocols today, and this situation will
|
|||
|
persist for some years. A vendor who develops computer
|
|||
|
communication software for the Internet protocol suite (or any
|
|||
|
other protocol suite!) and then fails to maintain and update
|
|||
|
that software for changing specifications is going to leave a
|
|||
|
trail of unhappy customers. The Internet is a large
|
|||
|
communication network, and the users are in constant contact
|
|||
|
through it. Experience has shown that knowledge of
|
|||
|
deficiencies in vendor software propagates quickly through the
|
|||
|
Internet technical community.
|
|||
|
|
|||
|
1.2.2 Robustness Principle
|
|||
|
|
|||
|
At every layer of the protocols, there is a general rule whose
|
|||
|
application can lead to enormous benefits in robustness and
|
|||
|
interoperability [IP:1]:
|
|||
|
|
|||
|
"Be liberal in what you accept, and
|
|||
|
conservative in what you send"
|
|||
|
|
|||
|
Software should be written to deal with every conceivable
|
|||
|
error, no matter how unlikely; sooner or later a packet will
|
|||
|
come in with that particular combination of errors and
|
|||
|
attributes, and unless the software is prepared, chaos can
|
|||
|
ensue. In general, it is best to assume that the network is
|
|||
|
filled with malevolent entities that will send in packets
|
|||
|
designed to have the worst possible effect. This assumption
|
|||
|
will lead to suitable protective design, although the most
|
|||
|
serious problems in the Internet have been caused by
|
|||
|
unenvisaged mechanisms triggered by low-probability events;
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 12]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
mere human malice would never have taken so devious a course!
|
|||
|
|
|||
|
Adaptability to change must be designed into all levels of
|
|||
|
Internet host software. As a simple example, consider a
|
|||
|
protocol specification that contains an enumeration of values
|
|||
|
for a particular header field -- e.g., a type field, a port
|
|||
|
number, or an error code; this enumeration must be assumed to
|
|||
|
be incomplete. Thus, if a protocol specification defines four
|
|||
|
possible error codes, the software must not break when a fifth
|
|||
|
code shows up. An undefined code might be logged (see below),
|
|||
|
but it must not cause a failure.
|
|||
|
|
|||
|
The second part of the principle is almost as important:
|
|||
|
software on other hosts may contain deficiencies that make it
|
|||
|
unwise to exploit legal but obscure protocol features. It is
|
|||
|
unwise to stray far from the obvious and simple, lest untoward
|
|||
|
effects result elsewhere. A corollary of this is "watch out
|
|||
|
for misbehaving hosts"; host software should be prepared, not
|
|||
|
just to survive other misbehaving hosts, but also to cooperate
|
|||
|
to limit the amount of disruption such hosts can cause to the
|
|||
|
shared communication facility.
|
|||
|
|
|||
|
1.2.3 Error Logging
|
|||
|
|
|||
|
The Internet includes a great variety of host and gateway
|
|||
|
systems, each implementing many protocols and protocol layers,
|
|||
|
and some of these contain bugs and mis-features in their
|
|||
|
Internet protocol software. As a result of complexity,
|
|||
|
diversity, and distribution of function, the diagnosis of
|
|||
|
Internet problems is often very difficult.
|
|||
|
|
|||
|
Problem diagnosis will be aided if host implementations include
|
|||
|
a carefully designed facility for logging erroneous or
|
|||
|
"strange" protocol events. It is important to include as much
|
|||
|
diagnostic information as possible when an error is logged. In
|
|||
|
particular, it is often useful to record the header(s) of a
|
|||
|
packet that caused an error. However, care must be taken to
|
|||
|
ensure that error logging does not consume prohibitive amounts
|
|||
|
of resources or otherwise interfere with the operation of the
|
|||
|
host.
|
|||
|
|
|||
|
There is a tendency for abnormal but harmless protocol events
|
|||
|
to overflow error logging files; this can be avoided by using a
|
|||
|
"circular" log, or by enabling logging only while diagnosing a
|
|||
|
known failure. It may be useful to filter and count duplicate
|
|||
|
successive messages. One strategy that seems to work well is:
|
|||
|
(1) always count abnormalities and make such counts accessible
|
|||
|
through the management protocol (see [INTRO:1]); and (2) allow
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 13]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
the logging of a great variety of events to be selectively
|
|||
|
enabled. For example, it might useful to be able to "log
|
|||
|
everything" or to "log everything for host X".
|
|||
|
|
|||
|
Note that different managements may have differing policies
|
|||
|
about the amount of error logging that they want normally
|
|||
|
enabled in a host. Some will say, "if it doesn't hurt me, I
|
|||
|
don't want to know about it", while others will want to take a
|
|||
|
more watchful and aggressive attitude about detecting and
|
|||
|
removing protocol abnormalities.
|
|||
|
|
|||
|
1.2.4 Configuration
|
|||
|
|
|||
|
It would be ideal if a host implementation of the Internet
|
|||
|
protocol suite could be entirely self-configuring. This would
|
|||
|
allow the whole suite to be implemented in ROM or cast into
|
|||
|
silicon, it would simplify diskless workstations, and it would
|
|||
|
be an immense boon to harried LAN administrators as well as
|
|||
|
system vendors. We have not reached this ideal; in fact, we
|
|||
|
are not even close.
|
|||
|
|
|||
|
At many points in this document, you will find a requirement
|
|||
|
that a parameter be a configurable option. There are several
|
|||
|
different reasons behind such requirements. In a few cases,
|
|||
|
there is current uncertainty or disagreement about the best
|
|||
|
value, and it may be necessary to update the recommended value
|
|||
|
in the future. In other cases, the value really depends on
|
|||
|
external factors -- e.g., the size of the host and the
|
|||
|
distribution of its communication load, or the speeds and
|
|||
|
topology of nearby networks -- and self-tuning algorithms are
|
|||
|
unavailable and may be insufficient. In some cases,
|
|||
|
configurability is needed because of administrative
|
|||
|
requirements.
|
|||
|
|
|||
|
Finally, some configuration options are required to communicate
|
|||
|
with obsolete or incorrect implementations of the protocols,
|
|||
|
distributed without sources, that unfortunately persist in many
|
|||
|
parts of the Internet. To make correct systems coexist with
|
|||
|
these faulty systems, administrators often have to "mis-
|
|||
|
configure" the correct systems. This problem will correct
|
|||
|
itself gradually as the faulty systems are retired, but it
|
|||
|
cannot be ignored by vendors.
|
|||
|
|
|||
|
When we say that a parameter must be configurable, we do not
|
|||
|
intend to require that its value be explicitly read from a
|
|||
|
configuration file at every boot time. We recommend that
|
|||
|
implementors set up a default for each parameter, so a
|
|||
|
configuration file is only necessary to override those defaults
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 14]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
that are inappropriate in a particular installation. Thus, the
|
|||
|
configurability requirement is an assurance that it will be
|
|||
|
POSSIBLE to override the default when necessary, even in a
|
|||
|
binary-only or ROM-based product.
|
|||
|
|
|||
|
This document requires a particular value for such defaults in
|
|||
|
some cases. The choice of default is a sensitive issue when
|
|||
|
the configuration item controls the accommodation to existing
|
|||
|
faulty systems. If the Internet is to converge successfully to
|
|||
|
complete interoperability, the default values built into
|
|||
|
implementations must implement the official protocol, not
|
|||
|
"mis-configurations" to accommodate faulty implementations.
|
|||
|
Although marketing considerations have led some vendors to
|
|||
|
choose mis-configuration defaults, we urge vendors to choose
|
|||
|
defaults that will conform to the standard.
|
|||
|
|
|||
|
Finally, we note that a vendor needs to provide adequate
|
|||
|
documentation on all configuration parameters, their limits and
|
|||
|
effects.
|
|||
|
|
|||
|
|
|||
|
1.3 Reading this Document
|
|||
|
|
|||
|
1.3.1 Organization
|
|||
|
|
|||
|
Protocol layering, which is generally used as an organizing
|
|||
|
principle in implementing network software, has also been used
|
|||
|
to organize this document. In describing the rules, we assume
|
|||
|
that an implementation does strictly mirror the layering of the
|
|||
|
protocols. Thus, the following three major sections specify
|
|||
|
the requirements for the link layer, the internet layer, and
|
|||
|
the transport layer, respectively. A companion RFC [INTRO:1]
|
|||
|
covers application level software. This layerist organization
|
|||
|
was chosen for simplicity and clarity.
|
|||
|
|
|||
|
However, strict layering is an imperfect model, both for the
|
|||
|
protocol suite and for recommended implementation approaches.
|
|||
|
Protocols in different layers interact in complex and sometimes
|
|||
|
subtle ways, and particular functions often involve multiple
|
|||
|
layers. There are many design choices in an implementation,
|
|||
|
many of which involve creative "breaking" of strict layering.
|
|||
|
Every implementor is urged to read references [INTRO:7] and
|
|||
|
[INTRO:8].
|
|||
|
|
|||
|
This document describes the conceptual service interface
|
|||
|
between layers using a functional ("procedure call") notation,
|
|||
|
like that used in the TCP specification [TCP:1]. A host
|
|||
|
implementation must support the logical information flow
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 15]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
implied by these calls, but need not literally implement the
|
|||
|
calls themselves. For example, many implementations reflect
|
|||
|
the coupling between the transport layer and the IP layer by
|
|||
|
giving them shared access to common data structures. These
|
|||
|
data structures, rather than explicit procedure calls, are then
|
|||
|
the agency for passing much of the information that is
|
|||
|
required.
|
|||
|
|
|||
|
In general, each major section of this document is organized
|
|||
|
into the following subsections:
|
|||
|
|
|||
|
(1) Introduction
|
|||
|
|
|||
|
(2) Protocol Walk-Through -- considers the protocol
|
|||
|
specification documents section-by-section, correcting
|
|||
|
errors, stating requirements that may be ambiguous or
|
|||
|
ill-defined, and providing further clarification or
|
|||
|
explanation.
|
|||
|
|
|||
|
(3) Specific Issues -- discusses protocol design and
|
|||
|
implementation issues that were not included in the walk-
|
|||
|
through.
|
|||
|
|
|||
|
(4) Interfaces -- discusses the service interface to the next
|
|||
|
higher layer.
|
|||
|
|
|||
|
(5) Summary -- contains a summary of the requirements of the
|
|||
|
section.
|
|||
|
|
|||
|
|
|||
|
Under many of the individual topics in this document, there is
|
|||
|
parenthetical material labeled "DISCUSSION" or
|
|||
|
"IMPLEMENTATION". This material is intended to give
|
|||
|
clarification and explanation of the preceding requirements
|
|||
|
text. It also includes some suggestions on possible future
|
|||
|
directions or developments. The implementation material
|
|||
|
contains suggested approaches that an implementor may want to
|
|||
|
consider.
|
|||
|
|
|||
|
The summary sections are intended to be guides and indexes to
|
|||
|
the text, but are necessarily cryptic and incomplete. The
|
|||
|
summaries should never be used or referenced separately from
|
|||
|
the complete RFC.
|
|||
|
|
|||
|
1.3.2 Requirements
|
|||
|
|
|||
|
In this document, the words that are used to define the
|
|||
|
significance of each particular requirement are capitalized.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 16]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
These words are:
|
|||
|
|
|||
|
* "MUST"
|
|||
|
|
|||
|
This word or the adjective "REQUIRED" means that the item
|
|||
|
is an absolute requirement of the specification.
|
|||
|
|
|||
|
* "SHOULD"
|
|||
|
|
|||
|
This word or the adjective "RECOMMENDED" means that there
|
|||
|
may exist valid reasons in particular circumstances to
|
|||
|
ignore this item, but the full implications should be
|
|||
|
understood and the case carefully weighed before choosing
|
|||
|
a different course.
|
|||
|
|
|||
|
* "MAY"
|
|||
|
|
|||
|
This word or the adjective "OPTIONAL" means that this item
|
|||
|
is truly optional. One vendor may choose to include the
|
|||
|
item because a particular marketplace requires it or
|
|||
|
because it enhances the product, for example; another
|
|||
|
vendor may omit the same item.
|
|||
|
|
|||
|
|
|||
|
An implementation is not compliant if it fails to satisfy one
|
|||
|
or more of the MUST requirements for the protocols it
|
|||
|
implements. An implementation that satisfies all the MUST and
|
|||
|
all the SHOULD requirements for its protocols is said to be
|
|||
|
"unconditionally compliant"; one that satisfies all the MUST
|
|||
|
requirements but not all the SHOULD requirements for its
|
|||
|
protocols is said to be "conditionally compliant".
|
|||
|
|
|||
|
1.3.3 Terminology
|
|||
|
|
|||
|
This document uses the following technical terms:
|
|||
|
|
|||
|
Segment
|
|||
|
A segment is the unit of end-to-end transmission in the
|
|||
|
TCP protocol. A segment consists of a TCP header followed
|
|||
|
by application data. A segment is transmitted by
|
|||
|
encapsulation inside an IP datagram.
|
|||
|
|
|||
|
Message
|
|||
|
In this description of the lower-layer protocols, a
|
|||
|
message is the unit of transmission in a transport layer
|
|||
|
protocol. In particular, a TCP segment is a message. A
|
|||
|
message consists of a transport protocol header followed
|
|||
|
by application protocol data. To be transmitted end-to-
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 17]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
end through the Internet, a message must be encapsulated
|
|||
|
inside a datagram.
|
|||
|
|
|||
|
IP Datagram
|
|||
|
An IP datagram is the unit of end-to-end transmission in
|
|||
|
the IP protocol. An IP datagram consists of an IP header
|
|||
|
followed by transport layer data, i.e., of an IP header
|
|||
|
followed by a message.
|
|||
|
|
|||
|
In the description of the internet layer (Section 3), the
|
|||
|
unqualified term "datagram" should be understood to refer
|
|||
|
to an IP datagram.
|
|||
|
|
|||
|
Packet
|
|||
|
A packet is the unit of data passed across the interface
|
|||
|
between the internet layer and the link layer. It
|
|||
|
includes an IP header and data. A packet may be a
|
|||
|
complete IP datagram or a fragment of an IP datagram.
|
|||
|
|
|||
|
Frame
|
|||
|
A frame is the unit of transmission in a link layer
|
|||
|
protocol, and consists of a link-layer header followed by
|
|||
|
a packet.
|
|||
|
|
|||
|
Connected Network
|
|||
|
A network to which a host is interfaced is often known as
|
|||
|
the "local network" or the "subnetwork" relative to that
|
|||
|
host. However, these terms can cause confusion, and
|
|||
|
therefore we use the term "connected network" in this
|
|||
|
document.
|
|||
|
|
|||
|
Multihomed
|
|||
|
A host is said to be multihomed if it has multiple IP
|
|||
|
addresses. For a discussion of multihoming, see Section
|
|||
|
3.3.4 below.
|
|||
|
|
|||
|
Physical network interface
|
|||
|
This is a physical interface to a connected network and
|
|||
|
has a (possibly unique) link-layer address. Multiple
|
|||
|
physical network interfaces on a single host may share the
|
|||
|
same link-layer address, but the address must be unique
|
|||
|
for different hosts on the same physical network.
|
|||
|
|
|||
|
Logical [network] interface
|
|||
|
We define a logical [network] interface to be a logical
|
|||
|
path, distinguished by a unique IP address, to a connected
|
|||
|
network. See Section 3.3.4.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 18]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
Specific-destination address
|
|||
|
This is the effective destination address of a datagram,
|
|||
|
even if it is broadcast or multicast; see Section 3.2.1.3.
|
|||
|
|
|||
|
Path
|
|||
|
At a given moment, all the IP datagrams from a particular
|
|||
|
source host to a particular destination host will
|
|||
|
typically traverse the same sequence of gateways. We use
|
|||
|
the term "path" for this sequence. Note that a path is
|
|||
|
uni-directional; it is not unusual to have different paths
|
|||
|
in the two directions between a given host pair.
|
|||
|
|
|||
|
MTU
|
|||
|
The maximum transmission unit, i.e., the size of the
|
|||
|
largest packet that can be transmitted.
|
|||
|
|
|||
|
|
|||
|
The terms frame, packet, datagram, message, and segment are
|
|||
|
illustrated by the following schematic diagrams:
|
|||
|
|
|||
|
A. Transmission on connected network:
|
|||
|
_______________________________________________
|
|||
|
| LL hdr | IP hdr | (data) |
|
|||
|
|________|________|_____________________________|
|
|||
|
|
|||
|
<---------- Frame ----------------------------->
|
|||
|
<----------Packet -------------------->
|
|||
|
|
|||
|
|
|||
|
B. Before IP fragmentation or after IP reassembly:
|
|||
|
______________________________________
|
|||
|
| IP hdr | transport| Application Data |
|
|||
|
|________|____hdr___|__________________|
|
|||
|
|
|||
|
<-------- Datagram ------------------>
|
|||
|
<-------- Message ----------->
|
|||
|
or, for TCP:
|
|||
|
______________________________________
|
|||
|
| IP hdr | TCP hdr | Application Data |
|
|||
|
|________|__________|__________________|
|
|||
|
|
|||
|
<-------- Datagram ------------------>
|
|||
|
<-------- Segment ----------->
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 19]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTRODUCTION October 1989
|
|||
|
|
|||
|
|
|||
|
1.4 Acknowledgments
|
|||
|
|
|||
|
This document incorporates contributions and comments from a large
|
|||
|
group of Internet protocol experts, including representatives of
|
|||
|
university and research labs, vendors, and government agencies.
|
|||
|
It was assembled primarily by the Host Requirements Working Group
|
|||
|
of the Internet Engineering Task Force (IETF).
|
|||
|
|
|||
|
The Editor would especially like to acknowledge the tireless
|
|||
|
dedication of the following people, who attended many long
|
|||
|
meetings and generated 3 million bytes of electronic mail over the
|
|||
|
past 18 months in pursuit of this document: Philip Almquist, Dave
|
|||
|
Borman (Cray Research), Noel Chiappa, Dave Crocker (DEC), Steve
|
|||
|
Deering (Stanford), Mike Karels (Berkeley), Phil Karn (Bellcore),
|
|||
|
John Lekashman (NASA), Charles Lynn (BBN), Keith McCloghrie (TWG),
|
|||
|
Paul Mockapetris (ISI), Thomas Narten (Purdue), Craig Partridge
|
|||
|
(BBN), Drew Perkins (CMU), and James Van Bokkelen (FTP Software).
|
|||
|
|
|||
|
In addition, the following people made major contributions to the
|
|||
|
effort: Bill Barns (Mitre), Steve Bellovin (AT&T), Mike Brescia
|
|||
|
(BBN), Ed Cain (DCA), Annette DeSchon (ISI), Martin Gross (DCA),
|
|||
|
Phill Gross (NRI), Charles Hedrick (Rutgers), Van Jacobson (LBL),
|
|||
|
John Klensin (MIT), Mark Lottor (SRI), Milo Medin (NASA), Bill
|
|||
|
Melohn (Sun Microsystems), Greg Minshall (Kinetics), Jeff Mogul
|
|||
|
(DEC), John Mullen (CMC), Jon Postel (ISI), John Romkey (Epilogue
|
|||
|
Technology), and Mike StJohns (DCA). The following also made
|
|||
|
significant contributions to particular areas: Eric Allman
|
|||
|
(Berkeley), Rob Austein (MIT), Art Berggreen (ACC), Keith Bostic
|
|||
|
(Berkeley), Vint Cerf (NRI), Wayne Hathaway (NASA), Matt Korn
|
|||
|
(IBM), Erik Naggum (Naggum Software, Norway), Robert Ullmann
|
|||
|
(Prime Computer), David Waitzman (BBN), Frank Wancho (USA), Arun
|
|||
|
Welch (Ohio State), Bill Westfield (Cisco), and Rayan Zachariassen
|
|||
|
(Toronto).
|
|||
|
|
|||
|
We are grateful to all, including any contributors who may have
|
|||
|
been inadvertently omitted from this list.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 20]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 LINK LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
2. LINK LAYER
|
|||
|
|
|||
|
2.1 INTRODUCTION
|
|||
|
|
|||
|
All Internet systems, both hosts and gateways, have the same
|
|||
|
requirements for link layer protocols. These requirements are
|
|||
|
given in Chapter 3 of "Requirements for Internet Gateways"
|
|||
|
[INTRO:2], augmented with the material in this section.
|
|||
|
|
|||
|
2.2 PROTOCOL WALK-THROUGH
|
|||
|
|
|||
|
None.
|
|||
|
|
|||
|
2.3 SPECIFIC ISSUES
|
|||
|
|
|||
|
2.3.1 Trailer Protocol Negotiation
|
|||
|
|
|||
|
The trailer protocol [LINK:1] for link-layer encapsulation MAY
|
|||
|
be used, but only when it has been verified that both systems
|
|||
|
(host or gateway) involved in the link-layer communication
|
|||
|
implement trailers. If the system does not dynamically
|
|||
|
negotiate use of the trailer protocol on a per-destination
|
|||
|
basis, the default configuration MUST disable the protocol.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
The trailer protocol is a link-layer encapsulation
|
|||
|
technique that rearranges the data contents of packets
|
|||
|
sent on the physical network. In some cases, trailers
|
|||
|
improve the throughput of higher layer protocols by
|
|||
|
reducing the amount of data copying within the operating
|
|||
|
system. Higher layer protocols are unaware of trailer
|
|||
|
use, but both the sending and receiving host MUST
|
|||
|
understand the protocol if it is used.
|
|||
|
|
|||
|
Improper use of trailers can result in very confusing
|
|||
|
symptoms. Only packets with specific size attributes are
|
|||
|
encapsulated using trailers, and typically only a small
|
|||
|
fraction of the packets being exchanged have these
|
|||
|
attributes. Thus, if a system using trailers exchanges
|
|||
|
packets with a system that does not, some packets
|
|||
|
disappear into a black hole while others are delivered
|
|||
|
successfully.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
On an Ethernet, packets encapsulated with trailers use a
|
|||
|
distinct Ethernet type [LINK:1], and trailer negotiation
|
|||
|
is performed at the time that ARP is used to discover the
|
|||
|
link-layer address of a destination system.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 21]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 LINK LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
Specifically, the ARP exchange is completed in the usual
|
|||
|
manner using the normal IP protocol type, but a host that
|
|||
|
wants to speak trailers will send an additional "trailer
|
|||
|
ARP reply" packet, i.e., an ARP reply that specifies the
|
|||
|
trailer encapsulation protocol type but otherwise has the
|
|||
|
format of a normal ARP reply. If a host configured to use
|
|||
|
trailers receives a trailer ARP reply message from a
|
|||
|
remote machine, it can add that machine to the list of
|
|||
|
machines that understand trailers, e.g., by marking the
|
|||
|
corresponding entry in the ARP cache.
|
|||
|
|
|||
|
Hosts wishing to receive trailer encapsulations send
|
|||
|
trailer ARP replies whenever they complete exchanges of
|
|||
|
normal ARP messages for IP. Thus, a host that received an
|
|||
|
ARP request for its IP protocol address would send a
|
|||
|
trailer ARP reply in addition to the normal IP ARP reply;
|
|||
|
a host that sent the IP ARP request would send a trailer
|
|||
|
ARP reply when it received the corresponding IP ARP reply.
|
|||
|
In this way, either the requesting or responding host in
|
|||
|
an IP ARP exchange may request that it receive trailer
|
|||
|
encapsulations.
|
|||
|
|
|||
|
This scheme, using extra trailer ARP reply packets rather
|
|||
|
than sending an ARP request for the trailer protocol type,
|
|||
|
was designed to avoid a continuous exchange of ARP packets
|
|||
|
with a misbehaving host that, contrary to any
|
|||
|
specification or common sense, responded to an ARP reply
|
|||
|
for trailers with another ARP reply for IP. This problem
|
|||
|
is avoided by sending a trailer ARP reply in response to
|
|||
|
an IP ARP reply only when the IP ARP reply answers an
|
|||
|
outstanding request; this is true when the hardware
|
|||
|
address for the host is still unknown when the IP ARP
|
|||
|
reply is received. A trailer ARP reply may always be sent
|
|||
|
along with an IP ARP reply responding to an IP ARP
|
|||
|
request.
|
|||
|
|
|||
|
2.3.2 Address Resolution Protocol -- ARP
|
|||
|
|
|||
|
2.3.2.1 ARP Cache Validation
|
|||
|
|
|||
|
An implementation of the Address Resolution Protocol (ARP)
|
|||
|
[LINK:2] MUST provide a mechanism to flush out-of-date cache
|
|||
|
entries. If this mechanism involves a timeout, it SHOULD be
|
|||
|
possible to configure the timeout value.
|
|||
|
|
|||
|
A mechanism to prevent ARP flooding (repeatedly sending an
|
|||
|
ARP Request for the same IP address, at a high rate) MUST be
|
|||
|
included. The recommended maximum rate is 1 per second per
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 22]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 LINK LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
destination.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
The ARP specification [LINK:2] suggests but does not
|
|||
|
require a timeout mechanism to invalidate cache entries
|
|||
|
when hosts change their Ethernet addresses. The
|
|||
|
prevalence of proxy ARP (see Section 2.4 of [INTRO:2])
|
|||
|
has significantly increased the likelihood that cache
|
|||
|
entries in hosts will become invalid, and therefore
|
|||
|
some ARP-cache invalidation mechanism is now required
|
|||
|
for hosts. Even in the absence of proxy ARP, a long-
|
|||
|
period cache timeout is useful in order to
|
|||
|
automatically correct any bad ARP data that might have
|
|||
|
been cached.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
Four mechanisms have been used, sometimes in
|
|||
|
combination, to flush out-of-date cache entries.
|
|||
|
|
|||
|
(1) Timeout -- Periodically time out cache entries,
|
|||
|
even if they are in use. Note that this timeout
|
|||
|
should be restarted when the cache entry is
|
|||
|
"refreshed" (by observing the source fields,
|
|||
|
regardless of target address, of an ARP broadcast
|
|||
|
from the system in question). For proxy ARP
|
|||
|
situations, the timeout needs to be on the order
|
|||
|
of a minute.
|
|||
|
|
|||
|
(2) Unicast Poll -- Actively poll the remote host by
|
|||
|
periodically sending a point-to-point ARP Request
|
|||
|
to it, and delete the entry if no ARP Reply is
|
|||
|
received from N successive polls. Again, the
|
|||
|
timeout should be on the order of a minute, and
|
|||
|
typically N is 2.
|
|||
|
|
|||
|
(3) Link-Layer Advice -- If the link-layer driver
|
|||
|
detects a delivery problem, flush the
|
|||
|
corresponding ARP cache entry.
|
|||
|
|
|||
|
(4) Higher-layer Advice -- Provide a call from the
|
|||
|
Internet layer to the link layer to indicate a
|
|||
|
delivery problem. The effect of this call would
|
|||
|
be to invalidate the corresponding cache entry.
|
|||
|
This call would be analogous to the
|
|||
|
"ADVISE_DELIVPROB()" call from the transport layer
|
|||
|
to the Internet layer (see Section 3.4), and in
|
|||
|
fact the ADVISE_DELIVPROB routine might in turn
|
|||
|
call the link-layer advice routine to invalidate
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 23]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 LINK LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
the ARP cache entry.
|
|||
|
|
|||
|
Approaches (1) and (2) involve ARP cache timeouts on
|
|||
|
the order of a minute or less. In the absence of proxy
|
|||
|
ARP, a timeout this short could create noticeable
|
|||
|
overhead traffic on a very large Ethernet. Therefore,
|
|||
|
it may be necessary to configure a host to lengthen the
|
|||
|
ARP cache timeout.
|
|||
|
|
|||
|
2.3.2.2 ARP Packet Queue
|
|||
|
|
|||
|
The link layer SHOULD save (rather than discard) at least
|
|||
|
one (the latest) packet of each set of packets destined to
|
|||
|
the same unresolved IP address, and transmit the saved
|
|||
|
packet when the address has been resolved.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Failure to follow this recommendation causes the first
|
|||
|
packet of every exchange to be lost. Although higher-
|
|||
|
layer protocols can generally cope with packet loss by
|
|||
|
retransmission, packet loss does impact performance.
|
|||
|
For example, loss of a TCP open request causes the
|
|||
|
initial round-trip time estimate to be inflated. UDP-
|
|||
|
based applications such as the Domain Name System are
|
|||
|
more seriously affected.
|
|||
|
|
|||
|
2.3.3 Ethernet and IEEE 802 Encapsulation
|
|||
|
|
|||
|
The IP encapsulation for Ethernets is described in RFC-894
|
|||
|
[LINK:3], while RFC-1042 [LINK:4] describes the IP
|
|||
|
encapsulation for IEEE 802 networks. RFC-1042 elaborates and
|
|||
|
replaces the discussion in Section 3.4 of [INTRO:2].
|
|||
|
|
|||
|
Every Internet host connected to a 10Mbps Ethernet cable:
|
|||
|
|
|||
|
o MUST be able to send and receive packets using RFC-894
|
|||
|
encapsulation;
|
|||
|
|
|||
|
o SHOULD be able to receive RFC-1042 packets, intermixed
|
|||
|
with RFC-894 packets; and
|
|||
|
|
|||
|
o MAY be able to send packets using RFC-1042 encapsulation.
|
|||
|
|
|||
|
|
|||
|
An Internet host that implements sending both the RFC-894 and
|
|||
|
the RFC-1042 encapsulations MUST provide a configuration switch
|
|||
|
to select which is sent, and this switch MUST default to RFC-
|
|||
|
894.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 24]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 LINK LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
Note that the standard IP encapsulation in RFC-1042 does not
|
|||
|
use the protocol id value (K1=6) that IEEE reserved for IP;
|
|||
|
instead, it uses a value (K1=170) that implies an extension
|
|||
|
(the "SNAP") which can be used to hold the Ether-Type field.
|
|||
|
An Internet system MUST NOT send 802 packets using K1=6.
|
|||
|
|
|||
|
Address translation from Internet addresses to link-layer
|
|||
|
addresses on Ethernet and IEEE 802 networks MUST be managed by
|
|||
|
the Address Resolution Protocol (ARP).
|
|||
|
|
|||
|
The MTU for an Ethernet is 1500 and for 802.3 is 1492.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
The IEEE 802.3 specification provides for operation over a
|
|||
|
10Mbps Ethernet cable, in which case Ethernet and IEEE
|
|||
|
802.3 frames can be physically intermixed. A receiver can
|
|||
|
distinguish Ethernet and 802.3 frames by the value of the
|
|||
|
802.3 Length field; this two-octet field coincides in the
|
|||
|
header with the Ether-Type field of an Ethernet frame. In
|
|||
|
particular, the 802.3 Length field must be less than or
|
|||
|
equal to 1500, while all valid Ether-Type values are
|
|||
|
greater than 1500.
|
|||
|
|
|||
|
Another compatibility problem arises with link-layer
|
|||
|
broadcasts. A broadcast sent with one framing will not be
|
|||
|
seen by hosts that can receive only the other framing.
|
|||
|
|
|||
|
The provisions of this section were designed to provide
|
|||
|
direct interoperation between 894-capable and 1042-capable
|
|||
|
systems on the same cable, to the maximum extent possible.
|
|||
|
It is intended to support the present situation where
|
|||
|
894-only systems predominate, while providing an easy
|
|||
|
transition to a possible future in which 1042-capable
|
|||
|
systems become common.
|
|||
|
|
|||
|
Note that 894-only systems cannot interoperate directly
|
|||
|
with 1042-only systems. If the two system types are set
|
|||
|
up as two different logical networks on the same cable,
|
|||
|
they can communicate only through an IP gateway.
|
|||
|
Furthermore, it is not useful or even possible for a
|
|||
|
dual-format host to discover automatically which format to
|
|||
|
send, because of the problem of link-layer broadcasts.
|
|||
|
|
|||
|
2.4 LINK/INTERNET LAYER INTERFACE
|
|||
|
|
|||
|
The packet receive interface between the IP layer and the link
|
|||
|
layer MUST include a flag to indicate whether the incoming packet
|
|||
|
was addressed to a link-layer broadcast address.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 25]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 LINK LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
DISCUSSION
|
|||
|
Although the IP layer does not generally know link layer
|
|||
|
addresses (since every different network medium typically has
|
|||
|
a different address format), the broadcast address on a
|
|||
|
broadcast-capable medium is an important special case. See
|
|||
|
Section 3.2.2, especially the DISCUSSION concerning broadcast
|
|||
|
storms.
|
|||
|
|
|||
|
The packet send interface between the IP and link layers MUST
|
|||
|
include the 5-bit TOS field (see Section 3.2.1.6).
|
|||
|
|
|||
|
The link layer MUST NOT report a Destination Unreachable error to
|
|||
|
IP solely because there is no ARP cache entry for a destination.
|
|||
|
|
|||
|
2.5 LINK LAYER REQUIREMENTS SUMMARY
|
|||
|
|
|||
|
| | | | |S| |
|
|||
|
| | | | |H| |F
|
|||
|
| | | | |O|M|o
|
|||
|
| | |S| |U|U|o
|
|||
|
| | |H| |L|S|t
|
|||
|
| |M|O| |D|T|n
|
|||
|
| |U|U|M| | |o
|
|||
|
| |S|L|A|N|N|t
|
|||
|
| |T|D|Y|O|O|t
|
|||
|
FEATURE |SECTION| | | |T|T|e
|
|||
|
--------------------------------------------------|-------|-|-|-|-|-|--
|
|||
|
| | | | | | |
|
|||
|
Trailer encapsulation |2.3.1 | | |x| | |
|
|||
|
Send Trailers by default without negotiation |2.3.1 | | | | |x|
|
|||
|
ARP |2.3.2 | | | | | |
|
|||
|
Flush out-of-date ARP cache entries |2.3.2.1|x| | | | |
|
|||
|
Prevent ARP floods |2.3.2.1|x| | | | |
|
|||
|
Cache timeout configurable |2.3.2.1| |x| | | |
|
|||
|
Save at least one (latest) unresolved pkt |2.3.2.2| |x| | | |
|
|||
|
Ethernet and IEEE 802 Encapsulation |2.3.3 | | | | | |
|
|||
|
Host able to: |2.3.3 | | | | | |
|
|||
|
Send & receive RFC-894 encapsulation |2.3.3 |x| | | | |
|
|||
|
Receive RFC-1042 encapsulation |2.3.3 | |x| | | |
|
|||
|
Send RFC-1042 encapsulation |2.3.3 | | |x| | |
|
|||
|
Then config. sw. to select, RFC-894 dflt |2.3.3 |x| | | | |
|
|||
|
Send K1=6 encapsulation |2.3.3 | | | | |x|
|
|||
|
Use ARP on Ethernet and IEEE 802 nets |2.3.3 |x| | | | |
|
|||
|
Link layer report b'casts to IP layer |2.4 |x| | | | |
|
|||
|
IP layer pass TOS to link layer |2.4 |x| | | | |
|
|||
|
No ARP cache entry treated as Dest. Unreach. |2.4 | | | | |x|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 26]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
3. INTERNET LAYER PROTOCOLS
|
|||
|
|
|||
|
3.1 INTRODUCTION
|
|||
|
|
|||
|
The Robustness Principle: "Be liberal in what you accept, and
|
|||
|
conservative in what you send" is particularly important in the
|
|||
|
Internet layer, where one misbehaving host can deny Internet
|
|||
|
service to many other hosts.
|
|||
|
|
|||
|
The protocol standards used in the Internet layer are:
|
|||
|
|
|||
|
o RFC-791 [IP:1] defines the IP protocol and gives an
|
|||
|
introduction to the architecture of the Internet.
|
|||
|
|
|||
|
o RFC-792 [IP:2] defines ICMP, which provides routing,
|
|||
|
diagnostic and error functionality for IP. Although ICMP
|
|||
|
messages are encapsulated within IP datagrams, ICMP
|
|||
|
processing is considered to be (and is typically implemented
|
|||
|
as) part of the IP layer. See Section 3.2.2.
|
|||
|
|
|||
|
o RFC-950 [IP:3] defines the mandatory subnet extension to the
|
|||
|
addressing architecture.
|
|||
|
|
|||
|
o RFC-1112 [IP:4] defines the Internet Group Management
|
|||
|
Protocol IGMP, as part of a recommended extension to hosts
|
|||
|
and to the host-gateway interface to support Internet-wide
|
|||
|
multicasting at the IP level. See Section 3.2.3.
|
|||
|
|
|||
|
The target of an IP multicast may be an arbitrary group of
|
|||
|
Internet hosts. IP multicasting is designed as a natural
|
|||
|
extension of the link-layer multicasting facilities of some
|
|||
|
networks, and it provides a standard means for local access
|
|||
|
to such link-layer multicasting facilities.
|
|||
|
|
|||
|
Other important references are listed in Section 5 of this
|
|||
|
document.
|
|||
|
|
|||
|
The Internet layer of host software MUST implement both IP and
|
|||
|
ICMP. See Section 3.3.7 for the requirements on support of IGMP.
|
|||
|
|
|||
|
The host IP layer has two basic functions: (1) choose the "next
|
|||
|
hop" gateway or host for outgoing IP datagrams and (2) reassemble
|
|||
|
incoming IP datagrams. The IP layer may also (3) implement
|
|||
|
intentional fragmentation of outgoing datagrams. Finally, the IP
|
|||
|
layer must (4) provide diagnostic and error functionality. We
|
|||
|
expect that IP layer functions may increase somewhat in the
|
|||
|
future, as further Internet control and management facilities are
|
|||
|
developed.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 27]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
For normal datagrams, the processing is straightforward. For
|
|||
|
incoming datagrams, the IP layer:
|
|||
|
|
|||
|
(1) verifies that the datagram is correctly formatted;
|
|||
|
|
|||
|
(2) verifies that it is destined to the local host;
|
|||
|
|
|||
|
(3) processes options;
|
|||
|
|
|||
|
(4) reassembles the datagram if necessary; and
|
|||
|
|
|||
|
(5) passes the encapsulated message to the appropriate
|
|||
|
transport-layer protocol module.
|
|||
|
|
|||
|
For outgoing datagrams, the IP layer:
|
|||
|
|
|||
|
(1) sets any fields not set by the transport layer;
|
|||
|
|
|||
|
(2) selects the correct first hop on the connected network (a
|
|||
|
process called "routing");
|
|||
|
|
|||
|
(3) fragments the datagram if necessary and if intentional
|
|||
|
fragmentation is implemented (see Section 3.3.3); and
|
|||
|
|
|||
|
(4) passes the packet(s) to the appropriate link-layer driver.
|
|||
|
|
|||
|
|
|||
|
A host is said to be multihomed if it has multiple IP addresses.
|
|||
|
Multihoming introduces considerable confusion and complexity into
|
|||
|
the protocol suite, and it is an area in which the Internet
|
|||
|
architecture falls seriously short of solving all problems. There
|
|||
|
are two distinct problem areas in multihoming:
|
|||
|
|
|||
|
(1) Local multihoming -- the host itself is multihomed; or
|
|||
|
|
|||
|
(2) Remote multihoming -- the local host needs to communicate
|
|||
|
with a remote multihomed host.
|
|||
|
|
|||
|
At present, remote multihoming MUST be handled at the application
|
|||
|
layer, as discussed in the companion RFC [INTRO:1]. A host MAY
|
|||
|
support local multihoming, which is discussed in this document,
|
|||
|
and in particular in Section 3.3.4.
|
|||
|
|
|||
|
Any host that forwards datagrams generated by another host is
|
|||
|
acting as a gateway and MUST also meet the specifications laid out
|
|||
|
in the gateway requirements RFC [INTRO:2]. An Internet host that
|
|||
|
includes embedded gateway code MUST have a configuration switch to
|
|||
|
disable the gateway function, and this switch MUST default to the
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 28]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
non-gateway mode. In this mode, a datagram arriving through one
|
|||
|
interface will not be forwarded to another host or gateway (unless
|
|||
|
it is source-routed), regardless of whether the host is single-
|
|||
|
homed or multihomed. The host software MUST NOT automatically
|
|||
|
move into gateway mode if the host has more than one interface, as
|
|||
|
the operator of the machine may neither want to provide that
|
|||
|
service nor be competent to do so.
|
|||
|
|
|||
|
In the following, the action specified in certain cases is to
|
|||
|
"silently discard" a received datagram. This means that the
|
|||
|
datagram will be discarded without further processing and that the
|
|||
|
host will not send any ICMP error message (see Section 3.2.2) as a
|
|||
|
result. However, for diagnosis of problems a host SHOULD provide
|
|||
|
the capability of logging the error (see Section 1.2.3), including
|
|||
|
the contents of the silently-discarded datagram, and SHOULD record
|
|||
|
the event in a statistics counter.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Silent discard of erroneous datagrams is generally intended
|
|||
|
to prevent "broadcast storms".
|
|||
|
|
|||
|
3.2 PROTOCOL WALK-THROUGH
|
|||
|
|
|||
|
3.2.1 Internet Protocol -- IP
|
|||
|
|
|||
|
3.2.1.1 Version Number: RFC-791 Section 3.1
|
|||
|
|
|||
|
A datagram whose version number is not 4 MUST be silently
|
|||
|
discarded.
|
|||
|
|
|||
|
3.2.1.2 Checksum: RFC-791 Section 3.1
|
|||
|
|
|||
|
A host MUST verify the IP header checksum on every received
|
|||
|
datagram and silently discard every datagram that has a bad
|
|||
|
checksum.
|
|||
|
|
|||
|
3.2.1.3 Addressing: RFC-791 Section 3.2
|
|||
|
|
|||
|
There are now five classes of IP addresses: Class A through
|
|||
|
Class E. Class D addresses are used for IP multicasting
|
|||
|
[IP:4], while Class E addresses are reserved for
|
|||
|
experimental use.
|
|||
|
|
|||
|
A multicast (Class D) address is a 28-bit logical address
|
|||
|
that stands for a group of hosts, and may be either
|
|||
|
permanent or transient. Permanent multicast addresses are
|
|||
|
allocated by the Internet Assigned Number Authority
|
|||
|
[INTRO:6], while transient addresses may be allocated
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 29]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
dynamically to transient groups. Group membership is
|
|||
|
determined dynamically using IGMP [IP:4].
|
|||
|
|
|||
|
We now summarize the important special cases for Class A, B,
|
|||
|
and C IP addresses, using the following notation for an IP
|
|||
|
address:
|
|||
|
|
|||
|
{ <Network-number>, <Host-number> }
|
|||
|
|
|||
|
or
|
|||
|
{ <Network-number>, <Subnet-number>, <Host-number> }
|
|||
|
|
|||
|
and the notation "-1" for a field that contains all 1 bits.
|
|||
|
This notation is not intended to imply that the 1-bits in an
|
|||
|
address mask need be contiguous.
|
|||
|
|
|||
|
(a) { 0, 0 }
|
|||
|
|
|||
|
This host on this network. MUST NOT be sent, except as
|
|||
|
a source address as part of an initialization procedure
|
|||
|
by which the host learns its own IP address.
|
|||
|
|
|||
|
See also Section 3.3.6 for a non-standard use of {0,0}.
|
|||
|
|
|||
|
(b) { 0, <Host-number> }
|
|||
|
|
|||
|
Specified host on this network. It MUST NOT be sent,
|
|||
|
except as a source address as part of an initialization
|
|||
|
procedure by which the host learns its full IP address.
|
|||
|
|
|||
|
(c) { -1, -1 }
|
|||
|
|
|||
|
Limited broadcast. It MUST NOT be used as a source
|
|||
|
address.
|
|||
|
|
|||
|
A datagram with this destination address will be
|
|||
|
received by every host on the connected physical
|
|||
|
network but will not be forwarded outside that network.
|
|||
|
|
|||
|
(d) { <Network-number>, -1 }
|
|||
|
|
|||
|
Directed broadcast to the specified network. It MUST
|
|||
|
NOT be used as a source address.
|
|||
|
|
|||
|
(e) { <Network-number>, <Subnet-number>, -1 }
|
|||
|
|
|||
|
Directed broadcast to the specified subnet. It MUST
|
|||
|
NOT be used as a source address.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 30]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
(f) { <Network-number>, -1, -1 }
|
|||
|
|
|||
|
Directed broadcast to all subnets of the specified
|
|||
|
subnetted network. It MUST NOT be used as a source
|
|||
|
address.
|
|||
|
|
|||
|
(g) { 127, <any> }
|
|||
|
|
|||
|
Internal host loopback address. Addresses of this form
|
|||
|
MUST NOT appear outside a host.
|
|||
|
|
|||
|
The <Network-number> is administratively assigned so that
|
|||
|
its value will be unique in the entire world.
|
|||
|
|
|||
|
IP addresses are not permitted to have the value 0 or -1 for
|
|||
|
any of the <Host-number>, <Network-number>, or <Subnet-
|
|||
|
number> fields (except in the special cases listed above).
|
|||
|
This implies that each of these fields will be at least two
|
|||
|
bits long.
|
|||
|
|
|||
|
For further discussion of broadcast addresses, see Section
|
|||
|
3.3.6.
|
|||
|
|
|||
|
A host MUST support the subnet extensions to IP [IP:3]. As
|
|||
|
a result, there will be an address mask of the form:
|
|||
|
{-1, -1, 0} associated with each of the host's local IP
|
|||
|
addresses; see Sections 3.2.2.9 and 3.3.1.1.
|
|||
|
|
|||
|
When a host sends any datagram, the IP source address MUST
|
|||
|
be one of its own IP addresses (but not a broadcast or
|
|||
|
multicast address).
|
|||
|
|
|||
|
A host MUST silently discard an incoming datagram that is
|
|||
|
not destined for the host. An incoming datagram is destined
|
|||
|
for the host if the datagram's destination address field is:
|
|||
|
|
|||
|
(1) (one of) the host's IP address(es); or
|
|||
|
|
|||
|
(2) an IP broadcast address valid for the connected
|
|||
|
network; or
|
|||
|
|
|||
|
(3) the address for a multicast group of which the host is
|
|||
|
a member on the incoming physical interface.
|
|||
|
|
|||
|
For most purposes, a datagram addressed to a broadcast or
|
|||
|
multicast destination is processed as if it had been
|
|||
|
addressed to one of the host's IP addresses; we use the term
|
|||
|
"specific-destination address" for the equivalent local IP
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 31]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
address of the host. The specific-destination address is
|
|||
|
defined to be the destination address in the IP header
|
|||
|
unless the header contains a broadcast or multicast address,
|
|||
|
in which case the specific-destination is an IP address
|
|||
|
assigned to the physical interface on which the datagram
|
|||
|
arrived.
|
|||
|
|
|||
|
A host MUST silently discard an incoming datagram containing
|
|||
|
an IP source address that is invalid by the rules of this
|
|||
|
section. This validation could be done in either the IP
|
|||
|
layer or by each protocol in the transport layer.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
A mis-addressed datagram might be caused by a link-
|
|||
|
layer broadcast of a unicast datagram or by a gateway
|
|||
|
or host that is confused or mis-configured.
|
|||
|
|
|||
|
An architectural goal for Internet hosts was to allow
|
|||
|
IP addresses to be featureless 32-bit numbers, avoiding
|
|||
|
algorithms that required a knowledge of the IP address
|
|||
|
format. Otherwise, any future change in the format or
|
|||
|
interpretation of IP addresses will require host
|
|||
|
software changes. However, validation of broadcast and
|
|||
|
multicast addresses violates this goal; a few other
|
|||
|
violations are described elsewhere in this document.
|
|||
|
|
|||
|
Implementers should be aware that applications
|
|||
|
depending upon the all-subnets directed broadcast
|
|||
|
address (f) may be unusable on some networks. All-
|
|||
|
subnets broadcast is not widely implemented in vendor
|
|||
|
gateways at present, and even when it is implemented, a
|
|||
|
particular network administration may disable it in the
|
|||
|
gateway configuration.
|
|||
|
|
|||
|
3.2.1.4 Fragmentation and Reassembly: RFC-791 Section 3.2
|
|||
|
|
|||
|
The Internet model requires that every host support
|
|||
|
reassembly. See Sections 3.3.2 and 3.3.3 for the
|
|||
|
requirements on fragmentation and reassembly.
|
|||
|
|
|||
|
3.2.1.5 Identification: RFC-791 Section 3.2
|
|||
|
|
|||
|
When sending an identical copy of an earlier datagram, a
|
|||
|
host MAY optionally retain the same Identification field in
|
|||
|
the copy.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 32]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Some Internet protocol experts have maintained that
|
|||
|
when a host sends an identical copy of an earlier
|
|||
|
datagram, the new copy should contain the same
|
|||
|
Identification value as the original. There are two
|
|||
|
suggested advantages: (1) if the datagrams are
|
|||
|
fragmented and some of the fragments are lost, the
|
|||
|
receiver may be able to reconstruct a complete datagram
|
|||
|
from fragments of the original and the copies; (2) a
|
|||
|
congested gateway might use the IP Identification field
|
|||
|
(and Fragment Offset) to discard duplicate datagrams
|
|||
|
from the queue.
|
|||
|
|
|||
|
However, the observed patterns of datagram loss in the
|
|||
|
Internet do not favor the probability of retransmitted
|
|||
|
fragments filling reassembly gaps, while other
|
|||
|
mechanisms (e.g., TCP repacketizing upon
|
|||
|
retransmission) tend to prevent retransmission of an
|
|||
|
identical datagram [IP:9]. Therefore, we believe that
|
|||
|
retransmitting the same Identification field is not
|
|||
|
useful. Also, a connectionless transport protocol like
|
|||
|
UDP would require the cooperation of the application
|
|||
|
programs to retain the same Identification value in
|
|||
|
identical datagrams.
|
|||
|
|
|||
|
3.2.1.6 Type-of-Service: RFC-791 Section 3.2
|
|||
|
|
|||
|
The "Type-of-Service" byte in the IP header is divided into
|
|||
|
two sections: the Precedence field (high-order 3 bits), and
|
|||
|
a field that is customarily called "Type-of-Service" or
|
|||
|
"TOS" (low-order 5 bits). In this document, all references
|
|||
|
to "TOS" or the "TOS field" refer to the low-order 5 bits
|
|||
|
only.
|
|||
|
|
|||
|
The Precedence field is intended for Department of Defense
|
|||
|
applications of the Internet protocols. The use of non-zero
|
|||
|
values in this field is outside the scope of this document
|
|||
|
and the IP standard specification. Vendors should consult
|
|||
|
the Defense Communication Agency (DCA) for guidance on the
|
|||
|
IP Precedence field and its implications for other protocol
|
|||
|
layers. However, vendors should note that the use of
|
|||
|
precedence will most likely require that its value be passed
|
|||
|
between protocol layers in just the same way as the TOS
|
|||
|
field is passed.
|
|||
|
|
|||
|
The IP layer MUST provide a means for the transport layer to
|
|||
|
set the TOS field of every datagram that is sent; the
|
|||
|
default is all zero bits. The IP layer SHOULD pass received
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 33]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
TOS values up to the transport layer.
|
|||
|
|
|||
|
The particular link-layer mappings of TOS contained in RFC-
|
|||
|
795 SHOULD NOT be implemented.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
While the TOS field has been little used in the past,
|
|||
|
it is expected to play an increasing role in the near
|
|||
|
future. The TOS field is expected to be used to
|
|||
|
control two aspects of gateway operations: routing and
|
|||
|
queueing algorithms. See Section 2 of [INTRO:1] for
|
|||
|
the requirements on application programs to specify TOS
|
|||
|
values.
|
|||
|
|
|||
|
The TOS field may also be mapped into link-layer
|
|||
|
service selectors. This has been applied to provide
|
|||
|
effective sharing of serial lines by different classes
|
|||
|
of TCP traffic, for example. However, the mappings
|
|||
|
suggested in RFC-795 for networks that were included in
|
|||
|
the Internet as of 1981 are now obsolete.
|
|||
|
|
|||
|
3.2.1.7 Time-to-Live: RFC-791 Section 3.2
|
|||
|
|
|||
|
A host MUST NOT send a datagram with a Time-to-Live (TTL)
|
|||
|
value of zero.
|
|||
|
|
|||
|
A host MUST NOT discard a datagram just because it was
|
|||
|
received with TTL less than 2.
|
|||
|
|
|||
|
The IP layer MUST provide a means for the transport layer to
|
|||
|
set the TTL field of every datagram that is sent. When a
|
|||
|
fixed TTL value is used, it MUST be configurable. The
|
|||
|
current suggested value will be published in the "Assigned
|
|||
|
Numbers" RFC.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
The TTL field has two functions: limit the lifetime of
|
|||
|
TCP segments (see RFC-793 [TCP:1], p. 28), and
|
|||
|
terminate Internet routing loops. Although TTL is a
|
|||
|
time in seconds, it also has some attributes of a hop-
|
|||
|
count, since each gateway is required to reduce the TTL
|
|||
|
field by at least one.
|
|||
|
|
|||
|
The intent is that TTL expiration will cause a datagram
|
|||
|
to be discarded by a gateway but not by the destination
|
|||
|
host; however, hosts that act as gateways by forwarding
|
|||
|
datagrams must follow the gateway rules for TTL.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 34]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
A higher-layer protocol may want to set the TTL in
|
|||
|
order to implement an "expanding scope" search for some
|
|||
|
Internet resource. This is used by some diagnostic
|
|||
|
tools, and is expected to be useful for locating the
|
|||
|
"nearest" server of a given class using IP
|
|||
|
multicasting, for example. A particular transport
|
|||
|
protocol may also want to specify its own TTL bound on
|
|||
|
maximum datagram lifetime.
|
|||
|
|
|||
|
A fixed value must be at least big enough for the
|
|||
|
Internet "diameter," i.e., the longest possible path.
|
|||
|
A reasonable value is about twice the diameter, to
|
|||
|
allow for continued Internet growth.
|
|||
|
|
|||
|
3.2.1.8 Options: RFC-791 Section 3.2
|
|||
|
|
|||
|
There MUST be a means for the transport layer to specify IP
|
|||
|
options to be included in transmitted IP datagrams (see
|
|||
|
Section 3.4).
|
|||
|
|
|||
|
All IP options (except NOP or END-OF-LIST) received in
|
|||
|
datagrams MUST be passed to the transport layer (or to ICMP
|
|||
|
processing when the datagram is an ICMP message). The IP
|
|||
|
and transport layer MUST each interpret those IP options
|
|||
|
that they understand and silently ignore the others.
|
|||
|
|
|||
|
Later sections of this document discuss specific IP option
|
|||
|
support required by each of ICMP, TCP, and UDP.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Passing all received IP options to the transport layer
|
|||
|
is a deliberate "violation of strict layering" that is
|
|||
|
designed to ease the introduction of new transport-
|
|||
|
relevant IP options in the future. Each layer must
|
|||
|
pick out any options that are relevant to its own
|
|||
|
processing and ignore the rest. For this purpose,
|
|||
|
every IP option except NOP and END-OF-LIST will include
|
|||
|
a specification of its own length.
|
|||
|
|
|||
|
This document does not define the order in which a
|
|||
|
receiver must process multiple options in the same IP
|
|||
|
header. Hosts sending multiple options must be aware
|
|||
|
that this introduces an ambiguity in the meaning of
|
|||
|
certain options when combined with a source-route
|
|||
|
option.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
The IP layer must not crash as the result of an option
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 35]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
length that is outside the possible range. For
|
|||
|
example, erroneous option lengths have been observed to
|
|||
|
put some IP implementations into infinite loops.
|
|||
|
|
|||
|
Here are the requirements for specific IP options:
|
|||
|
|
|||
|
|
|||
|
(a) Security Option
|
|||
|
|
|||
|
Some environments require the Security option in every
|
|||
|
datagram; such a requirement is outside the scope of
|
|||
|
this document and the IP standard specification. Note,
|
|||
|
however, that the security options described in RFC-791
|
|||
|
and RFC-1038 are obsolete. For DoD applications,
|
|||
|
vendors should consult [IP:8] for guidance.
|
|||
|
|
|||
|
|
|||
|
(b) Stream Identifier Option
|
|||
|
|
|||
|
This option is obsolete; it SHOULD NOT be sent, and it
|
|||
|
MUST be silently ignored if received.
|
|||
|
|
|||
|
|
|||
|
(c) Source Route Options
|
|||
|
|
|||
|
A host MUST support originating a source route and MUST
|
|||
|
be able to act as the final destination of a source
|
|||
|
route.
|
|||
|
|
|||
|
If host receives a datagram containing a completed
|
|||
|
source route (i.e., the pointer points beyond the last
|
|||
|
field), the datagram has reached its final destination;
|
|||
|
the option as received (the recorded route) MUST be
|
|||
|
passed up to the transport layer (or to ICMP message
|
|||
|
processing). This recorded route will be reversed and
|
|||
|
used to form a return source route for reply datagrams
|
|||
|
(see discussion of IP Options in Section 4). When a
|
|||
|
return source route is built, it MUST be correctly
|
|||
|
formed even if the recorded route included the source
|
|||
|
host (see case (B) in the discussion below).
|
|||
|
|
|||
|
An IP header containing more than one Source Route
|
|||
|
option MUST NOT be sent; the effect on routing of
|
|||
|
multiple Source Route options is implementation-
|
|||
|
specific.
|
|||
|
|
|||
|
Section 3.3.5 presents the rules for a host acting as
|
|||
|
an intermediate hop in a source route, i.e., forwarding
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 36]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
a source-routed datagram.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
If a source-routed datagram is fragmented, each
|
|||
|
fragment will contain a copy of the source route.
|
|||
|
Since the processing of IP options (including a
|
|||
|
source route) must precede reassembly, the
|
|||
|
original datagram will not be reassembled until
|
|||
|
the final destination is reached.
|
|||
|
|
|||
|
Suppose a source routed datagram is to be routed
|
|||
|
from host S to host D via gateways G1, G2, ... Gn.
|
|||
|
There was an ambiguity in the specification over
|
|||
|
whether the source route option in a datagram sent
|
|||
|
out by S should be (A) or (B):
|
|||
|
|
|||
|
(A): {>>G2, G3, ... Gn, D} <--- CORRECT
|
|||
|
|
|||
|
(B): {S, >>G2, G3, ... Gn, D} <---- WRONG
|
|||
|
|
|||
|
(where >> represents the pointer). If (A) is
|
|||
|
sent, the datagram received at D will contain the
|
|||
|
option: {G1, G2, ... Gn >>}, with S and D as the
|
|||
|
IP source and destination addresses. If (B) were
|
|||
|
sent, the datagram received at D would again
|
|||
|
contain S and D as the same IP source and
|
|||
|
destination addresses, but the option would be:
|
|||
|
{S, G1, ...Gn >>}; i.e., the originating host
|
|||
|
would be the first hop in the route.
|
|||
|
|
|||
|
|
|||
|
(d) Record Route Option
|
|||
|
|
|||
|
Implementation of originating and processing the Record
|
|||
|
Route option is OPTIONAL.
|
|||
|
|
|||
|
|
|||
|
(e) Timestamp Option
|
|||
|
|
|||
|
Implementation of originating and processing the
|
|||
|
Timestamp option is OPTIONAL. If it is implemented,
|
|||
|
the following rules apply:
|
|||
|
|
|||
|
o The originating host MUST record a timestamp in a
|
|||
|
Timestamp option whose Internet address fields are
|
|||
|
not pre-specified or whose first pre-specified
|
|||
|
address is the host's interface address.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 37]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
o The destination host MUST (if possible) add the
|
|||
|
current timestamp to a Timestamp option before
|
|||
|
passing the option to the transport layer or to
|
|||
|
ICMP for processing.
|
|||
|
|
|||
|
o A timestamp value MUST follow the rules given in
|
|||
|
Section 3.2.2.8 for the ICMP Timestamp message.
|
|||
|
|
|||
|
|
|||
|
3.2.2 Internet Control Message Protocol -- ICMP
|
|||
|
|
|||
|
ICMP messages are grouped into two classes.
|
|||
|
|
|||
|
*
|
|||
|
ICMP error messages:
|
|||
|
|
|||
|
Destination Unreachable (see Section 3.2.2.1)
|
|||
|
Redirect (see Section 3.2.2.2)
|
|||
|
Source Quench (see Section 3.2.2.3)
|
|||
|
Time Exceeded (see Section 3.2.2.4)
|
|||
|
Parameter Problem (see Section 3.2.2.5)
|
|||
|
|
|||
|
|
|||
|
*
|
|||
|
ICMP query messages:
|
|||
|
|
|||
|
Echo (see Section 3.2.2.6)
|
|||
|
Information (see Section 3.2.2.7)
|
|||
|
Timestamp (see Section 3.2.2.8)
|
|||
|
Address Mask (see Section 3.2.2.9)
|
|||
|
|
|||
|
|
|||
|
If an ICMP message of unknown type is received, it MUST be
|
|||
|
silently discarded.
|
|||
|
|
|||
|
Every ICMP error message includes the Internet header and at
|
|||
|
least the first 8 data octets of the datagram that triggered
|
|||
|
the error; more than 8 octets MAY be sent; this header and data
|
|||
|
MUST be unchanged from the received datagram.
|
|||
|
|
|||
|
In those cases where the Internet layer is required to pass an
|
|||
|
ICMP error message to the transport layer, the IP protocol
|
|||
|
number MUST be extracted from the original header and used to
|
|||
|
select the appropriate transport protocol entity to handle the
|
|||
|
error.
|
|||
|
|
|||
|
An ICMP error message SHOULD be sent with normal (i.e., zero)
|
|||
|
TOS bits.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 38]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
An ICMP error message MUST NOT be sent as the result of
|
|||
|
receiving:
|
|||
|
|
|||
|
* an ICMP error message, or
|
|||
|
|
|||
|
* a datagram destined to an IP broadcast or IP multicast
|
|||
|
address, or
|
|||
|
|
|||
|
* a datagram sent as a link-layer broadcast, or
|
|||
|
|
|||
|
* a non-initial fragment, or
|
|||
|
|
|||
|
* a datagram whose source address does not define a single
|
|||
|
host -- e.g., a zero address, a loopback address, a
|
|||
|
broadcast address, a multicast address, or a Class E
|
|||
|
address.
|
|||
|
|
|||
|
NOTE: THESE RESTRICTIONS TAKE PRECEDENCE OVER ANY REQUIREMENT
|
|||
|
ELSEWHERE IN THIS DOCUMENT FOR SENDING ICMP ERROR MESSAGES.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
These rules will prevent the "broadcast storms" that have
|
|||
|
resulted from hosts returning ICMP error messages in
|
|||
|
response to broadcast datagrams. For example, a broadcast
|
|||
|
UDP segment to a non-existent port could trigger a flood
|
|||
|
of ICMP Destination Unreachable datagrams from all
|
|||
|
machines that do not have a client for that destination
|
|||
|
port. On a large Ethernet, the resulting collisions can
|
|||
|
render the network useless for a second or more.
|
|||
|
|
|||
|
Every datagram that is broadcast on the connected network
|
|||
|
should have a valid IP broadcast address as its IP
|
|||
|
destination (see Section 3.3.6). However, some hosts
|
|||
|
violate this rule. To be certain to detect broadcast
|
|||
|
datagrams, therefore, hosts are required to check for a
|
|||
|
link-layer broadcast as well as an IP-layer broadcast
|
|||
|
address.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
This requires that the link layer inform the IP layer when
|
|||
|
a link-layer broadcast datagram has been received; see
|
|||
|
Section 2.4.
|
|||
|
|
|||
|
3.2.2.1 Destination Unreachable: RFC-792
|
|||
|
|
|||
|
The following additional codes are hereby defined:
|
|||
|
|
|||
|
6 = destination network unknown
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 39]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
7 = destination host unknown
|
|||
|
|
|||
|
8 = source host isolated
|
|||
|
|
|||
|
9 = communication with destination network
|
|||
|
administratively prohibited
|
|||
|
|
|||
|
10 = communication with destination host
|
|||
|
administratively prohibited
|
|||
|
|
|||
|
11 = network unreachable for type of service
|
|||
|
|
|||
|
12 = host unreachable for type of service
|
|||
|
|
|||
|
A host SHOULD generate Destination Unreachable messages with
|
|||
|
code:
|
|||
|
|
|||
|
2 (Protocol Unreachable), when the designated transport
|
|||
|
protocol is not supported; or
|
|||
|
|
|||
|
3 (Port Unreachable), when the designated transport
|
|||
|
protocol (e.g., UDP) is unable to demultiplex the
|
|||
|
datagram but has no protocol mechanism to inform the
|
|||
|
sender.
|
|||
|
|
|||
|
A Destination Unreachable message that is received MUST be
|
|||
|
reported to the transport layer. The transport layer SHOULD
|
|||
|
use the information appropriately; for example, see Sections
|
|||
|
4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol
|
|||
|
that has its own mechanism for notifying the sender that a
|
|||
|
port is unreachable (e.g., TCP, which sends RST segments)
|
|||
|
MUST nevertheless accept an ICMP Port Unreachable for the
|
|||
|
same purpose.
|
|||
|
|
|||
|
A Destination Unreachable message that is received with code
|
|||
|
0 (Net), 1 (Host), or 5 (Bad Source Route) may result from a
|
|||
|
routing transient and MUST therefore be interpreted as only
|
|||
|
a hint, not proof, that the specified destination is
|
|||
|
unreachable [IP:11]. For example, it MUST NOT be used as
|
|||
|
proof of a dead gateway (see Section 3.3.1).
|
|||
|
|
|||
|
3.2.2.2 Redirect: RFC-792
|
|||
|
|
|||
|
A host SHOULD NOT send an ICMP Redirect message; Redirects
|
|||
|
are to be sent only by gateways.
|
|||
|
|
|||
|
A host receiving a Redirect message MUST update its routing
|
|||
|
information accordingly. Every host MUST be prepared to
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 40]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
accept both Host and Network Redirects and to process them
|
|||
|
as described in Section 3.3.1.2 below.
|
|||
|
|
|||
|
A Redirect message SHOULD be silently discarded if the new
|
|||
|
gateway address it specifies is not on the same connected
|
|||
|
(sub-) net through which the Redirect arrived [INTRO:2,
|
|||
|
Appendix A], or if the source of the Redirect is not the
|
|||
|
current first-hop gateway for the specified destination (see
|
|||
|
Section 3.3.1).
|
|||
|
|
|||
|
3.2.2.3 Source Quench: RFC-792
|
|||
|
|
|||
|
A host MAY send a Source Quench message if it is
|
|||
|
approaching, or has reached, the point at which it is forced
|
|||
|
to discard incoming datagrams due to a shortage of
|
|||
|
reassembly buffers or other resources. See Section 2.2.3 of
|
|||
|
[INTRO:2] for suggestions on when to send Source Quench.
|
|||
|
|
|||
|
If a Source Quench message is received, the IP layer MUST
|
|||
|
report it to the transport layer (or ICMP processing). In
|
|||
|
general, the transport or application layer SHOULD implement
|
|||
|
a mechanism to respond to Source Quench for any protocol
|
|||
|
that can send a sequence of datagrams to the same
|
|||
|
destination and which can reasonably be expected to maintain
|
|||
|
enough state information to make this feasible. See Section
|
|||
|
4 for the handling of Source Quench by TCP and UDP.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
A Source Quench may be generated by the target host or
|
|||
|
by some gateway in the path of a datagram. The host
|
|||
|
receiving a Source Quench should throttle itself back
|
|||
|
for a period of time, then gradually increase the
|
|||
|
transmission rate again. The mechanism to respond to
|
|||
|
Source Quench may be in the transport layer (for
|
|||
|
connection-oriented protocols like TCP) or in the
|
|||
|
application layer (for protocols that are built on top
|
|||
|
of UDP).
|
|||
|
|
|||
|
A mechanism has been proposed [IP:14] to make the IP
|
|||
|
layer respond directly to Source Quench by controlling
|
|||
|
the rate at which datagrams are sent, however, this
|
|||
|
proposal is currently experimental and not currently
|
|||
|
recommended.
|
|||
|
|
|||
|
3.2.2.4 Time Exceeded: RFC-792
|
|||
|
|
|||
|
An incoming Time Exceeded message MUST be passed to the
|
|||
|
transport layer.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 41]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
A gateway will send a Time Exceeded Code 0 (In Transit)
|
|||
|
message when it discards a datagram due to an expired
|
|||
|
TTL field. This indicates either a gateway routing
|
|||
|
loop or too small an initial TTL value.
|
|||
|
|
|||
|
A host may receive a Time Exceeded Code 1 (Reassembly
|
|||
|
Timeout) message from a destination host that has timed
|
|||
|
out and discarded an incomplete datagram; see Section
|
|||
|
3.3.2 below. In the future, receipt of this message
|
|||
|
might be part of some "MTU discovery" procedure, to
|
|||
|
discover the maximum datagram size that can be sent on
|
|||
|
the path without fragmentation.
|
|||
|
|
|||
|
3.2.2.5 Parameter Problem: RFC-792
|
|||
|
|
|||
|
A host SHOULD generate Parameter Problem messages. An
|
|||
|
incoming Parameter Problem message MUST be passed to the
|
|||
|
transport layer, and it MAY be reported to the user.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
The ICMP Parameter Problem message is sent to the
|
|||
|
source host for any problem not specifically covered by
|
|||
|
another ICMP message. Receipt of a Parameter Problem
|
|||
|
message generally indicates some local or remote
|
|||
|
implementation error.
|
|||
|
|
|||
|
A new variant on the Parameter Problem message is hereby
|
|||
|
defined:
|
|||
|
Code 1 = required option is missing.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
This variant is currently in use in the military
|
|||
|
community for a missing security option.
|
|||
|
|
|||
|
3.2.2.6 Echo Request/Reply: RFC-792
|
|||
|
|
|||
|
Every host MUST implement an ICMP Echo server function that
|
|||
|
receives Echo Requests and sends corresponding Echo Replies.
|
|||
|
A host SHOULD also implement an application-layer interface
|
|||
|
for sending an Echo Request and receiving an Echo Reply, for
|
|||
|
diagnostic purposes.
|
|||
|
|
|||
|
An ICMP Echo Request destined to an IP broadcast or IP
|
|||
|
multicast address MAY be silently discarded.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 42]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
This neutral provision results from a passionate debate
|
|||
|
between those who feel that ICMP Echo to a broadcast
|
|||
|
address provides a valuable diagnostic capability and
|
|||
|
those who feel that misuse of this feature can too
|
|||
|
easily create packet storms.
|
|||
|
|
|||
|
The IP source address in an ICMP Echo Reply MUST be the same
|
|||
|
as the specific-destination address (defined in Section
|
|||
|
3.2.1.3) of the corresponding ICMP Echo Request message.
|
|||
|
|
|||
|
Data received in an ICMP Echo Request MUST be entirely
|
|||
|
included in the resulting Echo Reply. However, if sending
|
|||
|
the Echo Reply requires intentional fragmentation that is
|
|||
|
not implemented, the datagram MUST be truncated to maximum
|
|||
|
transmission size (see Section 3.3.3) and sent.
|
|||
|
|
|||
|
Echo Reply messages MUST be passed to the ICMP user
|
|||
|
interface, unless the corresponding Echo Request originated
|
|||
|
in the IP layer.
|
|||
|
|
|||
|
If a Record Route and/or Time Stamp option is received in an
|
|||
|
ICMP Echo Request, this option (these options) SHOULD be
|
|||
|
updated to include the current host and included in the IP
|
|||
|
header of the Echo Reply message, without "truncation".
|
|||
|
Thus, the recorded route will be for the entire round trip.
|
|||
|
|
|||
|
If a Source Route option is received in an ICMP Echo
|
|||
|
Request, the return route MUST be reversed and used as a
|
|||
|
Source Route option for the Echo Reply message.
|
|||
|
|
|||
|
3.2.2.7 Information Request/Reply: RFC-792
|
|||
|
|
|||
|
A host SHOULD NOT implement these messages.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
The Information Request/Reply pair was intended to
|
|||
|
support self-configuring systems such as diskless
|
|||
|
workstations, to allow them to discover their IP
|
|||
|
network numbers at boot time. However, the RARP and
|
|||
|
BOOTP protocols provide better mechanisms for a host to
|
|||
|
discover its own IP address.
|
|||
|
|
|||
|
3.2.2.8 Timestamp and Timestamp Reply: RFC-792
|
|||
|
|
|||
|
A host MAY implement Timestamp and Timestamp Reply. If they
|
|||
|
are implemented, the following rules MUST be followed.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 43]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
o The ICMP Timestamp server function returns a Timestamp
|
|||
|
Reply to every Timestamp message that is received. If
|
|||
|
this function is implemented, it SHOULD be designed for
|
|||
|
minimum variability in delay (e.g., implemented in the
|
|||
|
kernel to avoid delay in scheduling a user process).
|
|||
|
|
|||
|
The following cases for Timestamp are to be handled
|
|||
|
according to the corresponding rules for ICMP Echo:
|
|||
|
|
|||
|
o An ICMP Timestamp Request message to an IP broadcast or
|
|||
|
IP multicast address MAY be silently discarded.
|
|||
|
|
|||
|
o The IP source address in an ICMP Timestamp Reply MUST
|
|||
|
be the same as the specific-destination address of the
|
|||
|
corresponding Timestamp Request message.
|
|||
|
|
|||
|
o If a Source-route option is received in an ICMP Echo
|
|||
|
Request, the return route MUST be reversed and used as
|
|||
|
a Source Route option for the Timestamp Reply message.
|
|||
|
|
|||
|
o If a Record Route and/or Timestamp option is received
|
|||
|
in a Timestamp Request, this (these) option(s) SHOULD
|
|||
|
be updated to include the current host and included in
|
|||
|
the IP header of the Timestamp Reply message.
|
|||
|
|
|||
|
o Incoming Timestamp Reply messages MUST be passed up to
|
|||
|
the ICMP user interface.
|
|||
|
|
|||
|
The preferred form for a timestamp value (the "standard
|
|||
|
value") is in units of milliseconds since midnight Universal
|
|||
|
Time. However, it may be difficult to provide this value
|
|||
|
with millisecond resolution. For example, many systems use
|
|||
|
clocks that update only at line frequency, 50 or 60 times
|
|||
|
per second. Therefore, some latitude is allowed in a
|
|||
|
"standard value":
|
|||
|
|
|||
|
(a) A "standard value" MUST be updated at least 15 times
|
|||
|
per second (i.e., at most the six low-order bits of the
|
|||
|
value may be undefined).
|
|||
|
|
|||
|
(b) The accuracy of a "standard value" MUST approximate
|
|||
|
that of operator-set CPU clocks, i.e., correct within a
|
|||
|
few minutes.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 44]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
3.2.2.9 Address Mask Request/Reply: RFC-950
|
|||
|
|
|||
|
A host MUST support the first, and MAY implement all three,
|
|||
|
of the following methods for determining the address mask(s)
|
|||
|
corresponding to its IP address(es):
|
|||
|
|
|||
|
(1) static configuration information;
|
|||
|
|
|||
|
(2) obtaining the address mask(s) dynamically as a side-
|
|||
|
effect of the system initialization process (see
|
|||
|
[INTRO:1]); and
|
|||
|
|
|||
|
(3) sending ICMP Address Mask Request(s) and receiving ICMP
|
|||
|
Address Mask Reply(s).
|
|||
|
|
|||
|
The choice of method to be used in a particular host MUST be
|
|||
|
configurable.
|
|||
|
|
|||
|
When method (3), the use of Address Mask messages, is
|
|||
|
enabled, then:
|
|||
|
|
|||
|
(a) When it initializes, the host MUST broadcast an Address
|
|||
|
Mask Request message on the connected network
|
|||
|
corresponding to the IP address. It MUST retransmit
|
|||
|
this message a small number of times if it does not
|
|||
|
receive an immediate Address Mask Reply.
|
|||
|
|
|||
|
(b) Until it has received an Address Mask Reply, the host
|
|||
|
SHOULD assume a mask appropriate for the address class
|
|||
|
of the IP address, i.e., assume that the connected
|
|||
|
network is not subnetted.
|
|||
|
|
|||
|
(c) The first Address Mask Reply message received MUST be
|
|||
|
used to set the address mask corresponding to the
|
|||
|
particular local IP address. This is true even if the
|
|||
|
first Address Mask Reply message is "unsolicited", in
|
|||
|
which case it will have been broadcast and may arrive
|
|||
|
after the host has ceased to retransmit Address Mask
|
|||
|
Requests. Once the mask has been set by an Address
|
|||
|
Mask Reply, later Address Mask Reply messages MUST be
|
|||
|
(silently) ignored.
|
|||
|
|
|||
|
Conversely, if Address Mask messages are disabled, then no
|
|||
|
ICMP Address Mask Requests will be sent, and any ICMP
|
|||
|
Address Mask Replies received for that local IP address MUST
|
|||
|
be (silently) ignored.
|
|||
|
|
|||
|
A host SHOULD make some reasonableness check on any address
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 45]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
mask it installs; see IMPLEMENTATION section below.
|
|||
|
|
|||
|
A system MUST NOT send an Address Mask Reply unless it is an
|
|||
|
authoritative agent for address masks. An authoritative
|
|||
|
agent may be a host or a gateway, but it MUST be explicitly
|
|||
|
configured as a address mask agent. Receiving an address
|
|||
|
mask via an Address Mask Reply does not give the receiver
|
|||
|
authority and MUST NOT be used as the basis for issuing
|
|||
|
Address Mask Replies.
|
|||
|
|
|||
|
With a statically configured address mask, there SHOULD be
|
|||
|
an additional configuration flag that determines whether the
|
|||
|
host is to act as an authoritative agent for this mask,
|
|||
|
i.e., whether it will answer Address Mask Request messages
|
|||
|
using this mask.
|
|||
|
|
|||
|
If it is configured as an agent, the host MUST broadcast an
|
|||
|
Address Mask Reply for the mask on the appropriate interface
|
|||
|
when it initializes.
|
|||
|
|
|||
|
See "System Initialization" in [INTRO:1] for more
|
|||
|
information about the use of Address Mask Request/Reply
|
|||
|
messages.
|
|||
|
|
|||
|
DISCUSSION
|
|||
|
Hosts that casually send Address Mask Replies with
|
|||
|
invalid address masks have often been a serious
|
|||
|
nuisance. To prevent this, Address Mask Replies ought
|
|||
|
to be sent only by authoritative agents that have been
|
|||
|
selected by explicit administrative action.
|
|||
|
|
|||
|
When an authoritative agent receives an Address Mask
|
|||
|
Request message, it will send a unicast Address Mask
|
|||
|
Reply to the source IP address. If the network part of
|
|||
|
this address is zero (see (a) and (b) in 3.2.1.3), the
|
|||
|
Reply will be broadcast.
|
|||
|
|
|||
|
Getting no reply to its Address Mask Request messages,
|
|||
|
a host will assume there is no agent and use an
|
|||
|
unsubnetted mask, but the agent may be only temporarily
|
|||
|
unreachable. An agent will broadcast an unsolicited
|
|||
|
Address Mask Reply whenever it initializes, in order to
|
|||
|
update the masks of all hosts that have initialized in
|
|||
|
the meantime.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
The following reasonableness check on an address mask
|
|||
|
is suggested: the mask is not all 1 bits, and it is
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 46]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
either zero or else the 8 highest-order bits are on.
|
|||
|
|
|||
|
3.2.3 Internet Group Management Protocol IGMP
|
|||
|
|
|||
|
IGMP [IP:4] is a protocol used between hosts and gateways on a
|
|||
|
single network to establish hosts' membership in particular
|
|||
|
multicast groups. The gateways use this information, in
|
|||
|
conjunction with a multicast routing protocol, to support IP
|
|||
|
multicasting across the Internet.
|
|||
|
|
|||
|
At this time, implementation of IGMP is OPTIONAL; see Section
|
|||
|
3.3.7 for more information. Without IGMP, a host can still
|
|||
|
participate in multicasting local to its connected networks.
|
|||
|
|
|||
|
3.3 SPECIFIC ISSUES
|
|||
|
|
|||
|
3.3.1 Routing Outbound Datagrams
|
|||
|
|
|||
|
The IP layer chooses the correct next hop for each datagram it
|
|||
|
sends. If the destination is on a connected network, the
|
|||
|
datagram is sent directly to the destination host; otherwise,
|
|||
|
it has to be routed to a gateway on a connected network.
|
|||
|
|
|||
|
3.3.1.1 Local/Remote Decision
|
|||
|
|
|||
|
To decide if the destination is on a connected network, the
|
|||
|
following algorithm MUST be used [see IP:3]:
|
|||
|
|
|||
|
(a) The address mask (particular to a local IP address for
|
|||
|
a multihomed host) is a 32-bit mask that selects the
|
|||
|
network number and subnet number fields of the
|
|||
|
corresponding IP address.
|
|||
|
|
|||
|
(b) If the IP destination address bits extracted by the
|
|||
|
address mask match the IP source address bits extracted
|
|||
|
by the same mask, then the destination is on the
|
|||
|
corresponding connected network, and the datagram is to
|
|||
|
be transmitted directly to the destination host.
|
|||
|
|
|||
|
(c) If not, then the destination is accessible only through
|
|||
|
a gateway. Selection of a gateway is described below
|
|||
|
(3.3.1.2).
|
|||
|
|
|||
|
A special-case destination address is handled as follows:
|
|||
|
|
|||
|
* For a limited broadcast or a multicast address, simply
|
|||
|
pass the datagram to the link layer for the appropriate
|
|||
|
interface.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 47]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
* For a (network or subnet) directed broadcast, the
|
|||
|
datagram can use the standard routing algorithms.
|
|||
|
|
|||
|
The host IP layer MUST operate correctly in a minimal
|
|||
|
network environment, and in particular, when there are no
|
|||
|
gateways. For example, if the IP layer of a host insists on
|
|||
|
finding at least one gateway to initialize, the host will be
|
|||
|
unable to operate on a single isolated broadcast net.
|
|||
|
|
|||
|
3.3.1.2 Gateway Selection
|
|||
|
|
|||
|
To efficiently route a series of datagrams to the same
|
|||
|
destination, the source host MUST keep a "route cache" of
|
|||
|
mappings to next-hop gateways. A host uses the following
|
|||
|
basic algorithm on this cache to route a datagram; this
|
|||
|
algorithm is designed to put the primary routing burden on
|
|||
|
the gateways [IP:11].
|
|||
|
|
|||
|
(a) If the route cache contains no information for a
|
|||
|
particular destination, the host chooses a "default"
|
|||
|
gateway and sends the datagram to it. It also builds a
|
|||
|
corresponding Route Cache entry.
|
|||
|
|
|||
|
(b) If that gateway is not the best next hop to the
|
|||
|
destination, the gateway will forward the datagram to
|
|||
|
the best next-hop gateway and return an ICMP Redirect
|
|||
|
message to the source host.
|
|||
|
|
|||
|
(c) When it receives a Redirect, the host updates the
|
|||
|
next-hop gateway in the appropriate route cache entry,
|
|||
|
so later datagrams to the same destination will go
|
|||
|
directly to the best gateway.
|
|||
|
|
|||
|
Since the subnet mask appropriate to the destination address
|
|||
|
is generally not known, a Network Redirect message SHOULD be
|
|||
|
treated identically to a Host Redirect message; i.e., the
|
|||
|
cache entry for the destination host (only) would be updated
|
|||
|
(or created, if an entry for that host did not exist) for
|
|||
|
the new gateway.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
This recommendation is to protect against gateways that
|
|||
|
erroneously send Network Redirects for a subnetted
|
|||
|
network, in violation of the gateway requirements
|
|||
|
[INTRO:2].
|
|||
|
|
|||
|
When there is no route cache entry for the destination host
|
|||
|
address (and the destination is not on the connected
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 48]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
network), the IP layer MUST pick a gateway from its list of
|
|||
|
"default" gateways. The IP layer MUST support multiple
|
|||
|
default gateways.
|
|||
|
|
|||
|
As an extra feature, a host IP layer MAY implement a table
|
|||
|
of "static routes". Each such static route MAY include a
|
|||
|
flag specifying whether it may be overridden by ICMP
|
|||
|
Redirects.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
A host generally needs to know at least one default
|
|||
|
gateway to get started. This information can be
|
|||
|
obtained from a configuration file or else from the
|
|||
|
host startup sequence, e.g., the BOOTP protocol (see
|
|||
|
[INTRO:1]).
|
|||
|
|
|||
|
It has been suggested that a host can augment its list
|
|||
|
of default gateways by recording any new gateways it
|
|||
|
learns about. For example, it can record every gateway
|
|||
|
to which it is ever redirected. Such a feature, while
|
|||
|
possibly useful in some circumstances, may cause
|
|||
|
problems in other cases (e.g., gateways are not all
|
|||
|
equal), and it is not recommended.
|
|||
|
|
|||
|
A static route is typically a particular preset mapping
|
|||
|
from destination host or network into a particular
|
|||
|
next-hop gateway; it might also depend on the Type-of-
|
|||
|
Service (see next section). Static routes would be set
|
|||
|
up by system administrators to override the normal
|
|||
|
automatic routing mechanism, to handle exceptional
|
|||
|
situations. However, any static routing information is
|
|||
|
a potential source of failure as configurations change
|
|||
|
or equipment fails.
|
|||
|
|
|||
|
3.3.1.3 Route Cache
|
|||
|
|
|||
|
Each route cache entry needs to include the following
|
|||
|
fields:
|
|||
|
|
|||
|
(1) Local IP address (for a multihomed host)
|
|||
|
|
|||
|
(2) Destination IP address
|
|||
|
|
|||
|
(3) Type(s)-of-Service
|
|||
|
|
|||
|
(4) Next-hop gateway IP address
|
|||
|
|
|||
|
Field (2) MAY be the full IP address of the destination
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 49]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
host, or only the destination network number. Field (3),
|
|||
|
the TOS, SHOULD be included.
|
|||
|
|
|||
|
See Section 3.3.4.2 for a discussion of the implications of
|
|||
|
multihoming for the lookup procedure in this cache.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Including the Type-of-Service field in the route cache
|
|||
|
and considering it in the host route algorithm will
|
|||
|
provide the necessary mechanism for the future when
|
|||
|
Type-of-Service routing is commonly used in the
|
|||
|
Internet. See Section 3.2.1.6.
|
|||
|
|
|||
|
Each route cache entry defines the endpoints of an
|
|||
|
Internet path. Although the connecting path may change
|
|||
|
dynamically in an arbitrary way, the transmission
|
|||
|
characteristics of the path tend to remain
|
|||
|
approximately constant over a time period longer than a
|
|||
|
single typical host-host transport connection.
|
|||
|
Therefore, a route cache entry is a natural place to
|
|||
|
cache data on the properties of the path. Examples of
|
|||
|
such properties might be the maximum unfragmented
|
|||
|
datagram size (see Section 3.3.3), or the average
|
|||
|
round-trip delay measured by a transport protocol.
|
|||
|
This data will generally be both gathered and used by a
|
|||
|
higher layer protocol, e.g., by TCP, or by an
|
|||
|
application using UDP. Experiments are currently in
|
|||
|
progress on caching path properties in this manner.
|
|||
|
|
|||
|
There is no consensus on whether the route cache should
|
|||
|
be keyed on destination host addresses alone, or allow
|
|||
|
both host and network addresses. Those who favor the
|
|||
|
use of only host addresses argue that:
|
|||
|
|
|||
|
(1) As required in Section 3.3.1.2, Redirect messages
|
|||
|
will generally result in entries keyed on
|
|||
|
destination host addresses; the simplest and most
|
|||
|
general scheme would be to use host addresses
|
|||
|
always.
|
|||
|
|
|||
|
(2) The IP layer may not always know the address mask
|
|||
|
for a network address in a complex subnetted
|
|||
|
environment.
|
|||
|
|
|||
|
(3) The use of only host addresses allows the
|
|||
|
destination address to be used as a pure 32-bit
|
|||
|
number, which may allow the Internet architecture
|
|||
|
to be more easily extended in the future without
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 50]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
any change to the hosts.
|
|||
|
|
|||
|
The opposing view is that allowing a mixture of
|
|||
|
destination hosts and networks in the route cache:
|
|||
|
|
|||
|
(1) Saves memory space.
|
|||
|
|
|||
|
(2) Leads to a simpler data structure, easily
|
|||
|
combining the cache with the tables of default and
|
|||
|
static routes (see below).
|
|||
|
|
|||
|
(3) Provides a more useful place to cache path
|
|||
|
properties, as discussed earlier.
|
|||
|
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
The cache needs to be large enough to include entries
|
|||
|
for the maximum number of destination hosts that may be
|
|||
|
in use at one time.
|
|||
|
|
|||
|
A route cache entry may also include control
|
|||
|
information used to choose an entry for replacement.
|
|||
|
This might take the form of a "recently used" bit, a
|
|||
|
use count, or a last-used timestamp, for example. It
|
|||
|
is recommended that it include the time of last
|
|||
|
modification of the entry, for diagnostic purposes.
|
|||
|
|
|||
|
An implementation may wish to reduce the overhead of
|
|||
|
scanning the route cache for every datagram to be
|
|||
|
transmitted. This may be accomplished with a hash
|
|||
|
table to speed the lookup, or by giving a connection-
|
|||
|
oriented transport protocol a "hint" or temporary
|
|||
|
handle on the appropriate cache entry, to be passed to
|
|||
|
the IP layer with each subsequent datagram.
|
|||
|
|
|||
|
Although we have described the route cache, the lists
|
|||
|
of default gateways, and a table of static routes as
|
|||
|
conceptually distinct, in practice they may be combined
|
|||
|
into a single "routing table" data structure.
|
|||
|
|
|||
|
3.3.1.4 Dead Gateway Detection
|
|||
|
|
|||
|
The IP layer MUST be able to detect the failure of a "next-
|
|||
|
hop" gateway that is listed in its route cache and to choose
|
|||
|
an alternate gateway (see Section 3.3.1.5).
|
|||
|
|
|||
|
Dead gateway detection is covered in some detail in RFC-816
|
|||
|
[IP:11]. Experience to date has not produced a complete
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 51]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
algorithm which is totally satisfactory, though it has
|
|||
|
identified several forbidden paths and promising techniques.
|
|||
|
|
|||
|
* A particular gateway SHOULD NOT be used indefinitely in
|
|||
|
the absence of positive indications that it is
|
|||
|
functioning.
|
|||
|
|
|||
|
* Active probes such as "pinging" (i.e., using an ICMP
|
|||
|
Echo Request/Reply exchange) are expensive and scale
|
|||
|
poorly. In particular, hosts MUST NOT actively check
|
|||
|
the status of a first-hop gateway by simply pinging the
|
|||
|
gateway continuously.
|
|||
|
|
|||
|
* Even when it is the only effective way to verify a
|
|||
|
gateway's status, pinging MUST be used only when
|
|||
|
traffic is being sent to the gateway and when there is
|
|||
|
no other positive indication to suggest that the
|
|||
|
gateway is functioning.
|
|||
|
|
|||
|
* To avoid pinging, the layers above and/or below the
|
|||
|
Internet layer SHOULD be able to give "advice" on the
|
|||
|
status of route cache entries when either positive
|
|||
|
(gateway OK) or negative (gateway dead) information is
|
|||
|
available.
|
|||
|
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
If an implementation does not include an adequate
|
|||
|
mechanism for detecting a dead gateway and re-routing,
|
|||
|
a gateway failure may cause datagrams to apparently
|
|||
|
vanish into a "black hole". This failure can be
|
|||
|
extremely confusing for users and difficult for network
|
|||
|
personnel to debug.
|
|||
|
|
|||
|
The dead-gateway detection mechanism must not cause
|
|||
|
unacceptable load on the host, on connected networks,
|
|||
|
or on first-hop gateway(s). The exact constraints on
|
|||
|
the timeliness of dead gateway detection and on
|
|||
|
acceptable load may vary somewhat depending on the
|
|||
|
nature of the host's mission, but a host generally
|
|||
|
needs to detect a failed first-hop gateway quickly
|
|||
|
enough that transport-layer connections will not break
|
|||
|
before an alternate gateway can be selected.
|
|||
|
|
|||
|
Passing advice from other layers of the protocol stack
|
|||
|
complicates the interfaces between the layers, but it
|
|||
|
is the preferred approach to dead gateway detection.
|
|||
|
Advice can come from almost any part of the IP/TCP
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 52]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
architecture, but it is expected to come primarily from
|
|||
|
the transport and link layers. Here are some possible
|
|||
|
sources for gateway advice:
|
|||
|
|
|||
|
o TCP or any connection-oriented transport protocol
|
|||
|
should be able to give negative advice, e.g.,
|
|||
|
triggered by excessive retransmissions.
|
|||
|
|
|||
|
o TCP may give positive advice when (new) data is
|
|||
|
acknowledged. Even though the route may be
|
|||
|
asymmetric, an ACK for new data proves that the
|
|||
|
acknowleged data must have been transmitted
|
|||
|
successfully.
|
|||
|
|
|||
|
o An ICMP Redirect message from a particular gateway
|
|||
|
should be used as positive advice about that
|
|||
|
gateway.
|
|||
|
|
|||
|
o Link-layer information that reliably detects and
|
|||
|
reports host failures (e.g., ARPANET Destination
|
|||
|
Dead messages) should be used as negative advice.
|
|||
|
|
|||
|
o Failure to ARP or to re-validate ARP mappings may
|
|||
|
be used as negative advice for the corresponding
|
|||
|
IP address.
|
|||
|
|
|||
|
o Packets arriving from a particular link-layer
|
|||
|
address are evidence that the system at this
|
|||
|
address is alive. However, turning this
|
|||
|
information into advice about gateways requires
|
|||
|
mapping the link-layer address into an IP address,
|
|||
|
and then checking that IP address against the
|
|||
|
gateways pointed to by the route cache. This is
|
|||
|
probably prohibitively inefficient.
|
|||
|
|
|||
|
Note that positive advice that is given for every
|
|||
|
datagram received may cause unacceptable overhead in
|
|||
|
the implementation.
|
|||
|
|
|||
|
While advice might be passed using required arguments
|
|||
|
in all interfaces to the IP layer, some transport and
|
|||
|
application layer protocols cannot deduce the correct
|
|||
|
advice. These interfaces must therefore allow a
|
|||
|
neutral value for advice, since either always-positive
|
|||
|
or always-negative advice leads to incorrect behavior.
|
|||
|
|
|||
|
There is another technique for dead gateway detection
|
|||
|
that has been commonly used but is not recommended.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 53]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
This technique depends upon the host passively
|
|||
|
receiving ("wiretapping") the Interior Gateway Protocol
|
|||
|
(IGP) datagrams that the gateways are broadcasting to
|
|||
|
each other. This approach has the drawback that a host
|
|||
|
needs to recognize all the interior gateway protocols
|
|||
|
that gateways may use (see [INTRO:2]). In addition, it
|
|||
|
only works on a broadcast network.
|
|||
|
|
|||
|
At present, pinging (i.e., using ICMP Echo messages) is
|
|||
|
the mechanism for gateway probing when absolutely
|
|||
|
required. A successful ping guarantees that the
|
|||
|
addressed interface and its associated machine are up,
|
|||
|
but it does not guarantee that the machine is a gateway
|
|||
|
as opposed to a host. The normal inference is that if
|
|||
|
a Redirect or other evidence indicates that a machine
|
|||
|
was a gateway, successful pings will indicate that the
|
|||
|
machine is still up and hence still a gateway.
|
|||
|
However, since a host silently discards packets that a
|
|||
|
gateway would forward or redirect, this assumption
|
|||
|
could sometimes fail. To avoid this problem, a new
|
|||
|
ICMP message under development will ask "are you a
|
|||
|
gateway?"
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
The following specific algorithm has been suggested:
|
|||
|
|
|||
|
o Associate a "reroute timer" with each gateway
|
|||
|
pointed to by the route cache. Initialize the
|
|||
|
timer to a value Tr, which must be small enough to
|
|||
|
allow detection of a dead gateway before transport
|
|||
|
connections time out.
|
|||
|
|
|||
|
o Positive advice would reset the reroute timer to
|
|||
|
Tr. Negative advice would reduce or zero the
|
|||
|
reroute timer.
|
|||
|
|
|||
|
o Whenever the IP layer used a particular gateway to
|
|||
|
route a datagram, it would check the corresponding
|
|||
|
reroute timer. If the timer had expired (reached
|
|||
|
zero), the IP layer would send a ping to the
|
|||
|
gateway, followed immediately by the datagram.
|
|||
|
|
|||
|
o The ping (ICMP Echo) would be sent again if
|
|||
|
necessary, up to N times. If no ping reply was
|
|||
|
received in N tries, the gateway would be assumed
|
|||
|
to have failed, and a new first-hop gateway would
|
|||
|
be chosen for all cache entries pointing to the
|
|||
|
failed gateway.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 54]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
Note that the size of Tr is inversely related to the
|
|||
|
amount of advice available. Tr should be large enough
|
|||
|
to insure that:
|
|||
|
|
|||
|
* Any pinging will be at a low level (e.g., <10%) of
|
|||
|
all packets sent to a gateway from the host, AND
|
|||
|
|
|||
|
* pinging is infrequent (e.g., every 3 minutes)
|
|||
|
|
|||
|
Since the recommended algorithm is concerned with the
|
|||
|
gateways pointed to by route cache entries, rather than
|
|||
|
the cache entries themselves, a two level data
|
|||
|
structure (perhaps coordinated with ARP or similar
|
|||
|
caches) may be desirable for implementing a route
|
|||
|
cache.
|
|||
|
|
|||
|
3.3.1.5 New Gateway Selection
|
|||
|
|
|||
|
If the failed gateway is not the current default, the IP
|
|||
|
layer can immediately switch to a default gateway. If it is
|
|||
|
the current default that failed, the IP layer MUST select a
|
|||
|
different default gateway (assuming more than one default is
|
|||
|
known) for the failed route and for establishing new routes.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
When a gateway does fail, the other gateways on the
|
|||
|
connected network will learn of the failure through
|
|||
|
some inter-gateway routing protocol. However, this
|
|||
|
will not happen instantaneously, since gateway routing
|
|||
|
protocols typically have a settling time of 30-60
|
|||
|
seconds. If the host switches to an alternative
|
|||
|
gateway before the gateways have agreed on the failure,
|
|||
|
the new target gateway will probably forward the
|
|||
|
datagram to the failed gateway and send a Redirect back
|
|||
|
to the host pointing to the failed gateway (!). The
|
|||
|
result is likely to be a rapid oscillation in the
|
|||
|
contents of the host's route cache during the gateway
|
|||
|
settling period. It has been proposed that the dead-
|
|||
|
gateway logic should include some hysteresis mechanism
|
|||
|
to prevent such oscillations. However, experience has
|
|||
|
not shown any harm from such oscillations, since
|
|||
|
service cannot be restored to the host until the
|
|||
|
gateways' routing information does settle down.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
One implementation technique for choosing a new default
|
|||
|
gateway is to simply round-robin among the default
|
|||
|
gateways in the host's list. Another is to rank the
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 55]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
gateways in priority order, and when the current
|
|||
|
default gateway is not the highest priority one, to
|
|||
|
"ping" the higher-priority gateways slowly to detect
|
|||
|
when they return to service. This pinging can be at a
|
|||
|
very low rate, e.g., 0.005 per second.
|
|||
|
|
|||
|
3.3.1.6 Initialization
|
|||
|
|
|||
|
The following information MUST be configurable:
|
|||
|
|
|||
|
(1) IP address(es).
|
|||
|
|
|||
|
(2) Address mask(s).
|
|||
|
|
|||
|
(3) A list of default gateways, with a preference level.
|
|||
|
|
|||
|
A manual method of entering this configuration data MUST be
|
|||
|
provided. In addition, a variety of methods can be used to
|
|||
|
determine this information dynamically; see the section on
|
|||
|
"Host Initialization" in [INTRO:1].
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Some host implementations use "wiretapping" of gateway
|
|||
|
protocols on a broadcast network to learn what gateways
|
|||
|
exist. A standard method for default gateway discovery
|
|||
|
is under development.
|
|||
|
|
|||
|
3.3.2 Reassembly
|
|||
|
|
|||
|
The IP layer MUST implement reassembly of IP datagrams.
|
|||
|
|
|||
|
We designate the largest datagram size that can be reassembled
|
|||
|
by EMTU_R ("Effective MTU to receive"); this is sometimes
|
|||
|
called the "reassembly buffer size". EMTU_R MUST be greater
|
|||
|
than or equal to 576, SHOULD be either configurable or
|
|||
|
indefinite, and SHOULD be greater than or equal to the MTU of
|
|||
|
the connected network(s).
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
A fixed EMTU_R limit should not be built into the code
|
|||
|
because some application layer protocols require EMTU_R
|
|||
|
values larger than 576.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
An implementation may use a contiguous reassembly buffer
|
|||
|
for each datagram, or it may use a more complex data
|
|||
|
structure that places no definite limit on the reassembled
|
|||
|
datagram size; in the latter case, EMTU_R is said to be
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 56]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
"indefinite".
|
|||
|
|
|||
|
Logically, reassembly is performed by simply copying each
|
|||
|
fragment into the packet buffer at the proper offset.
|
|||
|
Note that fragments may overlap if successive
|
|||
|
retransmissions use different packetizing but the same
|
|||
|
reassembly Id.
|
|||
|
|
|||
|
The tricky part of reassembly is the bookkeeping to
|
|||
|
determine when all bytes of the datagram have been
|
|||
|
reassembled. We recommend Clark's algorithm [IP:10] that
|
|||
|
requires no additional data space for the bookkeeping.
|
|||
|
However, note that, contrary to [IP:10], the first
|
|||
|
fragment header needs to be saved for inclusion in a
|
|||
|
possible ICMP Time Exceeded (Reassembly Timeout) message.
|
|||
|
|
|||
|
There MUST be a mechanism by which the transport layer can
|
|||
|
learn MMS_R, the maximum message size that can be received and
|
|||
|
reassembled in an IP datagram (see GET_MAXSIZES calls in
|
|||
|
Section 3.4). If EMTU_R is not indefinite, then the value of
|
|||
|
MMS_R is given by:
|
|||
|
|
|||
|
MMS_R = EMTU_R - 20
|
|||
|
|
|||
|
since 20 is the minimum size of an IP header.
|
|||
|
|
|||
|
There MUST be a reassembly timeout. The reassembly timeout
|
|||
|
value SHOULD be a fixed value, not set from the remaining TTL.
|
|||
|
It is recommended that the value lie between 60 seconds and 120
|
|||
|
seconds. If this timeout expires, the partially-reassembled
|
|||
|
datagram MUST be discarded and an ICMP Time Exceeded message
|
|||
|
sent to the source host (if fragment zero has been received).
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
The IP specification says that the reassembly timeout
|
|||
|
should be the remaining TTL from the IP header, but this
|
|||
|
does not work well because gateways generally treat TTL as
|
|||
|
a simple hop count rather than an elapsed time. If the
|
|||
|
reassembly timeout is too small, datagrams will be
|
|||
|
discarded unnecessarily, and communication may fail. The
|
|||
|
timeout needs to be at least as large as the typical
|
|||
|
maximum delay across the Internet. A realistic minimum
|
|||
|
reassembly timeout would be 60 seconds.
|
|||
|
|
|||
|
It has been suggested that a cache might be kept of
|
|||
|
round-trip times measured by transport protocols for
|
|||
|
various destinations, and that these values might be used
|
|||
|
to dynamically determine a reasonable reassembly timeout
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 57]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
value. Further investigation of this approach is
|
|||
|
required.
|
|||
|
|
|||
|
If the reassembly timeout is set too high, buffer
|
|||
|
resources in the receiving host will be tied up too long,
|
|||
|
and the MSL (Maximum Segment Lifetime) [TCP:1] will be
|
|||
|
larger than necessary. The MSL controls the maximum rate
|
|||
|
at which fragmented datagrams can be sent using distinct
|
|||
|
values of the 16-bit Ident field; a larger MSL lowers the
|
|||
|
maximum rate. The TCP specification [TCP:1] arbitrarily
|
|||
|
assumes a value of 2 minutes for MSL. This sets an upper
|
|||
|
limit on a reasonable reassembly timeout value.
|
|||
|
|
|||
|
3.3.3 Fragmentation
|
|||
|
|
|||
|
Optionally, the IP layer MAY implement a mechanism to fragment
|
|||
|
outgoing datagrams intentionally.
|
|||
|
|
|||
|
We designate by EMTU_S ("Effective MTU for sending") the
|
|||
|
maximum IP datagram size that may be sent, for a particular
|
|||
|
combination of IP source and destination addresses and perhaps
|
|||
|
TOS.
|
|||
|
|
|||
|
A host MUST implement a mechanism to allow the transport layer
|
|||
|
to learn MMS_S, the maximum transport-layer message size that
|
|||
|
may be sent for a given {source, destination, TOS} triplet (see
|
|||
|
GET_MAXSIZES call in Section 3.4). If no local fragmentation
|
|||
|
is performed, the value of MMS_S will be:
|
|||
|
|
|||
|
MMS_S = EMTU_S - <IP header size>
|
|||
|
|
|||
|
and EMTU_S must be less than or equal to the MTU of the network
|
|||
|
interface corresponding to the source address of the datagram.
|
|||
|
Note that <IP header size> in this equation will be 20, unless
|
|||
|
the IP reserves space to insert IP options for its own purposes
|
|||
|
in addition to any options inserted by the transport layer.
|
|||
|
|
|||
|
A host that does not implement local fragmentation MUST ensure
|
|||
|
that the transport layer (for TCP) or the application layer
|
|||
|
(for UDP) obtains MMS_S from the IP layer and does not send a
|
|||
|
datagram exceeding MMS_S in size.
|
|||
|
|
|||
|
It is generally desirable to avoid local fragmentation and to
|
|||
|
choose EMTU_S low enough to avoid fragmentation in any gateway
|
|||
|
along the path. In the absence of actual knowledge of the
|
|||
|
minimum MTU along the path, the IP layer SHOULD use
|
|||
|
EMTU_S <= 576 whenever the destination address is not on a
|
|||
|
connected network, and otherwise use the connected network's
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 58]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
MTU.
|
|||
|
|
|||
|
The MTU of each physical interface MUST be configurable.
|
|||
|
|
|||
|
A host IP layer implementation MAY have a configuration flag
|
|||
|
"All-Subnets-MTU", indicating that the MTU of the connected
|
|||
|
network is to be used for destinations on different subnets
|
|||
|
within the same network, but not for other networks. Thus,
|
|||
|
this flag causes the network class mask, rather than the subnet
|
|||
|
address mask, to be used to choose an EMTU_S. For a multihomed
|
|||
|
host, an "All-Subnets-MTU" flag is needed for each network
|
|||
|
interface.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Picking the correct datagram size to use when sending data
|
|||
|
is a complex topic [IP:9].
|
|||
|
|
|||
|
(a) In general, no host is required to accept an IP
|
|||
|
datagram larger than 576 bytes (including header and
|
|||
|
data), so a host must not send a larger datagram
|
|||
|
without explicit knowledge or prior arrangement with
|
|||
|
the destination host. Thus, MMS_S is only an upper
|
|||
|
bound on the datagram size that a transport protocol
|
|||
|
may send; even when MMS_S exceeds 556, the transport
|
|||
|
layer must limit its messages to 556 bytes in the
|
|||
|
absence of other knowledge about the destination
|
|||
|
host.
|
|||
|
|
|||
|
(b) Some transport protocols (e.g., TCP) provide a way to
|
|||
|
explicitly inform the sender about the largest
|
|||
|
datagram the other end can receive and reassemble
|
|||
|
[IP:7]. There is no corresponding mechanism in the
|
|||
|
IP layer.
|
|||
|
|
|||
|
A transport protocol that assumes an EMTU_R larger
|
|||
|
than 576 (see Section 3.3.2), can send a datagram of
|
|||
|
this larger size to another host that implements the
|
|||
|
same protocol.
|
|||
|
|
|||
|
(c) Hosts should ideally limit their EMTU_S for a given
|
|||
|
destination to the minimum MTU of all the networks
|
|||
|
along the path, to avoid any fragmentation. IP
|
|||
|
fragmentation, while formally correct, can create a
|
|||
|
serious transport protocol performance problem,
|
|||
|
because loss of a single fragment means all the
|
|||
|
fragments in the segment must be retransmitted
|
|||
|
[IP:9].
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 59]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
Since nearly all networks in the Internet currently
|
|||
|
support an MTU of 576 or greater, we strongly recommend
|
|||
|
the use of 576 for datagrams sent to non-local networks.
|
|||
|
|
|||
|
It has been suggested that a host could determine the MTU
|
|||
|
over a given path by sending a zero-offset datagram
|
|||
|
fragment and waiting for the receiver to time out the
|
|||
|
reassembly (which cannot complete!) and return an ICMP
|
|||
|
Time Exceeded message. This message would include the
|
|||
|
largest remaining fragment header in its body. More
|
|||
|
direct mechanisms are being experimented with, but have
|
|||
|
not yet been adopted (see e.g., RFC-1063).
|
|||
|
|
|||
|
3.3.4 Local Multihoming
|
|||
|
|
|||
|
3.3.4.1 Introduction
|
|||
|
|
|||
|
A multihomed host has multiple IP addresses, which we may
|
|||
|
think of as "logical interfaces". These logical interfaces
|
|||
|
may be associated with one or more physical interfaces, and
|
|||
|
these physical interfaces may be connected to the same or
|
|||
|
different networks.
|
|||
|
|
|||
|
Here are some important cases of multihoming:
|
|||
|
|
|||
|
(a) Multiple Logical Networks
|
|||
|
|
|||
|
The Internet architects envisioned that each physical
|
|||
|
network would have a single unique IP network (or
|
|||
|
subnet) number. However, LAN administrators have
|
|||
|
sometimes found it useful to violate this assumption,
|
|||
|
operating a LAN with multiple logical networks per
|
|||
|
physical connected network.
|
|||
|
|
|||
|
If a host connected to such a physical network is
|
|||
|
configured to handle traffic for each of N different
|
|||
|
logical networks, then the host will have N logical
|
|||
|
interfaces. These could share a single physical
|
|||
|
interface, or might use N physical interfaces to the
|
|||
|
same network.
|
|||
|
|
|||
|
(b) Multiple Logical Hosts
|
|||
|
|
|||
|
When a host has multiple IP addresses that all have the
|
|||
|
same <Network-number> part (and the same <Subnet-
|
|||
|
number> part, if any), the logical interfaces are known
|
|||
|
as "logical hosts". These logical interfaces might
|
|||
|
share a single physical interface or might use separate
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 60]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
physical interfaces to the same physical network.
|
|||
|
|
|||
|
(c) Simple Multihoming
|
|||
|
|
|||
|
In this case, each logical interface is mapped into a
|
|||
|
separate physical interface and each physical interface
|
|||
|
is connected to a different physical network. The term
|
|||
|
"multihoming" was originally applied only to this case,
|
|||
|
but it is now applied more generally.
|
|||
|
|
|||
|
A host with embedded gateway functionality will
|
|||
|
typically fall into the simple multihoming case. Note,
|
|||
|
however, that a host may be simply multihomed without
|
|||
|
containing an embedded gateway, i.e., without
|
|||
|
forwarding datagrams from one connected network to
|
|||
|
another.
|
|||
|
|
|||
|
This case presents the most difficult routing problems.
|
|||
|
The choice of interface (i.e., the choice of first-hop
|
|||
|
network) may significantly affect performance or even
|
|||
|
reachability of remote parts of the Internet.
|
|||
|
|
|||
|
|
|||
|
Finally, we note another possibility that is NOT
|
|||
|
multihoming: one logical interface may be bound to multiple
|
|||
|
physical interfaces, in order to increase the reliability or
|
|||
|
throughput between directly connected machines by providing
|
|||
|
alternative physical paths between them. For instance, two
|
|||
|
systems might be connected by multiple point-to-point links.
|
|||
|
We call this "link-layer multiplexing". With link-layer
|
|||
|
multiplexing, the protocols above the link layer are unaware
|
|||
|
that multiple physical interfaces are present; the link-
|
|||
|
layer device driver is responsible for multiplexing and
|
|||
|
routing packets across the physical interfaces.
|
|||
|
|
|||
|
In the Internet protocol architecture, a transport protocol
|
|||
|
instance ("entity") has no address of its own, but instead
|
|||
|
uses a single Internet Protocol (IP) address. This has
|
|||
|
implications for the IP, transport, and application layers,
|
|||
|
and for the interfaces between them. In particular, the
|
|||
|
application software may have to be aware of the multiple IP
|
|||
|
addresses of a multihomed host; in other cases, the choice
|
|||
|
can be made within the network software.
|
|||
|
|
|||
|
3.3.4.2 Multihoming Requirements
|
|||
|
|
|||
|
The following general rules apply to the selection of an IP
|
|||
|
source address for sending a datagram from a multihomed
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 61]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
host.
|
|||
|
|
|||
|
(1) If the datagram is sent in response to a received
|
|||
|
datagram, the source address for the response SHOULD be
|
|||
|
the specific-destination address of the request. See
|
|||
|
Sections 4.1.3.5 and 4.2.3.7 and the "General Issues"
|
|||
|
section of [INTRO:1] for more specific requirements on
|
|||
|
higher layers.
|
|||
|
|
|||
|
Otherwise, a source address must be selected.
|
|||
|
|
|||
|
(2) An application MUST be able to explicitly specify the
|
|||
|
source address for initiating a connection or a
|
|||
|
request.
|
|||
|
|
|||
|
(3) In the absence of such a specification, the networking
|
|||
|
software MUST choose a source address. Rules for this
|
|||
|
choice are described below.
|
|||
|
|
|||
|
|
|||
|
There are two key requirement issues related to multihoming:
|
|||
|
|
|||
|
(A) A host MAY silently discard an incoming datagram whose
|
|||
|
destination address does not correspond to the physical
|
|||
|
interface through which it is received.
|
|||
|
|
|||
|
(B) A host MAY restrict itself to sending (non-source-
|
|||
|
routed) IP datagrams only through the physical
|
|||
|
interface that corresponds to the IP source address of
|
|||
|
the datagrams.
|
|||
|
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Internet host implementors have used two different
|
|||
|
conceptual models for multihoming, briefly summarized
|
|||
|
in the following discussion. This document takes no
|
|||
|
stand on which model is preferred; each seems to have a
|
|||
|
place. This ambivalence is reflected in the issues (A)
|
|||
|
and (B) being optional.
|
|||
|
|
|||
|
o Strong ES Model
|
|||
|
|
|||
|
The Strong ES (End System, i.e., host) model
|
|||
|
emphasizes the host/gateway (ES/IS) distinction,
|
|||
|
and would therefore substitute MUST for MAY in
|
|||
|
issues (A) and (B) above. It tends to model a
|
|||
|
multihomed host as a set of logical hosts within
|
|||
|
the same physical host.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 62]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
With respect to (A), proponents of the Strong ES
|
|||
|
model note that automatic Internet routing
|
|||
|
mechanisms could not route a datagram to a
|
|||
|
physical interface that did not correspond to the
|
|||
|
destination address.
|
|||
|
|
|||
|
Under the Strong ES model, the route computation
|
|||
|
for an outgoing datagram is the mapping:
|
|||
|
|
|||
|
route(src IP addr, dest IP addr, TOS)
|
|||
|
-> gateway
|
|||
|
|
|||
|
Here the source address is included as a parameter
|
|||
|
in order to select a gateway that is directly
|
|||
|
reachable on the corresponding physical interface.
|
|||
|
Note that this model logically requires that in
|
|||
|
general there be at least one default gateway, and
|
|||
|
preferably multiple defaults, for each IP source
|
|||
|
address.
|
|||
|
|
|||
|
o Weak ES Model
|
|||
|
|
|||
|
This view de-emphasizes the ES/IS distinction, and
|
|||
|
would therefore substitute MUST NOT for MAY in
|
|||
|
issues (A) and (B). This model may be the more
|
|||
|
natural one for hosts that wiretap gateway routing
|
|||
|
protocols, and is necessary for hosts that have
|
|||
|
embedded gateway functionality.
|
|||
|
|
|||
|
The Weak ES Model may cause the Redirect mechanism
|
|||
|
to fail. If a datagram is sent out a physical
|
|||
|
interface that does not correspond to the
|
|||
|
destination address, the first-hop gateway will
|
|||
|
not realize when it needs to send a Redirect. On
|
|||
|
the other hand, if the host has embedded gateway
|
|||
|
functionality, then it has routing information
|
|||
|
without listening to Redirects.
|
|||
|
|
|||
|
In the Weak ES model, the route computation for an
|
|||
|
outgoing datagram is the mapping:
|
|||
|
|
|||
|
route(dest IP addr, TOS) -> gateway, interface
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 63]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
3.3.4.3 Choosing a Source Address
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
When it sends an initial connection request (e.g., a
|
|||
|
TCP "SYN" segment) or a datagram service request (e.g.,
|
|||
|
a UDP-based query), the transport layer on a multihomed
|
|||
|
host needs to know which source address to use. If the
|
|||
|
application does not specify it, the transport layer
|
|||
|
must ask the IP layer to perform the conceptual
|
|||
|
mapping:
|
|||
|
|
|||
|
GET_SRCADDR(remote IP addr, TOS)
|
|||
|
-> local IP address
|
|||
|
|
|||
|
Here TOS is the Type-of-Service value (see Section
|
|||
|
3.2.1.6), and the result is the desired source address.
|
|||
|
The following rules are suggested for implementing this
|
|||
|
mapping:
|
|||
|
|
|||
|
(a) If the remote Internet address lies on one of the
|
|||
|
(sub-) nets to which the host is directly
|
|||
|
connected, a corresponding source address may be
|
|||
|
chosen, unless the corresponding interface is
|
|||
|
known to be down.
|
|||
|
|
|||
|
(b) The route cache may be consulted, to see if there
|
|||
|
is an active route to the specified destination
|
|||
|
network through any network interface; if so, a
|
|||
|
local IP address corresponding to that interface
|
|||
|
may be chosen.
|
|||
|
|
|||
|
(c) The table of static routes, if any (see Section
|
|||
|
3.3.1.2) may be similarly consulted.
|
|||
|
|
|||
|
(d) The default gateways may be consulted. If these
|
|||
|
gateways are assigned to different interfaces, the
|
|||
|
interface corresponding to the gateway with the
|
|||
|
highest preference may be chosen.
|
|||
|
|
|||
|
In the future, there may be a defined way for a
|
|||
|
multihomed host to ask the gateways on all connected
|
|||
|
networks for advice about the best network to use for a
|
|||
|
given destination.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
It will be noted that this process is essentially the
|
|||
|
same as datagram routing (see Section 3.3.1), and
|
|||
|
therefore hosts may be able to combine the
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 64]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
implementation of the two functions.
|
|||
|
|
|||
|
3.3.5 Source Route Forwarding
|
|||
|
|
|||
|
Subject to restrictions given below, a host MAY be able to act
|
|||
|
as an intermediate hop in a source route, forwarding a source-
|
|||
|
routed datagram to the next specified hop.
|
|||
|
|
|||
|
However, in performing this gateway-like function, the host
|
|||
|
MUST obey all the relevant rules for a gateway forwarding
|
|||
|
source-routed datagrams [INTRO:2]. This includes the following
|
|||
|
specific provisions, which override the corresponding host
|
|||
|
provisions given earlier in this document:
|
|||
|
|
|||
|
(A) TTL (ref. Section 3.2.1.7)
|
|||
|
|
|||
|
The TTL field MUST be decremented and the datagram perhaps
|
|||
|
discarded as specified for a gateway in [INTRO:2].
|
|||
|
|
|||
|
(B) ICMP Destination Unreachable (ref. Section 3.2.2.1)
|
|||
|
|
|||
|
A host MUST be able to generate Destination Unreachable
|
|||
|
messages with the following codes:
|
|||
|
|
|||
|
4 (Fragmentation Required but DF Set) when a source-
|
|||
|
routed datagram cannot be fragmented to fit into the
|
|||
|
target network;
|
|||
|
|
|||
|
5 (Source Route Failed) when a source-routed datagram
|
|||
|
cannot be forwarded, e.g., because of a routing
|
|||
|
problem or because the next hop of a strict source
|
|||
|
route is not on a connected network.
|
|||
|
|
|||
|
(C) IP Source Address (ref. Section 3.2.1.3)
|
|||
|
|
|||
|
A source-routed datagram being forwarded MAY (and normally
|
|||
|
will) have a source address that is not one of the IP
|
|||
|
addresses of the forwarding host.
|
|||
|
|
|||
|
(D) Record Route Option (ref. Section 3.2.1.8d)
|
|||
|
|
|||
|
A host that is forwarding a source-routed datagram
|
|||
|
containing a Record Route option MUST update that option,
|
|||
|
if it has room.
|
|||
|
|
|||
|
(E) Timestamp Option (ref. Section 3.2.1.8e)
|
|||
|
|
|||
|
A host that is forwarding a source-routed datagram
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 65]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
containing a Timestamp Option MUST add the current
|
|||
|
timestamp to that option, according to the rules for this
|
|||
|
option.
|
|||
|
|
|||
|
To define the rules restricting host forwarding of source-
|
|||
|
routed datagrams, we use the term "local source-routing" if the
|
|||
|
next hop will be through the same physical interface through
|
|||
|
which the datagram arrived; otherwise, it is "non-local
|
|||
|
source-routing".
|
|||
|
|
|||
|
o A host is permitted to perform local source-routing
|
|||
|
without restriction.
|
|||
|
|
|||
|
o A host that supports non-local source-routing MUST have a
|
|||
|
configurable switch to disable forwarding, and this switch
|
|||
|
MUST default to disabled.
|
|||
|
|
|||
|
o The host MUST satisfy all gateway requirements for
|
|||
|
configurable policy filters [INTRO:2] restricting non-
|
|||
|
local forwarding.
|
|||
|
|
|||
|
If a host receives a datagram with an incomplete source route
|
|||
|
but does not forward it for some reason, the host SHOULD return
|
|||
|
an ICMP Destination Unreachable (code 5, Source Route Failed)
|
|||
|
message, unless the datagram was itself an ICMP error message.
|
|||
|
|
|||
|
3.3.6 Broadcasts
|
|||
|
|
|||
|
Section 3.2.1.3 defined the four standard IP broadcast address
|
|||
|
forms:
|
|||
|
|
|||
|
Limited Broadcast: {-1, -1}
|
|||
|
|
|||
|
Directed Broadcast: {<Network-number>,-1}
|
|||
|
|
|||
|
Subnet Directed Broadcast:
|
|||
|
{<Network-number>,<Subnet-number>,-1}
|
|||
|
|
|||
|
All-Subnets Directed Broadcast: {<Network-number>,-1,-1}
|
|||
|
|
|||
|
A host MUST recognize any of these forms in the destination
|
|||
|
address of an incoming datagram.
|
|||
|
|
|||
|
There is a class of hosts* that use non-standard broadcast
|
|||
|
address forms, substituting 0 for -1. All hosts SHOULD
|
|||
|
_________________________
|
|||
|
*4.2BSD Unix and its derivatives, but not 4.3BSD.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 66]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
recognize and accept any of these non-standard broadcast
|
|||
|
addresses as the destination address of an incoming datagram.
|
|||
|
A host MAY optionally have a configuration option to choose the
|
|||
|
0 or the -1 form of broadcast address, for each physical
|
|||
|
interface, but this option SHOULD default to the standard (-1)
|
|||
|
form.
|
|||
|
|
|||
|
When a host sends a datagram to a link-layer broadcast address,
|
|||
|
the IP destination address MUST be a legal IP broadcast or IP
|
|||
|
multicast address.
|
|||
|
|
|||
|
A host SHOULD silently discard a datagram that is received via
|
|||
|
a link-layer broadcast (see Section 2.4) but does not specify
|
|||
|
an IP multicast or broadcast destination address.
|
|||
|
|
|||
|
Hosts SHOULD use the Limited Broadcast address to broadcast to
|
|||
|
a connected network.
|
|||
|
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Using the Limited Broadcast address instead of a Directed
|
|||
|
Broadcast address may improve system robustness. Problems
|
|||
|
are often caused by machines that do not understand the
|
|||
|
plethora of broadcast addresses (see Section 3.2.1.3), or
|
|||
|
that may have different ideas about which broadcast
|
|||
|
addresses are in use. The prime example of the latter is
|
|||
|
machines that do not understand subnetting but are
|
|||
|
attached to a subnetted net. Sending a Subnet Broadcast
|
|||
|
for the connected network will confuse those machines,
|
|||
|
which will see it as a message to some other host.
|
|||
|
|
|||
|
There has been discussion on whether a datagram addressed
|
|||
|
to the Limited Broadcast address ought to be sent from all
|
|||
|
the interfaces of a multihomed host. This specification
|
|||
|
takes no stand on the issue.
|
|||
|
|
|||
|
3.3.7 IP Multicasting
|
|||
|
|
|||
|
A host SHOULD support local IP multicasting on all connected
|
|||
|
networks for which a mapping from Class D IP addresses to
|
|||
|
link-layer addresses has been specified (see below). Support
|
|||
|
for local IP multicasting includes sending multicast datagrams,
|
|||
|
joining multicast groups and receiving multicast datagrams, and
|
|||
|
leaving multicast groups. This implies support for all of
|
|||
|
[IP:4] except the IGMP protocol itself, which is OPTIONAL.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 67]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
IGMP provides gateways that are capable of multicast
|
|||
|
routing with the information required to support IP
|
|||
|
multicasting across multiple networks. At this time,
|
|||
|
multicast-routing gateways are in the experimental stage
|
|||
|
and are not widely available. For hosts that are not
|
|||
|
connected to networks with multicast-routing gateways or
|
|||
|
that do not need to receive multicast datagrams
|
|||
|
originating on other networks, IGMP serves no purpose and
|
|||
|
is therefore optional for now. However, the rest of
|
|||
|
[IP:4] is currently recommended for the purpose of
|
|||
|
providing IP-layer access to local network multicast
|
|||
|
addressing, as a preferable alternative to local broadcast
|
|||
|
addressing. It is expected that IGMP will become
|
|||
|
recommended at some future date, when multicast-routing
|
|||
|
gateways have become more widely available.
|
|||
|
|
|||
|
If IGMP is not implemented, a host SHOULD still join the "all-
|
|||
|
hosts" group (224.0.0.1) when the IP layer is initialized and
|
|||
|
remain a member for as long as the IP layer is active.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Joining the "all-hosts" group will support strictly local
|
|||
|
uses of multicasting, e.g., a gateway discovery protocol,
|
|||
|
even if IGMP is not implemented.
|
|||
|
|
|||
|
The mapping of IP Class D addresses to local addresses is
|
|||
|
currently specified for the following types of networks:
|
|||
|
|
|||
|
o Ethernet/IEEE 802.3, as defined in [IP:4].
|
|||
|
|
|||
|
o Any network that supports broadcast but not multicast,
|
|||
|
addressing: all IP Class D addresses map to the local
|
|||
|
broadcast address.
|
|||
|
|
|||
|
o Any type of point-to-point link (e.g., SLIP or HDLC
|
|||
|
links): no mapping required. All IP multicast datagrams
|
|||
|
are sent as-is, inside the local framing.
|
|||
|
|
|||
|
Mappings for other types of networks will be specified in the
|
|||
|
future.
|
|||
|
|
|||
|
A host SHOULD provide a way for higher-layer protocols or
|
|||
|
applications to determine which of the host's connected
|
|||
|
network(s) support IP multicast addressing.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 68]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
3.3.8 Error Reporting
|
|||
|
|
|||
|
Wherever practical, hosts MUST return ICMP error datagrams on
|
|||
|
detection of an error, except in those cases where returning an
|
|||
|
ICMP error message is specifically prohibited.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
A common phenomenon in datagram networks is the "black
|
|||
|
hole disease": datagrams are sent out, but nothing comes
|
|||
|
back. Without any error datagrams, it is difficult for
|
|||
|
the user to figure out what the problem is.
|
|||
|
|
|||
|
3.4 INTERNET/TRANSPORT LAYER INTERFACE
|
|||
|
|
|||
|
The interface between the IP layer and the transport layer MUST
|
|||
|
provide full access to all the mechanisms of the IP layer,
|
|||
|
including options, Type-of-Service, and Time-to-Live. The
|
|||
|
transport layer MUST either have mechanisms to set these interface
|
|||
|
parameters, or provide a path to pass them through from an
|
|||
|
application, or both.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Applications are urged to make use of these mechanisms where
|
|||
|
applicable, even when the mechanisms are not currently
|
|||
|
effective in the Internet (e.g., TOS). This will allow these
|
|||
|
mechanisms to be immediately useful when they do become
|
|||
|
effective, without a large amount of retrofitting of host
|
|||
|
software.
|
|||
|
|
|||
|
We now describe a conceptual interface between the transport layer
|
|||
|
and the IP layer, as a set of procedure calls. This is an
|
|||
|
extension of the information in Section 3.3 of RFC-791 [IP:1].
|
|||
|
|
|||
|
|
|||
|
* Send Datagram
|
|||
|
|
|||
|
SEND(src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt
|
|||
|
=> result )
|
|||
|
|
|||
|
where the parameters are defined in RFC-791. Passing an Id
|
|||
|
parameter is optional; see Section 3.2.1.5.
|
|||
|
|
|||
|
|
|||
|
* Receive Datagram
|
|||
|
|
|||
|
RECV(BufPTR, prot
|
|||
|
=> result, src, dst, SpecDest, TOS, len, opt)
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 69]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
All the parameters are defined in RFC-791, except for:
|
|||
|
|
|||
|
SpecDest = specific-destination address of datagram
|
|||
|
(defined in Section 3.2.1.3)
|
|||
|
|
|||
|
The result parameter dst contains the datagram's destination
|
|||
|
address. Since this may be a broadcast or multicast address,
|
|||
|
the SpecDest parameter (not shown in RFC-791) MUST be passed.
|
|||
|
The parameter opt contains all the IP options received in the
|
|||
|
datagram; these MUST also be passed to the transport layer.
|
|||
|
|
|||
|
|
|||
|
* Select Source Address
|
|||
|
|
|||
|
GET_SRCADDR(remote, TOS) -> local
|
|||
|
|
|||
|
remote = remote IP address
|
|||
|
TOS = Type-of-Service
|
|||
|
local = local IP address
|
|||
|
|
|||
|
See Section 3.3.4.3.
|
|||
|
|
|||
|
|
|||
|
* Find Maximum Datagram Sizes
|
|||
|
|
|||
|
GET_MAXSIZES(local, remote, TOS) -> MMS_R, MMS_S
|
|||
|
|
|||
|
MMS_R = maximum receive transport-message size.
|
|||
|
MMS_S = maximum send transport-message size.
|
|||
|
(local, remote, TOS defined above)
|
|||
|
|
|||
|
See Sections 3.3.2 and 3.3.3.
|
|||
|
|
|||
|
|
|||
|
* Advice on Delivery Success
|
|||
|
|
|||
|
ADVISE_DELIVPROB(sense, local, remote, TOS)
|
|||
|
|
|||
|
Here the parameter sense is a 1-bit flag indicating whether
|
|||
|
positive or negative advice is being given; see the
|
|||
|
discussion in Section 3.3.1.4. The other parameters were
|
|||
|
defined earlier.
|
|||
|
|
|||
|
|
|||
|
* Send ICMP Message
|
|||
|
|
|||
|
SEND_ICMP(src, dst, TOS, TTL, BufPTR, len, Id, DF, opt)
|
|||
|
-> result
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 70]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
(Parameters defined in RFC-791).
|
|||
|
|
|||
|
Passing an Id parameter is optional; see Section 3.2.1.5.
|
|||
|
The transport layer MUST be able to send certain ICMP
|
|||
|
messages: Port Unreachable or any of the query-type
|
|||
|
messages. This function could be considered to be a special
|
|||
|
case of the SEND() call, of course; we describe it separately
|
|||
|
for clarity.
|
|||
|
|
|||
|
|
|||
|
* Receive ICMP Message
|
|||
|
|
|||
|
RECV_ICMP(BufPTR ) -> result, src, dst, len, opt
|
|||
|
|
|||
|
(Parameters defined in RFC-791).
|
|||
|
|
|||
|
The IP layer MUST pass certain ICMP messages up to the
|
|||
|
appropriate transport-layer routine. This function could be
|
|||
|
considered to be a special case of the RECV() call, of
|
|||
|
course; we describe it separately for clarity.
|
|||
|
|
|||
|
For an ICMP error message, the data that is passed up MUST
|
|||
|
include the original Internet header plus all the octets of
|
|||
|
the original message that are included in the ICMP message.
|
|||
|
This data will be used by the transport layer to locate the
|
|||
|
connection state information, if any.
|
|||
|
|
|||
|
In particular, the following ICMP messages are to be passed
|
|||
|
up:
|
|||
|
|
|||
|
o Destination Unreachable
|
|||
|
|
|||
|
o Source Quench
|
|||
|
|
|||
|
o Echo Reply (to ICMP user interface, unless the Echo
|
|||
|
Request originated in the IP layer)
|
|||
|
|
|||
|
o Timestamp Reply (to ICMP user interface)
|
|||
|
|
|||
|
o Time Exceeded
|
|||
|
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
In the future, there may be additions to this interface to
|
|||
|
pass path data (see Section 3.3.1.3) between the IP and
|
|||
|
transport layers.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 71]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
3.5 INTERNET LAYER REQUIREMENTS SUMMARY
|
|||
|
|
|||
|
|
|||
|
| | | | |S| |
|
|||
|
| | | | |H| |F
|
|||
|
| | | | |O|M|o
|
|||
|
| | |S| |U|U|o
|
|||
|
| | |H| |L|S|t
|
|||
|
| |M|O| |D|T|n
|
|||
|
| |U|U|M| | |o
|
|||
|
| |S|L|A|N|N|t
|
|||
|
| |T|D|Y|O|O|t
|
|||
|
FEATURE |SECTION | | | |T|T|e
|
|||
|
-------------------------------------------------|--------|-|-|-|-|-|--
|
|||
|
| | | | | | |
|
|||
|
Implement IP and ICMP |3.1 |x| | | | |
|
|||
|
Handle remote multihoming in application layer |3.1 |x| | | | |
|
|||
|
Support local multihoming |3.1 | | |x| | |
|
|||
|
Meet gateway specs if forward datagrams |3.1 |x| | | | |
|
|||
|
Configuration switch for embedded gateway |3.1 |x| | | | |1
|
|||
|
Config switch default to non-gateway |3.1 |x| | | | |1
|
|||
|
Auto-config based on number of interfaces |3.1 | | | | |x|1
|
|||
|
Able to log discarded datagrams |3.1 | |x| | | |
|
|||
|
Record in counter |3.1 | |x| | | |
|
|||
|
| | | | | | |
|
|||
|
Silently discard Version != 4 |3.2.1.1 |x| | | | |
|
|||
|
Verify IP checksum, silently discard bad dgram |3.2.1.2 |x| | | | |
|
|||
|
Addressing: | | | | | | |
|
|||
|
Subnet addressing (RFC-950) |3.2.1.3 |x| | | | |
|
|||
|
Src address must be host's own IP address |3.2.1.3 |x| | | | |
|
|||
|
Silently discard datagram with bad dest addr |3.2.1.3 |x| | | | |
|
|||
|
Silently discard datagram with bad src addr |3.2.1.3 |x| | | | |
|
|||
|
Support reassembly |3.2.1.4 |x| | | | |
|
|||
|
Retain same Id field in identical datagram |3.2.1.5 | | |x| | |
|
|||
|
| | | | | | |
|
|||
|
TOS: | | | | | | |
|
|||
|
Allow transport layer to set TOS |3.2.1.6 |x| | | | |
|
|||
|
Pass received TOS up to transport layer |3.2.1.6 | |x| | | |
|
|||
|
Use RFC-795 link-layer mappings for TOS |3.2.1.6 | | | |x| |
|
|||
|
TTL: | | | | | | |
|
|||
|
Send packet with TTL of 0 |3.2.1.7 | | | | |x|
|
|||
|
Discard received packets with TTL < 2 |3.2.1.7 | | | | |x|
|
|||
|
Allow transport layer to set TTL |3.2.1.7 |x| | | | |
|
|||
|
Fixed TTL is configurable |3.2.1.7 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
IP Options: | | | | | | |
|
|||
|
Allow transport layer to send IP options |3.2.1.8 |x| | | | |
|
|||
|
Pass all IP options rcvd to higher layer |3.2.1.8 |x| | | | |
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 72]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
IP layer silently ignore unknown options |3.2.1.8 |x| | | | |
|
|||
|
Security option |3.2.1.8a| | |x| | |
|
|||
|
Send Stream Identifier option |3.2.1.8b| | | |x| |
|
|||
|
Silently ignore Stream Identifer option |3.2.1.8b|x| | | | |
|
|||
|
Record Route option |3.2.1.8d| | |x| | |
|
|||
|
Timestamp option |3.2.1.8e| | |x| | |
|
|||
|
Source Route Option: | | | | | | |
|
|||
|
Originate & terminate Source Route options |3.2.1.8c|x| | | | |
|
|||
|
Datagram with completed SR passed up to TL |3.2.1.8c|x| | | | |
|
|||
|
Build correct (non-redundant) return route |3.2.1.8c|x| | | | |
|
|||
|
Send multiple SR options in one header |3.2.1.8c| | | | |x|
|
|||
|
| | | | | | |
|
|||
|
ICMP: | | | | | | |
|
|||
|
Silently discard ICMP msg with unknown type |3.2.2 |x| | | | |
|
|||
|
Include more than 8 octets of orig datagram |3.2.2 | | |x| | |
|
|||
|
Included octets same as received |3.2.2 |x| | | | |
|
|||
|
Demux ICMP Error to transport protocol |3.2.2 |x| | | | |
|
|||
|
Send ICMP error message with TOS=0 |3.2.2 | |x| | | |
|
|||
|
Send ICMP error message for: | | | | | | |
|
|||
|
- ICMP error msg |3.2.2 | | | | |x|
|
|||
|
- IP b'cast or IP m'cast |3.2.2 | | | | |x|
|
|||
|
- Link-layer b'cast |3.2.2 | | | | |x|
|
|||
|
- Non-initial fragment |3.2.2 | | | | |x|
|
|||
|
- Datagram with non-unique src address |3.2.2 | | | | |x|
|
|||
|
Return ICMP error msgs (when not prohibited) |3.3.8 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
Dest Unreachable: | | | | | | |
|
|||
|
Generate Dest Unreachable (code 2/3) |3.2.2.1 | |x| | | |
|
|||
|
Pass ICMP Dest Unreachable to higher layer |3.2.2.1 |x| | | | |
|
|||
|
Higher layer act on Dest Unreach |3.2.2.1 | |x| | | |
|
|||
|
Interpret Dest Unreach as only hint |3.2.2.1 |x| | | | |
|
|||
|
Redirect: | | | | | | |
|
|||
|
Host send Redirect |3.2.2.2 | | | |x| |
|
|||
|
Update route cache when recv Redirect |3.2.2.2 |x| | | | |
|
|||
|
Handle both Host and Net Redirects |3.2.2.2 |x| | | | |
|
|||
|
Discard illegal Redirect |3.2.2.2 | |x| | | |
|
|||
|
Source Quench: | | | | | | |
|
|||
|
Send Source Quench if buffering exceeded |3.2.2.3 | | |x| | |
|
|||
|
Pass Source Quench to higher layer |3.2.2.3 |x| | | | |
|
|||
|
Higher layer act on Source Quench |3.2.2.3 | |x| | | |
|
|||
|
Time Exceeded: pass to higher layer |3.2.2.4 |x| | | | |
|
|||
|
Parameter Problem: | | | | | | |
|
|||
|
Send Parameter Problem messages |3.2.2.5 | |x| | | |
|
|||
|
Pass Parameter Problem to higher layer |3.2.2.5 |x| | | | |
|
|||
|
Report Parameter Problem to user |3.2.2.5 | | |x| | |
|
|||
|
| | | | | | |
|
|||
|
ICMP Echo Request or Reply: | | | | | | |
|
|||
|
Echo server and Echo client |3.2.2.6 |x| | | | |
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 73]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
Echo client |3.2.2.6 | |x| | | |
|
|||
|
Discard Echo Request to broadcast address |3.2.2.6 | | |x| | |
|
|||
|
Discard Echo Request to multicast address |3.2.2.6 | | |x| | |
|
|||
|
Use specific-dest addr as Echo Reply src |3.2.2.6 |x| | | | |
|
|||
|
Send same data in Echo Reply |3.2.2.6 |x| | | | |
|
|||
|
Pass Echo Reply to higher layer |3.2.2.6 |x| | | | |
|
|||
|
Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | |
|
|||
|
Reverse and reflect Source Route option |3.2.2.6 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
ICMP Information Request or Reply: |3.2.2.7 | | | |x| |
|
|||
|
ICMP Timestamp and Timestamp Reply: |3.2.2.8 | | |x| | |
|
|||
|
Minimize delay variability |3.2.2.8 | |x| | | |1
|
|||
|
Silently discard b'cast Timestamp |3.2.2.8 | | |x| | |1
|
|||
|
Silently discard m'cast Timestamp |3.2.2.8 | | |x| | |1
|
|||
|
Use specific-dest addr as TS Reply src |3.2.2.8 |x| | | | |1
|
|||
|
Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | |1
|
|||
|
Reverse and reflect Source Route option |3.2.2.8 |x| | | | |1
|
|||
|
Pass Timestamp Reply to higher layer |3.2.2.8 |x| | | | |1
|
|||
|
Obey rules for "standard value" |3.2.2.8 |x| | | | |1
|
|||
|
| | | | | | |
|
|||
|
ICMP Address Mask Request and Reply: | | | | | | |
|
|||
|
Addr Mask source configurable |3.2.2.9 |x| | | | |
|
|||
|
Support static configuration of addr mask |3.2.2.9 |x| | | | |
|
|||
|
Get addr mask dynamically during booting |3.2.2.9 | | |x| | |
|
|||
|
Get addr via ICMP Addr Mask Request/Reply |3.2.2.9 | | |x| | |
|
|||
|
Retransmit Addr Mask Req if no Reply |3.2.2.9 |x| | | | |3
|
|||
|
Assume default mask if no Reply |3.2.2.9 | |x| | | |3
|
|||
|
Update address mask from first Reply only |3.2.2.9 |x| | | | |3
|
|||
|
Reasonableness check on Addr Mask |3.2.2.9 | |x| | | |
|
|||
|
Send unauthorized Addr Mask Reply msgs |3.2.2.9 | | | | |x|
|
|||
|
Explicitly configured to be agent |3.2.2.9 |x| | | | |
|
|||
|
Static config=> Addr-Mask-Authoritative flag |3.2.2.9 | |x| | | |
|
|||
|
Broadcast Addr Mask Reply when init. |3.2.2.9 |x| | | | |3
|
|||
|
| | | | | | |
|
|||
|
ROUTING OUTBOUND DATAGRAMS: | | | | | | |
|
|||
|
Use address mask in local/remote decision |3.3.1.1 |x| | | | |
|
|||
|
Operate with no gateways on conn network |3.3.1.1 |x| | | | |
|
|||
|
Maintain "route cache" of next-hop gateways |3.3.1.2 |x| | | | |
|
|||
|
Treat Host and Net Redirect the same |3.3.1.2 | |x| | | |
|
|||
|
If no cache entry, use default gateway |3.3.1.2 |x| | | | |
|
|||
|
Support multiple default gateways |3.3.1.2 |x| | | | |
|
|||
|
Provide table of static routes |3.3.1.2 | | |x| | |
|
|||
|
Flag: route overridable by Redirects |3.3.1.2 | | |x| | |
|
|||
|
Key route cache on host, not net address |3.3.1.3 | | |x| | |
|
|||
|
Include TOS in route cache |3.3.1.3 | |x| | | |
|
|||
|
| | | | | | |
|
|||
|
Able to detect failure of next-hop gateway |3.3.1.4 |x| | | | |
|
|||
|
Assume route is good forever |3.3.1.4 | | | |x| |
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 74]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
Ping gateways continuously |3.3.1.4 | | | | |x|
|
|||
|
Ping only when traffic being sent |3.3.1.4 |x| | | | |
|
|||
|
Ping only when no positive indication |3.3.1.4 |x| | | | |
|
|||
|
Higher and lower layers give advice |3.3.1.4 | |x| | | |
|
|||
|
Switch from failed default g'way to another |3.3.1.5 |x| | | | |
|
|||
|
Manual method of entering config info |3.3.1.6 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
REASSEMBLY and FRAGMENTATION: | | | | | | |
|
|||
|
Able to reassemble incoming datagrams |3.3.2 |x| | | | |
|
|||
|
At least 576 byte datagrams |3.3.2 |x| | | | |
|
|||
|
EMTU_R configurable or indefinite |3.3.2 | |x| | | |
|
|||
|
Transport layer able to learn MMS_R |3.3.2 |x| | | | |
|
|||
|
Send ICMP Time Exceeded on reassembly timeout |3.3.2 |x| | | | |
|
|||
|
Fixed reassembly timeout value |3.3.2 | |x| | | |
|
|||
|
| | | | | | |
|
|||
|
Pass MMS_S to higher layers |3.3.3 |x| | | | |
|
|||
|
Local fragmentation of outgoing packets |3.3.3 | | |x| | |
|
|||
|
Else don't send bigger than MMS_S |3.3.3 |x| | | | |
|
|||
|
Send max 576 to off-net destination |3.3.3 | |x| | | |
|
|||
|
All-Subnets-MTU configuration flag |3.3.3 | | |x| | |
|
|||
|
| | | | | | |
|
|||
|
MULTIHOMING: | | | | | | |
|
|||
|
Reply with same addr as spec-dest addr |3.3.4.2 | |x| | | |
|
|||
|
Allow application to choose local IP addr |3.3.4.2 |x| | | | |
|
|||
|
Silently discard d'gram in "wrong" interface |3.3.4.2 | | |x| | |
|
|||
|
Only send d'gram through "right" interface |3.3.4.2 | | |x| | |4
|
|||
|
| | | | | | |
|
|||
|
SOURCE-ROUTE FORWARDING: | | | | | | |
|
|||
|
Forward datagram with Source Route option |3.3.5 | | |x| | |1
|
|||
|
Obey corresponding gateway rules |3.3.5 |x| | | | |1
|
|||
|
Update TTL by gateway rules |3.3.5 |x| | | | |1
|
|||
|
Able to generate ICMP err code 4, 5 |3.3.5 |x| | | | |1
|
|||
|
IP src addr not local host |3.3.5 | | |x| | |1
|
|||
|
Update Timestamp, Record Route options |3.3.5 |x| | | | |1
|
|||
|
Configurable switch for non-local SRing |3.3.5 |x| | | | |1
|
|||
|
Defaults to OFF |3.3.5 |x| | | | |1
|
|||
|
Satisfy gwy access rules for non-local SRing |3.3.5 |x| | | | |1
|
|||
|
If not forward, send Dest Unreach (cd 5) |3.3.5 | |x| | | |2
|
|||
|
| | | | | | |
|
|||
|
BROADCAST: | | | | | | |
|
|||
|
Broadcast addr as IP source addr |3.2.1.3 | | | | |x|
|
|||
|
Receive 0 or -1 broadcast formats OK |3.3.6 | |x| | | |
|
|||
|
Config'ble option to send 0 or -1 b'cast |3.3.6 | | |x| | |
|
|||
|
Default to -1 broadcast |3.3.6 | |x| | | |
|
|||
|
Recognize all broadcast address formats |3.3.6 |x| | | | |
|
|||
|
Use IP b'cast/m'cast addr in link-layer b'cast |3.3.6 |x| | | | |
|
|||
|
Silently discard link-layer-only b'cast dg's |3.3.6 | |x| | | |
|
|||
|
Use Limited Broadcast addr for connected net |3.3.6 | |x| | | |
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 75]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 INTERNET LAYER October 1989
|
|||
|
|
|||
|
|
|||
|
| | | | | | |
|
|||
|
MULTICAST: | | | | | | |
|
|||
|
Support local IP multicasting (RFC-1112) |3.3.7 | |x| | | |
|
|||
|
Support IGMP (RFC-1112) |3.3.7 | | |x| | |
|
|||
|
Join all-hosts group at startup |3.3.7 | |x| | | |
|
|||
|
Higher layers learn i'face m'cast capability |3.3.7 | |x| | | |
|
|||
|
| | | | | | |
|
|||
|
INTERFACE: | | | | | | |
|
|||
|
Allow transport layer to use all IP mechanisms |3.4 |x| | | | |
|
|||
|
Pass interface ident up to transport layer |3.4 |x| | | | |
|
|||
|
Pass all IP options up to transport layer |3.4 |x| | | | |
|
|||
|
Transport layer can send certain ICMP messages |3.4 |x| | | | |
|
|||
|
Pass spec'd ICMP messages up to transp. layer |3.4 |x| | | | |
|
|||
|
Include IP hdr+8 octets or more from orig. |3.4 |x| | | | |
|
|||
|
Able to leap tall buildings at a single bound |3.5 | |x| | | |
|
|||
|
|
|||
|
Footnotes:
|
|||
|
|
|||
|
(1) Only if feature is implemented.
|
|||
|
|
|||
|
(2) This requirement is overruled if datagram is an ICMP error message.
|
|||
|
|
|||
|
(3) Only if feature is implemented and is configured "on".
|
|||
|
|
|||
|
(4) Unless has embedded gateway functionality or is source routed.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 76]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- UDP October 1989
|
|||
|
|
|||
|
|
|||
|
4. TRANSPORT PROTOCOLS
|
|||
|
|
|||
|
4.1 USER DATAGRAM PROTOCOL -- UDP
|
|||
|
|
|||
|
4.1.1 INTRODUCTION
|
|||
|
|
|||
|
The User Datagram Protocol UDP [UDP:1] offers only a minimal
|
|||
|
transport service -- non-guaranteed datagram delivery -- and
|
|||
|
gives applications direct access to the datagram service of the
|
|||
|
IP layer. UDP is used by applications that do not require the
|
|||
|
level of service of TCP or that wish to use communications
|
|||
|
services (e.g., multicast or broadcast delivery) not available
|
|||
|
from TCP.
|
|||
|
|
|||
|
UDP is almost a null protocol; the only services it provides
|
|||
|
over IP are checksumming of data and multiplexing by port
|
|||
|
number. Therefore, an application program running over UDP
|
|||
|
must deal directly with end-to-end communication problems that
|
|||
|
a connection-oriented protocol would have handled -- e.g.,
|
|||
|
retransmission for reliable delivery, packetization and
|
|||
|
reassembly, flow control, congestion avoidance, etc., when
|
|||
|
these are required. The fairly complex coupling between IP and
|
|||
|
TCP will be mirrored in the coupling between UDP and many
|
|||
|
applications using UDP.
|
|||
|
|
|||
|
4.1.2 PROTOCOL WALK-THROUGH
|
|||
|
|
|||
|
There are no known errors in the specification of UDP.
|
|||
|
|
|||
|
4.1.3 SPECIFIC ISSUES
|
|||
|
|
|||
|
4.1.3.1 Ports
|
|||
|
|
|||
|
UDP well-known ports follow the same rules as TCP well-known
|
|||
|
ports; see Section 4.2.2.1 below.
|
|||
|
|
|||
|
If a datagram arrives addressed to a UDP port for which
|
|||
|
there is no pending LISTEN call, UDP SHOULD send an ICMP
|
|||
|
Port Unreachable message.
|
|||
|
|
|||
|
4.1.3.2 IP Options
|
|||
|
|
|||
|
UDP MUST pass any IP option that it receives from the IP
|
|||
|
layer transparently to the application layer.
|
|||
|
|
|||
|
An application MUST be able to specify IP options to be sent
|
|||
|
in its UDP datagrams, and UDP MUST pass these options to the
|
|||
|
IP layer.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 77]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- UDP October 1989
|
|||
|
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
At present, the only options that need be passed
|
|||
|
through UDP are Source Route, Record Route, and Time
|
|||
|
Stamp. However, new options may be defined in the
|
|||
|
future, and UDP need not and should not make any
|
|||
|
assumptions about the format or content of options it
|
|||
|
passes to or from the application; an exception to this
|
|||
|
might be an IP-layer security option.
|
|||
|
|
|||
|
An application based on UDP will need to obtain a
|
|||
|
source route from a request datagram and supply a
|
|||
|
reversed route for sending the corresponding reply.
|
|||
|
|
|||
|
4.1.3.3 ICMP Messages
|
|||
|
|
|||
|
UDP MUST pass to the application layer all ICMP error
|
|||
|
messages that it receives from the IP layer. Conceptually
|
|||
|
at least, this may be accomplished with an upcall to the
|
|||
|
ERROR_REPORT routine (see Section 4.2.4.1).
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Note that ICMP error messages resulting from sending a
|
|||
|
UDP datagram are received asynchronously. A UDP-based
|
|||
|
application that wants to receive ICMP error messages
|
|||
|
is responsible for maintaining the state necessary to
|
|||
|
demultiplex these messages when they arrive; for
|
|||
|
example, the application may keep a pending receive
|
|||
|
operation for this purpose. The application is also
|
|||
|
responsible to avoid confusion from a delayed ICMP
|
|||
|
error message resulting from an earlier use of the same
|
|||
|
port(s).
|
|||
|
|
|||
|
4.1.3.4 UDP Checksums
|
|||
|
|
|||
|
A host MUST implement the facility to generate and validate
|
|||
|
UDP checksums. An application MAY optionally be able to
|
|||
|
control whether a UDP checksum will be generated, but it
|
|||
|
MUST default to checksumming on.
|
|||
|
|
|||
|
If a UDP datagram is received with a checksum that is non-
|
|||
|
zero and invalid, UDP MUST silently discard the datagram.
|
|||
|
An application MAY optionally be able to control whether UDP
|
|||
|
datagrams without checksums should be discarded or passed to
|
|||
|
the application.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Some applications that normally run only across local
|
|||
|
area networks have chosen to turn off UDP checksums for
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 78]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- UDP October 1989
|
|||
|
|
|||
|
|
|||
|
efficiency. As a result, numerous cases of undetected
|
|||
|
errors have been reported. The advisability of ever
|
|||
|
turning off UDP checksumming is very controversial.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
There is a common implementation error in UDP
|
|||
|
checksums. Unlike the TCP checksum, the UDP checksum
|
|||
|
is optional; the value zero is transmitted in the
|
|||
|
checksum field of a UDP header to indicate the absence
|
|||
|
of a checksum. If the transmitter really calculates a
|
|||
|
UDP checksum of zero, it must transmit the checksum as
|
|||
|
all 1's (65535). No special action is required at the
|
|||
|
receiver, since zero and 65535 are equivalent in 1's
|
|||
|
complement arithmetic.
|
|||
|
|
|||
|
4.1.3.5 UDP Multihoming
|
|||
|
|
|||
|
When a UDP datagram is received, its specific-destination
|
|||
|
address MUST be passed up to the application layer.
|
|||
|
|
|||
|
An application program MUST be able to specify the IP source
|
|||
|
address to be used for sending a UDP datagram or to leave it
|
|||
|
unspecified (in which case the networking software will
|
|||
|
choose an appropriate source address). There SHOULD be a
|
|||
|
way to communicate the chosen source address up to the
|
|||
|
application layer (e.g, so that the application can later
|
|||
|
receive a reply datagram only from the corresponding
|
|||
|
interface).
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
A request/response application that uses UDP should use
|
|||
|
a source address for the response that is the same as
|
|||
|
the specific destination address of the request. See
|
|||
|
the "General Issues" section of [INTRO:1].
|
|||
|
|
|||
|
4.1.3.6 Invalid Addresses
|
|||
|
|
|||
|
A UDP datagram received with an invalid IP source address
|
|||
|
(e.g., a broadcast or multicast address) must be discarded
|
|||
|
by UDP or by the IP layer (see Section 3.2.1.3).
|
|||
|
|
|||
|
When a host sends a UDP datagram, the source address MUST be
|
|||
|
(one of) the IP address(es) of the host.
|
|||
|
|
|||
|
4.1.4 UDP/APPLICATION LAYER INTERFACE
|
|||
|
|
|||
|
The application interface to UDP MUST provide the full services
|
|||
|
of the IP/transport interface described in Section 3.4 of this
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 79]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- UDP October 1989
|
|||
|
|
|||
|
|
|||
|
document. Thus, an application using UDP needs the functions
|
|||
|
of the GET_SRCADDR(), GET_MAXSIZES(), ADVISE_DELIVPROB(), and
|
|||
|
RECV_ICMP() calls described in Section 3.4. For example,
|
|||
|
GET_MAXSIZES() can be used to learn the effective maximum UDP
|
|||
|
maximum datagram size for a particular {interface,remote
|
|||
|
host,TOS} triplet.
|
|||
|
|
|||
|
An application-layer program MUST be able to set the TTL and
|
|||
|
TOS values as well as IP options for sending a UDP datagram,
|
|||
|
and these values must be passed transparently to the IP layer.
|
|||
|
UDP MAY pass the received TOS up to the application layer.
|
|||
|
|
|||
|
4.1.5 UDP REQUIREMENTS SUMMARY
|
|||
|
|
|||
|
|
|||
|
| | | | |S| |
|
|||
|
| | | | |H| |F
|
|||
|
| | | | |O|M|o
|
|||
|
| | |S| |U|U|o
|
|||
|
| | |H| |L|S|t
|
|||
|
| |M|O| |D|T|n
|
|||
|
| |U|U|M| | |o
|
|||
|
| |S|L|A|N|N|t
|
|||
|
| |T|D|Y|O|O|t
|
|||
|
FEATURE |SECTION | | | |T|T|e
|
|||
|
-------------------------------------------------|--------|-|-|-|-|-|--
|
|||
|
| | | | | | |
|
|||
|
UDP | | | | | | |
|
|||
|
-------------------------------------------------|--------|-|-|-|-|-|--
|
|||
|
| | | | | | |
|
|||
|
UDP send Port Unreachable |4.1.3.1 | |x| | | |
|
|||
|
| | | | | | |
|
|||
|
IP Options in UDP | | | | | | |
|
|||
|
- Pass rcv'd IP options to applic layer |4.1.3.2 |x| | | | |
|
|||
|
- Applic layer can specify IP options in Send |4.1.3.2 |x| | | | |
|
|||
|
- UDP passes IP options down to IP layer |4.1.3.2 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
Pass ICMP msgs up to applic layer |4.1.3.3 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
UDP checksums: | | | | | | |
|
|||
|
- Able to generate/check checksum |4.1.3.4 |x| | | | |
|
|||
|
- Silently discard bad checksum |4.1.3.4 |x| | | | |
|
|||
|
- Sender Option to not generate checksum |4.1.3.4 | | |x| | |
|
|||
|
- Default is to checksum |4.1.3.4 |x| | | | |
|
|||
|
- Receiver Option to require checksum |4.1.3.4 | | |x| | |
|
|||
|
| | | | | | |
|
|||
|
UDP Multihoming | | | | | | |
|
|||
|
- Pass spec-dest addr to application |4.1.3.5 |x| | | | |
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 80]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- UDP October 1989
|
|||
|
|
|||
|
|
|||
|
- Applic layer can specify Local IP addr |4.1.3.5 |x| | | | |
|
|||
|
- Applic layer specify wild Local IP addr |4.1.3.5 |x| | | | |
|
|||
|
- Applic layer notified of Local IP addr used |4.1.3.5 | |x| | | |
|
|||
|
| | | | | | |
|
|||
|
Bad IP src addr silently discarded by UDP/IP |4.1.3.6 |x| | | | |
|
|||
|
Only send valid IP source address |4.1.3.6 |x| | | | |
|
|||
|
UDP Application Interface Services | | | | | | |
|
|||
|
Full IP interface of 3.4 for application |4.1.4 |x| | | | |
|
|||
|
- Able to spec TTL, TOS, IP opts when send dg |4.1.4 |x| | | | |
|
|||
|
- Pass received TOS up to applic layer |4.1.4 | | |x| | |
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 81]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
4.2 TRANSMISSION CONTROL PROTOCOL -- TCP
|
|||
|
|
|||
|
4.2.1 INTRODUCTION
|
|||
|
|
|||
|
The Transmission Control Protocol TCP [TCP:1] is the primary
|
|||
|
virtual-circuit transport protocol for the Internet suite. TCP
|
|||
|
provides reliable, in-sequence delivery of a full-duplex stream
|
|||
|
of octets (8-bit bytes). TCP is used by those applications
|
|||
|
needing reliable, connection-oriented transport service, e.g.,
|
|||
|
mail (SMTP), file transfer (FTP), and virtual terminal service
|
|||
|
(Telnet); requirements for these application-layer protocols
|
|||
|
are described in [INTRO:1].
|
|||
|
|
|||
|
4.2.2 PROTOCOL WALK-THROUGH
|
|||
|
|
|||
|
4.2.2.1 Well-Known Ports: RFC-793 Section 2.7
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
TCP reserves port numbers in the range 0-255 for
|
|||
|
"well-known" ports, used to access services that are
|
|||
|
standardized across the Internet. The remainder of the
|
|||
|
port space can be freely allocated to application
|
|||
|
processes. Current well-known port definitions are
|
|||
|
listed in the RFC entitled "Assigned Numbers"
|
|||
|
[INTRO:6]. A prerequisite for defining a new well-
|
|||
|
known port is an RFC documenting the proposed service
|
|||
|
in enough detail to allow new implementations.
|
|||
|
|
|||
|
Some systems extend this notion by adding a third
|
|||
|
subdivision of the TCP port space: reserved ports,
|
|||
|
which are generally used for operating-system-specific
|
|||
|
services. For example, reserved ports might fall
|
|||
|
between 256 and some system-dependent upper limit.
|
|||
|
Some systems further choose to protect well-known and
|
|||
|
reserved ports by permitting only privileged users to
|
|||
|
open TCP connections with those port values. This is
|
|||
|
perfectly reasonable as long as the host does not
|
|||
|
assume that all hosts protect their low-numbered ports
|
|||
|
in this manner.
|
|||
|
|
|||
|
4.2.2.2 Use of Push: RFC-793 Section 2.8
|
|||
|
|
|||
|
When an application issues a series of SEND calls without
|
|||
|
setting the PUSH flag, the TCP MAY aggregate the data
|
|||
|
internally without sending it. Similarly, when a series of
|
|||
|
segments is received without the PSH bit, a TCP MAY queue
|
|||
|
the data internally without passing it to the receiving
|
|||
|
application.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 82]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
The PSH bit is not a record marker and is independent of
|
|||
|
segment boundaries. The transmitter SHOULD collapse
|
|||
|
successive PSH bits when it packetizes data, to send the
|
|||
|
largest possible segment.
|
|||
|
|
|||
|
A TCP MAY implement PUSH flags on SEND calls. If PUSH flags
|
|||
|
are not implemented, then the sending TCP: (1) must not
|
|||
|
buffer data indefinitely, and (2) MUST set the PSH bit in
|
|||
|
the last buffered segment (i.e., when there is no more
|
|||
|
queued data to be sent).
|
|||
|
|
|||
|
The discussion in RFC-793 on pages 48, 50, and 74
|
|||
|
erroneously implies that a received PSH flag must be passed
|
|||
|
to the application layer. Passing a received PSH flag to
|
|||
|
the application layer is now OPTIONAL.
|
|||
|
|
|||
|
An application program is logically required to set the PUSH
|
|||
|
flag in a SEND call whenever it needs to force delivery of
|
|||
|
the data to avoid a communication deadlock. However, a TCP
|
|||
|
SHOULD send a maximum-sized segment whenever possible, to
|
|||
|
improve performance (see Section 4.2.3.4).
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
When the PUSH flag is not implemented on SEND calls,
|
|||
|
i.e., when the application/TCP interface uses a pure
|
|||
|
streaming model, responsibility for aggregating any
|
|||
|
tiny data fragments to form reasonable sized segments
|
|||
|
is partially borne by the application layer.
|
|||
|
|
|||
|
Generally, an interactive application protocol must set
|
|||
|
the PUSH flag at least in the last SEND call in each
|
|||
|
command or response sequence. A bulk transfer protocol
|
|||
|
like FTP should set the PUSH flag on the last segment
|
|||
|
of a file or when necessary to prevent buffer deadlock.
|
|||
|
|
|||
|
At the receiver, the PSH bit forces buffered data to be
|
|||
|
delivered to the application (even if less than a full
|
|||
|
buffer has been received). Conversely, the lack of a
|
|||
|
PSH bit can be used to avoid unnecessary wakeup calls
|
|||
|
to the application process; this can be an important
|
|||
|
performance optimization for large timesharing hosts.
|
|||
|
Passing the PSH bit to the receiving application allows
|
|||
|
an analogous optimization within the application.
|
|||
|
|
|||
|
4.2.2.3 Window Size: RFC-793 Section 3.1
|
|||
|
|
|||
|
The window size MUST be treated as an unsigned number, or
|
|||
|
else large window sizes will appear like negative windows
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 83]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
and TCP will not work. It is RECOMMENDED that
|
|||
|
implementations reserve 32-bit fields for the send and
|
|||
|
receive window sizes in the connection record and do all
|
|||
|
window computations with 32 bits.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
It is known that the window field in the TCP header is
|
|||
|
too small for high-speed, long-delay paths.
|
|||
|
Experimental TCP options have been defined to extend
|
|||
|
the window size; see for example [TCP:11]. In
|
|||
|
anticipation of the adoption of such an extension, TCP
|
|||
|
implementors should treat windows as 32 bits.
|
|||
|
|
|||
|
4.2.2.4 Urgent Pointer: RFC-793 Section 3.1
|
|||
|
|
|||
|
The second sentence is in error: the urgent pointer points
|
|||
|
to the sequence number of the LAST octet (not LAST+1) in a
|
|||
|
sequence of urgent data. The description on page 56 (last
|
|||
|
sentence) is correct.
|
|||
|
|
|||
|
A TCP MUST support a sequence of urgent data of any length.
|
|||
|
|
|||
|
A TCP MUST inform the application layer asynchronously
|
|||
|
whenever it receives an Urgent pointer and there was
|
|||
|
previously no pending urgent data, or whenever the Urgent
|
|||
|
pointer advances in the data stream. There MUST be a way
|
|||
|
for the application to learn how much urgent data remains to
|
|||
|
be read from the connection, or at least to determine
|
|||
|
whether or not more urgent data remains to be read.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Although the Urgent mechanism may be used for any
|
|||
|
application, it is normally used to send "interrupt"-
|
|||
|
type commands to a Telnet program (see "Using Telnet
|
|||
|
Synch Sequence" section in [INTRO:1]).
|
|||
|
|
|||
|
The asynchronous or "out-of-band" notification will
|
|||
|
allow the application to go into "urgent mode", reading
|
|||
|
data from the TCP connection. This allows control
|
|||
|
commands to be sent to an application whose normal
|
|||
|
input buffers are full of unprocessed data.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
The generic ERROR-REPORT() upcall described in Section
|
|||
|
4.2.4.1 is a possible mechanism for informing the
|
|||
|
application of the arrival of urgent data.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 84]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
4.2.2.5 TCP Options: RFC-793 Section 3.1
|
|||
|
|
|||
|
A TCP MUST be able to receive a TCP option in any segment.
|
|||
|
A TCP MUST ignore without error any TCP option it does not
|
|||
|
implement, assuming that the option has a length field (all
|
|||
|
TCP options defined in the future will have length fields).
|
|||
|
TCP MUST be prepared to handle an illegal option length
|
|||
|
(e.g., zero) without crashing; a suggested procedure is to
|
|||
|
reset the connection and log the reason.
|
|||
|
|
|||
|
4.2.2.6 Maximum Segment Size Option: RFC-793 Section 3.1
|
|||
|
|
|||
|
TCP MUST implement both sending and receiving the Maximum
|
|||
|
Segment Size option [TCP:4].
|
|||
|
|
|||
|
TCP SHOULD send an MSS (Maximum Segment Size) option in
|
|||
|
every SYN segment when its receive MSS differs from the
|
|||
|
default 536, and MAY send it always.
|
|||
|
|
|||
|
If an MSS option is not received at connection setup, TCP
|
|||
|
MUST assume a default send MSS of 536 (576-40) [TCP:4].
|
|||
|
|
|||
|
The maximum size of a segment that TCP really sends, the
|
|||
|
"effective send MSS," MUST be the smaller of the send MSS
|
|||
|
(which reflects the available reassembly buffer size at the
|
|||
|
remote host) and the largest size permitted by the IP layer:
|
|||
|
|
|||
|
Eff.snd.MSS =
|
|||
|
|
|||
|
min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize
|
|||
|
|
|||
|
where:
|
|||
|
|
|||
|
* SendMSS is the MSS value received from the remote host,
|
|||
|
or the default 536 if no MSS option is received.
|
|||
|
|
|||
|
* MMS_S is the maximum size for a transport-layer message
|
|||
|
that TCP may send.
|
|||
|
|
|||
|
* TCPhdrsize is the size of the TCP header; this is
|
|||
|
normally 20, but may be larger if TCP options are to be
|
|||
|
sent.
|
|||
|
|
|||
|
* IPoptionsize is the size of any IP options that TCP
|
|||
|
will pass to the IP layer with the current message.
|
|||
|
|
|||
|
|
|||
|
The MSS value to be sent in an MSS option must be less than
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 85]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
or equal to:
|
|||
|
|
|||
|
MMS_R - 20
|
|||
|
|
|||
|
where MMS_R is the maximum size for a transport-layer
|
|||
|
message that can be received (and reassembled). TCP obtains
|
|||
|
MMS_R and MMS_S from the IP layer; see the generic call
|
|||
|
GET_MAXSIZES in Section 3.4.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
The choice of TCP segment size has a strong effect on
|
|||
|
performance. Larger segments increase throughput by
|
|||
|
amortizing header size and per-datagram processing
|
|||
|
overhead over more data bytes; however, if the packet
|
|||
|
is so large that it causes IP fragmentation, efficiency
|
|||
|
drops sharply if any fragments are lost [IP:9].
|
|||
|
|
|||
|
Some TCP implementations send an MSS option only if the
|
|||
|
destination host is on a non-connected network.
|
|||
|
However, in general the TCP layer may not have the
|
|||
|
appropriate information to make this decision, so it is
|
|||
|
preferable to leave to the IP layer the task of
|
|||
|
determining a suitable MTU for the Internet path. We
|
|||
|
therefore recommend that TCP always send the option (if
|
|||
|
not 536) and that the IP layer determine MMS_R as
|
|||
|
specified in 3.3.3 and 3.4. A proposed IP-layer
|
|||
|
mechanism to measure the MTU would then modify the IP
|
|||
|
layer without changing TCP.
|
|||
|
|
|||
|
4.2.2.7 TCP Checksum: RFC-793 Section 3.1
|
|||
|
|
|||
|
Unlike the UDP checksum (see Section 4.1.3.4), the TCP
|
|||
|
checksum is never optional. The sender MUST generate it and
|
|||
|
the receiver MUST check it.
|
|||
|
|
|||
|
4.2.2.8 TCP Connection State Diagram: RFC-793 Section 3.2,
|
|||
|
page 23
|
|||
|
|
|||
|
There are several problems with this diagram:
|
|||
|
|
|||
|
(a) The arrow from SYN-SENT to SYN-RCVD should be labeled
|
|||
|
with "snd SYN,ACK", to agree with the text on page 68
|
|||
|
and with Figure 8.
|
|||
|
|
|||
|
(b) There could be an arrow from SYN-RCVD state to LISTEN
|
|||
|
state, conditioned on receiving a RST after a passive
|
|||
|
open (see text page 70).
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 86]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
(c) It is possible to go directly from FIN-WAIT-1 to the
|
|||
|
TIME-WAIT state (see page 75 of the spec).
|
|||
|
|
|||
|
|
|||
|
4.2.2.9 Initial Sequence Number Selection: RFC-793 Section
|
|||
|
3.3, page 27
|
|||
|
|
|||
|
A TCP MUST use the specified clock-driven selection of
|
|||
|
initial sequence numbers.
|
|||
|
|
|||
|
4.2.2.10 Simultaneous Open Attempts: RFC-793 Section 3.4, page
|
|||
|
32
|
|||
|
|
|||
|
There is an error in Figure 8: the packet on line 7 should
|
|||
|
be identical to the packet on line 5.
|
|||
|
|
|||
|
A TCP MUST support simultaneous open attempts.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
It sometimes surprises implementors that if two
|
|||
|
applications attempt to simultaneously connect to each
|
|||
|
other, only one connection is generated instead of two.
|
|||
|
This was an intentional design decision; don't try to
|
|||
|
"fix" it.
|
|||
|
|
|||
|
4.2.2.11 Recovery from Old Duplicate SYN: RFC-793 Section 3.4,
|
|||
|
page 33
|
|||
|
|
|||
|
Note that a TCP implementation MUST keep track of whether a
|
|||
|
connection has reached SYN_RCVD state as the result of a
|
|||
|
passive OPEN or an active OPEN.
|
|||
|
|
|||
|
4.2.2.12 RST Segment: RFC-793 Section 3.4
|
|||
|
|
|||
|
A TCP SHOULD allow a received RST segment to include data.
|
|||
|
|
|||
|
DISCUSSION
|
|||
|
It has been suggested that a RST segment could contain
|
|||
|
ASCII text that encoded and explained the cause of the
|
|||
|
RST. No standard has yet been established for such
|
|||
|
data.
|
|||
|
|
|||
|
4.2.2.13 Closing a Connection: RFC-793 Section 3.5
|
|||
|
|
|||
|
A TCP connection may terminate in two ways: (1) the normal
|
|||
|
TCP close sequence using a FIN handshake, and (2) an "abort"
|
|||
|
in which one or more RST segments are sent and the
|
|||
|
connection state is immediately discarded. If a TCP
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 87]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
connection is closed by the remote site, the local
|
|||
|
application MUST be informed whether it closed normally or
|
|||
|
was aborted.
|
|||
|
|
|||
|
The normal TCP close sequence delivers buffered data
|
|||
|
reliably in both directions. Since the two directions of a
|
|||
|
TCP connection are closed independently, it is possible for
|
|||
|
a connection to be "half closed," i.e., closed in only one
|
|||
|
direction, and a host is permitted to continue sending data
|
|||
|
in the open direction on a half-closed connection.
|
|||
|
|
|||
|
A host MAY implement a "half-duplex" TCP close sequence, so
|
|||
|
that an application that has called CLOSE cannot continue to
|
|||
|
read data from the connection. If such a host issues a
|
|||
|
CLOSE call while received data is still pending in TCP, or
|
|||
|
if new data is received after CLOSE is called, its TCP
|
|||
|
SHOULD send a RST to show that data was lost.
|
|||
|
|
|||
|
When a connection is closed actively, it MUST linger in
|
|||
|
TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
|
|||
|
However, it MAY accept a new SYN from the remote TCP to
|
|||
|
reopen the connection directly from TIME-WAIT state, if it:
|
|||
|
|
|||
|
(1) assigns its initial sequence number for the new
|
|||
|
connection to be larger than the largest sequence
|
|||
|
number it used on the previous connection incarnation,
|
|||
|
and
|
|||
|
|
|||
|
(2) returns to TIME-WAIT state if the SYN turns out to be
|
|||
|
an old duplicate.
|
|||
|
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
TCP's full-duplex data-preserving close is a feature
|
|||
|
that is not included in the analogous ISO transport
|
|||
|
protocol TP4.
|
|||
|
|
|||
|
Some systems have not implemented half-closed
|
|||
|
connections, presumably because they do not fit into
|
|||
|
the I/O model of their particular operating system. On
|
|||
|
these systems, once an application has called CLOSE, it
|
|||
|
can no longer read input data from the connection; this
|
|||
|
is referred to as a "half-duplex" TCP close sequence.
|
|||
|
|
|||
|
The graceful close algorithm of TCP requires that the
|
|||
|
connection state remain defined on (at least) one end
|
|||
|
of the connection, for a timeout period of 2xMSL, i.e.,
|
|||
|
4 minutes. During this period, the (remote socket,
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 88]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
local socket) pair that defines the connection is busy
|
|||
|
and cannot be reused. To shorten the time that a given
|
|||
|
port pair is tied up, some TCPs allow a new SYN to be
|
|||
|
accepted in TIME-WAIT state.
|
|||
|
|
|||
|
4.2.2.14 Data Communication: RFC-793 Section 3.7, page 40
|
|||
|
|
|||
|
Since RFC-793 was written, there has been extensive work on
|
|||
|
TCP algorithms to achieve efficient data communication.
|
|||
|
Later sections of the present document describe required and
|
|||
|
recommended TCP algorithms to determine when to send data
|
|||
|
(Section 4.2.3.4), when to send an acknowledgment (Section
|
|||
|
4.2.3.2), and when to update the window (Section 4.2.3.3).
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
One important performance issue is "Silly Window
|
|||
|
Syndrome" or "SWS" [TCP:5], a stable pattern of small
|
|||
|
incremental window movements resulting in extremely
|
|||
|
poor TCP performance. Algorithms to avoid SWS are
|
|||
|
described below for both the sending side (Section
|
|||
|
4.2.3.4) and the receiving side (Section 4.2.3.3).
|
|||
|
|
|||
|
In brief, SWS is caused by the receiver advancing the
|
|||
|
right window edge whenever it has any new buffer space
|
|||
|
available to receive data and by the sender using any
|
|||
|
incremental window, no matter how small, to send more
|
|||
|
data [TCP:5]. The result can be a stable pattern of
|
|||
|
sending tiny data segments, even though both sender and
|
|||
|
receiver have a large total buffer space for the
|
|||
|
connection. SWS can only occur during the transmission
|
|||
|
of a large amount of data; if the connection goes
|
|||
|
quiescent, the problem will disappear. It is caused by
|
|||
|
typical straightforward implementation of window
|
|||
|
management, but the sender and receiver algorithms
|
|||
|
given below will avoid it.
|
|||
|
|
|||
|
Another important TCP performance issue is that some
|
|||
|
applications, especially remote login to character-at-
|
|||
|
a-time hosts, tend to send streams of one-octet data
|
|||
|
segments. To avoid deadlocks, every TCP SEND call from
|
|||
|
such applications must be "pushed", either explicitly
|
|||
|
by the application or else implicitly by TCP. The
|
|||
|
result may be a stream of TCP segments that contain one
|
|||
|
data octet each, which makes very inefficient use of
|
|||
|
the Internet and contributes to Internet congestion.
|
|||
|
The Nagle Algorithm described in Section 4.2.3.4
|
|||
|
provides a simple and effective solution to this
|
|||
|
problem. It does have the effect of clumping
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 89]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
characters over Telnet connections; this may initially
|
|||
|
surprise users accustomed to single-character echo, but
|
|||
|
user acceptance has not been a problem.
|
|||
|
|
|||
|
Note that the Nagle algorithm and the send SWS
|
|||
|
avoidance algorithm play complementary roles in
|
|||
|
improving performance. The Nagle algorithm discourages
|
|||
|
sending tiny segments when the data to be sent
|
|||
|
increases in small increments, while the SWS avoidance
|
|||
|
algorithm discourages small segments resulting from the
|
|||
|
right window edge advancing in small increments.
|
|||
|
|
|||
|
A careless implementation can send two or more
|
|||
|
acknowledgment segments per data segment received. For
|
|||
|
example, suppose the receiver acknowledges every data
|
|||
|
segment immediately. When the application program
|
|||
|
subsequently consumes the data and increases the
|
|||
|
available receive buffer space again, the receiver may
|
|||
|
send a second acknowledgment segment to update the
|
|||
|
window at the sender. The extreme case occurs with
|
|||
|
single-character segments on TCP connections using the
|
|||
|
Telnet protocol for remote login service. Some
|
|||
|
implementations have been observed in which each
|
|||
|
incoming 1-character segment generates three return
|
|||
|
segments: (1) the acknowledgment, (2) a one byte
|
|||
|
increase in the window, and (3) the echoed character,
|
|||
|
respectively.
|
|||
|
|
|||
|
4.2.2.15 Retransmission Timeout: RFC-793 Section 3.7, page 41
|
|||
|
|
|||
|
The algorithm suggested in RFC-793 for calculating the
|
|||
|
retransmission timeout is now known to be inadequate; see
|
|||
|
Section 4.2.3.1 below.
|
|||
|
|
|||
|
Recent work by Jacobson [TCP:7] on Internet congestion and
|
|||
|
TCP retransmission stability has produced a transmission
|
|||
|
algorithm combining "slow start" with "congestion
|
|||
|
avoidance". A TCP MUST implement this algorithm.
|
|||
|
|
|||
|
If a retransmitted packet is identical to the original
|
|||
|
packet (which implies not only that the data boundaries have
|
|||
|
not changed, but also that the window and acknowledgment
|
|||
|
fields of the header have not changed), then the same IP
|
|||
|
Identification field MAY be used (see Section 3.2.1.5).
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
Some TCP implementors have chosen to "packetize" the
|
|||
|
data stream, i.e., to pick segment boundaries when
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 90]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
segments are originally sent and to queue these
|
|||
|
segments in a "retransmission queue" until they are
|
|||
|
acknowledged. Another design (which may be simpler) is
|
|||
|
to defer packetizing until each time data is
|
|||
|
transmitted or retransmitted, so there will be no
|
|||
|
segment retransmission queue.
|
|||
|
|
|||
|
In an implementation with a segment retransmission
|
|||
|
queue, TCP performance may be enhanced by repacketizing
|
|||
|
the segments awaiting acknowledgment when the first
|
|||
|
retransmission timeout occurs. That is, the
|
|||
|
outstanding segments that fitted would be combined into
|
|||
|
one maximum-sized segment, with a new IP Identification
|
|||
|
value. The TCP would then retain this combined segment
|
|||
|
in the retransmit queue until it was acknowledged.
|
|||
|
However, if the first two segments in the
|
|||
|
retransmission queue totalled more than one maximum-
|
|||
|
sized segment, the TCP would retransmit only the first
|
|||
|
segment using the original IP Identification field.
|
|||
|
|
|||
|
4.2.2.16 Managing the Window: RFC-793 Section 3.7, page 41
|
|||
|
|
|||
|
A TCP receiver SHOULD NOT shrink the window, i.e., move the
|
|||
|
right window edge to the left. However, a sending TCP MUST
|
|||
|
be robust against window shrinking, which may cause the
|
|||
|
"useable window" (see Section 4.2.3.4) to become negative.
|
|||
|
|
|||
|
If this happens, the sender SHOULD NOT send new data, but
|
|||
|
SHOULD retransmit normally the old unacknowledged data
|
|||
|
between SND.UNA and SND.UNA+SND.WND. The sender MAY also
|
|||
|
retransmit old data beyond SND.UNA+SND.WND, but SHOULD NOT
|
|||
|
time out the connection if data beyond the right window edge
|
|||
|
is not acknowledged. If the window shrinks to zero, the TCP
|
|||
|
MUST probe it in the standard way (see next Section).
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Many TCP implementations become confused if the window
|
|||
|
shrinks from the right after data has been sent into a
|
|||
|
larger window. Note that TCP has a heuristic to select
|
|||
|
the latest window update despite possible datagram
|
|||
|
reordering; as a result, it may ignore a window update
|
|||
|
with a smaller window than previously offered if
|
|||
|
neither the sequence number nor the acknowledgment
|
|||
|
number is increased.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 91]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
4.2.2.17 Probing Zero Windows: RFC-793 Section 3.7, page 42
|
|||
|
|
|||
|
Probing of zero (offered) windows MUST be supported.
|
|||
|
|
|||
|
A TCP MAY keep its offered receive window closed
|
|||
|
indefinitely. As long as the receiving TCP continues to
|
|||
|
send acknowledgments in response to the probe segments, the
|
|||
|
sending TCP MUST allow the connection to stay open.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
It is extremely important to remember that ACK
|
|||
|
(acknowledgment) segments that contain no data are not
|
|||
|
reliably transmitted by TCP. If zero window probing is
|
|||
|
not supported, a connection may hang forever when an
|
|||
|
ACK segment that re-opens the window is lost.
|
|||
|
|
|||
|
The delay in opening a zero window generally occurs
|
|||
|
when the receiving application stops taking data from
|
|||
|
its TCP. For example, consider a printer daemon
|
|||
|
application, stopped because the printer ran out of
|
|||
|
paper.
|
|||
|
|
|||
|
The transmitting host SHOULD send the first zero-window
|
|||
|
probe when a zero window has existed for the retransmission
|
|||
|
timeout period (see Section 4.2.2.15), and SHOULD increase
|
|||
|
exponentially the interval between successive probes.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
This procedure minimizes delay if the zero-window
|
|||
|
condition is due to a lost ACK segment containing a
|
|||
|
window-opening update. Exponential backoff is
|
|||
|
recommended, possibly with some maximum interval not
|
|||
|
specified here. This procedure is similar to that of
|
|||
|
the retransmission algorithm, and it may be possible to
|
|||
|
combine the two procedures in the implementation.
|
|||
|
|
|||
|
4.2.2.18 Passive OPEN Calls: RFC-793 Section 3.8
|
|||
|
|
|||
|
Every passive OPEN call either creates a new connection
|
|||
|
record in LISTEN state, or it returns an error; it MUST NOT
|
|||
|
affect any previously created connection record.
|
|||
|
|
|||
|
A TCP that supports multiple concurrent users MUST provide
|
|||
|
an OPEN call that will functionally allow an application to
|
|||
|
LISTEN on a port while a connection block with the same
|
|||
|
local port is in SYN-SENT or SYN-RECEIVED state.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 92]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
Some applications (e.g., SMTP servers) may need to
|
|||
|
handle multiple connection attempts at about the same
|
|||
|
time. The probability of a connection attempt failing
|
|||
|
is reduced by giving the application some means of
|
|||
|
listening for a new connection at the same time that an
|
|||
|
earlier connection attempt is going through the three-
|
|||
|
way handshake.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
Acceptable implementations of concurrent opens may
|
|||
|
permit multiple passive OPEN calls, or they may allow
|
|||
|
"cloning" of LISTEN-state connections from a single
|
|||
|
passive OPEN call.
|
|||
|
|
|||
|
4.2.2.19 Time to Live: RFC-793 Section 3.9, page 52
|
|||
|
|
|||
|
RFC-793 specified that TCP was to request the IP layer to
|
|||
|
send TCP segments with TTL = 60. This is obsolete; the TTL
|
|||
|
value used to send TCP segments MUST be configurable. See
|
|||
|
Section 3.2.1.7 for discussion.
|
|||
|
|
|||
|
4.2.2.20 Event Processing: RFC-793 Section 3.9
|
|||
|
|
|||
|
While it is not strictly required, a TCP SHOULD be capable
|
|||
|
of queueing out-of-order TCP segments. Change the "may" in
|
|||
|
the last sentence of the first paragraph on page 70 to
|
|||
|
"should".
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Some small-host implementations have omitted segment
|
|||
|
queueing because of limited buffer space. This
|
|||
|
omission may be expected to adversely affect TCP
|
|||
|
throughput, since loss of a single segment causes all
|
|||
|
later segments to appear to be "out of sequence".
|
|||
|
|
|||
|
In general, the processing of received segments MUST be
|
|||
|
implemented to aggregate ACK segments whenever possible.
|
|||
|
For example, if the TCP is processing a series of queued
|
|||
|
segments, it MUST process them all before sending any ACK
|
|||
|
segments.
|
|||
|
|
|||
|
Here are some detailed error corrections and notes on the
|
|||
|
Event Processing section of RFC-793.
|
|||
|
|
|||
|
(a) CLOSE Call, CLOSE-WAIT state, p. 61: enter LAST-ACK
|
|||
|
state, not CLOSING.
|
|||
|
|
|||
|
(b) LISTEN state, check for SYN (pp. 65, 66): With a SYN
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 93]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
bit, if the security/compartment or the precedence is
|
|||
|
wrong for the segment, a reset is sent. The wrong form
|
|||
|
of reset is shown in the text; it should be:
|
|||
|
|
|||
|
<SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK>
|
|||
|
|
|||
|
|
|||
|
(c) SYN-SENT state, Check for SYN, p. 68: When the
|
|||
|
connection enters ESTABLISHED state, the following
|
|||
|
variables must be set:
|
|||
|
SND.WND <- SEG.WND
|
|||
|
SND.WL1 <- SEG.SEQ
|
|||
|
SND.WL2 <- SEG.ACK
|
|||
|
|
|||
|
|
|||
|
(d) Check security and precedence, p. 71: The first heading
|
|||
|
"ESTABLISHED STATE" should really be a list of all
|
|||
|
states other than SYN-RECEIVED: ESTABLISHED, FIN-WAIT-
|
|||
|
1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, and
|
|||
|
TIME-WAIT.
|
|||
|
|
|||
|
(e) Check SYN bit, p. 71: "In SYN-RECEIVED state and if
|
|||
|
the connection was initiated with a passive OPEN, then
|
|||
|
return this connection to the LISTEN state and return.
|
|||
|
Otherwise...".
|
|||
|
|
|||
|
(f) Check ACK field, SYN-RECEIVED state, p. 72: When the
|
|||
|
connection enters ESTABLISHED state, the variables
|
|||
|
listed in (c) must be set.
|
|||
|
|
|||
|
(g) Check ACK field, ESTABLISHED state, p. 72: The ACK is a
|
|||
|
duplicate if SEG.ACK =< SND.UNA (the = was omitted).
|
|||
|
Similarly, the window should be updated if: SND.UNA =<
|
|||
|
SEG.ACK =< SND.NXT.
|
|||
|
|
|||
|
(h) USER TIMEOUT, p. 77:
|
|||
|
|
|||
|
It would be better to notify the application of the
|
|||
|
timeout rather than letting TCP force the connection
|
|||
|
closed. However, see also Section 4.2.3.5.
|
|||
|
|
|||
|
|
|||
|
4.2.2.21 Acknowledging Queued Segments: RFC-793 Section 3.9
|
|||
|
|
|||
|
A TCP MAY send an ACK segment acknowledging RCV.NXT when a
|
|||
|
valid segment arrives that is in the window but not at the
|
|||
|
left window edge.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 94]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
RFC-793 (see page 74) was ambiguous about whether or
|
|||
|
not an ACK segment should be sent when an out-of-order
|
|||
|
segment was received, i.e., when SEG.SEQ was unequal to
|
|||
|
RCV.NXT.
|
|||
|
|
|||
|
One reason for ACKing out-of-order segments might be to
|
|||
|
support an experimental algorithm known as "fast
|
|||
|
retransmit". With this algorithm, the sender uses the
|
|||
|
"redundant" ACK's to deduce that a segment has been
|
|||
|
lost before the retransmission timer has expired. It
|
|||
|
counts the number of times an ACK has been received
|
|||
|
with the same value of SEG.ACK and with the same right
|
|||
|
window edge. If more than a threshold number of such
|
|||
|
ACK's is received, then the segment containing the
|
|||
|
octets starting at SEG.ACK is assumed to have been lost
|
|||
|
and is retransmitted, without awaiting a timeout. The
|
|||
|
threshold is chosen to compensate for the maximum
|
|||
|
likely segment reordering in the Internet. There is
|
|||
|
not yet enough experience with the fast retransmit
|
|||
|
algorithm to determine how useful it is.
|
|||
|
|
|||
|
4.2.3 SPECIFIC ISSUES
|
|||
|
|
|||
|
4.2.3.1 Retransmission Timeout Calculation
|
|||
|
|
|||
|
A host TCP MUST implement Karn's algorithm and Jacobson's
|
|||
|
algorithm for computing the retransmission timeout ("RTO").
|
|||
|
|
|||
|
o Jacobson's algorithm for computing the smoothed round-
|
|||
|
trip ("RTT") time incorporates a simple measure of the
|
|||
|
variance [TCP:7].
|
|||
|
|
|||
|
o Karn's algorithm for selecting RTT measurements ensures
|
|||
|
that ambiguous round-trip times will not corrupt the
|
|||
|
calculation of the smoothed round-trip time [TCP:6].
|
|||
|
|
|||
|
This implementation also MUST include "exponential backoff"
|
|||
|
for successive RTO values for the same segment.
|
|||
|
Retransmission of SYN segments SHOULD use the same algorithm
|
|||
|
as data segments.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
There were two known problems with the RTO calculations
|
|||
|
specified in RFC-793. First, the accurate measurement
|
|||
|
of RTTs is difficult when there are retransmissions.
|
|||
|
Second, the algorithm to compute the smoothed round-
|
|||
|
trip time is inadequate [TCP:7], because it incorrectly
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 95]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
assumed that the variance in RTT values would be small
|
|||
|
and constant. These problems were solved by Karn's and
|
|||
|
Jacobson's algorithm, respectively.
|
|||
|
|
|||
|
The performance increase resulting from the use of
|
|||
|
these improvements varies from noticeable to dramatic.
|
|||
|
Jacobson's algorithm for incorporating the measured RTT
|
|||
|
variance is especially important on a low-speed link,
|
|||
|
where the natural variation of packet sizes causes a
|
|||
|
large variation in RTT. One vendor found link
|
|||
|
utilization on a 9.6kb line went from 10% to 90% as a
|
|||
|
result of implementing Jacobson's variance algorithm in
|
|||
|
TCP.
|
|||
|
|
|||
|
The following values SHOULD be used to initialize the
|
|||
|
estimation parameters for a new connection:
|
|||
|
|
|||
|
(a) RTT = 0 seconds.
|
|||
|
|
|||
|
(b) RTO = 3 seconds. (The smoothed variance is to be
|
|||
|
initialized to the value that will result in this RTO).
|
|||
|
|
|||
|
The recommended upper and lower bounds on the RTO are known
|
|||
|
to be inadequate on large internets. The lower bound SHOULD
|
|||
|
be measured in fractions of a second (to accommodate high
|
|||
|
speed LANs) and the upper bound should be 2*MSL, i.e., 240
|
|||
|
seconds.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Experience has shown that these initialization values
|
|||
|
are reasonable, and that in any case the Karn and
|
|||
|
Jacobson algorithms make TCP behavior reasonably
|
|||
|
insensitive to the initial parameter choices.
|
|||
|
|
|||
|
4.2.3.2 When to Send an ACK Segment
|
|||
|
|
|||
|
A host that is receiving a stream of TCP data segments can
|
|||
|
increase efficiency in both the Internet and the hosts by
|
|||
|
sending fewer than one ACK (acknowledgment) segment per data
|
|||
|
segment received; this is known as a "delayed ACK" [TCP:5].
|
|||
|
|
|||
|
A TCP SHOULD implement a delayed ACK, but an ACK should not
|
|||
|
be excessively delayed; in particular, the delay MUST be
|
|||
|
less than 0.5 seconds, and in a stream of full-sized
|
|||
|
segments there SHOULD be an ACK for at least every second
|
|||
|
segment.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 96]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
A delayed ACK gives the application an opportunity to
|
|||
|
update the window and perhaps to send an immediate
|
|||
|
response. In particular, in the case of character-mode
|
|||
|
remote login, a delayed ACK can reduce the number of
|
|||
|
segments sent by the server by a factor of 3 (ACK,
|
|||
|
window update, and echo character all combined in one
|
|||
|
segment).
|
|||
|
|
|||
|
In addition, on some large multi-user hosts, a delayed
|
|||
|
ACK can substantially reduce protocol processing
|
|||
|
overhead by reducing the total number of packets to be
|
|||
|
processed [TCP:5]. However, excessive delays on ACK's
|
|||
|
can disturb the round-trip timing and packet "clocking"
|
|||
|
algorithms [TCP:7].
|
|||
|
|
|||
|
4.2.3.3 When to Send a Window Update
|
|||
|
|
|||
|
A TCP MUST include a SWS avoidance algorithm in the receiver
|
|||
|
[TCP:5].
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
The receiver's SWS avoidance algorithm determines when
|
|||
|
the right window edge may be advanced; this is
|
|||
|
customarily known as "updating the window". This
|
|||
|
algorithm combines with the delayed ACK algorithm (see
|
|||
|
Section 4.2.3.2) to determine when an ACK segment
|
|||
|
containing the current window will really be sent to
|
|||
|
the receiver. We use the notation of RFC-793; see
|
|||
|
Figures 4 and 5 in that document.
|
|||
|
|
|||
|
The solution to receiver SWS is to avoid advancing the
|
|||
|
right window edge RCV.NXT+RCV.WND in small increments,
|
|||
|
even if data is received from the network in small
|
|||
|
segments.
|
|||
|
|
|||
|
Suppose the total receive buffer space is RCV.BUFF. At
|
|||
|
any given moment, RCV.USER octets of this total may be
|
|||
|
tied up with data that has been received and
|
|||
|
acknowledged but which the user process has not yet
|
|||
|
consumed. When the connection is quiescent, RCV.WND =
|
|||
|
RCV.BUFF and RCV.USER = 0.
|
|||
|
|
|||
|
Keeping the right window edge fixed as data arrives and
|
|||
|
is acknowledged requires that the receiver offer less
|
|||
|
than its full buffer space, i.e., the receiver must
|
|||
|
specify a RCV.WND that keeps RCV.NXT+RCV.WND constant
|
|||
|
as RCV.NXT increases. Thus, the total buffer space
|
|||
|
RCV.BUFF is generally divided into three parts:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 97]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|<------- RCV.BUFF ---------------->|
|
|||
|
1 2 3
|
|||
|
----|---------|------------------|------|----
|
|||
|
RCV.NXT ^
|
|||
|
(Fixed)
|
|||
|
|
|||
|
1 - RCV.USER = data received but not yet consumed;
|
|||
|
2 - RCV.WND = space advertised to sender;
|
|||
|
3 - Reduction = space available but not yet
|
|||
|
advertised.
|
|||
|
|
|||
|
|
|||
|
The suggested SWS avoidance algorithm for the receiver
|
|||
|
is to keep RCV.NXT+RCV.WND fixed until the reduction
|
|||
|
satisfies:
|
|||
|
|
|||
|
RCV.BUFF - RCV.USER - RCV.WND >=
|
|||
|
|
|||
|
min( Fr * RCV.BUFF, Eff.snd.MSS )
|
|||
|
|
|||
|
where Fr is a fraction whose recommended value is 1/2,
|
|||
|
and Eff.snd.MSS is the effective send MSS for the
|
|||
|
connection (see Section 4.2.2.6). When the inequality
|
|||
|
is satisfied, RCV.WND is set to RCV.BUFF-RCV.USER.
|
|||
|
|
|||
|
Note that the general effect of this algorithm is to
|
|||
|
advance RCV.WND in increments of Eff.snd.MSS (for
|
|||
|
realistic receive buffers: Eff.snd.MSS < RCV.BUFF/2).
|
|||
|
Note also that the receiver must use its own
|
|||
|
Eff.snd.MSS, assuming it is the same as the sender's.
|
|||
|
|
|||
|
4.2.3.4 When to Send Data
|
|||
|
|
|||
|
A TCP MUST include a SWS avoidance algorithm in the sender.
|
|||
|
|
|||
|
A TCP SHOULD implement the Nagle Algorithm [TCP:9] to
|
|||
|
coalesce short segments. However, there MUST be a way for
|
|||
|
an application to disable the Nagle algorithm on an
|
|||
|
individual connection. In all cases, sending data is also
|
|||
|
subject to the limitation imposed by the Slow Start
|
|||
|
algorithm (Section 4.2.2.15).
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
The Nagle algorithm is generally as follows:
|
|||
|
|
|||
|
If there is unacknowledged data (i.e., SND.NXT >
|
|||
|
SND.UNA), then the sending TCP buffers all user
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 98]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
data (regardless of the PSH bit), until the
|
|||
|
outstanding data has been acknowledged or until
|
|||
|
the TCP can send a full-sized segment (Eff.snd.MSS
|
|||
|
bytes; see Section 4.2.2.6).
|
|||
|
|
|||
|
Some applications (e.g., real-time display window
|
|||
|
updates) require that the Nagle algorithm be turned
|
|||
|
off, so small data segments can be streamed out at the
|
|||
|
maximum rate.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
The sender's SWS avoidance algorithm is more difficult
|
|||
|
than the receivers's, because the sender does not know
|
|||
|
(directly) the receiver's total buffer space RCV.BUFF.
|
|||
|
An approach which has been found to work well is for
|
|||
|
the sender to calculate Max(SND.WND), the maximum send
|
|||
|
window it has seen so far on the connection, and to use
|
|||
|
this value as an estimate of RCV.BUFF. Unfortunately,
|
|||
|
this can only be an estimate; the receiver may at any
|
|||
|
time reduce the size of RCV.BUFF. To avoid a resulting
|
|||
|
deadlock, it is necessary to have a timeout to force
|
|||
|
transmission of data, overriding the SWS avoidance
|
|||
|
algorithm. In practice, this timeout should seldom
|
|||
|
occur.
|
|||
|
|
|||
|
The "useable window" [TCP:5] is:
|
|||
|
|
|||
|
U = SND.UNA + SND.WND - SND.NXT
|
|||
|
|
|||
|
i.e., the offered window less the amount of data sent
|
|||
|
but not acknowledged. If D is the amount of data
|
|||
|
queued in the sending TCP but not yet sent, then the
|
|||
|
following set of rules is recommended.
|
|||
|
|
|||
|
Send data:
|
|||
|
|
|||
|
(1) if a maximum-sized segment can be sent, i.e, if:
|
|||
|
|
|||
|
min(D,U) >= Eff.snd.MSS;
|
|||
|
|
|||
|
|
|||
|
(2) or if the data is pushed and all queued data can
|
|||
|
be sent now, i.e., if:
|
|||
|
|
|||
|
[SND.NXT = SND.UNA and] PUSHED and D <= U
|
|||
|
|
|||
|
(the bracketed condition is imposed by the Nagle
|
|||
|
algorithm);
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 99]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
(3) or if at least a fraction Fs of the maximum window
|
|||
|
can be sent, i.e., if:
|
|||
|
|
|||
|
[SND.NXT = SND.UNA and]
|
|||
|
|
|||
|
min(D.U) >= Fs * Max(SND.WND);
|
|||
|
|
|||
|
|
|||
|
(4) or if data is PUSHed and the override timeout
|
|||
|
occurs.
|
|||
|
|
|||
|
Here Fs is a fraction whose recommended value is 1/2.
|
|||
|
The override timeout should be in the range 0.1 - 1.0
|
|||
|
seconds. It may be convenient to combine this timer
|
|||
|
with the timer used to probe zero windows (Section
|
|||
|
4.2.2.17).
|
|||
|
|
|||
|
Finally, note that the SWS avoidance algorithm just
|
|||
|
specified is to be used instead of the sender-side
|
|||
|
algorithm contained in [TCP:5].
|
|||
|
|
|||
|
4.2.3.5 TCP Connection Failures
|
|||
|
|
|||
|
Excessive retransmission of the same segment by TCP
|
|||
|
indicates some failure of the remote host or the Internet
|
|||
|
path. This failure may be of short or long duration. The
|
|||
|
following procedure MUST be used to handle excessive
|
|||
|
retransmissions of data segments [IP:11]:
|
|||
|
|
|||
|
(a) There are two thresholds R1 and R2 measuring the amount
|
|||
|
of retransmission that has occurred for the same
|
|||
|
segment. R1 and R2 might be measured in time units or
|
|||
|
as a count of retransmissions.
|
|||
|
|
|||
|
(b) When the number of transmissions of the same segment
|
|||
|
reaches or exceeds threshold R1, pass negative advice
|
|||
|
(see Section 3.3.1.4) to the IP layer, to trigger
|
|||
|
dead-gateway diagnosis.
|
|||
|
|
|||
|
(c) When the number of transmissions of the same segment
|
|||
|
reaches a threshold R2 greater than R1, close the
|
|||
|
connection.
|
|||
|
|
|||
|
(d) An application MUST be able to set the value for R2 for
|
|||
|
a particular connection. For example, an interactive
|
|||
|
application might set R2 to "infinity," giving the user
|
|||
|
control over when to disconnect.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 100]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
(d) TCP SHOULD inform the application of the delivery
|
|||
|
problem (unless such information has been disabled by
|
|||
|
the application; see Section 4.2.4.1), when R1 is
|
|||
|
reached and before R2. This will allow a remote login
|
|||
|
(User Telnet) application program to inform the user,
|
|||
|
for example.
|
|||
|
|
|||
|
The value of R1 SHOULD correspond to at least 3
|
|||
|
retransmissions, at the current RTO. The value of R2 SHOULD
|
|||
|
correspond to at least 100 seconds.
|
|||
|
|
|||
|
An attempt to open a TCP connection could fail with
|
|||
|
excessive retransmissions of the SYN segment or by receipt
|
|||
|
of a RST segment or an ICMP Port Unreachable. SYN
|
|||
|
retransmissions MUST be handled in the general way just
|
|||
|
described for data retransmissions, including notification
|
|||
|
of the application layer.
|
|||
|
|
|||
|
However, the values of R1 and R2 may be different for SYN
|
|||
|
and data segments. In particular, R2 for a SYN segment MUST
|
|||
|
be set large enough to provide retransmission of the segment
|
|||
|
for at least 3 minutes. The application can close the
|
|||
|
connection (i.e., give up on the open attempt) sooner, of
|
|||
|
course.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Some Internet paths have significant setup times, and
|
|||
|
the number of such paths is likely to increase in the
|
|||
|
future.
|
|||
|
|
|||
|
4.2.3.6 TCP Keep-Alives
|
|||
|
|
|||
|
Implementors MAY include "keep-alives" in their TCP
|
|||
|
implementations, although this practice is not universally
|
|||
|
accepted. If keep-alives are included, the application MUST
|
|||
|
be able to turn them on or off for each TCP connection, and
|
|||
|
they MUST default to off.
|
|||
|
|
|||
|
Keep-alive packets MUST only be sent when no data or
|
|||
|
acknowledgement packets have been received for the
|
|||
|
connection within an interval. This interval MUST be
|
|||
|
configurable and MUST default to no less than two hours.
|
|||
|
|
|||
|
It is extremely important to remember that ACK segments that
|
|||
|
contain no data are not reliably transmitted by TCP.
|
|||
|
Consequently, if a keep-alive mechanism is implemented it
|
|||
|
MUST NOT interpret failure to respond to any specific probe
|
|||
|
as a dead connection.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 101]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
An implementation SHOULD send a keep-alive segment with no
|
|||
|
data; however, it MAY be configurable to send a keep-alive
|
|||
|
segment containing one garbage octet, for compatibility with
|
|||
|
erroneous TCP implementations.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
A "keep-alive" mechanism periodically probes the other
|
|||
|
end of a connection when the connection is otherwise
|
|||
|
idle, even when there is no data to be sent. The TCP
|
|||
|
specification does not include a keep-alive mechanism
|
|||
|
because it could: (1) cause perfectly good connections
|
|||
|
to break during transient Internet failures; (2)
|
|||
|
consume unnecessary bandwidth ("if no one is using the
|
|||
|
connection, who cares if it is still good?"); and (3)
|
|||
|
cost money for an Internet path that charges for
|
|||
|
packets.
|
|||
|
|
|||
|
Some TCP implementations, however, have included a
|
|||
|
keep-alive mechanism. To confirm that an idle
|
|||
|
connection is still active, these implementations send
|
|||
|
a probe segment designed to elicit a response from the
|
|||
|
peer TCP. Such a segment generally contains SEG.SEQ =
|
|||
|
SND.NXT-1 and may or may not contain one garbage octet
|
|||
|
of data. Note that on a quiet connection SND.NXT =
|
|||
|
RCV.NXT, so that this SEG.SEQ will be outside the
|
|||
|
window. Therefore, the probe causes the receiver to
|
|||
|
return an acknowledgment segment, confirming that the
|
|||
|
connection is still live. If the peer has dropped the
|
|||
|
connection due to a network partition or a crash, it
|
|||
|
will respond with a RST instead of an acknowledgment
|
|||
|
segment.
|
|||
|
|
|||
|
Unfortunately, some misbehaved TCP implementations fail
|
|||
|
to respond to a segment with SEG.SEQ = SND.NXT-1 unless
|
|||
|
the segment contains data. Alternatively, an
|
|||
|
implementation could determine whether a peer responded
|
|||
|
correctly to keep-alive packets with no garbage data
|
|||
|
octet.
|
|||
|
|
|||
|
A TCP keep-alive mechanism should only be invoked in
|
|||
|
server applications that might otherwise hang
|
|||
|
indefinitely and consume resources unnecessarily if a
|
|||
|
client crashes or aborts a connection during a network
|
|||
|
failure.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 102]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
4.2.3.7 TCP Multihoming
|
|||
|
|
|||
|
If an application on a multihomed host does not specify the
|
|||
|
local IP address when actively opening a TCP connection,
|
|||
|
then the TCP MUST ask the IP layer to select a local IP
|
|||
|
address before sending the (first) SYN. See the function
|
|||
|
GET_SRCADDR() in Section 3.4.
|
|||
|
|
|||
|
At all other times, a previous segment has either been sent
|
|||
|
or received on this connection, and TCP MUST use the same
|
|||
|
local address is used that was used in those previous
|
|||
|
segments.
|
|||
|
|
|||
|
4.2.3.8 IP Options
|
|||
|
|
|||
|
When received options are passed up to TCP from the IP
|
|||
|
layer, TCP MUST ignore options that it does not understand.
|
|||
|
|
|||
|
A TCP MAY support the Time Stamp and Record Route options.
|
|||
|
|
|||
|
An application MUST be able to specify a source route when
|
|||
|
it actively opens a TCP connection, and this MUST take
|
|||
|
precedence over a source route received in a datagram.
|
|||
|
|
|||
|
When a TCP connection is OPENed passively and a packet
|
|||
|
arrives with a completed IP Source Route option (containing
|
|||
|
a return route), TCP MUST save the return route and use it
|
|||
|
for all segments sent on this connection. If a different
|
|||
|
source route arrives in a later segment, the later
|
|||
|
definition SHOULD override the earlier one.
|
|||
|
|
|||
|
4.2.3.9 ICMP Messages
|
|||
|
|
|||
|
TCP MUST act on an ICMP error message passed up from the IP
|
|||
|
layer, directing it to the connection that created the
|
|||
|
error. The necessary demultiplexing information can be
|
|||
|
found in the IP header contained within the ICMP message.
|
|||
|
|
|||
|
o Source Quench
|
|||
|
|
|||
|
TCP MUST react to a Source Quench by slowing
|
|||
|
transmission on the connection. The RECOMMENDED
|
|||
|
procedure is for a Source Quench to trigger a "slow
|
|||
|
start," as if a retransmission timeout had occurred.
|
|||
|
|
|||
|
o Destination Unreachable -- codes 0, 1, 5
|
|||
|
|
|||
|
Since these Unreachable messages indicate soft error
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 103]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
conditions, TCP MUST NOT abort the connection, and it
|
|||
|
SHOULD make the information available to the
|
|||
|
application.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
TCP could report the soft error condition directly
|
|||
|
to the application layer with an upcall to the
|
|||
|
ERROR_REPORT routine, or it could merely note the
|
|||
|
message and report it to the application only when
|
|||
|
and if the TCP connection times out.
|
|||
|
|
|||
|
o Destination Unreachable -- codes 2-4
|
|||
|
|
|||
|
These are hard error conditions, so TCP SHOULD abort
|
|||
|
the connection.
|
|||
|
|
|||
|
o Time Exceeded -- codes 0, 1
|
|||
|
|
|||
|
This should be handled the same way as Destination
|
|||
|
Unreachable codes 0, 1, 5 (see above).
|
|||
|
|
|||
|
o Parameter Problem
|
|||
|
|
|||
|
This should be handled the same way as Destination
|
|||
|
Unreachable codes 0, 1, 5 (see above).
|
|||
|
|
|||
|
|
|||
|
4.2.3.10 Remote Address Validation
|
|||
|
|
|||
|
A TCP implementation MUST reject as an error a local OPEN
|
|||
|
call for an invalid remote IP address (e.g., a broadcast or
|
|||
|
multicast address).
|
|||
|
|
|||
|
An incoming SYN with an invalid source address must be
|
|||
|
ignored either by TCP or by the IP layer (see Section
|
|||
|
3.2.1.3).
|
|||
|
|
|||
|
A TCP implementation MUST silently discard an incoming SYN
|
|||
|
segment that is addressed to a broadcast or multicast
|
|||
|
address.
|
|||
|
|
|||
|
4.2.3.11 TCP Traffic Patterns
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
The TCP protocol specification [TCP:1] gives the
|
|||
|
implementor much freedom in designing the algorithms
|
|||
|
that control the message flow over the connection --
|
|||
|
packetizing, managing the window, sending
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 104]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
acknowledgments, etc. These design decisions are
|
|||
|
difficult because a TCP must adapt to a wide range of
|
|||
|
traffic patterns. Experience has shown that a TCP
|
|||
|
implementor needs to verify the design on two extreme
|
|||
|
traffic patterns:
|
|||
|
|
|||
|
o Single-character Segments
|
|||
|
|
|||
|
Even if the sender is using the Nagle Algorithm,
|
|||
|
when a TCP connection carries remote login traffic
|
|||
|
across a low-delay LAN the receiver will generally
|
|||
|
get a stream of single-character segments. If
|
|||
|
remote terminal echo mode is in effect, the
|
|||
|
receiver's system will generally echo each
|
|||
|
character as it is received.
|
|||
|
|
|||
|
o Bulk Transfer
|
|||
|
|
|||
|
When TCP is used for bulk transfer, the data
|
|||
|
stream should be made up (almost) entirely of
|
|||
|
segments of the size of the effective MSS.
|
|||
|
Although TCP uses a sequence number space with
|
|||
|
byte (octet) granularity, in bulk-transfer mode
|
|||
|
its operation should be as if TCP used a sequence
|
|||
|
space that counted only segments.
|
|||
|
|
|||
|
Experience has furthermore shown that a single TCP can
|
|||
|
effectively and efficiently handle these two extremes.
|
|||
|
|
|||
|
The most important tool for verifying a new TCP
|
|||
|
implementation is a packet trace program. There is a
|
|||
|
large volume of experience showing the importance of
|
|||
|
tracing a variety of traffic patterns with other TCP
|
|||
|
implementations and studying the results carefully.
|
|||
|
|
|||
|
|
|||
|
4.2.3.12 Efficiency
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
Extensive experience has led to the following
|
|||
|
suggestions for efficient implementation of TCP:
|
|||
|
|
|||
|
(a) Don't Copy Data
|
|||
|
|
|||
|
In bulk data transfer, the primary CPU-intensive
|
|||
|
tasks are copying data from one place to another
|
|||
|
and checksumming the data. It is vital to
|
|||
|
minimize the number of copies of TCP data. Since
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 105]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
the ultimate speed limitation may be fetching data
|
|||
|
across the memory bus, it may be useful to combine
|
|||
|
the copy with checksumming, doing both with a
|
|||
|
single memory fetch.
|
|||
|
|
|||
|
(b) Hand-Craft the Checksum Routine
|
|||
|
|
|||
|
A good TCP checksumming routine is typically two
|
|||
|
to five times faster than a simple and direct
|
|||
|
implementation of the definition. Great care and
|
|||
|
clever coding are often required and advisable to
|
|||
|
make the checksumming code "blazing fast". See
|
|||
|
[TCP:10].
|
|||
|
|
|||
|
(c) Code for the Common Case
|
|||
|
|
|||
|
TCP protocol processing can be complicated, but
|
|||
|
for most segments there are only a few simple
|
|||
|
decisions to be made. Per-segment processing will
|
|||
|
be greatly speeded up by coding the main line to
|
|||
|
minimize the number of decisions in the most
|
|||
|
common case.
|
|||
|
|
|||
|
|
|||
|
4.2.4 TCP/APPLICATION LAYER INTERFACE
|
|||
|
|
|||
|
4.2.4.1 Asynchronous Reports
|
|||
|
|
|||
|
There MUST be a mechanism for reporting soft TCP error
|
|||
|
conditions to the application. Generically, we assume this
|
|||
|
takes the form of an application-supplied ERROR_REPORT
|
|||
|
routine that may be upcalled [INTRO:7] asynchronously from
|
|||
|
the transport layer:
|
|||
|
|
|||
|
ERROR_REPORT(local connection name, reason, subreason)
|
|||
|
|
|||
|
The precise encoding of the reason and subreason parameters
|
|||
|
is not specified here. However, the conditions that are
|
|||
|
reported asynchronously to the application MUST include:
|
|||
|
|
|||
|
* ICMP error message arrived (see 4.2.3.9)
|
|||
|
|
|||
|
* Excessive retransmissions (see 4.2.3.5)
|
|||
|
|
|||
|
* Urgent pointer advance (see 4.2.2.4).
|
|||
|
|
|||
|
However, an application program that does not want to
|
|||
|
receive such ERROR_REPORT calls SHOULD be able to
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 106]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
effectively disable these calls.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
These error reports generally reflect soft errors that
|
|||
|
can be ignored without harm by many applications. It
|
|||
|
has been suggested that these error report calls should
|
|||
|
default to "disabled," but this is not required.
|
|||
|
|
|||
|
4.2.4.2 Type-of-Service
|
|||
|
|
|||
|
The application layer MUST be able to specify the Type-of-
|
|||
|
Service (TOS) for segments that are sent on a connection.
|
|||
|
It not required, but the application SHOULD be able to
|
|||
|
change the TOS during the connection lifetime. TCP SHOULD
|
|||
|
pass the current TOS value without change to the IP layer,
|
|||
|
when it sends segments on the connection.
|
|||
|
|
|||
|
The TOS will be specified independently in each direction on
|
|||
|
the connection, so that the receiver application will
|
|||
|
specify the TOS used for ACK segments.
|
|||
|
|
|||
|
TCP MAY pass the most recently received TOS up to the
|
|||
|
application.
|
|||
|
|
|||
|
DISCUSSION
|
|||
|
Some applications (e.g., SMTP) change the nature of
|
|||
|
their communication during the lifetime of a
|
|||
|
connection, and therefore would like to change the TOS
|
|||
|
specification.
|
|||
|
|
|||
|
Note also that the OPEN call specified in RFC-793
|
|||
|
includes a parameter ("options") in which the caller
|
|||
|
can specify IP options such as source route, record
|
|||
|
route, or timestamp.
|
|||
|
|
|||
|
4.2.4.3 Flush Call
|
|||
|
|
|||
|
Some TCP implementations have included a FLUSH call, which
|
|||
|
will empty the TCP send queue of any data for which the user
|
|||
|
has issued SEND calls but which is still to the right of the
|
|||
|
current send window. That is, it flushes as much queued
|
|||
|
send data as possible without losing sequence number
|
|||
|
synchronization. This is useful for implementing the "abort
|
|||
|
output" function of Telnet.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 107]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
4.2.4.4 Multihoming
|
|||
|
|
|||
|
The user interface outlined in sections 2.7 and 3.8 of RFC-
|
|||
|
793 needs to be extended for multihoming. The OPEN call
|
|||
|
MUST have an optional parameter:
|
|||
|
|
|||
|
OPEN( ... [local IP address,] ... )
|
|||
|
|
|||
|
to allow the specification of the local IP address.
|
|||
|
|
|||
|
DISCUSSION:
|
|||
|
Some TCP-based applications need to specify the local
|
|||
|
IP address to be used to open a particular connection;
|
|||
|
FTP is an example.
|
|||
|
|
|||
|
IMPLEMENTATION:
|
|||
|
A passive OPEN call with a specified "local IP address"
|
|||
|
parameter will await an incoming connection request to
|
|||
|
that address. If the parameter is unspecified, a
|
|||
|
passive OPEN will await an incoming connection request
|
|||
|
to any local IP address, and then bind the local IP
|
|||
|
address of the connection to the particular address
|
|||
|
that is used.
|
|||
|
|
|||
|
For an active OPEN call, a specified "local IP address"
|
|||
|
parameter will be used for opening the connection. If
|
|||
|
the parameter is unspecified, the networking software
|
|||
|
will choose an appropriate local IP address (see
|
|||
|
Section 3.3.4.2) for the connection
|
|||
|
|
|||
|
4.2.5 TCP REQUIREMENT SUMMARY
|
|||
|
|
|||
|
| | | | |S| |
|
|||
|
| | | | |H| |F
|
|||
|
| | | | |O|M|o
|
|||
|
| | |S| |U|U|o
|
|||
|
| | |H| |L|S|t
|
|||
|
| |M|O| |D|T|n
|
|||
|
| |U|U|M| | |o
|
|||
|
| |S|L|A|N|N|t
|
|||
|
| |T|D|Y|O|O|t
|
|||
|
FEATURE |SECTION | | | |T|T|e
|
|||
|
-------------------------------------------------|--------|-|-|-|-|-|--
|
|||
|
| | | | | | |
|
|||
|
Push flag | | | | | | |
|
|||
|
Aggregate or queue un-pushed data |4.2.2.2 | | |x| | |
|
|||
|
Sender collapse successive PSH flags |4.2.2.2 | |x| | | |
|
|||
|
SEND call can specify PUSH |4.2.2.2 | | |x| | |
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 108]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
If cannot: sender buffer indefinitely |4.2.2.2 | | | | |x|
|
|||
|
If cannot: PSH last segment |4.2.2.2 |x| | | | |
|
|||
|
Notify receiving ALP of PSH |4.2.2.2 | | |x| | |1
|
|||
|
Send max size segment when possible |4.2.2.2 | |x| | | |
|
|||
|
| | | | | | |
|
|||
|
Window | | | | | | |
|
|||
|
Treat as unsigned number |4.2.2.3 |x| | | | |
|
|||
|
Handle as 32-bit number |4.2.2.3 | |x| | | |
|
|||
|
Shrink window from right |4.2.2.16| | | |x| |
|
|||
|
Robust against shrinking window |4.2.2.16|x| | | | |
|
|||
|
Receiver's window closed indefinitely |4.2.2.17| | |x| | |
|
|||
|
Sender probe zero window |4.2.2.17|x| | | | |
|
|||
|
First probe after RTO |4.2.2.17| |x| | | |
|
|||
|
Exponential backoff |4.2.2.17| |x| | | |
|
|||
|
Allow window stay zero indefinitely |4.2.2.17|x| | | | |
|
|||
|
Sender timeout OK conn with zero wind |4.2.2.17| | | | |x|
|
|||
|
| | | | | | |
|
|||
|
Urgent Data | | | | | | |
|
|||
|
Pointer points to last octet |4.2.2.4 |x| | | | |
|
|||
|
Arbitrary length urgent data sequence |4.2.2.4 |x| | | | |
|
|||
|
Inform ALP asynchronously of urgent data |4.2.2.4 |x| | | | |1
|
|||
|
ALP can learn if/how much urgent data Q'd |4.2.2.4 |x| | | | |1
|
|||
|
| | | | | | |
|
|||
|
TCP Options | | | | | | |
|
|||
|
Receive TCP option in any segment |4.2.2.5 |x| | | | |
|
|||
|
Ignore unsupported options |4.2.2.5 |x| | | | |
|
|||
|
Cope with illegal option length |4.2.2.5 |x| | | | |
|
|||
|
Implement sending & receiving MSS option |4.2.2.6 |x| | | | |
|
|||
|
Send MSS option unless 536 |4.2.2.6 | |x| | | |
|
|||
|
Send MSS option always |4.2.2.6 | | |x| | |
|
|||
|
Send-MSS default is 536 |4.2.2.6 |x| | | | |
|
|||
|
Calculate effective send seg size |4.2.2.6 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
TCP Checksums | | | | | | |
|
|||
|
Sender compute checksum |4.2.2.7 |x| | | | |
|
|||
|
Receiver check checksum |4.2.2.7 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
Use clock-driven ISN selection |4.2.2.9 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
Opening Connections | | | | | | |
|
|||
|
Support simultaneous open attempts |4.2.2.10|x| | | | |
|
|||
|
SYN-RCVD remembers last state |4.2.2.11|x| | | | |
|
|||
|
Passive Open call interfere with others |4.2.2.18| | | | |x|
|
|||
|
Function: simultan. LISTENs for same port |4.2.2.18|x| | | | |
|
|||
|
Ask IP for src address for SYN if necc. |4.2.3.7 |x| | | | |
|
|||
|
Otherwise, use local addr of conn. |4.2.3.7 |x| | | | |
|
|||
|
OPEN to broadcast/multicast IP Address |4.2.3.14| | | | |x|
|
|||
|
Silently discard seg to bcast/mcast addr |4.2.3.14|x| | | | |
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 109]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
| | | | | | |
|
|||
|
Closing Connections | | | | | | |
|
|||
|
RST can contain data |4.2.2.12| |x| | | |
|
|||
|
Inform application of aborted conn |4.2.2.13|x| | | | |
|
|||
|
Half-duplex close connections |4.2.2.13| | |x| | |
|
|||
|
Send RST to indicate data lost |4.2.2.13| |x| | | |
|
|||
|
In TIME-WAIT state for 2xMSL seconds |4.2.2.13|x| | | | |
|
|||
|
Accept SYN from TIME-WAIT state |4.2.2.13| | |x| | |
|
|||
|
| | | | | | |
|
|||
|
Retransmissions | | | | | | |
|
|||
|
Jacobson Slow Start algorithm |4.2.2.15|x| | | | |
|
|||
|
Jacobson Congestion-Avoidance algorithm |4.2.2.15|x| | | | |
|
|||
|
Retransmit with same IP ident |4.2.2.15| | |x| | |
|
|||
|
Karn's algorithm |4.2.3.1 |x| | | | |
|
|||
|
Jacobson's RTO estimation alg. |4.2.3.1 |x| | | | |
|
|||
|
Exponential backoff |4.2.3.1 |x| | | | |
|
|||
|
SYN RTO calc same as data |4.2.3.1 | |x| | | |
|
|||
|
Recommended initial values and bounds |4.2.3.1 | |x| | | |
|
|||
|
| | | | | | |
|
|||
|
Generating ACK's: | | | | | | |
|
|||
|
Queue out-of-order segments |4.2.2.20| |x| | | |
|
|||
|
Process all Q'd before send ACK |4.2.2.20|x| | | | |
|
|||
|
Send ACK for out-of-order segment |4.2.2.21| | |x| | |
|
|||
|
Delayed ACK's |4.2.3.2 | |x| | | |
|
|||
|
Delay < 0.5 seconds |4.2.3.2 |x| | | | |
|
|||
|
Every 2nd full-sized segment ACK'd |4.2.3.2 |x| | | | |
|
|||
|
Receiver SWS-Avoidance Algorithm |4.2.3.3 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
Sending data | | | | | | |
|
|||
|
Configurable TTL |4.2.2.19|x| | | | |
|
|||
|
Sender SWS-Avoidance Algorithm |4.2.3.4 |x| | | | |
|
|||
|
Nagle algorithm |4.2.3.4 | |x| | | |
|
|||
|
Application can disable Nagle algorithm |4.2.3.4 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
Connection Failures: | | | | | | |
|
|||
|
Negative advice to IP on R1 retxs |4.2.3.5 |x| | | | |
|
|||
|
Close connection on R2 retxs |4.2.3.5 |x| | | | |
|
|||
|
ALP can set R2 |4.2.3.5 |x| | | | |1
|
|||
|
Inform ALP of R1<=retxs<R2 |4.2.3.5 | |x| | | |1
|
|||
|
Recommended values for R1, R2 |4.2.3.5 | |x| | | |
|
|||
|
Same mechanism for SYNs |4.2.3.5 |x| | | | |
|
|||
|
R2 at least 3 minutes for SYN |4.2.3.5 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
Send Keep-alive Packets: |4.2.3.6 | | |x| | |
|
|||
|
- Application can request |4.2.3.6 |x| | | | |
|
|||
|
- Default is "off" |4.2.3.6 |x| | | | |
|
|||
|
- Only send if idle for interval |4.2.3.6 |x| | | | |
|
|||
|
- Interval configurable |4.2.3.6 |x| | | | |
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 110]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
- Default at least 2 hrs. |4.2.3.6 |x| | | | |
|
|||
|
- Tolerant of lost ACK's |4.2.3.6 |x| | | | |
|
|||
|
| | | | | | |
|
|||
|
IP Options | | | | | | |
|
|||
|
Ignore options TCP doesn't understand |4.2.3.8 |x| | | | |
|
|||
|
Time Stamp support |4.2.3.8 | | |x| | |
|
|||
|
Record Route support |4.2.3.8 | | |x| | |
|
|||
|
Source Route: | | | | | | |
|
|||
|
ALP can specify |4.2.3.8 |x| | | | |1
|
|||
|
Overrides src rt in datagram |4.2.3.8 |x| | | | |
|
|||
|
Build return route from src rt |4.2.3.8 |x| | | | |
|
|||
|
Later src route overrides |4.2.3.8 | |x| | | |
|
|||
|
| | | | | | |
|
|||
|
Receiving ICMP Messages from IP |4.2.3.9 |x| | | | |
|
|||
|
Dest. Unreach (0,1,5) => inform ALP |4.2.3.9 | |x| | | |
|
|||
|
Dest. Unreach (0,1,5) => abort conn |4.2.3.9 | | | | |x|
|
|||
|
Dest. Unreach (2-4) => abort conn |4.2.3.9 | |x| | | |
|
|||
|
Source Quench => slow start |4.2.3.9 | |x| | | |
|
|||
|
Time Exceeded => tell ALP, don't abort |4.2.3.9 | |x| | | |
|
|||
|
Param Problem => tell ALP, don't abort |4.2.3.9 | |x| | | |
|
|||
|
| | | | | | |
|
|||
|
Address Validation | | | | | | |
|
|||
|
Reject OPEN call to invalid IP address |4.2.3.10|x| | | | |
|
|||
|
Reject SYN from invalid IP address |4.2.3.10|x| | | | |
|
|||
|
Silently discard SYN to bcast/mcast addr |4.2.3.10|x| | | | |
|
|||
|
| | | | | | |
|
|||
|
TCP/ALP Interface Services | | | | | | |
|
|||
|
Error Report mechanism |4.2.4.1 |x| | | | |
|
|||
|
ALP can disable Error Report Routine |4.2.4.1 | |x| | | |
|
|||
|
ALP can specify TOS for sending |4.2.4.2 |x| | | | |
|
|||
|
Passed unchanged to IP |4.2.4.2 | |x| | | |
|
|||
|
ALP can change TOS during connection |4.2.4.2 | |x| | | |
|
|||
|
Pass received TOS up to ALP |4.2.4.2 | | |x| | |
|
|||
|
FLUSH call |4.2.4.3 | | |x| | |
|
|||
|
Optional local IP addr parm. in OPEN |4.2.4.4 |x| | | | |
|
|||
|
-------------------------------------------------|--------|-|-|-|-|-|--
|
|||
|
-------------------------------------------------|--------|-|-|-|-|-|--
|
|||
|
|
|||
|
FOOTNOTES:
|
|||
|
|
|||
|
(1) "ALP" means Application-Layer program.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 111]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
5. REFERENCES
|
|||
|
|
|||
|
INTRODUCTORY REFERENCES
|
|||
|
|
|||
|
|
|||
|
[INTRO:1] "Requirements for Internet Hosts -- Application and Support,"
|
|||
|
IETF Host Requirements Working Group, R. Braden, Ed., RFC-1123,
|
|||
|
October 1989.
|
|||
|
|
|||
|
[INTRO:2] "Requirements for Internet Gateways," R. Braden and J.
|
|||
|
Postel, RFC-1009, June 1987.
|
|||
|
|
|||
|
[INTRO:3] "DDN Protocol Handbook," NIC-50004, NIC-50005, NIC-50006,
|
|||
|
(three volumes), SRI International, December 1985.
|
|||
|
|
|||
|
[INTRO:4] "Official Internet Protocols," J. Reynolds and J. Postel,
|
|||
|
RFC-1011, May 1987.
|
|||
|
|
|||
|
This document is republished periodically with new RFC numbers; the
|
|||
|
latest version must be used.
|
|||
|
|
|||
|
[INTRO:5] "Protocol Document Order Information," O. Jacobsen and J.
|
|||
|
Postel, RFC-980, March 1986.
|
|||
|
|
|||
|
[INTRO:6] "Assigned Numbers," J. Reynolds and J. Postel, RFC-1010, May
|
|||
|
1987.
|
|||
|
|
|||
|
This document is republished periodically with new RFC numbers; the
|
|||
|
latest version must be used.
|
|||
|
|
|||
|
[INTRO:7] "Modularity and Efficiency in Protocol Implementations," D.
|
|||
|
Clark, RFC-817, July 1982.
|
|||
|
|
|||
|
[INTRO:8] "The Structuring of Systems Using Upcalls," D. Clark, 10th ACM
|
|||
|
SOSP, Orcas Island, Washington, December 1985.
|
|||
|
|
|||
|
|
|||
|
Secondary References:
|
|||
|
|
|||
|
|
|||
|
[INTRO:9] "A Protocol for Packet Network Intercommunication," V. Cerf
|
|||
|
and R. Kahn, IEEE Transactions on Communication, May 1974.
|
|||
|
|
|||
|
[INTRO:10] "The ARPA Internet Protocol," J. Postel, C. Sunshine, and D.
|
|||
|
Cohen, Computer Networks, Vol. 5, No. 4, July 1981.
|
|||
|
|
|||
|
[INTRO:11] "The DARPA Internet Protocol Suite," B. Leiner, J. Postel,
|
|||
|
R. Cole and D. Mills, Proceedings INFOCOM 85, IEEE, Washington DC,
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 112]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
March 1985. Also in: IEEE Communications Magazine, March 1985.
|
|||
|
Also available as ISI-RS-85-153.
|
|||
|
|
|||
|
[INTRO:12] "Final Text of DIS8473, Protocol for Providing the
|
|||
|
Connectionless Mode Network Service," ANSI, published as RFC-994,
|
|||
|
March 1986.
|
|||
|
|
|||
|
[INTRO:13] "End System to Intermediate System Routing Exchange
|
|||
|
Protocol," ANSI X3S3.3, published as RFC-995, April 1986.
|
|||
|
|
|||
|
|
|||
|
LINK LAYER REFERENCES
|
|||
|
|
|||
|
|
|||
|
[LINK:1] "Trailer Encapsulations," S. Leffler and M. Karels, RFC-893,
|
|||
|
April 1984.
|
|||
|
|
|||
|
[LINK:2] "An Ethernet Address Resolution Protocol," D. Plummer, RFC-826,
|
|||
|
November 1982.
|
|||
|
|
|||
|
[LINK:3] "A Standard for the Transmission of IP Datagrams over Ethernet
|
|||
|
Networks," C. Hornig, RFC-894, April 1984.
|
|||
|
|
|||
|
[LINK:4] "A Standard for the Transmission of IP Datagrams over IEEE 802
|
|||
|
"Networks," J. Postel and J. Reynolds, RFC-1042, February 1988.
|
|||
|
|
|||
|
This RFC contains a great deal of information of importance to
|
|||
|
Internet implementers planning to use IEEE 802 networks.
|
|||
|
|
|||
|
|
|||
|
IP LAYER REFERENCES
|
|||
|
|
|||
|
|
|||
|
[IP:1] "Internet Protocol (IP)," J. Postel, RFC-791, September 1981.
|
|||
|
|
|||
|
[IP:2] "Internet Control Message Protocol (ICMP)," J. Postel, RFC-792,
|
|||
|
September 1981.
|
|||
|
|
|||
|
[IP:3] "Internet Standard Subnetting Procedure," J. Mogul and J. Postel,
|
|||
|
RFC-950, August 1985.
|
|||
|
|
|||
|
[IP:4] "Host Extensions for IP Multicasting," S. Deering, RFC-1112,
|
|||
|
August 1989.
|
|||
|
|
|||
|
[IP:5] "Military Standard Internet Protocol," MIL-STD-1777, Department
|
|||
|
of Defense, August 1983.
|
|||
|
|
|||
|
This specification, as amended by RFC-963, is intended to describe
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 113]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
the Internet Protocol but has some serious omissions (e.g., the
|
|||
|
mandatory subnet extension [IP:3] and the optional multicasting
|
|||
|
extension [IP:4]). It is also out of date. If there is a
|
|||
|
conflict, RFC-791, RFC-792, and RFC-950 must be taken as
|
|||
|
authoritative, while the present document is authoritative over
|
|||
|
all.
|
|||
|
|
|||
|
[IP:6] "Some Problems with the Specification of the Military Standard
|
|||
|
Internet Protocol," D. Sidhu, RFC-963, November 1985.
|
|||
|
|
|||
|
[IP:7] "The TCP Maximum Segment Size and Related Topics," J. Postel,
|
|||
|
RFC-879, November 1983.
|
|||
|
|
|||
|
Discusses and clarifies the relationship between the TCP Maximum
|
|||
|
Segment Size option and the IP datagram size.
|
|||
|
|
|||
|
[IP:8] "Internet Protocol Security Options," B. Schofield, RFC-1108,
|
|||
|
October 1989.
|
|||
|
|
|||
|
[IP:9] "Fragmentation Considered Harmful," C. Kent and J. Mogul, ACM
|
|||
|
SIGCOMM-87, August 1987. Published as ACM Comp Comm Review, Vol.
|
|||
|
17, no. 5.
|
|||
|
|
|||
|
This useful paper discusses the problems created by Internet
|
|||
|
fragmentation and presents alternative solutions.
|
|||
|
|
|||
|
[IP:10] "IP Datagram Reassembly Algorithms," D. Clark, RFC-815, July
|
|||
|
1982.
|
|||
|
|
|||
|
This and the following paper should be read by every implementor.
|
|||
|
|
|||
|
[IP:11] "Fault Isolation and Recovery," D. Clark, RFC-816, July 1982.
|
|||
|
|
|||
|
SECONDARY IP REFERENCES:
|
|||
|
|
|||
|
|
|||
|
[IP:12] "Broadcasting Internet Datagrams in the Presence of Subnets," J.
|
|||
|
Mogul, RFC-922, October 1984.
|
|||
|
|
|||
|
[IP:13] "Name, Addresses, Ports, and Routes," D. Clark, RFC-814, July
|
|||
|
1982.
|
|||
|
|
|||
|
[IP:14] "Something a Host Could Do with Source Quench: The Source Quench
|
|||
|
Introduced Delay (SQUID)," W. Prue and J. Postel, RFC-1016, July
|
|||
|
1987.
|
|||
|
|
|||
|
This RFC first described directed broadcast addresses. However,
|
|||
|
the bulk of the RFC is concerned with gateways, not hosts.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 114]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
UDP REFERENCES:
|
|||
|
|
|||
|
|
|||
|
[UDP:1] "User Datagram Protocol," J. Postel, RFC-768, August 1980.
|
|||
|
|
|||
|
|
|||
|
TCP REFERENCES:
|
|||
|
|
|||
|
|
|||
|
[TCP:1] "Transmission Control Protocol," J. Postel, RFC-793, September
|
|||
|
1981.
|
|||
|
|
|||
|
|
|||
|
[TCP:2] "Transmission Control Protocol," MIL-STD-1778, US Department of
|
|||
|
Defense, August 1984.
|
|||
|
|
|||
|
This specification as amended by RFC-964 is intended to describe
|
|||
|
the same protocol as RFC-793 [TCP:1]. If there is a conflict,
|
|||
|
RFC-793 takes precedence, and the present document is authoritative
|
|||
|
over both.
|
|||
|
|
|||
|
|
|||
|
[TCP:3] "Some Problems with the Specification of the Military Standard
|
|||
|
Transmission Control Protocol," D. Sidhu and T. Blumer, RFC-964,
|
|||
|
November 1985.
|
|||
|
|
|||
|
|
|||
|
[TCP:4] "The TCP Maximum Segment Size and Related Topics," J. Postel,
|
|||
|
RFC-879, November 1983.
|
|||
|
|
|||
|
|
|||
|
[TCP:5] "Window and Acknowledgment Strategy in TCP," D. Clark, RFC-813,
|
|||
|
July 1982.
|
|||
|
|
|||
|
|
|||
|
[TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge, ACM
|
|||
|
SIGCOMM-87, August 1987.
|
|||
|
|
|||
|
|
|||
|
[TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-88,
|
|||
|
August 1988.
|
|||
|
|
|||
|
|
|||
|
SECONDARY TCP REFERENCES:
|
|||
|
|
|||
|
|
|||
|
[TCP:8] "Modularity and Efficiency in Protocol Implementation," D.
|
|||
|
Clark, RFC-817, July 1982.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 115]
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
RFC1122 TRANSPORT LAYER -- TCP October 1989
|
|||
|
|
|||
|
|
|||
|
[TCP:9] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January 1984.
|
|||
|
|
|||
|
|
|||
|
[TCP:10] "Computing the Internet Checksum," R. Braden, D. Borman, and C.
|
|||
|
Partridge, RFC-1071, September 1988.
|
|||
|
|
|||
|
|
|||
|
[TCP:11] "TCP Extensions for Long-Delay Paths," V. Jacobson & R. Braden,
|
|||
|
RFC-1072, October 1988.
|
|||
|
|
|||
|
|
|||
|
Security Considerations
|
|||
|
|
|||
|
There are many security issues in the communication layers of host
|
|||
|
software, but a full discussion is beyond the scope of this RFC.
|
|||
|
|
|||
|
The Internet architecture generally provides little protection
|
|||
|
against spoofing of IP source addresses, so any security mechanism
|
|||
|
that is based upon verifying the IP source address of a datagram
|
|||
|
should be treated with suspicion. However, in restricted
|
|||
|
environments some source-address checking may be possible. For
|
|||
|
example, there might be a secure LAN whose gateway to the rest of the
|
|||
|
Internet discarded any incoming datagram with a source address that
|
|||
|
spoofed the LAN address. In this case, a host on the LAN could use
|
|||
|
the source address to test for local vs. remote source. This problem
|
|||
|
is complicated by source routing, and some have suggested that
|
|||
|
source-routed datagram forwarding by hosts (see Section 3.3.5) should
|
|||
|
be outlawed for security reasons.
|
|||
|
|
|||
|
Security-related issues are mentioned in sections concerning the IP
|
|||
|
Security option (Section 3.2.1.8), the ICMP Parameter Problem message
|
|||
|
(Section 3.2.2.5), IP options in UDP datagrams (Section 4.1.3.2), and
|
|||
|
reserved TCP ports (Section 4.2.2.1).
|
|||
|
|
|||
|
Author's Address
|
|||
|
|
|||
|
Robert Braden
|
|||
|
USC/Information Sciences Institute
|
|||
|
4676 Admiralty Way
|
|||
|
Marina del Rey, CA 90292-6695
|
|||
|
|
|||
|
Phone: (213) 822 1511
|
|||
|
|
|||
|
EMail: Braden@ISI.EDU
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Internet Engineering Task Force [Page 116]
|
|||
|
|