NetBSD/share/doc/iso/ucb/trans_serv.nr
1998-01-09 06:34:28 +00:00

700 lines
19 KiB
Plaintext

.\" $NetBSD: trans_serv.nr,v 1.2 1998/01/09 06:34:34 perry Exp $
.\"
.NC "Transport Service Interface"
.sh 1 "General"
.pp
It is assumed that the reader is acquainted with the
set of system calls and library routines that
compose the
Berkeley
Unix interprocess communication service (IPC).
To every extent possible
the ARGO transport service is provided by the same IPC mechanisms
that support
the transport-level services
included in the
AOS distribution.
In some instances, the interface
provided by AOS does not support
the services required by ISO 8073,
so system calls were added to support these services.
It is felt that this is superior to modifying
existing system calls, in order to avoid the
recoding of existing Unix utilities.
.pp
What follows is a description of the system calls
that are used to provide the transport service.
According to Unix custom,
the return value of a system call is 0
if the call succeeds and -1 if
the call fails for any reason.
In the latter case,
the global variable
\fIerrno\fR contains more information
about the error that caused the failure.
In the descriptions of all the system calls for which
this custom is followed,
the return value is named
\fIstatus\fR.
.sh 1 "Connection establishment"
.pp
Establishing a TP connection is similar to
establishing a connection using any other
transport protocol supported by Unix.
The same system calls are used, and the passive open
is required.
Some of the parameters to the system calls differ.
.pp
The following call creates a communication endpoint called a
\fIsocket\fR.
It returns
a positive integer called a
\fIsocket descriptor\fR,
which
will be a parameter in all communication primitives.
.(b
\fC
.TS
tab(+);
l s s s.
s = socket( af, type, protocol )
.T&
l l l.
+int+s,af,type,protocol;
.TE
\fR
.)b
.pp
The \fIaf\fR parameter describes the format of addresses
used in this communication.
Existing formats include AF_INET (DoD Internet addresses),
AF_PUP (Xerox PUP-I Internet addresses), and AF_UNIX
(addresses are Unix file names, for intra-machine IPC only).
TP runs in either the Internet domain or the ISO domain (AF_ISO).
When using the Internet domain, the network layer is the DoD Internet IP
with Internet-style addresses. The ISO domain uses the ISO
network service and ISO-style addresses\**.
.(f
\**ISO/DP 8348/DAD2 Addendum to the Network
Service Definition Covering Network Layer Addressing.
.)f
Regardless of the address family used, an address takes the
general form,
.(b
\fC
.TS
tab(+);
l s s s.
struct sockaddr {
.T&
l l l l.
+u_char+sa_len;+/* length of sockaddr */
+u_char+sa_family;+/* address family */
+char+sa_data[14];+/* space for an address */
}+
.TE
\fR
.)b
.lp
.i
A sockaddr is no longer required to be precisely 16 bytes long.
The allocation of 14 bytes for sa_data is intended for backwards
compatibility.
.r
.sp 1
When viewed as an Internet address, it takes the form
.(b
\fC
.TS
tab(+);
l s s s.
struct sockaddr_in {
.T&
l l l l.
+u_char+sin_len;+/* address length */
+u_char+sin_family;+/* address family */
+u_short+sin_port;+/* internet port */
+struct in_addr+sin_addr;+/* network addr A.B.C.D */
+char+sin_zero[8];+/* unused */
}
.TE
\fR
.)b
.sp 1
When viewed as an ISO address, as supplied by the
university of wisconsin, it takes the form
.(b
\fC
.TS
tab(+);
l s s s.
struct sockaddr_iso {
.T&
l l l l.
+u_char+siso_len;+/* address length */
+u_char+siso_family;+/* address family */
+u_short+siso_tsuffix;+/* transport suffix */
+struct iso_addr+siso_addr;+/* ISO NSAP addr */
+char+siso_zero[2];+/* unused */
}
.TE
\fR
.)b
The address described by a \fIsockaddr_iso\fR structure
is a TSAP-address (transport service access point address).
It is made of an NSAP-address (network service access point address)
and a TSAP selector (also called a transport suffix or
transport selector, hereafter called a TSEL).
The structure \fIsockaddr_iso\fR contains a 2-byte TSEL.
This is for compatibility with Internet addressing.
ARGO supports
TSELs of length 1-64 bytes.
TSELs of any length other than 2
are called \*(lqextended TSELs\*(rq.
They are described in detail in the section \fB\*(lqExtended TSELs\*(rq\fR.
If extended TSELs are not requested, 2-byte TSELs are used by default.
.pp
Refer to Chapter Five for more information about ISO NSAP-addresses.
.pp
.i
It is our intent at Berkeley to revamp the sockaddr_iso
to use a more natural and uniform model, for ISO addresses.
We cannot guarantee this modification to be complete by the
time we are ready to have something for NIST to test.
We hope to remove this notion of extended TSEL's as soon as
possible, certainly by formal beta testing of 4.4.
.r
Since sockaddr can be 108 bytes long without breaking anything
in the current Berkeley kernel, we should be able to eliminate
extended TSEL's entirely by
providing a sockaddr_iso along the lines of:
.(b
\fC
.TS
tab(+);
l s s s.
struct sockaddr_iso {
.T&
l l l l.
+u_char+siso_len;+/* address length */
+u_char+siso_family;+/* address family */
+u_char+siso_slen;+/* session suffix length */
+u_char+siso_tlen;+/* transport suffix length */
+u_char+siso_nlen;+/* nsap length */
+char+siso_data[22];+/* minimum nsap + tsel */
}
.TE
\fR
.)b
.pp
The \fItype\fR parameter in the \fIsocket()\fR call
distinguishes
datagram protocols, stream protocols, sequenced
packet protocols, reliable datagram protocols, and
"raw" protocols (in other words, the absence of a transport protocol).
Unix provides manifest named constants for each of these types.
TP supports the sequenced packet protocol abstraction, to which
the manifest constant SOCK_SEQPACKET applies.
.pp
The \fIprotocol\fR
parameter is an integer that identifies the protocol to be used.
Unix provides a database of protocol names and their associated
protocol numbers.
Unix also provides user-level tools
for searching the database.
The tools take the form of library routines.
A protocol number for TP has been chosen
by the Internet NIC to allow TP to run in the Internet domain, and this
has been added to the Unix network protocol database.
The standard Internet database tools that serve TCP users
can
also serve user of TP
in the Internet domain, if the TP protocol number is added to the
proper Internet database file,
\fC/etc/protocols\fR.
This change must be made for TP to run in either the Internet or
in the ISO domain.
The ARGO package contains a set of tools and a database
for use with TP in the ISO domain.
This set of tools is described in the manual pages
\fIisodir(5)\fR and
\fIisodir(3)\fR.
.pp
When a socket is created, it is not given an address.
Since a socket cannot be reached by a remote entity unless it has an address,
the user must request that a socket be given an address by
using the \fIbind()\fR system call:
.(b
\fC
.TS
tab(+);
l s s s.
status = bind( s, addr, addrlen )
.T&
l l l.
+int+s;
+struct sockaddr+*addr;
+int+addrlen;
.TE
\fR
.)b
.pp
The address is expected to be in the format specified by the
\fIaf\fR parameter to the \fIsocket()\fR
call that yielded the socket descriptor \fIs\fR.
If the user
passes an address parameter with a zero-valued transport suffix,
the transport layer
assigns an unused 2-byte transport selector.
This is a 4.3 Unix convention; it is not part of any ISO standard.
.pp
The \fIconnect()\fR system call effects an active open.
It is used to establish a connection with an entity that is
passively waiting for connection requests, and whose
transport address is known.
.(b
\fC
.TS
tab(+);
l s s s.
status = connect( s, addr, addrlen )
.T&
l l l.
+int+s;
+struct sockaddr+*addr;
+int+addrlen;
.TE
\fR
.)b
.pp
The first parameter is a socket descriptor.
The \fIaddr\fR parameter is a transport address in the format
specified by the \fIaf\fR parameter to the \fIsocket()\fR
call that yielded the socket descriptor \fIs\fR.
.pp
A passive open is accomplished with two system calls,
\fIlisten()\fR followed by \fIaccept()\fR.
.(b
\fC
.TS
tab(+);
l s s s.
status = listen( s, queuelen )
.T&
l l l.
+int+s;
+int+queuelen;
.TE
\fR
.)b
.pp
The \fIqueuelen\fR argument specifies the maximum
number of pending connection
requests that will be queued for acceptance by this user.
Connections are then accepted by the
system call \fIaccept()\fR.
There is no way to refuse connections.
The functional equivalent of connection
refusal is accomplished by accepting a connection and immediately
disconnecting.
.(b
\fC
.TS
tab(+);
l s s s.
new_s = accept( s, addr, addrlen )
.T&
l l l.
+int+new_s, s;
+struct sockaddr+*addr;
+int+addrlen;
.TE
\fR
.)b
.pp
The \fIaccept()\fR call completes the connection
establishment. If a connection request from a prospective peer
is pending on the socket described by \fIs\fR, it is removed and
a new socket is created for use with this connection.
A socket descriptor for the new socket is returned by the
system call.
If no connection requests are pending, this call blocks.
If the \fIaccept()\fR call fails, -1 is returned.
The transport address of the entity requesting the connection
is returned in the \fIaddr\fR parameter, and the length
of the address is returned in the \fIaddrlen\fR parameter.
The address associated with the new socket is inherited
from the socket on which the \fIlisten()\fR and \fIaccept()\fR were performed.
.pp
It is possible for the \fIaccept()\fR call to be interrupted
by an asynchronous event such as the arrival of expedited
data.
When system calls are interrupted, Unix returns the value -1
to the caller and puts the constant
EINTR in the global variable \fIerrno\fR.
This can create problems with the system call \fIaccept()\fR.
In the case of incoming expedited data, the interruption does
not indicate a problem, but the data may have arrived before
the caller has received the new socket descriptor, which is the
socket descriptor on which the expedited data are to be received.
In order to prevent this problem from occurring, the caller must
prevent the issuance of asynchronous indications until the
\fIaccept()\fR
call has returned.
Asynchronous indications are discussed below, in
the section titled
"Indications from the transport layer to the transport user".
.pp
It is possible to discover the
address bound to a
socket with the
\fIgetsockname()\fR system call.
.(b
\fC
.TS
tab(+);
l s s s.
status = getsockname( s, addr, addrlen )
.T&
l l l.
+int+s;
+struct sockaddr+*addr;
+int+addrlen;
.TE
\fR
.)b
.pp
If the socket has a peer, that is, it is connected,
the system call
\fIgetpeername()\fR
is used to discover the peer's address.
.(b
\fC
.TS
tab(+);
l s s s.
status = getpeername( s, addr, addrlen )
.T&
l l l.
+int+s;
+struct sockaddr+*addr;
+int+addrlen;
.TE
\fR
.)b
.lp
The names returned by
\fIgetsockname()\fR and \fIgetpeername()\fR
do not contain extended TSELs.
Extended TSELs can be retrieved with
the \fIgetsockopt()\fR and
\fIsetsockopt()\fR system calls, described below.
.pp
Unix supports several protocol-independent options
and protocol-specific options
associated with sockets.
These options can be inspected and changed by using
the \fIgetsockopt()\fR and
\fIsetsockopt()\fR system calls.
.(b
\fC
.TS
tab(+);
l s s s.
status = getsockopt( s, level, option, value, valuelen )
.T&
l l l.
+int+s, level, option;
+char+*value;
+int+*valuelen;
.TE
\fR
.)b
.(b
\fC
.TS
tab(+);
l s s s.
status = setsockopt( s, level, option, value, valuelen )
.T&
l l l.
+int+s, level, option;
+char+*value;
+int+valuelen;
.TE
\fR
.)b
.pp
The \fIlevel\fR argument may indicate
either
that this option applies to sockets or that it applies to
a specific protocol.
The constants SOL_SOCKET, SOL_TRANSPORT, and SOL_NETWORK
are possible values for the \fIlevel\fR argument.
The \fIoption\fR argument is an integer that identifies
the option chosen.
.\" LIST THE OPTIONS HERE
The options available to TP users provide the
user with the ability to control various TP protocol options
including but not limited to
TP class, TPDU size negotiated, TPDU format used,
acknowledgment and retransmission strategies.
For a detail list of the options, see the manual page \fItp(4p)\fR.
.sh 1 "Extended TSELs"
.pp
ARGO supports TSELs
of length 1 byte - 64 bytes for sockets bound to addresses in the
AF_ISO address family.
The ARGO user program uses the
\fIgetsockopt()\fR
and
\fIsetsockopt()\fR
system calls to
discover and assign extended TSELs.
.pp
To create a socket with an extended TSEL,
the process
.ip \(bu 5
opens a socket with \fCsocket(AF_ISO, SOCK_SEQPACKET, ISOPROTO_TP)\fR
.ip \(bu 5
binds an NSAP-address to the socket with \fIbind()\fR.
The address bound may contain a 2-byte selector (\fIiso_tsuffix\fR).
.ip \(bu 5
uses \fIsetsockopt()\fR with the command TPOPT_MY_TSEL,
to assign a TSEL to the socket.
.ip \(bu 5
calls \fIlisten(), connect()\fR, or any other appopriate system calls
to use the socket as desired.
.lp
To connect to a transport entity that is bound to a TSAP-address with
an extended TSEL, the
process
.ip \(bu 5
opens a socket with \fCsocket(AF_ISO, SOCK_SEQPACKET, ISOPROTO_TP)\fR
.ip \(bu 5
uses \fIsetsockopt()\fR, with the command TPOPT_PEER_TSEL,
to assign a PEER TSEL to the socket.
This TSEL is used by the transport entity
for all subsequent connect requests made on this socket,
unless the peer TSEL is changed by another call to
\fIsetsockopt()\fR employing the command TPOPT_PEER_TSEL.
.lp
To discover the TSEL of the peer of a connected socket,
the process
.ip \(bu 5
uses \fIgetsockopt()\fR with the command TPOPT_PEER_TSEL.
.lp
To discover the TSEL of socket's own address,
the process
.ip \(bu 5
uses \fIgetsockopt()\fR with the command TPOPT_MY_TSEL.
.sh 1 "Data transfer"
.pp
Earlier BSD-based systems have provided system calls for data transfer
having bugs and semantics that are problematic for TP.
These should be correct as presented in the test system.
The problem was in the manner in which the kernel
handled interrupted system calls.
The send and receive primitives
may be interrupted by signals.
A signal is the mechanism used to indicate
the presence of expedited data or out-of-band data.
If the send primitive as interrupted before completion,
the user could not determine how many octets of data were sent.
All forms of the existing interface
(\fIsend()\fR,
\fIrecv()\fR,
\fIsendmsg()\fR,
\fIrecvmsg()\fR,
\fIsendto()\fR,
\fIrecvfrom()\fR,
\fIwrite()\fR,
\fIread\fR,
\fIwritev()\fR,
and \fIreadv()\fR system calls)
return an octet count
when the system call completes, and to return a short count
if the system call is interrupted.
.pp
The system calls sendmsg and recvmsg
have been revised to make them more convenient for receipt of
out of band data.
.(b
\fC
.TS
tab(+);
l s s s.
cc = sendmsg( s, msg, flags )
.T&
l l l.
+int+s;
+istruct msghdr+msg;
+unsigned int+flags;
.TE
\fR
.)b
.(b
\fC
.TS
tab(+);
l s s s.
cc = recvmsg( s, msg, flags )
.T&
l l l.
+int+s;
+istruct msghdr+msg;
+unsigned int+flags;
.TE
\fR
.)b
.pp
The reader should now consult the manual page for recvmsg
for an explanation of the elements of the msghdr structure,
and how the calls may be used to glean user-connection-request data.
.pp
The \fIflags\fR parameter serves several purposes.
The TP specification requires that TSDUs be unlimited in size.
System calls cannot pass unlimited amounts of data between the user
and the kernel, so
there cannot be a one-to-one correspondence between TSDUs and
system calls.
The \fIflags\fR
parameter is used to mark the end-of-TSDU on sending data.
When receiving,
TP sets this bit
in the flags element of the msghdr structure
when the end of a TSDU is consumed.
This way one TSDU can span several system calls.
It is possible for the peer to send an empty TPDU with the end-of-TSDU
flag set, in which case the transport user
may receive zero octets with the end-of-TSDU flag set.
.pp
The \fIflags\fR parameter also serves to distinguish data transfer primitives
from expedited data transfer primitives.
The flag bit MSG_OOB is provided for "out of band data" in the
DoD Internet protocols. It is also used to provide the expedited data service
of the ISO protocols.
The transport layer will deliver one expedited datum (there will be a
one-to-one correspondence between expedited TSDUs and XPD TPDUs)
at a time.
The user must receive the datum before the transport
layer will accept more expedited data.
Each expedited datum my contain up to 16 octets.
.pp
.sh 1 "Disconnection"
.pp
The \fIclose\fR system call will disconnect any association
between two TP entities.
.(b
\fC
.TS
tab(+);
l s s s.
status = close( s )
.T&
l l l.
+int+s;
.TE
\fR
.)b
.pp
The argument \fIs\fR is a socket descriptor.
If a Unix user process terminates, Unix will close all files and
sockets associated with the process, which means all transport
connections associated with the process will be disconnected.
.sh 1 "Indications from the transport layer to the transport user"
.pp
The above set of system calls allows you to establish
a connection, transfer data, and disconnect.
The
presence or reception of expedited data is indicated
by TP setting the MSG_OOB bit in the flags element of the msg structure.
A disconnection initiated by the peer or by one of the
cooperating TP entities can be signalled by a control message,
although we have not yet implemented this.
.pp
The Unix signal mechanism may be used to provide these
service elements, as well.
When an expedited data TSDU arrives, the TP may interrupt
the user with a SIGURG signal ("urgent condition present on socket").
The user must have previously registered a procedure to handle
the signal by using the \fIsigvec()\fR system call or the
\fIsignal()\fR library routine provided for that purpose.
The signal handler takes the form
.(b
\fC
.TS
tab(+);
l s s s.
int sighandler( signal_number)
.T&
l l l.
+int+signal_number;
.TE
\fR
.)b
.pp
The \fIsignal_number\fR argument will be the well-known constant SIGURG.
There are two reasons for
the transport layer to issue
a SIGURG:
expedited data
are present or
disconnection was initiated by a transport entity or by the peer.
Should the user have more than one transport connection open,
another system call is used to determine to which socket(s)
the urgent condition applies.
This is the \fIselect()\fR system call, described below.
.pp
When the SIGURG indicates a disconnection, there may be
user data from the peer present.
TP saves the disconnect data for the user to receive via the
\fIgetsockopt()\fR system call, or through the
.IR recvmsg ()
ancillary data mechanism.
.\"
.\"If the user does not receive the disconnect data before the
.\"reference timer expires, the data will be discarded and the
.\"socket will be closed.
.pp
Transport service users may use more than one transport
connection at a time.
The \fIselect()\fR system call facilitates this.
.(b
\fC
.TS
tab(+);
l s s s.
#include <sys/types.h>
+
nfound = select( num_to_scan, recvmask, sendmask,
+exceptmask, timeout )
.T&
l l l.
+int+nfound, num_to_scan;
+fd_set+*recvmask, *sendmask, *exceptmask;
+time+timeout;
.TE
\fR
.)b
.pp
This system call takes as parameters a set of masks
that specify a subset of the socket descriptors that are in
use by the user program.
\fISelect()\fR inspects the sockets to see if they have data
to be received, can service a send without blocking, or
have an exceptional condition pending, respectively.
The masks will be set upon return to indicate the socket descriptors
for which the respective conditions exist.
The \fInum_to_scan\fR argument limits the number of sockets that are
inspected.
The call will return within the amount of time given in the
\fItimeout\fR parameter, or, if the parameter is zero, \fIselect()\fR
will block indefinitely.
.\" FIGURE
.so ../wisc/figs/TS_primitives.nr
.pp
.CF
summarizes the mapping of the transport service primitives
to Unix facilities.