NetBSD/share/doc/smm/18.net/6.t

.\" Copyright (c) 1983, 1986, 1993
.\"	The Regents of the University of California.  All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\"    notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\"    notice, this list of conditions and the following disclaimer in the
.\"    documentation and/or other materials provided with the distribution.
.\" 3. All advertising materials mentioning features or use of this software
.\"    must display the following acknowledgement:
.\"	This product includes software developed by the University of
.\"	California, Berkeley and its contributors.
.\" 4. Neither the name of the University nor the names of its contributors
.\"    may be used to endorse or promote products derived from this software
.\"    without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\"	@(#)6.t	8.1 (Berkeley) 6/8/93
.\"
.nr H2 1
.\".ds RH "Internal layering
.br
.ne 2i
.NH
\s+2Internal layering\s0
.PP
The internal structure of the network system is divided into
three layers.  These
layers correspond to the services provided by the socket
abstraction, those provided by the communication protocols,
and those provided by the hardware interfaces.  The communication
protocols are normally layered into two or more individual
cooperating layers, though they are collectively viewed
in the system as one layer providing services supportive
of the appropriate socket abstraction.
.PP
The following sections describe the properties of each layer
in the system and the interfaces to which each must conform.
.NH 2
Socket layer
.PP
The socket layer deals with the interprocess communication
facilities provided by the system.  A socket is a bidirectional
endpoint of communication which is ``typed'' by the semantics
of communication it supports.  The system calls described in
the \fIBerkeley Software Architecture Manual\fP [Joy86]
are used to manipulate sockets.
.PP
A socket consists of the following data structure:
.DS
._f
struct socket {
	short	so_type;		/* generic type */
	short	so_options;		/* from socket call */
	short	so_linger;		/* time to linger while closing */
	short	so_state;		/* internal state flags */
	caddr_t	so_pcb;			/* protocol control block */
	struct	protosw *so_proto;	/* protocol handle */
	struct	socket *so_head;	/* back pointer to accept socket */
	struct	socket *so_q0;		/* queue of partial connections */
	short	so_q0len;		/* partials on so_q0 */
	struct	socket *so_q;		/* queue of incoming connections */
	short	so_qlen;		/* number of connections on so_q */
	short	so_qlimit;		/* max number queued connections */
	struct	sockbuf so_rcv;		/* receive queue */
	struct	sockbuf so_snd;		/* send queue */
	short	so_timeo;		/* connection timeout */
	u_short	so_error;		/* error affecting connection */
	u_short	so_oobmark;		/* chars to oob mark */
	short	so_pgrp;		/* pgrp for signals */
};
.DE
.PP
Each socket contains two data queues, \fIso_rcv\fP and \fIso_snd\fP,
and a pointer to routines which provide supporting services.
The type of the socket,
\fIso_type\fP is defined at socket creation time and used in selecting
those services which are appropriate to support it.  The supporting
protocol is selected at socket creation time and recorded in
the socket data structure for later use.  Protocols are defined
by a table of procedures, the \fIprotosw\fP structure, which will
be described in detail later.  A pointer to a protocol-specific
data structure,
the ``protocol control block,'' is also present in the socket structure.
Protocols control this data structure, which normally includes a
back pointer to the parent socket structure to allow easy
lookup when returning information to a user
(for example, placing an error number in the \fIso_error\fP
field).  The other entries in the socket structure are used in
queuing connection requests, validating user requests, storing
socket characteristics (e.g.
options supplied at the time a socket is created), and maintaining
a socket's state.
.PP
Processes ``rendezvous at a socket'' in many instances.  For instance,
when a process wishes to extract data from a socket's receive queue
and it is empty, or lacks sufficient data to satisfy the request,
the process blocks, supplying the address of the receive queue as
a ``wait channel' to be used in notification.  When data arrives
for the process and is placed in the socket's queue, the blocked
process is identified by the fact it is waiting ``on the queue.''
.NH 3
Socket state
.PP
A socket's state is defined from the following:
.DS
.ta \w'#define 'u +\w'SS_ISDISCONNECTING    'u +\w'0x000     'u
#define	SS_NOFDREF	0x001	/* no file table ref any more */
#define	SS_ISCONNECTED	0x002	/* socket connected to a peer */
#define	SS_ISCONNECTING	0x004	/* in process of connecting to peer */
#define	SS_ISDISCONNECTING	0x008	/* in process of disconnecting */
#define	SS_CANTSENDMORE	0x010	/* can't send more data to peer */
#define	SS_CANTRCVMORE	0x020	/* can't receive more data from peer */
#define	SS_RCVATMARK	0x040	/* at mark on input */

#define	SS_PRIV	0x080	/* privileged */
#define	SS_NBIO	0x100	/* non-blocking ops */
#define	SS_ASYNC	0x200	/* async i/o notify */
.DE
.PP
The state of a socket is manipulated both by the protocols
and the user (through system calls).
When a socket is created, the state is defined based on the type of socket.
It may change as control actions are performed, for example connection
establishment.
It may also change according to the type of
input/output the user wishes to perform, as indicated by options
set with \fIfcntl\fP.  ``Non-blocking'' I/O  implies that
a process should never be blocked to await resources.  Instead, any
call which would block returns prematurely
with the error EWOULDBLOCK, or the service request may be partially
fulfilled, e.g. a request for more data than is present.
.PP
If a process requested ``asynchronous'' notification of events
related to the socket, the SIGIO signal is posted to the process
when such events occur.
An event is a change in the socket's state;
examples of such occurrences are: space
becoming available in the send queue, new data available in the
receive queue, connection establishment or disestablishment, etc.
.PP
A socket may be marked ``privileged'' if it was created by the
super-user.  Only privileged sockets may
bind addresses in privileged portions of an address space
or use ``raw'' sockets to access lower levels of the network.
.NH 3
Socket data queues
.PP
A socket's data queue contains a pointer to the data stored in
the queue and other entries related to the management of
the data.  The following structure defines a data queue:
.DS
._f
struct sockbuf {
	u_short	sb_cc;		/* actual chars in buffer */
	u_short	sb_hiwat;	/* max actual char count */
	u_short	sb_mbcnt;	/* chars of mbufs used */
	u_short	sb_mbmax;	/* max chars of mbufs to use */
	u_short	sb_lowat;	/* low water mark */
	short	sb_timeo;	/* timeout */
	struct	mbuf *sb_mb;	/* the mbuf chain */
	struct	proc *sb_sel;	/* process selecting read/write */
	short	sb_flags;	/* flags, see below */
};
.DE
.PP
Data is stored in a queue as a chain of mbufs.
The actual count of data characters as well as high and low water marks are
used by the protocols in controlling the flow of data.
The amount of buffer space (characters of mbufs and associated data pages)
is also recorded along with the limit on buffer allocation.
The socket routines cooperate in implementing the flow control
policy by blocking a process when it requests to send data and
the high water mark has been reached, or when it requests to
receive data and less than the low water mark is present
(assuming non-blocking I/O has not been specified).*
.FS
* The low-water mark is always presumed to be 0
in the current implementation.
.FE
.PP
When a socket is created, the supporting protocol ``reserves'' space
for the send and receive queues of the socket.
The limit on buffer allocation is set somewhat higher than the limit
on data characters
to account for the granularity of buffer allocation.
The actual storage associated with a
socket queue may fluctuate during a socket's lifetime, but it is assumed
that this reservation will always allow a protocol to acquire enough memory
to satisfy the high water marks.
.PP
The timeout and select values are manipulated by the socket routines
in implementing various portions of the interprocess communications
facilities and will not be described here.
.PP
Data queued at a socket is stored in one of two styles.
Stream-oriented sockets queue data with no addresses, headers
or record boundaries.
The data are in mbufs linked through the \fIm_next\fP field.
Buffers containing access rights may be present within the chain
if the underlying protocol supports passage of access rights.
Record-oriented sockets, including datagram sockets,
queue data as a list of packets; the sections of packets are distinguished
by the types of the mbufs containing them.
The mbufs which comprise a record are linked through the \fIm_next\fP field;
records are linked from the \fIm_act\fP field of the first mbuf
of one packet to the first mbuf of the next.
Each packet begins with an mbuf containing the ``from'' address
if the protocol provides it,
then any buffers containing access rights, and finally any buffers
containing data.
If a record contains no data,
no data buffers are required unless neither address nor access rights
are present.
.PP
A socket queue has a number of flags used in synchronizing access
to the data and in acquiring resources:
.DS
._d
#define	SB_LOCK	0x01	/* lock on data queue (so_rcv only) */
#define	SB_WANT	0x02	/* someone is waiting to lock */
#define	SB_WAIT	0x04	/* someone is waiting for data/space */
#define	SB_SEL	0x08	/* buffer is selected */
#define	SB_COLL	0x10	/* collision selecting */
.DE
The last two flags are manipulated by the system in implementing
the select mechanism.
.NH 3
Socket connection queuing
.PP
In dealing with connection oriented sockets (e.g. SOCK_STREAM)
the two ends are considered distinct.  One end is termed
\fIactive\fP, and generates connection requests.  The other
end is called \fIpassive\fP and accepts connection requests.
.PP
From the passive side, a socket is marked with
SO_ACCEPTCONN when a \fIlisten\fP call is made,
creating two queues of sockets: \fIso_q0\fP for connections
in progress and \fIso_q\fP for connections already made and
awaiting user acceptance.
As a protocol is preparing incoming connections, it creates
a socket structure queued on \fIso_q0\fP by calling the routine
\fIsonewconn\fP().  When the connection
is established, the socket structure is then transferred
to \fIso_q\fP, making it available for an \fIaccept\fP.
.PP
If an SO_ACCEPTCONN socket is closed with sockets on either
\fIso_q0\fP or \fIso_q\fP, these sockets are dropped,
with notification to the peers as appropriate.
.NH 2
Protocol layer(s)
.PP
Each socket is created in a communications domain,
which usually implies both an addressing structure (address family)
and a set of protocols which implement various socket types within the domain
(protocol family).
Each domain is defined by the following structure:
.DS
.ta .5i +\w'struct  'u +\w'(*dom_externalize)();   'u
struct	domain {
	int	dom_family;		/* PF_xxx */
	char	*dom_name;
	int	(*dom_init)();		/* initialize domain data structures */
	int	(*dom_externalize)();	/* externalize access rights */
	int	(*dom_dispose)();	/* dispose of internalized rights */
	struct	protosw *dom_protosw, *dom_protoswNPROTOSW;
	struct	domain *dom_next;
};
.DE
.PP
At boot time, each domain configured into the kernel
is added to a linked list of domain.
The initialization procedure of each domain is then called.
After that time, the domain structure is used to locate protocols
within the protocol family.
It may also contain procedure references
for externalization of access rights at the receiving socket
and the disposal of access rights that are not received.
.PP
Protocols are described by a set of entry points and certain
socket-visible characteristics, some of which are used in
deciding which socket type(s) they may support.
.PP
An entry in the ``protocol switch'' table exists for each
protocol module configured into the system.  It has the following form:
.DS
.ta .5i +\w'struct  'u +\w'domain *pr_domain;    'u
struct protosw {
	short	pr_type;		/* socket type used for */
	struct	domain *pr_domain;	/* domain protocol a member of */
	short	pr_protocol;		/* protocol number */
	short	pr_flags;		/* socket visible attributes */
/* protocol-protocol hooks */
	int	(*pr_input)();		/* input to protocol (from below) */
	int	(*pr_output)();		/* output to protocol (from above) */
	int	(*pr_ctlinput)();	/* control input (from below) */
	int	(*pr_ctloutput)();	/* control output (from above) */
/* user-protocol hook */
	int	(*pr_usrreq)();		/* user request */
/* utility hooks */
	int	(*pr_init)();		/* initialization routine */
	int	(*pr_fasttimo)();	/* fast timeout (200ms) */
	int	(*pr_slowtimo)();	/* slow timeout (500ms) */
	int	(*pr_drain)();		/* flush any excess space possible */
};
.DE
.PP
A protocol is called through the \fIpr_init\fP entry before any other.
Thereafter it is called every 200 milliseconds through the
\fIpr_fasttimo\fP entry and
every 500 milliseconds through the \fIpr_slowtimo\fP for timer based actions.
The system will call the \fIpr_drain\fP entry if it is low on space and
this should throw away any non-critical data.
.PP
Protocols pass data between themselves as chains of mbufs using
the \fIpr_input\fP and \fIpr_output\fP routines.  \fIPr_input\fP
passes data up (towards
the user) and \fIpr_output\fP passes it down (towards the network); control
information passes up and down on \fIpr_ctlinput\fP and \fIpr_ctloutput\fP.
The protocol is responsible for the space occupied by any of the
arguments to these entries and must either pass it onward or dispose of it.
(On output, the lowest level reached must free buffers storing the arguments;
on input, the highest level is responsible for freeing buffers.)
.PP
The \fIpr_usrreq\fP routine interfaces protocols to the socket
code and is described below.
.PP
The \fIpr_flags\fP field is constructed from the following values:
.DS
.ta \w'#define 'u +\w'PR_CONNREQUIRED   'u +8n
#define	PR_ATOMIC	0x01		/* exchange atomic messages only */
#define	PR_ADDR	0x02		/* addresses given with messages */
#define	PR_CONNREQUIRED	0x04		/* connection required by protocol */
#define	PR_WANTRCVD	0x08		/* want PRU_RCVD calls */
#define	PR_RIGHTS	0x10		/* passes capabilities */
.DE
Protocols which are connection-based specify the PR_CONNREQUIRED
flag so that the socket routines will never attempt to send data
before a connection has been established.  If the PR_WANTRCVD flag
is set, the socket routines will notify the protocol when the user
has removed data from the socket's receive queue.  This allows
the protocol to implement acknowledgement on user receipt, and
also update windowing information based on the amount of space
available in the receive queue.  The PR_ADDR field indicates that any
data placed in the socket's receive queue will be preceded by the
address of the sender.  The PR_ATOMIC flag specifies that each \fIuser\fP
request to send data must be performed in a single \fIprotocol\fP send
request; it is the protocol's responsibility to maintain record
boundaries on data to be sent.  The PR_RIGHTS flag indicates that the
protocol supports the passing of capabilities;  this is currently
used only by the protocols in the UNIX protocol family.
.PP
When a socket is created, the socket routines scan the protocol
table for the domain
looking for an appropriate protocol to support the type of
socket being created.  The \fIpr_type\fP field contains one of the
possible socket types (e.g. SOCK_STREAM), while the \fIpr_domain\fP
is a back pointer to the domain structure.
The \fIpr_protocol\fP field contains the protocol number of the
protocol, normally a well-known value.
.NH 2
Network-interface layer
.PP
Each network-interface configured into a system defines a
path through which packets may be sent and received.
Normally a hardware device is associated with this interface,
though there is no requirement for this (for example, all
systems have a software ``loopback'' interface used for
debugging and performance analysis).
In addition to manipulating the hardware device, an interface
module is responsible
for encapsulation and decapsulation of any link-layer header
information required to deliver a message to its destination.
The selection of which interface to use in delivering packets
is a routing decision carried out at a
higher level than the network-interface layer.
An interface may have addresses in one or more address families.
The address is set at boot time using an \fIioctl\fP on a socket
in the appropriate domain; this operation is implemented by the protocol
family, after verifying the operation through the device \fIioctl\fP entry.
.PP
An interface is defined by the following structure,
.DS
.ta .5i +\w'struct   'u +\w'ifaddr *if_addrlist;   'u
struct ifnet {
	char	*if_name;		/* name, e.g. ``en'' or ``lo'' */
	short	if_unit;		/* sub-unit for lower level driver */
	short	if_mtu;			/* maximum transmission unit */
	short	if_flags;		/* up/down, broadcast, etc. */
	short	if_timer;		/* time 'til if_watchdog called */
	struct	ifaddr *if_addrlist;	/* list of addresses of interface */
	struct	ifqueue if_snd;		/* output queue */
	int	(*if_init)();		/* init routine */
	int	(*if_output)();		/* output routine */
	int	(*if_ioctl)();		/* ioctl routine */
	int	(*if_reset)();		/* bus reset routine */
	int	(*if_watchdog)();	/* timer routine */
	int	if_ipackets;		/* packets received on interface */
	int	if_ierrors;		/* input errors on interface */
	int	if_opackets;		/* packets sent on interface */
	int	if_oerrors;		/* output errors on interface */
	int	if_collisions;		/* collisions on csma interfaces */
	struct	ifnet *if_next;
};
.DE
Each interface address has the following form:
.DS
.ta \w'#define 'u +\w'struct   'u +\w'struct   'u +\w'sockaddr ifa_addr;   'u-\w'struct   'u
struct ifaddr {
	struct	sockaddr ifa_addr;	/* address of interface */
	union {
		struct	sockaddr ifu_broadaddr;
		struct	sockaddr ifu_dstaddr;
	} ifa_ifu;
	struct	ifnet *ifa_ifp;		/* back-pointer to interface */
	struct	ifaddr *ifa_next;	/* next address for interface */
};
.ta \w'#define 'u +\w'ifa_broadaddr   'u +\w'ifa_ifu.ifu_broadaddr	   'u
#define	ifa_broadaddr	ifa_ifu.ifu_broadaddr	/* broadcast address */
#define	ifa_dstaddr	ifa_ifu.ifu_dstaddr	/* other end of p-to-p link */
.DE
The protocol generally maintains this structure as part of a larger
structure containing additional information concerning the address.
.PP
Each interface has a send queue and routines used for
initialization, \fIif_init\fP, and output, \fIif_output\fP.
If the interface resides on a system bus, the routine \fIif_reset\fP
will be called after a bus reset has been performed.
An interface may also
specify a timer routine, \fIif_watchdog\fP;
if \fIif_timer\fP is non-zero, it is decremented once per second
until it reaches zero, at which time the watchdog routine is called.
.PP
The state of an interface and certain characteristics are stored in
the \fIif_flags\fP field.  The following values are possible:
.DS
._d
#define	IFF_UP	0x1	/* interface is up */
#define	IFF_BROADCAST	0x2	/* broadcast is possible */
#define	IFF_DEBUG	0x4	/* turn on debugging */
#define	IFF_LOOPBACK	0x8	/* is a loopback net */
#define	IFF_POINTOPOINT	0x10	/* interface is point-to-point link */
#define	IFF_NOTRAILERS	0x20	/* avoid use of trailers */
#define	IFF_RUNNING	0x40	/* resources allocated */
#define	IFF_NOARP	0x80	/* no address resolution protocol */
.DE
If the interface is connected to a network which supports transmission
of \fIbroadcast\fP packets, the IFF_BROADCAST flag will be set and
the \fIifa_broadaddr\fP field will contain the address to be used in
sending or accepting a broadcast packet.  If the interface is associated
with a point-to-point hardware link (for example, a DEC DMR-11), the
IFF_POINTOPOINT flag will be set and \fIifa_dstaddr\fP will contain the
address of the host on the other side of the connection.  These addresses
and the local address of the interface, \fIif_addr\fP, are used in
filtering incoming packets.  The interface sets IFF_RUNNING after
it has allocated system resources and posted an initial read on the
device it manages.  This state bit is used to avoid multiple allocation
requests when an interface's address is changed.  The IFF_NOTRAILERS
flag indicates the interface should refrain from using a \fItrailer\fP
encapsulation on outgoing packets, or (where per-host negotiation
of trailers is possible) that trailer encapsulations should not be requested;
\fItrailer\fP protocols are described
in section 14.  The IFF_NOARP flag indicates the interface should not
use an ``address resolution protocol'' in mapping internetwork addresses
to local network addresses.
.PP
Various statistics are also stored in the interface structure.  These
may be viewed by users using the \fInetstat\fP(1) program.
.PP
The interface address and flags may be set with the SIOCSIFADDR and
SIOCSIFFLAGS \fIioctl\fP\^s.  SIOCSIFADDR is used initially to define each
interface's address; SIOGSIFFLAGS can be used to mark
an interface down and perform site-specific configuration.
The destination address of a point-to-point link is set with SIOCSIFDSTADDR.
Corresponding operations exist to read each value.
Protocol families may also support operations to set and read the broadcast
address.
In addition, the SIOCGIFCONF \fIioctl\fP retrieves a list of interface
names and addresses for all interfaces and protocols on the host.
.NH 3
UNIBUS interfaces
.PP
All hardware related interfaces currently reside on the UNIBUS.
Consequently a common set of utility routines for dealing
with the UNIBUS has been developed.  Each UNIBUS interface
utilizes a structure of the following form:
.DS
.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR];    'u
struct	ifubinfo {
	short	iff_uban;			/* uba number */
	short	iff_hlen;			/* local net header length */
	struct	uba_regs *iff_uba;		/* uba regs, in vm */
	short	iff_flags;			/* used during uballoc's */
};
.DE
Additional structures are associated with each receive and transmit buffer,
normally one each per interface; for read,
.DS
.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR];    'u
struct	ifrw {
	caddr_t	ifrw_addr;			/* virt addr of header */
	short	ifrw_bdp;			/* unibus bdp */
	short	ifrw_flags;			/* type, etc. */
#define	IFRW_W	0x01				/* is a transmit buffer */
	int	ifrw_info;			/* value from ubaalloc */
	int	ifrw_proto;			/* map register prototype */
	struct	pte *ifrw_mr;			/* base of map registers */
};
.DE
and for write,
.DS
.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR];    'u
struct	ifxmt {
	struct	ifrw ifrw;
	caddr_t	ifw_base;			/* virt addr of buffer */
	struct	pte ifw_wmap[IF_MAXNUBAMR];	/* base pages for output */
	struct	mbuf *ifw_xtofree;		/* pages being dma'd out */
	short	ifw_xswapd;			/* mask of clusters swapped */
	short	ifw_nmr;			/* number of entries in wmap */
};
.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR];    'u
#define	ifw_addr	ifrw.ifrw_addr
#define	ifw_bdp	ifrw.ifrw_bdp
#define	ifw_flags	ifrw.ifrw_flags
#define	ifw_info	ifrw.ifrw_info
#define	ifw_proto	ifrw.ifrw_proto
#define	ifw_mr	ifrw.ifrw_mr
.DE
One of each of these structures is conveniently packaged for interfaces
with single buffers for each direction, as follows:
.DS
.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR];    'u
struct	ifuba {
	struct	ifubinfo ifu_info;
	struct	ifrw ifu_r;
	struct	ifxmt ifu_xmt;
};
.ta \w'#define 'u +\w'ifw_xtofree 'u
#define	ifu_uban	ifu_info.iff_uban
#define	ifu_hlen	ifu_info.iff_hlen
#define	ifu_uba		ifu_info.iff_uba
#define	ifu_flags	ifu_info.iff_flags
#define	ifu_w		ifu_xmt.ifrw
#define	ifu_xtofree	ifu_xmt.ifw_xtofree
.DE
.PP
The \fIif_ubinfo\fP structure contains the general information needed
to characterize the I/O-mapped buffers for the device.
In addition, there is a structure describing each buffer, including
UNIBUS resources held by the interface.
Sufficient memory pages and bus map registers are allocated to each buffer
upon initialization according to the maximum packet size and header length.
The kernel virtual address of the buffer is held in \fIifrw_addr\fP,
and the map registers begin
at \fIifrw_mr\fP.  UNIBUS map register \fIifrw_mr\fP\^[\-1]
maps the local network header
ending on a page boundary.  UNIBUS data paths are
reserved for read and for
write, given by \fIifrw_bdp\fP.  The prototype of the map
registers for read and for write is saved in \fIifrw_proto\fP.
.PP
When write transfers are not at least half-full pages on page boundaries,
the data are just copied into the pages mapped on the UNIBUS
and the transfer is started.
If a write transfer is at least half a page long and on a page
boundary, UNIBUS page table entries are swapped to reference
the pages, and then the initial pages are
remapped from \fIifw_wmap\fP when the transfer completes.
The mbufs containing the mapped pages are placed on the \fIifw_xtofree\fP
queue to be freed after transmission.
.PP
When read transfers give at least half a page of data to be input, page
frames are allocated from a network page list and traded
with the pages already containing the data, mapping the allocated
pages to replace the input pages for the next UNIBUS data input.
.PP
The following utility routines are available for use in
writing network interface drivers; all use the
structures described above.
.LP
if_ubaminit(ifubinfo, uban, hlen, nmr, ifr, nr, ifx, nx);
.br
if_ubainit(ifuba, uban, hlen, nmr);
.IP
\fIif_ubaminit\fP allocates resources on UNIBUS adapter \fIuban\fP,
storing the information in the \fIifubinfo\fP, \fIifrw\fP and \fIifxmt\fP
structures referenced.
The \fIifr\fP and \fIifx\fP parameters are pointers to arrays
of \fIifrw\fP and \fIifxmt\fP structures whose dimensions
are \fInr\fP and \fInx\fP, respectively.
\fIif_ubainit\fP is a simpler, backwards-compatible interface used
for hardware with single buffers of each type.
They are called only at boot time or after a UNIBUS reset.
One data path (buffered or unbuffered,
depending on the \fIifu_flags\fP field) is allocated for each buffer.
The \fInmr\fP parameter indicates
the number of UNIBUS mapping registers required to map a maximal
sized packet onto the UNIBUS, while \fIhlen\fP specifies the size
of a local network header, if any, which should be mapped separately
from the data (see the description of trailer protocols in chapter 14).
Sufficient UNIBUS mapping registers and pages of memory are allocated
to initialize the input data path for an initial read.  For the output
data path, mapping registers and pages of memory are also allocated
and mapped onto the UNIBUS.  The pages associated with the output
data path are held in reserve in the event a write requires copying
non-page-aligned data (see \fIif_wubaput\fP below).
If \fIif_ubainit\fP is called with memory pages already allocated,
they will be used instead of allocating new ones (this normally
occurs after a UNIBUS reset).
A 1 is returned when allocation and initialization are successful,
0 otherwise.
.LP
m = if_ubaget(ifubinfo, ifr, totlen, off0, ifp);
.br
m = if_rubaget(ifuba, totlen, off0, ifp);
.IP
\fIif_ubaget\fP and \fIif_rubaget\fP pull input data
out of an interface receive buffer and into an mbuf chain.
The first interface passes pointers to the \fIifubinfo\fP structure
for the interface and the \fIifrw\fP structure for the receive buffer;
the second call may be used for single-buffered devices.
\fItotlen\fP specifies the length of data to be obtained, not counting the
local network header.  If \fIoff0\fP is non-zero, it indicates
a byte offset to a trailing local network header which should be
copied into a separate mbuf and prepended to the front of the resultant mbuf
chain.  When the data amount to at least a half a page,
the previously mapped data pages are remapped
into the mbufs and swapped with fresh pages, thus avoiding
any copy.
The receiving interface is recorded as \fIifp\fP, a pointer to an \fIifnet\fP
structure, for the use of the receiving network protocol.
A 0 return value indicates a failure to allocate resources.
.LP
if_wubaput(ifubinfo, ifx, m);
.br
if_wubaput(ifuba, m);
.IP
\fIif_ubaput\fP and \fIif_wubaput\fP map a chain of mbufs
onto a network interface in preparation for output.
The first interface is used by devices with multiple transmit buffers.
The chain includes any local network
header, which is copied so that it resides in the mapped and
aligned I/O space.
Page-aligned data that are page-aligned in the output buffer
are mapped to the UNIBUS in place of the normal buffer page,
and the corresponding mbuf is placed on a queue to be freed after transmission.
Any other mbufs which contained non-page-sized
data portions are copied to the I/O space and then freed.
Pages mapped from a previous output operation (no longer needed)
are unmapped.