509a188649
in the FREENIX track.
150 lines
8.1 KiB
Plaintext
150 lines
8.1 KiB
Plaintext
.\" $NetBSD: 1.me,v 1.1 1998/07/15 00:34:54 thorpej Exp $
|
|
.\"
|
|
.\" Copyright (c) 1998 Jason R. Thorpe.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\" 3. All advertising materials mentioning features or use of this software
|
|
.\" must display the following acknowledgements:
|
|
.\" This product includes software developed for the NetBSD Project
|
|
.\" by Jason R. Thorpe.
|
|
.\" 4. The name of the author may not be used to endorse or promote products
|
|
.\" derived from this software without specific prior written permission.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
|
|
.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
|
.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
|
|
.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
|
|
.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
|
|
.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
|
.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
|
|
.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
|
.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.sh 1 "Introduction"
|
|
.pp
|
|
NetBSD is a portable, modern UNIX-like operating system which currently
|
|
runs on eighteen platforms covering nine processor architectures. Some
|
|
of these platforms, including the Alpha and i386\**, share the PCI bus
|
|
.(f
|
|
\**The term "i386" is used here to refer to all of the 386-class and higher
|
|
processors, including the i486, Pentium, Pentium Pro, and Pentium II.
|
|
.)f
|
|
as a common architectural feature.
|
|
In order to share device drivers for PCI devices between different
|
|
platforms, abstractions that hide the details of bus access must be
|
|
invented. The details that must be hidden can be broken down into
|
|
two classes: CPU access to devices on the bus (\fIbus_space\fR)
|
|
and device access to host memory (\fIbus_dma\fR). Here we will discuss
|
|
the latter; \fIbus_space\fR is a complicated topic in and of itself, and
|
|
is beyond the scope of this paper.
|
|
.pp
|
|
Within the scope of DMA, there are two broad classes of details
|
|
that must be hidden from the core device driver.
|
|
The first class, host details, deals with issues such as
|
|
the physical mapping of system memory (and the DMA mechanisms employed
|
|
as a result of such mapping) and cache semantics. The second
|
|
class, bus details, deals with issues related to features or
|
|
limitations specific to the bus to which a device is attached, such
|
|
as DMA bursting and address line limitations.
|
|
.sh 2 "Host platform details"
|
|
.pp
|
|
In the example platforms listed above, there are at least three different
|
|
mechanisms used to perform DMA. The first is used by the i386 platform.
|
|
This mechanism can be described as "what you see is what you get":
|
|
the address that the device uses to perform the DMA transfer is the same
|
|
address that the host CPU uses to access the memory location in question.
|
|
.so figure1.pic
|
|
.pp
|
|
The second mechanism,
|
|
employed by the Alpha, is very similar to the first; the address
|
|
the host CPU uses to access the memory location in question is offset from
|
|
some base address at which host memory is direct-mapped on the device bus
|
|
for the purpose of DMA.
|
|
.so figure2.pic
|
|
.pp
|
|
The third mechanism, scatter-gather-mapped DMA, employs an MMU which performs
|
|
translation of DMA addresses to host memory physical addresses. This
|
|
mechanism is also used by the Alpha, because Alpha platforms implement a
|
|
physical address space sometimes significantly larger than the 32-bit
|
|
address space supported by most currently-available PCI devices.
|
|
.so figure3.pic
|
|
.pp
|
|
The second and third DMA mechanisms above are combined on the Alpha through
|
|
the use of \fIDMA windows\fR. The ASIC which implements the PCI bus
|
|
on a particular platform has at least two of these DMA windows. Each
|
|
window may be configured for direct-mapped or scatter-gather-mapped
|
|
DMA. Windows are chosen based on the type of DMA transfer being performed,
|
|
the bus type, and the physical address range of the host memory being
|
|
accessed.
|
|
.pp
|
|
These concepts apply to platforms other than those listed above
|
|
and busses other than PCI. Similar issues exist with the TurboChannel bus
|
|
used on DECstations and early Alpha systems, and with the Q-bus used on
|
|
some DEC MIPS and VAX-based servers.
|
|
.pp
|
|
The semantics of the host system's cache are also important to devices
|
|
which wish to perform DMA. Some systems are capable of cache-coherent
|
|
DMA. On such systems, the cache is often write-through (i.e. stores are
|
|
written both to the cache and to host memory), or the cache has special
|
|
snooping logic that can detect access to a memory location for which there
|
|
is a dirty cache line (which causes the cache to be flushed automatically).
|
|
Other systems are not capable of cache-coherent DMA. On these systems,
|
|
software must explicitly flush any data caches before memory-to-device
|
|
DMA transfers, as well as invalidate soon-to-be-stale cache lines before
|
|
device-to-memory DMA.
|
|
.sh 2 "Bus details"
|
|
.pp
|
|
In addition to hiding the platform-specific DMA details for a single bus,
|
|
it is desirable to share as much device driver code as possible for
|
|
a device which may attach to multiple busses. A good example is the
|
|
BusLogic family of SCSI adapters. This family of devices comes in ISA,
|
|
EISA, VESA local bus, and PCI flavors. While there are some bus-specific
|
|
details, such as probing and interrupt initialization, the vast majority
|
|
of the code that drives this family of devices is identical for each flavor.
|
|
.pp
|
|
The BusLogic family of SCSI adapters are examples of what are termed
|
|
\fIbus masters\fR. That is to say, the device itself performs all bus
|
|
handshaking and host memory access during a DMA transfer. No third party
|
|
is involved in the transfer. Such devices, when performing a DMA transfer,
|
|
present the DMA address on the bus address lines, execute the bus's fetch
|
|
or store operation, increment the address, and so forth until the transfer
|
|
is complete. Because the device is using the bus address lines, the range
|
|
of host physical addresses the device can access is limited by the number
|
|
of such lines. On the PCI bus, which has at least 32 address lines, the
|
|
device may be able to access the entire physical address space of a 32-bit
|
|
architecture, such as the i386. ISA, however, only has 24 address lines.
|
|
This means that the device can directly access only 16MB of physical
|
|
address space.
|
|
.pp
|
|
A common solution to the limited-address-lines problem is a technique
|
|
known as \fIDMA bouncing\fR. This technique involves a second memory
|
|
area, located in the physical address range accessible by the device,
|
|
known as a \fIbounce buffer\fR. In a memory-to-device transfer, the
|
|
data is copied by the CPU to the bounce buffer, and the DMA operation is
|
|
started. Conversely, in a device-to-memory transfer, the DMA operation is
|
|
started, and the CPU then copies the data from the bounce buffer once the
|
|
DMA operation has completed.
|
|
.pp
|
|
While simple to implement, DMA bouncing is not the most elegant way to
|
|
solve the limited-address-line problem. On the Alpha, for example,
|
|
scatter-gather-mapped DMA may be used to translate the out-of-range
|
|
memory physical addresses to in-range DMA addresses that the device
|
|
may use. This solution tends to offer better performance due to
|
|
eliminated data copies, and is less expensive in terms of memory usage.
|
|
.pp
|
|
Returning to the BusLogic SCSI example, it is undesirable to place
|
|
intimate knowledge of direct-mapping, scatter-gather-mapping,
|
|
and DMA bouncing in the core device driver. Clearly, an abstraction that
|
|
hides these details and presents a consistent interface, regardless of
|
|
the DMA mechanism being used, is needed.
|