common routine into the individual load routines, since each load
routine needs to muddle with the "internals" of this operation.
Add a `prefetch threshold' member to the bus_dma_tag_t, so that
eventually we can determine whether or not to allocate a spill
page on a per-mapping basis.
annoyance on systems that prefetch the next page during memory -> device
DMA if the DMA comes within a certain distance of the end of the current
page. This could cause machine checks since the PTE after the last page
would not have the valid bit set.
(I'm not going to complain about this slight kludge too much, since prefetch
makes DMA much faster...)