retrying an xfer, to avoid reading/writing data from/to wrong offset,
and eventually beyond the end of data buffer
fixes data corruption under QEMU observed by Paul Ripke for emulated
IDE drives
driver ata_bio hooks read parts of the xfer after ata_exec_xfer()
call in order to determine return value, change so that the hook
doesn't return any value - callers do not care already,
as all I/O requests are asynchronous
this problem was uncovered by recent change for wd(4) to not hold
wd mutex during ata_bio call, the interrupt for the xfer might
thus actually fire immediately
adjust also ata_exec_command driver hooks similarily - remove all
completion and waiting logic from drivers, upper layer ata code
using AT_WAIT/AT_POLL changed to call ata_wait_cmd() itself
PR kern/55169 by Nick Hudson
struct disk_sectoralign {
/* First aligned sector number. */
uint32_t dsa_firstaligned;
/* Number of sectors per aligned unit. */
uint32_t dsa_alignment;
};
- Teach wd(4) to get it from ATA.
- Teach cgd(4) to pass it through from the underlying disk.
- Teach dk(4) to pass it through with adjustments.
- Teach zpool (zfs) to take advantage of it.
=> XXX zpool doesn't seem to understand when the vdev's starting
sector is misaligned.
Missing:
- ccd(4) and raidframe(4) support -- these should support _using_
DIOCGSECTORALIGN to decide where to start putting ccd or raid
stripes on disk, and these should perhaps _implement_
DIOCGSECTORALIGN by reporting the stripe/interleave factor.
- sd(4) support -- I don't know any obvious way to get it from SCSI,
but if any SCSI wizards know better than I, please feel free to
teach sd(4) about it!
- any ld(4) attachments -- might be worth teaching the ld drivers for
nvme and various raid controllers to get the aligned sector size
There's some duplicate logic here for now. I'm doing it this way,
rather than gathering the logic into a new disklabel_sectoralign
function or something, so that this change is limited to adding a new
ioctl, without any new kernel symbols, in order to make it easy to
pull up to netbsd-9 without worrying about the module ABI.
If a disk is backed by a physical medium other than itself, such as
cgd(4), then it passes DK_DUMP_RECURSIVE to disable the recursion
detection for dk_dump.
If, however, a device represents a physical medium on its own, such
as wd(4), then it passes 0 instead.
With this, I can now dump to dk on cgd on dk on wd.
there has been no write. This avoids a (long) timeout on the flush cache
command triggered by atactl sleep, when the device is open only by the atactl
command itself.
If a drive has no partition open and goes to sleep, the WDF_LOADED
flag is clear, and the next open will issue wd_get_params() command.
But to wake up the drive a reset is required, and wd_get_params() doens't
issue a reset on timeout, so there's no way to wake up the disk.
Add a retry after reset to wd_get_params().
Tested by Hauke Fath; fixes PR kern/49457
makes the regular buffer queue ineffective, it also allowed to queue
an unlimited number of requests.
Fix this by limiting the number of requests queued to the driver to
the possible number of concurrent NCQ transactions.
- ata_xfer's are dynamicall allocated as needed using a pool, no longer
limited to number of possible openings supported by controller; dump
and recovery paths use dedicated pre-allocated storage
- moved callouts and condvars from ata_xfer to queue or channel, so that
ata_xfer does not need special initialization
- slot allocation now done when xfer is being activated, uncoupled
from memory allocation; active slots are no longer tracked by controller
code
- channel and drive reset is done always via the atabus thread, and
now executes with channel locked the whole time
- NCQ recovery moved to shared function, and run via the thread also
- added some workarounds for buggy error recovery AHCI emulation in QEMU
and Parallels
designed to primarily fix kern/52614, but might also help with kern/47041
and kern/53183
channel, which happened when one disk had pending I/O while the other
disk executed the final disk flush - need to restart bufq processing
once xfer is freed in this case
it could happen e.g. on boot when system executes fsck on different
partitions on the two drives in parallell and hence open and closes
the disk devices repeatedly
add KASSERT() for empty bufq on wd_lastclose(), and fix similar issue
also on suspend/standby path
this was introduced by the NCQ merge and not dksubr - before the merge
each drive had their own xfer, so they could not block each other
fixes PR kern/52783 by Onno van der Linden; many thanks for extensive
help with tracking this down
conversion had, to avoid possible race
on my system doesn't really change behaviour, besides the test runs
being slightly faster (3x parallell pkgsrc archive extraction, up
to 5% difference), thought that can just be noise
done as part of investigation for PR kern/53183 by Sevan Janiyan
polled, so that the logic for AT_POLL matches how e.g. ata_dmaerr() is
called; this was the original intent of the change in 1.428.2.25,
to make the error handling safe wrt. polled xfers
this is stopgap fix for ATA channel wedge after DMA error, as reported
by Martin Husemann in PR kern/52606, and PR kern/52605
problem happened due to ata_reset_channel() being called once in ata_dmaerr()
with flags == 0, which freezed channel and set flag to reset via thread,
then ata_reset_channel() was called via wdc_drive_reset() with AT_POLL, which
just executed the reset and cleared the flag, without clearing the extra
freeze; that logic will be refactored in separate commit
ATA subsystem was changed to support several outstanding commands, and use
NCQ xfers if supported by both the controller and the disk, including NCQ
error recovery. Set NCQ high priority for BPRIO_TIMECRITICAL xfers
if supported. Added FUA support.
Done some work towards MP-safe, all ATA code tsleep()/wakeup() replaced
by condvars, and switched most code from spl* to mutexes (separate
wd(4) and ata channel lock).
Introduced new option WD_CHAOS_MONKEY to facilitate testing of error
handling, fixed several uncovered issues. Also fixed several problems
with kernel dump to wd(4) disk.
Tested with ahcisata(4), mvsata(4), siisata(4), piixide(4) on amd64,
with and without port multiplier, both disk and ATAPI devices; other
drivers and archs mechanically adjusted and compile-tested. NCQ is
supported for ahcisata(4) and siisata(4) for any controller, for
mvsata(4) only Gen IIe ones for now. Also enabled ATAPI support in
mvsata(4).
Thanks to Matt Thomas for initial ATA infrastructure patch, and
Jonathan A.Kollasch for siisata(4) NCQ changes and general testing.
Also fixes PR kern/43169 (wd(4)); and PR kern/11811, PR kern/47041,
PR kern/51979 (kernel dump)
by the number of concurrent I/O requests. Also introduce a new disk_wait()
function to measure requests waiting in a bufq.
iostat -y now reports data about waiting and active requests.
So far only drivers using dksubr and dk, ccd, wd and xbd collect data about
waiting requests.
detaching devices at shutdown time with RB_POWERDOWN.
When detaching wd(4), put the drive in standby before detach
for DETACH_POWEROFF.
Fix PR kern/51252
altq Drop Type
disklabel Disk Type
file Descriptor Type
(not to mention constants that contain the string DTYPE).
Let's make them two, by changing the disklabel one to be DisK TYPE since the
other disklabel constants seem to do that. Not many userland programs use
these constants (and the ones that they do are mostly in ifdefs). They will
be fixed shortly.