- Don't copyout kernel virtual addresses (of SLIST entries) that
userland won't use anyway.
=> The structure still has space for this pointer; it's just always
null when userland gets it now.
- Don't copyout under a lock.
- Stop and return error if copyout fails (unless we've already copied
some out).
- Don't kmem_free under a lock.
XXX Unclear whether anyone actually uses WD_SOFTBADSECT or why --
it's always been disabled by default. Maybe we should just remove
it?
Fix kernel freeze for wdc(4) variants with ATAC_CAP_NOIRQ:
(1) Change ata_xfer_ops:c_poll from void to int function. When it returns
ATAPOLL_AGAIN, let ata_xfer_start() iterate itself again.
(2) Let wdc_ata_bio_poll() return ATAPOLL_AGAIN until ATA_ITSDONE is
achieved.
A similar change has been made for mvsata(4) (see mvsata_bio_poll()),
and no functional changes for other devices.
This is how the drivers worked before jdolecek-ncq branch was merged.
Note that this changes are less likely to cause infinite recursion:
(1) wdc_ata_bio_intr() called from wdc_ata_bio_poll() asserts ATA_ITSDONE
in its error handling paths via wdc_ata_bio_done().
(2) Return value from c_start (= wdc_ata_bio_start()) is checked in
ata_xfer_start().
Therefore, errors encountered in ata_xfer_ops:c_poll and c_start routines
terminate the recursion for wdc(4). The situation is similar for mvsata(4).
Still, there is a possibility where ata_xfer_start() takes long time to
finish a normal operation. This can result in a delayed response for lower
priority interrupts. But, I've never observed such a situation, even when
heavy thrashing takes place for swap partition in wd(4).
"Go ahead" by jdolecek@.
Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.
Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)
Remove unnecessary or redundant interface attributes where they're not
needed.
There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)
...and a sentinel value CFARG_EOL.
Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.
Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
any more - while it has been instrumental to inadvertedly discover
driver bugs in PIO mode under QEMU recently, generally the switch
more hurts than helps, so now only warn when DMA errors happen
code kept under ATA_DOWNGRADE_MODE ifdef, disabled by default
retrying an xfer, to avoid reading/writing data from/to wrong offset,
and eventually beyond the end of data buffer
fixes data corruption under QEMU observed by Paul Ripke for emulated
IDE drives
to avoid race between the timeout and I/O submission; the I/O
submission can sleep with xfer while waiting for the controller to
be ready once it gets to thread context, and timeout might cause
the xfer to be freed, leading to crashes due to use-after-free
this fixes another type of crashes with slow devices under QEMU reported
by Paul Ripke - thanks a lot with extensive debugging help
and successful, particularly after PIO write is finished
fixes crashes in case the setup is so slow that timeout is triggered
e.g. while still waiting in wdc_wait_for_unbusy() or shortly after, without
drive actually having chance to complete the I/O, as seen in some
configuration under QEMU by Paul Ripke
threadpool-job-per-channel for the in-thread-context work that needs
to be done (which is rare).
On one of my test systems, this results in the total number of LWPs
after multi-user boot dropping from 116 to 78.
driver ata_bio hooks read parts of the xfer after ata_exec_xfer()
call in order to determine return value, change so that the hook
doesn't return any value - callers do not care already,
as all I/O requests are asynchronous
this problem was uncovered by recent change for wd(4) to not hold
wd mutex during ata_bio call, the interrupt for the xfer might
thus actually fire immediately
adjust also ata_exec_command driver hooks similarily - remove all
completion and waiting logic from drivers, upper layer ata code
using AT_WAIT/AT_POLL changed to call ata_wait_cmd() itself
PR kern/55169 by Nick Hudson
thread sleeps in wdcwait() - check current lwp rather than relying
on global ATACH_TH_RUN channel flag
should fix the hang part of the problem reported in
http://mail-index.netbsd.org/netbsd-users/2020/03/12/msg024249.html
thanks to Paul Ripke for providing extensive debugging info
struct disk_sectoralign {
/* First aligned sector number. */
uint32_t dsa_firstaligned;
/* Number of sectors per aligned unit. */
uint32_t dsa_alignment;
};
- Teach wd(4) to get it from ATA.
- Teach cgd(4) to pass it through from the underlying disk.
- Teach dk(4) to pass it through with adjustments.
- Teach zpool (zfs) to take advantage of it.
=> XXX zpool doesn't seem to understand when the vdev's starting
sector is misaligned.
Missing:
- ccd(4) and raidframe(4) support -- these should support _using_
DIOCGSECTORALIGN to decide where to start putting ccd or raid
stripes on disk, and these should perhaps _implement_
DIOCGSECTORALIGN by reporting the stripe/interleave factor.
- sd(4) support -- I don't know any obvious way to get it from SCSI,
but if any SCSI wizards know better than I, please feel free to
teach sd(4) about it!
- any ld(4) attachments -- might be worth teaching the ld drivers for
nvme and various raid controllers to get the aligned sector size
There's some duplicate logic here for now. I'm doing it this way,
rather than gathering the logic into a new disklabel_sectoralign
function or something, so that this change is limited to adding a new
ioctl, without any new kernel symbols, in order to make it easy to
pull up to netbsd-9 without worrying about the module ABI.
If a disk is backed by a physical medium other than itself, such as
cgd(4), then it passes DK_DUMP_RECURSIVE to disable the recursion
detection for dk_dump.
If, however, a device represents a physical medium on its own, such
as wd(4), then it passes 0 instead.
With this, I can now dump to dk on cgd on dk on wd.