Commit Graph

463 Commits

Author SHA1 Message Date
jdolecek
c39076d0d1 account for already transferred data (partially done I/O) when
retrying an xfer, to avoid reading/writing data from/to wrong offset,
and eventually beyond the end of data buffer

fixes data corruption under QEMU observed by Paul Ripke for emulated
IDE drives
2020-05-24 22:12:29 +00:00
jdolecek
c61cfedcc1 fix use-after-free for ata xfer on bio submission found by KASAN
driver ata_bio hooks read parts of the xfer after ata_exec_xfer()
call in order to determine return value, change so that the hook
doesn't return any value - callers do not care already,
as all I/O requests are asynchronous

this problem was uncovered by recent change for wd(4) to not hold
wd mutex during ata_bio call, the interrupt for the xfer might
thus actually fire immediately

adjust also ata_exec_command driver hooks similarily - remove all
completion and waiting logic from drivers, upper layer ata code
using AT_WAIT/AT_POLL changed to call ata_wait_cmd() itself

PR kern/55169 by Nick Hudson
2020-04-13 10:49:34 +00:00
maxv
babb6cb124 constify 2020-04-13 08:05:02 +00:00
jdolecek
22ba269296 drop wd lock in wdstart1() before calling the ata_bio hook; when called
from ata thread context, that can still need to sleep for wdc attachments
in wdcwait()
2020-04-07 13:22:05 +00:00
riastradh
ffcf681ee3 New ioctl DIOCGSECTORALIGN returns sector alignment parameters.
struct disk_sectoralign {
	/* First aligned sector number.  */
	uint32_t dsa_firstaligned;

	/* Number of sectors per aligned unit.  */
	uint32_t dsa_alignment;
};

- Teach wd(4) to get it from ATA.
- Teach cgd(4) to pass it through from the underlying disk.
- Teach dk(4) to pass it through with adjustments.
- Teach zpool (zfs) to take advantage of it.
  => XXX zpool doesn't seem to understand when the vdev's starting
     sector is misaligned.

Missing:

- ccd(4) and raidframe(4) support -- these should support _using_
  DIOCGSECTORALIGN to decide where to start putting ccd or raid
  stripes on disk, and these should perhaps _implement_
  DIOCGSECTORALIGN by reporting the stripe/interleave factor.

- sd(4) support -- I don't know any obvious way to get it from SCSI,
  but if any SCSI wizards know better than I, please feel free to
  teach sd(4) about it!

- any ld(4) attachments -- might be worth teaching the ld drivers for
  nvme and various raid controllers to get the aligned sector size

There's some duplicate logic here for now.  I'm doing it this way,
rather than gathering the logic into a new disklabel_sectoralign
function or something, so that this change is limited to adding a new
ioctl, without any new kernel symbols, in order to make it easy to
pull up to netbsd-9 without worrying about the module ABI.
2020-03-02 16:01:56 +00:00
riastradh
fcabfbde55 Add a flag to dk_dump for virtual disk devices.
If a disk is backed by a physical medium other than itself, such as
cgd(4), then it passes DK_DUMP_RECURSIVE to disable the recursion
detection for dk_dump.

If, however, a device represents a physical medium on its own, such
as wd(4), then it passes 0 instead.

With this, I can now dump to dk on cgd on dk on wd.
2020-03-01 03:21:54 +00:00
simonb
f3ef58ddb7 Tidy quirk table and remove outdated quick from the quirk format string. 2020-01-18 11:24:40 +00:00
simonb
2fb6b7b160 Revert kern/54790 and kern/54855 NCQ fix that penalised all Samsung
EVO 860 drives.

ok jdolecek@
2020-01-18 11:22:49 +00:00
jdolecek
0631449bd5 enable the BAD_NCQ quirk for all 860 EVO drives
XXX work-in-progress, it's not clear whether this is driver or controller
XXX problem
2020-01-14 21:08:06 +00:00
jdolecek
eef4b266f0 disable NCQ by default for "Samsung SSD 860 EVO 1TB" and
"Samsung SSD 860 EVO 500GB" - these drives have known broken NCQ support
particularly when used with AMD SB710/750 chipsets, problem occur also
under Linux and Windows

https://eu.community.samsung.com/t5/Cameras-IT-Everything-Else/860-EVO-250GB-causing-freezes-on-AMD-system/td-p/575813
https://bugzilla.kernel.org/show_bug.cgi?id=201693

It seems there is no Samsung firmware update to fix this even.

Disable NCQ regardless of the controller, it's likely same problem
exists with other controllers too.

This should fix PR kern/54790 and PR kern/54855
2020-01-13 21:20:17 +00:00
msaitoh
a0403cde04 s/transfered/transferred/ 2019-12-27 09:41:48 +00:00
mlelstv
c8f70c2785 Take channel lock for calling reset_drive.
Should fix PR 54217.
2019-06-06 20:55:43 +00:00
mlelstv
23097e1644 Count only the initial start of a transfer, not the retries.
Should fix kern/54166.

Thanks to macallan@ for spotting the issue.
2019-06-06 20:41:04 +00:00
mlelstv
b13dbe890e Also schedule timeouts when all openings are in use. 2019-05-24 06:01:05 +00:00
bouyer
f3b5c195dd Really implement WDF_DIRTY. patch(1) did something silly here ... 2019-04-07 13:00:00 +00:00
bouyer
6b97ceb30a drop AT_RST_NOCMD, it's a cut'n'paste side effect 2019-04-05 21:31:44 +00:00
bouyer
4911039d01 Implement a DIRTY flag (copied from sd(4)) so avoid flushing the cache if
there has been no write. This avoids a (long) timeout on the flush cache
command triggered by atactl sleep, when the device is open only by the atactl
command itself.
If a drive has no partition open and goes to sleep, the WDF_LOADED
flag is clear, and the next open will issue  wd_get_params() command.
But to wake up the drive a reset is required, and wd_get_params() doens't
issue a reset on timeout, so there's no way to wake up the disk.
Add a retry after reset to wd_get_params().

Tested by Hauke Fath; fixes PR kern/49457
2019-04-05 18:23:45 +00:00
mlelstv
6d29cda925 The NCQ support added a private request queue to the wd driver. This
makes the regular buffer queue ineffective, it also allowed to queue
an unlimited number of requests.

Fix this by limiting the number of requests queued to the driver to
the possible number of concurrent NCQ transactions.
2019-03-19 16:56:29 +00:00
mlelstv
5363a63f5b Set disk model name as disk type. The information can be queried through
drvctl(4).
2019-03-19 06:51:05 +00:00
mlelstv
42ff1ff694 Move standby on detach after wedges deletion in case wedges trigger
I/O on the parent disk. Add debug messages.
2019-03-19 06:47:12 +00:00
jdolecek
af070baeb7 move the comment (and expand) about NCQ TRIM to wd_trim() 2018-10-24 19:46:44 +00:00
jdolecek
3fa9d7b0da Merge jdolecek-ncqfixes branch
- ata_xfer's are dynamicall allocated as needed using a pool, no longer
  limited to number of possible openings supported by controller; dump
  and recovery paths use dedicated pre-allocated storage
- moved callouts and condvars from ata_xfer to queue or channel, so that
  ata_xfer does not need special initialization
- slot allocation now done when xfer is being activated, uncoupled
  from memory allocation; active slots are no longer tracked by controller
  code
- channel and drive reset is done always via the atabus thread, and
  now executes with channel locked the whole time
- NCQ recovery moved to shared function, and run via the thread also
- added some workarounds for buggy error recovery AHCI emulation in QEMU
  and Parallels

designed to primarily fix kern/52614, but might also help with kern/47041
and kern/53183
2018-10-22 20:13:47 +00:00
jdolecek
be7eb61c6b fix race in wd_lastclose() on systems with two ide disks on same
channel, which happened when one disk had pending I/O while the other
disk executed the final disk flush - need to restart bufq processing
once xfer is freed in this case

it could happen e.g. on boot when system executes fsck on different
partitions on the two drives in parallell and hence open and closes
the disk devices repeatedly

add KASSERT() for empty bufq on wd_lastclose(), and fix similar issue
also on suspend/standby path

this was introduced by the NCQ merge and not dksubr - before the merge
each drive had their own xfer, so they could not block each other

fixes PR kern/52783 by Onno van der Linden; many thanks for extensive
help with tracking this down
2018-08-10 22:43:22 +00:00
jdolecek
6d0fcbdb88 add wddebug() which dumps some status for attached disks; indended for
debugging of PR kern/52783
2018-08-06 20:07:05 +00:00
jdolecek
02268abf5d take mutex around check for pending flush, as the code before dksubr
conversion had, to avoid possible race

on my system doesn't really change behaviour, besides the test runs
being slightly faster (3x parallell pkgsrc archive extraction, up
to 5% difference), thought that can just be noise

done as part of investigation for PR kern/53183 by Sevan Janiyan
2018-06-03 18:38:35 +00:00
mlelstv
451e80f07d Fix block address calculation for bad sectors. 2018-01-07 11:37:30 +00:00
pgoyette
d54ad228bf Fix build for WD_SOFTBADSECT option. PR kern/52814
XXX No clue if this option actually works.  This fix just makes it
XXX compile without error.
2017-12-13 10:24:31 +00:00
mlelstv
0fa2b3a17d Make wddone poll all drives of a channel again. 2017-11-07 04:09:08 +00:00
mlelstv
c2bc1c4bc6 Add WDF_OPEN flag to really disallow opening of a disk that has been invalidated.
Restore wdbiorestart function to actually retry the failed I/O request instead
of just restarting the queue.

Fix compilation without ATADEBUG.
2017-11-03 13:01:26 +00:00
mlelstv
cd89d24ed3 refactor wd and ataraid drivers to use common disk subroutines. 2017-11-01 19:34:45 +00:00
jdolecek
e5c1f84bb6 more detailed debug info; also sync DEBUG_* values in wd.c with ata.c 2017-10-19 20:45:07 +00:00
jdolecek
59fc8a81da do not use the NCQ priority by default; seems it negatively affects
performance at least with some drives, so this needs better understood first
2017-10-14 13:20:32 +00:00
jdolecek
fd181cba14 only call drive reset with AT_POLL when the command itself was
polled, so that the logic for AT_POLL matches how e.g. ata_dmaerr() is
called; this was the original intent of the change in 1.428.2.25,
to make the error handling safe wrt. polled xfers

this is stopgap fix for ATA channel wedge after DMA error, as reported
by Martin Husemann in PR kern/52606, and PR kern/52605

problem happened due to ata_reset_channel() being called once in ata_dmaerr()
with flags == 0, which freezed channel and set flag to reset via thread,
then ata_reset_channel() was called via wdc_drive_reset() with AT_POLL, which
just executed the reset and cleared the flag, without clearing the extra
freeze; that logic will be refactored in separate commit
2017-10-14 13:15:14 +00:00
jdolecek
26cf68556f Merge support for SATA NCQ (Native Command Queueing) from jdolecek-ncq branch
ATA subsystem was changed to support several outstanding commands, and use
NCQ xfers if supported by both the controller and the disk, including NCQ
error recovery. Set NCQ high priority for BPRIO_TIMECRITICAL xfers
if supported. Added FUA support.

Done some work towards MP-safe, all ATA code tsleep()/wakeup() replaced
by condvars, and switched most code from spl* to mutexes (separate
wd(4) and ata channel lock).

Introduced new option WD_CHAOS_MONKEY to facilitate testing of error
handling, fixed several uncovered issues. Also fixed several problems
with kernel dump to wd(4) disk.

Tested with ahcisata(4), mvsata(4), siisata(4), piixide(4) on amd64,
with and without port multiplier, both disk and ATAPI devices; other
drivers and archs mechanically adjusted and compile-tested. NCQ is
supported for ahcisata(4) and siisata(4) for any controller, for
mvsata(4) only Gen IIe ones for now. Also enabled ATAPI support in
mvsata(4).

Thanks to Matt Thomas for initial ATA infrastructure patch, and
Jonathan A.Kollasch for siisata(4) NCQ changes and general testing.

Also fixes PR kern/43169 (wd(4)); and PR kern/11811, PR kern/47041,
PR kern/51979 (kernel dump)
2017-10-07 16:05:31 +00:00
jdolecek
52d7de9781 remove the workaround for Seagate 'mod15write' bug, now driver only prints
error on boog; unfortunately the code actually doesn't work, and there is
little point trying to fix
2017-04-24 09:42:52 +00:00
mlelstv
ba576b71a7 Enhance disk metrics by calculating a weighted sum that is incremented
by the number of concurrent I/O requests. Also introduce a new disk_wait()
function to measure requests waiting in a bufq.
iostat -y now reports data about waiting and active requests.

So far only drivers using dksubr and dk, ccd, wd and xbd collect data about
waiting requests.
2017-03-05 23:07:12 +00:00
pgoyette
e38abff020 Avoid calling bufq_free() from critical section. 2016-11-20 02:34:27 +00:00
christos
5a86cfc800 CID 1364758: Integer handling issues, avoid sign extension to 64 bits. 2016-08-05 06:54:22 +00:00
jakllsch
e30127d7a8 Space before tab and trailing whitespace fixes. 2016-07-22 12:55:34 +00:00
jakllsch
91be397f74 Add ATA8-ACS Long Logical Sector Feature Set support to wd(4). 2016-07-22 04:08:10 +00:00
jakllsch
4821310465 Call wd_params_to_properties() after softc is sufficently
initialized.
2016-07-21 19:05:03 +00:00
jakllsch
580ae47a86 Remove unused 'params' argument of local function wd_params_to_properties() 2016-07-21 18:54:13 +00:00
bouyer
01a30830e3 Add a new config_detach() flag, DETACH_POWEROFF, which is set when
detaching devices at shutdown time with RB_POWERDOWN.
When detaching wd(4), put the drive in standby before detach
for DETACH_POWEROFF.
Fix PR kern/51252
2016-06-19 09:35:06 +00:00
mlelstv
6f00c789e1 Use C99-style initializers for struct dkdriver. 2015-04-26 15:15:19 +00:00
riastradh
233f556c2e Convert sys/dev to use <sys/rndsource.h>. 2015-04-13 16:33:23 +00:00
christos
c182898b0d We have three sets of DTYPE_ constants in the kernel:
altq		Drop 		Type
	disklabel	Disk 		Type
	file		Descriptor	Type
(not to mention constants that contain the string DTYPE).
Let's make them two, by changing the disklabel one to be DisK TYPE since the
other disklabel constants seem to do that. Not many userland programs use
these constants (and the ones that they do are mostly in ifdefs). They will
be fixed shortly.
2015-01-02 19:42:05 +00:00
christos
c60db2e923 make more drivers use disk_ioctl, and add a dev parameter to it so that
we can merge the "easy" disklabel ioctls to it. Ultimately all this will
go do dk_ioctl once all the drivers have been converted.
2014-12-31 19:52:04 +00:00
christos
3be6bb2414 Centralize wedge ioctls in disk_ioctl. 2014-12-31 17:06:48 +00:00
mlelstv
64c07f5206 support DIOCMWEDGES ioctl. 2014-11-04 07:51:54 +00:00
mlelstv
aba91938d6 The partition size is always computed in native blocks. The code also assumes
that native blocks are always DEVB_SIZE (a few lines earlier) which makes
the current calculation a no-op.
2014-10-11 14:05:11 +00:00