qemu/block
Eric Blake 3482b9bc41 block: Pass unaligned discard requests to drivers
Discard is advisory, so rounding the requests to alignment
boundaries is never semantically wrong from the data that
the guest sees.  But at least the Dell Equallogic iSCSI SANs
has an interesting property that its advertised discard
alignment is 15M, yet documents that discarding a sequence
of 1M slices will eventually result in the 15M page being
marked as discarded, and it is possible to observe which
pages have been discarded.

Between commits 9f1963b and b8d0a980, we converted the block
layer to a byte-based interface that ultimately ignores any
unaligned head or tail based on the driver's advertised
discard granularity, which means that qemu 2.7 refuses to
pass any discard request smaller than 15M down to the Dell
Equallogic hardware.  This is a slight regression in behavior
compared to earlier qemu, where a guest executing discards
in power-of-2 chunks used to be able to get every page
discarded, but is now left with various pages still allocated
because the guest requests did not align with the hardware's
15M pages.

Since the SCSI specification says nothing about a minimum
discard granularity, and only documents the preferred
alignment, it is best if the block layer gives the driver
every bit of information about discard requests, rather than
rounding it to alignment boundaries early.

Rework the block layer discard algorithm to mirror the write
zero algorithm: always peel off any unaligned head or tail
and manage that in isolation, then do the bulk of the request
on an aligned boundary.  The fallback when the driver returns
-ENOTSUP for an unaligned request is to silently ignore that
portion of the discard request; but for devices that can pass
the partial request all the way down to hardware, this can
result in the hardware coalescing requests and discarding
aligned pages after all.

Reported by: Peter Lieven <pl@kamp.de>
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-22 15:59:23 +01:00
..
accounting.c block: Clean up includes 2016-01-20 13:36:23 +01:00
archipelago.c block: use aio_bh_schedule_oneshot 2016-10-07 13:34:07 +02:00
backup.c blockjob: refactor backup_start as backup_job_create 2016-11-14 22:47:34 -05:00
blkdebug.c block: use aio_bh_schedule_oneshot 2016-10-07 13:34:07 +02:00
blkreplay.c replay: allow replay stopping and restarting 2016-09-27 11:57:30 +02:00
blkverify.c block: use aio_bh_schedule_oneshot 2016-10-07 13:34:07 +02:00
block-backend.c block-backend: Always notify on blk_eject 2016-11-14 11:15:54 -05:00
bochs.c block: Convert bdrv_co_preadv/pwritev to BdrvChild 2016-07-05 16:46:27 +02:00
cloop.c block: Convert bdrv_pread(v) to BdrvChild 2016-07-05 16:46:27 +02:00
commit.c blockjob: add block_job_start 2016-11-14 22:47:34 -05:00
crypto.c crypto: make PBKDF iterations configurable for LUKS format 2016-09-19 16:30:45 +01:00
curl.c block/curl: Do not wait for data beyond EOF 2016-11-14 22:47:34 -05:00
dirty-bitmap.c block: More operations for meta dirty bitmap 2016-10-24 17:56:07 +02:00
dmg-bz2.c dmg: Move libbz2 code to dmg-bz2.so 2016-10-07 14:14:06 +02:00
dmg.c dmg: Move libbz2 code to dmg-bz2.so 2016-10-07 14:14:06 +02:00
dmg.h dmg: Move libbz2 code to dmg-bz2.so 2016-10-07 14:14:06 +02:00
gluster.c gluster: Fix use after free in glfs_clear_preopened() 2016-11-21 17:04:43 -05:00
io.c block: Pass unaligned discard requests to drivers 2016-11-22 15:59:23 +01:00
iscsi.c block: Return -ENOTSUP rather than assert on unaligned discards 2016-11-22 15:59:22 +01:00
linux-aio.c linux-aio: fix re-entrant completion processing 2016-09-28 17:11:23 +01:00
Makefile.objs dmg: Move libbz2 code to dmg-bz2.so 2016-10-07 14:14:06 +02:00
mirror.c mirror: do not flush every time the disks are synced 2016-11-14 22:49:26 -05:00
nbd-client.c nbd: Implement NBD_CMD_WRITE_ZEROES on client 2016-11-02 09:28:56 +01:00
nbd-client.h nbd: Implement NBD_CMD_WRITE_ZEROES on client 2016-11-02 09:28:56 +01:00
nbd.c block/nbd: Fix the leaked visitor 2016-11-11 15:54:55 +01:00
nfs.c nfs: Fix memory leak in nfs_file_create() 2016-11-11 15:54:55 +01:00
null.c block: use aio_bh_schedule_oneshot 2016-10-07 13:34:07 +02:00
parallels.c block/parallels: check new image size 2016-08-05 09:59:06 +01:00
qapi.c qapi: rename QmpOutputVisitor to QObjectOutputVisitor 2016-10-25 16:25:54 +02:00
qcow2-cache.c block: Convert bdrv_pwrite(v/_sync) to BdrvChild 2016-07-05 16:46:27 +02:00
qcow2-cluster.c qcow2: Support BDRV_REQ_MAY_UNMAP 2016-10-24 17:54:03 +02:00
qcow2-refcount.c block: Convert bdrv_discard() to byte-based 2016-07-20 14:11:55 +01:00
qcow2-snapshot.c block: Convert bdrv_pwrite(v/_sync) to BdrvChild 2016-07-05 16:46:27 +02:00
qcow2.c block: Return -ENOTSUP rather than assert on unaligned discards 2016-11-22 15:59:22 +01:00
qcow2.h qcow2: Remove stale FIXME comment 2016-11-11 15:54:55 +01:00
qcow.c crypto: extend mode as a parameter in qcrypto_cipher_supports() 2016-10-19 10:09:24 +01:00
qed-check.c qed: Use DIV_ROUND_UP 2016-06-07 18:19:24 +03:00
qed-cluster.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-gencb.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-l2-cache.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-table.c block: introduce BDRV_POLL_WHILE 2016-10-28 21:50:18 +08:00
qed.c qed: Implement .bdrv_drain 2016-10-28 21:50:18 +08:00
qed.h block: use aio_bh_schedule_oneshot 2016-10-07 13:34:07 +02:00
quorum.c quorum: do not allocate multiple iovecs for FIFO strategy 2016-10-24 17:56:06 +02:00
raw_bsd.c raw_bsd: don't check size alignment when only offset is set 2016-11-11 15:54:55 +01:00
raw-posix.c raw-posix: Rename 'raw_s' to 'rs' 2016-11-11 15:56:22 +01:00
raw-win32.c block: improve error handling in raw_open 2016-10-24 17:54:03 +02:00
rbd.c rbd: make the code more readable 2016-11-01 07:55:57 -04:00
replication.c blockjob: refactor backup_start as backup_job_create 2016-11-14 22:47:34 -05:00
sheepdog.c block: Return -ENOTSUP rather than assert on unaligned discards 2016-11-22 15:59:22 +01:00
snapshot.c error: Remove NULL checks on error_propagate() calls 2016-06-20 16:38:13 +02:00
ssh.c block/ssh: Code cleanup for unused parameter 2016-11-11 15:54:55 +01:00
stream.c blockjob: add block_job_start 2016-11-14 22:47:34 -05:00
throttle-groups.c throttle: Correct access to wrong BlockBackendPublic structures 2016-10-24 17:54:03 +02:00
trace-events blockjob: add block_job_start 2016-11-14 22:47:34 -05:00
vdi.c vdi: Use QEMU UUID API 2016-09-23 11:42:52 +08:00
vhdx-endian.c vhdx: Use QEMU UUID API 2016-09-23 11:42:52 +08:00
vhdx-log.c block: Convert bdrv_pwrite(v/_sync) to BdrvChild 2016-07-05 16:46:27 +02:00
vhdx.c vhdx: Use QEMU UUID API 2016-09-23 11:42:52 +08:00
vhdx.h block: vhdx - update PAYLOAD_BLOCK_UNMAPPED value to match 1.00 spec 2014-12-12 15:42:22 +00:00
vmdk.c vmdk: add vmdk_co_pwritev_compressed 2016-09-05 19:06:48 +02:00
vpc.c vpc: Use QEMU UUID API 2016-09-23 11:42:52 +08:00
vvfat.c block: Add "read-only" to the options QDict 2016-09-23 13:36:10 +02:00
win32-aio.c linux-aio: share one LinuxAioState within an AioContext 2016-07-18 15:09:31 +01:00
write-threshold.c block: use bdrv_add_before_write_notifier 2016-10-07 13:34:07 +02:00