In order to correctly check whether a given cluster is read as zero, we
don't only need to check whether bdrv_get_block_status_above() sets
BDRV_BLOCK_ZERO, but also if all sectors for the whole cluster have the
same status.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
We should check for (res & BDRV_BLOCK_ZERO) only. The situation when we
will have !(res & BDRV_BLOCK_DATA) and will not have BDRV_BLOCK_ZERO is
not possible for images with bdi.unallocated_blocks_are_zero == true.
For those images where it's false, however, it can happen and we must
not consider the data zeroed then or we would corrupt the image.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of propagating any change of a BDS's AioContext only to its file
and backing children and letting driver-specific code do the rest, just
propagate it to all and drop the thus superfluous implementations of
bdrv_{at,de}tach_aio_context() in Quorum, blkverify and VMDK.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch removes the remaining users of bs->blk, which will allow us
to have multiple BBs on top of a single BDS. In the meantime, all checks
that are currently in place to prevent the user from creating such
setups can be switched to bdrv_has_blk() instead of accessing BDS.blk.
Future patches can allow them and e.g. enable users to mirror to a block
device that already has a BlockBackend on it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
query-named-block-nodes should not return information that is related
to the attached BlockBackend rather than the node itself, so throttling
information needs to be removed from it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
We need to introduce a separate BdrvNextIterator struct that can keep
more state than just the current BDS in order to avoid using the bs->blk
pointer.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
In many cases we just want to know whether a BDS has at least one BB
attached, without needing to know the exact BB that is attached. In
contrast to bs->blk, this is still a valid question when more than one
BB can be attached, so just answer it by checking the parents list.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Since virtio-blk implements request merging itself these days, the only
remaining users are test cases for the function. That doesn't make the
function exactly useful any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Block jobs don't actually make use of the iostatus for their BDSes, but
they manage a separate block job iostatus. Still, they require that it
is enabled for the source BDS and they enable it automatically for the
target and set the error handling mode - which ends up never being used
by the job.
This patch removes all of the BDS iostatus handling from the block job,
which removes another few bs->blk accesses.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
When block job errors were introduced, we assigned the iostatus of the
target BDS "just in case". The field has never been accessible for the
user because the target isn't listed in query-block.
Before we can allow the user to have a second BlockBackend on the
target, we need to clean this up. If anything, we would want to set the
iostatus for the internal BB of the job (which we can always do later),
but certainly not for a separate BB which the job doesn't even use.
As a nice side effect, this gets us rid of another bs->blk use.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
In order to get rid of bs->blk for bdrv_get_device_name() and
bdrv_get_device_or_node_name(), ask all parents for their name and
simply pick the first one.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
We want to get rid of BlockDriverState.blk in order to allow multiple
BlockBackends per BDS. Converting the device callbacks in block.c (which
assume a single BlockBackend) to per-child callbacks gets us rid of the
first few instances.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Checking whether there are throttled requests requires going to the
associated BlockBackend, which we want to avoid.
All users of bdrv_requests_pending() in block/io.c already call
bdrv_parent_drained_begin() first, which restarts all throttled
requests, so no throttled requests can be left here and this is removal
of dead code.
The remaining users (assertions during graph manipulation in block.c)
don't care about requests that are still queued in the BlockBackend and
haven't been issued for a BlockDriverState yet.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
This moves the throttling related part of the BDS life cycle management
to BlockBackend. The throttling group reference is now kept even when no
medium is inserted.
With this commit, throttling isn't disabled and then re-enabled any more
during graph reconfiguration. This fixes the temporary breakage of I/O
throttling when used with live snapshots or block jobs that manipulate
the graph.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
So far, bdrv_parent_drained_begin/end() was called for the duration of
the actual bdrv_drain() at the beginning of a drained section, but we
really should keep parents quiesced until the end of the drained
section.
This does not actually change behaviour at this point because the only
user of the .drained_begin/end BdrvChildRole callback is I/O throttling,
which already doesn't send any new requests after flushing its queue in
.drained_begin. The patch merely removes a trap for future users.
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
This removes the last part of I/O throttling from block/io.c and moves
it to the BlockBackend.
Instead of having knowledge about throttling inside io.c, we can call a
BdrvChild callback .drained_begin/end, which happens to drain the
throttled requests for BlockBackend parents.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
BlockBackends use it to get a back pointer from BdrvChild to
BlockBackend in any BdrvChildRole callbacks.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch changes where the throttling state is stored (used to be the
BlockDriverState, now it is the BlockBackend), but it doesn't actually
make it a BB level feature yet. For example, throttling is still
disabled when the BDS is detached from the BB.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
As a first step towards moving I/O throttling to the BlockBackend level,
this patch changes all pointers in struct ThrottleGroup from referencing
a BlockDriverState to referencing a BlockBackend.
This change is valid because we made sure that throttling can only be
enabled on BDSes which have a BB attached.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Some features, like I/O throttling, are implemented outside
block-backend.c, but still want to keep information in BlockBackend,
e.g. list entries that allow keeping a list of BlockBackends.
In order to avoid exposing the whole struct layout in the public header
file, this patch introduces an embedded public struct where such
information can be added and a pair of functions to convert between
BlockBackend and BlockBackendPublic.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
It was already true in principle that a throttled BDS always has a BB
attached, except that the order of operations while attaching or
detaching a BDS to/from a BB wasn't careful enough.
This commit breaks graph manipulations while I/O throttling is enabled.
It would have been possible to keep things working with some temporary
hacks, but quite cumbersome, so it's not worth the hassle. We'll fix
things again in a minute.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Move it to the actual users. There are still a few includes of
qemu/bswap.h in headers; removing them is left for future work.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJXNIcBAAoJEH8JsnLIjy/WiUQP/Rzfo8pe7TWA2InxdcDOPsx4
2/tHHJdVkffnNX5rdBvc0mOUZNJxej0NtJu2e63BB+ydYju//xw8gruKU7TR+Nd3
nPNSsqk80prK3RNgWu7qymBvIkHDDcDQhlp48HKq+dxrfConXtHmoXapGsqc0S47
xu03oC6WzSIyLf7TLytcjUmEprQSaCGOwsb/XaHAWL750fFAGcdy/K5PWBpUv6DN
T0jZ3u4UneE1jeabRmqAwjgDJXC9l6riH9fP/ZtYhgNlNj84zlMXajUHSULhGknP
cTGjwwg9tOvhcjTdhdRmWlvG1m0T77ZX3icfZLhcTdb/Uz68NXVqs8P25IGV9McD
DPrb3T/M8JUoqLXJxIpxUm2Levof5v0dUF1PHmN5bT7pshcqv/1J7v8Fdtf9l9mp
zI0+FK1TZ102C0H2F7AWYZSlo2EfNUSd02QQx6MbfDokDIlIxY+EgP1/Es5XlkqC
wc7HrJvq+uix2zXw9bn9Vg9p/nDuxlRx+ppRRarNNRonaqTrx/1qAaas4bsqc9Gz
H6gxw7BHybm0TZFdHqAdIonpesecYw6yWUXT/mQehbfphsmQmu/d2HvF2C9uUm4X
O0JduBlKOTm2hMcg5qL6Gko8WaQIctdCJH/1Onts92cZnm8Vr/9zcmMgwGoCd7sE
+t6Yg0jqpTUJwhZhIuCw
=NbjJ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Thu 12 May 2016 14:37:05 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
* remotes/kevin/tags/for-upstream: (69 commits)
qemu-iotests: iotests: fail hard if not run via "check"
block: enable testing of LUKS driver with block I/O tests
block: add support for encryption secrets in block I/O tests
block: add support for --image-opts in block I/O tests
qemu-io: Add 'write -z -u' to test MAY_UNMAP flag
qemu-io: Add 'write -f' to test FUA flag
qemu-io: Allow unaligned access by default
qemu-io: Use bool for command line flags
qemu-io: Make 'open' subcommand more like command line
qemu-io: Add missing option documentation
qmp: add monitor command to add/remove a child
quorum: implement bdrv_add_child() and bdrv_del_child()
Add new block driver interface to add/delete a BDS's child
qemu-img: check block status of backing file when converting.
iotests: fix the redirection order in 083
block: Inactivate all children
block: Drop superfluous invalidating bs->file from drivers
block: Invalidate all children
nbd: Simplify client FUA handling
block: Honor BDRV_REQ_FUA during write_zeroes
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now they are invalidated by the block layer, so it's not necessary to
do this in block drivers' implementations of .bdrv_invalidate_cache.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that the block layer honors per-bds FUA support, we don't
have to duplicate the fallback flush at the NBD layer. The
static function nbd_co_writev_flags() is no longer needed, and
the driver can just directly use nbd_client_co_writev().
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The block layer has a couple of cases where it can lose
Force Unit Access semantics when writing a large block of
zeroes, such that the request returns before the zeroes
have been guaranteed to land on underlying media.
SCSI does not support FUA during WRITESAME(10/16); FUA is only
supported if it falls back to WRITE(10/16). But where the
underlying device is new enough to not need a fallback, it
means that any upper layer request with FUA semantics was
silently ignoring BDRV_REQ_FUA.
Conversely, NBD has situations where it can support FUA but not
ZERO_WRITE; when that happens, the generic block layer fallback
to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu
2.6) was losing the FUA flag.
The problem of losing flags unrelated to ZERO_WRITE has been
latent in bdrv_co_do_write_zeroes() since commit aa7bfbff, but
back then, it did not matter because there was no FUA flag. It
became observable when commit 93f5e6d8 paved the way for flags
that can impact correctness, when we should have been using
bdrv_co_writev_flags() with modified flags. Compare to commit
9eeb6dd, which got flag manipulation right in
bdrv_co_do_zero_pwritev().
Symptoms: I tested with qemu-io with default writethrough cache
(which is supposed to use FUA semantics on every write), and
targetted an NBD client connected to a server that intentionally
did not advertise NBD_FLAG_SEND_FUA. When doing 'write 0 512',
the NBD client sent two operations (NBD_CMD_WRITE then
NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing
'write -z 0 512', the NBD client sent only NBD_CMD_WRITE.
The fix is do to a cleanup bdrv_co_flush() at the end of the
operation if any step in the middle relied on a BDS that does
not natively support FUA for that step (note that we don't
need to flush after every operation, if the operation is broken
into chunks based on bounce-buffer sizing). Each BDS gains a
new flag .supported_zero_flags, which parallels the use of
.supported_write_flags but only when accessing a zero write
operation (the flags MUST be different, because of SCSI having
different semantics based on WRITE vs. WRITESAME; and also
because BDRV_REQ_MAY_UNMAP only makes sense on zero writes).
Also fix some documentation to describe -ENOTSUP semantics,
particularly since iscsi depends on those semantics.
Down the road, we may want to add a driver where its
.bdrv_co_pwritev() honors all three of BDRV_REQ_FUA,
BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise
this via bs->supported_write_flags for blocks opened by that
driver; such a driver should NOT supply .bdrv_co_write_zeroes
nor .supported_zero_flags. But none of the drivers touched
in this patch want to do that (the act of writing zeroes is
different enough from normal writes to deserve a second
callback).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pre-patch, .supported_write_flags lives at the driver level, which
means we are blindly declaring that all block devices using a
given driver will either equally support FUA, or that we need a
fallback at the block layer. But there are drivers where FUA
support is a per-block decision: the NBD block driver is dependent
on the remote server advertising NBD_FLAG_SEND_FUA (and has
fallback code to duplicate the flush that the block layer would do
if NBD had not set .supported_write_flags); and the iscsi block
driver is dependent on the mode sense bits advertised by the
underlying device (and is currently silently ignoring FUA requests
if the underlying device does not support FUA).
The fix is to make supported flags as a per-BDS option, set during
.bdrv_open(). This patch moves the variable and fixes NBD and iscsi
to set it only conditionally; later patches will then further
simplify the NBD driver to quit duplicating work done at the block
layer, as well as tackle the fact that SCSI does not support FUA
semantics on WRITESAME(10/16) but only on WRITE(10/16).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is a possibility that qcow2_co_write_zeroes() will be called
with the partial block. This could be synthetically triggered with
qemu-io -c "write -z 32k 4k"
and can happen in the real life in qemu-nbd. The latter happens under
the following conditions:
(1) qemu-nbd is started with --detect-zeroes=on and is connected to the
kernel NBD client
(2) third party program opens kernel NBD device with O_DIRECT
(3) third party program performs write operation with memory buffer
not aligned to the page
In this case qcow2_co_write_zeroes() is unable to perform the operation
and mark entire cluster as zeroed and returns ENOTSUP. Thus the caller
switches to non-optimized version and writes real zeroes to the disk.
The patch creates a shortcut. If the block is read as zeroes, f.e. if
it is unallocated, the request is extended to cover full block.
User-visible situation with this block is not changed. Before the patch
the block is filled in the image with real zeroes. After that patch the
block is marked as zeroed in metadata. Thus any subsequent changes in
backing store chain are not affected.
Kevin, thank you for a cool suggestion.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that there are no remaining clients, we can drop the
sector-based blk_read(), blk_write(), blk_aio_readv(), and
blk_aio_writev(). Sadly, there are still remaining
sector-based interfaces, such as blk_*discard(), or
blk_write_compressed(); those will have to wait for another
day.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
blk_aio_readv() and blk_aio_writev() are annoying in that they
can't access sub-sector granularity, and cannot pass flags.
Also, they require the caller to pass redundant information
about the size of the I/O (qiov->size in bytes must match
nb_sectors in sectors).
Add new blk_aio_preadv() and blk_aio_pwritev() functions to fix
the flaws. The next few patches will upgrade callers, then
finally delete the old interfaces.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based blk_write() should die; convert the one-off
variant blk_write_zeroes() to use an offset/count interface
instead. Likewise for blk_co_write_zeroes() and
blk_aio_write_zeroes().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based blk_read() should die; convert the one-off
variant blk_read_unthrottled().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have several block drivers that understand BDRV_REQ_FUA,
and emulate it in the block layer for the rest by a full flush.
But without a way to actually request BDRV_REQ_FUA during a
pass-through blk_pwrite(), FUA-aware block drivers like NBD are
forced to repeat the emulation logic of a full flush regardless
of whether the backend they are writing to could do it more
efficiently.
This patch just wires up a flags argument; followup patches
will actually make use of it in the NBD driver and in qemu-io.
Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Vmdk images have metadata to indicate the vmware virtual
hardware version image was created/tested to run with.
Allow users to specify that version via new 'hwversion'
option.
[ kwolf: Adjust qemu-iotests common.filter ]
Signed-off-by: Janne Karhunen <Janne.Karhunen@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Files with conditional debug statements should ensure that the printf is
always compiled. This prevents bitrot of the format string of the debug
statement. And switch debug output to stderr.
Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are no block drivers left that implement the old .bdrv_read/write
interface, so it can be removed now. This gets us rid of the
corresponding emulation functions, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
This doesn't really convert any of the actual vvfat logic to use
vectored I/O (and it's doubtful whether that would make sense), but
instead just adapts the wrappers to the modern interface.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
This is a byte granularity version of vmdk_find_index_in_cluster().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>