This is the final patch for converting the common I/O path to take
a BdrvChild parameter instead of BlockDriverState.
The completion of this conversion means that all users that perform I/O
on an image need to actually hold a reference (in the form of BdrvChild,
possible as part of a BlockBackend) to that image. This also protects
against inconsistent use of BlockBackend vs. BlockDriverState functions
because direct use of a BlockDriverState isn't possible any more and
blk->root is private for block-backends.c.
In addition, we can now distinguish different users in the I/O path,
and the future op blockers work is going to add assertions based on
permissions stored in BdrvChild.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Using int for values that are only used as booleans is confusing.
While at it, rearrange a couple of members so that all the bools
are contiguous.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This brings it in line with .bdrv_save_vmstate().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
We already have a byte-based bdrv_pwritev(), but the read counterpart
was still missing. This commit adds it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
In a first step to convert the common I/O path to work on bytes rather
than sectors, this converts the copy-on-read logic that is used by
bdrv_aligned_preadv().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Add a new BDRV_REQ_MASK constant, and use it to make sure that
caller flags are always valid.
Tested with 'make check' and with qemu-iotests on both '-raw'
and '-qcow2'; the only failure turned up was fixed in the
previous commit.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Rename to bdrv_pwrite_zeroes() to let the compiler ensure we
cater to the updated semantics. Do the same for bdrv_co_write_zeroes().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I was thrown by the fact that the public type BlockRequest had
an anonymous union, but no obvious discriminator. Turns out
that the only client of the second branch of the union was code
internal to io.c, now that commit 91c6e4b killed public
multiwrite, so move it into io.c and improve the comments.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463699150-19445-1-git-send-email-eblake@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
This changes the backup block job to use the job's BlockBackend for
performing its I/O. job->bs isn't used by the backup code any more
afterwards.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This changes the streaming block job to use the job's BlockBackend for
performing the COR reads. job->bs isn't used by the streaming code any
more afterwards.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
There are no callers to bdrv_open() or bdrv_open_inherit() left that
pass a pointer to a non-NULL BDS pointer as the first argument of these
functions, so we can finally drop that parameter and just make them
return the new BDS.
Generally, the following pattern is applied:
bs = NULL;
ret = bdrv_open(&bs, ..., &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
...
}
by
bs = bdrv_open(..., errp);
if (!bs) {
ret = -EINVAL;
...
}
Of course, there are only a few instances where the pattern is really
pure.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is unused now, so we may just as well drop it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_next() users all leaked the BdrvNextIterator after completing
the iteration. Simply changing bdrv_next() to free the iterator before
returning NULL at the end of list doesn't work because some callers exit
the loop before looking at all BDSes.
This patch moves the BdrvNextIterator from the heap to the stack of
the caller and switches to a bdrv_first()/bdrv_next() interface for
initialising the iterator.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
We need to introduce a separate BdrvNextIterator struct that can keep
more state than just the current BDS in order to avoid using the bs->blk
pointer.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Since virtio-blk implements request merging itself these days, the only
remaining users are test cases for the function. That doesn't make the
function exactly useful any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
In some cases, we want to take a quorum child offline, and take
another child online.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1462865799-19402-2-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Extract the handling of io_plug "depth" from linux-aio.c and let the
main bdrv_drain loop do nothing but wait on I/O.
Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug
now operate on all children. The visit order is now symmetrical between
plug and unplug, making it possible for formats to implement plug/unplug.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Using the nested aio_poll() in coroutine is a bad idea. This patch
replaces the aio_poll loop in bdrv_drain with a BH, if called in
coroutine.
For example, the bdrv_drain() in mirror.c can hang when a guest issued
request is pending on it in qemu_co_mutex_lock().
Mirror coroutine in this case has just finished a request, and the block
job is about to complete. It calls bdrv_drain() which waits for the
other coroutine to complete. The other coroutine is a scsi-disk request.
The deadlock happens when the latter is in turn pending on the former to
yield/terminate, in qemu_co_mutex_lock(). The state flow is as below
(assuming a qcow2 image):
mirror coroutine scsi-disk coroutine
-------------------------------------------------------------
do last write
qcow2:qemu_co_mutex_lock()
...
scsi disk read
tracked request begin
qcow2:qemu_co_mutex_lock.enter
qcow2:qemu_co_mutex_unlock()
bdrv_drain
while (has tracked request)
aio_poll()
In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return
because the mirror coroutine is blocked in the aio_poll(blocking=true).
With this patch, the added qemu_coroutine_yield() allows the scsi-disk
coroutine to make progress as expected:
mirror coroutine scsi-disk coroutine
-------------------------------------------------------------
do last write
qcow2:qemu_co_mutex_lock()
...
scsi disk read
tracked request begin
qcow2:qemu_co_mutex_lock.enter
qcow2:qemu_co_mutex_unlock()
bdrv_drain.enter
> schedule BH
> qemu_coroutine_yield()
> qcow2:qemu_co_mutex_lock.return
> ...
tracked request end
...
(resumed from BH callback)
bdrv_drain.return
...
Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1459855253-5378-2-git-send-email-famz@redhat.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The only remaining users were block jobs (mirror and backup) which
unconditionally enabled WCE on the BlockBackend of the target image. As
these block jobs don't go through BlockBackend for their I/O requests,
they aren't affected by this setting anyway but always get a writeback
mode, so that call can be removed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
The previous patches have successively made blk->enable_write_cache the
true source for the information whether a writethrough mode must be
implemented. The corresponding BDRV_O_CACHE_WB is only useless baggage
we're carrying around, so now's the time to remove it.
At the same time, we remove the 'cache.writeback' option parsing on the
BDS level as the only effect was setting the BDRV_O_CACHE_WB flag.
This change requires test cases that explicitly enabled the option to
drop it. Other than that and the change of the error message when
writethrough is enabled on the BDS level (from "Can't set writethrough
mode" to "doesn't support the option"), there should be no change in
behaviour.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Whether a write cache is used or not is a decision that concerns the
user (e.g. the guest device) rather than the backend. It was already
logically part of the BB level as bdrv_move_feature_fields() always kept
it on top of the BDS tree; with this patch, the core of it (the actual
flag and the additional flushes) is also implemented there.
Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs
doesn't have a BlockBackend attached.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
It's like bdrv_parse_cache_flags(), except that writethrough mode isn't
included in the flags, but returned as a separate bool.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
For a couple of releases we have been warning
Encrypted images are deprecated
Support for them will be removed in a future release.
You can use 'qemu-img convert' to convert your image to an unencrypted one.
This warning was issued by system emulators, qemu-img, qemu-nbd
and qemu-io. Such a broad warning was issued because the original
intention was to rip out all the code for dealing with encryption
inside the QEMU block layer APIs.
The new block encryption framework used for the LUKS driver does
not rely on the unloved block layer API for encryption keys,
instead using the QOM 'secret' object type. It is thus no longer
appropriate to warn about encryption unconditionally.
When the qcow/qcow2 drivers are converted to use the new encryption
framework too, it will be practical to keep AES-CBC support present
for use in qemu-img, qemu-io & qemu-nbd to allow for interoperability
with older QEMU versions and liberation of data from existing encrypted
qcow2 files.
This change moves the warning out of the generic block code and
into the qcow/qcow2 drivers. Further, the warning is set to only
appear when running the system emulators, since qemu-img, qemu-io,
qemu-nbd are expected to support qcow2 encryption long term now that
the maint burden has been eliminated.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When opening an image it is useful to know whether the caller
intends to perform I/O on the image or not. In the case of
encrypted images this will allow the block driver to avoid
having to prompt for decryption keys when we merely want to
query header metadata about the image. eg qemu-img info
This flag is enforced at the top level only, since even if
we don't want todo I/O on the 'qcow2' file payload, the
underlying 'file' driver will still need todo I/O to read
the qcow2 header, for example.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The call in hmp_drive_del() is dead code because blk_remove_bs() is
called a few lines above. The only other remaining user is
bdrv_delete(), which only abuses bdrv_make_anon() to remove it from the
named nodes list. This path inlines the list entry removal into
bdrv_delete() and removes bdrv_make_anon().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
qemu-common.h should only be included by .c files. Its file comment
explains why: "No header file should depend on qemu-common.h, as this
would easily lead to circular header dependencies."
qemu/iov.h includes qemu-common.h for QEMUIOVector stuff. Move all
that to qemu/iov.h and drop the ill-advised include. Include
qemu/iov.h where the QEMUIOVector stuff is now missing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a function for iterating over all monitor-owned BlockDriverStates so
the generic block layer can do so.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The only code change is making bdrv_dirty_bitmap_truncate public. It is
used in block.c.
Also two long lines (bdrv_get_dirty) are wrapped.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-5-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Following patches to refactor and move block dirty bitmap code could use
this.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-4-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Since commit 91a097e, we end up with a somewhat weird cache mode
configuration with snapshot=on: The commit broke the cache mode
inheritance for the snapshot overlay so that it is opened as
writethrough instead of unsafe now. The following bdrv_append() call to
put it on top of the tree swaps the WCE flag with the snapshot's backing
file (i.e. the originally given file), so what we eventually get is
cache=writeback on the temporary overlay and
cache=writethrough,cache.no-flush=on on the real image file.
This patch changes things so that the temporary overlay gets
cache=unsafe again like it used to, and the real images get whatever the
user specified. This means that cache.direct is now respected even with
snapshot=on, and in the case of committing changes, the final flush is
no longer ignored except explicitly requested by the user.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This fixes a regression introduced with commit 3f09bfbc7. Multiple
bugs arise in conjunction with live snapshots and mirroring operations
(which include active layer commit).
After a live snapshot occurs, the active layer and the base layer both
have a non-NULL tqe_prev field in the device_list, although the base
node's tqe_prev field points to a NULL entry. This non-NULL tqe_prev
field occurs after the bdrv_append() in the external snapshot calls
change_parent_backing_link().
In change_parent_backing_link(), when the previous active layer is
removed from device_list, the device_list.tqe_prev pointer is not
set to NULL.
The operating scheme in the block layer is to indicate that a BDS belongs
in the bdrv_states device_list iff the device_list.tqe_prev pointer
is non-NULL.
This patch does two things:
1.) Introduces a new block layer helper bdrv_device_remove() to remove a
BDS from the device_list, and
2.) uses that new API, which also fixes the regression once used in
change_parent_backing_link().
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 0cd51e11c0666c04ddb7c05293fe94afeb551e89.1454376655.git.jcody@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The added parameter can be used to return the BDS pointer which the
valid offset is referring to. Its value should be ignored unless
BDRV_BLOCK_OFFSET_VALID in ret is set.
Until block drivers fill in the right value, let's clear it explicitly
right before calling .bdrv_get_block_status.
The "bs->file" condition in bdrv_co_get_block_status is kept now to keep iotest
case 102 passing, and will be fixed once all drivers return the right file
pointer.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-2-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>