Commit Graph

1324 Commits

Author SHA1 Message Date
Max Reitz
3cf746b3f1 block: Deep-clear inherits_from
BDS.inherits_from does not always point to an immediate parent node.
When launching a block job with a filter node, for example, the node
directly below the filter will not point to the filter, but keep its old
pointee (above the filter).

If that pointee goes away while the job is still running, the node's
inherits_from will not be updated and thus point to garbage.  To fix
this, bdrv_unref_child() has to check not only the parent node's
immediate children for nodes whose inherits_from needs to be cleared,
but its whole subtree.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190703172813.6868-7-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-15 15:48:40 +02:00
Max Reitz
e5182c1c57 block: Add BDS.never_freeze
The commit and the mirror block job must be able to drop their filter
node at any point.  However, this will not be possible if any of the
BdrvChild links to them is frozen.  Therefore, we need to prevent them
from ever becoming frozen.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190703172813.6868-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-15 15:48:40 +02:00
Max Reitz
1046779e64 block: Ignore loosening perm restrictions failures
We generally assume that loosening permission restrictions can never
fail.  We have seen in the past that this assumption is wrong.  This has
led to crashes because we generally pass &error_abort when loosening
permissions.

However, a failure in such a case should actually be handled in quite
the opposite way: It is very much not fatal, so qemu may report it, but
still consider the operation successful.  The only realistic problem is
that qemu may then retain permissions and thus locks on images it
actually does not require.  But again, that is not fatal.

To implement this behavior, we make all functions that change
permissions and that pass &error_abort to the initiating function
(bdrv_check_perm() or bdrv_child_check_perm()) evaluate the
@loosen_restrictions value introduced in the previous patch.  If it is
true and an error did occur, we abort the permission update, discard the
error, and instead report success to the caller.

bdrv_child_try_set_perm() itself does not pass &error_abort, but it is
the only public function to change permissions.  As such, callers may
pass &error_abort to it, expecting dropping permission restrictions to
never fail.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Max Reitz
9eab154415 block: Add *tighten_restrictions to *check*_perm()
This patch makes three functions report whether the necessary permission
change tightens restrictions or not.  These functions are:
- bdrv_check_perm()
- bdrv_check_update_perm()
- bdrv_child_check_perm()

Callers can use this result to decide whether a failure is fatal or not
(see the next patch).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Max Reitz
87ace5f8b6 block: Fix order in bdrv_replace_child()
We have to start by applying the permission restrictions to new_bs
before we can loosen them on old_bs.  See the comment for the
explanation.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Max Reitz
c1087f1206 block: Add bdrv_child_refresh_perms()
If a block node uses bdrv_child_try_set_perm() to change the permission
it takes on its child, the result may be very short-lived.  If anything
makes the block layer recalculate the permissions internally, it will
invoke the node driver's .bdrv_child_perm() implementation.  The
permission/shared permissions masks that returns will then override the
values previously passed to bdrv_child_try_set_perm().

If drivers want a child edge to have specific values for the
permissions/shared permissions mask, it must return them in
.bdrv_child_perm().  Consequentially, there is no need for them to pass
the same values to bdrv_child_try_set_perm() then: It is better to have
a function that invokes .bdrv_child_perm() and calls
bdrv_child_try_set_perm() with the result.  This patch adds such a
function under the name of bdrv_child_refresh_perms().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Vladimir Sementsov-Ogievskiy
b23c580c94 block: drop bs->job
Drop remaining users of bs->job:
1. assertions actually duplicated by assert(!bs->refcnt)
2. trace-point seems not enough reason to change stream_start to return
   BlockJob pointer
3. Restricting creation of two jobs based on same bs is bad idea, as
   3.1 Some jobs creates filters to be their main node, so, this check
   don't actually prevent creating second job on same real node (which
   will create another filter node) (but I hope it is restricted by
   other mechanisms)
   3.2 Even without bs->job we have two systems of permissions:
   op-blockers and BLK_PERM
   3.3 We may want to run several jobs on one node one day

And finally, drop bs->job pointer itself. Hurrah!

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Kevin Wolf
42a65f02f9 block: Remove bdrv_set_aio_context()
All callers of bdrv_set_aio_context() are eliminated now, they have
moved to bdrv_try_set_aio_context() and related safe functions. Remove
bdrv_set_aio_context().

With this, we can now know that the .set_aio_ctx callback must be
present in bdrv_set_aio_context_ignore() because
bdrv_can_set_aio_context() would have returned false previously, so
instead of checking the condition, we can assert it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 16:55:58 +02:00
Kevin Wolf
d0ee0204f4 block: Remove wrong bdrv_set_aio_context() calls
The mirror and commit block jobs use bdrv_set_aio_context() to move
their filter node into the right AioContext before hooking it up in the
graph. Similarly, bdrv_open_backing_file() explicitly moves the backing
file node into the right AioContext first.

This isn't necessary any more, they get automatically moved into the
right context now when attaching them.

However, in the case of bdrv_open_backing_file() with a node reference,
it's actually not only unnecessary, but even wrong: The unchecked
bdrv_set_aio_context() changes the AioContext of the child node even if
other parents require it to retain the old context. So this is not only
a simplification, but a bug fix, too.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1684342
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:22:22 +02:00
Kevin Wolf
ad943dcb22 block: Move node without parents to main AioContext
A node should only be in a non-default AioContext if a user is attached
to it that requires this. When the last parent of a node is gone, it can
move back to the main AioContext.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:22:22 +02:00
Kevin Wolf
132ada80c4 block: Adjust AioContexts when attaching nodes
So far, we only made sure that updating the AioContext of a node
affected the whole subtree. However, if a node is newly attached to a
new parent, we also need to make sure that both the subtree of the node
and the parent are in the same AioContext. This tries to move the new
child node to the parent AioContext and returns an error if this isn't
possible.

BlockBackends now actually apply their AioContext to their root node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:22:22 +02:00
Kevin Wolf
d861ab3acf block: Add BlockBackend.ctx
This adds a new parameter to blk_new() which requires its callers to
declare from which AioContext this BlockBackend is going to be used (or
the locks of which AioContext need to be taken anyway).

The given context is only stored and kept up to date when changing
AioContexts. Actually applying the stored AioContext to the root node
is saved for another commit.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:22:22 +02:00
Kevin Wolf
f871abd60f block: Drain source node in bdrv_replace_node()
Instead of just asserting that no requests are in flight in
bdrv_replace_node(), which is a requirement that most callers ignore, we
can just drain the source node right there. This fixes at least starting
a commit job while I/O is active on the backing chain, but probably
other callers, too.

Having requests in flight on the target node isn't a problem because the
target just gets new parents, but the call path of running requests
isn't modified. So we can just drop this assertion without a replacement.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1711643
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-06-04 15:20:41 +02:00
Alberto Garcia
b441dc71c0 block: Make bdrv_root_attach_child() unref child_bs on failure
A consequence of the previous patch is that bdrv_attach_child()
transfers the reference to child_bs from the caller to parent_bs,
which will drop it on bdrv_close() or when someone calls
bdrv_unref_child().

But this only happens when bdrv_attach_child() succeeds. If it fails
then the caller is responsible for dropping the reference to child_bs.

This patch makes bdrv_attach_child() take the reference also when
there is an error, freeing the caller for having to do it.

A similar situation happens with bdrv_root_attach_child(), so the
changes on this patch affect both functions.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20dfb3d9ccec559cdd1a9690146abad5d204a186.1557754872.git.berto@igalia.com
[mreitz: Removed now superfluous BdrvChild * variable in
         bdrv_open_child()]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Alberto Garcia
dd4118c792 block: Use bdrv_unref_child() for all children in bdrv_close()
bdrv_unref_child() does the following things:

  - Updates the child->bs->inherits_from pointer.
  - Calls bdrv_detach_child() to remove the BdrvChild from bs->children.
  - Calls bdrv_unref() to unref the child BlockDriverState.

When bdrv_unref_child() was introduced in commit 33a604075c it was not
used in bdrv_close() because the drivers that had additional children
(like quorum or blkverify) had already called bdrv_unref() on their
children during their own close functions.

This was changed later (in 0bd6e91a7e for quorum, in 3e586be0b2 for
blkverify) so there's no reason not to use bdrv_unref_child() in
bdrv_close() anymore.

After this there's also no need to remove bs->backing and bs->file
separately from the rest of the children, so bdrv_close() can be
simplified.

Now bdrv_close() unrefs all children (before this patch it was only
bs->file and bs->backing). As a result, none of the callers of
brvd_attach_child() should remove their reference to child_bs (because
this function effectively steals that reference). This patch updates a
couple of tests that were doing their own bdrv_unref().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 6d1d5feaa53aa1ab127adb73d605dc4503e3abd5.1557754872.git.berto@igalia.com
[mreitz: s/where/were/]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Max Reitz
481e0eeef4 block: Improve "Block node is read-only" message
This message does not make any sense when it appears as the response to
making an R/W node read-only.  We should detect that case and emit a
different message, then.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:57 +02:00
Kevin Wolf
53a7d04185 block: Propagate AioContext change to parents
All block nodes and users in any connected component of the block graph
must be in the same AioContext, so changing the AioContext of one node
must not only change all of its children, but all of its parents (and
in turn their children etc.) as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Kevin Wolf
0d83708a1d block: Move recursion to bdrv_set_aio_context()
Instead of having two recursions, in bdrv_attach_aio_context() and in
bdrv_detach_aio_context(), just having one recursion is enough. Said
functions are only about a single node now.

It is important that the recursion doesn't happen between detaching and
attaching a context to the current node because the nested call will
drain the node, and draining with a NULL context would segfault.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Kevin Wolf
a3a683c33d block: Make bdrv_attach/detach_aio_context() static
Since commit b97511c7bc, there is no reason for block drivers any more
to call these functions (see the function comment in block_int.h). They
are now just internal helper functions for bdrv_set_aio_context()
and can be made static.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Kevin Wolf
5d2318499f block: Add bdrv_try_set_aio_context()
Eventually, we want to make sure that all parents and all children of a
node are in the same AioContext as the node itself. This means that
changing the AioContext may fail because one of the other involved
parties (e.g. a guest device that was configured with an iothread)
cannot allow switching to a different AioContext.

Introduce a set of functions that allow to first check whether all
involved nodes can switch to a new context and only then do the actual
switch. The check recursively covers children and parents.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Anton Kuchin
30c321f97f block: remove bs from lists before closing
Close involves flush that can be performed asynchronously and bs
must be protected from being referenced before it is deleted.

Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-10 16:45:40 +02:00
Nikita Alekseev
66a5bdf309 block: Add coroutine_fn to bdrv_check_co_entry
bdrv_check_co_entry calls bdrv_co_check, which is a coroutine function.
Thus, it also needs to be marked as a coroutine.

Signed-off-by: Nikita Alekseev <n.alekseev2104@gmail.com>
Message-id: 20190401093051.16488-1-n.alekseev2104@gmail.com
Message-Id: <20190401093051.16488-1-n.alekseev2104@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-05-10 10:53:21 +01:00
Alberto Garcia
0b3ca76e52 block: Assert that drv->bdrv_child_perm is set in bdrv_child_perm()
There is no need to check for this because all block drivers that have
children implement bdrv_child_perm and all callers already ensure that
bs->drv is set.

Furthermore, if this check would fail then the callers would end up
with uninitialized values for nperm and nshared.

This patch replaces the check with an assertion.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190404112953.4058-1-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Kevin Wolf
1bffe1ae7a block: Fix AioContext switch for bs->drv == NULL
Even for block nodes with bs->drv == NULL, we can't just ignore a
bdrv_set_aio_context() call. Leaving the node in its old context can
mean that it's still in an iothread context in bdrv_close_all() during
shutdown, resulting in an attempted unlock of the AioContext lock which
we don't hold.

This is an example stack trace of a related crash:

 #0  0x00007ffff59da57f in raise () at /lib64/libc.so.6
 #1  0x00007ffff59c4895 in abort () at /lib64/libc.so.6
 #2  0x0000555555b97b1e in error_exit (err=<optimized out>, msg=msg@entry=0x555555d386d0 <__func__.19059> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
 #3  0x0000555555b97f7f in qemu_mutex_unlock_impl (mutex=mutex@entry=0x5555568002f0, file=file@entry=0x555555d378df "util/async.c", line=line@entry=507) at util/qemu-thread-posix.c:97
 #4  0x0000555555b92f55 in aio_context_release (ctx=ctx@entry=0x555556800290) at util/async.c:507
 #5  0x0000555555b05cf8 in bdrv_prwv_co (child=child@entry=0x7fffc80012f0, offset=offset@entry=131072, qiov=qiov@entry=0x7fffffffd4f0, is_write=is_write@entry=true, flags=flags@entry=0)
         at block/io.c:833
 #6  0x0000555555b060a9 in bdrv_pwritev (qiov=0x7fffffffd4f0, offset=131072, child=0x7fffc80012f0) at block/io.c:990
 #7  0x0000555555b060a9 in bdrv_pwrite (child=0x7fffc80012f0, offset=131072, buf=<optimized out>, bytes=<optimized out>) at block/io.c:990
 #8  0x0000555555ae172b in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568cc740, i=i@entry=0) at block/qcow2-cache.c:51
 #9  0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568cc740) at block/qcow2-cache.c:248
 #10 0x0000555555ae15de in qcow2_cache_flush (bs=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
 #11 0x0000555555ae16b1 in qcow2_cache_flush_dependency (c=0x5555568a1700, c=0x5555568a1700, bs=0x555556810680) at block/qcow2-cache.c:194
 #12 0x0000555555ae16b1 in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568a1700, i=i@entry=0) at block/qcow2-cache.c:194
 #13 0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568a1700) at block/qcow2-cache.c:248
 #14 0x0000555555ae15de in qcow2_cache_flush (bs=bs@entry=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
 #15 0x0000555555ad242c in qcow2_inactivate (bs=bs@entry=0x555556810680) at block/qcow2.c:2124
 #16 0x0000555555ad2590 in qcow2_close (bs=0x555556810680) at block/qcow2.c:2153
 #17 0x0000555555ab0c62 in bdrv_close (bs=0x555556810680) at block.c:3358
 #18 0x0000555555ab0c62 in bdrv_delete (bs=0x555556810680) at block.c:3542
 #19 0x0000555555ab0c62 in bdrv_unref (bs=0x555556810680) at block.c:4598
 #20 0x0000555555af4d72 in blk_remove_bs (blk=blk@entry=0x5555568103d0) at block/block-backend.c:785
 #21 0x0000555555af4dbb in blk_remove_all_bs () at block/block-backend.c:483
 #22 0x0000555555aae02f in bdrv_close_all () at block.c:3412
 #23 0x00005555557f9796 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4776

The reproducer I used is a qcow2 image on gluster volume, where the
virtual disk size (4 GB) is larger than the gluster volume size (64M),
so we can easily trigger an ENOSPC. This backend is assigned to a
virtio-blk device using an iothread, and then from the guest a
'dd if=/dev/zero of=/dev/vda bs=1G count=1' causes the VM to stop
because of an I/O error. qemu_gluster_co_flush_to_disk() sets
bs->drv = NULL on error, so when virtio-blk stops the dataplane, the
block nodes stay in the iothread AioContext. A 'quit' monitor command
issued from this paused state crashes the process.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1631227
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2019-04-30 15:29:00 +02:00
Kevin Wolf
3f48686fac block: Forward 'discard' to temporary overlay
When bdrv_temp_snapshot_options() is called for snapshot=on, the
'discard' option in the options QDict hasn't been parsed and merged into
the flags yet. So copy the dict entry to make sure that the temporary
overlay enables discard when it was requested for the drive.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2019-04-08 16:48:46 +02:00
Alberto Garcia
0f0998f621 block: continue until base is found in bdrv_freeze_backing_chain() et al
All three functions that handle the BdrvChild.frozen attribute walk
the backing chain from 'bs' to 'base' and stop either when 'base' is
found or at the end of the chain if 'base' is NULL.

However if 'base' is not found then the functions return without
errors as if it was NULL.

This is wrong: if the caller passed an incorrect parameter that means
that there is a bug in the code.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-02 12:04:44 +02:00
Kevin Wolf
2345bde647 block: Silence Coverity in bdrv_drop_intermediate()
Coverity doesn't like that the return value of bdrv_check_update_perm()
stays unused only in this place (CID 1399710).

Even if checking local_err should be equivalent to checking ret < 0,
let's switch to using the return value to be more consistent (and in
case of a bug somewhere down the call chain, forgetting to assign errp
is more likely than returning 0 for an error case).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2019-03-19 15:49:29 +01:00
Alberto Garcia
5019aece2a block: Remove the AioContext parameter from bdrv_reopen_multiple()
This parameter has been unused since 1a63a90750

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Alberto Garcia
faf116b438 block: Add bdrv_reset_options_allowed()
bdrv_reopen_prepare() receives a BDRVReopenState with (among other
things) a new set of options to be applied to that BlockDriverState.

If an option is missing then it means that we want to reset it to its
default value rather than keep the previous one. This way the state
of the block device after being reopened is comparable to that of a
device added with "blockdev-add" using the same set of options.

Not all options from all drivers can be changed this way, however.
If the user attempts to reset an immutable option to its default value
using this method then we must forbid it.

This new function takes a BlockDriverState and a new set of options
and checks if there's any option that was previously set but is
missing from the new set of options.

If the option is present in both sets we don't need to check that they
have the same value. The loop at the end of bdrv_reopen_prepare()
already takes care of that.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Alberto Garcia
cb828c31de block: Allow changing the backing file on reopen
This patch allows the user to change the backing file of an image that
is being reopened. Here's what it does:

 - In bdrv_reopen_prepare(): check that the value of 'backing' points
   to an existing node or is null. If it points to an existing node it
   also needs to make sure that replacing the backing file will not
   create a cycle in the node graph (i.e. you cannot reach the parent
   from the new backing file).

 - In bdrv_reopen_commit(): perform the actual node replacement by
   calling bdrv_set_backing_hd().

There may be temporary implicit nodes between a BDS and its backing
file (e.g. a commit filter node). In these cases bdrv_reopen_prepare()
looks for the real (non-implicit) backing file and requires that the
'backing' option points to it. Replacing or detaching a backing file
is forbidden if there are implicit nodes in the middle.

Although x-blockdev-reopen is meant to be used like blockdev-add,
there's an important thing that must be taken into account: the only
way to set a new backing file is by using a reference to an existing
node (previously added with e.g. blockdev-add).  If 'backing' contains
a dictionary with a new set of options ({"driver": "qcow2", "file": {
... }}) then it is interpreted that the _existing_ backing file must
be reopened with those options.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Alberto Garcia
bacd9b87c4 block: Allow omitting the 'backing' option in certain cases
Of all options of type BlockdevRef used to specify children in
BlockdevOptions, 'backing' is the only one that is optional.

For "x-blockdev-reopen" we want that if an option is omitted then it
must be reset to its default value. The default value of 'backing'
means that QEMU opens the backing file specified in the image
metadata, but this is not something that we want to support for the
reopen operation.

Because of this the 'backing' option has to be specified during
reopen, pointing to the existing backing file if we want to keep it,
or pointing to a different one (or NULL) if we want to replace it (to
be implemented in a subsequent patch).

In order to simplify things a bit and not to require that the user
passes the 'backing' option to every single block device even when
it's clearly not necessary, this patch allows omitting this option if
the block device being reopened doesn't have a backing file attached
_and_ no default backing file is specified in the image metadata.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Alberto Garcia
8546632e61 block: Handle child references in bdrv_reopen_queue()
Children in QMP are specified with BlockdevRef / BlockdevRefOrNull,
which can contain a set of child options, a child reference, or
NULL. In optional attributes like "backing" it can also be missing.

Only the first case (set of child options) is being handled properly
by bdrv_reopen_queue(). This patch deals with all the others.

Here's how these cases should be handled when bdrv_reopen_queue() is
deciding what to do with each child of a BlockDriverState:

   1) Set of child options: if the child was implicitly created (i.e
      inherits_from points to the parent) then the options are removed
      from the parent's options QDict and are passed to the child with
      a recursive bdrv_reopen_queue() call. This case was already
      working fine.

   2) Child reference: there's two possibilites here.

      2a) Reference to the current child: if the child was implicitly
          created then it is put in the reopen queue, keeping its
          current set of options (since this was a child reference
          there was no way to specify a different set of options).
          If the child is not implicit then it keeps its current set
          of options but it is not reopened (and therefore does not
          inherit any new option from the parent).

      2b) Reference to a different BDS: the current child is not put
          in the reopen queue at all. Passing a reference to a
          different BDS can be used to replace a child, although at
          the moment no driver implements this, so it results in an
          error. In any case, the current child is not going to be
          reopened (and might in fact disappear if it's replaced)

   3) NULL: This is similar to (2b). Although no driver allows this
      yet it can be used to detach the current child so it should not
      be put in the reopen queue.

   4) Missing option: at the moment "backing" is the only case where
      this can happen. With "blockdev-add", leaving "backing" out
      means that the default backing file is opened. We don't want to
      open a new image during reopen, so we require that "backing" is
      always present. We'll relax this requirement a bit in the next
      patch. If keep_old_opts is true and "backing" is missing then
      this behaves like 2a (the current child is reopened).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Alberto Garcia
077e8e2018 block: Add 'keep_old_opts' parameter to bdrv_reopen_queue()
The bdrv_reopen_queue() function is used to create a queue with
the BDSs that are going to be reopened and their new options. Once
the queue is ready bdrv_reopen_multiple() is called to perform the
operation.

The original options from each one of the BDSs are kept, with the new
options passed to bdrv_reopen_queue() applied on top of them.

For "x-blockdev-reopen" we want a function that behaves much like
"blockdev-add". We want to ignore the previous set of options so that
only the ones actually specified by the user are applied, with the
rest having their default values.

One of the things that we need is a way to tell bdrv_reopen_queue()
whether we want to keep the old set of options or not, and that's what
this patch does. All current callers are setting this new parameter to
true and x-blockdev-reopen will set it to false.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Alberto Garcia
2cad1ebe70 block: Allow freezing BdrvChild links
Our permission system is useful to define what operations are allowed
on a certain block node and includes things like BLK_PERM_WRITE or
BLK_PERM_RESIZE among others.

One of the permissions is BLK_PERM_GRAPH_MOD which allows "changing
the node that this BdrvChild points to". The exact meaning of this has
never been very clear, but it can be understood as "change any of the
links connected to the node". This can be used to prevent changing a
backing link, but it's too coarse.

This patch adds a new 'frozen' attribute to BdrvChild, which forbids
detaching the link from the node it points to, and new API to freeze
and unfreeze a backing chain.

After this change a few functions can fail, so they need additional
checks.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Kevin Wolf
3085513778 file-posix: Fix bdrv_open_flags() for snapshot=on
Using a different read-only setting for bs->open_flags than for the
flags to the driver's open function is just inconsistent and a bad idea.
After this patch, the temporary snapshot keeps being opened read-only if
read-only=on,snapshot=on is passed.

If we wanted to change this behaviour to make only the orginal image
file read-only, but the temporary overlay read-write (as the comment in
the removed code suggests), that change would have to be made in
bdrv_temp_snapshot_options() (where the comment suggests otherwise).

Addressing this inconsistency before introducing dynamic auto-read-only
is important because otherwise we would immediately try to reopen the
temporary overlay even though the file is already unlinked.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Kevin Wolf
69b736e765 block: Make permission changes in reopen less wrong
The way that reopen interacts with permission changes has one big
problem: Both operations are recursive, and the permissions are changes
for each node in the reopen queue.

For a simple graph that consists just of parent and child,
.bdrv_check_perm will be called twice for the child, once recursively
when adjusting the permissions of parent, and once again when the child
itself is reopened.

Even worse, the first .bdrv_check_perm call happens before
.bdrv_reopen_prepare was called for the child and the second one is
called afterwards.

Making sure that .bdrv_check_perm (and the other permission callbacks)
are called only once is hard. We can cope with multiple calls right now,
but as soon as file-posix gets a dynamic auto-read-only that may need to
open a new file descriptor, we get the additional requirement that all
of them are after the .bdrv_reopen_prepare call.

So reorder things in bdrv_reopen_multiple() to first call
.bdrv_reopen_prepare for all involved nodes and only then adjust
permissions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Kevin Wolf
a4615ab31c block: Avoid useless local_err
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2019-03-12 20:30:14 +01:00
Andrey Shinkevich
9ac404c523 block: iterate_format with account of whitelisting
bdrv_iterate_format (which is currently only used for printing out the
formats supported by the block layer) doesn't take format whitelisting
into account.

This creates a problem for tests: they enumerate supported formats to
decide which tests to enable, but then discover that QEMU doesn't let
them actually use some of those formats.

To avoid that, exclude formats that are not whitelisted from
enumeration, if whitelisting is in use.  Since we have separate
whitelists for r/w and r/o, take this a parameter to
bdrv_iterate_format, and print two lists of supported formats (r/w and
r/o) in main qemu.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:45 +01:00
Max Reitz
62a01a27f7 block: BDS options may lack the "driver" option
When BDSs are created by qemu itself (e.g. as filters in block jobs),
they may not have a "driver" option in their options QDict.  When
generating a json:{} filename, however, it must always be present.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-31-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:27 +01:00
Max Reitz
fb695c74aa block: Do not copy exact_filename from format file
If a format BDS's file BDS is in turn a format BDS, we cannot simply use
the same filename, because when opening a BDS tree based on a filename
alone, qemu will create only one format node on top of one protocol node
(disregarding a potential backing file).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-26-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:27 +01:00
Max Reitz
998b3a1e5a block: Purify .bdrv_refresh_filename()
Currently, BlockDriver.bdrv_refresh_filename() is supposed to both
refresh the filename (BDS.exact_filename) and set BDS.full_open_options.
Now that we have generic code in the central bdrv_refresh_filename() for
creating BDS.full_open_options, we can drop the latter part from all
BlockDriver.bdrv_refresh_filename() implementations.

This also means that we can drop all of the existing default code for
this from the global bdrv_refresh_filename() itself.

Furthermore, we now have to call BlockDriver.bdrv_refresh_filename()
after having set BDS.full_open_options, because the block driver's
implementation should now be allowed to depend on BDS.full_open_options
being set correctly.

Finally, with this patch we can drop the @options parameter from
BlockDriver.bdrv_refresh_filename(); also, add a comment on this
function's purpose in block/block_int.h while touching its interface.

This completely obsoletes blklogwrite's implementation of
.bdrv_refresh_filename().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-25-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:27 +01:00
Max Reitz
97e2f021f8 block: Generically refresh runtime options
Instead of having every block driver which implements
bdrv_refresh_filename() copy all of the strong runtime options over to
bs->full_open_options, implement this process generically in
bdrv_refresh_filename().

This patch only adds this new generic implementation, it does not remove
the old functionality. This is done in a follow-up patch.

With this patch, some superfluous information (that should never have
been there) may be removed from some JSON filenames, as can be seen in
the change to iotests 110's and 228's reference outputs.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-24-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:27 +01:00
Max Reitz
8df686165b block: Use bdrv_dirname() for relative filenames
bdrv_get_full_backing_filename_from_filename() breaks down when it comes
to JSON filenames. Using bdrv_dirname() as the basis is better because
since we have BDS, we can descend through the BDS tree to the protocol
layer, which gives us a greater probability of finding a non-JSON name;
also, bdrv_dirname() is more correct as it allows block drivers to
override the generation of that directory name in a protocol-specific
way.

We still need to keep bdrv_get_full_backing_filename_from_filename(),
though, because it has valid callers which need it during image creation
when no BDS is available yet.

This makes a test case in qemu-iotest 110, which was supposed to fail,
work. That is actually good, but we need to change the reference output
(and the comment in 110) accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-20-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:27 +01:00
Max Reitz
1e89d0f9be block: Add bdrv_dirname()
This function may be implemented by block drivers to derive a directory
name from a BDS. Concatenating this g_free()-able string with a relative
filename must result in a valid (not necessarily existing) filename, so
this is a function that should generally be not implemented by format
drivers, because this is protocol-specific.

If a BDS's driver does not implement this function, bdrv_dirname() will
fall through to the BDS's file if it exists. If it does not, the
exact_filename field will be used to generate a directory name.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-15-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz
2d9158ce79 block: Fix bdrv_find_backing_image()
bdrv_find_backing_image() should use bdrv_get_full_backing_filename() or
bdrv_make_absolute_filename() instead of trying to do what those
functions do by itself.

path_combine_deprecated() can now be dropped, so let's do that.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-14-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz
9f4793d8f2 block: Add bdrv_make_absolute_filename()
This is a general function for making a filename that is relative to a
certain BDS absolute.

It calls bdrv_get_full_backing_filename_from_filename() for now, but
that will be changed in a follow-up patch.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-13-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz
6b6833c1b4 block: bdrv_get_full_backing_filename's ret. val.
Make bdrv_get_full_backing_filename() return an allocated string instead
of placing the result in a caller-provided buffer.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-12-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz
645ae7d88e block: bdrv_get_full_backing_filename_from_...'s ret. val.
Make bdrv_get_full_backing_filename_from_filename() return an allocated
string instead of placing the result in a caller-provided buffer.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-11-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz
009b03aaa2 block: Make path_combine() return the path
Besides being safe for arbitrary path lengths, after some follow-up
patches all callers will want a freshly allocated buffer anyway.

In the meantime, path_combine_deprecated() is added which has the same
interface as path_combine() had before this patch. All callers to that
function will be converted in follow-up patches.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20190201192935.18394-10-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz
909936234c block: Respect backing bs in bdrv_refresh_filename
Basically, bdrv_refresh_filename() should respect all children of a
BlockDriverState. However, generally those children are driver-specific,
so this function cannot handle the general case. On the other hand,
there are only few drivers which use other children than @file and
@backing (that being vmdk, quorum, and blkverify).

Most block drivers only use @file and/or @backing (if they use any
children at all). Both can be implemented directly in
bdrv_refresh_filename.

The user overriding the file's filename is already handled, however, the
user overriding the backing file is not. If this is done, opening the
BDS with the plain filename of its file will not be correct, so we may
not set bs->exact_filename in that case.

iotest 051 contains test cases for overriding the backing file, and so
its output changes with this patch applied.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:25 +01:00
Max Reitz
998c201923 block: Add BDS.auto_backing_file
If the backing file is overridden, this most probably does change the
guest-visible data of a BDS.  Therefore, we will need to consider this
in bdrv_refresh_filename().

To see whether it has been overridden, we might want to compare
bs->backing_file and bs->backing->bs->filename.  However,
bs->backing_file is changed by bdrv_set_backing_hd() (which is just used
to change the backing child at runtime, without modifying the image
header), so bs->backing_file most of the time simply contains a copy of
bs->backing->bs->filename anyway, so it is useless for such a
comparison.

This patch adds an auto_backing_file BDS field which contains the
backing file path as indicated by the image header, which is not changed
by bdrv_set_backing_hd().

Because of bdrv_refresh_filename() magic, however, a BDS's filename may
differ from what has been specified during bdrv_open().  Then, the
comparison between bs->auto_backing_file and bs->backing->bs->filename
may fail even though bs->backing was opened from bs->auto_backing_file.
To mitigate this, we can copy the real BDS's filename (after the whole
bdrv_open() and bdrv_refresh_filename() process) into
bs->auto_backing_file, if we know the former has been opened based on
the latter.  This is only possible if no options modifying the backing
file's behavior have been specified, though.  To simplify things, this
patch only copies the filename from the backing file if no options have
been specified for it at all.

Furthermore, there are cases where an overlay is created by qemu which
already contains a BDS's filename (e.g. in blockdev-snapshot-sync).  We
do not need to worry about updating the overlay's bs->auto_backing_file
there, because we actually wrote a post-bdrv_refresh_filename() filename
into the image header.

So all in all, there will be false negatives where (as of a future
patch) bdrv_refresh_filename() will assume that the backing file differs
from what was specified in the image header, even though it really does
not.  However, these cases should be limited to where (1) the user
actually did override something in the backing chain (e.g. by specifying
options for the backing file), or (2) the user executed a QMP command to
change some node's backing file (e.g. change-backing-file or
block-commit with @backing-file given) where the given filename does not
happen to coincide with qemu's idea of the backing BDS's filename.

Then again, (1) really is limited to -drive.  With -blockdev or
blockdev-add, you have to adhere to the schema, so a user cannot give
partial "unimportant" options (e.g. by just setting backing.node-name
and leaving the rest to the image header).  Therefore, trying to fix
this would mean trying to fix something for -drive only.

To improve on (2), we would need a full infrastructure to "canonicalize"
an arbitrary filename (+ options), so it can be compared against
another.  That seems a bit over the top, considering that filenames
nowadays are there mostly for the user's entertainment.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:25 +01:00
Max Reitz
bb808d5f5c block: Skip implicit nodes for filename info
bdrv_refresh_filename() should simply skip all implicit nodes.  They are
supposed to be invisible to the user, so they should not appear in
filename information.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:25 +01:00
Max Reitz
e24518e303 block: Use children list in bdrv_refresh_filename
bdrv_refresh_filename() should invoke itself recursively on all
children, not just on file.

With that change, we can remove the manual invocations in blkverify,
quorum, commit, mirror, and blklogwrites.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:25 +01:00
Max Reitz
f30c66ba6e block: Use bdrv_refresh_filename() to pull
Before this patch, bdrv_refresh_filename() is used in a pushing manner:
Whenever the BDS graph is modified, the parents of the modified edges
are supposed to be updated (recursively upwards).  However, that is
nonviable, considering that we want child changes not to concern
parents.

Also, in the long run we want a pull model anyway: Here, we would have a
bdrv_filename() function which returns a BDS's filename, freshly
constructed.

This patch is an intermediate step.  It adds bdrv_refresh_filename()
calls before every place a BDS.filename value is used.  The only
exceptions are protocol drivers that use their own filename, which
clearly would not profit from refreshing that filename before.

Also, bdrv_get_encrypted_filename() is removed along the way (as a user
of BDS.filename), since it is completely unused.

In turn, all of the calls to bdrv_refresh_filename() before this patch
are removed, because we no longer have to call this function on graph
changes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:25 +01:00
Vladimir Sementsov-Ogievskiy
f962e96150 block: fix bdrv_check_perm for non-tree subgraph
bdrv_check_perm in it's recursion checks each node in context of new
permissions for one parent, because of nature of DFS. It works well,
while children subgraph of top-most updated node is a tree, i.e. it
doesn't have any kind of loops. But if we have a loop (not oriented,
of course), i.e. we have two different ways from top-node to some
child-node, then bdrv_check_perm will do wrong thing:

  top
  | \
  |  |
  v  v
  A  B
  |  |
  v  v
  node

It will once check new permissions of node in context of new A
permissions and old B permissions and once visa-versa. It's a wrong way
and may lead to corruption of permission system. We may start with
no-permissions and all-shared for both A->node and B->node relations
and finish up with non shared write permission for both ways.

The following commit will add a test, which shows this bug.

To fix this situation, let's really set BdrvChild permissions during
bdrv_check_perm procedure. And we are happy here, as check-perm is
already written in transaction manner, so we just need to restore
backed-up permissions in _abort.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25 15:03:19 +01:00
Vladimir Sementsov-Ogievskiy
2f30b7c377 block: improve should_update_child
As it already said in the comment, we don't want to create loops in
parent->child relations. So, when we try to append @to to @c, we should
check that @c is not in @to children subtree, and we should check it
recursively, not only the first level. The patch provides BFS-based
search, to check the relations.

This is needed for further fleecing-hook filter usage: we need to
append it to source, when the hook is already a parent of target, and
source may be in a backing chain of target (fleecing-scheme). So, on
appending, the hook should not became a child (direct or through
children subtree) of the target.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25 15:03:19 +01:00
Kevin Wolf
d70d595429 block: Use normal drain for bdrv_set_aio_context()
Now that bdrv_set_aio_context() works inside drained sections, it can
also use the real drain function instead of open coding something
similar.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25 15:03:19 +01:00
Kevin Wolf
e64f25f30b block: Fix AioContext switch for drained node
When a drained node changes its AioContext, we need to move its
aio_disable_external() to the new context, too.

Without this fix, drain_end will try to reenable the new context, which
has never been disabled, so an assertion failure is triggered.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2019-02-25 15:03:19 +01:00
Kevin Wolf
6c75d761d0 block: Don't poll in bdrv_set_aio_context()
The explicit aio_poll() call in bdrv_set_aio_context() was added in
commit c2b6428d38 as a workaround for bdrv_drain() failing to achieve
to actually quiesce everything (specifically the NBD client code to
switch AioContext).

Now that the NBD client has been fixed to complete this operation during
bdrv_drain(), we don't need the workaround any more.

It was wrong anyway: aio_poll() must always be run in the home thread of
the AioContext.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2019-02-25 15:03:19 +01:00
Denis Plotnikov
57830a499f block: don't set the same context
Adds a fast path on aio context setting preventing
unnecessary context setting routine.
Also, it prevents issues with cyclic walk of child
bds-es appeared because of registering aio walking
notifiers:

Call stack:

0  __GI_raise
1  __GI_abort
2  __assert_fail_base
3  __GI___assert_fail
4  bdrv_detach_aio_context (bs=0x55f54d65c000)      <<<
5  bdrv_detach_aio_context (bs=0x55f54fc8a800)
6  bdrv_set_aio_context (bs=0x55f54fc8a800, ...)
7  block_job_attached_aio_context
8  bdrv_attach_aio_context (bs=0x55f54d65c000, ...) <<<
9  bdrv_set_aio_context (bs=0x55f54d65c000)
10 blk_set_aio_context
11 virtio_blk_data_plane_stop
12 virtio_bus_stop_ioeventfd
13 virtio_vmstate_change
14 vm_state_notify (running=0, state=RUN_STATE_SHUTDOWN)
15 do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=true)
16 vm_stop (state=RUN_STATE_SHUTDOWN)
17 main_loop_should_exit
18 main_loop
19 main

This can happen because of "new" context attachment to VM disk bds.
When attaching a new context the corresponding aio context handler is
called for each of aio_notifiers registered on the VM disk bds context.
Among those handlers, there is the block_job_attached_aio_context handler
which sets a new aio context for the block job bds. When doing so,
the old context is detached from all the block job bds children and one of
them is the VM disk bds, serving as backing store for the blockjob bds,
although the VM disk bds is actually the initializer of that process.
Since the VM disk bds is protected with walking_aio_notifiers flag
from double processing in recursive calls, the assert fires.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25 15:03:19 +01:00
Andrey Shinkevich
1bf6e9ca92 bdrv_query_image_info Error parameter added
Inform a user in case qcow2_get_specific_info fails to obtain
QCOW2 image specific information. This patch is preliminary to
the one "qcow2: Add list of bitmaps to ImageInfoSpecificQCow2".

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1549638368-530182-2-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-11 14:35:43 -06:00
Kevin Wolf
78fc3b3a26 block: Fix invalidate_cache error path for parent activation
bdrv_co_invalidate_cache() clears the BDRV_O_INACTIVE flag before
actually activating a node so that the correct permissions etc. are
taken. In case of errors, the flag must be restored so that the next
call to bdrv_co_invalidate_cache() retries activation.

Restoring the flag was missing in the error path for a failed
parent->role->activate() call. The consequence is that this attempt to
activate all images correctly fails because we still set errp, however
on the next attempt BDRV_O_INACTIVE is already clear, so we return
success without actually retrying the failed action.

An example where this is observable in practice is migration to a QEMU
instance that has a raw format block node attached to a guest device
with share-rw=off (the default) while another process holds
BLK_PERM_WRITE for the same image. In this case, all activation steps
before parent->role->activate() succeed because raw can tolerate other
writers to the image. Only the parent callback (in particular
blk_root_activate()) tries to implement the share-rw=on property and
requests exclusive write permissions. This fails when the migration
completes and correctly displays an error. However, a manual 'cont' will
incorrectly resume the VM without calling blk_root_activate() again.

This case is described in more detail in the following bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1531888

Fix this by correctly restoring the BDRV_O_INACTIVE flag in the error
path.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-01 13:46:45 +01:00
Kevin Wolf
8be25de643 block: Apply auto-read-only for ro-whitelist drivers
If QEMU was configured with a driver in --block-drv-ro-whitelist, trying
to use that driver read-write resulted in an error message even if
auto-read-only=on was set.

Consider auto-read-only=on for the whitelist checking and use it to
automatically degrade to read-only for block drivers on the read-only
whitelist.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-01 13:46:45 +01:00
Kevin Wolf
4720cbeea1 block: Fix hangs in synchronous APIs with iothreads
In the block layer, synchronous APIs are often implemented by creating a
coroutine that calls the asynchronous coroutine-based implementation and
then waiting for completion with BDRV_POLL_WHILE().

For this to work with iothreads (more specifically, when the synchronous
API is called in a thread that is not the home thread of the block
device, so that the coroutine will run in a different thread), we must
make sure to call aio_wait_kick() at the end of the operation. Many
places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if
the condition has long become false.

Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This
corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is
generally not enough for most other operations because they haven't set
the return value in the coroutine entry stub yet. To avoid race
conditions there, we need to kick after setting the return value.

The race window is small enough that the problem doesn't usually surface
in the common path. However, it does surface and causes easily
reproducible hangs if the operation can return early before even calling
bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op
success paths).

The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is
slightly different: These functions even neglected to schedule the
coroutine in the home thread of the node. This avoids the hang, but is
obviously wrong, too. Fix those to schedule the coroutine in the right
AioContext in addition to adding aio_wait_kick() calls.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-01 13:46:44 +01:00
Vladimir Sementsov-Ogievskiy
5d3b4e9946 qapi: add x-debug-query-block-graph
Add a new command, returning block nodes (and their users) graph.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20181221170909.25584-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-01-31 00:38:19 +01:00
Alberto Garcia
e6d79c41c9 block: Assert that flags are up-to-date in bdrv_reopen_prepare()
Towards the end of bdrv_reopen_queue_child(), before starting to
process the children, the update_flags_from_options() function is
called in order to have BDRVReopenState.flags in sync with the options
from the QDict.

This is necessary because during the reopen process flags must be
updated for all nodes in the queue so bdrv_is_writable_after_reopen()
and the permission checks work correctly.

Because of that, calling update_flags_from_options() again in
bdrv_reopen_prepare() doesn't really change the flags (they are
already up-to-date). But we need to call it in order to remove those
options from QemuOpts and that way indicate that they have been
processed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:02 +01:00
Alberto Garcia
8eb4b07b6f block: Remove assertions from update_flags_from_options()
This function takes four options (cache.direct, cache.no-flush,
read-only and auto-read-only) from a QemuOpts object and updates the
flags accordingly.

If any of those options is not set (because it was missing from the
original QDict or because it had an invalid value) then the function
aborts with a failed assertion:

   $ qemu-io -c 'reopen -o read-only=foo' hd.qcow2
   block.c:1126: update_flags_from_options: Assertion `qemu_opt_find(opts, BDRV_OPT_CACHE_DIRECT)' failed.
   Aborted

This assertion is unnecessary, and it forces any caller of
bdrv_reopen() to pass all the aforementioned four options. This may
have made sense in order to remove ambiguity when bdrv_reopen() was
taking both flags and options, but that's not the case anymore.

It's also unnecessary if we want to validate the option values,
because bdrv_reopen_prepare() already takes care of that, as we can
see if we remove the assertions:

   $ qemu-io -c 'reopen -o read-only=foo' hd.qcow2
   Parameter 'read-only' expects 'on' or 'off'

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:02 +01:00
Alberto Garcia
9aa09ddd1e block: Stop passing flags to bdrv_reopen_queue_child()
Now that all callers are passing the new options using the QDict we no
longer need the 'flags' parameter.

This patch makes the following changes:

   1) The update_options_from_flags() call is no longer necessary
      so it can be removed.

   2) The update_flags_from_options() call is now used in all cases,
      and is moved down a few lines so it happens after the options
      QDict contains the final set of values.

   3) The flags parameter is removed. Now the flags are initialized
      using the current value (for the top-level node) or the parent
      flags (after inherit_options()). In both cases the initial
      values are updated to reflect the new options in the QDict. This
      happens in bdrv_reopen_queue_child() (as explained above) and in
      bdrv_reopen_prepare().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:02 +01:00
Alberto Garcia
2e891722c5 block: Remove flags parameter from bdrv_reopen_queue()
Now that all callers are passing all flag changes as QDict options,
the flags parameter is no longer necessary, so we can get rid of it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:02 +01:00
Alberto Garcia
295cf237c2 block: Drop bdrv_reopen()
No one is using this function anymore, so we can safely remove it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:02 +01:00
Alberto Garcia
e94d3dba6a block: Use bdrv_reopen_set_read_only() in bdrv_backing_update_filename()
This patch replaces the bdrv_reopen() calls that set and remove the
BDRV_O_RDWR flag with the new bdrv_reopen_set_read_only() function.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:01 +01:00
Alberto Garcia
6e1000a863 block: Add bdrv_reopen_set_read_only()
Most callers of bdrv_reopen() only use it to switch a BlockDriverState
between read-only and read-write, so this patch adds a new function
that does just that.

We also want to get rid of the flags parameter in the bdrv_reopen()
API, so this function sets the "read-only" option and passes the
original flags (which will then be updated in bdrv_reopen_prepare()).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:01 +01:00
Kevin Wolf
9e37271f50 block: Don't inactivate children before parents
bdrv_child_cb_inactivate() asserts that parents are already inactive
when children get inactivated. This precondition is necessary because
parents could still issue requests in their inactivation code.

When block nodes are created individually with -blockdev, all of them
are monitor owned and will be returned by bdrv_next() in an undefined
order (in practice, in the order of their creation, which is usually
children before parents), which obviously fails the assertion:

qemu: block.c:899: bdrv_child_cb_inactivate: Assertion `bs->open_flags & BDRV_O_INACTIVE' failed.

This patch fixes the ordering by skipping nodes with still active
parents in bdrv_inactivate_recurse() because we know that they will be
covered by recursion when the last active parent becomes inactive.

With the correct parents-before-children ordering, we also got rid of
the reason why commit aad0b7a0bf introduced two passes, so we can go
back to a single-pass recursion. This is necessary so we can rely on the
BDRV_O_INACTIVE flag to skip nodes with active parents (the flag used
to be set only in pass 2, so we would always skip non-root nodes in
pass 1 because all parents would still be considered active; setting the
flag in pass 1 would mean, that we never skip anything in pass 2 because
all parents are already considered inactive).

Because of the change to single pass, this patch is best reviewed with
whitespace changes ignored.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-11-27 12:59:00 +01:00
Alberto Garcia
6bd858b311 block: Update BlockDriverState.inherits_from on bdrv_drop_intermediate()
The previous patch fixed the inherits_from pointer after block-stream,
and this one does the same for block-commit.

When block-commit finishes and the 'top' node is not the topmost one
from the backing chain then all nodes above 'base' up to and including
'top' are removed from the chain.

The bdrv_drop_intermediate() call converts a chain like this one:

    base <- intermediate <- top <- active

into this one:

    base <- active

In a simple scenario each backing file from the first chain has the
inherits_from attribute pointing to its parent. This means that
reopening 'active' will recursively reopen all its children, whose
options can be changed in the process.

However after the 'block-commit' call base.inherits_from is NULL and
the chain is broken, so 'base' does not inherit from 'active' and will
not be reopened automatically:

   $ qemu-img create -f qcow2 hd0.qcow2 1M
   $ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2
   $ qemu-img create -f qcow2 -b hd1.qcow2 hd2.qcow2
   $ $QEMU -drive if=none,file=hd2.qcow2

   { 'execute': 'block-commit',
     'arguments': {
       'device': 'none0',
       'top': 'hd1.qcow2' } }

   { 'execute': 'human-monitor-command',
     'arguments': {
        'command-line':
          'qemu-io none0 "reopen -o backing.l2-cache-size=2M"' } }

   { "return": "Cannot change the option 'backing.l2-cache-size'\r\n"}

This patch updates base.inherits_from in this scenario, and adds a
test case.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-22 19:37:31 +01:00
Alberto Garcia
0065c455f9 block: Update BlockDriverState.inherits_from on bdrv_set_backing_hd()
When a BlockDriverState's child is opened (be it a backing file, the
protocol layer, or any other) inherits_from is set to point to the
parent node. Children opened separately and then attached to a parent
don't have this pointer set.

bdrv_reopen_queue_child() uses this to determine whether a node's
children must also be reopened inheriting the options from the parent
or not. If inherits_from points to the parent then the child is
reopened and its options can be changed, like in this example:

   $ qemu-img create -f qcow2 hd0.qcow2 1M
   $ qemu-img create -f qcow2 hd1.qcow2 1M
   $ $QEMU -drive if=none,node-name=hd0,file=hd0.qcow2,\
                  backing.driver=qcow2,backing.file.filename=hd1.qcow2
   (qemu) qemu-io hd0 "reopen -o backing.l2-cache-size=2M"

If the child does not inherit from the parent then it does not get
reopened and its options cannot be changed:

   $ $QEMU -drive if=none,node-name=hd1,file=hd1.qcow2
           -drive if=none,node-name=hd0,file=hd0.qcow2,backing=hd1
   (qemu) qemu-io hd0 "reopen -o backing.l2-cache-size=2M"
   Cannot change the option 'backing.l2-cache-size'

If a disk image has a chain of backing files then all of them are also
connected through their inherits_from pointers (i.e. it's possible to
walk the chain in reverse order from base to top).

However this is broken if the intermediate nodes are removed using
e.g. block-stream because the inherits_from pointer from the base node
becomes NULL:

   $ qemu-img create -f qcow2 hd0.qcow2 1M
   $ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2
   $ qemu-img create -f qcow2 -b hd1.qcow2 hd2.qcow2
   $ $QEMU -drive if=none,file=hd2.qcow2
   (qemu) qemu-io none0 "reopen -o backing.l2-cache-size=2M"
   (qemu) block_stream none0 0 hd0.qcow2
   (qemu) qemu-io none0 "reopen -o backing.l2-cache-size=2M"
   Cannot change the option 'backing.l2-cache-size'

This patch updates the inherits_from pointer if the intermediate nodes
of a backing chain are removed using bdrv_set_backing_hd(), and adds a
test case for this scenario.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-22 19:37:31 +01:00
Alberto Garcia
2a3d4331fa block: Fix update of BDRV_O_AUTO_RDONLY in update_flags_from_options()
Commit e35bdc123a added the auto-read-only option and the
code to update its corresponding flag in update_flags_from_options(),
but forgot to clear the flag if auto-read-only is false.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-22 16:43:52 +01:00
Max Reitz
9ad08c4456 block: Always abort reopen after prepare succeeded
bdrv_reopen_multiple() does not invoke bdrv_reopen_abort() for the
element of the reopen queue for which bdrv_reopen_prepare() failed,
because it assumes that the prepare function will have rolled back all
changes already.

However, bdrv_reopen_prepare() does not do this in every case: It may
notice an error after BlockDriver.bdrv_reopen_prepare() succeeded, and
it will not invoke BlockDriver.bdrv_reopen_abort() then; and neither
will bdrv_reopen_multiple(), as explained above.

This is wrong because we must always call .bdrv_reopen_commit() or
.bdrv_reopen_abort() after .bdrv_reopen_prepare() has succeeded.
Otherwise, the block driver has no chance to undo what it has done in
its implementation of .bdrv_reopen_prepare().

To fix this, bdrv_reopen_prepare() has to call .bdrv_reopen_abort() if
it wants to return an error after .bdrv_reopen_prepare() has succeeded.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-19 14:31:48 +01:00
Kevin Wolf
eaa2410f1e block: Require auto-read-only for existing fallbacks
Some block drivers have traditionally changed their node to read-only
mode without asking the user. This behaviour has been marked deprecated
since 2.11, expecting users to provide an explicit read-only=on option.

Now that we have auto-read-only=on, enable these drivers to make use of
the option.

This is the only use of bdrv_set_read_only(), so we can make it a bit
more specific and turn it into a bdrv_apply_auto_read_only() that is
more convenient for drivers to use.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-11-05 15:09:55 +01:00
Kevin Wolf
e35bdc123a block: Add auto-read-only option
If a management application builds the block graph node by node, the
protocol layer doesn't inherit its read-only option from the format
layer any more, so it must be set explicitly.

Backing files should work on read-only storage, but at the same time, a
block job like commit should be able to reopen them read-write if they
are on read-write storage. However, without option inheritance, reopen
only changes the read-only option for the root node (typically the
format layer), but not the protocol layer, so reopening fails (the
format layer wants to get write permissions, but the protocol layer is
still read-only).

A simple workaround for the problem in the management tool would be to
open the protocol layer always read-write and to make only the format
layer read-only for backing files. However, sometimes the file is
actually stored on read-only storage and we don't know whether the image
can be opened read-write (for example, for NBD it depends on the server
we're trying to connect to). This adds an option that makes QEMU try to
open the image read-write, but allows it to degrade to a read-only mode
without returning an error.

The documentation for this option is consciously phrased in a way that
allows QEMU to switch to a better model eventually: Instead of trying
when the image is first opened, making the read-only flag dynamic and
changing it automatically whenever the first BLK_PERM_WRITE user is
attached or the last one is detached would be much more useful
behaviour.

Unfortunately, this more useful behaviour is also a lot harder to
implement, and libvirt needs a solution now before it can switch to
-blockdev, so let's start with this easier approach for now.

Instead of adding a new auto-read-only option, turning the existing
read-only into an enum (with a bool alternate for compatibility) was
considered, but it complicated the implementation to the point that it
didn't seem to be worth it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-11-05 15:09:55 +01:00
Kevin Wolf
eeae6a596b block: Update flags in bdrv_set_read_only()
To fully change the read-only state of a node, we must not only change
bs->read_only, but also update bs->open_flags.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2018-11-05 15:09:55 +01:00
Alberto Garcia
415bbca86d block: replace "discard" literal with BDRV_OPT_DISCARD macro
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Vladimir Sementsov-Ogievskiy
9c98f145df dirty-bitmaps: clean-up bitmaps loading and migration logic
This patch aims to bring the following behavior:

1. We don't load bitmaps, when started in inactive mode. It's the case
of incoming migration. In this case we wait for bitmaps migration
through migration channel (if 'dirty-bitmaps' capability is enabled) or
for invalidation (to load bitmaps from the image).

2. We don't remove persistent bitmaps on inactivation. Instead, we only
remove bitmaps after storing. This is the only way to restore bitmaps,
if we decided to resume source after [failed] migration with
'dirty-bitmaps' capability enabled (which means, that bitmaps were not
stored).

3. We load bitmaps on open and any invalidation, it's ok for all cases:
  - normal open
  - migration target invalidation with dirty-bitmaps capability
    (bitmaps are migrating through migration channel, the are not
     stored, so they should have IN_USE flag set and will be skipped
     when loading. However, it would fail if bitmaps are read-only[1])
  - migration target invalidation without dirty-bitmaps capability
    (normal load of the bitmaps, if migrated with shared storage)
  - source invalidation with dirty-bitmaps capability
    (skip because IN_USE)
  - source invalidation without dirty-bitmaps capability
    (bitmaps were dropped, reload them)

[1]: to accurately handle this, migration of read-only bitmaps is
     explicitly forbidden in this patch.

New mechanism for not storing bitmaps when migrate with dirty-bitmaps
capability is introduced: migration filed in BdrvDirtyBitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:17 -04:00
Markus Armbruster
da7e92cac9 block: Clean up bdrv_img_create()'s error reporting
bdrv_img_create() takes an Error ** argument and uses it in the
conventional way, except for one place: when qemu_opts_do_parse()
fails, it first reports its error to stderr or the HMP monitor with
error_report_err(), then error_setg()'s a generic error.

When the caller reports that second error similarly, this produces two
consecutive error messages on stderr or the HMP monitor.

When the caller does something else with it, such as send it via QMP,
the first error still goes to stderr or the HMP monitor.  Fortunately,
no such caller exists.

Simply use the first error as is.  Update expected output of
qemu-iotest 049 accordingly.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-37-armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2018-10-19 14:51:34 +02:00
Markus Armbruster
4b5766488f error: Fix use of error_prepend() with &error_fatal, &error_abort
From include/qapi/error.h:

  * Pass an existing error to the caller with the message modified:
  *     error_propagate(errp, err);
  *     error_prepend(errp, "Could not frobnicate '%s': ", name);

Fei Li pointed out that doing error_propagate() first doesn't work
well when @errp is &error_fatal or &error_abort: the error_prepend()
is never reached.

Since I doubt fixing the documentation will stop people from getting
it wrong, introduce error_propagate_prepend(), in the hope that it
lures people away from using its constituents in the wrong order.
Update the instructions in error.h accordingly.

Convert existing error_prepend() next to error_propagate to
error_propagate_prepend().  If any of these get reached with
&error_fatal or &error_abort, the error messages improve.  I didn't
check whether that's the case anywhere.

Cc: Fei Li <fli@suse.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-10-19 14:51:34 +02:00
Alberto Garcia
543770bd2e block: Allow changing 'detect-zeroes' on reopen
'detect-zeroes' is one of the basic BlockdevOptions available for all
drivers, but it's not handled by bdrv_reopen_prepare(), so any attempt
to change it results in an error:

   (qemu) qemu-io virtio0 "reopen -o detect-zeroes=on"
   Cannot change the option 'detect-zeroes'

Since there's no reason why we shouldn't allow changing it and the
implementation is simple let's just do it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Alberto Garcia
593b307197 block: Allow changing 'discard' on reopen
'discard' is one of the basic BlockdevOptions available for all
drivers, but it's not handled by bdrv_reopen_prepare() so any attempt
to change it results in an error:

   (qemu) qemu-io virtio0 "reopen -o discard=on"
   Cannot change the option 'discard'

Since there's no reason why we shouldn't allow changing it and the
implementation is simple let's just do it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Alberto Garcia
57f9db9a94 block: Forbid trying to change unsupported options during reopen
The bdrv_reopen_prepare() function checks all options passed to each
BlockDriverState (in the reopen_state->options QDict) and makes all
necessary preparations to apply the option changes requested by the
user.

Options are removed from the QDict as they are processed, so at the
end of bdrv_reopen_prepare() only the options that can't be changed
are left. Then a loop goes over all remaining options and verifies
that the old and new values are identical, returning an error if
they're not.

The problem is that at the moment there are options that are removed
from the QDict although they can't be changed. The consequence of this
is any modification to any of those options is silently ignored:

   (qemu) qemu-io virtio0 "reopen -o discard=on"

This happens when all options from bdrv_runtime_opts are removed
from the QDict but then only a few of them are processed. Since
it's especially important that "node-name" and "driver" are not
changed, the code puts them back into the QDict so they are checked
at the end of the function. Instead of putting only those two options
back into the QDict, this patch puts all unprocessed options using
qemu_opts_to_qdict().

update_flags_from_options() also needs to be modified to prevent
BDRV_OPT_CACHE_NO_FLUSH, BDRV_OPT_CACHE_DIRECT and BDRV_OPT_READ_ONLY
from going back to the QDict.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Alberto Garcia
db905283b8 block: Allow child references on reopen
In the previous patches we removed all child references from
bs->{options,explicit_options} because keeping them is useless and
wrong.

Because of this, any attempt to reopen a BlockDriverState using a
child reference as one of its options would result in a failure,
because bdrv_reopen_prepare() would detect that there's a new option
(the child reference) that wasn't present in bs->options.

But passing child references on reopen can be useful. It's a way to
specify a BDS's child without having to pass recursively all of the
child's options, and if the reference points to a different BDS then
this can allow us to replace the child.

However, replacing the child is something that needs to be implemented
case by case and only when it makes sense. For now, this patch allows
passing a child reference as long as it points to the current child of
the BlockDriverState.

It's also important to remember that, as a consequence of the
previous patches, this child reference will be removed from
bs->{options,explicit_options} after the reopening has been completed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Alberto Garcia
a600aaddc3 block: Don't look for child references in append_open_options()
In the previous patch we removed child references from bs->options, so
there's no need to look for them here anymore.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Alberto Garcia
50196d7a7c block: Remove child references from bs->{options,explicit_options}
Block drivers allow opening their children using a reference to an
existing BlockDriverState. These references remain stored in the
'options' and 'explicit_options' QDicts, but we don't need to keep
them once everything is open.

What is more important, these values can become wrong if the children
change:

    $ qemu-img create -f qcow2 hd0.qcow2 10M
    $ qemu-img create -f qcow2 hd1.qcow2 10M
    $ qemu-img create -f qcow2 hd2.qcow2 10M
    $ $QEMU -drive if=none,file=hd0.qcow2,node-name=hd0 \
            -drive if=none,file=hd1.qcow2,node-name=hd1,backing=hd0 \
            -drive file=hd2.qcow2,node-name=hd2,backing=hd1

After this hd2 has hd1 as its backing file. Now let's remove it using
block_stream:

    (qemu) block_stream hd2 0 hd0.qcow2

Now hd0 is the backing file of hd2, but hd2's options QDicts still
contain backing=hd1.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Kevin Wolf
cfe29d8294 block: Use a single global AioWait
When draining a block node, we recurse to its parent and for subtree
drains also to its children. A single AIO_WAIT_WHILE() is then used to
wait for bdrv_drain_poll() to become true, which depends on all of the
nodes we recursed to. However, if the respective child or parent becomes
quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent
is checked, while AIO_WAIT_WHILE() depends on the AioWait of the
original node.

Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE().

This may mean that the draining thread gets a few more unnecessary
wakeups because an unrelated operation got completed, but we already
wake it up when something _could_ have changed rather than only if it
has certainly changed.

Apart from that, drain is a slow path anyway. In theory it would be
possible to use wakeups more selectively and still correctly, but the
gains are likely not worth the additional complexity. In fact, this
patch is a nice simplification for some places in the code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:50:15 +02:00
Alberto Garcia
8961be33e8 block: Fix use after free error in bdrv_open_inherit()
When a block device is opened with BDRV_O_SNAPSHOT and the
bdrv_append_temp_snapshot() call fails then the error code path tries
to unref the already destroyed 'options' QDict.

This can be reproduced easily by setting TMPDIR to a location where
the QEMU process can't write:

   $ TMPDIR=/nonexistent $QEMU -drive driver=null-co,snapshot=on

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-09-25 15:50:15 +02:00
Markus Armbruster
dd98e84819 qjson: Have qobject_from_json() & friends reject empty and blank
The last case where qobject_from_json() & friends return null without
setting an error is empty or blank input.  Callers:

* block.c's parse_json_protocol() reports "Could not parse the JSON
  options".  It's marked as a work-around, because it also covered
  actual bugs, but they got fixed in the previous few commits.

* qobject_input_visitor_new_str() reports "JSON parse error".  Also
  marked as work-around.  The recent fixes have made this unreachable,
  because it currently gets called only for input starting with '{'.

* check-qjson.c's empty_input() and blank_input() demonstrate the
  behavior.

* The other callers are not affected since they only pass input with
  exactly one JSON value or, in the case of negative tests, one error.

Fail with "Expecting a JSON value" instead of returning null, and
simplify callers.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-48-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Alberto Garcia
261dbcb18f block: Simplify append_open_options()
This function returns a BDS's driver-specific options, excluding also
those from its children. Since we have just removed all children
options from bs->options there's no need to do this last step.

We allow references to children, though ("backing": "node0"), so those
we still have to remove.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-08-15 12:50:39 +02:00
Alberto Garcia
4c8350fe17 block: Update bs->options if bdrv_reopen() succeeds
If bdrv_reopen() succeeds then bs->explicit_options is updated with
the new values, but bs->options never changes.

Here's an example:

   { "execute": "blockdev-add",
     "arguments": {
       "driver": "qcow2",
       "node-name": "hd0",
       "overlap-check": "all",
       "file": {
         "driver": "file",
         "filename": "hd0.qcow2"
       }
     }
   }

After this, both bs->options and bs->explicit_options contain
"overlap-check": "all".

Now let's change that using qemu-io's reopen command:

   (qemu) qemu-io hd0 "reopen -o overlap-check=none"

After this, bs->explicit_options contains the new value but
bs->options still keeps the old one.

This patch updates bs->options after a BDS has been successfully
reopened.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-08-15 12:50:39 +02:00
Alberto Garcia
1bab38e7bd block: Simplify bdrv_reopen_abort()
If a bdrv_reopen_multiple() call fails, then the explicit_options
QDict has to be deleted for every entry in the reopen queue. This must
happen regardless of whether that entry's bdrv_reopen_prepare() call
succeeded or not.

This patch simplifies the cleanup code a bit.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-08-15 12:50:39 +02:00
Alberto Garcia
2f624b80ba block: Remove children options from bs->{options,explicit_options}
When bdrv_open_inherit() opens a BlockDriverState the options QDict
can contain options for some of its children, passed in the form of
child-name.option=value

So while each child is opened with that subset of options, those same
options remain stored in the parent BDS, leaving (at least) two copies
of each one of them ("child-name.option=value" in the parent and
"option=value" in the child).

Having the children options stored in the parent is unnecessary and it
can easily lead to an inconsistent state:

  $ qemu-img create -f qcow2 hd0.qcow2 10M
  $ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2
  $ qemu-img create -f qcow2 -b hd1.qcow2 hd2.qcow2

  $ $QEMU -drive file=hd2.qcow2,node-name=hd2,backing.node-name=hd1

This opens a chain of images hd0 <- hd1 <- hd2. Now let's remove hd1
using block_stream:

  (qemu) block_stream hd2 0 hd0.qcow2

After this hd2 contains backing.node-name=hd1, which is no longer
correct because hd1 doesn't exist anymore.

This patch removes all children options from the parent dictionaries
at the end of bdrv_open_inherit() and bdrv_reopen_queue_child().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-08-15 12:50:39 +02:00
Vladimir Sementsov-Ogievskiy
3c005293c2 block: make .bdrv_close optional
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-08-15 12:50:39 +02:00
Stefan Weil
50d6a8a352 block: Fix typos in comments (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-23 16:50:43 +02:00
Kevin Wolf
4be6a6d118 block: Poll after drain on attaching a node
Commit dcf94a23b1 ('block: Don't poll in parent drain callbacks')
removed polling in bdrv_child_cb_drained_begin() on the grounds that the
original bdrv_drain() already will poll and BdrvChildRole.drained_begin
calls must not cause graph changes (and therefore must not call
aio_poll() or the recursion through the graph will break.

This reasoning is correct for calls through bdrv_do_drained_begin().
However, BdrvChildRole.drained_begin is also called when a node that is
already in a drained section (i.e. bdrv_do_drained_begin() has already
returned and therefore can't poll any more) is attached to a new parent.
In this case, we must explicitly poll to have all requests completed
before the drained new child can be attached to the parent.

In bdrv_replace_child_noperm(), we know that we're not inside the
recursion of bdrv_do_drained_begin() because graph changes are not
allowed there, and bdrv_replace_child_noperm() is a graph change. The
call of BdrvChildRole.drained_begin() must therefore be followed by a
BDRV_POLL_WHILE() that waits for the completion of requests.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10 10:36:15 +02:00