Commit Graph

1181 Commits

Author SHA1 Message Date
Eric Blake
79ba8c986a block: Set default request_alignment during bdrv_refresh_limits()
We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Now that all drivers have been updated to supply an override
of request_alignment during their .bdrv_refresh_limits(), as
needed, the block layer itself can defer setting the default
alignment until part of the overall bdrv_refresh_limits().

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Peter Maydell
7fa124b273 Error reporting patches for 2016-06-20
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIbBAABAgAGBQJXaAQPAAoJEDhwtADrkYZTbPMP+OmXlf1IwcUONhaHqD0SXAdC
 Q5NNHheNHbIdzrsOmgj2GAxCeGG08REPrdYLb8N9lI3Fev8ggQ0v3d9qiStL6Nvz
 2B7DeTxXba9Qmc2gY86RR4f7T6VK+8k1Kj9f3YcAoXHucdjDLRmtUozwES3Kk8mn
 oeBqd07yuTAO93xQhF6BDv3mfagPZV+kp15Se04ufWPpT9eG3Cp9Pl4/Yad/wJUr
 2B4pIvYnhISVlZIOvpuzdydBaCb/U0YefjWBDkawLwRBtM/kpNKoDdqr2UA0yChQ
 Xulgpt+UQLZ4aOYGLkzqBT7v3kDeEnYUFDdE03xZBLbHddOU05NEhLgzRVUEAdUS
 EAKWvWu692USDh9a2r0AYLqZW6J0+Fyi4fZ34gSuP69oi74CIvjJHDAEwsSdt6bm
 WHCchvfIk9kinzGiFA3hkAbVWAbI0EpB0J7k3mgCVVVelBpF9b9U4M2ydx9ZPufX
 t4ULXxQO9Qyxr7r0uiznugQAjIRwLMDOPPd1F5Vi+LX0wF3Pcrtc/F+3ehoLBjBD
 6zZA5lZVlUEISdzzytfjeX3ch5vRFwlujrTd2anxRuduuBV1/ztvE1v0Nsiii6Sw
 15BaHQJvBnMnAIO5LfhnAaVpIBk0jpQpwVyRcNUpeZ1VU2qlqVS4zxYZ0FDQgyer
 IZa1qYJ0Sz+oMkq9RXU=
 =TptL
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2016-06-20' into staging

Error reporting patches for 2016-06-20

# gpg: Signature made Mon 20 Jun 2016 15:56:15 BST
# gpg:                using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-error-2016-06-20:
  log: Fix qemu_set_log_filename() error handling
  log: Fix qemu_set_dfilter_ranges() error reporting
  log: Plug memory leak on multiple -dfilter
  coccinelle: Remove unnecessary variables for function return value
  error: Remove unnecessary local_err variables
  error: Remove NULL checks on error_propagate() calls
  vl: Error messages need to go to stderr, fix some

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-20 16:19:18 +01:00
Eduardo Habkost
621ff94d50 error: Remove NULL checks on error_propagate() calls
error_propagate() already ignores local_err==NULL, so there's no
need to check it before calling.

Coccinelle patch used to perform the changes added to
scripts/coccinelle/error_propagate_null.cocci.

Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1465855078-19435-2-git-send-email-ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-06-20 16:38:13 +02:00
Stefan Hajnoczi
e8a095dadb block: use safe iteration over AioContext notifiers
It's possible that an AioContext notifier user was close to finishing
when .detach_aio_context() or .attached_aio_context() is called.  In
that case they may call bdrv_remove_aio_context_notifier() during the
callback.

Use safe iteration to avoid crashing when the notifier list is modified
during iteration.  We must not only handle the case where the current
aio notifier is removed during a callback but also the one where any
other aio notifier is removed.

The next patch adds an AioContext notifier for block jobs and they
really could be terminating just as .detach_aio_context() is invoked.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-6-git-send-email-stefanha@redhat.com
2016-06-20 14:25:41 +01:00
Max Reitz
274fccee2b block/mirror: Fix target backing BDS
Currently, we are trying to move the backing BDS from the source to the
target in bdrv_replace_in_backing_chain() which is called from
mirror_exit(). However, mirror_complete() already tries to open the
target's backing chain with a call to bdrv_open_backing_file().

First, we should only set the target's backing BDS once. Second, the
mirroring block job has a better idea of what to set it to than the
generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
conditions on when to move the backing BDS from source to target are not
really correct).

Therefore, remove that code from bdrv_replace_in_backing_chain() and
leave it to mirror_complete().

Depending on what kind of mirroring is performed, we furthermore want to
use different strategies to open the target's backing chain:

- If blockdev-mirror is used, we can assume the user made sure that the
  target already has the correct backing chain. In particular, we should
  not try to open a backing file if the target does not have any yet.

- If drive-mirror with mode=absolute-paths is used, we can and should
  reuse the already existing chain of nodes that the source BDS is in.
  In case of sync=full, no backing BDS is required; with sync=top, we
  just link the source's backing BDS to the target, and with sync=none,
  we use the source BDS as the target's backing BDS.
  We should not try to open these backing files anew because this would
  lead to two BDSs existing per physical file in the backing chain, and
  we would like to avoid such concurrent access.

- If drive-mirror with mode=existing is used, we have to use the
  information provided in the physical image file which means opening
  the target's backing chain completely anew, just as it has been done
  already.
  If the target's backing chain shares images with the source, this may
  lead to multiple BDSs per physical image file. But since we cannot
  reliably ascertain this case, there is nothing we can do about it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-3-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Max Reitz
9bd910e2cb block: Allow replacement of a BDS by its overlay
change_parent_backing_link() asserts that the BDS to be replaced is not
used as a backing file. However, we may want to replace a BDS by its
overlay in which case that very link should not be redirected.

For instance, when doing a sync=none drive-mirror operation, we may have
the following BDS/BB forest before block job completion:

  target

  base <- source <- BlockBackend

During job completion, we want to establish the source BDS as the
target's backing node:

          target
            |
            v
  base <- source <- BlockBackend

This makes the target a valid replacement for the source:

          target <- BlockBackend
            |
            v
  base <- source

Without this modification to change_parent_backing_link() we have to
inject the target into the graph before the source is its backing node,
thus temporarily creating a wrong graph:

  target <- BlockBackend

  base <- source

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-2-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Kevin Wolf
418690447a block: Fix snapshot=on with aio=native
snapshot=on creates a temporary overlay that is always opened with
cache=unsafe (the cache mode specified by the user is only for the
actual image file and its children). This means that we must not inherit
the BDRV_O_NATIVE_AIO flag for the temporary overlay because trying to
use Linux AIO with cache=unsafe results in an error.

Reproducer without this patch:

$ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on
qemu-system-x86_64: -drive file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on: aio=native was
specified, but it requires cache.direct=on, which was not specified.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-16 15:19:56 +02:00
Kevin Wolf
c9d20029f4 block: Remove bs->zero_beyond_eof
It is always true for open images now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:56 +02:00
Kevin Wolf
23b0d9fb1d block: Don't enforce 512 byte minimum alignment
If block drivers say that they can do an alignment < 512 bytes, let's
just suppose they mean it. raw-posix used to be an offender with respect
to this, but it can actually deal with byte-aligned requests now.

The default is still 512 bytes for any drivers that only implement
sector-based interfaces, but it is 1 now for drivers that implement
.bdrv_co_preadv.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Peter Lieven
107d433cbb block: assert that bs->request_alignment is a power of 2
at least bdrv_co_preadv/pwritev expect this.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:09 +02:00
Kevin Wolf
a1a2af0756 block: Cancel jobs first in bdrv_close_all()
So far, bdrv_close_all() first removed all root BlockDriverStates of
BlockBackends and monitor owned BDSes, and then assumed that the
remaining BDSes must be related to jobs and cancelled these jobs.

This order doesn't work that well any more when block jobs use
BlockBackends internally because then they will lose their BDS before
being cancelled.

This patch changes bdrv_close_all() to first cancel all jobs and then
remove all root BDSes from the remaining BBs.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf
20018e12cf block: Propagate .drained_begin/end callbacks
When draining intermediate nodes (i.e. nodes that aren't the root node
for at least one of their parents; with node references, the user can
always configure the graph to create this situation), we need to
propagate the .drained_begin/end callbacks all the way up to the root
for the drain to be effective.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:11 +02:00
Kevin Wolf
36fe13317b block: Fix reconfiguring graph with drained nodes
When changing the BlockDriverState that a BdrvChild points to while the
node is currently drained, we must call the .drained_end() parent
callback. Conversely, when this means attaching a new node that is
already drained, we need to call .drained_begin().

bdrv_root_attach_child() takes now an opaque parameter, which is needed
because the callbacks must also be called if we're attaching a new child
to the BlockBackend when the root node is already drained, and they need
a way to identify the BlockBackend. Previously, child->opaque was set
too late and the callbacks would still see it as NULL.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:10 +02:00
Kevin Wolf
e9740bc6d4 block: Introduce bdrv_replace_child()
This adds a common function that is called when attaching a new child to
a parent, removing a child from a parent and when reconfiguring the
graph so that an existing child points to a different node now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
6b574e09b3 block: Drop bdrv_parent_cb_...() from bdrv_close()
bdrv_close() now asserts that the BDS's refcount is 0, therefore it
cannot have any parents and the bdrv_parent_cb_change_media() call is a
no-op.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
30f55fb81f block: Assert !bs->refcnt in bdrv_close()
The only caller of bdrv_close() left is bdrv_delete(). We may as well
assert that, in a way (there are some things in bdrv_close() that make
more sense under that assumption, such as the call to
bdrv_release_all_dirty_bitmaps() which in turn assumes that no frozen
bitmaps are attached to the BDS).

In addition, being called only in bdrv_delete() means that we can drop
bdrv_close()'s forward declaration at the top of block.c.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
5b3639371c block: Make bdrv_open() return a BDS
There are no callers to bdrv_open() or bdrv_open_inherit() left that
pass a pointer to a non-NULL BDS pointer as the first argument of these
functions, so we can finally drop that parameter and just make them
return the new BDS.

Generally, the following pattern is applied:

    bs = NULL;
    ret = bdrv_open(&bs, ..., &local_err);
    if (ret < 0) {
        error_propagate(errp, local_err);
        ...
    }

by

    bs = bdrv_open(..., errp);
    if (!bs) {
        ret = -EINVAL;
        ...
    }

Of course, there are only a few instances where the pattern is really
pure.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
9bddf75979 block: Drop bdrv_new_root()
It is unused now, so we may just as well drop it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
668361898e block: Let bdrv_open_inherit() return the snapshot
If bdrv_open_inherit() creates a snapshot BDS and *pbs is NULL, that
snapshot BDS should be returned instead of the BDS under it.

This has worked so far because (nearly) all users of BDRV_O_SNAPSHOT use
blk_new_open() to create the BDS tree. bdrv_append() (which is called by
bdrv_append_temp_snapshot()) redirects pointers from parents (i.e. the
BB in this case) to the newly appended child (i.e. the overlay),
therefore, while bdrv_open_inherit() did not return the root BDS, the BB
still pointed to it.

The only instance where BDRV_O_SNAPSHOT is used but blk_new_open() is
not is in blockdev_init() if no BDS tree is created, and instead
blk_new() is used and the flags are stored in the BB root state.
However, qmp_blockdev_change_medium() filters the BDRV_O_SNAPSHOT flag
before invoking bdrv_open(), so it will not have any effect.

In any case, it would be nicer if bdrv_open_inherit() could just always
return the root of the BDS tree that has been created.

To this end, bdrv_append_temp_snapshot() now returns the snapshot BDS
instead of just appending it on top of the snapshotted BDS. Also, it
calls bdrv_ref() before bdrv_append() (which bdrv_open_inherit() has to
undo if not returning the overlay).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
506f8709ce block: Drop useless bdrv_new() call
bdrv_append_temp_snapshot() uses bdrv_new() to create an empty BDS
before invoking bdrv_open() on that BDS. This is probably a relict from
when it used to do some modifications on that empty BDS, but now that is
unnecessary, so we can just set bs_snapshot to NULL and let bdrv_open()
do the rest.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Kevin Wolf
88be7b4be4 block: Fix bdrv_next() memory leak
The bdrv_next() users all leaked the BdrvNextIterator after completing
the iteration. Simply changing bdrv_next() to free the iterator before
returning NULL at the end of list doesn't work because some callers exit
the loop before looking at all BDSes.

This patch moves the BdrvNextIterator from the heap to the stack of
the caller and switches to a bdrv_first()/bdrv_next() interface for
initialising the iterator.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
b97511c7bc block: Propagate AioContext change to all children
Instead of propagating any change of a BDS's AioContext only to its file
and backing children and letting driver-specific code do the rest, just
propagate it to all and drop the thus superfluous implementations of
bdrv_{at,de}tach_aio_context() in Quorum, blkverify and VMDK.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
1f0c461b82 block: Remove BlockDriverState.blk
This patch removes the remaining users of bs->blk, which will allow us
to have multiple BBs on top of a single BDS. In the meantime, all checks
that are currently in place to prevent the user from creating such
setups can be switched to bdrv_has_blk() instead of accessing BDS.blk.

Future patches can allow them and e.g. enable users to mirror to a block
device that already has a BlockBackend on it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
7c8eece45b block: Avoid bs->blk in bdrv_next()
We need to introduce a separate BdrvNextIterator struct that can keep
more state than just the current BDS in order to avoid using the bs->blk
pointer.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
4c265bf9f4 block: User BdrvChild callback for device name
In order to get rid of bs->blk for bdrv_get_device_name() and
bdrv_get_device_or_node_name(), ask all parents for their name and
simply pick the first one.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
5c8cab4808 block: Use BdrvChild callbacks for change_media/resize
We want to get rid of BlockDriverState.blk in order to allow multiple
BlockBackends per BDS. Converting the device callbacks in block.c (which
assume a single BlockBackend) to per-child callbacks gets us rid of the
first few instances.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
b26ded9a7d Revert "block: Forbid I/O throttling on nodes with multiple parents for 2.6"
This reverts commit 76b223200e.

Now that I/O throttling is fully done on the BlockBackend level, there
is no reason any more to block I/O throttling for nodes with multiple
parents as the parents don't influence each other any more.

Conflicts:
	block.c

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
08e83aabe4 block: Remove bdrv_move_feature_fields()
bdrv_move_feature_fields() and swap_feature_fields() are empty now, they
can be removed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
7ca7f0f6db block: Decouple throttling from BlockDriverState
This moves the throttling related part of the BDS life cycle management
to BlockBackend. The throttling group reference is now kept even when no
medium is inserted.

With this commit, throttling isn't disabled and then re-enabled any more
during graph reconfiguration. This fixes the temporary breakage of I/O
throttling when used with live snapshots or block jobs that manipulate
the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf
97148076e8 block: Move I/O throttling configuration functions to BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf
27ccdd5259 block: Move throttling fields from BDS to BB
This patch changes where the throttling state is stored (used to be the
BlockDriverState, now it is the BlockBackend), but it doesn't actually
make it a BB level feature yet. For example, throttling is still
disabled when the BDS is detached from the BB.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf
a5614993d7 block: Make sure throttled BDSes always have a BB
It was already true in principle that a throttled BDS always has a BB
attached, except that the order of operations while attaching or
detaching a BDS to/from a BB wasn't careful enough.

This commit breaks graph manipulations while I/O throttling is enabled.
It would have been possible to keep things working with some temporary
hacks, but quite cumbersome, so it's not worth the hassle. We'll fix
things again in a minute.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:29 +02:00
Wen Congyang
98292c61bc quorum: implement bdrv_add_child() and bdrv_del_child()
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Message-id: 1462865799-19402-3-git-send-email-xiecl.fnst@cn.fujitsu.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:23 +02:00
Wen Congyang
e06018ad28 Add new block driver interface to add/delete a BDS's child
In some cases, we want to take a quorum child offline, and take
another child online.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1462865799-19402-2-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:23 +02:00
Fam Zheng
aad0b7a0bf block: Inactivate all children
Currently we only inactivate the top BDS. Actually bdrv_inactivate
should be the opposite of bdrv_invalidate_cache.

Recurse into the whole subtree instead.

Because a node may have multiple parents, and because once
BDRV_O_INACTIVE is set for a node, further writes are not allowed, we
cannot interleave flag settings and .bdrv_inactivate calls (that may
submit write to other nodes in a graph) within a single pass. Therefore
two passes are used here.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Fam Zheng
0d1c5c9160 block: Invalidate all children
Currently we only recurse to bs->file, which will miss the children in quorum
and VMDK.

Recurse into the whole subtree to avoid that.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Kevin Wolf
e3ddef25e9 block: Remove BlockDriver.bdrv_read/write
There are no block drivers left that implement the old .bdrv_read/write
interface, so it can be removed now. This gets us rid of the
corresponding emulation functions, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Paolo Bonzini
ce0f141259 block: introduce bdrv_no_throttling_begin/end
Extract the handling of throttling from bdrv_flush_io_queue.  These
new functions will soon become BdrvChildRole callbacks, as they can
be generalized to "beginning of drain" and "end of drain".

Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Kevin Wolf
76b223200e block: Forbid I/O throttling on nodes with multiple parents for 2.6
As the patches to move I/O throttling to BlockBackend didn't make it in
time for the 2.6 release, but the release adds new ways of configuring
VMs whose behaviour would change once the move is done, we need to
outlaw such configurations temporarily.

The problem exists whenever a BDS has more users than just its BB, for
example it is used as a backing file for another node. (This wasn't
possible in 2.5 yet as we introduced node references to specify a
backing file only recently.) In these cases, the throttling would
apply to these other users now, but after moving throttling to the
BlockBackend the other users wouldn't be throttled any more.

This patch prevents making new references to a throttled node as well as
using monitor commands to throttle a node with multiple parents.

Compared to 2.5 this changes behaviour in some corner cases where
references were allowed before, like bs->file or Quorum children. It
seems reasonable to assume that users didn't use I/O throttling on such
low level nodes. With the upcoming move of throttling into BlockBackend,
such configurations won't be possible anyway.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-04-05 09:22:28 +02:00
Kevin Wolf
09cf9db1bc block: Remove bdrv_(set_)enable_write_cache()
The only remaining users were block jobs (mirror and backup) which
unconditionally enabled WCE on the BlockBackend of the target image. As
these block jobs don't go through BlockBackend for their I/O requests,
they aren't affected by this setting anyway but always get a writeback
mode, so that call can be removed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf
61de4c6808 block: Remove BDRV_O_CACHE_WB
The previous patches have successively made blk->enable_write_cache the
true source for the information whether a writethrough mode must be
implemented. The corresponding BDRV_O_CACHE_WB is only useless baggage
we're carrying around, so now's the time to remove it.

At the same time, we remove the 'cache.writeback' option parsing on the
BDS level as the only effect was setting the BDRV_O_CACHE_WB flag.

This change requires test cases that explicitly enabled the option to
drop it. Other than that and the change of the error message when
writethrough is enabled on the BDS level (from "Can't set writethrough
mode" to "doesn't support the option"), there should be no change in
behaviour.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf
53e8ae0100 block: Remove bdrv_parse_cache_flags()
All users are converted to bdrv_parse_cache_mode() now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf
19dbecdcee qemu-io: Use bdrv_parse_cache_mode() in reopen_f()
We must forbid changing the WCE flag in bdrv_reopen() in the same patch,
as otherwise the behaviour would change so that the flag takes
precedence over the explicitly specified option.

The correct value of the WCE flag depends on the BlockBackend user (e.g.
guest device) and isn't a decision that the QMP client makes, so this
change is what we want.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf
c83f9fba2a block/qapi: Use blk_enable_write_cache()
Now that WCE is handled on the BlockBackend level, the flag is
meaningless for BDSes. As the schema requires us to fill the field,
we return an enabled write cache for them.

Note that this means that querying the BlockBackend name may return
writethrough as the cache information, whereas querying the node-name of
the root of that same BlockBackend will return writeback.

This may appear odd at first, but it actually makes sense because it
correctly repesents the layer that implements the WCE handling. This
becomes more apparent when you consider nodes that are the root node of
multiple BlockBackends, where each BB can have its own WCE setting.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf
bfd18d1e0b block: Move enable_write_cache to BB level
Whether a write cache is used or not is a decision that concerns the
user (e.g. the guest device) rather than the backend. It was already
logically part of the BB level as bdrv_move_feature_fields() always kept
it on top of the BDS tree; with this patch, the core of it (the actual
flag and the additional flushes) is also implemented there.

Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs
doesn't have a BlockBackend attached.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf
baf5602ed9 block: Add bdrv_parse_cache_mode()
It's like bdrv_parse_cache_flags(), except that writethrough mode isn't
included in the flags, but returned as a separate bool.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:00 +02:00
Daniel P. Berrange
e6ff69bf5e block: move encryption deprecation warning into qcow code
For a couple of releases we have been warning

  Encrypted images are deprecated
  Support for them will be removed in a future release.
  You can use 'qemu-img convert' to convert your image to an unencrypted one.

This warning was issued by system emulators, qemu-img, qemu-nbd
and qemu-io. Such a broad warning was issued because the original
intention was to rip out all the code for dealing with encryption
inside the QEMU block layer APIs.

The new block encryption framework used for the LUKS driver does
not rely on the unloved block layer API for encryption keys,
instead using the QOM 'secret' object type. It is thus no longer
appropriate to warn about encryption unconditionally.

When the qcow/qcow2 drivers are converted to use the new encryption
framework too, it will be practical to keep AES-CBC support present
for use in qemu-img, qemu-io & qemu-nbd to allow for interoperability
with older QEMU versions and liberation of data from existing encrypted
qcow2 files.

This change moves the warning out of the generic block code and
into the qcow/qcow2 drivers. Further, the warning is set to only
appear when running the system emulators, since qemu-img, qemu-io,
qemu-nbd are expected to support qcow2 encryption long term now that
the maint burden has been eliminated.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:12:15 +02:00
Daniel P. Berrange
abb06c5ac1 block: add flag to indicate that no I/O will be performed
When opening an image it is useful to know whether the caller
intends to perform I/O on the image or not. In the case of
encrypted images this will allow the block driver to avoid
having to prompt for decryption keys when we merely want to
query header metadata about the image. eg qemu-img info

This flag is enforced at the top level only, since even if
we don't want todo I/O on the 'qcow2' file payload, the
underlying 'file' driver will still need todo I/O to read
the qcow2 header, for example.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf
73ac451f34 block: Reject writethrough mode except at the root
Writethrough mode is going to become a BlockBackend feature rather than
a BDS one, so forbid it in places where we won't be able to support it
when the code finally matches the envisioned design.

We only allowed setting the cache mode of non-root nodes after the 2.5
release, so we're still free to make this change.

The target of block jobs is now always opened in a writeback mode
because it doesn't have a BlockBackend attached. This makes more sense
anyway because block jobs know when to flush. If the graph is modified
on job completion, the original cache mode moves to the new root, so
for the guest device writethough always stays enabled if it was
configured this way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf
b8816a4386 block: Make backing files always writeback
First of all, we're generally not writing to backing files, but when we
do, it's in the context of block jobs which know very well when to flush
the image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf
7a827aaec8 block: Remove dirty bitmaps from bdrv_move_feature_fields()
This patch changes dirty bitmaps from following a BlockBackend in graph
changes to sticking with the node they were created at. For the full
discussion, read the following mailing list thread:

  [Qemu-block] block: Dirty bitmaps and COR in bdrv_move_feature_fields()
  https://lists.nongnu.org/archive/html/qemu-block/2016-02/msg00745.html

In summary, the justification for this change is:

* When moving the dirty bitmap to the top of the tree was introduced in
  bdrv_append() in commit a9fc4408, it didn't actually have any effect
  because there could never be a bitmap in use when bdrv_append() was
  called (op blockers would prevent this). This is still true today for
  all internal uses of dirty bitmaps.

* Support for user-defined dirty bitmaps was introduced in 2.4, but we
  discouraged users from using it because we didn't consider it ready
  yet.

  Moreover, in 2.5, the bdrv_swap() removal introduced a bug that left
  dangling pointers if a dirty bitmap was present (the anchors of the
  dirty bitmap were swapped, but the back link in the first element
  wasn't updated), so it didn't even work correctly.

* block-dirty-bitmap-add takes an arbitrary node name, even if no
  BlockBackend is attached. This suggests that it is a node level
  operation and not a BlockBackend one. Consequently, there is no reason
  for dirty bitmaps to stay with a BlockBackend that was attached to the
  node they were created for.

* It was suggested that block-dirty-bitmap-add could track the node if a
  node name was specified, and track the BlockBackend if the device name
  was specified. This would however be inconsistent with other QMP
  commands. Commands that accept both device and node names currently
  interpret the device name just as an alias for the current root node
  of that BlockBackend.

* Dirty bitmaps have a name that is only unique amongst the bitmaps in a
  specific node. Moving bitmaps could lead to name clashes. Automatic
  renaming would involve too much magic.

* Persistent bitmaps are stored in a specific node. Moving them around
  automatically might be at least surprising, but it would probably also
  become a real problem because that would have to happen atomically
  without the management tool knowing of the operation.

At the end of the day it seems to be very clear that it was a mistake to
include dirty bitmaps in bdrv_move_feature_fields(). The functionality
of moving bitmaps and/or attaching them to a BlockBackend instead will
probably be needed, but it should be done with a new explicit QMP
command or option.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf
4c8449832c block: Remove copy-on-read from bdrv_move_feature_fields()
Ever since we first introduced bdrv_append() in commit 8802d1fd ('qapi:
Introduce blockdev-group-snapshot-sync command'), the copy-on-read flag
was moved to the new top layer when taking a snapshot. The only problem
is that it doesn't make a whole lot of sense.

The use case for manually enabled CoR is to avoid reading data twice
from a slow remote image, so we want to save it to a local overlay, say
an ISO image accessed via HTTP to a local qcow2 overlay. When taking a
snapshot, we end up with a backing chain like this:

    http <- local.qcow2 <- snap_overlay.qcow2

There is no point in doing CoR from local.qcow2 into snap_overlay.qcow2,
we just want to keep copying data from the remote source into
local.qcow2.

The other use case of CoR is in the context of streaming, which isn't
very interesting for bdrv_move_feature_fields() because op blockers
prevent this combination.

This patch makes the copy-on-read flag stay on the image for which it
was originally set and prevents it from being propagated to the new
overlay. It is no longer intended to move CoR to the BlockBackend level.
In order for this to make sense, we also need to keep the respective
image read-write.

As a side effect of these changes, creating a live snapshot image (as
opposed to using an existing externally created one) on top of a COR
block device works now. It used to fail because it tried to open its
backing file both read-only and with COR.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf
63eaaae08c block: Remove bdrv_make_anon()
The call in hmp_drive_del() is dead code because blk_remove_bs() is
called a few lines above. The only other remaining user is
bdrv_delete(), which only abuses bdrv_make_anon() to remove it from the
named nodes list. This path inlines the list entry removal into
bdrv_delete() and removes bdrv_make_anon().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 11:59:32 +02:00
Veronia Bahaa
f348b6d1a5 util: move declarations out of qemu-common.h
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)

Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Kevin Wolf
f21d96d04b block: Use BdrvChild in BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Max Reitz
9aaf28c61d block: Remove bdrv_states list
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Max Reitz
79720af640 block: Use bdrv_next() instead of bdrv_states
There is no point in manually iterating through the bdrv_states list
when there is bdrv_next().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Max Reitz
2626058034 block: Rewrite bdrv_next()
Instead of using the bdrv_states list, iterate over all the
BlockDriverStates attached to BlockBackends, and over all the
monitor-owned BDSs afterwards (except for those attached to a BB).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
fe1a9cbc33 block: Move some bdrv_*_all() functions to BB
Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
d0e46a5577 block: Drop BB name from bad option error
The information which BB is concerned does not seem useful enough to
justify its existence in most other place (which may be related to qemu
printing the -drive parameter in question anyway, and for blockdev-add
the attribution is naturally unambiguous). Furthermore, as of a future
patch, bdrv_get_device_name(bs) will always return the empty string
before bdrv_open_inherit() returns.

Therefore, just dropping that information seems to be the best course of
action.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Fam Zheng
ebab225910 block: Move block dirty bitmap code to separate files
The only code change is making bdrv_dirty_bitmap_truncate public. It is
used in block.c.

Also two long lines (bdrv_get_dirty) are wrapped.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-5-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:05 +01:00
Kevin Wolf
73176bee99 block: Fix snapshot=on cache modes
Since commit 91a097e, we end up with a somewhat weird cache mode
configuration with snapshot=on: The commit broke the cache mode
inheritance for the snapshot overlay so that it is opened as
writethrough instead of unsafe now. The following bdrv_append() call to
put it on top of the tree swaps the WCE flag with the snapshot's backing
file (i.e. the originally given file), so what we eventually get is
cache=writeback on the temporary overlay and
cache=writethrough,cache.no-flush=on on the real image file.

This patch changes things so that the temporary overlay gets
cache=unsafe again like it used to, and the real images get whatever the
user specified. This means that cache.direct is now respected even with
snapshot=on, and in the case of committing changes, the final flush is
no longer ignored except explicitly requested by the user.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
12d5ee3a7e block: Fix -incoming with snapshot=on
The BDRV_O_INACTIVE flag should only be set for images explicitly opened
by the user. snapshot=on needs to create a new qcow2 image and write
some metadata to it. This is not a problem because it can't come from
the source, so there's no reason to mark it as BDRV_O_INACTIVE, even
though it is opened while waiting for the migration to complete.

This fixes an assertion failure when -incoming and snapshot=on are
combined.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-22 09:49:46 +01:00
Peter Maydell
d38ea87ac5 all: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1454089805-5470-16-git-send-email-peter.maydell@linaro.org
2016-02-04 17:41:30 +00:00
Jeff Cody
f8aa905a4f block: set device_list.tqe_prev to NULL on BDS removal
This fixes a regression introduced with commit 3f09bfbc7.  Multiple
bugs arise in conjunction with live snapshots and mirroring operations
(which include active layer commit).

After a live snapshot occurs, the active layer and the base layer both
have a non-NULL tqe_prev field in the device_list, although the base
node's tqe_prev field points to a NULL entry.  This non-NULL tqe_prev
field occurs after the bdrv_append() in the external snapshot calls
change_parent_backing_link().

In change_parent_backing_link(), when the previous active layer is
removed from device_list, the device_list.tqe_prev pointer is not
set to NULL.

The operating scheme in the block layer is to indicate that a BDS belongs
in the bdrv_states device_list iff the device_list.tqe_prev pointer
is non-NULL.

This patch does two things:

1.) Introduces a new block layer helper bdrv_device_remove() to remove a
    BDS from the device_list, and
2.) uses that new API, which also fixes the regression once used in
    change_parent_backing_link().

Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 0cd51e11c0666c04ddb7c05293fe94afeb551e89.1454376655.git.jcody@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 18:04:47 +01:00
Max Reitz
ca9bd24cf1 block: Rewrite bdrv_close_all()
This patch rewrites bdrv_close_all(): Until now, all root BDSs have been
force-closed. This is bad because it can lead to cached data not being
flushed to disk.

Instead, try to make all reference holders relinquish their reference
voluntarily:

1. All BlockBackend users are handled by making all BBs simply eject
   their BDS tree. Since a BDS can never be on top of a BB, this will
   not cause any of the issues as seen with the force-closing of BDSs.
   The references will be relinquished and any further access to the BB
   will fail gracefully.
2. All BDSs which are owned by the monitor itself (because they do not
   have a BB) are relinquished next.
3. Besides BBs and the monitor, block jobs and other BDSs are the only
   things left that can hold a reference to BDSs. After every remaining
   block job has been canceled, there should not be any BDSs left (and
   the loop added here will always terminate (as long as NDEBUG is not
   defined), because either all_bdrv_states will be empty or there will
   not be any block job left to cancel, failing the assertion).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02 17:50:46 +01:00
Max Reitz
2c1d04e002 block: Add list of all BlockDriverStates
We need this list so that bdrv_close_all() can keep track of which BDSs
are still open after having removed the BDSs from all of the BBs and
having released all monitor BDS references.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02 17:50:46 +01:00
Max Reitz
64dff52019 block: Make bdrv_close() static
There are no users of bdrv_close() left, except for one of bdrv_open()'s
failure paths, bdrv_close_all() and bdrv_delete(), and that is good.
Make bdrv_close() static so nobody makes the mistake of directly using
bdrv_close() again.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02 17:50:46 +01:00
Max Reitz
033cb5659a block: Remove BDS close notifier
It is unused now, so we can remove it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02 17:50:46 +01:00
Max Reitz
c5acdc9ab4 block: Release named dirty bitmaps in bdrv_close()
bdrv_delete() is not very happy about deleting BlockDriverStates with
dirty bitmaps still attached to them. In the past, we got around that
very easily by relying on bdrv_close_all() bypassing bdrv_delete(), and
bdrv_close() simply ignoring that condition. We should fix that by
releasing all named dirty bitmaps in bdrv_close() (there should not be
any unnamed bitmaps left) and moving the assertion from bdrv_delete()
there.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02 17:50:46 +01:00
Kevin Wolf
76b1c7fe1c block: Inactivate BDS when migration completes
So far, live migration with shared storage meant that the image is in a
not-really-ready don't-touch-me state on the destination while the
source is still actively using it, but after completing the migration,
the image was fully opened on both sides. This is bad.

This patch adds a block driver callback to inactivate images on the
source before completing the migration. Inactivation means that it goes
to a state as if it was just live migrated to the qemu instance on the
source (i.e. BDRV_O_INACTIVE is set). You're then supposed to continue
either on the source or on the destination, which takes ownership of the
image.

A typical migration looks like this now with respect to disk images:

1. Destination qemu is started, the image is opened with
   BDRV_O_INACTIVE. The image is fully opened on the source.

2. Migration is about to complete. The source flushes the image and
   inactivates it. Now both sides have the image opened with
   BDRV_O_INACTIVE and are expecting the other side to still modify it.

3. One side (the destination on success) continues and calls
   bdrv_invalidate_all() in order to take ownership of the image again.
   This removes BDRV_O_INACTIVE on the resuming side; the flag remains
   set on the other side.

This ensures that the same image isn't written to by both instances
(unless both are resumed, but then you get what you deserve). This is
important because .bdrv_close for non-BDRV_O_INACTIVE images could write
to the image file, which is definitely forbidden while another host is
using the image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2016-01-20 13:36:23 +01:00
Kevin Wolf
04c01a5c8f block: Rename BDRV_O_INCOMING to BDRV_O_INACTIVE
Instead of covering only the state of images on the migration
destination before the migration is completed, the flag will also cover
the state of images on the migration source after completion. This
common state implies that the image is technically still open, but no
writes will happen and any cached contents will be reloaded from disk if
and when the image leaves this state.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-20 13:36:23 +01:00
Kevin Wolf
23c88b2472 block: Fix error path in bdrv_invalidate_cache()
We can only clear BDRV_O_INCOMING if the caches were actually
invalidated.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-20 13:36:23 +01:00
Kevin Wolf
82dc8b4110 block: Fix .bdrv_open flags
bdrv_common_open() modified bs->open_flags after inferring the set of
options to pass to the driver's .bdrv_open callback. This means that the
cache options were correctly set in bs->open_flags (and therefore
correctly displayed in 'info block'), but the image would actually be
opened with the default cache mode instead.

This patch removes the flags parameter to bdrv_common_open() (except for
BDRV_O_NO_BACKING it's the same as bs->open_flags anyway, and having two
names for the same thing is confusing), and moves the assignment of
open_flags down to immediately before calling into the block drivers. In
all other places, bs->open_flags is now used consistently.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-01-19 17:43:55 +01:00
Markus Armbruster
e43bfd9c87 error: Use error_prepend() where it makes obvious sense
Done with this Coccinelle semantic patch

    @@
    expression FMT, E1, E2;
    expression list ARGS;
    @@
    -    error_setg(E1, FMT, ARGS, error_get_pretty(E2));
    +    error_propagate(E1, E2);/*###*/
    +    error_prepend(E1, FMT/*@@@*/, ARGS);

followed by manual cleanup, first because I can't figure out how to
make Coccinelle transform strings, and second to get rid of now
superfluous error_propagate().

We now use or propagate the original error whole instead of just its
message obtained with error_get_pretty().  This avoids suppressing its
hint (see commit 50b7b00), but I can't see how the errors touched in
this commit could come with hints.  It also improves the message
printed with &error_abort when we screw up (see commit 1e9b65b).

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-13 15:16:17 +01:00
Markus Armbruster
cd5c2dac2e block: Clean up "Could not create temporary overlay" error message
bdrv_create() sets an error and returns -errno on failure.  When the
latter is interesting, the error is created with error_setg_errno().

bdrv_append_temp_snapshot() uses the error's message to create a new
one with error_setg_errno().  This adds a strerror() that is either
uninteresting or duplicate.  Use error_setg() instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1450452927-8346-7-git-send-email-armbru@redhat.com>
2016-01-13 15:16:16 +01:00
Paolo Bonzini
fc27291daf block: use drained section in bdrv_close
bdrv_close is used when ejecting a medium.  Use a drained section to ensure
that all I/O goes to either the old medium or the bitbucket.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1450867706-19860-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-01-07 21:30:17 +01:00
Max Reitz
8b13976d3f block: Add opaque value to the amend CB
Add an opaque value which is to be passed to the bdrv_amend_options()
status callback.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Kevin Wolf
91a097e747 block: Move cache options into options QDict
This adds the cache mode options to the QDict, so that they can be
specified for child nodes (e.g. backing.cache.direct=off).

The cache modes are not removed from the flags at this point; instead,
options and flags are kept in sync. If the user specifies both flags and
options, the options take precedence.

Child node inherit cache modes as options now, they don't use flags any
more.

Note that this forbids specifying the cache mode for empty drives. It
didn't make sense anyway to specify it there, because it didn't have any
effect. blockdev_init() considers the cache options now bdrv_open()
options and therefore doesn't create an empty drive any more but calls
into bdrv_open(). This in turn will fail with no driver and filename
specified.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:43 +01:00
Kevin Wolf
ccf9dc07b5 block: reopen: Extract QemuOpts for generic block layer options
This patch adds a QemuOpts for generic block layer options to
bdrv_reopen_prepare(). The only two options that currently exist
(node-name and driver) cannot be changed, so the only thing we do is
putting them right back into the QDict so that we check at the end that
they are indeed unchanged.

We will add new options soon that can actually be changed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:43 +01:00
Kevin Wolf
145f598e4a block: Introduce bs->explicit_options
bs->options doesn't only contain options that the user explicitly
requested, but also option that were derived from flags, the filename or
inherited from the parent node.

For reopen, it is important to know the difference because reopening the
parent can change inherited values in child nodes, but it shouldn't
change any options that were explicitly specified for the child.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:43 +01:00
Kevin Wolf
de3b53f007 block: Split out parse_json_protocol()
The next patch distinguishes options that were explicitly set and
options that were derived. bdrv_fill_option() added options of both
types: Options given by json: syntax should be counted as explicit, but
the rest is derived.

In preparation for the distinction, move json: parse to a separate
function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:43 +01:00
Kevin Wolf
8e2160e2c7 block: Add infrastructure for option inheritance
Options are not actually inherited from the parent node yet, but this
commit lays the grounds for doing so.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf
2851810223 block: reopen: Document option precedence and refactor accordingly
The interesting part of reopening an image is from which sources the
effective options should be taken, i.e. which options take precedence
over which other options. This patch documents the precedence that will
be implemented in the following patches.

It also refactors bdrv_reopen_queue(), so that the top-level reopened
node is handled the same way as children are. Option/flag inheritance
from the parent becomes just one item in the list and is done at the
beginning of the function, similar to how the other items are/will be
handled.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf
4c9dfe5d8a block: Allow specifying child options in reopen
If the child was defined in the same context (-drive argument or
blockdev-add QMP command) as its parent, a reopen of the parent should
work the same and allow changing options of the child.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf
62392ebb09 block: Keep "driver" in bs->options
Instead of passing a separate drv argument to bdrv_open_common(), just
make sure that a "driver" option is set in the QDict. This also means
that a "driver" entry is consistently present in bs->options now.

This is another step towards keeping all options in the QDict (which is
the represenation of the blockdev-add QMP command).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf
4cdd01d32e block: Pass driver-specific options to .bdrv_refresh_filename()
In order to decide whether a blkdebug: filename can be produced or a
json: one is necessary, blkdebug checked whether bs->options had more
options than just "config", "x-image" or "image" (the latter including
nested options). That doesn't work well when generic block layer options
are present.

This patch passes an option QDict to the driver that contains only
driver-specific options, i.e. the options for the general block layer as
well as child nodes are already filtered out. Works much better this
way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf
260fecf13b block: Exclude nested options only for children in append_open_options()
Some drivers have nested options (e.g. blkdebug rule arrays), which
don't belong to a child node and shouldn't be removed. Don't remove all
options with "." in their name, but check for the complete prefixes of
actually existing child nodes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf
9e700c1ac6 block: Consider all block layer options in append_open_options
The code already special-cased "node-name", which is currently the only
option passed in the QDict that isn't driver-specific. Generalise the
code to take all general block layer options into consideration.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf
d9b7b05703 block: Allow references for backing files
For bs->file, using references to existing BDSes has been possible for a
while already. This patch enables the same for bs->backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf
cddff5bae1 block: Fix reopen with semantically overlapping options
This fixes bdrv_reopen() calls like the following one:

    qemu-io -c 'open -o overlap-check.template=all /tmp/test.qcow2' \
    -c 'reopen -o overlap-check=none'

The approach taken so far would result in an options QDict that has both
"overlap-check.template=all" and "overlap-check=none", which obviously
conflicts. In this case, the old option should be overridden by the
newly specified option.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2015-12-18 14:34:42 +01:00
Eric Blake
a31939e6c8 blkdebug: Merge hand-rolled and qapi BlkdebugEvent enum
No need to keep two separate enums, where editing one is likely
to forget the other.  Now that we can specify a qapi enum prefix,
we don't even have to change the bulk of the uses.

get_event_by_name() could perhaps be replaced by qapi_enum_parse(),
but I left that for another day.

CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1447836791-369-20-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-12-17 08:21:27 +01:00
Fam Zheng
df9a681dc9 qed: Implement .bdrv_drain
The "need_check_timer" is used to clear the "NEED_CHECK" flag in the
image header after a grace period once metadata update has finished. In
compliance to the bdrv_drain semantics we should make sure it remains
deleted once .bdrv_drain is called.

We cannot reuse qed_need_check_timer_cb because here it doesn't satisfy
the assertion.  Do the "plug" and "flush" calls manually.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-10-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Peter Maydell
c459343b85 error: More error_setg() usage
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWQ4F7AAoJEDhwtADrkYZTfLwP/RQrScyqDdJamcXGY1vgtoKW
 2SgkGsx9zS8EwFpROugi5d4YeQItU469PI9KAK8Xlt+qsRqOmAlWAGwBp9s78bs5
 P9BYNISdhpF2YcRWBz01rk8PqW80P30dwpCNJQtG/xMv2fGoqFm7/OlFLuv1rfXi
 G/Yo5qIVPshOjmwXBY1CHckhMylhZrkuLz+7DDfhK9dQbgcmHc8C8VruiWhLJmdx
 lP+pkGVY0U7w6vwH0+FQKzMJnsrCfdweK1MXrp6j/mSB0u8YigyQ91eaFu68XZYr
 MaPAJYAPvrBwK4AzW/hzYNeFkJmAHTAb8BCz5MfjVDjnWOR97+IF1RWd+OFUSdnC
 r0m40N1e6L9AybQROoM23dEVmAH+gwObbR+np718tn5/HyDthisfjgGPJG2F7Sik
 GXUWbv5fu0371e1GIXYPXsrkyZ8+psLVnSFqp+I77RbVlLh4qaSBd75cQA21s/Md
 vmWL+byIE9GU2PHjoVxV49j24ULYjahSmewxwT8n2sMLOWKmqHVYAFJc06HP7Udc
 pVqGewMv4eYZgNuUbclEWwdXof6qJ35uksM4C5Ps4oLQood8MSINBDzQkIed4Ylx
 rrCgB9J5+tlQbxTZkvhpxrpXEoEbfT+X57cv9oBTnEyMZ4hknzVSx2+pri8LWWhe
 hWyZ9yJDZobRfxJy9bD0
 =FMeO
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2015-11-11' into staging

error: More error_setg() usage

# gpg: Signature made Wed 11 Nov 2015 17:57:15 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-error-2015-11-11:
  error: More error_setg() usage

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-12 10:09:14 +00:00
Eric Blake
455b0fde8c error: More error_setg() usage
A few uses of error_set(ERROR_CLASS_GENERIC_ERROR) were missed in
c6bd8c706, or have snuck in since.  Nuke them.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1447224690-9743-19-git-send-email-eblake@redhat.com>
Acked-by: Andreas Färber <afaerber@suse.de>
[Indentation tidied up, commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-11 18:56:26 +01:00
Alberto Garcia
a0d64a61db throttle: Use bs->throttle_state instead of bs->io_limits_enabled
There are two ways to check for I/O limits in a BlockDriverState:

- bs->throttle_state: if this pointer is not NULL, it means that this
  BDS is member of a throttling group, its ThrottleTimers structure
  has been initialized and its I/O limits are ready to be applied.

- bs->io_limits_enabled: if true it means that the throttle_state
  pointer is valid _and_ the limits are currently enabled.

The latter is used in several places to check whether a BDS has I/O
limits configured, but what it really checks is whether requests
are being throttled or not. For example, io_limits_enabled can be
temporarily set to false in cases like bdrv_read_unthrottled() without
otherwise touching the throtting configuration of that BDS.

This patch replaces bs->io_limits_enabled with bs->throttle_state in
all cases where what we really want to check is the existence of I/O
limits, not whether they are currently enabled or not.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Alberto Garcia
3e8c2e5705 block: support passing 'backing': '' to 'blockdev-add'
Passing an empty string allows opening an image but not its backing
file. This was already described in the API documentation, only the
implementation was missing.

This is useful for creating snapshots using images opened with
blockdev-add, since they are not supposed to have a backing image
before the operation.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Max Reitz
c69a4dd899 block: Make bdrv_states public
When inserting a BDS tree into a BB, we will need to add the root BDS to
this list. Since we will want to do that in the blockdev-insert-medium
implementation in blockdev.c, we will need access to it there.

This patch is not exactly elegant, but bdrv_states will be removed in
the future anyway because we no longer need it since we have BBs.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:46 +01:00
Alberto Garcia
9f4ed6fbd2 block: Don't call blk_bs() twice in bdrv_lookup_bs()
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:46 +01:00
Max Reitz
5433c24f0f block: Prepare for NULL BDS
blk_bs() will not necessarily return a non-NULL value any more (unless
blk_is_available() is true or it can be assumed to otherwise, e.g.
because it is called immediately after a successful blk_new_with_bs() or
blk_new_open()).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
373340b26c block: Move I/O status and error actions into BB
These options are only relevant for the user of a whole BDS tree (like a
guest device or a block job) and should thus be moved into the
BlockBackend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
7f0e9da6f1 block: Move BlockAcctStats into BlockBackend
As the comment above bdrv_get_stats() says, BlockAcctStats is something
which belongs to the device instead of each BlockDriverState. This patch
therefore moves it into the BlockBackend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
68e9ec017b block: Move guest_block_size into BlockBackend
guest_block_size is a guest device property so it should be moved into
the interface between block layer and guest devices, which is the
BlockBackend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
b4d02820d9 block: Invoke change media CB before NULLing drv
In order to handle host device passthrough, some guest device models
may call blk_is_inserted() to check whether the medium is inserted on
the host, when checking the guest tray status.

This tray status is inquired by blk_dev_change_media_cb(); because
bdrv_is_inserted() (invoked by blk_is_inserted()) always returns false
for BDS with drv set to NULL, blk_dev_change_media_cb() should therefore
be called before drv is set to NULL.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
28d7a78996 block: Make bdrv_is_inserted() recursive
If bdrv_is_inserted() is called on the top level BDS, it should make
sure all nodes in the BDS tree are actually inserted.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:22 +02:00
Max Reitz
e031f75048 block: Make bdrv_is_inserted() return a bool
Make bdrv_is_inserted(), blk_is_inserted(), and the callback
BlockDriver.bdrv_is_inserted() return a bool.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:22 +02:00
Max Reitz
d44f928a54 block: Set BDRV_O_INCOMING in bdrv_fill_options()
This flag should not be set for the root BDS only, but for any BDS that
is being created while incoming migration is pending, so setting it is
moved from blockdev_init() to bdrv_fill_options().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:22 +02:00
Daniel P. Berrange
10817bf09d coroutine: move into libqemuutil.a library
The coroutine files are currently referenced by the block-obj-y
variable. The coroutine functionality though is already used by
more than just the block code. eg migration code uses coroutine
yield. In the future the I/O channel code will also use the
coroutine yield functionality. Since the coroutine code is nicely
self-contained it can be easily built as part of the libqemuutil.a
library, making it widely available.

The headers are also moved into include/qemu, instead of the
include/block directory, since they are now part of the util
codebase, and the impl was never in the block/ directory
either.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-20 14:59:04 +01:00
Jeff Cody
15489c769b block: auto-generated node-names
If a node-name is not specified, automatically generate the node-name.

Generated node-names will use the "block" sub-system identifier.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-16 15:34:30 +02:00
Kevin Wolf
779020cbdc block: Allow bdrv_unref_child(bs, NULL)
bdrv_unref() can be called with a NULL argument and doesn't do anything
then. Make bdrv_unref_child() consistent with it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2015-10-16 15:34:30 +02:00
Kevin Wolf
8e419aefa0 block: Remove bdrv_swap()
bdrv_swap() is unused now. Remove it and all functions that have
no other users than bdrv_swap(). In particular, this removes the
.bdrv_rebind callbacks from block drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:30 +02:00
Kevin Wolf
3f09bfbc7b block: Add and use bdrv_replace_in_backing_chain()
This cleans up the mess we left behind in the mirror code after the
previous patch. Instead of using bdrv_swap(), just change pointers.

The interface change of the mirror job that callers must consider is
that after job completion, their local BDS pointers still point to the
same node now. qemu-img must change its code accordingly (which makes it
easier to understand); the other callers stays unchanged because after
completion they don't do anything with the BDS, but just with the job,
and the job is still owned by the source BDS.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:30 +02:00
Kevin Wolf
dd62f1ca43 block: Implement bdrv_append() without bdrv_swap()
Remember all parent nodes and just change the pointers there instead of
swapping the contents of the BlockDriverState.

Handling of snapshot=on must be moved further down in bdrv_open()
because *pbs (which is the bs pointer in the BlockBackend) must already
be set before bdrv_append() is called. Otherwise bdrv_append() changes
the BB's pointer to the temporary snapshot, but bdrv_open() overwrites
it with the read-only original image.

We also need to be careful to update callers as the interface changes
(becomes less insane): Previously, the meaning of the two parameters was
inverted when bdrv_append() returns. Now any BDS pointers keep pointing
to the same node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:30 +02:00
Kevin Wolf
d42a8a935b block: Introduce parents list
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:29 +02:00
Kevin Wolf
063dd40e11 block: Split bdrv_move_feature_fields()
After bdrv_swap(), some fields must be moved back to their original BDS
to compensate for the effects that a swap of the contents of the objects
has while keeping the old addresses. Other fields must be moved back
because they should logically be moved and must stay on top

When replacing bdrv_swap() with operations changing the pointers in the
parents, we only need the latter and must avoid swapping the former.
Split the function accordingly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:29 +02:00
Kevin Wolf
5db15a5769 block: Manage backing file references in bdrv_set_backing_hd()
This simplifies the code somewhat, especially when dropping whole
backing file subchains.

The exception is the mirroring code that does adventurous things with
bdrv_swap() and in order to keep it working, I had to duplicate most of
bdrv_set_backing_hd() locally. We'll get rid again of this ugliness
shortly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:29 +02:00
Kevin Wolf
760e006384 block: Convert bs->backing_hd to BdrvChild
This is the final step in converting all of the BlockDriverState
pointers that block drivers use to BdrvChild.

After this patch, bs->children contains the full list of child nodes
that are referenced by a given BDS, and these children are only
referenced through BdrvChild, so that updating the pointer in there is
enough for changing edges in the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:29 +02:00
Kevin Wolf
b26e90f56a block: Remove bdrv_open_image()
It is unused now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:29 +02:00
Kevin Wolf
9a4f4c3156 block: Convert bs->file to BdrvChild
This patch removes the temporary duplication between bs->file and
bs->file_child by converting everything to BdrvChild.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:29 +02:00
Kevin Wolf
1fdd693308 block: Introduce BDS.file_child
Store the BdrvChild for bs->file. At this point, bs->file_child->bs just
duplicates the bs->file pointer. Later, it will completely replace it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:29 +02:00
Alberto Garcia
99b7e77567 block: disable I/O limits at the beginning of bdrv_close()
Disabling I/O limits from a BDS also drains all pending throttled
requests, so it should be done at the beginning of bdrv_close() with
the rest of the bdrv_drain() calls before the BlockDriver is closed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-02 13:48:29 +02:00
Kevin Wolf
4d2cb09251 block: Allow specifying driver-specific options to reopen
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-09-14 16:51:36 +02:00
Max Reitz
cf25ff850f block: Drop bdrv_find_whitelisted_format()
It is unused by now, so we can drop it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-09-14 16:51:36 +02:00
Max Reitz
053e1578c9 block: Drop drv parameter from bdrv_fill_options()
Now that this parameter is effectively unused, we can drop it and change
the function accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-09-14 16:51:36 +02:00
Max Reitz
ce34377124 block: Drop drv parameter from bdrv_open_inherit()
Now that this parameter is effectively unused, we can drop it and just
pass NULL to bdrv_fill_options().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-09-14 16:51:36 +02:00
Max Reitz
6ebf9aa2ef block: Drop drv parameter from bdrv_open()
Now that this parameter is effectively unused, we can drop it and just
pass NULL on to bdrv_open_inherit().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-09-14 16:51:36 +02:00
Max Reitz
e6641719fe block: Always pass NULL as drv for bdrv_open()
Change all callers of bdrv_open() to pass the driver name in the options
QDict instead of passing its BlockDriver pointer.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-09-14 16:51:36 +02:00
Kővágó, Zoltán
fe646693ac opts: produce valid command line in qemu_opts_print
This will let us print options in a format that the user would actually
write it on the command line (foo=bar,baz=asd,etc=def), without
prepending a spurious comma at the beginning of the list, or quoting
values unnecessarily.  This patch provides the following changes:
* write and id=, if the option has an id
* do not print separator before the first element
* do not quote string arguments
* properly escape commas (,) for QEMU

Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-09-11 10:21:38 +03:00
Wen Congyang
e12f378409 block: more check for replaced node
We use mirror+replace to fix quorum's broken child. bs/s->common.bs
is quorum, and to_replace is the broken child. The new child is target_bs.
Without this patch, the replace node can be any node, and it can be
top BDS with BB, or another quorum's child. We just check if the broken
child is part of the quorum BDS in this patch.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Message-id: 55A86486.1000404@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-09-02 14:56:39 +01:00
Kevin Wolf
80a1e13091 block: Fix backing file child when modifying graph
This patch moves bdrv_attach_child() from the individual places that add
a backing file to a BDS to bdrv_set_backing_hd(), which is called by all
of them. It also adds bdrv_detach_child() there.

For normal operation (starting with one backing file chain and not
changing it until the topmost image is closed) and live snapshots, this
constitutes no change in behaviour.

For all other cases, this is a fix for the bug that the old backing file
was still referenced as a child, and the new one wasn't referenced.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-07-14 17:15:23 +02:00
Kevin Wolf
9a7dedbc43 block: Reorder cleanups in bdrv_close()
Block drivers may still want to access their child nodes in their
.bdrv_close handler. If they unref and/or detach a child by themselves,
this should not result in a double free.

There is additional code for backing files, which are just a special
case of child nodes. The same applies for them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-07-14 17:15:23 +02:00
Kevin Wolf
33a604075c block: Introduce bdrv_unref_child()
This is the counterpart for bdrv_open_child(). It decreases the
reference count of the child BDS and removes it from the list of
children of the given parent BDS.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-07-14 17:15:23 +02:00
Kevin Wolf
b4b059f628 block: Introduce bdrv_open_child()
It is the same as bdrv_open_image(), except that it doesn't only return
success or failure, but the newly created BdrvChild object for the new
child node.

As the BdrvChild object already contains a BlockDriverState pointer (and
this is supposed to become the only pointer so that bdrv_append() and
friends can just change a single pointer in BdrvChild), the pbs
parameter is removed for bdrv_open_child().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-07-14 17:15:18 +02:00
Kevin Wolf
df58179267 block: Move bdrv_attach_child() calls up the call chain
Let the callers of bdrv_open_inherit() call bdrv_attach_child(). It
needs to be called in all cases where bdrv_open_inherit() succeeds (i.e.
returns 0) and a child_role is given.

bdrv_attach_child() is moved upwards to avoid a forward declaration.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-07-14 15:55:19 +02:00
Fam Zheng
53ec73e264 block: Use bdrv_drain to replace uncessary bdrv_drain_all
There callers work on a single BlockDriverState subtree, where using
bdrv_drain() is more accurate.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-07 14:27:14 +01:00
Fam Zheng
c2e0dbbfd7 block: Initialize local_err in bdrv_append_temp_snapshot
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1436156684-16526-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-07 14:27:14 +01:00
Fam Zheng
6e82e4bce1 block: Remove bdrv_reset_dirty
Using this function would always be wrong because a dirty bitmap must
have a specific owner that consumes the dirty bits and calls
bdrv_reset_dirty_bitmap().

Remove the unused function to avoid future misuse.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-02 10:06:23 +01:00
Dimitris Aragiorgis
b192af8acc block: Use bdrv_is_sg() everywhere
Instead of checking bs->sg use bdrv_is_sg() consistently throughout
the code.

Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435056300-14924-2-git-send-email-dimara@arrikto.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-06-23 15:08:52 +01:00
Wen Congyang
c6a8c3283f util/hbitmap: Add an API to reset all set bits in hbitmap
The function bdrv_clear_dirty_bitmap() is updated to use
faster hbitmap_reset_all() call.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 555E868A.60506@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-06-23 15:06:16 +01:00
Markus Armbruster
cc7a8ea740 Include qapi/qmp/qerror.h exactly where needed
In particular, don't include it into headers.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-06-22 18:20:41 +02:00
Markus Armbruster
d49b683644 qerror: Move #include out of qerror.h
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-06-22 18:20:40 +02:00
Markus Armbruster
c6bd8c706a qerror: Clean up QERR_ macros to expand into a single string
These macros expand into error class enumeration constant, comma,
string.  Unclean.  Has been that way since commit 13f59ae.

The error class is always ERROR_CLASS_GENERIC_ERROR since the previous
commit.

Clean up as follows:

* Prepend every use of a QERR_ macro by ERROR_CLASS_GENERIC_ERROR, and
  delete it from the QERR_ macro.  No change after preprocessing.

* Rewrite error_set(ERROR_CLASS_GENERIC_ERROR, ...) into
  error_setg(...).  Again, no change after preprocessing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-06-22 18:20:40 +02:00
Peter Maydell
f3e3b083d4 Block layer core and image format patches
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJVevYFAAoJEH8JsnLIjy/W3jEP/0hiQ3rCRZ/he8s5maTdT+TR
 YSeHkB5rKpz0Uopn1DMn1QrIbUVzX7dyb+uf9zQ0/xRQIzf6k8uxqU/NWrdoF3NK
 qx91dGWedwnG+TEBIMbcR7nMrw4dP6kH7uPz/VWMXDHVLz0HIcD95qhKgs0mSY6J
 dWqex6ACjXM68zJU5IioagU9evV80WZE1S8z7zfixxtTBx5hCaTVbwalkaCxcrXw
 PbZle55rjI8B10+OzgBw0fq10nias+NTndU9CwNBboxmEtAjq8/mQ663vcWlmiFo
 9a/hkda27Z5ut/0Tqk1v4uLHauylp++rrAabPBAuCFMKes6cdkddP15Q/r52aJ29
 5meodQtbet1rGrM+Aq4vuSuWId71PGypEI/3URDdNfYFNISoeLLsk4lcQUu7VrDD
 sRX3Jt8SI3nkIgOnhPyi7NDPmafxFt8yRt5vM8MyR5ynF8NS/2hiAc3wqnbXGjUj
 a5GqDCefb1yM0R5HvksuFFt3OnXlKJQ3J+ksXNUJf9DSAZPauqWD696pcTeg8wyy
 3PIGkczgUuKTVfFWd3THZxJLAo7ZuqvXBHHV8o1SeMBDxwh4FhTd8Kjvm3rUNFfl
 VDox4qwZ1AcxLrxgqazKU7sD9iWBDHURRcpOoUsBys7oxQZnhcmMp1fRlEkTOyrD
 HNiSNByqBrtkfeVzHlSe
 =QgYk
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer core and image format patches

# gpg: Signature made Fri Jun 12 16:08:53 2015 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (25 commits)
  block: Fix reopen flag inheritance
  block: Add BlockDriverState.inherits_from
  block: Add list of children to BlockDriverState
  queue.h: Add QLIST_FIX_HEAD_PTR()
  block: Drain requests before swapping nodes in bdrv_swap()
  block: Move flag inheritance to bdrv_open_inherit()
  block: Use QemuOpts in bdrv_open_common()
  block: Use macro for cache option names
  vmdk: Use bdrv_open_image()
  quorum: Use bdrv_open_image()
  check-qdict: Test cases for new functions
  qdict: Add qdict_{set,copy}_default()
  qdict: Add qdict_array_entries()
  iotests: Add tests for overriding BDRV_O_PROTOCOL
  block: driver should override flags in bdrv_open()
  block: Change bitmap truncate conditional to assertion
  block: record new size in bdrv_dirty_bitmap_truncate
  raw-posix: Fix .bdrv_co_get_block_status() for unaligned image size
  vmdk: Use vmdk_find_index_in_cluster everywhere
  vmdk: Fix index_in_cluster calculation in vmdk_co_get_block_status
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-06-15 10:43:06 +01:00
Kevin Wolf
67251a3113 block: Fix reopen flag inheritance
When reopening an image, the block layer already takes care to reopen
bs->file as well with recalculated inherited flags. The same must happen
for any other child (most notably missing before this patch: backing
files).

If bs->file (or any other child) didn't originally inherit from bs, e.g.
because it was created separately and then only referenced, it must not
inherit flags on reopen either, so check the inherited_from field before
propagation the reopen down.

VMDK already reopened its extents manually; this code can now be
dropped.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-06-12 17:04:59 +02:00
Kevin Wolf
bddcec3745 block: Add BlockDriverState.inherits_from
Currently, the block layer assumes that any block node can have only one
parent, and if it has a parent, that it inherits some options/flags from
this parent.

This is not true any more: With references used in block device
creation, a single node can be used by multiple parents, or it can be
created separately and not inherit flags from any parent.

To handle reopens correctly, a node must know from which parent it
inherited options. This patch adds the information to BlockDriverState.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-06-12 17:04:59 +02:00
Kevin Wolf
6e93e7c41f block: Add list of children to BlockDriverState
This allows iterating over all children of a given BDS, not only
including bs->file and bs->backing_hd, but also driver-specific
ones like VMDK extents or Quorum children.

For bdrv_swap(), the list of children of the swapped BDS stays at that
BDS (because that's where the pointers stay as well). The list head
moves and pointers to it must be fixed up therefore.

The list of children in the parent of the swapped BDS is not affected by
the swap. The contents of the BDS objects is swapped, so the existing
pointer in the parent automatically points to the newly swapped in BDS.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-06-12 17:04:59 +02:00
Kevin Wolf
6ee4ce1ee7 block: Drain requests before swapping nodes in bdrv_swap()
bdrv_swap() requires that there are no requests in flight on either of
the two devices. The request coroutine would work on the wrong
BlockDriverState object (with bs->opaque even being interpreted as a
different type potentially) and all sorts of bad things would result
from this.

The currently existing callers mostly ensure that there is no I/O
pending on nodes that are swapped. In detail, this is:

1. Live snapshots. This goes through qmp_transaction(), which calls
   bdrv_drain_all() before doing anything. The command is executed
   synchronously, so no new I/O can be issued concurrently.

2. snapshot=on in bdrv_open(). We're in the middle of opening the image
   (both the original image and its temporary overlay), so there can't
   be any I/O in flight yet.

3. Mirroring. bdrv_drain() is already used on the source device so that
   the mirror doesn't miss anything. However, the main loop runs between
   that and the bdrv_swap() (which is actually a bug, being addressed in
   another series), so there is a small window in which new I/O might be
   issued that would be in flight during bdrv_swap().

It is safer to just drain the request queue of both devices in
bdrv_swap() instead of relying on callers to do the right thing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-06-12 17:04:59 +02:00
Kevin Wolf
f3930ed0bb block: Move flag inheritance to bdrv_open_inherit()
Instead of letting every caller of bdrv_open() determine the right flags
for its child node manually and pass them to the function, pass the
parent node and the role of the newly opened child (like backing file,
protocol layer, etc.).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-06-12 17:04:59 +02:00
Kevin Wolf
18edf289a8 block: Use QemuOpts in bdrv_open_common()
Instead of manually parsing options and then deleting them from the
options QDict, just use QemuOpts like most other places that deal with
block device options.

More options will be added there and then QemuOpts is a lot more
manageable than open-coding everything.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-06-12 16:59:06 +02:00
Max Reitz
53a2951312 block: driver should override flags in bdrv_open()
The BDRV_O_PROTOCOL flag should have an impact only if no driver is
specified explicitly. Therefore, if bdrv_open() is called with an
explicit block driver argument (either through the options QDict or
through the drv parameter) and that block driver is a protocol block
driver, BDRV_O_PROTOCOL should be set; if it is a format block driver,
BDRV_O_PROTOCOL should be unset.

While there was code to unset the flag in case a format block driver
has been selected, it only followed the bdrv_fill_options() function
call whereas the flag in fact needs to be adjusted before it is used
there.

With that change, BDRV_O_PROTOCOL will always be set if the BDS should
be a protocol driver; if the driver has been specified explicitly, the
new code will set it; and bdrv_fill_options() will only "probe" a
protocol driver if BDRV_O_PROTOCOL is set. The probing after
bdrv_fill_options() cannot select a protocol driver.

Thus, bdrv_open_image() to open BDS.file is never called if a protocol
BDS is about to be created. With that change in turn it is impossible to
call bdrv_open_common() with a protocol drv and file != NULL, which
allows us to remove the bdrv_swap() call.

This change breaks a test case in qemu-iotest 051:
"-drive file=t.qcow2,file.driver=qcow2" now works because the explicitly
specified "qcow2" overrides the BDRV_O_PROTOCOL which is automatically
set for the "file" BDS (and the filename is just passed down).
Therefore, this patch removes that test case.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-06-12 15:54:07 +02:00
John Snow
06207b0ff5 block: Change bitmap truncate conditional to assertion
This is an artifact of an older version that had both all-bitmap and
single-bitmap truncate functions, and some info got lost in the shuffle.

Bitmaps can only be frozen during a backup operation, and a backup
operation should prevent a resize operation, so just assert that this
cannot happen.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-06-12 15:54:01 +02:00
John Snow
5270b6a0d0 block: record new size in bdrv_dirty_bitmap_truncate
ce1ffea8 neglected to update the BdrvDirtyBitmap structure
itself for internal consistency. It's currently not an issue,
but for migration and persistence series this will cause headaches.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-06-12 15:54:01 +02:00
Alberto Garcia
db6283385c throttle: acquire the ThrottleGroup lock in bdrv_swap()
bdrv_swap() touches the fields of a BlockDriverState that are
protected by the ThrottleGroup lock. Although those fields end up in
their original place, they are temporarily swapped in the process,
so there's a chance that an operation on a member of the same group
happening on a different thread can try to use them.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: d92dc40d7c4f1fc5cda5cbbf4ffb7a4670b79d17.1433779731.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-06-12 14:00:00 +01:00
Alberto Garcia
76f4afb40f throttle: Add throttle group support
The throttle group support use a cooperative round robin scheduling
algorithm.

The principles of the algorithm are simple:
- Each BDS of the group is used as a token in a circular way.
- The active BDS computes if a wait must be done and arms the right
  timer.
- If a wait must be done the token timer will be armed so the token
  will become the next active BDS.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: f0082a86f3ac01c46170f7eafe2101a92e8fde39.1433779731.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-06-12 14:00:00 +01:00
Benoît Canet
0e5b0a2d54 throttle: Extract timers from ThrottleState into a separate structure
Group throttling will share ThrottleState between multiple bs.
As a consequence the ThrottleState will be accessed by multiple aio
context.

Timers are tied to their aio context so they must go out of the
ThrottleState structure.

This commit paves the way for each bs of a common ThrottleState to
have its own timer.

Signed-off-by: Benoit Canet <benoit.canet@nodalink.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 6cf9ea96d8b32ae2f8769cead38f68a6a0c8c909.1433779731.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-06-12 14:00:00 +01:00
John Snow
9abe3bdc45 qapi: add dirty bitmap status
Bitmaps can be in a handful of different states with potentially
more to come as we tool around with migration and persistence patches.

Management applications may need to know why certain bitmaps are
unavailable for various commands, e.g. busy in another operation,
busy being migrated, etc.

Right now, all we offer is BlockDirtyInfo's boolean member 'frozen'.
Instead of adding more booleans, replace it by an enumeration member
'status' with values 'active' and 'frozen'.  Then add new value
'disabled'.

Incompatible change.  Fine because the changed part hasn't been
released so far.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-05-29 12:53:12 +02:00
Fam Zheng
4a9c9ea0d3 block: Detect multiplication overflow in bdrv_getlength
Bogus image may have a large total_sectors that will overflow the
multiplication. For cleanness, fix the return code so the error message
will be meaningful.

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-05-22 17:08:01 +02:00
Denis V. Lunev
459b4e6612 block: align bounce buffers to page
The following sequence
    int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
    for (i = 0; i < 100000; i++)
            write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes.

The difference is quite reliable.

On the other hand we do not want at the moment to enforce bounce
buffering if guest request is aligned to 512 bytes.

The patch changes default bounce buffer optimal alignment to
MAX(page size, 4k). 4k is chosen as maximal known sector size on real
HDD.

The justification of the performance improve is quite interesting.
From the kernel point of view each request to the disk was split
by two. This could be seen by blktrace like this:
  9,0   11  1     0.000000000 11151  Q  WS 312737792 + 1023 [qemu-img]
  9,0   11  2     0.000007938 11151  Q  WS 312738815 + 8 [qemu-img]
  9,0   11  3     0.000030735 11151  Q  WS 312738823 + 1016 [qemu-img]
  9,0   11  4     0.000032482 11151  Q  WS 312739839 + 8 [qemu-img]
  9,0   11  5     0.000041379 11151  Q  WS 312739847 + 1016 [qemu-img]
  9,0   11  6     0.000042818 11151  Q  WS 312740863 + 8 [qemu-img]
  9,0   11  7     0.000051236 11151  Q  WS 312740871 + 1017 [qemu-img]
  9,0    5  1     0.169071519 11151  Q  WS 312741888 + 1023 [qemu-img]
After the patch the pattern becomes normal:
  9,0    6  1     0.000000000 12422  Q  WS 314834944 + 1024 [qemu-img]
  9,0    6  2     0.000038527 12422  Q  WS 314835968 + 1024 [qemu-img]
  9,0    6  3     0.000072849 12422  Q  WS 314836992 + 1024 [qemu-img]
  9,0    6  4     0.000106276 12422  Q  WS 314838016 + 1024 [qemu-img]
and the amount of requests sent to disk (could be calculated counting
number of lines in the output of blktrace) is reduced about 2 times.

Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest
does his job well and real requests comes properly aligned (to page).

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1431441056-26198-3-git-send-email-den@openvz.org
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-05-22 09:37:33 +01:00
Denis V. Lunev
4196d2f030 block: minimal bounce buffer alignment
The patch introduces new concept: minimal memory alignment for bounce
buffers. Original so called "optimal" value is actually minimal required
value for aligment. It should be used for validation that the IOVec
is properly aligned and bounce buffer is not required.

Though, from the performance point of view, it would be better if
bounce buffer or IOVec allocated by QEMU will be aligned stricter.

The patch does not change any alignment value yet.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1431441056-26198-2-git-send-email-den@openvz.org
CC: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-05-22 09:37:33 +01:00
Stefan Hajnoczi
61007b316c block: move I/O request processing to block/io.c
The block.c file has grown to over 6000 lines.  It is time to split this
file so there are fewer conflicts and the code is easier to maintain.

Extract I/O request processing code:
 * Read
 * Write
 * Zero writes and making the image empty
 * Flush
 * Discard
 * ioctl
 * Tracked requests and queuing
 * Throttling and copy-on-read
 * Block status and allocated functions
 * Refreshing block limits
 * Reading/writing vmstate
 * qemu_blockalign() and friends

The patch simply moves code from block.c into block/io.c.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:17 +02:00
Stefan Hajnoczi
0eb7217e49 block: extract bdrv_setup_io_funcs()
Move the code to install coroutine and aio emulation function pointers
in a BlockDriver to its own function.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:17 +02:00
Stefan Hajnoczi
e0c47b6cb1 block: add bdrv_set_dirty()/bdrv_reset_dirty() to block_int.h
The dirty bitmap functions are called from the block I/O processing
code.  Make them visible to block_int.h users so they can be used
outside block.c.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:17 +02:00
Stefan Hajnoczi
4f5472cb2d block: replace bdrv_states iteration with bdrv_next()
The bdrv_states list is a static variable in block.c.

bdrv_drain_all() and bdrv_flush_all() use this variable to iterate over
all drives.

The next patch will move bdrv_drain_all() and bdrv_flush_all() out of
block.c so it's necessary to switch to the public bdrv_next() interface.

Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:17 +02:00
John Snow
ce1ffea8cd block: Resize bitmaps on bdrv_truncate
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1429314609-29776-16-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:10 +02:00
John Snow
20dca81075 block: Ensure consistent bitmap function prototypes
We often don't need the BlockDriverState for functions
that operate on bitmaps. Remove it.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1429314609-29776-15-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:10 +02:00
John Snow
aa0c7ca506 block: add BdrvDirtyBitmap documentation
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1429314609-29776-14-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:10 +02:00
John Snow
a113534ffb qmp: Add dirty bitmap status field in query-block
Add the "frozen" status booleans, to inform clients
when a bitmap is occupied doing a task.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1429314609-29776-13-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:10 +02:00
John Snow
e74e6b78e6 qmp: add block-dirty-bitmap-clear
Add bdrv_clear_dirty_bitmap and a matching QMP command,
qmp_block_dirty_bitmap_clear that enables a user to reset
the bitmap attached to a drive.

This allows us to reset a bitmap in the event of a full
drive backup.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1429314609-29776-12-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:10 +02:00
John Snow
d58d845397 qmp: Add support of "dirty-bitmap" sync mode for drive-backup
For "dirty-bitmap" sync mode, the block job will iterate through the
given dirty bitmap to decide if a sector needs backup (backup all the
dirty clusters and skip clean ones), just as allocation conditions of
"top" sync mode.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1429314609-29776-11-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:10 +02:00
John Snow
9bd2b08f27 block: Add bitmap successors
A bitmap successor is an anonymous BdrvDirtyBitmap that is intended to
be created just prior to a sensitive operation (e.g. Incremental Backup)
that can either succeed or fail, but during the course of which we still
want a bitmap tracking writes.

On creating a successor, we "freeze" the parent bitmap which prevents
its deletion, enabling, anonymization, or creating a bitmap with the
same name.

On success, the parent bitmap can "abdicate" responsibility to the
successor, which will inherit its name. The successor will have been
tracking writes during the course of the backup operation. The parent
will be safely deleted.

On failure, we can "reclaim" the successor from the parent, unifying
them such that the resulting bitmap describes all writes occurring since
the last successful backup, for instance. Reclamation will thaw the
parent, but not explicitly re-enable it.

BdrvDirtyBitmap operations that target a single bitmap are protected
by assertions that the bitmap is not frozen and/or disabled.

BdrvDirtyBitmap operations that target a group of bitmaps, such as
bdrv_{set,reset}_dirty will ignore frozen/disabled drives with a
conditional instead.

Internal functions that enable/disable dirty bitmaps have assertions
added to them to prevent modifying frozen bitmaps.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1429314609-29776-10-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:10 +02:00
John Snow
b8e6fb752e block: Add bitmap disabled status
Add a status indicating the enabled/disabled state of the bitmap.
A bitmap is by default enabled, but you can lock the bitmap into
a read-only state by setting disabled = true.

A previous version of this patch added a QMP interface for changing
the state of the bitmap, but it has since been removed for now until
a use case emerges where this state must be revealed to the user.

The disabled state WILL be used internally for bitmap migration and
bitmap persistence.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1429314609-29776-9-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:10 +02:00
John Snow
592fdd02ae block: Introduce bdrv_dirty_bitmap_granularity()
This returns the granularity (in bytes) of dirty bitmap,
which matches the QMP interface and the existing query
interface.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1429314609-29776-6-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:10 +02:00
John Snow
341ebc2f81 qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove
The new command pair is added to manage a user created dirty bitmap. The
dirty bitmap's name is mandatory and must be unique for the same device,
but different devices can have bitmaps with the same names.

The granularity is an optional field. If it is not specified, we will
choose a default granularity based on the cluster size if available,
clamped to between 4K and 64K to mirror how the 'mirror' code was
already choosing granularity. If we do not have cluster size info
available, we choose 64K. This code has been factored out into a helper
shared with block/mirror.

This patch also introduces the 'block_dirty_bitmap_lookup' helper,
which takes a device name and a dirty bitmap name and validates the
lookup, returning NULL and setting errp if there is a problem with
either field. This helper will be re-used in future patches in this
series.

The types added to block-core.json will be re-used in future patches
in this series, see:
'qapi: Add transaction support to block-dirty-bitmap-{add, enable, disable}'

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1429314609-29776-5-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:10 +02:00
John Snow
5fba6c0e50 qmp: Ensure consistent granularity type
We treat this field with a variety of different types everywhere
in the code. Now it's just uint32_t.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1429314609-29776-4-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:10 +02:00
Fam Zheng
0db6e54a8a qapi: Add optional field "name" to block dirty bitmap
This field will be set for user created dirty bitmap. Also pass in an
error pointer to bdrv_create_dirty_bitmap, so when a name is already
taken on this BDS, it can report an error message. This is not global
check, two BDSes can have dirty bitmap with a common name.

Implemented bdrv_find_dirty_bitmap to find a dirty bitmap by name, will
be used later when other QMP commands want to reference dirty bitmap by
name.

Add bdrv_dirty_bitmap_make_anon. This unsets the name of dirty bitmap.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1429314609-29776-3-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:10 +02:00
Alberto Garcia
d5a8ee60a0 qmp: fill in the image field in BlockDeviceInfo
The image field in BlockDeviceInfo is supposed to contain an ImageInfo
object. However that is being filled in by bdrv_query_info(), not by
bdrv_block_device_info(), which is where BlockDeviceInfo is actually
created.

Anyone calling bdrv_block_device_info() directly will get a null image
field. As a consequence of this, the HMP command 'info block -n -v'
crashes QEMU.

This patch moves the code that fills in that field from
bdrv_query_info() to bdrv_block_device_info().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 1429271563-3765-1-git-send-email-berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:09 +02:00
Alberto Garcia
81e5f78a9f block: use bdrv_get_device_or_node_name() in error messages
There are several error messages that identify a BlockDriverState by
its device name. However those errors can be produced in nodes that
don't have a device name associated.

In those cases we should use bdrv_get_device_or_node_name() to fall
back to the node name and produce a more meaningful message. The
messages are also updated to use the more generic term 'node' instead
of 'device'.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 9823a1f0514fdb0692e92868661c38a9e00a12d6.1428485266.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:09 +02:00
Alberto Garcia
9b2aa84f87 block: add bdrv_get_device_or_node_name()
This function gets the device name associated with a BlockDriverState,
or its node name if the device name is empty.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 4fa30aa8d61d9052ce266fd5429a59a14e941255.1428485266.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:09 +02:00
Paolo Bonzini
0b5a24454f block: avoid unnecessary bottom halves
bdrv_aio_* APIs can use coroutines to achieve asynchronicity.  However,
the coroutine may terminate without having yielded back to the caller
(for example because of something that invokes a nested event loop,
or because the coroutine is doing nothing at all).  In this case,
the bdrv_aio_* API must delay the completion to the next iteration
of the main loop, because bdrv_aio_* will never invoke the callback
before returning.

This can be done with a bottom half, and indeed bdrv_aio_* is always
using one for simplicity.  It is possible to gain some performance
(~3%) by avoiding this in the common case.  A new field in the
BlockAIOCBCoroutine struct is set to true until the first time the
corotine has yielded to its creator, and completion goes through a
new function bdrv_co_complete.  If the flag is false, bdrv_co_complete
invokes the callback immediately.  If it is true, the caller will
notice that the coroutine has completed and schedule the bottom
half itself.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1427524638-28157-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:09 +02:00
Fam Zheng
69da3b0b47 block: Pause block jobs in bdrv_drain_all
This is necessary to suppress more IO requests from being generated from
block job coroutines.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1428069921-2957-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:09 +02:00
Fam Zheng
de50a20a4c block: Switch to host monotonic clock for IO throttling
Currently, throttle timers won't make any progress when VCPU is not
running, which would stall the request queue in utils, qtest, vm
suspending, and live migration, without special handling.

Block jobs are confusingly inconsistent between with and without
throttling: if user sets a bps limit, stops the vm, then start a block
job, the block job will not make any progress; in contrary, if user
unsets the bps limit, or if it's not set, the block job will run
normally.

After this patch, with the host clock, even if the VCPUs are stopped,
the throttle queues will be processed.

This patch also enables potential to add throttle to bdrv_drain_all.
Currently all requests are drained immediately. In other words whenever
it is called, IO throttling goes ineffective (examples: system reset,
migration and many block job operations.). This is a loophole that guest
could exploit. If we use the host clock, we can later just trust the
nested poll. This could be done on top.

Note that for qemu-iotests case 093, which uses qtest, we still keep vm
clock so the script can control the clock stepping in order to be
deterministic.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1427268446-6426-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:08 +02:00
Stefan Hajnoczi
786a4ea82e Convert (ffs(val) - 1) to ctz32(val)
This commit was generated mechanically by coccinelle from the following
semantic patch:

@@
expression val;
@@
- (ffs(val) - 1)
+ ctz32(val)

The call sites have been audited to ensure the ffs(0) - 1 == -1 case
never occurs (due to input validation, asserts, etc).  Therefore we
don't need to worry about the fact that ctz32(0) == 32.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1427124571-28598-5-git-send-email-stefanha@redhat.com
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:08 +02:00
Fam Zheng
fc3959e466 block: Fix unaligned zero write
If the zero write is not aligned, bdrv_co_do_pwritev will segfault
because of accessing to the NULL qiov passed in by bdrv_co_write_zeroes.
Fix this by allocating a local qiov in bdrv_co_do_pwritev if the request
is not aligned. (In this case the padding iovs are necessary anyway, so
it doesn't hurt.)

Also add a check at the end of bdrv_co_do_pwritev to clear the zero flag
if padding is involved.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1427160230-4489-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-03-27 10:01:12 +00:00
Fam Zheng
d51a2427f6 block: Drop bdrv_find
All callers are converted, so drop it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1425296209-1476-5-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-03-16 12:10:30 -04:00
Markus Armbruster
a1f688f415 block: Deprecate QCOW/QCOW2 encryption
We've steered users away from QCOW/QCOW2 encryption for a while,
because it's a flawed design (commit 136cd19 Describe flaws in
qcow/qcow2 encryption in the docs).

In addition to flawed crypto, we have comically bad usability, and
plain old bugs.  Let me show you.

= Example images =

I'm going to use a raw image as backing file, and two QCOW2 images,
one encrypted, and one not:

    $ qemu-img create -f raw backing.img 4m
    Formatting 'backing.img', fmt=raw size=4194304
    $ qemu-img create -f qcow2 -o encryption,backing_file=backing.img,backing_fmt=raw geheim.qcow2 4m
    Formatting 'geheim.qcow2', fmt=qcow2 size=4194304 backing_file='backing.img' backing_fmt='raw' encryption=on cluster_size=65536 lazy_refcounts=off
    $ qemu-img create -f qcow2 -o backing_file=backing.img,backing_fmt=raw normal.qcow2 4m
    Formatting 'normal.qcow2', fmt=qcow2 size=4194304 backing_file='backing.img' backing_fmt='raw' encryption=off cluster_size=65536 lazy_refcounts=off

= Usability issues =

== Confusing startup ==

When no image is encrypted, and you don't give -S, QEMU starts the
guest immediately:

    $ qemu-system-x86_64 -nodefaults -display none -monitor stdio normal.qcow2
    QEMU 2.2.50 monitor - type 'help' for more information
    (qemu) info status
    VM status: running

But as soon as there's an encrypted image in play, the guest is *not*
started, with no notification whatsoever:

    $ qemu-system-x86_64 -nodefaults -display none -monitor stdio geheim.qcow2
    QEMU 2.2.50 monitor - type 'help' for more information
    (qemu) info status
    VM status: paused (prelaunch)

If the user figured out that he needs to type "cont" to enter his
keys, the confusion enters the next level: "cont" asks for at most
*one* key.  If more are needed, it then silently does nothing.  The
user has to type "cont" once per encrypted image:

    $ qemu-system-x86_64 -nodefaults -display none -monitor stdio -drive if=none,file=geheim.qcow2 -drive if=none,file=geheim.qcow2
    QEMU 2.2.50 monitor - type 'help' for more information
    (qemu) info status
    VM status: paused (prelaunch)
    (qemu) c
    none0 (geheim.qcow2) is encrypted.
    Password: ******
    (qemu) info status
    VM status: paused (prelaunch)
    (qemu) c
    none1 (geheim.qcow2) is encrypted.
    Password: ******
    (qemu) info status
    VM status: running

== Incorrect passwords not caught ==

All existing encryption schemes give you the GIGO treatment: garbage
password in, garbage data out.  Guests usually refuse to mount
garbage, but other usage is prone to data loss.

== Need to stop the guest to add an encrypted image ==

    $ qemu-system-x86_64 -nodefaults -display none -monitor stdio
    QEMU 2.2.50 monitor - type 'help' for more information
    (qemu) info status
    VM status: running
    (qemu) drive_add "" if=none,file=geheim.qcow2
    Guest must be stopped for opening of encrypted image
    (qemu) stop
    (qemu) drive_add "" if=none,file=geheim.qcow2
    OK

Commit c3adb58 added this restriction.  Before, we could expose images
lacking an encryption key to guests, with potentially catastrophic
results.  See also "Use without key is not always caught".

= Bugs =

== Use without key is not always caught ==

Encrypted images can be in an intermediate state "opened, but no key".
The weird startup behavior and the need to stop the guest are there to
ensure the guest isn't exposed to that state.  But other things still
are!

* drive_backup

    $ qemu-system-x86_64 -nodefaults -display none -monitor stdio geheim.qcow2
    QEMU 2.2.50 monitor - type 'help' for more information
    (qemu) drive_backup -f ide0-hd0 out.img raw
    Formatting 'out.img', fmt=raw size=4194304

  I guess this writes encrypted data to raw image out.img.  Good luck
  with figuring out how to decrypt that again.

* commit

    $ qemu-system-x86_64 -nodefaults -display none -monitor stdio geheim.qcow2
    QEMU 2.2.50 monitor - type 'help' for more information
    (qemu) commit ide0-hd0

  I guess this writes encrypted data into the unencrypted raw backing
  image, effectively destroying it.

== QMP device_add of usb-storage fails when it shouldn't ==

When the image is encrypted, device_add creates the device, defers
actually attaching it to when the key becomes available, then fails.
This is wrong.  device_add must either create the device and succeed,
or do nothing and fail.

    $ qemu-system-x86_64 -nodefaults -display none -usb -qmp stdio -drive if=none,id=foo,file=geheim.qcow2
    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 2}, "package": ""}, "capabilities": []}}
    { "execute": "qmp_capabilities" }
    {"return": {}}
    { "execute": "device_add", "arguments": { "driver": "usb-storage", "id": "bar", "drive": "foo" } }
    {"error": {"class": "DeviceEncrypted", "desc": "'foo' (geheim.qcow2) is encrypted"}}
    {"execute":"device_del","arguments": { "id": "bar" } }
    {"timestamp": {"seconds": 1426003440, "microseconds": 237181}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/bar/bar.0/legacy[0]"}}
    {"timestamp": {"seconds": 1426003440, "microseconds": 238231}, "event": "DEVICE_DELETED", "data": {"device": "bar", "path": "/machine/peripheral/bar"}}
    {"return": {}}

This stuff is worse than useless, it's a trap for users.

If people become sufficiently interested in encrypted images to
contribute a cryptographically sane implementation for QCOW2 (or
whatever other format), then rewriting the necessary support around it
from scratch will likely be easier and yield better results than
fixing up the existing mess.

Let's deprecate the mess now, drop it after a grace period, and move
on.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-16 17:07:25 +01:00
Ekaterina Tumanova
892b7de832 block: add bdrv functions for geometry and blocksize
Add driver functions for geometry and blocksize detection

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1424087278-49393-2-git-send-email-tumanova@linux.vnet.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10 14:02:21 +01:00
Markus Armbruster
dc523cd348 qemu-img: Suppress unhelpful extra errors in convert, amend
img_convert() and img_amend() use qemu_opts_do_parse(), which reports
errors with qerror_report_err().  Its error messages aren't helpful
here, the caller reports one that actually makes sense.  Reproducer:

    $ qemu-img convert -o backing_format=raw in.img out.img
    qemu-img: Invalid parameter 'backing_format'
    qemu-img: Invalid options for file format 'raw'

To fix, propagate errors through qemu_opts_do_parse().  This lifts the
error reporting into callers.  Drop it from img_convert() and
img_amend(), keep it in qemu_chr_parse_compat(), bdrv_img_create().

Since I'm touching qemu_opts_do_parse() anyway, write a function
comment for it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-02-26 14:51:21 +01:00
Markus Armbruster
f43e47dbf6 QemuOpts: Drop qemu_opt_set(), rename qemu_opt_set_err(), fix use
qemu_opt_set() is a wrapper around qemu_opt_set() that reports the
error with qerror_report_err().

Most of its users assume the function can't fail.  Make them use
qemu_opt_set_err() with &error_abort, so that should the assumption
ever break, it'll break noisily.

Just two users remain, in util/qemu-config.c.  Switch them to
qemu_opt_set_err() as well, then rename qemu_opt_set_err() to
qemu_opt_set().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-02-26 14:49:31 +01:00
Markus Armbruster
6be4194b92 block: Suppress unhelpful extra errors in bdrv_img_create()
bdrv_img_create() uses qemu_opt_set(), which reports errors with
qerror_report_err().  Its error messages aren't helpful here, the
caller reports one that actually makes sense.  I don't know how to
trigger the error conditions, though.

Switch to qemu_opt_set_err() to get rid of the unwanted messages.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-02-26 14:48:31 +01:00
Markus Armbruster
39101f2511 QemuOpts: Convert qemu_opt_set_number() to Error, fix its use
Return the Error object instead of reporting it with
qerror_report_err().

Change callers that assume the function can't fail to pass
&error_abort, so that should the assumption ever break, it'll break
noisily.

Turns out all callers outside its unit test assume that.  We could
drop the Error ** argument, but that would make the interface less
regular, so don't.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-02-26 14:47:32 +01:00
Max Reitz
b9c649470b block: Keep bdrv_check*_request()'s return value
Do not throw away the value returned by bdrv_check_request() and
bdrv_check_byte_request().

Fix up some coding style issues in the proximity of the affected hunks.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1423162705-32065-17-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16 15:07:19 +00:00
Max Reitz
c0191e763b block: Remove "growable" from BDS
Now that request clamping is done in the BlockBackend, the "growable"
field can be removed from the BlockDriverState. All BDSs are now treated
as being "growable" (that is, they are allowed to grow; they are not
necessarily actually able to).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-16-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16 15:07:19 +00:00
Max Reitz
b65a5e12a4 block: Add Error parameter to bdrv_find_protocol()
The argument given to bdrv_find_protocol() is just a file name, which
makes it difficult for the caller to reconstruct what protocol
bdrv_find_protocol() was hoping to find. This patch adds an Error
parameter to that function to solve this issue.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16 15:07:18 +00:00
Markus Armbruster
b1ca639184 block: Eliminate silly QERR_ macros used for encryption keys
The QERR_ macros are leftovers from the days of "rich" error objects.
They're used with error_set() and qerror_report(), and expand into the
first *two* arguments.  This trickiness has become pointless.  Clean
up QERR_DEVICE_ENCRYPTED and QERR_DEVICE_NOT_ENCRYPTED.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1422524221-8566-5-git-send-email-armbru@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-02-06 11:46:32 -05:00
Markus Armbruster
4d2855a348 block: New bdrv_add_key(), convert monitor to use it
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1422524221-8566-4-git-send-email-armbru@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-02-06 11:46:32 -05:00
Peter Lieven
75af1f34cd block: introduce BDRV_REQUEST_MAX_SECTORS
we check and adjust request sizes at several places with
sometimes inconsistent checks or default values:
 INT_MAX
 INT_MAX >> BDRV_SECTOR_BITS
 UINT_MAX >> BDRV_SECTOR_BITS
 SIZE_MAX >> BDRV_SECTOR_BITS

This patches introdocues a macro for the maximal allowed sectors
per request and uses it at several places.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:22 +01:00
Peter Lieven
f4564d53c6 block: add accounting for merged requests
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Peter Lieven
98764152ad block: change default for discard and write zeroes to INT_MAX
do not trim requests if the driver does not supply a limit
through BlockLimits. For write zeroes we still keep a limit
for the unsupported path to avoid allocating a big bounce buffer.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Jeff Cody
a1a11d10ab block: remove unused variable in bdrv_commit
As Stefan pointed out, the variable 'filename' in bdrv_commit is unused,
despite being maintained in previous patches.

With this patch, get rid of the variable for good.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-23 18:17:06 +01:00
Fam Zheng
bb00021de0 block: Split BLOCK_OP_TYPE_COMMIT to BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}
Like BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET,
block-commit involves two asymmetric devices.

This change is not user-visible (yet), because commit only works with
device names.

But once we enable backing reference in blockdev-add, or specifying
node-name in block-commit command, we don't want the user to start two
commit jobs on the same backing chain, which will corrupt things because
of the final bdrv_swap.

Before we have per category blockers, splitting this type is still
better.

[Resolved virtio-blk dataplane conflict by replacing
BLOCK_OP_TYPE_COMMIT with both BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}.
They are safe since the block job runs in the same AioContext as the
dataplane IOThread.
--Stefan]

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-01-13 13:43:29 +00:00
Peter Lieven
095e4fa4b5 block: limited request size in write zeroes unsupported path
If bs->bl.max_write_zeroes is large and we end up in the unsupported
path we might allocate a lot of memory for the iovector and/or even
generate an oversized requests.

Fix this by limiting the request by the minimum of the reported
maximum transfer size or 16MB (32768 sectors).

Reported-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-id: 1420457389-16332-1-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-01-13 13:43:29 +00:00
Vladimir Sementsov-Ogievskiy
c4237dfa63 block: fix spoiling all dirty bitmaps by mirror and migration
Mirror and migration use dirty bitmaps for their purposes, and since
commit [block: per caller dirty bitmap] they use their own bitmaps, not
the global one. But they use old functions bdrv_set_dirty and
bdrv_reset_dirty, which change all dirty bitmaps.

Named dirty bitmaps series by Fam and Snow are affected: mirroring and
migration will spoil all (not related to this mirroring or migration)
named dirty bitmaps.

This patch fixes this by adding bdrv_set_dirty_bitmap and
bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent
such mistakes in future, old functions bdrv_(set,reset)_dirty are made
static, for internal block usage.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
CC: John Snow <jsnow@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1417081246-3593-1-git-send-email-vsementsov@parallels.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-01-13 11:47:56 +00:00
Max Reitz
291680186f block: Relative backing file for image creation
Relative backing filenames are always relative to the backed image's
directory; the same applies to image creation. Therefore, if the backing
file has to be opened for determining its size (in case the size has not
been explicitly specified) its filename should be interpreted relative
to the new image's base directory and not relative to qemu's working
directory.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Max Reitz
9f07429e88 block: JSON filenames and relative backing files
When using a relative backing file name, qemu needs to know the
directory of the top image file. For JSON filenames, such a directory
cannot be easily determined (e.g. how do you determine the directory of
a qcow2 BDS directly on top of a quorum BDS?). Therefore, do not allow
relative filenames for the backing file of BDSs only having a JSON
filename.

Furthermore, BDS::exact_filename should be used whenever possible. If
BDS::filename is not equal to BDS::exact_filename, the former will
always be a JSON object.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Max Reitz
0a82855a1a block: Get full backing filename from string
Introduce bdrv_get_full_backing_filename_from_filename(), a function
which takes the name of the backed file and a potentially relative
backing filename to produce the full (absolute) backing filename.

Use this function from bdrv_get_full_backing_filename().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Paolo Bonzini
e012b78cf5 block: do not allocate an iovec per read of a growable/zero_after_eof BDS
Most reads do not go past the end of the file, and they can use the
input QEMUIOVector instead of creating one.  This removes the
qemu_iovec_* functions from the profile.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Fam Zheng
43c5d8f800 block: Don't add trailing space in "Formating..." message
Change the message printing code to output a separator for each option
string before it instead of after, then we don't one more extra ' ' in
the end.

To update qemu-iotests output files, most of the times one would just
copy the *.out.bad to *.out. With this change we will not have the
space disliked by checkpatch.pl.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1418110684-19528-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 16:52:33 +00:00
Max Reitz
5c98415b2a vmdk: Fix error for JSON descriptor file names
If vmdk blindly tries to use path_combine() using bs->file->filename as
the base file name, this will result in a bad error message for JSON
file names when calling bdrv_open(). It is better to only try
bs->file->exact_filename; if that is empty, bs->file->filename will be
useless for path_combine() and an error should be emitted (containing
bs->file->filename because desc_file_path (which is
bs->file->exact_filename) is empty).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1417615043-26174-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 13:14:10 +00:00
Max Reitz
c614972408 block: Check create_opts before image creation
If a driver supports image creation, it needs to set the .create_opts
field. We can use that to make sure .create_opts for both drivers
involved is not NULL in bdrv_img_create(), which is important so that
the create_opts pointer in that function is not NULL after the
qemu_opts_append() calls and when going into qemu_opts_create().

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:20 +01:00
Max Reitz
ef8104378c block: Omit bdrv_find_format for essential drivers
We can always assume raw, file and qcow2 being available; so do not use
bdrv_find_format() to locate their BlockDriver objects but statically
reference the respective objects.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:19 +01:00
Kevin Wolf
c5f6e493bb block: Don't probe for unknown backing file format
If a qcow2 image specifies a backing file format that doesn't correspond
to any format driver that qemu knows, we shouldn't fall back to probing,
but simply error out.

Not looking up the backing file driver in bdrv_open_backing_file(), but
just filling in the "driver" option if it isn't there moves us closer to
the goal of having everything in QDict options and gets us the error
handling of bdrv_open(), which correctly refuses unknown drivers.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1416935562-7760-4-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:13 +01:00
Kevin Wolf
38f3ef574b raw: Prohibit dangerous writes for probed images
If the user neglects to specify the image format, QEMU probes the
image to guess it automatically, for convenience.

Relying on format probing is insecure for raw images (CVE-2008-2004).
If the guest writes a suitable header to the device, the next probe
will recognize a format chosen by the guest.  A malicious guest can
abuse this to gain access to host files, e.g. by crafting a QCOW2
header with backing file /etc/shadow.

Commit 1e72d3b (April 2008) provided -drive parameter format to let
users disable probing.  Commit f965509 (March 2009) extended QCOW2 to
optionally store the backing file format, to let users disable backing
file probing.  QED has had a flag to suppress probing since the
beginning (2010), set whenever a raw backing file is assigned.

All of these additions that allow to avoid format probing have to be
specified explicitly. The default still allows the attack.

In order to fix this, commit 79368c8 (July 2010) put probed raw images
in a restricted mode, in which they wouldn't be able to overwrite the
first few bytes of the image so that they would identify as a different
image. If a write to the first sector would write one of the signatures
of another driver, qemu would instead zero out the first four bytes.
This patch was later reverted in commit 8b33d9e (September 2010) because
it didn't get the handling of unaligned qiov members right.

Today's block layer that is based on coroutines and has qiov utility
functions makes it much easier to get this functionality right, so this
patch implements it.

The other differences of this patch to the old one are that it doesn't
silently write something different than the guest requested by zeroing
out some bytes (it fails the request instead) and that it doesn't
maintain a list of signatures in the raw driver (it calls the usual
probe function instead).

Note that this change doesn't introduce new breakage for false positive
cases where the guest legitimately writes data into the first sector
that matches the signatures of an image format (e.g. for nested virt):
These cases were broken before, only the failure mode changes from
corruption after the next restart (when the wrong format is probed) to
failing the problematic write request.

Also note that like in the original patch, the restrictions only apply
if the image format has been guessed by probing. Explicitly specifying a
format allows guests to write anything they like.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:13 +01:00
Kevin Wolf
7cddd3728e block: Read only one sector for format probing
The only image format driver that even potentially accesses anything
after 512 bytes in its bdrv_probe() implementation is VMDK, which reads
a plain-text descriptor file. In practice, the field it's looking for
seems to come first and will be well within the first 512 bytes, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1416497234-29880-7-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:12 +01:00
Markus Armbruster
c6684249fd block: Factor bdrv_probe_all() out of find_image_format()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1416497234-29880-6-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:12 +01:00
Fam Zheng
20a9e77dfa block: Add bdrv_get_node_name
This returns the node name of a BDS. Remove the TODO comment and expect
the callers to be explicit.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:25:29 +01:00
Fam Zheng
04df765ab4 block: Add bdrv_next_node
Similar to bdrv_next, this traverses through graph_bdrv_states. Will be
useful to enumerate all the named nodes.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:25:29 +01:00
Fam Zheng
f3a9cfddae block: Fix max nb_sectors in bdrv_make_zero
In bdrv_rw_co we report -EINVAL for nb_sectors > INT_MAX /
BDRV_SECTOR_SIZE, so a caller shouldn't exceed it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1415603264-21497-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-14 09:20:35 +00:00
Peter Maydell
776346cd63 trivial patches for 2014-11-11
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJUYh9vAAoJEL7lnXSkw9fbgPQH/065L5+SpaJR1Nte9Lz3N2s1
 a6tGSI22yu85tKvYCdYjeoVHSkSTyR57FdTfUd2xc2QPj+J4sWXpA81KILBGTJUp
 NMpmLpWg4LOh8Ek4ViRgmFFdryzIFa4dT4gc1AcSAIAQ6jsgK1dM7m5kfncC3TN0
 TUs248vJ2i/DaE0k8TOeJqxJTqInoFttlJEqG7RD+V5JznokE4zpFNXHDGx9BptE
 W2J38GJ/TKRPe9UrHMKZI1r6+ZBdXyE/CaqsNNKLJdqrHgSQuAyK/PS6dQbM4BLg
 M1qdP7Tp0wOlvv9qoEZMOEiUsi54XPqLgaLMbW74Yp5X459fqmLW2imy49pHXt8=
 =klsW
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2014-11-11' into staging

trivial patches for 2014-11-11

# gpg: Signature made Tue 11 Nov 2014 14:38:39 GMT using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"

* remotes/mjt/tags/pull-trivial-patches-2014-11-11:
  block: Fix comment for bdrv_co_get_block_status
  sysbus: Correct SYSTEM_BUS(obj) defines
  target-i386: cpu: keeping function parameters alignment on new line
  xen-hvm: Remove redundant variable 'xstate'
  coroutine-sigaltstack: Change jmp_buf to sigjmp_buf
  pc-bios: petalogix-s3adsp1800.dtb: Use 'xlnx, xps-ethernetlite-2.00.a' instead of 'xlnx, xps-ethernetlite-2.00.b'
  gdbstub: Add a missing case of signal number translation in gdbstub
  numa: make 'info numa' take into account hotplugged memory
  slirp/smbd: modify/set several parameters in generated smbd.conf
  qemu-doc.texi: fix typos in x509 examples
  icc_bus: fix typo ICC_BRIGDE -> ICC_BRIDGE

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-11-11 14:50:10 +00:00
Fam Zheng
705be728c0 block: Fix comment for bdrv_co_get_block_status
It returns more information than binary, fix the comment.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-11-11 17:36:19 +03:00
Max Reitz
e56934bece block: Propagate error in bdrv_img_create()
If the specified backing file could not be opened, do not generate a new
error message which contains the message which has been generated by
bdrv_open(), but just propagate the latter.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-11-06 12:45:47 +01:00
Stefan Hajnoczi
5a7e7a0bad block: let mirror blockjob run in BDS AioContext
The mirror block job must run in the BlockDriverState AioContext so that
it works with dataplane.

Acquire the AioContext in blockdev.c so starting the block job is safe.

Note that to_replace is treated separately from other BlockDriverStates
in that it does not need to be in the same AioContext.  Explicitly
acquire/release to_replace's AioContext when accessing it.

The completion code in block/mirror.c must perform BDS graph
manipulation and bdrv_reopen() from the main loop.  Use
block_job_defer_to_main_loop() to achieve that.

The bdrv_drain_all() call is not allowed outside the main loop since it
could lead to lock ordering problems.  Use bdrv_drain(bs) instead
because we have acquired the AioContext so nothing else can sneak in
I/O.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1413889440-32577-10-git-send-email-stefanha@redhat.com
2014-11-03 11:41:49 +00:00
Stefan Hajnoczi
5b98db0ad3 block: add bdrv_drain()
Now that op blockers are in use, we can ensure that no other sources are
generating I/O on a BlockDriverState.  Therefore it is possible to drain
requests for a single BDS.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1413889440-32577-7-git-send-email-stefanha@redhat.com
2014-11-03 11:41:49 +00:00
Max Reitz
7748543420 block: Add status callback to bdrv_amend_options()
Depending on the changed options and the image format,
bdrv_amend_options() may take a significant amount of time. In these
cases, a way to be informed about the operation's status is desirable.

Since the operation is rather complex and may fundamentally change the
image, implementing it as AIO or a coroutine does not seem feasible. On
the other hand, implementing it as a block job would be significantly
more difficult than a simple callback and would not add benefits other
than progress report to the amending operation, because it should not
actually be run as a block job at all.

A callback may not be very pretty, but it's very easy to implement and
perfectly fits its purpose here.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1414404776-4919-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:48 +00:00
Peter Maydell
573742a543 block.c: Fix type of IoOperationType variable in send_qmp_error_event()
The local variable 'ac' in send_qmp_error_event() is declared with the
wrong type, which causes clang to complain when it is initialized
and again when it is used:

block.c:3655:20: warning: implicit conversion from enumeration type 'enum IoOperationType' to different enumeration type 'BlockErrorAction' (aka 'enum BlockErrorAction') [-Wenum-conversion]
    ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
       ~           ^~~~~~~~~~~~~~~~~~~~~~
block.c:3655:45: warning: implicit conversion from enumeration type 'enum IoOperationType' to different enumeration type 'BlockErrorAction' (aka 'enum BlockErrorAction') [-Wenum-conversion]
    ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
       ~                                    ^~~~~~~~~~~~~~~~~~~~~~~
block.c:3656:62: warning: implicit conversion from enumeration type 'BlockErrorAction' (aka 'enum BlockErrorAction') to different enumeration type 'IoOperationType' (aka 'enum IoOperationType') [-Wenum-conversion]
    qapi_event_send_block_io_error(bdrv_get_device_name(bs), ac, action,
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                           ^~

Correct the type to IoOperationType, and rename the variable
to 'optype' to match its correct type.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Message-id: 1412969583-21045-1-git-send-email-peter.maydell@linaro.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:41 +00:00
Peter Lieven
6c5a42ac34 block: avoid creating oversized writes in multiwrite_merge
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:41 +00:00
Peter Lieven
2647fab57d BlockLimits: introduce max_transfer_length
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:41 +00:00
Max Reitz
59c9a95fd2 block: Respect underlying file's EOF
When falling through to the underlying file in
bdrv_co_get_block_status(), if it returns that the query offset is
beyond the file end (by setting *pnum to 0), return the range to be
zero and do not let the number of sectors for which information could be
obtained be overwritten.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:02 +02:00
Max Reitz
9ebd844805 block: Add qemu_{,try_}blockalign0()
These functions call their non-0-counterparts and then fill the
allocated buffer with 0 (if the allocation has been successful).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Markus Armbruster
a7f53e26a6 block: Lift device model API into BlockBackend
Move device model attachment / detachment and the BlockDevOps device
model callbacks and their wrappers from BlockDriverState to
BlockBackend.

Wrapper calls in block.c change from

    bdrv_dev_FOO_cb(bs, ...)

to

    if (bs->blk) {
        bdrv_dev_FOO_cb(bs->blk, ...);
    }

No change, because both bdrv_dev_change_media_cb() and
bdrv_dev_resize_cb() do nothing when no device model is attached, and
a device model can be attached only when bs->blk.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 14:03:50 +02:00
Markus Armbruster
097310b53e block: Rename BlockDriverCompletionFunc to BlockCompletionFunc
I'll use it with block backends shortly, and the name is going to fit
badly there.  It's a block layer thing anyway, not just a block driver
thing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:27 +02:00
Markus Armbruster
7c84b1b831 block: Rename BlockDriverAIOCB* to BlockAIOCB*
I'll use BlockDriverAIOCB with block backends shortly, and the name is
going to fit badly there.  It's a block layer thing anyway, not just a
block driver thing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:27 +02:00
Markus Armbruster
7f06d47eff block: Merge BlockBackend and BlockDriverState name spaces
BlockBackend's name space is separate only to keep the initial patches
simple.  Time to merge the two.

Retain bdrv_find() and bdrv_get_device_name() for now, to keep this
series manageable.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
bfb197e0d9 block: Eliminate BlockDriverState member device_name[]
device_name[] can become non-empty only in bdrv_new_root() and
bdrv_move_feature_fields().  The latter is used only to undo damage
done by bdrv_swap().  The former is called only by blk_new_with_bs().
Therefore, when a BlockDriverState's device_name[] is non-empty, then
it's been created with a BlockBackend, and vice versa.  Furthermore,
blk_new_with_bs() keeps the two names equal.

Therefore, device_name[] is redundant.  Eliminate it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
fea68bb6e9 block: Eliminate bdrv_iterate(), use bdrv_next()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
18e46a033d block: Connect BlockBackend and DriveInfo
Make the BlockBackend own the DriveInfo.  Change blockdev_init() to
return the BlockBackend instead of the DriveInfo.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
7e7d56d9e0 block: Connect BlockBackend to BlockDriverState
Convenience function blk_new_with_bs() creates a BlockBackend with its
BlockDriverState.  Callers have to unref both.  The commit after next
will relieve them of the need to unref the BlockDriverState.

Complication: due to the silly way drive_del works, we need a way to
hide a BlockBackend, just like bdrv_make_anon().  To emphasize its
"special" status, give the function a suitably off-putting name:
blk_hide_on_behalf_of_do_drive_del().  Unfortunately, hiding turns the
BlockBackend's name into the empty string.  Can't avoid that without
breaking the blk->bs->device_name equals blk->name invariant.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
e4e9986b1c block: Split bdrv_new_root() off bdrv_new()
Creating an anonymous BDS can't fail.  Make that obvious.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Alexey Kardashevskiy
7ea2d269cb block/migration: Disable cache invalidate for incoming migration
When migrated using libvirt with "--copy-storage-all", at the end of
migration there is race between NBD mirroring task trying to do flush
and migration completion, both end up invalidating cache. Since qcow2
driver does not handle this situation very well, random crashes happen.

This disables the BDRV_O_INCOMING flag for the block device being migrated
once the cache has been invalidated.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

--

fixed parens by hand
Signed-off-by: Juan Quintela <quintela@redhat.com>
2014-10-14 09:35:21 +02:00
Markus Armbruster
f5bebbbb28 util: Emancipate id_wellformed() from QemuOpts
IDs have long spread beyond QemuOpts: not everything with an ID
necessarily goes through QemuOpts.  Commit 9aebf3b is about such a
case: block layer names are meant to be well-formed IDs, but some of
them don't go through QemuOpts, and thus weren't checked.  The commit
fixed that the straightforward way: rename the internal QemuOpts
helper id_wellformed() to qemu_opts_id_wellformed() and give it
external linkage.

Instead of using it directly in block.c, the commit adds wrapper
bdrv_is_valid_name(), probably to hide the connection to QemuOpts.

Go one logical step further: emancipate IDs from QemuOpts.  Rename the
function back to id_wellformed(), and put it in another file.  While
there, clean up its value to bool.  Peel off the bdrv_is_valid_name()
wrapper.

[Replaced stray return 0 with return false to match bool returns used
elsewhere in id_wellformed().
--Stefan]

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-10-03 10:30:33 +01:00
Kevin Wolf
9aebf3b892 block: Validate node-name
The device_name of a BlockDriverState is currently checked because it is
always used as a QemuOpts ID and qemu_opts_create() checks whether such
IDs are wellformed.

node-name is supposed to share the same namespace, but it isn't checked
currently. This patch adds explicit checks both for device_name and
node-name so that the same rules will still apply even if QemuOpts won't
be used any more at some point.

qemu-img used to use names with spaces in them, which isn't allowed any
more. Replace them with underscores.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-25 15:24:32 +02:00
Markus Armbruster
d224469d87 block: Improve message for device name clashing with node name
Suggested-by: Benoit Canet <benoit.canet@nodalink.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-25 15:24:14 +02:00
Markus Armbruster
3ae59580a0 block: Keep DriveInfo alive until BlockDriverState dies
If the BDS's refcnt > 0, drive_del() destroys the DriveInfo, but not
the BDS.  This can happen in three places:

* Device model destruction during unplug: blockdev_auto_del()

* Xen IDE unplug: pci_piix3_xen_ide_unplug()

* drive_del command when no device model is attached: do_drive_del()

The other callers of drive_del are on error paths where refcnt == 1.

If the user somehow manages to plug in a device model using a BDS that
has gone through drive_del(), the legacy configuration passed in
DriveInfo doesn't reach the device model, and automatic deletion on
unplug doesn't work.  Worse, some device models such as scsi-disk
crash when DriveInfo doesn't exist.

This is theoretical; I didn't research an actual reproducer. The problem
was introduced when we replaced DriveInfo reference counting by BDS
reference counting in commit a94a3fa..fa510eb.

Fix by keeping DriveInfo alive until its BDS dies.

This affects qemu_drive_opts: now you can't reuse the same ID for new
drive options until the BDS dies.  Before, you could, but since the
code always attempts to create a BDS with the same ID next, the
enclosing operation "create a new drive" failed anyway.  Different
error path, same result.

Unfortunately, the fix involves use of blockdev.c stuff from block.c,
which is a layering violation.  Fortunately, my forthcoming
BlockBackend work will get rid of it again.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-25 15:24:14 +02:00
Fam Zheng
8007429a99 block: Rename qemu_aio_release -> qemu_aio_unref
Suggested-by: Benoît Canet <benoit.canet@irqsave.net>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:17 +01:00
Fam Zheng
ca5fd113b8 block: Drop AIOCBInfo.cancel
Now that all the implementations are converted to asynchronous version
and we can emulate synchronous cancellation with it. Let's drop the
unused member.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:16 +01:00
Fam Zheng
f600ac1902 block: Drop bdrv_em_aiocb_info.cancel
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:00 +01:00
Fam Zheng
3acabd685e block: Drop bdrv_em_co_aiocb_info.cancel
Also drop the now unused ->done pointer.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:38:59 +01:00
Fam Zheng
02c50efe08 block: Add bdrv_aio_cancel_async
This is the async version of bdrv_aio_cancel, which doesn't block the
caller. It guarantees that the cb is called either before returning or
some time later.

bdrv_aio_cancel can base on bdrv_aio_cancel_async, later we can convert
all .io_cancel implementations to .io_cancel_async, and the aio_poll is
the common logic. In the end, .io_cancel can be dropped.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:38:58 +01:00
Fam Zheng
f197fe2b2c block: Add refcnt in BlockDriverAIOCB
This will be useful in synchronous cancel emulation with
bdrv_aio_cancel_async.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:38:57 +01:00
Luiz Capitulino
624ff5736e block: extend BLOCK_IO_ERROR with reason string
BLOCK_IO_ERROR events are logged by libvirt, which helps with
post mortem analysis of guests. However, one information that
we miss today is a human readable string describing the cause
of the I/O error.

This commit adds that string it to BLOCK_IO_ERROR. Note that
this string is a debugging aid for humans, meaning that it
should not parsed by applications.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-11 17:14:13 +02:00
Benoît Canet
5366d0c8bc block: Make the block accounting functions operate on BlockAcctStats
This is the next step for decoupling block accounting functions from
BlockDriverState.
In a future commit the BlockAcctStats structure will be moved from
BlockDriverState to the device models structures.

Note that bdrv_get_stats was introduced so device models can retrieve the
BlockAcctStats structure of a BlockDriverState without being aware of it's
layout.
This function should go away when BlockAcctStats will be embedded in the device
models structures.

CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Keith Busch <keith.busch@intel.com>
CC: Anthony Liguori <aliguori@amazon.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Michael Tokarev <mjt@tls.msk.ru>
CC: John Snow <jsnow@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Alexander Graf <agraf@suse.de>
CC: Max Reitz <mreitz@redhat.com>

Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10 10:41:29 +02:00