We have a helper wrapper for checking for the BDS read_only flag,
add a helper wrapper to set the read_only flag as well.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 9b18972d05f5fa2ac16c014f0af98d680553048d.1491597120.git.jcody@redhat.com
In case of block migration, there may be writes to BlockBackends that do
not have the write permission taken. Before this issue is fixed (which
is not going to happen in 2.9), we therefore cannot assert that this is
the case.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20170411145050.31290-1-mreitz@redhat.com
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The fact that the bs->aio_context is changing can confuse the dataplane
iothread, because of the now fine granularity aio context lock.
bdrv_drain should rather be a bdrv_drained_begin/end pair, but since
bs->aio_context is changing, we can just use aio_disable_external and
bdrv_parent_drained_begin.
Reported-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
While it is true that bdrv_set_aio_context only works on a single
BlockDriverState subtree (see commit message for 53ec73e, "block: Use
bdrv_drain to replace uncessary bdrv_drain_all", 2015-07-07), it works
at the AioContext level rather than the BlockDriverState level.
Therefore, it is also necessary to trigger pending bottom halves too,
even if no requests are pending.
For NBD this ensures that the aio_co_schedule of a previous call to
nbd_attach_aio_context is completed before detaching from the old
AioContext; it fixes qemu-iotest 094. Another similar bug happens
when the VM is stopped and the virtio-blk dataplane irqfd is torn down.
In this case it's possible that guest I/O gets stuck if notify_guest_bh
was scheduled but doesn't run.
Calling aio_poll from another AioContext is safe if non-blocking; races
such as the one mentioned in the commit message for c9d1a56 ("block:
only call aio_poll on the current thread's AioContext", 2016-10-28)
are a concern for blocking calls.
I considered other options, including:
- moving the bs->wakeup mechanism to AioContext, and letting the caller
check. This might work for virtio which has a clear place to wakeup
(notify_place_bh) and check the condition (virtio_blk_data_plane_stop).
For aio_co_schedule I couldn't find a clear place to check the condition.
- adding a dummy oneshot bottom half and waiting for it to trigger.
This has the complication that bottom half list is LIFO for historical
reasons. There were performance issues caused by bottom half ordering
in the past, so I decided against it for 2.9.
Fixes: 9972354856
Reported-by: Max Reitz <mreitz@redhat.com>
Reported-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Tested-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170314111157.14464-2-pbonzini@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
bdrv_child_set_perm alone is not very usable because the caller must
call bdrv_child_check_perm first. This is already encapsulated
conveniently in bdrv_child_try_set_perm, so remove the other prototypes
from the header and fix the one wrong caller, block/mirror.c.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In bdrv_open_inherit(), the filename is refreshed after opening the
backing file, but we neglected to do the same when the backing file
changes later.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
All callers pass false now, so the parameter can go away again.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1488317230-26248-14-git-send-email-armbru@redhat.com>
The next few commits will put the errors to use where appropriate.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1488317230-26248-13-git-send-email-armbru@redhat.com>
When adding an Error parameter, bdrv_replace_in_backing_chain() would
become nothing more than a wrapper around change_parent_backing_link().
So make the latter public, renamed as bdrv_replace_node(), and remove
bdrv_replace_in_backing_chain().
Most of the callers just remove a node from the graph that they just
inserted, so they can use &error_abort, but completion of a mirror job
with 'replaces' set can actually fail.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Instead of just trying to change parents by parent over to reference @to
instead of @from, and abort()ing whenever the permissions don't allow
this, do proper permission checking beforehand and pass any error to the
callers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
change_parent_backing_link() will need to update multiple BdrvChild
objects at once. Checking permissions reference by reference doesn't
work because permissions need to be consistent only with all parents
moved to the new child.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Aborting on error in bdrv_append() isn't correct. This patch fixes it
and lets the callers handle failures.
Test case 085 needs a reference output update. This is caused by the
reversed order of bdrv_set_backing_hd() and change_parent_backing_link()
in bdrv_append(): When the backing file of the new node is set, the
parent nodes are still pointing to the old top, so the backing blocker
is now initialised with the node name rather than the BlockBackend name.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Not all callers of bdrv_set_backing_hd() know for sure that attaching
the backing file will be allowed by the permission system. Return the
error from the function rather than aborting.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This adds an assertion that ensures that the necessary resize permission
has been granted before bdrv_truncate() is called.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Now that the backing file child role implements .attach/.detach
callbacks, nothing prevents us from modifying the graph even if that
involves changing backing file links.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Backing files are somewhat special compared to other kinds of children
because they are attached and detached using bdrv_set_backing_hd()
rather than the normal set of functions, which does a few more things
like setting backing blockers, toggling the BDRV_O_NO_BACKING flag,
setting parent_bs->backing_file, etc.
These special features are a reason why change_parent_backing_link()
can't handle backing files yet. With abstracting the additional features
into .attach/.detach callbacks, we get a step closer to a function that
can actually deal with this.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
bdrv_append() cares about isolation of the node that it modifies, but
not about activity in some subtree below it. Instead of using the
recursive bdrv_requests_pending(), directly check bs->in_flight, which
considers only the node in question.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
When the parents' child links are updated in bdrv_append() or
bdrv_replace_in_backing_chain(), this should affect all child links of
BlockBackends or other nodes, but not on child links held for other
purposes (like for setting permissions). This patch allows to control
the behaviour per BdrvChildRole.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Instead of just telling that there was some conflict, we can be specific
and tell which permissions were in conflict and which way the conflict
is.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
For meaningful error messages in the permission system, we need to get
some human-readable description of the parent of a BdrvChild.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Now that blk_insert_bs() requests the BlockBackend permissions for the
node it attaches to, it can fail. Instead of aborting, pass the errors
to the callers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
We want every user to be specific about the permissions it needs, so
we'll pass the initial permissions as parameters to blk_new(). A user
only needs to call blk_set_perm() if it wants to change the permissions
after the fact.
The permissions are stored in the BlockBackend and applied whenever a
BlockDriverState should be attached in blk_insert_bs().
This does not include actually choosing the right set of permissions
everywhere yet. Instead, the usual FIXME comment is added to each place
and will be addressed in individual patches.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Now that all block drivers with children tell us what permissions they
need from each of their children, bdrv_attach_child() can use this
information and make the right requirements while trying to attach new
children.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
All block drivers that can have child nodes implement .bdrv_child_perm()
now. Make this officially a requirement by asserting that only drivers
without children can omit .bdrv_child_perm().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
vvfat is the last remaining driver that can have children, but doesn't
implement .bdrv_child_perm() yet. The default handlers aren't suitable
here, so let's implement a very simple driver-specific one that protects
the internal child from being used by other users as good as our
permissions permit.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Almost all format drivers have the same characteristics as far as
permissions are concerned: They have one or more children for storing
their own data and, more importantly, metadata (can be written to and
grow even without external write requests, must be protected against
other writers and present consistent data) and optionally a backing file
(this is just data, so like for a filter, it only depends on what the
parent nodes need).
This provides a default implementation that can be shared by most of
our format drivers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Most filters need permissions related to read and write for their
children, but only if the node has a parent that wants to use the same
operation on the filter. The same is true for resize.
This adds a default implementation that simply forwards all necessary
permissions to all children of the node and leaves the other permissions
unchanged.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
In many cases, the required permissions of one node on its children
depend on what its parents require from it. For example, the raw format
or most filter drivers only need to request consistent reads if that's
something that one of their parents wants.
In order to achieve this, this patch introduces two new BlockDriver
callbacks. The first one lets drivers first check (recursively) whether
the requested permissions can be set; the second one actually sets the
new permission bitmask.
Also add helper functions that drivers can use in their implementation
of the callbacks to update their permissions on a specific child.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
When attaching a node as a child to a new parent, the required and
shared permissions for this parent are checked against all other parents
of the node now, and an error is returned if there is a conflict.
This allows error returns to a function that previously always
succeeded, and the same is true for quite a few callers and their
callers. Converting all of them within the same patch would be too much,
so for now everyone tells that they don't need any permissions and allow
everyone else to do anything. This way we can use &error_abort initially
and convert caller by caller to pass actual permission requirements and
implement error handling.
All these places are marked with FIXME comments and it will be the job
of the next patches to clean them up again.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
It will have to return an error soon, so prepare the callers for it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJYsHaaAAoJEH8JsnLIjy/WvC4P/iw/WC6XySNzPMOzICrJr9fc
IHz447+0MqoIBWqGRFLEU8z4t6k8HGt2YnWKFuS1N+2gU5fSgOdp20nAA+pQvxRb
RTyQL7BNJPvFNzrEQPTpdvBt79veYsoNe5dNfSq01z/PdgxMQR4PkS96Jm8w4Jdu
20ZPfULws8+Wq1Snsxl4PdnFpY2n97OOGQRCjl50h5ypol5+fXDDzAvp/GPf6Q8B
09OTWmQ4UIr4OZqT83T1kDdRZRRMIxiRP5qTTKdh0BlJEQdBjrJ9fTClY4bMcIgl
VP/3kzkmdoqSW+4D+0L88q7vyvM3Kwc6n2PFDaRf6Lgy9ueYoPKuW9AF5vzcxhED
AI0SN9boM+4z1P75kalBaHg/BiRL7UDwpHgdanPVcENoP8IywsKzOIvdjOpqINmX
uLU3QfK+qK6x7gflhVXiX2sFzTLzZQlWu5KPjW6QfuMzUuRAeksfMDXh2GXNFVG1
igGnORyqB2nOGNAsudtMrOvjHQRSbwGsnvWcwvaX0swrxOb3oAwc3X+E4nxE/K5/
wJQLrCM9DfD3CKKlKiu+1wQXx6zqaGYIsNxLcrSylJ3TPsGuxfkx3b9GVCCQ0Y+e
YSdwWSAIS9LxR4qISJ5ehlnYDA0sRtdco0xGW6ACNE5IrgREYUcH4EJOXbxZ4NHs
BK/DVdYhFtNZ60zHeEn4
=OYgF
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Fri 24 Feb 2017 18:08:26 GMT
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
tests: Use opened block node for block job tests
vvfat: Use opened node as backing file
block: Add bdrv_new_open_driver()
block: Factor out bdrv_open_driver()
block: Use BlockBackend for image probing
block: Factor out bdrv_open_child_bs()
block: Attach bs->file only during .bdrv_open()
block: Pass BdrvChild to bdrv_truncate()
mirror: Resize active commit base in mirror_run()
qcow2: Use BB for resizing in qcow2_amend_options()
blockdev: Use BlockBackend to resize in qmp_block_resize()
iotests: Fix another race in 030
qemu-img: Improve documentation for PREALLOC_MODE_FALLOC
qemu-img: Truncate before full preallocation
qemu-img: Add tests for raw image preallocation
qemu-img: Do not truncate before preallocation
qemu-iotests: redirect nbd server stdout to /dev/null
qemu-iotests: add ability to exclude certain protocols from tests
qemu-iotests: Test 137 only supports 'file' protocol
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This function allows to create more or less normal BlockDriverStates
even for BlockDrivers that aren't globally registered (e.g. helper
filters for block jobs).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This is a function that doesn't do any option parsing, but just does
some basic BlockDriverState setup and calls the .bdrv_open() function of
the block driver.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This fixes the use of a parent-less BdrvChild in bdrv_open_inherit() by
converting it into a BlockBackend. Which is exactly what it should be,
image probing is an external, standalone user of a node. The requests
can't be considered to originate from the format driver node because
that one isn't even opened yet.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This is the part of bdrv_open_child() that opens a BDS with option
inheritance, but doesn't attach it as a child to the parent yet.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
The way that attaching bs->file worked was a bit unusual in that it was
the only child that would be attached to a node which is not opened yet.
Because of this, the block layer couldn't know yet which permissions the
driver would eventually need.
This patch moves the point where bs->file is attached to the beginning
of the individual .bdrv_open() implementations, so drivers already know
what they are going to do with the child. This is also more consistent
with how driver-specific children work.
For a moment, bdrv_open() gets its own BdrvChild to perform image
probing, but instead of directly assigning this BdrvChild to the BDS, it
becomes a temporary one and the node name is passed as an option to the
drivers, so that they can simply use bdrv_open_child() to create another
reference for their own use.
This duplicated child for (the not opened yet) bs is not the final
state, a follow-up patch will change the image probing code to use a
BlockBackend, which is completely independent of bs.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
qobject_to_qdict(obj) returns NULL when obj isn't a QDict. Check
that instead of qobject_type(obj) == QTYPE_QDICT.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1487363905-9480-8-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Current implementation invalidates firstly parent bds and then its
children. This leads to the following bug:
after incoming migration, in bdrv_invalidate_cache_all:
1. invalidate parent bds - reopen it with BDRV_O_INACTIVE cleared
2. child is not yet invalidated
3. parent check that its BDRV_O_INACTIVE is cleared
4. parent writes to child
5. assert in bdrv_co_pwritev, as BDRV_O_INACTIVE is set for child
This patch fixes it by just changing invalidate sequence: invalidate
children first.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170131112308.54189-1-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
In bdrv_find_backing_image(), if we are searching an image for a backing
file that contains a protocol, we currently only compare unmodified
paths.
However, some management software will change the backing filename to be
a relative filename in a path. QEMU is able to handle this fine,
because internally it will use path_combine to put together the full
protocol URI.
However, this can lead to an inability to match an image during a QAPI
command that needs to use bdrv_find_backing_image() to find the image,
when it is searched by the full URI.
When searching for a protocol filename, if the straight comparison
fails, this patch will also compare against the full backing filename to
see if that is a match.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: c2d025adca8a2b665189e6f4cf080f44126d0b6b.1485392617.git.jcody@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Introduce rules in the top level Makefile that are able to generate
trace.[ch] files in every subdirectory which has a trace-events file.
The top level directory is handled specially, so instead of creating
trace.h, it creates trace-root.h. This allows sub-directories to
include the top level trace-root.h file, without ambiguity wrt to
the trace.g file in the current sub-dir.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170125161417.31949-7-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
options must be non-NULL here, because a NULL value is replaced with
qdict_new earlier in the function. Reported by Coverity.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Some block drivers may not be loaded yet, but qemu supports them
nonetheless. bdrv_iterate_format() should report them, too.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20161012204907.25941-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
bdrv_iterate_format() did not actually sort the formats by name but by
"pointer interpreted as string". That is probably not what we intended
to do, so fix it (by changing qsort_strcmp() so it matches the example
from qsort()'s manual page).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20161012204907.25941-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This makes sure that the image we are streaming into is open in
read-write mode during the operation.
Operation blockers are also set in all intermediate nodes, since they
will be removed from the chain afterwards.
Finally, this also unblocks the stream operation in backing files.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When a BlockDriverState is about to be reopened it can trigger certain
operations that need to write to disk. During this process a different
block job can be woken up. If that block job completes and also needs
to call bdrv_reopen() it can happen that it needs to do it on the same
BlockDriverState that is still in the process of being reopened.
This can have fatal consequences, like in this example:
1) Block job A starts and sleeps after a while.
2) Block job B starts and tries to reopen node1 (a qcow2 file).
3) Reopening node1 means flushing and replacing its qcow2 cache.
4) While the qcow2 cache is being flushed, job A wakes up.
5) Job A completes and reopens node1, replacing its cache.
6) Job B resumes, but the cache that was being flushed no longer
exists.
This patch splits the bdrv_drain_all() call to keep all block jobs
paused during bdrv_reopen_multiple(), so that step 4 can never happen
and the operation is safe.
Note that this scenario can only happen if both bdrv_reopen() calls
are made by block jobs on the same backing chain. Otherwise there's no
chance that the same BlockDriverState appears in both reopen queues.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
aio_poll is not thread safe; for example bdrv_drain can hang if
the last in-flight I/O operation is completed in the I/O thread after
the main thread has checked bs->in_flight.
The bug remains latent as long as all of it is called within
aio_context_acquire/aio_context_release, but this will change soon.
To fix this, if bdrv_drain is called from outside the I/O thread,
signal the main AioContext through a dummy bottom half. The event
loop then only runs in the I/O thread.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1477565348-5458-18-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
After the next patch bdrv_drain_all will have to be called without holding any
AioContext. Prepare to do this by adding an AioContext argument to
bdrv_reopen_multiple.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1477565348-5458-15-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
The event currently only contains the BlockBackend name. However, with
anonymous BlockBackends, this is always the empty string. Add the qdev
ID (or if none was given, the QOM path) so that the user can still see
which device caused the event.
Event generation has to be moved from bdrv_eject() to the BlockBackend
because the BDS doesn't know the attached device, but that's easy
because blk_eject() is the only user of it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Recently we moved a few options from QemuOptsLists in blockdev.c to
bdrv_runtime_opts in block.c in order to make them accissble using
blockdev-add. However, this has the side effect that these options are
missing from query-command-line-options now, and libvirt consequently
disables the corresponding feature.
This problem was reported as a regression for the 'discard' option,
introduced in commit 818584a4. However, it is more general than that.
Fix it by adding bdrv_runtime_opts to the list of QemuOptsLists that are
returned in query-command-line-options. For the future, libvirt is
advised to use QMP schema introspection for block device options.
Reported-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Michal Privoznik <mprivozn@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
This enables its use for nested child nodes. The compatibility
between the 'discard' and 'detect-zeroes' setting is checked in
bdrv_open_common() now as the former setting isn't available before
calling bdrv_open() any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Amongst others, this means that you can now use the 'detect-zeroes'
option for non-top-level nodes in blockdev-add, like the QAPI schema
promises.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
bdrv_reopen_queue_child() assumes that a BlockDriverState is never
added twice to BlockReopenQueue.
That's however not the case: commit_start() adds 'base' (and its
children) to a new reopen queue, and then 'overlay_bs' (and its
children, which include 'base') to the same queue. The effect of this
is that the first set of options is ignored and overriden by the
second.
We fixed this by swapping the order in which both BDSs were added to
the queue in 3db2bd5508. This patch
checks if a BDS is already in the reopen queue and keeps its options.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds the "read-only" option to the QDict. One important effect of
this change is that when a child inherits options from its parent, the
existing "read-only" mode can be preserved if it was explicitly set
previously.
This addresses scenarios like this:
[E] <- [D] <- [C] <- [B] <- [A]
In this case, if we reopen [D] with read-only=off, and later reopen
[B], then [D] will not inherit read-only=on from its parent during the
bdrv_reopen_queue_child() stage.
The BDRV_O_RDWR flag is not removed yet, but its keep in sync with the
value of the "read-only" option.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We're only doing this immediately before opening the image, but
bs->open_flags is used earlier in the function. At the moment this is
not causing problems because none of the checked flags are modified by
update_flags_from_options(), but this will change when we introduce
the "read-only" option.
This patch calls update_flags_from_options() at the beginning of the
function, immediately after creating the QemuOpts.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If an image is opened with snapshot=on, its flags are modified by
bdrv_backing_options() and then bs->open_flags is updated accordingly.
This last step is unnecessary if we calculate the new flags before
setting bs->open_flags.
Soon we'll introduce the "read-only" option, and then we'll need to
be able to modify its value in the QDict when snapshot=on. This is
more cumbersome if bs->options is already set. This patch simplifies
that. Other than that, there are no semantic changes. Although it
might seem that bs->options can have a different value now because
it is stored after calling bdrv_backing_options(), this call doesn't
actually modify them in this scenario.
The code that sets BDRV_O_ALLOW_RDWR is also moved for the same
reason.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is unnecessary and has been unused since 5433c24f0f.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
The builtin NBD server uses its own BlockBackend now instead of reusing
the monitor/guest device one.
This means that it has its own writethrough setting now. The builtin
NBD server always uses writeback caching now regardless of whether the
guest device has WCE enabled. qemu-nbd respects the cache mode given on
the command line.
We still need to keep a reference to the monitor BB because we put an
eject notifier on it, but we don't use it for any I/O.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
All .bdrv_co_write_zeroes callbacks nowadays work perfectly even
with backing store attached. If future new callbacks would be unable to do
that - they have a chance to block this in bdrv_get_info().
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-6-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.
This change introduces a write generation scheme in BlockDriverState.
Current write generation is checked against last flushed generation to
avoid unnessesary flushes.
The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.
This affected some blkdebug testcases that were expecting error logs from
failure-injected flushes which are now skipped entirely
(tests 026 071 089).
This also affects the performance of block jobs and thus BLOCK_JOB_READY
events for driver-mirror and active block-commit commands now arrives
faster, before QMP send successfully returns to caller (tests 141 144).
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-5-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
In practice the entry argument is always known at creation time, and
it is confusing that sometimes qemu_coroutine_enter is used with a
non-NULL argument to re-enter a coroutine (this happens in
block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value
at creation time, for consistency with e.g. aio_bh_new.
Mostly done with the following semantic patch:
@ entry1 @
expression entry, arg, co;
@@
- co = qemu_coroutine_create(entry);
+ co = qemu_coroutine_create(entry, arg);
...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);
@ entry2 @
expression entry, arg;
identifier co;
@@
- Coroutine *co = qemu_coroutine_create(entry);
+ Coroutine *co = qemu_coroutine_create(entry, arg);
...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);
@ entry3 @
expression entry, arg;
@@
- qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
+ qemu_coroutine_enter(qemu_coroutine_create(entry, arg));
@ reentry @
expression co;
@@
- qemu_coroutine_enter(co, NULL);
+ qemu_coroutine_enter(co);
except for the aforementioned few places where the semantic patch
stumbled (as expected) and for test_co_queue, which would otherwise
produce an uninitialized variable warning.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
No code changes, just moved from one file to another.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Using int for values that are only used as booleans is confusing.
While at it, rearrange a couple of members so that all the bools
are contiguous.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It makes more sense to have ALL block size limit constraints
in the same struct. Improve the documentation while at it.
Simplify a couple of conditionals, now that we have audited and
documented that request_alignment is always non-zero.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.
Now that all drivers have been updated to supply an override
of request_alignment during their .bdrv_refresh_limits(), as
needed, the block layer itself can defer setting the default
alignment until part of the overall bdrv_refresh_limits().
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
error_propagate() already ignores local_err==NULL, so there's no
need to check it before calling.
Coccinelle patch used to perform the changes added to
scripts/coccinelle/error_propagate_null.cocci.
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1465855078-19435-2-git-send-email-ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
It's possible that an AioContext notifier user was close to finishing
when .detach_aio_context() or .attached_aio_context() is called. In
that case they may call bdrv_remove_aio_context_notifier() during the
callback.
Use safe iteration to avoid crashing when the notifier list is modified
during iteration. We must not only handle the case where the current
aio notifier is removed during a callback but also the one where any
other aio notifier is removed.
The next patch adds an AioContext notifier for block jobs and they
really could be terminating just as .detach_aio_context() is invoked.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-6-git-send-email-stefanha@redhat.com
Currently, we are trying to move the backing BDS from the source to the
target in bdrv_replace_in_backing_chain() which is called from
mirror_exit(). However, mirror_complete() already tries to open the
target's backing chain with a call to bdrv_open_backing_file().
First, we should only set the target's backing BDS once. Second, the
mirroring block job has a better idea of what to set it to than the
generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
conditions on when to move the backing BDS from source to target are not
really correct).
Therefore, remove that code from bdrv_replace_in_backing_chain() and
leave it to mirror_complete().
Depending on what kind of mirroring is performed, we furthermore want to
use different strategies to open the target's backing chain:
- If blockdev-mirror is used, we can assume the user made sure that the
target already has the correct backing chain. In particular, we should
not try to open a backing file if the target does not have any yet.
- If drive-mirror with mode=absolute-paths is used, we can and should
reuse the already existing chain of nodes that the source BDS is in.
In case of sync=full, no backing BDS is required; with sync=top, we
just link the source's backing BDS to the target, and with sync=none,
we use the source BDS as the target's backing BDS.
We should not try to open these backing files anew because this would
lead to two BDSs existing per physical file in the backing chain, and
we would like to avoid such concurrent access.
- If drive-mirror with mode=existing is used, we have to use the
information provided in the physical image file which means opening
the target's backing chain completely anew, just as it has been done
already.
If the target's backing chain shares images with the source, this may
lead to multiple BDSs per physical image file. But since we cannot
reliably ascertain this case, there is nothing we can do about it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-3-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
change_parent_backing_link() asserts that the BDS to be replaced is not
used as a backing file. However, we may want to replace a BDS by its
overlay in which case that very link should not be redirected.
For instance, when doing a sync=none drive-mirror operation, we may have
the following BDS/BB forest before block job completion:
target
base <- source <- BlockBackend
During job completion, we want to establish the source BDS as the
target's backing node:
target
|
v
base <- source <- BlockBackend
This makes the target a valid replacement for the source:
target <- BlockBackend
|
v
base <- source
Without this modification to change_parent_backing_link() we have to
inject the target into the graph before the source is its backing node,
thus temporarily creating a wrong graph:
target <- BlockBackend
base <- source
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-2-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
snapshot=on creates a temporary overlay that is always opened with
cache=unsafe (the cache mode specified by the user is only for the
actual image file and its children). This means that we must not inherit
the BDRV_O_NATIVE_AIO flag for the temporary overlay because trying to
use Linux AIO with cache=unsafe results in an error.
Reproducer without this patch:
$ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on
qemu-system-x86_64: -drive file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on: aio=native was
specified, but it requires cache.direct=on, which was not specified.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is always true for open images now.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
If block drivers say that they can do an alignment < 512 bytes, let's
just suppose they mean it. raw-posix used to be an offender with respect
to this, but it can actually deal with byte-aligned requests now.
The default is still 512 bytes for any drivers that only implement
sector-based interfaces, but it is 1 now for drivers that implement
.bdrv_co_preadv.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
at least bdrv_co_preadv/pwritev expect this.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
So far, bdrv_close_all() first removed all root BlockDriverStates of
BlockBackends and monitor owned BDSes, and then assumed that the
remaining BDSes must be related to jobs and cancelled these jobs.
This order doesn't work that well any more when block jobs use
BlockBackends internally because then they will lose their BDS before
being cancelled.
This patch changes bdrv_close_all() to first cancel all jobs and then
remove all root BDSes from the remaining BBs.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
When draining intermediate nodes (i.e. nodes that aren't the root node
for at least one of their parents; with node references, the user can
always configure the graph to create this situation), we need to
propagate the .drained_begin/end callbacks all the way up to the root
for the drain to be effective.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
When changing the BlockDriverState that a BdrvChild points to while the
node is currently drained, we must call the .drained_end() parent
callback. Conversely, when this means attaching a new node that is
already drained, we need to call .drained_begin().
bdrv_root_attach_child() takes now an opaque parameter, which is needed
because the callbacks must also be called if we're attaching a new child
to the BlockBackend when the root node is already drained, and they need
a way to identify the BlockBackend. Previously, child->opaque was set
too late and the callbacks would still see it as NULL.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
This adds a common function that is called when attaching a new child to
a parent, removing a child from a parent and when reconfiguring the
graph so that an existing child points to a different node now.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
bdrv_close() now asserts that the BDS's refcount is 0, therefore it
cannot have any parents and the bdrv_parent_cb_change_media() call is a
no-op.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The only caller of bdrv_close() left is bdrv_delete(). We may as well
assert that, in a way (there are some things in bdrv_close() that make
more sense under that assumption, such as the call to
bdrv_release_all_dirty_bitmaps() which in turn assumes that no frozen
bitmaps are attached to the BDS).
In addition, being called only in bdrv_delete() means that we can drop
bdrv_close()'s forward declaration at the top of block.c.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are no callers to bdrv_open() or bdrv_open_inherit() left that
pass a pointer to a non-NULL BDS pointer as the first argument of these
functions, so we can finally drop that parameter and just make them
return the new BDS.
Generally, the following pattern is applied:
bs = NULL;
ret = bdrv_open(&bs, ..., &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
...
}
by
bs = bdrv_open(..., errp);
if (!bs) {
ret = -EINVAL;
...
}
Of course, there are only a few instances where the pattern is really
pure.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is unused now, so we may just as well drop it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If bdrv_open_inherit() creates a snapshot BDS and *pbs is NULL, that
snapshot BDS should be returned instead of the BDS under it.
This has worked so far because (nearly) all users of BDRV_O_SNAPSHOT use
blk_new_open() to create the BDS tree. bdrv_append() (which is called by
bdrv_append_temp_snapshot()) redirects pointers from parents (i.e. the
BB in this case) to the newly appended child (i.e. the overlay),
therefore, while bdrv_open_inherit() did not return the root BDS, the BB
still pointed to it.
The only instance where BDRV_O_SNAPSHOT is used but blk_new_open() is
not is in blockdev_init() if no BDS tree is created, and instead
blk_new() is used and the flags are stored in the BB root state.
However, qmp_blockdev_change_medium() filters the BDRV_O_SNAPSHOT flag
before invoking bdrv_open(), so it will not have any effect.
In any case, it would be nicer if bdrv_open_inherit() could just always
return the root of the BDS tree that has been created.
To this end, bdrv_append_temp_snapshot() now returns the snapshot BDS
instead of just appending it on top of the snapshotted BDS. Also, it
calls bdrv_ref() before bdrv_append() (which bdrv_open_inherit() has to
undo if not returning the overlay).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_append_temp_snapshot() uses bdrv_new() to create an empty BDS
before invoking bdrv_open() on that BDS. This is probably a relict from
when it used to do some modifications on that empty BDS, but now that is
unnecessary, so we can just set bs_snapshot to NULL and let bdrv_open()
do the rest.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_next() users all leaked the BdrvNextIterator after completing
the iteration. Simply changing bdrv_next() to free the iterator before
returning NULL at the end of list doesn't work because some callers exit
the loop before looking at all BDSes.
This patch moves the BdrvNextIterator from the heap to the stack of
the caller and switches to a bdrv_first()/bdrv_next() interface for
initialising the iterator.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Instead of propagating any change of a BDS's AioContext only to its file
and backing children and letting driver-specific code do the rest, just
propagate it to all and drop the thus superfluous implementations of
bdrv_{at,de}tach_aio_context() in Quorum, blkverify and VMDK.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch removes the remaining users of bs->blk, which will allow us
to have multiple BBs on top of a single BDS. In the meantime, all checks
that are currently in place to prevent the user from creating such
setups can be switched to bdrv_has_blk() instead of accessing BDS.blk.
Future patches can allow them and e.g. enable users to mirror to a block
device that already has a BlockBackend on it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
We need to introduce a separate BdrvNextIterator struct that can keep
more state than just the current BDS in order to avoid using the bs->blk
pointer.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
In order to get rid of bs->blk for bdrv_get_device_name() and
bdrv_get_device_or_node_name(), ask all parents for their name and
simply pick the first one.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
We want to get rid of BlockDriverState.blk in order to allow multiple
BlockBackends per BDS. Converting the device callbacks in block.c (which
assume a single BlockBackend) to per-child callbacks gets us rid of the
first few instances.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This reverts commit 76b223200e.
Now that I/O throttling is fully done on the BlockBackend level, there
is no reason any more to block I/O throttling for nodes with multiple
parents as the parents don't influence each other any more.
Conflicts:
block.c
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
bdrv_move_feature_fields() and swap_feature_fields() are empty now, they
can be removed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
This moves the throttling related part of the BDS life cycle management
to BlockBackend. The throttling group reference is now kept even when no
medium is inserted.
With this commit, throttling isn't disabled and then re-enabled any more
during graph reconfiguration. This fixes the temporary breakage of I/O
throttling when used with live snapshots or block jobs that manipulate
the graph.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch changes where the throttling state is stored (used to be the
BlockDriverState, now it is the BlockBackend), but it doesn't actually
make it a BB level feature yet. For example, throttling is still
disabled when the BDS is detached from the BB.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
It was already true in principle that a throttled BDS always has a BB
attached, except that the order of operations while attaching or
detaching a BDS to/from a BB wasn't careful enough.
This commit breaks graph manipulations while I/O throttling is enabled.
It would have been possible to keep things working with some temporary
hacks, but quite cumbersome, so it's not worth the hassle. We'll fix
things again in a minute.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
In some cases, we want to take a quorum child offline, and take
another child online.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1462865799-19402-2-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Currently we only inactivate the top BDS. Actually bdrv_inactivate
should be the opposite of bdrv_invalidate_cache.
Recurse into the whole subtree instead.
Because a node may have multiple parents, and because once
BDRV_O_INACTIVE is set for a node, further writes are not allowed, we
cannot interleave flag settings and .bdrv_inactivate calls (that may
submit write to other nodes in a graph) within a single pass. Therefore
two passes are used here.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently we only recurse to bs->file, which will miss the children in quorum
and VMDK.
Recurse into the whole subtree to avoid that.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are no block drivers left that implement the old .bdrv_read/write
interface, so it can be removed now. This gets us rid of the
corresponding emulation functions, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Extract the handling of throttling from bdrv_flush_io_queue. These
new functions will soon become BdrvChildRole callbacks, as they can
be generalized to "beginning of drain" and "end of drain".
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As the patches to move I/O throttling to BlockBackend didn't make it in
time for the 2.6 release, but the release adds new ways of configuring
VMs whose behaviour would change once the move is done, we need to
outlaw such configurations temporarily.
The problem exists whenever a BDS has more users than just its BB, for
example it is used as a backing file for another node. (This wasn't
possible in 2.5 yet as we introduced node references to specify a
backing file only recently.) In these cases, the throttling would
apply to these other users now, but after moving throttling to the
BlockBackend the other users wouldn't be throttled any more.
This patch prevents making new references to a throttled node as well as
using monitor commands to throttle a node with multiple parents.
Compared to 2.5 this changes behaviour in some corner cases where
references were allowed before, like bs->file or Quorum children. It
seems reasonable to assume that users didn't use I/O throttling on such
low level nodes. With the upcoming move of throttling into BlockBackend,
such configurations won't be possible anyway.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The only remaining users were block jobs (mirror and backup) which
unconditionally enabled WCE on the BlockBackend of the target image. As
these block jobs don't go through BlockBackend for their I/O requests,
they aren't affected by this setting anyway but always get a writeback
mode, so that call can be removed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
The previous patches have successively made blk->enable_write_cache the
true source for the information whether a writethrough mode must be
implemented. The corresponding BDRV_O_CACHE_WB is only useless baggage
we're carrying around, so now's the time to remove it.
At the same time, we remove the 'cache.writeback' option parsing on the
BDS level as the only effect was setting the BDRV_O_CACHE_WB flag.
This change requires test cases that explicitly enabled the option to
drop it. Other than that and the change of the error message when
writethrough is enabled on the BDS level (from "Can't set writethrough
mode" to "doesn't support the option"), there should be no change in
behaviour.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
We must forbid changing the WCE flag in bdrv_reopen() in the same patch,
as otherwise the behaviour would change so that the flag takes
precedence over the explicitly specified option.
The correct value of the WCE flag depends on the BlockBackend user (e.g.
guest device) and isn't a decision that the QMP client makes, so this
change is what we want.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Now that WCE is handled on the BlockBackend level, the flag is
meaningless for BDSes. As the schema requires us to fill the field,
we return an enabled write cache for them.
Note that this means that querying the BlockBackend name may return
writethrough as the cache information, whereas querying the node-name of
the root of that same BlockBackend will return writeback.
This may appear odd at first, but it actually makes sense because it
correctly repesents the layer that implements the WCE handling. This
becomes more apparent when you consider nodes that are the root node of
multiple BlockBackends, where each BB can have its own WCE setting.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Whether a write cache is used or not is a decision that concerns the
user (e.g. the guest device) rather than the backend. It was already
logically part of the BB level as bdrv_move_feature_fields() always kept
it on top of the BDS tree; with this patch, the core of it (the actual
flag and the additional flushes) is also implemented there.
Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs
doesn't have a BlockBackend attached.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
It's like bdrv_parse_cache_flags(), except that writethrough mode isn't
included in the flags, but returned as a separate bool.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
For a couple of releases we have been warning
Encrypted images are deprecated
Support for them will be removed in a future release.
You can use 'qemu-img convert' to convert your image to an unencrypted one.
This warning was issued by system emulators, qemu-img, qemu-nbd
and qemu-io. Such a broad warning was issued because the original
intention was to rip out all the code for dealing with encryption
inside the QEMU block layer APIs.
The new block encryption framework used for the LUKS driver does
not rely on the unloved block layer API for encryption keys,
instead using the QOM 'secret' object type. It is thus no longer
appropriate to warn about encryption unconditionally.
When the qcow/qcow2 drivers are converted to use the new encryption
framework too, it will be practical to keep AES-CBC support present
for use in qemu-img, qemu-io & qemu-nbd to allow for interoperability
with older QEMU versions and liberation of data from existing encrypted
qcow2 files.
This change moves the warning out of the generic block code and
into the qcow/qcow2 drivers. Further, the warning is set to only
appear when running the system emulators, since qemu-img, qemu-io,
qemu-nbd are expected to support qcow2 encryption long term now that
the maint burden has been eliminated.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When opening an image it is useful to know whether the caller
intends to perform I/O on the image or not. In the case of
encrypted images this will allow the block driver to avoid
having to prompt for decryption keys when we merely want to
query header metadata about the image. eg qemu-img info
This flag is enforced at the top level only, since even if
we don't want todo I/O on the 'qcow2' file payload, the
underlying 'file' driver will still need todo I/O to read
the qcow2 header, for example.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Writethrough mode is going to become a BlockBackend feature rather than
a BDS one, so forbid it in places where we won't be able to support it
when the code finally matches the envisioned design.
We only allowed setting the cache mode of non-root nodes after the 2.5
release, so we're still free to make this change.
The target of block jobs is now always opened in a writeback mode
because it doesn't have a BlockBackend attached. This makes more sense
anyway because block jobs know when to flush. If the graph is modified
on job completion, the original cache mode moves to the new root, so
for the guest device writethough always stays enabled if it was
configured this way.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
First of all, we're generally not writing to backing files, but when we
do, it's in the context of block jobs which know very well when to flush
the image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This patch changes dirty bitmaps from following a BlockBackend in graph
changes to sticking with the node they were created at. For the full
discussion, read the following mailing list thread:
[Qemu-block] block: Dirty bitmaps and COR in bdrv_move_feature_fields()
https://lists.nongnu.org/archive/html/qemu-block/2016-02/msg00745.html
In summary, the justification for this change is:
* When moving the dirty bitmap to the top of the tree was introduced in
bdrv_append() in commit a9fc4408, it didn't actually have any effect
because there could never be a bitmap in use when bdrv_append() was
called (op blockers would prevent this). This is still true today for
all internal uses of dirty bitmaps.
* Support for user-defined dirty bitmaps was introduced in 2.4, but we
discouraged users from using it because we didn't consider it ready
yet.
Moreover, in 2.5, the bdrv_swap() removal introduced a bug that left
dangling pointers if a dirty bitmap was present (the anchors of the
dirty bitmap were swapped, but the back link in the first element
wasn't updated), so it didn't even work correctly.
* block-dirty-bitmap-add takes an arbitrary node name, even if no
BlockBackend is attached. This suggests that it is a node level
operation and not a BlockBackend one. Consequently, there is no reason
for dirty bitmaps to stay with a BlockBackend that was attached to the
node they were created for.
* It was suggested that block-dirty-bitmap-add could track the node if a
node name was specified, and track the BlockBackend if the device name
was specified. This would however be inconsistent with other QMP
commands. Commands that accept both device and node names currently
interpret the device name just as an alias for the current root node
of that BlockBackend.
* Dirty bitmaps have a name that is only unique amongst the bitmaps in a
specific node. Moving bitmaps could lead to name clashes. Automatic
renaming would involve too much magic.
* Persistent bitmaps are stored in a specific node. Moving them around
automatically might be at least surprising, but it would probably also
become a real problem because that would have to happen atomically
without the management tool knowing of the operation.
At the end of the day it seems to be very clear that it was a mistake to
include dirty bitmaps in bdrv_move_feature_fields(). The functionality
of moving bitmaps and/or attaching them to a BlockBackend instead will
probably be needed, but it should be done with a new explicit QMP
command or option.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Ever since we first introduced bdrv_append() in commit 8802d1fd ('qapi:
Introduce blockdev-group-snapshot-sync command'), the copy-on-read flag
was moved to the new top layer when taking a snapshot. The only problem
is that it doesn't make a whole lot of sense.
The use case for manually enabled CoR is to avoid reading data twice
from a slow remote image, so we want to save it to a local overlay, say
an ISO image accessed via HTTP to a local qcow2 overlay. When taking a
snapshot, we end up with a backing chain like this:
http <- local.qcow2 <- snap_overlay.qcow2
There is no point in doing CoR from local.qcow2 into snap_overlay.qcow2,
we just want to keep copying data from the remote source into
local.qcow2.
The other use case of CoR is in the context of streaming, which isn't
very interesting for bdrv_move_feature_fields() because op blockers
prevent this combination.
This patch makes the copy-on-read flag stay on the image for which it
was originally set and prevents it from being propagated to the new
overlay. It is no longer intended to move CoR to the BlockBackend level.
In order for this to make sense, we also need to keep the respective
image read-write.
As a side effect of these changes, creating a live snapshot image (as
opposed to using an existing externally created one) on top of a COR
block device works now. It used to fail because it tried to open its
backing file both read-only and with COR.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The call in hmp_drive_del() is dead code because blk_remove_bs() is
called a few lines above. The only other remaining user is
bdrv_delete(), which only abuses bdrv_make_anon() to remove it from the
named nodes list. This path inlines the list entry removal into
bdrv_delete() and removes bdrv_make_anon().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)
Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is no point in manually iterating through the bdrv_states list
when there is bdrv_next().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of using the bdrv_states list, iterate over all the
BlockDriverStates attached to BlockBackends, and over all the
monitor-owned BDSs afterwards (except for those attached to a BB).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The information which BB is concerned does not seem useful enough to
justify its existence in most other place (which may be related to qemu
printing the -drive parameter in question anyway, and for blockdev-add
the attribution is naturally unambiguous). Furthermore, as of a future
patch, bdrv_get_device_name(bs) will always return the empty string
before bdrv_open_inherit() returns.
Therefore, just dropping that information seems to be the best course of
action.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The only code change is making bdrv_dirty_bitmap_truncate public. It is
used in block.c.
Also two long lines (bdrv_get_dirty) are wrapped.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-5-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Since commit 91a097e, we end up with a somewhat weird cache mode
configuration with snapshot=on: The commit broke the cache mode
inheritance for the snapshot overlay so that it is opened as
writethrough instead of unsafe now. The following bdrv_append() call to
put it on top of the tree swaps the WCE flag with the snapshot's backing
file (i.e. the originally given file), so what we eventually get is
cache=writeback on the temporary overlay and
cache=writethrough,cache.no-flush=on on the real image file.
This patch changes things so that the temporary overlay gets
cache=unsafe again like it used to, and the real images get whatever the
user specified. This means that cache.direct is now respected even with
snapshot=on, and in the case of committing changes, the final flush is
no longer ignored except explicitly requested by the user.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
The BDRV_O_INACTIVE flag should only be set for images explicitly opened
by the user. snapshot=on needs to create a new qcow2 image and write
some metadata to it. This is not a problem because it can't come from
the source, so there's no reason to mark it as BDRV_O_INACTIVE, even
though it is opened while waiting for the migration to complete.
This fixes an assertion failure when -incoming and snapshot=on are
combined.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1454089805-5470-16-git-send-email-peter.maydell@linaro.org
This fixes a regression introduced with commit 3f09bfbc7. Multiple
bugs arise in conjunction with live snapshots and mirroring operations
(which include active layer commit).
After a live snapshot occurs, the active layer and the base layer both
have a non-NULL tqe_prev field in the device_list, although the base
node's tqe_prev field points to a NULL entry. This non-NULL tqe_prev
field occurs after the bdrv_append() in the external snapshot calls
change_parent_backing_link().
In change_parent_backing_link(), when the previous active layer is
removed from device_list, the device_list.tqe_prev pointer is not
set to NULL.
The operating scheme in the block layer is to indicate that a BDS belongs
in the bdrv_states device_list iff the device_list.tqe_prev pointer
is non-NULL.
This patch does two things:
1.) Introduces a new block layer helper bdrv_device_remove() to remove a
BDS from the device_list, and
2.) uses that new API, which also fixes the regression once used in
change_parent_backing_link().
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 0cd51e11c0666c04ddb7c05293fe94afeb551e89.1454376655.git.jcody@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This patch rewrites bdrv_close_all(): Until now, all root BDSs have been
force-closed. This is bad because it can lead to cached data not being
flushed to disk.
Instead, try to make all reference holders relinquish their reference
voluntarily:
1. All BlockBackend users are handled by making all BBs simply eject
their BDS tree. Since a BDS can never be on top of a BB, this will
not cause any of the issues as seen with the force-closing of BDSs.
The references will be relinquished and any further access to the BB
will fail gracefully.
2. All BDSs which are owned by the monitor itself (because they do not
have a BB) are relinquished next.
3. Besides BBs and the monitor, block jobs and other BDSs are the only
things left that can hold a reference to BDSs. After every remaining
block job has been canceled, there should not be any BDSs left (and
the loop added here will always terminate (as long as NDEBUG is not
defined), because either all_bdrv_states will be empty or there will
not be any block job left to cancel, failing the assertion).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We need this list so that bdrv_close_all() can keep track of which BDSs
are still open after having removed the BDSs from all of the BBs and
having released all monitor BDS references.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are no users of bdrv_close() left, except for one of bdrv_open()'s
failure paths, bdrv_close_all() and bdrv_delete(), and that is good.
Make bdrv_close() static so nobody makes the mistake of directly using
bdrv_close() again.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is unused now, so we can remove it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_delete() is not very happy about deleting BlockDriverStates with
dirty bitmaps still attached to them. In the past, we got around that
very easily by relying on bdrv_close_all() bypassing bdrv_delete(), and
bdrv_close() simply ignoring that condition. We should fix that by
releasing all named dirty bitmaps in bdrv_close() (there should not be
any unnamed bitmaps left) and moving the assertion from bdrv_delete()
there.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
So far, live migration with shared storage meant that the image is in a
not-really-ready don't-touch-me state on the destination while the
source is still actively using it, but after completing the migration,
the image was fully opened on both sides. This is bad.
This patch adds a block driver callback to inactivate images on the
source before completing the migration. Inactivation means that it goes
to a state as if it was just live migrated to the qemu instance on the
source (i.e. BDRV_O_INACTIVE is set). You're then supposed to continue
either on the source or on the destination, which takes ownership of the
image.
A typical migration looks like this now with respect to disk images:
1. Destination qemu is started, the image is opened with
BDRV_O_INACTIVE. The image is fully opened on the source.
2. Migration is about to complete. The source flushes the image and
inactivates it. Now both sides have the image opened with
BDRV_O_INACTIVE and are expecting the other side to still modify it.
3. One side (the destination on success) continues and calls
bdrv_invalidate_all() in order to take ownership of the image again.
This removes BDRV_O_INACTIVE on the resuming side; the flag remains
set on the other side.
This ensures that the same image isn't written to by both instances
(unless both are resumed, but then you get what you deserve). This is
important because .bdrv_close for non-BDRV_O_INACTIVE images could write
to the image file, which is definitely forbidden while another host is
using the image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Instead of covering only the state of images on the migration
destination before the migration is completed, the flag will also cover
the state of images on the migration source after completion. This
common state implies that the image is technically still open, but no
writes will happen and any cached contents will be reloaded from disk if
and when the image leaves this state.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
We can only clear BDRV_O_INCOMING if the caches were actually
invalidated.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
bdrv_common_open() modified bs->open_flags after inferring the set of
options to pass to the driver's .bdrv_open callback. This means that the
cache options were correctly set in bs->open_flags (and therefore
correctly displayed in 'info block'), but the image would actually be
opened with the default cache mode instead.
This patch removes the flags parameter to bdrv_common_open() (except for
BDRV_O_NO_BACKING it's the same as bs->open_flags anyway, and having two
names for the same thing is confusing), and moves the assignment of
open_flags down to immediately before calling into the block drivers. In
all other places, bs->open_flags is now used consistently.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>