mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Kevin Wolf	6a1b9ee152	block: Default .bdrv_child_perm() for filter drivers Most filters need permissions related to read and write for their children, but only if the node has a parent that wants to use the same operation on the filter. The same is true for resize. This adds a default implementation that simply forwards all necessary permissions to all children of the node and leaves the other permissions unchanged. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	33a610c398	block: Involve block drivers in permission granting In many cases, the required permissions of one node on its children depend on what its parents require from it. For example, the raw format or most filter drivers only need to request consistent reads if that's something that one of their parents wants. In order to achieve this, this patch introduces two new BlockDriver callbacks. The first one lets drivers first check (recursively) whether the requested permissions can be set; the second one actually sets the new permission bitmask. Also add helper functions that drivers can use in their implementation of the callbacks to update their permissions on a specific child. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	d5e6f437c5	block: Let callers request permissions when attaching a child node When attaching a node as a child to a new parent, the required and shared permissions for this parent are checked against all other parents of the node now, and an error is returned if there is a conflict. This allows error returns to a function that previously always succeeded, and the same is true for quite a few callers and their callers. Converting all of them within the same patch would be too much, so for now everyone tells that they don't need any permissions and allow everyone else to do anything. This way we can use &error_abort initially and convert caller by caller to pass actual permission requirements and implement error handling. All these places are marked with FIXME comments and it will be the job of the next patches to clean them up again. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	8b2ff5291f	block: Add Error argument to bdrv_attach_child() It will have to return an error soon, so prepare the callers for it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:35 +01:00
Peter Maydell	6b4e463ff3	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJYsHaaAAoJEH8JsnLIjy/WvC4P/iw/WC6XySNzPMOzICrJr9fc IHz447+0MqoIBWqGRFLEU8z4t6k8HGt2YnWKFuS1N+2gU5fSgOdp20nAA+pQvxRb RTyQL7BNJPvFNzrEQPTpdvBt79veYsoNe5dNfSq01z/PdgxMQR4PkS96Jm8w4Jdu 20ZPfULws8+Wq1Snsxl4PdnFpY2n97OOGQRCjl50h5ypol5+fXDDzAvp/GPf6Q8B 09OTWmQ4UIr4OZqT83T1kDdRZRRMIxiRP5qTTKdh0BlJEQdBjrJ9fTClY4bMcIgl VP/3kzkmdoqSW+4D+0L88q7vyvM3Kwc6n2PFDaRf6Lgy9ueYoPKuW9AF5vzcxhED AI0SN9boM+4z1P75kalBaHg/BiRL7UDwpHgdanPVcENoP8IywsKzOIvdjOpqINmX uLU3QfK+qK6x7gflhVXiX2sFzTLzZQlWu5KPjW6QfuMzUuRAeksfMDXh2GXNFVG1 igGnORyqB2nOGNAsudtMrOvjHQRSbwGsnvWcwvaX0swrxOb3oAwc3X+E4nxE/K5/ wJQLrCM9DfD3CKKlKiu+1wQXx6zqaGYIsNxLcrSylJ3TPsGuxfkx3b9GVCCQ0Y+e YSdwWSAIS9LxR4qISJ5ehlnYDA0sRtdco0xGW6ACNE5IrgREYUcH4EJOXbxZ4NHs BK/DVdYhFtNZ60zHeEn4 =OYgF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Fri 24 Feb 2017 18:08:26 GMT # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: tests: Use opened block node for block job tests vvfat: Use opened node as backing file block: Add bdrv_new_open_driver() block: Factor out bdrv_open_driver() block: Use BlockBackend for image probing block: Factor out bdrv_open_child_bs() block: Attach bs->file only during .bdrv_open() block: Pass BdrvChild to bdrv_truncate() mirror: Resize active commit base in mirror_run() qcow2: Use BB for resizing in qcow2_amend_options() blockdev: Use BlockBackend to resize in qmp_block_resize() iotests: Fix another race in 030 qemu-img: Improve documentation for PREALLOC_MODE_FALLOC qemu-img: Truncate before full preallocation qemu-img: Add tests for raw image preallocation qemu-img: Do not truncate before preallocation qemu-iotests: redirect nbd server stdout to /dev/null qemu-iotests: add ability to exclude certain protocols from tests qemu-iotests: Test 137 only supports 'file' protocol Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-02-26 12:26:37 +00:00
Kevin Wolf	680c7f9606	block: Add bdrv_new_open_driver() This function allows to create more or less normal BlockDriverStates even for BlockDrivers that aren't globally registered (e.g. helper filters for block jobs). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Kevin Wolf	01a5650179	block: Factor out bdrv_open_driver() This is a function that doesn't do any option parsing, but just does some basic BlockDriverState setup and calls the .bdrv_open() function of the block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Kevin Wolf	5696c6e350	block: Use BlockBackend for image probing This fixes the use of a parent-less BdrvChild in bdrv_open_inherit() by converting it into a BlockBackend. Which is exactly what it should be, image probing is an external, standalone user of a node. The requests can't be considered to originate from the format driver node because that one isn't even opened yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Kevin Wolf	2d6b86af14	block: Factor out bdrv_open_child_bs() This is the part of bdrv_open_child() that opens a BDS with option inheritance, but doesn't attach it as a child to the parent yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Kevin Wolf	4e4bf5c42c	block: Attach bs->file only during .bdrv_open() The way that attaching bs->file worked was a bit unusual in that it was the only child that would be attached to a node which is not opened yet. Because of this, the block layer couldn't know yet which permissions the driver would eventually need. This patch moves the point where bs->file is attached to the beginning of the individual .bdrv_open() implementations, so drivers already know what they are going to do with the child. This is also more consistent with how driver-specific children work. For a moment, bdrv_open() gets its own BdrvChild to perform image probing, but instead of directly assigning this BdrvChild to the BDS, it becomes a temporary one and the node name is passed as an option to the drivers, so that they can simply use bdrv_open_child() to create another reference for their own use. This duplicated child for (the not opened yet) bs is not the final state, a follow-up patch will change the image probing code to use a BlockBackend, which is completely independent of bs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Kevin Wolf	52cdbc5869	block: Pass BdrvChild to bdrv_truncate() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Markus Armbruster	ca6b6e1e68	Don't check qobject_type() before qobject_to_qdict() qobject_to_qdict(obj) returns NULL when obj isn't a QDict. Check that instead of qobject_type(obj) == QTYPE_QDICT. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1487363905-9480-8-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-02-22 19:52:01 +01:00
Vladimir Sementsov-Ogievskiy	16e977d506	block: bdrv_invalidate_cache: invalidate children first Current implementation invalidates firstly parent bds and then its children. This leads to the following bug: after incoming migration, in bdrv_invalidate_cache_all: 1. invalidate parent bds - reopen it with BDRV_O_INACTIVE cleared 2. child is not yet invalidated 3. parent check that its BDRV_O_INACTIVE is cleared 4. parent writes to child 5. assert in bdrv_co_pwritev, as BDRV_O_INACTIVE is set for child This patch fixes it by just changing invalidate sequence: invalidate children first. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170131112308.54189-1-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00
Jeff Cody	418661e032	block: check full backing filename when searching protocol filenames In bdrv_find_backing_image(), if we are searching an image for a backing file that contains a protocol, we currently only compare unmodified paths. However, some management software will change the backing filename to be a relative filename in a path. QEMU is able to handle this fine, because internally it will use path_combine to put together the full protocol URI. However, this can lead to an inability to match an image during a QAPI command that needs to use bdrv_find_backing_image() to find the image, when it is searched by the full URI. When searching for a protocol filename, if the straight comparison fails, this patch will also compare against the full backing filename to see if that is a match. Signed-off-by: Jeff Cody <jcody@redhat.com> Message-id: c2d025adca8a2b665189e6f4cf080f44126d0b6b.1485392617.git.jcody@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00
Daniel P. Berrange	0ab8ed18a6	trace: switch to modular code generation for sub-directories Introduce rules in the top level Makefile that are able to generate trace.[ch] files in every subdirectory which has a trace-events file. The top level directory is handled specially, so instead of creating trace.h, it creates trace-root.h. This allows sub-directories to include the top level trace-root.h file, without ambiguity wrt to the trace.g file in the current sub-dir. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170125161417.31949-7-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-31 17:11:18 +00:00
Paolo Bonzini	7ad2757fef	block: remove dead check options must be non-NULL here, because a NULL value is replaced with qdict_new earlier in the function. Reported by Coverity. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-01-24 23:26:53 +03:00
Max Reitz	eb0df69f50	block: Emit modules in bdrv_iterate_format() Some block drivers may not be loaded yet, but qemu supports them nonetheless. bdrv_iterate_format() should report them, too. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20161012204907.25941-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-11-11 15:56:22 +01:00
Max Reitz	ceff5bd79c	block: Fix bdrv_iterate_format() sorting bdrv_iterate_format() did not actually sort the formats by name but by "pointer interpreted as string". That is probably not what we intended to do, so fix it (by changing qsort_strcmp() so it matches the example from qsort()'s manual page). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20161012204907.25941-2-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-11-11 15:56:22 +01:00
Alberto Garcia	61b49e48b3	block: Support streaming to an intermediate layer This makes sure that the image we are streaming into is open in read-write mode during the operation. Operation blockers are also set in all intermediate nodes, since they will be removed from the chain afterwards. Finally, this also unblocks the stream operation in backing files. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-10-31 16:52:38 +01:00
Alberto Garcia	40840e419b	block: Pause all jobs during bdrv_reopen_multiple() When a BlockDriverState is about to be reopened it can trigger certain operations that need to write to disk. During this process a different block job can be woken up. If that block job completes and also needs to call bdrv_reopen() it can happen that it needs to do it on the same BlockDriverState that is still in the process of being reopened. This can have fatal consequences, like in this example: 1) Block job A starts and sleeps after a while. 2) Block job B starts and tries to reopen node1 (a qcow2 file). 3) Reopening node1 means flushing and replacing its qcow2 cache. 4) While the qcow2 cache is being flushed, job A wakes up. 5) Job A completes and reopens node1, replacing its cache. 6) Job B resumes, but the cache that was being flushed no longer exists. This patch splits the bdrv_drain_all() call to keep all block jobs paused during bdrv_reopen_multiple(), so that step 4 can never happen and the operation is safe. Note that this scenario can only happen if both bdrv_reopen() calls are made by block jobs on the same backing chain. Otherwise there's no chance that the same BlockDriverState appears in both reopen queues. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-10-31 16:52:38 +01:00
Paolo Bonzini	c9d1a56174	block: only call aio_poll on the current thread's AioContext aio_poll is not thread safe; for example bdrv_drain can hang if the last in-flight I/O operation is completed in the I/O thread after the main thread has checked bs->in_flight. The bug remains latent as long as all of it is called within aio_context_acquire/aio_context_release, but this will change soon. To fix this, if bdrv_drain is called from outside the I/O thread, signal the main AioContext through a dummy bottom half. The event loop then only runs in the I/O thread. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1477565348-5458-18-git-send-email-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2016-10-28 21:50:18 +08:00
Paolo Bonzini	720150f318	block: prepare bdrv_reopen_multiple to release AioContext After the next patch bdrv_drain_all will have to be called without holding any AioContext. Prepare to do this by adding an AioContext argument to bdrv_reopen_multiple. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1477565348-5458-15-git-send-email-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2016-10-28 21:50:18 +08:00
Kevin Wolf	2d76e724cf	block: Add qdev ID to DEVICE_TRAY_MOVED The event currently only contains the BlockBackend name. However, with anonymous BlockBackends, this is always the empty string. Add the qdev ID (or if none was given, the QOM path) so that the user can still see which device caused the event. Event generation has to be moved from bdrv_eject() to the BlockBackend because the BDS doesn't know the attached device, but that's easy because blk_eject() is the only user of it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-10-07 13:34:22 +02:00
Kevin Wolf	c5f3014b82	block: Add bdrv_runtime_opts to query-command-line-options Recently we moved a few options from QemuOptsLists in blockdev.c to bdrv_runtime_opts in block.c in order to make them accissble using blockdev-add. However, this has the side effect that these options are missing from query-command-line-options now, and libvirt consequently disables the corresponding feature. This problem was reported as a regression for the 'discard' option, introduced in commit `818584a4`. However, it is more general than that. Fix it by adding bdrv_runtime_opts to the list of QemuOptsLists that are returned in query-command-line-options. For the future, libvirt is advised to use QMP schema introspection for block device options. Reported-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Michal Privoznik <mprivozn@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-10-07 13:34:07 +02:00
Kevin Wolf	818584a43a	block: Move 'discard' option to bdrv_open_common() This enables its use for nested child nodes. The compatibility between the 'discard' and 'detect-zeroes' setting is checked in bdrv_open_common() now as the former setting isn't available before calling bdrv_open() any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-09-29 14:13:39 +02:00
Kevin Wolf	692e01a27c	block: Parse 'detect-zeroes' in bdrv_open_common() Amongst others, this means that you can now use the 'detect-zeroes' option for non-top-level nodes in blockdev-add, like the QAPI schema promises. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-09-29 14:13:39 +02:00
Alberto Garcia	5b7ba05fe7	block: Don't queue the same BDS twice in bdrv_reopen_queue_child() bdrv_reopen_queue_child() assumes that a BlockDriverState is never added twice to BlockReopenQueue. That's however not the case: commit_start() adds 'base' (and its children) to a new reopen queue, and then 'overlay_bs' (and its children, which include 'base') to the same queue. The effect of this is that the first set of options is ignored and overriden by the second. We fixed this by swapping the order in which both BDSs were added to the queue in `3db2bd5508`. This patch checks if a BDS is already in the reopen queue and keeps its options. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-09-23 13:36:10 +02:00
Alberto Garcia	f87a0e29a9	block: Add "read-only" to the options QDict This adds the "read-only" option to the QDict. One important effect of this change is that when a child inherits options from its parent, the existing "read-only" mode can be preserved if it was explicitly set previously. This addresses scenarios like this: [E] <- [D] <- [C] <- [B] <- [A] In this case, if we reopen [D] with read-only=off, and later reopen [B], then [D] will not inherit read-only=on from its parent during the bdrv_reopen_queue_child() stage. The BDRV_O_RDWR flag is not removed yet, but its keep in sync with the value of the "read-only" option. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-09-23 13:36:10 +02:00
Alberto Garcia	9b7e869167	block: Update bs->open_flags earlier in bdrv_open_common() We're only doing this immediately before opening the image, but bs->open_flags is used earlier in the function. At the moment this is not causing problems because none of the checked flags are modified by update_flags_from_options(), but this will change when we introduce the "read-only" option. This patch calls update_flags_from_options() at the beginning of the function, immediately after creating the QemuOpts. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-09-23 13:36:10 +02:00
Alberto Garcia	14499ea541	block: Set BDRV_O_ALLOW_RDWR and snapshot_options before storing the flags If an image is opened with snapshot=on, its flags are modified by bdrv_backing_options() and then bs->open_flags is updated accordingly. This last step is unnecessary if we calculate the new flags before setting bs->open_flags. Soon we'll introduce the "read-only" option, and then we'll need to be able to modify its value in the QDict when snapshot=on. This is more cumbersome if bs->options is already set. This patch simplifies that. Other than that, there are no semantic changes. Although it might seem that bs->options can have a different value now because it is stored after calling bdrv_backing_options(), this call doesn't actually modify them in this scenario. The code that sets BDRV_O_ALLOW_RDWR is also moved for the same reason. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-09-23 13:36:10 +02:00
Alberto Garcia	38b5e4c3dc	block: Remove bdrv_is_snapshot This is unnecessary and has been unused since `5433c24f0f`. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-09-23 13:36:10 +02:00
Marc Mari	88d88798b7	blockdev: Add dynamic module loading for block drivers Extend the current module interface to allow for block drivers to be loaded dynamically on request. The only block drivers that can be converted into modules are the drivers that don't perform any init operation except for registering themselves. In addition, only the protocol drivers are being modularized, as they are the only ones which see significant performance benefits. The format drivers do not generally link to external libraries, so modularizing them is of no benefit from a performance perspective. All the necessary module information is located in a new structure found in module_block.h This spoils the purpose of `5505e8b76f` (block/dmg: make it modular). Before this patch, if module build is enabled, block-dmg.so is linked to libbz2, whereas the main binary is not. In downstream, theoretically, it means only the qemu-block-extra package depends on libbz2, while the main QEMU package needn't to. With this patch, we (temporarily) change the case so that the main QEMU depends on libbz2 again. Signed-off-by: Marc Marí <markmb@redhat.com> Signed-off-by: Colin Lord <clord@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> [mreitz: Do a signed comparison against the length of block_driver_modules[], so it will not cause a compile error when empty] Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-09-20 22:12:03 +02:00
Wen Congyang	e9d6456e95	block: unblock backup operations in backing file Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Kashyap Chamarthy <kchamart@redhat.com> Message-id: 1469602913-20979-2-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-09-13 11:00:56 +01:00
Kevin Wolf	cd7fca952c	nbd-server: Use a separate BlockBackend The builtin NBD server uses its own BlockBackend now instead of reusing the monitor/guest device one. This means that it has its own writethrough setting now. The builtin NBD server always uses writeback caching now regardless of whether the guest device has WCE enabled. qemu-nbd respects the cache mode given on the command line. We still need to keep a reference to the monitor BB because we put an eject notifier on it, but we don't use it for any I/O. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-09-05 19:06:47 +02:00
Denis V. Lunev	2f0342efdb	block: remove extra condition in bdrv_can_write_zeroes_with_unmap All .bdrv_co_write_zeroes callbacks nowadays work perfectly even with backing store attached. If future new callbacks would be unable to do that - they have a chance to block this in bdrv_get_info(). Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-6-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Evgeny Yakovlev	3ff2f67a7c	block: ignore flush requests when storage is clean Some guests (win2008 server for example) do a lot of unnecessary flushing when underlying media has not changed. This adds additional overhead on host when calling fsync/fdatasync. This change introduces a write generation scheme in BlockDriverState. Current write generation is checked against last flushed generation to avoid unnessesary flushes. The problem with excessive flushing was found by a performance test which does parallel directory tree creation (from 2 processes). Results improved from 0.424 loops/sec to 0.432 loops/sec. Each loop creates 10^3 directories with 10 files in each. This affected some blkdebug testcases that were expecting error logs from failure-injected flushes which are now skipped entirely (tests 026 071 089). This also affects the performance of block jobs and thus BLOCK_JOB_READY events for driver-mirror and active block-commit commands now arrives faster, before QMP send successfully returns to caller (tests 141 144). Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1468870792-7411-5-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>	2016-07-18 18:19:01 -04:00
Paolo Bonzini	0b8b8753e4	coroutine: move entry argument to qemu_coroutine_create In practice the entry argument is always known at creation time, and it is confusing that sometimes qemu_coroutine_enter is used with a non-NULL argument to re-enter a coroutine (this happens in block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value at creation time, for consistency with e.g. aio_bh_new. Mostly done with the following semantic patch: @ entry1 @ expression entry, arg, co; @@ - co = qemu_coroutine_create(entry); + co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry2 @ expression entry, arg; identifier co; @@ - Coroutine co = qemu_coroutine_create(entry); + Coroutine co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry3 @ expression entry, arg; @@ - qemu_coroutine_enter(qemu_coroutine_create(entry), arg); + qemu_coroutine_enter(qemu_coroutine_create(entry, arg)); @ reentry @ expression co; @@ - qemu_coroutine_enter(co, NULL); + qemu_coroutine_enter(co); except for the aforementioned few places where the semantic patch stumbled (as expected) and for test_co_queue, which would otherwise produce an uninitialized variable warning. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Kevin Wolf	cf2ab8fc34	block: Convert bdrv_pread(v) to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	83fd6dd3e7	block: Move bdrv_commit() to block/commit.c No code changes, just moved from one file to another. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Eric Blake	5411541270	block: Use bool as appropriate for BDS members Using int for values that are only used as booleans is confusing. While at it, rearrange a couple of members so that all the bools are contiguous. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:26 +02:00
Eric Blake	a5b8dd2ce8	block: Move request_alignment into BlockLimit It makes more sense to have ALL block size limit constraints in the same struct. Improve the documentation while at it. Simplify a couple of conditionals, now that we have audited and documented that request_alignment is always non-zero. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:26 +02:00
Eric Blake	79ba8c986a	block: Set default request_alignment during bdrv_refresh_limits() We want to eventually stick request_alignment alongside other BlockLimits, but first, we must ensure it is populated at the same time as all other limits, rather than being a special case that is set only when a block is first opened. Now that all drivers have been updated to supply an override of request_alignment during their .bdrv_refresh_limits(), as needed, the block layer itself can defer setting the default alignment until part of the overall bdrv_refresh_limits(). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Peter Maydell	7fa124b273	Error reporting patches for 2016-06-20 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIbBAABAgAGBQJXaAQPAAoJEDhwtADrkYZTbPMP+OmXlf1IwcUONhaHqD0SXAdC Q5NNHheNHbIdzrsOmgj2GAxCeGG08REPrdYLb8N9lI3Fev8ggQ0v3d9qiStL6Nvz 2B7DeTxXba9Qmc2gY86RR4f7T6VK+8k1Kj9f3YcAoXHucdjDLRmtUozwES3Kk8mn oeBqd07yuTAO93xQhF6BDv3mfagPZV+kp15Se04ufWPpT9eG3Cp9Pl4/Yad/wJUr 2B4pIvYnhISVlZIOvpuzdydBaCb/U0YefjWBDkawLwRBtM/kpNKoDdqr2UA0yChQ Xulgpt+UQLZ4aOYGLkzqBT7v3kDeEnYUFDdE03xZBLbHddOU05NEhLgzRVUEAdUS EAKWvWu692USDh9a2r0AYLqZW6J0+Fyi4fZ34gSuP69oi74CIvjJHDAEwsSdt6bm WHCchvfIk9kinzGiFA3hkAbVWAbI0EpB0J7k3mgCVVVelBpF9b9U4M2ydx9ZPufX t4ULXxQO9Qyxr7r0uiznugQAjIRwLMDOPPd1F5Vi+LX0wF3Pcrtc/F+3ehoLBjBD 6zZA5lZVlUEISdzzytfjeX3ch5vRFwlujrTd2anxRuduuBV1/ztvE1v0Nsiii6Sw 15BaHQJvBnMnAIO5LfhnAaVpIBk0jpQpwVyRcNUpeZ1VU2qlqVS4zxYZ0FDQgyer IZa1qYJ0Sz+oMkq9RXU= =TptL -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2016-06-20' into staging Error reporting patches for 2016-06-20 # gpg: Signature made Mon 20 Jun 2016 15:56:15 BST # gpg: using RSA key 0x3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-error-2016-06-20: log: Fix qemu_set_log_filename() error handling log: Fix qemu_set_dfilter_ranges() error reporting log: Plug memory leak on multiple -dfilter coccinelle: Remove unnecessary variables for function return value error: Remove unnecessary local_err variables error: Remove NULL checks on error_propagate() calls vl: Error messages need to go to stderr, fix some Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-06-20 16:19:18 +01:00
Eduardo Habkost	621ff94d50	error: Remove NULL checks on error_propagate() calls error_propagate() already ignores local_err==NULL, so there's no need to check it before calling. Coccinelle patch used to perform the changes added to scripts/coccinelle/error_propagate_null.cocci. Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1465855078-19435-2-git-send-email-ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-06-20 16:38:13 +02:00
Stefan Hajnoczi	e8a095dadb	block: use safe iteration over AioContext notifiers It's possible that an AioContext notifier user was close to finishing when .detach_aio_context() or .attached_aio_context() is called. In that case they may call bdrv_remove_aio_context_notifier() during the callback. Use safe iteration to avoid crashing when the notifier list is modified during iteration. We must not only handle the case where the current aio notifier is removed during a callback but also the one where any other aio notifier is removed. The next patch adds an AioContext notifier for block jobs and they really could be terminating just as .detach_aio_context() is invoked. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466096189-6477-6-git-send-email-stefanha@redhat.com	2016-06-20 14:25:41 +01:00
Max Reitz	274fccee2b	block/mirror: Fix target backing BDS Currently, we are trying to move the backing BDS from the source to the target in bdrv_replace_in_backing_chain() which is called from mirror_exit(). However, mirror_complete() already tries to open the target's backing chain with a call to bdrv_open_backing_file(). First, we should only set the target's backing BDS once. Second, the mirroring block job has a better idea of what to set it to than the generic code in bdrv_replace_in_backing_chain() (in fact, the latter's conditions on when to move the backing BDS from source to target are not really correct). Therefore, remove that code from bdrv_replace_in_backing_chain() and leave it to mirror_complete(). Depending on what kind of mirroring is performed, we furthermore want to use different strategies to open the target's backing chain: - If blockdev-mirror is used, we can assume the user made sure that the target already has the correct backing chain. In particular, we should not try to open a backing file if the target does not have any yet. - If drive-mirror with mode=absolute-paths is used, we can and should reuse the already existing chain of nodes that the source BDS is in. In case of sync=full, no backing BDS is required; with sync=top, we just link the source's backing BDS to the target, and with sync=none, we use the source BDS as the target's backing BDS. We should not try to open these backing files anew because this would lead to two BDSs existing per physical file in the backing chain, and we would like to avoid such concurrent access. - If drive-mirror with mode=existing is used, we have to use the information provided in the physical image file which means opening the target's backing chain completely anew, just as it has been done already. If the target's backing chain shares images with the source, this may lead to multiple BDSs per physical image file. But since we cannot reliably ascertain this case, there is nothing we can do about it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160610185750.30956-3-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Max Reitz	9bd910e2cb	block: Allow replacement of a BDS by its overlay change_parent_backing_link() asserts that the BDS to be replaced is not used as a backing file. However, we may want to replace a BDS by its overlay in which case that very link should not be redirected. For instance, when doing a sync=none drive-mirror operation, we may have the following BDS/BB forest before block job completion: target base <- source <- BlockBackend During job completion, we want to establish the source BDS as the target's backing node: target \| v base <- source <- BlockBackend This makes the target a valid replacement for the source: target <- BlockBackend \| v base <- source Without this modification to change_parent_backing_link() we have to inject the target into the graph before the source is its backing node, thus temporarily creating a wrong graph: target <- BlockBackend base <- source Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160610185750.30956-2-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Kevin Wolf	418690447a	block: Fix snapshot=on with aio=native snapshot=on creates a temporary overlay that is always opened with cache=unsafe (the cache mode specified by the user is only for the actual image file and its children). This means that we must not inherit the BDRV_O_NATIVE_AIO flag for the temporary overlay because trying to use Linux AIO with cache=unsafe results in an error. Reproducer without this patch: $ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on qemu-system-x86_64: -drive file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on: aio=native was specified, but it requires cache.direct=on, which was not specified. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-16 15:19:56 +02:00
Kevin Wolf	c9d20029f4	block: Remove bs->zero_beyond_eof It is always true for open images now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:56 +02:00
Kevin Wolf	23b0d9fb1d	block: Don't enforce 512 byte minimum alignment If block drivers say that they can do an alignment < 512 bytes, let's just suppose they mean it. raw-posix used to be an offender with respect to this, but it can actually deal with byte-aligned requests now. The default is still 512 bytes for any drivers that only implement sector-based interfaces, but it is 1 now for drivers that implement .bdrv_co_preadv. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Peter Lieven	107d433cbb	block: assert that bs->request_alignment is a power of 2 at least bdrv_co_preadv/pwritev expect this. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:09 +02:00
Kevin Wolf	a1a2af0756	block: Cancel jobs first in bdrv_close_all() So far, bdrv_close_all() first removed all root BlockDriverStates of BlockBackends and monitor owned BDSes, and then assumed that the remaining BDSes must be related to jobs and cancelled these jobs. This order doesn't work that well any more when block jobs use BlockBackends internally because then they will lose their BDS before being cancelled. This patch changes bdrv_close_all() to first cancel all jobs and then remove all root BDSes from the remaining BBs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	20018e12cf	block: Propagate .drained_begin/end callbacks When draining intermediate nodes (i.e. nodes that aren't the root node for at least one of their parents; with node references, the user can always configure the graph to create this situation), we need to propagate the .drained_begin/end callbacks all the way up to the root for the drain to be effective. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-25 19:04:11 +02:00
Kevin Wolf	36fe13317b	block: Fix reconfiguring graph with drained nodes When changing the BlockDriverState that a BdrvChild points to while the node is currently drained, we must call the .drained_end() parent callback. Conversely, when this means attaching a new node that is already drained, we need to call .drained_begin(). bdrv_root_attach_child() takes now an opaque parameter, which is needed because the callbacks must also be called if we're attaching a new child to the BlockBackend when the root node is already drained, and they need a way to identify the BlockBackend. Previously, child->opaque was set too late and the callbacks would still see it as NULL. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-25 19:04:10 +02:00
Kevin Wolf	e9740bc6d4	block: Introduce bdrv_replace_child() This adds a common function that is called when attaching a new child to a parent, removing a child from a parent and when reconfiguring the graph so that an existing child points to a different node now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	6b574e09b3	block: Drop bdrv_parent_cb_...() from bdrv_close() bdrv_close() now asserts that the BDS's refcount is 0, therefore it cannot have any parents and the bdrv_parent_cb_change_media() call is a no-op. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	30f55fb81f	block: Assert !bs->refcnt in bdrv_close() The only caller of bdrv_close() left is bdrv_delete(). We may as well assert that, in a way (there are some things in bdrv_close() that make more sense under that assumption, such as the call to bdrv_release_all_dirty_bitmaps() which in turn assumes that no frozen bitmaps are attached to the BDS). In addition, being called only in bdrv_delete() means that we can drop bdrv_close()'s forward declaration at the top of block.c. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	5b3639371c	block: Make bdrv_open() return a BDS There are no callers to bdrv_open() or bdrv_open_inherit() left that pass a pointer to a non-NULL BDS pointer as the first argument of these functions, so we can finally drop that parameter and just make them return the new BDS. Generally, the following pattern is applied: bs = NULL; ret = bdrv_open(&bs, ..., &local_err); if (ret < 0) { error_propagate(errp, local_err); ... } by bs = bdrv_open(..., errp); if (!bs) { ret = -EINVAL; ... } Of course, there are only a few instances where the pattern is really pure. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	9bddf75979	block: Drop bdrv_new_root() It is unused now, so we may just as well drop it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	668361898e	block: Let bdrv_open_inherit() return the snapshot If bdrv_open_inherit() creates a snapshot BDS and *pbs is NULL, that snapshot BDS should be returned instead of the BDS under it. This has worked so far because (nearly) all users of BDRV_O_SNAPSHOT use blk_new_open() to create the BDS tree. bdrv_append() (which is called by bdrv_append_temp_snapshot()) redirects pointers from parents (i.e. the BB in this case) to the newly appended child (i.e. the overlay), therefore, while bdrv_open_inherit() did not return the root BDS, the BB still pointed to it. The only instance where BDRV_O_SNAPSHOT is used but blk_new_open() is not is in blockdev_init() if no BDS tree is created, and instead blk_new() is used and the flags are stored in the BB root state. However, qmp_blockdev_change_medium() filters the BDRV_O_SNAPSHOT flag before invoking bdrv_open(), so it will not have any effect. In any case, it would be nicer if bdrv_open_inherit() could just always return the root of the BDS tree that has been created. To this end, bdrv_append_temp_snapshot() now returns the snapshot BDS instead of just appending it on top of the snapshotted BDS. Also, it calls bdrv_ref() before bdrv_append() (which bdrv_open_inherit() has to undo if not returning the overlay). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	506f8709ce	block: Drop useless bdrv_new() call bdrv_append_temp_snapshot() uses bdrv_new() to create an empty BDS before invoking bdrv_open() on that BDS. This is probably a relict from when it used to do some modifications on that empty BDS, but now that is unnecessary, so we can just set bs_snapshot to NULL and let bdrv_open() do the rest. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-25 19:04:10 +02:00
Kevin Wolf	88be7b4be4	block: Fix bdrv_next() memory leak The bdrv_next() users all leaked the BdrvNextIterator after completing the iteration. Simply changing bdrv_next() to free the iterator before returning NULL at the end of list doesn't work because some callers exit the loop before looking at all BDSes. This patch moves the BdrvNextIterator from the heap to the stack of the caller and switches to a bdrv_first()/bdrv_next() interface for initialising the iterator. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	b97511c7bc	block: Propagate AioContext change to all children Instead of propagating any change of a BDS's AioContext only to its file and backing children and letting driver-specific code do the rest, just propagate it to all and drop the thus superfluous implementations of bdrv_{at,de}tach_aio_context() in Quorum, blkverify and VMDK. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	1f0c461b82	block: Remove BlockDriverState.blk This patch removes the remaining users of bs->blk, which will allow us to have multiple BBs on top of a single BDS. In the meantime, all checks that are currently in place to prevent the user from creating such setups can be switched to bdrv_has_blk() instead of accessing BDS.blk. Future patches can allow them and e.g. enable users to mirror to a block device that already has a BlockBackend on it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	7c8eece45b	block: Avoid bs->blk in bdrv_next() We need to introduce a separate BdrvNextIterator struct that can keep more state than just the current BDS in order to avoid using the bs->blk pointer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	4c265bf9f4	block: User BdrvChild callback for device name In order to get rid of bs->blk for bdrv_get_device_name() and bdrv_get_device_or_node_name(), ask all parents for their name and simply pick the first one. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	5c8cab4808	block: Use BdrvChild callbacks for change_media/resize We want to get rid of BlockDriverState.blk in order to allow multiple BlockBackends per BDS. Converting the device callbacks in block.c (which assume a single BlockBackend) to per-child callbacks gets us rid of the first few instances. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	b26ded9a7d	Revert "block: Forbid I/O throttling on nodes with multiple parents for 2.6" This reverts commit `76b223200e`. Now that I/O throttling is fully done on the BlockBackend level, there is no reason any more to block I/O throttling for nodes with multiple parents as the parents don't influence each other any more. Conflicts: block.c Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	08e83aabe4	block: Remove bdrv_move_feature_fields() bdrv_move_feature_fields() and swap_feature_fields() are empty now, they can be removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	7ca7f0f6db	block: Decouple throttling from BlockDriverState This moves the throttling related part of the BDS life cycle management to BlockBackend. The throttling group reference is now kept even when no medium is inserted. With this commit, throttling isn't disabled and then re-enabled any more during graph reconfiguration. This fixes the temporary breakage of I/O throttling when used with live snapshots or block jobs that manipulate the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	97148076e8	block: Move I/O throttling configuration functions to BlockBackend Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	27ccdd5259	block: Move throttling fields from BDS to BB This patch changes where the throttling state is stored (used to be the BlockDriverState, now it is the BlockBackend), but it doesn't actually make it a BB level feature yet. For example, throttling is still disabled when the BDS is detached from the BB. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	a5614993d7	block: Make sure throttled BDSes always have a BB It was already true in principle that a throttled BDS always has a BB attached, except that the order of operations while attaching or detaching a BDS to/from a BB wasn't careful enough. This commit breaks graph manipulations while I/O throttling is enabled. It would have been possible to keep things working with some temporary hacks, but quite cumbersome, so it's not worth the hassle. We'll fix things again in a minute. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:29 +02:00
Wen Congyang	98292c61bc	quorum: implement bdrv_add_child() and bdrv_del_child() Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Message-id: 1462865799-19402-3-git-send-email-xiecl.fnst@cn.fujitsu.com Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-05-12 15:33:23 +02:00
Wen Congyang	e06018ad28	Add new block driver interface to add/delete a BDS's child In some cases, we want to take a quorum child offline, and take another child online. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1462865799-19402-2-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-05-12 15:33:23 +02:00
Fam Zheng	aad0b7a0bf	block: Inactivate all children Currently we only inactivate the top BDS. Actually bdrv_inactivate should be the opposite of bdrv_invalidate_cache. Recurse into the whole subtree instead. Because a node may have multiple parents, and because once BDRV_O_INACTIVE is set for a node, further writes are not allowed, we cannot interleave flag settings and .bdrv_inactivate calls (that may submit write to other nodes in a graph) within a single pass. Therefore two passes are used here. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Fam Zheng	0d1c5c9160	block: Invalidate all children Currently we only recurse to bs->file, which will miss the children in quorum and VMDK. Recurse into the whole subtree to avoid that. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Kevin Wolf	e3ddef25e9	block: Remove BlockDriver.bdrv_read/write There are no block drivers left that implement the old .bdrv_read/write interface, so it can be removed now. This gets us rid of the corresponding emulation functions, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Paolo Bonzini	ce0f141259	block: introduce bdrv_no_throttling_begin/end Extract the handling of throttling from bdrv_flush_io_queue. These new functions will soon become BdrvChildRole callbacks, as they can be generalized to "beginning of drain" and "end of drain". Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	76b223200e	block: Forbid I/O throttling on nodes with multiple parents for 2.6 As the patches to move I/O throttling to BlockBackend didn't make it in time for the 2.6 release, but the release adds new ways of configuring VMs whose behaviour would change once the move is done, we need to outlaw such configurations temporarily. The problem exists whenever a BDS has more users than just its BB, for example it is used as a backing file for another node. (This wasn't possible in 2.5 yet as we introduced node references to specify a backing file only recently.) In these cases, the throttling would apply to these other users now, but after moving throttling to the BlockBackend the other users wouldn't be throttled any more. This patch prevents making new references to a throttled node as well as using monitor commands to throttle a node with multiple parents. Compared to 2.5 this changes behaviour in some corner cases where references were allowed before, like bs->file or Quorum children. It seems reasonable to assume that users didn't use I/O throttling on such low level nodes. With the upcoming move of throttling into BlockBackend, such configurations won't be possible anyway. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-04-05 09:22:28 +02:00
Kevin Wolf	09cf9db1bc	block: Remove bdrv_(set_)enable_write_cache() The only remaining users were block jobs (mirror and backup) which unconditionally enabled WCE on the BlockBackend of the target image. As these block jobs don't go through BlockBackend for their I/O requests, they aren't affected by this setting anyway but always get a writeback mode, so that call can be removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:03 +02:00
Kevin Wolf	61de4c6808	block: Remove BDRV_O_CACHE_WB The previous patches have successively made blk->enable_write_cache the true source for the information whether a writethrough mode must be implemented. The corresponding BDRV_O_CACHE_WB is only useless baggage we're carrying around, so now's the time to remove it. At the same time, we remove the 'cache.writeback' option parsing on the BDS level as the only effect was setting the BDRV_O_CACHE_WB flag. This change requires test cases that explicitly enabled the option to drop it. Other than that and the change of the error message when writethrough is enabled on the BDS level (from "Can't set writethrough mode" to "doesn't support the option"), there should be no change in behaviour. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:03 +02:00
Kevin Wolf	53e8ae0100	block: Remove bdrv_parse_cache_flags() All users are converted to bdrv_parse_cache_mode() now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:03 +02:00
Kevin Wolf	19dbecdcee	qemu-io: Use bdrv_parse_cache_mode() in reopen_f() We must forbid changing the WCE flag in bdrv_reopen() in the same patch, as otherwise the behaviour would change so that the flag takes precedence over the explicitly specified option. The correct value of the WCE flag depends on the BlockBackend user (e.g. guest device) and isn't a decision that the QMP client makes, so this change is what we want. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:03 +02:00
Kevin Wolf	c83f9fba2a	block/qapi: Use blk_enable_write_cache() Now that WCE is handled on the BlockBackend level, the flag is meaningless for BDSes. As the schema requires us to fill the field, we return an enabled write cache for them. Note that this means that querying the BlockBackend name may return writethrough as the cache information, whereas querying the node-name of the root of that same BlockBackend will return writeback. This may appear odd at first, but it actually makes sense because it correctly repesents the layer that implements the WCE handling. This becomes more apparent when you consider nodes that are the root node of multiple BlockBackends, where each BB can have its own WCE setting. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	bfd18d1e0b	block: Move enable_write_cache to BB level Whether a write cache is used or not is a decision that concerns the user (e.g. the guest device) rather than the backend. It was already logically part of the BB level as bdrv_move_feature_fields() always kept it on top of the BDS tree; with this patch, the core of it (the actual flag and the additional flushes) is also implemented there. Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs doesn't have a BlockBackend attached. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	baf5602ed9	block: Add bdrv_parse_cache_mode() It's like bdrv_parse_cache_flags(), except that writethrough mode isn't included in the flags, but returned as a separate bool. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:00 +02:00
Daniel P. Berrange	e6ff69bf5e	block: move encryption deprecation warning into qcow code For a couple of releases we have been warning Encrypted images are deprecated Support for them will be removed in a future release. You can use 'qemu-img convert' to convert your image to an unencrypted one. This warning was issued by system emulators, qemu-img, qemu-nbd and qemu-io. Such a broad warning was issued because the original intention was to rip out all the code for dealing with encryption inside the QEMU block layer APIs. The new block encryption framework used for the LUKS driver does not rely on the unloved block layer API for encryption keys, instead using the QOM 'secret' object type. It is thus no longer appropriate to warn about encryption unconditionally. When the qcow/qcow2 drivers are converted to use the new encryption framework too, it will be practical to keep AES-CBC support present for use in qemu-img, qemu-io & qemu-nbd to allow for interoperability with older QEMU versions and liberation of data from existing encrypted qcow2 files. This change moves the warning out of the generic block code and into the qcow/qcow2 drivers. Further, the warning is set to only appear when running the system emulators, since qemu-img, qemu-io, qemu-nbd are expected to support qcow2 encryption long term now that the maint burden has been eliminated. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:12:15 +02:00
Daniel P. Berrange	abb06c5ac1	block: add flag to indicate that no I/O will be performed When opening an image it is useful to know whether the caller intends to perform I/O on the image or not. In the case of encrypted images this will allow the block driver to avoid having to prompt for decryption keys when we merely want to query header metadata about the image. eg qemu-img info This flag is enforced at the top level only, since even if we don't want todo I/O on the 'qcow2' file payload, the underlying 'file' driver will still need todo I/O to read the qcow2 header, for example. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Kevin Wolf	73ac451f34	block: Reject writethrough mode except at the root Writethrough mode is going to become a BlockBackend feature rather than a BDS one, so forbid it in places where we won't be able to support it when the code finally matches the envisioned design. We only allowed setting the cache mode of non-root nodes after the 2.5 release, so we're still free to make this change. The target of block jobs is now always opened in a writeback mode because it doesn't have a BlockBackend attached. This makes more sense anyway because block jobs know when to flush. If the graph is modified on job completion, the original cache mode moves to the new root, so for the guest device writethough always stays enabled if it was configured this way. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-03-30 11:59:32 +02:00
Kevin Wolf	b8816a4386	block: Make backing files always writeback First of all, we're generally not writing to backing files, but when we do, it's in the context of block jobs which know very well when to flush the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-03-30 11:59:32 +02:00
Kevin Wolf	7a827aaec8	block: Remove dirty bitmaps from bdrv_move_feature_fields() This patch changes dirty bitmaps from following a BlockBackend in graph changes to sticking with the node they were created at. For the full discussion, read the following mailing list thread: [Qemu-block] block: Dirty bitmaps and COR in bdrv_move_feature_fields() https://lists.nongnu.org/archive/html/qemu-block/2016-02/msg00745.html In summary, the justification for this change is: * When moving the dirty bitmap to the top of the tree was introduced in bdrv_append() in commit `a9fc4408`, it didn't actually have any effect because there could never be a bitmap in use when bdrv_append() was called (op blockers would prevent this). This is still true today for all internal uses of dirty bitmaps. * Support for user-defined dirty bitmaps was introduced in 2.4, but we discouraged users from using it because we didn't consider it ready yet. Moreover, in 2.5, the bdrv_swap() removal introduced a bug that left dangling pointers if a dirty bitmap was present (the anchors of the dirty bitmap were swapped, but the back link in the first element wasn't updated), so it didn't even work correctly. * block-dirty-bitmap-add takes an arbitrary node name, even if no BlockBackend is attached. This suggests that it is a node level operation and not a BlockBackend one. Consequently, there is no reason for dirty bitmaps to stay with a BlockBackend that was attached to the node they were created for. * It was suggested that block-dirty-bitmap-add could track the node if a node name was specified, and track the BlockBackend if the device name was specified. This would however be inconsistent with other QMP commands. Commands that accept both device and node names currently interpret the device name just as an alias for the current root node of that BlockBackend. * Dirty bitmaps have a name that is only unique amongst the bitmaps in a specific node. Moving bitmaps could lead to name clashes. Automatic renaming would involve too much magic. * Persistent bitmaps are stored in a specific node. Moving them around automatically might be at least surprising, but it would probably also become a real problem because that would have to happen atomically without the management tool knowing of the operation. At the end of the day it seems to be very clear that it was a mistake to include dirty bitmaps in bdrv_move_feature_fields(). The functionality of moving bitmaps and/or attaching them to a BlockBackend instead will probably be needed, but it should be done with a new explicit QMP command or option. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-03-30 11:59:32 +02:00
Kevin Wolf	4c8449832c	block: Remove copy-on-read from bdrv_move_feature_fields() Ever since we first introduced bdrv_append() in commit `8802d1fd` ('qapi: Introduce blockdev-group-snapshot-sync command'), the copy-on-read flag was moved to the new top layer when taking a snapshot. The only problem is that it doesn't make a whole lot of sense. The use case for manually enabled CoR is to avoid reading data twice from a slow remote image, so we want to save it to a local overlay, say an ISO image accessed via HTTP to a local qcow2 overlay. When taking a snapshot, we end up with a backing chain like this: http <- local.qcow2 <- snap_overlay.qcow2 There is no point in doing CoR from local.qcow2 into snap_overlay.qcow2, we just want to keep copying data from the remote source into local.qcow2. The other use case of CoR is in the context of streaming, which isn't very interesting for bdrv_move_feature_fields() because op blockers prevent this combination. This patch makes the copy-on-read flag stay on the image for which it was originally set and prevents it from being propagated to the new overlay. It is no longer intended to move CoR to the BlockBackend level. In order for this to make sense, we also need to keep the respective image read-write. As a side effect of these changes, creating a live snapshot image (as opposed to using an existing externally created one) on top of a COR block device works now. It used to fail because it tried to open its backing file both read-only and with COR. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-03-30 11:59:32 +02:00
Kevin Wolf	63eaaae08c	block: Remove bdrv_make_anon() The call in hmp_drive_del() is dead code because blk_remove_bs() is called a few lines above. The only other remaining user is bdrv_delete(), which only abuses bdrv_make_anon() to remove it from the named nodes list. This path inlines the list entry removal into bdrv_delete() and removes bdrv_make_anon(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 11:59:32 +02:00
Veronia Bahaa	f348b6d1a5	util: move declarations out of qemu-common.h Move declarations out of qemu-common.h for functions declared in utils/ files: e.g. include/qemu/path.h for utils/path.c. Move inline functions out of qemu-common.h and into new files (e.g. include/qemu/bcd.h) Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:17 +01:00
Kevin Wolf	f21d96d04b	block: Use BdrvChild in BlockBackend Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Max Reitz	9aaf28c61d	block: Remove bdrv_states list Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Max Reitz	79720af640	block: Use bdrv_next() instead of bdrv_states There is no point in manually iterating through the bdrv_states list when there is bdrv_next(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Max Reitz	2626058034	block: Rewrite bdrv_next() Instead of using the bdrv_states list, iterate over all the BlockDriverStates attached to BlockBackends, and over all the monitor-owned BDSs afterwards (except for those attached to a BB). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Max Reitz	fe1a9cbc33	block: Move some bdrv_*_all() functions to BB Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Max Reitz	d0e46a5577	block: Drop BB name from bad option error The information which BB is concerned does not seem useful enough to justify its existence in most other place (which may be related to qemu printing the -drive parameter in question anyway, and for blockdev-add the attribution is naturally unambiguous). Furthermore, as of a future patch, bdrv_get_device_name(bs) will always return the empty string before bdrv_open_inherit() returns. Therefore, just dropping that information seems to be the best course of action. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Fam Zheng	ebab225910	block: Move block dirty bitmap code to separate files The only code change is making bdrv_dirty_bitmap_truncate public. It is used in block.c. Also two long lines (bdrv_get_dirty) are wrapped. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1457412306-18940-5-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-03-14 17:35:05 +01:00
Kevin Wolf	73176bee99	block: Fix snapshot=on cache modes Since commit `91a097e`, we end up with a somewhat weird cache mode configuration with snapshot=on: The commit broke the cache mode inheritance for the snapshot overlay so that it is opened as writethrough instead of unsafe now. The following bdrv_append() call to put it on top of the tree swaps the WCE flag with the snapshot's backing file (i.e. the originally given file), so what we eventually get is cache=writeback on the temporary overlay and cache=writethrough,cache.no-flush=on on the real image file. This patch changes things so that the temporary overlay gets cache=unsafe again like it used to, and the real images get whatever the user specified. This means that cache.direct is now respected even with snapshot=on, and in the case of committing changes, the final flush is no longer ignored except explicitly requested by the user. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-14 16:46:43 +01:00
Kevin Wolf	12d5ee3a7e	block: Fix -incoming with snapshot=on The BDRV_O_INACTIVE flag should only be set for images explicitly opened by the user. snapshot=on needs to create a new qcow2 image and write some metadata to it. This is not a problem because it can't come from the source, so there's no reason to mark it as BDRV_O_INACTIVE, even though it is opened while waiting for the migration to complete. This fixes an assertion failure when -incoming and snapshot=on are combined. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-02-22 09:49:46 +01:00
Peter Maydell	d38ea87ac5	all: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1454089805-5470-16-git-send-email-peter.maydell@linaro.org	2016-02-04 17:41:30 +00:00
Jeff Cody	f8aa905a4f	block: set device_list.tqe_prev to NULL on BDS removal This fixes a regression introduced with commit `3f09bfbc7`. Multiple bugs arise in conjunction with live snapshots and mirroring operations (which include active layer commit). After a live snapshot occurs, the active layer and the base layer both have a non-NULL tqe_prev field in the device_list, although the base node's tqe_prev field points to a NULL entry. This non-NULL tqe_prev field occurs after the bdrv_append() in the external snapshot calls change_parent_backing_link(). In change_parent_backing_link(), when the previous active layer is removed from device_list, the device_list.tqe_prev pointer is not set to NULL. The operating scheme in the block layer is to indicate that a BDS belongs in the bdrv_states device_list iff the device_list.tqe_prev pointer is non-NULL. This patch does two things: 1.) Introduces a new block layer helper bdrv_device_remove() to remove a BDS from the device_list, and 2.) uses that new API, which also fixes the regression once used in change_parent_backing_link(). Signed-off-by: Jeff Cody <jcody@redhat.com> Message-id: 0cd51e11c0666c04ddb7c05293fe94afeb551e89.1454376655.git.jcody@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 18:04:47 +01:00
Max Reitz	ca9bd24cf1	block: Rewrite bdrv_close_all() This patch rewrites bdrv_close_all(): Until now, all root BDSs have been force-closed. This is bad because it can lead to cached data not being flushed to disk. Instead, try to make all reference holders relinquish their reference voluntarily: 1. All BlockBackend users are handled by making all BBs simply eject their BDS tree. Since a BDS can never be on top of a BB, this will not cause any of the issues as seen with the force-closing of BDSs. The references will be relinquished and any further access to the BB will fail gracefully. 2. All BDSs which are owned by the monitor itself (because they do not have a BB) are relinquished next. 3. Besides BBs and the monitor, block jobs and other BDSs are the only things left that can hold a reference to BDSs. After every remaining block job has been canceled, there should not be any BDSs left (and the loop added here will always terminate (as long as NDEBUG is not defined), because either all_bdrv_states will be empty or there will not be any block job left to cancel, failing the assertion). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-02-02 17:50:46 +01:00
Max Reitz	2c1d04e002	block: Add list of all BlockDriverStates We need this list so that bdrv_close_all() can keep track of which BDSs are still open after having removed the BDSs from all of the BBs and having released all monitor BDS references. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-02-02 17:50:46 +01:00
Max Reitz	64dff52019	block: Make bdrv_close() static There are no users of bdrv_close() left, except for one of bdrv_open()'s failure paths, bdrv_close_all() and bdrv_delete(), and that is good. Make bdrv_close() static so nobody makes the mistake of directly using bdrv_close() again. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-02-02 17:50:46 +01:00
Max Reitz	033cb5659a	block: Remove BDS close notifier It is unused now, so we can remove it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-02-02 17:50:46 +01:00
Max Reitz	c5acdc9ab4	block: Release named dirty bitmaps in bdrv_close() bdrv_delete() is not very happy about deleting BlockDriverStates with dirty bitmaps still attached to them. In the past, we got around that very easily by relying on bdrv_close_all() bypassing bdrv_delete(), and bdrv_close() simply ignoring that condition. We should fix that by releasing all named dirty bitmaps in bdrv_close() (there should not be any unnamed bitmaps left) and moving the assertion from bdrv_delete() there. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-02-02 17:50:46 +01:00
Kevin Wolf	76b1c7fe1c	block: Inactivate BDS when migration completes So far, live migration with shared storage meant that the image is in a not-really-ready don't-touch-me state on the destination while the source is still actively using it, but after completing the migration, the image was fully opened on both sides. This is bad. This patch adds a block driver callback to inactivate images on the source before completing the migration. Inactivation means that it goes to a state as if it was just live migrated to the qemu instance on the source (i.e. BDRV_O_INACTIVE is set). You're then supposed to continue either on the source or on the destination, which takes ownership of the image. A typical migration looks like this now with respect to disk images: 1. Destination qemu is started, the image is opened with BDRV_O_INACTIVE. The image is fully opened on the source. 2. Migration is about to complete. The source flushes the image and inactivates it. Now both sides have the image opened with BDRV_O_INACTIVE and are expecting the other side to still modify it. 3. One side (the destination on success) continues and calls bdrv_invalidate_all() in order to take ownership of the image again. This removes BDRV_O_INACTIVE on the resuming side; the flag remains set on the other side. This ensures that the same image isn't written to by both instances (unless both are resumed, but then you get what you deserve). This is important because .bdrv_close for non-BDRV_O_INACTIVE images could write to the image file, which is definitely forbidden while another host is using the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2016-01-20 13:36:23 +01:00
Kevin Wolf	04c01a5c8f	block: Rename BDRV_O_INCOMING to BDRV_O_INACTIVE Instead of covering only the state of images on the migration destination before the migration is completed, the flag will also cover the state of images on the migration source after completion. This common state implies that the image is technically still open, but no writes will happen and any cached contents will be reloaded from disk if and when the image leaves this state. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-20 13:36:23 +01:00
Kevin Wolf	23c88b2472	block: Fix error path in bdrv_invalidate_cache() We can only clear BDRV_O_INCOMING if the caches were actually invalidated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-20 13:36:23 +01:00
Kevin Wolf	82dc8b4110	block: Fix .bdrv_open flags bdrv_common_open() modified bs->open_flags after inferring the set of options to pass to the driver's .bdrv_open callback. This means that the cache options were correctly set in bs->open_flags (and therefore correctly displayed in 'info block'), but the image would actually be opened with the default cache mode instead. This patch removes the flags parameter to bdrv_common_open() (except for BDRV_O_NO_BACKING it's the same as bs->open_flags anyway, and having two names for the same thing is confusing), and moves the assignment of open_flags down to immediately before calling into the block drivers. In all other places, bs->open_flags is now used consistently. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-01-19 17:43:55 +01:00
Markus Armbruster	e43bfd9c87	error: Use error_prepend() where it makes obvious sense Done with this Coccinelle semantic patch @@ expression FMT, E1, E2; expression list ARGS; @@ - error_setg(E1, FMT, ARGS, error_get_pretty(E2)); + error_propagate(E1, E2);/###/ + error_prepend(E1, FMT/@@@/, ARGS); followed by manual cleanup, first because I can't figure out how to make Coccinelle transform strings, and second to get rid of now superfluous error_propagate(). We now use or propagate the original error whole instead of just its message obtained with error_get_pretty(). This avoids suppressing its hint (see commit `50b7b00`), but I can't see how the errors touched in this commit could come with hints. It also improves the message printed with &error_abort when we screw up (see commit `1e9b65b`). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-13 15:16:17 +01:00
Markus Armbruster	cd5c2dac2e	block: Clean up "Could not create temporary overlay" error message bdrv_create() sets an error and returns -errno on failure. When the latter is interesting, the error is created with error_setg_errno(). bdrv_append_temp_snapshot() uses the error's message to create a new one with error_setg_errno(). This adds a strerror() that is either uninteresting or duplicate. Use error_setg() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1450452927-8346-7-git-send-email-armbru@redhat.com>	2016-01-13 15:16:16 +01:00
Paolo Bonzini	fc27291daf	block: use drained section in bdrv_close bdrv_close is used when ejecting a medium. Use a drained section to ensure that all I/O goes to either the old medium or the bitbucket. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1450867706-19860-2-git-send-email-pbonzini@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-01-07 21:30:17 +01:00
Max Reitz	8b13976d3f	block: Add opaque value to the amend CB Add an opaque value which is to be passed to the bdrv_amend_options() status callback. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Kevin Wolf	91a097e747	block: Move cache options into options QDict This adds the cache mode options to the QDict, so that they can be specified for child nodes (e.g. backing.cache.direct=off). The cache modes are not removed from the flags at this point; instead, options and flags are kept in sync. If the user specifies both flags and options, the options take precedence. Child node inherit cache modes as options now, they don't use flags any more. Note that this forbids specifying the cache mode for empty drives. It didn't make sense anyway to specify it there, because it didn't have any effect. blockdev_init() considers the cache options now bdrv_open() options and therefore doesn't create an empty drive any more but calls into bdrv_open(). This in turn will fail with no driver and filename specified. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:43 +01:00
Kevin Wolf	ccf9dc07b5	block: reopen: Extract QemuOpts for generic block layer options This patch adds a QemuOpts for generic block layer options to bdrv_reopen_prepare(). The only two options that currently exist (node-name and driver) cannot be changed, so the only thing we do is putting them right back into the QDict so that we check at the end that they are indeed unchanged. We will add new options soon that can actually be changed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:43 +01:00
Kevin Wolf	145f598e4a	block: Introduce bs->explicit_options bs->options doesn't only contain options that the user explicitly requested, but also option that were derived from flags, the filename or inherited from the parent node. For reopen, it is important to know the difference because reopening the parent can change inherited values in child nodes, but it shouldn't change any options that were explicitly specified for the child. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:43 +01:00
Kevin Wolf	de3b53f007	block: Split out parse_json_protocol() The next patch distinguishes options that were explicitly set and options that were derived. bdrv_fill_option() added options of both types: Options given by json: syntax should be counted as explicit, but the rest is derived. In preparation for the distinction, move json: parse to a separate function. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:43 +01:00
Kevin Wolf	8e2160e2c7	block: Add infrastructure for option inheritance Options are not actually inherited from the parent node yet, but this commit lays the grounds for doing so. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	2851810223	block: reopen: Document option precedence and refactor accordingly The interesting part of reopening an image is from which sources the effective options should be taken, i.e. which options take precedence over which other options. This patch documents the precedence that will be implemented in the following patches. It also refactors bdrv_reopen_queue(), so that the top-level reopened node is handled the same way as children are. Option/flag inheritance from the parent becomes just one item in the list and is done at the beginning of the function, similar to how the other items are/will be handled. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	4c9dfe5d8a	block: Allow specifying child options in reopen If the child was defined in the same context (-drive argument or blockdev-add QMP command) as its parent, a reopen of the parent should work the same and allow changing options of the child. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	62392ebb09	block: Keep "driver" in bs->options Instead of passing a separate drv argument to bdrv_open_common(), just make sure that a "driver" option is set in the QDict. This also means that a "driver" entry is consistently present in bs->options now. This is another step towards keeping all options in the QDict (which is the represenation of the blockdev-add QMP command). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	4cdd01d32e	block: Pass driver-specific options to .bdrv_refresh_filename() In order to decide whether a blkdebug: filename can be produced or a json: one is necessary, blkdebug checked whether bs->options had more options than just "config", "x-image" or "image" (the latter including nested options). That doesn't work well when generic block layer options are present. This patch passes an option QDict to the driver that contains only driver-specific options, i.e. the options for the general block layer as well as child nodes are already filtered out. Works much better this way. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	260fecf13b	block: Exclude nested options only for children in append_open_options() Some drivers have nested options (e.g. blkdebug rule arrays), which don't belong to a child node and shouldn't be removed. Don't remove all options with "." in their name, but check for the complete prefixes of actually existing child nodes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	9e700c1ac6	block: Consider all block layer options in append_open_options The code already special-cased "node-name", which is currently the only option passed in the QDict that isn't driver-specific. Generalise the code to take all general block layer options into consideration. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	d9b7b05703	block: Allow references for backing files For bs->file, using references to existing BDSes has been possible for a while already. This patch enables the same for bs->backing. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	cddff5bae1	block: Fix reopen with semantically overlapping options This fixes bdrv_reopen() calls like the following one: qemu-io -c 'open -o overlap-check.template=all /tmp/test.qcow2' \ -c 'reopen -o overlap-check=none' The approach taken so far would result in an options QDict that has both "overlap-check.template=all" and "overlap-check=none", which obviously conflicts. In this case, the old option should be overridden by the newly specified option. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-12-18 14:34:42 +01:00
Eric Blake	a31939e6c8	blkdebug: Merge hand-rolled and qapi BlkdebugEvent enum No need to keep two separate enums, where editing one is likely to forget the other. Now that we can specify a qapi enum prefix, we don't even have to change the bulk of the uses. get_event_by_name() could perhaps be replaced by qapi_enum_parse(), but I left that for another day. CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1447836791-369-20-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2015-12-17 08:21:27 +01:00
Fam Zheng	df9a681dc9	qed: Implement .bdrv_drain The "need_check_timer" is used to clear the "NEED_CHECK" flag in the image header after a grace period once metadata update has finished. In compliance to the bdrv_drain semantics we should make sure it remains deleted once .bdrv_drain is called. We cannot reuse qed_need_check_timer_cb because here it doesn't satisfy the assertion. Do the "plug" and "flush" calls manually. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-10-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Peter Maydell	c459343b85	error: More error_setg() usage -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWQ4F7AAoJEDhwtADrkYZTfLwP/RQrScyqDdJamcXGY1vgtoKW 2SgkGsx9zS8EwFpROugi5d4YeQItU469PI9KAK8Xlt+qsRqOmAlWAGwBp9s78bs5 P9BYNISdhpF2YcRWBz01rk8PqW80P30dwpCNJQtG/xMv2fGoqFm7/OlFLuv1rfXi G/Yo5qIVPshOjmwXBY1CHckhMylhZrkuLz+7DDfhK9dQbgcmHc8C8VruiWhLJmdx lP+pkGVY0U7w6vwH0+FQKzMJnsrCfdweK1MXrp6j/mSB0u8YigyQ91eaFu68XZYr MaPAJYAPvrBwK4AzW/hzYNeFkJmAHTAb8BCz5MfjVDjnWOR97+IF1RWd+OFUSdnC r0m40N1e6L9AybQROoM23dEVmAH+gwObbR+np718tn5/HyDthisfjgGPJG2F7Sik GXUWbv5fu0371e1GIXYPXsrkyZ8+psLVnSFqp+I77RbVlLh4qaSBd75cQA21s/Md vmWL+byIE9GU2PHjoVxV49j24ULYjahSmewxwT8n2sMLOWKmqHVYAFJc06HP7Udc pVqGewMv4eYZgNuUbclEWwdXof6qJ35uksM4C5Ps4oLQood8MSINBDzQkIed4Ylx rrCgB9J5+tlQbxTZkvhpxrpXEoEbfT+X57cv9oBTnEyMZ4hknzVSx2+pri8LWWhe hWyZ9yJDZobRfxJy9bD0 =FMeO -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2015-11-11' into staging error: More error_setg() usage # gpg: Signature made Wed 11 Nov 2015 17:57:15 GMT using RSA key ID EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" * remotes/armbru/tags/pull-error-2015-11-11: error: More error_setg() usage Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-11-12 10:09:14 +00:00
Eric Blake	455b0fde8c	error: More error_setg() usage A few uses of error_set(ERROR_CLASS_GENERIC_ERROR) were missed in `c6bd8c706`, or have snuck in since. Nuke them. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1447224690-9743-19-git-send-email-eblake@redhat.com> Acked-by: Andreas Färber <afaerber@suse.de> [Indentation tidied up, commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2015-11-11 18:56:26 +01:00
Alberto Garcia	a0d64a61db	throttle: Use bs->throttle_state instead of bs->io_limits_enabled There are two ways to check for I/O limits in a BlockDriverState: - bs->throttle_state: if this pointer is not NULL, it means that this BDS is member of a throttling group, its ThrottleTimers structure has been initialized and its I/O limits are ready to be applied. - bs->io_limits_enabled: if true it means that the throttle_state pointer is valid _and_ the limits are currently enabled. The latter is used in several places to check whether a BDS has I/O limits configured, but what it really checks is whether requests are being throttled or not. For example, io_limits_enabled can be temporarily set to false in cases like bdrv_read_unthrottled() without otherwise touching the throtting configuration of that BDS. This patch replaces bs->io_limits_enabled with bs->throttle_state in all cases where what we really want to check is the existence of I/O limits, not whether they are currently enabled or not. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-11 16:25:47 +01:00
Alberto Garcia	3e8c2e5705	block: support passing 'backing': '' to 'blockdev-add' Passing an empty string allows opening an image but not its backing file. This was already described in the API documentation, only the implementation was missing. This is useful for creating snapshots using images opened with blockdev-add, since they are not supposed to have a backing image before the operation. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-11 16:25:47 +01:00
Max Reitz	c69a4dd899	block: Make bdrv_states public When inserting a BDS tree into a BB, we will need to add the root BDS to this list. Since we will want to do that in the blockdev-insert-medium implementation in blockdev.c, we will need access to it there. This patch is not exactly elegant, but bdrv_states will be removed in the future anyway because we no longer need it since we have BBs. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-11 16:22:46 +01:00
Alberto Garcia	9f4ed6fbd2	block: Don't call blk_bs() twice in bdrv_lookup_bs() Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-11 16:22:46 +01:00
Max Reitz	5433c24f0f	block: Prepare for NULL BDS blk_bs() will not necessarily return a non-NULL value any more (unless blk_is_available() is true or it can be assumed to otherwise, e.g. because it is called immediately after a successful blk_new_with_bs() or blk_new_open()). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Max Reitz	373340b26c	block: Move I/O status and error actions into BB These options are only relevant for the user of a whole BDS tree (like a guest device or a block job) and should thus be moved into the BlockBackend. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Max Reitz	7f0e9da6f1	block: Move BlockAcctStats into BlockBackend As the comment above bdrv_get_stats() says, BlockAcctStats is something which belongs to the device instead of each BlockDriverState. This patch therefore moves it into the BlockBackend. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Max Reitz	68e9ec017b	block: Move guest_block_size into BlockBackend guest_block_size is a guest device property so it should be moved into the interface between block layer and guest devices, which is the BlockBackend. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Max Reitz	b4d02820d9	block: Invoke change media CB before NULLing drv In order to handle host device passthrough, some guest device models may call blk_is_inserted() to check whether the medium is inserted on the host, when checking the guest tray status. This tray status is inquired by blk_dev_change_media_cb(); because bdrv_is_inserted() (invoked by blk_is_inserted()) always returns false for BDS with drv set to NULL, blk_dev_change_media_cb() should therefore be called before drv is set to NULL. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Max Reitz	28d7a78996	block: Make bdrv_is_inserted() recursive If bdrv_is_inserted() is called on the top level BDS, it should make sure all nodes in the BDS tree are actually inserted. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:22 +02:00
Max Reitz	e031f75048	block: Make bdrv_is_inserted() return a bool Make bdrv_is_inserted(), blk_is_inserted(), and the callback BlockDriver.bdrv_is_inserted() return a bool. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:22 +02:00
Max Reitz	d44f928a54	block: Set BDRV_O_INCOMING in bdrv_fill_options() This flag should not be set for the root BDS only, but for any BDS that is being created while incoming migration is pending, so setting it is moved from blockdev_init() to bdrv_fill_options(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:22 +02:00
Daniel P. Berrange	10817bf09d	coroutine: move into libqemuutil.a library The coroutine files are currently referenced by the block-obj-y variable. The coroutine functionality though is already used by more than just the block code. eg migration code uses coroutine yield. In the future the I/O channel code will also use the coroutine yield functionality. Since the coroutine code is nicely self-contained it can be easily built as part of the libqemuutil.a library, making it widely available. The headers are also moved into include/qemu, instead of the include/block directory, since they are now part of the util codebase, and the impl was never in the block/ directory either. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2015-10-20 14:59:04 +01:00
Jeff Cody	15489c769b	block: auto-generated node-names If a node-name is not specified, automatically generate the node-name. Generated node-names will use the "block" sub-system identifier. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	779020cbdc	block: Allow bdrv_unref_child(bs, NULL) bdrv_unref() can be called with a NULL argument and doesn't do anything then. Make bdrv_unref_child() consistent with it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	8e419aefa0	block: Remove bdrv_swap() bdrv_swap() is unused now. Remove it and all functions that have no other users than bdrv_swap(). In particular, this removes the .bdrv_rebind callbacks from block drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	3f09bfbc7b	block: Add and use bdrv_replace_in_backing_chain() This cleans up the mess we left behind in the mirror code after the previous patch. Instead of using bdrv_swap(), just change pointers. The interface change of the mirror job that callers must consider is that after job completion, their local BDS pointers still point to the same node now. qemu-img must change its code accordingly (which makes it easier to understand); the other callers stays unchanged because after completion they don't do anything with the BDS, but just with the job, and the job is still owned by the source BDS. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	dd62f1ca43	block: Implement bdrv_append() without bdrv_swap() Remember all parent nodes and just change the pointers there instead of swapping the contents of the BlockDriverState. Handling of snapshot=on must be moved further down in bdrv_open() because *pbs (which is the bs pointer in the BlockBackend) must already be set before bdrv_append() is called. Otherwise bdrv_append() changes the BB's pointer to the temporary snapshot, but bdrv_open() overwrites it with the read-only original image. We also need to be careful to update callers as the interface changes (becomes less insane): Previously, the meaning of the two parameters was inverted when bdrv_append() returns. Now any BDS pointers keep pointing to the same node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	d42a8a935b	block: Introduce parents list Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	063dd40e11	block: Split bdrv_move_feature_fields() After bdrv_swap(), some fields must be moved back to their original BDS to compensate for the effects that a swap of the contents of the objects has while keeping the old addresses. Other fields must be moved back because they should logically be moved and must stay on top When replacing bdrv_swap() with operations changing the pointers in the parents, we only need the latter and must avoid swapping the former. Split the function accordingly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	5db15a5769	block: Manage backing file references in bdrv_set_backing_hd() This simplifies the code somewhat, especially when dropping whole backing file subchains. The exception is the mirroring code that does adventurous things with bdrv_swap() and in order to keep it working, I had to duplicate most of bdrv_set_backing_hd() locally. We'll get rid again of this ugliness shortly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	760e006384	block: Convert bs->backing_hd to BdrvChild This is the final step in converting all of the BlockDriverState pointers that block drivers use to BdrvChild. After this patch, bs->children contains the full list of child nodes that are referenced by a given BDS, and these children are only referenced through BdrvChild, so that updating the pointer in there is enough for changing edges in the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	b26e90f56a	block: Remove bdrv_open_image() It is unused now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	9a4f4c3156	block: Convert bs->file to BdrvChild This patch removes the temporary duplication between bs->file and bs->file_child by converting everything to BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	1fdd693308	block: Introduce BDS.file_child Store the BdrvChild for bs->file. At this point, bs->file_child->bs just duplicates the bs->file pointer. Later, it will completely replace it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Alberto Garcia	99b7e77567	block: disable I/O limits at the beginning of bdrv_close() Disabling I/O limits from a BDS also drains all pending throttled requests, so it should be done at the beginning of bdrv_close() with the rest of the bdrv_drain() calls before the BlockDriver is closed. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-02 13:48:29 +02:00
Kevin Wolf	4d2cb09251	block: Allow specifying driver-specific options to reopen Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-09-14 16:51:36 +02:00
Max Reitz	cf25ff850f	block: Drop bdrv_find_whitelisted_format() It is unused by now, so we can drop it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:36 +02:00
Max Reitz	053e1578c9	block: Drop drv parameter from bdrv_fill_options() Now that this parameter is effectively unused, we can drop it and change the function accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:36 +02:00
Max Reitz	ce34377124	block: Drop drv parameter from bdrv_open_inherit() Now that this parameter is effectively unused, we can drop it and just pass NULL to bdrv_fill_options(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:36 +02:00
Max Reitz	6ebf9aa2ef	block: Drop drv parameter from bdrv_open() Now that this parameter is effectively unused, we can drop it and just pass NULL on to bdrv_open_inherit(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:36 +02:00
Max Reitz	e6641719fe	block: Always pass NULL as drv for bdrv_open() Change all callers of bdrv_open() to pass the driver name in the options QDict instead of passing its BlockDriver pointer. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:36 +02:00
Kővágó, Zoltán	fe646693ac	opts: produce valid command line in qemu_opts_print This will let us print options in a format that the user would actually write it on the command line (foo=bar,baz=asd,etc=def), without prepending a spurious comma at the beginning of the list, or quoting values unnecessarily. This patch provides the following changes: * write and id=, if the option has an id * do not print separator before the first element * do not quote string arguments * properly escape commas (,) for QEMU Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-09-11 10:21:38 +03:00
Wen Congyang	e12f378409	block: more check for replaced node We use mirror+replace to fix quorum's broken child. bs/s->common.bs is quorum, and to_replace is the broken child. The new child is target_bs. Without this patch, the replace node can be any node, and it can be top BDS with BB, or another quorum's child. We just check if the broken child is part of the quorum BDS in this patch. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Message-id: 55A86486.1000404@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-09-02 14:56:39 +01:00
Kevin Wolf	80a1e13091	block: Fix backing file child when modifying graph This patch moves bdrv_attach_child() from the individual places that add a backing file to a BDS to bdrv_set_backing_hd(), which is called by all of them. It also adds bdrv_detach_child() there. For normal operation (starting with one backing file chain and not changing it until the topmost image is closed) and live snapshots, this constitutes no change in behaviour. For all other cases, this is a fix for the bug that the old backing file was still referenced as a child, and the new one wasn't referenced. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-07-14 17:15:23 +02:00
Kevin Wolf	9a7dedbc43	block: Reorder cleanups in bdrv_close() Block drivers may still want to access their child nodes in their .bdrv_close handler. If they unref and/or detach a child by themselves, this should not result in a double free. There is additional code for backing files, which are just a special case of child nodes. The same applies for them. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-07-14 17:15:23 +02:00
Kevin Wolf	33a604075c	block: Introduce bdrv_unref_child() This is the counterpart for bdrv_open_child(). It decreases the reference count of the child BDS and removes it from the list of children of the given parent BDS. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-07-14 17:15:23 +02:00
Kevin Wolf	b4b059f628	block: Introduce bdrv_open_child() It is the same as bdrv_open_image(), except that it doesn't only return success or failure, but the newly created BdrvChild object for the new child node. As the BdrvChild object already contains a BlockDriverState pointer (and this is supposed to become the only pointer so that bdrv_append() and friends can just change a single pointer in BdrvChild), the pbs parameter is removed for bdrv_open_child(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-07-14 17:15:18 +02:00
Kevin Wolf	df58179267	block: Move bdrv_attach_child() calls up the call chain Let the callers of bdrv_open_inherit() call bdrv_attach_child(). It needs to be called in all cases where bdrv_open_inherit() succeeds (i.e. returns 0) and a child_role is given. bdrv_attach_child() is moved upwards to avoid a forward declaration. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-07-14 15:55:19 +02:00
Fam Zheng	53ec73e264	block: Use bdrv_drain to replace uncessary bdrv_drain_all There callers work on a single BlockDriverState subtree, where using bdrv_drain() is more accurate. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-07 14:27:14 +01:00
Fam Zheng	c2e0dbbfd7	block: Initialize local_err in bdrv_append_temp_snapshot Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1436156684-16526-1-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-07 14:27:14 +01:00
Fam Zheng	6e82e4bce1	block: Remove bdrv_reset_dirty Using this function would always be wrong because a dirty bitmap must have a specific owner that consumes the dirty bits and calls bdrv_reset_dirty_bitmap(). Remove the unused function to avoid future misuse. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Dimitris Aragiorgis	b192af8acc	block: Use bdrv_is_sg() everywhere Instead of checking bs->sg use bdrv_is_sg() consistently throughout the code. Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-2-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Wen Congyang	c6a8c3283f	util/hbitmap: Add an API to reset all set bits in hbitmap The function bdrv_clear_dirty_bitmap() is updated to use faster hbitmap_reset_all() call. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 555E868A.60506@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:06:16 +01:00
Markus Armbruster	cc7a8ea740	Include qapi/qmp/qerror.h exactly where needed In particular, don't include it into headers. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:41 +02:00
Markus Armbruster	d49b683644	qerror: Move #include out of qerror.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:40 +02:00
Markus Armbruster	c6bd8c706a	qerror: Clean up QERR_ macros to expand into a single string These macros expand into error class enumeration constant, comma, string. Unclean. Has been that way since commit `13f59ae`. The error class is always ERROR_CLASS_GENERIC_ERROR since the previous commit. Clean up as follows: * Prepend every use of a QERR_ macro by ERROR_CLASS_GENERIC_ERROR, and delete it from the QERR_ macro. No change after preprocessing. * Rewrite error_set(ERROR_CLASS_GENERIC_ERROR, ...) into error_setg(...). Again, no change after preprocessing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:40 +02:00
Peter Maydell	f3e3b083d4	Block layer core and image format patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJVevYFAAoJEH8JsnLIjy/W3jEP/0hiQ3rCRZ/he8s5maTdT+TR YSeHkB5rKpz0Uopn1DMn1QrIbUVzX7dyb+uf9zQ0/xRQIzf6k8uxqU/NWrdoF3NK qx91dGWedwnG+TEBIMbcR7nMrw4dP6kH7uPz/VWMXDHVLz0HIcD95qhKgs0mSY6J dWqex6ACjXM68zJU5IioagU9evV80WZE1S8z7zfixxtTBx5hCaTVbwalkaCxcrXw PbZle55rjI8B10+OzgBw0fq10nias+NTndU9CwNBboxmEtAjq8/mQ663vcWlmiFo 9a/hkda27Z5ut/0Tqk1v4uLHauylp++rrAabPBAuCFMKes6cdkddP15Q/r52aJ29 5meodQtbet1rGrM+Aq4vuSuWId71PGypEI/3URDdNfYFNISoeLLsk4lcQUu7VrDD sRX3Jt8SI3nkIgOnhPyi7NDPmafxFt8yRt5vM8MyR5ynF8NS/2hiAc3wqnbXGjUj a5GqDCefb1yM0R5HvksuFFt3OnXlKJQ3J+ksXNUJf9DSAZPauqWD696pcTeg8wyy 3PIGkczgUuKTVfFWd3THZxJLAo7ZuqvXBHHV8o1SeMBDxwh4FhTd8Kjvm3rUNFfl VDox4qwZ1AcxLrxgqazKU7sD9iWBDHURRcpOoUsBys7oxQZnhcmMp1fRlEkTOyrD HNiSNByqBrtkfeVzHlSe =QgYk -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer core and image format patches # gpg: Signature made Fri Jun 12 16:08:53 2015 BST using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: (25 commits) block: Fix reopen flag inheritance block: Add BlockDriverState.inherits_from block: Add list of children to BlockDriverState queue.h: Add QLIST_FIX_HEAD_PTR() block: Drain requests before swapping nodes in bdrv_swap() block: Move flag inheritance to bdrv_open_inherit() block: Use QemuOpts in bdrv_open_common() block: Use macro for cache option names vmdk: Use bdrv_open_image() quorum: Use bdrv_open_image() check-qdict: Test cases for new functions qdict: Add qdict_{set,copy}_default() qdict: Add qdict_array_entries() iotests: Add tests for overriding BDRV_O_PROTOCOL block: driver should override flags in bdrv_open() block: Change bitmap truncate conditional to assertion block: record new size in bdrv_dirty_bitmap_truncate raw-posix: Fix .bdrv_co_get_block_status() for unaligned image size vmdk: Use vmdk_find_index_in_cluster everywhere vmdk: Fix index_in_cluster calculation in vmdk_co_get_block_status ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-06-15 10:43:06 +01:00
Kevin Wolf	67251a3113	block: Fix reopen flag inheritance When reopening an image, the block layer already takes care to reopen bs->file as well with recalculated inherited flags. The same must happen for any other child (most notably missing before this patch: backing files). If bs->file (or any other child) didn't originally inherit from bs, e.g. because it was created separately and then only referenced, it must not inherit flags on reopen either, so check the inherited_from field before propagation the reopen down. VMDK already reopened its extents manually; this code can now be dropped. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-06-12 17:04:59 +02:00
Kevin Wolf	bddcec3745	block: Add BlockDriverState.inherits_from Currently, the block layer assumes that any block node can have only one parent, and if it has a parent, that it inherits some options/flags from this parent. This is not true any more: With references used in block device creation, a single node can be used by multiple parents, or it can be created separately and not inherit flags from any parent. To handle reopens correctly, a node must know from which parent it inherited options. This patch adds the information to BlockDriverState. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-06-12 17:04:59 +02:00
Kevin Wolf	6e93e7c41f	block: Add list of children to BlockDriverState This allows iterating over all children of a given BDS, not only including bs->file and bs->backing_hd, but also driver-specific ones like VMDK extents or Quorum children. For bdrv_swap(), the list of children of the swapped BDS stays at that BDS (because that's where the pointers stay as well). The list head moves and pointers to it must be fixed up therefore. The list of children in the parent of the swapped BDS is not affected by the swap. The contents of the BDS objects is swapped, so the existing pointer in the parent automatically points to the newly swapped in BDS. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-06-12 17:04:59 +02:00
Kevin Wolf	6ee4ce1ee7	block: Drain requests before swapping nodes in bdrv_swap() bdrv_swap() requires that there are no requests in flight on either of the two devices. The request coroutine would work on the wrong BlockDriverState object (with bs->opaque even being interpreted as a different type potentially) and all sorts of bad things would result from this. The currently existing callers mostly ensure that there is no I/O pending on nodes that are swapped. In detail, this is: 1. Live snapshots. This goes through qmp_transaction(), which calls bdrv_drain_all() before doing anything. The command is executed synchronously, so no new I/O can be issued concurrently. 2. snapshot=on in bdrv_open(). We're in the middle of opening the image (both the original image and its temporary overlay), so there can't be any I/O in flight yet. 3. Mirroring. bdrv_drain() is already used on the source device so that the mirror doesn't miss anything. However, the main loop runs between that and the bdrv_swap() (which is actually a bug, being addressed in another series), so there is a small window in which new I/O might be issued that would be in flight during bdrv_swap(). It is safer to just drain the request queue of both devices in bdrv_swap() instead of relying on callers to do the right thing. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-06-12 17:04:59 +02:00
Kevin Wolf	f3930ed0bb	block: Move flag inheritance to bdrv_open_inherit() Instead of letting every caller of bdrv_open() determine the right flags for its child node manually and pass them to the function, pass the parent node and the role of the newly opened child (like backing file, protocol layer, etc.). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-06-12 17:04:59 +02:00
Kevin Wolf	18edf289a8	block: Use QemuOpts in bdrv_open_common() Instead of manually parsing options and then deleting them from the options QDict, just use QemuOpts like most other places that deal with block device options. More options will be added there and then QemuOpts is a lot more manageable than open-coding everything. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-06-12 16:59:06 +02:00
Max Reitz	53a2951312	block: driver should override flags in bdrv_open() The BDRV_O_PROTOCOL flag should have an impact only if no driver is specified explicitly. Therefore, if bdrv_open() is called with an explicit block driver argument (either through the options QDict or through the drv parameter) and that block driver is a protocol block driver, BDRV_O_PROTOCOL should be set; if it is a format block driver, BDRV_O_PROTOCOL should be unset. While there was code to unset the flag in case a format block driver has been selected, it only followed the bdrv_fill_options() function call whereas the flag in fact needs to be adjusted before it is used there. With that change, BDRV_O_PROTOCOL will always be set if the BDS should be a protocol driver; if the driver has been specified explicitly, the new code will set it; and bdrv_fill_options() will only "probe" a protocol driver if BDRV_O_PROTOCOL is set. The probing after bdrv_fill_options() cannot select a protocol driver. Thus, bdrv_open_image() to open BDS.file is never called if a protocol BDS is about to be created. With that change in turn it is impossible to call bdrv_open_common() with a protocol drv and file != NULL, which allows us to remove the bdrv_swap() call. This change breaks a test case in qemu-iotest 051: "-drive file=t.qcow2,file.driver=qcow2" now works because the explicitly specified "qcow2" overrides the BDRV_O_PROTOCOL which is automatically set for the "file" BDS (and the filename is just passed down). Therefore, this patch removes that test case. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-06-12 15:54:07 +02:00
John Snow	06207b0ff5	block: Change bitmap truncate conditional to assertion This is an artifact of an older version that had both all-bitmap and single-bitmap truncate functions, and some info got lost in the shuffle. Bitmaps can only be frozen during a backup operation, and a backup operation should prevent a resize operation, so just assert that this cannot happen. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-06-12 15:54:01 +02:00
John Snow	5270b6a0d0	block: record new size in bdrv_dirty_bitmap_truncate `ce1ffea8` neglected to update the BdrvDirtyBitmap structure itself for internal consistency. It's currently not an issue, but for migration and persistence series this will cause headaches. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-06-12 15:54:01 +02:00
Alberto Garcia	db6283385c	throttle: acquire the ThrottleGroup lock in bdrv_swap() bdrv_swap() touches the fields of a BlockDriverState that are protected by the ThrottleGroup lock. Although those fields end up in their original place, they are temporarily swapped in the process, so there's a chance that an operation on a member of the same group happening on a different thread can try to use them. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: d92dc40d7c4f1fc5cda5cbbf4ffb7a4670b79d17.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Alberto Garcia	76f4afb40f	throttle: Add throttle group support The throttle group support use a cooperative round robin scheduling algorithm. The principles of the algorithm are simple: - Each BDS of the group is used as a token in a circular way. - The active BDS computes if a wait must be done and arms the right timer. - If a wait must be done the token timer will be armed so the token will become the next active BDS. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: f0082a86f3ac01c46170f7eafe2101a92e8fde39.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Benoît Canet	0e5b0a2d54	throttle: Extract timers from ThrottleState into a separate structure Group throttling will share ThrottleState between multiple bs. As a consequence the ThrottleState will be accessed by multiple aio context. Timers are tied to their aio context so they must go out of the ThrottleState structure. This commit paves the way for each bs of a common ThrottleState to have its own timer. Signed-off-by: Benoit Canet <benoit.canet@nodalink.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 6cf9ea96d8b32ae2f8769cead38f68a6a0c8c909.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
John Snow	9abe3bdc45	qapi: add dirty bitmap status Bitmaps can be in a handful of different states with potentially more to come as we tool around with migration and persistence patches. Management applications may need to know why certain bitmaps are unavailable for various commands, e.g. busy in another operation, busy being migrated, etc. Right now, all we offer is BlockDirtyInfo's boolean member 'frozen'. Instead of adding more booleans, replace it by an enumeration member 'status' with values 'active' and 'frozen'. Then add new value 'disabled'. Incompatible change. Fine because the changed part hasn't been released so far. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2015-05-29 12:53:12 +02:00
Fam Zheng	4a9c9ea0d3	block: Detect multiplication overflow in bdrv_getlength Bogus image may have a large total_sectors that will overflow the multiplication. For cleanness, fix the return code so the error message will be meaningful. Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Denis V. Lunev	459b4e6612	block: align bounce buffers to page The following sequence int fd = open(argv[1], O_RDWR \| O_CREAT \| O_DIRECT, 0644); for (i = 0; i < 100000; i++) write(fd, buf, 4096); performs 5% better if buf is aligned to 4096 bytes. The difference is quite reliable. On the other hand we do not want at the moment to enforce bounce buffering if guest request is aligned to 512 bytes. The patch changes default bounce buffer optimal alignment to MAX(page size, 4k). 4k is chosen as maximal known sector size on real HDD. The justification of the performance improve is quite interesting. From the kernel point of view each request to the disk was split by two. This could be seen by blktrace like this: 9,0 11 1 0.000000000 11151 Q WS 312737792 + 1023 [qemu-img] 9,0 11 2 0.000007938 11151 Q WS 312738815 + 8 [qemu-img] 9,0 11 3 0.000030735 11151 Q WS 312738823 + 1016 [qemu-img] 9,0 11 4 0.000032482 11151 Q WS 312739839 + 8 [qemu-img] 9,0 11 5 0.000041379 11151 Q WS 312739847 + 1016 [qemu-img] 9,0 11 6 0.000042818 11151 Q WS 312740863 + 8 [qemu-img] 9,0 11 7 0.000051236 11151 Q WS 312740871 + 1017 [qemu-img] 9,0 5 1 0.169071519 11151 Q WS 312741888 + 1023 [qemu-img] After the patch the pattern becomes normal: 9,0 6 1 0.000000000 12422 Q WS 314834944 + 1024 [qemu-img] 9,0 6 2 0.000038527 12422 Q WS 314835968 + 1024 [qemu-img] 9,0 6 3 0.000072849 12422 Q WS 314836992 + 1024 [qemu-img] 9,0 6 4 0.000106276 12422 Q WS 314838016 + 1024 [qemu-img] and the amount of requests sent to disk (could be calculated counting number of lines in the output of blktrace) is reduced about 2 times. Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest does his job well and real requests comes properly aligned (to page). Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-3-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Denis V. Lunev	4196d2f030	block: minimal bounce buffer alignment The patch introduces new concept: minimal memory alignment for bounce buffers. Original so called "optimal" value is actually minimal required value for aligment. It should be used for validation that the IOVec is properly aligned and bounce buffer is not required. Though, from the performance point of view, it would be better if bounce buffer or IOVec allocated by QEMU will be aligned stricter. The patch does not change any alignment value yet. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-2-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00

... 2 3 4 5 6 ...

1172 Commits