mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Max Reitz	8f3a73bc57	block: Add blk_dev_has_tray() Pull out the check whether a block device has a tray from blk_dev_is_tray_open() into its own function so both attributes (whether there is a tray vs. whether that tray is open) can be queried independently. Cc: qemu-stable <qemu-stable@nongnu.org> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1454096953-31773-2-git-send-email-mreitz@redhat.com	2016-02-02 17:46:56 +01:00
Kevin Wolf	76b1c7fe1c	block: Inactivate BDS when migration completes So far, live migration with shared storage meant that the image is in a not-really-ready don't-touch-me state on the destination while the source is still actively using it, but after completing the migration, the image was fully opened on both sides. This is bad. This patch adds a block driver callback to inactivate images on the source before completing the migration. Inactivation means that it goes to a state as if it was just live migrated to the qemu instance on the source (i.e. BDRV_O_INACTIVE is set). You're then supposed to continue either on the source or on the destination, which takes ownership of the image. A typical migration looks like this now with respect to disk images: 1. Destination qemu is started, the image is opened with BDRV_O_INACTIVE. The image is fully opened on the source. 2. Migration is about to complete. The source flushes the image and inactivates it. Now both sides have the image opened with BDRV_O_INACTIVE and are expecting the other side to still modify it. 3. One side (the destination on success) continues and calls bdrv_invalidate_all() in order to take ownership of the image again. This removes BDRV_O_INACTIVE on the resuming side; the flag remains set on the other side. This ensures that the same image isn't written to by both instances (unless both are resumed, but then you get what you deserve). This is important because .bdrv_close for non-BDRV_O_INACTIVE images could write to the image file, which is definitely forbidden while another host is using the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2016-01-20 13:36:23 +01:00
Stefan Hajnoczi	bd44feb754	block: add BlockLimits.max_iov field The maximum number of struct iovec elements depends on the BlockDriverState. The raw-posix and iSCSI protocols have a maximum of IOV_MAX but others could have different values. Cc: Peter Lieven <pl@kamp.de> Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-12-22 16:01:07 +08:00
Max Reitz	8b13976d3f	block: Add opaque value to the amend CB Add an opaque value which is to be passed to the bdrv_amend_options() status callback. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Kevin Wolf	145f598e4a	block: Introduce bs->explicit_options bs->options doesn't only contain options that the user explicitly requested, but also option that were derived from flags, the filename or inherited from the parent node. For reopen, it is important to know the difference because reopening the parent can change inherited values in child nodes, but it shouldn't change any options that were explicitly specified for the child. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:43 +01:00
Kevin Wolf	8e2160e2c7	block: Add infrastructure for option inheritance Options are not actually inherited from the parent node yet, but this commit lays the grounds for doing so. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	4cdd01d32e	block: Pass driver-specific options to .bdrv_refresh_filename() In order to decide whether a blkdebug: filename can be produced or a json: one is necessary, blkdebug checked whether bs->options had more options than just "config", "x-image" or "image" (the latter including nested options). That doesn't work well when generic block layer options are present. This patch passes an option QDict to the driver that contains only driver-specific options, i.e. the options for the general block layer as well as child nodes are already filtered out. Works much better this way. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	260fecf13b	block: Exclude nested options only for children in append_open_options() Some drivers have nested options (e.g. blkdebug rule arrays), which don't belong to a child node and shouldn't be removed. Don't remove all options with "." in their name, but check for the complete prefixes of actually existing child nodes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	5365f44dfa	qcow2: Add .bdrv_join_options callback qcow2 accepts a few driver-specific options that overlap semantically (e.g. "overlap-check" is an alias of "overlap-check.template", and any missing cache size option is derived from the given ones). When bdrv_reopen() merges the set of updated options with left out options that should be kept at their old value, we need to consider this and filter out any duplicates (which would generally cause errors because new and old value would contradict each other). This patch adds a .bdrv_join_options callback to BlockDriver and implements it for qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-12-18 14:34:42 +01:00
Eric Blake	a31939e6c8	blkdebug: Merge hand-rolled and qapi BlkdebugEvent enum No need to keep two separate enums, where editing one is likely to forget the other. Now that we can specify a qapi enum prefix, we don't even have to change the bulk of the uses. get_event_by_name() could perhaps be replaced by qapi_enum_parse(), but I left that for another day. CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1447836791-369-20-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2015-12-17 08:21:27 +01:00
John Snow	78f51fde88	block: Add BlockJobTxn support to backup_run Allow a BlockJobTxn to be passed into backup_run, which will allow the job to join a transactional group if present. Propagate this new parameter outward into new QMP helper functions in blockdev.c to allow transaction commands to pass forward their BlockJobTxn object in a forthcoming patch. [split up from a patch originally by Stefan and Fam. --js] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1446765200-3054-12-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:44 +01:00
Fam Zheng	df9a681dc9	qed: Implement .bdrv_drain The "need_check_timer" is used to clear the "NEED_CHECK" flag in the image header after a grace period once metadata update has finished. In compliance to the bdrv_drain semantics we should make sure it remains deleted once .bdrv_drain is called. We cannot reuse qed_need_check_timer_cb because here it doesn't satisfy the assertion. Do the "plug" and "flush" calls manually. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-10-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Fam Zheng	67da1dc5ce	block: Introduce BlockDriver.bdrv_drain callback Drivers can have internal request sources that generate IO, like the need_check_timer in QED. Since we want quiesced periods that contain nested event loops in block layer, we need to have a way to disable such event sources. Block drivers must implement the "bdrv_drain" callback if it has any internal sources that can generate I/O activity, like a timer or a worker thread (even in a library) that can schedule QEMUBH in an asynchronous callback. Update the comments of bdrv_drain and bdrv_drained_begin accordingly. Like bdrv_requests_pending(), we should consider all the children of bs. Before, the while loop just works, as bdrv_requests_pending() already tracks its children; now we mustn't miss the callback, so recurse down explicitly. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1447064214-29930-9-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Fam Zheng	83c98d7b92	block: Drop BlockDriver.bdrv_ioctl Now the callback is not used any more, drop the field along with all implementations in block drivers, which are iscsi and raw. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-8-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Fam Zheng	ebde595ce6	block: Add more types for tracked request We'll track more request types besides read and write, change the boolean field to an enum. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:08 +01:00
Alberto Garcia	a0d64a61db	throttle: Use bs->throttle_state instead of bs->io_limits_enabled There are two ways to check for I/O limits in a BlockDriverState: - bs->throttle_state: if this pointer is not NULL, it means that this BDS is member of a throttling group, its ThrottleTimers structure has been initialized and its I/O limits are ready to be applied. - bs->io_limits_enabled: if true it means that the throttle_state pointer is valid _and_ the limits are currently enabled. The latter is used in several places to check whether a BDS has I/O limits configured, but what it really checks is whether requests are being throttled or not. For example, io_limits_enabled can be temporarily set to false in cases like bdrv_read_unthrottled() without otherwise touching the throtting configuration of that BDS. This patch replaces bs->io_limits_enabled with bs->throttle_state in all cases where what we really want to check is the existence of I/O limits, not whether they are currently enabled or not. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-11 16:25:47 +01:00
Max Reitz	c69a4dd899	block: Make bdrv_states public When inserting a BDS tree into a BB, we will need to add the root BDS to this list. Since we will want to do that in the blockdev-insert-medium implementation in blockdev.c, we will need access to it there. This patch is not exactly elegant, but bdrv_states will be removed in the future anyway because we no longer need it since we have BBs. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-11 16:22:46 +01:00
Fam Zheng	51288d7917	block: Introduce "drained begin/end" API The semantics is that after bdrv_drained_begin(bs), bs will not get new external requests until the matching bdrv_drained_end(bs). Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:24 +02:00
Max Reitz	281d22d86c	block: Add BlockBackendRootState This structure will store some of the state of the root BDS if the BDS tree is removed, so that state can be restored once a new BDS tree is inserted. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Max Reitz	373340b26c	block: Move I/O status and error actions into BB These options are only relevant for the user of a whole BDS tree (like a guest device or a block job) and should thus be moved into the BlockBackend. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Max Reitz	7f0e9da6f1	block: Move BlockAcctStats into BlockBackend As the comment above bdrv_get_stats() says, BlockAcctStats is something which belongs to the device instead of each BlockDriverState. This patch therefore moves it into the BlockBackend. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Max Reitz	53d8f9d8fb	block: Remove wr_highest_sector from BlockAcctStats BlockAcctStats contains statistics about the data transferred from and to the device; wr_highest_sector does not fit in with the rest. Furthermore, those statistics are supposed to be specific for a certain device and not necessarily for a BDS (see the comment above bdrv_get_stats()); on the other hand, wr_highest_sector may be a rather important information to know for each BDS. When BlockAcctStats is finally removed from the BDS, we will want to keep wr_highest_sector in the BDS. Finally, wr_highest_sector is renamed to wr_highest_offset and given the appropriate meaning. Externally, it is represented as an offset so there is no point in doing something different internally. Its definition is changed to match that in qapi/block-core.json which is "the offset after the greatest byte written to". Doing so should not cause any harm since if external programs tried to calculate the volume usage by (wr_highest_offset + 512) / volume_size, after this patch they will just assume the volume to be full slightly earlier than before. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Max Reitz	68e9ec017b	block: Move guest_block_size into BlockBackend guest_block_size is a guest device property so it should be moved into the interface between block layer and guest devices, which is the BlockBackend. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Max Reitz	e031f75048	block: Make bdrv_is_inserted() return a bool Make bdrv_is_inserted(), blk_is_inserted(), and the callback BlockDriver.bdrv_is_inserted() return a bool. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:22 +02:00
Daniel P. Berrange	10817bf09d	coroutine: move into libqemuutil.a library The coroutine files are currently referenced by the block-obj-y variable. The coroutine functionality though is already used by more than just the block code. eg migration code uses coroutine yield. In the future the I/O channel code will also use the coroutine yield functionality. Since the coroutine code is nicely self-contained it can be easily built as part of the libqemuutil.a library, making it widely available. The headers are also moved into include/qemu, instead of the include/block directory, since they are now part of the util codebase, and the impl was never in the block/ directory either. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2015-10-20 14:59:04 +01:00
Kevin Wolf	8e419aefa0	block: Remove bdrv_swap() bdrv_swap() is unused now. Remove it and all functions that have no other users than bdrv_swap(). In particular, this removes the .bdrv_rebind callbacks from block drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	d42a8a935b	block: Introduce parents list Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	a2d6190048	block-backend: Add blk_set_bs() It allows changing the BlockDriverState that a BlockBackend points to. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	439db28cf9	block/io: Make bdrv_requests_pending() public Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	760e006384	block: Convert bs->backing_hd to BdrvChild This is the final step in converting all of the BlockDriverState pointers that block drivers use to BdrvChild. After this patch, bs->children contains the full list of child nodes that are referenced by a given BDS, and these children are only referenced through BdrvChild, so that updating the pointer in there is enough for changing edges in the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	9a4f4c3156	block: Convert bs->file to BdrvChild This patch removes the temporary duplication between bs->file and bs->file_child by converting everything to BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	1fdd693308	block: Introduce BDS.file_child Store the BdrvChild for bs->file. At this point, bs->file_child->bs just duplicates the bs->file pointer. Later, it will completely replace it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	80a1e13091	block: Fix backing file child when modifying graph This patch moves bdrv_attach_child() from the individual places that add a backing file to a BDS to bdrv_set_backing_hd(), which is called by all of them. It also adds bdrv_detach_child() there. For normal operation (starting with one backing file chain and not changing it until the topmost image is closed) and live snapshots, this constitutes no change in behaviour. For all other cases, this is a fix for the bug that the old backing file was still referenced as a child, and the new one wasn't referenced. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-07-14 17:15:23 +02:00
Kevin Wolf	b4b059f628	block: Introduce bdrv_open_child() It is the same as bdrv_open_image(), except that it doesn't only return success or failure, but the newly created BdrvChild object for the new child node. As the BdrvChild object already contains a BlockDriverState pointer (and this is supposed to become the only pointer so that bdrv_append() and friends can just change a single pointer in BdrvChild), the pbs parameter is removed for bdrv_open_child(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-07-14 17:15:18 +02:00
Fam Zheng	6e82e4bce1	block: Remove bdrv_reset_dirty Using this function would always be wrong because a dirty bitmap must have a specific owner that consumes the dirty bits and calls bdrv_reset_dirty_bitmap(). Remove the unused function to avoid future misuse. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Fam Zheng	0fc9f8ea28	qmp: Add optional bool "unmap" to drive-mirror If specified as "true", it allows discarding on target sectors where source is not allocated. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
John Snow	4b80ab2b7d	qapi: Rename 'dirty-bitmap' mode to 'incremental' If we wish to make differential backups a feature that's easy to access, it might be pertinent to rename the "dirty-bitmap" mode to "incremental" to make it clear what /type/ of backup the dirty-bitmap is helping us perform. This is an API breaking change, but 2.4 has not yet gone live, so we have this flexibility. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1433463642-21840-2-git-send-email-jsnow@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 09:20:18 +01:00
Markus Armbruster	a0b1a66ea3	Include monitor/monitor.h exactly where needed In particular, don't include it into headers. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:41 +02:00
Markus Armbruster	cc7a8ea740	Include qapi/qmp/qerror.h exactly where needed In particular, don't include it into headers. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:41 +02:00
Peter Maydell	f3e3b083d4	Block layer core and image format patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJVevYFAAoJEH8JsnLIjy/W3jEP/0hiQ3rCRZ/he8s5maTdT+TR YSeHkB5rKpz0Uopn1DMn1QrIbUVzX7dyb+uf9zQ0/xRQIzf6k8uxqU/NWrdoF3NK qx91dGWedwnG+TEBIMbcR7nMrw4dP6kH7uPz/VWMXDHVLz0HIcD95qhKgs0mSY6J dWqex6ACjXM68zJU5IioagU9evV80WZE1S8z7zfixxtTBx5hCaTVbwalkaCxcrXw PbZle55rjI8B10+OzgBw0fq10nias+NTndU9CwNBboxmEtAjq8/mQ663vcWlmiFo 9a/hkda27Z5ut/0Tqk1v4uLHauylp++rrAabPBAuCFMKes6cdkddP15Q/r52aJ29 5meodQtbet1rGrM+Aq4vuSuWId71PGypEI/3URDdNfYFNISoeLLsk4lcQUu7VrDD sRX3Jt8SI3nkIgOnhPyi7NDPmafxFt8yRt5vM8MyR5ynF8NS/2hiAc3wqnbXGjUj a5GqDCefb1yM0R5HvksuFFt3OnXlKJQ3J+ksXNUJf9DSAZPauqWD696pcTeg8wyy 3PIGkczgUuKTVfFWd3THZxJLAo7ZuqvXBHHV8o1SeMBDxwh4FhTd8Kjvm3rUNFfl VDox4qwZ1AcxLrxgqazKU7sD9iWBDHURRcpOoUsBys7oxQZnhcmMp1fRlEkTOyrD HNiSNByqBrtkfeVzHlSe =QgYk -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer core and image format patches # gpg: Signature made Fri Jun 12 16:08:53 2015 BST using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: (25 commits) block: Fix reopen flag inheritance block: Add BlockDriverState.inherits_from block: Add list of children to BlockDriverState queue.h: Add QLIST_FIX_HEAD_PTR() block: Drain requests before swapping nodes in bdrv_swap() block: Move flag inheritance to bdrv_open_inherit() block: Use QemuOpts in bdrv_open_common() block: Use macro for cache option names vmdk: Use bdrv_open_image() quorum: Use bdrv_open_image() check-qdict: Test cases for new functions qdict: Add qdict_{set,copy}_default() qdict: Add qdict_array_entries() iotests: Add tests for overriding BDRV_O_PROTOCOL block: driver should override flags in bdrv_open() block: Change bitmap truncate conditional to assertion block: record new size in bdrv_dirty_bitmap_truncate raw-posix: Fix .bdrv_co_get_block_status() for unaligned image size vmdk: Use vmdk_find_index_in_cluster everywhere vmdk: Fix index_in_cluster calculation in vmdk_co_get_block_status ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-06-15 10:43:06 +01:00
Kevin Wolf	bddcec3745	block: Add BlockDriverState.inherits_from Currently, the block layer assumes that any block node can have only one parent, and if it has a parent, that it inherits some options/flags from this parent. This is not true any more: With references used in block device creation, a single node can be used by multiple parents, or it can be created separately and not inherit flags from any parent. To handle reopens correctly, a node must know from which parent it inherited options. This patch adds the information to BlockDriverState. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-06-12 17:04:59 +02:00
Kevin Wolf	6e93e7c41f	block: Add list of children to BlockDriverState This allows iterating over all children of a given BDS, not only including bs->file and bs->backing_hd, but also driver-specific ones like VMDK extents or Quorum children. For bdrv_swap(), the list of children of the swapped BDS stays at that BDS (because that's where the pointers stay as well). The list head moves and pointers to it must be fixed up therefore. The list of children in the parent of the swapped BDS is not affected by the swap. The contents of the BDS objects is swapped, so the existing pointer in the parent automatically points to the newly swapped in BDS. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-06-12 17:04:59 +02:00
Kevin Wolf	f3930ed0bb	block: Move flag inheritance to bdrv_open_inherit() Instead of letting every caller of bdrv_open() determine the right flags for its child node manually and pass them to the function, pass the parent node and the role of the newly opened child (like backing file, protocol layer, etc.). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-06-12 17:04:59 +02:00
Alberto Garcia	76f4afb40f	throttle: Add throttle group support The throttle group support use a cooperative round robin scheduling algorithm. The principles of the algorithm are simple: - Each BDS of the group is used as a token in a circular way. - The active BDS computes if a wait must be done and arms the right timer. - If a wait must be done the token timer will be armed so the token will become the next active BDS. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: f0082a86f3ac01c46170f7eafe2101a92e8fde39.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Alberto Garcia	2ff1f2e3a3	throttle: Add throttle group infrastructure Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 2fdb4de17210b733a13eb472c33cd08b45f8fd21.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Benoît Canet	0e5b0a2d54	throttle: Extract timers from ThrottleState into a separate structure Group throttling will share ThrottleState between multiple bs. As a consequence the ThrottleState will be accessed by multiple aio context. Timers are tied to their aio context so they must go out of the ThrottleState structure. This commit paves the way for each bs of a common ThrottleState to have its own timer. Signed-off-by: Benoit Canet <benoit.canet@nodalink.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 6cf9ea96d8b32ae2f8769cead38f68a6a0c8c909.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Denis V. Lunev	4196d2f030	block: minimal bounce buffer alignment The patch introduces new concept: minimal memory alignment for bounce buffers. Original so called "optimal" value is actually minimal required value for aligment. It should be used for validation that the IOVec is properly aligned and bounce buffer is not required. Though, from the performance point of view, it would be better if bounce buffer or IOVec allocated by QEMU will be aligned stricter. The patch does not change any alignment value yet. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-2-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Stefan Hajnoczi	0eb7217e49	block: extract bdrv_setup_io_funcs() Move the code to install coroutine and aio emulation function pointers in a BlockDriver to its own function. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:17 +02:00
Stefan Hajnoczi	e0c47b6cb1	block: add bdrv_set_dirty()/bdrv_reset_dirty() to block_int.h The dirty bitmap functions are called from the block I/O processing code. Make them visible to block_int.h users so they can be used outside block.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:17 +02:00
John Snow	d58d845397	qmp: Add support of "dirty-bitmap" sync mode for drive-backup For "dirty-bitmap" sync mode, the block job will iterate through the given dirty bitmap to decide if a sector needs backup (backup all the dirty clusters and skip clean ones), just as allocation conditions of "top" sync mode. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-11-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	5fba6c0e50	qmp: Ensure consistent granularity type We treat this field with a variety of different types everywhere in the code. Now it's just uint32_t. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-4-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
Ekaterina Tumanova	892b7de832	block: add bdrv functions for geometry and blocksize Add driver functions for geometry and blocksize detection Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424087278-49393-2-git-send-email-tumanova@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:21 +01:00
Max Reitz	06d05fa738	qcow2: Allow creation with refcount order != 4 Add a creation option to qcow2 for setting the refcount order of images to be created, and respect that option's value. This breaks some test outputs, fix them. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:21 +01:00
Teruaki Ishizaki	876eb1b0cc	sheepdog: selectable object size support Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify object_size option. If you want to create a VDI of 8MB object size, you need to specify following command option. # qemu-img create -o object_size=8M sheepdog:test1 100M In addition, when you don't specify qemu-img command option, a default value of sheepdog cluster is used for creating VDI. # qemu-img create sheepdog:test2 100M Signed-off-by: Teruaki Ishizaki <ishizaki.teruaki@lab.ntt.co.jp> Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-09 11:11:59 +01:00
Max Reitz	c0191e763b	block: Remove "growable" from BDS Now that request clamping is done in the BlockBackend, the "growable" field can be removed from the BlockDriverState. All BDSs are now treated as being "growable" (that is, they are allowed to grow; they are not necessarily actually able to). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-16-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:19 +00:00
Francesco Romani	e2462113b2	block: add event when disk usage exceeds threshold Managing applications, like oVirt (http://www.ovirt.org), make extensive use of thin-provisioned disk images. To let the guest run smoothly and be not unnecessarily paused, oVirt sets a disk usage threshold (so called 'high water mark') based on the occupation of the device, and automatically extends the image once the threshold is reached or exceeded. In order to detect the crossing of the threshold, oVirt has no choice but aggressively polling the QEMU monitor using the query-blockstats command. This lead to unnecessary system load, and is made even worse under scale: deployments with hundreds of VMs are no longer rare. To fix this, this patch adds: * A new monitor command `block-set-write-threshold', to set a mark for a given block device. * A new event `BLOCK_WRITE_THRESHOLD', to report if a block device usage exceeds the threshold. * A new `write_threshold' field into the `BlockDeviceInfo' structure, to report the configured threshold. This will allow the managing application to use smarter and more efficient monitoring, greatly reducing the need of polling. [Updated qemu-iotests 067 output to add the new 'write_threshold' property. --Stefan] [Changed g_assert_false() to !g_assert() to fix the build on older glib versions. --Kevin] Signed-off-by: Francesco Romani <fromani@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1421068273-692-1-git-send-email-fromani@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Jeff Cody	9a29e18f7d	block: update string sizes for filename,backing_file,exact_filename The string field entries 'filename', 'backing_file', and 'exact_filename' in the BlockDriverState struct are defined as 1024 bytes. However, many places that use these values accept a maximum of PATH_MAX bytes, so we have a mixture of 1024 byte and PATH_MAX byte allocations. This patch makes the BlockDriverStruct field string sizes match usage. This patch also does a few fixes related to the size that needs to happen now: * the block qapi driver is updated to use PATH_MAX bytes * the qcow and qcow2 drivers have an additional safety check * the block vvfat driver is updated to use PATH_MAX bytes for the size of backing_file, for systems where PATH_MAX is < 1024 bytes. * qemu-img uses PATH_MAX rather than 1024. These instances were not changed to be dynamically allocated, however, as the extra temporary 3K in stack usage for qemu-img does not seem worrisome. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:06 +01:00
Max Reitz	5f535a941e	block: Make essential BlockDriver objects public There are some block drivers which are essential to QEMU and may not be removed: These are raw, file and qcow2 (as the default non-raw format). Make their BlockDriver objects public so they can be directly referenced throughout the block layer without needing to call bdrv_find_format() and having to deal with an error at runtime, while the real problem occurred during linking (where raw, file or qcow2 were not linked into qemu). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:19 +01:00
Kevin Wolf	38f3ef574b	raw: Prohibit dangerous writes for probed images If the user neglects to specify the image format, QEMU probes the image to guess it automatically, for convenience. Relying on format probing is insecure for raw images (CVE-2008-2004). If the guest writes a suitable header to the device, the next probe will recognize a format chosen by the guest. A malicious guest can abuse this to gain access to host files, e.g. by crafting a QCOW2 header with backing file /etc/shadow. Commit `1e72d3b` (April 2008) provided -drive parameter format to let users disable probing. Commit `f965509` (March 2009) extended QCOW2 to optionally store the backing file format, to let users disable backing file probing. QED has had a flag to suppress probing since the beginning (2010), set whenever a raw backing file is assigned. All of these additions that allow to avoid format probing have to be specified explicitly. The default still allows the attack. In order to fix this, commit `79368c8` (July 2010) put probed raw images in a restricted mode, in which they wouldn't be able to overwrite the first few bytes of the image so that they would identify as a different image. If a write to the first sector would write one of the signatures of another driver, qemu would instead zero out the first four bytes. This patch was later reverted in commit `8b33d9e` (September 2010) because it didn't get the handling of unaligned qiov members right. Today's block layer that is based on coroutines and has qiov utility functions makes it much easier to get this functionality right, so this patch implements it. The other differences of this patch to the old one are that it doesn't silently write something different than the guest requested by zeroing out some bytes (it fails the request instead) and that it doesn't maintain a list of signatures in the raw driver (it calls the usual probe function instead). Note that this change doesn't introduce new breakage for false positive cases where the guest legitimately writes data into the first sector that matches the signatures of an image format (e.g. for nested virt): These cases were broken before, only the failure mode changes from corruption after the next restart (when the wrong format is probed) to failing the problematic write request. Also note that like in the original patch, the restrictions only apply if the image format has been guessed by probing. Explicitly specifying a format allows guests to write anything they like. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:13 +01:00
Kevin Wolf	7cddd3728e	block: Read only one sector for format probing The only image format driver that even potentially accesses anything after 512 bytes in its bdrv_probe() implementation is VMDK, which reads a plain-text descriptor file. In practice, the field it's looking for seems to come first and will be well within the first 512 bytes, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1416497234-29880-7-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Max Reitz	7748543420	block: Add status callback to bdrv_amend_options() Depending on the changed options and the image format, bdrv_amend_options() may take a significant amount of time. In these cases, a way to be informed about the operation's status is desirable. Since the operation is rather complex and may fundamentally change the image, implementing it as AIO or a coroutine does not seem feasible. On the other hand, implementing it as a block job would be significantly more difficult than a simple callback and would not add benefits other than progress report to the amending operation, because it should not actually be run as a block job at all. A callback may not be very pretty, but it's very easy to implement and perfectly fits its purpose here. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414404776-4919-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Peter Lieven	2647fab57d	BlockLimits: introduce max_transfer_length Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Markus Armbruster	a7f53e26a6	block: Lift device model API into BlockBackend Move device model attachment / detachment and the BlockDevOps device model callbacks and their wrappers from BlockDriverState to BlockBackend. Wrapper calls in block.c change from bdrv_dev_FOO_cb(bs, ...) to if (bs->blk) { bdrv_dev_FOO_cb(bs->blk, ...); } No change, because both bdrv_dev_change_media_cb() and bdrv_dev_resize_cb() do nothing when no device model is attached, and a device model can be attached only when bs->blk. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	097310b53e	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	7c84b1b831	block: Rename BlockDriverAIOCB* to BlockAIOCB* I'll use BlockDriverAIOCB with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	bfb197e0d9	block: Eliminate BlockDriverState member device_name[] device_name[] can become non-empty only in bdrv_new_root() and bdrv_move_feature_fields(). The latter is used only to undo damage done by bdrv_swap(). The former is called only by blk_new_with_bs(). Therefore, when a BlockDriverState's device_name[] is non-empty, then it's been created with a BlockBackend, and vice versa. Furthermore, blk_new_with_bs() keeps the two names equal. Therefore, device_name[] is redundant. Eliminate it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	7e7d56d9e0	block: Connect BlockBackend to BlockDriverState Convenience function blk_new_with_bs() creates a BlockBackend with its BlockDriverState. Callers have to unref both. The commit after next will relieve them of the need to unref the BlockDriverState. Complication: due to the silly way drive_del works, we need a way to hide a BlockBackend, just like bdrv_make_anon(). To emphasize its "special" status, give the function a suitably off-putting name: blk_hide_on_behalf_of_do_drive_del(). Unfortunately, hiding turns the BlockBackend's name into the empty string. Can't avoid that without breaking the blk->bs->device_name equals blk->name invariant. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Benoît Canet	5e5a94b605	block: Extract the block accounting code The plan is to add new accounting metrics (latency, invalid requests, failed requests, queue depth) and block.c is overpopulated so it will be better to work in a separate module. Moreover the long term plan is to have statistics in each of the BDS of the graph for metrology purpose; this means that the device model statistics must move from the topmost BDS to the device model. So we need to decouple the statistic code from BlockDriverState. This is another argument for the extraction of the code in a separate module. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Benoit Canet <benoit@irqsave.net> CC: Fam Zheng <famz@redhat.com> CC: Peter Crosthwaite <peter.crosthwaite@xilinx.com> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	0ddd0ad96a	block: Extract the BlockAcctStats structure Extract the block accounting statistics into a structure so the block device models can hold them in the future. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Max Reitz	33384421b3	block: Add AIO context notifiers If a long-running operation on a BDS wants to always remain in the same AIO context, it somehow needs to keep track of the BDS changing its context. This adds a function for registering callbacks on a BDS which are called whenever the BDS is attached or detached from an AIO context. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:48:45 +01:00
Max Reitz	91af701412	block: Add bdrv_refresh_filename() Some block devices may not have a filename in their BDS; and for some, there may not even be a normal filename at all. To work around this, add a function which tries to construct a valid filename for the BDS.filename field. If a filename exists or a block driver is able to reconstruct a valid filename (which is placed in BDS.exact_filename), this can directly be used. If no filename can be constructed, we can still construct an options QDict which is then converted to a JSON object and prefixed with the "json:" pseudo protocol prefix. The QDict is placed in BDS.full_open_options. For most block drivers, this process can be done automatically; those that need special handling may define a .bdrv_refresh_filename() method to fill BDS.exact_filename and BDS.full_open_options themselves. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 14:31:56 +02:00
Kevin Wolf	3baca89139	block: Add Error argument to bdrv_refresh_limits() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-18 13:18:43 +01:00
Ming Lei	448ad91db4	block: block: introduce APIs for submitting IO as a batch This patch introduces three APIs so that following patches can support queuing I/O requests and submitting them as a batch for improving I/O performance. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-07 11:05:17 +02:00
Jeff Cody	54e2690090	block: extend block-commit to accept a string for the backing file On some image chains, QEMU may not always be able to resolve the filenames properly, when updating the backing file of an image after a block commit. For instance, certain relative pathnames may fail, or drives may have been specified originally by file descriptor (e.g. /dev/fd/???), or a relative protocol pathname may have been used. In these instances, QEMU may lack the information to be able to make the correct choice, but the user or management layer most likely does have that knowledge. With this extension to the block-commit api, the user is able to change the backing file of the overlay image as part of the block-commit operation. This allows the change to be 'safe', in the sense that if the attempt to write the overlay image metadata fails, then the block-commit operation returns failure, without disrupting the guest. If the commit top is the active layer, then specifying the backing file string will be treated as an error (there is no overlay image to modify in that case). If a backing file string is not specified in the command, the backing file string to use is determined in the same manner as it was previously. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-01 10:47:01 +02:00
Chunyan Liu	4ab1559085	qemu-img create: add 'nocow' option Add 'nocow' option so that users could have a chance to set NOCOW flag to newly created files. It's useful on btrfs file system to enhance performance. Btrfs has low performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files. Generally, there are two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then all newly created files will be NOCOW. b) per file. Add the NOCOW file attribute. It could only be done to empty or new files. This patch tries the second way, according to the option, it could add NOCOW per file. For most block drivers, since the create file step is in raw-posix.c, so we can do setting NOCOW flag ioctl in raw-posix.c only. But there are some exceptions, like block/vpc.c and block/vdi.c, they are creating file by calling qemu_open directly. For them, do the same setting NOCOW flag ioctl work in them separately. [Fixed up 082.out due to the new 'nocow' creation option --Stefan] Signed-off-by: Chunyan Liu <cyliu@suse.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-01 10:15:12 +02:00
Benoît Canet	09158f00e0	block: Add replaces argument to drive-mirror drive-mirror will bdrv_swap the new BDS named node-name with the one pointed by replaces when the mirroring is finished. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-06-27 20:00:00 +02:00
Kevin Wolf	8ee79e707a	block: Catch backing files assigned to non-COW drivers Since we parse backing.* options to add a backing file from the command line when the driver didn't assign one, it has been possible to have a backing file for e.g. raw images (it just was never accessed). This is obvious nonsense and should be rejected. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-06-26 13:51:01 +02:00
Wenchao Xia	5a2d2cbd88	qapi event: convert BLOCK_IO_ERROR and BLOCK_JOB_ERROR Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2014-06-23 11:12:27 -04:00
Chunyan Liu	c282e1fdf7	cleanup QEMUOptionParameter Now that all backend drivers are using QemuOpts, remove all QEMUOptionParameter related codes. Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Signed-off-by: Chunyan Liu <cyliu@suse.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-16 17:23:21 +08:00
Chunyan Liu	83d0521a1e	change block layer to support both QemuOpts and QEMUOptionParamter Change block layer to support both QemuOpts and QEMUOptionParameter. After this patch, it will change backend drivers one by one. At the end, QEMUOptionParameter will be removed and only QemuOpts is kept. Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Signed-off-by: Chunyan Liu <cyliu@suse.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-16 17:23:20 +08:00
Fam Zheng	db519cba87	block: Move declaration of bdrv_get_aio_context to block.h block_int.h is for block layer and block drivers, other code shouldn't include it. But similar to bdrv_set_aio_context, bdrv_get_aio_context should also be accessible from outside of block layer. Move it. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-04 09:56:12 +02:00
Stefan Hajnoczi	dcd042282d	block: add bdrv_set_aio_context() Up until now all BlockDriverState instances have used the QEMU main loop for fd handlers, timers, and BHs. This is not scalable on SMP guests and hosts so we need to move to a model with multiple event loops on different host CPUs. bdrv_set_aio_context() assigns the AioContext event loop to use for a particular BlockDriverState. It first detaches the entire BlockDriverState graph from the current AioContext and then attaches to the new AioContext. This function will be used by virtio-blk data-plane to assign a BlockDriverState to its IOThread AioContext. Make bdrv_aio_set_context() public since data-plane should not include block_int.h. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-04 09:56:11 +02:00
Fam Zheng	826b6ca0b0	block: Add backing_blocker in BlockDriverState This makes use of op_blocker and blocks all the operations except for commit target, on each BlockDriverState->backing_hd. The asserts for op_blocker in bdrv_swap are removed because with this change, the target of block commit has at least the backing blocker of its child, so the assertion is not true. Callers should do their check. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Fam Zheng	3718d8ab65	block: Replace in_use with operation blocker This drops BlockDriverState.in_use with op_blockers: - Call bdrv_op_block_all in place of bdrv_set_in_use(bs, 1). - Call bdrv_op_unblock_all in place of bdrv_set_in_use(bs, 0). - Check bdrv_op_is_blocked() in place of bdrv_in_use(bs). The specific types are used, e.g. in place of starting block backup, bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP, ...). There is one exception in block_job_create, where bdrv_op_blocker_is_empty() is used, because we don't know the operation type here. This doesn't matter because in a few commits away we will drop the check and move it to callers that _do_ know the type. - Check bdrv_op_blocker_is_empty() in place of assert(!bs->in_use). Note: there is only bdrv_op_block_all and bdrv_op_unblock_all callers at this moment. So although the checks are specific to op types, this changes can still be seen as identical logic with previously with in_use. The difference is error message are improved because of blocker error info. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Fam Zheng	fbe40ff780	block: Introduce op_blockers to BlockDriverState BlockDriverState.op_blockers is an array of lists with BLOCK_OP_TYPE_MAX elements. Each list is a list of blockers of an operation type (BlockOpType), that marks this BDS as currently blocked for a certain type of operation with reason errors stored in the list. The rule of usage is: * BDS user who wants to take an operation should check if there's any blocker of the type with bdrv_op_is_blocked(). * BDS user who wants to block certain types of operation, should call bdrv_op_block (or bdrv_op_block_all to block all types of operations, which is similar to the existing bdrv_set_in_use()). * A blocker is only referenced by op_blockers, so the lifecycle is managed by caller, and shouldn't be lost until unblock, so typically a caller does these: - Allocate a blocker with error_setg or similar, call bdrv_op_block() to block some operations. - Hold the blocker, do his job. - Unblock operations that it blocked, with the same reason pointer passed to bdrv_op_unblock(). - Release the blocker with error_free(). Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Peter Lieven	465bee1da8	block: optimize zero writes with bdrv_write_zeroes this patch tries to optimize zero write requests by automatically using bdrv_write_zeroes if it is supported by the format. This significantly speeds up file system initialization and should speed zero write test used to test backend storage performance. I ran the following 2 tests on my internal SSD with a 50G QCOW2 container and on an attached iSCSI storage. a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX QCOW2 [off] [on] [unmap] ----- runtime: 14secs 1.1secs 1.1secs filesize: 937M 18M 18M iSCSI [off] [on] [unmap] ---- runtime: 9.3s 0.9s 0.9s b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct QCOW2 [off] [on] [unmap] ----- runtime: 246secs 18secs 18secs filesize: 51G 192K 192K throughput: 203M/s 2.3G/s 2.3G/s iSCSI* [off] [on] [unmap] ---- runtime: 8mins 45secs 33secs throughput: 106M/s 1.2G/s 1.6G/s allocated: 100% 100% 0% * The storage was connected via an 1Gbit interface. It seems to internally handle writing zeroes via WRITESAME16 very fast. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-05-19 13:42:27 +02:00
Kevin Wolf	8bfea15dda	block: Unlink temporary files in raw-posix/win32 Instead of having unlink() calls in the generic block layer, where we aren't even guarateed to have a file name, move them to those block drivers that are actually used and that always have a filename. Gets us rid of some #ifdefs as well. The patch also converts bs->is_temporary to a new BDRV_O_TEMPORARY open flag so that it is inherited in the protocol layer and the raw-posix and raw-win32 drivers can unlink the file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-30 11:05:00 +02:00
Kevin Wolf	5a8a30db47	block: Add error handling to bdrv_invalidate_cache() If it returns an error, the migrated VM will not be started, but qemu exits with an error message. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-03-19 09:39:41 +01:00
Benoît Canet	b5042a3622	block: Rewrite the snapshot authorization mechanism for block filters. This patch keep the recursive way of doing things but simplify it by giving two responsabilities to all block filters implementors. They will need to do two things: -Set the is_filter field of their block driver to true. -Implement the bdrv_recurse_is_first_non_filter method of their block driver like it is done on the Quorum block driver. (block/quorum.c) [Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes the semantics of blkverify, which now recurses down both bs->file and s->test_file. -- Stefan] Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-03-13 14:23:27 +01:00
Kevin Wolf	6460440f34	block: Allow wait_serialising_requests() at any point We can only have a single wait_serialising_requests() call per request because otherwise we can run into deadlocks where requests are waiting for each other. The same is true when wait_serialising_requests() is not at the very beginning of a request, so that other requests can be issued between the start of the tracking and wait_serialising_requests(). Fix this by changing wait_serialising_requests() to ignore requests that are already (directly or indirectly) waiting for the calling request. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	7327145f63	block: Make overlap range for serialisation dynamic Copy on Read wants to serialise with all requests touching the same cluster, so wait_serialising_requests() rounded to cluster boundaries. Other users like alignment RMW will have different requirements, though (requests touching the same sector), so make it dynamic. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	2dbafdc012	block: Generalise and optimise COR serialisation Change the API so that specific requests can be marked serialising. Only these requests are checked for overlaps then. This means that during a Copy on Read operation, not all requests overlapping other requests are serialised any more, but only those that actually overlap with the specific COR request. Also remove COR from function and variable names because this functionality can be useful in other contexts. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	793ed47a7a	block: Switch BdrvTrackedRequest to byte granularity Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Paolo Bonzini	c25f53b06e	raw: Probe required direct I/O alignment Add a bs->request_alignment field that contains the required offset/length alignment for I/O requests and fill it in the raw block drivers. Use ioctls if possible, else see what alignment it takes for O_DIRECT to succeed. While at it, also expose the memory alignment requirements, which may be (and in practice are) different from the disk alignment requirements. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:02 +01:00
Paolo Bonzini	1b7fd72955	block: rename buffer_alignment to guest_block_size The alignment field is now set to the value that is promised to the guest, rather than required by the host. The next patches will make QEMU aware of the host-provided values, so make this clear. The alignment is also not about memory buffers, but about the sectors on the disk, change the documentation of the field. At this point, the field is set by the device emulation, but completely ignored by the block layer. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:01 +01:00
Kevin Wolf	339064d506	block: Don't use guest sector size for qemu_blockalign() bs->buffer_alignment is set by the device emulation and contains the logical block size of the guest device. This isn't something that the block layer should know, and even less something to use for determining the right alignment of buffers to be used for the host. The new BlockLimits field opt_mem_alignment tells the qemu block layer the optimal alignment to be used so that no bounce buffer must be used in the driver. This patch may change the buffer alignment from 4k to 512 for all callers that used qemu_blockalign() with the top-level image format BlockDriverState. The value was never propagated to other levels in the tree, so in particular raw-posix never required anything else than 512. While on disks with 4k sectors direct I/O requires a 4k alignment, memory may still be okay when aligned to 512 byte boundaries. This is what must have happened in practice, because otherwise this would already have failed earlier. Therefore I don't expect regressions even with this intermediate state. Later, raw-posix can implement the hook and expose a different memory alignment requirement. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:01 +01:00
Kevin Wolf	d34682cd4a	block: Move initialisation of BlockLimits to bdrv_refresh_limits() This function separates filling the BlockLimits from bdrv_open(), which allows it to call it from other operations which may change the limits (e.g. modifications to the backing file chain or bdrv_reopen) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:01 +01:00
Benoît Canet	212a5a8f09	block: Create authorizations mechanism for external snapshot and resize. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-01-24 16:07:08 +01:00
Benoît Canet	dc364f4cdc	block: Add bs->node_name to hold the name of a bs node of the bs graph. Add the minimum of code to prepare for the following patches. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-01-24 14:33:01 +01:00
Fam Zheng	03544a6e9e	block: Add commit_active_start() commit_active_start is implemented in block/mirror.c, It will create a job with "commit" type and designated base in block-commit command. This will be used for committing active layer of device. Sync mode is removed from MirrorBlockJob because there's no proper type for commit. The used information is is_none_mode. The common part of mirror_start and commit_active_start is moved to mirror_start_job(). Fix the comment wording for commit_start. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-20 16:26:16 +01:00
Peter Lieven	7337acaf21	block: add opt_transfer_length to BlockLimits Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-05 11:45:24 +01:00
Wenchao Xia	7b4c4781e3	snapshot: distinguish id and name in load_tmp Since later this function will be used so improve it. The only caller of it now is qemu-img, and it is not impacted by introduce function bdrv_snapshot_load_tmp_by_id_or_name() that call bdrv_snapshot_load_tmp() twice to keep old search logic. bdrv_snapshot_load_tmp_by_id_or_name() return int to let caller know the errno, and errno will be used later. Also fix a typo in comments of bdrv_snapshot_delete(). Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-04 15:19:00 +01:00
Fam Zheng	4cc70e9337	blkdebug: add "remove_break" command This adds "remove_break" command which is the reverse of blkdebug command "break": it removes all breakpoints with given tag and resumes all the requests. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-29 13:40:37 +01:00
Liu Yuan	b3af018f3b	sheepdog: support user-defined redundancy option Sheepdog support two kinds of redundancy, full replication and erasure coding. # create a fully replicated vdi with x copies -o redundancy=x (1 <= x <= SD_MAX_COPIES) # create a erasure coded vdi with x data strips and y parity strips -o redundancy=x:y (x must be one of {2,4,8,16} and 1 <= y < SD_EC_MAX_STRIP) E.g, to convert a vdi into sheepdog vdi 'test' with 8:3 erasure coding scheme $ qemu-img convert -o redundancy=8:3 linux-0.2.img sheepdog:test Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-29 13:40:37 +01:00
Fam Zheng	e4654d2d94	block: per caller dirty bitmap Previously a BlockDriverState has only one dirty bitmap, so only one caller (e.g. a block job) can keep track of writing. This changes the dirty bitmap to a list and creates a BdrvDirtyBitmap for each caller, the lifecycle is managed with these new functions: bdrv_create_dirty_bitmap bdrv_release_dirty_bitmap Where BdrvDirtyBitmap is a linked list wrapper structure of HBitmap. In place of bdrv_set_dirty_tracking, a BdrvDirtyBitmap pointer argument is added to these functions, since each caller has its own dirty bitmap: bdrv_get_dirty bdrv_dirty_iter_init bdrv_get_dirty_count bdrv_set_dirty and bdrv_reset_dirty prototypes are unchanged but will internally walk the list of all dirty bitmaps and set them one by one. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-11-29 13:40:33 +01:00
Peter Lieven	fe81c2cca6	block: add BlockLimits structure to BlockDriverState this patch adds BlockLimits which introduces discard and write_zeroes limits and alignment information to the BlockDriverState. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-28 10:30:51 +01:00
Peter Lieven	aa7bfbfff7	block: add flags to bdrv_*_write_zeroes Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-28 10:30:51 +01:00
Kevin Wolf	b94a261057	block: Avoid unecessary drv->bdrv_getlength() calls The block layer generally keeps the size of an image cached in bs->total_sectors so that it doesn't have to perform expensive operations to get the size whenever it needs it. This doesn't work however when using a backend that can change its size without qemu being aware of it, i.e. passthrough of removable media like CD-ROMs or floppy disks. For this reason, the caching is disabled when a removable device is used. It is obvious that checking whether the _guest_ device has removable media isn't the right thing to do when we want to know whether the size of the host backend can change. To make things worse, non-top-level BlockDriverStates never have any device attached, which makes qemu assume they are removable, so drv->bdrv_getlength() is always called on the protocol layer. In the case of raw-posix, this causes unnecessary lseek() system calls, which turned out to be rather expensive. This patch completely changes the logic and disables bs->total_sectors caching only for certain block driver types, for which a size change is expected: host_cdrom and host_floppy on POSIX, host_device on win32; also the raw format in case it sits on top of one of these protocols, but in the common case the nested bdrv_getlength() call on the protocol driver will use the cache again and avoid an expensive drv->bdrv_getlength() call. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-29 13:10:26 +01:00
Benoît Canet	f6186f49e2	block: Add BlockDriver.bdrv_check_ext_snapshot. This field is used by blkverify to disable external snapshots creation. It will also be used by block filters like quorum to disable external snapshot creation. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:49:59 +02:00
Max Reitz	eae041fe6f	block: Add bdrv_get_specific_info Add a function for retrieving an ImageInfoSpecific object from a block driver. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 10:52:54 +02:00
Benoît Canet	030be32184	block: introduce BlockDriver.bdrv_needs_filename to enable some drivers. Some drivers will have driver specifics options but no filename. This new bool allow the block layer to treat them correctly. The .bdrv_needs_filename is set in drivers not having .bdrv_parse_filename and not having .bdrv_open. The first exception to this rule will be the quorum driver. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 16:21:28 +02:00
Max Reitz	d5124c00d8	bdrv: Use "Error" for creating images Add an Error ** parameter to BlockDriver.bdrv_create to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Max Reitz	015a1036a7	bdrv: Use "Error" for opening images Add an Error ** parameter to BlockDriver.bdrv_open and BlockDriver.bdrv_file_open to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:47 +02:00
Wenchao Xia	a89d89d3e6	snapshot: distinguish id and name in snapshot delete Snapshot creation actually already distinguish id and name since it take a structured parameter sn, but delete can't. Later an accurate delete is needed in qmp_transaction abort and blockdev-snapshot-delete-sync, so change its prototype. Also errp is added to tip error, but return value is kepted to let caller check what kind of error happens. Existing caller for it are savevm, delvm and qemu-img, they are not impacted by introducing a new function bdrv_snapshot_delete_by_id_or_name(), which check the return value and do the operation again. Before this patch: For qcow2, it search id first then name to find the one to delete. For rbd, it search name. For sheepdog, it does nothing. After this patch: For qcow2, logic is the same by call it twice in caller. For rbd, it always fails in delete with id, but still search for name in second try, no change to user. Some code for *errp is based on Pavel's patch. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:47 +02:00
Max Reitz	6f176b48f9	block: Image file option amendment This patch adds the "amend" option to qemu-img which allows changing image options on existing image files. It also adds the generic bdrv implementation which is basically just a wrapper for the image format specific function. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Paolo Bonzini	b6b8a33354	block: introduce bdrv_get_block_status API For now, bdrv_get_block_status is just another name for bdrv_is_allocated. The next patches will add more flags. This also touches all block drivers with a mostly mechanical rename. The sole exception is cow; because it calls cow_co_is_allocated from the read code, we keep that function and make cow_co_get_block_status a wrapper. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Fam Zheng	9fcb025146	block: implement reference count for BlockDriverState Introduce bdrv_ref/bdrv_unref to manage the lifecycle of BlockDriverState. They are unused for now but will used to replace bdrv_delete() later. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Benoît Canet	cc0681c454	block: Enable the new throttling code in the block layer. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:07 +02:00
Alex Bligh	6a1751b7aa	aio / timers: Untangle include files include/qemu/timer.h has no need to include main-loop.h and doing so causes an issue for the next patch. Unfortunately various files assume including timers.h will pull in main-loop.h. Untangle this mess. Signed-off-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 19:10:27 +02:00
Asias He	0d51b4debe	block: Introduce bs->zero_beyond_eof In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when protocols reading beyond end of file), we break qemu-iotests ./check -qcow2 022. This happens because qcow2 temporarily sets ->growable = 1 for vmstate accesses (which are stored beyond the end of regular image data). We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to disable ->zero_beyond_eof temporarily in addition to enable ->growable. [Since the broken patch "block: Produce zeros when protocols reading beyond end of file" has not been merged yet, I have applied this fix first and will then apply the next patch to keep the tree bisectable. -- Stefan] Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 14:10:21 +02:00
Ian Main	fc5d3f8432	Implement sync modes for drive-backup. This patch adds sync-modes to the drive-backup interface and implements the FULL, NONE and TOP modes of synchronization. FULL performs as before copying the entire contents of the drive while preserving the point-in-time using CoW. NONE only copies new writes to the target drive. TOP copies changes to the topmost drive image and preserves the point-in-time using CoW. For sync mode TOP are creating a new target image using the same backing file as the original disk image. Then any new data that has been laid on top of it since creation is copied in the main backup_run() loop. There is an extra check in the 'TOP' case so that we don't bother to copy all the data of the backing file as it already exists in the target. This is where the bdrv_co_is_allocated() is used to determine if the data exists in the topmost layer or below. Also any new data being written is intercepted via the write_notifier hook which ends up calling backup_do_cow() to copy old data out before it gets overwritten. For mode 'NONE' we create the new target image and only copy in the original data from the disk image starting from the time the call was made. This preserves the point in time data by only copying the parts that are going to change to the target image. This way we can reconstruct the final image by checking to see if the given block exists in the new target image first, and if it does not, you can get it from the original image. This is basically an optimization allowing you to do point-in-time snapshots with low overhead vs the 'FULL' version. Since there is no old data to copy out the loop in backup_run() for the NONE case just calls qemu_coroutine_yield() which only wakes up after an event (usually cancel in this case). The rest is handled by the before_write notifier which again calls backup_do_cow() to write out the old data so it can be preserved. Signed-off-by: Ian Main <imain@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-07-26 22:01:31 +02:00
Dietmar Maurer	98d2c6f2cd	block: add basic backup support to block driver backup_start() creates a block job that copies a point-in-time snapshot of a block device to a target block device. We call backup_do_cow() for each write during backup. That function reads the original data from the block device before it gets overwritten. The data is then written to the target device. Currently backup cluster size is hardcoded to 65536 bytes. [I made a number of changes to Dietmar's original patch and folded them in to make code review easy. Here is the full list: * Drop BackupDumpFunc interface in favor of a target block device * Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes() * Use 0 delay instead of 1us, like other block jobs * Unify creation/start functions into backup_start() * Simplify cleanup, free bitmap in backup_run() instead of cb * function * Use HBitmap to avoid duplicating bitmap code * Use bdrv_getlength() instead of accessing ->total_sectors * directly * Delete the backup.h header file, it is no longer necessary * Move ./backup.c to block/backup.c * Remove #ifdefed out code * Coding style and whitespace cleanups * Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks * Keep our own in-flight CowRequest list instead of using block.c tracked requests. This means a little code duplication but is much simpler than trying to share the tracked requests list and use the backup block size. * Add on_source_error and on_target_error error handling. * Use trace events instead of DPRINTF() -- stefanha] Signed-off-by: Dietmar Maurer <dietmar@proxmox.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:26 +02:00
Stefan Hajnoczi	d616b22474	block: add bdrv_add_before_write_notifier() The bdrv_add_before_write_notifier() function installs a callback that is invoked before a write request is processed. This will be used to implement copy-on-write point-in-time snapshots where we need to copy out old data before overwriting it. Note that BdrvTrackedRequest is moved to block_int.h since it is passed to .notify() functions. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:26 +02:00
Wenchao Xia	f364ec65b5	block: move qmp and info dump related code to block/qapi.c This patch is a pure code move patch, except following modification: 1 get_human_readable_size() is changed to static function. 2 dump_human_image_info() is renamed to bdrv_image_info_dump(). 3 in qmp_query_block() and qmp_query_blockstats, use bdrv_next(bs) instead of direct traverse of global array 'bdrv_states'. 4 collect_snapshots() and collect_image_info() are renamed, unused parameter *fmt in collect_image_info() is removed. 5 code style fix. To avoid conflict and tip better, macro in header file is BLOCK_QAPI_H instead of QAPI_H. Now block.h and snapshot.h are at the same level in include path, block_int.h and qapi.h will both include them. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-04 13:56:30 +02:00
Kevin Wolf	56d1b4d21d	block: Remove filename parameter from .bdrv_file_open() It is unused now in all block drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 11:34:35 +02:00
Kevin Wolf	cf8074b382	block: Introduce bdrv_writev_vmstate Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 08:26:18 +02:00
Stefan Hajnoczi	ae29d6c64b	block: keep I/O throttling slice time constant It is not necessary to adjust the slice time at runtime. We already extend the current slice in order to carry over accounting into the next slice. Changing the actual slice time value introduces oscillations. The guest may experience large changes in throughput or IOPS from one moment to the next when slice times are adjusted. Reported-by: Benoît Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-By: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-04-05 18:58:05 +02:00
Stefan Hajnoczi	5905fbc9c9	block: fix I/O throttling accounting blind spot I/O throttling relies on bdrv_acct_done() which is called when a request completes. This leaves a blind spot since we only charge for completed requests, not submitted requests. For example, if there is 1 operation remaining in this time slice the guest could submit 3 operations and they will all be submitted successfully since they don't actually get accounted for until they complete. Originally we probably thought this is okay since the requests will be accounted when the time slice is extended. In practice it causes fluctuations since the guest can exceed its I/O limit and it will be punished for this later on. Account for I/O upon submission so that I/O limits are enforced properly. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-By: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-04-05 18:58:05 +02:00
Kevin Wolf	c2ad1b0c46	block: Allow omitting the file name when using driver-specific options After this patch, using -drive with an empty file name continues to open the file if driver-specific options are used. If no driver-specific options are specified, the semantics stay as it was: It defines a drive without an inserted medium. In order to achieve this, bdrv_open() must be made safe to work with a NULL filename parameter. The assumption that is made is that only block drivers which implement bdrv_parse_filename() support using driver specific options and could therefore work without a filename. These drivers must make sure to cope with NULL in their implementation of .bdrv_open() (this is only NBD for now). For all other drivers, the block layer code will make sure to error out before calling into their code - they can't possibly work without a filename. Now an NBD connection can be opened like this: qemu-system-x86_64 -drive file.driver=nbd,file.port=1234,file.host=::1 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	6963a30d82	block: Introduce .bdrv_parse_filename callback If a driver needs structured data and not just a string, it can provide a .bdrv_parse_filename callback now that parses the command line string into separate options. Keeping this separate from .bdrv_open_filename ensures that the preferred way of directly specifying the options always works as well if parsing the string works. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	787e4a8500	block: Add options QDict to bdrv_file_open() prototypes The new parameter is unused yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:31 +01:00
Stefan Hajnoczi	85d126f3ee	block: add bdrv_get_aio_context() For now bdrv_get_aio_context() is just a stub that calls qemu_aio_get_context() since the block layer is currently tied to the main loop AioContext. Add the stub now so that the block layer can begin accessing its AioContext. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2013-03-15 16:07:51 +01:00
Kevin Wolf	de9c0cec6c	block: Add options QDict to bdrv_open() prototype It doesn't do anything yet except storing the options QDict in the BlockDriverState. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:49 +01:00
Kevin Wolf	1a86938f04	block: Add options QDict to .bdrv_open() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:49 +01:00
Othmar Pasteka	7f2039f611	vmdk: Allow selecting SCSI adapter in image creation Introduce a new option "adapter_type" when converting to vmdk images. It can be one of the following: ide (default), buslogic, lsilogic or legacyESX (according to the vmdk spec from vmware). In case of a non-ide adapter, heads is set to 255 instead of the 16. The latter is used for "ide". Also see LP#545089 Signed-off-by: Othmar Pasteka <pasteka@kabsi.at> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:28 +01:00
Paolo Bonzini	08e4ed6cde	mirror: add buf-size argument to drive-mirror This makes sense when the next commit starts using the extra buffer space to perform many I/O operations asynchronously. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	eee13dfe30	mirror: allow customizing the granularity The desired granularity may be very different depending on the kind of operation (e.g. continuous replication vs. collapse-to-raw) and whether the VM is expected to perform lots of I/O while mirroring is in progress. Allow the user to customize it, while providing a sane default so that in general there will be no extra allocated space in the target compared to the source. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	8f0720ecbc	block: implement dirty bitmap using HBitmap This actually uses the dirty bitmap in the block layer, and converts mirroring to use an HBitmapIter. Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts) Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:33 +01:00
Paolo Bonzini	1de7afc984	misc: move include files to include/qemu/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:32:39 +01:00
Paolo Bonzini	83c9089e73	monitor: move include files to include/monitor/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:32 +01:00
Paolo Bonzini	737e150e89	block: move include files to include/block/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:31 +01:00

... 2 3 4 5 6

291 Commits