mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Kevin Wolf	40365552c2	mirror: Error out when a BDS would get two BBs bdrv_replace_in_backing_chain() asserts that not both old and new BlockDdriverState have a BlockBackend attached to them because both would have to end up pointing to the new BDS and we don't support more than one BB per BDS yet. Before we can safely allow references to existing nodes as backing files, we need to make sure that even if a backing file has a BB on it, this doesn't crash qemu. There are probably also some cases with the 'replaces' option set where drive-mirror could fail this assertion today. They are fixed with this error check as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:42 +01:00
Fam Zheng	176c36997f	mirror: Quiesce source during "mirror_exit" With dataplane, the ioeventfd events could be dispatched after mirror_run releases the dirty bitmap, but before mirror_exit actually does the device switch, because the iothread will still be running, and it will cause silent data loss. Fix this by adding a bdrv_drained_begin/end pair around the window, so that no new external request will be handled. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-12-02 10:44:06 -05:00
Fam Zheng	18930ba3d1	blockjob: Introduce reference count and fix reference to job->bs Add reference count to block job, meanwhile move the ownership of the reference to job->bs from the caller (which is released in two completion callbacks) to the block job itself. It is necessary for block_job_complete_sync to work, because block job shouldn't live longer than its bs, as asserted in bdrv_delete. Now block_job_complete_sync can be simplified. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1446765200-3054-6-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Alberto Garcia	10f3cd15dd	mirror: block all operations on the target image during the job There's nothing preventing the target image from being used by other operations during the 'drive-mirror' job, so we should block them all until the job is done. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 82b88fd04cde918a08a6f9a4ab85626d7fd7b502.1446475331.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-11-11 16:55:28 +01:00
Max Reitz	373340b26c	block: Move I/O status and error actions into BB These options are only relevant for the user of a whole BDS tree (like a guest device or a block job) and should thus be moved into the BlockBackend. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Kevin Wolf	3f09bfbc7b	block: Add and use bdrv_replace_in_backing_chain() This cleans up the mess we left behind in the mirror code after the previous patch. Instead of using bdrv_swap(), just change pointers. The interface change of the mirror job that callers must consider is that after job completion, their local BDS pointers still point to the same node now. qemu-img must change its code accordingly (which makes it easier to understand); the other callers stays unchanged because after completion they don't do anything with the BDS, but just with the job, and the job is still owned by the source BDS. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	8ccb9569a9	blockjob: Store device name at job creation Some block jobs change the block device graph on completion. This means that the device that owns the job and originally was addressed with its device name may no longer be what the corresponding BlockBackend points to. Previously, the effects of bdrv_swap() ensured that the job was (at least partially) transferred to the target image. Events that contain the device name could still use bdrv_get_device_name(job->bs) and get the same result. After removing bdrv_swap(), this won't work any more. Instead, save the device name at job creation and use that copy for QMP events and anything else identifying the job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	5db15a5769	block: Manage backing file references in bdrv_set_backing_hd() This simplifies the code somewhat, especially when dropping whole backing file subchains. The exception is the mirroring code that does adventurous things with bdrv_swap() and in order to keep it working, I had to duplicate most of bdrv_set_backing_hd() locally. We'll get rid again of this ugliness shortly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	760e006384	block: Convert bs->backing_hd to BdrvChild This is the final step in converting all of the BlockDriverState pointers that block drivers use to BdrvChild. After this patch, bs->children contains the full list of child nodes that are referenced by a given BDS, and these children are only referenced through BdrvChild, so that updating the pointer in there is enough for changing edges in the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Paolo Bonzini	c84b31926f	block: switch from g_slice allocator to malloc Simplify memory allocation by sticking with a single API. GSlice is not that fast anyway (tcmalloc/jemalloc are better). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-12 11:17:45 +01:00
Jeff Cody	5279efebcf	block: mirror - fix full sync mode when target does not support zero init During mirror, if the target device does not support zero init, a mirror may result in a corrupted image for sync="full" mode. This is due to how the initial dirty bitmap is set up prior to copying data - we did not mark sectors as dirty that are unallocated. This means those unallocated sectors are skipped over on the target, and for a device without zero init, invalid data may reside in those holes. If both of the following conditions are true, then we will explicitly mark all sectors as dirty: 1.) sync = "full" 2.) bdrv_has_zero_init(target) == false If the target does support zero init, but a target image is passed in with data already present (i.e. an "existing" image), it is assumed the data present in the existing image is valid data for those sectors. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 91ed4bc5bda7e2b09eb508b07c83f4071fe0b3c9.1443705220.git.jcody@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-10-01 15:02:21 -04:00
Wen Congyang	e12f378409	block: more check for replaced node We use mirror+replace to fix quorum's broken child. bs/s->common.bs is quorum, and to_replace is the broken child. The new child is target_bs. Without this patch, the replace node can be any node, and it can be top BDS with BB, or another quorum's child. We just check if the broken child is part of the quorum BDS in this patch. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Message-id: 55A86486.1000404@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-09-02 14:56:39 +01:00
Kevin Wolf	e424aff5f3	mirror: Fix coroutine reentrance This fixes a regression introduced by commit `dcfb3beb` ("mirror: Do zero write on target if sectors not allocated"), which was reported to cause aborts with the message "Co-routine re-entered recursively". The cause for this bug is the following code in mirror_iteration_done(): if (s->common.busy) { qemu_coroutine_enter(s->common.co, NULL); } This has always been ugly because - unlike most places that reenter - it doesn't have a specific yield that it pairs with, but is more uncontrolled. What we really mean here is "reenter the coroutine if it's in one of the four explicit yields in mirror.c". This used to be equivalent with s->common.busy because neither mirror_run() nor mirror_iteration() call any function that could yield. However since commit `dcfb3beb` this doesn't hold true any more: bdrv_get_block_status_above() can yield. So what happens is that bdrv_get_block_status_above() wants to take a lock that is already held, so it adds itself to the queue of waiting coroutines and yields. Instead of being woken up by the unlock function, however, it gets woken up by mirror_iteration_done(), which is obviously wrong. In most cases the code actually happens to cope fairly well with such cases, but in this specific case, the unlock must already have scheduled the coroutine for wakeup when mirror_iteration_done() reentered it. And then the coroutine happened to process the scheduled restarts and tried to reenter itself recursively. This patch fixes the problem by pairing the reenter in mirror_iteration_done() with specific yields instead of abusing s->common.busy. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1439455310-11263-1-git-send-email-kwolf@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-08-14 09:51:31 -04:00
Stefan Hajnoczi	cae98cb87d	block/mirror: limit qiov to IOV_MAX elements If mirror has more free buffers than IOV_MAX, preadv(2)/pwritev(2) EINVAL failures may be encountered. It is possible to trigger this by setting granularity to a low value like 8192. This patch stops appending chunks once IOV_MAX is reached. The spurious EINVAL failure can be reproduced with a qcow2 image file and the following QMP invocation: qmp.command('drive-mirror', device='virtio0', target='/tmp/r7.s1', granularity=8192, sync='full', mode='absolute-paths', format='raw') While the guest is running dd if=/dev/zero of=/var/tmp/foo oflag=direct bs=4k. Cc: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1435761950-26714-1-git-send-email-stefanha@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-08-06 04:41:09 -04:00
Fam Zheng	9990069758	mirror: Speed up bitmap initial scanning Limiting to sectors_per_chunk for each bdrv_is_allocated_above is slow, because the underlying protocol driver would issue much more queries than necessary. We should coalesce the query. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: <1436413678-7114-4-git-send-email-famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-22 11:14:21 +01:00
Wen Congyang	48ac0a4df8	mirror: correct buf_size If bus_size is less than 0, the command fails. If buf_size is 0, use DEFAULT_MIRROR_BUF_SIZE. If buf_size % granularity is not 0, mirror_free_init() will do dangerous things. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 5555A588.3080907@cn.fujitsu.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-07-14 21:50:13 -04:00
Fam Zheng	4c0cbd6fec	block/mirror: Sleep periodically during bitmap scanning Before, we only yield after initializing dirty bitmap, where the QMP command would return. That may take very long, and guest IO will be blocked. Add sleep points like the later mirror iterations. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1431486673-19280-1-git-send-email-famz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-07-14 21:50:13 -04:00
Ting Wang	970311646a	blockjob: add block_job_release function There is job resource leak in function mirror_start_job, although bdrv_create_dirty_bitmap is unlikely failed. Add block_job_release for each release when needed. Signed-off-by: Ting Wang <kathy.wangting@huawei.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1435311455-56048-1-git-send-email-kathy.wangting@huawei.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-07 14:27:14 +01:00
Fam Zheng	dcfb3beb51	mirror: Do zero write on target if sectors not allocated If guest discards a source cluster, mirroring with bdrv_aio_readv is overkill. Some protocols do zero upon discard, where it's best to use bdrv_aio_write_zeroes, otherwise, bdrv_aio_discard will be enough. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Fam Zheng	0fc9f8ea28	qmp: Add optional bool "unmap" to drive-mirror If specified as "true", it allows discarding on target sectors where source is not allocated. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
John Snow	4b80ab2b7d	qapi: Rename 'dirty-bitmap' mode to 'incremental' If we wish to make differential backups a feature that's easy to access, it might be pertinent to rename the "dirty-bitmap" mode to "incremental" to make it clear what /type/ of backup the dirty-bitmap is helping us perform. This is an API breaking change, but 2.4 has not yet gone live, so we have this flexibility. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1433463642-21840-2-git-send-email-jsnow@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 09:20:18 +01:00
Markus Armbruster	cc7a8ea740	Include qapi/qmp/qerror.h exactly where needed In particular, don't include it into headers. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:41 +02:00
Markus Armbruster	c6bd8c706a	qerror: Clean up QERR_ macros to expand into a single string These macros expand into error class enumeration constant, comma, string. Unclean. Has been that way since commit `13f59ae`. The error class is always ERROR_CLASS_GENERIC_ERROR since the previous commit. Clean up as follows: * Prepend every use of a QERR_ macro by ERROR_CLASS_GENERIC_ERROR, and delete it from the QERR_ macro. No change after preprocessing. * Rewrite error_set(ERROR_CLASS_GENERIC_ERROR, ...) into error_setg(...). Again, no change after preprocessing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:40 +02:00
Max Reitz	001c95b740	block/mirror: Always call block_job_sleep_ns() The mirror block job is trying to take a clever shortcut if delay_ns is 0 and skips block_job_sleep_ns() in that case. But that function must be called in every block job iteration, because otherwise it is for example impossible to pause the job. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:11 +02:00
John Snow	20dca81075	block: Ensure consistent bitmap function prototypes We often don't need the BlockDriverState for functions that operate on bitmaps. Remove it. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-15-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	d58d845397	qmp: Add support of "dirty-bitmap" sync mode for drive-backup For "dirty-bitmap" sync mode, the block job will iterate through the given dirty bitmap to decide if a sector needs backup (backup all the dirty clusters and skip clean ones), just as allocation conditions of "top" sync mode. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-11-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	341ebc2f81	qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove The new command pair is added to manage a user created dirty bitmap. The dirty bitmap's name is mandatory and must be unique for the same device, but different devices can have bitmaps with the same names. The granularity is an optional field. If it is not specified, we will choose a default granularity based on the cluster size if available, clamped to between 4K and 64K to mirror how the 'mirror' code was already choosing granularity. If we do not have cluster size info available, we choose 64K. This code has been factored out into a helper shared with block/mirror. This patch also introduces the 'block_dirty_bitmap_lookup' helper, which takes a device name and a dirty bitmap name and validates the lookup, returning NULL and setting errp if there is a problem with either field. This helper will be re-used in future patches in this series. The types added to block-core.json will be re-used in future patches in this series, see: 'qapi: Add transaction support to block-dirty-bitmap-{add, enable, disable}' Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1429314609-29776-5-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	5fba6c0e50	qmp: Ensure consistent granularity type We treat this field with a variety of different types everywhere in the code. Now it's just uint32_t. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-4-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
Fam Zheng	0db6e54a8a	qapi: Add optional field "name" to block dirty bitmap This field will be set for user created dirty bitmap. Also pass in an error pointer to bdrv_create_dirty_bitmap, so when a name is already taken on this BDS, it can report an error message. This is not global check, two BDSes can have dirty bitmap with a common name. Implemented bdrv_find_dirty_bitmap to find a dirty bitmap by name, will be used later when other QMP commands want to reference dirty bitmap by name. Add bdrv_dirty_bitmap_make_anon. This unsets the name of dirty bitmap. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-3-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
Fam Zheng	a7282330c0	blockjob: Update function name in comments Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1428069921-2957-5-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:09 +02:00
Fam Zheng	751ebd76e6	blockjob: Allow nested pause This patch changes block_job_pause to increase the pause counter and block_job_resume to decrease it. The counter will allow calling block_job_pause/block_job_resume unconditionally on a job when we need to suspend the IO temporarily. From now on, each block_job_resume must be paired with a block_job_pause to keep the counter balanced. The user pause from QMP or HMP will only trigger block_job_pause once until it's resumed, this is achieved by adding a user_paused flag in BlockJob. One occurrence of block_job_resume in mirror_complete is replaced with block_job_enter which does what is necessary. In block_job_cancel, the cancel flag is good enough to instruct coroutines to quit loop, so use block_job_enter to replace the unpaired block_job_resume. Upon block job IO error, user is notified about the entering to the pause state, so this pause belongs to user pause, set the flag accordingly and expect a matching QMP resume. [Extended doc comments as suggested by Paolo Bonzini <pbonzini@redhat.com>. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1428069921-2957-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:09 +02:00
Jeff Cody	1d33936ea8	block: mirror - change string allocation to 2-bytes The backing_filename string in mirror_run() is only used to check for a NULL string, so we don't need to allocate 1024 bytes (or, later, PATH_MAX bytes), when we only need to copy the first 2 characters. We technically only need 1 byte, as we are just checking for NULL, but since backing_filename[] is populated by bdrv_get_backing_filename(), a string size of 1 will always only return '\0'; Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:06 +01:00
Vladimir Sementsov-Ogievskiy	c4237dfa63	block: fix spoiling all dirty bitmaps by mirror and migration Mirror and migration use dirty bitmaps for their purposes, and since commit [block: per caller dirty bitmap] they use their own bitmaps, not the global one. But they use old functions bdrv_set_dirty and bdrv_reset_dirty, which change all dirty bitmaps. Named dirty bitmaps series by Fam and Snow are affected: mirroring and migration will spoil all (not related to this mirroring or migration) named dirty bitmaps. This patch fixes this by adding bdrv_set_dirty_bitmap and bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent such mistakes in future, old functions bdrv_(set,reset)_dirty are made static, for internal block usage. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com> CC: John Snow <jsnow@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Denis V. Lunev <den@openvz.org> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417081246-3593-1-git-send-email-vsementsov@parallels.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-01-13 11:47:56 +00:00
Stefan Hajnoczi	5a7e7a0bad	block: let mirror blockjob run in BDS AioContext The mirror block job must run in the BlockDriverState AioContext so that it works with dataplane. Acquire the AioContext in blockdev.c so starting the block job is safe. Note that to_replace is treated separately from other BlockDriverStates in that it does not need to be in the same AioContext. Explicitly acquire/release to_replace's AioContext when accessing it. The completion code in block/mirror.c must perform BDS graph manipulation and bdrv_reopen() from the main loop. Use block_job_defer_to_main_loop() to achieve that. The bdrv_drain_all() call is not allowed outside the main loop since it could lead to lock ordering problems. Use bdrv_drain(bs) instead because we have acquired the AioContext so nothing else can sneak in I/O. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-10-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Max Reitz	b21c76529d	block/mirror: Improve progress report Instead of taking the total length of the block device as the block job's length, use the number of dirty sectors. The progress is now the number of sectors mirrored to the target block device. Note that this may result in the job's length increasing during operation, which is however in fact desirable. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414159063-25977-8-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Markus Armbruster	097310b53e	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	bfb197e0d9	block: Eliminate BlockDriverState member device_name[] device_name[] can become non-empty only in bdrv_new_root() and bdrv_move_feature_fields(). The latter is used only to undo damage done by bdrv_swap(). The former is called only by blk_new_with_bs(). Therefore, when a BlockDriverState's device_name[] is non-empty, then it's been created with a BlockBackend, and vice versa. Furthermore, blk_new_with_bs() keeps the two names equal. Therefore, device_name[] is redundant. Eliminate it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Stefan Hajnoczi	6d0de8eb21	mirror: fix uninitialized variable delay_ns warnings The gcc 4.1.2 compiler warns that delay_ns may be uninitialized in mirror_iteration(). There are two break statements in the do ... while loop that skip over the delay_ns assignment. These are probably the cause of the warning. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>	2014-08-28 13:42:25 +01:00
Kevin Wolf	7504edf477	mirror: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the mirror block job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:16 +02:00
Kevin Wolf	5a0f6fd5c8	mirror: Fix qiov size for short requests When mirroring an image of a size that is not a multiple of the mirror job granularity, the last request would have the right nb_sectors argument, but a qiov that is rounded up to the next multiple of the granularity. Don't do this. This fixes a segfault that is caused by raw-posix being confused by this and allocating a buffer with request length, but operating on it with qiov length. [s/Driver/Drive/ in qemu-iotests 041 as suggested by Eric --Stefan] Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-07 09:15:29 +02:00
Benoît Canet	09158f00e0	block: Add replaces argument to drive-mirror drive-mirror will bdrv_swap the new BDS named node-name with the one pointed by replaces when the mirroring is finished. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-06-27 20:00:00 +02:00
Fam Zheng	9e48b02540	mirror: Go through ready -> complete process for 0 len image When mirroring or active committing a zero length image, BLOCK_JOB_READY is not reported now, instead the job completes because we short circuit the mirror job loop. This is inconsistent with non-zero length images, and only confuses management software. Let's do the same thing when seeing a 0-length image: report ready immediately; wait for block-job-cancel or block-job-complete; clear the cancel flag as existing non-zero image synced case (cancelled after ready); then jump to the exit. Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-06-26 13:50:57 +02:00
Wenchao Xia	bcada37b19	qapi event: convert other BLOCK_JOB events Since BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED, BLOCK_JOB_READY are related, convert them in one patch. The block_job_event_* functions are used to keep encapsulation of BlockJob structure. Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2014-06-23 11:12:28 -04:00
Wenchao Xia	a589569f2f	qapi: adjust existing defines In order to let event defines use existing types later, instead of redefine new ones, some old type defines for spice and vnc are changed, and BlockErrorAction is moved from block.h to qapi schema. Note that BlockErrorAction is not merged with BlockdevOnError. At this point, VncInfo is not made a child of VncBasicInfo, because VncBasicInfo has mandatory fields where VncInfo makes them optional. Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2014-06-23 11:01:25 -04:00
Fam Zheng	826b6ca0b0	block: Add backing_blocker in BlockDriverState This makes use of op_blocker and blocks all the operations except for commit target, on each BlockDriverState->backing_hd. The asserts for op_blocker in bdrv_swap are removed because with this change, the target of block commit has at least the backing blocker of its child, so the assertion is not true. Callers should do their check. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Fam Zheng	c3cc95bd15	mirror: Check for bdrv_get_info result bdrv_get_info could fail. Add check before using the returned value. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-04-29 13:43:08 +02:00
Fam Zheng	373df5b135	mirror: Fix resource leak when bdrv_getlength fails The direct return will skip releasing of all the resouces at immediate_exit, don't miss that. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-04-29 13:43:00 +02:00
Fam Zheng	f0e9736012	mirror: Use DIV_ROUND_UP Although bdrv_getlength() was just called above this, and checked for error, it is better to just use the value we already get, and use DIV_ROUND_UP. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-04-28 17:36:30 +02:00
Markus Armbruster	0fb6395c0c	Use error_is_set() only when necessary (again) error_is_set(&var) is the same as var != NULL, but it takes whole-program analysis to figure that out. Unnecessarily hard for optimizers, static checkers, and human readers. Commit `84d18f0` dumbed it down to obvious, but a few more have crept in since, and documentation was overlooked. Dumb these down, too. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-04-25 18:05:06 +02:00
Fam Zheng	b8afb520e4	block: Handle error of bdrv_getlength in bdrv_create_dirty_bitmap bdrv_getlength could fail, check the return value before using it. Return NULL and set errno if it fails. Callers are updated to handle the error case. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-04-22 11:57:02 +02:00
Stefan Hajnoczi	7b770c720b	mirror: fix early wake from sleep due to aio The mirror blockjob coroutine rate-limits itself by sleeping. The coroutine also performs I/O asynchronously so it's important that the aio callback doesn't wake the coroutine early as that breaks rate-limiting. Reported-by: Joaquim Barrera <jbarrera@ac.upc.edu> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-03-25 14:09:50 +01:00
Paolo Bonzini	cc8c9d6c6f	mirror: fix throttling delay calculation The throttling delay calculation was using an inaccurate sector count to calculate the time to sleep. This broke rate-limiting for the block mirror job. Move the delay calculation into mirror_iteration() where we know how many sectors were transferred. This lets us calculate an accurate delay time. Reported-by: Joaquim Barrera <jbarrera@ac.upc.edu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-03-25 14:09:50 +01:00
Jeff Cody	50c75136be	block: mirror - remove code cruft that has no function Originally, this built up the error message with the backing filename, so that errp was set as follows: error_set(errp, QERR_OPEN_FILE_FAILED, backing_filename); However, we now propagate the local_error from the bdrv_open_backing_file() call instead, making these 2 lines useless code. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-03-06 11:47:40 +01:00
Jeff Cody	cc67f4d1f9	block: mirror - use local_err to avoid NULL errp When starting a block job, commit_active_start() relies on whether *errp is set by mirror_start_job. This allows it to determine if the mirror job start failed, so that it can clean up any changes to open flags from the bdrv_reopen(). If errp is NULL, then it will not be able to determine if mirror_start_job failed or not. To avoid this, use a local Error variable, and then propagate the error (if any) to errp. Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-02-14 18:05:39 +01:00
Jeff Cody	39a611a3e0	block: Don't throw away errno via error_setg There are a handful of places in the block layer where a failure path has a valid -errno value, yet error_setg() is used. Those instances should instead use error_setg_errno(), to preserve as much error information as possible. This patch replaces those instances with error_setg_errno(), so that errno is passed up the stack in the error message. Reported-By: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-02-14 18:05:38 +01:00
Jeff Cody	4da8358596	block: resize backing image during active layer commit, if needed If the top image to commit is the active layer, and also larger than the base image, then an I/O error will likely be returned during block-commit. For instance, if we have a base image with a virtual size 10G, and a active layer image of size 20G, then committing the snapshot via 'block-commit' will likely fail. This will automatically attempt to resize the base image, if the active layer image to be committed is larger. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-01-24 16:12:49 +01:00
Zhang Min	6df3bf8eb3	drive mirror:fix memory leak In the function mirror_iteration() -> qemu_iovec_init(), it allocates memory for op->qiov.iov, when the write request calls back, but in the function mirror_iteration_done(), it only frees the op, not free the op->qiov.iov, so this causes memory leak. It should use qemu_iovec_destroy() to free op->qiov. Signed-off-by: Zhang Min <rudy.zhangmin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-01-24 14:33:00 +01:00
Fam Zheng	20a63d2cec	commit: Support commit active layer If active is top, it will be mirrored to base, (with block/mirror.c code), then the image is switched when user completes the block job. QMP documentation is updated. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-20 16:26:16 +01:00
Fam Zheng	03544a6e9e	block: Add commit_active_start() commit_active_start is implemented in block/mirror.c, It will create a job with "commit" type and designated base in block-commit command. This will be used for committing active layer of device. Sync mode is removed from MirrorBlockJob because there's no proper type for commit. The used information is is_none_mode. The common part of mirror_start and commit_active_start is moved to mirror_start_job(). Fix the comment wording for commit_start. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-20 16:26:16 +01:00
Fam Zheng	5bc361b813	mirror: Move base to MirrorBlockJob This allows setting the base before entering mirror_run, commit will make use of it. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-20 16:26:16 +01:00
Fam Zheng	f95c625ce4	mirror: Don't close target Let reference count manage target and don't call bdrv_close here. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-20 16:26:16 +01:00
Fam Zheng	e4654d2d94	block: per caller dirty bitmap Previously a BlockDriverState has only one dirty bitmap, so only one caller (e.g. a block job) can keep track of writing. This changes the dirty bitmap to a list and creates a BdrvDirtyBitmap for each caller, the lifecycle is managed with these new functions: bdrv_create_dirty_bitmap bdrv_release_dirty_bitmap Where BdrvDirtyBitmap is a linked list wrapper structure of HBitmap. In place of bdrv_set_dirty_tracking, a BdrvDirtyBitmap pointer argument is added to these functions, since each caller has its own dirty bitmap: bdrv_get_dirty bdrv_dirty_iter_init bdrv_get_dirty_count bdrv_set_dirty and bdrv_reset_dirty prototypes are unchanged but will internally walk the list of all dirty bitmaps and set them one by one. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-11-29 13:40:33 +01:00
Fam Zheng	79e14bf778	qapi: make use of new BlockJobType Switch the string to enum type BlockJobType in BlockJobDriver. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 10:52:54 +02:00
Fam Zheng	3fc4b10af0	blockjob: rename BlockJobType to BlockJobDriver We will use BlockJobType as the enum type name of block jobs in QAPI, rename current BlockJobType to BlockJobDriver, which will eventually become a set of operations, similar to block drivers. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 10:52:54 +02:00
Max Reitz	34b5d2c68e	block: Error parameter for open functions Add an Error ** parameter to bdrv_open, bdrv_file_open and associated functions to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Paolo Bonzini	4f5786376e	block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction Now that bdrv_is_allocated detects coroutine context, the two can use the same code. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Fam Zheng	4f6fd3491c	block: make bdrv_delete() static Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no longer public and should be called by bdrv_unref() if refcnt is decreased to 0. This is an identical change because effectively, there's no multiple reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Alex Bligh	bc72ad6754	aio / timers: Switch entire codebase to the new timer API This is an autogenerated patch using scripts/switch-timer-api. Switch the entire code base to using the new timer API. Note this patch may introduce some line length issues. Signed-off-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 19:14:24 +02:00
Alex Bligh	7483d1e547	aio / timers: convert block_job_sleep_ns and co_sleep_ns to new API Convert block_job_sleep_ns and co_sleep_ns to use the new timer API. Signed-off-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 19:14:24 +02:00
Kevin Wolf	f59fee8d50	block: Make BlockJobTypes const Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:27 +02:00
Luiz Capitulino	dacc26aae5	block: mirror_complete(): use error_setg_file_open() Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 11:01:14 -04:00
Kevin Wolf	31ca6d077c	block: Add driver-specific options for backing files Options starting in "backing." are passed to the backing file now. If you don't need to specify the filename for the backing file, you can add it on the command line instead of in the image file: $ qemu-nbd -t /tmp/test.img $ qemu-img create -f qcow2 empty.qcow2 1G $ qemu-system-x86_64 -drive file=empty.qcow2,backing.file.driver=nbd,\ backing.file.host=localhost Note that this doesn't override the backing filename from the image. If the image has one, this will fail because NBD doesn't want the options and a filename at the same time. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Paolo Bonzini	88ff0e48ee	mirror: do nothing on zero-sized disk On a zero-sized disk we need to break out of the job successfully before bdrv_dirty_iter_init is called, otherwise you will get an assertion failure with the next patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Paolo Bonzini	884fea4e87	mirror: support arbitrarily-sized iterations Yet another optimization is to extend the mirroring iteration to include more adjacent dirty blocks. This limits the number of I/O operations and makes mirroring efficient even with a small granularity. Most of the infrastructure is already in place; we only need to put a loop around the computation of the origin and sector count of the iteration. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Paolo Bonzini	402a47411b	mirror: support more than one in-flight AIO operation With AIO support in place, we can start copying more than one chunk in parallel. This patch introduces the required infrastructure for this: the buffer is split into multiple granularity-sized chunks, and there is a free list to access them. Because of copy-on-write, a single operation may already require multiple chunks to be available on the free list. In addition, two different iterations on the HBitmap may want to copy the same cluster. We avoid this by keeping a bitmap of in-flight I/O operations, and blocking until the previous iteration completes. This should be a pretty rare occurrence, though; as long as there is no overlap the next iteration can start before the previous one finishes. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Paolo Bonzini	08e4ed6cde	mirror: add buf-size argument to drive-mirror This makes sense when the next commit starts using the extra buffer space to perform many I/O operations asynchronously. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	bd48bde8f0	mirror: switch mirror_iteration to AIO There is really no change in the behavior of the job here, since there is still a maximum of one in-flight I/O operation between the source and the target. However, this patch already introduces the AIO callbacks (which are unmodified in the next patch) and some of the logic to count in-flight operations and only complete the job when there is none. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	eee13dfe30	mirror: allow customizing the granularity The desired granularity may be very different depending on the kind of operation (e.g. continuous replication vs. collapse-to-raw) and whether the VM is expected to perform lots of I/O while mirroring is in progress. Allow the user to customize it, while providing a sane default so that in general there will be no extra allocated space in the target compared to the source. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	50717e941b	block: allow customizing the granularity of the dirty bitmap Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	acc906c6c5	block: return count of dirty sectors, not chunks Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:33 +01:00
Paolo Bonzini	b812f6719c	mirror: perform COW if the cluster size is bigger than the granularity When mirroring runs, the backing files for the target may not yet be ready. However, this means that a copy-on-write operation on the target would fill the missing sectors with zeros. Copy-on-write only happens if the granularity of the dirty bitmap is smaller than the cluster size (and only for clusters that are allocated in the source after the job has started copying). So far, the granularity was fixed to 1MB; to avoid the problem we detected the situation and required the backing files to be available in that case only. However, we want to lower the granularity for efficiency, so we need a better solution. The solution is to always copy a whole cluster the first time it is touched. The code keeps a bitmap of clusters that have already been allocated by the mirroring job, and only does "manual" copy-on-write if the chunk being copied is zero in the bitmap. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:33 +01:00
Paolo Bonzini	8f0720ecbc	block: implement dirty bitmap using HBitmap This actually uses the dirty bitmap in the block layer, and converts mirroring to use an HBitmapIter. Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts) Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:33 +01:00
Markus Armbruster	7191bf311e	block: Fix how mirror_run() frees its buffer It allocates with qemu_blockalign(), therefore it must free with qemu_vfree(), not g_free(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 17:28:55 +01:00
Paolo Bonzini	737e150e89	block: move include files to include/block/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:31 +01:00
Kevin Wolf	c57b6656c3	aio: Get rid of qemu_aio_flush() There are no remaining users, and new users should probably be using bdrv_drain_all() in the first place. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-11 11:04:25 +01:00
Paolo Bonzini	b952b5589a	mirror: add support for on-source-error/on-target-error Error management is important for mirroring; otherwise, an error on the target (even something as "innocent" as ENOSPC) requires to start again with a full copy. Similar to on_read_error/on_write_error, two separate knobs are provided for on_source_error (reads) and on_target_error (writes). The default is 'report' for both. The 'ignore' policy will leave the sector dirty, so that it will be retried later. Thus, it will not cause corruption. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-10-24 10:26:22 +02:00
Paolo Bonzini	d63ffd87ac	mirror: implement completion Switching to the target of the migration is done mostly asynchronously, and reported to management via the BLOCK_JOB_COMPLETED event; the only synchronous phase is opening the backing files. bdrv_open_backing_file can always be done, even for migration of the full image (aka sync: 'full'). In this case, qmp_drive_mirror will create the target disk with no backing file at all, and bdrv_open_backing_file will be a no-op. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-10-24 10:26:22 +02:00
Paolo Bonzini	893f7ebafe	mirror: introduce mirror job This patch adds the implementation of a new job that mirrors a disk to a new image while letting the guest continue using the old image. The target is treated as a "black box" and data is copied from the source to the target in the background. This can be used for several purposes, including storage migration, continuous replication, and observation of the guest I/O in an external program. It is also a first step in replacing the inefficient block migration code that is part of QEMU. The job is possibly never-ending, but it is logically structured into two phases: 1) copy all data as fast as possible until the target first gets in sync with the source; 2) keep target in sync and ensure that reopening to the target gets a correct (full) copy of the source data. The second phase is indicated by the progress in "info block-jobs" reporting the current offset to be equal to the length of the file. When the job is cancelled in the second phase, QEMU will run the job until the source is clean and quiescent, then it will report successful completion of the job. In other words, the BLOCK_JOB_CANCELLED event means that the target may _not_ be consistent with a past state of the source; the BLOCK_JOB_COMPLETED event means that the target is consistent with a past state of the source. (Note that it could already happen that management lost the race against QEMU and got a completion event instead of cancellation). It is not yet possible to complete the job and switch over to the target disk. The next patches will fix this and add many refinements to the basic idea introduced here. These include improved error management, some tunable knobs and performance optimizations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-10-24 10:26:19 +02:00

... 2 3 4 5 6

288 Commits