mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Kevin Wolf	5db15a5769	block: Manage backing file references in bdrv_set_backing_hd() This simplifies the code somewhat, especially when dropping whole backing file subchains. The exception is the mirroring code that does adventurous things with bdrv_swap() and in order to keep it working, I had to duplicate most of bdrv_set_backing_hd() locally. We'll get rid again of this ugliness shortly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	760e006384	block: Convert bs->backing_hd to BdrvChild This is the final step in converting all of the BlockDriverState pointers that block drivers use to BdrvChild. After this patch, bs->children contains the full list of child nodes that are referenced by a given BDS, and these children are only referenced through BdrvChild, so that updating the pointer in there is enough for changing edges in the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	b26e90f56a	block: Remove bdrv_open_image() It is unused now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	9a4f4c3156	block: Convert bs->file to BdrvChild This patch removes the temporary duplication between bs->file and bs->file_child by converting everything to BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	1fdd693308	block: Introduce BDS.file_child Store the BdrvChild for bs->file. At this point, bs->file_child->bs just duplicates the bs->file pointer. Later, it will completely replace it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Wen Congyang	9568b511c9	block: Introduce a new API bdrv_co_no_copy_on_readv() In some cases, we need to disable copy-on-read, and just read the data. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Message-id: 1441682913-14320-2-git-send-email-wency@cn.fujitsu.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-09-25 08:37:07 -04:00
Kevin Wolf	4d2cb09251	block: Allow specifying driver-specific options to reopen Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-09-14 16:51:36 +02:00
Max Reitz	cf25ff850f	block: Drop bdrv_find_whitelisted_format() It is unused by now, so we can drop it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:36 +02:00
Max Reitz	6ebf9aa2ef	block: Drop drv parameter from bdrv_open() Now that this parameter is effectively unused, we can drop it and just pass NULL on to bdrv_open_inherit(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:36 +02:00
Veres Lajos	67cc32ebfd	typofixes - v4 Signed-off-by: Veres Lajos <vlajos@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-09-11 10:45:43 +03:00
Daniel P. Berrange	b6af097528	maint: remove / fix many doubled words Many source files have doubled words (eg "the the", "to to", and so on). Most of these can simply be removed, but a couple were actual mis-spellings (eg "to to" instead of "to do"). There was even one triple word score "to to to" :-) Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-09-11 10:21:38 +03:00
Wen Congyang	e12f378409	block: more check for replaced node We use mirror+replace to fix quorum's broken child. bs/s->common.bs is quorum, and to_replace is the broken child. The new child is target_bs. Without this patch, the replace node can be any node, and it can be top BDS with BB, or another quorum's child. We just check if the broken child is part of the quorum BDS in this patch. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Message-id: 55A86486.1000404@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-09-02 14:56:39 +01:00
Stefan Hajnoczi	ca96ac44dc	AioContext: force event loop iteration using BH The notify_me optimization introduced in commit `eabc977973` ("AioContext: fix broken ctx->dispatching optimization") skips event_notifier_set() calls when the event loop thread is not blocked in ppoll(2). This optimization causes a deadlock if two aio_context_acquire() calls race. notify_me = 0 during the race so the winning thread can enter ppoll(2) unaware that the other thread is waiting its turn to acquire the AioContext. This patch forces ppoll(2) to return by scheduling a BH instead of calling aio_notify(). The following deadlock with virtio-blk dataplane is fixed: qemu ... -object iothread,id=iothread0 \ -drive if=none,id=drive0,file=test.img,... \ -device virtio-blk-pci,iothread=iothread0,drive=drive0 This command-line results in a hang early on without this patch. Thanks to Paolo Bonzini <pbonzini@redhat.com> for investigating this bug with me. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1438101249-25166-4-git-send-email-pbonzini@redhat.com Message-Id: <1438014819-18125-3-git-send-email-stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-29 10:02:06 +01:00
Paolo Bonzini	05e514b1d4	AioContext: optimize clearing the EventNotifier It is pretty rare for aio_notify to actually set the EventNotifier. It can happen with worker threads such as thread-pool.c's, but otherwise it should never be set thanks to the ctx->notify_me optimization. The previous patch, unfortunately, added an unconditional call to event_notifier_test_and_clear; now add a userspace fast path that avoids the call. Note that it is not possible to do the same with event_notifier_set; it would break, as proved (again) by the included formal model. This patch survived over 3000 reboots on aarch64 KVM. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-id: 1437487673-23740-7-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-22 12:41:40 +01:00
Paolo Bonzini	eabc977973	AioContext: fix broken ctx->dispatching optimization This patch rewrites the ctx->dispatching optimization, which was the cause of some mysterious hangs that could be reproduced on aarch64 KVM only. The hangs were indirectly caused by aio_poll() and in particular by flash memory updates's call to blk_write(), which invokes aio_poll(). Fun stuff: they had an extremely short race window, so much that adding all kind of tracing to either the kernel or QEMU made it go away (a single printf made it half as reproducible). On the plus side, the failure mode (a hang until the next keypress) made it very easy to examine the state of the process with a debugger. And there was a very nice reproducer from Laszlo, which failed pretty often (more than half of the time) on any version of QEMU with a non-debug kernel; it also failed fast, while still in the firmware. So, it could have been worse. For some unknown reason they happened only with virtio-scsi, but that's not important. It's more interesting that they disappeared with io=native, making thread-pool.c a likely suspect for where the bug arose. thread-pool.c is also one of the few places which use bottom halves across threads, by the way. I hope that no other similar bugs exist, but just in case :) I am going to describe how the successful debugging went... Since the likely culprit was the ctx->dispatching optimization, which mostly affects bottom halves, the first observation was that there are two qemu_bh_schedule() invocations in the thread pool: the one in the aio worker and the one in thread_pool_completion_bh. The latter always causes the optimization to trigger, the former may or may not. In order to restrict the possibilities, I introduced new functions qemu_bh_schedule_slow() and qemu_bh_schedule_fast(): /* qemu_bh_schedule_slow: / ctx = bh->ctx; bh->idle = 0; if (atomic_xchg(&bh->scheduled, 1) == 0) { event_notifier_set(&ctx->notifier); } / qemu_bh_schedule_fast: / ctx = bh->ctx; bh->idle = 0; assert(ctx->dispatching); atomic_xchg(&bh->scheduled, 1); Notice how the atomic_xchg is still in qemu_bh_schedule_slow(). This was already debated a few months ago, so I assumed it to be correct. In retrospect this was a very good idea, as you'll see later. Changing thread_pool_completion_bh() to qemu_bh_schedule_fast() didn't trigger the assertion (as expected). Changing the worker's invocation to qemu_bh_schedule_slow() didn't hide the bug (another assumption which luckily held). This already limited heavily the amount of interaction between the threads, hinting that the problematic events must have triggered around thread_pool_completion_bh(). As mentioned early, invoking a debugger to examine the state of a hung process was pretty easy; the iothread was always waiting on a poll(..., -1) system call. Infinite timeouts are much rarer on x86, and this could be the reason why the bug was never observed there. With the buggy sequence more or less resolved to an interaction between thread_pool_completion_bh() and poll(..., -1), my "tracing" strategy was to just add a few qemu_clock_get_ns(QEMU_CLOCK_REALTIME) calls, hoping that the ordering of aio_ctx_prepare(), aio_ctx_dispatch, poll() and qemu_bh_schedule_fast() would provide some hint. The output was: (gdb) p last_prepare $3 = 103885451 (gdb) p last_dispatch $4 = 103876492 (gdb) p last_poll $5 = 115909333 (gdb) p last_schedule $6 = 115925212 Notice how the last call to qemu_poll_ns() came after aio_ctx_dispatch(). This makes little sense unless there is an aio_poll() call involved, and indeed with a slightly different instrumentation you can see that there is one: (gdb) p last_prepare $3 = 107569679 (gdb) p last_dispatch $4 = 107561600 (gdb) p last_aio_poll $5 = 110671400 (gdb) p last_schedule $6 = 110698917 So the scenario becomes clearer: iothread VCPU thread -------------------------------------------------------------------------- aio_ctx_prepare aio_ctx_check qemu_poll_ns(timeout=-1) aio_poll aio_dispatch thread_pool_completion_bh qemu_bh_schedule() At this point bh->scheduled = 1 and the iothread has not been woken up. The solution must be close, but this alone should not be a problem, because the bottom half is only rescheduled to account for rare situations (see commit `3c80ca1`, thread-pool: avoid deadlock in nested aio_poll() calls, 2014-07-15). Introducing a third thread---a thread pool worker thread, which also does qemu_bh_schedule()---does bring out the problematic case. The third thread must be awakened after* the callback is complete and thread_pool_completion_bh has redone the whole loop, explaining the short race window. And then this is what happens: thread pool worker -------------------------------------------------------------------------- <I/O completes> qemu_bh_schedule() Tada, bh->scheduled is already 1, so qemu_bh_schedule() does nothing and the iothread is never woken up. This is where the bh->scheduled optimization comes into play---it is correct, but removing it would have masked the bug. So, what is the bug? Well, the question asked by the ctx->dispatching optimization ("is any active aio_poll dispatching?") was wrong. The right question to ask instead is "is any active aio_poll not dispatching", i.e. in the prepare or poll phases? In that case, the aio_poll is sleeping or might go to sleep anytime soon, and the EventNotifier must be invoked to wake it up. In any other case (including if there is no active aio_poll at all!) we can just wait for the next prepare phase to pick up the event (e.g. a bottom half); the prepare phase will avoid the blocking and service the bottom half. Expressing the invariant with a logic formula, the broken one looked like: !(exists(thread): in_dispatching(thread)) => !optimize or equivalently: !(exists(thread): in_aio_poll(thread) && in_dispatching(thread)) => !optimize In the correct one, the negation is in a slightly different place: (exists(thread): in_aio_poll(thread) && !in_dispatching(thread)) => !optimize or equivalently: (exists(thread): in_prepare_or_poll(thread)) => !optimize Even if the difference boils down to moving an exclamation mark :) the implementation is quite different. However, I think the new one is simpler to understand. In the old implementation, the "exists" was implemented with a boolean value. This didn't really support well the case of multiple concurrent event loops, but I thought that this was okay: aio_poll holds the AioContext lock so there cannot be concurrent aio_poll invocations, and I was just considering nested event loops. However, aio_poll _could_ indeed be concurrent with the GSource. This is why I came up with the wrong invariant. In the new implementation, "exists" is computed simply by counting how many threads are in the prepare or poll phases. There are some interesting points to consider, but the gist of the idea remains: 1) AioContext can be used through GSource as well; as mentioned in the patch, bit 0 of the counter is reserved for the GSource. 2) the counter need not be updated for a non-blocking aio_poll, because it won't sleep forever anyway. This is just a matter of checking the "blocking" variable. This requires some changes to the win32 implementation, but is otherwise not too complicated. 3) as mentioned above, the new implementation will not call aio_notify when there is no active aio_poll at all. The tests have to be adjusted for this change. The calls to aio_notify in async.c are fine; they only want to kick aio_poll out of a blocking wait, but need not do anything if aio_poll is not running. 4) nested aio_poll: these just work with the new implementation; when a nested event loop is invoked, the outer event loop is never in the prepare or poll phases. The outer event loop thus has already decremented the counter. Reported-by: Richard W. M. Jones <rjones@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-id: 1437487673-23740-5-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-22 12:41:40 +01:00
Kevin Wolf	80a1e13091	block: Fix backing file child when modifying graph This patch moves bdrv_attach_child() from the individual places that add a backing file to a BDS to bdrv_set_backing_hd(), which is called by all of them. It also adds bdrv_detach_child() there. For normal operation (starting with one backing file chain and not changing it until the topmost image is closed) and live snapshots, this constitutes no change in behaviour. For all other cases, this is a fix for the bug that the old backing file was still referenced as a child, and the new one wasn't referenced. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-07-14 17:15:23 +02:00
Kevin Wolf	33a604075c	block: Introduce bdrv_unref_child() This is the counterpart for bdrv_open_child(). It decreases the reference count of the child BDS and removes it from the list of children of the given parent BDS. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-07-14 17:15:23 +02:00
Kevin Wolf	b4b059f628	block: Introduce bdrv_open_child() It is the same as bdrv_open_image(), except that it doesn't only return success or failure, but the newly created BdrvChild object for the new child node. As the BdrvChild object already contains a BlockDriverState pointer (and this is supposed to become the only pointer so that bdrv_append() and friends can just change a single pointer in BdrvChild), the pbs parameter is removed for bdrv_open_child(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-07-14 17:15:18 +02:00
Ting Wang	970311646a	blockjob: add block_job_release function There is job resource leak in function mirror_start_job, although bdrv_create_dirty_bitmap is unlikely failed. Add block_job_release for each release when needed. Signed-off-by: Ting Wang <kathy.wangting@huawei.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1435311455-56048-1-git-send-email-kathy.wangting@huawei.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-07 14:27:14 +01:00
Fam Zheng	6e82e4bce1	block: Remove bdrv_reset_dirty Using this function would always be wrong because a dirty bitmap must have a specific owner that consumes the dirty bits and calls bdrv_reset_dirty_bitmap(). Remove the unused function to avoid future misuse. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Fam Zheng	0fc9f8ea28	qmp: Add optional bool "unmap" to drive-mirror If specified as "true", it allows discarding on target sectors where source is not allocated. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Fam Zheng	ba3f0e2545	block: Add bdrv_get_block_status_above Like bdrv_is_allocated_above, this function follows the backing chain until seeing BDRV_BLOCK_ALLOCATED. Base is not included. Reimplement bdrv_is_allocated on top. [Initialized bdrv_co_get_block_status_above() ret to 0 to silence mingw64 compiler warning about the unitialized variable. assert(bs != base) prevents that case but I suppose the program could be compiled with -DNDEBUG. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:03:50 +01:00
John Snow	4b80ab2b7d	qapi: Rename 'dirty-bitmap' mode to 'incremental' If we wish to make differential backups a feature that's easy to access, it might be pertinent to rename the "dirty-bitmap" mode to "incremental" to make it clear what /type/ of backup the dirty-bitmap is helping us perform. This is an API breaking change, but 2.4 has not yet gone live, so we have this flexibility. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1433463642-21840-2-git-send-email-jsnow@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 09:20:18 +01:00
Markus Armbruster	a0b1a66ea3	Include monitor/monitor.h exactly where needed In particular, don't include it into headers. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:41 +02:00
Markus Armbruster	cc7a8ea740	Include qapi/qmp/qerror.h exactly where needed In particular, don't include it into headers. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:41 +02:00
Peter Maydell	f3e3b083d4	Block layer core and image format patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJVevYFAAoJEH8JsnLIjy/W3jEP/0hiQ3rCRZ/he8s5maTdT+TR YSeHkB5rKpz0Uopn1DMn1QrIbUVzX7dyb+uf9zQ0/xRQIzf6k8uxqU/NWrdoF3NK qx91dGWedwnG+TEBIMbcR7nMrw4dP6kH7uPz/VWMXDHVLz0HIcD95qhKgs0mSY6J dWqex6ACjXM68zJU5IioagU9evV80WZE1S8z7zfixxtTBx5hCaTVbwalkaCxcrXw PbZle55rjI8B10+OzgBw0fq10nias+NTndU9CwNBboxmEtAjq8/mQ663vcWlmiFo 9a/hkda27Z5ut/0Tqk1v4uLHauylp++rrAabPBAuCFMKes6cdkddP15Q/r52aJ29 5meodQtbet1rGrM+Aq4vuSuWId71PGypEI/3URDdNfYFNISoeLLsk4lcQUu7VrDD sRX3Jt8SI3nkIgOnhPyi7NDPmafxFt8yRt5vM8MyR5ynF8NS/2hiAc3wqnbXGjUj a5GqDCefb1yM0R5HvksuFFt3OnXlKJQ3J+ksXNUJf9DSAZPauqWD696pcTeg8wyy 3PIGkczgUuKTVfFWd3THZxJLAo7ZuqvXBHHV8o1SeMBDxwh4FhTd8Kjvm3rUNFfl VDox4qwZ1AcxLrxgqazKU7sD9iWBDHURRcpOoUsBys7oxQZnhcmMp1fRlEkTOyrD HNiSNByqBrtkfeVzHlSe =QgYk -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer core and image format patches # gpg: Signature made Fri Jun 12 16:08:53 2015 BST using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: (25 commits) block: Fix reopen flag inheritance block: Add BlockDriverState.inherits_from block: Add list of children to BlockDriverState queue.h: Add QLIST_FIX_HEAD_PTR() block: Drain requests before swapping nodes in bdrv_swap() block: Move flag inheritance to bdrv_open_inherit() block: Use QemuOpts in bdrv_open_common() block: Use macro for cache option names vmdk: Use bdrv_open_image() quorum: Use bdrv_open_image() check-qdict: Test cases for new functions qdict: Add qdict_{set,copy}_default() qdict: Add qdict_array_entries() iotests: Add tests for overriding BDRV_O_PROTOCOL block: driver should override flags in bdrv_open() block: Change bitmap truncate conditional to assertion block: record new size in bdrv_dirty_bitmap_truncate raw-posix: Fix .bdrv_co_get_block_status() for unaligned image size vmdk: Use vmdk_find_index_in_cluster everywhere vmdk: Fix index_in_cluster calculation in vmdk_co_get_block_status ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-06-15 10:43:06 +01:00
Peter Maydell	8aeaa055f5	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVevNrAAoJEJykq7OBq3PIn3wH/iTkoxK+8c6vHY4gAyW/3Vel JAWyDULliLg/mJiyDyz+9hbyb3FmDsQ7Gve5laFCqSES8IChe/fAoR2dXAapHbJm VroBii+C4MIakAjB1eSGVTMeJVgyq0SxCbEgo6MSgCNRg8pH/ne4kGMn0elsyMQm VrxqsoW1qda0gLbybu7uRKmdZlb5Eklk0zQmBlzd25j3xgMXmo7ZcXW0d7dCi/xd ot01aPNHWKy2FyYsI+N1SMagCedjn0FFCHMMc08W34olUOAschV+Croxi6gupbTl RdlYNnB6BBYLgT7mxYuKKd20r4Q8Si3E5YjIsYaXIskbCOxDuryKboyheMXWqdA= =mx5x -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging # gpg: Signature made Fri Jun 12 15:57:47 2015 BST using RSA key ID 81AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" * remotes/stefanha/tags/block-pull-request: qemu-iotests: expand test 093 to support group throttling throttle: Update throttle infrastructure copyright throttle: add the name of the ThrottleGroup to BlockDeviceInfo throttle: acquire the ThrottleGroup lock in bdrv_swap() throttle: Add throttle group support throttle: Add throttle group infrastructure tests throttle: Add throttle group infrastructure throttle: Extract timers from ThrottleState into a separate structure raw-posix: Fix .bdrv_co_get_block_status() for unaligned image size Revert "iothread: release iothread around aio_poll" Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-06-12 18:04:14 +01:00
Kevin Wolf	bddcec3745	block: Add BlockDriverState.inherits_from Currently, the block layer assumes that any block node can have only one parent, and if it has a parent, that it inherits some options/flags from this parent. This is not true any more: With references used in block device creation, a single node can be used by multiple parents, or it can be created separately and not inherit flags from any parent. To handle reopens correctly, a node must know from which parent it inherited options. This patch adds the information to BlockDriverState. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-06-12 17:04:59 +02:00
Kevin Wolf	6e93e7c41f	block: Add list of children to BlockDriverState This allows iterating over all children of a given BDS, not only including bs->file and bs->backing_hd, but also driver-specific ones like VMDK extents or Quorum children. For bdrv_swap(), the list of children of the swapped BDS stays at that BDS (because that's where the pointers stay as well). The list head moves and pointers to it must be fixed up therefore. The list of children in the parent of the swapped BDS is not affected by the swap. The contents of the BDS objects is swapped, so the existing pointer in the parent automatically points to the newly swapped in BDS. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-06-12 17:04:59 +02:00
Kevin Wolf	f3930ed0bb	block: Move flag inheritance to bdrv_open_inherit() Instead of letting every caller of bdrv_open() determine the right flags for its child node manually and pass them to the function, pass the parent node and the role of the newly opened child (like backing file, protocol layer, etc.). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-06-12 17:04:59 +02:00
Kevin Wolf	54861b9280	block: Use macro for cache option names Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-06-12 16:58:07 +02:00
Alberto Garcia	db6283385c	throttle: acquire the ThrottleGroup lock in bdrv_swap() bdrv_swap() touches the fields of a BlockDriverState that are protected by the ThrottleGroup lock. Although those fields end up in their original place, they are temporarily swapped in the process, so there's a chance that an operation on a member of the same group happening on a different thread can try to use them. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: d92dc40d7c4f1fc5cda5cbbf4ffb7a4670b79d17.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Alberto Garcia	76f4afb40f	throttle: Add throttle group support The throttle group support use a cooperative round robin scheduling algorithm. The principles of the algorithm are simple: - Each BDS of the group is used as a token in a circular way. - The active BDS computes if a wait must be done and arms the right timer. - If a wait must be done the token timer will be armed so the token will become the next active BDS. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: f0082a86f3ac01c46170f7eafe2101a92e8fde39.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Alberto Garcia	2ff1f2e3a3	throttle: Add throttle group infrastructure Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 2fdb4de17210b733a13eb472c33cd08b45f8fd21.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Benoît Canet	0e5b0a2d54	throttle: Extract timers from ThrottleState into a separate structure Group throttling will share ThrottleState between multiple bs. As a consequence the ThrottleState will be accessed by multiple aio context. Timers are tied to their aio context so they must go out of the ThrottleState structure. This commit paves the way for each bs of a common ThrottleState to have its own timer. Signed-off-by: Benoit Canet <benoit.canet@nodalink.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 6cf9ea96d8b32ae2f8769cead38f68a6a0c8c909.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Fam Zheng	6484e42247	main-loop: Drop qemu_set_fd_handler2 All users are converted to qemu_set_fd_handler now, drop qemu_set_fd_handler2 and IOHandlerRecord.fd_read_poll. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1433400324-7358-9-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 13:26:21 +01:00
John Snow	9abe3bdc45	qapi: add dirty bitmap status Bitmaps can be in a handful of different states with potentially more to come as we tool around with migration and persistence patches. Management applications may need to know why certain bitmaps are unavailable for various commands, e.g. busy in another operation, busy being migrated, etc. Right now, all we offer is BlockDirtyInfo's boolean member 'frozen'. Instead of adding more booleans, replace it by an enumeration member 'status' with values 'active' and 'frozen'. Then add new value 'disabled'. Incompatible change. Fine because the changed part hasn't been released so far. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2015-05-29 12:53:12 +02:00
Denis V. Lunev	4196d2f030	block: minimal bounce buffer alignment The patch introduces new concept: minimal memory alignment for bounce buffers. Original so called "optimal" value is actually minimal required value for aligment. It should be used for validation that the IOVec is properly aligned and bounce buffer is not required. Though, from the performance point of view, it would be better if bounce buffer or IOVec allocated by QEMU will be aligned stricter. The patch does not change any alignment value yet. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-2-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Stefan Hajnoczi	0eb7217e49	block: extract bdrv_setup_io_funcs() Move the code to install coroutine and aio emulation function pointers in a BlockDriver to its own function. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:17 +02:00
Stefan Hajnoczi	e0c47b6cb1	block: add bdrv_set_dirty()/bdrv_reset_dirty() to block_int.h The dirty bitmap functions are called from the block I/O processing code. Make them visible to block_int.h users so they can be used outside block.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:17 +02:00
John Snow	20dca81075	block: Ensure consistent bitmap function prototypes We often don't need the BlockDriverState for functions that operate on bitmaps. Remove it. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-15-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	e74e6b78e6	qmp: add block-dirty-bitmap-clear Add bdrv_clear_dirty_bitmap and a matching QMP command, qmp_block_dirty_bitmap_clear that enables a user to reset the bitmap attached to a drive. This allows us to reset a bitmap in the event of a full drive backup. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1429314609-29776-12-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	d58d845397	qmp: Add support of "dirty-bitmap" sync mode for drive-backup For "dirty-bitmap" sync mode, the block job will iterate through the given dirty bitmap to decide if a sector needs backup (backup all the dirty clusters and skip clean ones), just as allocation conditions of "top" sync mode. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-11-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	9bd2b08f27	block: Add bitmap successors A bitmap successor is an anonymous BdrvDirtyBitmap that is intended to be created just prior to a sensitive operation (e.g. Incremental Backup) that can either succeed or fail, but during the course of which we still want a bitmap tracking writes. On creating a successor, we "freeze" the parent bitmap which prevents its deletion, enabling, anonymization, or creating a bitmap with the same name. On success, the parent bitmap can "abdicate" responsibility to the successor, which will inherit its name. The successor will have been tracking writes during the course of the backup operation. The parent will be safely deleted. On failure, we can "reclaim" the successor from the parent, unifying them such that the resulting bitmap describes all writes occurring since the last successful backup, for instance. Reclamation will thaw the parent, but not explicitly re-enable it. BdrvDirtyBitmap operations that target a single bitmap are protected by assertions that the bitmap is not frozen and/or disabled. BdrvDirtyBitmap operations that target a group of bitmaps, such as bdrv_{set,reset}_dirty will ignore frozen/disabled drives with a conditional instead. Internal functions that enable/disable dirty bitmaps have assertions added to them to prevent modifying frozen bitmaps. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1429314609-29776-10-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	b8e6fb752e	block: Add bitmap disabled status Add a status indicating the enabled/disabled state of the bitmap. A bitmap is by default enabled, but you can lock the bitmap into a read-only state by setting disabled = true. A previous version of this patch added a QMP interface for changing the state of the bitmap, but it has since been removed for now until a use case emerges where this state must be revealed to the user. The disabled state WILL be used internally for bitmap migration and bitmap persistence. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-9-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	592fdd02ae	block: Introduce bdrv_dirty_bitmap_granularity() This returns the granularity (in bytes) of dirty bitmap, which matches the QMP interface and the existing query interface. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-6-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	341ebc2f81	qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove The new command pair is added to manage a user created dirty bitmap. The dirty bitmap's name is mandatory and must be unique for the same device, but different devices can have bitmaps with the same names. The granularity is an optional field. If it is not specified, we will choose a default granularity based on the cluster size if available, clamped to between 4K and 64K to mirror how the 'mirror' code was already choosing granularity. If we do not have cluster size info available, we choose 64K. This code has been factored out into a helper shared with block/mirror. This patch also introduces the 'block_dirty_bitmap_lookup' helper, which takes a device name and a dirty bitmap name and validates the lookup, returning NULL and setting errp if there is a problem with either field. This helper will be re-used in future patches in this series. The types added to block-core.json will be re-used in future patches in this series, see: 'qapi: Add transaction support to block-dirty-bitmap-{add, enable, disable}' Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1429314609-29776-5-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
John Snow	5fba6c0e50	qmp: Ensure consistent granularity type We treat this field with a variety of different types everywhere in the code. Now it's just uint32_t. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-4-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
Fam Zheng	0db6e54a8a	qapi: Add optional field "name" to block dirty bitmap This field will be set for user created dirty bitmap. Also pass in an error pointer to bdrv_create_dirty_bitmap, so when a name is already taken on this BDS, it can report an error message. This is not global check, two BDSes can have dirty bitmap with a common name. Implemented bdrv_find_dirty_bitmap to find a dirty bitmap by name, will be used later when other QMP commands want to reference dirty bitmap by name. Add bdrv_dirty_bitmap_make_anon. This unsets the name of dirty bitmap. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-3-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:10 +02:00
Alberto Garcia	d5a8ee60a0	qmp: fill in the image field in BlockDeviceInfo The image field in BlockDeviceInfo is supposed to contain an ImageInfo object. However that is being filled in by bdrv_query_info(), not by bdrv_block_device_info(), which is where BlockDeviceInfo is actually created. Anyone calling bdrv_block_device_info() directly will get a null image field. As a consequence of this, the HMP command 'info block -n -v' crashes QEMU. This patch moves the code that fills in that field from bdrv_query_info() to bdrv_block_device_info(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 1429271563-3765-1-git-send-email-berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:09 +02:00
Alberto Garcia	9b2aa84f87	block: add bdrv_get_device_or_node_name() This function gets the device name associated with a BlockDriverState, or its node name if the device name is empty. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 4fa30aa8d61d9052ce266fd5429a59a14e941255.1428485266.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:09 +02:00
Fam Zheng	751ebd76e6	blockjob: Allow nested pause This patch changes block_job_pause to increase the pause counter and block_job_resume to decrease it. The counter will allow calling block_job_pause/block_job_resume unconditionally on a job when we need to suspend the IO temporarily. From now on, each block_job_resume must be paired with a block_job_pause to keep the counter balanced. The user pause from QMP or HMP will only trigger block_job_pause once until it's resumed, this is achieved by adding a user_paused flag in BlockJob. One occurrence of block_job_resume in mirror_complete is replaced with block_job_enter which does what is necessary. In block_job_cancel, the cancel flag is good enough to instruct coroutines to quit loop, so use block_job_enter to replace the unpaired block_job_resume. Upon block job IO error, user is notified about the entering to the pause state, so this pause belongs to user pause, set the flag accordingly and expect a matching QMP resume. [Extended doc comments as suggested by Paolo Bonzini <pbonzini@redhat.com>. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1428069921-2957-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:09 +02:00
Paolo Bonzini	49110174f8	AioContext: acquire/release AioContext during aio_poll This is the first step in pushing down acquire/release, and will let rfifolock drop the contention callback feature. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424449612-18215-3-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:08 +02:00
Paolo Bonzini	e98ab09709	aio-posix: move pollfds to thread-local storage By using thread-local storage, aio_poll can stop using global data during g_poll_ns. This will make it possible to drop callbacks from rfifolock. [Moved npfd = 0 assignment to end of walking_handlers region as suggested by Paolo. This resolves the assert(npfd == 0) assertion failure in pollfds_cleanup(). --Stefan] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424449612-18215-2-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:08 +02:00
Max Reitz	3f4726596d	nbd: Set block size to BDRV_SECTOR_SIZE Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <1424887718-10800-13-git-send-email-mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-03-18 12:07:01 +01:00
Max Reitz	ac97393dc7	nbd: Fix potential signed overflow issues Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <1424887718-10800-11-git-send-email-mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-03-18 12:06:56 +01:00
Max Reitz	98f44bbe70	nbd: Handle blk_getlength() failure Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <1424887718-10800-9-git-send-email-mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-03-18 12:06:50 +01:00
Fam Zheng	d51a2427f6	block: Drop bdrv_find All callers are converted, so drop it. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1425296209-1476-5-git-send-email-famz@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-03-16 12:10:30 -04:00
Ekaterina Tumanova	892b7de832	block: add bdrv functions for geometry and blocksize Add driver functions for geometry and blocksize detection Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424087278-49393-2-git-send-email-tumanova@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:21 +01:00
Max Reitz	06d05fa738	qcow2: Allow creation with refcount order != 4 Add a creation option to qcow2 for setting the refcount order of images to be created, and respect that option's value. This breaks some test outputs, fix them. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:21 +01:00
Teruaki Ishizaki	876eb1b0cc	sheepdog: selectable object size support Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify object_size option. If you want to create a VDI of 8MB object size, you need to specify following command option. # qemu-img create -o object_size=8M sheepdog:test1 100M In addition, when you don't specify qemu-img command option, a default value of sheepdog cluster is used for creating VDI. # qemu-img create sheepdog:test2 100M Signed-off-by: Teruaki Ishizaki <ishizaki.teruaki@lab.ntt.co.jp> Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-09 11:11:59 +01:00
Kevin Wolf	cd12bb567c	coroutine: Clean up qemu_coroutine_enter() qemu_coroutine_enter() is now the only user of coroutine_swap(). Both functions are short, so inline it. Also, using COROUTINE_YIELD is now even more confusing because this code is never called during qemu_coroutine_yield() any more. In fact, this value is never read back, so we can just introduce a new COROUTINE_ENTER which documents the purpose of the task switch better. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2015-03-09 11:11:59 +01:00
Fam Zheng	2e5b887cfc	block: Forbid bdrv_set_aio_context outside BQL Even if the caller has both the old and the new AioContext's, there can be a deadlock, due to the leading bdrv_drain_all. Suppose there are four io threads (A, B, A0, B0) with A and B owning a BDS for each (bs_a, bs_b); Now A wants to move bs_a to iothread A0, and B wants to move bs_b to B0, at the same time: iothread A iothread B -------------------------------------------------------------------------- aio_context_acquire(A0) /* OK / aio_context_acquire(B0) / OK / bdrv_set_aio_context(bs_a, A0) bdrv_set_aio_context(bs_b, B0) -> bdrv_drain_all() -> bdrv_drain_all() -> acquire A / OK / -> acquire A / blocked / -> acquire B / blocked */ -> acquire B ... ... Deadlock happens because A is waiting for B, and B is waiting for A. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1423969591-23646-2-git-send-email-famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-27 14:43:45 +01:00
Max Reitz	c0191e763b	block: Remove "growable" from BDS Now that request clamping is done in the BlockBackend, the "growable" field can be removed from the BlockDriverState. All BDSs are now treated as being "growable" (that is, they are allowed to grow; they are not necessarily actually able to). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-16-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:19 +00:00
Max Reitz	b65a5e12a4	block: Add Error parameter to bdrv_find_protocol() The argument given to bdrv_find_protocol() is just a file name, which makes it difficult for the caller to reconstruct what protocol bdrv_find_protocol() was hoping to find. This patch adds an Error parameter to that function to solve this issue. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:18 +00:00
Max Reitz	f53a829bb9	nbd: Drop BDS backpointer Before this patch, the "opaque" pointer in an NBD BDS points to a BDRVNBDState, which contains an NbdClientSession object, which in turn contains a pointer to the BDS. This pointer may become invalid due to bdrv_swap(), so drop it, and instead pass the BDS directly to the nbd-client.c functions which then retrieve the NbdClientSession object from there. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1423256778-3340-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 14:36:03 +00:00
Markus Armbruster	4d2855a348	block: New bdrv_add_key(), convert monitor to use it Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1422524221-8566-4-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-02-06 11:46:32 -05:00
Peter Lieven	75af1f34cd	block: introduce BDRV_REQUEST_MAX_SECTORS we check and adjust request sizes at several places with sometimes inconsistent checks or default values: INT_MAX INT_MAX >> BDRV_SECTOR_BITS UINT_MAX >> BDRV_SECTOR_BITS SIZE_MAX >> BDRV_SECTOR_BITS This patches introdocues a macro for the maximal allowed sectors per request and uses it at several places. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:22 +01:00
Max Reitz	1ce52846d3	nbd: Improve error messages This patch makes use of the Error object for nbd_receive_negotiate() so that errors during negotiation look nicer. Furthermore, this patch adds an additional error message if the received magic was wrong, but would be correct for the other protocol version, respectively: So if an export name was specified, but the NBD server magic corresponds to an old handshake, this condition is explicitly signaled to the user, and vice versa. As these messages are now part of the "Could not open image" error message, additional filtering has to be employed in iotest 083, which this patch does as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:22 +01:00
Francesco Romani	e2462113b2	block: add event when disk usage exceeds threshold Managing applications, like oVirt (http://www.ovirt.org), make extensive use of thin-provisioned disk images. To let the guest run smoothly and be not unnecessarily paused, oVirt sets a disk usage threshold (so called 'high water mark') based on the occupation of the device, and automatically extends the image once the threshold is reached or exceeded. In order to detect the crossing of the threshold, oVirt has no choice but aggressively polling the QEMU monitor using the query-blockstats command. This lead to unnecessary system load, and is made even worse under scale: deployments with hundreds of VMs are no longer rare. To fix this, this patch adds: * A new monitor command `block-set-write-threshold', to set a mark for a given block device. * A new event `BLOCK_WRITE_THRESHOLD', to report if a block device usage exceeds the threshold. * A new `write_threshold' field into the `BlockDeviceInfo' structure, to report the configured threshold. This will allow the managing application to use smarter and more efficient monitoring, greatly reducing the need of polling. [Updated qemu-iotests 067 output to add the new 'write_threshold' property. --Stefan] [Changed g_assert_false() to !g_assert() to fix the build on older glib versions. --Kevin] Signed-off-by: Francesco Romani <fromani@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1421068273-692-1-git-send-email-fromani@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Lieven	f4564d53c6	block: add accounting for merged requests Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Jeff Cody	9a29e18f7d	block: update string sizes for filename,backing_file,exact_filename The string field entries 'filename', 'backing_file', and 'exact_filename' in the BlockDriverState struct are defined as 1024 bytes. However, many places that use these values accept a maximum of PATH_MAX bytes, so we have a mixture of 1024 byte and PATH_MAX byte allocations. This patch makes the BlockDriverStruct field string sizes match usage. This patch also does a few fixes related to the size that needs to happen now: * the block qapi driver is updated to use PATH_MAX bytes * the qcow and qcow2 drivers have an additional safety check * the block vvfat driver is updated to use PATH_MAX bytes for the size of backing_file, for systems where PATH_MAX is < 1024 bytes. * qemu-img uses PATH_MAX rather than 1024. These instances were not changed to be dynamically allocated, however, as the extra temporary 3K in stack usage for qemu-img does not seem worrisome. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:06 +01:00
Peter Maydell	b629a38a13	Mostly bugfixes and cleanups from qemu-devel. Yet another small patch from the record/replay series, and a few SCSI and i386 patches as well. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJUtjlCAAoJEL/70l94x66Dy5gH/0QIHoXVH/2wuA9apNK2/gBj 2U7g08QGKlc2wQGF4a48sQf523lSt5eirVxrwta0wmvFeznrdR84d4YGpolHM67A Q9Y5J2i+v1H6cfQH6ylq61QQ7rEC3+isa65wblLeMSCAb2W1CcV7avSKu4BSPZw2 jGr3jd2Ve7pOsULpPhiNsmmltYSeZc7sQBYc9C7fQEoxOGsNnRoKOUKPnIk1mJTc iYH480L1MnOL3enIz13K34lQofNRhJxJBLYKhYsBydQbOh0/Ls1eifOY4xEegXZ0 IUODy6c2pk+s/IUPARpBucKGKzDxdv0DLXDV60uGn5EsYT0CjCl9/sRs3bZvaQE= =eT8u -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging Mostly bugfixes and cleanups from qemu-devel. Yet another small patch from the record/replay series, and a few SCSI and i386 patches as well. # gpg: Signature made Wed 14 Jan 2015 09:39:14 GMT using RSA key ID 78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: cpus: consistently use QEMU_CLOCK_VIRTUAL_RT for icount_warp_rt timer qemu-timer: rename timer_init to timer_init_tl scsi: fix cancellation when I/O was completed but DMA was not. rules.mak: Fix module build hw/scsi/lsi53c895a: add support for additional diag / debug registers qemu-common.h: optimise muldiv64 if int128 is available target-i386: do not memcpy in and out of xmm_regs target-i386: fix movntsd on big-endian hosts vl.c: fix regression when reading memory size from config file vl: Don't silently change topology when all -smp options were set vl: fix max_cpus check vl: Avoid unnecessary 'if' nesting 9pfs: changed to use event_notifier instead of qemu_pipe vl.c: fix regression when reading machine type from config file char: restore stdio echo on resume from suspend. Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-01-14 18:02:47 +00:00
Paolo Bonzini	f186aa976b	qemu-timer: rename timer_init to timer_init_tl timer_init is not called that often. Free the name for an equivalent of timer_new. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-14 10:38:57 +01:00
Fam Zheng	bb00021de0	block: Split BLOCK_OP_TYPE_COMMIT to BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET} Like BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET, block-commit involves two asymmetric devices. This change is not user-visible (yet), because commit only works with device names. But once we enable backing reference in blockdev-add, or specifying node-name in block-commit command, we don't want the user to start two commit jobs on the same backing chain, which will corrupt things because of the final bdrv_swap. Before we have per category blockers, splitting this type is still better. [Resolved virtio-blk dataplane conflict by replacing BLOCK_OP_TYPE_COMMIT with both BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}. They are safe since the block job runs in the same AioContext as the dataplane IOThread. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-01-13 13:43:29 +00:00
Paolo Bonzini	66552b894b	coroutine: drop qemu_coroutine_adjust_pool_size This is not needed anymore. The new TLS-based algorithm is adaptive. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417518350-6167-7-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-01-13 13:43:29 +00:00
Vladimir Sementsov-Ogievskiy	c4237dfa63	block: fix spoiling all dirty bitmaps by mirror and migration Mirror and migration use dirty bitmaps for their purposes, and since commit [block: per caller dirty bitmap] they use their own bitmaps, not the global one. But they use old functions bdrv_set_dirty and bdrv_reset_dirty, which change all dirty bitmaps. Named dirty bitmaps series by Fam and Snow are affected: mirroring and migration will spoil all (not related to this mirroring or migration) named dirty bitmaps. This patch fixes this by adding bdrv_set_dirty_bitmap and bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent such mistakes in future, old functions bdrv_(set,reset)_dirty are made static, for internal block usage. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com> CC: John Snow <jsnow@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Denis V. Lunev <den@openvz.org> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417081246-3593-1-git-send-email-vsementsov@parallels.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-01-13 11:47:56 +00:00
Max Reitz	9f07429e88	block: JSON filenames and relative backing files When using a relative backing file name, qemu needs to know the directory of the top image file. For JSON filenames, such a directory cannot be easily determined (e.g. how do you determine the directory of a qcow2 BDS directly on top of a quorum BDS?). Therefore, do not allow relative filenames for the backing file of BDSs only having a JSON filename. Furthermore, BDS::exact_filename should be used whenever possible. If BDS::filename is not equal to BDS::exact_filename, the former will always be a JSON object. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Max Reitz	0a82855a1a	block: Get full backing filename from string Introduce bdrv_get_full_backing_filename_from_filename(), a function which takes the name of the backed file and a potentially relative backing filename to produce the full (absolute) backing filename. Use this function from bdrv_get_full_backing_filename(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Stefan Hajnoczi	b5cf2c1b08	block: drop unused bdrv_clear_incoming_migration_all() prototype The bdrv_clear_incoming_migration_all() function has not existed since commit `7ea2d269cb` ("block/migration: Disable cache invalidate for incoming migration"). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1418212937-22222-1-git-send-email-stefanha@redhat.com	2014-12-12 16:55:16 +00:00
Max Reitz	5c98415b2a	vmdk: Fix error for JSON descriptor file names If vmdk blindly tries to use path_combine() using bs->file->filename as the base file name, this will result in a bad error message for JSON file names when calling bdrv_open(). It is better to only try bs->file->exact_filename; if that is empty, bs->file->filename will be useless for path_combine() and an error should be emitted (containing bs->file->filename because desc_file_path (which is bs->file->exact_filename) is empty). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417615043-26174-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 13:14:10 +00:00
Max Reitz	5f535a941e	block: Make essential BlockDriver objects public There are some block drivers which are essential to QEMU and may not be removed: These are raw, file and qcow2 (as the default non-raw format). Make their BlockDriver objects public so they can be directly referenced throughout the block layer without needing to call bdrv_find_format() and having to deal with an error at runtime, while the real problem occurred during linking (where raw, file or qcow2 were not linked into qemu). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:19 +01:00
Kevin Wolf	38f3ef574b	raw: Prohibit dangerous writes for probed images If the user neglects to specify the image format, QEMU probes the image to guess it automatically, for convenience. Relying on format probing is insecure for raw images (CVE-2008-2004). If the guest writes a suitable header to the device, the next probe will recognize a format chosen by the guest. A malicious guest can abuse this to gain access to host files, e.g. by crafting a QCOW2 header with backing file /etc/shadow. Commit `1e72d3b` (April 2008) provided -drive parameter format to let users disable probing. Commit `f965509` (March 2009) extended QCOW2 to optionally store the backing file format, to let users disable backing file probing. QED has had a flag to suppress probing since the beginning (2010), set whenever a raw backing file is assigned. All of these additions that allow to avoid format probing have to be specified explicitly. The default still allows the attack. In order to fix this, commit `79368c8` (July 2010) put probed raw images in a restricted mode, in which they wouldn't be able to overwrite the first few bytes of the image so that they would identify as a different image. If a write to the first sector would write one of the signatures of another driver, qemu would instead zero out the first four bytes. This patch was later reverted in commit `8b33d9e` (September 2010) because it didn't get the handling of unaligned qiov members right. Today's block layer that is based on coroutines and has qiov utility functions makes it much easier to get this functionality right, so this patch implements it. The other differences of this patch to the old one are that it doesn't silently write something different than the guest requested by zeroing out some bytes (it fails the request instead) and that it doesn't maintain a list of signatures in the raw driver (it calls the usual probe function instead). Note that this change doesn't introduce new breakage for false positive cases where the guest legitimately writes data into the first sector that matches the signatures of an image format (e.g. for nested virt): These cases were broken before, only the failure mode changes from corruption after the next restart (when the wrong format is probed) to failing the problematic write request. Also note that like in the original patch, the restrictions only apply if the image format has been guessed by probing. Explicitly specifying a format allows guests to write anything they like. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:13 +01:00
Kevin Wolf	7cddd3728e	block: Read only one sector for format probing The only image format driver that even potentially accesses anything after 512 bytes in its bdrv_probe() implementation is VMDK, which reads a plain-text descriptor file. In practice, the field it's looking for seems to come first and will be well within the first 512 bytes, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1416497234-29880-7-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Max Reitz	e140177d9c	nbd: Change external interface to BlockBackend Substitute BlockDriverState by BlockBackend in every globally visible function provided by nbd. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1416309679-333-5-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Fam Zheng	20a9e77dfa	block: Add bdrv_get_node_name This returns the node name of a BDS. Remove the TODO comment and expect the callers to be explicit. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:25:29 +01:00
Fam Zheng	04df765ab4	block: Add bdrv_next_node Similar to bdrv_next, this traverses through graph_bdrv_states. Will be useful to enumerate all the named nodes. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:25:29 +01:00
Peter Maydell	776346cd63	trivial patches for 2014-11-11 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJUYh9vAAoJEL7lnXSkw9fbgPQH/065L5+SpaJR1Nte9Lz3N2s1 a6tGSI22yu85tKvYCdYjeoVHSkSTyR57FdTfUd2xc2QPj+J4sWXpA81KILBGTJUp NMpmLpWg4LOh8Ek4ViRgmFFdryzIFa4dT4gc1AcSAIAQ6jsgK1dM7m5kfncC3TN0 TUs248vJ2i/DaE0k8TOeJqxJTqInoFttlJEqG7RD+V5JznokE4zpFNXHDGx9BptE W2J38GJ/TKRPe9UrHMKZI1r6+ZBdXyE/CaqsNNKLJdqrHgSQuAyK/PS6dQbM4BLg M1qdP7Tp0wOlvv9qoEZMOEiUsi54XPqLgaLMbW74Yp5X459fqmLW2imy49pHXt8= =klsW -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2014-11-11' into staging trivial patches for 2014-11-11 # gpg: Signature made Tue 11 Nov 2014 14:38:39 GMT using RSA key ID A4C3D7DB # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" # gpg: aka "Michael Tokarev <mjt@corpit.ru>" # gpg: aka "Michael Tokarev <mjt@debian.org>" * remotes/mjt/tags/pull-trivial-patches-2014-11-11: block: Fix comment for bdrv_co_get_block_status sysbus: Correct SYSTEM_BUS(obj) defines target-i386: cpu: keeping function parameters alignment on new line xen-hvm: Remove redundant variable 'xstate' coroutine-sigaltstack: Change jmp_buf to sigjmp_buf pc-bios: petalogix-s3adsp1800.dtb: Use 'xlnx, xps-ethernetlite-2.00.a' instead of 'xlnx, xps-ethernetlite-2.00.b' gdbstub: Add a missing case of signal number translation in gdbstub numa: make 'info numa' take into account hotplugged memory slirp/smbd: modify/set several parameters in generated smbd.conf qemu-doc.texi: fix typos in x509 examples icc_bus: fix typo ICC_BRIGDE -> ICC_BRIDGE Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-11-11 14:50:10 +00:00
Fam Zheng	705be728c0	block: Fix comment for bdrv_co_get_block_status It returns more information than binary, fix the comment. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2014-11-11 17:36:19 +03:00
Stefan Hajnoczi	5b98db0ad3	block: add bdrv_drain() Now that op blockers are in use, we can ensure that no other sources are generating I/O on a BlockDriverState. Therefore it is possible to drain requests for a single BDS. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-7-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Stefan Hajnoczi	dec7d421f8	blockjob: add block_job_defer_to_main_loop() Block jobs will run in the BlockDriverState's AioContext, which may not always be the QEMU main loop. There are some block layer APIs that are either not thread-safe or risk lock ordering problems. This includes bdrv_unref(), bdrv_close(), and anything that calls bdrv_drain_all(). The block_job_defer_to_main_loop() API allows a block job to schedule a function to run in the main loop with the BlockDriverState AioContext held. This function will be used to perform cleanup and backing chain manipulations in block jobs. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-6-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Max Reitz	7748543420	block: Add status callback to bdrv_amend_options() Depending on the changed options and the image format, bdrv_amend_options() may take a significant amount of time. In these cases, a way to be informed about the operation's status is desirable. Since the operation is rather complex and may fundamentally change the image, implementing it as AIO or a coroutine does not seem feasible. On the other hand, implementing it as a block job would be significantly more difficult than a simple callback and would not add benefits other than progress report to the amending operation, because it should not actually be run as a block job at all. A callback may not be very pretty, but it's very easy to implement and perfectly fits its purpose here. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414404776-4919-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	ef6dbf1e46	blockjob: Add "ready" field When a block job signals readiness, this is currently reported only through QMP. If qemu wants to use block jobs for internal tasks, there needs to be another way to correctly detect when a block job may be completed. For this reason, introduce a bool "ready" which is set when the block job may be completed. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414159063-25977-6-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	345f9e1b04	blockjob: Introduce block_job_complete_sync() Implement block_job_complete_sync() by doing the exact same thing as block_job_cancel_sync() does, only with calling block_job_complete() instead of block_job_cancel(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1414159063-25977-5-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	94054183da	qcow2: Optimize bdrv_make_empty() bdrv_make_empty() is currently only called if the current image represents an external snapshot that has been committed to its base image; it is therefore unlikely to have internal snapshots. In this case, bdrv_make_empty() can be greatly sped up by emptying the L1 and refcount table (while having the dirty flag set, which only works for compat=1.1) and creating a trivial refcount structure. If there are snapshots or for compat=0.10, fall back to the simple implementation (discard all clusters). [Applied s/clusters/cluster/ typo fix suggested by Eric Blake --Stefan] Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1414159063-25977-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Peter Lieven	2647fab57d	BlockLimits: introduce max_transfer_length Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Max Reitz	9ebd844805	block: Add qemu_{,try_}blockalign0() These functions call their non-0-counterparts and then fill the allocated buffer with 0 (if the allocation has been successful). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Markus Armbruster	a7f53e26a6	block: Lift device model API into BlockBackend Move device model attachment / detachment and the BlockDevOps device model callbacks and their wrappers from BlockDriverState to BlockBackend. Wrapper calls in block.c change from bdrv_dev_FOO_cb(bs, ...) to if (bs->blk) { bdrv_dev_FOO_cb(bs->blk, ...); } No change, because both bdrv_dev_change_media_cb() and bdrv_dev_resize_cb() do nothing when no device model is attached, and a device model can be attached only when bs->blk. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	d829a2115f	block/qapi: Convert qmp_query_block() to BlockBackend Much more command code needs conversion. I start with this one because it's using bdrv_dev_* functions, which I'm about to lift into BlockBackend. While there, give bdrv_query_info() internal linkage. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	097310b53e	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	7c84b1b831	block: Rename BlockDriverAIOCB* to BlockAIOCB* I'll use BlockDriverAIOCB with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	7f06d47eff	block: Merge BlockBackend and BlockDriverState name spaces BlockBackend's name space is separate only to keep the initial patches simple. Time to merge the two. Retain bdrv_find() and bdrv_get_device_name() for now, to keep this series manageable. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	bfb197e0d9	block: Eliminate BlockDriverState member device_name[] device_name[] can become non-empty only in bdrv_new_root() and bdrv_move_feature_fields(). The latter is used only to undo damage done by bdrv_swap(). The former is called only by blk_new_with_bs(). Therefore, when a BlockDriverState's device_name[] is non-empty, then it's been created with a BlockBackend, and vice versa. Furthermore, blk_new_with_bs() keeps the two names equal. Therefore, device_name[] is redundant. Eliminate it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	fea68bb6e9	block: Eliminate bdrv_iterate(), use bdrv_next() Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	7e7d56d9e0	block: Connect BlockBackend to BlockDriverState Convenience function blk_new_with_bs() creates a BlockBackend with its BlockDriverState. Callers have to unref both. The commit after next will relieve them of the need to unref the BlockDriverState. Complication: due to the silly way drive_del works, we need a way to hide a BlockBackend, just like bdrv_make_anon(). To emphasize its "special" status, give the function a suitably off-putting name: blk_hide_on_behalf_of_do_drive_del(). Unfortunately, hiding turns the BlockBackend's name into the empty string. Can't avoid that without breaking the blk->bs->device_name equals blk->name invariant. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	e4e9986b1c	block: Split bdrv_new_root() off bdrv_new() Creating an anonymous BDS can't fail. Make that obvious. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Chrysostomos Nanakos	2f78e491d7	async: aio_context_new(): Handle event_notifier_init failure On a system with a low limit of open files the initialization of the event notifier could fail and QEMU exits without printing any error information to the user. The problem can be easily reproduced by enforcing a low limit of open files and start QEMU with enough I/O threads to hit this limit. The same problem raises, without the creation of I/O threads, while QEMU initializes the main event loop by enforcing an even lower limit of open files. This commit adds an error message on failure: # qemu [...] -object iothread,id=iothread0 -object iothread,id=iothread1 qemu: Failed to initialize event notifier: Too many open files in system Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:48 +01:00
Fam Zheng	8007429a99	block: Rename qemu_aio_release -> qemu_aio_unref Suggested-by: Benoît Canet <benoit.canet@irqsave.net> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:17 +01:00
Fam Zheng	ca5fd113b8	block: Drop AIOCBInfo.cancel Now that all the implementations are converted to asynchronous version and we can emulate synchronous cancellation with it. Let's drop the unused member. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:16 +01:00
Fam Zheng	02c50efe08	block: Add bdrv_aio_cancel_async This is the async version of bdrv_aio_cancel, which doesn't block the caller. It guarantees that the cb is called either before returning or some time later. bdrv_aio_cancel can base on bdrv_aio_cancel_async, later we can convert all .io_cancel implementations to .io_cancel_async, and the aio_poll is the common logic. In the end, .io_cancel can be dropped. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:38:58 +01:00
Fam Zheng	f197fe2b2c	block: Add refcnt in BlockDriverAIOCB This will be useful in synchronous cancel emulation with bdrv_aio_cancel_async. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:38:57 +01:00
Benoît Canet	5366d0c8bc	block: Make the block accounting functions operate on BlockAcctStats This is the next step for decoupling block accounting functions from BlockDriverState. In a future commit the BlockAcctStats structure will be moved from BlockDriverState to the device models structures. Note that bdrv_get_stats was introduced so device models can retrieve the BlockAcctStats structure of a BlockDriverState without being aware of it's layout. This function should go away when BlockAcctStats will be embedded in the device models structures. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Keith Busch <keith.busch@intel.com> CC: Anthony Liguori <aliguori@amazon.com> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Peter Maydell <peter.maydell@linaro.org> CC: Michael Tokarev <mjt@tls.msk.ru> CC: John Snow <jsnow@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Alexander Graf <agraf@suse.de> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	28298fd3d9	block: rename BlockAcctType members to start with BLOCK_ instead of BDRV_ The middle term goal is to move the BlockAcctStats structure in the device models. (Capturing I/O accounting statistics in the device models is good for billing) This patch make a small step in this direction by removing a reference to BDRV. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Keith Busch <keith.busch@intel.com> CC: Anthony Liguori <aliguori@amazon.com> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Richard Henderson <rth@twiddle.net> CC: Markus Armbruster <armbru@redhat.com> CC: Alexander Graf <agraf@suse.de>i Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	5e5a94b605	block: Extract the block accounting code The plan is to add new accounting metrics (latency, invalid requests, failed requests, queue depth) and block.c is overpopulated so it will be better to work in a separate module. Moreover the long term plan is to have statistics in each of the BDS of the graph for metrology purpose; this means that the device model statistics must move from the topmost BDS to the device model. So we need to decouple the statistic code from BlockDriverState. This is another argument for the extraction of the code in a separate module. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Benoit Canet <benoit@irqsave.net> CC: Fam Zheng <famz@redhat.com> CC: Peter Crosthwaite <peter.crosthwaite@xilinx.com> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	0ddd0ad96a	block: Extract the BlockAcctStats structure Extract the block accounting statistics into a structure so the block device models can hold them in the future. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Markus Armbruster	a31e69cf00	thread-pool: Drop unnecessary includes Dragging block_int.h into a header is not nice. Fortunately, this is the only offender. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Max Reitz	33384421b3	block: Add AIO context notifiers If a long-running operation on a BDS wants to always remain in the same AIO context, it somehow needs to keep track of the BDS changing its context. This adds a function for registering callbacks on a BDS which are called whenever the BDS is attached or detached from an AIO context. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:48:45 +01:00
Paolo Bonzini	b493317d34	aio-win32: add support for sockets Uses the same select/WSAEventSelect scheme as main-loop.c. WSAEventSelect() is edge-triggered, so it cannot be used directly, but it is still used as a way to exit from a blocking g_poll(). Before g_poll() is called, we poll sockets with a non-blocking select() to achieve the level-triggered semantics we require: if a socket is ready, the g_poll() is made non-blocking too. Based on a patch from Or Goshen. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:46:58 +01:00
Paolo Bonzini	a3462c6561	AioContext: introduce aio_prepare This will be used to implement socket polling on Windows. On Windows, select() and g_poll() are completely different; sockets are polled with select() before calling g_poll, and the g_poll must be nonblocking if select() says a socket is ready. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:46:58 +01:00
Paolo Bonzini	e4c7e2d12d	AioContext: export and use aio_dispatch So far, aio_poll's scheme was dispatch/poll/dispatch, where the first dispatch phase was used only in the GSource case in order to avoid a blocking poll. Earlier patches changed it to dispatch/prepare/poll/dispatch, where prepare is aio_compute_timeout. By making aio_dispatch public, we can remove the first dispatch phase altogether, so that both aio_poll and the GSource use the same prepare/poll/dispatch scheme. This patch breaks the invariant that aio_poll(..., true) will not block the first time it returns false. This used to be fundamental for qemu_aio_flush's implementation as "while (qemu_aio_wait()) {}" but no code in QEMU relies on this invariant anymore. The return value of aio_poll() is now comparable with that of g_main_context_iteration. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:46:58 +01:00
Paolo Bonzini	845ca10dd0	AioContext: take bottom halves into account when computing aio_poll timeout Right now, QEMU invokes aio_bh_poll before the "poll" phase of aio_poll. It is simpler to do it afterwards and skip the "poll" phase altogether when the OS-dependent parts of AioContext are invoked from GSource. This way, AioContext behaves more similarly when used as a GSource vs. when used as stand-alone. As a start, take bottom halves into account when computing the poll timeout. If a bottom half is ready, do a non-blocking poll. As a side effect, this makes idle bottom halves work with aio_poll; an improvement, but not really an important one since they are deprecated. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:46:58 +01:00
Fam Zheng	0b9caf9b31	coroutine: Drop co_sleep_ns block_job_sleep_ns is the only user. Since we are moving towards AioContext aware code, it's better to use the explicit version and drop the old one. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:46:58 +01:00
Max Reitz	91af701412	block: Add bdrv_refresh_filename() Some block devices may not have a filename in their BDS; and for some, there may not even be a normal filename at all. To work around this, add a function which tries to construct a valid filename for the BDS.filename field. If a filename exists or a block driver is able to reconstruct a valid filename (which is placed in BDS.exact_filename), this can directly be used. If no filename can be constructed, we can still construct an options QDict which is then converted to a JSON object and prefixed with the "json:" pseudo protocol prefix. The QDict is placed in BDS.full_open_options. For most block drivers, this process can be done automatically; those that need special handling may define a .bdrv_refresh_filename() method to fill BDS.exact_filename and BDS.full_open_options themselves. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 14:31:56 +02:00
Kevin Wolf	7d2a35cc92	block: Introduce qemu_try_blockalign() This function returns NULL instead of aborting when an allocation fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:15 +02:00
Stefan Hajnoczi	ac2662a913	coroutine: make pool size dynamic Allow coroutine users to adjust the pool size. For example, if the guest has multiple emulated disk drives we should keep around more coroutines. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-08-15 15:07:14 +02:00
Markus Armbruster	65a9bb25d6	block: New bdrv_nb_sectors() A call to retrieve the image size converts between bytes and sectors several times: * BlockDriver method bdrv_getlength() returns bytes. * refresh_total_sectors() converts to sectors, rounding up, and stores in total_sectors. * bdrv_getlength() converts total_sectors back to bytes (now rounded up to a multiple of the sector size). * Callers wanting sectors rather bytes convert it right back. Example: bdrv_get_geometry(). bdrv_nb_sectors() provides a way to omit the last two conversions. It's exactly bdrv_getlength() with the conversion to bytes omitted. It's functionally like bdrv_get_geometry() without its odd error handling. Reimplement bdrv_getlength() and bdrv_get_geometry() on top of bdrv_nb_sectors(). The next patches will convert some users of bdrv_getlength() to bdrv_nb_sectors(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:12 +02:00
Kevin Wolf	3baca89139	block: Add Error argument to bdrv_refresh_limits() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-18 13:18:43 +01:00
Paolo Bonzini	acfb23ad3d	AioContext: do not rely on aio_poll(ctx, true) result to end a loop Currently, whenever aio_poll(ctx, true) has completed all pending work it returns true and the next call to aio_poll(ctx, true) will not block. This invariant has its roots in qemu_aio_flush()'s implementation as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does not exist anymore and bdrv_drain_all() is implemented differently; and this invariant is complicated to maintain and subtly different from the return value of GMainLoop's g_main_context_iteration. All calls to aio_poll(ctx, true) except one are guarded by a while() loop checking for a request to be incomplete, or a BlockDriverState to be idle. The one remaining call (in iothread.c) uses this to delay the aio_context_release/acquire pair until the AioContext is quiescent, however: - we can do the same just by using non-blocking aio_poll, similar to how vl.c invokes main_loop_wait - it is buggy, because it does not ensure that the AioContext is released between an aio_notify and the next time the iothread goes to sleep. This leads to hangs when stopping the dataplane thread. In the end, these semantics are a bad match for the current users of AioContext. So modify that one exception in iothread.c, which also fixes the hangs, as well as the testcase so that it use the same idiom as the actual QEMU code. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-07-14 12:03:20 +02:00
Paolo Bonzini	0ceb849bd3	AioContext: speed up aio_notify In many cases, the call to event_notifier_set in aio_notify is unnecessary. In particular, if we are executing aio_dispatch, or if aio_poll is not blocking, we know that we will soon get to the next loop iteration (if necessary); the thread that hosts the AioContext's event loop does not need any nudging. The patch includes a Promela formal model that shows that this really works and does not need any further complication such as generation counts. It needs a memory barrier though. The generation counts are not needed because any change to ctx->dispatching after the memory barrier is okay for aio_notify. If it changes from zero to one, it is the right thing to skip event_notifier_set. If it changes from one to zero, the event_notifier_set is unnecessary but harmless. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-07-09 15:50:11 +02:00
Paolo Bonzini	87f68d3182	block: drop aio functions that operate on the main AioContext The main AioContext should be accessed explicitly via qemu_get_aio_context(). Most of the time, using it is not the right thing to do. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-07-09 15:50:11 +02:00
Ming Lei	448ad91db4	block: block: introduce APIs for submitting IO as a batch This patch introduces three APIs so that following patches can support queuing I/O requests and submitting them as a batch for improving I/O performance. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-07 11:05:17 +02:00
Jeff Cody	54e2690090	block: extend block-commit to accept a string for the backing file On some image chains, QEMU may not always be able to resolve the filenames properly, when updating the backing file of an image after a block commit. For instance, certain relative pathnames may fail, or drives may have been specified originally by file descriptor (e.g. /dev/fd/???), or a relative protocol pathname may have been used. In these instances, QEMU may lack the information to be able to make the correct choice, but the user or management layer most likely does have that knowledge. With this extension to the block-commit api, the user is able to change the backing file of the overlay image as part of the block-commit operation. This allows the change to be 'safe', in the sense that if the attempt to write the overlay image metadata fails, then the block-commit operation returns failure, without disrupting the guest. If the commit top is the active layer, then specifying the backing file string will be treated as an error (there is no overlay image to modify in that case). If a backing file string is not specified in the command, the backing file string to use is determined in the same manner as it was previously. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-01 10:47:01 +02:00
Jeff Cody	5a6684d2b9	block: add helper function to determine if a BDS is in a chain This is a small helper function, to determine if 'base' is in the chain of BlockDriverState 'top'. It returns true if it is in the chain, and false otherwise. If either argument is NULL, it will also return false. Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-01 10:47:01 +02:00
Chunyan Liu	4ab1559085	qemu-img create: add 'nocow' option Add 'nocow' option so that users could have a chance to set NOCOW flag to newly created files. It's useful on btrfs file system to enhance performance. Btrfs has low performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files. Generally, there are two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then all newly created files will be NOCOW. b) per file. Add the NOCOW file attribute. It could only be done to empty or new files. This patch tries the second way, according to the option, it could add NOCOW per file. For most block drivers, since the create file step is in raw-posix.c, so we can do setting NOCOW flag ioctl in raw-posix.c only. But there are some exceptions, like block/vpc.c and block/vdi.c, they are creating file by calling qemu_open directly. For them, do the same setting NOCOW flag ioctl work in them separately. [Fixed up 082.out due to the new 'nocow' creation option --Stefan] Signed-off-by: Chunyan Liu <cyliu@suse.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-01 10:15:12 +02:00
Peter Maydell	8954000b9e	Merge remote-tracking branch 'remotes/bonzini/nbd-next' into staging * remotes/bonzini/nbd-next: nbd: Handle NBD_OPT_LIST option. nbd: Handle fixed new-style clients. nbd: Shutdown socket before closing. nbd: Don't validate from and len in NBD_CMD_DISC. nbd: Don't export a block device with no medium. Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-06-30 16:13:32 +01:00
Hani Benhabiles	32d7d2e068	nbd: Handle NBD_OPT_LIST option. Signed-off-by: Hani Benhabiles <kroosec@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-30 12:50:17 +02:00
Hani Benhabiles	f5076b5a75	nbd: Handle fixed new-style clients. When this flag is set, the server tells the client that it can send another option if the server received a request with an option that it doesn't understand instead of directly closing the connection. Also add link to the most up-to-date documentation. Signed-off-by: Hani Benhabiles <kroosec@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-30 12:50:17 +02:00
Peter Maydell	2d40fa6987	Block patches for 2.1.0-rc0 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJTrbz4AAoJEH8JsnLIjy/WlIkP/RepIwS29f19i3B/idGzUdYW 9XJnVowRvpkUzDqUprrr7lPHMW/CwAswLNis9B1hZ59rx+tx4Hm/rZGARlqhSOSO ZMdW32GFW0SyC5PglFBwGQAk4U0FxwW5cJD6US7h3L4pACIdCkzFSNxehyfCMyU/ oJkjuAH4a2IQoQf/M7WMm5kPkrdpRp6ZgbQvJGHaR63cuulZDb7rbHMyG66MWH8P wahhFFPY1wOeMBiISxPbmcTus+AlfCffG5qPqq83OtaIuWzINTmWlpiFmtx+Aqwy HSvGnFJ4Rf7J6Fw8sdTsABdqUTc/gxDYmhAuftm/hsjD9MvPeuFSLPMPLfGg6aPR umKaeBOw8NoMTPgbxg403gxFTrHar+TidBu8KgZw5T189/oJSSpT2J53uHWazmd9 8USkcYQ7VHdFUQVXluLEzHMIWc7kf87ylQ8c9S1yCkNeWYxRZDZGgHEU49ov7FFU FnA0w+ZFyDkU8d5gryG+vxOeBDlmXD4UHa676gGlaYhs7YC/BY/JaMgqY4Fd6MMW dS5ibPjdtbxEZTh29eWEByMWpzuitr+iPPzsJEdC29LeIIj3XRQq/4FyiQ6EMAAO iOlcqE3tws0Ty8GEp78xsAYjaLuH3zmvOTa4aHUQ+K9kwpMPFSJKEcLkwPWWYRbs qR2ZL6M+95oQTYkYzv8i =Wwqx -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block patches for 2.1.0-rc0 # gpg: Signature made Fri 27 Jun 2014 19:50:32 BST using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: (47 commits) iotests: Fix 083 for out-of-tree builds iotests: Drop Python version from 065's Shebang iotests: Use $PYTHON for Python scripts iotests: Source common.env configure: Enable out-of-tree iotests iotests: Allow out-of-tree run block.c: Don't return success for bdrv_append_temp_snapshot() failure qemu-iotests: Add TestRepairQuorum to 041 to test drive-mirror node-name mode. block: Add replaces argument to drive-mirror blockjob: Fix recent BLOCK_JOB_ERROR regression blockjob: Fix recent BLOCK_JOB_READY regression virtio-blk: Rename complete_request_early to complete_request_vring virtio-blk: Unify {non-,}dataplane's request handlings virtio-blk: Schedule BH in the right context virtio-blk: Export request handling functions to dataplane virtio-blk: Make request completion function virtual block: acquire AioContext in qmp_query_blockstats() block: make bdrv_query_stats() static virtio-blk: Fix and clean up the in_sg and out_sg check virtio-blk: Fill in VirtIOBlockReq.out in dataplane code ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-06-29 15:24:54 +01:00
Chen Gang	6b8aeca574	block.c: Don't return success for bdrv_append_temp_snapshot() failure When failure occurs, 'ret' need be set, or may return 0 to indicate success. Previously, an error was set in errp, but 0 was returned anyway. So let bdrv_append_temp_snapshot() return an error code and use that for the bdrv_open() return value. Also, error_propagate() need be called only one time within a function. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-06-27 20:00:00 +02:00
Benoît Canet	09158f00e0	block: Add replaces argument to drive-mirror drive-mirror will bdrv_swap the new BDS named node-name with the one pointed by replaces when the mirroring is finished. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-06-27 20:00:00 +02:00
Stefan Hajnoczi	ac46821f2c	block: make bdrv_query_stats() static This function is only called from block/qapi.c. There is no need to keep it public. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Tested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-06-27 18:19:57 +02:00
Wenchao Xia	2f44a08b3e	qapi event: clean up in callers This patch improves docs and address small issues in event callers. Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2014-06-27 09:27:56 -04:00
Kevin Wolf	8ee79e707a	block: Catch backing files assigned to non-COW drivers Since we parse backing.* options to add a backing file from the command line when the driver didn't assign one, it has been possible to have a backing file for e.g. raw images (it just was never accessed). This is obvious nonsense and should be rejected. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-06-26 13:51:01 +02:00
Fam Zheng	dc71ce45de	blockjob: Add block_job_yield() This will unset busy flag and put coroutine to sleep, can be used to wait for QMP complete/cancel. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-06-26 12:12:22 +02:00
Wenchao Xia	bcada37b19	qapi event: convert other BLOCK_JOB events Since BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED, BLOCK_JOB_READY are related, convert them in one patch. The block_job_event_* functions are used to keep encapsulation of BlockJob structure. Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2014-06-23 11:12:28 -04:00
Wenchao Xia	5a2d2cbd88	qapi event: convert BLOCK_IO_ERROR and BLOCK_JOB_ERROR Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2014-06-23 11:12:27 -04:00
Wenchao Xia	a589569f2f	qapi: adjust existing defines In order to let event defines use existing types later, instead of redefine new ones, some old type defines for spice and vnc are changed, and BlockErrorAction is moved from block.h to qapi schema. Note that BlockErrorAction is not merged with BlockdevOnError. At this point, VncInfo is not made a child of VncBasicInfo, because VncBasicInfo has mandatory fields where VncInfo makes them optional. Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2014-06-23 11:01:25 -04:00
Alexey Kardashevskiy	b9e77bc718	scsi: Print command name in debug This makes scsi_command_name() public. This makes use of scsi_command_name() in debug output for scsi-disk and spapr-vscsi host bus adapter. Before this, SCSI used to print hex numbers instead of human-friendly strings. This adds GET_EVENT_STATUS_NOTIFICATION and READ_DISC_INFORMATION to the list of SCSI commands supported by scsi_command_name(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-18 08:47:10 +02:00
Chunyan Liu	c282e1fdf7	cleanup QEMUOptionParameter Now that all backend drivers are using QemuOpts, remove all QEMUOptionParameter related codes. Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Signed-off-by: Chunyan Liu <cyliu@suse.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-16 17:23:21 +08:00
Chunyan Liu	83d0521a1e	change block layer to support both QemuOpts and QEMUOptionParamter Change block layer to support both QemuOpts and QEMUOptionParameter. After this patch, it will change backend drivers one by one. At the end, QEMUOptionParameter will be removed and only QemuOpts is kept. Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Signed-off-by: Chunyan Liu <cyliu@suse.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-16 17:23:20 +08:00
Fam Zheng	db519cba87	block: Move declaration of bdrv_get_aio_context to block.h block_int.h is for block layer and block drivers, other code shouldn't include it. But similar to bdrv_set_aio_context, bdrv_get_aio_context should also be accessible from outside of block layer. Move it. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-04 09:56:12 +02:00
Stefan Hajnoczi	76ef2cf549	raw-posix: drop raw_get_aio_fd() since it is no longer used virtio-blk data-plane now uses the QEMU block layer for I/O. We do not need raw_get_aio_fd() anymore. It was a layering violation anyway, so let's get rid of it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-04 09:56:12 +02:00
Stefan Hajnoczi	dcd042282d	block: add bdrv_set_aio_context() Up until now all BlockDriverState instances have used the QEMU main loop for fd handlers, timers, and BHs. This is not scalable on SMP guests and hosts so we need to move to a model with multiple event loops on different host CPUs. bdrv_set_aio_context() assigns the AioContext event loop to use for a particular BlockDriverState. It first detaches the entire BlockDriverState graph from the current AioContext and then attaches to the new AioContext. This function will be used by virtio-blk data-plane to assign a BlockDriverState to its IOThread AioContext. Make bdrv_aio_set_context() public since data-plane should not include block_int.h. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-04 09:56:11 +02:00
Fam Zheng	826b6ca0b0	block: Add backing_blocker in BlockDriverState This makes use of op_blocker and blocks all the operations except for commit target, on each BlockDriverState->backing_hd. The asserts for op_blocker in bdrv_swap are removed because with this change, the target of block commit has at least the backing blocker of its child, so the assertion is not true. Callers should do their check. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Fam Zheng	8d24cce1e3	block: Add bdrv_set_backing_hd() This is the common but non-trivial steps to assign or change the backing_hd of BDS. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Fam Zheng	3718d8ab65	block: Replace in_use with operation blocker This drops BlockDriverState.in_use with op_blockers: - Call bdrv_op_block_all in place of bdrv_set_in_use(bs, 1). - Call bdrv_op_unblock_all in place of bdrv_set_in_use(bs, 0). - Check bdrv_op_is_blocked() in place of bdrv_in_use(bs). The specific types are used, e.g. in place of starting block backup, bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP, ...). There is one exception in block_job_create, where bdrv_op_blocker_is_empty() is used, because we don't know the operation type here. This doesn't matter because in a few commits away we will drop the check and move it to callers that _do_ know the type. - Check bdrv_op_blocker_is_empty() in place of assert(!bs->in_use). Note: there is only bdrv_op_block_all and bdrv_op_unblock_all callers at this moment. So although the checks are specific to op types, this changes can still be seen as identical logic with previously with in_use. The difference is error message are improved because of blocker error info. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Fam Zheng	fbe40ff780	block: Introduce op_blockers to BlockDriverState BlockDriverState.op_blockers is an array of lists with BLOCK_OP_TYPE_MAX elements. Each list is a list of blockers of an operation type (BlockOpType), that marks this BDS as currently blocked for a certain type of operation with reason errors stored in the list. The rule of usage is: * BDS user who wants to take an operation should check if there's any blocker of the type with bdrv_op_is_blocked(). * BDS user who wants to block certain types of operation, should call bdrv_op_block (or bdrv_op_block_all to block all types of operations, which is similar to the existing bdrv_set_in_use()). * A blocker is only referenced by op_blockers, so the lifecycle is managed by caller, and shouldn't be lost until unblock, so typically a caller does these: - Allocate a blocker with error_setg or similar, call bdrv_op_block() to block some operations. - Hold the blocker, do his job. - Unblock operations that it blocked, with the same reason pointer passed to bdrv_op_unblock(). - Release the blocker with error_free(). Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Fam Zheng	8574575f90	block: Add BlockOpType enum This adds the enum of all the operations that can be taken on a block device. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-28 14:28:46 +02:00
Peter Lieven	465bee1da8	block: optimize zero writes with bdrv_write_zeroes this patch tries to optimize zero write requests by automatically using bdrv_write_zeroes if it is supported by the format. This significantly speeds up file system initialization and should speed zero write test used to test backend storage performance. I ran the following 2 tests on my internal SSD with a 50G QCOW2 container and on an attached iSCSI storage. a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX QCOW2 [off] [on] [unmap] ----- runtime: 14secs 1.1secs 1.1secs filesize: 937M 18M 18M iSCSI [off] [on] [unmap] ---- runtime: 9.3s 0.9s 0.9s b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct QCOW2 [off] [on] [unmap] ----- runtime: 246secs 18secs 18secs filesize: 51G 192K 192K throughput: 203M/s 2.3G/s 2.3G/s iSCSI* [off] [on] [unmap] ---- runtime: 8mins 45secs 33secs throughput: 106M/s 1.2G/s 1.6G/s allocated: 100% 100% 0% * The storage was connected via an 1Gbit interface. It seems to internally handle writing zeroes via WRITESAME16 very fast. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-05-19 13:42:27 +02:00
Kevin Wolf	e88ae2264d	block: Fix bdrv_is_allocated() for short backing files bdrv_is_allocated() shouldn't return true for sectors that are unallocated, but after the end of a short backing file, even though such sectors are (correctly) marked as containing zeros. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-05-19 11:36:48 +02:00
Kevin Wolf	b1e6fc0817	block: Fix open flags with BDRV_O_SNAPSHOT The immediately visible effect of this patch is that it fixes committing a temporary snapshot to its backing file. Previously, it would fail with a "permission denied" error because bdrv_inherited_flags() forced the backing file to be read-only, ignoring the r/w reopen of bdrv_commit(). The bigger problem this revealed is that the original open flags must actually only be applied to the temporary snapshot, and the original image file must be treated as a backing file of the temporary snapshot and get the right flags for that. Reported-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-09 20:57:31 +02:00
Fam Zheng	85f49cad87	qemu-img: Convert by cluster size if target is compressed If target block driver forces compression, qemu-img convert needs to write by cluster size as well as "-c" option. Particularly, this applies for converting to VMDK streamOptimized format. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-05-09 13:32:16 +02:00
Kevin Wolf	8bfea15dda	block: Unlink temporary files in raw-posix/win32 Instead of having unlink() calls in the generic block layer, where we aren't even guarateed to have a file name, move them to those block drivers that are actually used and that always have a filename. Gets us rid of some #ifdefs as well. The patch also converts bs->is_temporary to a new BDRV_O_TEMPORARY open flag so that it is inherited in the protocol layer and the raw-posix and raw-win32 drivers can unlink the file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-30 11:05:00 +02:00
Kevin Wolf	98522f63f4	block: Add errp to bdrv_new() This patch adds an errp parameter to bdrv_new() and updates all its callers. The next patches will make use of this in order to check for duplicate IDs. Most of the callers know that their ID is fine, so they can simply assert that there is no error. Behaviour doesn't change with this patch yet as bdrv_new() doesn't actually assign errors to errp. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-04-22 12:00:20 +02:00
Fam Zheng	b8afb520e4	block: Handle error of bdrv_getlength in bdrv_create_dirty_bitmap bdrv_getlength could fail, check the return value before using it. Return NULL and set errno if it fails. Callers are updated to handle the error case. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-04-22 11:57:02 +02:00
Kevin Wolf	b998875dcf	block: Fix snapshot=on for protocol parsed from filename Since commit `9fd3171a`, BDRV_O_SNAPSHOT uses an option QDict to specify the originally requested image as the backing file of the newly created temporary snapshot. This means that the filename is stored in "file.filename", which is an option that is not parsed for protocol names. Therefore things like -drive file=nbd:localhost:10809 were broken because it looked for a local file with the literal name 'nbd:localhost:10809'. This patch changes the way BDRV_O_SNAPSHOT works once again. We now open the originally requested image as normal, and then do a similar operation as for live snapshots to put the temporary snapshot on top. This way, both driver specific options and parsed filenames work. As a nice side effect, this results in code movement to factor bdrv_append_temp_snapshot() out. This is a good preparation for moving its call to drive_init() and friends eventually. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-04-04 19:35:51 +02:00
Kevin Wolf	5a8a30db47	block: Add error handling to bdrv_invalidate_cache() If it returns an error, the migrated VM will not be started, but qemu exits with an error message. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-03-19 09:39:41 +01:00
Stefan Hajnoczi	98563fc3ec	aio: add aio_context_acquire() and aio_context_release() It can be useful to run an AioContext from a thread which normally does not "own" the AioContext. For example, request draining can be implemented by acquiring the AioContext and looping aio_poll() until all requests have been completed. The following pattern should work: /* Event loop thread / while (running) { aio_context_acquire(ctx); aio_poll(ctx, true); aio_context_release(ctx); } / Another thread */ aio_context_acquire(ctx); bdrv_read(bs, 0x1000, buf, 1); aio_context_release(ctx); This patch implements aio_context_acquire() and aio_context_release(). Note that existing aio_poll() callers do not need to worry about acquiring and releasing - it is only needed when multiple threads will call aio_poll() on the same AioContext. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-03-13 14:42:24 +01:00
Benoît Canet	b5042a3622	block: Rewrite the snapshot authorization mechanism for block filters. This patch keep the recursive way of doing things but simplify it by giving two responsabilities to all block filters implementors. They will need to do two things: -Set the is_filter field of their block driver to true. -Implement the bdrv_recurse_is_first_non_filter method of their block driver like it is done on the Quorum block driver. (block/quorum.c) [Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes the semantics of blkverify, which now recurses down both bs->file and s->test_file. -- Stefan] Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-03-13 14:23:27 +01:00
Paolo Bonzini	537b41f501	nbd: move socket wrappers to qemu-nbd qemu-nbd is one of the few valid users of qerror_report_err. Move the error-reporting socket wrappers there. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:23 +01:00
Paolo Bonzini	c06b72781d	nbd: inline tcp_socket_incoming_spec into sole caller Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:22 +01:00
Paolo Bonzini	77e8b9ca64	nbd: correctly propagate errors Before: $ ./qemu-io-old qemu-io-old> open -r -o file.driver=nbd one of path and host must be specified. qemu-io-old: can't open device (null): Could not open image: Invalid argument $ ./qemu-io-old qemu-io-old> open -r -o file.driver=nbd,file.host=foo,file.path=bar path and host may not be used at the same time. qemu-io-old: can't open device (null): Could not open image: Invalid argument After: $ ./qemu-io qemu-io> open -r -o file.driver=nbd qemu-io: can't open device (null): one of path and host must be specified. $ ./qemu-io qemu-io> open -r -o file.driver=nbd,file.host=foo,file.path=bar qemu-io: can't open device (null): path and host may not be used at the same time. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:22 +01:00
Max Reitz	f7d9fd8c72	block: Remove bdrv_open_image()'s force_raw option This option is now unnecessary since specifying BDRV_O_PROTOCOL as flag will do exactly the same. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:22 +01:00
Max Reitz	2e40134bfd	block: Make bdrv_file_open() static Add the bdrv_open() option BDRV_O_PROTOCOL which results in passing the call to bdrv_file_open(). Additionally, make bdrv_file_open() static and therefore bdrv_open() the only way to call it. Consequently, all existing calls to bdrv_file_open() have to be adjusted to use bdrv_open() with the BDRV_O_PROTOCOL flag instead. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:22 +01:00
Max Reitz	ddf5636dc9	block: Add reference parameter to bdrv_open() Allow bdrv_open() to handle references to existing block devices just as bdrv_file_open() is already capable of. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:22 +01:00
Max Reitz	f67503e5bd	block: Change BDS parameter of bdrv_open() to ** Make bdrv_open() take a pointer to a BDS pointer, similarly to bdrv_file_open(). If a pointer to a NULL pointer is given, bdrv_open() will create a new BDS with an empty name; if the BDS pointer is not NULL, that existing BDS will be reused (in the same way as bdrv_open() already did). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-02-21 21:02:21 +01:00
Kevin Wolf	9e1cb96d9a	qemu-iotests: Test pwritev RMW logic Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:25 +01:00
Kevin Wolf	8407d5d7e2	block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper Instead of implementing the alignment adjustment here, use the now existing functionality of bdrv_co_do_pwritev(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:03 +01:00
Kevin Wolf	6460440f34	block: Allow wait_serialising_requests() at any point We can only have a single wait_serialising_requests() call per request because otherwise we can run into deadlocks where requests are waiting for each other. The same is true when wait_serialising_requests() is not at the very beginning of a request, so that other requests can be issued between the start of the tracking and wait_serialising_requests(). Fix this by changing wait_serialising_requests() to ignore requests that are already (directly or indirectly) waiting for the calling request. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	7327145f63	block: Make overlap range for serialisation dynamic Copy on Read wants to serialise with all requests touching the same cluster, so wait_serialising_requests() rounded to cluster boundaries. Other users like alignment RMW will have different requirements, though (requests touching the same sector), so make it dynamic. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	2dbafdc012	block: Generalise and optimise COR serialisation Change the API so that specific requests can be marked serialising. Only these requests are checked for overlaps then. This means that during a Copy on Read operation, not all requests overlapping other requests are serialised any more, but only those that actually overlap with the specific COR request. Also remove COR from function and variable names because this functionality can be useful in other contexts. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Kevin Wolf	793ed47a7a	block: Switch BdrvTrackedRequest to byte granularity Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:02 +01:00
Paolo Bonzini	c25f53b06e	raw: Probe required direct I/O alignment Add a bs->request_alignment field that contains the required offset/length alignment for I/O requests and fill it in the raw block drivers. Use ioctls if possible, else see what alignment it takes for O_DIRECT to succeed. While at it, also expose the memory alignment requirements, which may be (and in practice are) different from the disk alignment requirements. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:02 +01:00
Paolo Bonzini	1b7fd72955	block: rename buffer_alignment to guest_block_size The alignment field is now set to the value that is promised to the guest, rather than required by the host. The next patches will make QEMU aware of the host-provided values, so make this clear. The alignment is also not about memory buffers, but about the sectors on the disk, change the documentation of the field. At this point, the field is set by the device emulation, but completely ignored by the block layer. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:01 +01:00
Kevin Wolf	339064d506	block: Don't use guest sector size for qemu_blockalign() bs->buffer_alignment is set by the device emulation and contains the logical block size of the guest device. This isn't something that the block layer should know, and even less something to use for determining the right alignment of buffers to be used for the host. The new BlockLimits field opt_mem_alignment tells the qemu block layer the optimal alignment to be used so that no bounce buffer must be used in the driver. This patch may change the buffer alignment from 4k to 512 for all callers that used qemu_blockalign() with the top-level image format BlockDriverState. The value was never propagated to other levels in the tree, so in particular raw-posix never required anything else than 512. While on disks with 4k sectors direct I/O requires a 4k alignment, memory may still be okay when aligned to 512 byte boundaries. This is what must have happened in practice, because otherwise this would already have failed earlier. Therefore I don't expect regressions even with this intermediate state. Later, raw-posix can implement the hook and expose a different memory alignment requirement. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-01-24 17:40:01 +01:00
Kevin Wolf	355ef4ac95	block: Update BlockLimits when they might have changed When reopening with different flags, or when backing files disappear from the chain, the limits may change. Make sure they get updated in these cases. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit@irqsave.net>	2014-01-24 17:40:01 +01:00
Kevin Wolf	d34682cd4a	block: Move initialisation of BlockLimits to bdrv_refresh_limits() This function separates filling the BlockLimits from bdrv_open(), which allows it to call it from other operations which may change the limits (e.g. modifications to the backing file chain or bdrv_reopen) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-01-24 17:40:01 +01:00
Benoît Canet	212a5a8f09	block: Create authorizations mechanism for external snapshot and resize. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-01-24 16:07:08 +01:00
Benoît Canet	12d3ba821d	qmp: Allow to change password on named block driver states. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Fam Zheng <famz@redhat.com> There was two candidate ways to implement named node manipulation: 1) { 'command': 'block_passwd', 'data': {'device': 'str', 'node-name': 'str', 'password': 'str'} } 2) { 'command': 'block_passwd', 'data': {'device': 'str', '*device-is-node': 'bool', 'password': 'str'} } Luiz proposed 1 and says 2 was an abuse of the QMP interface and proposed to rewrite the QMP block interface for 2.0. Luiz does not like in 1 the fact that 2 fields are optional but one of them must be specified leading to an abuse of the QMP semantic. Kevin argumented that 2 what a clear abuse of the device field and would not be practical when reading fast some log file because the user would read "device" and think that a device is manipulated when it's in fact a node name. Documentation of 1 make it pretty clear what to do for the user. Kevin argued that all bs are node including devices ones so 2 does not make sense. Kevin also argued that rewriting the QMP block interface would not make disapear the current one. Kevin pushed the argument that making the QAPI generator compatible with the semantic of the operation would need a rewrite that no one has done yet. A vote has been done on the list to elect the version to use and 1 won. For reference the complete thread is: "[Qemu-devel] [PATCH V4 4/7] qmp: Allow to change password on names block driver states." Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-01-24 16:07:08 +01:00
Benoît Canet	c13163fba1	qmp: Add QMP query-named-block-nodes to list the named BlockDriverState nodes. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-01-24 16:07:08 +01:00
Benoît Canet	dc364f4cdc	block: Add bs->node_name to hold the name of a bs node of the bs graph. Add the minimum of code to prepare for the following patches. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-01-24 14:33:01 +01:00
Max Reitz	da557aac18	block: Add bdrv_open_image() Add a common function for opening images to be used for block drivers specified through BlockdevRefs in an option QDict. The difference from bdrv_file_open() is that this function may invoke bdrv_open() instead, allowing auto-detection of the driver to be used; and second, it automatically extracts the BlockdevRef from the option QDict. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-01-22 12:07:18 +01:00
Max Reitz	72daa72eee	block: Allow reference for bdrv_file_open() Allow specifying a reference to an existing block device (by name) for bdrv_file_open() instead of a filename and/or options. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-01-22 12:07:17 +01:00
Fam Zheng	03544a6e9e	block: Add commit_active_start() commit_active_start is implemented in block/mirror.c, It will create a job with "commit" type and designated base in block-commit command. This will be used for committing active layer of device. Sync mode is removed from MirrorBlockJob because there's no proper type for commit. The used information is is_none_mode. The common part of mirror_start and commit_active_start is moved to mirror_start_job(). Fix the comment wording for commit_start. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-20 16:26:16 +01:00
Peter Lieven	7337acaf21	block: add opt_transfer_length to BlockLimits Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-05 11:45:24 +01:00
Wenchao Xia	8c116b0e41	qemu-nbd: support internal snapshot export Now it is possible to directly export an internal snapshot, which can be used to probe the snapshot's contents without qemu-img convert. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-04 15:19:00 +01:00
Wenchao Xia	7b4c4781e3	snapshot: distinguish id and name in load_tmp Since later this function will be used so improve it. The only caller of it now is qemu-img, and it is not impacted by introduce function bdrv_snapshot_load_tmp_by_id_or_name() that call bdrv_snapshot_load_tmp() twice to keep old search logic. bdrv_snapshot_load_tmp_by_id_or_name() return int to let caller know the errno, and errno will be used later. Also fix a typo in comments of bdrv_snapshot_delete(). Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-04 15:19:00 +01:00
Paolo Bonzini	d5ef94d43d	block: add bdrv_aio_write_zeroes This will be used by the SCSI layer. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-03 15:26:49 +01:00
Paolo Bonzini	d20d9b7c67	block: add flags to BlockRequest This lets bdrv_co_do_rw receive flags, so that it can be used for zero writes. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-03 15:26:48 +01:00
Marc-André Lureau	7b6b145dbc	coroutine: remove unused CoQueue AioContext The AioContext ctx field is apparently unused in qemu codebase since `02ffb50448`. Signed-off-by: Marc-André Lureau <marcandre.lureau@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-02 17:11:49 +01:00
Marc-André Lureau	f287c41381	coroutine: remove qemu_co_queue_wait_insert_head qemu_co_queue_wait_insert_head() is unused in qemu code base now. Signed-off-by: Marc-André Lureau <marcandre.lureau@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-02 17:11:49 +01:00
Fam Zheng	4cc70e9337	blkdebug: add "remove_break" command This adds "remove_break" command which is the reverse of blkdebug command "break": it removes all breakpoints with given tag and resumes all the requests. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-29 13:40:37 +01:00
Liu Yuan	b3af018f3b	sheepdog: support user-defined redundancy option Sheepdog support two kinds of redundancy, full replication and erasure coding. # create a fully replicated vdi with x copies -o redundancy=x (1 <= x <= SD_MAX_COPIES) # create a erasure coded vdi with x data strips and y parity strips -o redundancy=x:y (x must be one of {2,4,8,16} and 1 <= y < SD_EC_MAX_STRIP) E.g, to convert a vdi into sheepdog vdi 'test' with 8:3 erasure coding scheme $ qemu-img convert -o redundancy=8:3 linux-0.2.img sheepdog:test Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-29 13:40:37 +01:00
Fam Zheng	21b5683508	qapi: Change BlockDirtyInfo to list We have multiple dirty bitmaps in BDS now, switch QAPI to allow query it (BlockInfo.dirty_bitmaps), and also drop old BlockInfo.dirty. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-11-29 13:40:36 +01:00
Fam Zheng	e4654d2d94	block: per caller dirty bitmap Previously a BlockDriverState has only one dirty bitmap, so only one caller (e.g. a block job) can keep track of writing. This changes the dirty bitmap to a list and creates a BdrvDirtyBitmap for each caller, the lifecycle is managed with these new functions: bdrv_create_dirty_bitmap bdrv_release_dirty_bitmap Where BdrvDirtyBitmap is a linked list wrapper structure of HBitmap. In place of bdrv_set_dirty_tracking, a BdrvDirtyBitmap pointer argument is added to these functions, since each caller has its own dirty bitmap: bdrv_get_dirty bdrv_dirty_iter_init bdrv_get_dirty_count bdrv_set_dirty and bdrv_reset_dirty prototypes are unchanged but will internally walk the list of all dirty bitmaps and set them one by one. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-11-29 13:40:33 +01:00
Peter Lieven	d75cbb5e68	block: introduce bdrv_make_zero this patch adds a call to completely zero out a block device. the operation is sped up by checking the block status and only writing zeroes to the device if they currently do not return zeroes. optionally the zero writing can be sped up by setting the flag BDRV_REQ_MAY_UNMAP to emulate the zero write by unmapping if the driver supports it. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-28 10:30:52 +01:00
Peter Lieven	fe81c2cca6	block: add BlockLimits structure to BlockDriverState this patch adds BlockLimits which introduces discard and write_zeroes limits and alignment information to the BlockDriverState. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-28 10:30:51 +01:00
Peter Lieven	4ce786914b	block: add wrappers for logical block provisioning information This adds 2 wrappers to read the unallocated_blocks_are_zero and can_write_zeroes_with_unmap info from the BDI. The wrappers are required to check for the existence of a backing_hd and if the devices are opened with the correct flags. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-28 10:30:51 +01:00
Peter Lieven	e1a5c4bed4	block: add logical block provisioning info to BlockDriverInfo Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-28 10:30:51 +01:00
Peter Lieven	d32f35cbc5	block: introduce BDRV_REQ_MAY_UNMAP request flag Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-28 10:30:51 +01:00
Peter Lieven	aa7bfbfff7	block: add flags to bdrv_*_write_zeroes Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-28 10:30:51 +01:00
Peter Lieven	6faac15fa8	block: make BdrvRequestFlags public Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-11-28 10:30:51 +01:00
MORITA Kazutaka	3ab7bd1917	coroutine: add co_aio_sleep_ns() to allow sleep in block drivers This helper function behaves similarly to co_sleep_ns(), but the sleeping coroutine will be resumed when using qemu_aio_wait(). Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Tested-by: Liu Yuan <namei.unix@gmail.com> Reviewed-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-30 12:22:09 +01:00
Kevin Wolf	b94a261057	block: Avoid unecessary drv->bdrv_getlength() calls The block layer generally keeps the size of an image cached in bs->total_sectors so that it doesn't have to perform expensive operations to get the size whenever it needs it. This doesn't work however when using a backend that can change its size without qemu being aware of it, i.e. passthrough of removable media like CD-ROMs or floppy disks. For this reason, the caching is disabled when a removable device is used. It is obvious that checking whether the _guest_ device has removable media isn't the right thing to do when we want to know whether the size of the host backend can change. To make things worse, non-top-level BlockDriverStates never have any device attached, which makes qemu assume they are removable, so drv->bdrv_getlength() is always called on the protocol layer. In the case of raw-posix, this causes unnecessary lseek() system calls, which turned out to be rather expensive. This patch completely changes the logic and disables bs->total_sectors caching only for certain block driver types, for which a size change is expected: host_cdrom and host_floppy on POSIX, host_device on win32; also the raw format in case it sits on top of one of these protocols, but in the common case the nested bdrv_getlength() call on the protocol driver will use the cache again and avoid an expensive drv->bdrv_getlength() call. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-29 13:10:26 +01:00
Benoît Canet	f6186f49e2	block: Add BlockDriver.bdrv_check_ext_snapshot. This field is used by blkverify to disable external snapshots creation. It will also be used by block filters like quorum to disable external snapshot creation. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:49:59 +02:00
Peter Lieven	92bc50a5ad	block/get_block_status: avoid redundant callouts on raw devices if a raw device like an iscsi target or host device is used the current implementation makes a second call out to get the block status of bs->file. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:49:59 +02:00
Max Reitz	a8d8ecb77f	block/qapi: Human-readable ImageInfoSpecific dump Add a function for generically dumping the ImageInfoSpecific information in a human-readable format to block/qapi.c. Use this function in bdrv_image_info_dump and qemu-io-cmds.c:info_f to allow qemu-img info resp. qemu-io -c info to print that format specific information. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 10:52:54 +02:00
Max Reitz	eae041fe6f	block: Add bdrv_get_specific_info Add a function for retrieving an ImageInfoSpecific object from a block driver. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 10:52:54 +02:00
Fam Zheng	79e14bf778	qapi: make use of new BlockJobType Switch the string to enum type BlockJobType in BlockJobDriver. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 10:52:54 +02:00
Fam Zheng	3fc4b10af0	blockjob: rename BlockJobType to BlockJobDriver We will use BlockJobType as the enum type name of block jobs in QAPI, rename current BlockJobType to BlockJobDriver, which will eventually become a set of operations, similar to block drivers. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 10:52:54 +02:00
Benoît Canet	030be32184	block: introduce BlockDriver.bdrv_needs_filename to enable some drivers. Some drivers will have driver specifics options but no filename. This new bool allow the block layer to treat them correctly. The .bdrv_needs_filename is set in drivers not having .bdrv_parse_filename and not having .bdrv_open. The first exception to this rule will be the quorum driver. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 16:21:28 +02:00
Max Reitz	cc84d90ff5	block: Error parameter for create functions Add an Error ** parameter to bdrv_create and its associated functions to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Max Reitz	34b5d2c68e	block: Error parameter for open functions Add an Error ** parameter to bdrv_open, bdrv_file_open and associated functions to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Max Reitz	d5124c00d8	bdrv: Use "Error" for creating images Add an Error ** parameter to BlockDriver.bdrv_create to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Max Reitz	015a1036a7	bdrv: Use "Error" for opening images Add an Error ** parameter to BlockDriver.bdrv_open and BlockDriver.bdrv_file_open to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:47 +02:00
Wenchao Xia	a89d89d3e6	snapshot: distinguish id and name in snapshot delete Snapshot creation actually already distinguish id and name since it take a structured parameter sn, but delete can't. Later an accurate delete is needed in qmp_transaction abort and blockdev-snapshot-delete-sync, so change its prototype. Also errp is added to tip error, but return value is kepted to let caller check what kind of error happens. Existing caller for it are savevm, delvm and qemu-img, they are not impacted by introducing a new function bdrv_snapshot_delete_by_id_or_name(), which check the return value and do the operation again. Before this patch: For qcow2, it search id first then name to find the one to delete. For rbd, it search name. For sheepdog, it does nothing. After this patch: For qcow2, logic is the same by call it twice in caller. For rbd, it always fails in delete with id, but still search for name in second try, no change to user. Some code for *errp is based on Pavel's patch. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:47 +02:00
Wenchao Xia	2ea1dd758c	snapshot: new function bdrv_snapshot_find_by_id_and_name() To make it clear about id and name in searching, add this API to distinguish them. Caller can choose to search by id or name, *errp will be set only for exception. Some code are modified based on Pavel's patch. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:47 +02:00
Max Reitz	6f176b48f9	block: Image file option amendment This patch adds the "amend" option to qemu-img which allows changing image options on existing image files. It also adds the generic bdrv implementation which is basically just a wrapper for the image format specific function. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Paolo Bonzini	4333bb7140	block: define get_block_status return value Define the return value of get_block_status. Bits 0, 1, 2 and 9-62 are valid; bit 63 (the sign bit) is reserved for errors. Bits 3-8 are left for future extensions. The return code is compatible with the old is_allocated API: if a driver only returns 0 or 1 (aka BDRV_BLOCK_DATA) like is_allocated used to, clients of is_allocated will not have any change in behavior. Still, we will return more precise information in the next patches and the new definition of bdrv_is_allocated is already prepared for this. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	b6b8a33354	block: introduce bdrv_get_block_status API For now, bdrv_get_block_status is just another name for bdrv_is_allocated. The next patches will add more flags. This also touches all block drivers with a mostly mechanical rename. The sole exception is cow; because it calls cow_co_is_allocated from the read code, we keep that function and make cow_co_get_block_status a wrapper. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	4f5786376e	block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction Now that bdrv_is_allocated detects coroutine context, the two can use the same code. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	bdad13b9de	block: make bdrv_co_is_allocated static bdrv_is_allocated can detect coroutine context and go through a fast path, similar to other block layer functions. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Fam Zheng	4f6fd3491c	block: make bdrv_delete() static Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no longer public and should be called by bdrv_unref() if refcnt is decreased to 0. This is an identical change because effectively, there's no multiple reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Fam Zheng	9fcb025146	block: implement reference count for BlockDriverState Introduce bdrv_ref/bdrv_unref to manage the lifecycle of BlockDriverState. They are unused for now but will used to replace bdrv_delete() later. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Benoît Canet	cc0681c454	block: Enable the new throttling code in the block layer. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:07 +02:00
Max Reitz	afa50193cd	qcow2-refcount: Repair shared refcount blocks If the refcount of a refcount block is greater than one, we can at least try to repair that problem by duplicating the affected block. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-02 10:06:59 +02:00
Alex Bligh	7483d1e547	aio / timers: convert block_job_sleep_ns and co_sleep_ns to new API Convert block_job_sleep_ns and co_sleep_ns to use the new timer API. Signed-off-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 19:14:24 +02:00
Alex Bligh	4e29e8311a	aio / timers: Add aio_timer_init & aio_timer_new wrappers Add aio_timer_init and aio_timer_new wrapper functions. Signed-off-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 19:10:28 +02:00
Alex Bligh	dae21b98b9	aio / timers: Add QEMUTimerListGroup to AioContext Add a QEMUTimerListGroup each AioContext (meaning a QEMUTimerList associated with each clock is added) and delete it when the AioContext is freed. Signed-off-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 19:10:27 +02:00
Alex Bligh	6a1751b7aa	aio / timers: Untangle include files include/qemu/timer.h has no need to include main-loop.h and doing so causes an issue for the next patch. Unfortunately various files assume including timers.h will pull in main-loop.h. Untangle this mess. Signed-off-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 19:10:27 +02:00
Asias He	0d51b4debe	block: Introduce bs->zero_beyond_eof In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when protocols reading beyond end of file), we break qemu-iotests ./check -qcow2 022. This happens because qcow2 temporarily sets ->growable = 1 for vmstate accesses (which are stored beyond the end of regular image data). We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to disable ->zero_beyond_eof temporarily in addition to enable ->growable. [Since the broken patch "block: Produce zeros when protocols reading beyond end of file" has not been merged yet, I have applied this fix first and will then apply the next patch to keep the tree bisectable. -- Stefan] Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 14:10:21 +02:00
Stefan Hajnoczi	f2e5dca46b	aio: drop io_flush argument The .io_flush() handler no longer exists and has no users. Drop the io_flush argument to aio_set_fd_handler() and related functions. The AioFlushEventNotifierHandler and AioFlushHandler typedefs are no longer used and are dropped too. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Benoît Canet	b681a1c73e	block: Repair the throttling code. The throttling code was segfaulting since commit `02ffb50448` because some qemu_co_queue_next caller does not run in a coroutine. qemu_co_queue_do_restart assume that the caller is a coroutinne. As suggested by Stefan fix this by entering the coroutine directly. Also make sure like suggested that qemu_co_queue_next() and qemu_co_queue_restart_all() can be called only in coroutines. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-29 17:07:37 +02:00
Ian Main	fc5d3f8432	Implement sync modes for drive-backup. This patch adds sync-modes to the drive-backup interface and implements the FULL, NONE and TOP modes of synchronization. FULL performs as before copying the entire contents of the drive while preserving the point-in-time using CoW. NONE only copies new writes to the target drive. TOP copies changes to the topmost drive image and preserves the point-in-time using CoW. For sync mode TOP are creating a new target image using the same backing file as the original disk image. Then any new data that has been laid on top of it since creation is copied in the main backup_run() loop. There is an extra check in the 'TOP' case so that we don't bother to copy all the data of the backing file as it already exists in the target. This is where the bdrv_co_is_allocated() is used to determine if the data exists in the topmost layer or below. Also any new data being written is intercepted via the write_notifier hook which ends up calling backup_do_cow() to copy old data out before it gets overwritten. For mode 'NONE' we create the new target image and only copy in the original data from the disk image starting from the time the call was made. This preserves the point in time data by only copying the parts that are going to change to the target image. This way we can reconstruct the final image by checking to see if the given block exists in the new target image first, and if it does not, you can get it from the original image. This is basically an optimization allowing you to do point-in-time snapshots with low overhead vs the 'FULL' version. Since there is no old data to copy out the loop in backup_run() for the NONE case just calls qemu_coroutine_yield() which only wakes up after an event (usually cancel in this case). The rest is handled by the before_write notifier which again calls backup_do_cow() to write out the old data so it can be preserved. Signed-off-by: Ian Main <imain@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-07-26 22:01:31 +02:00
Peter Lieven	4105eaaab9	block: add bdrv_write_zeroes() Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-19 12:29:21 +08:00
Liu Ping Fan	dcc772e2f2	QEMUBH: make AioContext's bh re-entrant BH will be used outside big lock, so introduce lock to protect between the writers, ie, bh's adders and deleter. The lock only affects the writers and bh's callback does not take this extra lock. Note that for the same AioContext, aio_bh_poll() can not run in parallel yet. Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-19 12:29:21 +08:00
Kevin Wolf	f0f0fdfeec	block: Add return value for bdrv_flush_all() bdrv_flush() can fail, and bdrv_flush_all() should return an error as well if this happens for a block device. It returns the first error return now, but still at least tries to flush the remaining devices even in error cases. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-15 09:51:27 +02:00
Kevin Wolf	98289620e0	block: Don't parse protocol from file.filename One of the major reasons for doing something new for -blockdev and blockdev-add was that the old block layer code parses filenames instead of just taking them literally. So we should really leave it untouched when it's passing using the new interfaces (like -drive file.filename=...). This allows opening relative file names that contain a colon. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-07-15 09:49:00 +02:00
Peter Lieven	3ac216270a	block: change default of .has_zero_init to 0 .has_zero_init defaults to 1 for all formats and protocols. this is a dangerous default since this means that all new added drivers need to manually overwrite it to 0 if they do not ensure that a device is zero initialized after bdrv_create(). if a driver needs to explicitly set this value to 1 its easier to verify the correctness in the review process. during review of the existing drivers it turned out that ssh and gluster had a wrong default of 1. both protocols support host_devices as backend which are not by default zero initialized. this wrong assumption will lead to possible corruption if qemu-img convert is used to write to such a backend. vpc and vmdk also defaulted to 1 altough they support fixed respectively flat extends. this has to be addresses in separate patches. both formats as well as the mentioned ssh and gluster are turned to the default of 0 with this patch for safety. a similar problem with the wrong default existed for iscsi most likely because the driver developer did oversee the default value of 1. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 13:52:35 +02:00
Dietmar Maurer	98d2c6f2cd	block: add basic backup support to block driver backup_start() creates a block job that copies a point-in-time snapshot of a block device to a target block device. We call backup_do_cow() for each write during backup. That function reads the original data from the block device before it gets overwritten. The data is then written to the target device. Currently backup cluster size is hardcoded to 65536 bytes. [I made a number of changes to Dietmar's original patch and folded them in to make code review easy. Here is the full list: * Drop BackupDumpFunc interface in favor of a target block device * Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes() * Use 0 delay instead of 1us, like other block jobs * Unify creation/start functions into backup_start() * Simplify cleanup, free bitmap in backup_run() instead of cb * function * Use HBitmap to avoid duplicating bitmap code * Use bdrv_getlength() instead of accessing ->total_sectors * directly * Delete the backup.h header file, it is no longer necessary * Move ./backup.c to block/backup.c * Remove #ifdefed out code * Coding style and whitespace cleanups * Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks * Keep our own in-flight CowRequest list instead of using block.c tracked requests. This means a little code duplication but is much simpler than trying to share the tracked requests list and use the backup block size. * Add on_source_error and on_target_error error handling. * Use trace events instead of DPRINTF() -- stefanha] Signed-off-by: Dietmar Maurer <dietmar@proxmox.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:26 +02:00

... 3 4 5 6 7 ...

501 Commits