mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Kevin Wolf	e8eb863778	block: Make bdrv_is_whitelisted() public We'll use a separate source file for image creation, and we need to check there whether the requested driver is whitelisted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-03-09 15:17:47 +01:00
Kevin Wolf	e1d74bc6c6	qcow2: Use BlockdevRef in qcow2_co_create() Instead of passing a separate BlockDriverState* into qcow2_co_create(), make use of the BlockdevRef that is included in BlockdevCreateOptions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-03-09 15:17:47 +01:00
Paolo Bonzini	2fd6163884	block: convert bdrv_check callback to coroutine_fn Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1516279431-30424-8-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-09 15:17:47 +01:00
Paolo Bonzini	2b148f392b	block: convert bdrv_invalidate_cache callback to coroutine_fn QED's bdrv_invalidate_cache implementation would like to reuse functions that acquire/release the metadata locks. Call it from coroutine context to simplify the logic. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1516279431-30424-6-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-09 15:17:47 +01:00
Stefan Hajnoczi	b89d92f3cf	block: add aio_wait_bh_oneshot() Sometimes it's necessary for the main loop thread to run a BH in an IOThread and wait for its completion. This primitive is useful during startup/shutdown to synchronize and avoid race conditions. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20180307144205.20619-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-03-08 17:38:51 +00:00
Peter Maydell	58e2e17dba	Block layer patches -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJanYJPAAoJEH8JsnLIjy/WxjUQAJA+DTOmGXvaNpMs65BrU79K /r/iGVrzHv/RMLmrWMnqj96W9SnpMuiAP9hVLNsekqClY9q4ME4DpGcXhWfhSvF5 FC51ehvFJdfo8cPorsevcqNj60iWebjcx3lFfUq2606UOyYih3oijYxr6gSwWbRc GAgdGMqsvGYpzgqAQVEWHUhaX0La49/OzY42aR+E+LCBNfTYvlydvyoc+tUTdIpW 1eM/ASGndGsN0Cf2vxlbKgJ0/P6v+cRZuuIDhKZqre+YG+yM+pq7yZb+o7nf/P36 TPR93BsT7FSVAizRK7VFRuPIynHpiaxYygrJERCXF0sxsV4OlKjpmt/uUPamWFh+ 46Jx2NK1AuAx87BdErgmA119ObO3oAPxK0+2p981obb6SphTbbPxDj6SOlYCt4mJ mhff4JtIiwCmDSckAwd2mkBI1Tvl9qqcELrpyd2t2eU4ec2vf7fPd85EsK/Mq6Kr dbfqFvjNaaMxChoqFgkHAveYJ7zYqRFI2IY5o9c1QyZehCGPWjScxHXZZYdpDl59 YF9DkYQDOyvEX2jmMECaO1r/0nnO+BqQHu5ItJuTte9rjP9Q0do3iBISiIefewtf yji6/QNn2hFrnr1HPAwLFFC3kPgc8Mq8mIUb53j8vG/01KhVRCcnJm2K6D4IUwLZ S6ZnQJB97eE4y7YR5dNt =2axz -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Mon 05 Mar 2018 17:45:51 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (38 commits) block: Fix NULL dereference on empty drive error qcow2: Replace align_offset() with ROUND_UP() block/ssh: Add basic .bdrv_truncate() block/ssh: Make ssh_grow_file() blocking block/ssh: Pull ssh_grow_file() from ssh_create() qemu-img: Make resize error message more general qcow2: make qcow2_co_create2() a coroutine_fn block: rename .bdrv_create() to .bdrv_co_create_opts() Revert "IDE: Do not flush empty CDROM drives" block: test blk_aio_flush() with blk->root == NULL block: add BlockBackend->in_flight counter block: extract AIO_WAIT_WHILE() from BlockDriverState aio: rename aio_context_in_iothread() to in_aio_context_home_thread() docs: document how to use the l2-cache-entry-size parameter specs/qcow2: Fix documentation of the compressed cluster descriptor iotest 033: add misaligned write-zeroes test via truncate block: fix write with zero flag set and iovector provided block: Drop unused .bdrv_co_get_block_status() vvfat: Switch to .bdrv_co_block_status() vpc: Switch to .bdrv_co_block_status() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # include/block/block.h	2018-03-06 11:20:44 +00:00
Markus Armbruster	9af2398977	Include less of the generated modular QAPI headers In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-02 13:45:50 -06:00
Stefan Hajnoczi	efc75e2a4c	block: rename .bdrv_create() to .bdrv_co_create_opts() BlockDriver->bdrv_create() has been called from coroutine context since commit `5b7e1542cf` ("block: make bdrv_create adopt coroutine"). Make this explicit by renaming to .bdrv_co_create_opts() and add the coroutine_fn annotation. This makes it obvious to block driver authors that they may yield, use CoMutex, or other coroutine_fn APIs. bdrv_co_create is reserved for the QAPI-based version that Kevin is working on. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170705102231.20711-2-stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Stefan Hajnoczi	7719f3c968	block: extract AIO_WAIT_WHILE() from BlockDriverState BlockDriverState has the BDRV_POLL_WHILE() macro to wait on event loop activity while a condition evaluates to true. This is used to implement synchronous operations where it acts as a condvar between the IOThread running the operation and the main loop waiting for the operation. It can also be called from the thread that owns the AioContext and in that case it's just a nested event loop. BlockBackend needs this behavior but doesn't always have a BlockDriverState it can use. This patch extracts BDRV_POLL_WHILE() into the AioWait abstraction, which can be used with AioContext and isn't tied to BlockDriverState anymore. This feature could be built directly into AioContext but then all users would kick the event loop even if they signal different conditions. Imagine an AioContext with many BlockDriverStates, each time a request completes any waiter would wake up and re-check their condition. It's nicer to keep a separate AioWait object for each condition instead. Please see "block/aio-wait.h" for details on the API. The name AIO_WAIT_WHILE() avoids the confusion between AIO_POLL_WHILE() and AioContext polling. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Stefan Hajnoczi	d2b63ba8dd	aio: rename aio_context_in_iothread() to in_aio_context_home_thread() The name aio_context_in_iothread() is misleading because it also returns true when called on the main AioContext from the main loop thread, which is not an IOThread. This patch renames it to in_aio_context_home_thread() and expands the doc comment to make the semantics clearer. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Eric Blake	636cb51258	block: Drop unused .bdrv_co_get_block_status() We are gradually moving away from sector-based interfaces, towards byte-based. Now that all drivers have been updated to provide the byte-based .bdrv_co_block_status(), we can delete the sector-based interface. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Eric Blake	3e4d0e72b7	block: Switch passthrough drivers to .bdrv_co_block_status() We are gradually moving away from sector-based interfaces, towards byte-based. Update the generic helpers, and all passthrough clients (blkdebug, commit, mirror, throttle) accordingly. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Eric Blake	86a3d5c688	block: Add .bdrv_co_block_status() callback We are gradually moving away from sector-based interfaces, towards byte-based. Now that the block layer exposes byte-based allocation, it's time to tackle the drivers. Add a new callback that operates on as small as byte boundaries. Subsequent patches will then update individual drivers, then finally remove .bdrv_co_get_block_status(). The new code also passes through the 'want_zero' hint, which will allow subsequent patches to further optimize callers that only care about how much of the image is allocated (want_zero is false), rather than full details about runs of zeroes and which offsets the allocation actually maps to (want_zero is true). As part of this effort, fix another part of the documentation: the claim in commit `4c41cb4` that BDRV_BLOCK_ALLOCATED is short for 'DATA \|\| ZERO' is a lie at the block layer (see commit `e88ae2264`), even though it is how the bit is computed from the driver layer. After all, there are intentionally cases where we return ZERO but not ALLOCATED at the block layer, when we know that a read sees zero because the backing file is too short. Note that the driver interface is thus slightly different than the public interface with regards to which bits will be set, and what guarantees are provided on input. We also add an assertion that any driver using the new callback will make progress (the only time pnum will be 0 is if the block layer already handled an out-of-bounds request, or if there is an error); the old driver interface did not provide this guarantee, which could lead to some inf-loops in drastic corner-case failures. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Vladimir Sementsov-Ogievskiy	25c146789f	nbd: BLOCK_STATUS constants Expose the new constants and structs that will be used by both server and client implementations of NBD_CMD_BLOCK_STATUS (the command is currently experimental at https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md but will hopefully be stabilized soon). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1518702707-7077-4-git-send-email-vsementsov@virtuozzo.com> [eblake: split from larger patch on server implementation] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-01 14:05:24 -06:00
Vladimir Sementsov-Ogievskiy	6bc8695725	nbd: change indenting in nbd.h Prepared indenting for the following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1518702707-7077-3-git-send-email-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-01 14:04:45 -06:00
Vladimir Sementsov-Ogievskiy	3e99da5e76	block: maintain persistent disabled bitmaps To maintain load/store disabled bitmap there is new approach: - deprecate @autoload flag of block-dirty-bitmap-add, make it ignored - store enabled bitmaps as "auto" to qcow2 - store disabled bitmaps without "auto" flag to qcow2 - on qcow2 open load "auto" bitmaps as enabled and others as disabled (except in_use bitmaps) Also, adjust iotests 165 and 176 appropriately. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180202160752.143796-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-02-13 16:59:58 +01:00
Eric Blake	e24d813b29	block: Simplify bdrv_can_write_zeroes_with_unmap() We don't need the can_write_zeroes_with_unmap field in BlockDriverInfo, because it is redundant information with supported_zero_flags & BDRV_REQ_MAY_UNMAP. Note that BlockDriverInfo and supported_zero_flags are both per-device settings, rather than global state about the driver as a whole, which means one or both of these bits of information can already be conditional. Let's audit how they were set: crypto: always setting can_write_ to false is pointless (the struct starts life zero-initialized), no use of supported_ nbd: just recently fixed to set can_write_ if supported_ includes MAY_UNMAP (thus this commit effectively reverts bca80059e and solves the problem mentioned there in a more global way) file-posix, iscsi, qcow2: can_write_ is conditional, while supported_ was unconditional; but passing MAY_UNMAP would fail with ENOTSUP if the condition wasn't met qed: can_write_ is unconditional, but pwrite_zeroes lacks support for MAY_UNMAP and supported_ is not set. Perhaps support can be added later (since it would be similar to qcow2), but for now claiming false is no real loss all other drivers: can_write_ is not set, and supported_ is either unset or a passthrough Simplify the code by moving the conditional into supported_zero_flags for all drivers, then dropping the now-unused BDI field. For callers that relied on bdrv_can_write_zeroes_with_unmap(), we return the same per-device settings for drivers that had conditions (no observable change in behavior there); and can now return true (instead of false) for drivers that support passthrough (for example, the commit driver) which gives those drivers the same fix as nbd just got in bca80059e. For callers that relied on supported_zero_flags, we now have a few more places that can avoid a wasted call to pwrite_zeroes() that will just fail with ENOTSUP. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180126193439.20219-1-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-02-09 12:32:44 -06:00
Markus Armbruster	922a01a013	Move include qemu/option.h from qemu-common.h to actual users qemu-common.h includes qemu/option.h, but most places that include the former don't actually need the latter. Drop the include, and add it to the places that actually need it. While there, drop superfluous includes of both headers, and separate #include from file comment with a blank line. This cleanup makes the number of objects depending on qemu/option.h drop from 4545 (out of 4743) to 284 in my "build everything" tree. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-20-armbru@redhat.com> [Semantic conflict with commit `bdd6a90a9e` in block/nvme.c resolved]	2018-02-09 13:52:16 +01:00
Markus Armbruster	452fcdbc49	Include qapi/qmp/qdict.h exactly where needed This cleanup makes the number of objects depending on qapi/qmp/qdict.h drop from 4550 (out of 4743) to 368 in my "build everything" tree. For qapi/qmp/qobject.h, the number drops from 4552 to 390. While there, separate #include from file comment with a blank line. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-13-armbru@redhat.com>	2018-02-09 13:52:15 +01:00
Markus Armbruster	5ee9d2fe9e	Include qapi/qmp/qobject.h exactly where needed Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-11-armbru@redhat.com>	2018-02-09 13:52:15 +01:00
Markus Armbruster	522ece32d2	Drop superfluous includes of qapi-types.h and test-qapi-types.h Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-4-armbru@redhat.com>	2018-02-09 05:05:11 +01:00
Fam Zheng	a3d9a352d4	block: Move NVMe constants to a separate header Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-8-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-02-08 09:22:03 +08:00
Fam Zheng	23d0ba9319	block: Introduce buf register API Allow block driver to map and unmap a buffer for later I/O, as a performance hint. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-5-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-02-08 09:22:03 +08:00
Vladimir Sementsov-Ogievskiy	a3b0dc7582	qapi: add nbd-server-remove Add command for removing an export. It is needed for cases when we don't want to keep the export after the operation on it was completed. The other example is a temporary node, created with blockdev-add. If we want to delete it we should firstly remove any corresponding NBD export. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180119135719.24745-3-vsementsov@virtuozzo.com> [eblake: drop dead nb_clients code] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-01-26 09:37:20 -06:00
Vladimir Sementsov-Ogievskiy	420a4e9559	nbd: rename nbd_option and nbd_opt_reply Rename nbd_option and nbd_opt_reply to NBDOption and NBDOptionReply to correspond to Qemu coding style and other structures here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171122101958.17065-5-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2018-01-10 12:11:23 -06:00
Kevin Wolf	d736f119da	block: Allow graph changes in subtree drained section We need to remember how many of the drain sections in which a node is were recursive (i.e. subtree drain rather than node drain), so that they can be correctly applied when children are added or removed during the drained section. With this change, it is safe to modify the graph even inside a bdrv_subtree_drained_begin/end() section. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-12-22 15:05:32 +01:00
Kevin Wolf	b016558590	block: Add bdrv_subtree_drained_begin/end() bdrv_drained_begin() waits for the completion of requests in the whole subtree, but it only actually keeps its immediate bs parameter quiesced until bdrv_drained_end(). Add a version that keeps the whole subtree drained. As of this commit, graph changes cannot be allowed during a subtree drained section, but this will be fixed soon. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-12-22 15:05:32 +01:00
Kevin Wolf	0152bf400f	block: Don't notify parents in drain call chain This is in preparation for subtree drains, i.e. drained sections that affect not only a single node, but recursively all child nodes, too. Calling the parent callbacks for drain is pointless when we just came from that parent node recursively and leads to multiple increases of bs->quiesce_counter in a single drain call. Don't do it. In order for this to work correctly, the parent callback must be called for every bdrv_drain_begin/end() call, not only for the outermost one: If we have a node N with two parents A and B, recursive draining of A should cause the quiesce_counter of B to increase because its child N is drained independently of B. If now B is recursively drained, too, A must increase its quiesce_counter because N is drained independently of A only now, even if N is going from quiesce_counter 1 to 2. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-12-22 15:05:32 +01:00
Fam Zheng	8e77e0bceb	block: Remove unused bdrv_requests_pending Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-12-22 15:05:31 +01:00
Vladimir Sementsov-Ogievskiy	56207df55e	hbitmap: add next_zero function The function searches for next zero bit. Also add interface for BdrvDirtyBitmap and unit test. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20171012135313.227864-2-vsementsov@virtuozzo.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-12-18 10:54:13 -05:00
Paolo Bonzini	fc24908e7d	blockjob: reimplement block_job_sleep_ns to allow cancellation This reverts the effects of commit `4afeffc857` ("blockjob: do not allow coroutine double entry or entry-after-completion", 2017-11-21) This fixed the symptom of a bug rather than the root cause. Canceling the wait on a sleeping blockjob coroutine is generally fine, we just need to make it work correctly across AioContexts. To do so, use a QEMUTimer that calls block_job_enter. Use a mutex to ensure that block_job_enter synchronizes correctly with block_job_sleep_ns. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Tested-By: Jeff Cody <jcody@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-29 15:26:21 +01:00
Paolo Bonzini	5bf1d5a73a	blockjob: remove clock argument from block_job_sleep_ns All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to support more than one clock when block_job_sleep_ns switches to a single timer stored in the BlockJob struct. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Tested-By: Jeff Cody <jcody@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-29 15:11:02 +01:00
Peter Maydell	64807cd779	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaFFvtAAoJEL2+eyfA3jBXHgoP/2Hw6ksjPOUKwuRvOWJYvS27 Vfr5S5yjayS3fB+6mSZJ4cX77ais0d7kyQgPyRfOI7ecMsgQXphYSJlf1ROM5VGY 4Jn3d/jrJ1ZiN3bdQyFH4x0T8+/UuHILc5Hu0C0Jekk1sxGI/8j+k8J9pDVZ7j/8 BvYLMqpnV+W1tGAXUlu8SwB3JqAjy8orO05bd9zN6Q0zPrrJuW+4evijHrB/LJwG HPSttN7khlCt+bPHSx1qDNKYr2lVE2RmaS6Nk3KS/lJgnmEerGCPTbLpno7IdQGu MJ+Lq2FzFNRJ2Il/7bqGio2KN3CmWlm2Kw1n6DuvjUjYXv1m4RlQ8x6EIZCcicN1 sDARn3qm6in3v85WV0uav1ARyqn3XM2YDhDSx3VBQcvgm9pgQYuJ6ztOx3tfVJPz 9T2R272CvOnlOFHyi5C9vZ1XEo9eJMgUNLVXPud+oMAjPodpOv04tK2IsoNOYfk0 OX2aVuV/AvsLMD3xol3bcIOFNhVrv/ePV7J5n8TLZxOpJfbPLYkgC/gTTMkOLce3 bO+W8YCp7GMPCmba6wx7x1frZOY5yg4gA4MVisN7vnnggr1LUU0AvkNw2CDZWj/m K6ZDJwTuBO1OjAUKURHxrRlcnqILaNeR3T7dS+7fS5tpkxxDkZlGZgBmJhw4/21K LvGpqDNcljvowdlKPKgB =s4RK -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging # gpg: Signature made Tue 21 Nov 2017 17:01:33 GMT # gpg: using RSA key 0xBDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: qemu-iotest: add test for blockjob coroutine race condition qemu-iotests: add option in common.qemu for mismatch only coroutine: abort if we try to schedule or enter a pending coroutine blockjob: do not allow coroutine double entry or entry-after-completion Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-11-21 17:05:49 +00:00
Jeff Cody	4afeffc857	blockjob: do not allow coroutine double entry or entry-after-completion When block_job_sleep_ns() is called, the co-routine is scheduled for future execution. If we allow the job to be re-entered prior to the scheduled time, we present a race condition in which a coroutine can be entered recursively, or even entered after the coroutine is deleted. The job->busy flag is used by blockjobs when a coroutine is busy executing. The function 'block_job_enter()' obeys the busy flag, and will not enter a coroutine if set. If we sleep a job, we need to leave the busy flag set, so that subsequent calls to block_job_enter() are prevented. This changes the prior behavior of block_job_cancel() being able to immediately wake up and cancel a job; in practice, this should not be an issue, as the coroutine sleep times are generally very small, and the cancel will occur the next time the coroutine wakes up. This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708 Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-21 11:51:18 -05:00
Kevin Wolf	2b624fe079	block: Add errp to bdrv_all_goto_snapshot() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-11-21 14:48:22 +01:00
Kevin Wolf	0b62bcbc61	block: Add errp to bdrv_snapshot_goto() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-11-21 14:48:22 +01:00
Max Reitz	5e003f17ec	block: Make bdrv_next() keep strong references On one hand, it is a good idea for bdrv_next() to return a strong reference because ideally nearly every pointer should be refcounted. This fixes intermittent failure of iotest 194. On the other, it is absolutely necessary for bdrv_next() itself to keep a strong reference to both the BB (in its first phase) and the BDS (at least in the second phase) because when called the next time, it will dereference those objects to get a link to the next one. Therefore, it needs these objects to stay around until then. Just storing the pointer to the next in the iterator is not really viable because that pointer might become invalid as well. Both arguments taken together means we should probably just invoke bdrv_ref() and blk_ref() in bdrv_next(). This means we have to assert that bdrv_next() is always called from the main loop, but that was probably necessary already before this patch and judging from the callers, it also looks to actually be the case. Keeping these strong references means however that callers need to give them up if they decide to abort the iteration early. They can do so through the new bdrv_next_cleanup() function. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110172545.32609-1-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:31 +01:00
Eric Blake	efdc0c103d	nbd: Fix struct name for structured reads A closer read of the NBD spec shows that a structured reply chunk for a hole is not quite identical to the prefix of a data chunk, because the hole has to also send a 32-bit size field. Although we do not yet send holes, we should fix the misleading information in our header and make it easier for a future patch to support sparse reads. Messed up in commit `bae245d1`. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171108215703.9295-5-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2017-11-09 10:17:12 -06:00
Vladimir Sementsov-Ogievskiy	f140e30003	nbd: Minimal structured read for client Minimal implementation: for structured error only error_report error message. Note that test 83 is now more verbose, because the implementation prints more warnings about unexpected communication errors; perhaps future patches should tone things down by using trace messages instead of traces, but the common case of successful communication is no noisier than before. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171027104037.8319-13-eblake@redhat.com>	2017-10-30 21:48:41 +01:00
Eric Blake	56dc682bf5	nbd: Move nbd_read() to common header An upcoming change to block/nbd-client.c will want to read the tail of a structured reply chunk directly from the wire. Move this function to make it easier. Based on a patch from Vladimir Sementsov-Ogievskiy. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171027104037.8319-12-eblake@redhat.com>	2017-10-30 21:48:36 +01:00
Vladimir Sementsov-Ogievskiy	d2febedb45	nbd/client: prepare nbd_receive_reply for structured reply In following patch nbd_receive_reply will be used both for simple and structured reply header receiving. NBDReply is altered into union of simple reply header and structured reply chunk header, simple error translation moved to block/nbd-client to be consistent with further structured reply error translation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171027104037.8319-11-eblake@redhat.com>	2017-10-30 21:48:32 +01:00
Eric Blake	bae245d19a	nbd: Expose constants and structs for structured read Upcoming patches will implement the NBD structured reply extension [1] for both client and server roles. Declare the constants, structs, and lookup routines that will be valuable whether the server or client code is backported in isolation. This includes moving one constant from an internal header to the public header, as part of the structured read processing will be done in block/nbd-client.c rather than nbd/client.c. [1]https://github.com/NetworkBlockDevice/nbd/blob/extension-structured-reply/doc/proto.md Based on patches from Vladimir Sementsov-Ogievskiy. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171027104037.8319-4-eblake@redhat.com>	2017-10-30 21:07:21 +01:00
Eric Blake	dd68944049	nbd: Move nbd_errno_to_system_errno() to public header This is needed in preparation for structured reply handling, as we will be performing the translation from NBD error to system errno value higher in the stack at block/nbd-client.c. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171027104037.8319-3-eblake@redhat.com>	2017-10-30 21:07:21 +01:00
Eric Blake	efa6e2ed64	block: Align block status requests Any device that has request_alignment greater than 512 should be unable to report status at a finer granularity; it may also be simpler for such devices to be guaranteed that the block layer has rounded things out to the granularity boundary (the way the block layer already rounds all other I/O out). Besides, getting the code correct for super-sector alignment also benefits us for the fact that our public interface now has byte granularity, even though none of our drivers have byte-level callbacks. Add an assertion in blkdebug that proves that the block layer never requests status of unaligned sections, similar to what it does on other requests (while still keeping the generic helper in place for when future patches add a throttle driver). Note that iotest 177 already covers this (it would fail if you use just the blkdebug.c hunk without the io.c changes). Meanwhile, we can drop assertions in callers that no longer have to pass in sector-aligned addresses. There is a mid-function scope added for 'count' and 'longret', for a couple of reasons: first, an upcoming patch will add an 'if' statement that checks whether a driver has an old- or new-style callback, and can conveniently use the same scope for less indentation churn at that time. Second, since we are trying to get rid of sector-based computations, wrapping things in a scope makes it easier to group and see what will be deleted in a final cleanup patch once all drivers have been converted to the new-style callback. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	3182664220	block: Convert bdrv_get_block_status_above() to bytes We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the name of the function from bdrv_get_block_status_above() to bdrv_block_status_above() ensures that the compiler enforces that all callers are updated. Likewise, since it a byte interface allows an offset mapping that might not be sector aligned, split the mapping out of the return value and into a pass-by-reference parameter. For now, the io.c layer still assert()s that all uses are sector-aligned, but that can be relaxed when a later patch implements byte-based block status in the drivers. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_block_status(), plus updates for the new split return interface. But some code, particularly bdrv_block_status(), gets a lot simpler because it no longer has to mess with sectors. Likewise, mirror code no longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore drop an assertion about alignment because the loop no longer depends on alignment (never mind that we don't really have a driver that reports sub-sector alignments, so it's not really possible to test the effect of sub-sector mirroring). Fix a neighboring assertion to use is_power_of_2 while there. For ease of review, bdrv_get_block_status() was tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	237d78f8fc	block: Convert bdrv_get_block_status() to bytes We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the name of the function from bdrv_get_block_status() to bdrv_block_status() ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned, but that can be relaxed when a later patch implements byte-based block status in the drivers. There was an inherent limitation in returning the offset via the return value: we only have room for BDRV_BLOCK_OFFSET_MASK bits, which means an offset can only be mapped for sector-aligned queries (or, if we declare that non-aligned input is at the same relative position modulo 512 of the answer), so the new interface also changes things to return the offset via output through a parameter by reference rather than mashed into the return value. We'll have some glue code that munges between the two styles until we finish converting all uses. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_block_status(), coupled with the tweak in calling convention. But some code, particularly bdrv_is_allocated(), gets a lot simpler because it no longer has to mess with sectors. For ease of review, bdrv_get_block_status_above() will be tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	7cfd527525	block: Make bdrv_round_to_clusters() signature more useful In the process of converting sector-based interfaces to bytes, I'm finding it easier to represent a byte count as a 64-bit integer at the block layer (even if we are internally capped by SIZE_MAX or even INT_MAX for individual transactions, it's still nicer to not have to worry about truncation/overflow issues on as many variables). Update the signature of bdrv_round_to_clusters() to uniformly use int64_t, matching the signature already chosen for bdrv_is_allocated and the fact that off_t is also a signed type, then adjust clients according to the required fallout (even where the result could now exceed 32 bits, no client is directly assigning the result into a 32-bit value without breaking things into a loop first). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	298a1665a2	block: Allow NULL file for bdrv_get_block_status() Not all callers care about which BDS owns the mapping for a given range of the file. This patch merely simplifies the callers by consolidating the logic in the common call point, while guaranteeing a non-NULL file to all the driver callbacks, for no semantic change. The only caller that does not care about pnum is bdrv_is_allocated, as invoked by vvfat; we can likewise add assertions that the rest of the stack does not have to worry about a NULL pnum. Furthermore, this will also set the stage for a future cleanup: when a caller does not care about which BDS owns an offset, it would be nice to allow the driver to optimize things to not have to return BDRV_BLOCK_OFFSET_VALID in the first place. In the case of fragmented allocation (for example, it's fairly easy to create a qcow2 image where consecutive guest addresses are not at consecutive host addresses), the current contract requires bdrv_get_block_status() to clamp pnum to the limit where host addresses are no longer consecutive, but allowing a NULL file means that pnum could be set to the full length of known-allocated data. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Peter Maydell	79b2a13aa8	nbd patches for 2017-10-14 - Marc-André Lureau - NBD: use g_new() family of functions - Vladimir Sementsov-Ogievskiy - first half of 00/13 nbd minimal structured read -----BEGIN PGP SIGNATURE----- Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAABCAAGBQJZ4q4XAAoJEKeha0olJ0NqEooH/R8NKYACELA39xrLdEMUQuZY 1Lm3/OtpBIICKx7OiZ7LniqApAI++FgjNxOf6PAfNG0TmEA+wMFaZ6NJEdi9DAmv kJVLsxiqKLDD+WIKMq5XfZQoFMJ8rV8W2/BYx9cF3Pl4KMT20qDsumsncZJ7DGOR jjsbAI8Q6g45VBx6TJbxXiTMDj87nIyNaydAGzRQTmEHtnmh8mllPiuEhJu24l6G 7CQKfcu4/7Te/5PvJIPn7CxHdVjLYalgWDRkU3kXcwmO8vGQEkYoiHPoc8lGsGtw oXJ2YIODYBIjeICkF0/PjT9aoeJQG8EuHR1hT0CW5dVBZz/DlVP/j+EZ6IDV/8k= =ud0Z -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-10-14' into staging nbd patches for 2017-10-14 - Marc-André Lureau - NBD: use g_new() family of functions - Vladimir Sementsov-Ogievskiy - first half of 00/13 nbd minimal structured read # gpg: Signature made Sun 15 Oct 2017 01:38:47 BST # gpg: using RSA key 0xA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2017-10-14: nbd: header constants indenting nbd/server: simplify reply transmission nbd/server: refactor nbd_co_send_simple_reply parameters nbd/server: do not use NBDReply structure nbd/server: structurize simple reply header sending nbd: rename some simple-request related objects to be _simple_ block/nbd-client: refactor nbd_co_receive_reply block/nbd-client: assert qiov len once in nbd_co_request NBD: use g_new() family of functions Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-10-16 15:54:42 +01:00
Vladimir Sementsov-Ogievskiy	92652b1243	nbd: header constants indenting Prepare indenting for the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171012095319.136610-9-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-10-13 09:27:38 -05:00
Manos Pitsidianakis	f8ea8dacf0	block: rename bdrv_co_drain to bdrv_co_drain_begin Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-10-13 12:38:41 +01:00
Manos Pitsidianakis	481cad48e5	block: add bdrv_co_drain_end callback BlockDriverState has a bdrv_co_drain() callback but no equivalent for the end of the drain. The throttle driver (block/throttle.c) needs a way to mark the end of the drain in order to toggle io_limits_disabled correctly, thus bdrv_co_drain_end is needed. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-10-13 12:38:41 +01:00
Vladimir Sementsov-Ogievskiy	caad53845a	nbd/server: structurize simple reply header sending Use packed structure instead of pointer arithmetics. Also, merge two redundant traces into one. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171012095319.136610-5-vsementsov@virtuozzo.com> [eblake: tweak and mention impact on traces, fix errp usage] Signed-off-by: Eric Blake <eblake@redhat.com>	2017-10-12 16:53:15 -05:00
Kevin Wolf	bde70715b6	commit: Remove overlay_bs We don't need to make any assumptions about the graph layout above the top node of the commit operation any more. Remove the use of bdrv_find_overlay() and related variables from the commit job code. bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so we can just drop it. The overlay node was previously added to the block job to get a BLK_PERM_GRAPH_MOD. We really need to respect those permissions in bdrv_drop_intermediate() now, but as long as we haven't figured out yet how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO comment there. With this change, it is now possible to perform another block job on an overlay node without conflicts. qemu-iotests 030 is changed accordingly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-10-06 16:28:58 +02:00
Kevin Wolf	6858eba09e	block: Introduce BdrvChildRole.update_filename There is no good reason for bdrv_drop_intermediate() to know the active layer above the subchain it is operating on - even more so, because the assumption that there is a single active layer above it is not generally true. In order to prepare removal of the active parameter, use a BdrvChildRole callback to update the backing file string in the overlay image instead of directly calling bdrv_change_backing_file(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	0fdf1a4f68	dirty-bitmap: Switch bdrv_set_dirty() to bytes Both callers already had bytes available, but were scaling to sectors. Move the scaling to internal code. In the case of bdrv_aligned_pwritev(), we are now passing the exact offset rather than a rounded sector-aligned value, but that's okay as long as dirty bitmap widens start/bytes to granularity boundaries. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	e0d7f73e63	dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes Some of the callers were already scaling bytes to sectors; others can be easily converted to pass byte offsets, all in our shift towards a consistent byte interface everywhere. Making the change will also make it easier to write the hold-out callers to use byte rather than sectors for their iterations; it also makes it easier for a future dirty-bitmap patch to offload scaling over to the internal hbitmap. Although all callers happen to pass sector-aligned values, make the internal scaling robust to any sub-sector requests. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	3b5d4df0c6	dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes Half the callers were already scaling bytes to sectors; the other half can eventually be simplified to use byte iteration. Both callers were already using the result as a bool, so make that explicit. Making the change also makes it easier for a future dirty-bitmap patch to offload scaling over to the internal hbitmap. Remember, asking whether a byte is dirty is effectively asking whether the entire granularity containing the byte is dirty, since we only track dirtiness by granularity. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	715a74d819	dirty-bitmap: Set iterator start by offset, not sector All callers to bdrv_dirty_iter_new() passed 0 for their initial starting point, drop that parameter. Most callers to bdrv_set_dirty_iter() were scaling a byte offset to a sector number; the exception qcow2-bitmap will be converted later to use byte rather than sector iteration. Move the scaling to occur internally to dirty bitmap code instead, so that callers now pass in bytes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	86f6ae67e1	dirty-bitmap: Change bdrv_dirty_bitmap_serialize() to take bytes Right now, the dirty-bitmap code exposes the fact that we use a scale of sector granularity in the underlying hbitmap to anything that wants to serialize a dirty bitmap. It's nicer to uniformly expose bytes as our dirty-bitmap interface, matching the previous change to bitmap size. The only caller to serialization is currently qcow2-cluster.c, which becomes a bit more verbose because it is still tracking sectors for other reasons, but a later patch will fix that to more uniformly use byte offsets everywhere. Likewise, within dirty-bitmap, we have to add more assertions that we are not truncating incorrectly, which can go away once the internal hbitmap is byte-based rather than sector-based. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	1b6cc579de	dirty-bitmap: Avoid size query failure during truncate We've previously fixed several places where we failed to account for possible errors from bdrv_nb_sectors(). Fix another one by making bdrv_dirty_bitmap_truncate() take the new size from the caller instead of querying itself; then adjust the sole caller bdrv_truncate() to pass the size just determined by a successful resize, or to reuse the size given to the original truncate operation when refresh_total_sectors() was not able to confirm the actual size (the two sizes can potentially differ according to rounding constraints), thus avoiding sizing the bitmaps to -1. This also fixes a bug where not all failure paths in bdrv_truncate() would set errp. Note that bdrv_truncate() is still a bit awkward. We may want to revisit it later and clean up things to better guarantee that a resize attempt either fails cleanly up front, or cannot fail after guest-visible changes have been made (if temporary changes are made, then they need to be cleanly rolled back). But that is a task for another day; for now, the goal is the bare minimum fix to ensure that just bdrv_dirty_bitmap_truncate() cannot fail. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	dfe55c3577	dirty-bitmap: Drop unused functions We had several functions that no one is currently using, and which use sector-based interfaces. I'm trying to convert towards byte-based interfaces, so it's easier to just drop the unused functions: bdrv_dirty_bitmap_get_meta bdrv_dirty_bitmap_get_meta_locked bdrv_dirty_bitmap_reset_meta bdrv_dirty_bitmap_meta_granularity Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Kevin Wolf	3045025991	block: Fix permissions after bdrv_reopen() If we switch between read-only and read-write, the permissions that image format drivers need on bs->file change, too. Make sure to update the permissions during bdrv_reopen(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-09-26 14:46:23 +02:00
Kevin Wolf	148eb13c84	block: Base permissions on rw state after reopen When new permissions are calculated during bdrv_reopen(), they need to be based on the state of the graph as it will be after the reopen has completed, not on the current state of the involved nodes. This patch makes bdrv_is_writable() optionally accept a BlockReopenQueue from which the new flags are taken. This is then used for determining the new bs->file permissions of format drivers as soon as we add the code to actually pass a non-NULL reopen queue to the .bdrv_child_perm callbacks. While moving bdrv_is_writable(), make it static. It isn't used outside block.c. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-09-26 14:46:23 +02:00
Kevin Wolf	e0995dc3da	block: Add reopen_queue to bdrv_child_perm() In the context of bdrv_reopen(), we'll have to look at the state of the graph as it will be after the reopen. This interface addition is in preparation for the change. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-09-26 14:46:23 +02:00
Paolo Bonzini	08e2c9f19c	scsi: move block/scsi.h to include/scsi/constants.h Complete the transition by renaming this header, which was shared by block/iscsi.c and the SCSI emulation code. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:31 +02:00
Paolo Bonzini	e5b5728cd3	scsi: move non-emulation specific code to scsi/ util/scsi.c includes some SCSI code that is shared by block/iscsi.c and hw/scsi, but the introduction of the persistent reservation helper will add many more instances of this. There is also include/block/scsi.h, which actually is not part of the core block layer. The persistent reservation manager will also need a home. A scsi/ directory provides one for both the aforementioned shared code and the PR manager code. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:11 +02:00
Peter Maydell	75be9a52b1	nbd patches for 2017-09-06 - Daniel P. Berrange: [0/2] Fix / skip recent iotests with LUKS driver - Eric Blake: [0/3] nbd: Use common read/write-all qio functions -----BEGIN PGP SIGNATURE----- Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAABCAAGBQJZsBGjAAoJEKeha0olJ0NqVRoH/iiNEB2SlZFFl5W++wf3Ekq/ lvtZjK3rxpvRXvy6LiRsYVs27Etc8E9aSw2UK6aaqgA3qR8g3zdmwUZb9w3slkeI OXedt0fS5IpQ4UP0ORUBb/LgyOgW3uA0UjHBTEAKl0SyvFPx+TrTZXxqQUqlAc9A lFaA0g71xvfqWWhXmt0PQjRr9bBEpe+4L4NgOypa+Z3xbBAektx390S8N/b/P8fC FNwAqBPTY5XAgJGnEhL9EUOdUWnVgoyG1MR63puJzULYi+2+TlpR2w030qRif75b h7TqYUvwKLnoqMyhBb5LmyhcqwNdphz/1DsEudk18XGuvC94WYkopC3rT7TPWLs= =vGUc -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-09-06' into staging nbd patches for 2017-09-06 - Daniel P. Berrange: [0/2] Fix / skip recent iotests with LUKS driver - Eric Blake: [0/3] nbd: Use common read/write-all qio functions # gpg: Signature made Wed 06 Sep 2017 16:17:55 BST # gpg: using RSA key 0xA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2017-09-06: nbd: Use new qio_channel_*_all() functions io: Add new qio_channel_read{, v}_all_eof functions io: Yield rather than wait when already in coroutine iotests: blacklist 194 with the luks driver iotests: rewrite 192 to use _launch_qemu to fix LUKS support Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-09-07 17:53:59 +01:00
Eric Blake	030fa7f6f9	nbd: Use new qio_channel_*_all() functions Rather than open-coding our own read/write-all functions, we can make use of the recently-added qio code. It slightly changes the error message in one of the iotests. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170905191114.5959-4-eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com>	2017-09-06 10:11:54 -05:00
Manos Pitsidianakis	d8e7d87ec4	block: add throttle block filter driver block/throttle.c uses existing I/O throttle infrastructure inside a block filter driver. I/O operations are intercepted in the filter's read/write coroutines, and referred to block/throttle-groups.c The driver can be used with the syntax -drive driver=throttle,file.filename=foo.qcow2,throttle-group=bar which registers the throttle filter node with the ThrottleGroup 'bar'. The given group must be created beforehand with object-add or -object. Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-06 10:12:02 +02:00
Manos Pitsidianakis	432d889e55	block: convert ThrottleGroup to object with QOM ThrottleGroup is converted to an object. This will allow the future throttle block filter drive easy creation and configuration of throttle groups in QMP and cli. A new QAPI struct, ThrottleLimits, is introduced to provide a shared struct for all throttle configuration needs in QMP. ThrottleGroups can be created via CLI as -object throttle-group,id=foo,x-iops-total=100,x-.. where x-* are individual limit properties. Since we can't add non-scalar properties in -object this interface must be used instead. However, setting these properties must be disabled after initialization because certain combinations of limits are forbidden and thus configuration changes should be done in one transaction. The individual properties will go away when support for non-scalar values in CLI is implemented and thus are marked as experimental. ThrottleGroup also has a `limits` property that uses the ThrottleLimits struct. It can be used to create ThrottleGroups or set the configuration in existing groups as follows: { "execute": "object-add", "arguments": { "qom-type": "throttle-group", "id": "foo", "props" : { "limits": { "iops-total": 100 } } } } { "execute" : "qom-set", "arguments" : { "path" : "foo", "property" : "limits", "value" : { "iops-total" : 99 } } } This also means a group's configuration can be fetched with qom-get. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 18:12:21 +02:00
Manos Pitsidianakis	c61791fc23	block: add aio_context field in ThrottleGroupMember timer_cb() needs to know about the current Aio context of the throttle request that is woken up. In order to make ThrottleGroupMember backend agnostic, this information is stored in an aio_context field instead of accessing it from BlockBackend. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 16:47:52 +02:00
Manos Pitsidianakis	022cdc9f40	block: move ThrottleGroup membership to ThrottleGroupMember This commit eliminates the 1:1 relationship between BlockBackend and throttle group state. Users will be able to create multiple throttle nodes, each with its own throttle group state, in the future. The throttle group state cannot be per-BlockBackend anymore, it must be per-throttle node. This is done by gathering ThrottleGroup membership details from BlockBackendPublic into ThrottleGroupMember and refactoring existing code to use the structure. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 16:47:51 +02:00
Daniel P. Berrange	b461151ff3	block: document semantics of bdrv_co_preadv\|pwritev Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170831105456.9558-1-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-09-05 11:07:02 +01:00
Daniel P. Berrange	64182a6b8b	block: document semantics of bdrv_co_preadv\|pwritev Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 11:40:34 +02:00
Manos Pitsidianakis	f7cc69b326	block: add default implementations for bdrv_co_get_block_status() bdrv_co_get_block_status_from_file() and bdrv_co_get_block_status_from_backing() set *file to bs->file and bs->backing respectively, so that bdrv_co_get_block_status() can recurse to them. Future block drivers won't have to duplicate code to implement this. Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-04 18:31:13 +02:00
Manos Pitsidianakis	f024aee867	block: remove unused bdrv_media_changed This function is not used anywhere, so remove it. Markus Armbruster adds: The i82078 floppy device model used to call bdrv_media_changed() to implement its media change bit when backed by a host floppy. This went away in `21fcf36` "fdc: simplify media change handling". Probably broke host floppy media change. Host floppy pass-through was dropped in commit `f709623`. bdrv_media_changed() has never been used for anything else. Remove it. (Source is Message-ID: <87y3ruaypm.fsf@dusky.pond.sub.org>) Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-04 18:31:13 +02:00
Manos Pitsidianakis	5a612c009e	block: pass bdrv_* methods to bs->file by default in block filters The following functions fail if bs->drv is a filter and does not implement them: bdrv_probe_blocksizes bdrv_probe_geometry bdrv_truncate bdrv_has_zero_init bdrv_get_info Instead, the call should be passed to bs->file if it exists, to allow filter drivers to support those methods without implementing them. This commit makes `drv->is_filter = true` imply that these callbacks will be forwarded to bs->file by default, so disabling support for these functions must be done explicitly. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-04 18:31:13 +02:00
Vladimir Sementsov-Ogievskiy	490dc5ed9b	nbd/client: fix nbd_send_request to return int Fix nbd_send_request to return int, as it returns a return value of nbd_write (which is int), and the only user of nbd_send_request's return value (nbd_co_send_request) consider it as int too. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170804151440.320927-5-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-30 13:00:38 -05:00
Vladimir Sementsov-Ogievskiy	ba8456442b	nbd/client: refactor nbd_receive_reply Refactor nbd_receive_reply to return 1 on success, 0 on eof, when no data was read and <0 for other cases, because returned size of read data is not actually used. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170804151440.320927-4-vsementsov@virtuozzo.com> [eblake: tweak function comments] Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-30 13:00:38 -05:00
Kevin Wolf	54a32bfec1	block: Allow reopen rw without BDRV_O_ALLOW_RDWR BDRV_O_ALLOW_RDWR is a flag that tells whether qemu can internally reopen a node read-write temporarily because the user requested read-write for the top-level image, but qemu decided that read-only is enough for this node (a backing file). bdrv_reopen() is different, it is also used for cases where the user changed their mind and wants to update the options. There is no reason to forbid making a node read-write in that case. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-08-08 15:19:16 +02:00
Paolo Bonzini	82346685b8	block: drop bdrv_set_key from BlockDriver This is not used anymore since `c01c214b69` ("block: remove all encryption handling APIs", 2017-07-11). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-08 14:37:00 +02:00
Kevin Wolf	d3c8c67469	block: Skip implicit nodes in query-block/blockstats Commits `0db832f` and `6cdbceb` introduced the automatic insertion of filter nodes above the top layer of mirror and commit block jobs. The assumption made there was that since libvirt doesn't do node-level management of the block layer yet, it shouldn't be affected by added nodes. This is true as far as commands issued by libvirt are concerned. It only uses BlockBackend names to address nodes, so any operations it performs still operate on the root of the tree as intended. However, the assumption breaks down when you consider query commands, which return data for the wrong node now. These commands also return information on some child nodes (bs->file and/or bs->backing), which libvirt does make use of, and which refer to the wrong nodes, too. One of the consequences is that oVirt gets wrong information about the image size and stops the VM in response as long as a mirror or commit job is running: https://bugzilla.redhat.com/show_bug.cgi?id=1470634 This patch fixes the problem by hiding the implicit nodes created automatically by the mirror and commit block jobs in the output of query-block and BlockBackend-based query-blockstats as long as the user doesn't indicate that they are aware of those nodes by providing a node name for them in the QMP command to start the block job. The node-based commands query-named-block-nodes and query-blockstats with query-nodes=true still show all nodes, including implicit ones. This ensures that users that are capable of node-level management can still access the full information; users that only know BlockBackends won't use these commands. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Tested-by: Eric Blake <eblake@redhat.com>	2017-07-24 15:06:04 +02:00
Paolo Bonzini	61124f03ab	block: invoke .bdrv_drain callback in coroutine context and from AioContext This will let the callback take a CoMutex in the next patch. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-8-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-07-17 11:28:15 +08:00
Eric Blake	081dd1fe36	nbd: Implement NBD_INFO_BLOCK_SIZE on client The upstream NBD Protocol has defined a new extension to allow the server to advertise block sizes to the client, as well as a way for the client to inform the server whether it intends to obey block sizes. When using the block layer as the client, we will obey block sizes; but when used as 'qemu-nbd -c' to hand off to the kernel nbd module as the client, we are still waiting for the kernel to implement a way for us to learn if it will honor block sizes (perhaps by an addition to sysfs, rather than an ioctl), as well as any way to tell the kernel what additional block sizes to obey (NBD_SET_BLKSIZE appears to be accurate for the minimum size, but preferred and maximum sizes would probably be new ioctl()s), so until then, we need to make our request for block sizes conditional. When using ioctl(NBD_SET_BLKSIZE) to hand off to the kernel, use the minimum block size as the sector size if it is larger than 512, which also has the nice effect of cooperating with (non-qemu) servers that don't do read-modify-write when exposing a block device with 4k sectors; it might also allow us to visit a file larger than 2T on a 32-bit kernel. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170707203049.534-10-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:42 +02:00
Eric Blake	3736cc5be3	nbd: Expose and debug more NBD constants The NBD protocol has several constants defined in various extensions that we are about to implement. Expose them to the code, along with an easy way to map various constants to strings during diagnostic messages. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170707203049.534-4-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Eric Blake	004a89fce9	nbd: Create struct for tracking export info The NBD Protocol is introducing some additional information about exports, such as minimum request size and alignment, as well as an advertised maximum request size. It will be easier to feed this information back to the block layer if we gather all the information into a struct, rather than adding yet more pointer parameters during negotiation. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170707203049.534-2-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Max Reitz	7ea37c3066	block: Add PreallocMode to bdrv_truncate() For block drivers that just pass a truncate request to the underlying protocol, we can now pass the preallocation mode instead of aborting if it is not PREALLOC_MODE_OFF. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Max Reitz	8243ccb743	block: Add PreallocMode to BD.bdrv_truncate() Add a PreallocMode parameter to the bdrv_truncate() function implemented by each block driver. Currently, we always pass PREALLOC_MODE_OFF and no driver accepts anything else. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Stefan Hajnoczi	90880ff107	block: add bdrv_measure() API bdrv_measure() provides a conservative maximum for the size of a new image. This information is handy if storage needs to be allocated (e.g. a SAN or an LVM volume) ahead of time. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20170705125738.8777-2-stefanha@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:00 +02:00
Vladimir Sementsov-Ogievskiy	615b5dcf2d	block: release persistent bitmaps on inactivate We should release them here to reload on invalidate cache. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170628120530.31251-31-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:59 +02:00
Vladimir Sementsov-Ogievskiy	56f364e6d7	block/dirty-bitmap: add bdrv_remove_persistent_dirty_bitmap Interface for removing persistent bitmap from its storage. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-28-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:59 +02:00
Vladimir Sementsov-Ogievskiy	a3b52535e8	qmp: add x-debug-block-dirty-bitmap-sha256 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170628120530.31251-26-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:59 +02:00
Vladimir Sementsov-Ogievskiy	67b792f5ed	block: add bdrv_can_store_new_dirty_bitmap This will be needed to check some restrictions before making bitmap persistent in qmp-block-dirty-bitmap-add (this functionality will be added by future patch) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-22-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:59 +02:00
Vladimir Sementsov-Ogievskiy	3dd10a06d1	block/dirty-bitmap: add bdrv_dirty_bitmap_next() Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-19-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	a88b179fdb	block: introduce persistent dirty bitmaps New field BdrvDirtyBitmap.persistent means, that bitmap should be saved by format driver in .bdrv_close and .bdrv_inactivate. No format driver supports it for now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170628120530.31251-18-vsementsov@virtuozzo.com [mreitz: Fixed indentation] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	a0319aacd4	block/dirty-bitmap: add autoload field to BdrvDirtyBitmap Mirror AUTO flag from Qcow2 bitmap in BdrvDirtyBitmap. This will be needed in future, to save this flag back to Qcow2 for persistent bitmaps. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170628120530.31251-16-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	cb9ff6c25a	block: new bdrv_reopen_bitmaps_rw interface Add format driver handler, which should mark loaded read-only bitmaps as 'IN_USE' in the image and unset read_only field in corresponding BdrvDirtyBitmap's. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170628120530.31251-14-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	d6883bc968	block/dirty-bitmap: add readonly field to BdrvDirtyBitmap It will be needed in following commits for persistent bitmaps. If bitmap is loaded from read-only storage (and we can't mark it "in use" in this storage) corresponding BdrvDirtyBitmap should be read-only. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170628120530.31251-11-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	6bdc8b719a	block/dirty-bitmap: add deserialize_ones func Add bdrv_dirty_bitmap_deserialize_ones() function, which is needed for qcow2 bitmap loading, to handle unallocated bitmap parts, marked as all-ones. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-7-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:57 +02:00
Vladimir Sementsov-Ogievskiy	ba06ff1a5c	block: fix bdrv_dirty_bitmap_granularity signature Make getter signature const-correct. This allows other functions with const dirty bitmap parameter use bdrv_dirty_bitmap_granularity(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20170628120530.31251-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:57 +02:00
Daniel P. Berrange	c01c214b69	block: remove all encryption handling APIs Now that all encryption keys must be provided upfront via the QCryptoSecret API and associated block driver properties there is no need for any explicit encryption handling APIs in the block layer. Encryption can be handled transparently within the block driver. We only retain an API for querying whether an image is encrypted or not, since that is a potentially useful piece of metadata to report to the user. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-18-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:56 +02:00
Daniel P. Berrange	0cb8d47ba9	block: deprecate "encryption=on" in favor of "encrypt.format=aes" Historically the qcow & qcow2 image formats supported a property "encryption=on" to enable their built-in AES encryption. We'll soon be supporting LUKS for qcow2, so need a more general purpose way to enable encryption, with a choice of formats. This introduces an "encrypt.format" option, which will later be joined by a number of other "encrypt.XXX" options. The use of a "encrypt." prefix instead of "encrypt-" is done to facilitate mapping to a nested QAPI schema at later date. e.g. the preferred syntax is now qemu-img create -f qcow2 -o encrypt.format=aes demo.qcow2 Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-8-berrange@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:55 +02:00
Eric Blake	51b0a48888	block: Make bdrv_is_allocated_above() byte-based We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the signature of the function to use int64_t *pnum ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned, but that can be relaxed when a later patch implements byte-based block status. Therefore, for the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_is_allocated(). But some code, particularly stream_run(), gets a lot simpler because it no longer has to mess with sectors. Leave comments where we can further simplify by switching to byte-based iterations, once later patches eliminate the need for sector-aligned operations. For ease of review, bdrv_is_allocated() was tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:07 +02:00
Eric Blake	d6a644bbfe	block: Make bdrv_is_allocated() byte-based We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the signature of the function to use int64_t pnum ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned on input and that pnum is sector-aligned on return to the caller, but that can be relaxed when a later patch implements byte-based block status. Therefore, this code adds usages like DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned values, where the call might reasonbly give non-aligned results in the future; on the other hand, no rounding is needed for callers that should just continue to work with byte alignment. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_is_allocated(). But some code, particularly bdrv_commit(), gets a lot simpler because it no longer has to mess with sectors; also, it is now possible to pass NULL if the caller does not care how much of the image is allocated beyond the initial offset. Leave comments where we can further simplify once a later patch eliminates the need for sector-aligned requests through bdrv_is_allocated(). For ease of review, bdrv_is_allocated_above() will be tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:07 +02:00
Eric Blake	f6ac207893	backup: Switch block_backup.h to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Continue by converting the public interface to backup jobs (no semantic change), including a change to CowRequest to track by bytes instead of cluster indices. Note that this does not change the difference between the public interface (starting point, and size of the subsequent range) and the internal interface (starting and end points). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Xie Changlong <xiechanglong@cmss.chinamobile.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	e8a81e9cad	block: Drop unused bdrv_round_sectors_to_clusters() Now that the last user [mirror_iteration()] has converted to using bytes, we no longer need a function to round sectors to clusters. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	d5254033da	block: Simplify use of BDRV_BLOCK_RAW The lone caller that cares about a return of BDRV_BLOCK_RAW (namely, io.c:bdrv_co_get_block_status) completely replaces the return value, so there is no point in passing BDRV_BLOCK_DATA. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Eric Blake	fb0d8654ff	block: Add BDRV_BLOCK_EOF to bdrv_get_block_status() Just as the block layer already sets BDRV_BLOCK_ALLOCATED as a shortcut for subsequent operations, there are also some optimizations that are made easier if we can quickly tell that *pnum will advance us to the end of a file, via a new BDRV_BLOCK_EOF which gets set by the block layer. This just plumbs up the new bit; subsequent patches will make use of it. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170505021500.19315-2-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-30 21:48:06 +08:00
sochin.jiang	4172a00373	fix: avoid an infinite loop or a dangling pointer problem in img_commit img_commit could fall into an infinite loop calling run_block_job() if its blockjob fails on any I/O error, fix this already known problem. Signed-off-by: sochin.jiang <sochin.jiang@huawei.com> Message-id: 1497509253-28941-1-git-send-email-sochin.jiang@huawei.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-06-26 14:54:46 +02:00
Manos Pitsidianakis	f5a5ca7969	block: change variable names in BlockDriverState Change the 'int count' parameter in pwrite_zeros, pdiscard related functions (and some others) to 'int bytes', as they both refer to bytes. This helps with code legibility. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-06-26 14:54:46 +02:00
Kevin Wolf	c5f1ad429c	block: Remove bdrv_aio_readv/writev/flush() These functions are unused now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:15 +02:00
Peter Maydell	65a0e3e842	-----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEtBAABCAAXBQJZQyPmEBxmYW16QHJlZGhhdC5jb20ACgkQyjViTGqRccaWrgf/ SCAHpi4gzWbr7AN03jP16Qy/kqNik6F7LTNSqrRbvBPb3TNchDd4z44SAghK5m/r +IlYQc20sBZ60tRHIHAUSF2WNcea2pj1v3ZVgjrI7hiJ3DXPiqqt/dAR/W/BLIDO tAHAVF6Pnrjm9DC4d2zATLDHvcHMzWOsnePh7XcOm44REbwUr3GDg6bf2+j+5yfS 9ewmXfh8z4w1IvSn+f5B+IeCvGvJNA1D55dqcGo8Ivlg9PnElziXFaXO2s7UiLIM mF3eTSIbJQNNN+E+0lpRpnqQiq+Txxggu61Q4f8bOTBhEOPa3etj1ydnXMVbvX25 6SUuBfGh51tyOIZOJz3GtA== =9b+J -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/famz/tags/docker-and-block-pull-request' into staging # gpg: Signature made Fri 16 Jun 2017 01:18:46 BST # gpg: using RSA key 0xCA35624C6A9171C6 # gpg: Good signature from "Fam Zheng <famz@redhat.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6 * remotes/famz/tags/docker-and-block-pull-request: (23 commits) block: make accounting thread-safe block: split BlockAcctStats creation and setup block: introduce block_account_one_io block: protect modification of dirty bitmaps with a mutex migration/block: reset dirty bitmap before reading block: introduce dirty_bitmap_mutex block: protect tracked_requests and flush_queue with reqs_lock block: access write_gen with atomics block: use Stat64 for wr_highest_offset util: add stats64 module throttle-groups: protect throttled requests with a CoMutex throttle-groups: do not use qemu_co_enter_next throttle-groups: only start one coroutine from drained_begin block: access io_plugged with atomic ops block: access wakeup with atomic ops block: access serialising_in_flight with atomic ops block: access io_limits_disabled with atomic ops block: access quiesce_counter with atomic ops block: access copy_on_read with atomic ops docker: Add flex and bison to centos6 image ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-20 16:01:15 +01:00
Paolo Bonzini	5b50bf77ce	block: make accounting thread-safe I'm not trying too hard yet. Later, with multiqueue support, this may cause mutex contention or cacheline bouncing. Cc: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-20-pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	9caa6f3dbe	block: split BlockAcctStats creation and setup block_acct_destroy is called unconditionally in blk_delete, but there is no BlockAcctStats function that is called unconditionally in blk_new. Split block_acct_init in two, so that it will be possible to create a QemuMutex in block_acct_init and destroy it in block_acct_cleanup. Cc: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-19-pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	b64bd51efa	block: protect modification of dirty bitmaps with a mutex Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-17-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	2119882c7e	block: introduce dirty_bitmap_mutex It protects only the list of dirty bitmaps; in the next patch we will also protect their content. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-15-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	3783fa3dd3	block: protect tracked_requests and flush_queue with reqs_lock Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-14-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	47fec59941	block: access write_gen with atomics Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-13-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	f7946da274	block: use Stat64 for wr_highest_offset Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-12-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	850d54a2a9	block: access io_plugged with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-7-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	e2a6ae7fe5	block: access wakeup with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-6-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	20fc71b25c	block: access serialising_in_flight with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-5-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	414c2ec358	block: access quiesce_counter with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-3-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	d3faa13e5f	block: access copy_on_read with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-2-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Vladimir Sementsov-Ogievskiy	d1fdf257d5	nbd: rename read_sync and friends Rename nbd_wr_syncv -> nbd_rwv read_sync -> nbd_read read_sync_eof -> nbd_read_eof write_sync -> nbd_write drop_sync -> nbd_drop 1. nbd_ prefix read_sync and write_sync are already shared, so it is good to have a namespace prefix. drop_sync will be shared, and read_sync_eof is related to read_sync, so let's rename them all. 2. _sync suffix _sync is related to the fact that nbd_wr_syncv doesn't return if a write to socket returns EAGAIN. The first implementation of nbd_wr_syncv (was wr_sync in `7a5ca8648b`) just loops while getting EAGAIN, the current implementation yields in this case. Why we want to get rid of it: - it is normal for r/w functions to be synchronous, so having an additional suffix for it looks redundant (contrariwise, we have _aio suffix for async functions) - _sync suffix in block layer is used when function does flush (so using it for other thing is confusing a bit) - keep function names short after adding nbd_ prefix 3. for nbd_wr_syncv let's use more common notation 'rw' Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170602150150.258222-2-vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-06-15 11:04:06 +02:00
Eric Blake	0c9390d978	nbd: Fix regression on resiliency to port scan Back in qemu 2.5, qemu-nbd was immune to port probes (a transient server would not quit, regardless of how many probe connections came and went, until a connection actually negotiated). But we broke that in commit `ee7d7aa` when removing the return value to nbd_client_new(), although that patch also introduced a bug causing an assertion failure on a client that fails negotiation. We then made it worse during refactoring in commit `1a6245a` (a segfault before we could even assert); the (masked) assertion was cleaned up in `d3780c2` (still in 2.6), and just recently we finally fixed the segfault ("nbd: Fully intialize client in case of failed negotiation"). But that still means that ever since we added TLS support to qemu-nbd, we have been vulnerable to an ill-timed port-scan being able to cause a denial of service by taking down qemu-nbd before a real client has a chance to connect. Since negotiation is now handled asynchronously via coroutines, we no longer have a synchronous point of return by re-adding a return value to nbd_client_new(). So this patch instead wires things up to pass the negotiation status through the close_fn callback function. Simple test across two terminals: $ qemu-nbd -f raw -p 30001 file $ nmap 127.0.0.1 -p 30001 && \ qemu-io -c 'r 0 512' -f raw nbd://localhost:30001 Note that this patch does not change what constitutes successful negotiation (thus, a client must enter transmission phase before that client can be considered as a reason to terminate the server when the connection ends). Perhaps we may want to tweak things in a later patch to also treat a client that uses NBD_OPT_ABORT as being a 'successful' negotiation (the client correctly talked the NBD protocol, and informed us it was not going to use our export after all), but that's a discussion for another day. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170608222617.20376-1-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-06-15 11:04:05 +02:00
Vladimir Sementsov-Ogievskiy	be41c100c0	nbd/client.c: use errp instead of LOG Move to modern errp scheme from just LOGging errors. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170526110913.89098-1-vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-06-06 20:18:36 +02:00
Vladimir Sementsov-Ogievskiy	f260956536	nbd: add errp parameter to nbd_wr_syncv() Will be used in following patch to provide actual error message in some cases. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170516094533.6160-4-vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-06-06 20:18:36 +02:00
Juan Quintela	5e22479ae2	migration: Create include for migration snapshots Start removing migration code from sysemu/sysemu.h. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:49:23 +02:00
Stefan Hajnoczi	0748b3526e	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJZLDGTAAoJEH8JsnLIjy/WjY0P/jtXF+SQKs9H695Uy+U+EUr5 9w+lYTfjHXTUx9F5AKT4OAMww/uWSG5ywG8FZD8AJZyt1B8piDrSA7sW5YYgWCV4 XD1HMzN6pasVGrchzgfBSW/K9BgUlvBEjqFrLCNVii0osa68J7vX3uJsLyRA+yPo ijFseJaB/b0KnAPG9JqHXc4DVQgU2IhZH3ZsgE6Utb+KUrm8/veiORhWQboBbfCW eyTF+YWee/rI+up4nKn9+Le57EUXWr3Y50BBxFZw1/4wQbv0WEhOIbl/Z4ggxNRR VoyXGMqOIZFlGQ7bGxDawuEwGKfcOFFEHj3rhPrs7RPnod/6X8Vxm0rAr2GNZoOW 9tLMQKe4HLo3kVwX65AhzWd/WMukAd0MC/0oaULOjiAEdlWjflyEhx9H9uYefl5x kn77ZqQqIEL4aCYRBLJn9oJV3CIZbSd6cuKC9UPuXAwD8XDWTXUzT/aDMJUnfij7 vhS1qks1Wyk37wIjTuuERwrr0K6y8e3De30NeU6LqQuzXvJ1MYSp49BA1FCgHfmT x5RWbTSjLdjZiIo7biE2dSwdOtsxsnuxX19utqfGqOmegNU7LykRtu6mFh9kJhNe 5ladt1Mu//puZLQ/q4RH/J0pwkuS/OVrS3LBobcggPGHhUIiHhdNUI73bb7BK0+9 ME5DGLm4/c/REB3Lidr9 =kkU0 -----END PGP SIGNATURE----- Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging Block layer patches # gpg: Signature made Mon 29 May 2017 03:34:59 PM BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * kwolf/tags/for-upstream: block/file-: _parse_filename() and colons block: Fix backing paths for filenames with colons block: Tweak error message related to qemu-img amend qemu-img: Fix leakage of options on error qemu-img: copy *key-secret opts when opening newly created files qemu-img: introduce --target-image-opts for 'convert' command qemu-img: fix --image-opts usage with dd command qemu-img: add support for --object with 'dd' command qemu-img: Fix documentation of convert qcow2: remove extra local_error variable mirror: Drop permissions on s->target on completion nvme: Add support for Controller Memory Buffers iotests: 147: Don't test inet6 if not available qemu-iotests: Test streaming with missing job ID stream: fix crash in stream_start() when block_job_create() fails Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-30 14:15:15 +01:00
Max Reitz	03c320d803	block/file-: _parse_filename() and colons The file drivers' *_parse_filename() implementations just strip the optional protocol prefix off the filename. However, for e.g. "file:foo:bar", this would lead to "foo:bar" being stored as the BDS's filename which looks like it should be managed using the "foo" protocol. This is especially troublesome if you then try to resolve a backing filename based on "foo:bar". This issue can only occur if the stripped part is a relative filename ("file:/foo:bar" will be shortened to "/foo:bar" and having a slash before the first colon means that "/foo" is not recognized as a protocol part). Therefore, we can easily fix it by prepending "./" to such filenames. Before this patch: $ ./qemu-img create -f qcow2 backing.qcow2 64M Formatting 'backing.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ ./qemu-img create -f qcow2 -b backing.qcow2 file🔝image.qcow2 Formatting 'file🔝image.qcow2', fmt=qcow2 size=67108864 backing_file=backing.qcow2 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ ./qemu-io file🔝image.qcow2 can't open device file🔝image.qcow2: Could not open backing file: Unknown protocol 'top' After this patch: $ ./qemu-io file🔝image.qcow2 [no error] Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170522195217.12991-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-29 15:39:54 +02:00
Paolo Bonzini	eb05e011e2	blockjob: use deferred_to_main_loop to indicate the coroutine has ended All block jobs are using block_job_defer_to_main_loop as the final step just before the coroutine terminates. At this point, block_job_enter should do nothing, but currently it restarts the freed coroutine. Now, the job->co states should probably be changed to an enum (e.g. BEFORE_START, STARTED, YIELDED, COMPLETED) subsuming block_job_started, job->deferred_to_main_loop and job->busy. For now, this patch eliminates the problematic reenter by removing the reset of job->deferred_to_main_loop (which served no purpose, as far as I could see) and checking the flag in block_job_enter. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170508141310.8674-12-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	f321dcb57f	blockjob: introduce block_job_pause/resume_all Remove use of block_job_pause/resume from outside blockjob.c, thus making them static. The new functions are used by the block layer, so place them in blockjob_int.h. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-5-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	05b0d8e3b8	blockjob: introduce block_job_early_fail Outside blockjob.c, block_job_unref is only used when a block job fails to start, and block_job_ref is not used at all. The reference counting thus is pretty well hidden. Introduce a separate function to be used by block jobs; because block_job_ref and block_job_unref now become static, move them earlier in blockjob.c. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-4-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	9f086abbe4	blockjob: remove iostatus_reset callback This is unused since commit `66a0fae` ("blockjob: Don't touch BDS iostatus", 2016-05-19). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-3-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Stefan Hajnoczi	b54933eed5	-----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZFciYAAoJEJykq7OBq3PINCgH/1IXhfLR6AtPKAa0b2oABWEW lD90HiHIxeqYIDiVnppD6CEnIZ6VBmImSyRr6V9Fk7OTkCucVRLOXFZF63d7ETDo PBF8+FJ8YgdClJ80f3HkmPu6vzvnRzcs5mUBrrKhZPAy2adxNmIpfir/78p+Aomd HxCYl3jF6aVUtFEtzTwAmDjXU2NUNPQABAAKiGT5jwSUcksxogg7LSIM/ZlFzNhc ZMI9h6doRP0v3kLyh429qVEgXpCh7VaD9Swv6H3/sSHJdCBP5MILrTADrK+45tmL nIxHruOyt63Jgd8acGDdvYz1oHjPXb2Cn/87SiONsqwoyFJmkqbVXvOg4gqPuJ8= =yXgr -----END PGP SIGNATURE----- Merge tag 'block-pull-request' into staging # gpg: Signature made Fri 12 May 2017 10:37:12 AM EDT # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * tag 'block-pull-request': aio: add missing aio_notify() to aio_enable_external() block: Simplify BDRV_BLOCK_RAW recursion coroutine: remove GThread implementation Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-12 10:39:23 -04:00
Stefan Hajnoczi	321d1dba8b	aio: add missing aio_notify() to aio_enable_external() The main loop uses aio_disable_external()/aio_enable_external() to temporarily disable processing of external AioContext clients like device emulation. This allows monitor commands to quiesce I/O and prevent the guest from submitting new requests while a monitor command is in progress. The aio_enable_external() API is currently broken when an IOThread is in aio_poll() waiting for fd activity when the main loop re-enables external clients. Incrementing ctx->external_disable_cnt does not wake the IOThread from ppoll(2) so fd processing remains suspended and leads to unresponsive emulated devices. This patch adds an aio_notify() call to aio_enable_external() so the IOThread is kicked out of ppoll(2) and will re-arm the file descriptors. The bug can be reproduced as follows: $ qemu -M accel=kvm -m 1024 \ -object iothread,id=iothread0 \ -device virtio-scsi-pci,iothread=iothread0,id=virtio-scsi-pci0 \ -drive if=none,id=drive0,aio=native,cache=none,format=raw,file=test.img \ -device scsi-hd,id=scsi-hd0,drive=drive0 \ -qmp tcp::5555,server,nowait $ scripts/qmp/qmp-shell localhost:5555 (qemu) blockdev-snapshot-sync device=drive0 snapshot-file=sn1.qcow2 mode=absolute-paths format=qcow2 After blockdev-snapshot-sync completes the SCSI disk will be unresponsive. This leads to request timeouts inside the guest. Reported-by: Qianqian Zhu <qizhu@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170508180705.20609-1-stefanha@redhat.com Suggested-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-12 10:36:46 -04:00
Eric Blake	4c41cb4955	block: Update comments on BDRV_BLOCK_* meanings We had some conflicting documentation: a nice 8-way table that described all possible combinations of DATA, ZERO, and OFFSET_VALID, contrasted with text that implied that OFFSET_VALID always meant raw data could be read directly. Furthermore, the text refers a lot to bs->file, even though the interface was updated back in `67a0fd2a` to let the driver pass back a specific BDS (not necessarily bs->file). As the 8-way table is the intended semantics, simplify the rest of the text to get rid of the confusion. ALLOCATED is always set by the block layer for convenience (drivers do not have to worry about it). RAW is used only internally, but by more than the raw driver. Document these additional items on the driver callback. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-4-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Kevin Wolf	9c5e6594f1	block: Fix write/resize permissions for inactive images Format drivers for inactive nodes don't need write/resize permissions on their bs->file and can share write/resize with another VM (in fact, this is the whole point of keeping images inactive). Represent this fact in the op blocker system, so that image locking does the right thing without special-casing inactive images. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	cfa1a5723f	block: Drop permissions when migration completes With image locking, permissions affect other qemu processes as well. We want to be sure that the destination can run, so let's drop permissions on the source when migration completes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	4417ab7adf	block: New BdrvChildRole.activate() for blk_resume_after_migration() Instead of manually calling blk_resume_after_migration() in migration code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend activation with cache invalidation into a single function. This is achieved with a new callback in BdrvChildRole that is called by bdrv_invalidate_cache_all(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Fam Zheng	5a9347c673	block: Add, parse and store "force-share" option Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:02:38 +02:00
Fam Zheng	5176196c32	block: Make bdrv_perm_names public It can be used outside of block.c for making user friendly messages. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:02:38 +02:00
Markus Armbruster	bd269ebc82	sockets: Limit SocketAddressLegacy to external interfaces SocketAddressLegacy is a simple union, and simple unions are awkward: they have their variant members wrapped in a "data" object on the wire, and require additional indirections in C. SocketAddress is the equivalent flat union. Convert all users of SocketAddressLegacy to SocketAddress, except for existing external interfaces. See also commit fce5d53..9445673 and 85a82e8..c5f1ae3. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1493192202-3184-7-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [Minor editing accident fixed, commit message and a comment tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-09 09:14:40 +02:00
Klim Kireev	d4a7f45ec9	block: fix obvious coding style mistakes in block_int.h Signed-off-by: Klim Kireev <proffk@virtuozzo.mipt.ru> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Message-id: 1491405505-31620-2-git-send-email-den@openvz.org Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-28 16:02:03 +02:00
Max Reitz	4bff28b81a	block: Add errp to BD.bdrv_truncate() Add an Error parameter to the block drivers' bdrv_truncate() interface. If a block driver does not set this in case of an error, the generic bdrv_truncate() implementation will do so. Where it is obvious, this patch also makes some block drivers set this value. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170328205129.15138-4-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-28 16:02:03 +02:00
Max Reitz	ed3d2ec98a	block: Add errp to b{lk,drv}_truncate() For one thing, this allows us to drop the error message generation from qemu-img.c and blockdev.c and instead have it unified in bdrv_truncate(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170328205129.15138-3-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-28 16:02:02 +02:00
Jeff Cody	45803a0396	block: introduce bdrv_can_set_read_only() Introduce check function for setting read_only flags. Will return < 0 on error, with appropriate Error value set. Does not alter any flags. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: e2bba34ac3bc76a0c42adc390413f358ae0566e8.1491597120.git.jcody@redhat.com	2017-04-24 15:09:33 -04:00
Jeff Cody	e2b8247a32	block: do not set BDS read_only if copy_on_read enabled A few block drivers will set the BDS read_only flag from their .bdrv_open() function. This means the bs->read_only flag could be set after we enable copy_on_read, as the BDRV_O_COPY_ON_READ flag check occurs prior to the call to bdrv->bdrv_open(). This adds an error return to bdrv_set_read_only(), and an error will be return if we try to set the BDS to read_only while copy_on_read is enabled. This patch also changes the behavior of vvfat. Before, vvfat could override the drive 'readonly' flag with its own, internal 'rw' flag. For instance, this -drive parameter would result in a writable image: "-drive format=vvfat,dir=/tmp/vvfat,rw,if=virtio,readonly=on" This is not correct. Now, attempting to use the above -drive parameter will result in an error (i.e., 'rw' is incompatible with 'readonly=on'). Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 0c5b4c1cc2c651471b131f21376dfd5ea24d2196.1491597120.git.jcody@redhat.com	2017-04-24 15:09:33 -04:00
Jeff Cody	fe5241bfe3	block: add bdrv_set_read_only() helper function We have a helper wrapper for checking for the BDS read_only flag, add a helper wrapper to set the read_only flag as well. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 9b18972d05f5fa2ac16c014f0af98d680553048d.1491597120.git.jcody@redhat.com	2017-04-24 15:09:33 -04:00
Fam Zheng	78bbd910bb	block: Make errp the last parameter of commit_active_start Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170421122710.15373-9-famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-04-24 09:13:44 +02:00
Fam Zheng	9217283dc8	block: Make errp the last parameter of bdrv_img_create Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170421122710.15373-6-famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-04-24 09:12:59 +02:00
Fam Zheng	91af091f92	block: Drain BH in bdrv_drained_begin During block job completion, nothing is preventing block_job_defer_to_main_loop_bh from being called in a nested aio_poll(), which is a trouble, such as in this code path: qmp_block_commit commit_active_start bdrv_reopen bdrv_reopen_multiple bdrv_reopen_prepare bdrv_flush aio_poll aio_bh_poll aio_bh_call block_job_defer_to_main_loop_bh stream_complete bdrv_reopen block_job_defer_to_main_loop_bh is the last step of the stream job, which should have been "paused" by the bdrv_drained_begin/end in bdrv_reopen_multiple, but it is not done because it's in the form of a main loop BH. Similar to why block jobs should be paused between drained_begin and drained_end, BHs they schedule must be excluded as well. To achieve this, this patch forces draining the BH in BDRV_POLL_WHILE. As a side effect this fixes a hang in block_job_detach_aio_context during system_reset when a block job is ready: #0 0x0000555555aa79f3 in bdrv_drain_recurse #1 0x0000555555aa825d in bdrv_drained_begin #2 0x0000555555aa8449 in bdrv_drain #3 0x0000555555a9c356 in blk_drain #4 0x0000555555aa3cfd in mirror_drain #5 0x0000555555a66e11 in block_job_detach_aio_context #6 0x0000555555a62f4d in bdrv_detach_aio_context #7 0x0000555555a63116 in bdrv_set_aio_context #8 0x0000555555a9d326 in blk_set_aio_context #9 0x00005555557e38da in virtio_blk_data_plane_stop #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd #13 0x00005555559f6a18 in virtio_pci_reset #14 0x00005555559139a9 in qdev_reset_one #15 0x0000555555916738 in qbus_walk_children #16 0x0000555555913318 in qdev_walk_children #17 0x0000555555916738 in qbus_walk_children #18 0x00005555559168ca in qemu_devices_reset #19 0x000055555581fcbb in pc_machine_reset #20 0x00005555558a4d96 in qemu_system_reset #21 0x000055555577157a in main_loop_should_exit #22 0x000055555577157a in main_loop #23 0x000055555577157a in main The rationale is that the loop in block_job_detach_aio_context cannot make any progress in pausing/completing the job, because bs->in_flight is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop BH. With this patch, it does. Reported-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170418143044.12187-3-famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Tested-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-04-18 22:56:28 +08:00
Fam Zheng	052a75721f	block: Introduce bdrv_coroutine_enter Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2017-04-11 20:07:15 +08:00
Fam Zheng	8865852e00	async: Introduce aio_co_enter They start the coroutine on the specified context. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2017-04-11 20:07:15 +08:00
Fam Zheng	14e9559f46	block: Make bdrv_parent_drained_begin/end public Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2017-04-11 20:07:15 +08:00
Kevin Wolf	d35ff5e6b3	block: Ignore guest dev permissions during incoming migration Usually guest devices don't like other writers to the same image, so they use blk_set_perm() to prevent this from happening. In the migration phase before the VM is actually running, though, they don't have a problem with writes to the image. On the other hand, storage migration needs to be able to write to the image in this phase, so the restrictive blk_set_perm() call of qdev devices breaks it. This patch flags all BlockBackends with a qdev device as blk->disable_perm during incoming migration, which means that the requested permissions are stored in the BlockBackend, but not actually applied to its root node yet. Once migration has finished and the VM should be resumed, the permissions are applied. If they cannot be applied (e.g. because the NBD server used for block migration hasn't been shut down), resuming the VM fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com>	2017-04-07 14:44:05 +02:00
Fam Zheng	c1cef67251	block: Always call bdrv_child_check_perm first bdrv_child_set_perm alone is not very usable because the caller must call bdrv_child_check_perm first. This is already encapsulated conveniently in bdrv_child_try_set_perm, so remove the other prototypes from the header and fix the one wrong caller, block/mirror.c. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-17 12:54:06 +01:00
Kevin Wolf	5fe31c25cc	block: Fix error handling in bdrv_replace_in_backing_chain() When adding an Error parameter, bdrv_replace_in_backing_chain() would become nothing more than a wrapper around change_parent_backing_link(). So make the latter public, renamed as bdrv_replace_node(), and remove bdrv_replace_in_backing_chain(). Most of the callers just remove a node from the graph that they just inserted, so they can use &error_abort, but completion of a mirror job with 'replaces' set can actually fail. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-07 14:53:28 +01:00
Kevin Wolf	46181129ea	block: Ignore multiple children in bdrv_check_update_perm() change_parent_backing_link() will need to update multiple BdrvChild objects at once. Checking permissions reference by reference doesn't work because permissions need to be consistent only with all parents moved to the new child. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-07 14:53:28 +01:00
Kevin Wolf	b2c2832c61	block: Add Error parameter to bdrv_append() Aborting on error in bdrv_append() isn't correct. This patch fixes it and lets the callers handle failures. Test case 085 needs a reference output update. This is caused by the reversed order of bdrv_set_backing_hd() and change_parent_backing_link() in bdrv_append(): When the backing file of the new node is set, the parent nodes are still pointing to the old top, so the backing blocker is now initialised with the node name rather than the BlockBackend name. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:47:51 +01:00
Kevin Wolf	12fa4af61f	block: Add Error parameter to bdrv_set_backing_hd() Not all callers of bdrv_set_backing_hd() know for sure that attaching the backing file will be allowed by the permission system. Return the error from the function rather than aborting. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:47:51 +01:00
Kevin Wolf	0db832f42e	commit: Add filter-node-name to block-commit Management tools need to be able to know about every node in the graph and need a way to address them. Changing the graph structure was okay because libvirt doesn't really manage the node level yet, but future libvirt versions need to deal with both new and old version of qemu. This new option to blockdev-commit allows the client to set a node-name for the automatically inserted filter driver, and at the same time serves as a witness for a future libvirt that this version of qemu does automatically insert a filter driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:47:50 +01:00
Kevin Wolf	6cdbceb12c	mirror: Add filter-node-name to blockdev-mirror Management tools need to be able to know about every node in the graph and need a way to address them. Changing the graph structure was okay because libvirt doesn't really manage the node level yet, but future libvirt versions need to deal with both new and old version of qemu. This new option to blockdev-mirror allows the client to set a node-name for the automatically inserted filter driver, and at the same time serves as a witness for a future libvirt that this version of qemu does automatically insert a filter driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:47:50 +01:00
Kevin Wolf	bbc02b90bc	blockjob: Factor out block_job_remove_all_bdrv() In some cases, we want to remove op blockers on intermediate nodes before the whole block job transaction has completed (because they block restoring the final graph state during completion). Provide a function for this. The whole block job lifecycle is a bit messed up and it's hard to actually do all things in the right order, but I'll leave simplifying this for another day. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	db95dbba3b	block: BdrvChildRole.attach/detach() callbacks Backing files are somewhat special compared to other kinds of children because they are attached and detached using bdrv_set_backing_hd() rather than the normal set of functions, which does a few more things like setting backing blockers, toggling the BDRV_O_NO_BACKING flag, setting parent_bs->backing_file, etc. These special features are a reason why change_parent_backing_link() can't handle backing files yet. With abstracting the additional features into .attach/.detach callbacks, we get a step closer to a function that can actually deal with this. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	76d554e20b	blockjob: Add permissions to block_job_add_bdrv() Block jobs don't actually do I/O through the the reference they create with block_job_add_bdrv(), but they might want to use the permisssion system to express what the block job does to intermediate nodes. This adds permissions to block_job_add_bdrv() to provide the means to request permissions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	26de9438c1	block: Add BdrvChildRole.stay_at_node When the parents' child links are updated in bdrv_append() or bdrv_replace_in_backing_chain(), this should affect all child links of BlockBackends or other nodes, but not on child links held for other purposes (like for setting permissions). This patch allows to control the behaviour per BdrvChildRole. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	b541155587	block: Add BdrvChildRole.get_parent_desc() For meaningful error messages in the permission system, we need to get some human-readable description of the parent of a BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	c6cc12bfa7	blockjob: Add permissions to block_job_create() This functions creates a BlockBackend internally, so the block jobs need to tell it what they want to do with the BB. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	39829a01ae	block: Allow error return in BlockDevOps.change_media_cb() Some devices allow a media change between read-only and read-write media. They need to adapt the permissions in their .change_media_cb() implementation, which can fail. So add an Error parameter to the function. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	55880601d8	block: Add BDRV_O_RESIZE for blk_new_open() blk_new_open() is a convenience function that processes flags rather than QDict options as a simple way to just open an image file. In order to keep it convenient in the future, it must automatically request the necessary permissions. This can easily be inferred from the flags for read and write, but we need another flag that tells us whether to get the resize permission. We can't just always request it because that means that no block jobs can run on the resulting BlockBackend (which is something that e.g. qemu-img commit wants to do), but we also can't request it never because most of the .bdrv_create() implementations call blk_truncate(). The solution is to introduce another flag that is passed by all users that want to resize the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	91ef38257a	vvfat: Implement .bdrv_child_perm() vvfat is the last remaining driver that can have children, but doesn't implement .bdrv_child_perm() yet. The default handlers aren't suitable here, so let's implement a very simple driver-specific one that protects the internal child from being used by other users as good as our permissions permit. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	6b1a044afb	block: Default .bdrv_child_perm() for format drivers Almost all format drivers have the same characteristics as far as permissions are concerned: They have one or more children for storing their own data and, more importantly, metadata (can be written to and grow even without external write requests, must be protected against other writers and present consistent data) and optionally a backing file (this is just data, so like for a filter, it only depends on what the parent nodes need). This provides a default implementation that can be shared by most of our format drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	6a1b9ee152	block: Default .bdrv_child_perm() for filter drivers Most filters need permissions related to read and write for their children, but only if the node has a parent that wants to use the same operation on the filter. The same is true for resize. This adds a default implementation that simply forwards all necessary permissions to all children of the node and leaves the other permissions unchanged. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	33a610c398	block: Involve block drivers in permission granting In many cases, the required permissions of one node on its children depend on what its parents require from it. For example, the raw format or most filter drivers only need to request consistent reads if that's something that one of their parents wants. In order to achieve this, this patch introduces two new BlockDriver callbacks. The first one lets drivers first check (recursively) whether the requested permissions can be set; the second one actually sets the new permission bitmask. Also add helper functions that drivers can use in their implementation of the callbacks to update their permissions on a specific child. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	d5e6f437c5	block: Let callers request permissions when attaching a child node When attaching a node as a child to a new parent, the required and shared permissions for this parent are checked against all other parents of the node now, and an error is returned if there is a conflict. This allows error returns to a function that previously always succeeded, and the same is true for quite a few callers and their callers. Converting all of them within the same patch would be too much, so for now everyone tells that they don't need any permissions and allow everyone else to do anything. This way we can use &error_abort initially and convert caller by caller to pass actual permission requirements and implement error handling. All these places are marked with FIXME comments and it will be the job of the next patches to clean them up again. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	8b2ff5291f	block: Add Error argument to bdrv_attach_child() It will have to return an error soon, so prepare the callers for it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:35 +01:00
Kevin Wolf	7006c9a761	block: Add op blocker permission constants This patch defines the permission categories that will be used by the new op blocker system. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:35 +01:00
Kevin Wolf	680c7f9606	block: Add bdrv_new_open_driver() This function allows to create more or less normal BlockDriverStates even for BlockDrivers that aren't globally registered (e.g. helper filters for block jobs). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Kevin Wolf	52cdbc5869	block: Pass BdrvChild to bdrv_truncate() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Paolo Bonzini	91bcea4899	block: document fields protected by AioContext lock Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-19-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:40 +00:00
Paolo Bonzini	a153bf52b3	aio-posix: partially inline aio_dispatch into aio_poll This patch prepares for the removal of unnecessary lockcnt inc/dec pairs. Extract the dispatching loop for file descriptor handlers into a new function aio_dispatch_handlers, and then inline aio_dispatch into aio_poll. aio_dispatch can now become void. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-17-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:39 +00:00
Paolo Bonzini	0c330a734b	aio: introduce aio_co_schedule and aio_co_wake aio_co_wake provides the infrastructure to start a coroutine on a "home" AioContext. It will be used by CoMutex and CoQueue, so that coroutines don't jump from one context to another when they go to sleep on a mutex or waitqueue. However, it can also be used as a more efficient alternative to one-shot bottom halves, and saves the effort of tracking which AioContext a coroutine is running on. aio_co_schedule is the part of aio_co_wake that starts a coroutine on a remove AioContext, but it is also useful to implement e.g. bdrv_set_aio_context callbacks. The implementation of aio_co_schedule is based on a lock-free multiple-producer, single-consumer queue. The multiple producers use cmpxchg to add to a LIFO stack. The consumer (a per-AioContext bottom half) grabs all items added so far, inverts the list to make it FIFO, and goes through it one item at a time until it's empty. The data structure was inspired by OSv, which uses it in the very code we'll "port" to QEMU for the thread-safe CoMutex. Most of the new code is really tests. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-3-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:07 +00:00
Eric Farman	482652502e	block: Fix target variable of BLKSECTGET ioctl Commit `6f6071745b` ("raw-posix: Fetch max sectors for host block device") introduced a routine to call the kernel BLKSECTGET ioctl, which stores the result back to user space. However, the size of the data returned depends on the routine handling the ioctl. The (compat_)blkdev_ioctl returns a short, while sg_ioctl returns an int. Thus, on big-endian systems, we can find ourselves accidentally shifting the result to a much larger value. (On s390x, a short is 16 bits while an int is 32 bits.) Also, the two ioctl handlers return values in different scales (block returns sectors, while sg returns bytes), so some tweaking of the outputs is required such that hdev_get_max_transfer_length returns a value in a consistent set of units. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Message-Id: <20170120162527.66075-3-farman@linux.vnet.ibm.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-27 18:07:31 +01:00
Paolo Bonzini	7c690fd193	aio: document locking Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170112180800.21085-10-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-16 13:25:18 +00:00
Paolo Bonzini	d7c99a1282	aio: make ctx->list_lock a QemuLockCnt, subsuming ctx->walking_bh This will make it possible to walk the list of bottom halves without holding the AioContext lock---and in turn to call bottom half handlers without holding the lock. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170112180800.21085-4-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-16 13:25:17 +00:00
Paolo Bonzini	cf2c02c8ea	aio: rename bh_lock to list_lock This will be used for AioHandlers too. There is going to be little or no contention, so it is better to reuse the same lock. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170112180800.21085-2-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-16 13:25:17 +00:00
Paolo Bonzini	8f90b5e91d	block: get rid of bdrv_io_unplugged_begin/end bdrv_io_plug and bdrv_io_unplug are only called (via their BlockBackend equivalents) after starting asynchronous I/O. bdrv_drain is not going to be called while they are running, because---even if a coroutine runs for some reason---it will only drain in the next iteration of the event loop through bdrv_co_yield_to_drain. So this mechanism is unnecessary, get rid of it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161129113334.605-1-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-16 13:25:17 +00:00
Eric Blake	c1bb86cd8a	block: Rename raw-{posix,win32} to file-*.c These files deal with the file protocol, not the raw format (the file protocol is often used with other formats, and the raw format is not forced to use the file protocol). Rename things to make it a bit easier to follow. Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-01-09 13:30:53 +01:00
Stefan Hajnoczi	82a4118694	aio: self-tune polling time This patch is based on the algorithm for the kvm.ko halt_poll_ns parameter in Linux. The initial polling time is zero. If the event loop is woken up within the maximum polling time it means polling could be effective, so grow polling time. If the event loop is woken up beyond the maximum polling time it means polling is not effective, so shrink polling time. If the event loop makes progress within the current polling time then the sweet spot has been reached. This algorithm adjusts the polling time so it can adapt to variations in workloads. The goal is to reach the sweet spot while also recognizing when polling would hurt more than help. Two new trace events, poll_grow and poll_shrink, are added for observing polling time adjustment. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-13-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-03 16:38:50 +00:00
Stefan Hajnoczi	684e508c23	aio: add .io_poll_begin/end() callbacks The begin and end callbacks can be used to prepare for the polling loop and clean up when polling stops. Note that they may only be called once for multiple aio_poll() calls if polling continues to succeed. Once polling fails the end callback is invoked before aio_poll() resumes file descriptor monitoring. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-11-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-03 16:38:50 +00:00
Stefan Hajnoczi	4a1cba3802	aio: add polling mode to AioContext The AioContext event loop uses ppoll(2) or epoll_wait(2) to monitor file descriptors or until a timer expires. In cases like virtqueues, Linux AIO, and ThreadPool it is technically possible to wait for events via polling (i.e. continuously checking for events without blocking). Polling can be faster than blocking syscalls because file descriptors, the process scheduler, and system calls are bypassed. The main disadvantage to polling is that it increases CPU utilization. In classic polling configuration a full host CPU thread might run at 100% to respond to events as quickly as possible. This patch implements a timeout so we fall back to blocking syscalls if polling detects no activity. After the timeout no CPU cycles are wasted on polling until the next event loop iteration. The run_poll_handlers_begin() and run_poll_handlers_end() trace events are added to aid performance analysis and troubleshooting. If you need to know whether polling mode is being used, trace these events to find out. Note that the AioContext is now re-acquired before disabling notify_me in the non-polling case. This makes the code cleaner since notify_me was enabled outside the non-polling AioContext release region. This change is correct since it's safe to keep notify_me enabled longer (disabling is an optimization) but potentially causes unnecessary event_notifer_set() calls. I think the chance of performance regression is small here. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-03 16:38:48 +00:00
Stefan Hajnoczi	f6a51c84cd	aio: add AioPollFn and io_poll() interface The new AioPollFn io_poll() argument to aio_set_fd_handler() and aio_set_event_handler() is used in the next patch. Keep this code change separate due to the number of files it touches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-03 16:38:48 +00:00
Stefan Hajnoczi	721671ade7	aio: add flag to skip fds to aio_dispatch() Polling mode will not call ppoll(2)/epoll_wait(2). Therefore we know there are no fds ready and should avoid looping over fd handlers in aio_dispatch(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-03 16:38:47 +00:00
Yaowei Bai	722f8d9099	block: drop remaining legacy aio functions in comment Commit `87f68d3182` (block: drop aio functions that operate on the main AioContext) drops qemu_aio_wait function references mostly while leaves these behind, clean up them. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Message-Id: <1480566640-27264-3-git-send-email-baiyaowei@cmss.chinamobile.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-12-22 16:00:25 +01:00
John Snow	111049a4ec	blockjob: refactor backup_start as backup_job_create Refactor backup_start as backup_job_create, which only creates the job, but does not automatically start it. The old interface, 'backup_start', is not kept in favor of limiting the number of nearly-identical interfaces that would have to be edited to keep up with QAPI changes in the future. Callers that wish to synchronously start the backup_block_job can instead just call block_job_start immediately after calling backup_job_create. Transactions are updated to use the new interface, calling block_job_start only during the .commit phase, which helps prevent race conditions where jobs may finish before we even finish building the transaction. This may happen, for instance, during empty block backup jobs. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1478587839-9834-6-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
John Snow	5ccac6f186	blockjob: add block_job_start Instead of automatically starting jobs at creation time via backup_start et al, we'd like to return a job object pointer that can be started manually at later point in time. For now, add the block_job_start mechanism and start the jobs automatically as we have been doing, with conversions job-by-job coming in later patches. Of note: cancellation of unstarted jobs will perform all the normal cleanup as if the job had started, particularly abort and clean. The only difference is that we will not emit any events, because the job never actually started. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1478587839-9834-5-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
John Snow	a7815a764c	blockjob: add .start field Add an explicit start field to specify the entrypoint. We already have ownership of the coroutine itself AND managing the lifetime of the coroutine, let's take control of creation of the coroutine, too. This will allow us to delay creation of the actual coroutine until we know we'll actually start a BlockJob in block_job_start. This avoids the sticky question of how to "un-create" a Coroutine that hasn't been started yet. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1478587839-9834-4-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00

... 2 3 4 5 6 ...

906 Commits