mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Thomas Huth	4c386f8064	Do not include sysemu/sysemu.h if it's not really necessary Stop including sysemu/sysemu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-2-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:50 +02:00
Kevin Wolf	35b7f4abd5	block: Add BDRV_O_NO_SHARE for blk_new_open() Normally, blk_new_open() just shares all permissions. This was fine originally when permissions only protected against uses in the same process because no other part of the code would actually get to access the block nodes opened with blk_new_open(). However, since we use it for file locking now, unsharing permissions becomes desirable. Add a new BDRV_O_NO_SHARE flag that is used in blk_new_open() to unshare any permissions that can be unshared. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210422164344.283389-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy	228ca37e12	block: drop ctx argument from bdrv_root_attach_child Passing parent aio context is redundant, as child_class and parent opaque pointer are enough to retrieve it. Drop the argument and use new bdrv_child_get_parent_aio_context() interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-7-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:47 +02:00
Vladimir Sementsov-Ogievskiy	3ca1f32257	block: BdrvChildClass: add .get_parent_aio_context handler Add new handler to get aio context and implement it in all child classes. Add corresponding public interface to be used soon. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-6-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:47 +02:00
Philippe Mathieu-Daudé	538f049704	sysemu: Let VMChangeStateHandler take boolean 'running' argument The 'running' argument from VMChangeStateHandler does not require other value than 0 / 1. Make it a plain boolean. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20210111152020.1422021-3-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-03-09 23:13:57 +01:00
Kevin Wolf	86b1cf3227	block: Separate blk_is_writable() and blk_supports_write_perm() Currently, blk_is_read_only() tells whether a given BlockBackend can only be used in read-only mode because its root node is read-only. Some callers actually try to answer a slightly different question: Is the BlockBackend configured to be writable, by taking write permissions on the root node? This can differ, for example, for CD-ROM devices which don't take write permissions, but may be backed by a writable image file. scsi-cd allows write requests to the drive if blk_is_read_only() returns false. However, the write request will immediately run into an assertion failure because the write permission is missing. This patch introduces separate functions for both questions. blk_supports_write_perm() answers the question whether the block node/image file can support writable devices, whereas blk_is_writable() tells whether the BlockBackend is currently configured to be writable. All calls of blk_is_read_only() are converted to one of the two new functions. Fixes: https://bugs.launchpad.net/bugs/1906693 Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210118123448.307825-2-kwolf@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-01-27 20:45:20 +01:00
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	2020-09-23 16:07:44 +01:00
Max Reitz	9a71b9de3f	commit: Deal with filters This includes some permission limiting (for example, we only need to take the RESIZE permission if the base is smaller than the top). Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Stefan Hajnoczi	1d719ddc35	block: fix bdrv_aio_cancel() for ENOMEDIUM requests bdrv_aio_cancel() calls aio_poll() on the AioContext for the given I/O request until it has completed. ENOMEDIUM requests are special because there is no BlockDriverState when the drive has no medium! Define a .get_aio_context() function for BlkAioEmAIOCB requests so that bdrv_aio_cancel() can find the AioContext where the completion BH is pending. Without this function bdrv_aio_cancel() aborts on ENOMEDIUM requests! libFuzzer triggered the following assertion: cat << EOF \| qemu-system-i386 -M pc-q35-5.0 \ -nographic -monitor none -serial none \ -qtest stdio -trace ide\* outl 0xcf8 0x8000fa24 outl 0xcfc 0xe106c000 outl 0xcf8 0x8000fa04 outw 0xcfc 0x7 outl 0xcf8 0x8000fb20 write 0x0 0x3 0x2780e7 write 0xe106c22c 0xd 0x1130c218021130c218021130c2 write 0xe106c218 0x15 0x110010110010110010110010110010110010110010 EOF ide_exec_cmd IDE exec cmd: bus 0x56170a77a2b8; state 0x56170a77a340; cmd 0xe7 ide_reset IDEstate 0x56170a77a340 Aborted (core dumped) (gdb) bt #1 0x00007ffff4f93895 in abort () at /lib64/libc.so.6 #2 0x0000555555dc6c00 in bdrv_aio_cancel (acb=0x555556765550) at block/io.c:2745 #3 0x0000555555dac202 in blk_aio_cancel (acb=0x555556765550) at block/block-backend.c:1546 #4 0x0000555555b1bd74 in ide_reset (s=0x555557213340) at hw/ide/core.c:1318 #5 0x0000555555b1e3a1 in ide_bus_reset (bus=0x5555572132b8) at hw/ide/core.c:2422 #6 0x0000555555b2aa27 in ahci_reset_port (s=0x55555720eb50, port=2) at hw/ide/ahci.c:650 #7 0x0000555555b29fd7 in ahci_port_write (s=0x55555720eb50, port=2, offset=44, val=16) at hw/ide/ahci.c:360 #8 0x0000555555b2a564 in ahci_mem_write (opaque=0x55555720eb50, addr=556, val=16, size=1) at hw/ide/ahci.c:513 #9 0x000055555598415b in memory_region_write_accessor (mr=0x55555720eb80, addr=556, value=0x7fffffffb838, size=1, shift=0, mask=255, attrs=...) at softmmu/memory.c:483 Looking at bdrv_aio_cancel: 2728 /* async I/Os / 2729 2730 void bdrv_aio_cancel(BlockAIOCB acb) 2731 { 2732 qemu_aio_ref(acb); 2733 bdrv_aio_cancel_async(acb); 2734 while (acb->refcnt > 1) { 2735 if (acb->aiocb_info->get_aio_context) { 2736 aio_poll(acb->aiocb_info->get_aio_context(acb), true); 2737 } else if (acb->bs) { 2738 /* qemu_aio_ref and qemu_aio_unref are not thread-safe, so 2739 * assert that we're not using an I/O thread. Thread-safe 2740 * code should use bdrv_aio_cancel_async exclusively. 2741 */ 2742 assert(bdrv_get_aio_context(acb->bs) == qemu_get_aio_context()); 2743 aio_poll(bdrv_get_aio_context(acb->bs), true); 2744 } else { 2745 abort(); <=============== 2746 } 2747 } 2748 qemu_aio_unref(acb); 2749 } Fixes: `02c50efe08` ("block: Add bdrv_aio_cancel_async") Reported-by: Alexander Bulekov <alxndr@bu.edu> Buglink: https://bugs.launchpad.net/qemu/+bug/1878255 Originally-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200720100141.129739-1-stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-21 12:00:38 +02:00
Greg Kurz	e6cada9231	block: Avoid stale pointer dereference in blk_get_aio_context() It is possible for blk_remove_bs() to race with blk_drain_all(), causing the latter to dereference a stale blk->root pointer: blk_remove_bs(blk) bdrv_root_unref_child(blk->root) child_bs = blk->root->bs bdrv_detach_child(blk->root) ... g_free(blk->root) <============== blk->root becomes stale bdrv_unref(child_bs) <============ yield at some point A blk_drain_all() can be triggered by some guest action in the meantime, eg. on POWER, SLOF might disable bus mastering on a virtio-scsi-pci device: virtio_write_config() virtio_pci_stop_ioeventfd() virtio_bus_stop_ioeventfd() virtio_scsi_dataplane_stop() blk_drain_all() blk_get_aio_context() bs = blk->root ? blk->root->bs : NULL ^^^^^^^^^ stale Then, depending on one's luck, QEMU either crashes with SEGV or hits the assertion in blk_get_aio_context(). blk->root is set by blk_insert_bs() which calls bdrv_root_attach_child() first. The blk_remove_bs() function should rollback the changes made by blk_insert_bs() in the opposite order (or it should be documented somewhere why this isn't the case). Clear blk->root before calling bdrv_root_unref_child() in blk_remove_bs(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159430264541.389456.11925072456012783045.stgit@bahia.lan> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:24:15 +02:00
Max Reitz	1f38f04eac	block: Pass BdrvChildRole in remaining cases These calls have no real use for the child role yet, but it will not harm to give one. Notably, the bdrv_root_attach_child() call in blockjob.c is left unmodified because there is not much the generic BlockJob object wants from its children. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-34-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	3cdc69d31b	block: Pass parent_is_format to .inherit_options() We plan to unify the generic .inherit_options() functions. The resulting common function will need to decide whether to force-enable format probing, force-disable it, or leave it as-is. To make this decision, it will need to know whether the parent node is a format node or not (because we never want format probing if the parent is a format node already (except for the backing chain)). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-9-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	272c02eaef	block: Pass BdrvChildRole to .inherit_options() For now, all callers (effectively) pass 0 and no callee evaluates thie value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-8-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	258b776515	block: Add BdrvChildRole to BdrvChild For now, it is always set to 0. Later patches in this series will ensure that all callers pass an appropriate combination of flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	bd86fb990c	block: Rename BdrvChildRole to BdrvChildClass This structure nearly only contains parent callbacks for child state changes. It cannot really reflect a child's role, because different roles may overlap (as we will see when real roles are introduced), and because parents can have custom callbacks even when the child fulfills a standard role. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	2b7bbdbdef	block: Add blk_make_empty() Two callers of BlockDriver.bdrv_make_empty() remain that should not call this method directly. Both do not have access to a BdrvChild, but they can use a BlockBackend, so we add this function that lets them use it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429141126.85159-4-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Eric Blake	a3aeeab557	block: Add blk_new_with_bs() helper There are several callers that need to create a new block backend from an existing BDS; make the task slightly easier with a common helper routine. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200424190903.522087-2-eblake@redhat.com> [mreitz: Set @ret only in error paths, see https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg01216.html] Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200428192648.749066-2-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 13:17:36 +02:00
Kevin Wolf	8c6242b6f3	block-backend: Add flags to blk_truncate() Now that node level interface bdrv_truncate() supports passing request flags to the block driver, expose this on the BlockBackend level, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	7b8e485742	block: Add flags to bdrv(_co)_truncate() Now that block drivers can support flags for .bdrv_co_truncate, expose the parameter in the node level interfaces bdrv_co_truncate() and bdrv_truncate(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	7f16476fab	block: Fix blk->in_flight during blk_wait_while_drained() Waiting in blk_wait_while_drained() while blk->in_flight is increased for the current request is wrong because it will cause the drain operation to deadlock. This patch makes sure that blk_wait_while_drained() is called with blk->in_flight increased exactly once for the current request, and that it temporarily decreases the counter while it waits. Fixes: `cf3129323f` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200407121259.21350-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 15:40:57 +02:00
Kevin Wolf	fbb92b6798	block: Increase BB.in_flight for coroutine and sync interfaces External callers of blk_co_() and of the synchronous blk_() functions don't currently increase the BlockBackend.in_flight counter, but calls from blk_aio_() do, so there is an inconsistency whether the counter has been increased or not. This patch moves the actual operations to static functions that can later know they will always be called with in_flight increased exactly once, even for external callers using the blk_co_() coroutine interfaces. If the public blk_co_*() interface is unused, remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200407121259.21350-3-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 15:40:41 +02:00
Kevin Wolf	564806c529	block-backend: Reorder flush/pdiscard function definitions Move all variants of the flush/pdiscard functions to a single place and put the blk_co_*() version first because it is called by all other variants (and will become static in the next patch). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200407121259.21350-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 15:40:28 +02:00
Max Reitz	c80d8b06cf	block: Add @exact parameter to bdrv_co_truncate() We have two drivers (iscsi and file-posix) that (in some cases) return success from their .bdrv_co_truncate() implementation if the block device is larger than the requested offset, but cannot be shrunk. Some callers do not want that behavior, so this patch adds a new parameter that they can use to turn off that behavior. This patch just adds the parameter and lets the block/io.c and block/block-backend.c functions pass it around. All other callers always pass false and none of the implementations evaluate it, so that this patch does not change existing behavior. Future patches take care of that. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:00:07 +01:00
Vladimir Sementsov-Ogievskiy	b30168647f	block/block-backend: add blk_co_pwritev_part Add blk write function with qiov_offset parameter. It's needed for the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191011090711.19940-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Pavel Dovgalyuk	e4ec5ad464	replay: add BH oneshot event for block layer Replay is capable of recording normal BH events, but sometimes there are single use callbacks scheduled with aio_bh_schedule_oneshot function. This patch enables recording and replaying such callbacks. Block layer uses these events for calling the completion function. Replaying these calls makes the execution deterministic. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Peter Maydell	e018ccb3fb	Block layer patches: - file-posix: Fix O_DIRECT alignment detection - Fixes for concurrent block jobs - block-backend: Queue requests while drained (fix IDE vs. job crashes) - qemu-img convert: Deprecate using -n and -o together - iotests: Migration tests with filter nodes - iotests: More media change tests -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJdVnduAAoJEH8JsnLIjy/W0IgQAKft/M3aDgt0sbTzQh8vdy6A yAfTnnSL4Z56+8qAsqhEnplC3rZxvTkg9AGOoNYHOZKl3FgRH9r8g9/Enemh4fWu MH52hiRf2ytlFVurIQal3aj9O+i0YTnzuvYbysvkH4ID5zbv2QnwdagtEcBxbbYL NZTMZBynDzp4rKIZ7p6T/kkaklLHh4vZrjW+Mzm3LQx9JJr8TwVNqqetSfc4VKIJ ByaNbbihDUVjQyIaJ24DXXJdzonGrrtSbSZycturc5FzXymzSRgrXZCeSKCs8X+i fjwMXH5v4/UfK511ILsXiumeuxBfD2Ck4sAblFxVo06oMPRNmsAKdRLeDByE7IC1 lWep/pB3y/au9CW2/pkWJOiaz5s5iuv2fFYidKUJ0KQ1dD7G8M9rzkQlV3FUmTZO jBKSxHEffXsYl0ojn0vGmZEd7FAPi3fsZibGGws1dVgxlWI93aUJsjCq0E+lHIRD hEmQcjqZZa4taKpj0Y3Me05GkL7tH6RYA153jDNb8rPdzriGRCLZSObEISrOJf8H Mh0gTLi8KJNh6bULd12Ake1tKn7ZeTXpHH+gadz9OU7eIModh1qYTSHPlhy5oAv0 Hm9BikNlS1Hzw+a+EbLcOW7TrsteNeGr7r8T6QKPMq1sfsYcp3svbC2c+zVlQ6Ll mLoTssksXOkgBevVqSiS =T7L5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - file-posix: Fix O_DIRECT alignment detection - Fixes for concurrent block jobs - block-backend: Queue requests while drained (fix IDE vs. job crashes) - qemu-img convert: Deprecate using -n and -o together - iotests: Migration tests with filter nodes - iotests: More media change tests # gpg: Signature made Fri 16 Aug 2019 10:29:18 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: file-posix: Handle undetectable alignment qemu-img convert: Deprecate using -n and -o together block-backend: Queue requests while drained mirror: Keep mirror_top_bs drained after dropping permissions block: Remove blk_pread_unthrottled() iotests: Add test for concurrent stream/commit tests: Test mid-drain bdrv_replace_child_noperm() tests: Test polling in bdrv_drop_intermediate() block: Reduce (un)drains when replacing a child block: Keep subtree drained in drop_intermediate block: Simplify bdrv_filter_default_perms() iotests: Test migration with all kinds of filter nodes iotests: Move migration helpers to iotests.py iotests/118: Add -blockdev based tests iotests/118: Create test classes dynamically iotests/118: Test media change for scsi-cd Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-08-16 16:43:46 +01:00
Markus Armbruster	54d31236b9	sysemu: Split sysemu/runstate.h off sysemu/sysemu.h sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]	2019-08-16 13:37:36 +02:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	13d4ff07e8	trace: Do not include qom/cpu.h into generated trace.h docs/devel/tracing.txt explains "since many source files include trace.h, [the generated trace.h use] a minimum of types and other header files included to keep the namespace clean and compile times and dependencies down." Commit `4815185902` "trace: Add per-vCPU tracing states for events with the 'vcpu' property" made them all include qom/cpu.h via control-internal.h. qom/cpu.h in turn includes about thirty headers. Ouch. Per-vCPU tracing is currently not supported in sub-directories' trace-events. In other words, qom/cpu.h can only be used in trace-root.h, not in any trace.h. Split trace/control-vcpu.h off trace/control.h and trace/control-internal.h. Have the generated trace.h include trace/control.h (which no longer includes qom/cpu.h), and trace-root.h include trace/control-vcpu.h (which includes it). The resulting improvement is a bit disappointing: in my "build everything" tree, some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h) depend on a trace.h, and about 600 of them no longer depend on qom/cpu.h. But more than 1300 others depend on trace-root.h. More work is clearly needed. Left for another day. Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-8-armbru@redhat.com>	2019-08-16 13:31:52 +02:00
Kevin Wolf	cf3129323f	block-backend: Queue requests while drained This fixes devices like IDE that can still start new requests from I/O handlers in the CPU thread while the block backend is drained. The basic assumption is that in a drain section, no new requests should be allowed through a BlockBackend (blk_drained_begin/end don't exist, we get drain sections only on the node level). However, there are two special cases where requests should not be queued: 1. Block jobs: We already make sure that block jobs are paused in a drain section, so they won't start new requests. However, if the drain_begin is called on the job's BlockBackend first, it can happen that we deadlock because the job stays busy until it reaches a pause point - which it can't if its requests aren't processed any more. The proper solution here would be to make all requests through the job's filter node instead of using a BlockBackend. For now, just disabling request queuing on the job BlockBackend is simpler. 2. In test cases where making requests through bdrv_* would be cumbersome because we'd need a BdrvChild. As we already got the functionality to disable request queuing from 1., use it in tests, too, for convenience. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2019-08-16 10:25:16 +02:00
Kevin Wolf	421919d76b	block: Remove blk_pread_unthrottled() The functionality offered by blk_pread_unthrottled() goes back to commit `498e386c58`. Then, we couldn't perform I/O throttling with synchronous requests because timers wouldn't be executed in polling loops. So the commit automatically disabled I/O throttling as soon as a synchronous request was issued. However, for geometry detection during disk initialisation, we always used (and still use) synchronous requests even if guest requests use AIO later. Geometry detection was not wanted to disable I/O throttling, so bdrv_pread_unthrottled() was introduced which disabled throttling only temporarily. All of this isn't necessary any more because we do run timers in polling loop and even synchronous requests are now using coroutine infrastructure internally. For this reason, commit `90c78624f` already removed the automatic disabling of I/O throttling. It's time to get rid of the workaround for the removed code, and its abuse of blk_root_drained_begin()/end(), as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2019-08-16 10:25:16 +02:00
Max Reitz	e037c09c78	block: Do not poll in bdrv_do_drained_end() We should never poll anywhere in bdrv_do_drained_end() (including its recursive callees like bdrv_drain_invoke()), because it does not cope well with graph changes. In fact, it has been written based on the postulation that no graph changes will happen in it. Instead, the callers that want to poll must poll, i.e. all currently globally available wrappers: bdrv_drained_end(), bdrv_subtree_drained_end(), bdrv_unapply_subtree_drain(), and bdrv_drain_all_end(). Graph changes there do not matter. They can poll simply by passing a pointer to a drained_end_counter and wait until it reaches 0. This patch also adds a non-polling global wrapper for bdrv_do_drained_end() that takes a drained_end_counter pointer. We need such a variant because now no function called anywhere from bdrv_do_drained_end() must poll. This includes BdrvChildRole.drained_end(), which already must not poll according to its interface documentation, but bdrv_child_cb_drained_end() just violates that by invoking bdrv_drained_end() (which does poll). Therefore, BdrvChildRole.drained_end() must take a drained_end_counter parameter, which bdrv_child_cb_drained_end() can pass on to the new bdrv_drained_end_no_poll() function. Note that we now have a pattern of all drained_end-related functions either polling or receiving a drained_end_counter to let the caller poll based on that. A problem with a single poll loop is that when the drained section in bdrv_set_aio_context_ignore() ends, some nodes in the subgraph may be in the old contexts, while others are in the new context already. To let the collective poll in bdrv_drained_end() work correctly, we must not hold a lock to the old context, so that the old context can make progress in case it is different from the current context. (In the process, remove the comment saying that the current context is always the old context, because it is wrong.) In all other places, all nodes in a subtree must be in the same context, so we can just poll that. The exception of course is bdrv_drain_all_end(), but that always runs in the main context, so we can just poll NULL (like bdrv_drain_all_begin() does). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-07-19 13:19:16 +02:00
Vladimir Sementsov-Ogievskiy	68d00e4293	block/block-backend: blk_iostatus_reset: drop usage of bs->job We are going to remove bs->job pointer. Drop it's usage in blk_iostatus_reset. blk_iostatus_reset() has only two callers: 1. blk_attach_dev(). This doesn't have anything to do with jobs and attaching a new guest device won't solve any problem the job encountered, so no reason to reset the iostatus for the job. 2. qmp_cont(). This resets the iostatus for everything. We can just call block_job_iostatus_reset() for all block jobs instead of going through BlockBackend. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:10 +02:00
Kevin Wolf	132ada80c4	block: Adjust AioContexts when attaching nodes So far, we only made sure that updating the AioContext of a node affected the whole subtree. However, if a node is newly attached to a new parent, we also need to make sure that both the subtree of the node and the parent are in the same AioContext. This tries to move the new child node to the parent AioContext and returns an error if this isn't possible. BlockBackends now actually apply their AioContext to their root node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Kevin Wolf	d861ab3acf	block: Add BlockBackend.ctx This adds a new parameter to blk_new() which requires its callers to declare from which AioContext this BlockBackend is going to be used (or the locks of which AioContext need to be taken anyway). The given context is only stored and kept up to date when changing AioContexts. Actually applying the stored AioContext to the root node is saved for another commit. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Kevin Wolf	97896a4887	block: Add Error to blk_set_aio_context() Add an Error parameter to blk_set_aio_context() and use bdrv_child_try_set_aio_context() internally to check whether all involved nodes can actually support the AioContext switch. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Alberto Garcia	b441dc71c0	block: Make bdrv_root_attach_child() unref child_bs on failure A consequence of the previous patch is that bdrv_attach_child() transfers the reference to child_bs from the caller to parent_bs, which will drop it on bdrv_close() or when someone calls bdrv_unref_child(). But this only happens when bdrv_attach_child() succeeds. If it fails then the caller is responsible for dropping the reference to child_bs. This patch makes bdrv_attach_child() take the reference also when there is an error, freeing the caller for having to do it. A similar situation happens with bdrv_root_attach_child(), so the changes on this patch affect both functions. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20dfb3d9ccec559cdd1a9690146abad5d204a186.1557754872.git.berto@igalia.com [mreitz: Removed now superfluous BdrvChild * variable in bdrv_open_child()] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Kevin Wolf	980b0f943a	block: Add blk_set_allow_aio_context_change() Some users (like block jobs) can tolerate an AioContext change for their BlockBackend. Add a function that tells the BlockBackend that it can allow changes. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:56 +02:00
Kevin Wolf	38475269d4	block: Implement .(can_)set_aio_ctx for BlockBackend bdrv_try_set_aio_context() currently fails if a BlockBackend is attached to a node because it doesn't implement the BdrvChildRole callbacks for AioContext management. We can allow changing the AioContext of monitor-owned BlockBackends as long as no device is attached to them. When setting the AioContext of the root node of a BlockBackend, we now need to pass blk->root as an ignored child because we don't want the root node to recursively call back into BlockBackend and execute blk_do_set_aio_context() a second time. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:56 +02:00
Eric Blake	4841211e0d	block: Add bdrv_get_request_alignment() The next patch needs access to a device's minimum permitted alignment, since NBD wants to advertise this to clients. Add an accessor function, borrowing from blk_get_max_transfer() for accessing a backend's block limits. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190329042750.14704-6-eblake@redhat.com>	2019-04-01 08:46:52 -05:00
Peter Maydell	adf2e451f3	Block layer patches: - Block graph change fixes (avoid loops, cope with non-tree graphs) - bdrv_set_aio_context() related fixes - HMP snapshot commands: Use only tag, not the ID to identify snapshots - qmeu-img, commit: Error path fixes - block/nvme: Build fix for gcc 9 - MAINTAINERS updates - Fix various issues with bdrv_refresh_filename() - Fix various iotests - Include LUKS overhead in qemu-img measure for qcow2 - A fix for vmdk's image creation interface -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJcc/knAAoJEH8JsnLIjy/WptQP/3F8Lh52H4egXaP7NUUuDjQM AhqhuDAp/EZBS+xim9kLTogNJADe/rMWdSX/YB5aLpSPYbjasC66NgaLhd6QewgQ VIcsLUdlYAyZ5ZjJytimfMTLwm1X02RmVIe55y52DTY8LlfViZzOlf3qwqPm00ao EJB2cl8UJLM+PVEu59cCw3R0/06LY+WIJRB32d3tnCBRTkaJwfR9h4lrp/juVcFZ U+2eWU68KMbUHSYiWANowN+KRV3uPY4HVA98v3F0vDmcBxlVHOeBg6S+PcT7tK8p huzCMwcdwUyPMJgVs/+WBtUnbG0jN6SHUYmFLz859UMVgBnCw5tzBMf8qw1wOA4A Iw+zor27Pxj4IlxcLPp5f97YZ8k9acdMR2VKPH6xLJZ1JF+sKa54RfzESd5EJeIj Mfcp773H0lIaWcFJ6RY1F0L1E1ta7QigwNBiWMdYfh0a0EWHnDvGyYeaSPYEQ+rl e8bZOcfrYwVI7DTDiZOIkGA9D8DXEPDNp+sl6s1DxeY69D0NNaXTtCPqFNNAbFbd 20uD7yDRZlWq32cQB/K9D5cSkZRSOzdUpLfLU3nQU2+dz11x6OpM6m7DVboSrztD 1HtPPDzDEvH5dOP7ibd60s+ntjkSiNfNkUgnuVrBE/d/PocC1eHHpZt5V7f43Ofb RxVwH5+smzQ9nsNBfQR0 =gaah -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - Block graph change fixes (avoid loops, cope with non-tree graphs) - bdrv_set_aio_context() related fixes - HMP snapshot commands: Use only tag, not the ID to identify snapshots - qmeu-img, commit: Error path fixes - block/nvme: Build fix for gcc 9 - MAINTAINERS updates - Fix various issues with bdrv_refresh_filename() - Fix various iotests - Include LUKS overhead in qemu-img measure for qcow2 - A fix for vmdk's image creation interface # gpg: Signature made Mon 25 Feb 2019 14:18:15 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (71 commits) iotests: Skip 211 on insufficient memory vmdk: false positive of compat6 with hwversion not set iotests: add LUKS payload overhead to 178 qemu-img measure test qcow2: include LUKS payload overhead in qemu-img measure iotests.py: s/_/-/g on keys in qmp_log() iotests: Let 045 be run concurrently iotests: Filter SSH paths iotests.py: Filter filename in any string value iotests.py: Add is_str() iotests: Fix 207 to use QMP filters for qmp_log iotests: Fix 232 for LUKS iotests: Remove superfluous rm from 232 iotests: Fix 237 for Python 2.x iotests: Re-add filename filters iotests: Test json:{} filenames of internal BDSs block: BDS options may lack the "driver" option block/null: Generate filename even with latency-ns block/curl: Implement bdrv_refresh_filename() block/curl: Harmonize option defaults block/nvme: Fix bdrv_refresh_filename() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-02-26 19:04:47 +00:00
Kevin Wolf	c90e2a9cfd	block-backend: Make blk_inc/dec_in_flight public For some users of BlockBackends, just increasing the in_flight counter is easier than implementing separate handlers in BlockDevOps. Make the helper functions for this public. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-25 15:03:19 +01:00
Vladimir Sementsov-Ogievskiy	ae5a9592ce	block/block-backend: use QEMU_IOVEC_INIT_BUF Use new QEMU_IOVEC_INIT_BUF() instead of qemu_iovec_init_external( ... , 1), which simplifies the code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190218140926.333779-4-vsementsov@virtuozzo.com Message-Id: <20190218140926.333779-4-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-02-22 09:42:13 +00:00
Thomas Huth	d09ea2d227	block: Remove blk_attach_dev_legacy() / legacy_dev code The last user of blk_attach_dev_legacy() was the code in xen_disk which has recently been reworked. Now there is no user for this legacy function anymore. Thus we can finally remove all code related to the "legacy_dev" flag, too, and turn the related "void " in block-backend.c into proper "DeviceState " to fix some of the remaining TODOs there. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-01 13:46:45 +01:00
Kevin Wolf	4720cbeea1	block: Fix hangs in synchronous APIs with iothreads In the block layer, synchronous APIs are often implemented by creating a coroutine that calls the asynchronous coroutine-based implementation and then waiting for completion with BDRV_POLL_WHILE(). For this to work with iothreads (more specifically, when the synchronous API is called in a thread that is not the home thread of the block device, so that the coroutine will run in a different thread), we must make sure to call aio_wait_kick() at the end of the operation. Many places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if the condition has long become false. Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is generally not enough for most other operations because they haven't set the return value in the coroutine entry stub yet. To avoid race conditions there, we need to kick after setting the return value. The race window is small enough that the problem doesn't usually surface in the common path. However, it does surface and causes easily reproducible hangs if the operation can return early before even calling bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op success paths). The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is slightly different: These functions even neglected to schedule the coroutine in the home thread of the node. This avoids the hang, but is obviously wrong, too. Fix those to schedule the coroutine in the right AioContext in addition to adding aio_wait_kick() calls. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-02-01 13:46:44 +01:00
Vladimir Sementsov-Ogievskiy	5d3b4e9946	qapi: add x-debug-query-block-graph Add a new command, returning block nodes (and their users) graph. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20181221170909.25584-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-01-31 00:38:19 +01:00
Liam Merwick	602414d123	block: Null pointer dereference in blk_root_get_parent_desc() The dev_id returned by the call to blk_get_attached_dev_id() in blk_root_get_parent_desc() can be NULL (an internal call to object_get_canonical_path may have returned NULL). Instead of just checking this case before before dereferencing, adjust blk_get_attached_dev_id() to return the empty string if no object path can be found (similar to the case when blk->dev is NULL and an empty string is returned). Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com> Message-id: 1541453919-25973-3-git-send-email-Liam.Merwick@oracle.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-11-12 17:49:21 +01:00
Li Qiang	967105651b	block: change some function return type to bool Signed-off-by: Li Qiang <liq3ea@163.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-11-05 15:09:54 +01:00
Kevin Wolf	cb53460b70	block-backend: Set werror/rerror defaults in blk_new() Currently, the default values for werror and rerror have to be set explicitly with blk_set_on_error() by the callers of blk_new(). The only caller actually doing this is blockdev_init(), which is called for BlockBackends created using -drive. In particular, anonymous BlockBackends created with -device ...,drive=<node-name> didn't get the correct default set and instead defaulted to the integer value 0 (= BLOCKDEV_ON_ERROR_REPORT). This is the intended default for rerror anyway, but the default for werror should be BLOCKDEV_ON_ERROR_ENOSPC. Set the defaults in blk_new() instead so that they apply no matter what way the BlockBackend was created. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2018-10-01 19:13:46 +02:00
Kevin Wolf	cfe29d8294	block: Use a single global AioWait When draining a block node, we recurse to its parent and for subtree drains also to its children. A single AIO_WAIT_WHILE() is then used to wait for bdrv_drain_poll() to become true, which depends on all of the nodes we recursed to. However, if the respective child or parent becomes quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent is checked, while AIO_WAIT_WHILE() depends on the AioWait of the original node. Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE(). This may mean that the draining thread gets a few more unnecessary wakeups because an unrelated operation got completed, but we already wake it up when something _could_ have changed rather than only if it has certainly changed. Apart from that, drain is a slow path anyway. In theory it would be possible to use wakeups more selectively and still correctly, but the gains are likely not worth the additional complexity. In fact, this patch is a nice simplification for some places in the code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-09-25 15:50:15 +02:00
Kevin Wolf	46aaf2a566	block-backend: Decrease in_flight only after callback Request callbacks can do pretty much anything, including operations that will yield from the coroutine (such as draining the backend). In that case, a decreased in_flight would be visible to other code and could lead to a drain completing while the callback hasn't actually completed yet. Note that reordering these operations forbids calling drain directly inside an AIO callback. As Paolo explains, indirectly calling it is okay: - Calling it through a coroutine is okay, because then bdrv_drained_begin() goes through bdrv_co_yield_to_drain() and you have in_flight=2 when bdrv_co_yield_to_drain() yields, then soon in_flight=1 when the aio_co_wake() in the AIO callback completes, then in_flight=0 after the bottom half starts. - Calling it through a bottom half would be okay too, as long as the AIO callback remembers to do inc_in_flight/dec_in_flight just like bdrv_co_yield_to_drain() and bdrv_co_drain_bh_cb() do A few more important cases that come to mind: - A coroutine that yields because of I/O is okay, with a sequence similar to bdrv_co_yield_to_drain(). - A coroutine that yields with no I/O pending will correctly decrease in_flight to zero before yielding. - Calling more AIO from the callback won't overflow the counter just because of mutual recursion, because AIO functions always yield at least once before invoking the callback. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-25 15:50:15 +02:00
Kevin Wolf	5ca9d21bd1	block-backend: Fix potential double blk_delete() blk_unref() first decreases the refcount of the BlockBackend and calls blk_delete() if the refcount reaches zero. Requests can still be in flight at this point, they are only drained during blk_delete(): At this point, arbitrary callbacks can run. If any callback takes a temporary BlockBackend reference, it will first increase the refcount to 1 and then decrease it to 0 again, triggering another blk_delete(). This will cause a use-after-free crash in the outer blk_delete(). Fix it by draining the BlockBackend before decreasing to refcount to 0. Assert in blk_ref() that it never takes the first refcount (which would mean that the BlockBackend is already being deleted). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-09-25 15:50:15 +02:00
Kevin Wolf	fe5258a503	block-backend: Add .drained_poll callback A bdrv_drain operation must ensure that all parents are quiesced, this includes BlockBackends. Otherwise, callbacks called by requests that are completed on the BDS layer, but not quite yet on the BlockBackend layer could still create new requests. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-09-25 15:50:15 +02:00
Peter Xu	3ab72385b2	qapi: Drop qapi_event_send_FOO()'s Error argument The generated qapi_event_send_FOO() take an Error argument. They can't actually fail, because all they do with the argument is passing it to functions that can't fail: the QObject output visitor, and the @qmp_emit callback, which is either monitor_qapi_event_queue() or event_test_emit(). Drop the argument, and pass &error_abort to the QObject output visitor and @qmp_emit instead. Suggested-by: Eric Blake <eblake@redhat.com> Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180815133747.25032-4-peterx@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Commit message rewritten, update to qapi-code-gen.txt corrected] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-08-28 18:21:38 +02:00
Kevin Wolf	572023f7b2	block: Remove deprecated -drive option serial This reinstates commit `b008326744`, which was temporarily reverted for the 3.0 release so that libvirt gets some extra time to update their command lines. The -drive option serial was deprecated in QEMU 2.10. It's time to remove it. Tests need to be updated to set the serial number with -global instead of using the -drive option. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-08-15 12:50:39 +02:00
Fam Zheng	0b9fd3f467	block: Use BdrvChild to discard Other I/O functions are already using a BdrvChild pointer in the API, so make discard do the same. It makes it possible to initiate the same permission checks before doing I/O, and much easier to share the helper functions for this, which will be added and used by write, truncate and copy range paths. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 16:01:52 +02:00
Cornelia Huck	44e8b4689c	Revert "block: Remove deprecated -drive option serial" This reverts commit `b008326744`. Hold off removing this for one more QEMU release (current libvirt release still uses it.) Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 14:36:11 +02:00
Vladimir Sementsov-Ogievskiy	67b51fb998	block: split flags in copy_range Pass read flags and write flags separately. This is needed to handle coming BDRV_REQ_NO_SERIALISING clearly in following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 13:04:25 +02:00
Greg Kurz	f45280cbf6	block: fix QEMU crash with scsi-hd and drive_del Removing a drive with drive_del while it is being used to run an I/O intensive workload can cause QEMU to crash. An AIO flush can yield at some point: blk_aio_flush_entry() blk_co_flush(blk) bdrv_co_flush(blk->root->bs) ... qemu_coroutine_yield() and let the HMP command to run, free blk->root and give control back to the AIO flush: hmp_drive_del() blk_remove_bs() bdrv_root_unref_child(blk->root) child_bs = blk->root->bs bdrv_detach_child(blk->root) bdrv_replace_child(blk->root, NULL) blk->root->bs = NULL g_free(blk->root) <============== blk->root becomes stale bdrv_unref(child_bs) bdrv_delete(child_bs) bdrv_close() bdrv_drained_begin() bdrv_do_drained_begin() bdrv_drain_recurse() aio_poll() ... qemu_coroutine_switch() and the AIO flush completion ends up dereferencing blk->root: blk_aio_complete() scsi_aio_complete() blk_get_aio_context(blk) bs = blk_bs(blk) ie, bs = blk->root ? blk->root->bs : NULL ^^^^^ stale The problem is that we should avoid making block driver graph changes while we have in-flight requests. Let's drain all I/O for this BB before calling bdrv_root_unref_child(). Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	b008326744	block: Remove deprecated -drive option serial The -drive option serial was deprecated in QEMU 2.10. It's time to remove it. Tests need to be updated to set the serial number with -global instead of using the -drive option. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-06-15 14:49:44 +02:00
Fam Zheng	b5679fa49c	block-backend: Add blk_co_copy_range It's a BlockBackend wrapper of the BDS interface. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180601092648.24614-10-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-01 14:41:48 +01:00
Daniel Henrique Barboza	7803696d85	block-backend: simplify blk_get_aio_context blk_get_aio_context verifies if BlockDriverState bs is not NULL, return bdrv_get_aio_context(bs) if true or qemu_get_aio_context() otherwise. However, bdrv_get_aio_context from block.c already does this verification itself, also returning qemu_get_aio_context() if bs is NULL: AioContext bdrv_get_aio_context(BlockDriverState bs) { return bs ? bs->aio_context : qemu_get_aio_context(); } This patch simplifies blk_get_aio_context to simply call bdrv_get_aio_context instead of replicating the same logic. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-15 16:11:41 +02:00
Stefan Hajnoczi	d03654eacd	block: let blk_add/remove_aio_context_notifier() tolerate BDS changes Commit `2019ba0a01` ("block: Add AioContextNotifier functions to BB") added blk_add/remove_aio_context_notifier() and implemented them by passing through the bdrv_() equivalent. This doesn't work across bdrv_append(), which detaches child->bs and re-attaches it to a new BlockDriverState. When blk_remove_aio_context_notifier() is called we will access the new BDS instead of the one where the notifier was added! >From the point of view of the blk_() API user, changes to the root BDS should be transparent. This patch maintains a list of AioContext notifiers in BlockBackend and adds/removes them from the BlockDriverState as needed. Reported-by: Stefano Panella <spanella@gmail.com> Cc: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180306204819.11266-2-stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-13 15:38:55 -05:00
Deepa Srinivasan	c060332c76	block: Fix qemu crash when using scsi-block Starting qemu with the following arguments causes qemu to segfault: ... -device lsi,id=lsi0 -drive file=iscsi:<...>,format=raw,if=none,node-name= iscsi1 -device scsi-block,bus=lsi0.0,id=<...>,drive=iscsi1 This patch fixes blk_aio_ioctl() so it does not pass stack addresses to blk_aio_ioctl_entry() which may be invoked after blk_aio_ioctl() returns. More details about the bug follow. blk_aio_ioctl() invokes blk_aio_prwv() with blk_aio_ioctl_entry as the coroutine parameter. blk_aio_prwv() ultimately calls aio_co_enter(). When blk_aio_ioctl() is executed from within a coroutine context (e.g. iscsi_bh_cb()), aio_co_enter() adds the coroutine (blk_aio_ioctl_entry) to the current coroutine's wakeup queue. blk_aio_ioctl() then returns. When blk_aio_ioctl_entry() executes later, it accesses an invalid pointer: .... BlkRwCo *rwco = &acb->rwco; rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset, rwco->qiov->iov[0].iov_base); <--- qiov is invalid here ... In the case when blk_aio_ioctl() is called from a non-coroutine context, blk_aio_ioctl_entry() executes immediately. But if bdrv_co_ioctl() calls qemu_coroutine_yield(), blk_aio_ioctl() will return. When the coroutine execution is complete, control returns to blk_aio_ioctl_entry() after the call to blk_co_ioctl(). There is no invalid reference after this point, but the function is still holding on to invalid pointers. The fix is to change blk_aio_prwv() to accept a void pointer for the IO buffer rather than a QEMUIOVector. blk_aio_prwv() passes this through in BlkRwCo and the coroutine function casts it to QEMUIOVector or uses the void pointer directly. Signed-off-by: Deepa Srinivasan <deepa.srinivasan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-03-08 15:43:11 +00:00
Peter Maydell	58e2e17dba	Block layer patches -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJanYJPAAoJEH8JsnLIjy/WxjUQAJA+DTOmGXvaNpMs65BrU79K /r/iGVrzHv/RMLmrWMnqj96W9SnpMuiAP9hVLNsekqClY9q4ME4DpGcXhWfhSvF5 FC51ehvFJdfo8cPorsevcqNj60iWebjcx3lFfUq2606UOyYih3oijYxr6gSwWbRc GAgdGMqsvGYpzgqAQVEWHUhaX0La49/OzY42aR+E+LCBNfTYvlydvyoc+tUTdIpW 1eM/ASGndGsN0Cf2vxlbKgJ0/P6v+cRZuuIDhKZqre+YG+yM+pq7yZb+o7nf/P36 TPR93BsT7FSVAizRK7VFRuPIynHpiaxYygrJERCXF0sxsV4OlKjpmt/uUPamWFh+ 46Jx2NK1AuAx87BdErgmA119ObO3oAPxK0+2p981obb6SphTbbPxDj6SOlYCt4mJ mhff4JtIiwCmDSckAwd2mkBI1Tvl9qqcELrpyd2t2eU4ec2vf7fPd85EsK/Mq6Kr dbfqFvjNaaMxChoqFgkHAveYJ7zYqRFI2IY5o9c1QyZehCGPWjScxHXZZYdpDl59 YF9DkYQDOyvEX2jmMECaO1r/0nnO+BqQHu5ItJuTte9rjP9Q0do3iBISiIefewtf yji6/QNn2hFrnr1HPAwLFFC3kPgc8Mq8mIUb53j8vG/01KhVRCcnJm2K6D4IUwLZ S6ZnQJB97eE4y7YR5dNt =2axz -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Mon 05 Mar 2018 17:45:51 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (38 commits) block: Fix NULL dereference on empty drive error qcow2: Replace align_offset() with ROUND_UP() block/ssh: Add basic .bdrv_truncate() block/ssh: Make ssh_grow_file() blocking block/ssh: Pull ssh_grow_file() from ssh_create() qemu-img: Make resize error message more general qcow2: make qcow2_co_create2() a coroutine_fn block: rename .bdrv_create() to .bdrv_co_create_opts() Revert "IDE: Do not flush empty CDROM drives" block: test blk_aio_flush() with blk->root == NULL block: add BlockBackend->in_flight counter block: extract AIO_WAIT_WHILE() from BlockDriverState aio: rename aio_context_in_iothread() to in_aio_context_home_thread() docs: document how to use the l2-cache-entry-size parameter specs/qcow2: Fix documentation of the compressed cluster descriptor iotest 033: add misaligned write-zeroes test via truncate block: fix write with zero flag set and iovector provided block: Drop unused .bdrv_co_get_block_status() vvfat: Switch to .bdrv_co_block_status() vpc: Switch to .bdrv_co_block_status() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # include/block/block.h	2018-03-06 11:20:44 +00:00
Kevin Wolf	bfe1a14c18	block: Fix NULL dereference on empty drive error blk_error_action() sends a BLOCK_IO_ERROR QMP event which includes the node name of its root node. If the BlockBackend represents an empty drive, there is no root node, so we should not try to access its node name. Make the field optional in the event and include it only when the BlockBackend isn't empty. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-03-05 18:45:32 +01:00
Markus Armbruster	9af2398977	Include less of the generated modular QAPI headers In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-02 13:45:50 -06:00
Stefan Hajnoczi	33f2a75777	block: add BlockBackend->in_flight counter BlockBackend currently relies on BlockDriverState->in_flight to track requests for blk_drain(). There is a corner case where BlockDriverState->in_flight cannot be used though: blk->root can be NULL when there is no medium. This results in a segfault when the NULL pointer is dereferenced. Introduce a BlockBackend->in_flight counter for aio requests so it works even when blk->root == NULL. Based on a patch by Kevin Wolf <kwolf@redhat.com>. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Markus Armbruster	922a01a013	Move include qemu/option.h from qemu-common.h to actual users qemu-common.h includes qemu/option.h, but most places that include the former don't actually need the latter. Drop the include, and add it to the places that actually need it. While there, drop superfluous includes of both headers, and separate #include from file comment with a blank line. This cleanup makes the number of objects depending on qemu/option.h drop from 4545 (out of 4743) to 284 in my "build everything" tree. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-20-armbru@redhat.com> [Semantic conflict with commit `bdd6a90a9e` in block/nvme.c resolved]	2018-02-09 13:52:16 +01:00
Markus Armbruster	e688df6bc4	Include qapi/error.h exactly where needed This cleanup makes the number of objects depending on qapi/error.h drop from 1910 (out of 4743) to 1612 in my "build everything" tree. While there, separate #include from file comment with a blank line, and drop a useless comment on why qemu/osdep.h is included first. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-5-armbru@redhat.com> [Semantic conflict with commit `34e304e975` resolved, OSX breakage fixed]	2018-02-09 13:50:17 +01:00
Fam Zheng	23d0ba9319	block: Introduce buf register API Allow block driver to map and unmap a buffer for later I/O, as a performance hint. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-5-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-02-08 09:22:03 +08:00
Kevin Wolf	1f4ad7d3b8	block: Don't request I/O permission with BDRV_O_NO_IO 'qemu-img info' makes sense even when BLK_PERM_CONSISTENT_READ cannot be granted because of a block job in a running qemu process. It already sets BDRV_O_NO_IO to indicate that it doesn't access the guest visible data at all. Check the BDRV_O_NO_IO flags in blk_new_open(), so that I/O related permissions are not unnecessarily requested and 'qemu-img info' can work even if BLK_PERM_CONSISTENT_READ cannot be granted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2017-11-21 14:48:22 +01:00
Max Reitz	5e003f17ec	block: Make bdrv_next() keep strong references On one hand, it is a good idea for bdrv_next() to return a strong reference because ideally nearly every pointer should be refcounted. This fixes intermittent failure of iotest 194. On the other, it is absolutely necessary for bdrv_next() itself to keep a strong reference to both the BB (in its first phase) and the BDS (at least in the second phase) because when called the next time, it will dereference those objects to get a link to the next one. Therefore, it needs these objects to stay around until then. Just storing the pointer to the next in the iterator is not really viable because that pointer might become invalid as well. Both arguments taken together means we should probably just invoke bdrv_ref() and blk_ref() in bdrv_next(). This means we have to assert that bdrv_next() is always called from the main loop, but that was probably necessary already before this patch and judging from the callers, it also looks to actually be the case. Keeping these strong references means however that callers need to give them up if they decide to abort the iteration early. They can do so through the new bdrv_next_cleanup() function. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110172545.32609-1-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:31 +01:00
Alberto Garcia	c89bcf3af0	block: Leave valid throttle timers when removing a BDS from a backend If a BlockBackend has I/O limits set then its ThrottleGroupMember structure uses the AioContext from its attached BlockDriverState. Those two contexts must be kept in sync manually. This is not ideal and will be fixed in the future by removing the throttling configuration from the BlockBackend and storing it in an implicit filter node instead, but for now we have to live with this. When you remove the BlockDriverState from the backend then the throttle timers are destroyed. If a new BlockDriverState is later inserted then they are created again using the new AioContext. There are a couple of problems with this: a) The code manipulates the timers directly, leaving the ThrottleGroupMember.aio_context field in an inconsisent state. b) If you remove the I/O limits (e.g by destroying the backend) when the timers are gone then throttle_group_unregister_tgm() will attempt to destroy them again, crashing QEMU. While b) could be fixed easily by allowing the timers to be freed twice, this would result in a situation in which we can no longer guarantee that a valid ThrottleState has a valid AioContext and timers. This patch ensures that the timers and AioContext are always valid when I/O limits are set, regardless of whether the BlockBackend has a BlockDriverState inserted or not. [Fixed "There'a" typo as suggested by Max Reitz <mreitz@redhat.com> --Stefan] Reported-by: sochin jiang <sochin.jiang@huawei.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: e089c66e7c20289b046d782cea4373b765c5bc1d.1510339534.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-13 15:43:49 +00:00
Alberto Garcia	48bf7ea81a	block: Check for inserted BlockDriverState in blk_io_limits_disable() When you set I/O limits using block_set_io_throttle or the command line throttling.* options they are kept in the BlockBackend regardless of whether a BlockDriverState is attached to the backend or not. Therefore when removing the limits using blk_io_limits_disable() we need to check if there's a BDS before attempting to drain it, else it will crash QEMU. This can be reproduced very easily using HMP: (qemu) drive_add 0 if=none,throttling.iops-total=5000 (qemu) drive_del none0 Reported-by: sochin jiang <sochin.jiang@huawei.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 0d3a67ce8d948bb33e08672564714dcfb76a3d8c.1510339534.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-13 14:38:46 +00:00
Stefan Hajnoczi	dc868fb03b	throttle-groups: drain before detaching ThrottleState I/O requests hang after stop/cont commands at least since QEMU 2.10.0 with -drive iops=100: (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000 (qemu) stop (qemu) cont ...I/O is stuck... This happens because blk_set_aio_context() detaches the ThrottleState while requests may still be in flight: if (tgm->throttle_state) { throttle_group_detach_aio_context(tgm); throttle_group_attach_aio_context(tgm, new_context); } This patch encloses the detach/attach calls in a drained region so no I/O request is left hanging. Also add assertions so we don't make the same mistake again in the future. Reported-by: Yongxue Hong <yhong@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20171110151934.16883-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-13 14:02:09 +00:00
Zhengui	632a773543	block: all I/O should be completed before removing throttle timers. In blk_remove_bs, all I/O should be completed before removing throttle timers. If there has inflight I/O, removing throttle timers here will cause the inflight I/O never return. This patch add bdrv_drained_begin before throttle_timers_detach_aio_context to let all I/O completed before removing throttle timers. [Moved declaration of bs as suggested by Alberto Garcia <berto@igalia.com>. --Stefan] Signed-off-by: Zhengui <lizhengui@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1508564040-120700-1-git-send-email-lizhengui@huawei.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-13 14:02:05 +00:00
Manos Pitsidianakis	f738cfc843	block: tidy ThrottleGroupMember initializations Move the CoMutex and CoQueue inits inside throttle_group_register_tgm() which is called whenever a ThrottleGroupMember is initialized. There's no need for them to be separate. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 16:47:52 +02:00
Manos Pitsidianakis	c61791fc23	block: add aio_context field in ThrottleGroupMember timer_cb() needs to know about the current Aio context of the throttle request that is woken up. In order to make ThrottleGroupMember backend agnostic, this information is stored in an aio_context field instead of accessing it from BlockBackend. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 16:47:52 +02:00
Manos Pitsidianakis	022cdc9f40	block: move ThrottleGroup membership to ThrottleGroupMember This commit eliminates the 1:1 relationship between BlockBackend and throttle group state. Users will be able to create multiple throttle nodes, each with its own throttle group state, in the future. The throttle group state cannot be per-BlockBackend anymore, it must be per-throttle node. This is done by gathering ThrottleGroup membership details from BlockBackendPublic into ThrottleGroupMember and refactoring existing code to use the structure. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 16:47:51 +02:00
Fam Zheng	ca2e214411	block-backend: Allow more "can inactivate" cases These two conditions corresponds to mirror job's source and target, which need to be allowed as they are part of the non-shared storage migration workflow: failing to inactivate either will result in a failure during migration completion. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-3-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [eblake: improve comment grammar] Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-23 10:21:55 -05:00
Fam Zheng	c16de8f59a	block-backend: Refactor inactivate check The logic will be fixed (extended), move it to a separate function. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-2-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-23 10:21:55 -05:00
Fam Zheng	5f7772c4d0	block-backend: Defer shared_perm tightening migration completion As in the case of nbd_export_new(), bdrv_invalidate_cache() can be called when migration is still in progress. In this case we are not ready to tighten the shared permissions fenced by blk->disable_perm. Defer to a VM state change handler. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170815130740.31229-4-famz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-15 10:03:28 -05:00
Kevin Wolf	a429b9b5f4	block: Make blk_all_next() public Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-07-18 15:14:36 +02:00
Kevin Wolf	77beef8365	block: Make blk_get_attached_dev_id() public Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-07-18 15:14:35 +02:00
Max Reitz	3a691c50f1	block: Add PreallocMode to blk_truncate() blk_truncate() itself will pass that value to bdrv_truncate(), and all callers of blk_truncate() just set the parameter to PREALLOC_MODE_OFF for now. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Max Reitz	7ea37c3066	block: Add PreallocMode to bdrv_truncate() For block drivers that just pass a truncate request to the underlying protocol, we can now pass the preallocation mode instead of aborting if it is not PREALLOC_MODE_OFF. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Manos Pitsidianakis	f5a5ca7969	block: change variable names in BlockDriverState Change the 'int count' parameter in pwrite_zeros, pdiscard related functions (and some others) to 'int bytes', as they both refer to bytes. This helps with code legibility. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-06-26 14:54:46 +02:00
Paolo Bonzini	9caa6f3dbe	block: split BlockAcctStats creation and setup block_acct_destroy is called unconditionally in blk_delete, but there is no BlockAcctStats function that is called unconditionally in blk_new. Split block_acct_init in two, so that it will be possible to create a QemuMutex in block_acct_init and destroy it in block_acct_cleanup. Cc: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-19-pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	93001e9d87	throttle-groups: protect throttled requests with a CoMutex Another possibility is to use tg->lock, which we're holding anyway in both schedule_next_request and throttle_group_co_io_limits_intercept. This would require open-coding the CoQueue however, so I've chosen this alternative. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-10-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	d993b85804	block: access io_limits_disabled with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-4-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Kevin Wolf	93c26503e0	block: Fix anonymous BBs in blk_root_inactivate() blk->name isn't an array, but a pointer that can be NULL. Checking for an anonymous BB must involve a NULL check first, otherwise we get crashes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2017-06-09 11:45:03 +02:00
Kevin Wolf	cfa1a5723f	block: Drop permissions when migration completes With image locking, permissions affect other qemu processes as well. We want to be sure that the destination can run, so let's drop permissions on the source when migration completes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	4417ab7adf	block: New BdrvChildRole.activate() for blk_resume_after_migration() Instead of manually calling blk_resume_after_migration() in migration code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend activation with cache invalidation into a single function. This is achieved with a new callback in BdrvChildRole that is called by bdrv_invalidate_cache_all(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	ed3d2ec98a	block: Add errp to b{lk,drv}_truncate() For one thing, this allows us to drop the error message generation from qemu-img.c and blockdev.c and instead have it unified in bdrv_truncate(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170328205129.15138-3-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-28 16:02:02 +02:00
Krzysztof Kozlowski	0731a50feb	block: Constify data passed by pointer to blk_name blk_name() is not modifying data passed to it through pointer and it returns also a pointer to const so the argument can be made const for code safeness. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-04-27 15:39:49 +02:00
Eric Blake	1606e4cf8a	throttle: Remove block from group on hot-unplug When a block device that is part of a throttle group is hot-unplugged, we forgot to remove it from the throttle group. This leaves stale memory around, and causes an easily reproducible crash: $ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio \ -device virtio-scsi-pci,bus=pci.0 -drive \ id=drive_image2,if=none,format=raw,file=file2,bps=512000,iops=100,group=foo \ -device scsi-hd,id=image2,drive=drive_image2 -drive \ id=drive_image3,if=none,format=raw,file=file3,bps=512000,iops=100,group=foo \ -device scsi-hd,id=image3,drive=drive_image3 {'execute':'qmp_capabilities'} {'execute':'device_del','arguments':{'id':'image3'}} {'execute':'system_reset'} Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1428810 Suggested-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170406190847.29347-1-eblake@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-11 15:33:00 +02:00
Fam Zheng	e92f0e1910	block: Use bdrv_coroutine_enter to start I/O coroutines BDRV_POLL_WHILE waits for the started I/O by releasing bs's ctx then polling the main context, which relies on the yielded coroutine continuing on bs->ctx before notifying qemu_aio_context with bdrv_wakeup(). Thus, using qemu_coroutine_enter to start I/O is wrong because if the coroutine is entered from main loop, co->ctx will be qemu_aio_context, as a result of the "release, poll, acquire" loop of BDRV_POLL_WHILE, race conditions happen when both main thread and the iothread access the same BDS: main loop iothread ----------------------------------------------------------------------- blockdev_snapshot aio_context_acquire(bs->ctx) virtio_scsi_data_plane_handle_cmd bdrv_drained_begin(bs->ctx) bdrv_flush(bs) bdrv_co_flush(bs) aio_context_acquire(bs->ctx).enter ... qemu_coroutine_yield(co) BDRV_POLL_WHILE() aio_context_release(bs->ctx) aio_context_acquire(bs->ctx).return ... aio_co_wake(co) aio_poll(qemu_aio_context) ... co_schedule_bh_cb() ... qemu_coroutine_enter(co) ... /* (A) bdrv_co_flush(bs) /* (B) I/O on bs / continues... / aio_context_release(bs->ctx) aio_context_acquire(bs->ctx) Note that in above case, bdrv_drained_begin() doesn't do the "release, poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0. Fix this by using bdrv_coroutine_enter and enter coroutine in the right context. iotests 109 output is updated because the coroutine reenter flow during mirror job complete is different (now through co_queue_wakeup, instead of the unconditional qemu_coroutine_switch before), making the end job len different. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2017-04-11 20:07:15 +08:00
Kevin Wolf	d35ff5e6b3	block: Ignore guest dev permissions during incoming migration Usually guest devices don't like other writers to the same image, so they use blk_set_perm() to prevent this from happening. In the migration phase before the VM is actually running, though, they don't have a problem with writes to the image. On the other hand, storage migration needs to be able to write to the image in this phase, so the restrictive blk_set_perm() call of qdev devices breaks it. This patch flags all BlockBackends with a qdev device as blk->disable_perm during incoming migration, which means that the requested permissions are stored in the BlockBackend, but not actually applied to its root node yet. Once migration has finished and the VM should be resumed, the permissions are applied. If they cannot be applied (e.g. because the NBD server used for block migration hasn't been shut down), resuming the VM fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com>	2017-04-07 14:44:05 +02:00
John Snow	f4d9cc88ee	block-backend: add drained_begin / drained_end ops Allow block backends to forward drain requests to their devices/users. The initial intended purpose for this patch is to allow BBs to forward requests along to BlockJobs, which will want to pause if their associated BB has entered a drained region. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170316212351.13797-3-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-22 13:26:27 -04:00

1 2 3 4 5 ...

296 Commits