mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Kevin Wolf	7b9e8b22bc	block: Mark preadv_snapshot/snapshot_block_status GRAPH_RDLOCK Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-16-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:21 +01:00
Emanuele Giuseppe Esposito	742bf09b20	block: Mark bdrv_co_copy_range() GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_copy_range() need to hold a reader lock for the graph. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-15-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:20 +01:00
Kevin Wolf	b24a4c41ba	block: Mark bdrv_co_pwrite_sync() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_pwrite_sync() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-13-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:18 +01:00
Kevin Wolf	b9b10c35e5	block: Mark public read/write functions GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_pread/pwrite() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-12-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:17 +01:00
Kevin Wolf	7b1fb72e2c	block: Mark read/write in block/io.c GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_driver_*() need to hold a reader lock for the graph. It doesn't add the annotation to public functions yet. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-11-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:16 +01:00
Kevin Wolf	abaf8b750b	block: Mark bdrv_co_pwrite_zeroes() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_pwrite_zeroes() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-10-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:14 +01:00
Emanuele Giuseppe Esposito	9a5a1c621e	block: Mark bdrv_co_pdiscard() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_pdiscard() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-9-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:13 +01:00
Emanuele Giuseppe Esposito	8809534933	block: Mark bdrv_co_flush() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_flush() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-8-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:12 +01:00
Kevin Wolf	26c518ab1e	block: Mark bdrv_co_ioctl() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_ioctl() need to hold a reader lock for the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-6-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:09 +01:00
Kevin Wolf	7ff9579e60	block: Mark bdrv_co_block_status() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_block_status() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-5-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:07 +01:00
Kevin Wolf	c2b8e31516	block: Mark bdrv_co_truncate() and callers GRAPH_RDLOCK This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_truncate() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-4-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:49:03 +01:00
Kevin Wolf	10e5d70787	block: Make bdrv_can_set_read_only() static It is never called outside of block.c. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-2-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-23 19:48:28 +01:00
Kevin Wolf	4bee90e9da	block: Create no_co_wrappers for open functions Images can't be opened in coroutine context because opening needs to change the block graph. Add no_co_wrappers so that coroutines have a simple way of opening images in a BH instead. At the same time, mark the wrapped functions as no_coroutine_fn. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230126172432.436111-3-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-17 11:22:19 +01:00
Kevin Wolf	d6ee2e324e	block-coroutine-wrapper: Introduce no_co_wrapper Some functions must not be called from coroutine context. The common pattern to use them anyway from a coroutine is running them in a BH and letting the calling coroutine yield to be woken up when the BH is completed. Instead of manually writing such wrappers, add support for generating them to block-coroutine-wrapper. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230126172432.436111-2-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-17 11:22:19 +01:00
Markus Armbruster	02f95e91a6	block: Clean up includes This commit was created with scripts/clean-includes. All .c should include qemu/osdep.h first. The script performs three related cleanups: * Ensure .c files include qemu/osdep.h first. * Including it in a .h is redundant, since the .c already includes it. Drop such inclusions. * Likewise, including headers qemu/osdep.h includes is redundant. Drop these, too. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20230202133830.2152150-16-armbru@redhat.com>	2023-02-08 07:28:05 +01:00
Hanna Reitz	d570177b50	qemu-img: Change info key names for protocol nodes Currently, when querying a qcow2 image, qemu-img info reports something like this: image: test.qcow2 file format: qcow2 virtual size: 64 MiB (67108864 bytes) disk size: 196 KiB cluster_size: 65536 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: false refcount bits: 16 corrupt: false extended l2: false Child node '/file': image: test.qcow2 file format: file virtual size: 192 KiB (197120 bytes) disk size: 196 KiB Format specific information: extent size hint: 1048576 Notably, the way the keys are named is specific for image files: The filename is shown under "image", the BDS driver under "file format", and the BDS length under "virtual size". This does not make much sense for nodes that are not actually supposed to be guest images, like the /file child node shown above. Give bdrv_node_info_dump() a @protocol parameter that gives a hint that the respective node is probably just used for data storage and does not necessarily present the data for a VM guest disk. This renames the keys so that with this patch, the output becomes: image: test.qcow2 [...] Child node '/file': filename: test.qcow2 protocol type: file file length: 192 KiB (197120 bytes) disk size: 196 KiB Format specific information: extent size hint: 1048576 (Perhaps we should also rename "Format specific information", but I could not come up with anything better that will not become problematic if we guess wrong with the protocol "heuristic".) This change affects iotest 302, which has protocol node information in its reference output. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220620162704.80987-13-hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:33 +01:00
Hanna Reitz	76c9e9750d	block/qapi: Add indentation to bdrv_node_info_dump() In order to let qemu-img info present a block graph, add a parameter to bdrv_node_info_dump() and bdrv_image_info_specific_dump() so that the information of nodes below the root level can be given an indentation. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220620162704.80987-9-hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:33 +01:00
Hanna Reitz	6cab33997b	block/qapi: Introduce BlockGraphInfo Introduce a new QAPI type BlockGraphInfo and an associated bdrv_query_block_graph_info() function that recursively gathers BlockNodeInfo objects through a block graph. A follow-up patch is going to make "qemu-img info" use this to print information about all nodes that are (usually implicitly) opened for a given image file. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220620162704.80987-8-hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:33 +01:00
Hanna Reitz	5d8813593f	block/qapi: Let bdrv_query_image_info() recurse There is no real reason why bdrv_query_image_info() should generally not recurse. The ImageInfo struct has a pointer to the backing image, so it should generally be filled, unless the caller explicitly opts out. This moves the recursing code from bdrv_block_device_info() into bdrv_query_image_info(). Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220620162704.80987-7-hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:33 +01:00
Hanna Reitz	b1f4cd1589	qemu-img: Use BlockNodeInfo qemu-img info never uses ImageInfo's backing-image field, because it opens the backing chain one by one with BDRV_O_NO_BACKING, and prints all backing chain nodes' information consecutively. Use BlockNodeInfo to make it clear that we only print information about a single node, and that we are not using the backing-image field. Notably, bdrv_image_info_dump() does not evaluate the backing-image field, so we can easily make it take a BlockNodeInfo pointer (and consequentially rename it to bdrv_node_info_dump()). It makes more sense this way, because again, the interface now makes it syntactically clear that backing-image is ignored by this function. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220620162704.80987-6-hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:33 +01:00
Hanna Reitz	a2085f8909	block: Split BlockNodeInfo off of ImageInfo ImageInfo sometimes contains flat information, and sometimes it does not. Split off a BlockNodeInfo struct, which only contains information about a single node and has no link to the backing image. We do this so we can extend BlockNodeInfo to a BlockGraphInfo struct, which has links to all child nodes, not just the backing node. It would be strange to base BlockGraphInfo on ImageInfo, because then this extended struct would have two links to the backing node (one in BlockGraphInfo as one of all the child links, and one in ImageInfo). Furthermore, it is quite common to ignore the backing-image field altogether: bdrv_query_image_info() does not set it, and bdrv_image_info_dump() does not evaluate it. That signals that we should have different structs for describing a single node and one that has a link to the backing image. Still, bdrv_query_image_info() and bdrv_image_info_dump() are not changed too much in this patch. Follow-up patches will handle them. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220620162704.80987-5-hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:33 +01:00
Hanna Reitz	3716470b24	block: Improve empty format-specific info dump When a block driver supports obtaining format-specific information, but that object only contains optional fields, it is possible that none of them are present, so that dump_qobject() (called by bdrv_image_info_specific_dump()) will not print anything. The callers of bdrv_image_info_specific_dump() put a header above this information ("Format specific information:\n"), which will look strange when there is nothing below. Modify bdrv_image_info_specific_dump() to print this header instead of its callers, and only if there is indeed something to be printed. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220620162704.80987-2-hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Philippe Mathieu-Daudé	fcb9e05144	block/nbd: Add missing <qemu/bswap.h> include The inlined nbd_readXX() functions call beXX_to_cpu(), themselves declared in <qemu/bswap.h>. This fixes when refactoring: In file included from ../../block/nbd.c:44: include/block/nbd.h: In function 'nbd_read16': include/block/nbd.h:383:12: error: implicit declaration of function 'be16_to_cpu' [-Werror=implicit-function-declaration] 383 \| val = be##bits##_to_cpu(val); \ \| ^~ include/block/nbd.h:387:1: note: in expansion of macro 'DEF_NBD_READ_N' 387 \| DEF_NBD_READ_N(16) /* Defines nbd_read16(). */ \| ^~~~~~~~~~~~~~ Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221125175328.48539-1-philmd@linaro.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Emanuele Giuseppe Esposito	ca5e2ad98d	block: Rename bdrv_load/save_vmstate() to bdrv_co_load/save_vmstate() Since these functions always run in coroutine context, adjust their name to include "_co_", just like all other BlockDriver callbacks. No functional change intended. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-15-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Emanuele Giuseppe Esposito	c834dc0586	block: Convert bdrv_debug_event() to co_wrapper_mixed bdrv_debug_event() is categorized as an I/O function, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since it traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper_mixed to move the actual function into a coroutine where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-14-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Emanuele Giuseppe Esposito	2c75261cc2	block: Convert bdrv_lock_medium() to co_wrapper bdrv_lock_medium() is categorized as an I/O function, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since it traverses the block nodes graph, which however is only possible in a coroutine. The only caller of this function is blk_lock_medium(). Therefore make blk_lock_medium() a co_wrapper, so that it always creates a new coroutine, and then make bdrv_lock_medium() a coroutine_fn where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-13-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Emanuele Giuseppe Esposito	2531b390fb	block: Convert bdrv_eject() to co_wrapper bdrv_eject() is categorized as an I/O function, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since it traverses the block nodes graph, which however is only possible in a coroutine. The only caller of this function is blk_eject(). Therefore make blk_eject() a co_wrapper, so that it always creates a new coroutine, and then make bdrv_eject() coroutine_fn where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-12-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Emanuele Giuseppe Esposito	3d47eb0a2a	block: Convert bdrv_get_info() to co_wrapper_mixed bdrv_get_info() is categorized as an I/O function, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since it traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper to move the actual function into a coroutine where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-11-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Emanuele Giuseppe Esposito	82618d7bc3	block: Convert bdrv_get_allocated_file_size() to co_wrapper bdrv_get_allocated_file_size() is categorized as an I/O function, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since it traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper to move the actual function into a coroutine where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-10-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Emanuele Giuseppe Esposito	c86422c554	block: Convert bdrv_refresh_total_sectors() to co_wrapper_mixed BlockDriver->bdrv_getlength is categorized as IO callback, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since the callback traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper to move the actual function into a coroutine where the lock can be taken. Because now this function creates a new coroutine and polls, we need to take the AioContext lock where it is missing, for the only reason that internally co_wrapper calls AIO_WAIT_WHILE and it expects to release the AioContext lock. This is especially messy when a co_wrapper creates a coroutine and polls in bdrv_open_driver, because this function has so many callers in so many context that it can easily lead to deadlocks. Therefore the new rule for bdrv_open_driver is that the caller must always hold the AioContext lock of the given bs (except if it is a coroutine), because the function calls bdrv_refresh_total_sectors() which is now a co_wrapper. Once the rwlock is ultimated and placed in every place it needs to be, we will poll using AIO_WAIT_WHILE_UNLOCKED and remove the AioContext lock. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-7-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Emanuele Giuseppe Esposito	c057960c4e	block: Rename refresh_total_sectors to bdrv_refresh_total_sectors The name is not good, not the least because we are going to convert this to a generated co_wrapper, which adds a _co infix after the first part of the name. No functional change intended. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-6-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Emanuele Giuseppe Esposito	1e97be9156	block: Convert bdrv_is_inserted() to co_wrapper bdrv_is_inserted() is categorized as an I/O function, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since it traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper to move the actual function into a coroutine where the lock can be taken. At the same time, add also blk_is_inserted as co_wrapper_mixed, since it is called in both coroutine and non-coroutine contexts. Because now this function creates a new coroutine and polls, we need to take the AioContext lock where it is missing, for the only reason that internally c_w_mixed_bdrv_rdlock calls AIO_WAIT_WHILE and it expects to release the AioContext lock. Once the rwlock is ultimated and placed in every place it needs to be, we will poll using AIO_WAIT_WHILE_UNLOCKED and remove the AioContext lock. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-5-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Emanuele Giuseppe Esposito	09d9fc97f8	block: Convert bdrv_io_unplug() to co_wrapper BlockDriver->bdrv_io_unplug is categorized as IO callback, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since the callback traverses the block nodes graph, which however is only possible in a coroutine. The only caller of this function is blk_io_unplug(), therefore make blk_io_unplug() a co_wrapper, so that we're always running in a coroutine where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-4-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Emanuele Giuseppe Esposito	8f49745432	block: Convert bdrv_io_plug() to co_wrapper BlockDriver->bdrv_io_plug is categorized as IO callback, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since the callback traverses the block nodes graph, which however is only possible in a coroutine. The only caller of this function is blk_io_plug(), therefore make blk_io_plug() a co_wrapper, so that we're always running in a coroutine where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-3-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-02-01 16:52:32 +01:00
Paolo Bonzini	3d65110f0c	block: remove bdrv_coroutine_enter It has only one caller---inline it and remove the function. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221215130225.476477-2-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-01-24 18:26:41 +01:00
Alberto Faria	0f3de970fe	block: Add no_coroutine_fn and coroutine_mixed_fn marker Add more annotations to functions, describing valid and invalid calls from coroutine to non-coroutine context. When applied to a function, no_coroutine_fn advertises that it should not be called from coroutine_fn functions. This can be because the function blocks or, in the case of generated_co_wrapper, to enforce that coroutine_fn functions directly call the coroutine_fn that backs the generated_co_wrapper. coroutine_mixed_fn instead is for function that can be called in both coroutine and non-coroutine context, but will suspend when called in coroutine context. Annotating them is a first step towards enforcing that non-annotated functions are absolutely not going to suspend. These can be used for example with the vrc tool: # find functions that really cannot be called from no_coroutine_fn (vrc) load --loader clang libblock.fa.p/meson-generated_.._block_block-gen.c.o (vrc) paths [no_coroutine_fn,!coroutine_mixed_fn] bdrv_remove_persistent_dirty_bitmap bdrv_create bdrv_can_store_new_dirty_bitmap # find how coroutine_fns end up calling a mixed function (vrc) load --loader clang --force libblock.fa.p/.c.o (vrc) paths [coroutine_fn] [!no_coroutine_fn] [coroutine_mixed_fn] ... bdrv_pread <- vhdx_log_write <- vhdx_log_write_and_flush <- vhdx_co_writev ... Signed-off-by: Alberto Faria <afaria@redhat.com> [Rebase, add coroutine_mixed_fn. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221216110758.559947-3-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-01-24 18:26:41 +01:00
Markus Armbruster	e2c1c34f13	include/block: Untangle inclusion loops We have two inclusion loops: block/block.h -> block/block-global-state.h -> block/block-common.h -> block/blockjob.h -> block/block.h block/block.h -> block/block-io.h -> block/block-common.h -> block/blockjob.h -> block/block.h I believe these go back to Emanuele's reorganization of the block API, merged a few months ago in commit `d7e2fe4aac`. Fortunately, breaking them is merely a matter of deleting unnecessary includes from headers, and adding them back in places where they are now missing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221221133551.3967339-2-armbru@redhat.com>	2023-01-20 07:24:28 +01:00
Markus Armbruster	4369560135	coroutine: Use Coroutine typedef name instead of structure tag Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221221131435.3851212-6-armbru@redhat.com>	2023-01-20 07:23:45 +01:00
Markus Armbruster	68ba85cecc	coroutine: Split qemu/coroutine-core.h off qemu/coroutine.h qemu/coroutine.h and qemu/lockable.h include each other. They need each other only in macro expansions, so we could simply drop both inclusions to break the loop, and add suitable includes to files that expand the macros. Instead, move a part of qemu/coroutine.h to new qemu/coroutine-core.h so that qemu/coroutine-core.h doesn't need qemu/lockable.h, and qemu/lockable.h only needs qemu/coroutine-core.h. Result: qemu/coroutine.h includes qemu/lockable.h includes qemu/coroutine-core.h. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221221131435.3851212-5-armbru@redhat.com> [Semantic rebase conflict with `7c10cb38cc` "accel/tcg: Add debuginfo support" resolved]	2023-01-20 07:21:46 +01:00
Markus Armbruster	af7f8eb591	coroutine: Move coroutine_fn to qemu/osdep.h, trim includes block/block-hmp-cmds.h and qemu/co-shared-resource.h use coroutine_fn without including qemu/coroutine.h. They compile only if it's already included from elsewhere. I could fix that, but pulling in qemu/coroutine.h and everything it includes just for a macro that expands into nothing feels silly. Instead, move the macro to qemu/osdep.h. Inclusions of qemu/coroutine.h just for coroutine_fn become superfluous. Drop them. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20221221131435.3851212-3-armbru@redhat.com>	2023-01-19 10:18:28 +01:00
Markus Armbruster	2379247810	coroutine: Clean up superfluous inclusion of qemu/coroutine.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20221221131435.3851212-2-armbru@redhat.com>	2023-01-19 10:18:28 +01:00
Kevin Wolf	1b3ff9feb9	block: GRAPH_RDLOCK for functions only called by co_wrappers The generated coroutine wrappers already take care to take the lock in the non-coroutine path, and assume that the lock is already taken in the coroutine path. The only thing we need to do for the wrapped function is adding the GRAPH_RDLOCK annotation. Doing so also allows us to mark the corresponding callbacks in BlockDriver as GRAPH_RDLOCK_PTR. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221207131838.239125-19-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:08:23 +01:00
Emanuele Giuseppe Esposito	90830f5950	block: use co_wrapper_mixed_bdrv_rdlock in functions taking the rdlock Take the rdlock already, before we add the assertions. All these functions either read the graph recursively, or call BlockDriver callbacks that will eventually need to be protected by the graph rdlock. Do it now to all functions together, because many of these recursively call each other. For example, bdrv_co_truncate calls BlockDriver->bdrv_co_truncate, and some driver callbacks implement their own .bdrv_co_truncate by calling bdrv_flush inside. So if bdrv_flush asserts but bdrv_truncate does not take the rdlock yet, the assertion will always fail. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221207131838.239125-18-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:08:23 +01:00
Emanuele Giuseppe Esposito	e6d3f7a602	block-coroutine-wrapper.py: introduce annotations that take the graph rdlock Add co_wrapper_bdrv_rdlock and co_wrapper_mixed_bdrv_rdlock option to the block-coroutine-wrapper.py script. This "_bdrv_rdlock" option takes and releases the graph rdlock when a coroutine function is created. This means that when used together with "_mixed", the function marked with co_wrapper_mixed_bdrv_rdlock will support both coroutine and non-coroutine case, and in the latter case it will create a coroutine that takes and releases the rdlock. When called from a coroutine, the caller must already hold the graph lock. Example: void co_wrapper_mixed_bdrv_rdlock bdrv_f1(); Becomes static void bdrv_co_enter_f1() { bdrv_graph_co_rdlock(); bdrv_co_function(); bdrv_graph_co_rdunlock(); } void bdrv_f1() { if (qemu_in_coroutine) { assume_graph_lock(); bdrv_co_function(); } else { qemu_co_enter(bdrv_co_enter_f1); ... } } When used alone, the function will not work in coroutine context, and when called in non-coroutine context it will create a new coroutine that takes care of taking and releasing the rdlock automatically. Example: void co_wrapper_bdrv_rdlock bdrv_f1(); Becomes static void bdrv_co_enter_f1() { bdrv_graph_co_rdlock(); bdrv_co_function(); bdrv_graph_co_rdunlock(); } void bdrv_f1() { assert(!qemu_in_coroutine()); qemu_co_enter(bdrv_co_enter_f1); ... } About their usage: - co_wrapper does not take the rdlock, so it can be used also outside the block layer. - co_wrapper_mixed will be used by many blk_* functions, since the coroutine function needs to call blk_wait_while_drained() and the rdlock must be taken afterwards, otherwise it's a deadlock. In the future this annotation will go away, and blk_* will use co_wrapper directly. - co_wrapper_bdrv_rdlock will be used by BlockDriver callbacks, ideally by all of them in the future. - co_wrapper_mixed_bdrv_rdlock will be used by the remaining functions that are still called by coroutine and non-coroutine context. In the future this annotation will go away, as we will split such mixed functions. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221207131838.239125-17-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:08:23 +01:00
Kevin Wolf	303de47b2c	Mark assert_bdrv_graph_readable/writable() GRAPH_RD/WRLOCK Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221207131838.239125-16-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:08:23 +01:00
Kevin Wolf	4002ffdc4f	graph-lock: TSA annotations for lock/unlock functions Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221207131838.239125-15-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:08:23 +01:00
Emanuele Giuseppe Esposito	3f35f82e04	block: assert that graph read and writes are performed correctly Remove the old assert_bdrv_graph_writable, and replace it with the new version using graph-lock API. See the function documentation for more information. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221207131838.239125-14-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:08:23 +01:00
Emanuele Giuseppe Esposito	8aa77000c2	graph-lock: Implement guard macros Similar to the implementation in lockable.h, implement macros to automatically take and release the rdlock. Create the empty GraphLockable and GraphLockableMainloop structs only to use it as a type for G_DEFINE_AUTOPTR_CLEANUP_FUNC. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221207131838.239125-4-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:43 +01:00
Paolo Bonzini	aead9dc9d1	graph-lock: Introduce a lock to protect block graph operations Block layer graph operations are always run under BQL in the main loop. This is proved by the assertion qemu_in_main_thread() and its wrapper macro GLOBAL_STATE_CODE. However, there are also concurrent coroutines running in other iothreads that always try to traverse the graph. Currently this is protected (among various other things) by the AioContext lock, but once this is removed, we need to make sure that reads do not happen while modifying the graph. We distinguish between writer (main loop, under BQL) that modifies the graph, and readers (all other coroutines running in various AioContext), that go through the graph edges, reading ->parents and->children. The writer (main loop) has "exclusive" access, so it first waits for any current read to finish, and then prevents incoming ones from entering while it has the exclusive access. The readers (coroutines in multiple AioContext) are free to access the graph as long the writer is not modifying the graph. In case it is, they go in a CoQueue and sleep until the writer is done. If a coroutine changes AioContext, the counter in the original and new AioContext are left intact, since the writer does not care where the reader is, but only if there is one. As a result, some AioContexts might have a negative reader count, to balance the positive count of the AioContext that took the lock. This also means that when an AioContext is deleted it may have a nonzero reader count. In that case we transfer the count to a global shared counter so that the writer is always aware of all readers. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221207131838.239125-3-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:43 +01:00
Kevin Wolf	da0bd74434	block: Factor out bdrv_drain_all_begin_nopoll() Provide a separate function that just quiesces the users of a node to prevent new requests from coming in, but without waiting for the already in-flight I/O to complete. This function can be used in contexts where polling is not allowed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221207131838.239125-2-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:43 +01:00
Emanuele Giuseppe Esposito	0508d0be4b	block/dirty-bitmap: convert coroutine-only functions to co_wrapper bdrv_can_store_new_dirty_bitmap and bdrv_remove_persistent_dirty_bitmap check if they are running in a coroutine, directly calling the coroutine callback if it's the case. Except that no coroutine calls such functions, therefore that check can be removed, and function creation can be offloaded to c_w. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20221128142337.657646-15-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:43 +01:00
Emanuele Giuseppe Esposito	741443eb43	block: convert bdrv_create to co_wrapper This function is never called in coroutine context, therefore instead of manually creating a new coroutine, delegate it to the block-coroutine-wrapper script, defining it as co_wrapper. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20221128142337.657646-14-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:43 +01:00
Emanuele Giuseppe Esposito	76a2f554c1	block-coroutine-wrapper.py: introduce co_wrapper This new annotation starts just a function wrapper that creates a new coroutine. It assumes the caller is not a coroutine. It will be the default annotation to be used in the future. This is much better as c_w_mixed, because it is clear if the caller is a coroutine or not, and provides the advantage of automating the code creation. In the future all c_w_mixed functions will be substituted by co_wrapper. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20221128142337.657646-11-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:43 +01:00
Emanuele Giuseppe Esposito	1bd542016c	block: rename generated_co_wrapper in co_wrapper_mixed In preparation to the incoming new function specifiers, rename g_c_w with a more meaningful name and document it. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20221128142337.657646-10-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:43 +01:00
Emanuele Giuseppe Esposito	2475a0d0f4	block: bdrv_create_file is a coroutine_fn It is always called in coroutine_fn callbacks, therefore it can directly call bdrv_co_create(). Rename it to bdrv_co_create_file too. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20221128142337.657646-9-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:43 +01:00
Emanuele Giuseppe Esposito	43a0d4f08b	block-copy: add coroutine_fn annotations These functions end up calling bdrv_common_block_status_above(), a generated_co_wrapper function. In addition, they also happen to be always called in coroutine context, meaning all callers are coroutine_fn. This means that the g_c_w function will enter the qemu_in_coroutine() case and eventually suspend (or in other words call qemu_coroutine_yield()). Therefore we can mark such functions coroutine_fn too. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20221128142337.657646-3-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:43 +01:00
Emanuele Giuseppe Esposito	7b52a921c1	block-io: introduce coroutine_fn duplicates for bdrv_common_block_status_above callers bdrv_common_block_status_above() is a g_c_w, and it is being called by many "wrapper" functions like bdrv_is_allocated(), bdrv_is_allocated_above() and bdrv_block_status_above(). Because we want to eventually split the coroutine from non-coroutine case in g_c_w, create duplicate wrappers that take care of directly calling the same coroutine functions called in the g_c_w. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20221128142337.657646-2-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:43 +01:00
Kevin Wolf	606ed756c1	block: Remove poll parameter from bdrv_parent_drained_begin_single() All callers of bdrv_parent_drained_begin_single() pass poll=false now, so we don't need the parameter any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221118174110.55183-16-kwolf@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:42 +01:00
Kevin Wolf	2398747128	block: Don't poll in bdrv_replace_child_noperm() In order to make sure that bdrv_replace_child_noperm() doesn't have to poll any more, get rid of the bdrv_parent_drained_begin_single() call. This is possible now because we can require that the parent is already drained through the child in question when the function is called and we don't call the parent drain callbacks more than once. The additional drain calls needed in callers cause the test case to run its code in the drain handler too early (bdrv_attach_child() drains now), so modify it to only enable the code after the test setup has completed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221118174110.55183-15-kwolf@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:42 +01:00
Kevin Wolf	a82a3bd135	block: Remove ignore_bds_parents parameter from drain_begin/end. ignore_bds_parents is now ignored during drain_begin and drain_end, so we can just remove it there. It is still a valid optimisation for drain_all in bdrv_drained_poll(), so leave it around there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221118174110.55183-13-kwolf@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:42 +01:00
Kevin Wolf	57e05be343	block: Call drain callbacks only once We only need to call both the BlockDriver's callback and the parent callbacks when going from undrained to drained or vice versa. A second drain section doesn't make a difference for the driver or the parent, they weren't supposed to send new requests before and after the second drain. One thing that gets in the way is the 'ignore_bds_parents' parameter in bdrv_do_drained_begin_quiesce() and bdrv_do_drained_end(): It means that bdrv_drain_all_begin() increases bs->quiesce_counter, but does not quiesce the parent through BdrvChildClass callbacks. If an additional drain section is started now, bs->quiesce_counter will be non-zero, but we would still need to quiesce the parent through BdrvChildClass in order to keep things consistent (and unquiesce it on the matching bdrv_drained_end(), even though the counter would not reach 0 yet as long as the bdrv_drain_all() section is still active). Instead of keeping track of this, let's just get rid of the parameter. It was introduced in commit `6cd5c9d7b2` as an optimisation so that during bdrv_drain_all(), we wouldn't recursively drain all parents up to the root for each node, resulting in quadratic complexity. As it happens, calling the callbacks only once solves the same problem, so as of this patch, we'll still have O(n) complexity and ignore_bds_parents is not needed any more. This patch only ignores the 'ignore_bds_parents' parameter. It will be removed in a separate patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20221118174110.55183-12-kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:42 +01:00
Kevin Wolf	299403aeda	block: Remove subtree drains Subtree drains are not used any more. Remove them. After this, BdrvChildClass.attach/detach() don't poll any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20221118174110.55183-11-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:42 +01:00
Kevin Wolf	92140b9f3f	stream: Replace subtree drain with a single node drain The subtree drain was introduced in commit `b1e1af394d` as a way to avoid graph changes between finding the base node and changing the block graph as necessary on completion of the image streaming job. The block graph could change between these two points because bdrv_set_backing_hd() first drains the parent node, which involved polling and can do anything. Subtree draining was an imperfect way to make this less likely (because with it, fewer callbacks are called during this window). Everyone agreed that it's not really the right solution, and it was only committed as a stopgap solution. This replaces the subtree drain with a solution that simply drains the parent node before we try to find the base node, and then call a version of bdrv_set_backing_hd() that doesn't drain, but just asserts that the parent node is already drained. This way, any graph changes caused by draining happen before we start looking at the graph and things stay consistent between finding the base node and changing the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20221118174110.55183-10-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:42 +01:00
Kevin Wolf	2f65df6e16	block: Remove drained_end_counter drained_end_counter is unused now, nobody changes its value any more. It can be removed. In cases where we had two almost identical functions that only differed in whether the caller passes drained_end_counter, or whether they would poll for a local drained_end_counter to reach 0, these become a single function. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221118174110.55183-5-kwolf@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:42 +01:00
Kevin Wolf	5e8ac21717	block: Revert .bdrv_drained_begin/end to non-coroutine_fn Polling during bdrv_drained_end() can be problematic (and in the future, we may get cases for bdrv_drained_begin() where polling is forbidden, and we don't care about already in-flight requests, but just want to prevent new requests from arriving). The .bdrv_drained_begin/end callbacks running in a coroutine is the only reason why we have to do this polling, so make them non-coroutine callbacks again. None of the callers actually yield any more. This means that bdrv_drained_end() effectively doesn't poll any more, even if AIO_WAIT_WHILE() loops are still there (their condition is false from the beginning). This is generally not a problem, but in test-bdrv-drain, some additional explicit aio_poll() calls need to be added because the test case wants to verify the final state after BHs have executed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20221118174110.55183-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-12-15 16:07:42 +01:00
Hanna Reitz	d5f8d79c2f	block: Make bdrv_child_get_parent_aio_context I/O We want to use bdrv_child_get_parent_aio_context() from bdrv_parent_drained_{begin,end}_single(), both of which are "I/O or GS" functions. Prior to `3ed4f708fe`, all the implementations were I/O code anyway. `3ed4f708fe` has put block jobs' AioContext field under the job mutex, so to make child_job_get_parent_aio_context() work in an I/O context, we need to take that lock there. Furthermore, blk_root_get_parent_aio_context() is not marked as anything, but is safe to run in an I/O context, so mark it that way now. (blk_get_aio_context() is an I/O code function.) With that done, all implementations explicitly are I/O code, so we can mark bdrv_child_get_parent_aio_context() as I/O code, too, so callers know it is safe to run from both GS and I/O contexts. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20221107151321.211175-2-hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-11-10 14:58:34 +01:00
Stefan Hajnoczi	d5ab9490cd	Block layer patches - Cleanup bs->backing and bs->file handling - Refactor bdrv_try_set_aio_context using transactions - Changes for improved coroutine_fn consistency - vhost-user-blk: fix the resize crash - io_uring: Use of io_uring_register_ring_fd() led to breakage, revert - vvfat: Fix some problems with r/w mode - Code cleanup - MAINTAINERS: Fold "Block QAPI, monitor, ..." into "Block layer core" -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmNazhIRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9ZyTw/8Dfck/SuxfyeLlnQItkjaV4cnqWOU8vHs 9x0KhlptCs+HXdF/3iicpA0lHojn7mNnbdFGjPRY4E0LriQv91TQ5ycdEmrseFPf sgeQlgdKCVU/pHjZ2wYarm2pE43Cx85a5xuufmw+7w49dNNZn14l4t+DgviuClVM nuVaogfZFbYyetre+Qd2TgLl+gJ+0d4o7Zs5lSWLrT8t0L9AGkcWPA7Nrbl6loIE dOautV4G7jLjuMiCeJZOGcnuRVe3gCQ5rCGBFzzH4DUtz4BmiYx4hd3LMEsP0PMM CrsfDZS04Ztybl9M7TmJuwkAm1gx1JDMOuJuh18lbJocIOBvhkKKxY2wI5LIdZVI ZntmU36RowkX+GGu/PYpYyMjBDClJppZCl7vnjyLYsVt6r0Vu6SmlHpJhcRYabhe 96Kv1LXH9A6+ogKPU3Layw6JGjg01GNr1ALuT7PO3pGto/JshmOuBEJJDucoF84M 5AfxFCohMROVldwblA6M0eKnlQBgtr5BvtgbV54BBo88VlFJgDJFQn7R09cTFUEo UwaJoS+nIaiZ0bQQVZhZloVppUaTdVJojzfVRCZZctga96/tu1HSFnGLnbEFpUN3 KOf+XnVNS6Ro+nPSDf9bMjbIom2JicGFfV+6yMgIoxY/d5UA2dTZfefil4TAlSod 6PsTgg+jrm8= =/Fw0 -----END PGP SIGNATURE----- Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging Block layer patches - Cleanup bs->backing and bs->file handling - Refactor bdrv_try_set_aio_context using transactions - Changes for improved coroutine_fn consistency - vhost-user-blk: fix the resize crash - io_uring: Use of io_uring_register_ring_fd() led to breakage, revert - vvfat: Fix some problems with r/w mode - Code cleanup - MAINTAINERS: Fold "Block QAPI, monitor, ..." into "Block layer core" # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmNazhIRHGt3b2xmQHJl # ZGhhdC5jb20ACgkQfwmycsiPL9ZyTw/8Dfck/SuxfyeLlnQItkjaV4cnqWOU8vHs # 9x0KhlptCs+HXdF/3iicpA0lHojn7mNnbdFGjPRY4E0LriQv91TQ5ycdEmrseFPf # sgeQlgdKCVU/pHjZ2wYarm2pE43Cx85a5xuufmw+7w49dNNZn14l4t+DgviuClVM # nuVaogfZFbYyetre+Qd2TgLl+gJ+0d4o7Zs5lSWLrT8t0L9AGkcWPA7Nrbl6loIE # dOautV4G7jLjuMiCeJZOGcnuRVe3gCQ5rCGBFzzH4DUtz4BmiYx4hd3LMEsP0PMM # CrsfDZS04Ztybl9M7TmJuwkAm1gx1JDMOuJuh18lbJocIOBvhkKKxY2wI5LIdZVI # ZntmU36RowkX+GGu/PYpYyMjBDClJppZCl7vnjyLYsVt6r0Vu6SmlHpJhcRYabhe # 96Kv1LXH9A6+ogKPU3Layw6JGjg01GNr1ALuT7PO3pGto/JshmOuBEJJDucoF84M # 5AfxFCohMROVldwblA6M0eKnlQBgtr5BvtgbV54BBo88VlFJgDJFQn7R09cTFUEo # UwaJoS+nIaiZ0bQQVZhZloVppUaTdVJojzfVRCZZctga96/tu1HSFnGLnbEFpUN3 # KOf+XnVNS6Ro+nPSDf9bMjbIom2JicGFfV+6yMgIoxY/d5UA2dTZfefil4TAlSod # 6PsTgg+jrm8= # =/Fw0 # -----END PGP SIGNATURE----- # gpg: Signature made Thu 27 Oct 2022 14:29:38 EDT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (58 commits) block/block-backend: blk_set_enable_write_cache is IO_CODE monitor: switch to _co_ functions vmdk: switch to _co_ functions vhdx: switch to _co_ functions vdi: switch to _co_ functions qed: switch to _co_ functions qcow2: switch to _co_ functions qcow: switch to _co_ functions parallels: switch to _co_ functions mirror: switch to _co_ functions block: switch to _co_ functions commit: switch to _co_ functions vmdk: manually add more coroutine_fn annotations qcow2: manually add more coroutine_fn annotations qcow: manually add more coroutine_fn annotations blkdebug: add missing coroutine_fn annotation for indirect-called functions qcow2: add coroutine_fn annotation for indirect-called functions block: add missing coroutine_fn annotation to BlockDriverState callbacks coroutine-io: add missing coroutine_fn annotation to prototypes coroutine-lock: add missing coroutine_fn annotation to prototypes ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-30 15:15:12 -04:00
Alberto Faria	c2d7680893	block: add missing coroutine_fn annotation to BlockDriverState callbacks Signed-off-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221013123711.620631-9-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Alberto Faria	16bb776f5b	block: add missing coroutine_fn annotation to prototypes The functions are marked coroutine_fn in the definition. Signed-off-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221013123711.620631-6-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Alberto Faria	6894ee2bee	monitor: add missing coroutine_fn annotation hmp_block_resize and hmp_screendump are defined as a ".coroutine = true" command, so they must be coroutine_fn. Signed-off-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221013123711.620631-4-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Emanuele Giuseppe Esposito	142e690712	block: remove bdrv_try_set_aio_context and replace it with bdrv_try_change_aio_context No functional change intended. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221025084952.2139888-11-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Emanuele Giuseppe Esposito	a41cfda126	block: rename bdrv_child_try_change_aio_context in bdrv_try_change_aio_context No functional changes intended. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221025084952.2139888-10-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Emanuele Giuseppe Esposito	d2aafbb68a	block: remove all unused ->can_set_aio_ctx and ->set_aio_ctx callbacks Together with all _can_set_ and _set_ APIs, as they are not needed anymore. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221025084952.2139888-9-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Emanuele Giuseppe Esposito	e08cc001d5	bdrv_change_aio_context: use hash table instead of list of visited nodes Minor performance improvement, but given that we have hash tables available, avoid iterating in the visited nodes list every time just to check if a node has been already visited. The data structure is not actually a proper hash map, but an hash set, as we are just adding nodes and not key,value pairs. Suggested-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221025084952.2139888-4-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Emanuele Giuseppe Esposito	7e8c182fb5	block: use transactions as a replacement of ->{can_}set_aio_context() Simplify the way the aiocontext can be changed in a BDS graph. There are currently two problems in bdrv_try_set_aio_context: - There is a confusion of AioContext locks taken and released, because we assume that old aiocontext is always taken and new one is taken inside. - It doesn't look very safe to call bdrv_drained_begin while some nodes have already switched to the new aiocontext and others haven't. This could be especially dangerous because bdrv_drained_begin polls, so something else could be executed while graph is in an inconsistent state. Additional minor nitpick: can_set and set_ callbacks both traverse the graph, both using the ignored list of visited nodes in a different way. Therefore, get rid of all of this and introduce a new callback, change_aio_context, that uses transactions to efficiently, cleanly and most importantly safely change the aiocontext of a graph. This new callback is a "merge" of the two previous ones: - Just like can_set_aio_context, recursively traverses the graph. Marks all nodes that are visited using a GList, and checks if they could change the aio_context. - For each node that passes the above check, drain it and add a new transaction that implements a callback that effectively changes the aiocontext. - Once done, the recursive function returns if all nodes can change the AioContext. If so, commit the above transactions. Regardless of the outcome, call transaction.clean() to undo all drains done in the recursion. - The transaction list is scanned only after all nodes are being drained, so we are sure that they all are in the same context, and then we switch their AioContext, concluding the drain only after all nodes switched to the new AioContext. In this way we make sure that bdrv_drained_begin() is always called under the old AioContext, and bdrv_drained_end() under the new one. - Because of the above, we don't need to release and re-acquire the old AioContext every time, as everything is done once (and not per-node drain and aiocontext change). Note that the "change" API is not yet invoked anywhere. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221025084952.2139888-3-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Vladimir Sementsov-Ogievskiy	5bb0474778	block: Manipulate bs->file / bs->backing pointers in .attach/.detach bs->file and bs->backing are a kind of duplication of part of bs->children. But very useful diplication, so let's not drop them at all:) We should manage bs->file and bs->backing in same place, where we manage bs->children, to keep them in sync. Moreover, generic io paths are unprepared to BdrvChild without a bs, so it's double good to clear bs->file / bs->backing when we detach the child. Detach is simple: if we detach bs->file or bs->backing child, just set corresponding field to NULL. Attach is a bit more complicated. But we still can precisely detect should we set one of bs->file / bs->backing or not: - if role is BDRV_CHILD_COW, we definitely deal with bs->backing - else, if role is BDRV_CHILD_FILTERED (it must be also BDRV_CHILD_PRIMARY), it's a filtered child. Use bs->drv->filtered_child_is_backing to chose the pointer field to modify. - else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file - in all other cases, it's neither bs->backing nor bs->file. It's some other child and we shouldn't care OK. This change brings one more good thing: we can (and should) get rid of all indirect pointers in the block-graph-change transactions: bdrv_attach_child_common() stores BdrvChild into transaction to clear it on abort. bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm() just pass-through this feature, bdrv_root_attach_child() doesn't need the feature. Look at bdrv_attach_child_noperm() callers: - bdrv_attach_child() doesn't need the feature - bdrv_set_file_or_backing_noperm() uses the feature to manage bs->file and bs->backing, we don't want it anymore - bdrv_append() uses the feature to manage bs->backing, again we don't want it anymore So, we should drop this stuff! Great! We could probably keep BdrvChild argument to keep the int return value, but it seems not worth the complexity. Finally, we now set .file / .backing automatically in generic code and want to restring setting them by hand outside of .attach/.detach. So, this patch cleanups all remaining places where they were set. To find such places I use: git grep '\->file =' git grep '\->backing =' git grep '&.\<backing\>' git grep '&.\<file\>' Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Vladimir Sementsov-Ogievskiy	71ca43852a	block: document connection between child roles and bs->backing/bs->file Make the informal rules formal. In further commit we'll add corresponding assertions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220726201134.924743-8-vsementsov@yandex-team.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Vladimir Sementsov-Ogievskiy	1921b4f786	test-bdrv-graph-mod: fix filters to be filters bdrv_pass_through is used as filter, even all node variables has corresponding names. We want to append it, so it should be backing-child-based filter like mirror_top. So, in test_update_perm_tree, first child should be DATA, as we don't want filters with two filtered children. bdrv_exclusive_writer is used as a filter once. So it should be filter anyway. We want to append it, so it should be backing-child-based fitler too. Make all FILTERED children to be PRIMARY as well. We are going to force this rule by assertion soon. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220726201134.924743-7-vsementsov@yandex-team.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Vladimir Sementsov-Ogievskiy	8393078032	block: introduce bdrv_open_file_child() helper Almost all drivers call bdrv_open_child() similarly. Let's create a helper for this. The only not updated drivers that call bdrv_open_child() to set bs->file are raw-format and snapshot-access: raw-format sometimes want to have filtered child but don't set drv->is_filter to true. snapshot-access wants only DATA \| PRIMARY Possibly we should implement drv->is_filter_func() handler, to consider raw-format as filter when it works as filter.. But it's another story. Note also, that we decrease assignments to bs->file in code: it helps us restrict modifying this field in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220726201134.924743-3-vsementsov@yandex-team.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Vladimir Sementsov-Ogievskiy	046fd84fac	block: BlockDriver: add .filtered_child_is_backing field Unfortunately not all filters use .file child as filtered child. Two exclusions are mirror_top and commit_top. Happily they both are private filters. Bad thing is that this inconsistency is observable through qmp commands query-block / query-named-block-nodes. So, could we just change mirror_top and commit_top to use file child as all other filter driver is an open question. Probably, we could do that with some kind of deprecation period, but how to warn users during it? For now, let's just add a field so we can distinguish them in generic code, it will be used in further commits. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220726201134.924743-2-vsementsov@yandex-team.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:14:11 +02:00
Bin Meng	69fbfff95e	block: Refactor get_tmp_filename() At present there are two callers of get_tmp_filename() and they are inconsistent. One does: /* TODO: extra byte is a hack to ensure MAX_PATH space on Windows. / char tmp_filename = g_malloc0(PATH_MAX + 1); ... ret = get_tmp_filename(tmp_filename, PATH_MAX + 1); while the other does: s->qcow_filename = g_malloc(PATH_MAX); ret = get_tmp_filename(s->qcow_filename, PATH_MAX); As we can see different 'size' arguments are passed. There are also platform specific implementations inside the function, and the use of snprintf is really undesirable. The function name is also misleading. It creates a temporary file, not just a filename. Refactor this routine by changing its name and signature to: char create_tmp_file(Error *errp) and use g_get_tmp_dir() / g_mkstemp() for a consistent implementation. While we are here, add some comments to mention that /var/tmp is preferred over /tmp on non-win32 hosts. Signed-off-by: Bin Meng <bin.meng@windriver.com> Message-Id: <20221010040432.3380478-2-bin.meng@windriver.com> [kwolf: Fixed incorrect errno negation and iotest 051] Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-27 20:13:32 +02:00
Stefan Hajnoczi	f4ec04bae9	block: return errors from bdrv_register_buf() Registering an I/O buffer is only a performance optimization hint but it is still necessary to return errors when it fails. Later patches will need to detect errors when registering buffers but an immediate advantage is that error_report() calls are no longer needed in block driver .bdrv_register_buf() functions. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-8-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	e8b6535533	block: add BDRV_REQ_REGISTERED_BUF request flag Block drivers may optimize I/O requests accessing buffers previously registered with bdrv_register_buf(). Checking whether all elements of a request's QEMUIOVector are within previously registered buffers is expensive, so we need a hint from the user to avoid costly checks. Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all QEMUIOVector elements in an I/O request are known to be within previously registered buffers. Always pass the flag through to driver read/write functions. There is little harm in passing the flag to a driver that does not use it. Passing the flag to drivers avoids changes across many block drivers. Filter drivers would need to explicitly support the flag and pass through to their children when the children support it. That's a lot of code changes and it's hard to remember to do that everywhere, leading to silent reduced performance when the flag is accidentally dropped. The only problematic scenario with the approach in this patch is when a driver passes the flag through to internal I/O requests that don't use the same I/O buffer. In that case the hint may be set when it should actually be clear. This is a rare case though so the risk is low. Some drivers have assert(!flags), which no longer works when BDRV_REQ_REGISTERED_BUF is passed in. These assertions aren't very useful anyway since the functions are called almost exclusively by bdrv_driver_preadv/pwritev() so if we get flags handling right there then the assertion is not needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	98b3ddc78b	block: use BdrvRequestFlags type for supported flag fields Use the enum type so GDB displays the enum members instead of printing a numeric constant. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	4f384011c5	block: pass size to bdrv_unregister_buf() The only implementor of bdrv_register_buf() is block/nvme.c, where the size is not needed when unregistering a buffer. This is because util/vfio-helpers.c can look up mappings by address. Future block drivers that implement bdrv_register_buf() may not be able to do their job given only the buffer address. Add a size argument to bdrv_unregister_buf(). Also document the assumptions about bdrv_register_buf()/bdrv_unregister_buf() calls. The same <host, size> values that were given to bdrv_register_buf() must be given to bdrv_unregister_buf(). gcc 11.2.1 emits a spurious warning that img_bench()'s buf_size local variable might be uninitialized, so it's necessary to silence the compiler. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Emanuele Giuseppe Esposito	ba6a910052	blockjob: remove unused functions These public functions are not used anywhere, thus can be dropped. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220926093214.506243-21-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-07 12:11:41 +02:00
Emanuele Giuseppe Esposito	3937e12cf8	blockjob.h: categorize fields in struct BlockJob The same job lock is being used also to protect some of blockjob fields. Categorize them just as done in job.h. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220926093214.506243-15-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-07 12:11:41 +02:00
Emanuele Giuseppe Esposito	f41ab73fa2	blockjob: introduce block_job _locked() APIs Just as done with job.h, create _locked() functions in blockjob.h These functions will be later useful when caller has already taken the lock. All blockjob _locked functions call job _locked functions. Note: at this stage, job_{lock/unlock} and job lock guard macros are nop. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220926093214.506243-8-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-07 12:11:41 +02:00
Emanuele Giuseppe Esposito	fd4b14e299	aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Same as AIO_WAIT_WHILE macro, but if we are in the Main loop do not release and then acquire ctx_ 's aiocontext. Once all Aiocontext locks go away, this macro will replace AIO_WAIT_WHILE. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220926093214.506243-5-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-07 12:11:41 +02:00
Paolo Bonzini	9fb26291ac	nbd: remove incorrect coroutine_fn annotations nbd_co_establish_connection_cancel() cancels a coroutine but is not called from coroutine context itself, for example in nbd_cancel_in_flight() and in timer callbacks reconnect_delay_timer_cb() and open_timer_cb(). Reviewed-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220922084924.201610-5-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-10-07 12:11:40 +02:00
Keith Busch	a7c5f67a78	block: move bdrv_qiov_is_aligned to file-posix There is only user of bdrv_qiov_is_aligned(), so move the alignment function to there and make it static. Signed-off-by: Keith Busch <kbusch@kernel.org> Message-Id: <20220929200523.3218710-2-kbusch@meta.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-09-30 18:43:44 +02:00
Denis V. Lunev	131498f775	block: make serializing requests functions 'void' Return codes of the following functions are never used in the code: * bdrv_wait_serialising_requests_locked * bdrv_wait_serialising_requests * bdrv_make_request_serialising Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Kevin Wolf <kwolf@redhat.com> CC: Hanna Reitz <hreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <fam@euphon.net> CC: Ronnie Sahlberg <ronniesahlberg@gmail.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220817083736.40981-3-den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-09-30 18:43:44 +02:00
Denis V. Lunev	b2aaf35477	block: pass OnOffAuto instead of bool to block_acct_setup() We would have one more place for block_acct_setup() calling, which should not corrupt original value. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: Peter Krempa <pkrempa@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220824095044.166009-2-den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-09-30 18:02:30 +02:00
Darren Kenny	43f76aac49	nvme: Fix misleading macro when mixed with ternary operator Using the Parfait source code analyser and issue was found in hw/nvme/ctrl.c where the macros NVME_CAP_SET_CMBS and NVME_CAP_SET_PMRS are called with a ternary operatore in the second parameter, resulting in a potentially unexpected expansion of the form: x ? a: b & FLAG_TEST which will result in a different result to: (x ? a: b) & FLAG_TEST. The macros should wrap each of the parameters in brackets to ensure the correct result on expansion. Signed-off-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-07-15 10:40:33 +02:00
Jinhao Fan	3f7fe8de3d	hw/nvme: Implement shadow doorbell buffer support Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3) and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13 in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config command, the nvme_dbbuf_config function tries to associate each existing SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address. Queues created after the Doorbell Buffer Config command will have the doorbell buffers associated with them when they are initialized. In nvme_process_sq and nvme_post_cqe, proactively check for Shadow Doorbell buffer changes instead of wait for doorbell register changes. This reduces the number of MMIOs. In nvme_process_db(), update the shadow doorbell buffer value with the doorbell register value if it is the admin queue. This is a hack since hosts like Linux NVMe driver and SPDK do not use shadow doorbell buffer for the admin queue. Copying the doorbell register value to the shadow doorbell buffer allows us to support these hosts as well as spec-compliant hosts that use shadow doorbell buffer for the admin queue. Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> [k.jensen: rebased] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-07-15 10:40:33 +02:00
Alberto Faria	e97190a405	block: Add bdrv_co_pwrite_sync() Also convert bdrv_pwrite_sync() to being implemented using generated_co_wrapper. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220609152744.3891847-9-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:56 +02:00
Alberto Faria	1d39c7098b	block: Implement bdrv_{pread,pwrite,pwrite_zeroes}() using generated_co_wrapper bdrv_{pread,pwrite}() now return -EIO instead of -EINVAL when 'bytes' is negative, making them consistent with bdrv_{preadv,pwritev}() and bdrv_co_{pread,pwrite,preadv,pwritev}(). bdrv_pwrite_zeroes() now also calls trace_bdrv_co_pwrite_zeroes() and clears the BDRV_REQ_MAY_UNMAP flag when appropriate, which it didn't previously. Signed-off-by: Alberto Faria <afaria@redhat.com> Message-Id: <20220609152744.3891847-8-afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:56 +02:00
Alberto Faria	c1458c66b2	block: Make 'bytes' param of bdrv_co_{pread,pwrite,preadv,pwritev}() an int64_t For consistency with other I/O functions, and in preparation to implement bdrv_{pread,pwrite}() using generated_co_wrapper. unsigned int fits in int64_t, so all callers remain correct. bdrv_check_request32() is called further down the stack and causes -EIO to be returned if 'bytes' is negative or greater than BDRV_REQUEST_MAX_BYTES, which in turns never exceeds SIZE_MAX. Signed-off-by: Alberto Faria <afaria@redhat.com> Message-Id: <20220609152744.3891847-7-afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:56 +02:00
Alberto Faria	ca71a64ee5	block: Make bdrv_co_pwrite() take a const buffer It does not mutate the buffer. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220609152744.3891847-6-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:56 +02:00
Alberto Faria	32cc71def9	block: Change bdrv_{pread,pwrite,pwrite_sync}() param order Swap 'buf' and 'bytes' around for consistency with bdrv_co_{pread,pwrite}(), and in preparation to implement these functions using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression child, offset, buf, bytes, flags; @@ - bdrv_pread(child, offset, buf, bytes, flags) + bdrv_pread(child, offset, bytes, buf, flags) @@ expression child, offset, buf, bytes, flags; @@ - bdrv_pwrite(child, offset, buf, bytes, flags) + bdrv_pwrite(child, offset, bytes, buf, flags) @@ expression child, offset, buf, bytes, flags; @@ - bdrv_pwrite_sync(child, offset, buf, bytes, flags) + bdrv_pwrite_sync(child, offset, bytes, buf, flags) Resulting overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220609152744.3891847-3-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:55 +02:00
Alberto Faria	53fb7844f0	block: Add a 'flags' param to bdrv_{pread,pwrite,pwrite_sync}() For consistency with other I/O functions, and in preparation to implement them using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression child, offset, buf, bytes; @@ - bdrv_pread(child, offset, buf, bytes) + bdrv_pread(child, offset, buf, bytes, 0) @@ expression child, offset, buf, bytes; @@ - bdrv_pwrite(child, offset, buf, bytes) + bdrv_pwrite(child, offset, buf, bytes, 0) @@ expression child, offset, buf, bytes; @@ - bdrv_pwrite_sync(child, offset, buf, bytes) + bdrv_pwrite_sync(child, offset, buf, bytes, 0) Resulting overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220609152744.3891847-2-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-07-12 12:14:55 +02:00
Vladimir Sementsov-Ogievskiy	15df6e6987	block/block-copy: block_copy(): add timeout_ns parameter Add possibility to limit block_copy() call in time. To be used in the next commit. As timed-out block_copy() call will continue in background anyway (we can't immediately cancel IO operation), it's important also give user a possibility to pass a callback, to do some additional actions on block-copy call finish. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2022-06-29 10:56:12 +03:00
Emanuele Giuseppe Esposito	7455ff1aa0	aio_wait_kick: add missing memory barrier It seems that aio_wait_kick always required a memory barrier or atomic operation in the caller, but nobody actually took care of doing it. Let's put the barrier in the function instead, and pair it with another one in AIO_WAIT_WHILE. Read aio_wait_kick() comment for further explanation. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220524173054.12651-1-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Vladimir Sementsov-Ogievskiy	618af89e55	block: simplify handling of try to merge different sized bitmaps We have too much logic to simply check that bitmaps are of the same size. Let's just define that hbitmap_merge() and bdrv_dirty_bitmap_merge_internal() require their argument bitmaps be of same size, this simplifies things. Let's look through the callers: For backup_init_bcs_bitmap() we already assert that merge can't fail. In bdrv_reclaim_dirty_bitmap_locked() we gracefully handle the error that can't happen: successor always has same size as its parent, drop this logic. In bdrv_merge_dirty_bitmap() we already has assertion and separate check. Make the check explicit and improve error message. Signed-off-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20220517111206.23585-4-v.sementsov-og@mail.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Stefan Hajnoczi	3399848b7f	block: drop unused bdrv_co_drain() API bdrv_co_drain() has not been used since commit `9a0cec664e` ("mirror: use bdrv_drained_begin/bdrv_drained_end") in 2016. Remove it so there are fewer drain scenarios to worry about. Use bdrv_drained_begin()/bdrv_drained_end() instead. They are "mixed" functions that can be called from coroutine context. Unlike bdrv_co_drain(), these functions provide control of the length of the drained section, which is usually the right thing. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220521122714.3837731-1-stefanha@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Łukasz Gieryk	11871f53ef	hw/nvme: Add support for the Virtualization Management command With the new command one can: - assign flexible resources (queues, interrupts) to primary and secondary controllers, - toggle the online/offline state of given controller. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	746d42b133	hw/nvme: Initialize capability structures for primary/secondary controllers With four new properties: - sriov_v{i,q}_flexible, - sriov_max_v{i,q}_per_vf, one can configure the number of available flexible resources, as well as the limits. The primary and secondary controller capability structures are initialized accordingly. Since the number of available queues (interrupts) now varies between VF/PF, BAR size calculation is also adjusted. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Lukasz Maniak	99f48ae7ae	hw/nvme: Add support for Secondary Controller List Introduce handling for Secondary Controller List (Identify command with CNS value of 15h). Secondary controller ids are unique in the subsystem, hence they are reserved by it upon initialization of the primary controller to the number of sriov_max_vfs. ID reservation requires the addition of an intermediate controller slot state, so the reserved controller has the address 0xFFFF. A secondary controller is in the reserved state when it has no virtual function assigned, but its primary controller is realized. Secondary controller reservations are released to NULL when its primary controller is unregistered. Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:28 +02:00
Lukasz Maniak	5e6f963f01	hw/nvme: Add support for Primary Controller Capabilities Implementation of Primary Controller Capabilities data structure (Identify command with CNS value of 14h). Currently, the command returns only ID of a primary controller. Handling of remaining fields are added in subsequent patches implementing virtualization enhancements. Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:28 +02:00
Eric Blake	58a6fdcc9e	nbd/server: Allow MULTI_CONN for shared writable exports According to the NBD spec, a server that advertises NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will not see any cache inconsistencies: when properly separated by a single flush, actions performed by one client will be visible to another client, regardless of which client did the flush. We always satisfy these conditions in qemu - even when we support multiple clients, ALL clients go through a single point of reference into the block layer, with no local caching. The effect of one client is instantly visible to the next client. Even if our backend were a network device, we argue that any multi-path caching effects that would cause inconsistencies in back-to-back actions not seeing the effect of previous actions would be a bug in that backend, and not the fault of caching in qemu. As such, it is safe to unconditionally advertise CAN_MULTI_CONN for any qemu NBD server situation that supports parallel clients. Note, however, that we don't want to advertise CAN_MULTI_CONN when we know that a second client cannot connect (for historical reasons, qemu-nbd defaults to a single connection while nbd-server-add and QMP commands default to unlimited connections; but we already have existing means to let either style of NBD server creation alter those defaults). This is visible by no longer advertising MULTI_CONN for 'qemu-nbd -r' without -e, as in the iotest nbd-qemu-allocation. The harder part of this patch is setting up an iotest to demonstrate behavior of multiple NBD clients to a single server. It might be possible with parallel qemu-io processes, but I found it easier to do in python with the help of libnbd, and help from Nir and Vladimir in writing the test. Signed-off-by: Eric Blake <eblake@redhat.com> Suggested-by: Nir Soffer <nsoffer@redhat.com> Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Message-Id: <20220512004924.417153-3-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-05-12 13:10:52 +02:00
Eric Blake	a5fced4021	qemu-nbd: Pass max connections to blockdev layer The next patch wants to adjust whether the NBD server code advertises MULTI_CONN based on whether it is known if the server limits to exactly one client. For a server started by QMP, this information is obtained through nbd_server_start (which can support more than one export); but for qemu-nbd (which supports exactly one export), it is controlled only by the command-line option -e/--shared. Since we already have a hook function used by qemu-nbd, it's easiest to just alter its signature to fit our needs. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20220512004924.417153-2-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-05-12 13:10:52 +02:00
Markus Armbruster	ea9cea93c6	Clean up decorations and whitespace around header guards Cleaned up with scripts/clean-header-guards.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-5-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2022-05-11 16:50:32 +02:00
Markus Armbruster	52581c718c	Clean up header guards that don't match their file name Header guard symbols should match their file name to make guard collisions less likely. Cleaned up with scripts/clean-header-guards.pl, followed by some renaming of new guard symbols picked by the script to better ones. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-2-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> [Change to generated file ebpf/rss.bpf.skeleton.h backed out]	2022-05-11 16:49:06 +02:00
Nicolas Saenz Julienne	71ad4713cc	util/event-loop-base: Introduce options to set the thread pool size The thread pool regulates itself: when idle, it kills threads until empty, when in demand, it creates new threads until full. This behaviour doesn't play well with latency sensitive workloads where the price of creating a new thread is too high. For example, when paired with qemu's '-mlock', or using safety features like SafeStack, creating a new thread has been measured take multiple milliseconds. In order to mitigate this let's introduce a new 'EventLoopBase' property to set the thread pool size. The threads will be created during the pool's initialization or upon updating the property's value, remain available during its lifetime regardless of demand, and destroyed upon freeing it. A properly characterized workload will then be able to configure the pool to avoid any latency spikes. Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20220425075723.20019-4-nsaenzju@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-05-09 10:43:23 +01:00
Hanna Reitz	15aee7ac95	block: Classify bdrv_get_flags() as I/O function This function is safe to call in an I/O context, and qcow2_do_open() does so (invoked in an I/O context by qcow2_co_invalidate_cache()). Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220427114057.36651-2-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-05-04 15:55:23 +02:00
Vladimir Sementsov-Ogievskiy	1466ef6cbe	qapi: rename BlockDirtyBitmapMergeSource to BlockDirtyBitmapOrStr Rename the type to be reused. Old name is "what is it for". To be natively reused for other needs, let's name it exactly "what is it". Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Message-Id: <20220314213226.362217-2-v.sementsov-og@mail.ru> [eblake: Adjust S-o-b to Vladimir's new email, with permission] Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: John Snow <jsnow@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-04-26 13:13:50 -05:00
Marc-André Lureau	215aea0cb2	include: move qdict_{crumple,flatten} declarations Move them where they belong, since the functions are implemented in block-qdict.c. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220420132624.2439741-25-marcandre.lureau@redhat.com>	2022-04-21 17:03:51 +04:00
Daniel P. Berrangé	046f98d075	block: pass desired TLS hostname through from block driver client In commit `a71d597b98` Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Date: Thu Jun 10 13:08:00 2021 +0300 block/nbd: reuse nbd_co_do_establish_connection() in nbd_open() the use of the 'hostname' field from the BDRVNBDState struct was lost, and 'nbd_connect' just hardcoded it to match the IP socket address. This was a harmless bug at the time since we block use with anything other than IP sockets. Shortly though, we want to allow the caller to override the hostname used in the TLS certificate checks. This is to allow for TLS when doing port forwarding or tunneling. Thus we need to reinstate the passing along of the 'hostname'. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220304193610.3293146-3-berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-03-07 15:58:42 -06:00
Vladimir Sementsov-Ogievskiy	1c14eaabdb	block: introduce snapshot-access block driver The new block driver simply utilizes snapshot-access API of underlying block node. In further patches we want to use it like this: [guest] [NBD export] \| \| \| root \| root v file v [copy-before-write]<------[snapshot-access] \| \| \| file \| target v v [active-disk] [temp.img] This way, NBD client will be able to read snapshotted state of active disk, when active disk is continued to be written by guest. This is known as "fleecing", and currently uses another scheme based on qcow2 temporary image which backing file is active-disk. New scheme comes with benefits - see next commit. The other possible application is exporting internal snapshots of qcow2, like this: [guest] [NBD export] \| \| \| root \| root v file v [qcow2]<---------[snapshot-access] For this, we'll need to implement snapshot-access API handlers in qcow2 driver, and improve snapshot-access block driver (and API) to make it possible to select snapshot by name. Another thing to improve is size of snapshot. Now for simplicity we just use size of bs->file, which is OK for backup, but for qcow2 snapshots export we'll need to imporve snapshot-access API to get size of snapshot. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20220303194349.2304213-12-vsementsov@virtuozzo.com> [hreitz: Rebased on block GS/IO split] Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-03-07 09:33:31 +01:00
Vladimir Sementsov-Ogievskiy	ce14f3b407	block/io: introduce block driver snapshot-access API Add new block driver handlers and corresponding generic wrappers. It will be used to allow copy-before-write filter to provide reach fleecing interface in further commit. In future this approach may be used to allow reading qcow2 internal snapshots, for example to export them through NBD. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-11-vsementsov@virtuozzo.com> [hreitz: Rebased on block GS/IO split] Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-03-07 09:33:31 +01:00
Vladimir Sementsov-Ogievskiy	3b7ca26bdf	block/reqlist: add reqlist_wait_all() Add function to wait for all intersecting requests. To be used in the further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-10-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-03-07 09:33:30 +01:00
Vladimir Sementsov-Ogievskiy	a6426475a7	block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Add a convenient function similar with bdrv_block_status() to get status of dirty bitmap. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-9-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-03-07 09:33:30 +01:00
Vladimir Sementsov-Ogievskiy	d088e6a48a	block: intoduce reqlist Split intersecting-requests functionality out of block-copy to be reused in copy-before-write filter. Note: while being here, fix tiny typo in MAINTAINERS. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-7-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-03-07 09:33:30 +01:00
Vladimir Sementsov-Ogievskiy	177541e671	block/block-copy: add block_copy_reset() Split block_copy_reset() out of block_copy_reset_unallocated() to be used separately later. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-6-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-03-07 09:33:30 +01:00
Vladimir Sementsov-Ogievskiy	1f7252e868	block/block-copy: block_copy_state_new(): add bitmap parameter This will be used in the following commit to bring "incremental" mode to copy-before-write filter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-4-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-03-07 09:33:30 +01:00
Vladimir Sementsov-Ogievskiy	34ffacb7f4	block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value That simplifies handling failure in existing code and in further new usage of bdrv_merge_dirty_bitmap(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-3-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-03-07 09:33:30 +01:00
Vladimir Sementsov-Ogievskiy	45e62b464a	block: fix preallocate filter: don't do unaligned preallocate requests There is a bug in handling BDRV_REQ_NO_WAIT flag: we still may wait in wait_serialising_requests() if request is unaligned. And this is possible for the only user of this flag (preallocate filter) if underlying file is unaligned to its request_alignment on start. So, we have to fix preallocate filter to do only aligned preallocate requests. Next, we should fix generic block/io.c somehow. Keeping in mind that preallocate is the only user of BDRV_REQ_NO_WAIT and that we have to fix its behavior now, it seems more safe to just assert that we never use BDRV_REQ_NO_WAIT with unaligned requests and add corresponding comment. Let's do so. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <20220215121609.38570-1-vsementsov@virtuozzo.com> [hreitz: Rebased on block GS/IO split] Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-03-07 09:19:20 +01:00
Emanuele Giuseppe Esposito	abc5a79c64	block_int-common.h: split function pointers in BdrvChildClass Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-28-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Emanuele Giuseppe Esposito	69c0bf1197	block_int-common.h: split function pointers in BlockDriver Similar to the header split, also the function pointers in BlockDriver can be split in I/O and global state. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-26-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Emanuele Giuseppe Esposito	6b573efec8	include/block/snapshot: global state API + assertions Snapshots run also under the BQL, so they all are in the global state API. The aiocontext lock that they hold is currently an overkill and in future could be removed. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-23-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Emanuele Giuseppe Esposito	4ad3387637	include/block/blockjob.h: global state API blockjob functions run always under the BQL lock. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-19-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Emanuele Giuseppe Esposito	2015c4c28d	include/block/blockjob_int.h: split header into I/O and GS API Since the I/O functions are not many, keep a single file. Also split the function pointers in BlockJobDriver. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220303151616.325444-16-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Emanuele Giuseppe Esposito	696bf4c78c	block: introduce assert_bdrv_graph_writable We want to be sure that the functions that write the child and parent list of a bs are under BQL and drain. BQL prevents from concurrent writings from the GS API, while drains protect from I/O. TODO: drains are missing in some functions using this assert. Therefore a proper assertion will fail. Because adding drains requires additional discussions, they will be added in future series. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-15-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Emanuele Giuseppe Esposito	967d7905d1	IO_CODE and IO_OR_GS_CODE for block_int I/O API Mark all I/O functions with IO_CODE, and all "I/O OR GS" with IO_OR_GS_CODE. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-14-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Emanuele Giuseppe Esposito	ebc2752b08	include/block/block_int: split header into I/O and global state API Similarly to the previous patch, split block_int.h in block_int-io.h and block_int-global-state.h block_int-common.h contains the structures shared between the two headers, and the functions that can't be categorized as I/O or global state. Assertions are added in the next patch. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-12-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Emanuele Giuseppe Esposito	384a48fb74	IO_CODE and IO_OR_GS_CODE for block I/O API Mark all I/O functions with IO_CODE, and all "I/O OR GS" with IO_OR_GS_CODE. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-6-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Emanuele Giuseppe Esposito	3b491a9056	include/block/block: split header into I/O and global state API block.h currently contains a mix of functions: some of them run under the BQL and modify the block layer graph, others are instead thread-safe and perform I/O in iothreads. Some others can only be called by either the main loop or the iothread running the AioContext (and not other iothreads), and using them in another thread would cause deadlocks, and therefore it is not ideal to define them as I/O. It is not easy to understand which function is part of which group (I/O vs GS vs "I/O or GS"), and this patch aims to clarify it. The "GS" functions need the BQL, and often use aio_context_acquire/release and/or drain to be sure they can modify the graph safely. The I/O function are instead thread safe, and can run in any AioContext. "I/O or GS" functions run instead in the main loop or in a single iothread, and use BDRV_POLL_WHILE(). By splitting the header in two files, block-io.h and block-global-state.h we have a clearer view on what needs what kind of protection. block-common.h contains common structures shared by both headers. block.h is left there for legacy and to avoid changing all includes in all c files that use the block APIs. Assertions are added in the next patch. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-4-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Emanuele Giuseppe Esposito	3b71719462	block: rename bdrv_invalidate_cache_all, blk_invalidate_cache and test_sync_op_invalidate_cache Following the bdrv_activate renaming, change also the name of the respective callers. bdrv_invalidate_cache_all -> bdrv_activate_all blk_invalidate_cache -> blk_activate test_sync_op_invalidate_cache -> test_sync_op_activate No functional change intended. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220209105452.1694545-5-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:14:40 +01:00
Emanuele Giuseppe Esposito	a94750d956	block: introduce bdrv_activate This function is currently just a wrapper for bdrv_invalidate_cache(), but in future will contain the code of bdrv_co_invalidate_cache() that has to always be protected by BQL, and leave the rest in the I/O coroutine. Replace all bdrv_invalidate_cache() invokations with bdrv_activate(). Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220209105452.1694545-4-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:14:40 +01:00
Emanuele Giuseppe Esposito	c1019d1687	crypto: perform permission checks under BQL Move the permission API calls into driver-specific callbacks that always run under BQL. In this case, bdrv_crypto_luks needs to perform permission checks before and after qcrypto_block_amend_options(). The problem is that the caller, block_crypto_amend_options_generic_luks(), can also run in I/O from .bdrv_co_amend(). This does not comply with Global State-I/O API split, as permissions API must always run under BQL. Firstly, introduce .bdrv_amend_pre_run() and .bdrv_amend_clean() callbacks. These two callbacks are guaranteed to be invoked under BQL, respectively before and after .bdrv_co_amend(). They take care of performing the permission checks in the same way as they are currently done before and after qcrypto_block_amend_options(). These callbacks are in preparation for next patch, where we delete the original permission check. Right now they just add redundant control. Then, call .bdrv_amend_pre_run() before job_start in qmp_x_blockdev_amend(), so that it will be run before the job coroutine is created and stay in the main loop. As a cleanup, use JobDriver's .clean() callback to call .bdrv_amend_clean(), and run amend-specific cleanup callbacks under BQL. After this patch, permission failures occur early in the blockdev-amend job to update a LUKS volume's keys. iotest 296 must now expect them in x-blockdev-amend's QMP reply instead of waiting for the actual job to fail later. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220209105452.1694545-2-eesposit@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220304153729.711387-6-hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:14:39 +01:00
Naveen Nagar	44219b6029	hw/nvme: 64-bit pi support This adds support for one possible new protection information format introduced in TP4068 (and integrated in NVMe 2.0): the 64-bit CRC guard and 48-bit reference tag. This version does not support storage tags. Like the CRC16 support already present, this uses a software implementation of CRC64 (so it is naturally pretty slow). But its good enough for verification purposes. This may go nicely hand-in-hand with the support that Keith submitted for the Linux kernel[1]. [1]: https://lore.kernel.org/linux-nvme/20220126165214.GA1782352@dhcp-10-100-145-180.wdc.com/T/ Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-03-03 09:30:21 +01:00
Naveen Nagar	763c05dfb0	hw/nvme: add support for the lbafee hbs feature Add support for up to 64 LBA formats through the LBAFEE field of the Host Behavior Support feature. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-03-03 09:28:49 +01:00
Naveen Nagar	d0c0697b9e	hw/nvme: add host behavior support feature Add support for getting and setting the Host Behavior Support feature. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-03-03 09:28:48 +01:00
Klaus Jensen	e321b4cdc2	hw/nvme: add support for zoned random write area Add support for TP 4076 ("Zoned Random Write Area"), v2021.08.23 ("Ratified"). This adds three new namespace parameters: "zoned.numzrwa" (number of zrwa resources, i.e. number of zones that can have a zrwa), "zoned.zrwas" (zrwa size in LBAs), "zoned.zrwafg" (granularity in LBAs for flushes). Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-02-14 08:58:29 +01:00
Klaus Jensen	25872031e1	hw/nvme: add ozcs enum Add enumeration for OZCS values. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-02-14 08:58:29 +01:00
Klaus Jensen	6190d92ff7	hw/nvme: add struct for zone management send Add struct for Zone Management Send in preparation for more zone send flags. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-02-14 08:58:29 +01:00
Emanuele Giuseppe Esposito	751486c185	block.h: remove outdated comment The comment "disk I/O throttling" doesn't make any sense at all any more. It was added in commit `0563e19151` to describe bdrv_io_limits_enable()/disable(), which were removed in commit `97148076`, so the comment is just a forgotten leftover. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220131125615.74612-1-eesposit@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2022-02-01 13:28:53 +01:00
Peter Maydell	1cd2ad11d3	Block layer patches - qemu-storage-daemon: Add vhost-user-blk help - block-backend: Fix use-after-free for BDS pointers after aio_poll() - qemu-img: Fix sparseness of output image with unaligned ranges - vvfat: Fix crashes in read-write mode - Fix device deletion events with -device JSON syntax - Code cleanups -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmHhf5gRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9YBMA//ZkaIigVsfjRoeUh2MccgOuvYpXZtq4po q7l6AwGLbBpTt5Fy468gYhwmXuwHCapTMRmvWf6mpb86jtJ6vdbE16L0Z4/Z9iiW C0w69fsAAP9XyI+f7Q5FNtzz3jWztKowgyhkU33izbwYM7dm5Xw1q5bDkOiIBNoO d8cdxLC1oQGEWJmGLgmbaM/ow0iDogFpT8zU5j0VE3uK01si8pblWlXm1SM3nOK9 b4uROqKYsTzTny/zX7KxD4SX3UGKYK393rQxr5HdmTiW14uGfB+EVfBxJmn07Qch lWM/v9tYoP1aVbR6IL5osAQdmbDYX0zsRMq5UA+dQ6OqnE3GpluVrYIFoaUSoShf S704hYdWgO0sKfpAYgJgGo6y0mglnp9Z7xO4Ng3XUNj0gvfgnOe3CdCdXIOeTFwC eP+KlFvbUT2xpTqI6ttBgKCcwKHA3hgWCnlo39C80bL1ZVKWSqh6zORfwmptouQ3 BmuhEqZRyoYrknrTELN+lIKK2gP6MLup/ymeXWOOOE58KSpmrdeBAXmgJNXX3ucx lAWGsIz0CxdaKQoZpKpikho4rhrGkqZ33B3H7mdcsKS6zYzmsDIqa9FzUjtpvN2V K/jXlK7dv58Y+LLzpcuJAf8HNnitA107WD5RA1s5nTw0ahD2GwR4UPzEhnSO9/nT yZ3dGUysj7Q= =dnBv -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches - qemu-storage-daemon: Add vhost-user-blk help - block-backend: Fix use-after-free for BDS pointers after aio_poll() - qemu-img: Fix sparseness of output image with unaligned ranges - vvfat: Fix crashes in read-write mode - Fix device deletion events with -device JSON syntax - Code cleanups # gpg: Signature made Fri 14 Jan 2022 13:50:16 GMT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: iotests/testrunner.py: refactor test_field_width block: drop BLK_PERM_GRAPH_MOD qemu-img: make is_allocated_sectors() more efficient iotests: Test qemu-img convert of zeroed data cluster vvfat: Fix vvfat_write() for writes before the root directory vvfat: Fix size of temporary qcow file iotests/308: Fix for CAP_DAC_OVERRIDE iotests/stream-error-on-reset: New test block-backend: prevent dangling BDS pointers across aio_poll() qapi/block: Restrict vhost-user-blk to CONFIG_VHOST_USER_BLK_SERVER qemu-storage-daemon: Add vhost-user-blk help docs: Correct 'vhost-user-blk' spelling softmmu: fix device deletion events with -device JSON syntax include/sysemu/blockdev.h: remove drive_get_max_devs include/sysemu/blockdev.h: remove drive_mark_claimed_by_board and inline drive_def block_int: make bdrv_backing_overridden static Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-01-14 15:56:30 +00:00
Vladimir Sementsov-Ogievskiy	64631f3681	block: drop BLK_PERM_GRAPH_MOD First, this permission never protected a node from being changed, as generic child-replacing functions don't check it. Second, it's a strange thing: it presents a permission of parent node to change its child. But generally, children are replaced by different mechanisms, like jobs or qmp commands, not by nodes. Graph-mod permission is hard to understand. All other permissions describe operations which done by parent node on its child: read, write, resize. Graph modification operations are something completely different. The only place where BLK_PERM_GRAPH_MOD is used as "perm" (not shared perm) is mirror_start_job, for s->target. Still modern code should use bdrv_freeze_backing_chain() to protect from graph modification, if we don't do it somewhere it may be considered as a bug. So, it's a bit risky to drop GRAPH_MOD, and analyzing of possible loss of protection is hard. But one day we should do it, let's do it now. One more bit of information is that locking the corresponding byte in file-posix doesn't make sense at all. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210902093754.2352-1-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-01-14 12:03:16 +01:00
Emanuele Giuseppe Esposito	fa8fc1d09f	block_int: make bdrv_backing_overridden static bdrv_backing_overridden is only used in block.c, so there is no need to leave it in block_int.h Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20211215121140.456939-2-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-01-14 12:03:16 +01:00
Stefan Hajnoczi	826cc32423	aio-posix: split poll check from ready handler Adaptive polling measures the execution time of the polling check plus handlers called when a polled event becomes ready. Handlers can take a significant amount of time, making it look like polling was running for a long time when in fact the event handler was running for a long time. For example, on Linux the io_submit(2) syscall invoked when a virtio-blk device's virtqueue becomes ready can take 10s of microseconds. This can exceed the default polling interval (32 microseconds) and cause adaptive polling to stop polling. By excluding the handler's execution time from the polling check we make the adaptive polling calculation more accurate. As a result, the event loop now stays in polling mode where previously it would have fallen back to file descriptor monitoring. The following data was collected with virtio-blk num-queues=2 event_idx=off using an IOThread. Before: 168k IOPS, IOThread syscalls: 9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0) = 16 9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8) = 8 9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8) = 8 9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3 9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0) = 32 174k IOPS (+3.6%), IOThread syscalls: 9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0) = 32 9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8) = 8 9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8) = 8 9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50) = 32 Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because the IOThread stays in polling mode instead of falling back to file descriptor monitoring. As usual, polling is not implemented on Windows so this patch ignores the new io_poll_read() callback in aio-win32.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-2-stefanha@redhat.com [Fixed up aio_set_event_notifier() calls in tests/unit/test-fdmon-epoll.c added after this series was queued. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-01-12 17:09:39 +00:00
Vladimir Sementsov-Ogievskiy	985cac8f20	blockjob: drop BlockJob.blk field It's unused now (except for permission handling)[]. The only reasonable user of it was block-stream job, recently updated to use own blk. And other block jobs prefer to use own source node related objects. So, the arguments of dropping the field are: - block jobs prefer not to use it - block jobs usually has more then one node to operate on, and better to operate symmetrically (for example has both source and target blk's in specific block-job state structure) : BlockJob.blk is used to keep some permissions. We simply move permissions to block-job child created in block_job_create() together with blk. In mirror, we just should not care anymore about restoring state of blk. Most probably this code could be dropped long ago, after dropping bs->job pointer. Now it finally goes away together with BlockJob.blk itself. iotest 141 output is updated, as "bdrv_has_blk(bs)" check in qmp_blockdev_del() doesn't fail (we don't have blk now). Still, new error message looks even better. In iotest 283 we need to add a job id, otherwise "Invalid job ID" happens now earlier than permission check (as permissions moved from blk to block-job node). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com>	2021-12-28 15:18:59 +01:00
Vladimir Sementsov-Ogievskiy	df9a316505	blockjob: implement and use block_job_get_aio_context We are going to drop BlockJob.blk. So let's retrieve block job context from underlying job instead of main node. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com>	2021-12-28 15:18:23 +01:00
Stefano Garzarella	68d7946648	linux-aio: add `dev_max_batch` parameter to laio_io_unplug() Between the submission of a request and the unplug, other devices with larger limits may have been queued new requests without flushing the batch. Using the new `dev_max_batch` parameter, laio_io_unplug() can check if the batch exceeds the device limit to flush the current batch. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20211026162346.253081-4-sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-11-02 13:03:35 +01:00
Stefano Garzarella	512da21101	linux-aio: add `dev_max_batch` parameter to laio_co_submit() This new parameter can be used by block devices to limit the Linux AIO batch size more than the limit set by the AIO context. file-posix backend supports this, passing its `aio-max-batch` option previously added. Add an helper function to calculate the maximum batch size. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20211026162346.253081-3-sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-11-02 13:03:35 +01:00
Paolo Bonzini	cc07162953	block: introduce max_hw_iov for use in scsi-generic Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel sources, IOV_MAX in POSIX). Because of this, on some host adapters requests with many iovecs are rejected with -EINVAL by the io_submit() or readv()/writev() system calls. In fact, the same limit applies to SG_IO as well. To fix both the EINVAL and the possible performance issues from using fewer iovecs than allowed by Linux (some HBAs have max_segments as low as 128), introduce a separate entry in BlockLimits to hold the max_segments value from sysfs. This new limit is used only for SG_IO and clamped to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to bs->bl.max_transfer. Reported-by: Halil Pasic <pasic@linux.ibm.com> Cc: Hanna Reitz <hreitz@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-block@nongnu.org Cc: qemu-stable@nongnu.org Fixes: `18473467d5` ("file-posix: try BLKSECTGET on block devices too, do not round to power of 2", 2021-06-25) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210923130436.1187591-1-pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-10-06 10:25:55 +02:00
Vladimir Sementsov-Ogievskiy	621d17378a	block: implement bdrv_new_open_driver_opts() Add version of bdrv_new_open_driver() that supports QDict options. We'll use it in further commit. Simply add one more argument to bdrv_new_open_driver() is worse, as there are too many invocations of bdrv_new_open_driver() to update then. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Suggested-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210920115538.264372-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-10-06 10:25:55 +02:00
Emanuele Giuseppe Esposito	a6297e1ade	include/block.h: remove outdated comment There are a couple of errors in bdrv_drained_begin header comment: - block_job_pause does not exist anymore, it has been replaced with job_pause in `b15de82867` - job_pause is automatically invoked as a .drained_begin callback (child_job_drained_begin) by the child_job BdrvChildClass struct in blockjob.c. So no additional pause should be required. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210903113800.59970-1-eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-10-06 10:25:55 +02:00
Vladimir Sementsov-Ogievskiy	0c8022876f	block: use int64_t instead of int in driver discard handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver discard handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_pdiscard in block/io.c. It is already prepared to work with 64bit requests, but pass at most max(bs->bl.max_pdiscard, INT_MAX) to the driver. Let's look at all updated functions: blkdebug: all calculations are still OK, thanks to bdrv_check_qiov_request(). both rule_check and bdrv_co_pdiscard are 64bit blklogwrites: pass to blk_loc_writes_co_log which is 64bit blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pdiscard, OK copy-before-write: pass to bdrv_co_pdiscard which is 64bit and to cbw_do_copy_before_write which is 64bit file-posix: one handler calls raw_account_discard() is 64bit and both handlers calls raw_do_pdiscard(). Update raw_do_pdiscard, which pass to RawPosixAIOData::aio_nbytes, which is 64bit (and calls raw_account_discard()) gluster: somehow, third argument of glfs_discard_async is size_t. Let's set max_pdiscard accordingly. iscsi: iscsi_allocmap_set_invalid is 64bit, !is_byte_request_lun_aligned is 64bit. list.num is uint32_t. Let's clarify max_pdiscard and pdiscard_alignment. mirror_top: pass to bdrv_mirror_top_do_write() which is 64bit nbd: protocol limitation. max_pdiscard is alredy set strict enough, keep it as is for now. nvme: buf.nlb is uint32_t and we do shift. So, add corresponding limits to nvme_refresh_limits(). preallocate: pass to bdrv_co_pdiscard() which is 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: calculations are still OK, thanks to bdrv_check_qiov_request(), qcow2_cluster_discard() is 64bit. raw-format: raw_adjust_offset() is 64bit, bdrv_co_pdiscard too. throttle: pass to bdrv_co_pdiscard() which is 64bit and to throttle_group_co_io_limits_intercept() which is 64bit as well. test-block-iothread: bytes argument is unused Great! Now all drivers are prepared to handle 64bit discard requests, or else have explicit max_pdiscard limits. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-11-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	39af49c0d7	block: make BlockLimits::max_pdiscard 64bit We are going to support 64 bit discard requests. Now update the limit variable. It's absolutely safe. The variable is set in some drivers, and used in bdrv_co_pdiscard(). Update also max_pdiscard variable in bdrv_co_pdiscard(), so that bdrv_co_pdiscard() is now prepared for 64bit requests. The remaining logic including num, offset and bytes variables is already supporting 64bit requests. So the only thing that prevents 64 bit requests is limiting max_pdiscard variable to INT_MAX in bdrv_co_pdiscard(). We'll drop this limitation after updating all block drivers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-10-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	f34b2bcf8c	block: use int64_t instead of int in driver write_zeroes handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write_zeroes handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_do_pwrite_zeroes(). bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s max_write_zeroes is limited to INT_MAX. So, updated functions all are safe, they will not get "bytes" larger than before. Still, let's look through all updated functions, and add assertions to the ones which are actually unprepared to values larger than INT_MAX. For these drivers also set explicit max_pwrite_zeroes limit. Let's go: blkdebug: calculations can't overflow, thanks to bdrv_check_qiov_request() in generic layer. rule_check() and bdrv_co_pwrite_zeroes() both have 64bit argument. blklogwrites: pass to blk_log_writes_co_log() with 64bit argument. blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pwrite_zeroes() which is OK copy-before-write: Calls cbw_do_copy_before_write() and bdrv_co_pwrite_zeroes, both have 64bit argument. file-posix: both handler calls raw_do_pwrite_zeroes, which is updated. In raw_do_pwrite_zeroes() calculations are OK due to bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes which is uint64_t. Check also where that uint64_t gets handed: handle_aiocb_write_zeroes_block() passes a uint64_t[2] to ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate() which takes off_t (and we compile to always have 64-bit off_t), as does handle_aiocb_write_zeroes_unmap. All look safe. gluster: bytes go to GlusterAIOCB::size which is int64_t and to glfs_zerofill_async works with off_t. iscsi: Aha, here we deal with iscsi_writesame16_task() that has uint32_t num_blocks argument and iscsi_writesame16_task() has uint16_t argument. Make comments, add assertions and clarify max_pwrite_zeroes calculation. iscsi_allocmap_() functions already has int64_t argument is_byte_request_lun_aligned is simple to update, do it. mirror_top: pass to bdrv_mirror_top_do_write which has uint64_t argument nbd: Aha, here we have protocol limitation, and NBDRequest::len is uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are OK for now. nvme: Again, protocol limitation. And no inherent limit for write-zeroes at all. But from code that calculates cdw12 it's obvious that we do have limit and alignment. Let's clarify it. Also, obviously the code is not prepared to handle bytes=0. Let's handle this case too. trace events already 64bit preallocate: pass to handle_write() and bdrv_co_pwrite_zeroes(), both 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: offset + bytes and alignment still works good (thanks to bdrv_check_qiov_request()), so tail calculation is OK qcow2_subcluster_zeroize() has 64bit argument, should be OK trace events updated qed: qed_co_request wants int nb_sectors. Also in code we have size_t used for request length which may be 32bit. So, let's just keep INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and don't care. raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both 64bit. throttle: Both throttle_group_co_io_limits_intercept() and bdrv_co_pwrite_zeroes() are 64bit. vmdk: pass to vmdk_pwritev which is 64bit quorum: pass to quorum_co_pwritev() which is 64bit Hooray! At this point all block drivers are prepared to support 64bit write-zero requests, or have explicitly set max_pwrite_zeroes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-8-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: use <= rather than < in assertions relying on max_pwrite_zeroes] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	d544f5d3b1	block: make BlockLimits::max_pwrite_zeroes 64bit We are going to support 64 bit write-zeroes requests. Now update the limit variable. It's absolutely safe. The variable is set in some drivers, and used in bdrv_co_do_pwrite_zeroes(). Update also max_write_zeroes variable in bdrv_co_do_pwrite_zeroes(), so that bdrv_co_do_pwrite_zeroes() is now prepared to 64bit requests. The remaining logic including num, offset and bytes variables is already supporting 64bit requests. So the only thing that prevents 64 bit requests is limiting max_write_zeroes variable to INT_MAX in bdrv_co_do_pwrite_zeroes(). We'll drop this limitation after updating all block drivers. Ah, we also have bdrv_check_request32() in bdrv_co_pwritev_part(). It will be modified to do bdrv_check_request() for write-zeroes path. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-7-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	485350497b	block: use int64_t instead of uint64_t in copy_range driver handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver copy_range handlers parameters which are already 64bit to signed type. Now let's consider all callers. Simple git grep '\->bdrv_co_copy_range' shows the only caller: bdrv_co_copy_range_internal(), which does bdrv_check_request32(), so everything is OK. Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_co_copy_range_$from\\|to$\s*=' \| \ awk '{print $4}' \| sed 's/,//' \| sed 's/&//' \| sort \| uniq \| \ while read func; do git grep "$func(" \| \ grep -v "$func(BlockDriverState"; done shows no more callers. So, we are done. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-6-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	e75abedab7	block: use int64_t instead of uint64_t in driver write handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_$aio\\|co$_pwritev$_part$\?' shows that's there three callers of driver function: bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_save_vmstate() does bdrv_check_qiov_request(). Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_$aio\\|co$_pwritev$_part$\?\s*=' \| \ awk '{print $4}' \| sed 's/,//' \| sed 's/&//' \| sort \| uniq \| \ while read func; do git grep "$func(" \| \ grep -v "$func(BlockDriverState"; done shows several callers: qcow2: qcow2_co_truncate() write at most up to @offset, which is checked in generic qcow2_co_truncate() by bdrv_check_request(). qcow2_co_pwritev_compressed_task() pass the request (or part of the request) that already went through normal write path, so it should be OK qcow: qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch quorum: quorum_co_pwrite_zeroes() pass int64_t and int - OK throttle: throttle_co_pwritev_compressed() pass int64_t, it's updated by this patch vmdk: vmdk_co_pwritev_compressed() pass int64_t, it's updated by this patch Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:31 -05:00
Vladimir Sementsov-Ogievskiy	f7ef38dd13	block: use int64_t instead of uint64_t in driver read handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver read handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_$aio\\|co$_preadv$_part$\?' shows that's there three callers of driver function: bdrv_driver_preadv() in block/io.c, passes int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_load_vmstate() does bdrv_check_qiov_request(). do_perform_cow_read() has uint64_t argument. And a lot of things in qcow2 driver are uint64_t, so converting it is big job. But we must not work with requests that don't satisfy bdrv_check_qiov_request(), so let's just assert it here. Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_$aio\\|co$_preadv$_part$\?\s*=' \| \ awk '{print $4}' \| sed 's/,//' \| sed 's/&//' \| sort \| uniq \| \ while read func; do git grep "$func(" \| \ grep -v "$func(BlockDriverState"; done The only one such caller: QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1); ... ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0); in tests/unit/test-bdrv-drain.c, and it's OK obviously. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix typos] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:31 -05:00
Vladimir Sementsov-Ogievskiy	558902cc3d	qcow2: check request on vmstate save/load path We modify the request by adding an offset to vmstate. Let's check the modified request. It will help us to safely move .bdrv_co_preadv_part and .bdrv_co_pwritev_part to int64_t type of offset and bytes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:31 -05:00
Naveen Nagar	07a3dfa7c4	hw/nvme: fix verification of select field in namespace attachment Fix is added to check for reserved value in select field for namespace attachment CC: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-09-24 08:43:52 +02:00
Hanna Reitz	5a1cfd2150	block: Clarify that @bytes is no limit on pnum .bdrv_co_block_status() implementations are free to return a pnum that exceeds @bytes, because bdrv_co_block_status() in block/io.c will clamp pnum as necessary. On the other hand, if drivers' implementations return values for pnum that are as large as possible, our recently introduced block-status cache will become more effective. So, make a note in block_int.h that @bytes is no upper limit for *pnum. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210812084148.14458-4-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2021-09-15 15:54:07 +02:00
Hanna Reitz	0bc329fbb0	block: block-status cache for data regions As we have attempted before (https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html, "file-posix: Cache lseek result for data regions"; https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html, "file-posix: Cache next hole"), this patch seeks to reduce the number of SEEK_DATA/HOLE operations the file-posix driver has to perform. The main difference is that this time it is implemented as part of the general block layer code. The problem we face is that on some filesystems or in some circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the implementation is outside of qemu, there is little we can do about its performance. We have already introduced the want_zero parameter to bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls unless we really want zero information; but sometimes we do want that information, because for files that consist largely of zero areas, special-casing those areas can give large performance boosts. So the real problem is with files that consist largely of data, so that inquiring the block status does not gain us much performance, but where such an inquiry itself takes a lot of time. To address this, we want to cache data regions. Most of the time, when bad performance is reported, it is in places where the image is iterated over from start to end (qemu-img convert or the mirror job), so a simple yet effective solution is to cache only the current data region. (Note that only caching data regions but not zero regions means that returning false information from the cache is not catastrophic: Treating zeroes as data is fine. While we try to invalidate the cache on zero writes and discards, such incongruences may still occur when there are other processes writing to the image.) We only use the cache for nodes without children (i.e. protocol nodes), because that is where the problem is: Drivers that rely on block-status implementations outside of qemu (e.g. SEEK_DATA/HOLE). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210812084148.14458-3-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [hreitz: Added `local_file == bs` assertion, as suggested by Vladimir] Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-15 15:54:06 +02:00
Hanna Reitz	33ff4c9e08	block: Drop BDS comment regarding bdrv_append() There is a comment above the BDS definition stating care must be taken to consider handling newly added fields in bdrv_append(). Actually, this comment should have said "bdrv_swap()" as of `4ddc07cac` (nine years ago), and in any case, bdrv_swap() was dropped in `8e419aefa` (six years ago). So no such care is necessary anymore. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210812084148.14458-2-hreitz@redhat.com>	2021-09-15 15:54:06 +02:00
Vladimir Sementsov-Ogievskiy	abde8ac2a5	block/block-copy: block_copy_state_new(): drop extra arguments The only caller pass copy_range and compress both false. Let's just drop these arguments. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824083856.17408-35-vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:38:08 +02:00
Vladimir Sementsov-Ogievskiy	b518e9e9ef	block/backup: move cluster size calculation to block-copy The main consumer of cluster-size is block-copy. Let's calculate it here instead of passing through backup-top. We are going to publish copy-before-write filter soon, so it will be created through options. But we don't want for now to make explicit option for cluster-size, let's continue to calculate it automatically. So, now is the time to get rid of cluster_size argument for bdrv_cbw_append(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-10-vsementsov@virtuozzo.com> [hreitz: Add qemu/error-report.h include to block/block-copy.c] Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:11 +02:00
Vladimir Sementsov-Ogievskiy	f8b9504bac	block/block-copy: introduce block_copy_set_copy_opts() We'll need a possibility to set compress and use_copy_range options after initialization of the state. So make corresponding part of block_copy_state_new() separate and public. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824083856.17408-8-vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Vladimir Sementsov-Ogievskiy	49577723d4	block-copy: move detecting fleecing scheme to block-copy We want to simplify initialization interface of copy-before-write filter as we are going to make it public. So, let's detect fleecing scheme exactly in block-copy code, to not pass this information through extra levels. Why not just set BDRV_REQ_SERIALISING unconditionally: because we are going to implement new more efficient fleecing scheme which will not rely on backing feature. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-7-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Vladimir Sementsov-Ogievskiy	bd8f4c42c8	block: introduce bdrv_replace_child_bs() Add function to transactionally replace bs inside BdrvChild. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-2-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Klaus Jensen	a316aa50e6	hw/nvme: use symbolic names for registers Add the NvmeBarRegs enum and use these instead of explicit register offsets. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-07-26 21:09:38 +02:00
Klaus Jensen	5d45edbeac	hw/nvme: split pmrmsc register into upper and lower The specification uses a set of 32 bit PMRMSCL and PMRMSCU registers to make up the 64 bit logical PMRMSC register. Make it so. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-07-26 21:09:38 +02:00
Stefano Garzarella	1793ad0247	iothread: add aio-max-batch parameter The `aio-max-batch` parameter will be propagated to AIO engines and it will be used to control the maximum number of queued requests. When there are in queue a number of requests equal to `aio-max-batch`, the engine invokes the system call to forward the requests to the kernel. This parameter allows us to control the maximum batch size to reduce the latency that requests might accumulate while queued in the AIO engine queue. If `aio-max-batch` is equal to 0 (default value), the AIO engine will use its default maximum batch size value. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20210721094211.69853-3-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-21 13:47:50 +01:00
Peter Maydell	42e1d798a6	Block layer patches - Make blockdev-reopen stable - Remove deprecated qemu-img backing file without format - rbd: Convert to coroutines and add write zeroes support - rbd: Updated MAINTAINERS - export/fuse: Allow other users access to the export - vhost-user: Fix backends without multiqueue support - Fix drive-backup transaction endless drained section -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmDoRdIRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9bvgQ/+Ogq24n1UOQc8FEKRYfyhajNToQ9ofzWN iLiblSGx2QDq+CauD3qdu6z7DLlqEXeoM4NYM462oIPumptQj+9XZt7ftfh6FLWW 4yJEbjfnVKOba+vFdJ+E0DStwnPaxYdnrPGd53cwHZfbZh4ZmkpTM350mzHHiLTb KYKOgWd+UHZbkYeCVNYTGe30SRBiKeAecTpsVZ5HVhe7LstjByuy5stk8dytLpdV YqdKOToZfOp77XiHr8YcLLp1HHBGlr5hw73V4SDas0beCp7hqtnAqsTYyXBue4xO 4zfD4Gujr5JVOCb0crDTyOmOQY5E+y2dqFoOUF00D5AoN2vj4nfQ9ESkbqlE9BVh mgJ1izSokYlN2X8rIwGXNR5fbxRmxxfkAA4rScNRytj1KxDHyrDxrp/k8YFemxSQ qwgb/FBm0fcr69evPRzovKwZFhcyPremksluHQE4rZZ66qBQ2cGuDJPE7PWVTpPH 67JCrIVK/O6n5p+4ilFHmQQ3aP3ol0frMFcboYVRchJ2MhIDTsfFL3F/tTK8hy86 AmrrdQ1BQIAoKNOKnAmOSOUdExM55OcfPmX69+AhEk2GeWP6kgz5Pks4H3qCiKGf YoRk8F1V+N4q+C0mFFovB61bNQ6COIlBuzmD9EtmpDD/Ta3Wib+3ZnoGVIdPS+OI jyj+qJxd9z4= =kH+r -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches - Make blockdev-reopen stable - Remove deprecated qemu-img backing file without format - rbd: Convert to coroutines and add write zeroes support - rbd: Updated MAINTAINERS - export/fuse: Allow other users access to the export - vhost-user: Fix backends without multiqueue support - Fix drive-backup transaction endless drained section # gpg: Signature made Fri 09 Jul 2021 13:49:22 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (28 commits) block: Make blockdev-reopen stable API iotests: Test reopening multiple devices at the same time block: Support multiple reopening with x-blockdev-reopen block: Acquire AioContexts during bdrv_reopen_multiple() block: Add bdrv_reopen_queue_free() qcow2: Fix dangling pointer after reopen for 'file' qemu-img: Improve error for rebase without backing format qemu-img: Require -F with -b backing image qcow2: Prohibit backing file changes in 'qemu-img amend' blockdev: fix drive-backup transaction endless drained section vhost-user: Fix backends without multiqueue support MAINTAINERS: add block/rbd.c reviewer block/rbd: fix type of task->complete iotests/fuse-allow-other: Test allow-other iotests/308: Test +w on read-only FUSE exports export/fuse: Let permissions be adjustable export/fuse: Give SET_ATTR_SIZE its own branch export/fuse: Add allow-other option export/fuse: Pass default_permissions for mount util/uri: do not check argument of uri_free() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-07-10 19:55:21 +01:00
Kevin Wolf	6cf42ca2f9	block: Acquire AioContexts during bdrv_reopen_multiple() As the BlockReopenQueue can contain nodes in multiple AioContexts, only one of which may be locked when AIO_WAIT_WHILE() can be called, we can't let the caller lock the right contexts. Instead, individually lock the AioContext of a single node when iterating the queue. Reintroduce bdrv_reopen() as a wrapper for reopening a single node that drains the node and temporarily drops the AioContext lock for bdrv_reopen_multiple(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210708114709.206487-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 13:19:11 +02:00
Alberto Garcia	ab5b522879	block: Add bdrv_reopen_queue_free() Move the code to free a BlockReopenQueue to a separate function. It will be used in a subsequent patch. [ kwolf: Also free explicit_options and options, and explicitly qobject_ref() the value when it continues to be used. This makes future memory leaks less likely. ] Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210708114709.206487-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 13:19:11 +02:00
Stefan Hajnoczi	0f08586c71	util/async: add a human-readable name to BHs for debugging It can be difficult to debug issues with BHs in production environments. Although BHs can usually be identified by looking up their ->cb() function pointer, this requires debug information for the program. It is also not possible to print human-readable diagnostics about BHs because they have no identifier. This patch adds a name to each BH. The name is not unique per instance but differentiates between cb() functions, which is usually enough. It's done by changing aio_bh_new() and friends to macros that stringify cb. The next patch will use the name field when reporting leaked BHs. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210414200247.917496-2-stefanha@redhat.com>	2021-07-05 11:40:32 +01:00
Peter Maydell	9c2647f750	Block layer patches - Supporting changing 'file' in x-blockdev-reopen - ssh: add support for sha256 host key fingerprints - vhost-user-blk: Implement reconnection during realize - introduce QEMU_AUTO_VFREE - Don't require password of encrypted backing file for image creation - Code cleanups -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmDclTcRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9bzaw/+PYQ9vPG+ZROWl633TUOQu7IYGZynXCET ZHlV2JlXnFH8QoO8A53U72cgVg+GlwDpiMCCtjEGMG1yfMBNe+DXR1wFUMDne1Gs qIFX4gIVpPGDi3gPeQvefLCwoN8VXwIxvCJj40YR9BY0cqQR6joI81kjVlqKemqB cHG4qJHmnphihSLep/dd2BRInkeHWXWU63jK4d7ctdCwHZPNDf0u0FEraxEqS/d0 5tfdIl3haghhoDRRamyh5bLZBCW1KlpfTR98RdspcwpSlXq/FgV5N9CFa15XSYfQ rinSClSIOpLzG90+tBihROVBXvbugO5qZVQTG0Yg1tt4FGG8Cmiqf9MNXC5yctNg WnaQQipx/37deafGA4jqorZBJd1R87JLJTBFTpkB47XAFq/ltqsTDhrrfdS+jail Fd+qyqWg0Jx3JjdhSUpHvDKBBsErjoxtoyQIGakSreXGmj2UY6BFmGii7lnLZNLo +E81C7exnkCIGKkOHy+y9DkpVY/PEJKCG7uwcyy+F2qOqGUOxKLuZomWcLodo6Vf /eJ/UsLJt6HhXhXq/1ZZHmaORn8Lft1yr/9azoGXZ7er+jZcbEkhbcZmET+Y6ykq Vox/GmLkhyVkM96MA0lMW5hHPWUbF29m9Jmq3nNfvFWBcILEs4uWSlbd0M2oAmWj ung9sKIV/8s= =aB0a -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches - Supporting changing 'file' in x-blockdev-reopen - ssh: add support for sha256 host key fingerprints - vhost-user-blk: Implement reconnection during realize - introduce QEMU_AUTO_VFREE - Don't require password of encrypted backing file for image creation - Code cleanups # gpg: Signature made Wed 30 Jun 2021 17:00:55 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (24 commits) vhost-user-blk: Implement reconnection during realize vhost-user-blk: Factor out vhost_user_blk_realize_connect() vhost: Distinguish errors in vhost_dev_get_config() vhost-user-blk: Add Error parameter to vhost_user_blk_start() vhost: Return 0/-errno in vhost_dev_init() vhost: Distinguish errors in vhost_backend_init() vhost: Add Error parameter to vhost_dev_init() block/ssh: add support for sha256 host key fingerprints block/commit: use QEMU_AUTO_VFREE introduce QEMU_AUTO_VFREE iotests: Test replacing files with x-blockdev-reopen block: Allow changing bs->file on reopen block: BDRVReopenState: drop replace_backing_bs field block: move supports_backing check to bdrv_set_file_or_backing_noperm() block: bdrv_reopen_parse_backing(): simplify handling implicit filters block: bdrv_reopen_parse_backing(): don't check frozen child block: bdrv_reopen_parse_backing(): don't check aio context block: introduce bdrv_set_file_or_backing_noperm() block: introduce bdrv_remove_file_or_backing_child() block: comment graph-modifying function not updating permissions ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-07-02 11:46:32 +01:00
Peter Maydell	1ec2cd0ce2	hw/nvme patches * namespace eui64 support (Heinrich) * aiocb refactoring (Klaus) * controller parameter for auto zone transitioning (Niklas) * misc fixes and additions (Gollu, Klaus, Keith) -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmDbap8ACgkQTeGvMW1P Denz9wf9GtU4z7G0pQpFlv1smWPvQfv9qlkGiRvGcIQjEgPjcKjEsJC1CqtG6nZ5 BGYzey706iM7NAs6wPkyUyHgWXl+N/Ku5nNBvNJuzTN5J9aHgnqehe8civvS6ONU 0WtAQEEZCLu0Qbvw6KawCShXrJ6Jpmu+bYGggYEy5+baumLF0Qj50l2wRiDbImNu rE3iZuhEenwnNNSTTkyt2RT8lSzizAtth4bAqzl3Zw1Q3NO+qDCL0wPOWD7YuD8p LTZ8/ojB1+YUI3JrMM4MlzbJyGstdk2Y5zq6QpAcf86agaaFEmS0iqjOic52ImpL VzmZZZ0W7kmhJVc5H/Q5y/KAFIHnFA== =efjI -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into staging hw/nvme patches * namespace eui64 support (Heinrich) * aiocb refactoring (Klaus) * controller parameter for auto zone transitioning (Niklas) * misc fixes and additions (Gollu, Klaus, Keith) # gpg: Signature made Tue 29 Jun 2021 19:46:55 BST # gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9 # gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown] # gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838 # Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9 * remotes/nvme/tags/nvme-next-pull-request: (23 commits) hw/nvme: add 'zoned.zasl' to documentation hw/nvme: fix pin-based interrupt behavior (again) hw/nvme: fix missing check for PMR capability hw/nvme: documentation fix hw/nvme: fix endianess conversion and add controller list Partially revert "hw/block/nvme: drain namespaces on sq deletion" hw/nvme: reimplement format nvm to allow cancellation hw/nvme: reimplement zone reset to allow cancellation hw/nvme: reimplement the copy command to allow aio cancellation hw/nvme: add dw0/1 to the req completion trace event hw/nvme: use prinfo directly in nvme_check_prinfo and nvme_dif_check hw/nvme: remove assert from nvme_get_zone_by_slba hw/nvme: save reftag when generating pi hw/nvme: reimplement dsm to allow cancellation hw/nvme: add nvme_block_status_all helper hw/nvme: reimplement flush to allow cancellation hw/nvme: default for namespace EUI-64 hw/nvme: namespace parameter for EUI-64 hw/nvme: fix csi field for cns 0x00 and 0x11 hw/nvme: add param to control auto zone transitioning to zone state closed ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-06-30 21:09:27 +01:00
Alberto Garcia	ecd30d2d97	block: Allow changing bs->file on reopen When the x-blockdev-reopen was added it allowed reconfiguring the graph by replacing backing files, but changing the 'file' option was forbidden. Because of this restriction some operations are not possible, notably inserting and removing block filters. This patch adds support for replacing the 'file' option. This is similar to replacing the backing file and the user is likewise responsible for the correctness of the resulting graph, otherwise this can lead to data corruption. Signed-off-by: Alberto Garcia <berto@igalia.com> [vsementsov: bdrv_reopen_parse_file_or_backing() is modified a lot] Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610120537.196183-9-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-29 16:51:00 +02:00
Vladimir Sementsov-Ogievskiy	3d0e8743f0	block: BDRVReopenState: drop replace_backing_bs field It's used only in bdrv_reopen_commit(). "backing" is covered by the loop through all children except for case when we removed backing child during reopen. Make it more obvious and drop extra boolean field: qdict_del will not fail if there is no such entry. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610120537.196183-8-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-29 16:51:00 +02:00
Gollu Appalanaidu	5f4eb94dbb	hw/nvme: fix endianess conversion and add controller list Add the controller identifiers list CNS 0x13, available list of ctrls in NVM Subsystem that may or may not be attached to namespaces. In Identify Ctrl List of the CNS 0x12 and 0x13 no endian conversion for the nsid field. These two CNS values shows affect when there exists a Subsystem. Added condition if there is no Subsystem return invalid field in command. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-06-29 07:16:25 +02:00
Klaus Jensen	2a132309e4	hw/nvme: use prinfo directly in nvme_check_prinfo and nvme_dif_check The nvme_check_prinfo() and nvme_dif_check() functions operate on the 16 bit "control" member of the NvmeCmd. These functions do not otherwise operate on an NvmeCmd or an NvmeRequest, so change them to expect the actual 4 bit PRINFO field and add constants that work on this field as well. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Gollu Appalanaidu	18de1526ba	hw/nvme: add identify namespace flbas/mc enums Add enums for the Identify Namespace FLBAS and MC fields. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> [k.jensen: squashed separate flbas/mc commits into one] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-06-29 07:16:25 +02:00
Peter Maydell	6512fa497c	* Some Meson test conversions * KVM dirty page ring buffer fix * KVM TSC scaling support * Fixes for SG_IO with /dev/sdX devices * (Non)support for host devices on iOS * -smp cleanups -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDV5TIUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNySgf9HMnAtLWp36p2ie74o4rrW9x3Ojrm fuCq2i3q3nBhEKqqiyp+QQJGubE44mXEZQYtX89tOfSFgg7o6SLIoAcQQskr+In6 f9I1jjpSVTls0AaGUO+iRn9KiTzeMWeo1l6Wht+2mfBL5XpNLaLLu/T49uPhjlvN zFi5blgILxIYMqMCD1joDBnIiqqDozr0p7QzRZD8re25sRhg0NHQxyIh3OxBPpJ9 3Jhy1Us0cDWrwvPbxz6S5N0zesLu1ojtojVPy6iKjyHSv+6eiE6bHyIbS8duG5+H zBC1THOsUV3X1UvPAjuSNlgfNeobGAzmxSJ/evLgWWkpkx1mLtsnL5RARQ== =YoOL -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * Some Meson test conversions * KVM dirty page ring buffer fix * KVM TSC scaling support * Fixes for SG_IO with /dev/sdX devices * (Non)support for host devices on iOS * -smp cleanups # gpg: Signature made Fri 25 Jun 2021 15:16:18 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (28 commits) machine: reject -smp dies!=1 for non-PC machines machine: pass QAPI struct to mc->smp_parse machine: add error propagation to mc->smp_parse machine: move common smp_parse code to caller machine: move dies from X86MachineState to CpuTopology file-posix: handle EINTR during ioctl block: detect DKIOCGETBLOCKCOUNT/SIZE before use block: try BSD disk size ioctls one after another block: check for sys/disk.h block: feature detection for host block support file-posix: try BLKSECTGET on block devices too, do not round to power of 2 block: add max_hw_transfer to BlockLimits block-backend: align max_transfer to request alignment osdep: provide ROUND_DOWN macro scsi-generic: pass max_segments via max_iov field in BlockLimits file-posix: fix max_iov for /dev/sg devices KVM: Fix dirty ring mmap incorrect size due to renaming accident configure, meson: convert libusbredir detection to meson configure, meson: convert libcacard detection to meson configure, meson: convert libusb detection to meson ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-06-28 21:04:22 +01:00
Emanuele Giuseppe Esposito	149009bef4	block-copy: atomic .cancelled and .finished fields in BlockCopyCallState By adding acquire/release pairs, we ensure that .ret and .error_is_read fields are written by block_copy_dirty_clusters before .finished is true, and that they are read by API user after .finished is true. The atomic here are necessary because the fields are concurrently modified in coroutines, and read outside. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-6-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:51 +03:00
Paolo Bonzini	24b36e9813	block: add max_hw_transfer to BlockLimits For block host devices, I/O can happen through either the kernel file descriptor I/O system calls (preadv/pwritev, io_submit, io_uring) or the SCSI passthrough ioctl SG_IO. In the latter case, the size of each transfer can be limited by the HBA, while for file descriptor I/O the kernel is able to split and merge I/O in smaller pieces as needed. Applying the HBA limits to file descriptor I/O results in more system calls and suboptimal performance, so this patch splits the max_transfer limit in two: max_transfer remains valid and is used in general, while max_hw_transfer is limited to the maximum hardware size. max_hw_transfer can then be included by the scsi-generic driver in the block limits page, to ensure that the stricter hardware limit is used. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Vladimir Sementsov-Ogievskiy	97cf89259e	nbd/client-connection: add option for non-blocking connection attempt We'll need a possibility of non-blocking nbd_co_establish_connection(), so that it returns immediately, and it returns success only if a connections was previously established in background. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-30-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy	43cb34dede	nbd/client-connection: return only one io channel block/nbd doesn't need underlying sioc channel anymore. So, we can update nbd/client-connection interface to return only one top-most io channel, which is more straight forward. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-27-vsementsov@virtuozzo.com> [eblake: squash in Vladimir's fixes for uninit usage caught by clang] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:20:53 -05:00
Vladimir Sementsov-Ogievskiy	e0e67cbe58	nbd/client-connection: implement connection retry Add an option for a thread to retry connecting until it succeeds. We'll use nbd/client-connection both for reconnect and for initial connection in nbd_open(), so we need a possibility to use same NBDClientConnection instance to connect once in nbd_open() and then use retry semantics for reconnect. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-21-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	130d49baa5	nbd/client-connection: add possibility of negotiation Add arguments and logic to support nbd negotiation in the same thread after successful connection. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-20-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	5276c87c12	nbd: move connection code from block/nbd to nbd/client-connection We now have bs-independent connection API, which consists of four functions: nbd_client_connection_new() nbd_client_connection_release() nbd_co_establish_connection() nbd_co_establish_connection_cancel() Move them to a separate file together with NBDClientConnection structure which becomes private to the new API. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-18-vsementsov@virtuozzo.com> [eblake: comment tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Paolo Bonzini	5f50be9b58	async: the main AioContext is only "current" if under the BQL If we want to wake up a coroutine from a worker thread, aio_co_wake() currently does not work. In that scenario, aio_co_wake() calls aio_co_enter(), but there is no current AioContext and therefore qemu_get_current_aio_context() returns the main thread. aio_co_wake() then attempts to call aio_context_acquire() instead of going through aio_co_schedule(). The default case of qemu_get_current_aio_context() was added to cover synchronous I/O started from the vCPU thread, but the main and vCPU threads are quite different. The main thread is an I/O thread itself, only running a more complicated event loop; the vCPU thread instead is essentially a worker thread that occasionally calls qemu_mutex_lock_iothread(). It is only in those critical sections that it acts as if it were the home thread of the main AioContext. Therefore, this patch detaches qemu_get_current_aio_context() from iothreads, which is a useless complication. The AioContext pointer is stored directly in the thread-local variable, including for the main loop. Worker threads (including vCPU threads) optionally behave as temporary home threads if they have taken the big QEMU lock, but if that is not the case they will always schedule coroutines on remote threads via aio_co_schedule(). With this change, the stub qemu_mutex_iothread_locked() must be changed from true to false. The previous value of true was needed because the main thread did not have an AioContext in the thread-local variable, but now it does have one. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210609122234.544153-1-pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: tweak commit message per Vladimir's review] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:52 -05:00
Paolo Bonzini	c0d4aa82f8	vl: plumb keyval-based options into -readconfig Let -readconfig support parsing command line options into QDict or QemuOpts. This will be used to add back support for objects in -readconfig. Cc: Markus Armbruster <armbru@redhat.com> Cc: qemu-stable@nongnu.org Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210524105752.3318299-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-04 13:50:04 +02:00
Vladimir Sementsov-Ogievskiy	260242a833	block: drop BlockBackendRootState::read_only Instead of keeping additional boolean field, let's store the information in BDRV_O_RDWR bit of BlockBackendRootState::open_flags. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210527154056.70294-4-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00

... 2 3 4 5 6 ...

1581 Commits