mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Emanuele Giuseppe Esposito	b4ad82aab1	assertions for block_int global state API Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-13-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Emanuele Giuseppe Esposito	384a48fb74	IO_CODE and IO_OR_GS_CODE for block I/O API Mark all I/O functions with IO_CODE, and all "I/O OR GS" with IO_OR_GS_CODE. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-6-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Emanuele Giuseppe Esposito	f791bf7f93	assertions for block global state API All the global state (GS) API functions will check that qemu_in_main_thread() returns true. If not, it means that the safety of BQL cannot be guaranteed, and they need to be moved to I/O. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-5-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-03-04 18:18:25 +01:00
Hanna Reitz	113b727ce7	block/io: Update BSC only if want_zero is true We update the block-status cache whenever we get new information from a bdrv_co_block_status() call to the block driver. However, if we have passed want_zero=false to that call, it may flag areas containing zeroes as data, and so we would update the block-status cache with wrong information. Therefore, we should not update the cache with want_zero=false. Reported-by: Nir Soffer <nsoffer@redhat.com> Fixes: `0bc329fbb0` ("block: block-status cache for data regions") Reviewed-by: Nir Soffer <nsoffer@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220118170000.49423-2-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2022-01-28 16:52:40 -06:00
Paolo Bonzini	cc07162953	block: introduce max_hw_iov for use in scsi-generic Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel sources, IOV_MAX in POSIX). Because of this, on some host adapters requests with many iovecs are rejected with -EINVAL by the io_submit() or readv()/writev() system calls. In fact, the same limit applies to SG_IO as well. To fix both the EINVAL and the possible performance issues from using fewer iovecs than allowed by Linux (some HBAs have max_segments as low as 128), introduce a separate entry in BlockLimits to hold the max_segments value from sysfs. This new limit is used only for SG_IO and clamped to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to bs->bl.max_transfer. Reported-by: Halil Pasic <pasic@linux.ibm.com> Cc: Hanna Reitz <hreitz@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-block@nongnu.org Cc: qemu-stable@nongnu.org Fixes: `18473467d5` ("file-posix: try BLKSECTGET on block devices too, do not round to power of 2", 2021-06-25) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210923130436.1187591-1-pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-10-06 10:25:55 +02:00
Vladimir Sementsov-Ogievskiy	6a8f3dbb19	block/io: allow 64bit discard requests Now that all drivers are updated by the previous commit, we can drop the last limiter on pdiscard path: INT_MAX in bdrv_co_pdiscard(). Now everything is prepared for implementing incredibly cool and fast big-discard requests in NBD and qcow2. And any other driver which wants it of course. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-12-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	39af49c0d7	block: make BlockLimits::max_pdiscard 64bit We are going to support 64 bit discard requests. Now update the limit variable. It's absolutely safe. The variable is set in some drivers, and used in bdrv_co_pdiscard(). Update also max_pdiscard variable in bdrv_co_pdiscard(), so that bdrv_co_pdiscard() is now prepared for 64bit requests. The remaining logic including num, offset and bytes variables is already supporting 64bit requests. So the only thing that prevents 64 bit requests is limiting max_pdiscard variable to INT_MAX in bdrv_co_pdiscard(). We'll drop this limitation after updating all block drivers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-10-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	2aaa3f9b33	block/io: allow 64bit write-zeroes requests Now that all drivers are updated by previous commit, we can drop two last limiters on write-zeroes path: INT_MAX in bdrv_co_do_pwrite_zeroes() and bdrv_check_request32() in bdrv_co_pwritev_part(). Now everything is prepared for implementing incredibly cool and fast big-write-zeroes in NBD and qcow2. And any other driver which wants it of course. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-9-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	d544f5d3b1	block: make BlockLimits::max_pwrite_zeroes 64bit We are going to support 64 bit write-zeroes requests. Now update the limit variable. It's absolutely safe. The variable is set in some drivers, and used in bdrv_co_do_pwrite_zeroes(). Update also max_write_zeroes variable in bdrv_co_do_pwrite_zeroes(), so that bdrv_co_do_pwrite_zeroes() is now prepared to 64bit requests. The remaining logic including num, offset and bytes variables is already supporting 64bit requests. So the only thing that prevents 64 bit requests is limiting max_write_zeroes variable to INT_MAX in bdrv_co_do_pwrite_zeroes(). We'll drop this limitation after updating all block drivers. Ah, we also have bdrv_check_request32() in bdrv_co_pwritev_part(). It will be modified to do bdrv_check_request() for write-zeroes path. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-7-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	e75abedab7	block: use int64_t instead of uint64_t in driver write handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_$aio\\|co$_pwritev$_part$\?' shows that's there three callers of driver function: bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_save_vmstate() does bdrv_check_qiov_request(). Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_$aio\\|co$_pwritev$_part$\?\s*=' \| \ awk '{print $4}' \| sed 's/,//' \| sed 's/&//' \| sort \| uniq \| \ while read func; do git grep "$func(" \| \ grep -v "$func(BlockDriverState"; done shows several callers: qcow2: qcow2_co_truncate() write at most up to @offset, which is checked in generic qcow2_co_truncate() by bdrv_check_request(). qcow2_co_pwritev_compressed_task() pass the request (or part of the request) that already went through normal write path, so it should be OK qcow: qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch quorum: quorum_co_pwrite_zeroes() pass int64_t and int - OK throttle: throttle_co_pwritev_compressed() pass int64_t, it's updated by this patch vmdk: vmdk_co_pwritev_compressed() pass int64_t, it's updated by this patch Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:31 -05:00
Vladimir Sementsov-Ogievskiy	558902cc3d	qcow2: check request on vmstate save/load path We modify the request by adding an offset to vmstate. Let's check the modified request. It will help us to safely move .bdrv_co_preadv_part and .bdrv_co_pwritev_part to int64_t type of offset and bytes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:31 -05:00
Vladimir Sementsov-Ogievskiy	b984b2968b	block/io: bring request check to bdrv_co_(read,write)v_vmstate Only qcow2 driver supports vmstate. In qcow2 these requests go through .bdrv_co_p{read,write}v_part handlers. So, let's do our basic check for the request on vmstate generic handlers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:31 -05:00
Hanna Reitz	0bc329fbb0	block: block-status cache for data regions As we have attempted before (https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html, "file-posix: Cache lseek result for data regions"; https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html, "file-posix: Cache next hole"), this patch seeks to reduce the number of SEEK_DATA/HOLE operations the file-posix driver has to perform. The main difference is that this time it is implemented as part of the general block layer code. The problem we face is that on some filesystems or in some circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the implementation is outside of qemu, there is little we can do about its performance. We have already introduced the want_zero parameter to bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls unless we really want zero information; but sometimes we do want that information, because for files that consist largely of zero areas, special-casing those areas can give large performance boosts. So the real problem is with files that consist largely of data, so that inquiring the block status does not gain us much performance, but where such an inquiry itself takes a lot of time. To address this, we want to cache data regions. Most of the time, when bad performance is reported, it is in places where the image is iterated over from start to end (qemu-img convert or the mirror job), so a simple yet effective solution is to cache only the current data region. (Note that only caching data regions but not zero regions means that returning false information from the cache is not catastrophic: Treating zeroes as data is fine. While we try to invalidate the cache on zero writes and discards, such incongruences may still occur when there are other processes writing to the image.) We only use the cache for nodes without children (i.e. protocol nodes), because that is where the problem is: Drivers that rely on block-status implementations outside of qemu (e.g. SEEK_DATA/HOLE). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210812084148.14458-3-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [hreitz: Added `local_file == bs` assertion, as suggested by Vladimir] Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-15 15:54:06 +02:00
Kevin Wolf	87ab880252	block: Fix in_flight leak in request padding error path When bdrv_pad_request() fails in bdrv_co_preadv_part(), bs->in_flight has been increased, but is never decreased again. This leads to a hang when trying to drain the block node. This bug was observed with Windows guests which issue a request that fully uses IOV_MAX during installation, so that when padding is necessary (O_DIRECT with a 4k sector size block device on the host), adding another entry causes failure. Call bdrv_dec_in_flight() to fix this. There is a larger problem to solve here because this request shouldn't even fail, but Windows doesn't seem to care and with this minimal fix the installation succeeds. So given that we're already in freeze, let's take this minimal fix for 6.1. Fixes: `98ca45494f` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1972079 Reported-by: Qing Wang <qinwang@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210727154923.91067-1-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-08-03 15:43:30 +02:00
Akihiko Odaki	9f460c64e1	block/io: Merge discard request alignments Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Message-id: 20210705130458.97642-3-akihiko.odaki@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-06 14:28:55 +01:00
Eric Blake	97efa8698e	block: Move read-only check during truncation earlier No need to start a tracked request that will always fail. The choice to check read-only after bdrv_inc_in_flight() predates `1bc5f09f2e` (block: Use tracked request for truncate), but waiting for serializing requests can make the effect more noticeable. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210609163034.997943-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-29 16:51:00 +02:00
Paolo Bonzini	24b36e9813	block: add max_hw_transfer to BlockLimits For block host devices, I/O can happen through either the kernel file descriptor I/O system calls (preadv/pwritev, io_submit, io_uring) or the SCSI passthrough ioctl SG_IO. In the latter case, the size of each transfer can be limited by the HBA, while for file descriptor I/O the kernel is able to split and merge I/O in smaller pieces as needed. Applying the HBA limits to file descriptor I/O results in more system calls and suboptimal performance, so this patch splits the max_transfer limit in two: max_transfer remains valid and is used in general, while max_hw_transfer is limited to the maximum hardware size. max_hw_transfer can then be included by the scsi-generic driver in the block limits page, to ensure that the stricter hardware limit is used. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Vladimir Sementsov-Ogievskiy	307261b243	block: consistently use bdrv_is_read_only() It's better to use accessor function instead of bs->read_only directly. In some places use bdrv_is_writable() instead of checking both BDRV_O_RDWR set and BDRV_O_INACTIVE not set. In bdrv_open_common() it's a bit strange to add one more variable, but we are going to drop bs->read_only in the next patch, so new ro local variable substitutes it here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210527154056.70294-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	ad578c56d5	block: drop write notifiers They are unused now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-3-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	94783301b8	block/write-threshold: don't use write notifiers write-notifiers are used only for write-threshold. New code for such purpose should create filters. Let's better special-case write-threshold and drop write notifiers at all. (Actually, write-threshold is special-cased anyway, as the only user of write-notifiers) So, create a new direct interface for bdrv_co_write_req_prepare() and drop all write-notifier related logic from write-threshold.c. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [mreitz: Adjusted comment as per Eric's suggestion] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	1e4c797c75	block: make bdrv_refresh_limits() to be a transaction action To be used in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-28-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy	bd54669a4a	block: add new BlockDriver handler: bdrv_cancel_in_flight It will be used to stop retrying NBD requests on mirror cancel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 09:45:18 -06:00
Vladimir Sementsov-Ogievskiy	a5215b8fdf	block/io: use int64_t bytes in copy_range We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert now copy_range parameters which are already 64bit to signed type. It's safe as we don't work with requests overflowing BDRV_MAX_LENGTH (which is less than INT64_MAX), and do check the requests in bdrv_co_copy_range_internal() (by bdrv_check_request32(), which calls bdrv_check_request()). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-17-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	e9e52efdc5	block/io: support int64_t bytes in read/write wrappers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). Now, since bdrv_co_preadv_part() and bdrv_co_pwritev_part() have been updated, update all their wrappers. For all of them type of 'bytes' is widening, so callers are safe. We have update request_fn in blkverify.c simultaneously. Still it's just a pointer to one of bdrv_co_pwritev() or bdrv_co_preadv(), and type is widening for callers of the request_fn anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-16-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	37e9403ea8	block/io: support int64_t bytes in bdrv_co_p{read,write}v_part() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their remaining dependencies now. bdrv_pad_request() is updated simultaneously, as pointer to bytes passed to it both from bdrv_co_pwritev_part() and bdrv_co_preadv_part(). So, all callers of bdrv_pad_request() are updated to pass 64bit bytes. bdrv_pad_request() is already good for 64bit requests, add corresponding assertion. Look at bdrv_co_preadv_part() and bdrv_co_pwritev_part(). Type is widening, so callers are safe. Let's look inside the functions. In bdrv_co_preadv_part() and bdrv_aligned_pwritev() we only pass bytes to other already int64_t interfaces (and some obviously safe calculations), it's OK. In bdrv_co_do_zero_pwritev() aligned_bytes may become large now, still it's passed to bdrv_aligned_pwritev which supports int64_t bytes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-15-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy	8b0c5d7659	block/io: support int64_t bytes in bdrv_aligned_preadv() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_aligned_preadv() now. Make the bytes variable in bdrv_padding_rmw_read() int64_t, as it is only used for pass-through to bdrv_aligned_preadv(). All bdrv_aligned_preadv() callers are safe as type is widening. Let's look inside: - add a new-style assertion that request is good. - callees bdrv_is_allocated(), bdrv_co_do_copy_on_readv() supports int64_t bytes - conversion of bytes_remaining is OK, as we never have requests overflowing BDRV_MAX_LENGTH - looping through bytes_remaining is ok, num is updated to int64_t - for bdrv_driver_preadv we have same limit of max_transfer - qemu_iovec_memset is OK, as bytes+qiov_offset should not overflow qiov->size anyway (thanks to bdrv_check_qiov_request()) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-14-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy	9df5afbdd1	block/io: support int64_t bytes in bdrv_co_do_copy_on_readv() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_co_do_copy_on_readv() now. 'bytes' type widening, so callers are safe. Look at the function itself: bytes, skip_bytes and progress become int64_t. bdrv_round_to_clusters() is OK, cluster_bytes now may be large. trace_bdrv_co_do_copy_on_readv() is OK looping through cluster_bytes is still OK. pnum is still capped to max_transfer, and to MAX_BOUNCE_BUFFER when we are going to do COR operation. Therefor calculations in qemu_iovec_from_buf() and bdrv_driver_preadv() should not change. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-13-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy	fcfd9ade68	block/io: support int64_t bytes in bdrv_aligned_pwritev() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_aligned_pwritev() now and convert the dependencies: bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() to signed type bytes. Conversion of bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() is definitely safe, as all requests in block/io must not overflow BDRV_MAX_LENGTH. Still add assertions. For bdrv_aligned_pwritev() 'bytes' type is widened, so callers are safe. Let's check usage of the parameter inside the function. Passing to bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() is OK. Passing to qemu_iovec_* is OK after new assertion. All other callees are already updated to int64_t. Checking alignment is not changed, offset + bytes and qiov_offset + bytes calculations are safe (thanks to new assertions). max_transfer is kept to be int for now. It has a default of INT_MAX here, and some drivers may rely on it. It's to be refactored later. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-12-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:16:03 -06:00
Vladimir Sementsov-Ogievskiy	5ae07b1410	block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_co_do_pwrite_zeroes() now. Callers are safe, as converting int to int64_t is safe. Concentrate on 'bytes' usage in the function (thx to Eric Blake): compute 'int tail' via % 'int alignment' - safe fragmentation loop 'int num' - still fragments with a cap on max_transfer use of 'num' within the loop MIN(bytes, max_transfer) as well as %alignment - still works, so calculations in if (head) {} are safe clamp size by 'int max_write_zeroes' - safe drv->bdrv_co_pwrite_zeroes(int) - safe because of clamping clamp size by 'int max_transfer' - safe buf allocation is still clamped to max_transfer qemu_iovec_init_buf(size_t) - safe because of clamping bdrv_driver_pwritev(uint64_t) - safe Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-11-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:16:03 -06:00
Vladimir Sementsov-Ogievskiy	17abcbeee2	block/io: use int64_t bytes in driver wrappers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver wrappers parameters which are already 64bit to signed type. Requests in block/io.c must never exceed BDRV_MAX_LENGTH (which is less than INT64_MAX), which makes the conversion to signed 64bit type safe. Add corresponding assertions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-10-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:16:03 -06:00
Eric Blake	8024726459	block: use int64_t as bytes type in tracked requests We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). All requests in block/io must not overflow BDRV_MAX_LENGTH, all external users of BdrvTrackedRequest already have corresponding assertions, so we are safe. Add some assertions still. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-9-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:15 -06:00
Vladimir Sementsov-Ogievskiy	63f4ad1186	block/io: improve bdrv_check_request: check qiov too Operations with qiov add more restrictions on bytes, let's cover it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-8-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy	98ca45494f	block/io: bdrv_pad_request(): support qemu_iovec_init_extended failure Make bdrv_pad_request() honest: return error if qemu_iovec_init_extended() failed. Update also bdrv_padding_destroy() to clean the structure for safety. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-6-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy	f0deecff82	block/io: refactor bdrv_pad_request(): move bdrv_pad_request() up Prepare for the following patch when bdrv_pad_request() will be able to fail. Update the comments. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:52 -06:00
Vladimir Sementsov-Ogievskiy	a56ed80c42	block: fix theoretical overflow in bdrv_init_padding() Calculation of sum may theoretically overflow, so use 64bit type and add some good assertions. Use int64_t constantly. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: tweak assertion order] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy	4c002cef0e	util/iov: make qemu_iovec_init_extended() honest Actually, we can't extend the io vector in all cases. Handle possible MAX_IOV and size_t overflows. For now add assertion to callers (actually they rely on success anyway) and fix them in the following patch. Add also some additional good assertions to qemu_iovec_init_slice() while being here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy	69b55e03f7	block: refactor bdrv_check_request: add errp It's better to pass &error_abort than just assert that result is 0: on crash, we'll immediately see the reason in the backtrace. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix iotest 206 fallout] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:33 -06:00
Andrey Shinkevich	897dd0ec4f	block: include supported_read_flags into BDS structure Add the new member supported_read_flags to the BlockDriverState structure. It will control the flags set for copy-on-read operations. Make the block generic layer evaluate supported read flags before they go to a block driver. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [vsementsov: use assert instead of abort] Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	d1a764d126	block: introduce BDRV_REQ_NO_WAIT flag Add flag to make serialising request no wait: if there are conflicting requests, just return error immediately. It's will be used in upcoming preallocate filter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201021145859.11201-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	8ac5aab255	block: bdrv_mark_request_serialising: split non-waiting function We'll need a separate function, which will only "mark" request serialising with specified align but not wait for conflicting requests. So, it will be like old bdrv_mark_request_serialising(), before merging bdrv_wait_serialising_requests_locked() into it. To reduce the possible mess, let's do the following: Public function that does both marking and waiting will be called bdrv_make_request_serialising, and private function which will only "mark" will be called tracked_request_set_serialising(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201021145859.11201-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	ec1c886831	block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg bs is linked in req, so no needs to pass it separately. Most of tracked-requests API doesn't have bs argument. Actually, after this patch only tracked_request_begin has it, but it's for purpose. While being here, also add a comment about what "_locked" is. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201021145859.11201-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	3183937ff9	block/io: split out bdrv_find_conflicting_request To be reused in separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201021145859.11201-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	2e36da62cf	block/io.c: drop assertion on double waiting for request serialisation The comments states, that on misaligned request we should have already been waiting. But for bdrv_padding_rmw_read, we called bdrv_mark_request_serialising with align = request_alignment, and now we serialise with align = cluster_size. So we may have to wait again with larger alignment. Note, that the only user of BDRV_REQ_SERIALISING is backup which issues cluster-aligned requests, so seems the assertion should not fire for now. But it's wrong anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201021145859.11201-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Kevin Wolf	960d5fb3e8	block: Fix deadlock in bdrv_co_yield_to_drain() If bdrv_co_yield_to_drain() is called for draining a block node that runs in a different AioContext, it keeps that AioContext locked while it yields and schedules a BH in the AioContext to do the actual drain. As long as executing the BH is the very next thing that the event loop of the node's AioContext does, this actually happens to work, but when it tries to execute something else that wants to take the AioContext lock, it will deadlock. (In the bug report, this other thing is a virtio-scsi device running virtio_scsi_data_plane_handle_cmd().) Instead, always drop the AioContext lock across the yield and reacquire it only when the coroutine is reentered. The BH needs to unconditionally take the lock for itself now. This fixes the 'block_resize' QMP command on a block node that runs in an iothread. Cc: qemu-stable@nongnu.org Fixes: `eb94b81a94` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20201203172311.68232-4-kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy	8b1170012b	block: introduce BDRV_MAX_LENGTH We are going to modify block layer to work with 64bit requests. And first step is moving to int64_t type for both offset and bytes arguments in all block request related functions. It's mostly safe (when widening signed or unsigned int to int64_t), but switching from uint64_t is questionable. So, let's first establish the set of requests we want to work with. First signed int64_t should be enough, as off_t is signed anyway. Then, obviously offset + bytes should not overflow. And most interesting: (offset + bytes) being aligned up should not overflow as well. Aligned to what alignment? First thing that comes in mind is bs->bl.request_alignment, as we align up request to this alignment. But there is another thing: look at bdrv_mark_request_serialising(). It aligns request up to some given alignment. And this parameter may be bdrv_get_cluster_size(), which is often a lot greater than bs->bl.request_alignment. Note also, that bdrv_mark_request_serialising() uses signed int64_t for calculations. So, actually, we already depend on some restrictions. Happily, bdrv_get_cluster_size() returns int and bs->bl.request_alignment has 32bit unsigned type, but defined to be a power of 2 less than INT_MAX. So, we may establish, that INT_MAX is absolute maximum for any kind of alignment that may occur with the request. Note, that bdrv_get_cluster_size() is not documented to return power of 2, still bdrv_mark_request_serialising() behaves like it is. Also, backup uses bdi.cluster_size and is not prepared to it not being power of 2. So, let's establish that Qemu supports only power-of-2 clusters and alignments. So, alignment can't be greater than 2^30. Finally to be safe with calculations, to not calculate different maximums for different nodes (depending on cluster size and request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30) as absolute maximum bytes length for Qemu. Actually, it's not much less than INT64_MAX. OK, then, let's apply it to block/io. Let's consider all block/io entry points of offset/bytes: 4 bytes/offset interface functions: bdrv_co_preadv_part(), bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and bdrv_co_pdiscard() and we check them all with bdrv_check_request(). We also have one entry point with only offset: bdrv_co_truncate(). Check the offset. And one public structure: BdrvTrackedRequest. Happily, it has only three external users: file-posix.c: adopted by this patch write-threshold.c: only read fields test-write-threshold.c: sets obviously small constant values Better is to make the structure private and add corresponding interfaces.. Still it's not obvious what kind of interface is needed for file-posix.c. Let's keep it public but add corresponding assertions. After this patch we'll convert functions in block/io.c to int64_t bytes and offset parameters. We can assume that offset/bytes pair always satisfy new restrictions, and make corresponding assertions where needed. If we reach some offset/bytes point in block/io.c missing bdrv_check_request() it is considered a bug. As well, if block/io.c modifies a offset/bytes request, expanding it more then aligning up to request_alignment, it's a bug too. For all io requests except for discard we keep for now old restriction of 32bit request length. iotest 206 output error message changed, as now test disk size is larger than new limit. Add one more test case with new maximum disk size to cover too-big-L1 case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy	f4dad307ef	block/io: bdrv_check_byte_request(): drop bdrv_is_inserted() Move bdrv_is_inserted() calls into callers. We are going to make bdrv_check_byte_request() a clean thing. bdrv_is_inserted() is not about checking the request, it's about checking the bs. So, it should be separate. With this patch we probably change error path for some failure scenarios. But depending on the fact that querying too big request on empty cdrom (or corrupted qcow2 node with no drv) will result in EIO and not ENOMEDIUM would be very strange. More over, we are going to move to 64bit requests, so larger requests will be allowed anyway. More over, keeping in mind that cdrom is the only driver that has .bdrv_is_inserted() handler it's strange that we should care so much about it in generic block layer, intuitively we should just do read and write, and cdrom driver should return correct errors if it is not inserted. But it's a work for another series. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-4-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy	33985614bd	block/io: bdrv_refresh_limits(): use ERRP_GUARD This simplifies following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-3-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Eric Blake	a92b1b065e	block: Return depth level during bdrv_is_allocated_above When checking for allocation across a chain, it's already easy to count the depth within the chain at which the allocation is found. Instead of throwing that information away, return it to the caller. Existing callers only cared about allocated/non-allocated, but having a depth available will be used by NBD in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-9-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-30 15:21:23 -05:00
Greg Kurz	1a6d3bd229	block: End quiescent sections when a BDS is deleted If a BDS gets deleted during blk_drain_all(), it might miss a call to bdrv_do_drained_end(). This means missing a call to aio_enable_external() and the AIO context remains disabled for ever. This can cause a device to become irresponsive and to disrupt the guest execution, ie. hang, loop forever or worse. This scenario is quite easy to encounter with virtio-scsi on POWER when punching multiple blockdev-create QMP commands while the guest is booting and it is still running the SLOF firmware. This happens because SLOF disables/re-enables PCI devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND register after the initial probe/feature negotiation, as it tends to work with a single device at a time at various stages like probing and running block/network bootloaders without doing a full reset in-between. This naturally generates many dataplane stops and starts, and thus many drain sections that can race with blockdev_create_run(). In the end, SLOF bails out. It is somehow reproducible on x86 but it requires to generate articial dataplane start/stop activity with stop/cont QMP commands. In this case, seabios ends up looping for ever, waiting for the virtio-scsi device to send a response to a command it never received. Add a helper that pairs all previously called bdrv_do_drained_begin() with a bdrv_do_drained_end() and call it from bdrv_close(). While at it, update the "/bdrv-drain/graph-change/drain_all" test in test-bdrv-drain so that it can catch the issue. BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160346526998.272601.9045392804399803158.stgit@bahia.lan> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00
Alberto Garcia	46cd1e8a47	qcow2: Skip copy-on-write when allocating a zero cluster Since commit `c8bb23cbdb` when a write request results in a new allocation QEMU first tries to see if the rest of the cluster outside the written area contains only zeroes. In that case, instead of doing a normal copy-on-write operation and writing explicit zero buffers to disk, the code zeroes the whole cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK. This improves performance very significantly but it only happens when we are writing to an area that was completely unallocated before. Zero clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and are therefore slower to allocate. This happens because the code uses bdrv_is_allocated_above() rather bdrv_block_status_above(). The former is not as accurate for this purpose but it is faster. However in the case of qcow2 the underlying call does already report zero clusters just fine so there is no reason why we cannot use that information. After testing 4KB writes on an image that only contains zero clusters this patch results in almost five times more IOPS. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <6d77cab968c501c44d6e1089b9bc91b04170b49e.1603731354.git.berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00

1 2 3 4 5 ...

409 Commits