mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Vladimir Sementsov-Ogievskiy	a2debf6506	qcow2-refcount: introduce fix_l2_entry_by_zero() Split fix_l2_entry_by_zero() out of check_refcounts_l2() to be reused in further patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-5-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-15 18:42:38 +02:00
Vladimir Sementsov-Ogievskiy	a6e098462b	qcow2: introduce qcow2_parse_compressed_l2_entry() helper Add helper to parse compressed l2_entry and use it everywhere instead of open-coding. Note, that in most places we move to precise coffset/csize instead of sector-aligned. Still it should work good enough for updating refcounts. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-4-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-15 18:42:38 +02:00
Vladimir Sementsov-Ogievskiy	9a3978a46b	qcow2: compressed read: simplify cluster descriptor passing Let's pass the whole L2 entry and not bother with L2E_COMPRESSED_OFFSET_SIZE_MASK. It also helps further refactoring that adds generic qcow2_parse_compressed_l2_entry() helper. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-3-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-15 18:42:38 +02:00
Vladimir Sementsov-Ogievskiy	786c22d9c2	qcow2-refcount: improve style of check_refcounts_l2() - don't use same name for size in bytes and in entries - use g_autofree for l2_table - add whitespace - fix block comment style Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-2-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-15 18:42:38 +02:00
Vladimir Sementsov-Ogievskiy	ff812c5563	qcow2: handle_dependencies(): relax conflict detection There is no conflict and no dependency if we have parallel writes to different subclusters of one cluster when the cluster itself is already allocated. So, relax extra dependency. Measure performance: First, prepare build/qemu-img-old and build/qemu-img-new images. cd scripts/simplebench ./img_bench_templater.py Paste the following to stdin of running script: qemu_img=../../build/qemu-img-{old\|new} $qemu_img create -f qcow2 -o extended_l2=on /ssd/x.qcow2 1G $qemu_img bench -c 100000 -d 8 [-s 2K\|-s 2K -o 512\|-s $((10242+512))] \ -w -t none -n /ssd/x.qcow2 The result: All results are in seconds ------------------ --------- --------- old new -s 2K 6.7 ± 15% 6.2 ± 12% -7% -s 2K -o 512 13 ± 3% 11 ± 5% -16% -s $((10242+512)) 9.5 ± 4% 8.4 -12% ------------------ --------- --------- So small writes are more independent now and that helps to keep deeper io queue which improves performance. 271 iotest output becomes racy for three allocation in one cluster. Second and third writes may finish in different order. Second and third requests don't depend on each other any more. Still they both depend on first request anyway. Filter out second and third write offsets to cover both possible outputs. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824101517.59802-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> [hreitz: s/ an / and /] Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-15 15:54:07 +02:00
Vladimir Sementsov-Ogievskiy	6d207d3501	qcow2: refactor handle_dependencies() loop body No logic change, just prepare for the following commit. While being here do also small grammar fix in a comment. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824101517.59802-3-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-15 15:54:07 +02:00
Stefano Garzarella	66fed30c9c	block/mirror: fix NULL pointer dereference in mirror_wait_on_conflicts() In mirror_iteration() we call mirror_wait_on_conflicts() with `self` parameter set to NULL. Starting from commit `d44dae1a7c` we dereference `self` pointer in mirror_wait_on_conflicts() without checks if it is not NULL. Backtrace: Program terminated with signal SIGSEGV, Segmentation fault. #0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>) at ../block/mirror.c:172 172 self->waiting_for_op = op; [Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))] (gdb) bt #0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>) at ../block/mirror.c:172 #1 0x00005610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=<optimized out>) at ../block/mirror.c:491 #2 0x00005610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at ../job.c:917 #3 0x00005610c5f046c6 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:173 #4 0x00007f0909975820 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /usr/lib64/libc.so.6 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001404 Fixes: `d44dae1a7c` ("block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210910124533.288318-1-sgarzare@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-15 15:54:07 +02:00
Hanna Reitz	9dbf6455f4	block/iscsi: Do not force-cap pnum bdrv_co_block_status() does it for us, we do not need to do it here. The advantage of not capping pnum is that bdrv_co_block_status() can cache larger data regions than requested by its caller. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210812084148.14458-7-hreitz@redhat.com>	2021-09-15 15:54:07 +02:00
Hanna Reitz	72b4cabe5e	block/gluster: Do not force-cap pnum bdrv_co_block_status() does it for us, we do not need to do it here. The advantage of not capping pnum is that bdrv_co_block_status() can cache larger data regions than requested by its caller. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210812084148.14458-6-hreitz@redhat.com>	2021-09-15 15:54:07 +02:00
Hanna Reitz	869e7ee827	block/file-posix: Do not force-cap pnum bdrv_co_block_status() does it for us, we do not need to do it here. The advantage of not capping pnum is that bdrv_co_block_status() can cache larger data regions than requested by its caller. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210812084148.14458-5-hreitz@redhat.com>	2021-09-15 15:54:07 +02:00
Hanna Reitz	0bc329fbb0	block: block-status cache for data regions As we have attempted before (https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html, "file-posix: Cache lseek result for data regions"; https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html, "file-posix: Cache next hole"), this patch seeks to reduce the number of SEEK_DATA/HOLE operations the file-posix driver has to perform. The main difference is that this time it is implemented as part of the general block layer code. The problem we face is that on some filesystems or in some circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the implementation is outside of qemu, there is little we can do about its performance. We have already introduced the want_zero parameter to bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls unless we really want zero information; but sometimes we do want that information, because for files that consist largely of zero areas, special-casing those areas can give large performance boosts. So the real problem is with files that consist largely of data, so that inquiring the block status does not gain us much performance, but where such an inquiry itself takes a lot of time. To address this, we want to cache data regions. Most of the time, when bad performance is reported, it is in places where the image is iterated over from start to end (qemu-img convert or the mirror job), so a simple yet effective solution is to cache only the current data region. (Note that only caching data regions but not zero regions means that returning false information from the cache is not catastrophic: Treating zeroes as data is fine. While we try to invalidate the cache on zero writes and discards, such incongruences may still occur when there are other processes writing to the image.) We only use the cache for nodes without children (i.e. protocol nodes), because that is where the problem is: Drivers that rely on block-status implementations outside of qemu (e.g. SEEK_DATA/HOLE). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210812084148.14458-3-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [hreitz: Added `local_file == bs` assertion, as suggested by Vladimir] Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-15 15:54:06 +02:00
Max Reitz	e24154d878	gluster: Align block-status tail gluster's block-status implementation is basically a copy of that in block/file-posix.c, there is only one thing missing, and that is aligning trailing data extents to the request alignment (as added by commit `9c3db310ff`). Note that `9c3db310ff` mentions that "there seems to be no other block driver that sets request_alignment and [...]", but while block/gluster.c does indeed not set request_alignment, block/io.c's bdrv_refresh_limits() will still default to an alignment of 512 because block/gluster.c does not provide a byte-aligned read function. Therefore, unaligned tails can conceivably occur, and so we should apply the change from `9c3db310ff` to gluster's block-status implementation. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210805143603.59503-1-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-15 15:54:06 +02:00
Michael Tokarev	68857f13aa	spelling: sytem => system Signed-off-By: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <fefb5f5c-82bc-05e2-b4c1-665e9d6896ff@msgid.tls.msk.ru> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-09-15 15:51:07 +02:00
Philippe Mathieu-Daudé	9bd2788f49	block/nvme: Only report VFIO error on failed retry We expect the first qemu_vfio_dma_map() to fail (indicating DMA mappings exhaustion, see commit `15a730e7a3`). Do not report the first failure as error, since we are going to flush the mappings and retry. This removes spurious error message displayed on the monitor: (qemu) c (qemu) qemu-kvm: VFIO_MAP_DMA failed: No space left on device (qemu) info status VM status: running Reported-by: Tingting Mao <timao@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210902070025.197072-12-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-09-07 09:08:24 +01:00
Philippe Mathieu-Daudé	521b97cd4e	util/vfio-helpers: Pass Error handle to qemu_vfio_dma_map() Currently qemu_vfio_dma_map() displays errors on stderr. When using management interface, this information is simply lost. Pass qemu_vfio_dma_map() an Error** handle so it can propagate the error to callers. Reviewed-by: Fam Zheng <fam@euphon.net> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210902070025.197072-7-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-09-07 09:08:24 +01:00
Philippe Mathieu-Daudé	526c37c19d	block/nvme: Have nvme_create_queue_pair() report errors consistently nvme_create_queue_pair() does not return a boolean value (indicating eventual error) but a pointer, and is inconsistent in how it fills the error handler. To fulfill callers expectations, always set an error message on failure. Reported-by: Auger Eric <eric.auger@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210902070025.197072-6-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-09-07 09:08:24 +01:00
Philippe Mathieu-Daudé	5ef1f4ec6f	block/nvme: Use safer trace format string Fix when building with -Wshorten-64-to-32: warning: implicit conversion loses integer precision: 'unsigned long' to 'int' [-Wshorten-64-to-32] Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210902070025.197072-2-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-09-07 09:08:24 +01:00
Viktor Prutyanov	ebd979c74e	block/file-win32: add reopen handlers Make 'qemu-img commit' work on Windows. Command 'commit' requires reopening backing file in RW mode. So, add reopen prepare/commit/abort handlers and change dwShareMode for CreateFile call in order to allow further read/write reopening. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/418 Suggested-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu> Tested-by: Helge Konetzka <hk@zapateado.de> Message-Id: <20210825173625.19415-1-viktor.prutyanov@phystech.edu> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:38:08 +02:00
Fabrice Fontaine	28031d5c74	block/export/fuse.c: fix fuse-lseek on uclibc or musl Include linux/fs.h to avoid the following build failure on uclibc or musl raised since version 6.0.0: ../block/export/fuse.c: In function 'fuse_lseek': ../block/export/fuse.c:641:19: error: 'SEEK_HOLE' undeclared (first use in this function) 641 \| if (whence != SEEK_HOLE && whence != SEEK_DATA) { \| ^~~~~~~~~ ../block/export/fuse.c:641:19: note: each undeclared identifier is reported only once for each function it appears in ../block/export/fuse.c:641:42: error: 'SEEK_DATA' undeclared (first use in this function); did you mean 'SEEK_SET'? 641 \| if (whence != SEEK_HOLE && whence != SEEK_DATA) { \| ^~~~~~~~~ \| SEEK_SET Fixes: - http://autobuild.buildroot.org/results/33c90ebf04997f4d3557cfa66abc9cf9a3076137 Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com> Message-Id: <20210827220301.272887-1-fontaine.fabrice@gmail.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:38:08 +02:00
Vladimir Sementsov-Ogievskiy	abde8ac2a5	block/block-copy: block_copy_state_new(): drop extra arguments The only caller pass copy_range and compress both false. Let's just drop these arguments. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824083856.17408-35-vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:38:08 +02:00
Vladimir Sementsov-Ogievskiy	751cec7a26	block/copy-before-write: make public block driver Finally, copy-before-write gets own .bdrv_open and .bdrv_close handlers, block_init() call and becomes available through bdrv_open(). To achieve this: - cbw_init gets unused flags argument and becomes cbw_open - block_copy_state_free() call moved to new cbw_close() - in bdrv_cbw_append: - options are completed with driver and node-name, and we can simply use bdrv_insert_node() to do both open and drained replacing - in bdrv_cbw_drop: - cbw_close() is now responsible for freeing s->bcs, so don't do it here Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-22-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	201b4bb6c7	block/block-copy: make setting progress optional Now block-copy will crash if user don't set progress meter by block_copy_set_progress_meter(). copy-before-write filter will be used in separate of backup job, and it doesn't want any progress meter (for now). So, allow not setting it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-21-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	06e0a9c164	block/copy-before-write: initialize block-copy bitmap We are going to publish copy-before-write filter to be used in separate of backup. Future step would support bitmap for the filter. But let's start from full set bitmap. We have to modify backup, as bitmap is first initialized by copy-before-write filter, and then backup modifies it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-20-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	f44fd7399c	block/copy-before-write: cbw_init(): use options One more step closer to .bdrv_open(): use options instead of plain arguments. Move to bdrv_open_child() calls, native for drive open handlers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-19-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	4c1e992bf2	block/copy-before-write: bdrv_cbw_append(): drop unused compress arg Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-18-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	5a50742674	block/copy-before-write: cbw_init(): use file child after attaching In the next commit we'll get rid of source argument of cbw_init(). Prepare to it now, to make next commit simpler: move the code block that uses source below attaching the child and use bs->file->bs instead of source variable. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-17-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	fe7ea40c0e	block/copy-before-write: cbw_init(): rename variables One more step closer to real .bdrv_open() handler: use more usual names for bs being initialized and its state. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-16-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	1f0cacb967	block/copy-before-write: introduce cbw_init() Move part of bdrv_cbw_append() to new function cbw_open(). It's an intermediate step for adding normal .bdrv_open() handler to the filter. With this commit no logic is changed, but we have a function which will be turned into .bdrv_open() handler in future commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-15-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	7ddbce2dec	block/copy-before-write: bdrv_cbw_append(): replace child at last Refactor the function to replace child at last. Thus we don't need to revert it and code is simplified. block-copy state initialization being done before replacing the child doesn't need any drained section. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-14-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	3c1e63277e	block/copy-before-write: use file child instead of backing We are going to publish copy-before-write filter, and there no public backing-child-based filter in Qemu. No reason to create a precedent, so let's refactor copy-before-write filter instead. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-13-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	451532311a	block/copy-before-write: drop extra bdrv_unref on failure path bdrv_attach_child() do bdrv_unref() on failure, so we shouldn't do it by hand here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-12-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	3860c02019	block/copy-before-write: relax permission requirements when no parents We are going to publish copy-before-write filter. So, user should be able to create it with blockdev-add first, specifying both filtered and target children. And then do blockdev-reopen, to actually insert the filter where needed. Currently, filter unshares write permission unconditionally on source node. It's good, but it will not allow to do blockdev-add. So, let's relax restrictions when filter doesn't have any parent. Test output is modified, as now permission conflict happens only when job creates a blk parent for filter node. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-11-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	b518e9e9ef	block/backup: move cluster size calculation to block-copy The main consumer of cluster-size is block-copy. Let's calculate it here instead of passing through backup-top. We are going to publish copy-before-write filter soon, so it will be created through options. But we don't want for now to make explicit option for cluster-size, let's continue to calculate it automatically. So, now is the time to get rid of cluster_size argument for bdrv_cbw_append(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-10-vsementsov@virtuozzo.com> [hreitz: Add qemu/error-report.h include to block/block-copy.c] Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:11 +02:00
Vladimir Sementsov-Ogievskiy	2a6511dfeb	block/backup: set copy_range and compress after filter insertion We are going to publish copy-before-write filter, so it would be initialized through options. Still we don't want to publish compress and copy-range options, as 1. Modern way to enable compression is to use compress filter. 2. For copy-range it's unclean how to make proper interface: - it's has experimental prefix for backup job anyway - the whole BackupPerf structure doesn't make sense for the filter So, let's just add copy-range possibility to the filter later if needed. Still, we are going to continue support for compression and experimental copy-range in backup job. So, set these options after filter creation. Note, that we can drop "compress" argument of bdrv_cbw_append() now, as well as "perf". The only reason not doing so is that now, when I prepare this patch the big series around it is already reviewed and I want to avoid extra rebase conflicts to simplify review of the following version. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-9-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Vladimir Sementsov-Ogievskiy	f8b9504bac	block/block-copy: introduce block_copy_set_copy_opts() We'll need a possibility to set compress and use_copy_range options after initialization of the state. So make corresponding part of block_copy_state_new() separate and public. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824083856.17408-8-vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Vladimir Sementsov-Ogievskiy	49577723d4	block-copy: move detecting fleecing scheme to block-copy We want to simplify initialization interface of copy-before-write filter as we are going to make it public. So, let's detect fleecing scheme exactly in block-copy code, to not pass this information through extra levels. Why not just set BDRV_REQ_SERIALISING unconditionally: because we are going to implement new more efficient fleecing scheme which will not rely on backing feature. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-7-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Vladimir Sementsov-Ogievskiy	d003e0aece	block: rename backup-top to copy-before-write We are going to convert backup_top to full featured public filter, which can be used in separate of backup job. Start from renaming from "how it used" to "what it does". While updating comments in 283 iotest, drop and rephrase also things about ".active", as this field is now dropped, and filter doesn't have "inactive" mode. Note that this change may be considered as incompatible interface change, as backup-top filter format name was visible through query-block and query-named-block-nodes. Still, consider the following reasoning: 1. backup-top was never documented, so if someone depends on format name (for driver that can't be used other than it is automatically inserted on backup job start), it's a kind of "undocumented feature use". So I think we are free to change it. 2. There is a hope, that there is no such users: it's a lot more native to give a good node-name to backup-top filter if need to operate with it somehow, and don't touch format name. 3. Another "incompatible" change in further commit would be moving copy-before-write filter from using backing child to file child. And this is even more reasonable than renaming: for now all public filters are file-child based. So, it's a risky change, but risk seems small and good interface worth it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-6-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Vladimir Sementsov-Ogievskiy	ed089506ee	block: introduce blk_replace_bs Add function to change bs inside blk. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-3-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Stefan Hajnoczi	b68ce82409	raw-format: drop WRITE and RESIZE child perms when possible The following command-line fails due to a permissions conflict: $ qemu-storage-daemon \ --blockdev driver=nvme,node-name=nvme0,device=0000:08:00.0,namespace=1 \ --blockdev driver=raw,node-name=l1-1,file=nvme0,offset=0,size=1073741824 \ --blockdev driver=raw,node-name=l1-2,file=nvme0,offset=1073741824,size=1073741824 \ --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock,max-connections=2 \ --export type=nbd,id=nbd-l1-1,node-name=l1-1,name=l1-1,writable=on \ --export type=nbd,id=nbd-l1-2,node-name=l1-2,name=l1-2,writable=on qemu-storage-daemon: --export type=nbd,id=nbd-l1-1,node-name=l1-1,name=l1-1,writable=on: Permission conflict on node 'nvme0': permissions 'resize' are both required by node 'l1-1' (uses node 'nvme0' as 'file' child) and unshared by node 'l1-2' (uses node 'nvme0' as 'file' child). The problem is that block/raw-format.c relies on bdrv_default_perms() to set permissions on the nvme node. The default permissions add RESIZE in anticipation of a format driver like qcow2 that needs to grow the image file. This fails because RESIZE is unshared, so we cannot get the RESIZE permission. Max Reitz pointed out that block/crypto.c already handles this case by implementing a custom ->bdrv_child_perm() function that adjusts the result of bdrv_default_perms(). This patch takes the same approach in block/raw-format.c so that RESIZE is only required if it's actually necessary (e.g. the parent is qcow2). Cc: Max Reitz <mreitz@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210726122839.822900-1-stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Mao Zhongyi	8cca0bd289	block/monitor: Consolidate hmp_handle_error calls to reduce redundant code Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com> Message-Id: <20210802062507.347555-1-maozhongyi@cmss.chinamobile.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Fabrice Fontaine	50482fda98	block/export/fuse.c: fix musl build Fix the following build failure on musl raised since version 6.0.0 and `4ca37a96a7` because musl does not define FALLOC_FL_ZERO_RANGE: ../block/export/fuse.c: In function 'fuse_fallocate': ../block/export/fuse.c:563:23: error: 'FALLOC_FL_ZERO_RANGE' undeclared (first use in this function) 563 \| } else if (mode & FALLOC_FL_ZERO_RANGE) { \| ^~~~~~~~~~~~~~~~~~~~ Fixes: - http://autobuild.buildroot.org/results/b96e3d364fd1f8bbfb18904a742e73327d308f64 Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com> Message-Id: <20210809095101.1101336-1-fontaine.fabrice@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-08-09 17:19:27 +02:00
Kevin Wolf	87ab880252	block: Fix in_flight leak in request padding error path When bdrv_pad_request() fails in bdrv_co_preadv_part(), bs->in_flight has been increased, but is never decreased again. This leads to a hang when trying to drain the block node. This bug was observed with Windows guests which issue a request that fully uses IOV_MAX during installation, so that when padding is necessary (O_DIRECT with a 4k sector size block device on the host), adding another entry causes failure. Call bdrv_dec_in_flight() to fix this. There is a larger problem to solve here because this request shouldn't even fail, but Windows doesn't seem to care and with this minimal fix the installation succeeds. So given that we're already in freeze, let's take this minimal fix for 6.1. Fixes: `98ca45494f` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1972079 Reported-by: Qing Wang <qinwang@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210727154923.91067-1-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-08-03 15:43:30 +02:00
Fabian Ebner	54caccb365	block/io_uring: resubmit when result is -EAGAIN Linux SCSI can throw spurious -EAGAIN in some corner cases in its completion path, which will end up being the result in the completed io_uring request. Resubmitting such requests should allow block jobs to complete, even if such spurious errors are encountered. Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Message-id: 20210729091029.65369-1-f.ebner@proxmox.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-29 17:14:55 +01:00
Philippe Mathieu-Daudé	15a730e7a3	block/nvme: Fix VFIO_MAP_DMA failed: No space left on device When the NVMe block driver was introduced (see commit `bdd6a90a9e`, January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning -ENOMEM in case of error. The driver was correctly handling the error path to recycle its volatile IOVA mappings. To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit DMA mappings per container", April 2019) added the -ENOSPC error to signal the user exhausted the DMA mappings available for a container. The block driver started to mis-behave: qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device (qemu) (qemu) info status VM status: paused (io-error) (qemu) c VFIO_MAP_DMA failed: No space left on device (qemu) c VFIO_MAP_DMA failed: No space left on device (The VM is not resumable from here, hence stuck.) Fix by handling the new -ENOSPC error (when DMA mappings are exhausted) without any distinction to the current -ENOMEM error, so we don't change the behavior on old kernels where the CVE-2019-3882 fix is not present. An easy way to reproduce this bug is to restrict the DMA mapping limit (65535 by default) when loading the VFIO IOMMU module: # modprobe vfio_iommu_type1 dma_entry_limit=666 Cc: qemu-stable@nongnu.org Cc: Fam Zheng <fam@euphon.net> Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Reported-by: Michal Prívozník <mprivozn@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210723195843.1032825-1-philmd@redhat.com Fixes: `bdd6a90a9e` ("block: Add VFIO based NVMe driver") Buglink: https://bugs.launchpad.net/qemu/+bug/1863333 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-26 09:38:12 +01:00
Eric Blake	94075c28ee	iotests: Improve and rename test 291 to qemu-img-bitmap Enhance the test to demonstrate existing less-than-stellar behavior of qemu-img with a qcow2 image containing an inconsistent bitmap: we don't diagnose the problem until after copying the entire image (a potentially long time), and when we do diagnose the failure, we still end up leaving an empty bitmap in the destination. This mess will be cleaned up in the next patch. While at it, rename the test now that we support useful iotest names, and fix a missing newline in the error message thus exposed. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210709153951.2801666-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Nir Soffer <nsoffer@redhat.com>	2021-07-21 14:14:41 -05:00
Stefano Garzarella	d7ddd0a161	linux-aio: limit the batch size using `aio-max-batch` parameter When there are multiple queues attached to the same AIO context, some requests may experience high latency, since in the worst case the AIO engine queue is only flushed when it is full (MAX_EVENTS) or there are no more queues plugged. Commit `2558cb8dd4` ("linux-aio: increasing MAX_EVENTS to a larger hardcoded value") changed MAX_EVENTS from 128 to 1024, to increase the number of in-flight requests. But this change also increased the potential maximum batch to 1024 elements. When there is a single queue attached to the AIO context, the issue is mitigated from laio_io_unplug() that will flush the queue every time is invoked since there can't be others queue plugged. Let's use the new `aio-max-batch` IOThread parameter to mitigate this issue, limiting the number of requests in a batch. We also define a default value (32): this value is obtained running some benchmarks and it represents a good tradeoff between the latency increase while a request is queued and the cost of the io_submit(2) system call. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20210721094211.69853-4-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-21 13:47:50 +01:00
Max Reitz	8573823f3b	block/export: Conditionally ignore set-context error When invoking block-export-add with some iothread and fixed-iothread=false, and changing the node's iothread fails, the error is supposed to be ignored. However, it is still stored in errp, which is wrong. If a second error occurs, the "errp must be NULL" assertion in error_setv() fails: qemu-system-x86_64: ../util/error.c:59: error_setv: Assertion `*errp == NULL' failed. So if fixed-iothread=false, we should ignore the error by passing NULL to bdrv_try_set_aio_context(). Fixes: `f51d23c80a` ("block/export: add iothread and fixed-iothread options") Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210624083825.29224-2-mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 16:49:31 +02:00
Vladimir Sementsov-Ogievskiy	6af72274ef	block/vvfat: fix: drop backing Most probably this fake backing child doesn't work anyway (see notes about it in `a8a4d15c1c`). Still, since `25f78d9e2d` drivers are required to set .supports_backing if they want to call bdrv_set_backing_hd, so now vvfat just doesn't work because of this check. Let's finally drop this fake backing file. Fixes: `25f78d9e2d` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210715124853.13335-1-vsementsov@virtuozzo.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 16:30:20 +02:00
Lukas Straub	c2cf0ecab5	replication: Remove workaround Remove the workaround introduced in commit `6ecbc6c526` "replication: Avoid blk_make_empty() on read-only child". It is not needed anymore since s->hidden_disk is guaranteed to be writable when secondary_do_checkpoint() runs. Because replication_start(), _do_checkpoint() and _stop() are only called by COLO migration code and COLO-migration activates all disks via bdrv_invalidate_cache_all() before it calls these functions. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <d3acfad43879e9f376bffa7dd797ae74d0a7c81a.1626619393.git.lukasstraub2@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 16:11:53 +02:00
Lukas Straub	3b78420bb1	replication: Properly attach children The replication driver needs access to the children block-nodes of it's child so it can issue bdrv_make_empty() and bdrv_co_pwritev() to manage the replication. However, it does this by directly copying the BdrvChilds, which is wrong. Fix this by properly attaching the block-nodes with bdrv_attach_child() and requesting the required permissions. This ultimatively fixes a potential crash in replication_co_writev(), because it may write to s->secondary_disk if it is in state BLOCK_REPLICATION_FAILOVER_FAILED, without requesting write permissions first. And now the workaround in secondary_do_checkpoint() can be removed. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <5d0539d729afb8072d0d7cde977c5066285591b4.1626619393.git.lukasstraub2@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 16:11:53 +02:00
Lukas Straub	a990a42b39	replication: Reduce usage of s->hidden_disk and s->secondary_disk In preparation for the next patch, initialize s->hidden_disk and s->secondary_disk later and replace access to them with local variables in the places where they aren't initialized yet. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1eb9dc179267207d9c7eccaeb30761758e32e9ab.1626619393.git.lukasstraub2@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 16:11:53 +02:00
Lukas Straub	1e12ecfd2c	replication: Remove s->active_disk s->active_disk is bs->file. Remove it and use local variables instead. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <2534f867ea9be5b666dfce19744b7d4e2b96c976.1626619393.git.lukasstraub2@web.de> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 16:08:38 +02:00
Vladimir Sementsov-Ogievskiy	d44dae1a7c	block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts It's possible that requests start to wait each other in mirror_wait_on_conflicts(). To avoid it let's use same technique as in block/io.c in bdrv_wait_serialising_requests_locked() / bdrv_find_conflicting_request(): don't wait on intersecting request if it is already waiting for some other request. For details of the dead-lock look at testIntersectingActiveIO() test-case which we actually fixing now. Fixes: `d06107ade0` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210702211636.228981-4-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 13:14:45 +02:00
Vladimir Sementsov-Ogievskiy	ead3f1bff9	block/mirror: set .co for active-write MirrorOp objects This field is unused, but it very helpful for debugging. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210702211636.228981-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 13:14:45 +02:00
Emanuele Giuseppe Esposito	36109bff17	blkdebug: protect rules and suspended_reqs with a lock First, categorize the structure fields to identify what needs to be protected and what doesn't. We essentially need to protect only .state, and the 3 lists in BDRVBlkdebugState. Then, add the lock and mark the functions accordingly. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210614082931.24925-7-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito	4153b553bd	block/blkdebug: remove new_state field and instead use a local variable There seems to be no benefit in using a field. Replace it with a local variable, and move the state update before the yields. The state update has do be done before the yields because now using a local variable does not allow the new updated state to be visible by the other yields. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210614082931.24925-6-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito	2196c341f7	blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE That would be unsafe in case a rule other than the current one is removed while the coroutine has yielded. Keep FOREACH_SAFE because suspend_request deletes the current rule. After this patch, all matching rules are deleted before suspending the coroutine, rather than just one. This doesn't affect the existing testcases. Use actions_count to see how many yield to issue. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210614082931.24925-5-eesposit@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito	51a463680d	blkdebug: track all actions Add a counter for each action that a rule can trigger. This is mainly used to keep track of how many coroutine_yield() we need to perform after processing all rules in the list. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210614082931.24925-4-eesposit@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito	f48ff5af13	blkdebug: move post-resume handling to resume_req_by_tag We want to move qemu_coroutine_yield() after the loop on rules, because QLIST_FOREACH_SAFE is wrong if the rule list is modified while the coroutine has yielded. Therefore move the suspended request to the heap and clean it up from the remove side. All that is left is for blkdebug_debug_event to handle the yielding. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210614082931.24925-3-eesposit@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito	69d0690c10	blkdebug: refactor removal of a suspended request Extract to a separate function. Do not rely on FOREACH_SAFE, which is only "safe" if the current node is removed---not if another node is removed. Instead, just walk the entire list from the beginning when asked to resume all suspended requests with a given tag. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210614082931.24925-2-eesposit@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-07-19 17:38:38 +02:00
Lukas Straub	0b9cd6b947	nbd: register yank function earlier Although unlikely, qemu might hang in nbd_send_request(). Allow recovery in this case by registering the yank function before calling it. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <20210704000730.1befb596@gecko.fritz.box> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-07-12 11:24:00 -05:00
Peter Maydell	d1987c8114	* More SVM fixes (Lara) * Module annotation database (Gerd) * Memory leak fixes (myself) * Build fixes (myself) * --with-devices-* support (Alex) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDoeBgUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroMtFAgAippmxRt3lt+tcdSrCOZlKmxW6veK nUidtzfH5uE8vQsh5Q98WCEq871C/C+St1gK+q2H/MLrJeAqZD39DV+SKTuZ6Tcp 3jL0iYC+oO0OjkHppDQTUDweF9KrsAW1WEeNz2th1OUDSjBXuXbZ+N497taouX18 p2UN0gKNsOO2/QFrKL5KO7vSC56eBGoZz6gKtw/7dDtJBtizf1xKBRHW43b+CnQJ mHLs7Tj6oMC+vnMHkUKLH/6za3WJF1XHs5fp2isRgqoOSP8m0r6CMg8JnFIvmQf/ tbLospKSWqcgD5C5PlFm2wSOjdU7zuPKM7wchhKrrEIvdDPhXaKrlpwi5Q== =GFX1 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * More SVM fixes (Lara) * Module annotation database (Gerd) * Memory leak fixes (myself) * Build fixes (myself) * --with-devices-* support (Alex) # gpg: Signature made Fri 09 Jul 2021 17:23:52 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (48 commits) meson: Use input/output for entitlements target configure: allow the selection of alternate config in the build configs: rename default-configs to configs and reorganise hw/arm: move CONFIG_V7M out of default-devices hw/arm: add dependency on OR_IRQ for XLNX_VERSAL meson: Introduce target-specific Kconfig meson: switch function tests from compilation to linking vl: fix leak of qdict_crumple return value target/i386: fix exceptions for MOV to DR target/i386: Added DR6 and DR7 consistency checks target/i386: Added MSRPM and IOPM size check monitor/tcg: move tcg hmp commands to accel/tcg, register them dynamically usb: build usb-host as module monitor/usb: register 'info usbhost' dynamically usb: drop usb_host_dev_is_scsi_storage hook monitor: allow register hmp commands accel: build tcg modular accel: add tcg module annotations accel: build qtest modular accel: add qtest module annotations ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-07-11 22:20:51 +01:00
Peter Maydell	42e1d798a6	Block layer patches - Make blockdev-reopen stable - Remove deprecated qemu-img backing file without format - rbd: Convert to coroutines and add write zeroes support - rbd: Updated MAINTAINERS - export/fuse: Allow other users access to the export - vhost-user: Fix backends without multiqueue support - Fix drive-backup transaction endless drained section -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmDoRdIRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9bvgQ/+Ogq24n1UOQc8FEKRYfyhajNToQ9ofzWN iLiblSGx2QDq+CauD3qdu6z7DLlqEXeoM4NYM462oIPumptQj+9XZt7ftfh6FLWW 4yJEbjfnVKOba+vFdJ+E0DStwnPaxYdnrPGd53cwHZfbZh4ZmkpTM350mzHHiLTb KYKOgWd+UHZbkYeCVNYTGe30SRBiKeAecTpsVZ5HVhe7LstjByuy5stk8dytLpdV YqdKOToZfOp77XiHr8YcLLp1HHBGlr5hw73V4SDas0beCp7hqtnAqsTYyXBue4xO 4zfD4Gujr5JVOCb0crDTyOmOQY5E+y2dqFoOUF00D5AoN2vj4nfQ9ESkbqlE9BVh mgJ1izSokYlN2X8rIwGXNR5fbxRmxxfkAA4rScNRytj1KxDHyrDxrp/k8YFemxSQ qwgb/FBm0fcr69evPRzovKwZFhcyPremksluHQE4rZZ66qBQ2cGuDJPE7PWVTpPH 67JCrIVK/O6n5p+4ilFHmQQ3aP3ol0frMFcboYVRchJ2MhIDTsfFL3F/tTK8hy86 AmrrdQ1BQIAoKNOKnAmOSOUdExM55OcfPmX69+AhEk2GeWP6kgz5Pks4H3qCiKGf YoRk8F1V+N4q+C0mFFovB61bNQ6COIlBuzmD9EtmpDD/Ta3Wib+3ZnoGVIdPS+OI jyj+qJxd9z4= =kH+r -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches - Make blockdev-reopen stable - Remove deprecated qemu-img backing file without format - rbd: Convert to coroutines and add write zeroes support - rbd: Updated MAINTAINERS - export/fuse: Allow other users access to the export - vhost-user: Fix backends without multiqueue support - Fix drive-backup transaction endless drained section # gpg: Signature made Fri 09 Jul 2021 13:49:22 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (28 commits) block: Make blockdev-reopen stable API iotests: Test reopening multiple devices at the same time block: Support multiple reopening with x-blockdev-reopen block: Acquire AioContexts during bdrv_reopen_multiple() block: Add bdrv_reopen_queue_free() qcow2: Fix dangling pointer after reopen for 'file' qemu-img: Improve error for rebase without backing format qemu-img: Require -F with -b backing image qcow2: Prohibit backing file changes in 'qemu-img amend' blockdev: fix drive-backup transaction endless drained section vhost-user: Fix backends without multiqueue support MAINTAINERS: add block/rbd.c reviewer block/rbd: fix type of task->complete iotests/fuse-allow-other: Test allow-other iotests/308: Test +w on read-only FUSE exports export/fuse: Let permissions be adjustable export/fuse: Give SET_ATTR_SIZE its own branch export/fuse: Add allow-other option export/fuse: Pass default_permissions for mount util/uri: do not check argument of uri_free() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-07-10 19:55:21 +01:00
Gerd Hoffmann	f8ade0dc01	modules: add block module annotations Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Jose R. Ziviani <jziviani@suse.de> Message-Id: <20210624103836.2382472-14-kraxel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-07-09 18:20:27 +02:00
Paolo Bonzini	63a7f85306	meson: fix missing preprocessor symbols While most libraries do not need a CONFIG_* symbol because the "when:" clauses are enough, some do. Add them back or stop using them if possible. In the case of libpmem, the statement to add the CONFIG_* symbol was still in configure, but could not be triggered because it checked for "no" instead of "disabled" (and it would be wrong anyway since the test for the library has not been done yet). Reported-by: Li Zhijian <lizhijian@cn.fujitsu.com> Fixes: `587d59d6cc` ("configure, meson: convert virgl detection to meson", 2021-07-06) Fixes: `83ef16821a` ("configure, meson: convert libdaxctl detection to meson", 2021-07-06) Fixes: `e36e8c70f6` ("configure, meson: convert libpmem detection to meson", 2021-07-06) Fixes: `53c22b68e3` ("configure, meson: convert liburing detection to meson", 2021-07-06) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-07-09 18:19:00 +02:00
Kevin Wolf	6cf42ca2f9	block: Acquire AioContexts during bdrv_reopen_multiple() As the BlockReopenQueue can contain nodes in multiple AioContexts, only one of which may be locked when AIO_WAIT_WHILE() can be called, we can't let the caller lock the right contexts. Instead, individually lock the AioContext of a single node when iterating the queue. Reintroduce bdrv_reopen() as a wrapper for reopening a single node that drains the node and temporarily drops the AioContext lock for bdrv_reopen_multiple(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210708114709.206487-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 13:19:11 +02:00
Kevin Wolf	bcfd86d6a6	qcow2: Fix dangling pointer after reopen for 'file' Without an external data file, s->data_file is a second pointer with the same value as bs->file. When changing bs->file to a different BdrvChild and freeing the old BdrvChild, s->data_file must also be updated, otherwise it points to freed memory and causes crashes. This problem was caught by iotests case 245. Fixes: df2b7086f169239ebad5d150efa29c9bb6d4f820 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210708114709.206487-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 13:19:11 +02:00
Eric Blake	5a385bf5c5	qcow2: Prohibit backing file changes in 'qemu-img amend' This was deprecated back in `bc5ee6da7` (qcow2: Deprecate use of qemu-img amend to change backing file), and no one in the meantime has given any reasons why it should be supported. Time to make change attempts a hard error (but for convenience, specifying the _same_ backing chain is not forbidden). Update a couple of iotests to match. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210503213600.569128-2-eblake@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	64cc845bdb	block/rbd: fix type of task->complete task->complete is a bool not an integer. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20210707180449.32665-1-pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Max Reitz	6aeeaed29c	export/fuse: Let permissions be adjustable Allow changing the file mode, UID, and GID through SETATTR. Without allow_other, UID and GID are not allowed to be changed, because it would not make sense. Also, changing group or others' permissions is not allowed either. For read-only exports, +w cannot be set. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210625142317.271673-5-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Max Reitz	9bad96a8cc	export/fuse: Give SET_ATTR_SIZE its own branch In order to support changing other attributes than the file size in fuse_setattr(), we have to give each its own independent branch. This also applies to the only attribute we do support right now. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210625142317.271673-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Max Reitz	8fc54f9428	export/fuse: Add allow-other option Without the allow_other mount option, no user (not even root) but the one who started qemu/the storage daemon can access the export. Allow users to configure the export such that such accesses are possible. While allow_other is probably what users want, we cannot make it an unconditional default, because passing it is only possible (for non-root users) if the global fuse.conf configuration file allows it. Thus, the default is an 'auto' mode, in which we first try with allow_other, and then fall back to without. FuseExport.allow_other reports whether allow_other was actually used as a mount option or not. Currently, this information is not used, but a future patch will let this field decide whether e.g. an export's UID and GID can be changed through chmod. One notable thing about 'auto' mode is that libfuse may print error messages directly to stderr, and so may fusermount (which it executes). Our export code cannot really filter or hide them. Therefore, if 'auto' fails its first attempt and has to fall back, fusermount will print an error message that mounting with allow_other failed. This behavior necessitates a change to iotest 308, namely we need to filter out this error message (because if the first attempt at mounting with allow_other succeeds, there will be no such message). Furthermore, common.rc's _make_test_img should use allow-other=off for FUSE exports, because iotests generally do not need to access images from other users, so allow-other=on or allow-other=auto have no advantage. OTOH, allow-other=on will not work on systems where user_allow_other is disabled, and with allow-other=auto, we get said error message that we would need to filter out again. Just disabling allow-other is simplest. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210625142317.271673-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Max Reitz	2c7dd057aa	export/fuse: Pass default_permissions for mount We do not do any permission checks in fuse_open(), so let the kernel do them. We already let fuse_getattr() report the proper UNIX permissions, so this should work the way we want. This causes a change in 308's reference output, because now opening a non-writable export with O_RDWR fails already, instead of only actually attempting to write to it. (That is an improvement.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210625142317.271673-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Heinrich Schuchardt	c2615bdfbd	util/uri: do not check argument of uri_free() uri_free() checks if its argument is NULL in uri_clean() and g_free(). There is no need to check the argument before the call. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Message-Id: <20210629063602.4239-1-xypron.glpk@gmx.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	eb06cbab7e	block/rbd: drop qemu_rbd_refresh_limits librbd supports 1 byte alignment for all aio operations. Currently, there is no API call to query limits from the Ceph ObjectStore backend. So drop the bdrv_refresh_limits completely until there is such an API call. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-7-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	c56ac27d2a	block/rbd: add write zeroes support This patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores BDRV_REQ_MAY_UNMAP for older librbd versions. The rationale for this is as follows (citing Ilya Dryomov current RBD maintainer): ---8<--- a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes() and as a consequence always unmap if librbd is too old It's not clear what qemu's expectation is but in general Write Zeroes is allowed to unmap. The only guarantee is that subsequent reads return zeroes, everything else is a hint. This is how it is specified in the kernel and in the NVMe spec. In particular, block/nvme.c implements it as follows: if (flags & BDRV_REQ_MAY_UNMAP) { cdw12 \|= (1 << 25); } This sets the Deallocate bit. But if it's not set, the device may still deallocate: """ If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes command, and the namespace supports clearing all bytes to 0h in the values read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from a deallocated logical block and its metadata (excluding protection information), then for each specified logical block, the controller: - should deallocate that logical block; ... If the Deallocate bit is cleared to '0' in a Write Zeroes command, and the namespace supports clearing all bytes to 0h in the values read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from a deallocated logical block and its metadata (excluding protection information), then, for each specified logical block, the controller: - may deallocate that logical block; """ https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags Again, it's not clear what qemu expects here, but without it we end up in a ridiculous situation where specifying the "don't allow slow fallback" switch immediately fails all efficient zeroing requests on a device where Write Zeroes is always efficient: $ qemu-io -c 'help write' \| grep -- '-[zun]' -n, -- with -z, don't allow slow fallback -u, -- with -z, allow unmapping -z, -- write zeroes using blk_co_pwrite_zeroes $ qemu-io -f rbd -c 'write -z -u -n 0 1M' rbd:foo/bar write failed: Operation not supported --->8--- Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-6-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	c3e5fac534	block/rbd: migrate from aio to coroutines Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-5-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	6d9214189e	block/rbd: update s->image_size in qemu_rbd_getlength While at it just call rbd_get_size and avoid rbd_image_info_t. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-4-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	832a93dcb8	block/rbd: store object_size in BDRVRBDState Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-3-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	48672ac058	block/rbd: bump librbd requirement to luminous release Ceph Luminous (version 12.2.z) is almost 4 years old at this point. Bump the requirement to get rid of the ifdef'ry in the code. Qemu 6.1 dropped the support for RHEL-7 which was the last supported OS that required an older librbd. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-2-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Or Ozeri	42e4ac9ef5	block/rbd: Add support for rbd image encryption Starting from ceph Pacific, RBD has built-in support for image-level encryption. Currently supported formats are LUKS version 1 and 2. There are 2 new relevant librbd APIs for controlling encryption, both expect an open image context: rbd_encryption_format: formats an image (i.e. writes the LUKS header) rbd_encryption_load: loads encryptor/decryptor to the image IO stack This commit extends the qemu rbd driver API to support the above. Signed-off-by: Or Ozeri <oro@il.ibm.com> Message-Id: <20210627114635.39326-1-oro@il.ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Akihiko Odaki	9f460c64e1	block/io: Merge discard request alignments Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Message-id: 20210705130458.97642-3-akihiko.odaki@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-06 14:28:55 +01:00
Akihiko Odaki	0dfc7af2b2	block/file-posix: Optimize for macOS This commit introduces "punch hole" operation and optimizes transfer block size for macOS. Thanks to Konstantin Nazarov for detailed analysis of a flaw in an old version of this change: https://gist.github.com/akihikodaki/87df4149e7ca87f18dc56807ec5a1bc5#gistcomment-3654667 Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Message-id: 20210705130458.97642-1-akihiko.odaki@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-06 14:28:55 +01:00
Peter Maydell	9c2647f750	Block layer patches - Supporting changing 'file' in x-blockdev-reopen - ssh: add support for sha256 host key fingerprints - vhost-user-blk: Implement reconnection during realize - introduce QEMU_AUTO_VFREE - Don't require password of encrypted backing file for image creation - Code cleanups -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmDclTcRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9bzaw/+PYQ9vPG+ZROWl633TUOQu7IYGZynXCET ZHlV2JlXnFH8QoO8A53U72cgVg+GlwDpiMCCtjEGMG1yfMBNe+DXR1wFUMDne1Gs qIFX4gIVpPGDi3gPeQvefLCwoN8VXwIxvCJj40YR9BY0cqQR6joI81kjVlqKemqB cHG4qJHmnphihSLep/dd2BRInkeHWXWU63jK4d7ctdCwHZPNDf0u0FEraxEqS/d0 5tfdIl3haghhoDRRamyh5bLZBCW1KlpfTR98RdspcwpSlXq/FgV5N9CFa15XSYfQ rinSClSIOpLzG90+tBihROVBXvbugO5qZVQTG0Yg1tt4FGG8Cmiqf9MNXC5yctNg WnaQQipx/37deafGA4jqorZBJd1R87JLJTBFTpkB47XAFq/ltqsTDhrrfdS+jail Fd+qyqWg0Jx3JjdhSUpHvDKBBsErjoxtoyQIGakSreXGmj2UY6BFmGii7lnLZNLo +E81C7exnkCIGKkOHy+y9DkpVY/PEJKCG7uwcyy+F2qOqGUOxKLuZomWcLodo6Vf /eJ/UsLJt6HhXhXq/1ZZHmaORn8Lft1yr/9azoGXZ7er+jZcbEkhbcZmET+Y6ykq Vox/GmLkhyVkM96MA0lMW5hHPWUbF29m9Jmq3nNfvFWBcILEs4uWSlbd0M2oAmWj ung9sKIV/8s= =aB0a -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches - Supporting changing 'file' in x-blockdev-reopen - ssh: add support for sha256 host key fingerprints - vhost-user-blk: Implement reconnection during realize - introduce QEMU_AUTO_VFREE - Don't require password of encrypted backing file for image creation - Code cleanups # gpg: Signature made Wed 30 Jun 2021 17:00:55 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (24 commits) vhost-user-blk: Implement reconnection during realize vhost-user-blk: Factor out vhost_user_blk_realize_connect() vhost: Distinguish errors in vhost_dev_get_config() vhost-user-blk: Add Error parameter to vhost_user_blk_start() vhost: Return 0/-errno in vhost_dev_init() vhost: Distinguish errors in vhost_backend_init() vhost: Add Error parameter to vhost_dev_init() block/ssh: add support for sha256 host key fingerprints block/commit: use QEMU_AUTO_VFREE introduce QEMU_AUTO_VFREE iotests: Test replacing files with x-blockdev-reopen block: Allow changing bs->file on reopen block: BDRVReopenState: drop replace_backing_bs field block: move supports_backing check to bdrv_set_file_or_backing_noperm() block: bdrv_reopen_parse_backing(): simplify handling implicit filters block: bdrv_reopen_parse_backing(): don't check frozen child block: bdrv_reopen_parse_backing(): don't check aio context block: introduce bdrv_set_file_or_backing_noperm() block: introduce bdrv_remove_file_or_backing_child() block: comment graph-modifying function not updating permissions ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-07-02 11:46:32 +01:00
Daniel P. Berrangé	bf783261f0	block/ssh: add support for sha256 host key fingerprints Currently the SSH block driver supports MD5 and SHA1 for host key fingerprints. This is a cryptographically sensitive operation and so these hash algorithms are inadequate by modern standards. This adds support for SHA256 which has been supported in libssh since the 0.8.1 release. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210622115156.138458-1-berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-30 12:45:32 +02:00
Philippe Mathieu-Daudé	7b3b616838	block/nbd: Use qcrypto_tls_creds_check_endpoint() Avoid accessing QCryptoTLSCreds internals by using the qcrypto_tls_creds_check_endpoint() helper. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-29 18:29:47 +01:00
Vladimir Sementsov-Ogievskiy	7170170866	block/commit: use QEMU_AUTO_VFREE Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210628121133.193984-3-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-29 16:51:21 +02:00
Eric Blake	97efa8698e	block: Move read-only check during truncation earlier No need to start a tracked request that will always fail. The choice to check read-only after bdrv_inc_in_flight() predates `1bc5f09f2e` (block: Use tracked request for truncate), but waiting for serializing requests can make the effect more noticeable. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210609163034.997943-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-29 16:51:00 +02:00
Peter Maydell	6512fa497c	* Some Meson test conversions * KVM dirty page ring buffer fix * KVM TSC scaling support * Fixes for SG_IO with /dev/sdX devices * (Non)support for host devices on iOS * -smp cleanups -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDV5TIUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNySgf9HMnAtLWp36p2ie74o4rrW9x3Ojrm fuCq2i3q3nBhEKqqiyp+QQJGubE44mXEZQYtX89tOfSFgg7o6SLIoAcQQskr+In6 f9I1jjpSVTls0AaGUO+iRn9KiTzeMWeo1l6Wht+2mfBL5XpNLaLLu/T49uPhjlvN zFi5blgILxIYMqMCD1joDBnIiqqDozr0p7QzRZD8re25sRhg0NHQxyIh3OxBPpJ9 3Jhy1Us0cDWrwvPbxz6S5N0zesLu1ojtojVPy6iKjyHSv+6eiE6bHyIbS8duG5+H zBC1THOsUV3X1UvPAjuSNlgfNeobGAzmxSJ/evLgWWkpkx1mLtsnL5RARQ== =YoOL -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * Some Meson test conversions * KVM dirty page ring buffer fix * KVM TSC scaling support * Fixes for SG_IO with /dev/sdX devices * (Non)support for host devices on iOS * -smp cleanups # gpg: Signature made Fri 25 Jun 2021 15:16:18 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (28 commits) machine: reject -smp dies!=1 for non-PC machines machine: pass QAPI struct to mc->smp_parse machine: add error propagation to mc->smp_parse machine: move common smp_parse code to caller machine: move dies from X86MachineState to CpuTopology file-posix: handle EINTR during ioctl block: detect DKIOCGETBLOCKCOUNT/SIZE before use block: try BSD disk size ioctls one after another block: check for sys/disk.h block: feature detection for host block support file-posix: try BLKSECTGET on block devices too, do not round to power of 2 block: add max_hw_transfer to BlockLimits block-backend: align max_transfer to request alignment osdep: provide ROUND_DOWN macro scsi-generic: pass max_segments via max_iov field in BlockLimits file-posix: fix max_iov for /dev/sg devices KVM: Fix dirty ring mmap incorrect size due to renaming accident configure, meson: convert libusbredir detection to meson configure, meson: convert libcacard detection to meson configure, meson: convert libusb detection to meson ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-06-28 21:04:22 +01:00
Peter Maydell	9e654e1019	block: Make block-copy API thread-safe -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEi5wmzbL9FHyIDoahVh8kwfGfefsFAmDVzrgACgkQVh8kwfGf efuoKRAAsHqE46P2xwjjPROtwZC6HP/Ny5xfbF2CglfC4eX4ff89dwWSagjHfBig 1TQjHemOo5Fs8gyBGec0WPn0HaBVFthq75VGObt+QkHy7+owGDmtVghkXnEO8z2c gldPoVeOXAbU4DZXgITQYfN1ljCOdrdKQMrMYmKqXLmw/vMzAfKlBKqsOFQiyVys 4egn25QxyiOh+T29zyVwmGVABaH2TIuJqkDr+iMh4IypYZf0BlqJRw++kLhGdxGq RIpXQiXExy3lRm8htuh+GDAigSXNz93XKU3ZHe8RPBtPfdS0siGWwVTW2nLsH+Rc vfUvfkqBSGcFaFjGHg/eNGvoMTYAZKHx8yq72voqrfFIPtz3NAoS2ahz/hIU/NiL YLOmOcRZVx+xJ5lxjaxi/SvbTHVtZoBym/Aje/YiT3v4A8rziG6BeBUP9ec/bk+D YYxMZNfzW2jVq0Vl4TZyYmV8e/H8Ha3HbLLJip3tiLXsBIjajfNti9iJ/xac5NzD jV5pR27yIXilYHPR7GCYaMRp5LJv8uGiW704yAt2dwBizqonm7cgTg5Fcx0HFDyk +HyPwu/TGI3cEF7dl+8V+AmwKug+jFj3VFVh5UMtf4bqNeliYNx+QpCUuWPuPMFc U1nXWYBtcklD4JNPERUgUW7x2tmAJN5dGDY0LAa2brrOnKVTTwU= =JXBO -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vsementsov/tags/pull-jobs-2021-06-25' into staging block: Make block-copy API thread-safe # gpg: Signature made Fri 25 Jun 2021 13:40:24 BST # gpg: using RSA key 8B9C26CDB2FD147C880E86A1561F24C1F19F79FB # gpg: Good signature from "Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 8B9C 26CD B2FD 147C 880E 86A1 561F 24C1 F19F 79FB * remotes/vsementsov/tags/pull-jobs-2021-06-25: block-copy: atomic .cancelled and .finished fields in BlockCopyCallState block-copy: add CoMutex lock block-copy: move progress_set_remaining in block_copy_task_end block-copy: streamline choice of copy_range vs. read/write block-copy: small refactor in block_copy_task_entry and block_copy_common co-shared-resource: protect with a mutex progressmeter: protect with a mutex blockjob: let ratelimit handle a speed of 0 block-copy: let ratelimit handle a speed of 0 ratelimit: treat zero speed as unlimited Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-06-28 18:58:19 +01:00
Emanuele Giuseppe Esposito	149009bef4	block-copy: atomic .cancelled and .finished fields in BlockCopyCallState By adding acquire/release pairs, we ensure that .ret and .error_is_read fields are written by block_copy_dirty_clusters before .finished is true, and that they are read by API user after .finished is true. The atomic here are necessary because the fields are concurrently modified in coroutines, and read outside. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-6-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:51 +03:00
Emanuele Giuseppe Esposito	d0c389d2ce	block-copy: add CoMutex lock Group various structures fields, to better understand what we need to protect with a lock and what doesn't need it. Then, add a CoMutex to protect concurrent access of block-copy data structures. This mutex also protects .copy_bitmap, because its thread-safe API does not prevent it from assigning two tasks to the same bitmap region. Exceptions to the lock: - .sleep_state is handled in the series "coroutine: new sleep/wake API" and thus here left as TODO. - .finished, .cancelled and reads to .ret and .error_is_read will be protected in the following patch, because are used also outside coroutines. - .skip_unallocated is atomic. Including it under the mutex would increase the critical sections and make them also much more complex. We can have it as atomic since it is only written from outside and read by block-copy coroutines. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-5-eesposit@redhat.com> [vsementsov: fix typo in comment] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:39 +03:00
Emanuele Giuseppe Esposito	e3dd339fee	block-copy: move progress_set_remaining in block_copy_task_end Moving this function in task_end ensures to update the progress anyways, even if there is an error. It also helps in next patch, allowing task_end to have only one critical section. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-4-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:35 +03:00
Paolo Bonzini	05d5e12b24	block-copy: streamline choice of copy_range vs. read/write Put the logic to determine the copy size in a separate function, so that there is a simple state machine for the possible methods of copying data from one BlockDriverState to the other. Use .method instead of .copy_range as in-out argument, and include also .zeroes as an additional copy method. While at it, store the common computation of block_copy_max_transfer into a new field of BlockCopyState, and make sure that we always obey max_transfer; that's more efficient even for the COPY_RANGE_READ_WRITE case. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210624072043.180494-3-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:33 +03:00
Emanuele Giuseppe Esposito	c6a3e3df30	block-copy: small refactor in block_copy_task_entry and block_copy_common Use a local variable instead of referencing BlockCopyState through a BlockCopyCallState or BlockCopyTask every time. This is in preparation for next patches. No functional change intended. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-2-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:32:09 +03:00
Emanuele Giuseppe Esposito	a7b4f8fc09	progressmeter: protect with a mutex Progressmeter is protected by the AioContext mutex, which is taken by the block jobs and their caller (like blockdev). We would like to remove the dependency of block layer code on the AioContext mutex, since most drivers and the core I/O code are already not relying on it. Create a new C file to implement the ProgressMeter API, but keep the struct as public, to avoid forcing allocation on the heap. Also add a mutex to be able to provide an accurate snapshot of the progress values to the caller. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210614081130.22134-5-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:24:24 +03:00
Paolo Bonzini	ca657c99e6	block-copy: let ratelimit handle a speed of 0 Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210614081130.22134-3-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:24:16 +03:00
Paolo Bonzini	bd80936a4f	file-posix: handle EINTR during ioctl Similar to other handle_aiocb_* functions, handle_aiocb_ioctl needs to cater for the possibility that ioctl is interrupted by a signal. Otherwise, the I/O is incorrectly reported as a failure to the guest. Reported-by: Gordon Watson <gwatson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Joelle van Dyne	09e20abdda	block: detect DKIOCGETBLOCKCOUNT/SIZE before use iOS hosts do not have these defined so we fallback to the default behaviour. Co-authored-by: Warner Losh <imp@bsdimp.com> Signed-off-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Paolo Bonzini	267cd53f5f	block: try BSD disk size ioctls one after another Try all the possible ioctls for disk size as long as they are supported, to keep the #if ladder simple. Extracted and cleaned up from a patch by Joelle van Dyne and Warner Losh. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00

1 2 3 4 5 ...

5424 Commits