mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Emanuele Giuseppe Esposito	149009bef4	block-copy: atomic .cancelled and .finished fields in BlockCopyCallState By adding acquire/release pairs, we ensure that .ret and .error_is_read fields are written by block_copy_dirty_clusters before .finished is true, and that they are read by API user after .finished is true. The atomic here are necessary because the fields are concurrently modified in coroutines, and read outside. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-6-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:51 +03:00
Emanuele Giuseppe Esposito	d0c389d2ce	block-copy: add CoMutex lock Group various structures fields, to better understand what we need to protect with a lock and what doesn't need it. Then, add a CoMutex to protect concurrent access of block-copy data structures. This mutex also protects .copy_bitmap, because its thread-safe API does not prevent it from assigning two tasks to the same bitmap region. Exceptions to the lock: - .sleep_state is handled in the series "coroutine: new sleep/wake API" and thus here left as TODO. - .finished, .cancelled and reads to .ret and .error_is_read will be protected in the following patch, because are used also outside coroutines. - .skip_unallocated is atomic. Including it under the mutex would increase the critical sections and make them also much more complex. We can have it as atomic since it is only written from outside and read by block-copy coroutines. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-5-eesposit@redhat.com> [vsementsov: fix typo in comment] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:39 +03:00
Emanuele Giuseppe Esposito	e3dd339fee	block-copy: move progress_set_remaining in block_copy_task_end Moving this function in task_end ensures to update the progress anyways, even if there is an error. It also helps in next patch, allowing task_end to have only one critical section. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-4-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:35 +03:00
Paolo Bonzini	05d5e12b24	block-copy: streamline choice of copy_range vs. read/write Put the logic to determine the copy size in a separate function, so that there is a simple state machine for the possible methods of copying data from one BlockDriverState to the other. Use .method instead of .copy_range as in-out argument, and include also .zeroes as an additional copy method. While at it, store the common computation of block_copy_max_transfer into a new field of BlockCopyState, and make sure that we always obey max_transfer; that's more efficient even for the COPY_RANGE_READ_WRITE case. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210624072043.180494-3-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:33 +03:00
Emanuele Giuseppe Esposito	c6a3e3df30	block-copy: small refactor in block_copy_task_entry and block_copy_common Use a local variable instead of referencing BlockCopyState through a BlockCopyCallState or BlockCopyTask every time. This is in preparation for next patches. No functional change intended. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-2-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:32:09 +03:00
Emanuele Giuseppe Esposito	a7b4f8fc09	progressmeter: protect with a mutex Progressmeter is protected by the AioContext mutex, which is taken by the block jobs and their caller (like blockdev). We would like to remove the dependency of block layer code on the AioContext mutex, since most drivers and the core I/O code are already not relying on it. Create a new C file to implement the ProgressMeter API, but keep the struct as public, to avoid forcing allocation on the heap. Also add a mutex to be able to provide an accurate snapshot of the progress values to the caller. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210614081130.22134-5-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:24:24 +03:00
Paolo Bonzini	ca657c99e6	block-copy: let ratelimit handle a speed of 0 Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210614081130.22134-3-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:24:16 +03:00
Paolo Bonzini	bd80936a4f	file-posix: handle EINTR during ioctl Similar to other handle_aiocb_* functions, handle_aiocb_ioctl needs to cater for the possibility that ioctl is interrupted by a signal. Otherwise, the I/O is incorrectly reported as a failure to the guest. Reported-by: Gordon Watson <gwatson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Joelle van Dyne	09e20abdda	block: detect DKIOCGETBLOCKCOUNT/SIZE before use iOS hosts do not have these defined so we fallback to the default behaviour. Co-authored-by: Warner Losh <imp@bsdimp.com> Signed-off-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Paolo Bonzini	267cd53f5f	block: try BSD disk size ioctls one after another Try all the possible ioctls for disk size as long as they are supported, to keep the #if ladder simple. Extracted and cleaned up from a patch by Joelle van Dyne and Warner Losh. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Joelle van Dyne	14176c8d05	block: feature detection for host block support On Darwin (iOS), there are no system level APIs for directly accessing host block devices. We detect this at configure time. Signed-off-by: Joelle van Dyne <j@getutm.app> Message-Id: <20210315180341.31638-2-j@getutm.app> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Paolo Bonzini	18473467d5	file-posix: try BLKSECTGET on block devices too, do not round to power of 2 bs->sg is only true for character devices, but block devices can also be used with scsi-block and scsi-generic. Unfortunately BLKSECTGET returns bytes in an int for /dev/sgN devices, and sectors in a short for block devices, so account for that in the code. The maximum transfer also need not be a power of 2 (for example I have seen disks with 1280 KiB maximum transfer) so there's no need to pass the result through pow2floor. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Paolo Bonzini	24b36e9813	block: add max_hw_transfer to BlockLimits For block host devices, I/O can happen through either the kernel file descriptor I/O system calls (preadv/pwritev, io_submit, io_uring) or the SCSI passthrough ioctl SG_IO. In the latter case, the size of each transfer can be limited by the HBA, while for file descriptor I/O the kernel is able to split and merge I/O in smaller pieces as needed. Applying the HBA limits to file descriptor I/O results in more system calls and suboptimal performance, so this patch splits the max_transfer limit in two: max_transfer remains valid and is used in general, while max_hw_transfer is limited to the maximum hardware size. max_hw_transfer can then be included by the scsi-generic driver in the block limits page, to ensure that the stricter hardware limit is used. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Paolo Bonzini	b99f7fa08a	block-backend: align max_transfer to request alignment Block device requests must be aligned to bs->bl.request_alignment. It makes sense for drivers to align bs->bl.max_transfer the same way; however when there is no specified limit, blk_get_max_transfer just returns INT_MAX. Since the contract of the function does not specify that INT_MAX means "no maximum", just align the outcome of the function (whether INT_MAX or bs->bl.max_transfer) before returning it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Paolo Bonzini	01ef8185b8	scsi-generic: pass max_segments via max_iov field in BlockLimits I/O to a disk via read/write is not limited by the number of segments allowed by the host adapter; the kernel can split requests if needed, and the limit imposed by the host adapter can be very low (256k or so) to avoid that SG_IO returns EINVAL if memory is heavily fragmented. Since this value is only interesting for SG_IO-based I/O, do not include it in the max_transfer and only take it into account when patching the block limits VPD page in the scsi-generic device. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2021-06-25 10:54:12 +02:00
Paolo Bonzini	8ad5ab6148	file-posix: fix max_iov for /dev/sg devices Even though it was only called for devices that have bs->sg set (which must be character devices), sg_get_max_segments looked at /sys/dev/block which only works for block devices. On Linux the sg driver has its own way to provide the maximum number of iovecs in a scatter/gather list, so add support for it. The block device path is kept because it will be reinstated in the next patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2021-06-25 10:54:12 +02:00
Max Reitz	32a9a245d7	block/snapshot: Clarify goto fallback behavior In the bdrv_snapshot_goto() fallback code, we work with a pointer to either bs->file or bs->backing. We detach that child, close the node (with .bdrv_close()), apply the snapshot on the child node, and then re-open the node (with .bdrv_open()). In order for .bdrv_open() to attach the same child node that we had before, we pass "file={child-node}" or "backing={child-node}" to it. Therefore, when .bdrv_open() has returned success, we can assume that bs->file or bs->backing (respectively) points to our original child again. This is verified by an assertion. All of this is not immediately clear from a quick glance at the code, so add a comment to the assertion what it is for, and why it is valid. It certainly confused Coverity. Reported-by: Coverity (CID 1452774) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210503095418.31521-1-mreitz@redhat.com> [mreitz: s/close/detach/] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-24 09:49:04 +02:00
Vladimir Sementsov-Ogievskiy	bbfb7c2f35	block/nbd: safer transition to receiving request req->receiving is a flag of request being in one concrete yield point in nbd_co_do_receive_one_chunk(). Such kind of boolean flag is always better to unset before scheduling the coroutine, to avoid double scheduling. So, let's be more careful. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-33-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy	91e0998f5a	block/nbd: add nbd_client_connected() helper We already have two similar helpers for other state. Let's add another one for convenience. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-32-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy	a71d597b98	block/nbd: reuse nbd_co_do_establish_connection() in nbd_open() The only last step we need to reuse the function is coroutine-wrapper. nbd_open() may be called from non-coroutine context. So, generate the wrapper and use it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-31-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy	97cf89259e	nbd/client-connection: add option for non-blocking connection attempt We'll need a possibility of non-blocking nbd_co_establish_connection(), so that it returns immediately, and it returns success only if a connections was previously established in background. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-30-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy	51edbf537d	block/nbd: split nbd_co_do_establish_connection out of nbd_reconnect_attempt Split out the part that we want to reuse for nbd_open(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-29-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:21:21 -05:00
Vladimir Sementsov-Ogievskiy	43cb34dede	nbd/client-connection: return only one io channel block/nbd doesn't need underlying sioc channel anymore. So, we can update nbd/client-connection interface to return only one top-most io channel, which is more straight forward. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-27-vsementsov@virtuozzo.com> [eblake: squash in Vladimir's fixes for uninit usage caught by clang] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:20:53 -05:00
Vladimir Sementsov-Ogievskiy	95a078ea3e	block/nbd: drop BDRVNBDState::sioc Currently sioc pointer is used just to pass from socket-connection to nbd negotiation. Drop the field, and use local variables instead. With next commit we'll update nbd/client-connection.c to behave appropriately (return only top-most ioc, not two channels). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-26-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:54 -05:00
Vladimir Sementsov-Ogievskiy	c2405af0e4	block/nbd: don't touch s->sioc in nbd_teardown_connection() Negotiation during reconnect is now done in a thread, and s->sioc is not available during negotiation. Negotiation in thread will be cancelled by nbd_client_connection_release() called from nbd_clear_bdrvstate(). So, we don't need this code chunk anymore. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-25-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:54 -05:00
Vladimir Sementsov-Ogievskiy	6d2b0332d3	block/nbd: use negotiation of NBDClientConnection Now that we can opt in to negotiation as part of the client connection thread, use that to simplify connection_co. This is another step on the way to moving all reconnect code into NBDClientConnection. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-24-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:54 -05:00
Vladimir Sementsov-Ogievskiy	e9ba7788b0	block/nbd: split nbd_handle_updated_info out of nbd_client_handshake() To be reused in the following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-23-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:54 -05:00
Vladimir Sementsov-Ogievskiy	130d49baa5	nbd/client-connection: add possibility of negotiation Add arguments and logic to support nbd negotiation in the same thread after successful connection. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-20-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	5276c87c12	nbd: move connection code from block/nbd to nbd/client-connection We now have bs-independent connection API, which consists of four functions: nbd_client_connection_new() nbd_client_connection_release() nbd_co_establish_connection() nbd_co_establish_connection_cancel() Move them to a separate file together with NBDClientConnection structure which becomes private to the new API. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-18-vsementsov@virtuozzo.com> [eblake: comment tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	248d470198	block/nbd: introduce nbd_client_connection_release() This is a last step of creating bs-independent nbd connection interface. With next commit we can finally move it to separate file. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-17-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	f68729747d	block/nbd: introduce nbd_client_connection_new() This is a step of creating bs-independent nbd connection interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-16-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	90ddc64fb2	block/nbd: rename NBDConnectThread to NBDClientConnection We are going to move the connection code to its own file, and want clear names and APIs first. The structure is shared between user and (possibly) several runs of connect-thread. So it's wrong to call it "thread". Let's rename to something more generic. Appropriately rename connect_thread and thr variables to conn. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-15-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	c3e7730485	block/nbd: make nbd_co_establish_connection_cancel() bs-independent nbd_co_establish_connection_cancel() actually needs only pointer to NBDConnectThread. So, make it clean. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-14-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	d33833d7af	block/nbd: bs-independent interface for nbd_co_establish_connection() We are going to split connection code to a separate file. Now we are ready to give nbd_co_establish_connection() clean and bs-independent interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-13-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	b8e8a3d116	block/nbd: drop thr->state We don't need all these states. The code refactored to use two boolean variables looks simpler. While moving the comment in nbd_co_establish_connection() rework it to give better information. Also, we are going to move the connection code to separate file and mentioning drained section would be confusing. Improve also the comment in NBDConnectThread, while dropping removed state names from it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-12-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: comment tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	08ea55d068	block/nbd: simplify waking of nbd_co_establish_connection() Instead of managing connect_bh, bh_ctx, and wait_connect fields, we can use a single link to the waiting coroutine with proper mutex protection. So new logic is: nbd_co_establish_connection() sets wait_co under the mutex, releases the mutex, then yield()s. Note that wait_co may be scheduled by the thread immediately after unlocking the mutex. Still, the main thread (or iothread) will not reach the code for entering the coroutine until the yield(), so we are safe. connect_thread_func() and nbd_co_establish_connection_cancel() do the following to handle wait_co: Under the mutex, if thr->wait_co is not NULL, make it NULL and schedule it. This way, we avoid scheduling the coroutine twice. Still scheduling is a bit different: In connect_thread_func() we can just call aio_co_wake under mutex, after commit [async: the main AioContext is only "current" if under the BQL] we are sure that aio_co_wake() will not try to acquire the aio context and do qemu_aio_coroutine_enter() but simply schedule the coroutine by aio_co_schedule(). nbd_co_establish_connection_cancel() will be called from non-coroutine context in further patch and will be able to go through qemu_aio_coroutine_enter() path of aio_co_wake(). So keep current behavior of waking the coroutine after the critical section. Also, this commit reduces the dependence of nbd_co_establish_connection() on the internals of bs (we now use a generic pointer to the coroutine, instead of direct use of s->connection_co). This is a step towards splitting the connection API out of nbd.c. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-11-vsementsov@virtuozzo.com> Reviewied-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	2def3edb4b	block/nbd: BDRVNBDState: drop unused connect_err and connect_status These fields are write-only. Drop them. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-10-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	2a25def4be	block/nbd: nbd_client_handshake(): fix leak of s->ioc Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Message-Id: <20210610100802.5888-9-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Roman Kagan	e8b35bf5dc	block/nbd: ensure ->connection_thread is always valid Simplify lifetime management of BDRVNBDState->connect_thread by delaying the possible cleanup of it until the BDRVNBDState itself goes away. This also reverts `0267101af6` "block/nbd: fix possible use after free of s->connect_thread" as now s->connect_thread can't be cleared until the very end. Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> [vsementsov: rebase, revert `0267101af6` changes] Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: tweak comment] Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-8-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	6cc702beac	block/nbd: call socket_address_parse_named_fd() in advance Detecting monitor by current coroutine works bad when we are not in coroutine context. And that's exactly so in nbd reconnect code, where qio_channel_socket_connect_sync() is called from thread. Monitor is needed only to parse named file descriptor. So, let's just parse it during nbd_open(), so that all further users of s->saddr don't need to access monitor. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-7-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	fb392b548e	block/nbd: connect_thread_func(): do qio_channel_set_delay(false) nbd_open() does it (through nbd_establish_connection()). Actually we lost that call on reconnect path in `1dc4718d84` "block/nbd: use non-blocking connect: fix vm hang on connect()" when we have introduced reconnect thread. Fixes: `1dc4718d84` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	bbba1c376b	block/nbd: fix how state is cleared on nbd_open() failure paths We have two "return error" paths in nbd_open() after nbd_process_options(). Actually we should call nbd_clear_bdrvstate() on these paths. Interesting that nbd_process_options() calls nbd_clear_bdrvstate() by itself. Let's fix leaks and refactor things to be more obvious: - intialize yank at top of nbd_open() - move yank cleanup to nbd_clear_bdrvstate() - refactor nbd_open() so that all failure paths except for yank-register goes through nbd_clear_bdrvstate() Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Roman Kagan	3687ad4903	block/nbd: fix channel object leak nbd_free_connect_thread leaks the channel object if it hasn't been stolen. Unref it and fix the leak. Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:52 -05:00
Daniel P. Berrangé	39683553f9	block: use GDateTime for formatting timestamp when dumping snapshot info The GDateTime APIs provided by GLib avoid portability pitfalls, such as some platforms where 'struct timeval.tv_sec' field is still 'long' instead of 'time_t'. When combined with automatic cleanup, GDateTime often results in simpler code too. Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Daniel P. Berrangé	99be1ac366	block: remove duplicate trace.h include Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Daniel P. Berrangé	60ff2ae2a2	block: add trace point when fdatasync fails A flush failure is a critical failure scenario for some operations. For example, it will prevent migration from completing, as it will make vm_stop() report an error. Thus it is important to have a trace point present for debugging. Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Daniel P. Berrangé	c7ddc8821d	block: preserve errno from fdatasync failures When fdatasync() fails on a file backend we set a flag that short-circuits any future attempts to call fdatasync(). The first failure returns the true errno, but the later short- circuited calls return a generic EIO. The latter is unhelpful because fdatasync() can return a variety of errnos, including EACCESS. Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Paolo Bonzini	7fa1c63553	iscsi: link libm into the module Depending on the configuration of QEMU, some binaries might not need libm at all. In that case libiscsi, which uses exp(), will fail to load. Link it in the module explicitly. Reported-by: Yi Sun <yisun@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-04 13:47:07 +02:00
Paolo Bonzini	96acfb1f25	meson: allow optional dependencies for block modules Right now all dependencies for block modules are passed to module_ss.add(when: ...), so they are mandatory. In the next patch we will need to add a libm dependency to a module, but libm does not exist on all systems. So, modify the creation of module_ss and modsrc so that dependencies can also be passed to module_ss.add(if_true: ...). While touching the array, remove the useless dependency of the curl module on glib. glib is always linked in QEMU and in fact all other block modules also need it, but they don't have to specify it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-04 13:47:07 +02:00
Peter Maydell	8e6dad2028	Block layer patches - NBD server: Fix crashes related to switching between AioContexts - file-posix: Workaround for discard/write_zeroes on buggy filesystems - Follow-up fixes for the reopen vs. permission changes - quorum: Fix error handling for flush - block-copy: Refactor copy_range handling - docs: Describe how to use 'null-co' block driver -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmC3iy8RHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9ZP0hAAuh07CFWLzHCcRC7PBzekSPfzRYYBLDSW EObJ1Ov4mvz8UZoP6BDJ5QVzLPhel6hXkxTd83B1D7t/Dq+yJYR0z8Kv3USpaVJ4 2U26SsoGQM8BmtVDL1Q8tQ5eDWQ4ykxNx6F2/lKBe1EH1lfaun04Xj1rNh7jpilo nNmKMDDI1UOkH0lKDR3tqfEV0XQE7o+ZKfPlIbvYMjXk9ZPKUHfjNPGVdCLQVnqH VJI01hF7eEx1ykSMdlC+TzNoVGG+mCBokGuW0JlUvOpX6FcGnAlxXQx8u53c1I8O lggZV8b2IbrNBUVwXQHLLrXxjOo+u54Ct9y/gXUTAj8qai+9jVRp60Y1pnCyeIeu DzFx10xwy04PGleb7AAZ4dT8du2+PTkuyQ9KmlvQ2U4IUcgW124CrDeO7XYr1aif hCTJPeEDrC9YNU6AQ8rLXrYUtkumSm2zUzU5nZ//i5WH41475/vsmgP5A+Jr457A Xu0yiI2Gqkr9CNsP9ZzMkNj03oIBhPFuGxiwibLQsy/6UVnaDYS0+rQ8FXYnF5+K iEpgXe3vKTWxM097kzJMBTDVRMXRa75NtK7KWXMDgVpHTbcv1t1otsn+6dfv+B55 ULJM1ETsyYS0T6BqNglvdytsraSt7JgSF+ZLHbYk3KVDshwnq/0ksgSqHNNA14ca kYTzHhMgo5w= =gSq/ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches - NBD server: Fix crashes related to switching between AioContexts - file-posix: Workaround for discard/write_zeroes on buggy filesystems - Follow-up fixes for the reopen vs. permission changes - quorum: Fix error handling for flush - block-copy: Refactor copy_range handling - docs: Describe how to use 'null-co' block driver # gpg: Signature made Wed 02 Jun 2021 14:44:15 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: docs/secure-coding-practices: Describe how to use 'null-co' block driver block-copy: refactor copy_range handling block-copy: fix block_copy_task_entry() progress update nbd/server: Use drained block ops to quiesce the server block-backend: add drained_poll block: improve permission conflict error message block: simplify bdrv_child_user_desc() block/vvfat: inherit child_vvfat_qcow from child_of_bds block: improve bdrv_child_get_parent_desc() block-backend: improve blk_root_get_parent_desc() block: document child argument of bdrv_attach_child_common() block/file-posix: Try other fallbacks after invalid FALLOC_FL_ZERO_RANGE block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS block: drop BlockBackendRootState::read_only block: drop BlockDriverState::read_only block: consistently use bdrv_is_read_only() block/vvfat: fix vvfat_child_perm crash block/vvfat: child_vvfat_qcow: add .get_parent_aio_context, fix crash qemu-io-cmds: assert that we don't have .perm requested in no-blk case block/quorum: Provide .bdrv_co_flush instead of .bdrv_co_flush_to_disk Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-06-02 19:34:03 +01:00
Vladimir Sementsov-Ogievskiy	bed9523471	block-copy: refactor copy_range handling Currently we update s->use_copy_range and s->copy_size in block_copy_do_copy(). It's not very good: 1. block_copy_do_copy() is intended to be a simple function, that wraps bdrv_co_<io> functions for need of block copy. That's why we don't pass BlockCopyTask into it. So, block_copy_do_copy() is bad place for manipulation with generic state of block-copy process 2. We are going to make block-copy thread-safe. So, it's good to move manipulation with state of block-copy to the places where we'll need critical sections anyway, to not introduce extra synchronization primitives in block_copy_do_copy(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210528141628.44287-3-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	8146b357d0	block-copy: fix block_copy_task_entry() progress update Don't report successful progress on failure, when call_state->ret is set. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210528141628.44287-2-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Sergio Lopez	095cc4d0f6	block-backend: add drained_poll Allow block backends to poll their devices/users to check if they have been quiesced when entering a drained section. This will be used in the next patch to wait for the NBD server to be completely quiesced. Suggested-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Sergio Lopez <slp@redhat.com> Message-Id: <20210602060552.17433-2-slp@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	8081f064e4	block/vvfat: inherit child_vvfat_qcow from child_of_bds Recently we've fixed a crash by adding .get_parent_aio_context handler to child_vvfat_qcow. Now we want it to support .get_parent_desc as well. child_vvfat_qcow wants to implement own .inherit_options, it's not bad. But omitting all other handlers is a bad idea. Let's inherit the class from child_of_bds instead, similar to chain_child_class and detach_by_driver_cb_class in test-bdrv-drain.c. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210601075218.79249-5-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	fd240a184b	block-backend: improve blk_root_get_parent_desc() We have different types of parents: block nodes, block backends and jobs. So, it makes sense to specify type together with name. While being here also use g_autofree. iotest 307 output is updated. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210601075218.79249-3-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Thomas Huth	fa95e9fbab	block/file-posix: Try other fallbacks after invalid FALLOC_FL_ZERO_RANGE If fallocate(... FALLOC_FL_ZERO_RANGE ...) returns EINVAL, it's likely an indication that the file system is buggy and does not implement unaligned accesses right. We still might be lucky with the other fallback fallocate() calls later in this function, though, so we should not return immediately and try the others first. Since FALLOC_FL_ZERO_RANGE could also return EINVAL if the file descriptor is not a regular file, we ignore this filesystem bug silently, without printing an error message for the user. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210527172020.847617-3-thuth@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Thomas Huth	73ebf29729	block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS A customer reported that running qemu-img convert -t none -O qcow2 -f qcow2 input.qcow2 output.qcow2 fails for them with the following error message when the images are stored on a GPFS file system : qemu-img: error while writing sector 0: Invalid argument After analyzing the strace output, it seems like the problem is in handle_aiocb_write_zeroes(): The call to fallocate(FALLOC_FL_PUNCH_HOLE) returns EINVAL, which can apparently happen if the file system has a different idea of the granularity of the operation. It's arguably a bug in GPFS, since the PUNCH_HOLE mode should not result in EINVAL according to the man-page of fallocate(), but the file system is out there in production and so we have to deal with it. In commit `294682cc3a` ("block: workaround for unaligned byte range in fallocate()") we also already applied the a work-around for the same problem to the earlier fallocate(FALLOC_FL_ZERO_RANGE) call, so do it now similar with the PUNCH_HOLE call. But instead of silently catching and returning -ENOTSUP (which causes the caller to fall back to writing zeroes), let's rather inform the user once about the buggy file system and try the other fallback instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210527172020.847617-2-thuth@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	260242a833	block: drop BlockBackendRootState::read_only Instead of keeping additional boolean field, let's store the information in BDRV_O_RDWR bit of BlockBackendRootState::open_flags. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210527154056.70294-4-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	307261b243	block: consistently use bdrv_is_read_only() It's better to use accessor function instead of bs->read_only directly. In some places use bdrv_is_writable() instead of checking both BDRV_O_RDWR set and BDRV_O_INACTIVE not set. In bdrv_open_common() it's a bit strange to add one more variable, but we are going to drop bs->read_only in the next patch, so new ro local variable substitutes it here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210527154056.70294-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	39df2c6d57	block/vvfat: fix vvfat_child_perm crash It's wrong to rely on s->qcow in vvfat_child_perm, as on permission update during bdrv_open_child() call this field is not set yet. Still prior to `aa5a04c7db`, it didn't crash, as bdrv_open_child passed NULL as child to bdrv_child_perm(), and NULL was equal to NULL in assertion (still, it was bad guarantee for child being s->qcow, not backing :). Since `aa5a04c7db` "add bdrv_attach_child_noperm" bdrv_refresh_perms called on parent node when attaching child, and new correct child pointer is passed to .bdrv_child_perm. Still, s->qcow is NULL at the moment. Let's rely only on role instead. Without that fix, ./build/qemu-system-x86_64 -usb -device usb-storage,drive=fat16 \ -drive \ file=fat:rw:fat-type=16:"<path of a host folder>",id=fat16,format=raw,if=none crashes: (gdb) bt 0 raise () at /lib64/libc.so.6 1 abort () at /lib64/libc.so.6 2 _nl_load_domain.cold () at /lib64/libc.so.6 3 annobin_assert.c_end () at /lib64/libc.so.6 4 vvfat_child_perm (bs=0x559186f3d690, c=0x559186f1ed20, role=3, reopen_queue=0x0, perm=0, shared=31, nperm=0x7ffe56f28298, nshared=0x7ffe56f282a0) at ../block/vvfat.c:3214 5 bdrv_child_perm (bs=0x559186f3d690, child_bs=0x559186f60190, c=0x559186f1ed20, role=3, reopen_queue=0x0, parent_perm=0, parent_shared=31, nperm=0x7ffe56f28298, nshared=0x7ffe56f282a0) at ../block.c:2094 6 bdrv_node_refresh_perm (bs=0x559186f3d690, q=0x0, tran=0x559186f65850, errp=0x7ffe56f28530) at ../block.c:2336 7 bdrv_list_refresh_perms (list=0x559186db5b90 = {...}, q=0x0, tran=0x559186f65850, errp=0x7ffe56f28530) at ../block.c:2358 8 bdrv_refresh_perms (bs=0x559186f3d690, errp=0x7ffe56f28530) at ../block.c:2419 9 bdrv_attach_child (parent_bs=0x559186f3d690, child_bs=0x559186f60190, child_name=0x559184d83e3d "write-target", child_class=0x5591852f3b00 <child_vvfat_qcow>, child_role=3, errp=0x7ffe56f28530) at ../block.c:2959 10 bdrv_open_child (filename=0x559186f5cb80 "/var/tmp/vl.7WYmFU", options=0x559186f66c20, bdref_key=0x559184d83e3d "write-target", parent=0x559186f3d690, child_class=0x5591852f3b00 <child_vvfat_qcow>, child_role=3, allow_none=false, errp=0x7ffe56f28530) at ../block.c:3351 11 enable_write_target (bs=0x559186f3d690, errp=0x7ffe56f28530) at ../block/vvfat.c:3177 12 vvfat_open (bs=0x559186f3d690, options=0x559186f42db0, flags=155650, errp=0x7ffe56f28530) at ../block/vvfat.c:1236 13 bdrv_open_driver (bs=0x559186f3d690, drv=0x5591853d97e0 <bdrv_vvfat>, node_name=0x0, options=0x559186f42db0, open_flags=155650, errp=0x7ffe56f28640) at ../block.c:1557 14 bdrv_open_common (bs=0x559186f3d690, file=0x0, options=0x559186f42db0, errp=0x7ffe56f28640) at ../block.c:1833 ... (gdb) fr 4 #4 vvfat_child_perm (bs=0x559186f3d690, c=0x559186f1ed20, role=3, reopen_queue=0x0, perm=0, shared=31, nperm=0x7ffe56f28298, nshared=0x7ffe56f282a0) at ../block/vvfat.c:3214 3214 assert(c == s->qcow \|\| (role & BDRV_CHILD_COW)); (gdb) p role $1 = 3 # BDRV_CHILD_DATA \| BDRV_CHILD_METADATA (gdb) p c $2 = {bs = 0x559186f60190, name = 0x559186f669d0 "write-target", klass = 0x5591852f3b00 <child_vvfat_qcow>, role = 3, opaque = 0x559186f3d690, perm = 3, shared_perm = 4, frozen = false, parent_quiesce_counter = 0, next = {le_next = 0x0, le_prev = 0x559186f41818}, next_parent = {le_next = 0x0, le_prev = 0x559186f64320}} (gdb) p s->qcow $3 = (BdrvChild ) 0x0 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210524101257.119377-3-vsementsov@virtuozzo.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	fb62b58896	block/vvfat: child_vvfat_qcow: add .get_parent_aio_context, fix crash Commit `3ca1f32257` "block: BdrvChildClass: add .get_parent_aio_context handler" introduced new handler and commit `228ca37e12` "block: drop ctx argument from bdrv_root_attach_child" made a generic use of it. But `3ca1f32257` didn't update child_vvfat_qcow. Fix that. Before that fix the command ./build/qemu-system-x86_64 -usb -device usb-storage,drive=fat16 \ -drive file=fat:rw:fat-type=16:"<path of a host folder>",id=fat16,format=raw,if=none crashes: 1 bdrv_child_get_parent_aio_context (c=0x559d62426d20) at ../block.c:1440 2 bdrv_attach_child_common (child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target", child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, perm=3, shared_perm=4, opaque=0x559d62445690, child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60) at ../block.c:2795 3 bdrv_attach_child_noperm (parent_bs=0x559d62445690, child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target", child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60) at ../block.c:2855 4 bdrv_attach_child (parent_bs=0x559d62445690, child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target", child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, errp=0x7ffc74c2ae60) at ../block.c:2953 5 bdrv_open_child (filename=0x559d62464b80 "/var/tmp/vl.h3TIS4", options=0x559d6246ec20, bdref_key=0x559d606f9e3d "write-target", parent=0x559d62445690, child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, allow_none=false, errp=0x7ffc74c2ae60) at ../block.c:3351 6 enable_write_target (bs=0x559d62445690, errp=0x7ffc74c2ae60) at ../block/vvfat.c:3176 7 vvfat_open (bs=0x559d62445690, options=0x559d6244adb0, flags=155650, errp=0x7ffc74c2ae60) at ../block/vvfat.c:1236 8 bdrv_open_driver (bs=0x559d62445690, drv=0x559d60d4f7e0 <bdrv_vvfat>, node_name=0x0, options=0x559d6244adb0, open_flags=155650, errp=0x7ffc74c2af70) at ../block.c:1557 9 bdrv_open_common (bs=0x559d62445690, file=0x0, options=0x559d6244adb0, errp=0x7ffc74c2af70) at ... (gdb) fr 1 #1 0x0000559d603ea3bf in bdrv_child_get_parent_aio_context (c=0x559d62426d20) at ../block.c:1440 1440 return c->klass->get_parent_aio_context(c); (gdb) p c->klass $1 = (const BdrvChildClass ) 0x559d60c58d20 <child_vvfat_qcow> (gdb) p c->klass->get_parent_aio_context $2 = (AioContext ()(BdrvChild )) 0x0 Fixes: `3ca1f32257` Fixes: `228ca37e12` Reported-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210524101257.119377-2-vsementsov@virtuozzo.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Lukas Straub	5529b02da2	block/quorum: Provide .bdrv_co_flush instead of .bdrv_co_flush_to_disk The quorum block driver uses a custom flush callback to handle the case when some children return io errors. In that case it still returns success if enough children are healthy. However, it provides it as the .bdrv_co_flush_to_disk callback, not as .bdrv_co_flush. This causes the block layer to do it's own generic flushing for the children instead, which doesn't handle errors properly. Fix this by providing .bdrv_co_flush instead of .bdrv_co_flush_to_disk so the block layer uses the custom flush callback. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reported-by: Minghao Yuan <meeho@qq.com> Message-Id: <20210518134214.11ccf05f@gecko.fritz.box> Tested-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Thomas Huth	b4c10fc6fe	block/ssh: Bump minimum libssh version to 0.8.7 It has been over two years since RHEL-8 was released, and thus per the platform build policy, we no longer need to support RHEL-7 as a build target. So from the RHEL-7 perspective, we do not have to support libssh v0.7 anymore now. Let's look at the versions from other distributions and operating systems - according to repology.org, current shipping versions are: RHEL-8: 0.9.4 Debian Buster: 0.8.7 openSUSE Leap 15.2: 0.8.7 Ubuntu LTS 18.04: 0.8.0 * Ubuntu LTS 20.04: 0.9.3 FreeBSD: 0.9.5 Fedora 33: 0.9.5 Fedora 34: 0.9.5 OpenBSD: 0.9.5 macOS HomeBrew: 0.9.5 HaikuPorts: 0.9.5 * The version of libssh in Ubuntu 18.04 claims to be 0.8.0 from the name of the package, but in reality it is a 0.7 patched up as a Frankenstein monster with patches from the 0.8 development branch. This gave us some headaches in the past already and so it never worked with QEMU. All attempts to get it supported have failed in the past, patches for QEMU have never been merged and a request to Ubuntu to fix it in their 18.04 distro has been ignored: https://bugs.launchpad.net/ubuntu/+source/libssh/+bug/1847514 Thus we really should ignore the libssh in Ubuntu 18.04 in QEMU, too. Fix it by bumping the minimum libssh version to something that is greater than 0.8.0 now. Debian Buster and openSUSE Leap have the oldest version and so 0.8.7 is the new minimum. Signed-off-by: Thomas Huth <thuth@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Message-Id: <20210519155859.344569-1-thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-06-02 06:51:09 +02:00
Stefano Garzarella	d0fb9657a3	docs: fix references to docs/devel/tracing.rst Commit `e50caf4a5c` ("tracing: convert documentation to rST") converted docs/devel/tracing.txt to docs/devel/tracing.rst. We still have several references to the old file, so let's fix them with the following command: sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt) Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210517151702.109066-2-sgarzare@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-06-02 06:51:09 +02:00
Paolo Bonzini	b02629550d	replication: move include out of root directory The replication.h file is included from migration/colo.c and tests/unit/test-replication.c, so it should be in include/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:46 +02:00
Paolo Bonzini	29a6ea24eb	coroutine-sleep: replace QemuCoSleepState pointer with struct in the API Right now, users of qemu_co_sleep_ns_wakeable are simply passing a pointer to QemuCoSleepState by reference to the function. But QemuCoSleepState really is just a Coroutine*; making the content of the struct public is just as efficient and lets us skip the user_state_pointer indirection. Since the usage is changed, take the occasion to rename the struct to QemuCoSleep. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20210517100548.28806-6-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-05-21 18:22:33 +01:00
Paolo Bonzini	eaee072085	coroutine-sleep: allow qemu_co_sleep_wake that wakes nothing All callers of qemu_co_sleep_wake are checking whether they are passing a NULL argument inside the pointer-to-pointer: do the check in qemu_co_sleep_wake itself. As a side effect, qemu_co_sleep_wake can be called more than once and it will only wake the coroutine once; after the first time, the argument will be set to NULL via sleep_state->user_state_pointer. However, this would not be safe unless co_sleep_cb keeps using the QemuCoSleepState directly, so make it go through the pointer-to-pointer instead. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20210517100548.28806-4-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-05-21 18:22:33 +01:00
Stefan Hajnoczi	1b0b2e6d06	block/export: improve vu_blk_sect_range_ok() The checks in vu_blk_sect_range_ok() assume VIRTIO_BLK_SECTOR_SIZE is equal to BDRV_SECTOR_SIZE. This is true, but let's add a QEMU_BUILD_BUG_ON() to make it explicit. We might as well check that the request buffer size is a multiple of VIRTIO_BLK_SECTOR_SIZE while we're at it. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210331142727.391465-1-stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-05-18 11:08:13 +02:00
Vladimir Sementsov-Ogievskiy	38b4409647	qcow2: set bdi->is_dirty Set bdi->is_dirty, so that qemu-img info could show dirty flag. After this commit the following check will show '"dirty-flag": true': ./build/qemu-img create -f qcow2 -o lazy_refcounts=on x 1M ./build/qemu-io x qemu-io> write 0 1M After "write" command success, kill the qemu-io process: kill -9 <qemu-io pid> ./build/qemu-img info --output=json x This will show '"dirty-flag": true' among other things. (before this commit it shows '"dirty-flag": false') Note, that qcow2's dirty-bit is not a "dirty bit for the image". It only protects qcow2 lazy refcounts feature. So, there are a lot of conditions when qcow2 session may be not closed correctly, but bit is 0. Still, when bit is set, the last session is definitely not finished correctly and it's better to report it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210504160656.462836-1-vsementsov@virtuozzo.com> Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-05-18 11:08:13 +02:00
Vladimir Sementsov-Ogievskiy	c61ebf362d	write-threshold: deal with includes "qemu/typedefs.h" is enough for include/block/write-threshold.h header with forward declaration of BlockDriverState. Also drop extra includes from block/write-threshold.c and tests/unit/test-write-threshold.c Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210506090621.11848-9-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	2e0e9cbd89	block/write-threshold: drop extra APIs bdrv_write_threshold_exceeded() is unused. bdrv_write_threshold_is_set() is used only to double check the value of bs->write_threshold_offset in tests. No real sense in it (both tests do check real value with help of bdrv_write_threshold_get()) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [mreitz: Adjusted commit message as per Eric's suggestion] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	ad578c56d5	block: drop write notifiers They are unused now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-3-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	94783301b8	block/write-threshold: don't use write notifiers write-notifiers are used only for write-threshold. New code for such purpose should create filters. Let's better special-case write-threshold and drop write notifiers at all. (Actually, write-threshold is special-cased anyway, as the only user of write-notifiers) So, create a new direct interface for bdrv_co_write_req_prepare() and drop all write-notifier related logic from write-threshold.c. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [mreitz: Adjusted comment as per Eric's suggestion] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	bcc8584c83	block/copy-on-read: use bdrv_drop_filter() and drop s->active Now, after huge update of block graph permission update algorithm, we don't need this workaround with active state of the filter. Drop it and use new smart bdrv_drop_filter() function. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210506194143.394141-1-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	9c785cd714	mirror: stop cancelling in-flight requests on non-force cancel in READY If mirror is READY than cancel operation is not discarding the whole result of the operation, but instead it's a documented way get a point-in-time snapshot of source disk. So, we should not cancel any requests if mirror is READ and force=false. Let's fix that case. Note, that bug that we have before this commit is not critical, as the only .bdrv_cancel_in_flight implementation is nbd_cancel_in_flight() and it cancels only requests waiting for reconnection, so it should be rare case. Fixes: `521ff8b779` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210421075858.40197-1-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	78632a3d16	monitor: hmp_qemu_io: acquire aio contex, fix crash Max reported the following bug: $ ./qemu-img create -f raw src.img 1G $ ./qemu-img create -f raw dst.img 1G $ (echo ' {"execute":"qmp_capabilities"} {"execute":"blockdev-mirror", "arguments":{"job-id":"mirror", "device":"source", "target":"target", "sync":"full", "filter-node-name":"mirror-top"}} '; sleep 3; echo ' {"execute":"human-monitor-command", "arguments":{"command-line": "qemu-io mirror-top \"write 0 1G\""}}') \ \| x86_64-softmmu/qemu-system-x86_64 \ -qmp stdio \ -blockdev file,node-name=source,filename=src.img \ -blockdev file,node-name=target,filename=dst.img \ -object iothread,id=iothr0 \ -device virtio-blk,drive=source,iothread=iothr0 crashes: 0 raise () at /usr/lib/libc.so.6 1 abort () at /usr/lib/libc.so.6 2 error_exit (err=<optimized out>, msg=msg@entry=0x55fbb1634790 <__func__.27> "qemu_mutex_unlock_impl") at ../util/qemu-thread-posix.c:37 3 qemu_mutex_unlock_impl (mutex=mutex@entry=0x55fbb25ab6e0, file=file@entry=0x55fbb1636957 "../util/async.c", line=line@entry=650) at ../util/qemu-thread-posix.c:109 4 aio_context_release (ctx=ctx@entry=0x55fbb25ab680) at ../util/async.c:650 5 bdrv_do_drained_begin (bs=bs@entry=0x55fbb3a87000, recursive=recursive@entry=false, parent=parent@entry=0x0, ignore_bds_parents=ignore_bds_parents@entry=false, poll=poll@entry=true) at ../block/io.c:441 6 bdrv_do_drained_begin (poll=true, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55fbb3a87000) at ../block/io.c:448 7 blk_drain (blk=0x55fbb26c5a00) at ../block/block-backend.c:1718 8 blk_unref (blk=0x55fbb26c5a00) at ../block/block-backend.c:498 9 blk_unref (blk=0x55fbb26c5a00) at ../block/block-backend.c:491 10 hmp_qemu_io (mon=0x7fffaf3fc7d0, qdict=<optimized out>) at ../block/monitor/block-hmp-cmds.c:628 man pthread_mutex_unlock ... EPERM The mutex type is PTHREAD_MUTEX_ERRORCHECK or PTHREAD_MUTEX_RECURSIVE, or the mutex is a robust mutex, and the current thread does not own the mutex. So, thread doesn't own the mutex. And we have iothread here. Next, note that AIO_WAIT_WHILE() documents that ctx must be acquired exactly once by caller. But where is it acquired in the call stack? Seems nowhere. qemuio_command do acquire aio context.. But we need context acquired around blk_unref() as well and actually around blk_insert_bs() too. Let's refactor qemuio_command so that it doesn't acquire aio context but callers do that instead. This way we can cleanly acquire aio context in hmp_qemu_io() around all three calls. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210423134233.51495-1-vsementsov@virtuozzo.com> [mreitz: Fixed comment] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Connor Kuehl	2b99cfce08	block/rbd: Add an escape-aware strchr helper Sometimes the parser needs to further split a token it has collected from the token input stream. Right now, it does a cursory check to see if the relevant characters appear in the token to determine if it should break it down further. However, qemu_rbd_next_tok() will escape characters as it removes tokens from the token stream and plain strchr() won't. This can make the initial strchr() check slightly misleading since it implies qemu_rbd_next_tok() will find the token and split on it, except the reality is that qemu_rbd_next_tok() will pass over it if it is escaped. Use a custom strchr to avoid mixing escaped and unescaped string operations. Furthermore, this code is identical to how qemu_rbd_next_tok() seeks its next token, so incorporate this custom strchr into the body of that function to reduce duplication. Reported-by: Han Han <hhan@redhat.com> Fixes: https://bugzilla.redhat.com/1873913 Signed-off-by: Connor Kuehl <ckuehl@redhat.com> Message-Id: <20210421212343.85524-3-ckuehl@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Markus Armbruster	09ec85176e	block: Drop the sheepdog block driver It was deprecated in commit `e1c4269763`, v5.2.0. See that commit message for rationale. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210501075747.3293186-1-armbru@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com>	2021-05-12 17:42:23 +02:00
Peter Maydell	4cc10cae64	* NetBSD NVMM support * RateLimit mutex * Prepare for Meson 0.57 upgrade -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCROukUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOFXgf/ThwuBCbwC6pwoHpZzFXHdJRXIqHa iKTqjCLymz9NQBRTaMeG5CWjXl4o9syHLzEXLQxuQaynHK8AjbyeMSllBVLzBUme TU9AY3qwLShRJm3XGXkuUilFE+IR8FXWFgrTOsZXgbT+JQlkCgiuhCRqfAcDEgi/ F5SNqlMzPNvF6G0FY9DFBBkoKF4YWROx25SgNl3fxgWwC94px/a22BXTVpOxaClZ HE/H+kbJH5sD2dOJR5cqbgFg7eBemNdxO3tSbR6WoP9pcvVPx0Dgh5hUJb5+pUXY fV5O5zZ+CdyNjWM4yAHg0y8kOlnqrLwv7pH+NdqWFaWiZ9uCSrVFR13ejQ== =sKO4 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * NetBSD NVMM support * RateLimit mutex * Prepare for Meson 0.57 upgrade # gpg: Signature made Tue 04 May 2021 13:15:37 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: glib-compat: accept G_TEST_SLOW environment variable gitlab-ci: use --meson=internal for CFI jobs configure: handle meson options that have changed type configure: reindent meson invocation slirp: add configure option to disable smbd ratelimit: protect with a mutex Add NVMM Accelerator: add maintainers for NetBSD/NVMM Add NVMM accelerator: acceleration enlightenments Add NVMM accelerator: x86 CPU support Add NVMM accelerator: configure and build logic oslib-win32: do not rely on macro to get redefined function name Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-05-06 18:56:17 +01:00
Paolo Bonzini	4951967d84	ratelimit: protect with a mutex Right now, rate limiting is protected by the AioContext mutex, which is taken for example both by the block jobs and by qmp_block_job_set_speed (via find_block_job). We would like to remove the dependency of block layer code on the AioContext mutex, since most drivers and the core I/O code are already not relying on it. However, there is no existing lock that can easily be taken by both ratelimit_set_speed and ratelimit_calculate_delay, especially because the latter might run in coroutine context (and therefore under a CoMutex) but the former will not. Since concurrent calls to ratelimit_calculate_delay are not possible, one idea could be to use a seqlock to get a snapshot of slice_ns and slice_quota. But for now keep it simple, and just add a mutex to the RateLimit struct; block jobs are generally not performance critical to the point of optimizing the clock cycles spent in synchronization. This also requires the introduction of init/destroy functions, so add them to the two users of ratelimit.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-04 14:15:35 +02:00
Thomas Huth	4c386f8064	Do not include sysemu/sysemu.h if it's not really necessary Stop including sysemu/sysemu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-2-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:50 +02:00
Kevin Wolf	35b7f4abd5	block: Add BDRV_O_NO_SHARE for blk_new_open() Normally, blk_new_open() just shares all permissions. This was fine originally when permissions only protected against uses in the same process because no other part of the code would actually get to access the block nodes opened with blk_new_open(). However, since we use it for file locking now, unsharing permissions becomes desirable. Add a new BDRV_O_NO_SHARE flag that is used in blk_new_open() to unshare any permissions that can be unshared. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210422164344.283389-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy	72373e40fb	block: bdrv_reopen_multiple: refresh permissions on updated graph Move bdrv_reopen_multiple to new paradigm of permission update: first update graph relations, then do refresh the permissions. We have to modify reopen process in file-posix driver: with new scheme we don't have prepared permissions in raw_reopen_prepare(), so we should reconfigure fd in raw_check_perm(). Still this seems more native and simple anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-31-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy	1e4c797c75	block: make bdrv_refresh_limits() to be a transaction action To be used in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-28-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy	b75d64b329	block/backup-top: drop .active We don't need this workaround anymore: bdrv_append is already smart enough and we can use new bdrv_drop_filter(). This commit efficiently reverts also recent `705dde27c6`, which checked .active on io path. Still it said that the problem should be theoretical. And the logic of filter removement is changed anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-25-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy	228ca37e12	block: drop ctx argument from bdrv_root_attach_child Passing parent aio context is redundant, as child_class and parent opaque pointer are enough to retrieve it. Drop the argument and use new bdrv_child_get_parent_aio_context() interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-7-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:47 +02:00
Vladimir Sementsov-Ogievskiy	3ca1f32257	block: BdrvChildClass: add .get_parent_aio_context handler Add new handler to get aio context and implement it in all child classes. Add corresponding public interface to be used soon. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-6-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:47 +02:00
Vladimir Sementsov-Ogievskiy	ae9d441706	block: bdrv_append(): don't consume reference We have too much comments for this feature. It seems better just don't do it. Most of real users (tests don't count) have to create additional reference. Drop also comment in external_snapshot_prepare: - bdrv_append doesn't "remove" old bs in common sense, it sounds strange - the fact that bdrv_append can fail is obvious from the context - the fact that we must rollback all changes in transaction abort is known (it's the direct role of abort) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-5-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:47 +02:00
Vladimir Sementsov-Ogievskiy	0267101af6	block/nbd: fix possible use after free of s->connect_thread If on nbd_close() we detach the thread (in nbd_co_establish_connection_cancel() thr->state becomes CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use s->connect_thread (which is set to NULL), as running thread may free it at any time. Still nbd_co_establish_connection() does exactly this: it saves s->connect_thread to local variable (just for better code style) and use it even after yield point, when thread may be already detached. Fix that. Also check thr to be non-NULL on nbd_co_establish_connection() start for safety. After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes impossible in the second switch in nbd_co_establish_connection(). Still, don't add extra abort() just before the release. If it somehow possible to reach this "case:" it won't hurt. Anyway, good refactoring of all this reconnect mess will come soon. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210406155114.1057355-1-vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-04-13 15:35:12 +02:00
Max Reitz	00769414cd	mirror: Do not enter a paused job on completion Currently, it is impossible to complete jobs on standby (i.e. paused ready jobs), but actually the only thing in mirror_complete() that does not work quite well with a paused job is the job_enter() at the end. If we make it conditional, this function works just fine even if the mirror job is paused. So technically this is a no-op, but obviously the intention is to accept block-job-complete even for jobs on standby, which we need this patch for first. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210409120422.144040-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-09 18:00:29 +02:00
Max Reitz	c41f5b96ee	mirror: Move open_backing_file to exit_common This is a graph change and therefore should be done in job-finalize (which is what invokes mirror_exit_common()). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210409120422.144040-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-09 18:00:29 +02:00
Stefano Garzarella	b084b420d9	block/rbd: fix memory leak in qemu_rbd_co_create_opts() When we allocate 'q_namespace', we forgot to set 'has_q_namespace' to true. This can cause several issues, including a memory leak, since qapi_free_BlockdevCreateOptions() does not deallocate that memory, as reported by valgrind: 13 bytes in 1 blocks are definitely lost in loss record 7 of 96 at 0x4839809: malloc (vg_replace_malloc.c:307) by 0x48CEBB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x48E3FE3: g_strdup (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x180010: qemu_rbd_co_create_opts (rbd.c:446) by 0x1AE72C: bdrv_create_co_entry (block.c:492) by 0x241902: coroutine_trampoline (coroutine-ucontext.c:173) by 0x57530AF: ??? (in /usr/lib64/libc-2.32.so) by 0x1FFEFFFA6F: ??? Fix setting 'has_q_namespace' to true when we allocate 'q_namespace'. Fixes: `19ae9ae014` ("block/rbd: Add support for ceph namespaces") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210329150129.121182-3-sgarzare@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-09 18:00:29 +02:00
Stefano Garzarella	c1c1f6cf51	block/rbd: fix memory leak in qemu_rbd_connect() In qemu_rbd_connect(), 'mon_host' is allocated by qemu_rbd_mon_host() using g_strjoinv(), but it's only freed in the error path, leaking memory in the success path as reported by valgrind: 80 bytes in 4 blocks are definitely lost in loss record 5,028 of 6,516 at 0x4839809: malloc (vg_replace_malloc.c:307) by 0x5315BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x532B6FF: g_strjoinv (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x87D07E: qemu_rbd_mon_host (rbd.c:538) by 0x87D07E: qemu_rbd_connect (rbd.c:562) by 0x87E1CE: qemu_rbd_open (rbd.c:740) by 0x840EB1: bdrv_open_driver (block.c:1528) by 0x8453A9: bdrv_open_common (block.c:1802) by 0x8453A9: bdrv_open_inherit (block.c:3444) by 0x8464C2: bdrv_open (block.c:3537) by 0x8108CD: qmp_blockdev_add (blockdev.c:3569) by 0x8EA61B: qmp_marshal_blockdev_add (qapi-commands-block-core.c:1086) by 0x90B528: do_qmp_dispatch_bh (qmp-dispatch.c:131) by 0x907EA4: aio_bh_poll (async.c:164) Fix freeing 'mon_host' also when qemu_rbd_connect() ends correctly. Fixes: `0a55679b4a` Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210329150129.121182-2-sgarzare@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-09 18:00:29 +02:00
David Edmondson	07ee2ab4fd	block/vdi: Don't assume that blocks are larger than VdiHeader Given that the block size is read from the header of the VDI file, a wide variety of sizes might be seen. Rather than re-using a block sized memory region when writing the VDI header, allocate an appropriately sized buffer. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20210325112941.365238-3-pbonzini@redhat.com Message-Id: <20210309144015.557477-3-david.edmondson@oracle.com> Acked-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-03-31 10:44:21 +01:00
David Edmondson	574b8304cf	block/vdi: When writing new bmap entry fails, don't leak the buffer If a new bitmap entry is allocated, requiring the entire block to be written, avoiding leaking the buffer allocated for the block should the write fail. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20210325112941.365238-2-pbonzini@redhat.com Message-Id: <20210309144015.557477-2-david.edmondson@oracle.com> Acked-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-03-31 10:44:21 +01:00
Max Reitz	484108293d	qcow2: Force preallocation with data-file-raw Setting the qcow2 data-file-raw bit means that you can ignore the qcow2 metadata when reading from the external data file. It does not mean that you have to ignore it, though. Therefore, the data read must be the same regardless of whether you interpret the metadata or whether you ignore it, and thus the L1/L2 tables must all be present and give a 1:1 mapping. This patch changes 244's output: First, the qcow2 file is larger right after creation, because of metadata preallocation. Second, the qemu-img map output changes: Everything that was not explicitly discarded or zeroed is now a data area. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210326145509.163455-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2021-03-30 13:02:10 +02:00
Max Reitz	53431b9086	block/mirror: Fix mirror_top's permissions mirror_top currently shares all permissions, and takes only the WRITE permission (if some parent has taken that permission, too). That is wrong, though; mirror_top is a filter, so it should take permissions like any other filter does. For example, if the parent needs CONSISTENT_READ, we need to take that, too, and if it cannot share the WRITE permission, we cannot share it either. The exception is when mirror_top is used for active commit, where we cannot take CONSISTENT_READ (because it is deliberately unshared above the base node) and where we must share WRITE (so that it is shared for all images in the backing chain, so the mirror job can take it for the target BB). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210211172242.146671-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-03-29 18:09:00 +02:00
Pavel Dovgalyuk	ad0ce64279	qcow2: use external virtual timers Regular virtual timers are used to emulate timings related to vCPU and peripheral states. QCOW2 uses timers to clean the cache. These timers should have external flag. In the opposite case they affect the execution and it can't be recorded and replayed. This patch adds external flag to the timer for qcow2 cache clean. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <161700516327.1141158.8366564693714562536.stgit@pasha-ThinkPad-X280> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-03-29 18:04:29 +02:00
Markus Armbruster	bdabafc683	block: Remove monitor command block_passwd Command block_passwd always fails since Commit `c01c214b69` "block: remove all encryption handling APIs" (v2.10.0) turned block_passwd into a stub that always fails, and hardcoded encryption_key_missing to false in query-named-block-nodes and query-block. Commit `ad1324e044` "block: remove 'encryption_key_missing' flag from QAPI" just landed. Complete the cleanup job: remove block_passwd. Cc: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210323101951.3686029-1-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-23 22:31:56 +01:00
Stefan Hajnoczi	6f4b1996b4	block/export: disable VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD for now The vhost-user in-flight shmfd feature has not been tested with qemu-storage-daemon's vhost-user-blk server. Disable this optional feature for now because it requires MFD_ALLOW_SEALING, which is not available in some CI environments. If we need this feature in the future it can be re-enabled after testing. Reported-by: Peter Maydell <peter.maydell@linaro.org> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210309094106.196911-2-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-19 10:15:06 +01:00

1 2 3 4 5 ...

5334 Commits