This patch contains test which minimally tests write-zeroes on top of
working discard.
The following checks are added:
* write 2 clusters, write-zero to the first allocated cluster
* write 2 cluster, write-zero to the half the first allocated cluster
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
The zero flag is missed in the Parallels format specification. We can
resort to discard if we have no backing file.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
This patch contains test which minimally tests discard and new cluster
allocation logic.
The following checks are added:
* write 2 clusters, discard the first allocated
* write another cluster, check that the hole is filled
* write 2 clusters, discard the first allocated, write 1 cluster at
non-aligned to cluster offset (2 new clusters should be allocated)
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
* Discarding with backing stores is not supported by the format.
* There is no buffering/queueing of the discard operation.
* Only operations aligned to the cluster are supported.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Replace 'space' representing the amount of data to preallocate with
'bytes'.
Rationale:
* 'space' at each place is converted to bytes
* the unit is more close to the variable name
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
The access to the bitmap is not optimized completely.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
We should extend the bitmap if the file is extended and set the bit in
the image used bitmap once the cluster is allocated. Sanity check at
that moment also looks like a good idea.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
This would be useful in the next patch in allocate_clusters(). This
change would not imply serious performance drawbacks as usually image
is full of data or are at the end of the bitmap.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
The test is quite similar with the original one for duplicated clusters.
There is the only difference in the operation which should fix the
image.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Original check is broken as supposed reading from 2 different clusters
results in read from the same file offset twice. This is definitely
wrong.
We should be sure that
* the content of both clusters is correct after repair
* clusters are at the different offsets after repair
In order to check the latter we write some content into the first one
and validate that fact.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
If the operation is failed, we need to check image consistency if the
problem is not about memory allocation.
Bitmap adjustments in allocate_cluster are not performed yet.
They worth to be separate. This was proven useful during debug of this
series. Kept as is for future bissecting.
It should be specifically noted that used bitmap must be recalculated
if data_off has been fixed during image consistency check.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
We have only check through self-repair and that proven to be not enough.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Once we have repaired data_off field in the header we should update
s->data_start which is calculated on the base of it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Since
commit cfce1091d5
Author: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Date: Tue Jul 18 12:44:29 2023 +0200
parallels: Image repairing in parallels_open()
there is a potential pit fall with calling
qemu-io -c "read"
The image is opened in read-write mode and thus could be potentially
repaired. This could ruin testing process.
The patch forces read-only opening for reads. In that case repairing
is impossible.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
This functionality is used twice already and next patch will add more
code with it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
More conditions follows thus the check should be more scalable.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
At the beginning of the function we can return immediately until we
really allocate s->header.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
We do not need to perform any deallocation/cleanup if wrong format is
detected.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
This patch creates above mentioned helper and moves its usage to the
beginning of parallels_open(). This simplifies parallels_open() a bit.
The patch also ensures that we store prealloc_size on block driver state
always in sectors. This makes code cleaner and avoids wrong opinion at
the assignment that the value is in bytes.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
We should free opts allocated through qemu_opts_create() at the end.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Parallels driver indeed support Parallels Dirty Bitmap Feature in
read-only mode. The patch adds bdrv_supports_persistent_dirty_bitmap()
callback which always return 1 to indicate that.
This will allow to copy CBT from Parallels image with qemu-img.
Note: read-write support is signalled through
bdrv_co_can_store_new_dirty_bitmap() and is different.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Old code is ugly and contains tabulations. There are no functional
changes in this patch.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
All 32-bit hppa CPUs allow a fixed number of TLB entries to have a
different page size than the default 4k.
Those are called "Block-TLBs" and are created at startup by the
operating system and managed by the firmware of hppa machines
through the firmware PDC_BLOCK_TLB call.
This patchset adds the necessary glue to SeaBIOS-hppa and
qemu to allow up to 16 BTLB entries in the emulation.
Two patches from Mikulas Patocka fix signal delivery issues
in linux-user on hppa.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZQnz0wAKCRD3ErUQojoP
X6NDAP9F1Huhceot8peohGodRDOhnXWfDcjQZSDvadieKv/rJQEA60Z5QV5VlQgw
SyUT4AcoiB7N4nvS+iDa+6dKfRH/YQM=
=kqqt
-----END PGP SIGNATURE-----
Merge tag 'hppa-btlb-pull-request' of https://github.com/hdeller/qemu-hppa into staging
Block-TLB support and linux-user fixes for hppa target
All 32-bit hppa CPUs allow a fixed number of TLB entries to have a
different page size than the default 4k.
Those are called "Block-TLBs" and are created at startup by the
operating system and managed by the firmware of hppa machines
through the firmware PDC_BLOCK_TLB call.
This patchset adds the necessary glue to SeaBIOS-hppa and
qemu to allow up to 16 BTLB entries in the emulation.
Two patches from Mikulas Patocka fix signal delivery issues
in linux-user on hppa.
# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZQnz0wAKCRD3ErUQojoP
# X6NDAP9F1Huhceot8peohGodRDOhnXWfDcjQZSDvadieKv/rJQEA60Z5QV5VlQgw
# SyUT4AcoiB7N4nvS+iDa+6dKfRH/YQM=
# =kqqt
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 19 Sep 2023 15:17:39 EDT
# gpg: using EDDSA key BCE9123E1AD29F07C049BBDEF712B510A23A0F5F
# gpg: Good signature from "Helge Deller <deller@gmx.de>" [unknown]
# gpg: aka "Helge Deller <deller@kernel.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 4544 8228 2CD9 10DB EF3D 25F8 3E5F 3D04 A7A2 4603
# Subkey fingerprint: BCE9 123E 1AD2 9F07 C049 BBDE F712 B510 A23A 0F5F
* tag 'hppa-btlb-pull-request' of https://github.com/hdeller/qemu-hppa:
linux-user/hppa: lock both words of function descriptor
linux-user/hppa: clear the PSW 'N' bit when delivering signals
target/hppa: Wire up diag instruction to support BTLB
target/hppa: Extract diagnose immediate value
target/hppa: Add BTLB support to hppa TLB functions
target/hppa: Report and clear BTLBs via fw_cfg at startup
target/hppa: Allow up to 16 BTLB entries
target/hppa: Update to SeaBIOS-hppa version 9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It is forbidden to block on the event loop during a coroutine, as that
can cause deadlocks due to recursive locking.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230908075458.527013-1-pbonzini@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Process zoned requests in the current thread's AioContext instead of in
the BlockBackend's AioContext.
There is no need to use the BlockBackend's AioContext thanks to CoMutex
bs->wps->colock, which protects zone metadata.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230912231037.826804-5-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Switch blk_aio_*() APIs over to multi-queue by using
qemu_get_current_aio_context() instead of blk_get_aio_context(). This
change will allow devices to process I/O in multiple IOThreads in the
future.
I audited existing blk_aio_*() callers:
- migration/block.c: blk_mig_lock() protects the data accessed by the
completion callback.
- The remaining emulated devices and exports run with
qemu_get_aio_context() == blk_get_aio_context().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230912231037.826804-4-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch fixes a race condition in test-bdrv-drain that is difficult
to reproduce. test-bdrv-drain sometimes fails without an error message
on the block pull request sent by Kevin Wolf on Sep 4, 2023. I was able
to reproduce it locally and found that "block-backend: process I/O in
the current AioContext" (in this patch series) is the first commit where
it reproduces.
I do not know why "block-backend: process I/O in the current AioContext"
exposes this bug. It might be related to the fact that the test's preadv
request runs in the main thread instead of IOThread a after my commit.
That might simply change the timing of the test.
Now on to the race condition in test-bdrv-drain. The main thread
schedules a BH in IOThread a and then drains the BDS:
aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);
/* The request is running on the IOThread a. Draining its block device
* will make sure that it has completed as far as the BDS is concerned,
* but the drain in this thread can continue immediately after
* bdrv_dec_in_flight() and aio_ret might be assigned only slightly
* later. */
do_drain_begin(drain_type, bs);
If the BH completes before do_drain_begin() then there is nothing to
worry about.
If the BH invokes bdrv_flush() before do_drain_begin(), then
do_drain_begin() waits for it to complete.
The problematic case is when do_drain_begin() runs before the BH enters
bdrv_flush(). Then do_drain_begin() misses the BH and the drain
mechanism has failed in quiescing I/O.
Fix this by incrementing the in_flight counter so that do_drain_begin()
waits for test_iothread_main_thread_bh().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230912231037.826804-3-stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The synchronous bdrv_aio_cancel() function needs the acb's AioContext so
it can call aio_poll() to wait for cancellation.
It turns out that all users run under the BQL in the main AioContext, so
this callback is not needed.
Remove the callback, mark bdrv_aio_cancel() GLOBAL_STATE_CODE just like
its blk_aio_cancel() caller, and poll the main loop AioContext.
The purpose of this cleanup is to identify bdrv_aio_cancel() as an API
that does not work with the multi-queue block layer.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230912231037.826804-2-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Right now "qemu-img map" reports compressed blocks as containing data
but having no host offset. This is not very informative. Instead,
let's add another boolean field named "compressed" in case JSON output
mode is specified. This is achieved by utilizing new allocation status
flag BDRV_BLOCK_COMPRESSED for bdrv_block_status().
Also update the expected qemu-iotests outputs to contain the new field.
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Message-ID: <20230907210226.953821-3-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Functions qcow2_get_host_offset(), get_cluster_offset(),
vmdk_co_block_status() explicitly report compressed cluster types when data
is compressed. However, this information is never passed further. Let's
make use of it by adding new BDRV_BLOCK_COMPRESSED flag for
bdrv_block_status(), so that caller may know that the data range is
compressed. In particular, we're going to use this flag to tweak
"qemu-img map" output.
This new flag is only being utilized by qcow, qcow2 and vmdk formats, as only
those support compression.
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Message-ID: <20230907210226.953821-2-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The functions read the parents list in the generic block layer, so we
need to hold the graph lock already there. The BlockDriver
implementations actually modify the graph, so it has to be a writer
lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-22-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_unref_child(). These callers will typically
already hold the graph lock once the locking work is completed, which
means that they can't call functions that take it internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-21-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_root_unref_child(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-20-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The function reads the parents list, so it needs to hold the graph lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-ID: <20230911094620.45040-19-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The function reads the parents list, so it needs to hold the graph lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-ID: <20230911094620.45040-18-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The function reads the parents list, so it needs to hold the graph lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-17-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_child_perm() need to hold a reader lock for the graph because
some implementations access the children list of a node.
The callers of bdrv_child_perm() conveniently already hold the lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-16-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The function reads the parents list, so it needs to hold the graph lock.
This happens to result in BlockDriver.bdrv_set_perm() to be called with
the graph lock held. For consistency, make it the same for all of the
BlockDriver callbacks for updating permissions and annotate the function
pointers with GRAPH_RDLOCK_PTR.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-15-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The function reads the parents list, so it needs to hold the graph lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-14-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_attach_child_common(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-13-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In previous patches, we changed some transactionable functions to be
marked as GRAPH_WRLOCK, but required that tran_finalize() is still
called without the lock. This was because all callbacks that can be in
the same transaction need to follow the same convention.
Now that we don't have conflicting requirements any more, we can switch
all of the transaction callbacks to be declared GRAPH_WRLOCK, too, and
call tran_finalize() with the lock held.
Document for each of these transactionable functions that the lock needs
to be held when completing the transaction, and make sure that all
callers down to the place where the transaction is finalised actually
have the writer lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-12-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_attach_child_common(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
Note that the transaction callbacks still take the lock internally, so
tran_finalize() must be called without the lock held. This is because
bdrv_append() also calls bdrv_replace_node_noperm(), which currently
requires the transaction callbacks to be called unlocked. In the next
step, both of them can be switched to locked tran_finalize() calls
together.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-11-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_replace_child_tran(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
While a graph lock is held, polling is not allowed. Therefore draining
the necessary nodes can no longer be done in bdrv_remove_child() and
bdrv_replace_node_noperm(), but the callers must already make sure that
they are drained.
Note that the transaction callbacks still take the lock internally, so
tran_finalize() must be called without the lock held. This is because
bdrv_append() also calls bdrv_attach_child_noperm(), which currently
requires to be called unlocked. Once it changes, the transaction
callbacks can be changed, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-10-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_replace_child_noperm(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-9-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Don't assume specific parameter names like 'bs' or 'blk' in the
generated code, but use the actual name.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-8-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a new wrapper type for GRAPH_WRLOCK functions that should be called
from coroutine context.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-7-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_unref() is called by a lot of places that need to hold the graph
lock (it naturally happens in the context of operations that change the
graph). However, bdrv_unref() takes the graph writer lock internally, so
it can't actually be called while already holding a graph lock without
causing a deadlock.
bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
node before closing it, and draining requires that the graph is
unlocked.
The solution is to defer deleting the node until we don't hold the lock
any more and draining is possible again.
Note that keeping images open for longer than necessary can create
problems, too: You can't open an image again before it is really closed
(if image locking didn't prevent it, it would cause corruption).
Reopening an image immediately happens at least during bdrv_open() and
bdrv_co_create().
In order to solve this problem, make sure to run the deferred unref in
bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
The output of iotest 051 is updated because the additional polling
changes the order of HMP output, resulting in a new "(qemu)" prompt in
the test output that was previously on a separate line and filtered out.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230911094620.45040-6-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The documentation for bdrv_append() says that the caller must hold the
AioContext lock for bs_top. Change all callers to actually adhere to the
contract.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-5-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>