Commit Graph

5566 Commits

Author SHA1 Message Date
Xie Yongji
779d82e1d3 vduse-blk: Add name option
Currently we use 'id' option as the name of VDUSE device.
It's a bit confusing since we use one value for two different
purposes: the ID to identfy the export within QEMU (must be
distinct from any other exports in the same QEMU process, but
can overlap with names used by other processes), and the VDUSE
name to uniquely identify it on the host (must be distinct from
other VDUSE devices on the same host, but can overlap with other
export types like NBD in the same process). To make it clear,
this patch adds a separate 'name' option to specify the VDUSE
name for the vduse-blk export instead.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20220614051532.92-7-xieyongji@bytedance.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Xie Yongji
0862a087fd vduse-blk: Add serial option
Add a 'serial' option to allow user to specify this value
explicitly. And the default value is changed to an empty
string as what we did in "hw/block/virtio-blk.c".

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20220614051532.92-6-xieyongji@bytedance.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Eric Blake
2866ddd121 nbd: Drop dead code spotted by Coverity
CID 1488362 points out that the second 'rc >= 0' check is now dead
code.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: 172f5f1a40(nbd: remove peppering of nbd_client_connected)
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220516210519.76135-1-eblake@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Fabian Ebner
9b38fc56c0 block/gluster: correctly set max_pdiscard
On 64-bit platforms, assigning SIZE_MAX to the int64_t max_pdiscard
results in a negative value, and the following assertion would trigger
down the line (it's not the same max_pdiscard, but computed from the
other one):
qemu-system-x86_64: ../block/io.c:3166: bdrv_co_pdiscard: Assertion
`max_pdiscard >= bs->bl.request_alignment' failed.

On 32-bit platforms, it's fine to keep using SIZE_MAX.

The assertion in qemu_gluster_co_pdiscard() is checking that the value
of 'bytes' can safely be passed to glfs_discard_async(), which takes a
size_t for the argument in question, so it is kept as is. And since
max_pdiscard is still <= SIZE_MAX, relying on max_pdiscard is still
fine.

Fixes: 0c8022876f ("block: use int64_t instead of int in driver discard handlers")
Cc: qemu-stable@nongnu.org
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Message-Id: <20220520075922.43972-1-f.ebner@proxmox.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Stefano Garzarella
66dc5f9606 block/rbd: report a better error when namespace does not exist
If the namespace does not exist, rbd_create() fails with -ENOENT and
QEMU reports a generic "error rbd create: No such file or directory":

    $ qemu-img create rbd:rbd/namespace/image 1M
    Formatting 'rbd:rbd/namespace/image', fmt=raw size=1048576
    qemu-img: rbd:rbd/namespace/image: error rbd create: No such file or directory

Unfortunately rados_ioctx_set_namespace() does not fail if the namespace
does not exist, so let's use rbd_namespace_exists() in qemu_rbd_connect()
to check if the namespace exists, reporting a more understandable error:

    $ qemu-img create rbd:rbd/namespace/image 1M
    Formatting 'rbd:rbd/namespace/image', fmt=raw size=1048576
    qemu-img: rbd:rbd/namespace/image: namespace 'namespace' does not exist

Reported-by: Tingting Mao <timao@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220517071012.6120-1-sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Xie Yongji
d043e2db87 libvduse: Add support for reconnecting
To support reconnecting after restart or crash, VDUSE backend
might need to resubmit inflight I/Os. This stores the metadata
such as the index of inflight I/O's descriptors to a shm file so
that VDUSE backend can restore them during reconnecting.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20220523084611.91-9-xieyongji@bytedance.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Xie Yongji
9e4dea6727 vduse-blk: Add vduse-blk resize support
To support block resize, this uses vduse_dev_update_config()
to update the capacity field in configuration space and inject
config interrupt on the block resize callback.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220523084611.91-8-xieyongji@bytedance.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Xie Yongji
2a2359b844 vduse-blk: Implement vduse-blk export
This implements a VDUSE block backends based on
the libvduse library. We can use it to export the BDSs
for both VM and container (host) usage.

The new command-line syntax is:

$ qemu-storage-daemon \
    --blockdev file,node-name=drive0,filename=test.img \
    --export vduse-blk,node-name=drive0,id=vduse-export0,writable=on

After the qemu-storage-daemon started, we need to use
the "vdpa" command to attach the device to vDPA bus:

$ vdpa dev add name vduse-export0 mgmtdev vduse

Also the device must be removed via the "vdpa" command
before we stop the qemu-storage-daemon.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220523084611.91-7-xieyongji@bytedance.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Xie Yongji
5c36802970 block/export: Abstract out the logic of virtio-blk I/O process
Abstract the common logic of virtio-blk I/O process to a function
named virtio_blk_process_req(). It's needed for the following commit.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20220523084611.91-4-xieyongji@bytedance.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Xie Yongji
8e7fd6f623 block/export: Fix incorrect length passed to vu_queue_push()
Now the req->size is set to the correct value only
when handling VIRTIO_BLK_T_GET_ID request. This patch
fixes it.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20220523084611.91-3-xieyongji@bytedance.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Xie Yongji
ac1fc3a3a9 block: Support passing NULL ops to blk_set_dev_ops()
This supports passing NULL ops to blk_set_dev_ops()
so that we can remove stale ops in some cases.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220523084611.91-2-xieyongji@bytedance.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Vladimir Sementsov-Ogievskiy
618af89e55 block: simplify handling of try to merge different sized bitmaps
We have too much logic to simply check that bitmaps are of the same
size. Let's just define that hbitmap_merge() and
bdrv_dirty_bitmap_merge_internal() require their argument bitmaps be of
same size, this simplifies things.

Let's look through the callers:

For backup_init_bcs_bitmap() we already assert that merge can't fail.

In bdrv_reclaim_dirty_bitmap_locked() we gracefully handle the error
that can't happen: successor always has same size as its parent, drop
this logic.

In bdrv_merge_dirty_bitmap() we already has assertion and separate
check. Make the check explicit and improve error message.

Signed-off-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220517111206.23585-4-v.sementsov-og@mail.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Vladimir Sementsov-Ogievskiy
58cbfbdf73 block: improve block_dirty_bitmap_merge(): don't allocate extra bitmap
We don't need extra bitmap. All we need is to backup the original
bitmap when we do first merge. So, drop extra temporary bitmap and work
directly with target and backup.

Still to keep old semantics, that on failure target is unchanged and
user don't need to restore, we need a local_backup variable and do
restore ourselves on failure path.

Signed-off-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
Message-Id: <20220517111206.23585-3-v.sementsov-og@mail.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Vladimir Sementsov-Ogievskiy
775b30b305 block: block_dirty_bitmap_merge(): fix error path
At the end we ignore failure of bdrv_merge_dirty_bitmap() and report
success. And still set errp. That's wrong.

Signed-off-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220517111206.23585-2-v.sementsov-og@mail.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Stefan Hajnoczi
1ab5096b3a block: get rid of blk->guest_block_size
Commit 1b7fd72955 ("block: rename buffer_alignment to
guest_block_size") noted:

  At this point, the field is set by the device emulation, but completely
  ignored by the block layer.

The last time the value of buffer_alignment/guest_block_size was
actually used was before commit 339064d506 ("block: Don't use guest
sector size for qemu_blockalign()").

This value has not been used since 2013. Get rid of it.

Cc: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220518130945.2657905-1-stefanha@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Faria <afaria@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Stefan Hajnoczi
3399848b7f block: drop unused bdrv_co_drain() API
bdrv_co_drain() has not been used since commit 9a0cec664e ("mirror:
use bdrv_drained_begin/bdrv_drained_end") in 2016. Remove it so there
are fewer drain scenarios to worry about.

Use bdrv_drained_begin()/bdrv_drained_end() instead. They are "mixed"
functions that can be called from coroutine context. Unlike
bdrv_co_drain(), these functions provide control of the length of the
drained section, which is usually the right thing.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220521122714.3837731-1-stefanha@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Alberto Faria <afaria@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Stefan Hajnoczi
99b969fbe1 linux-aio: explain why max batch is checked in laio_io_unplug()
It may not be obvious why laio_io_unplug() checks max batch. I discussed
this with Stefano and have added a comment summarizing the reason.

Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20220609164712.1539045-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-06-15 16:43:42 +01:00
Stefan Hajnoczi
f387cac5af linux-aio: fix unbalanced plugged counter in laio_io_unplug()
Every laio_io_plug() call has a matching laio_io_unplug() call. There is
a plugged counter that tracks the number of levels of plugging and
allows for nesting.

The plugged counter must reflect the balance between laio_io_plug() and
laio_io_unplug() calls accurately. Otherwise I/O stalls occur since
io_submit(2) calls are skipped while plugged.

Reported-by: Nikolay Tenev <nt@storpool.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20220609164712.1539045-2-stefanha@redhat.com
Cc: Stefano Garzarella <sgarzare@redhat.com>
Fixes: 68d7946648 ("linux-aio: add `dev_max_batch` parameter to laio_io_unplug()")
[Stefano Garzarella suggested adding a Fixes tag.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-06-15 16:43:42 +01:00
Sam Li
e2848bc574 Use io_uring_register_ring_fd() to skip fd operations
Linux recently added a new io_uring(7) optimization API that QEMU
doesn't take advantage of yet. The liburing library that QEMU uses
has added a corresponding new API calling io_uring_register_ring_fd().
When this API is called after creating the ring, the io_uring_submit()
library function passes a flag to the io_uring_enter(2) syscall
allowing it to skip the ring file descriptor fdget()/fdput()
operations. This saves some CPU cycles.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-id: 20220531105011.111082-1-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-06-15 14:50:41 +01:00
Paolo Bonzini
f0d43b1ece coroutine-lock: qemu_co_queue_restart_all is a coroutine-only qemu_co_enter_all
qemu_co_queue_restart_all is basically the same as qemu_co_enter_all
but without a QemuLockable argument.  That's perfectly fine, but only as
long as the function is marked coroutine_fn.  If used outside coroutine
context, qemu_co_queue_wait will attempt to take the lock and that
is just broken: if you are calling qemu_co_queue_restart_all outside
coroutine context, the lock is going to be a QemuMutex which cannot be
taken twice by the same thread.

The patch adds the marker to qemu_co_queue_restart_all and to its sole
non-coroutine_fn caller; it then reimplements the function in terms of
qemu_co_enter_all_impl, to remove duplicated code and to clarify that the
latter also works in coroutine context.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220427130830.150180-4-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 12:29:44 +02:00
Markus Armbruster
9c0928045c Clean up ill-advised or unusual header guards
Leading underscores are ill-advised because such identifiers are
reserved.  Trailing underscores are merely ugly.  Strip both.

Our header guards commonly end in _H.  Normalize the exceptions.

Macros should be ALL_CAPS.  Normalize the exception.

Done with scripts/clean-header-guards.pl.

include/hw/xen/interface/ and tools/virtiofsd/ left alone, because
these were imported from Xen and libfuse respectively.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220506134911.2856099-3-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2022-05-11 16:50:01 +02:00
Markus Armbruster
52581c718c Clean up header guards that don't match their file name
Header guard symbols should match their file name to make guard
collisions less likely.

Cleaned up with scripts/clean-header-guards.pl, followed by some
renaming of new guard symbols picked by the script to better ones.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220506134911.2856099-2-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[Change to generated file ebpf/rss.bpf.skeleton.h backed out]
2022-05-11 16:49:06 +02:00
Hanna Reitz
6d17e28798 block/vmdk: Fix reopening bs->file
VMDK disk data is stored in extents, which may or may not be separate
from bs->file.  VmdkExtent.file points to where they are stored.  Each
that is stored in bs->file will simply reuse the exact pointer value of
bs->file.

(That is why vmdk_free_extents() will unref VmdkExtent.file (e->file)
only if e->file != bs->file.)

Reopen operations can change bs->file (they will replace the whole
BdrvChild object, not just the BDS stored in that BdrvChild), and then
we will need to change all .file pointers of all such VmdkExtents to
point to the new BdrvChild.

In vmdk_reopen_prepare(), we have to check which VmdkExtents are
affected, and in vmdk_reopen_commit(), we can modify them.  We have to
split this because:
- The new BdrvChild is created only after prepare, so we can change
  VmdkExtent.file only in commit
- In commit, there no longer is any (valid) reference to the old
  BdrvChild object, so there would be nothing to compare VmdkExtent.file
  against to see whether it was equal to bs->file before reopening
  (There is BDRVReopenState.old_file_bs, but the old bs->file
  BdrvChild's .bs pointer will be NULL-ed when the new BdrvChild is
  created, and so we cannot compare VmdkExtent.file->bs against
  BDRVReopenState.old_file_bs)

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220314162719.65384-2-hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-04 15:55:23 +02:00
Hanna Reitz
06e9cd19a4 qcow2: Do not reopen data_file in invalidate_cache
qcow2_co_invalidate_cache() closes and opens the qcow2 file, by calling
qcow2_close() and qcow2_do_open().  These two functions must thus be
usable from both a global-state and an I/O context.

As they are, they are not safe to call in an I/O context, because they
use bdrv_unref_child() and bdrv_open_child() to close/open the data_file
child, respectively, both of which are global-state functions.  When
used from qcow2_co_invalidate_cache(), we do not need to close/open the
data_file child, though (we do not do this for bs->file or bs->backing
either), and so we should skip it in the qcow2_co_invalidate_cache()
path.

To do so, add a parameter to qcow2_do_open() and qcow2_close() to make
them skip handling s->data_file, and have qcow2_co_invalidate_cache()
exempt it from the memset() on the BDRVQcow2State.

(Note that the QED driver similarly closes/opens the QED image by
invoking bdrv_qed_close()+bdrv_qed_do_open(), but both functions seem
safe to use in an I/O context.)

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/945
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220427114057.36651-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-04 15:55:23 +02:00
Marc-André Lureau
ad24b679d2 block: move fcntl_setfl()
It is only used by block/file-posix.c, move it there.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2022-05-03 15:17:53 +04:00
Paolo Bonzini
620c5cb5da nbd: document what is protected by the CoMutexes
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-10-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-04-26 13:16:42 -05:00
Paolo Bonzini
a80a9a1c73 nbd: take receive_mutex when reading requests[].receiving
requests[].receiving is set by nbd_receive_replies() under the receive_mutex;
Read it under the same mutex as well.  Waking up receivers on errors happens
after each reply finishes processing, in nbd_co_receive_one_chunk().
If there is no currently-active reply, there are two cases:

* either there is no active request at all, in which case no
element of request[] can have .receiving = true

* or nbd_receive_replies() must be running and owns receive_mutex;
in that case it will get back to nbd_co_receive_one_chunk() because
the socket has been shutdown, and all waiting coroutines will wake up
in turn.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-9-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-04-26 13:16:41 -05:00
Paolo Bonzini
dba5156c0e nbd: move s->state under requests_lock
Remove the confusing, and most likely wrong, atomics.  The only function
that used to be somewhat in a hot path was nbd_client_connected(),
but it is not anymore after the previous patches.

The same logic is used both to check if a request had to be reissued
and also in nbd_reconnecting_attempt().  The former cases are outside
requests_lock, while nbd_reconnecting_attempt() does have the lock,
therefore the two have been separated in the previous commit.
nbd_client_will_reconnect() can simply take s->requests_lock, while
nbd_reconnecting_attempt() can inline the access now that no
complicated atomics are involved.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-8-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-04-26 13:16:39 -05:00
Paolo Bonzini
8d45185cb7 nbd: code motion and function renaming
Prepare for the next patch, so that the diff is less confusing.

nbd_client_connecting is moved closer to the definition point.

nbd_client_connecting_wait() is kept only for the reconnection
logic; when it is used to check if a request has to be reissued,
use the renamed function nbd_client_will_reconnect().  In the
next patch, the two cases will have different locking requirements.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-7-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-04-26 13:16:38 -05:00
Paolo Bonzini
ee19d953ec nbd: use a QemuMutex to synchronize yanking, reconnection and coroutines
The condition for waiting on the s->free_sema queue depends on
both s->in_flight and s->state.  The latter is currently using
atomics, but this is quite dubious and probably wrong.

Because s->state is written in the main thread too, for example by
the yank callback, it cannot be protected by a CoMutex.  Introduce a
separate lock that can be used by nbd_co_send_request(); later on this
lock will also be used for s->state.  There will not be any contention
on the lock unless there is a yank or reconnect, so this is not
performance sensitive.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-6-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-04-26 13:16:37 -05:00
Paolo Bonzini
8610b4491f nbd: keep send_mutex/free_sema handling outside nbd_co_do_establish_connection
Elevate s->in_flight early so that other incoming requests will wait
on the CoQueue in nbd_co_send_request; restart them after getting back
from nbd_reconnect_attempt.  This could be after the reconnect timer or
nbd_cancel_in_flight have cancelled the attempt, so there is no
need anymore to cancel the requests there.

nbd_co_send_request now handles both stopping and restarting pending
requests after a successful connection, and there is no need to
hold send_mutex in nbd_co_do_establish_connection.  The current setup
is confusing because nbd_co_do_establish_connection is called both with
send_mutex taken and without it.  Before the patch it uses free_sema which
(at least in theory...) is protected by send_mutex, after the patch it
does not anymore.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-5-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: wrap long line]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-04-26 13:16:36 -05:00
Paolo Bonzini
172f5f1a40 nbd: remove peppering of nbd_client_connected
It is unnecessary to check nbd_client_connected() because every time
s->state is moved out of NBD_CLIENT_CONNECTED the socket is shut down
and all coroutines are resumed.

The only case where it was actually needed is when the NBD
server disconnects and there is no reconnect-delay.  In that
case, nbd_receive_replies() does not set s->reply.handle and
nbd_co_do_receive_one_chunk() cannot continue.  For that one case,
check the return value of nbd_receive_replies().

As to the others:

* nbd_receive_replies() can put the current coroutine to sleep if another
reply is ongoing; then it will be woken by nbd_channel_error(), called
by the ongoing reply.  Or it can try itself to read a reply header and
fail, thus calling nbd_channel_error() itself.

* nbd_co_send_request() will write the body of the request and fail

* nbd_reply_chunk_iter_receive() will call nbd_co_receive_one_chunk()
and then nbd_co_do_receive_one_chunk(), which will handle the failure as
above; or it will just detect a previous call to nbd_iter_channel_error()
via iter->ret < 0.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-4-pbonzini@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-04-26 13:16:36 -05:00
Paolo Bonzini
0c43c6fc89 nbd: mark more coroutine_fns
Several coroutine functions in block/nbd.c are not marked as such.  This
patch adds a few more markers; it is not exhaustive, but it focuses
especially on:

- places that wake other coroutines, because aio_co_wake() has very
different semantics inside a coroutine (queuing after yield vs. entering
immediately);

- functions with _co_ in their names, to avoid confusion

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-3-pbonzini@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-04-26 13:16:35 -05:00
Paolo Bonzini
8846b7d1c1 nbd: safeguard against waking up invalid coroutine
The .reply_possible field of s->requests is never set to false.  This is
not a problem as it is only a safeguard to detect protocol errors,
but it's sloppy.  In fact, the field is actually not necessary at all,
because .coroutine is set to NULL in NBD_FOREACH_REPLY_CHUNK after
receiving the last chunk.  Thus, replace .reply_possible with .coroutine
and move the check before deciding the fate of this request.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-2-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-04-26 13:16:24 -05:00
Vladimir Sementsov-Ogievskiy
1466ef6cbe qapi: rename BlockDirtyBitmapMergeSource to BlockDirtyBitmapOrStr
Rename the type to be reused. Old name is "what is it for". To be
natively reused for other needs, let's name it exactly "what is it".

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Message-Id: <20220314213226.362217-2-v.sementsov-og@mail.ru>
[eblake: Adjust S-o-b to Vladimir's new email, with permission]
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-04-26 13:13:50 -05:00
Denis V. Lunev
80dd5aff1b block: add 'force' parameter to 'blockdev-change-medium' command
'blockdev-change-medium' is a convinient wrapper for the following
sequence of commands:
 * blockdev-open-tray
 * blockdev-remove-medium
 * blockdev-insert-medium
 * blockdev-close-tray
and should be used f.e. to change ISO image inside the CD-ROM tray.
Though the guest could lock the tray and some linux guests like
CentOS 8.5 actually does that. In this case the execution if this
command results in the error like the following:
  Device 'scsi0-0-1-0' is locked and force was not specified,
  wait for tray to open and try again.

This situation is could be resolved 'blockdev-open-tray' by passing
flag 'force' inside. Thus is seems reasonable to add the same
capability for 'blockdev-change-medium' too.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Acked-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Hanna Reitz <hreitz@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220412221846.280723-1-den@openvz.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-04-25 12:02:36 +02:00
Hanna Reitz
0423f75351 qcow2: Add errp to rebuild_refcount_structure()
Instead of fprint()-ing error messages in rebuild_refcount_structure()
and its rebuild_refcounts_write_refblocks() helper, pass them through an
Error object to qcow2_check_refcounts() (which will then print it).

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220405134652.19278-4-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2022-04-20 12:09:17 +02:00
Hanna Reitz
a8c07ec287 qcow2: Improve refcount structure rebuilding
When rebuilding the refcount structures (when qemu-img check -r found
errors with refcount = 0, but reference count > 0), the new refcount
table defaults to being put at the image file end[1].  There is no good
reason for that except that it means we will not have to rewrite any
refblocks we already wrote to disk.

Changing the code to rewrite those refblocks is not too difficult,
though, so let us do that.  That is beneficial for images on block
devices, where we cannot really write beyond the end of the image file.

Use this opportunity to add extensive comments to the code, and refactor
it a bit, getting rid of the backwards-jumping goto.

[1] Unless there is something allocated in the area pointed to by the
    last refblock, so we have to write that refblock.  In that case, we
    try to put the reftable in there.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1519071
Closes: https://gitlab.com/qemu-project/qemu/-/issues/941
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220405134652.19278-2-hreitz@redhat.com>
2022-04-20 10:14:28 +02:00
Marc-André Lureau
0f9668e0c1 Remove qemu-common.h include from most units
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:55 +02:00
Marc-André Lureau
89fc45d5c6 include: move qemu_get_vm_name() to sysemu.h
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-26-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:43 +02:00
Marc-André Lureau
8e3b0cbb72 Replace qemu_real_host_page variables with inlined functions
Replace the global variables with inlined helper functions. getpagesize() is very
likely annotated with a "const" function attribute (at least with glibc), and thus
optimization should apply even better.

This avoids the need for a constructor initialization too.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 10:50:38 +02:00
Hanna Reitz
b1e1af394d block/stream: Drain subtree around graph change
When the stream block job cuts out the nodes between top and base in
stream_prepare(), it does not drain the subtree manually; it fetches the
base node, and tries to insert it as the top node's backing node with
bdrv_set_backing_hd().  bdrv_set_backing_hd() however will drain, and so
the actual base node might change (because the base node is actually not
part of the stream job) before the old base node passed to
bdrv_set_backing_hd() is installed.

This has two implications:

First, the stream job does not keep a strong reference to the base node.
Therefore, if it is deleted in bdrv_set_backing_hd()'s drain (e.g.
because some other block job is drained to finish), we will get a
use-after-free.  We should keep a strong reference to that node.

Second, even with such a strong reference, the problem remains that the
base node might change before bdrv_set_backing_hd() actually runs and as
a result the wrong base node is installed.

Both effects can be seen in 030's TestParallelOps.test_overlapping_5()
case, which has five nodes, and simultaneously streams from the middle
node to the top node, and commits the middle node down to the base node.
As it is, this will sometimes crash, namely when we encounter the
above-described use-after-free.

Taking a strong reference to the base node, we no longer get a crash,
but the resuling block graph is less than ideal: The expected result is
obviously that all middle nodes are cut out and the base node is the
immediate backing child of the top node.  However, if stream_prepare()
takes a strong reference to its base node (the middle node), and then
the commit job finishes in bdrv_set_backing_hd(), supposedly dropping
that middle node, the stream job will just reinstall it again.

Therefore, we need to keep the whole subtree drained in
stream_prepare(), so that the graph modification it performs is
effectively atomic, i.e. that the base node it fetches is still the base
node when bdrv_set_backing_hd() sets it as the top node's backing node.

Verify this by asserting in said 030's test case that the base node is
always the top node's immediate backing child when both jobs are done.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220324140907.17192-1-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
2022-03-29 16:30:55 +02:00
Philippe Mathieu-Daudé
3f1db95917 block: Fix misleading hexadecimal format
"0x%u" format is very misleading, replace by "0x%x".

Found running:

  $ git grep -E '0x%[0-9]*([lL]*|" ?PRI)[dDuU]' block/

Inspired-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-id: 20220323114718.58714-2-philippe.mathieu.daude@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-03-24 10:38:42 +00:00
Peter Maydell
04ddcda6a2 Fixes and cleanups for 7.0
Hi,
 
 A collection of fixes & cleanup patches that should be safe for 7.0 inclusion.
 -----BEGIN PGP SIGNATURE-----
 
 iQJQBAABCAA6FiEEh6m9kz+HxgbSdvYt2ujhCXWWnOUFAmI5vPIcHG1hcmNhbmRy
 ZS5sdXJlYXVAcmVkaGF0LmNvbQAKCRDa6OEJdZac5a7ED/9+DCc6b+yAeMsFR7SI
 kqxSvPW9RbgQrJo0LrJxX7H+xYs40JFpkNZFhuAGgWPrk6GlebMzg+aMgSlZi4XN
 B7y5/dAKUUPCC+kNQ7azP4Gp+xb+Pxg2ZZxQ9SnxsGgPWC1prliiB8Zbvs8f5lHl
 ACbh7wvfVOcSJoMaCAf5km4AFzWYQQkwn2w3CRl4CfWnuWUhjnnYL9DfjHrfaYPK
 JCbRCx534dy/amrMPgbAOcDRl0K9/9Tw+xATxOkQPLZ4Za4tclsAGZ9Hb2WoDuWS
 LYQ1ZJVouv37EnaPVMCyPyC2n4oLJ86L2RCSBqKgIgv7rmwTUcqlfYPVg7TZGxuw
 T234lIc8AXcm2UNQ4iTXLH/Od9RGHKseZSF8QYTVGNDtfvp3bDFVT6k5e2X/SpXY
 gVloTdFzmwYWM8dtREPepZlEhXNKz7XdltlrcwyDdKWW0OffLRyKkNIsuUja7EoL
 q4n8l4tq084iLTHpEUSWaFwZvu89b8n81hML0box6XXrOldk1qdf57Ka5gqxNrnk
 pJES7ocRoTANjZgASrJW8vPu3/GkdlmE/Khf5bnOzq/lWMwVxPqYEQY+PRoAU2zR
 MS1UJ9IITe3toJlx7+DqR8Lo6fUyralwKv/MUnBW65S45S7VkbCO4anELNnVvzAE
 CFfsa30VblNDEbppBMXwRFyX0Q==
 =fKgO
 -----END PGP SIGNATURE-----

Merge tag 'fixes-pull-request' of gitlab.com:marcandre.lureau/qemu into staging

Fixes and cleanups for 7.0

Hi,

A collection of fixes & cleanup patches that should be safe for 7.0 inclusion.

# gpg: Signature made Tue 22 Mar 2022 12:11:30 GMT
# gpg:                using RSA key 87A9BD933F87C606D276F62DDAE8E10975969CE5
# gpg:                issuer "marcandre.lureau@redhat.com"
# gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" [full]
# gpg:                 aka "Marc-André Lureau <marcandre.lureau@gmail.com>" [full]
# Primary key fingerprint: 87A9 BD93 3F87 C606 D276  F62D DAE8 E109 7596 9CE5

* tag 'fixes-pull-request' of gitlab.com:marcandre.lureau/qemu: (21 commits)
  qapi: remove needless include
  Remove trailing ; after G_DEFINE_AUTO macro
  tests: remove needless include
  error: use GLib to remember the program name
  qga: remove bswap.h include
  qapi: remove needless include
  meson: fix CONFIG_ATOMIC128 check
  meson: move int128 checks from configure
  qapi: remove needless include
  util: remove the net/net.h dependency
  util: remove needless includes
  scripts/modinfo-collect: remove unused/dead code
  Move HOST_LONG_BITS to compiler.h
  Simplify HOST_LONG_BITS
  compiler.h: replace QEMU_SENTINEL with G_GNUC_NULL_TERMINATED
  compiler.h: replace QEMU_WARN_UNUSED_RESULT with G_GNUC_WARN_UNUSED_RESULT
  Replace GCC_FMT_ATTR with G_GNUC_PRINTF
  Drop qemu_foo() socket API wrapper
  m68k/nios2-semi: fix gettimeofday() result check
  vl: typo fix in a comment
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-03-22 18:43:03 +00:00
Peter Maydell
9d36d5f7e0 Block patches for 7.0-rc1:
- iotest fixes:
   - Fix some iotests for riscv targets
   - Use GNU sed in more places where required
   - Meson-related fixes (i.e. to print errors when they occur)
   - Have qemu-img calls (from Python tests) generally raise nicely
     formattable exceptions on errors
   - Fix iotest 207
 - Allow RBD images to be growable by writing zeroes past the end of
   file, fixing qcow2 on rbd
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEy2LXoO44KeRfAE00ofpA0JgBnN8FAmI5uC4SHGhyZWl0ekBy
 ZWRoYXQuY29tAAoJEKH6QNCYAZzfsPIP/1iRUWmWd9h9m7gX5hSU+TWYXsH+Ua/G
 fKdDHqNlVjQHq6SDN+A3jDCjAV9I91vlHRVIMWXc4QlKbdNdY7yUP1zYKuRTbfig
 UXce6g1NQoZUIlBSHZDqzgEmrvjunwP1U2te2LWliQjmvlTJgdIHYUJn2VgmY3wg
 2vo8exE3YO1FArS9nFsiX/1Ju35Dm4+3w9NkI6KxnKvaFpY++jovVVy2Bj7Jmfsh
 bRnfiQtDVX+0H4FXYS02hDEX4h7PTzsd1DeapVbiiZW9KJrbb0rchTSGc+VKMdkC
 z2XDfU+Hw4jYlNomuWdSZy6qJMNzUaKEqJMji1OiLym4429OAyseL8EO5c6Utjcb
 RRqGKWBOp1ceFZcy8vmQ2Rxc7b3Nc/Jv41Ty7PbyHmrtd2nbgD+AVnlFH9qWDtZo
 clvFSaIHcHaC4k+MppsbGTKvbW7qRYUkdk1B+tFlZytwQpFvM4oK2CF8jNzoLYfY
 qJIvrBqgaBKnYzAGCV4qgH9I0gWY7WHwvwfevHy47rk7XDsErWKMfBFy294TYDyq
 +yU6K1VijWDEn/DdQZMSZQJeE7ByA4cfSSYGRmwtTLOZxmUi4mEBEEo6Lw3x4eI2
 eXj/fhRenbjfE+5p3xmWvEyaVvvyor9oXUPH45HN/LvGgUJmEsjs0XGwnNNwxJBT
 s2lp8U5Lo4Lb
 =0cG5
 -----END PGP SIGNATURE-----

Merge tag 'pull-block-2022-03-22' of https://gitlab.com/hreitz/qemu into staging

Block patches for 7.0-rc1:
- iotest fixes:
  - Fix some iotests for riscv targets
  - Use GNU sed in more places where required
  - Meson-related fixes (i.e. to print errors when they occur)
  - Have qemu-img calls (from Python tests) generally raise nicely
    formattable exceptions on errors
  - Fix iotest 207
- Allow RBD images to be growable by writing zeroes past the end of
  file, fixing qcow2 on rbd

# gpg: Signature made Tue 22 Mar 2022 11:51:10 GMT
# gpg:                using RSA key CB62D7A0EE3829E45F004D34A1FA40D098019CDF
# gpg:                issuer "hreitz@redhat.com"
# gpg: Good signature from "Hanna Reitz <hreitz@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: CB62 D7A0 EE38 29E4 5F00  4D34 A1FA 40D0 9801 9CDF

* tag 'pull-block-2022-03-22' of https://gitlab.com/hreitz/qemu: (25 commits)
  iotests/207: Filter host fingerprint
  iotests.py: Filters for VM.run_job()
  iotests: make qemu_img_log and img_info_log raise on error
  iotests: remove qemu_img_pipe_and_status()
  iotests: replace qemu_img_log('create', ...) calls
  iotests: use qemu_img() in has_working_luks()
  iotests: remove remaining calls to qemu_img_pipe()
  iotests/149: Remove qemu_img_pipe() call
  iotests: replace unchecked calls to qemu_img_pipe()
  iotests: change supports_quorum to use qemu_img
  iotests: add qemu_img_map() function
  iotests/remove-bitmap-from-backing: use qemu_img_info()
  iotests: add qemu_img_info()
  iotests: use qemu_img_json() when applicable
  iotests: add qemu_img_json()
  iotests: fortify compare_images() against crashes
  iotests: make qemu_img raise on non-zero rc by default
  iotests: Remove explicit checks for qemu_img() == 0
  python/utils: add VerboseProcessError
  python/utils: add add_visual_margin() text decoration utility
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-03-22 12:44:11 +00:00
Marc-André Lureau
c08401793a compiler.h: replace QEMU_WARN_UNUSED_RESULT with G_GNUC_WARN_UNUSED_RESULT
One less qemu-specific macro. It also helps to make some headers/units
only depend on glib, and thus moved in standalone projects eventually.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2022-03-22 14:40:51 +04:00
Marc-André Lureau
9edc6313da Replace GCC_FMT_ATTR with G_GNUC_PRINTF
One less qemu-specific macro. It also helps to make some headers/units
only depend on glib, and thus moved in standalone projects eventually.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
2022-03-22 14:40:51 +04:00
Stefano Garzarella
cc5387a544 block/rbd: fix write zeroes with growing images
Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220317162638.41192-1-sgarzare@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-03-22 09:40:54 +01:00
Rao Lei
6690302b84 block/nbd.c: Fixed IO request coroutine not being wakeup when kill NBD server
During the IO stress test, the IO request coroutine has a probability that is
can't be awakened when the NBD server is killed.

The GDB stack is as follows:
(gdb) bt
0  0x00007f2ff990cbf6 in __ppoll (fds=0x55575de85000, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
1  0x000055575c302e7c in qemu_poll_ns (fds=0x55575de85000, nfds=1, timeout=599999603140) at ../util/qemu-timer.c:348
2  0x000055575c2d3c34 in fdmon_poll_wait (ctx=0x55575dc480f0, ready_list=0x7ffd9dd1dae0, timeout=599999603140) at ../util/fdmon-poll.c:80
3  0x000055575c2d350d in aio_poll (ctx=0x55575dc480f0, blocking=true) at ../util/aio-posix.c:655
4  0x000055575c16eabd in bdrv_do_drained_begin(bs=0x55575dee7fe0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true)at ../block/io.c:474
5  0x000055575c16eba6 in bdrv_drained_begin (bs=0x55575dee7fe0) at ../block/io.c:480
6  0x000055575c1aff33 in quorum_del_child (bs=0x55575dee7fe0, child=0x55575dcea690, errp=0x7ffd9dd1dd08) at ../block/quorum.c:1130
7  0x000055575c14239b in bdrv_del_child (parent_bs=0x55575dee7fe0, child=0x55575dcea690, errp=0x7ffd9dd1dd08) at ../block.c:7705
8  0x000055575c12da28 in qmp_x_blockdev_change(parent=0x55575df404c0 "colo-disk0", has_child=true, child=0x55575de867f0 "children.1", has_node=false, no   de=0x0, errp=0x7ffd9dd1dd08) at ../blockdev.c:3676
9  0x000055575c258435 in qmp_marshal_x_blockdev_change (args=0x7f2fec008190, ret=0x7f2ff7b0bd98, errp=0x7f2ff7b0bd90) at qapi/qapi-commands-block-core.c   :1675
10 0x000055575c2c6201 in do_qmp_dispatch_bh (opaque=0x7f2ff7b0be30) at ../qapi/qmp-dispatch.c:129
11 0x000055575c2ebb1c in aio_bh_call (bh=0x55575dc429c0) at ../util/async.c:141
12 0x000055575c2ebc2a in aio_bh_poll (ctx=0x55575dc480f0) at ../util/async.c:169
13 0x000055575c2d2d96 in aio_dispatch (ctx=0x55575dc480f0) at ../util/aio-posix.c:415
14 0x000055575c2ec07f in aio_ctx_dispatch (source=0x55575dc480f0, callback=0x0, user_data=0x0) at ../util/async.c:311
15 0x00007f2ff9e7cfbd in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
16 0x000055575c2fd581 in glib_pollfds_poll () at ../util/main-loop.c:232
17 0x000055575c2fd5ff in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255
18 0x000055575c2fd710 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
19 0x000055575bfa7588 in qemu_main_loop () at ../softmmu/runstate.c:726
20 0x000055575bbee57a in main (argc=60, argv=0x7ffd9dd1e0e8, envp=0x7ffd9dd1e2d0) at ../softmmu/main.c:50

(gdb) qemu coroutine 0x55575e16aac0
0  0x000055575c2ee7dc in qemu_coroutine_switch (from_=0x55575e16aac0, to_=0x7f2ff830fba0, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:302
1  0x000055575c2fe2a9 in qemu_coroutine_yield () at ../util/qemu-coroutine.c:195
2  0x000055575c2fe93c in qemu_co_queue_wait_impl (queue=0x55575dc46170, lock=0x7f2b32ad9850) at ../util/qemu-coroutine-lock.c:56
3  0x000055575c17ddfb in nbd_co_send_request (bs=0x55575ebfaf20, request=0x7f2b32ad9920, qiov=0x55575dfc15d8) at ../block/nbd.c:478
4  0x000055575c17f931 in nbd_co_request (bs=0x55575ebfaf20, request=0x7f2b32ad9920, write_qiov=0x55575dfc15d8) at ../block/nbd.c:1182
5  0x000055575c17fe14 in nbd_client_co_pwritev (bs=0x55575ebfaf20, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/nbd.c:1284
6  0x000055575c170d25 in bdrv_driver_pwritev (bs=0x55575ebfaf20, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0)
    at ../block/io.c:1264
7  0x000055575c1733b4 in bdrv_aligned_pwritev
    (child=0x55575dff6890, req=0x7f2b32ad9ad0, offset=403487858688, bytes=4538368, align=1, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2126
8  0x000055575c173c67 in bdrv_co_pwritev_part (child=0x55575dff6890, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0)
    at ../block/io.c:2314
9  0x000055575c17391b in bdrv_co_pwritev (child=0x55575dff6890, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/io.c:2233
10 0x000055575c1ee506 in replication_co_writev (bs=0x55575e9824f0, sector_num=788062224, remaining_sectors=8864, qiov=0x55575dfc15d8, flags=0)
    at ../block/replication.c:270
11 0x000055575c170eed in bdrv_driver_pwritev (bs=0x55575e9824f0, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0)
    at ../block/io.c:1297
12 0x000055575c1733b4 in bdrv_aligned_pwritev
    (child=0x55575dcea690, req=0x7f2b32ad9e00, offset=403487858688, bytes=4538368, align=512, qiov=0x55575dfc15d8, qiov_offset=0, flags=0)
    at ../block/io.c:2126
13 0x000055575c173c67 in bdrv_co_pwritev_part (child=0x55575dcea690, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0)
    at ../block/io.c:2314
14 0x000055575c17391b in bdrv_co_pwritev (child=0x55575dcea690, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/io.c:2233
15 0x000055575c1aeffa in write_quorum_entry (opaque=0x7f2fddaf8c50) at ../block/quorum.c:699
16 0x000055575c2ee4db in coroutine_trampoline (i0=1578543808, i1=21847) at ../util/coroutine-ucontext.c:173
17 0x00007f2ff9855660 in __start_context () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91

When we do failover in COLO mode, QEMU will hang while it is waiting for
the in-flight IO. From the call trace, we can see the IO request coroutine
has yielded in nbd_co_send_request(). When we kill the NBD server, it will never
be wake up. Actually, when we do IO stress test, it will have a lot of
requests in free_sema queue. When the NBD server is killed, current
MAX_NBD_REQUESTS finishes with errors but they wake up at most
MAX_NBD_REQEUSTS from the queue. So, let's move qemu_co_queue_next out
to fix this issue.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Message-Id: <20220309074844.275450-1-lei.rao@intel.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-03-21 15:21:09 -05:00
Philippe Mathieu-Daudé
aa44d3f6b8 block/file-posix: Remove a deprecation warning on macOS 12
When building on macOS 12 we get:

  block/file-posix.c:3335:18: warning: 'IOMasterPort' is deprecated: first deprecated in macOS 12.0 [-Wdeprecated-declarations]
      kernResult = IOMasterPort( MACH_PORT_NULL, &masterPort );
                   ^~~~~~~~~~~~
                   IOMainPort

Replace by IOMainPort, redefining it to IOMasterPort if not available.

Suggested-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed by: Cameron Esfahani <dirty@apple.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Tested-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2022-03-15 13:36:33 +01:00