Commit Graph

216 Commits

Author SHA1 Message Date
Stefan Hajnoczi
bfa36802d1 virtio-blk: avoid using ioeventfd state in irqfd conditional
Requests that complete in an IOThread use irqfd to notify the guest
while requests that complete in the main loop thread use the traditional
qdev irq code path. The reason for this conditional is that the irq code
path requires the BQL:

  if (s->ioeventfd_started && !s->ioeventfd_disabled) {
      virtio_notify_irqfd(vdev, req->vq);
  } else {
      virtio_notify(vdev, req->vq);
  }

There is a corner case where the conditional invokes the irq code path
instead of the irqfd code path:

  static void virtio_blk_stop_ioeventfd(VirtIODevice *vdev)
  {
      ...
      /*
       * Set ->ioeventfd_started to false before draining so that host notifiers
       * are not detached/attached anymore.
       */
      s->ioeventfd_started = false;

      /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
      blk_drain(s->conf.conf.blk);

During blk_drain() the conditional produces the wrong result because
ioeventfd_started is false.

Use qemu_in_iothread() instead of checking the ioeventfd state.

Cc: qemu-stable@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-15394
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240122172625.415386-1-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-08 09:37:33 +01:00
Hanna Czenczek
52bff01f64 virtio-blk: Use ioeventfd_attach in start_ioeventfd
Commit d3f6f294ae ("virtio-blk: always set
ioeventfd during startup") has made virtio_blk_start_ioeventfd() always
kick the virtqueue (set the ioeventfd), regardless of whether the BB is
drained.  That is no longer necessary, because attaching the host
notifier will now set the ioeventfd, too; this happens either
immediately right here in virtio_blk_start_ioeventfd(), or later when
the drain ends, in virtio_blk_ioeventfd_attach().

With event_notifier_set() removed, the code becomes the same as the one
in virtio_blk_ioeventfd_attach(), so we can reuse that function.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20240202153158.788922-4-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 21:51:06 +01:00
Stefan Hajnoczi
b3d9bb9a56 virtio-blk: do not use C99 mixed declarations
QEMU's coding style generally forbids C99 mixed declarations.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240206140410.65650-1-stefanha@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 15:08:26 +01:00
Stefan Hajnoczi
f2eea93c6b virtio-blk: add vq_rq[] bounds check in virtio_blk_dma_restart_cb()
Hanna Czenczek <hreitz@redhat.com> noted that the array index in
virtio_blk_dma_restart_cb() is not bounds-checked:

  g_autofree VirtIOBlockReq **vq_rq = g_new0(VirtIOBlockReq *, num_queues);
  ...
  while (rq) {
      VirtIOBlockReq *next = rq->next;
      uint16_t idx = virtio_get_queue_index(rq->vq);

      rq->next = vq_rq[idx];
                 ^^^^^^^^^^

The code is correct because both rq->vq and vq_rq[] depend on
num_queues, but this is indirect and not 100% obvious. Add an assertion.

Suggested-by: Hanna Czenczek <hreitz@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240206190610.107963-4-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 14:44:13 +01:00
Stefan Hajnoczi
5fbcbd50fc virtio-blk: clarify that there is at least 1 virtqueue
It is not possible to instantiate a virtio-blk device with 0 virtqueues.
The following check is located in ->realize():

  if (!conf->num_queues) {
      error_setg(errp, "num-queues property must be larger than 0");
      return;
  }

Later on we access s->vq_aio_context[0] under the assumption that there
is as least one virtqueue. Hanna Czenczek <hreitz@redhat.com> noted that
it would help to show that the array index is already valid.

Add an assertion to document that s->vq_aio_context[0] is always
safe...and catch future code changes that break this assumption.

Suggested-by: Hanna Czenczek <hreitz@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240206190610.107963-3-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 14:44:09 +01:00
Stefan Hajnoczi
1f995a4782 virtio-blk: enforce iothread-vq-mapping validation
Hanna Czenczek <hreitz@redhat.com> noticed that the safety of
`vq_aio_context[vq->value] = ctx;` with user-defined vq->value inputs is
not obvious.

The code is structured in validate() + apply() steps so input validation
is there, but it happens way earlier and there is nothing that
guarantees apply() can only be called with validated inputs.

This patch moves the validate() call inside the apply() function so
validation is guaranteed. I also added the bounds checking assertion
that Hanna suggested.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20240206190610.107963-2-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 14:44:05 +01:00
Stefan Hajnoczi
d3f6f294ae virtio-blk: always set ioeventfd during startup
When starting ioeventfd it is common practice to set the event notifier
so that the ioeventfd handler is triggered to run immediately. There may
be no requests waiting to be processed, but the idea is that if a
request snuck in then we guarantee that it will be detected.

One scenario where self-triggering the ioeventfd is necessary is when
virtio_blk_handle_output() is called from a vCPU thread before the
VIRTIO Device Status transitions to DRIVER_OK. In that case we need to
self-trigger the ioeventfd so that the kick handled by the vCPU thread
causes the vq AioContext thread to take over handling the request(s).

Fixes: b6948ab01d ("virtio-blk: add iothread-vq-mapping parameter")
Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240119135748.270944-7-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-01-26 11:16:58 +01:00
Stefan Hajnoczi
ea0736d7f8 virtio-blk: tolerate failure to set BlockBackend AioContext
We no longer rely on setting the AioContext since the block layer
IO_CODE APIs can be called from any thread. Now it's just a hint to help
block jobs and other operations co-locate themselves in a thread with
the guest I/O requests. Keep going if setting the AioContext fails.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240119135748.270944-6-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-01-26 11:16:58 +01:00
Stefan Hajnoczi
71ee0cdd14 virtio-blk: restart s->rq reqs in vq AioContexts
A virtio-blk device with the iothread-vq-mapping parameter has
per-virtqueue AioContexts. It is not thread-safe to process s->rq
requests in the BlockBackend AioContext since that may be different from
the virtqueue's AioContext to which this request belongs. The code
currently races and could crash.

Adapt virtio_blk_dma_restart_cb() to first split s->rq into per-vq lists
and then schedule a BH each vq's AioContext as necessary. This way
requests are safely processed in their vq's AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240119135748.270944-5-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-01-26 11:16:58 +01:00
Stefan Hajnoczi
3cdaf3dd4a virtio-blk: rename dataplane to ioeventfd
The dataplane code is really about using ioeventfd. It's used both for
IOThreads (what we think of as dataplane) and for the core virtio-pci
code's ioeventfd feature (which is enabled by default and used when no
IOThread has been specified). Rename the code to reflect this.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240119135748.270944-4-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-01-26 11:16:58 +01:00
Stefan Hajnoczi
57bc265893 virtio-blk: rename dataplane create/destroy functions
virtio_blk_data_plane_create() and virtio_blk_data_plane_destroy() are
actually about s->vq_aio_context[] rather than managing
dataplane-specific state.

As a prerequisite to using s->vq_aio_context[] in all code paths (even
when dataplane is not used), rename these functions to reflect that they
just manage s->vq_aio_context and call them regardless of whether or not
dataplane is in use.

Note that virtio-blk supports running with -device
virtio-blk-pci,ioevent=off where the vCPU thread enters the device
emulation code. In this mode ioeventfd is not used for virtqueue
processing. However, we still want to initialize s->vq_aio_context[] to
qemu_aio_context in that case since I/O completion callbacks will be
invoked in the main loop thread.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240119135748.270944-3-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-01-26 11:16:58 +01:00
Stefan Hajnoczi
3bcc17f065 virtio-blk: move dataplane code into virtio-blk.c
The dataplane code used to be significantly different from the
non-dataplane code and therefore had a separate source file.

Over time the difference has gotten smaller because the I/O code paths
were unified. Nowadays the distinction between the VirtIOBlock and
VirtIOBlockDataPlane structs is more of an inconvenience that hinders
code simplification.

Move hw/block/dataplane/virtio-blk.c into hw/block/virtio-blk.c, merging
VirtIOBlockDataPlane's fields into VirtIOBlock.

hw/block/virtio-blk.c used VirtIOBlock->dataplane to check if
virtio_blk_data_plane_create() was successful. This is not necessary
because ->dataplane_started and ->dataplane_disabled can be used
instead. This patch makes those changes in order to drop
VirtIOBlock->dataplane.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240119135748.270944-2-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-01-26 11:16:58 +01:00
Stefan Hajnoczi
0b2675c473 Rename "QEMU global mutex" to "BQL" in comments and docs
The term "QEMU global mutex" is identical to the more widely used Big
QEMU Lock ("BQL"). Update the code comments and documentation to use
"BQL" instead of "QEMU global mutex".

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-id: 20240102153529.486531-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-01-08 10:45:43 -05:00
Richard Henderson
7d5dc0a367 hw/block: Constify VMState
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231221031652.119827-25-richard.henderson@linaro.org>
2023-12-29 11:17:30 +11:00
Stefan Hajnoczi
b6948ab01d virtio-blk: add iothread-vq-mapping parameter
Add the iothread-vq-mapping parameter to assign virtqueues to IOThreads.
Store the vq:AioContext mapping in the new struct
VirtIOBlockDataPlane->vq_aio_context[] field and refactor the code to
use the per-vq AioContext instead of the BlockDriverState's AioContext.

Reimplement --device virtio-blk-pci,iothread= and non-IOThread mode by
assigning all virtqueues to the IOThread and main loop's AioContext in
vq_aio_context[], respectively.

The comment in struct VirtIOBlockDataPlane about EventNotifiers is
stale. Remove it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231220134755.814917-5-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21 22:49:28 +01:00
Stefan Hajnoczi
b49f4755c7 block: remove AioContext locking
This is the big patch that removes
aio_context_acquire()/aio_context_release() from the block layer and
affected block layer users.

There isn't a clean way to split this patch and the reviewers are likely
the same group of people, so I decided to do it in one patch.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-ID: <20231205182011.1976568-7-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21 22:49:27 +01:00
Stefan Hajnoczi
6b5aa3a0fd virtio-blk: don't lock AioContext in the submission code path
There is no need to acquire the AioContext lock around blk_aio_*() or
blk_get_geometry() anymore. I/O plugging (defer_call()) also does not
require the AioContext lock anymore.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230914140101.1065008-5-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21 22:49:27 +01:00
Stefan Hajnoczi
c1135913c6 virtio-blk: don't lock AioContext in the completion code path
Nothing in the completion code path relies on the AioContext lock
anymore. Virtqueues are only accessed from one thread at any moment and
the s->rq global state is protected by its own lock now.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230914140101.1065008-4-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21 22:49:27 +01:00
Stefan Hajnoczi
9c67f33fca virtio-blk: add lock to protect s->rq
s->rq is accessed from IO_CODE and GLOBAL_STATE_CODE. Introduce a lock
to protect s->rq and eliminate reliance on the AioContext lock.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230914140101.1065008-3-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21 22:49:27 +01:00
Stefan Hajnoczi
433fcea40c util/defer-call: move defer_call() to util/
The networking subsystem may wish to use defer_call(), so move the code
to util/ where it can be reused.

As a reminder of what defer_call() does:

This API defers a function call within a defer_call_begin()/defer_call_end()
section, allowing multiple calls to batch up. This is a performance
optimization that is used in the block layer to submit several I/O requests
at once instead of individually:

  defer_call_begin(); <-- start of section
  ...
  defer_call(my_func, my_obj); <-- deferred my_func(my_obj) call
  defer_call(my_func, my_obj); <-- another
  defer_call(my_func, my_obj); <-- another
  ...
  defer_call_end(); <-- end of section, my_func(my_obj) is called once

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230913200045.1024233-3-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-10-31 15:41:42 +01:00
Stefan Hajnoczi
ccee48aa73 block: rename blk_io_plug_call() API to defer_call()
Prepare to move the blk_io_plug_call() API out of the block layer so
that other subsystems call use this deferred call mechanism. Rename it
to defer_call() but leave the code in block/plug.c.

The next commit will move the code out of the block layer.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230913200045.1024233-2-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-10-31 15:41:24 +01:00
Stefan Hajnoczi
41abca8c39 block: add blk_io_plug_call() API
Introduce a new API for thread-local blk_io_plug() that does not
traverse the block graph. The goal is to make blk_io_plug() multi-queue
friendly.

Instead of having block drivers track whether or not we're in a plugged
section, provide an API that allows them to defer a function call until
we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
called multiple times with the same fn/opaque pair, then fn() is only
called once at the end of the function - resulting in batching.

This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
because the plug state is now thread-local.

Later patches convert block drivers to blk_io_plug_call() and then we
can finally remove .bdrv_co_io_plug() once all block drivers have been
converted.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 07:34:03 -04:00
Stefan Hajnoczi
1665d9326f virtio-blk: implement BlockDevOps->drained_begin()
Detach ioeventfds during drained sections to stop I/O submission from
the guest. virtio-blk is no longer reliant on aio_disable_external()
after this patch. This will allow us to remove the
aio_disable_external() API once all other code that relies on it is
converted.

Take extra care to avoid attaching/detaching ioeventfds if the data
plane is started/stopped during a drained section. This should be rare,
but maybe the mirror block job can trigger it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230516190238.8401-18-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Sam Li
4e92acf74d virtio-blk: add some trace events for zoned emulation
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508051916.178322-4-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15 08:18:10 -04:00
Sam Li
52eb76f4b1 block: add accounting for zone append operation
Taking account of the new zone append write operation for zoned devices,
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
write, flush).

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-id: 20230508051916.178322-3-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15 08:18:10 -04:00
Sam Li
4f7366506a virtio-blk: add zoned storage emulation for zoned devices
This patch extends virtio-blk emulation to handle zoned device commands
by calling the new block layer APIs to perform zoned device I/O on
behalf of the guest. It supports Report Zone, four zone oparations (open,
close, finish, reset), and Append Zone.

The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
support zoned block devices. Regular block devices(conventional zones)
will not be set.

The guest os can use blktests, fio to test those commands on zoned devices.
Furthermore, using zonefs to test zone append write is also supported.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-id: 20230508051916.178322-2-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15 08:18:10 -04:00
Emanuele Giuseppe Esposito
1f433e84c3 virtio-blk: add missing AioContext lock
virtio_blk_update_config() calls blk_get_geometry and blk_getlength,
and both functions eventually end up calling bdrv_poll_co when not
running in a coroutine:
- blk_getlength is a co_wrapper_mixed function
- blk_get_geometry calls bdrv_get_geometry -> bdrv_nb_sectors, a
  co_wrapper_mixed function too

Since we are not running in a coroutine, we need to take s->blk
AioContext lock, otherwise bdrv_poll_co will inevitably call
AIO_WAIT_WHILE and therefore try to un unlock() an AioContext lock
that was never acquired.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2167838

Steps to reproduce the issue: simply boot a VM with
-object '{"qom-type":"iothread","id":"iothread1"}' \
-blockdev '{"driver":"file","filename":"$QCOW2","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}' \
-device virtio-blk-pci,iothread=iothread1,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on

and observe that it will fail not manage to boot with "qemu_mutex_unlock_impl: Operation not permitted"

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230208111148.1040083-1-eesposit@redhat.com>
2023-02-09 10:21:23 -05:00
Stefan Hajnoczi
a937f8e857 virtio-blk: simplify virtio_blk_dma_restart_cb()
virtio_blk_dma_restart_cb() is tricky because the BH must deal with
virtio_blk_data_plane_start()/virtio_blk_data_plane_stop() being called.

There are two issues with the code:

1. virtio_blk_realize() should use qdev_add_vm_change_state_handler()
   instead of qemu_add_vm_change_state_handler(). This ensures the
   ordering with virtio_init()'s vm change state handler that calls
   virtio_blk_data_plane_start()/virtio_blk_data_plane_stop() is
   well-defined. Then blk's AioContext is guaranteed to be up-to-date in
   virtio_blk_dma_restart_cb() and it's no longer necessary to have a
   special case for virtio_blk_data_plane_start().

2. Only blk_drain() waits for virtio_blk_dma_restart_cb()'s
   blk_inc_in_flight() to be decremented. The bdrv_drain() family of
   functions do not wait for BlockBackend's in_flight counter to reach
   zero. virtio_blk_data_plane_stop() relies on blk_set_aio_context()'s
   implicit drain, but that's a bdrv_drain() and not a blk_drain().
   Note that virtio_blk_reset() already correctly relies on blk_drain().
   If virtio_blk_data_plane_stop() switches to blk_drain() then we can
   properly wait for pending virtio_blk_dma_restart_bh() calls.

Once these issues are taken care of the code becomes simpler. This
change is in preparation for multiple IOThreads in virtio-blk where we
need to clean up the multi-threading behavior.

I ran the reproducer from commit 49b44549ac ("virtio-blk: On restart,
process queued requests in the proper context") to check that there is
no regression.

Cc: Sergio Lopez <slp@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-id: 20221102182337.252202-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-01-23 15:01:23 -05:00
Stefan Hajnoczi
baf422684d virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint
Register guest RAM using BlockRAMRegistrar and set the
BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory
accesses in I/O requests.

This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely
on DMA mapping/unmapping.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20221013185908.1297568-14-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-26 14:56:42 -04:00
Daniil Tatianin
d9cf55a86d virtio-blk: move config size params to virtio-blk-common
This way we can reuse it for other virtio-blk devices, e.g
vhost-user-blk, which currently does not control its config space size
dynamically.

Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220906073111.353245-3-d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07 09:41:51 -04:00
Daniil Tatianin
d74c30c811 virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size
This is the first step towards moving all device config size calculation
logic into the virtio core code. In particular, this adds a struct that
contains all the necessary information for common virtio code to be able
to calculate the final config size for a device. This is expected to be
used with the new virtio_get_config_size helper, which calculates the
final length based on the provided host features.

This builds on top of already existing code like VirtIOFeature and
virtio_feature_get_config_size(), but adds additional fields, as well as
sanity checking so that device-specifc code doesn't have to duplicate it.

An example usage would be:

    static const VirtIOFeature dev_features[] = {
        {.flags = 1ULL << FEATURE_1_BIT,
         .end = endof(struct virtio_dev_config, feature_1)},
        {.flags = 1ULL << FEATURE_2_BIT,
         .end = endof(struct virtio_dev_config, feature_2)},
        {}
    };

    static const VirtIOConfigSizeParams dev_cfg_size_params = {
        .min_size = DEV_BASE_CONFIG_SIZE,
        .max_size = sizeof(struct virtio_dev_config),
        .feature_sizes = dev_features
    };

    // code inside my_dev_device_realize()
    size_t config_size = virtio_get_config_size(&dev_cfg_size_params,
                                                host_features);
    virtio_init(vdev, VIRTIO_ID_MYDEV, config_size);

Currently every device is expected to write its own boilerplate from the
example above in device_realize(), however, the next step of this
transition is moving VirtIOConfigSizeParams into VirtioDeviceClass,
so that it can be done automatically by the virtio initialization code.

All of the users of virtio_feature_get_config_size have been converted
to use virtio_get_config_size so it's no longer needed and is removed
with this commit.

Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Message-Id: <20220906073111.353245-2-d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07 09:41:51 -04:00
Stefan Hajnoczi
1ab5096b3a block: get rid of blk->guest_block_size
Commit 1b7fd72955 ("block: rename buffer_alignment to
guest_block_size") noted:

  At this point, the field is set by the device emulation, but completely
  ignored by the block layer.

The last time the value of buffer_alignment/guest_block_size was
actually used was before commit 339064d506 ("block: Don't use guest
sector size for qemu_blockalign()").

This value has not been used since 2013. Get rid of it.

Cc: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220518130945.2657905-1-stefanha@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Faria <afaria@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Jonah Palmer
3857cd5c7f virtio: drop name parameter for virtio_init()
This patch drops the name parameter for the virtio_init function.

The pair between the numeric device ID and the string device ID
(name) of a virtio device already exists, but not in a way that
lets us map between them.

This patch lets us do this and removes the need for the name
parameter in the virtio_init function.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1648819405-25696-2-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-16 04:38:40 -04:00
Kevin Wolf
98e3ab3505 coroutine: Rename qemu_coroutine_inc/dec_pool_size()
It's true that these functions currently affect the batch size in which
coroutines are reused (i.e. moved from the global release pool to the
allocation pool of a specific thread), but this is a bug and will be
fixed in a separate patch.

In fact, the comment in the header file already just promises that it
influences the pool size, so reflect this in the name of the functions.
As a nice side effect, the shorter function name makes some line
wrapping unnecessary.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220510151020.105528-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-12 12:20:45 +02:00
Hiroki Narukawa
4c41c69e05 util: adjust coroutine pool size to virtio block queue
Coroutine pool size was 64 from long ago, and the basis was organized in the commit message in 4d68e86b.

At that time, virtio-blk queue-size and num-queue were not configuable, and equivalent values were 128 and 1.

Coroutine pool size 64 was fine then.

Later queue-size and num-queue got configuable, and default values were increased.

Coroutine pool with size 64 exhausts frequently with random disk IO in new size, and slows down.

This commit adjusts coroutine pool size adaptively with new values.

This commit adds 64 by default, but now coroutine is not only for block devices,

and is not too much burdon comparing with new default.

pool size of 128 * vCPUs.

Signed-off-by: Hiroki Narukawa <hnarukaw@yahoo-corp.jp>
Message-id: 20220214115302.13294-2-hnarukaw@yahoo-corp.jp
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-02-14 17:11:25 +00:00
Stefan Hajnoczi
186b969173 virtio-blk: drop unused virtio_blk_handle_vq() return value
The return value of virtio_blk_handle_vq() is no longer used. Get rid of
it. This is a step towards unifying the dataplane and non-dataplane
virtqueue handler functions.

Prepare virtio_blk_handle_output() to be used by both dataplane and
non-dataplane by making the condition for starting ioeventfd more
specific. This way it won't trigger when dataplane has already been
started.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20211207132336.36627-4-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-01-12 17:09:39 +00:00
Philippe Mathieu-Daudé
01ce7724a1 virtio-blk: Constify VirtIOFeature feature_sizes[]
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210511104157.2880306-3-philmd@redhat.com>
2021-05-14 08:12:09 -04:00
Akihiko Odaki
fb0b154c80 virtio-blk: Respect discard granularity
Report the configured granularity for discard operation to the
guest. If this is not set use the block size.

Since until now we have ignored the configured discard granularity
and always reported the block size, let's add
'report-discard-granularity' property and disable it for older
machine types to avoid migration issues.

Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210225001239.47046-1-akihiko.odaki@gmail.com>
2021-03-15 09:48:53 +00:00
Philippe Mathieu-Daudé
538f049704 sysemu: Let VMChangeStateHandler take boolean 'running' argument
The 'running' argument from VMChangeStateHandler does not require
other value than 0 / 1. Make it a plain boolean.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210111152020.1422021-3-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09 23:13:57 +01:00
Kevin Wolf
86b1cf3227 block: Separate blk_is_writable() and blk_supports_write_perm()
Currently, blk_is_read_only() tells whether a given BlockBackend can
only be used in read-only mode because its root node is read-only. Some
callers actually try to answer a slightly different question: Is the
BlockBackend configured to be writable, by taking write permissions on
the root node?

This can differ, for example, for CD-ROM devices which don't take write
permissions, but may be backed by a writable image file. scsi-cd allows
write requests to the drive if blk_is_read_only() returns false.
However, the write request will immediately run into an assertion
failure because the write permission is missing.

This patch introduces separate functions for both questions.
blk_supports_write_perm() answers the question whether the block
node/image file can support writable devices, whereas blk_is_writable()
tells whether the BlockBackend is currently configured to be writable.

All calls of blk_is_read_only() are converted to one of the two new
functions.

Fixes: https://bugs.launchpad.net/bugs/1906693
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210118123448.307825-2-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-01-27 20:45:20 +01:00
Stefan Hajnoczi
7bd04a041a virtio-blk: undo destructive iov_discard_*() operations
Fuzzing discovered that virtqueue_unmap_sg() is being called on modified
req->in/out_sg iovecs. This means dma_memory_map() and
dma_memory_unmap() calls do not have matching memory addresses.

Fuzzing discovered that non-RAM addresses trigger a bug:

  void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
                           bool is_write, hwaddr access_len)
  {
      if (buffer != bounce.buffer) {
          ^^^^^^^^^^^^^^^^^^^^^^^

A modified iov->iov_base is no longer recognized as a bounce buffer and
the wrong branch is taken.

There are more potential bugs: dirty memory is not tracked correctly and
MemoryRegion refcounts can be leaked.

Use the new iov_discard_undo() API to restore elem->in/out_sg before
virtqueue_push() is called.

Fixes: 827805a249 ("virtio-blk: Convert VirtIOBlockReq.out to structrue")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1890360
Message-Id: <20200917094455.822379-3-stefanha@redhat.com>
2020-09-23 13:41:58 +01:00
Stefan Hajnoczi
9445e1e15e virtio-blk-pci: default num_queues to -smp N
Automatically size the number of virtio-blk-pci request virtqueues to
match the number of vCPUs.  Other transports continue to default to 1
request virtqueue.

A 1:1 virtqueue:vCPU mapping ensures that completion interrupts are
handled on the same vCPU that submitted the request.  No IPI is
necessary to complete an I/O request and performance is improved.  The
maximum number of MSI-X vectors and virtqueues limit are respected.

Performance improves from 78k to 104k IOPS on a 32 vCPU guest with 101
virtio-blk-pci devices (ioengine=libaio, iodepth=1, bs=4k, rw=randread
with NVMe storage).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Message-Id: <20200818143348.310613-7-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-27 08:29:13 -04:00
Roman Kagan
c56ee92fcb block: consolidate blocksize properties consistency checks
Several block device properties related to blocksize configuration must
be in certain relationship WRT each other: physical block must be no
smaller than logical block; min_io_size, opt_io_size, and
discard_granularity must be a multiple of a logical block.

To ensure these requirements are met, add corresponding consistency
checks to blkconf_blocksizes, adjusting its signature to communicate
possible error to the caller.  Also remove the now redundant consistency
checks from the specific devices.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20200528225516.1676602-3-rvkagan@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-06-17 14:53:40 +02:00
Roman Kagan
6abee26085 virtio-blk: store opt_io_size with correct size
The width of opt_io_size in virtio_blk_config is 32bit.  However, it's
written with virtio_stw_p; this may result in value truncation, and on
big-endian systems with legacy virtio in completely bogus readings in
the guest.

Use the appropriate accessor to store it.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200528225516.1676602-2-rvkagan@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-06-17 14:53:40 +02:00
Sergio Lopez
49b44549ac virtio-blk: On restart, process queued requests in the proper context
On restart, we were scheduling a BH to process queued requests, which
would run before starting up the data plane, leading to those requests
being assigned and started on coroutines on the main context.

This could cause requests to be wrongly processed in parallel from
different threads (the main thread and the iothread managing the data
plane), potentially leading to multiple issues.

For example, stopping and resuming a VM multiple times while the guest
is generating I/O on a virtio_blk device can trigger a crash with a
stack tracing looking like this one:

<------>
 Thread 2 (Thread 0x7ff736765700 (LWP 1062503)):
 #0  0x00005567a13b99d6 in iov_memset
     (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803)
     at util/iov.c:69
 #1  0x00005567a13bab73 in qemu_iovec_memset
     (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530
 #2  0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86
 #3  0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217
 #4  0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323
 #5  0x00005567a12f43d9 in qemu_laio_process_completions_and_submit (s=0x7ff7182e8420)
     at block/linux-aio.c:236
 #6  0x00005567a12f44c2 in qemu_laio_poll_cb (opaque=0x7ff7182e8430) at block/linux-aio.c:267
 #7  0x00005567a13aed83 in run_poll_handlers_once (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8)
     at util/aio-posix.c:520
 #8  0x00005567a13aee9f in run_poll_handlers (ctx=0x5567a2b58c70, max_ns=16000, timeout=0x7ff7367645f8)
     at util/aio-posix.c:562
 #9  0x00005567a13aefde in try_poll_mode (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8)
     at util/aio-posix.c:597
 #10 0x00005567a13af115 in aio_poll (ctx=0x5567a2b58c70, blocking=true) at util/aio-posix.c:639
 #11 0x00005567a109acca in iothread_run (opaque=0x5567a2b29760) at iothread.c:75
 #12 0x00005567a13b2790 in qemu_thread_start (args=0x5567a2b694c0) at util/qemu-thread-posix.c:519
 #13 0x00007ff73eedf2de in start_thread () at /lib64/libpthread.so.0
 #14 0x00007ff73ec10e83 in clone () at /lib64/libc.so.6

 Thread 1 (Thread 0x7ff743986f00 (LWP 1062500)):
 #0  0x00005567a13b99d6 in iov_memset
     (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803)
     at util/iov.c:69
 #1  0x00005567a13bab73 in qemu_iovec_memset
     (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530
 #2  0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86
 #3  0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217
 #4  0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323
 #5  0x00005567a12f4a2f in laio_do_submit (fd=19, laiocb=0x7ff5f4ff9ae0, offset=472363008, type=2)
     at block/linux-aio.c:375
 #6  0x00005567a12f4af2 in laio_co_submit
     (bs=0x5567a2b8c460, s=0x7ff7182e8420, fd=19, offset=472363008, qiov=0x7ff5f4ff9ca0, type=2)
     at block/linux-aio.c:394
 #7  0x00005567a12f1803 in raw_co_prw
     (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, type=2)
     at block/file-posix.c:1892
 #8  0x00005567a12f1941 in raw_co_pwritev
     (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, flags=0)
     at block/file-posix.c:1925
 #9  0x00005567a12fe3e1 in bdrv_driver_pwritev
     (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, qiov_offset=0, flags=0)
     at block/io.c:1183
 #10 0x00005567a1300340 in bdrv_aligned_pwritev
     (child=0x5567a2b5b070, req=0x7ff5f4ff9db0, offset=472363008, bytes=20480, align=512, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) at block/io.c:1980
 #11 0x00005567a1300b29 in bdrv_co_pwritev_part
     (child=0x5567a2b5b070, offset=472363008, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0)
     at block/io.c:2137
 #12 0x00005567a12baba1 in qcow2_co_pwritev_task
     (bs=0x5567a2b92740, file_cluster_offset=472317952, offset=487305216, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, l2meta=0x0) at block/qcow2.c:2444
 #13 0x00005567a12bacdb in qcow2_co_pwritev_task_entry (task=0x5567a2b48540) at block/qcow2.c:2475
 #14 0x00005567a13167d8 in aio_task_co (opaque=0x5567a2b48540) at block/aio_task.c:45
 #15 0x00005567a13cf00c in coroutine_trampoline (i0=738245600, i1=32759) at util/coroutine-ucontext.c:115
 #16 0x00007ff73eb622e0 in __start_context () at /lib64/libc.so.6
 #17 0x00007ff6626f1350 in  ()
 #18 0x0000000000000000 in  ()
<------>

This is also known to cause crashes with this message (assertion
failed):

 aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule'

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1812765
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20200603093240.40489-3-slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-06-17 14:53:39 +02:00
Sergio Lopez
7aa1c247b4 virtio-blk: Refactor the code that processes queued requests
Move the code that processes queued requests from
virtio_blk_dma_restart_bh() to its own, non-static, function. This
will allow us to call it from the virtio_blk_data_plane_start() in a
future patch.

Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20200603093240.40489-2-slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-06-17 14:53:39 +02:00
Markus Armbruster
b69c3c21a5 qdev: Unrealize must not fail
Devices may have component devices and buses.

Device realization may fail.  Realization is recursive: a device's
realize() method realizes its components, and device_set_realized()
realizes its buses (which should in turn realize the devices on that
bus, except bus_set_realized() doesn't implement that, yet).

When realization of a component or bus fails, we need to roll back:
unrealize everything we realized so far.  If any of these unrealizes
failed, the device would be left in an inconsistent state.  Must not
happen.

device_set_realized() lets it happen: it ignores errors in the roll
back code starting at label child_realize_fail.

Since realization is recursive, unrealization must be recursive, too.
But how could a partly failed unrealize be rolled back?  We'd have to
re-realize, which can fail.  This design is fundamentally broken.

device_set_realized() does not roll back at all.  Instead, it keeps
unrealizing, ignoring further errors.

It can screw up even for a device with no buses: if the lone
dc->unrealize() fails, it still unregisters vmstate, and calls
listeners' unrealize() callback.

bus_set_realized() does not roll back either.  Instead, it stops
unrealizing.

Fortunately, no unrealize method can fail, as we'll see below.

To fix the design error, drop parameter @errp from all the unrealize
methods.

Any unrealize method that uses @errp now needs an update.  This leads
us to unrealize() methods that can fail.  Merely passing it to another
unrealize method cannot cause failure, though.  Here are the ones that
do other things with @errp:

* virtio_serial_device_unrealize()

  Fails when qbus_set_hotplug_handler() fails, but still does all the
  other work.  On failure, the device would stay realized with its
  resources completely gone.  Oops.  Can't happen, because
  qbus_set_hotplug_handler() can't actually fail here.  Pass
  &error_abort to qbus_set_hotplug_handler() instead.

* hw/ppc/spapr_drc.c's unrealize()

  Fails when object_property_del() fails, but all the other work is
  already done.  On failure, the device would stay realized with its
  vmstate registration gone.  Oops.  Can't happen, because
  object_property_del() can't actually fail here.  Pass &error_abort
  to object_property_del() instead.

* spapr_phb_unrealize()

  Fails and bails out when remove_drcs() fails, but other work is
  already done.  On failure, the device would stay realized with some
  of its resources gone.  Oops.  remove_drcs() fails only when
  chassis_from_bus()'s object_property_get_uint() fails, and it can't
  here.  Pass &error_abort to remove_drcs() instead.

Therefore, no unrealize method can fail before this patch.

device_set_realized()'s recursive unrealization via bus uses
object_property_set_bool().  Can't drop @errp there, so pass
&error_abort.

We similarly unrealize with object_property_set_bool() elsewhere,
always ignoring errors.  Pass &error_abort instead.

Several unrealize methods no longer handle errors from other unrealize
methods: virtio_9p_device_unrealize(),
virtio_input_device_unrealize(), scsi_qdev_unrealize(), ...
Much of the deleted error handling looks wrong anyway.

One unrealize methods no longer ignore such errors:
usb_ehci_pci_exit().

Several realize methods no longer ignore errors when rolling back:
v9fs_device_realize_common(), pci_qdev_unrealize(),
spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(),
virtio_device_realize().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-17-armbru@redhat.com>
2020-05-15 07:08:14 +02:00
Markus Armbruster
40c2281cc3 Drop more @errp parameters after previous commit
Several functions can't fail anymore: ich9_pm_add_properties(),
device_add_bootindex_property(), ppc_compat_add_property(),
spapr_caps_add_properties(), PropertyInfo.create().  Drop their @errp
parameter.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-16-armbru@redhat.com>
2020-05-15 07:08:14 +02:00
Pan Nengyuan
cfaf757edd virtio-blk: delete vqs on the error path in realize()
virtio_vqs forgot to free on the error path in realize(). Fix that.

The asan stack:
Direct leak of 14336 byte(s) in 1 object(s) allocated from:
    #0 0x7f58b93fd970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
    #1 0x7f58b858249d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
    #2 0x5562cc627f49 in virtio_add_queue /mnt/sdb/qemu/hw/virtio/virtio.c:2413
    #3 0x5562cc4b524a in virtio_blk_device_realize /mnt/sdb/qemu/hw/block/virtio-blk.c:1202
    #4 0x5562cc613050 in virtio_device_realize /mnt/sdb/qemu/hw/virtio/virtio.c:3615
    #5 0x5562ccb7a568 in device_set_realized /mnt/sdb/qemu/hw/core/qdev.c:891
    #6 0x5562cd39cd45 in property_set_bool /mnt/sdb/qemu/qom/object.c:2238

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20200328005705.29898-2-pannengyuan@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-03-29 09:52:13 -04:00
Denis Plotnikov
c9b7d9ec21 virtio: increase virtqueue size for virtio-scsi and virtio-blk
The goal is to reduce the amount of requests issued by a guest on
1M reads/writes. This rises the performance up to 4% on that kind of
disk access pattern.

The maximum chunk size to be used for the guest disk accessing is
limited with seg_max parameter, which represents the max amount of
pices in the scatter-geather list in one guest disk request.

Since seg_max is virqueue_size dependent, increasing the virtqueue
size increases seg_max, which, in turn, increases the maximum size
of data to be read/write from a guest disk.

More details in the original problem statment:
https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html

Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Message-id: 20200214074648.958-1-dplotnikov@virtuozzo.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-22 08:26:47 +00:00