qemu/hw/virtio
Laszlo Ersek d7dc0682f5 vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously
(1) The virtio-1.2 specification
<http://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html> writes:

> 3     General Initialization And Device Operation
> 3.1   Device Initialization
> 3.1.1 Driver Requirements: Device Initialization
>
> [...]
>
> 7. Perform device-specific setup, including discovery of virtqueues for
>    the device, optional per-bus setup, reading and possibly writing the
>    device’s virtio configuration space, and population of virtqueues.
>
> 8. Set the DRIVER_OK status bit. At this point the device is “live”.

and

> 4         Virtio Transport Options
> 4.1       Virtio Over PCI Bus
> 4.1.4     Virtio Structure PCI Capabilities
> 4.1.4.3   Common configuration structure layout
> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
>
> [...]
>
> The driver MUST configure the other virtqueue fields before enabling the
> virtqueue with queue_enable.
>
> [...]

(The same statements are present in virtio-1.0 identically, at
<http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html>.)

These together mean that the following sub-sequence of steps is valid for
a virtio-1.0 guest driver:

(1.1) set "queue_enable" for the needed queues as the final part of device
initialization step (7),

(1.2) set DRIVER_OK in step (8),

(1.3) immediately start sending virtio requests to the device.

(2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
special virtio feature is negotiated, then virtio rings start in disabled
state, according to
<https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
enabling vrings.

Therefore setting "queue_enable" from the guest (1.1) -- which is
technically "buffered" on the QEMU side until the guest sets DRIVER_OK
(1.2) -- is a *control plane* operation, which -- after (1.2) -- travels
from the guest through QEMU to the vhost-user backend, using a unix domain
socket.

Whereas sending a virtio request (1.3) is a *data plane* operation, which
evades QEMU -- it travels from guest to the vhost-user backend via
eventfd.

This means that operations ((1.1) + (1.2)) and (1.3) travel through
different channels, and their relative order can be reversed, as perceived
by the vhost-user backend.

That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
against the Rust-language virtiofsd version 1.7.2. (Which uses version
0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
crate.)

Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
device initialization steps (i.e., control plane operations), and
immediately sends a FUSE_INIT request too (i.e., performs a data plane
operation). In the Rust-language virtiofsd, this creates a race between
two components that run *concurrently*, i.e., in different threads or
processes:

- Control plane, handling vhost-user protocol messages:

  The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
  [crates/vhost-user-backend/src/handler.rs] handles
  VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
  flag according to the message processed.

- Data plane, handling virtio / FUSE requests:

  The "VringEpollHandler::handle_event" method
  [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
  virtio / FUSE request, consuming the virtio kick at the same time. If
  the vring's "enabled" flag is set, the virtio / FUSE request is
  processed genuinely. If the vring's "enabled" flag is clear, then the
  virtio / FUSE request is discarded.

Note that OVMF enables the queue *first*, and sends FUSE_INIT *second*.
However, if the data plane processor in virtiofsd wins the race, then it
sees the FUSE_INIT *before* the control plane processor took notice of
VHOST_USER_SET_VRING_ENABLE and green-lit the queue for the data plane
processor. Therefore the latter drops FUSE_INIT on the floor, and goes
back to waiting for further virtio / FUSE requests with epoll_wait.
Meanwhile OVMF is stuck waiting for the FUSET_INIT response -- a deadlock.

The deadlock is not deterministic. OVMF hangs infrequently during first
boot. However, OVMF hangs almost certainly during reboots from the UEFI
shell.

The race can be "reliably masked" by inserting a very small delay -- a
single debug message -- at the top of "VringEpollHandler::handle_event",
i.e., just before the data plane processor checks the "enabled" field of
the vring. That delay suffices for the control plane processor to act upon
VHOST_USER_SET_VRING_ENABLE.

We can deterministically prevent the race in QEMU, by blocking OVMF inside
step (1.2) -- i.e., in the write to the device status register that
"unleashes" queue enablement -- until VHOST_USER_SET_VRING_ENABLE actually
*completes*. That way OVMF's VCPU cannot advance to the FUSE_INIT
submission before virtiofsd's control plane processor takes notice of the
queue being enabled.

Wait for VHOST_USER_SET_VRING_ENABLE completion by:

- setting the NEED_REPLY flag on VHOST_USER_SET_VRING_ENABLE, and waiting
  for the reply, if the VHOST_USER_PROTOCOL_F_REPLY_ACK vhost-user feature
  has been negotiated, or

- performing a separate VHOST_USER_GET_FEATURES *exchange*, which requires
  a backend response regardless of VHOST_USER_PROTOCOL_F_REPLY_ACK.

Cc: "Michael S. Tsirkin" <mst@redhat.com> (supporter:vhost)
Cc: Eugenio Perez Martin <eperezma@redhat.com>
Cc: German Maglione <gmaglione@redhat.com>
Cc: Liu Jiang <gerry@linux.alibaba.com>
Cc: Sergio Lopez Pascual <slp@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Albert Esteve <aesteve@redhat.com>
[lersek@redhat.com: work Eugenio's explanation into the commit message,
 about QEMU containing step (1.1) until step (1.2)]
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20231002203221.17241-8-lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-22 05:18:16 -04:00
..
Kconfig virtio-md-pci: New parent type for virtio-mem-pci and virtio-pmem-pci 2023-07-12 09:27:25 +02:00
meson.build virtio: add vhost-user-base and a generic vhost-user-device 2023-10-04 04:54:04 -04:00
trace-events vdpa: export vhost_vdpa_set_vring_ready 2023-10-04 04:54:19 -04:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
vdpa-dev-pci.c vdpa: add vdpa-dev-pci support 2022-12-21 06:35:28 -05:00
vdpa-dev.c vdpa: move vhost_vdpa_set_vring_ready to the caller 2023-10-04 04:54:21 -04:00
vhost-backend.c vhost: add method vhost_set_vring_err 2022-06-27 18:53:18 -04:00
vhost-iova-tree.c util: accept iova_tree_remove_parameter by value 2022-09-02 10:22:39 +08:00
vhost-iova-tree.h util: accept iova_tree_remove_parameter by value 2022-09-02 10:22:39 +08:00
vhost-scsi-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
vhost-shadow-virtqueue.c vhost: Expose vhost_svq_available_slots() 2023-10-18 10:41:50 -04:00
vhost-shadow-virtqueue.h vhost: Expose vhost_svq_available_slots() 2023-10-18 10:41:50 -04:00
vhost-stub.c vhost: Add vhost_get_max_memslots() 2023-10-12 14:15:22 +02:00
vhost-user-blk-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
vhost-user-device-pci.c virtio: add vhost-user-base and a generic vhost-user-device 2023-10-04 04:54:04 -04:00
vhost-user-device.c hw/virtio: add config support to vhost-user-device 2023-10-04 04:54:05 -04:00
vhost-user-fs-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
vhost-user-fs.c hw/virtio: fix typo in VIRTIO_CONFIG_IRQ_IDX comments 2023-07-10 18:59:32 -04:00
vhost-user-gpio-pci.c hw/virtio: add vhost-user-gpio-pci boilerplate 2022-10-07 09:41:51 -04:00
vhost-user-gpio.c qmp: update virtio feature maps, vhost-user-gpio introspection 2023-10-04 04:54:26 -04:00
vhost-user-i2c-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
vhost-user-i2c.c virtio: i2c: Check notifier helpers for VIRTIO_CONFIG_IRQ_IDX 2023-04-24 22:56:55 -04:00
vhost-user-input-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
vhost-user-rng-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
vhost-user-rng.c vhost-user-rng: Back up vqs before cleaning up vhost_dev 2023-03-02 03:10:47 -05:00
vhost-user-scmi-pci.c hw/virtio: Add vhost-user-scmi-pci boilerplate 2023-07-10 16:17:08 -04:00
vhost-user-scmi.c hw/virtio: Add a protection against duplicate vu_scmi_stop calls 2023-08-03 16:06:49 -04:00
vhost-user-scsi-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
vhost-user-vsock-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
vhost-user-vsock.c hw/virtio: introduce virtio_device_should_start 2022-11-07 14:08:18 -05:00
vhost-user.c vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously 2023-10-22 05:18:16 -04:00
vhost-vdpa.c vhost: Remove vhost_backend_can_merge() callback 2023-10-12 14:15:21 +02:00
vhost-vsock-common.c hw/virtio: fix typo in VIRTIO_CONFIG_IRQ_IDX comments 2023-07-10 18:59:32 -04:00
vhost-vsock-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
vhost-vsock.c hw/virtio: introduce virtio_device_should_start 2022-11-07 14:08:18 -05:00
vhost.c memory,vhost: Allow for marking memory device memory regions unmergeable 2023-10-12 14:15:22 +02:00
virtio-9p-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
virtio-balloon-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
virtio-balloon.c hw: replace most qemu_bh_new calls with qemu_bh_new_guarded 2023-04-28 11:31:54 +02:00
virtio-blk-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
virtio-bus.c virtio: stop ioeventfd on reset 2022-06-14 16:50:30 +02:00
virtio-config-io.c hw/virtio: Extract config read/write accessors to virtio-config-io.c 2022-12-21 07:32:24 -05:00
virtio-crypto-pci.c Use DECLARE_*CHECKER* macros 2020-09-09 09:27:09 -04:00
virtio-crypto.c hw/other: spelling fixes 2023-09-21 11:31:16 +03:00
virtio-hmp-cmds.c virtio: Move HMP commands from monitor/ to hw/virtio/ 2023-02-04 07:56:54 +01:00
virtio-input-host-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
virtio-input-pci.c virtio-input-pci: add virtio-multitouch-pci 2023-05-28 13:08:25 +04:00
virtio-iommu-pci.c hw/virtio/virtio-iommu-pci: Enforce the device is plugged on the root bus 2022-11-07 13:12:19 -05:00
virtio-iommu.c virtio-iommu: Standardize granule extraction and formatting 2023-08-03 16:06:49 -04:00
virtio-md-pci.c virtio-md-pci: Support unplug requests for compatible devices 2023-07-12 09:27:30 +02:00
virtio-mem-pci.c virtio-mem: Expose device memory dynamically via multiple memslots if enabled 2023-10-12 14:15:22 +02:00
virtio-mem-pci.h virtio-md-pci: New parent type for virtio-mem-pci and virtio-pmem-pci 2023-07-12 09:27:25 +02:00
virtio-mem.c virtio-mem: Mark memslot alias memory regions unmergeable 2023-10-12 14:15:22 +02:00
virtio-mmio.c virtio: refresh vring region cache after updating a virtqueue size 2023-04-21 03:08:21 -04:00
virtio-net-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
virtio-pci.c virtio: Add shared memory capability 2023-10-16 11:29:56 +04:00
virtio-pmem-pci.c virtio-md-pci: New parent type for virtio-mem-pci and virtio-pmem-pci 2023-07-12 09:27:25 +02:00
virtio-pmem-pci.h virtio-md-pci: New parent type for virtio-mem-pci and virtio-pmem-pci 2023-07-12 09:27:25 +02:00
virtio-pmem.c thread-pool: avoid passing the pool parameter every time 2023-04-25 13:17:28 +02:00
virtio-qmp.c vhost-user: move VhostUserProtocolFeature definition to header file 2023-10-04 04:54:28 -04:00
virtio-qmp.h qmp: remove virtio_list, search QOM tree instead 2023-10-04 04:54:24 -04:00
virtio-rng-pci.c virtio-rng-pci: Allow setting nvectors, so we can use MSI-X 2022-11-07 13:12:20 -05:00
virtio-rng.c virtio: drop name parameter for virtio_init() 2022-05-16 04:38:40 -04:00
virtio-scsi-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
virtio-serial-pci.c hw/virtio: move virtio-pci.h into shared include space 2022-05-16 04:38:40 -04:00
virtio-stub.c qmp: add QMP command x-query-virtio-queue-element 2022-10-09 16:38:45 -04:00
virtio.c virtio: remove unused next argument from virtqueue_split_read_next_desc() 2023-10-04 18:15:06 -04:00