Commit Graph

1607 Commits

Author SHA1 Message Date
Eugenio Pérez
f6fe3e333f vdpa: move memory listener to vhost_vdpa_shared
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source.  The main goal is to reduce the
downtime.

However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener.  If the source guest has CVQ enabled, it
will be the CVQ device.  Otherwise, it  will be the first one.

Move the memory listener to a common place rather than always in the
first / last vhost_vdpa.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-14-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26 04:51:07 -05:00
Eugenio Pérez
b06a38f2b0 vdpa: use dev_shared in vdpa_iommu
The memory listener functions can call these too.  Make vdpa_iommu work
with VhostVDPAShared.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-13-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26 04:51:07 -05:00
Eugenio Pérez
6f03d9ef8a vdpa: use VhostVDPAShared in vdpa_dma_map and unmap
The callers only have the shared information by the end of this series.
Start converting this functions.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-12-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26 04:51:07 -05:00
Eugenio Pérez
3c6d12a3b1 vdpa: move iommu_list to vhost_vdpa_shared
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source.  The main goal is to reduce the
downtime.

However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener.  If the source guest has CVQ enabled, it
will be the CVQ device.  Otherwise, it  will be the first one.

Move the iommu_list member to VhostVDPAShared so all vhost_vdpa can use
it, rather than always in the first / last vhost_vdpa.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-11-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26 04:51:07 -05:00
Eugenio Pérez
74e76c7d5b vdpa: remove msg type of vhost_vdpa
It is always VHOST_IOTLB_MSG_V2. We can always make it back per
vhost_dev if needed.

This change makes easier for vhost_vdpa_map and unmap not to depend on
vhost_vdpa but only in VhostVDPAShared.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-10-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26 04:51:07 -05:00
Eugenio Pérez
4da38d1a6d vdpa: move backend_cap to vhost_vdpa_shared
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source.  The main goal is to reduce the
downtime.

However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener.  If the source guest has CVQ enabled, it
will be the CVQ device.  Otherwise, it  will be the first one.

Move the backend_cap member to VhostVDPAShared so all vhost_vdpa can use
it, rather than always in the first / last vhost_vdpa.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-9-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26 04:51:07 -05:00
Eugenio Pérez
7627f0a2de vdpa: move iotlb_batch_begin_sent to vhost_vdpa_shared
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source.  The main goal is to reduce the
downtime.

However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener.  If the source guest has CVQ enabled, it
will be the CVQ device.  Otherwise, it  will be the first one.

Move the iotlb_batch_begin_sent member to VhostVDPAShared so all
vhost_vdpa can use it, rather than always in the first / last
vhost_vdpa.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-8-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26 04:51:07 -05:00
Eugenio Pérez
f12b2498e5 vdpa: move file descriptor to vhost_vdpa_shared
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source.  The main goal is to reduce the
downtime.

However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener.  If the source guest has CVQ enabled, it
will be the CVQ device.  Otherwise, it  will be the first one.

Move the file descriptor to VhostVDPAShared so all vhost_vdpa can use
it, rather than always in the first / last vhost_vdpa.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-7-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26 04:51:07 -05:00
Eugenio Pérez
e36b9992fa vdpa: use vdpa shared for tracing
By the end of this series dma_map and dma_unmap functions don't have the
vdpa device for tracing.  Movinge trace function to shared member one.
Print it also in the vdpa initialization so log reader can relate them.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-6-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26 04:51:07 -05:00
Eugenio Pérez
a6e823d40e vdpa: move shadow_data to vhost_vdpa_shared
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source.  The main goal is to reduce the
downtime.

However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener.  If the source guest has CVQ enabled, it
will be the CVQ device.  Otherwise, it  will be the first one.

Move the shadow_data member to VhostVDPAShared so all vhost_vdpa can use
it, rather than always in the first or last vhost_vdpa.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-5-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26 04:51:07 -05:00
Eugenio Pérez
ae25ff41b7 vdpa: move iova_range to vhost_vdpa_shared
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source.  The main goal is to reduce the
downtime.

However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener.  If the source guest has CVQ enabled, it
will be the CVQ device.  Otherwise, it  will be the first one.

Move the iova range to VhostVDPAShared so all vhost_vdpa can use it,
rather than always in the first or last vhost_vdpa.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-4-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26 04:51:07 -05:00
Eugenio Pérez
5edb02e800 vdpa: move iova tree to the shared struct
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source.  The main goal is to reduce the
downtime.

However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener.  If the source guest has CVQ enabled, it
will be the CVQ device.  Otherwise, it  will be the first one.

Move the iova tree to VhostVDPAShared so all vhost_vdpa can use it,
rather than always in the first or last vhost_vdpa.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-3-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26 04:51:07 -05:00
Eugenio Pérez
bc865bfe2d vdpa: do not set virtio status bits if unneeded
Next commits will set DRIVER and ACKNOWLEDGE flags repeatedly in the
case of a migration destination.  Let's save ioctls with this.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20231215172830.2540987-2-eperezma@redhat.com>
2023-12-26 04:51:07 -05:00
Mike Christie
9aad781959 vhost: Add worker backend callouts
This adds the vhost backend callouts for the worker ioctls added in the
6.4 linux kernel commit:

c1ecd8e95007 ("vhost: allow userspace to create workers")

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Message-Id: <20231204231618.21962-2-michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-25 11:34:55 -05:00
Mathieu Poirier
df72f01104 virtio: rng: Check notifier helpers for VIRTIO_CONFIG_IRQ_IDX
Since the driver doesn't support interrupts, we must return early when
index is set to VIRTIO_CONFIG_IRQ_IDX.  Basically the same thing Viresh
did for "91208dd297f2 virtio: i2c: Check notifier helpers for
VIRTIO_CONFIG_IRQ_IDX".

Fixes: 544f0278af ("virtio: introduce macro VIRTIO_CONFIG_IRQ_IDX")
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Message-Id: <20231025171841.3379663-1-mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-25 11:34:55 -05:00
Li Feng
95e1019a4a vhost-user-scsi: free the inflight area when reset
Keep it the same to vhost-user-blk.
At the same time, fix the vhost_reset_device.

Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <20231123055431.217792-3-fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-02 15:56:49 -05:00
Li Feng
298d4f892e vhost-user: fix the reconnect error
If the error occurs in vhost_dev_init, the value of s->connected is set to true
in advance, and there is no chance to enter this function execution again
in the future.

Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <20231123055431.217792-2-fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-02 15:56:49 -05:00
Eric Auger
c8559fcb15 virtio-iommu: Remove useless !sdev check in virtio_iommu_probe()
The code already checks iommu_mr is not NULL so there is no
need to check container_of() is not NULL. Remove the check.

Fixes: CID 1523901
Fixes: 09b4c3d6a2 ("virtio-iommu: Record whether a probe request has
been issued")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reported-by: Coverity (CID 1523901)
Message-Id: <20231109170715.259520-1-eric.auger@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-02 15:56:49 -05:00
Philippe Mathieu-Daudé
c9a4aa06df hw/virtio: Free VirtIOIOMMUPCI::vdev.reserved_regions[] on finalize()
Commit 0be6bfac62 ("qdev: Implement variable length array properties")
added the DEFINE_PROP_ARRAY() macro with the following comment:

  * It is the responsibility of the device deinit code to free the
  * @_arrayfield memory.

Commit 8077b8e549 added:

  DEFINE_PROP_ARRAY("reserved-regions", VirtIOIOMMUPCI,
                    vdev.nb_reserved_regions, vdev.reserved_regions,
                    qdev_prop_reserved_region, ReservedRegion),

but forgot to free the 'vdev.reserved_regions' array. Do it in the
instance_finalize() handler.

Cc: qemu-stable@nongnu.org
Fixes: 8077b8e549 ("virtio-iommu-pci: Add array of Interval properties") # v5.1.0+
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20231121174051.63038-3-philmd@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-27 15:27:41 +00:00
Philippe Mathieu-Daudé
837053a7f4 hw/virtio: Add VirtioPCIDeviceTypeInfo::instance_finalize field
The VirtioPCIDeviceTypeInfo structure, added in commit a4ee4c8baa
("virtio: Helper for registering virtio device types") got extended
in commit 8ea90ee690 ("virtio: add class_size") with the @class_size
field. Do similarly with the @instance_finalize field.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20231121174051.63038-2-philmd@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-27 15:27:39 +00:00
David Hildenbrand
364eff6885 virtio-mem: fix division by zero in virtio_mem_activate_memslots_to_plug()
When running with "dynamic-memslots=off", we enter
virtio_mem_activate_memslots_to_plug() to return immediately again
because "vmem->dynamic_memslots == false". However, the compiler might
not optimize out calculating start_idx+end_idx, where we divide by
vmem->memslot_size. In such a configuration, the memslot size is 0 and
we'll get a division by zero:

    (qemu) qom-set vmem0 requested-size 3G
    (qemu) q35.sh: line 38: 622940 Floating point exception(core dumped)

The same is true for virtio_mem_deactivate_unplugged_memslots(), however
we never really reach that code without a prior
virtio_mem_activate_memslots_to_plug() call.

Let's fix it by simply calling these functions only with
"dynamic-memslots=on".

This was found when using a debug build of QEMU.

Message-ID: <20231023111341.219317-1-david@redhat.com>
Reprted-by: Mario Casquero <mcasquer@redhat.com>
Fixes: 177f9b1ee4 ("virtio-mem: Expose device memory dynamically via multiple memslots if enabled")
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-11-13 09:35:44 +01:00
Stefan Hajnoczi
f6b615b52d virtio,pc,pci: features, fixes
virtio sound card support
 
 vhost-user: back-end state migration
 
 cxl:
      line length reduction
      enabling fabric management
 
 vhost-vdpa:
      shadow virtqueue hash calculation Support
      shadow virtqueue RSS Support
 
 tests:
     CPU topology related smbios test cases
 
 Fixes, cleanups all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmVKDDoPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpF08H/0Zts8uvkHbgiOEJw4JMHU6/VaCipfIYsp01
 GSfwYOyEsXJ7GIxKWaCiMnWXEm7tebNCPKf3DoUtcAojQj3vuF9XbWBKw/bfRn83
 nGO/iiwbYViSKxkwqUI+Up5YiN9o0M8gBFrY0kScPezbnYmo5u2bcADdEEq6gH68
 D0Ea8i+WmszL891ypvgCDBL2ObDk3qX3vA5Q6J2I+HKX2ofJM59BwaKwS5ghw+IG
 BmbKXUZJNjUQfN9dQ7vJuiuqdknJ2xUzwW2Vn612ffarbOZB1DZ6ruWlrHty5TjX
 0w4IXEJPBgZYbX9oc6zvTQnbLDBJbDU89mnme0TcmNMKWmQKTtc=
 =vEv+
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

virtio,pc,pci: features, fixes

virtio sound card support

vhost-user: back-end state migration

cxl:
     line length reduction
     enabling fabric management

vhost-vdpa:
     shadow virtqueue hash calculation Support
     shadow virtqueue RSS Support

tests:
    CPU topology related smbios test cases

Fixes, cleanups all over the place

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmVKDDoPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpF08H/0Zts8uvkHbgiOEJw4JMHU6/VaCipfIYsp01
# GSfwYOyEsXJ7GIxKWaCiMnWXEm7tebNCPKf3DoUtcAojQj3vuF9XbWBKw/bfRn83
# nGO/iiwbYViSKxkwqUI+Up5YiN9o0M8gBFrY0kScPezbnYmo5u2bcADdEEq6gH68
# D0Ea8i+WmszL891ypvgCDBL2ObDk3qX3vA5Q6J2I+HKX2ofJM59BwaKwS5ghw+IG
# BmbKXUZJNjUQfN9dQ7vJuiuqdknJ2xUzwW2Vn612ffarbOZB1DZ6ruWlrHty5TjX
# 0w4IXEJPBgZYbX9oc6zvTQnbLDBJbDU89mnme0TcmNMKWmQKTtc=
# =vEv+
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 07 Nov 2023 18:06:50 HKT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (63 commits)
  acpi/tests/avocado/bits: enable console logging from bits VM
  acpi/tests/avocado/bits: enforce 32-bit SMBIOS entry point
  hw/cxl: Add tunneled command support to mailbox for switch cci.
  hw/cxl: Add dummy security state get
  hw/cxl/type3: Cleanup multiple CXL_TYPE3() calls in read/write functions
  hw/cxl/mbox: Add Get Background Operation Status Command
  hw/cxl: Add support for device sanitation
  hw/cxl/mbox: Wire up interrupts for background completion
  hw/cxl/mbox: Add support for background operations
  hw/cxl: Implement Physical Ports status retrieval
  hw/pci-bridge/cxl_downstream: Set default link width and link speed
  hw/cxl/mbox: Add Physical Switch Identify command.
  hw/cxl/mbox: Add Information and Status / Identify command
  hw/cxl: Add a switch mailbox CCI function
  hw/pci-bridge/cxl_upstream: Move defintion of device to header.
  hw/cxl/mbox: Generalize the CCI command processing
  hw/cxl/mbox: Pull the CCI definition out of the CXLDeviceState
  hw/cxl/mbox: Split mailbox command payload into separate input and output
  hw/cxl/mbox: Pull the payload out of struct cxl_cmd and make instances constant
  hw/cxl: Fix a QEMU_BUILD_BUG_ON() in switch statement scope issue.
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-11-07 18:59:41 +08:00
Hanna Czenczek
bca3e2a138 vhost-user-fs: Implement internal migration
A virtio-fs device's VM state consists of:
- the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE)
- the back-end's (virtiofsd's) internal state

We get/set the latter via the new vhost operations to transfer migratory
state.  It is its own dedicated subsection, so that for external
migration, it can be disabled.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20231016134243.68248-8-hreitz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-07 03:39:10 -05:00
Hanna Czenczek
4a00d5d7f4 vhost: Add high-level state save/load functions
vhost_save_backend_state() and vhost_load_backend_state() can be used by
vhost front-ends to easily save and load the back-end's state to/from
the migration stream.

Because we do not know the full state size ahead of time,
vhost_save_backend_state() simply reads the data in 1 MB chunks, and
writes each chunk consecutively into the migration stream, prefixed by
its length.  EOF is indicated by a 0-length chunk.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20231016134243.68248-7-hreitz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-07 03:39:10 -05:00
Hanna Czenczek
cda83adc62 vhost-user: Interface for migration state transfer
Add the interface for transferring the back-end's state during migration
as defined previously in vhost-user.rst.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20231016134243.68248-6-hreitz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-07 03:39:10 -05:00
Stefan Hajnoczi
bb59f3548f vfio queue:
* Support for non 64b IOVA space
 * Introduction of a PCIIOMMUOps callback structure to ease future
   extensions
 * Fix for a buffer overrun when writing the VF token
 * PPC cleanups preparing ground for IOMMUFD support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmVI+bIACgkQUaNDx8/7
 7KHW4g/9FmgX0k2Elm1BAul3slJtuBT8/iHKfK19rhXICxhxS5xBWJA8FmosTWAT
 91YqQJhOHARxLd9VROfv8Fq8sAo+Ys8bP3PTXh5satjY5gR9YtmMSVqvsAVLn7lv
 a/0xp7wPJt2UeKzvRNUqFXNr7yHPwxFxbJbmmAJbNte8p+TfE2qvojbJnu7BjJbg
 sTtS/vFWNJwtuNYTkMRoiZaUKEoEZ8LnslOqKUjgeO59g4i3Dq8e2JCmHANPFWUK
 cWmr7AqcXgXEnLSDWTtfN53bjcSCYkFVb4WV4Wv1/7hUF5jQ4UR0l3B64xWe0M3/
 Prak3bWOM/o7JwLBsgaWPngXA9V0WFBTXVF4x5qTwhuR1sSV8MxUvTKxI+qqiEzA
 FjU89oSZ+zXId/hEUuTL6vn1Th8/6mwD0L9ORchNOQUKzCjBzI4MVPB09nM3AdPC
 LGThlufsZktdoU2KjMHpc+gMIXQYsxkgvm07K5iZTZ5eJ4tV5KB0aPvTZppGUxe1
 YY9og9F3hxjDHQtEuSY2rzBQI7nrUpd1ZI5ut/3ZgDWkqD6aGRtMme4n4GsGsYb2
 Ht9+d2RL9S8uPUh+7rV8K/N3+vXgXRaEYTuAScKtflEbA7YnZA5nUdMng8x0kMTQ
 Y73XCd4UGWDfSSZsgaIHGkM/MRIHgmlrfcwPkWqWW9vF+92O6Hw=
 =/Du0
 -----END PGP SIGNATURE-----

Merge tag 'pull-vfio-20231106' of https://github.com/legoater/qemu into staging

vfio queue:

* Support for non 64b IOVA space
* Introduction of a PCIIOMMUOps callback structure to ease future
  extensions
* Fix for a buffer overrun when writing the VF token
* PPC cleanups preparing ground for IOMMUFD support

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmVI+bIACgkQUaNDx8/7
# 7KHW4g/9FmgX0k2Elm1BAul3slJtuBT8/iHKfK19rhXICxhxS5xBWJA8FmosTWAT
# 91YqQJhOHARxLd9VROfv8Fq8sAo+Ys8bP3PTXh5satjY5gR9YtmMSVqvsAVLn7lv
# a/0xp7wPJt2UeKzvRNUqFXNr7yHPwxFxbJbmmAJbNte8p+TfE2qvojbJnu7BjJbg
# sTtS/vFWNJwtuNYTkMRoiZaUKEoEZ8LnslOqKUjgeO59g4i3Dq8e2JCmHANPFWUK
# cWmr7AqcXgXEnLSDWTtfN53bjcSCYkFVb4WV4Wv1/7hUF5jQ4UR0l3B64xWe0M3/
# Prak3bWOM/o7JwLBsgaWPngXA9V0WFBTXVF4x5qTwhuR1sSV8MxUvTKxI+qqiEzA
# FjU89oSZ+zXId/hEUuTL6vn1Th8/6mwD0L9ORchNOQUKzCjBzI4MVPB09nM3AdPC
# LGThlufsZktdoU2KjMHpc+gMIXQYsxkgvm07K5iZTZ5eJ4tV5KB0aPvTZppGUxe1
# YY9og9F3hxjDHQtEuSY2rzBQI7nrUpd1ZI5ut/3ZgDWkqD6aGRtMme4n4GsGsYb2
# Ht9+d2RL9S8uPUh+7rV8K/N3+vXgXRaEYTuAScKtflEbA7YnZA5nUdMng8x0kMTQ
# Y73XCd4UGWDfSSZsgaIHGkM/MRIHgmlrfcwPkWqWW9vF+92O6Hw=
# =/Du0
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 06 Nov 2023 22:35:30 HKT
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [unknown]
# gpg:                 aka "Cédric Le Goater <clg@kaod.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-vfio-20231106' of https://github.com/legoater/qemu: (22 commits)
  vfio/common: Move vfio_host_win_add/del into spapr.c
  vfio/spapr: Make vfio_spapr_create/remove_window static
  vfio/container: Move spapr specific init/deinit into spapr.c
  vfio/container: Move vfio_container_add/del_section_window into spapr.c
  vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c
  util/uuid: Define UUID_STR_LEN from UUID_NONE string
  util/uuid: Remove UUID_FMT_LEN
  vfio/pci: Fix buffer overrun when writing the VF token
  util/uuid: Add UUID_STR_LEN definition
  hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
  test: Add some tests for range and resv-mem helpers
  virtio-iommu: Consolidate host reserved regions and property set ones
  virtio-iommu: Implement set_iova_ranges() callback
  virtio-iommu: Record whether a probe request has been issued
  range: Introduce range_inverse_array()
  virtio-iommu: Introduce per IOMMUDevice reserved regions
  util/reserved-region: Add new ReservedRegion helpers
  range: Make range_compare() public
  virtio-iommu: Rename reserved_regions into prop_resv_regions
  vfio: Collect container iova range info
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-11-07 09:41:52 +08:00
Maciej S. Szmigiero
2d7f108186 Revert "hw/virtio/virtio-pmem: Replace impossible check by assertion"
This reverts commit 5960f254db since the
previous commit made this situation possible again.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-11-06 13:53:59 +01:00
Yi Liu
ba7d12eb8c hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
This patch modifies pci_setup_iommu() to set PCIIOMMUOps
instead of setting PCIIOMMUFunc. PCIIOMMUFunc is used to
get an address space for a PCI device in vendor specific
way. The PCIIOMMUOps still offers this functionality. But
using PCIIOMMUOps leaves space to add more iommu related
vendor specific operations.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hervé Poussineau <hpoussin@reactos.org>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Cc: Jagannathan Raman <jag.raman@oracle.com>
Cc: Matthew Rosato <mjrosato@linux.ibm.com>
Cc: Eric Farman <farman@linux.ibm.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
[ clg: - refreshed on latest QEMU
       - included hw/remote/iommu.c
       - documentation update
       - asserts in pci_setup_iommu()
       - removed checks on iommu_bus->iommu_ops->get_address_space
       - included Elroy PCI host (PA-RISC) ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-03 09:20:31 +01:00
Eric Auger
5c476ba3fa virtio-iommu: Consolidate host reserved regions and property set ones
Up to now we were exposing to the RESV_MEM probe requests the
reserved memory regions set though the reserved-regions array property.

Combine those with the host reserved memory regions if any. Those
latter are tagged as RESERVED. We don't have more information about
them besides then cannot be mapped. Reserved regions set by
property have higher priority.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: "Michael S. Tsirkin" <mst@redhat.com>
Tested-by: Yanghang Liu <yanghliu@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-03 09:20:31 +01:00
Eric Auger
30d40e39bd virtio-iommu: Implement set_iova_ranges() callback
The implementation populates the array of per IOMMUDevice
host reserved ranges.

It is forbidden to have conflicting sets of host IOVA ranges
to be applied onto the same IOMMU MR (implied by different
host devices).

In case the callback is called after the probe request has
been issues by the driver, a warning is issued.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: "Michael S. Tsirkin" <mst@redhat.com>
Tested-by: Yanghang Liu <yanghliu@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-03 09:20:31 +01:00
Eric Auger
09b4c3d6a2 virtio-iommu: Record whether a probe request has been issued
Add an IOMMUDevice 'probe_done' flag to record that the driver
already issued a probe request on that device.

This will be useful to double check host reserved regions aren't
notified after the probe and hence are not taken into account
by the driver.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: "Michael S. Tsirkin" <mst@redhat.com>
Tested-by: Yanghang Liu <yanghliu@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-03 09:20:31 +01:00
Eric Auger
908cae0de4 virtio-iommu: Introduce per IOMMUDevice reserved regions
For the time being the per device reserved regions are
just a duplicate of IOMMU wide reserved regions. Subsequent
patches will combine those with host reserved regions, if any.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Yanghang Liu <yanghliu@redhat.com>
Reviewed-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-03 09:20:31 +01:00
Eric Auger
41cc70cdf5 virtio-iommu: Rename reserved_regions into prop_resv_regions
Rename VirtIOIOMMU (nb_)reserved_regions fields with the "prop_" prefix
to highlight those fields are set through a property, at machine level.
They are IOMMU wide.

A subsequent patch will introduce per IOMMUDevice reserved regions
that will include both those IOMMU wide property reserved
regions plus, sometimes, host reserved regions, if the device is
backed by a host device protected by a physical IOMMU. Also change
nb_ prefix by nr_.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-03 09:20:31 +01:00
Eric Auger
e8f433f80e memory: Let ReservedRegion use Range
A reserved region is a range tagged with a type. Let's directly use
the Range type in the prospect to reuse some of the library helpers
shipped with the Range type.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-03 09:20:31 +01:00
Steve Sistare
89415796f6 cpr: relax vhost migration blockers
vhost blocks migration if logging is not supported to track dirty
memory, and vhost-user blocks it if the log cannot be saved to a shm fd.

vhost-vdpa blocks migration if both hosts do not support all the device's
features using a shadow VQ, for tracking requests and dirty memory.

vhost-scsi blocks migration if storage cannot be shared across hosts,
or if state cannot be migrated.

None of these conditions apply if the old and new qemu processes do
not run concurrently, and if new qemu starts on the same host as old,
which is the case for cpr.

Narrow the scope of these blockers so they only apply to normal mode.
They will not block cpr modes when they are added in subsequent patches.

No functional change until a new mode is added.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1698263069-406971-5-git-send-email-steven.sistare@oracle.com>
2023-11-01 16:13:59 +01:00
Juan Quintela
99b16e8ee4 migration: Use vmstate_register_any()
This are the easiest cases, where we were already using
VMSTATE_INSTANCE_ID_ANY.

Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-3-quintela@redhat.com>
2023-11-01 16:13:58 +01:00
Stefan Hajnoczi
84d61e5f36 virtio: use defer_call() in virtio_irqfd_notify()
virtio-blk and virtio-scsi invoke virtio_irqfd_notify() to send Used
Buffer Notifications from an IOThread. This involves an eventfd
write(2) syscall. Calling this repeatedly when completing multiple I/O
requests in a row is wasteful.

Use the defer_call() API to batch together virtio_irqfd_notify() calls
made during thread pool (aio=threads), Linux AIO (aio=native), and
io_uring (aio=io_uring) completion processing.

Behavior is unchanged for emulated devices that do not use
defer_call_begin()/defer_call_end() since defer_call() immediately
invokes the callback when called outside a
defer_call_begin()/defer_call_end() region.

fio rw=randread bs=4k iodepth=64 numjobs=8 IOPS increases by ~9% with a
single IOThread and 8 vCPUs. iodepth=1 decreases by ~1% but this could
be noise. Detailed performance data and configuration specifics are
available here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd

This duplicates the BH that virtio-blk uses for batching. The next
commit will remove it.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230913200045.1024233-4-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-10-31 15:42:14 +01:00
Paolo Bonzini
126e7f7803 kvm: require KVM_CAP_IOEVENTFD and KVM_CAP_IOEVENTFD_ANY_LENGTH
KVM_CAP_IOEVENTFD_ANY_LENGTH was added in Linux 4.4, released in 2016.
Assume that it is present.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25 17:35:15 +02:00
Paolo Bonzini
5d9ec1f4c7 kvm: assume that many ioeventfds can be created
NR_IOBUS_DEVS was increased to 200 in Linux 2.6.34.  By Linux 3.5 it had
increased to 1000 and later ioeventfds were changed to not count against
the limit.  But the earlier limit of 200 would already be enough for
kvm_check_many_ioeventfds() to be true, so remove the check.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25 17:35:15 +02:00
Stefan Hajnoczi
1b4a5a20da virtio,pc,pci: features, cleanups
infrastructure for vhost-vdpa shadow work
 piix south bridge rework
 reconnect for vhost-user-scsi
 dummy ACPI QTG DSM for cxl
 
 tests, cleanups, fixes all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmU06PMPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpNIsH/0DlKti86VZLJ6PbNqsnKxoK2gg05TbEhPZU
 pQ+RPDaCHpFBsLC5qsoMJwvaEQFe0e49ZFemw7bXRzBxgmbbNnZ9ArCIPqT+rvQd
 7UBmyC+kacVyybZatq69aK2BHKFtiIRlT78d9Izgtjmp8V7oyKoz14Esh8wkE+FT
 ypHUa70Addi6alNm6BVkm7bxZxi0Wrmf3THqF8ViYvufzHKl7JR5e17fKWEG0BqV
 9W7AeHMnzJ7jkTvBGUw7g5EbzFn7hPLTbO4G/VW97k0puS4WRX5aIMkVhUazsRIa
 zDOuXCCskUWuRapiCwY0E4g7cCaT8/JR6JjjBaTgkjJgvo5Y8Eg=
 =ILek
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

virtio,pc,pci: features, cleanups

infrastructure for vhost-vdpa shadow work
piix south bridge rework
reconnect for vhost-user-scsi
dummy ACPI QTG DSM for cxl

tests, cleanups, fixes all over the place

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmU06PMPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpNIsH/0DlKti86VZLJ6PbNqsnKxoK2gg05TbEhPZU
# pQ+RPDaCHpFBsLC5qsoMJwvaEQFe0e49ZFemw7bXRzBxgmbbNnZ9ArCIPqT+rvQd
# 7UBmyC+kacVyybZatq69aK2BHKFtiIRlT78d9Izgtjmp8V7oyKoz14Esh8wkE+FT
# ypHUa70Addi6alNm6BVkm7bxZxi0Wrmf3THqF8ViYvufzHKl7JR5e17fKWEG0BqV
# 9W7AeHMnzJ7jkTvBGUw7g5EbzFn7hPLTbO4G/VW97k0puS4WRX5aIMkVhUazsRIa
# zDOuXCCskUWuRapiCwY0E4g7cCaT8/JR6JjjBaTgkjJgvo5Y8Eg=
# =ILek
# -----END PGP SIGNATURE-----
# gpg: Signature made Sun 22 Oct 2023 02:18:43 PDT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (62 commits)
  intel-iommu: Report interrupt remapping faults, fix return value
  MAINTAINERS: Add include/hw/intc/i8259.h to the PC chip section
  vhost-user: Fix protocol feature bit conflict
  tests/acpi: Update DSDT.cxl with QTG DSM
  hw/cxl: Add QTG _DSM support for ACPI0017 device
  tests/acpi: Allow update of DSDT.cxl
  hw/i386/cxl: ensure maxram is greater than ram size for calculating cxl range
  vhost-user: fix lost reconnect
  vhost-user-scsi: start vhost when guest kicks
  vhost-user-scsi: support reconnect to backend
  vhost: move and rename the conn retry times
  vhost-user-common: send get_inflight_fd once
  hw/i386/pc_piix: Make PIIX4 south bridge usable in PC machine
  hw/isa/piix: Implement multi-process QEMU support also for PIIX4
  hw/isa/piix: Resolve duplicate code regarding PCI interrupt wiring
  hw/isa/piix: Reuse PIIX3's PCI interrupt triggering in PIIX4
  hw/isa/piix: Rename functions to be shared for PCI interrupt triggering
  hw/isa/piix: Reuse PIIX3 base class' realize method in PIIX4
  hw/isa/piix: Share PIIX3's base class with PIIX4
  hw/isa/piix: Harmonize names of reset control memory regions
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-10-23 14:45:29 -07:00
Li Feng
f02a4b8e64 vhost-user: fix lost reconnect
When the vhost-user is reconnecting to the backend, and if the vhost-user fails
at the get_features in vhost_dev_init(), then the reconnect will fail
and it will not be retriggered forever.

The reason is:
When the vhost-user fails at get_features, the vhost_dev_cleanup will be called
immediately.

vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.

The reconnect path is:
vhost_user_blk_event
   vhost_user_async_close(.. vhost_user_blk_disconnect ..)
     qemu_chr_fe_set_handlers <----- clear the notifier callback
       schedule vhost_user_async_close_bh

The vhost->vdev is null, so the vhost_user_blk_disconnect will not be
called, then the event fd callback will not be reinstalled.

All vhost-user devices have this issue, including vhost-user-blk/scsi.

With this patch, if the vdev->vdev is null, the fd callback will still
be reinstalled.

Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")

Signed-off-by: Li Feng <fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20231009044735.941655-6-fengli@smartx.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-22 05:18:17 -04:00
Li Feng
4dfcc09f48 vhost: move and rename the conn retry times
Multiple devices need this macro, move it to a common header.

Signed-off-by: Li Feng <fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20231009044735.941655-3-fengli@smartx.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-22 05:18:17 -04:00
Stefan Hajnoczi
c0c4f14729 virtio: call ->vhost_reset_device() during reset
vhost-user-scsi has a VirtioDeviceClass->reset() function that calls
->vhost_reset_device(). The other vhost devices don't notify the vhost
device upon reset.

Stateful vhost devices may need to handle device reset in order to free
resources or prevent stale device state from interfering after reset.

Call ->vhost_device_reset() from virtio_reset() so that that vhost
devices are notified of device reset.

This patch affects behavior as follows:
- vhost-kernel: No change in behavior since ->vhost_reset_device() is
  not implemented.
- vhost-user: back-ends that negotiate
  VHOST_USER_PROTOCOL_F_RESET_DEVICE now receive a
  VHOST_USER_DEVICE_RESET message upon device reset. Otherwise there is
  no change in behavior. DPDK, SPDK, libvhost-user, and the
  vhost-user-backend crate do not negotiate
  VHOST_USER_PROTOCOL_F_RESET_DEVICE automatically.
- vhost-vdpa: an extra SET_STATUS 0 call is made during device reset.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20231004014532.1228637-4-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
2023-10-22 05:18:16 -04:00
Stefan Hajnoczi
e6383293eb vhost-backend: remove vhost_kernel_reset_device()
vhost_kernel_reset_device() invokes RESET_OWNER, which disassociates the
owner process from the device. The device is left non-operational since
SET_OWNER is only called once during startup in vhost_dev_init().

vhost_kernel_reset_device() is never called so this latent bug never
appears. Get rid of vhost_kernel_reset_device() for now. If someone
needs it in the future they'll need to implement it correctly.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20231004014532.1228637-3-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
2023-10-22 05:18:16 -04:00
Stefan Hajnoczi
22d2464f7e vhost-user: do not send RESET_OWNER on device reset
The VHOST_USER_RESET_OWNER message is deprecated in the spec:

   This is no longer used. Used to be sent to request disabling all
   rings, but some back-ends interpreted it to also discard connection
   state (this interpretation would lead to bugs).  It is recommended
   that back-ends either ignore this message, or use it to disable all
   rings.

The only caller of vhost_user_reset_device() is vhost_user_scsi_reset().
It checks that F_RESET_DEVICE was negotiated before calling it:

  static void vhost_user_scsi_reset(VirtIODevice *vdev)
  {
      VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev);
      struct vhost_dev *dev = &vsc->dev;

      /*
       * Historically, reset was not implemented so only reset devices
       * that are expecting it.
       */
      if (!virtio_has_feature(dev->protocol_features,
                              VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
          return;
      }

      if (dev->vhost_ops->vhost_reset_device) {
          dev->vhost_ops->vhost_reset_device(dev);
      }
  }

Therefore VHOST_USER_RESET_OWNER is actually never sent by
vhost_user_reset_device(). Remove the dead code. This effectively moves
the vhost-user protocol specific code from vhost-user-scsi.c into
vhost-user.c where it belongs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20231004014532.1228637-2-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
2023-10-22 05:18:16 -04:00
Laszlo Ersek
d7dc0682f5 vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously
(1) The virtio-1.2 specification
<http://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html> writes:

> 3     General Initialization And Device Operation
> 3.1   Device Initialization
> 3.1.1 Driver Requirements: Device Initialization
>
> [...]
>
> 7. Perform device-specific setup, including discovery of virtqueues for
>    the device, optional per-bus setup, reading and possibly writing the
>    device’s virtio configuration space, and population of virtqueues.
>
> 8. Set the DRIVER_OK status bit. At this point the device is “live”.

and

> 4         Virtio Transport Options
> 4.1       Virtio Over PCI Bus
> 4.1.4     Virtio Structure PCI Capabilities
> 4.1.4.3   Common configuration structure layout
> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
>
> [...]
>
> The driver MUST configure the other virtqueue fields before enabling the
> virtqueue with queue_enable.
>
> [...]

(The same statements are present in virtio-1.0 identically, at
<http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html>.)

These together mean that the following sub-sequence of steps is valid for
a virtio-1.0 guest driver:

(1.1) set "queue_enable" for the needed queues as the final part of device
initialization step (7),

(1.2) set DRIVER_OK in step (8),

(1.3) immediately start sending virtio requests to the device.

(2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
special virtio feature is negotiated, then virtio rings start in disabled
state, according to
<https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
enabling vrings.

Therefore setting "queue_enable" from the guest (1.1) -- which is
technically "buffered" on the QEMU side until the guest sets DRIVER_OK
(1.2) -- is a *control plane* operation, which -- after (1.2) -- travels
from the guest through QEMU to the vhost-user backend, using a unix domain
socket.

Whereas sending a virtio request (1.3) is a *data plane* operation, which
evades QEMU -- it travels from guest to the vhost-user backend via
eventfd.

This means that operations ((1.1) + (1.2)) and (1.3) travel through
different channels, and their relative order can be reversed, as perceived
by the vhost-user backend.

That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
against the Rust-language virtiofsd version 1.7.2. (Which uses version
0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
crate.)

Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
device initialization steps (i.e., control plane operations), and
immediately sends a FUSE_INIT request too (i.e., performs a data plane
operation). In the Rust-language virtiofsd, this creates a race between
two components that run *concurrently*, i.e., in different threads or
processes:

- Control plane, handling vhost-user protocol messages:

  The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
  [crates/vhost-user-backend/src/handler.rs] handles
  VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
  flag according to the message processed.

- Data plane, handling virtio / FUSE requests:

  The "VringEpollHandler::handle_event" method
  [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
  virtio / FUSE request, consuming the virtio kick at the same time. If
  the vring's "enabled" flag is set, the virtio / FUSE request is
  processed genuinely. If the vring's "enabled" flag is clear, then the
  virtio / FUSE request is discarded.

Note that OVMF enables the queue *first*, and sends FUSE_INIT *second*.
However, if the data plane processor in virtiofsd wins the race, then it
sees the FUSE_INIT *before* the control plane processor took notice of
VHOST_USER_SET_VRING_ENABLE and green-lit the queue for the data plane
processor. Therefore the latter drops FUSE_INIT on the floor, and goes
back to waiting for further virtio / FUSE requests with epoll_wait.
Meanwhile OVMF is stuck waiting for the FUSET_INIT response -- a deadlock.

The deadlock is not deterministic. OVMF hangs infrequently during first
boot. However, OVMF hangs almost certainly during reboots from the UEFI
shell.

The race can be "reliably masked" by inserting a very small delay -- a
single debug message -- at the top of "VringEpollHandler::handle_event",
i.e., just before the data plane processor checks the "enabled" field of
the vring. That delay suffices for the control plane processor to act upon
VHOST_USER_SET_VRING_ENABLE.

We can deterministically prevent the race in QEMU, by blocking OVMF inside
step (1.2) -- i.e., in the write to the device status register that
"unleashes" queue enablement -- until VHOST_USER_SET_VRING_ENABLE actually
*completes*. That way OVMF's VCPU cannot advance to the FUSE_INIT
submission before virtiofsd's control plane processor takes notice of the
queue being enabled.

Wait for VHOST_USER_SET_VRING_ENABLE completion by:

- setting the NEED_REPLY flag on VHOST_USER_SET_VRING_ENABLE, and waiting
  for the reply, if the VHOST_USER_PROTOCOL_F_REPLY_ACK vhost-user feature
  has been negotiated, or

- performing a separate VHOST_USER_GET_FEATURES *exchange*, which requires
  a backend response regardless of VHOST_USER_PROTOCOL_F_REPLY_ACK.

Cc: "Michael S. Tsirkin" <mst@redhat.com> (supporter:vhost)
Cc: Eugenio Perez Martin <eperezma@redhat.com>
Cc: German Maglione <gmaglione@redhat.com>
Cc: Liu Jiang <gerry@linux.alibaba.com>
Cc: Sergio Lopez Pascual <slp@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Albert Esteve <aesteve@redhat.com>
[lersek@redhat.com: work Eugenio's explanation into the commit message,
 about QEMU containing step (1.1) until step (1.2)]
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20231002203221.17241-8-lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-22 05:18:16 -04:00
Laszlo Ersek
75b6b6da21 vhost-user: allow "vhost_set_vring" to wait for a reply
The "vhost_set_vring" function already centralizes the common parts of
"vhost_user_set_vring_num", "vhost_user_set_vring_base" and
"vhost_user_set_vring_enable". We'll want to allow some of those callers
to wait for a reply.

Therefore, rebase "vhost_set_vring" from just "vhost_user_write" to
"vhost_user_write_sync", exposing the "wait_for_reply" parameter.

This is purely refactoring -- there is no observable change. That's
because:

- all three callers pass in "false" for "wait_for_reply", which disables
  all logic in "vhost_user_write_sync" except the call to
  "vhost_user_write";

- the fds=NULL and fd_num=0 arguments of the original "vhost_user_write"
  call inside "vhost_set_vring" are hard-coded within
  "vhost_user_write_sync".

Cc: "Michael S. Tsirkin" <mst@redhat.com> (supporter:vhost)
Cc: Eugenio Perez Martin <eperezma@redhat.com>
Cc: German Maglione <gmaglione@redhat.com>
Cc: Liu Jiang <gerry@linux.alibaba.com>
Cc: Sergio Lopez Pascual <slp@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Albert Esteve <aesteve@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20231002203221.17241-7-lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-22 05:18:16 -04:00
Laszlo Ersek
df3b2abc32 vhost-user: hoist "write_sync", "get_features", "get_u64"
In order to avoid a forward-declaration for "vhost_user_write_sync" in a
subsequent patch, hoist "vhost_user_write_sync" ->
"vhost_user_get_features" -> "vhost_user_get_u64" just above
"vhost_set_vring".

This is purely code movement -- no observable change.

Cc: "Michael S. Tsirkin" <mst@redhat.com> (supporter:vhost)
Cc: Eugenio Perez Martin <eperezma@redhat.com>
Cc: German Maglione <gmaglione@redhat.com>
Cc: Liu Jiang <gerry@linux.alibaba.com>
Cc: Sergio Lopez Pascual <slp@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Albert Esteve <aesteve@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20231002203221.17241-6-lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-22 05:18:16 -04:00
Laszlo Ersek
99ad9ec89d vhost-user: flatten "enforce_reply" into "vhost_user_write_sync"
At this point, only "vhost_user_write_sync" calls "enforce_reply"; embed
the latter into the former.

This is purely refactoring -- no observable change.

Cc: "Michael S. Tsirkin" <mst@redhat.com> (supporter:vhost)
Cc: Eugenio Perez Martin <eperezma@redhat.com>
Cc: German Maglione <gmaglione@redhat.com>
Cc: Liu Jiang <gerry@linux.alibaba.com>
Cc: Sergio Lopez Pascual <slp@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Albert Esteve <aesteve@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20231002203221.17241-5-lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-22 05:18:16 -04:00
Laszlo Ersek
54ae36822f vhost-user: factor out "vhost_user_write_sync"
The tails of the "vhost_user_set_vring_addr" and "vhost_user_set_u64"
functions are now byte-for-byte identical. Factor the common tail out to a
new function called "vhost_user_write_sync".

This is purely refactoring -- no observable change.

Cc: "Michael S. Tsirkin" <mst@redhat.com> (supporter:vhost)
Cc: Eugenio Perez Martin <eperezma@redhat.com>
Cc: German Maglione <gmaglione@redhat.com>
Cc: Liu Jiang <gerry@linux.alibaba.com>
Cc: Sergio Lopez Pascual <slp@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Albert Esteve <aesteve@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20231002203221.17241-4-lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-22 05:18:16 -04:00