mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Steve Sistare	d9cda21303	migration: simplify notifiers Pass the callback function to add_migration_state_change_notifier so that migration can initialize the notifier on add and clear it on delete, which simplifies the call sites. Shorten the function names so the extra arg can be added more legibly. Hide the global notifier list in a new function migration_call_notifiers, and make it externally visible so future live update code can call it. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <1686148954-250144-1-git-send-email-steven.sistare@oracle.com>	2023-10-20 08:51:41 +02:00
Steve Sistare	c8a7fc5179	migration: simplify blockers Modify migrate_add_blocker and migrate_del_blocker to take an Error reason. This allows migration to own the Error object, so that if an error occurs in migrate_add_blocker, migration code can free the Error and clear the client handle, simplifying client code. It also simplifies the migrate_del_blocker call site. In addition, this is a pre-requisite for a proposed future patch that would add a mode argument to migration requests to support live update, and maintain a list of blockers for each mode. A blocker may apply to a single mode or to multiple modes, and passing Error will allow one Error object to be registered for multiple modes. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Tested-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <1697634216-84215-1-git-send-email-steven.sistare@oracle.com>	2023-10-20 08:51:41 +02:00
Marc-André Lureau	8741781157	hw/vfio: add ramfb migration support Add a "VFIODisplay" subsection whenever "x-ramfb-migrate" is turned on. Turn it off by default on machines <= 8.1 for compatibility reasons. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> [ clg: - checkpatch fixes - improved warn_report() in vfio_realize() ] Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Eric Auger	c0f527f4cc	vfio/pci: Remove vfio_detach_device from vfio_realize error path In vfio_realize, on the error path, we currently call vfio_detach_device() after a successful vfio_attach_device. While this looks natural, vfio_instance_finalize also induces a vfio_detach_device(), and it seems to be the right place instead as other resources are released there which happen to be a prerequisite to a successful UNSET_CONTAINER. So let's rely on the finalize vfio_detach_device call to free all the relevant resources. Fixes: a28e06621170 ("vfio/pci: Introduce vfio_[attach/detach]_device") Reported-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Zhenzhong Duan	88ceb67a6f	vfio/ap: Remove pointless apdev variable No need to double-cast, call VFIO_AP_DEVICE() on DeviceState. No functional changes. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Zhenzhong Duan	fde4dbb7e6	vfio/pci: Fix a potential memory leak in vfio_listener_region_add When there is an failure in vfio_listener_region_add() and the section belongs to a ram device, there is an inaccurate error report which should never be related to vfio_dma_map failure. The memory holding err is also incrementally leaked in each failure. Fix it by reporting the real error and free it. Fixes: `567b5b309a` ("vfio/pci: Relax DMA map errors for MMIO regions") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Yi Liu	7e63b31138	vfio/common: Move legacy VFIO backend code into separate container.c Move all the code really dependent on the legacy VFIO container/group into a separate file: container.c. What does remain in common.c is the code related to VFIOAddressSpace, MemoryListeners, migration and all other general operations. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Zhenzhong Duan	3d779abafe	vfio/common: Introduce a global VFIODevice list Some functions iterate over all the VFIODevices. This is currently achieved by iterating over all groups/devices. Let's introduce a global list of VFIODevices simplifying that scan. This will also be useful while migrating to IOMMUFD by hiding the group specificity. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Zhenzhong Duan	0bddd88027	vfio/common: Store the parent container in VFIODevice let's store the parent contaienr within the VFIODevice. This simplifies the logic in vfio_viommu_preset() and brings the benefice to hide the group specificity which is useful for IOMMUFD migration. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Zhenzhong Duan	7103ef7e76	vfio/common: Introduce a per container device list Several functions need to iterate over the VFIO devices attached to a given container. This is currently achieved by iterating over the groups attached to the container and then over the devices in the group. Let's introduce a per container device list that simplifies this search. Per container list is used in below functions: vfio_devices_all_dirty_tracking vfio_devices_all_device_dirty_tracking vfio_devices_all_running_and_mig_active vfio_devices_dma_logging_stop vfio_devices_dma_logging_start vfio_devices_query_dirty_bitmap This will also ease the migration of IOMMUFD by hiding the group specificity. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Zhenzhong Duan	c8fcb90c96	vfio/common: Move VFIO reset handler registration to a group agnostic function Move the reset handler registration/unregistration to a place that is not group specific. vfio_[get/put]_address_space are the best places for that purpose. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Eric Auger	e08041ece7	vfio/ccw: Use vfio_[attach/detach]_device Let the vfio-ccw device use vfio_attach_device() and vfio_detach_device(), hence hiding the details of the used IOMMU backend. Note that the migration reduces the following trace "vfio: subchannel %s has already been attached" (featuring cssid.ssid.devid) into "device is already attached" Also now all the devices have been migrated to use the new vfio_attach_device/vfio_detach_device API, let's turn the legacy functions into static functions, local to container.c. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Eric Auger	c95d128ee3	vfio/ap: Use vfio_[attach/detach]_device Let the vfio-ap device use vfio_attach_device() and vfio_detach_device(), hence hiding the details of the used IOMMU backend. We take the opportunity to use g_path_get_basename() which is prefered, as suggested by `3e015d815b` ("use g_path_get_basename instead of basename") Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Eric Auger	da5ed43299	vfio/platform: Use vfio_[attach/detach]_device Let the vfio-platform device use vfio_attach_device() and vfio_detach_device(), hence hiding the details of the used IOMMU backend. Drop the trace event for vfio-platform as we have similar one in vfio_attach_device. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Eric Auger	5456b1867d	vfio/pci: Introduce vfio_[attach/detach]_device We want the VFIO devices to be able to use two different IOMMU backends, the legacy VFIO one and the new iommufd one. Introduce vfio_[attach/detach]_device which aim at hiding the underlying IOMMU backend (IOCTLs, datatypes, ...). Once vfio_attach_device completes, the device is attached to a security context and its fd can be used. Conversely When vfio_detach_device completes, the device has been detached from the security context. At the moment only the implementation based on the legacy container/group exists. Let's use it from the vfio-pci device. Subsequent patches will handle other devices. We also take benefit of this patch to properly free vbasedev->name on failure. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Zhenzhong Duan	5621c02d5a	vfio/common: Extract out vfio_kvm_device_[add/del]_fd Introduce two new helpers, vfio_kvm_device_[add/del]_fd which take as input a file descriptor which can be either a group fd or a cdev fd. This uses the new KVM_DEV_VFIO_FILE VFIO KVM device group, which aliases to the legacy KVM_DEV_VFIO_GROUP. vfio_kvm_device_[add/del]_group then call those new helpers. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Eric Auger	a33832b194	vfio/common: Introduce vfio_container_add\|del_section_window() Introduce helper functions that isolate the code used for VFIO_SPAPR_TCE_v2_IOMMU. Those helpers hide implementation details beneath the container object and make the vfio_listener_region_add/del() implementations more readable. No code change intended. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Eric Auger	77c212599d	vfio/common: Propagate KVM_SET_DEVICE_ATTR error if any In the VFIO_SPAPR_TCE_v2_IOMMU container case, when KVM_SET_DEVICE_ATTR fails, we currently don't propagate the error as we do on the vfio_spapr_create_window() failure case. Let's align the code. Take the opportunity to reword the error message and make it more explicit. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Yi Liu	1e09f52f4d	vfio/common: Move IOMMU agnostic helpers to a separate file Move low-level iommu agnostic helpers to a separate helpers.c file. They relate to regions, interrupts, device/region capabilities and etc. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-18 10:10:49 +02:00
Jing Liu	eaadba6f9b	vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation During migration restoring, vfio_enable_vectors() is called to restore enabling MSI-X interrupts for assigned devices. It sets the range from 0 to nr_vectors to kernel to enable MSI-X and the vectors unmasked in guest. During the MSI-X enabling, all the vectors within the range are allocated according to the VFIO_DEVICE_SET_IRQS ioctl. When dynamic MSI-X allocation is supported, we only want the guest unmasked vectors being allocated and enabled. Use vector 0 with an invalid fd to get MSI-X enabled, after that, all the vectors can be allocated in need. Signed-off-by: Jing Liu <jing2.liu@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-05 22:04:51 +02:00
Jing Liu	5ebffa4e87	vfio/pci: use an invalid fd to enable MSI-X Guests typically enable MSI-X with all of the vectors masked in the MSI-X vector table. To match the guest state of device, QEMU enables MSI-X by enabling vector 0 with userspace triggering and immediately release. However the release function actually does not release it due to already using userspace mode. It is no need to enable triggering on host and rely on the mask bit to avoid spurious interrupts. Use an invalid fd (i.e. fd = -1) is enough to get MSI-X enabled. After dynamic MSI-X allocation is supported, the interrupt restoring also need use such way to enable MSI-X, therefore, create a function for that. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-05 22:04:51 +02:00
Jing Liu	d9e6710d7d	vfio/pci: enable vector on dynamic MSI-X allocation The vector_use callback is used to enable vector that is unmasked in guest. The kernel used to only support static MSI-X allocation. When allocating a new interrupt using "static MSI-X allocation" kernels, QEMU first disables all previously allocated vectors and then re-allocates all including the new one. The nr_vectors of VFIOPCIDevice indicates that all vectors from 0 to nr_vectors are allocated (and may be enabled), which is used to loop all the possibly used vectors when e.g., disabling MSI-X interrupts. Extend the vector_use function to support dynamic MSI-X allocation when host supports the capability. QEMU therefore can individually allocate and enable a new interrupt without affecting others or causing interrupts lost during runtime. Utilize nr_vectors to calculate the upper bound of enabled vectors in dynamic MSI-X allocation mode since looping all msix_entries_nr is not efficient and unnecessary. Signed-off-by: Jing Liu <jing2.liu@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-05 22:04:51 +02:00
Jing Liu	45d85f6228	vfio/pci: detect the support of dynamic MSI-X allocation Kernel provides the guidance of dynamic MSI-X allocation support of passthrough device, by clearing the VFIO_IRQ_INFO_NORESIZE flag to guide user space. Fetch the flags from host to determine if dynamic MSI-X allocation is supported. Originally-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-05 22:04:51 +02:00
Zhenzhong Duan	c06327c9db	vfio/pci: rename vfio_put_device to vfio_pci_put_device vfio_put_device() is a VFIO PCI specific function, rename it with 'vfio_pci' prefix to avoid confusing. No functional change. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-05 22:04:51 +02:00
Alex Williamson	931150e56b	vfio/display: Fix missing update to set backing fields The below referenced commit renames scanout_width/height to backing_width/height, but also promotes these fields in various portions of the egl interface. Meanwhile vfio dmabuf support has never used the previous scanout fields and is therefore missed in the update. This results in a black screen when transitioning from ramfb to dmabuf display when using Intel vGPU with these features. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1891 Link: https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg02726.html Fixes: `9ac06df8b6` ("virtio-gpu-udmabuf: correct naming of QemuDmaBuf size properties") Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-10-05 22:04:51 +02:00
Cédric Le Goater	44fa20c928	spapr: Remove support for NVIDIA V100 GPU with NVLink2 NVLink2 support was removed from the PPC PowerNV platform and VFIO in Linux 5.13 with commits : 562d1e207d32 ("powerpc/powernv: remove the nvlink support") b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2") This was 2.5 years ago. Do the same in QEMU with a revert of commit `ec132efaa8` ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some adjustements are required on the NUMA part. Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20230918091717.149950-1-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2023-09-18 07:25:28 -03:00
Joao Martins	a31fe5daea	vfio/common: Separate vfio-pci ranges QEMU computes the DMA logging ranges for two predefined ranges: 32-bit and 64-bit. In the OVMF case, when the dynamic MMIO window is enabled, QEMU includes in the 64-bit range the RAM regions at the lower part and vfio-pci device RAM regions which are at the top of the address space. This range contains a large gap and the size can be bigger than the dirty tracking HW limits of some devices (MLX5 has a 2^42 limit). To avoid such large ranges, introduce a new PCI range covering the vfio-pci device RAM regions, this only if the addresses are above 4GB to avoid breaking potential SeaBIOS guests. [ clg: - wrote commit log - fixed overlapping 32-bit and PCI ranges when using SeaBIOS ] Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Fixes: `5255bbf4ec` ("vfio/common: Add device dirty page tracking start/stop") Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:06 +02:00
Avihai Horon	615379764a	vfio/migration: Block VFIO migration with background snapshot Background snapshot allows creating a snapshot of the VM while it's running and keeping it small by not including dirty RAM pages. The way it works is by first stopping the VM, saving the non-iterable devices' state and then starting the VM and saving the RAM while write protecting it with UFFD. The resulting snapshot represents the VM state at snapshot start. VFIO migration is not compatible with background snapshot. First of all, VFIO device state is not even saved in background snapshot because only non-iterable device state is saved. But even if it was saved, after starting the VM, a VFIO device could dirty pages without it being detected by UFFD write protection. This would corrupt the snapshot, as the RAM in it would not represent the RAM at snapshot start. To prevent this, block VFIO migration with background snapshot. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:06 +02:00
Avihai Horon	bf7ef7a2da	vfio/migration: Block VFIO migration with postcopy migration VFIO migration is not compatible with postcopy migration. A VFIO device in the destination can't handle page faults for pages that have not been sent yet. Doing such migration will cause the VM to crash in the destination: qemu-system-x86_64: VFIO_MAP_DMA failed: Bad address qemu-system-x86_64: vfio_dma_map(0x55a28c7659d0, 0xc0000, 0xb000, 0x7f1b11a00000) = -14 (Bad address) qemu: hardware error: vfio: DMA mapping failed, unable to continue To prevent this, block VFIO migration with postcopy migration. Reported-by: Yanghang Liu <yanghliu@redhat.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Tested-by: Yanghang Liu <yanghliu@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:06 +02:00
Avihai Horon	8118349b1b	vfio/migration: Fail adding device with enable-migration=on and existing blocker If a device with enable-migration=on is added and it causes a migration blocker, adding the device should fail with a proper error. This is not the case with multiple device migration blocker when the blocker already exists. If the blocker already exists and a device with enable-migration=on is added which causes a migration blocker, adding the device will succeed. Fix it by failing adding the device in such case. Fixes: `8bbcb64a71` ("vfio/migration: Make VFIO migration non-experimental") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:06 +02:00
Avihai Horon	5c7a4b6035	vfio/migration: Allow migration of multiple P2P supporting devices Now that P2P support has been added to VFIO migration, allow migration of multiple devices if all of them support P2P migration. Single device migration is allowed regardless of P2P migration support. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:05 +02:00
Avihai Horon	94f775e428	vfio/migration: Add P2P support for VFIO migration VFIO migration uAPI defines an optional intermediate P2P quiescent state. While in the P2P quiescent state, P2P DMA transactions cannot be initiated by the device, but the device can respond to incoming ones. Additionally, all outstanding P2P transactions are guaranteed to have been completed by the time the device enters this state. The purpose of this state is to support migration of multiple devices that might do P2P transactions between themselves. Add support for P2P migration by transitioning all the devices to the P2P quiescent state before stopping or starting the devices. Use the new VMChangeStateHandler prepare_cb to achieve that behavior. This will allow migration of multiple VFIO devices if all of them support P2P migration. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:05 +02:00
Joao Martins	3d4d0f0e06	vfio/migration: Refactor PRE_COPY and RUNNING state checks Move the PRE_COPY and RUNNING state checks to helper functions. This is in preparation for adding P2P VFIO migration support, where these helpers will also test for PRE_COPY_P2P and RUNNING_P2P states. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:05 +02:00
Avihai Horon	5485298ce0	vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Changing the device state from STOP_COPY to STOP can take time as the device may need to free resources and do other operations as part of the transition. Currently, this is done in vfio_save_complete_precopy() and therefore it is counted in the migration downtime. To avoid this, change the device state from STOP_COPY to STOP in vfio_save_cleanup(), which is called after migration has completed and thus is not part of migration downtime. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:05 +02:00
Alex Williamson	c00aac6f14	vfio/pci: Enable AtomicOps completers on root ports Dynamically enable Atomic Ops completer support around realize/exit of vfio-pci devices reporting host support for these accesses and adhering to a minimal configuration standard. While the Atomic Ops completer bits in the root port device capabilities2 register are read-only, the PCIe spec does allow RO bits to change to reflect hardware state. We take advantage of that here around the realize and exit functions of the vfio-pci device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Robin Voetter <robin@streamhpc.com> Tested-by: Robin Voetter <robin@streamhpc.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Tony Krowiak	1360b2ad1f	s390x/ap: Wire up the device request notifier interface Let's wire up the device request notifier interface to handle device unplug requests for AP. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Link: https://lore.kernel.org/qemu-devel/20230530225544.280031-1-akrowiak@linux.ibm.com/ Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Avihai Horon	8af87a3ec7	vfio: Fix null pointer dereference bug in vfio_bars_finalize() vfio_realize() has the following flow: 1. vfio_bars_prepare() -- sets VFIOBAR->size. 2. msix_early_setup(). 3. vfio_bars_register() -- allocates VFIOBAR->mr. After vfio_bars_prepare() is called msix_early_setup() can fail. If it does fail, vfio_bars_register() is never called and VFIOBAR->mr is not allocated. In this case, vfio_bars_finalize() is called as part of the error flow to free the bars' resources. However, vfio_bars_finalize() calls object_unparent() for VFIOBAR->mr after checking only VFIOBAR->size, and thus we get a null pointer dereference. Fix it by checking VFIOBAR->mr in vfio_bars_finalize(). Fixes: `89d5202edc` ("vfio/pci: Allow relocating MSI-X MMIO") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	d4a2af747d	vfio/migration: Return bool type for vfio_migration_realize() Make vfio_migration_realize() adhere to the convention of other realize() callbacks(like qdev_realize) by returning bool instead of int. Suggested-by: Cédric Le Goater <clg@redhat.com> Suggested-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	0520d63c77	vfio/migration: Remove print of "Migration disabled" Property enable_migration supports [on/off/auto]. In ON mode, error pointer is passed to errp and logged. In OFF mode, we doesn't need to log "Migration disabled" as it's intentional. In AUTO mode, we should only ever see errors or warnings if the device supports migration and an error or incompatibility occurs while further probing or configuring it. Lack of support for migration shoundn't generate an error or warning. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	2b43b2995b	vfio/migration: Free resources when vfio_migration_realize fails When vfio_realize() succeeds, hot unplug will call vfio_exitfn() to free resources allocated in vfio_realize(); when vfio_realize() fails, vfio_exitfn() is never called and we need to free resources in vfio_realize(). In the case that vfio_migration_realize() fails, e.g: with -only-migratable & enable-migration=off, we see below: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off 0000:81:11.1: Migration disabled Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: Migration is disabled for VFIO device If we hotplug again we should see same log as above, but we see: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off Error: vfio 0000:81:11.1: device is already attached That's because some references to VFIO device isn't released. For resources allocated in vfio_migration_realize(), free them by jumping to out_deinit path with calling a new function vfio_migration_deinit(). For resources allocated in vfio_realize(), free them by jumping to de-register path in vfio_realize(). Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Fixes: `a22651053b` ("vfio: Make vfio-pci device migration capable") Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	3c26c80a0a	vfio/migration: Change vIOMMU blocker from global to per device Contrary to multiple device blocker which needs to consider already-attached devices to unblock/block dynamically, the vIOMMU migration blocker is a device specific config. Meaning it only needs to know whether the device is bypassing or not the vIOMMU (via machine property, or per pxb-pcie::bypass_iommu), and does not need the state of currently present devices. For this reason, the vIOMMU global migration blocker can be consolidated into the per-device migration blocker, allowing us to remove some unnecessary code. This change also makes vfio_mig_active() more accurate as it doesn't check for global blocker. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	adee0da036	vfio/pci: Disable INTx in vfio_realize error path When vfio realize fails, INTx isn't disabled if it has been enabled. This may confuse host side with unhandled interrupt report. Fixes: `c5478fea27` ("vfio/pci: Respond to KVM irqchip change notifier") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Alex Williamson	0ddcb39c93	hw/vfio/pci-quirks: Sanitize capability pointer Coverity reports a tained scalar when traversing the capabilities chain (CID 1516589). In practice I've never seen a device with a chain so broken as to cause an issue, but it's also pretty easy to sanitize. Fixes: `f6b30c1984` ("hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques") Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	0cc889c882	vfio/pci: Free leaked timer in vfio_realize error path When vfio_realize fails, the mmap_timer used for INTx optimization isn't freed. As this timer isn't activated yet, the potential impact is just a piece of leaked memory. Fixes: `ea486926b0` ("vfio-pci: Update slow path INTx algorithm timer related") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Zhenzhong Duan	357bd7932a	vfio/pci: Fix a segfault in vfio_realize The kvm irqchip notifier is only registered if the device supports INTx, however it's unconditionally removed in vfio realize error path. If the assigned device does not support INTx, this will cause QEMU to crash when vfio realize fails. Change it to conditionally remove the notifier only if the notify hook is setup. Before fix: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1 Connection closed by foreign host. After fix: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1 Error: vfio 0000:81:11.1: xres and yres properties require display=on (qemu) Fixes: `c5478fea27` ("vfio/pci: Respond to KVM irqchip change notifier") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	8bbcb64a71	vfio/migration: Make VFIO migration non-experimental The major parts of VFIO migration are supported today in QEMU. This includes basic VFIO migration, device dirty page tracking and precopy support. Thus, at this point in time, it seems appropriate to make VFIO migration non-experimental: remove the x prefix from enable_migration property, change it to ON_OFF_AUTO and let the default value be AUTO. In addition, make the following adjustments: 1. When enable_migration is ON and migration is not supported, fail VFIO device realization. 2. When enable_migration is AUTO (i.e., not explicitly enabled), require device dirty tracking support. This is because device dirty tracking is currently the only method to do dirty page tracking, which is essential for migrating in a reasonable downtime. Setting enable_migration to ON will not require device dirty tracking. 3. Make migration error and blocker messages more elaborate. 4. Remove error prints in vfio_migration_query_flags(). 5. Rename trace_vfio_migration_probe() to trace_vfio_migration_realize(). Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	808642a2f6	vfio/migration: Reset bytes_transferred properly Currently, VFIO bytes_transferred is not reset properly: 1. bytes_transferred is not reset after a VM snapshot (so a migration following a snapshot will report incorrect value). 2. bytes_transferred is a single counter for all VFIO devices, however upon migration failure it is reset multiple times, by each VFIO device. Fix it by introducing a new function vfio_reset_bytes_transferred() and calling it during migration and snapshot start. Remove existing bytes_transferred reset in VFIO migration state notifier, which is not needed anymore. Fixes: `3710586caa` ("qapi: Add VFIO devices migration stats in Migration stats") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Shameer Kolothum	c174088923	vfio/pci: Call vfio_prepare_kvm_msi_virq_batch() in MSI retry path When vfio_enable_vectors() returns with less than requested nr_vectors we retry with what kernel reported back. But the retry path doesn't call vfio_prepare_kvm_msi_virq_batch() and this results in, qemu-system-aarch64: vfio: Error: Failed to enable 4 MSI vectors, retry with 1 qemu-system-aarch64: ../hw/vfio/pci.c:602: vfio_commit_kvm_msi_virq_batch: Assertion `vdev->defer_kvm_irq_routing' failed Fixes: `dc580d51f7` ("vfio: defer to commit kvm irq routing when enable msi/msix") Reviewed-by: Longpeng <longpeng2@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Alex Williamson	f6b30c1984	hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques NVIDIA Turing and newer GPUs implement the MSI-X capability at the offset previously reserved for use by hypervisors to implement the GPUDirect Cliques capability. A revised specification provides an alternate location. Add a config space walk to the quirk to check for conflicts, allowing us to fall back to the new location or generate an error at the quirk setup rather than when the real conflicting capability is added should there be no available location. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Alex Williamson	634f38f0f7	vfio: Implement a common device info helper A common helper implementing the realloc algorithm for handling capabilities. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Robin Voetter <robin@streamhpc.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00

1 2 3 4 5 ...

577 Commits