mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Zhenzhong Duan	2b43b2995b	vfio/migration: Free resources when vfio_migration_realize fails When vfio_realize() succeeds, hot unplug will call vfio_exitfn() to free resources allocated in vfio_realize(); when vfio_realize() fails, vfio_exitfn() is never called and we need to free resources in vfio_realize(). In the case that vfio_migration_realize() fails, e.g: with -only-migratable & enable-migration=off, we see below: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off 0000:81:11.1: Migration disabled Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: Migration is disabled for VFIO device If we hotplug again we should see same log as above, but we see: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off Error: vfio 0000:81:11.1: device is already attached That's because some references to VFIO device isn't released. For resources allocated in vfio_migration_realize(), free them by jumping to out_deinit path with calling a new function vfio_migration_deinit(). For resources allocated in vfio_realize(), free them by jumping to de-register path in vfio_realize(). Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Fixes: `a22651053b` ("vfio: Make vfio-pci device migration capable") Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	3c26c80a0a	vfio/migration: Change vIOMMU blocker from global to per device Contrary to multiple device blocker which needs to consider already-attached devices to unblock/block dynamically, the vIOMMU migration blocker is a device specific config. Meaning it only needs to know whether the device is bypassing or not the vIOMMU (via machine property, or per pxb-pcie::bypass_iommu), and does not need the state of currently present devices. For this reason, the vIOMMU global migration blocker can be consolidated into the per-device migration blocker, allowing us to remove some unnecessary code. This change also makes vfio_mig_active() more accurate as it doesn't check for global blocker. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	adee0da036	vfio/pci: Disable INTx in vfio_realize error path When vfio realize fails, INTx isn't disabled if it has been enabled. This may confuse host side with unhandled interrupt report. Fixes: `c5478fea27` ("vfio/pci: Respond to KVM irqchip change notifier") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	0cc889c882	vfio/pci: Free leaked timer in vfio_realize error path When vfio_realize fails, the mmap_timer used for INTx optimization isn't freed. As this timer isn't activated yet, the potential impact is just a piece of leaked memory. Fixes: `ea486926b0` ("vfio-pci: Update slow path INTx algorithm timer related") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Zhenzhong Duan	357bd7932a	vfio/pci: Fix a segfault in vfio_realize The kvm irqchip notifier is only registered if the device supports INTx, however it's unconditionally removed in vfio realize error path. If the assigned device does not support INTx, this will cause QEMU to crash when vfio realize fails. Change it to conditionally remove the notifier only if the notify hook is setup. Before fix: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1 Connection closed by foreign host. After fix: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1 Error: vfio 0000:81:11.1: xres and yres properties require display=on (qemu) Fixes: `c5478fea27` ("vfio/pci: Respond to KVM irqchip change notifier") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	8bbcb64a71	vfio/migration: Make VFIO migration non-experimental The major parts of VFIO migration are supported today in QEMU. This includes basic VFIO migration, device dirty page tracking and precopy support. Thus, at this point in time, it seems appropriate to make VFIO migration non-experimental: remove the x prefix from enable_migration property, change it to ON_OFF_AUTO and let the default value be AUTO. In addition, make the following adjustments: 1. When enable_migration is ON and migration is not supported, fail VFIO device realization. 2. When enable_migration is AUTO (i.e., not explicitly enabled), require device dirty tracking support. This is because device dirty tracking is currently the only method to do dirty page tracking, which is essential for migrating in a reasonable downtime. Setting enable_migration to ON will not require device dirty tracking. 3. Make migration error and blocker messages more elaborate. 4. Remove error prints in vfio_migration_query_flags(). 5. Rename trace_vfio_migration_probe() to trace_vfio_migration_realize(). Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Shameer Kolothum	c174088923	vfio/pci: Call vfio_prepare_kvm_msi_virq_batch() in MSI retry path When vfio_enable_vectors() returns with less than requested nr_vectors we retry with what kernel reported back. But the retry path doesn't call vfio_prepare_kvm_msi_virq_batch() and this results in, qemu-system-aarch64: vfio: Error: Failed to enable 4 MSI vectors, retry with 1 qemu-system-aarch64: ../hw/vfio/pci.c:602: vfio_commit_kvm_msi_virq_batch: Assertion `vdev->defer_kvm_irq_routing' failed Fixes: `dc580d51f7` ("vfio: defer to commit kvm irq routing when enable msi/msix") Reviewed-by: Longpeng <longpeng2@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Zhenzhong Duan	b83b40b614	vfio/pci: Fix a use-after-free issue vbasedev->name is freed wrongly which leads to garbage VFIO trace log. Fix it by allocating a dup of vbasedev->name and then free the dup. Fixes: `2dca1b37a7` ("vfio/pci: add support for VF token") Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-05-24 09:21:22 +02:00
Alex Williamson	b5048a4cbf	vfio/pci: Static Resizable BAR capability The PCI Resizable BAR (ReBAR) capability is currently hidden from the VM because the protocol for interacting with the capability does not support a mechanism for the device to reject an advertised supported BAR size. However, when assigned to a VM, the act of resizing the BAR requires adjustment of host resources for the device, which absolutely can fail. Linux does not currently allow us to reserve resources for the device independent of the current usage. The only writable field within the ReBAR capability is the BAR Size register. The PCIe spec indicates that when written, the device should immediately begin to operate with the provided BAR size. The spec however also notes that software must only write values corresponding to supported sizes as indicated in the capability and control registers. Writing unsupported sizes produces undefined results. Therefore, if the hypervisor were to virtualize the capability and control registers such that the current size is the only indicated available size, then a write of anything other than the current size falls into the category of undefined behavior, where we can essentially expose the modified ReBAR capability as read-only. This may seem pointless, but users have reported that virtualizing the capability in this way not only allows guest software to expose related features as available (even if only cosmetic), but in some scenarios can resolve guest driver issues. Additionally, no regressions in behavior have been reported for this change. A caveat here is that the PCIe spec requires for compatibility that devices report support for a size in the range of 1MB to 512GB, therefore if the current BAR size falls outside that range we revert to hiding the capability. Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230505232308.2869912-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-05-09 09:30:13 -06:00
Minwoo Im	2dca1b37a7	vfio/pci: add support for VF token VF token was introduced [1] to kernel vfio-pci along with SR-IOV support [2]. This patch adds support VF token among PF and VF(s). To passthu PCIe VF to a VM, kernel >= v5.7 needs this. It can be configured with UUID like: -device vfio-pci,host=DDDD:BB:DD:F,vf-token=<uuid>,... [1] https://lore.kernel.org/linux-pci/158396393244.5601.10297430724964025753.stgit@gimli.home/ [2] https://lore.kernel.org/linux-pci/158396044753.5601.14804870681174789709.stgit@gimli.home/ Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Link: https://lore.kernel.org/r/20230320073522epcms2p48f682ecdb73e0ae1a4850ad0712fd780@epcms2p4 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-05-09 09:30:13 -06:00
Alex Williamson	8249cffc62	vfio/migration: Rename entry points Pick names that align with the section drivers should use them from, avoiding the confusion of calling a _finalize() function from _exit() and generalizing the actual _finalize() to handle removing the viommu blocker. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/167820912978.606734.12740287349119694623.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 11:19:07 -07:00
Joao Martins	e46883204c	vfio/migration: Block migration with vIOMMU Migrating with vIOMMU will require either tracking maximum IOMMU supported address space (e.g. 39/48 address width on Intel) or range-track current mappings and dirty track the new ones post starting dirty tracking. This will be done as a separate series, so add a live migration blocker until that is fixed. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-14-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 10:21:22 -07:00
Eric Auger	0d570a2572	vfio/pci: Use vbasedev local variable in vfio_realize() Using a VFIODevice handle local variable to improve the code readability. no functional change intended Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20220502094223.36384-3-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-06 09:06:50 -06:00
Eric Auger	9d38ffc5d8	hw/vfio/pci: fix vfio_pci_hot_reset_result trace point "%m" format specifier is not interpreted by the trace infrastructure and thus "%m" is output instead of the actual errno string. Fix it by outputting strerror(errno). Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20220502094223.36384-2-yi.l.liu@intel.com [aw: replace commit log as provided by Eric] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-06 09:06:50 -06:00
Longpeng(Mike)	dc580d51f7	vfio: defer to commit kvm irq routing when enable msi/msix In migration resume phase, all unmasked msix vectors need to be setup when loading the VF state. However, the setup operation would take longer if the VM has more VFs and each VF has more unmasked vectors. The hot spot is kvm_irqchip_commit_routes, it'll scan and update all irqfds that are already assigned each invocation, so more vectors means need more time to process them. vfio_pci_load_config vfio_msix_enable msix_set_vector_notifiers for (vector = 0; vector < dev->msix_entries_nr; vector++) { vfio_msix_vector_do_use vfio_add_kvm_msi_virq kvm_irqchip_commit_routes <-- expensive } We can reduce the cost by only committing once outside the loop. The routes are cached in kvm_state, we commit them first and then bind irqfd for each vector. The test VM has 128 vcpus and 8 VF (each one has 65 vectors), we measure the cost of the vfio_msix_enable for each VF, and we can see 90+% costs can be reduce. VF Count of irqfds[] Original With this patch 1st 65 8 2 2nd 130 15 2 3rd 195 22 2 4th 260 24 3 5th 325 36 2 6th 390 44 3 7th 455 51 3 8th 520 58 4 Total 258ms 21ms [] Count of irqfds How many irqfds that already assigned and need to process in this round. The optimization can be applied to msi type too. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220326060226.1892-6-longpeng2@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-06 09:06:50 -06:00
Longpeng(Mike)	75d546fc18	Revert "vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration" Commit `ecebe53fe9` ("vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration") avoids inefficiently disabling and enabling vectors repeatedly and lets the unmasked vectors be enabled one by one. But we want to batch multiple routes and defer the commit, and only commit once outside the loop of setting vector notifiers, so we cannot enable the vectors one by one in the loop now. Revert that commit and we will take another way in the next patch, it can not only avoid disabling/enabling vectors repeatedly, but also satisfy our requirement of defer to commit. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220326060226.1892-5-longpeng2@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-06 09:06:50 -06:00
Longpeng(Mike)	8ab217d5d3	vfio: simplify the failure path in vfio_msi_enable Use vfio_msi_disable_common to simplify the error handling in vfio_msi_enable. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220326060226.1892-4-longpeng2@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-06 09:06:50 -06:00
Longpeng(Mike)	be4a46eccf	vfio: move re-enabling INTX out of the common helper Move re-enabling INTX out, and the callers should decide to re-enable it or not. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220326060226.1892-3-longpeng2@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-06 09:06:50 -06:00
Longpeng(Mike)	a6f5770fb2	vfio: simplify the conditional statements in vfio_msi_enable It's unnecessary to test against the specific return value of VFIO_DEVICE_SET_IRQS, since any positive return is an error indicating the number of vectors we should retry with. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220326060226.1892-2-longpeng2@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-06 09:06:50 -06:00
Marc-André Lureau	8e3b0cbb72	Replace qemu_real_host_page variables with inlined functions Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-04-06 10:50:38 +02:00
Markus Armbruster	b21e238037	Use g_new() & friends where that makes obvious sense g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. This commit only touches allocations with size arguments of the form sizeof(T). Patch created mechanically with: $ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \ --macro-file scripts/cocci-macro-file.h FILES... Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20220315144156.1595462-4-armbru@redhat.com> Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>	2022-03-21 15:44:44 +01:00
Longpeng(Mike)	def4c5570c	kvm/msi: do explicit commit when adding msi routes We invoke the kvm_irqchip_commit_routes() for each addition to MSI route table, which is not efficient if we are adding lots of routes in some cases. This patch lets callers invoke the kvm_irqchip_commit_routes(), so the callers can decide how to optimize. [1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00967.html Signed-off-by: Longpeng <longpeng2@huawei.com> Message-Id: <20220222141116.2091-3-longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-15 11:26:20 +01:00
Kunkun Jiang	f36d4fb85f	vfio/pci: Add support for mmapping sub-page MMIO BARs after live migration We can expand MemoryRegions of sub-page MMIO BARs in vfio_pci_write_config() to improve IO performance for some devices. However, the MemoryRegions of destination VM are not expanded any more after live migration. Because their addresses have been updated in vmstate_load_state() (vfio_pci_load_config) and vfio_sub_page_bar_update_mapping() will not be called. This may result in poor performance after live migration. So iterate BARs in vfio_pci_load_config() and try to update sub-page BARs. Reported-by: Nianyao Tang <tangnianyao@huawei.com> Reported-by: Qixin Gan <ganqixin@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> Link: https://lore.kernel.org/r/20211027090406.761-2-jiangkunkun@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-11-01 12:17:51 -06:00
Kevin Wolf	f3558b1b76	qdev: Base object creation on QDict rather than QemuOpts QDicts are both what QMP natively uses and what the keyval parser produces. Going through QemuOpts isn't useful for either one, so switch the main device creation function to QDicts. By sharing more code with the -object/object-add code path, we can even reduce the code size a bit. This commit doesn't remove the detour through QemuOpts from any code path yet, but it allows the following commits to do so. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20211008133442.141332-15-kwolf@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-10-15 16:11:22 +02:00
Cai Huoqing	631ba5a128	hw/vfio: Fix typo in comments Fix typo in comments: programatically ==> programmatically disconecting ==> disconnecting mulitple ==> multiple timout ==> timeout regsiter ==> register forumula ==> formula Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210730012613.2198-1-caihuoqing@baidu.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-09-16 11:57:01 +02:00
Cai Huoqing	1bd9f1b14d	vfio/pci: Add pba_offset PCI quirk for BAIDU KUNLUN AI processor Fix pba_offset initialization value for BAIDU KUNLUN Virtual Function device. The KUNLUN hardware returns an incorrect value for the VF PBA offset, and add a quirk to instead return a hardcoded value of 0xb400. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Link: https://lore.kernel.org/r/20210713093743.942-1-caihuoqing@baidu.com [aw: comment & whitespace tuning] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-07-14 13:47:17 -06:00
Cai Huoqing	936555bc4f	vfio/pci: Change to use vfio_pci_is() Make use of vfio_pci_is() helper function. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Link: https://lore.kernel.org/r/20210713014831.742-1-caihuoqing@baidu.com [aw: commit log wording] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-07-14 13:47:17 -06:00
Thomas Huth	4c386f8064	Do not include sysemu/sysemu.h if it's not really necessary Stop including sysemu/sysemu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-2-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:50 +02:00
Shenming Lu	ecebe53fe9	vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration In VFIO migration resume phase and some guest startups, there are already unmasked vectors in the vector table when calling vfio_msix_enable(). So in order to avoid inefficiently disabling and enabling vectors repeatedly, let's allocate all needed vectors first and then enable these unmasked vectors one by one without disabling. Signed-off-by: Shenming Lu <lushenming@huawei.com> Message-Id: <20210310030233.1133-4-lushenming@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-16 10:06:44 -06:00
Philippe Mathieu-Daudé	4eda914cac	hw/vfio/pci-quirks: Replace the word 'blacklist' Follow the inclusive terminology from the "Conscious Language in your Open Source Projects" guidelines [] and replace the word "blacklist" appropriately. [] https://github.com/conscious-lang/conscious-lang-docs/blob/main/faq.md Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210205171817.2108907-9-philmd@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-16 10:06:44 -06:00
Eduardo Habkost	ce35e2295e	qdev: Move softmmu properties to qdev-properties-system.h Move the property types and property macros implemented in qdev-properties-system.c to a new qdev-properties-system.h header. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-12-18 15:20:17 -05:00
Kirti Wankhede	bb0990d174	vfio: Change default dirty pages tracking behavior during migration By default dirty pages tracking is enabled during iterative phase (pre-copy phase). Added per device opt-out option 'x-pre-copy-dirty-page-tracking' to disable dirty pages tracking during iterative phase. If the option 'x-pre-copy-dirty-page-tracking=off' is set for any VFIO device, dirty pages tracking during iterative phase will be disabled. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-23 10:05:58 -07:00
Alex Williamson	cf254988a5	vfio: Make migration support experimental Support for migration of vfio devices is still in flux. Developers are attempting to add support for new devices and new architectures, but none are yet readily available for validation. We have concerns whether we're transferring device resources at the right point in the migration, whether we're guaranteeing that updates during pre-copy are migrated, and whether we can provide bit-stream compatibility should any of this change. Even the question of whether devices should participate in dirty page tracking during pre-copy seems contentious. In short, migration support has not had enough soak time and it feels premature to mark it as supported. Create an experimental option such that we can continue to develop. [Retaining previous acks/reviews for a previously identical code change with different specifics in the commit log.] Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-23 08:29:29 -07:00
Kirti Wankhede	a22651053b	vfio: Make vfio-pci device migration capable If the device is not a failover primary device, call vfio_migration_probe() and vfio_migration_finalize() to enable migration support for those devices that support it respectively to tear it down again. Removed migration blocker from VFIO PCI device specific structure and use migration blocker from generic structure of VFIO device. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:51 -07:00
Kirti Wankhede	c5e2fb3ce4	vfio: Add save and load functions for VFIO PCI devices Added functions to save and restore PCI device specific data, specifically config space of PCI device. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:50 -07:00
Kirti Wankhede	e93b733bcf	vfio: Add vfio_get_object callback to VFIODeviceOps Hook vfio_get_object callback for PCI devices. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:50 -07:00
Eduardo Habkost	01b4606440	vfio: Rename PCI_VFIO to VFIO_PCI Make the type checking macro name consistent with the TYPE_* constant. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20200902224311.1321159-56-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-09-09 13:20:22 -04:00
Eduardo Habkost	42db0fb5e0	vfio/pci: Move QOM macros to header This will make future conversion to OBJECT_DECLARE* easier. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Tested-By: Roman Bolshakov <r.bolshakov@yadro.com> Message-Id: <20200825192110.3528606-43-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-08-27 14:04:55 -04:00
Markus Armbruster	af175e85f9	error: Eliminate error_propagate() with Coccinelle, part 2 When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. The previous commit did that with a Coccinelle script I consider fairly trustworthy. This commit uses the same script with the matching of return taken out, i.e. we convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... } to if (!foo(..., errp)) { ... ... } This is unsound: @err could still be read between afterwards. I don't know how to express "no read of @err without an intervening write" in Coccinelle. Instead, I manually double-checked for uses of @err. Suboptimal line breaks tweaked manually. qdev_realize() simplified further to placate scripts/checkpatch.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-36-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	668f62ec62	error: Eliminate error_propagate() with Coccinelle, part 1 When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. Convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... return ... } to if (!foo(..., errp)) { ... ... return ... } where nothing else needs @err. Coccinelle script: @rule1 forall@ identifier fun, err, errp, lbl; expression list args, args2; binary operator op; constant c1, c2; symbol false; @@ if ( ( - fun(args, &err, args2) + fun(args, errp, args2) \| - !fun(args, &err, args2) + !fun(args, errp, args2) \| - fun(args, &err, args2) op c1 + fun(args, errp, args2) op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; \| return c2; \| return false; ) } @rule2 forall@ identifier fun, err, errp, lbl; expression list args, args2; expression var; binary operator op; constant c1, c2; symbol false; @@ - var = fun(args, &err, args2); + var = fun(args, errp, args2); ... when != err if ( ( var \| !var \| var op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; \| return c2; \| return false; \| return var; ) } @depends on rule1 \|\| rule2@ identifier err; @@ - Error *err = NULL; ... when != err Not exactly elegant, I'm afraid. The "when != lbl:" is necessary to avoid transforming if (fun(args, &err)) { goto out } ... out: error_propagate(errp, err); even though other paths to label out still need the error_propagate(). For an actual example, see sclp_realize(). Without the "when strict", Coccinelle transforms vfio_msix_setup(), incorrectly. I don't know what exactly "when strict" does, only that it helps here. The match of return is narrower than what I want, but I can't figure out how to express "return where the operand doesn't use @err". For an example where it's too narrow, see vfio_intx_enable(). Silently fails to convert hw/arm/armsse.c, because Coccinelle gets confused by ARMSSE being used both as typedef and function-like macro there. Converted manually. Line breaks tidied up manually. One nested declaration of @local_err deleted manually. Preexisting unwanted blank line dropped in hw/riscv/sifive_e.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-35-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
David Hildenbrand	aff92b8286	vfio: Convert to ram_block_discard_disable() VFIO is (except devices without a physical IOMMU or some mediated devices) incompatible with discarding of RAM. The kernel will pin basically all VM memory. Let's convert to ram_block_discard_disable(), which can now fail, in contrast to qemu_balloon_inhibit(). Leave "x-balloon-allowed" named as it is for now. Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Tony Krowiak <akrowiak@linux.ibm.com> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Pierre Morel <pmorel@linux.ibm.com> Cc: Eric Farman <farman@linux.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200626072248.78761-4-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-07-02 05:54:59 -04:00
Peter Xu	97a3757616	vfio/pci: Use kvm_irqchip_add_irqfd_notifier_gsi() for irqfds VFIO is currently the only one left that is not using the generic function (kvm_irqchip_add_irqfd_notifier_gsi()) to register irqfds. Let VFIO use the common framework too. Follow up patches will introduce extra features for kvm irqfd, so that VFIO can easily leverage that after the switch. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200318145204.74483-3-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-10 12:10:28 -04:00
Markus Armbruster	40c2281cc3	Drop more @errp parameters after previous commit Several functions can't fail anymore: ich9_pm_add_properties(), device_add_bootindex_property(), ppc_compat_add_property(), spapr_caps_add_properties(), PropertyInfo.create(). Drop their @errp parameter. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-16-armbru@redhat.com>	2020-05-15 07:08:14 +02:00
Marc-André Lureau	4f67d30b5e	qdev: set properties with device_class_set_props() The following patch will need to handle properties registration during class_init time. Let's use a device_class_set_props() setter. spatch --macro-file scripts/cocci-macro-file.h --sp-file ./scripts/coccinelle/qdev-set-props.cocci --keep-comments --in-place --dir . @@ typedef DeviceClass; DeviceClass *d; expression val; @@ - d->props = val + device_class_set_props(d, val) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200110153039.1379601-20-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-24 20:59:15 +01:00
Peter Xu	0446f81217	vfio/pci: Don't remove irqchip notifier if not registered The kvm irqchip notifier is only registered if the device supports INTx, however it's unconditionally removed. If the assigned device does not support INTx, this will cause QEMU to crash when unplugging the device from the system. Change it to conditionally remove the notifier only if the notify hook is setup. CC: Eduardo Habkost <ehabkost@redhat.com> CC: David Gibson <david@gibson.dropbear.id.au> CC: Alex Williamson <alex.williamson@redhat.com> Cc: qemu-stable@nongnu.org # v4.2 Reported-by: yanghliu@redhat.com Debugged-by: Eduardo Habkost <ehabkost@redhat.com> Fixes: `c5478fea27` ("vfio/pci: Respond to KVM irqchip change notifier") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1782678 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-01-06 14:19:42 -07:00
David Gibson	c5478fea27	vfio/pci: Respond to KVM irqchip change notifier VFIO PCI devices already respond to the pci intx routing notifier, in order to update kernel irqchip mappings when routing is updated. However this won't handle the case where the irqchip itself is replaced by a different model while retaining the same routing. This case can happen on the pseries machine type due to PAPR feature negotiation. To handle that case, add a handler for the irqchip change notifier, which does much the same thing as the routing notifier, but is unconditional, rather than being a no-op when the routing hasn't changed. Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2019-11-26 10:11:30 +11:00
David Gibson	ad54dbd89d	vfio/pci: Split vfio_intx_update() This splits the vfio_intx_update() function into one part doing the actual reconnection with the KVM irqchip (vfio_intx_update(), now taking an argument with the new routing) and vfio_intx_routing_notifier() which handles calls to the pci device intx routing notifier and calling vfio_intx_update() when necessary. This will make adding support for the irqchip change notifier easier. Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2019-11-26 10:11:30 +11:00
Jens Freimann	ed92369a57	vfio: don't ignore return value of migrate_add_blocker When an error occurs in migrate_add_blocker() it sets a negative return value and uses error pointer we pass in. Instead of just looking at the error pointer check for a negative return value and avoid a coverity error because the return value is set but never used. This fixes CID 1407219. Reported-by: Coverity (CID 1407219) Fixes: `f045a0104c` ("vfio: unplug failover primary device before migration") Signed-off-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-11-18 10:41:48 -07:00
Michal Privoznik	1335d64323	hw/vfio/pci: Fix double free of migration_blocker When user tries to hotplug a VFIO device, but the operation fails somewhere in the middle (in my testing it failed because of RLIMIT_MEMLOCK forbidding more memory allocation), then a double free occurs. In vfio_realize() the vdev->migration_blocker is allocated, then something goes wrong which causes control to jump onto 'error' label where the error is freed. But the pointer is left pointing to invalid memory. Later, when vfio_instance_finalize() is called, the memory is freed again. In my testing the second hunk was sufficient to fix the bug, but I figured the first hunk doesn't hurt either. ==169952== Invalid read of size 8 ==169952== at 0xA47DCD: error_free (error.c:266) ==169952== by 0x4E0A18: vfio_instance_finalize (pci.c:3040) ==169952== by 0x8DF74C: object_deinit (object.c:606) ==169952== by 0x8DF7BE: object_finalize (object.c:620) ==169952== by 0x8E0757: object_unref (object.c:1074) ==169952== by 0x45079C: memory_region_unref (memory.c:1779) ==169952== by 0x45376B: do_address_space_destroy (memory.c:2793) ==169952== by 0xA5C600: call_rcu_thread (rcu.c:283) ==169952== by 0xA427CB: qemu_thread_start (qemu-thread-posix.c:519) ==169952== by 0x80A8457: start_thread (in /lib64/libpthread-2.29.so) ==169952== by 0x81C96EE: clone (in /lib64/libc-2.29.so) ==169952== Address 0x143137e0 is 0 bytes inside a block of size 48 free'd ==169952== at 0x4A342BB: free (vg_replace_malloc.c:530) ==169952== by 0xA47E05: error_free (error.c:270) ==169952== by 0x4E0945: vfio_realize (pci.c:3025) ==169952== by 0x76A4FF: pci_qdev_realize (pci.c:2099) ==169952== by 0x689B9A: device_set_realized (qdev.c:876) ==169952== by 0x8E2C80: property_set_bool (object.c:2080) ==169952== by 0x8E0EF6: object_property_set (object.c:1272) ==169952== by 0x8E3FC8: object_property_set_qobject (qom-qobject.c:26) ==169952== by 0x8E11DB: object_property_set_bool (object.c:1338) ==169952== by 0x5E7BDD: qdev_device_add (qdev-monitor.c:673) ==169952== by 0x5E81E5: qmp_device_add (qdev-monitor.c:798) ==169952== by 0x9E18A8: do_qmp_dispatch (qmp-dispatch.c:132) ==169952== Block was alloc'd at ==169952== at 0x4A35476: calloc (vg_replace_malloc.c:752) ==169952== by 0x51B1158: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.6) ==169952== by 0xA47357: error_setv (error.c:61) ==169952== by 0xA475D9: error_setg_internal (error.c:97) ==169952== by 0x4DF8C2: vfio_realize (pci.c:2737) ==169952== by 0x76A4FF: pci_qdev_realize (pci.c:2099) ==169952== by 0x689B9A: device_set_realized (qdev.c:876) ==169952== by 0x8E2C80: property_set_bool (object.c:2080) ==169952== by 0x8E0EF6: object_property_set (object.c:1272) ==169952== by 0x8E3FC8: object_property_set_qobject (qom-qobject.c:26) ==169952== by 0x8E11DB: object_property_set_bool (object.c:1338) ==169952== by 0x5E7BDD: qdev_device_add (qdev-monitor.c:673) Fixes: `f045a0104c` ("vfio: unplug failover primary device before migration") Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-11-18 10:41:47 -07:00
Jens Freimann	f045a0104c	vfio: unplug failover primary device before migration As usual block all vfio-pci devices from being migrated, but make an exception for failover primary devices. This is achieved by setting unmigratable to 0 but also add a migration blocker for all vfio-pci devices except failover primary devices. These will be unplugged before migration happens by the migration handler of the corresponding virtio-net standby device. Signed-off-by: Jens Freimann <jfreimann@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20191029114905.6856-12-jfreimann@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-10-29 18:55:26 -04:00

1 2 3 4

200 Commits