mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Eric Auger	e04cff9d97	vfio/pci: Remove vfio_populate_device returned value The returned value (either -errno or -1) is not used anymore by the caller, vfio_realize, since the error now is stored in the error object. So let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:02 -06:00
Eric Auger	ec3bcf424e	vfio/pci: Remove vfio_msix_early_setup returned value The returned value is not used anymore by the caller, vfio_realize, since the error now is stored in the error object. So let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:01 -06:00
Eric Auger	1a22aca1d0	vfio/pci: Conversion to realize This patch converts VFIO PCI to realize function. Also original initfn errors now are propagated using QEMU error objects. All errors are formatted with the same pattern: "vfio: %s: the error description" Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:01 -06:00
Eric Auger	9bdbfbd50d	vfio/platform: Pass an error object to vfio_base_device_init This patch propagates errors encountered during vfio_base_device_init up to the realize function. In case the host value is not set or badly formed we now report an error. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:01 -06:00
Eric Auger	0d84f47bff	vfio/platform: fix a wrong returned value in vfio_populate_device In case the vfio_init_intp fails we currently do not return an error value. This patch fixes the bug. The returned value is not explicit but in practice the error object is the one used to report the error to the end-user and the actual returned error value is not used. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:00 -06:00
Eric Auger	5ff7419d4c	vfio/platform: Pass an error object to vfio_populate_device Propagate the vfio_populate_device errors up to vfio_base_device_init. The error object also is passed to vfio_init_intp. At the moment we only report the error. Subsequent patches will propagate the error up to the realize function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:00 -06:00
Eric Auger	59f7d6743c	vfio: Pass an error object to vfio_get_device Pass an error object to prepare for migration to VFIO-PCI realize. In vfio platform vfio_base_device_init we currently just report the error. Subsequent patches will propagate the error up to the realize function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:00 -06:00
Eric Auger	1b808d5be0	vfio: Pass an error object to vfio_get_group Pass an error object to prepare for migration to VFIO-PCI realize. For the time being let's just simply report the error in vfio platform's vfio_base_device_init(). A subsequent patch will duly propagate the error up to vfio_platform_realize. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:59 -06:00
Eric Auger	01905f58f1	vfio: Pass an Error object to vfio_connect_container The error is currently simply reported in vfio_get_group. Don't bother too much with the prefix which will be handled at upper level, later on. Also return an error value in case container->error is not 0 and the container is teared down. On vfio_spapr_remove_window failure, we also report an error whereas it was silent before. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:59 -06:00
Eric Auger	7237011d05	vfio/pci: Pass an error object to vfio_pci_igd_opregion_init Pass an error object to prepare for migration to VFIO-PCI realize. In vfio_probe_igd_bar4_quirk, simply report the error. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:59 -06:00
Eric Auger	7ef165b9a8	vfio/pci: Pass an error object to vfio_add_capabilities Pass an error object to prepare for migration to VFIO-PCI realize. The error is cascaded downto vfio_add_std_cap and then vfio_msi(x)_setup, vfio_setup_pcie_cap. vfio_add_ext_cap does not return anything else than 0 so let's transform it into a void function. Also use pci_add_capability2 which takes an error object. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:58 -06:00
Eric Auger	7dfb34247e	vfio/pci: Pass an error object to vfio_intx_enable Pass an error object to prepare for migration to VFIO-PCI realize. The error object is propagated down to vfio_intx_enable_kvm(). The three other callers, vfio_intx_enable_kvm(), vfio_msi_disable_common() and vfio_pci_post_reset() do not propagate the error and simply call error_reportf_err() with the ERR_PREFIX formatting. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:58 -06:00
Eric Auger	008d0e2d7b	vfio/pci: Pass an error object to vfio_msix_early_setup Pass an error object to prepare for migration to VFIO-PCI realize. The returned value will be removed later on. We now format an error in case of reading failure for - the MSIX flags - the MSIX table, - the MSIX PBA. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:58 -06:00
Eric Auger	2312d907dd	vfio/pci: Pass an error object to vfio_populate_device Pass an error object to prepare for migration to VFIO-PCI realize. The returned value will be removed later on. The case where error recovery cannot be enabled is not converted into an error object but directly reported through error_report, as before. Populating an error instead would cause the future realize function to fail, which is not wanted. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:57 -06:00
Eric Auger	cde4279baa	vfio/pci: Pass an error object to vfio_populate_vga Pass an error object to prepare for the same operation in vfio_populate_device. Eventually this contributes to the migration to VFIO-PCI realize. We now report an error on vfio_get_region_info failure. vfio_probe_igd_bar4_quirk is not involved in the migration to realize and simply calls error_reportf_err. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:57 -06:00
Eric Auger	426ec9049e	vfio/pci: Use local error object in vfio_initfn To prepare for migration to realize, let's use a local error object in vfio_initfn. Also let's use the same error prefix for all error messages. On top of the 1-1 conversion, we start using a common error prefix for all error messages. We also introduce a similar warning prefix which will be used later on. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:56 -06:00
Peter Xu	cdb3081269	memory: introduce IOMMUNotifier and its caps IOMMU Notifier list is used for notifying IO address mapping changes. Currently VFIO is the only user. However it is possible that future consumer like vhost would like to only listen to part of its notifications (e.g., cache invalidations). This patch introduced IOMMUNotifier and IOMMUNotfierFlag bits for a finer grained control of it. IOMMUNotifier contains a bitfield for the notify consumer describing what kind of notification it is interested in. Currently two kinds of notifications are defined: - IOMMU_NOTIFIER_MAP: for newly mapped entries (additions) - IOMMU_NOTIFIER_UNMAP: for entries to be removed (cache invalidates) When registering the IOMMU notifier, we need to specify one or multiple types of messages to listen to. When notifications are triggered, its type will be checked against the notifier's type bits, and only notifiers with registered bits will be notified. (For any IOMMU implementation, an in-place mapping change should be notified with an UNMAP followed by a MAP.) Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1474606948-14391-2-git-send-email-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-27 08:59:16 +02:00
David Gibson	6d17a018d0	vfio/pci: Fix regression in MSI routing configuration `d1f6af6` "kvm-irqchip: simplify kvm_irqchip_add_msi_route" was a cleanup of kvmchip routing configuration, that was mostly intended for x86. However, it also contains a subtle change in behaviour which breaks EEH[1] error recovery on certain VFIO passthrough devices on spapr guests. So far it's only been seen on a BCM5719 NIC on a POWER8 server, but there may be other hardware with the same problem. It's also possible there could be circumstances where it causes a bug on x86 as well, though I don't know of any obvious candidates. Prior to `d1f6af6`, both vfio_msix_vector_do_use() and vfio_add_kvm_msi_virq() used msg == NULL as a special flag to mark this as the "dummy" vector used to make the host hardware state sync with the guest expected hardware state in terms of MSI configuration. Specifically that flag caused vfio_add_kvm_msi_virq() to become a no-op, meaning the dummy irq would always be delivered via qemu. `d1f6af6` changed vfio_add_kvm_msi_virq() so it takes a vector number instead of the msg parameter, and determines the correct message itself. The test for !msg was removed, and not replaced with anything there or in the caller. With an spapr guest which has a VFIO device, if an EEH error occurs on the host hardware, then the device will be isolated then reset. This is a combination of host and guest action, mediated by some EEH related hypercalls. I haven't fully traced the mechanics, but somehow installing the kvm irqchip route for the dummy irq on the BCM5719 means that after EEH reset and recovery, at least some irqs are no longer delivered to the guest. In particular, the guest never gets the link up event, and so the NIC is effectively dead. [1] EEH (Enhanced Error Handling) is an IBM POWER server specific PCI-* error reporting and recovery mechanism. The concept is somewhat similar to PCI-E AER, but the details are different. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1373802 Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Gavin Shan <gwshan@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Cc: qemu-stable@nongnu.org Fixes: `d1f6af6a17` ("kvm-irqchip: simplify kvm_irqchip_add_msi_route") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-09-15 10:41:36 -06:00
Laurent Vivier	e723b87103	trace-events: fix first line comment in trace-events Documentation is docs/tracing.txt instead of docs/trace-events.txt. find . -name trace-events -exec \ sed -i "s?See docs/trace-events.txt for syntax documentation.?See docs/tracing.txt for syntax documentation.?" \ {} \; Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-id: 1470669081-17860-1-git-send-email-lvivier@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-08-12 10:36:01 +01:00
Markus Armbruster	fea1c0999a	vfio: Use error_report() instead of error_printf() for errors Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1470224274-31522-4-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-08-08 09:01:18 +02:00
Peter Xu	3f1fea0fb5	kvm-irqchip: do explicit commit when update irq In the past, we are doing gsi route commit for each irqchip route update. This is not efficient if we are updating lots of routes in the same time. This patch removes the committing phase in kvm_irqchip_update_msi_route(). Instead, we do explicit commit after all routes updated. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-07-21 20:44:19 +03:00
Peter Xu	d1f6af6a17	kvm-irqchip: simplify kvm_irqchip_add_msi_route Changing the original MSIMessage parameter in kvm_irqchip_add_msi_route into the vector number. Vector index provides more information than the MSIMessage, we can retrieve the MSIMessage using the vector easily. This will avoid fetching MSIMessage every time before adding MSI routes. Meanwhile, the vector info will be used in the coming patches to further enable gsi route update notifications. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-07-21 20:44:18 +03:00
Alex Williamson	383a7af7ec	vfio/pci: Hide ARI capability QEMU supports ARI on downstream ports and assigned devices may support ARI in their extended capabilities. The endpoint ARI capability specifies the next function, such that the OS doesn't need to walk each possible function, however this next function is relative to the host, not the guest. This leads to device discovery issues when we combine separate functions into virtual multi-function packages in a guest. For example, SR-IOV VFs are not enumerated by simply probing the function address space, therefore the ARI next-function field is zero. When we combine multiple VFs together as a multi-function device in the guest, the guest OS identifies ARI is enabled, relies on this next-function field, and stops looking for additional function after the first is found. Long term we should expose the ARI capability to the guest to enable configurations with more than 8 functions per slot, but this requires additional QEMU PCI infrastructure to manage the next-function field for multiple, otherwise independent devices. In the short term, hiding this capability allows equivalent functionality to what we currently have on non-express chipsets. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>	2016-07-18 10:55:17 -06:00
David Gibson	21bb3093e6	vfio/spapr: Remove stale ioctl() call This ioctl() call to VFIO_IOMMU_SPAPR_TCE_REMOVE was left over from an earlier version of the code and has since been folded into vfio_spapr_remove_window(). It wasn't caught because although the argument structure has been removed, the libc function remove() means this didn't trigger a compile failure. The ioctl() was also almost certain to fail silently and harmlessly with the bogus argument, so this wasn't caught in testing. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2016-07-18 10:40:27 +10:00
Markus Armbruster	a9c94277f0	Use #include "..." for our own headers, <...> for others Tracked down with an ugly, brittle and probably buggy Perl script. Also move includes converted to <...> up so they get included before ours where that's obviously okay. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2016-07-12 16:19:16 +02:00
Peter Maydell	791b7d2340	pc, pci, virtio: new features, cleanups, fixes iommus can not be added with -device. cleanups and fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXe4l4AAoJECgfDbjSjVRpIz4IALye7mKG61/POA4Gqmhalc3d HnlNSZ2YcKAuvPg7WWkBuRacrQvVY/MbW1mLloG1lY0tdFgZG8Cy+CY6wJg1NE4c cXd+77vHkIyrnl+Nil+QOgTFiAsMnD+mXHHsnCDw2jGn3JbgVNuCMi7V34fGkQd2 PDkZyYfwTqO3HytuG0/j2Somc9du1gjYdn+9qigfZVgP96jGDojBuJWuuU5flCB3 Kj5xrOuI01XlbdTk71tVjBJBektQurWr6r7GECDqZIpUfc+BI70FU9jPh+OlLTO/ 92yi29ncjyStz4tRnf18xoQ8uBgH/tD1xigEUPRtnm1+0i/tgONBL8cAdBF9FBE= =ABGE -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging pc, pci, virtio: new features, cleanups, fixes iommus can not be added with -device. cleanups and fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 05 Jul 2016 11:18:32 BST # gpg: using RSA key 0x281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (30 commits) vmw_pvscsi: remove unnecessary internal msi state flag e1000e: remove unnecessary internal msi state flag vmxnet3: remove unnecessary internal msi state flag mptsas: remove unnecessary internal msi state flag megasas: remove unnecessary megasas_use_msi() pci: Convert msi_init() to Error and fix callers to check it pci bridge dev: change msi property type megasas: change msi/msix property type mptsas: change msi property type intel-hda: change msi property type usb xhci: change msi/msix property type change pvscsi_init_msi() type to void tests: add APIC.cphp and DSDT.cphp blobs tests: acpi: add CPU hotplug testcase log: Permit -dfilter 0..0xffffffffffffffff range: Replace internal representation of Range range: Eliminate direct Range member access log: Clean up misuse of Range for -dfilter pci_register_bar: cleanup Revert "virtio-net: unbreak self announcement and guest offloads after migration" ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-07-05 16:48:24 +01:00
Cao jin	1108b2f8a9	pci: Convert msi_init() to Error and fix callers to check it msi_init() reports errors with error_report(), which is wrong when it's used in realize(). Fix by converting it to Error. Fix its callers to handle failure instead of ignoring it. For those callers who don't handle the failure, it might happen: when user want msi on, but he doesn't get what he want because of msi_init fails silently. cc: Gerd Hoffmann <kraxel@redhat.com> cc: John Snow <jsnow@redhat.com> cc: Dmitry Fleytman <dmitry@daynix.com> cc: Jason Wang <jasowang@redhat.com> cc: Michael S. Tsirkin <mst@redhat.com> cc: Hannes Reinecke <hare@suse.de> cc: Paolo Bonzini <pbonzini@redhat.com> cc: Alex Williamson <alex.williamson@redhat.com> cc: Markus Armbruster <armbru@redhat.com> cc: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com>	2016-07-05 13:14:41 +03:00
Alexey Kardashevskiy	2e4109de8e	vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management. This adds ability to VFIO common code to dynamically allocate/remove DMA windows in the host kernel when new VFIO container is added/removed. This adds a helper to vfio_listener_region_add which makes VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into the host IOMMU list; the opposite action is taken in vfio_listener_region_del. When creating a new window, this uses heuristic to decide on the TCE table levels number. This should cause no guest visible change in behavior. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Added some casts to prevent printf() warnings on certain targets where the kernel headers' __u64 doesn't match uint64_t or PRIx64] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00
Alexey Kardashevskiy	f4ec5e26ed	vfio: Add host side DMA window capabilities There are going to be multiple IOMMUs per a container. This moves the single host IOMMU parameter set to a list of VFIOHostDMAWindow. This should cause no behavioral change and will be used later by the SPAPR TCE IOMMU v2 which will also add a vfio_host_win_del() helper. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00
Alexey Kardashevskiy	318f67ce13	vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) This makes use of the new "memory registering" feature. The idea is to provide the userspace ability to notify the host kernel about pages which are going to be used for DMA. Having this information, the host kernel can pin them all once per user process, do locked pages accounting (once) and not spent time on doing that in real time with possible failures which cannot be handled nicely in some cases. This adds a prereg memory listener which listens on address_space_memory and notifies a VFIO container about memory which needs to be pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped. The feature is only enabled for SPAPR IOMMU v2. The host kernel changes are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does not call it when v2 is detected and enabled. This enforces guest RAM blocks to be host page size aligned; however this is not new as KVM already requires memory slots to be host page size aligned. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [dwg: Fix compile error on 32-bit host] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:30:54 +10:00
Alexey Kardashevskiy	d22d8956b1	memory: Add MemoryRegionIOMMUOps.notify_started/stopped callbacks The IOMMU driver may change behavior depending on whether a notifier client is present. In the case of POWER, this represents a change in the visibility of the IOTLB, for other drivers such as intel-iommu and future AMD-Vi emulation, notifier support is not yet enabled and this provides the opportunity to flag that incompatibility. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> [new log & extracted from [PATCH qemu v17 12/12] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping listening] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-30 13:00:23 -06:00
Alex Williamson	e37dac06dc	vfio/pci: Hide SR-IOV capability The kernel currently exposes the SR-IOV capability as read-only through vfio-pci. This is sufficient to protect the host kernel, but has the potential to confuse guests without further virtualization. In particular, OVMF tries to size the VF BARs and comes up with absurd results, ending with an assert. There's not much point in adding virtualization to a read-only capability, so we simply hide it for now. If the kernel ever enables SR-IOV virtualization, we should easily be able to test it through VF BAR sizing or explicit flags. Testing whether we should parse extended capabilities is also pulled into the function to keep these assumptions in one place. Tested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-30 13:00:23 -06:00
Chen Fan	325ae8d548	vfio: add pcie extended capability support For vfio pcie device, we could expose the extended capability on PCIE bus. due to add a new pcie capability at the tail of the chain, in order to avoid config space overwritten, we introduce a copy config for parsing extended caps. and rebuild the pcie extended config space. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-30 13:00:23 -06:00
Alex Williamson	4d3fc4fdc6	vfio/pci: Fix VGA quirks Commit `2d82f8a3cd` ("vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions") converted VFIOPCIDevice.vga to be dynamically allocted, negating the need for VFIOPCIDevice.has_vga. Unfortunately not all of the has_vga users were converted, nor was the field removed from the structure. Correct these oversights. Reported-by: Peter Maloney <peter.maloney@brockmann-consult.de> Tested-by: Peter Maloney <peter.maloney@brockmann-consult.de> Fixes: `2d82f8a3cd` ("vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions") Fixes: https://bugs.launchpad.net/qemu/+bug/1591628 Cc: qemu-stable@nongnu.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-30 13:00:22 -06:00
Alexey Kardashevskiy	f682e9c244	memory: Add reporting of supported page sizes Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate uses when translating, however this information is not available outside the translate context for various checks. This adds a get_min_page_size callback to MemoryRegionIOMMUOps and a wrapper for it so IOMMU users (such as VFIO) can know the minimum actual page size supported by an IOMMU. As IOMMU MR represents a guest IOMMU, this uses TARGET_PAGE_SIZE as fallback. This removes vfio_container_granularity() and uses new helper in memory_region_iommu_replay() when replaying IOMMU mappings on added IOMMU memory region. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> [dwg: Removed an unnecessary calculation] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-06-22 11:13:09 +10:00
Daniel P. Berrange	1cf6ebc73f	trace: split out trace events for hw/vfio/ directory Move all trace-events for files in the hw/vfio/ directory to their own file. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 1466066426-16657-30-git-send-email-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-20 17:22:16 +01:00
Gavin Shan	d917e88d85	vfio: Fix broken EEH vfio_eeh_container_op() is the backend that communicates with host kernel to support EEH functionality in QEMU. However, the functon should return the value from host kernel instead of 0 unconditionally. dwg: Specifically the problem occurs for the handful of EEH sub-operations which can return a non-zero, non-error result. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> [dwg: clarification to commit message] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-06-17 15:59:18 +10:00
Paolo Bonzini	02d0e09503	os-posix: include sys/mman.h qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check is bogus without a previous inclusion of sys/mman.h. Include it in sysemu/os-posix.h and remove it from everywhere else. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 18:39:03 +02:00
Alexey Kardashevskiy	f1f9365019	vfio: Check that IOMMU MR translates to system address space At the moment IOMMU MR only translate to the system memory. However if some new code changes this, we will need clear indication why it is not working so here is the check. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-26 11:12:09 -06:00
Alexey Kardashevskiy	d78c19b5cf	memory: Fix IOMMU replay base address Since `a788f227` "memory: Allow replay of IOMMU mapping notifications" when new VFIO listener is added, all existing IOMMU mappings are replayed. However there is a problem that the base address of an IOMMU memory region (IOMMU MR) is ignored which is not a problem for the existing user (which is pseries) with its default 32bit DMA window starting at 0 but it is if there is another DMA window. This stores the IOMMU's offset_within_address_space and adjusts the IOVA before calling vfio_dma_map/vfio_dma_unmap. As the IOMMU notifier expects IOVA offset rather than the absolute address, this also adjusts IOVA in sPAPR H_PUT_TCE handler before calling notifier(s). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-26 11:12:08 -06:00
Alexey Kardashevskiy	7a057b4fb9	vfio: Fix 128 bit handling when deleting region `7532d3cbf` "vfio: Fix 128 bit handling" added support for 64bit IOMMU memory regions when those are added to VFIO address space; however removing code cannot cope with these as int128_get64() will fail on 1<<64. This copies 128bit handling from region_add() to region_del(). Since the only machine type which is actually going to use 64bit IOMMU is pseries and it never really removes them (instead it will dynamically add/remove subregions), this should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-26 11:12:07 -06:00
Alex Williamson	6ced0bba70	vfio/pci: Add a separate option for IGD OpRegion support The IGD OpRegion is enabled automatically when running in legacy mode, but it can sometimes be useful in universal passthrough mode as well. Without an OpRegion, output spigots don't work, and even though Intel doesn't officially support physical outputs in UPT mode, it's a useful feature. Note that if an OpRegion is enabled but a monitor is not connected, some graphics features will be disabled in the guest versus a headless system without an OpRegion, where they would work. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:12:03 -06:00
Alex Williamson	c4c45e943e	vfio/pci: Intel graphics legacy mode assignment Enable quirks to support SandyBridge and newer IGD devices as primary VM graphics. This requires new vfio-pci device specific regions added in kernel v4.6 to expose the IGD OpRegion, the shadow ROM, and config space access to the PCI host bridge and LPC/ISA bridge. VM firmware support, SeaBIOS only so far, is also required for reserving memory regions for IGD specific use. In order to enable this mode, IGD must be assigned to the VM at PCI bus address 00:02.0, it must have a ROM, it must be able to enable VGA, it must have or be able to create on its own an LPC/ISA bridge of the proper type at PCI bus address 00:1f.0 (sorry, not compatible with Q35 yet), and it must have the above noted vfio-pci kernel features and BIOS. The intention is that to enable this mode, a user simply needs to assign 00:02.0 from the host to 00:02.0 in the VM: -device vfio-pci,host=0000:00:02.0,bus=pci.0,addr=02.0 and everything either happens automatically or it doesn't. In the case that it doesn't, we leave error reports, but assume the device will operate in universal passthrough mode (UPT), which doesn't require any of this, but has a much more narrow window of supported devices, supported use cases, and supported guest drivers. When using IGD in this mode, the VM firmware is required to reserve some VM RAM for the OpRegion (on the order or several 4k pages) and stolen memory for the GTT (up to 8MB for the latest GPUs). An additional option, x-igd-gms allows the user to specify some amount of additional memory (value is number of 32MB chunks up to 512MB) that is pre-allocated for graphics use. TBH, I don't know of anything that requires this or makes use of this memory, which is why we don't allocate any by default, but the specification suggests this is not actually a valid combination, so the option exists as a workaround. Please report if it's actually necessary in some environment. See code comments for further discussion about the actual operation of the quirks necessary to assign these devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:12:01 -06:00
Alex Williamson	581406e0e3	vfio/pci: Setup BAR quirks after capabilities probing Capability probing modifies wmask, which quirks may be interested in changing themselves. Apply our BAR quirks after the capability scan to make this possible. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:12:00 -06:00
Alex Williamson	182bca4592	vfio/pci: Consolidate VGA setup Combine VGA discovery and registration. Quirks can have dependencies on BARs, so the quirks push out until after we've scanned the BARs. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:11:58 -06:00
Alex Williamson	4225f2b670	vfio/pci: Fix return of vfio_populate_vga() This function returns success if either we setup the VGA region or the host vfio doesn't return enough regions to support the VGA index. This latter case doesn't make any sense. If we're asked to populate VGA, fail if it doesn't exist and let the caller decide if that's important. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:11:56 -06:00
Alex Williamson	e61a424f05	vfio: Create device specific region info helper Given a device specific region type and sub-type, find it. Also cleanup return point on error in vfio_get_region_info() so that we always return 0 with a valid pointer or -errno and NULL. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:04:50 -06:00
Alex Williamson	b53b0f696b	vfio: Enable sparse mmap capability The sparse mmap capability in a vfio region info allows vfio to tell us which sub-areas of a region may be mmap'd. Thus rather than assuming a single mmap covers the entire region and later frobbing it ourselves for things like the PCI MSI-X vector table, we can read that directly from vfio. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 09:43:20 -06:00
Paolo Bonzini	e81096b1c8	explicitly include linux/kvm.h Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-19 16:42:27 +02:00
Peter Maydell	7cd592bc65	VFIO updates 2016-03-28 - Use 128bit math to avoid asserts with IOMMU regions (Bandan Das) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJW+a1UAAoJECObm247sIsiqMsP/RX+lfr2LF7MA7mTEHFIya51 K0LkICysQ5Kh/nIa7yPuPKuGJaDazpYrASq4P5dwm/5IPHk3r/eE7oue7/f1itOK Rgv1sv2nsVkd0p9adTpMkuIAYzLPyp2enL6iuFZo8urwFDfjfAxo8Q0pFd/nxWz7 u7ft69vV40uWtHDg3TZx1EH19UJc0S2ouJeP1Q/MBLZ6FrJq2/SkgujNdX/OvnAv 0zi9mKOykc+lDEzQShXyivIDnl8NYwnEDdcfMvCf3JoV5j/SkCtLyryRoKGOV2LW 53687rNucVX5HLiFEs2hBuYhpxo5/Y7XOxAgbtRkkX1Dh12oy0u1NlV21BV5sPpQ KfiIimtVcq/EuLhs/HLMvwT83EwIRvlXmXiJxii5vWU7Nimmx+dpBqtkrwJ0qzch SPs+SxcCKCNLYeSPQxZ5mQTaf4uNgvBWMVm0nJtQvrWfUp/iLdDMeOx5Dx2wnoT4 8ksHkJin7j2JQnmQiCrPHgLsp47NF0cJixe10DkA9AaQBUcPaExfyr/n5qZIkbvz gxbW0QG1bkpYPD8uNxgFIblSlQGXVTpcvVUaUW7crEH+Dw3GK4h8UcTkKH7rqHGk 9eg0NfHMQDpS/Hr8MneXuOOlUOH2s9ZR0zVPsEwR1kqV1GNEmPbAz1tiLrGLwH3r /UBzyVFjBrdeP4gAJt4/ =DpB8 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20160328.0' into staging VFIO updates 2016-03-28 - Use 128bit math to avoid asserts with IOMMU regions (Bandan Das) # gpg: Signature made Mon 28 Mar 2016 23:16:52 BST using RSA key ID 3BB08B22 # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" # gpg: aka "Alex Williamson <alex@shazbot.org>" # gpg: aka "Alex Williamson <alwillia@redhat.com>" # gpg: aka "Alex Williamson <alex.l.williamson@gmail.com>" * remotes/awilliam/tags/vfio-update-20160328.0: vfio: convert to 128 bit arithmetic calculations when adding mem regions Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-03-29 17:39:41 +01:00
Bandan Das	55efcc537d	vfio: convert to 128 bit arithmetic calculations when adding mem regions vfio_listener_region_add for a iommu mr results in an overflow assert since iommu memory region is initialized with UINT64_MAX. Convert calculations to 128 bit arithmetic for iommu memory regions and let int128_get64 assert for non iommu regions if there's an overflow. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bandan Das <bsd@redhat.com> [missed (end - 1) on 2nd trace call, move llsize closer to use] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-28 13:27:49 -06:00
Markus Armbruster	da34e65cb4	include/qemu/osdep.h: Don't include qapi/error.h Commit `57cb38b` included qapi/error.h into qemu/osdep.h to get the Error typedef. Since then, we've moved to include qemu/osdep.h everywhere. Its file comment explains: "To avoid getting into possible circular include dependencies, this file should not include any other QEMU headers, with the exceptions of config-host.h, compiler.h, os-posix.h and os-win32.h, all of which are doing a similar job to this file and are under similar constraints." qapi/error.h doesn't do a similar job, and it doesn't adhere to similar constraints: it includes qapi-types.h. That's in excess of 100KiB of crap most .c files don't actually need. Add the typedef to qemu/typedefs.h, and include that instead of qapi/error.h. Include qapi/error.h in .c files that need it and don't get it now. Include qapi-types.h in qom/object.h for uint16List. Update scripts/clean-includes accordingly. Update it further to match reality: replace config.h by config-target.h, add sysemu/os-posix.h, sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h comment quoted above similarly. This reduces the number of objects depending on qapi/error.h from "all of them" to less than a third. Unfortunately, the number depending on qapi-types.h shrinks only a little. More work is needed for that one. Signed-off-by: Markus Armbruster <armbru@redhat.com> [Fix compilation without the spice devel packages. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:15 +01:00
David Gibson	3356128cd1	vfio: Eliminate vfio_container_ioctl() vfio_container_ioctl() was a bad interface that bypassed abstraction boundaries, had semantics that sat uneasily with its name, and was unsafe in many realistic circumstances. Now that spapr-pci-vfio-host-bridge has been folded into spapr-pci-host-bridge, there are no more users, so remove it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-16 09:55:11 +11:00
David Gibson	3153119e9b	vfio: Start improving VFIO/EEH interface At present the code handling IBM's Enhanced Error Handling (EEH) interface on VFIO devices operates by bypassing the usual VFIO logic with vfio_container_ioctl(). That's a poorly designed interface with unclear semantics about exactly what can be operated on. In particular it operates on a single vfio container internally (hence the name), but takes an address space and group id, from which it deduces the container in a rather roundabout way. groupids are something that code outside vfio shouldn't even be aware of. This patch creates new interfaces for EEH operations. Internally we have vfio_eeh_container_op() which takes a VFIOContainer object directly. For external use we have vfio_eeh_as_ok() which determines if an AddressSpace is usable for EEH (at present this means it has a single container with exactly one group attached), and vfio_eeh_as_op() which will perform an operation on an AddressSpace in the unambiguous case, and otherwise returns an error. This interface still isn't great, but it's enough of an improvement to allow a number of cleanups in other places. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-16 09:55:10 +11:00
Neo Jia	062ed5d8d6	vfio/pci: replace fixed string limit by g_strdup_printf A trivial change to remove string limit by using g_strdup_printf Tested-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 20:50:43 -07:00
Alex Williamson	e593c0211b	vfio/pci: Split out VGA setup This could be setup later by device specific code, such as IGD initialization. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 20:50:41 -07:00
Alex Williamson	e2e5ee9c56	vfio/pci: Fixup PCI option ROMs Devices like Intel graphics are known to not only have bad checksums, but also the wrong device ID. This is not so surprising given that the video BIOS is typically part of the system firmware image rather that embedded into the device and needs to support any IGD device installed into the system. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 20:50:39 -07:00
Alex Williamson	2d82f8a3cd	vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions Match common vfio code with setup, exit, and finalize functions for BAR, quirk, and VGA management. VGA is also changed to dynamic allocation to match the other MemoryRegions. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 20:50:38 -07:00
Alex Williamson	db0da029a1	vfio: Generalize region support Both platform and PCI vfio drivers create a "slow", I/O memory region with one or more mmap memory regions overlayed when supported by the device. Generalize this to a set of common helpers in the core that pulls the region info from vfio, fills the region data, configures slow mapping, and adds helpers for comleting the mmap, enable/disable, and teardown. This can be immediately used by the PCI MSI-X code, which needs to mmap around the MSI-X vector table. This also changes VFIORegion.mem to be dynamically allocated because otherwise we don't know how the caller has allocated VFIORegion and therefore don't know whether to unreference it to destroy the MemoryRegion or not. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 20:03:16 -07:00
Alex Williamson	469002263a	vfio: Wrap VFIO_DEVICE_GET_REGION_INFO In preparation for supporting capability chains on regions, wrap ioctl(VFIO_DEVICE_GET_REGION_INFO) so we don't duplicate the code for each caller. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 09:39:07 -07:00
Alex Williamson	7df9381b7a	vfio: Add sysfsdev property for pci & platform vfio-pci currently requires a host= parameter, which comes in the form of a PCI address in [domain:]<bus:slot.function> notation. We expect to find a matching entry in sysfs for that under /sys/bus/pci/devices/. vfio-platform takes a similar approach, but defines the host= parameter to be a string, which can be matched directly under /sys/bus/platform/devices/. On the PCI side, we have some interest in using vfio to expose vGPU devices. These are not actual discrete PCI devices, so they don't have a compatible host PCI bus address or a device link where QEMU wants to look for it. There's also really no requirement that vfio can only be used to expose physical devices, a new vfio bus and iommu driver could expose a completely emulated device. To fit within the vfio framework, it would need a kernel struct device and associated IOMMU group, but those are easy constraints to manage. To support such devices, which would include vGPUs, that honor the VFIO PCI programming API, but are not necessarily backed by a unique PCI address, add support for specifying any device in sysfs. The vfio API already has support for probing the device type to ensure compatibility with either vfio-pci or vfio-platform. With this, a vfio-pci device could either be specified as: -device vfio-pci,host=02:00.0 or -device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0 or even -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0 When vGPU support comes along, this might look something more like: -device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0 NB - This is only a made up example path The same change is made for vfio-platform, specifying sysfsdev has precedence over the old host option. Tested-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 09:39:07 -07:00
Peter Maydell	974dc73d77	all: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- This just catches a couple of stragglers since I posted the last clean-includes patchset last week.	2016-02-23 12:43:05 +00:00
Wei Yang	b58b17f744	vfio/pci: use PCI_MSIX_FLAGS on retrieving the MSIX entries Even PCI_CAP_FLAGS has the same value as PCI_MSIX_FLAGS, the later one is the more proper on retrieving MSIX entries. This patch uses PCI_MSIX_FLAGS to retrieve the MSIX entries. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-19 09:42:32 -07:00
Eric Auger	62d9551247	hw/vfio/platform: amd-xgbe device This patch introduces the amd-xgbe VFIO platform device. It allows the guest to do passthrough on a device exposing an "amd,xgbe-seattle-v1a" compat string. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-19 09:42:29 -07:00
Wei Yang	3fc1c182c1	vfio/pci: replace 1 with PCI_CAP_LIST_NEXT to make code self-explain Use the macro PCI_CAP_LIST_NEXT instead of 1, so that the code would be more self-explain. This patch makes this change and also fixs one typo in comment. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-19 09:42:29 -07:00
Chen Fan	88caf177ac	vfio: make the 4 bytes aligned for capability size this function search the capability from the end, the last size should 0x100 - pos, not 0xff - pos. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-19 09:42:28 -07:00
Peter Maydell	c6eacb1ac0	hw/vfio: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1453832250-766-22-git-send-email-peter.maydell@linaro.org	2016-01-29 15:07:24 +00:00
Alex Williamson	95239e1625	vfio/pci: Lazy PBA emulation The PCI spec recommends devices use additional alignment for MSI-X data structures to allow software to map them to separate processor pages. One advantage of doing this is that we can emulate those data structures without a significant performance impact to the operation of the device. Some devices fail to implement that suggestion and assigned device performance suffers. One such case of this is a Mellanox MT27500 series, ConnectX-3 VF, where the MSI-X vector table and PBA are aligned on separate 4K pages. If PBA emulation is enabled, performance suffers. It's not clear how much value we get from PBA emulation, but the solution here is to only lazily enable the emulated PBA when a masked MSI-X vector fires. We then attempt to more aggresively disable the PBA memory region any time a vector is unmasked. The expectation is then that a typical VM will run entirely with PBA emulation disabled, and only when used is that emulation re-enabled. Reported-by: Shyam Kaushik <shyam.kaushik@gmail.com> Tested-by: Shyam Kaushik <shyam.kaushik@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-01-19 11:33:42 -07:00
Alex Williamson	f5793fd9e1	vfio/pci-quirks: Only quirk to size of PCI config space For quirks that support the full PCIe extended config space, limit the quirk to only the size of config space available through vfio. This allows host systems with broken MMCONFIG regions to still make use of these quirks without generating bad address faults trying to access beyond the end of config space exposed through vfio. This may expose direct access to the mirror of extended config space, only trapping the sub-range of standard config space, but allowing this makes the quirk, and thus the device, functional. We expect that only device specific accesses make use of the mirror, not general extended PCI capability accesses, so any virtualization in this space is likely unnecessary anyway, and the device is still IOMMU isolated, so it should only be able to hurt itself through any bogus configurations enabled by this space. Link: https://www.redhat.com/archives/vfio-users/2015-November/msg00192.html Reported-by: Ronnie Swanink <ronnie@ronnieswanink.nl> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-01-19 11:33:41 -07:00
Markus Armbruster	bdd81addf4	vfio: Use g_new() & friends where that makes obvious sense g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. This commit only touches allocations with size arguments of the form sizeof(T). Same Coccinelle semantic patch as in commit `b45c03f`. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-10 12:11:08 -07:00
Alex Williamson	0282abf078	vfio/pci: Hide device PCIe capability on non-express buses for PCIe VMs When we have a PCIe VM, such as Q35, guests start to care more about valid configurations of devices relative to the VM view of the PCI topology. Windows will error with a Code 10 for an assigned device if a PCIe capability is found for a device on a conventional bus. We also have the possibility of IOMMUs, like VT-d, where the where the guest may be acutely aware of valid express capabilities on physical hardware. Some devices, like tg3 are adversely affected by this due to driver dependencies on the PCIe capability. The only solution for such devices is to attach them to an express capable bus in the VM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-10 12:11:08 -07:00
Pavel Fedin	dc9f06ca81	kvm: Pass PCI device pointer to MSI routing functions In-kernel ITS emulation on ARM64 will require to supply requester IDs. These IDs can now be retrieved from the device pointer using new pci_requester_id() function. This patch adds pci_dev pointer to KVM GSI routing functions and makes callers passing it. x86 architecture does not use requester IDs, but hw/i386/kvm/pci-assign.c also made passing PCI device pointer instead of NULL for consistency with the rest of the code. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Message-Id: <ce081423ba2394a4efc30f30708fca07656bc500.1444916432.git.p.fedin@samsung.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-19 10:13:07 +02:00
David Gibson	508ce5eb00	vfio: Allow hotplug of containers onto existing guest IOMMU mappings At present the memory listener used by vfio to keep host IOMMU mappings in sync with the guest memory image assumes that if a guest IOMMU appears, then it has no existing mappings. This may not be true if a VFIO device is hotplugged onto a guest bus which didn't previously include a VFIO device, and which has existing guest IOMMU mappings. Therefore, use the memory_region_register_iommu_notifier_replay() function in order to fix this case, replaying existing guest IOMMU mappings, bringing the host IOMMU into sync with the guest IOMMU. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:39:47 -06:00
David Gibson	7a140a57c6	vfio: Record host IOMMU's available IO page sizes Depending on the host IOMMU type we determine and record the available page sizes for IOMMU translation. We'll need this for other validation in future patches. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:38:41 -06:00
David Gibson	3898aad323	vfio: Check guest IOVA ranges against host IOMMU capabilities The current vfio core code assumes that the host IOMMU is capable of mapping any IOVA the guest wants to use to where we need. However, real IOMMUs generally only support translating a certain range of IOVAs (the "DMA window") not a full 64-bit address space. The common x86 IOMMUs support a wide enough range that guests are very unlikely to go beyond it in practice, however the IOMMU used on IBM Power machines - in the default configuration - supports only a much more limited IOVA range, usually 0..2GiB. If the guest attempts to set up an IOVA range that the host IOMMU can't map, qemu won't report an error until it actually attempts to map a bad IOVA. If guest RAM is being mapped directly into the IOMMU (i.e. no guest visible IOMMU) then this will show up very quickly. If there is a guest visible IOMMU, however, the problem might not show up until much later when the guest actually attempt to DMA with an IOVA the host can't handle. This patch adds a test so that we will detect earlier if the guest is attempting to use IOVA ranges that the host IOMMU won't be able to deal with. For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is incorrect, but no worse than what we have already. We can't do better for now because the Type1 kernel interface doesn't tell us what IOVA range the IOMMU actually supports. For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported IOVA range and validate guest IOVA ranges against it, and this patch does so. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:38:13 -06:00
David Gibson	ac6dc3894f	vfio: Generalize vfio_listener_region_add failure path If a DMA mapping operation fails in vfio_listener_region_add() it checks to see if we've already completed initial setup of the container. If so it reports an error so the setup code can fail gracefully, otherwise throws a hw_error(). There are other potential failure cases in vfio_listener_region_add() which could benefit from the same logic, so move it to its own fail: block. Later patches can use this to extend other failure cases to fail as gracefully as possible under the circumstances. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:37:02 -06:00
David Gibson	ee0bf0e59b	vfio: Remove unneeded union from VFIOContainer Currently the VFIOContainer iommu_data field contains a union with different information for different host iommu types. However: * It only actually contains information for the x86-like "Type1" iommu * Because we have a common listener the Type1 fields are actually used on all IOMMU types, including the SPAPR TCE type as well In fact we now have a general structure for the listener which is unlikely to ever need per-iommu-type information, so this patch removes the union. In a similar way we can unify the setup of the vfio memory listener in vfio_connect_container() that is currently split across a switch on iommu type, but is effectively the same in both cases. The iommu_data.release pointer was only needed as a cleanup function which would handle potentially different data in the union. With the union gone, it too can be removed. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:36:08 -06:00
Eric Auger	a5b39cd3f6	hw/vfio/platform: do not set resamplefd for edge-sensitive IRQS In irqfd mode, current code attempts to set a resamplefd whatever the type of the IRQ. For an edge-sensitive IRQ this attempt fails and as a consequence, the whole irqfd setup fails and we fall back to the slow mode. This patch bypasses the resamplefd setting for non level-sentive IRQs. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:30:12 -06:00
Eric Auger	a22313deca	hw/vfio/platform: change interrupt/unmask fields into pointer unmask EventNotifier might not be initialized in case of edge sensitive irq. Using EventNotifier pointers make life simpler to handle the edge-sensitive irqfd setup. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:30:12 -06:00
Eric Auger	58892b447f	hw/vfio/platform: irqfd setup sequence update With current implementation, eventfd VFIO signaling is first set up and then irqfd is setup, if supported and allowed. This start sequence causes several issues with IRQ forwarding setup which, if supported, is transparently attempted on irqfd setup: IRQ forwarding setup is likely to fail if the IRQ is detected as under injection into the guest (active at irqchip level or VFIO masked). This currently always happens because the current sequence explicitly VFIO-masks the IRQ before setting irqfd. Even if that masking were removed, we couldn't prevent the case where the IRQ is under injection into the guest. So the simpler solution is to remove this 2-step startup and directly attempt irqfd setup. This is what this patch does. Also in case the eventfd setup fails, there is no reason to go farther: let's abort. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:30:12 -06:00
Alex Williamson	9d146b2e2f	vfio/pci: Remove use of g_malloc0_n() from quirks For compatibility with glib 2.22. Reported-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 21:27:17 -06:00
Alex Williamson	89dcccc593	vfio/pci: Add emulated PCI IDs Specifying an emulated PCI vendor/device ID can be useful for testing various quirk paths, even though the behavior and functionality of the device with bogus IDs is fully unsupportable. We need to use a uint32_t for the vendor/device IDs, even though the registers themselves are only 16-bit in order to be able to determine whether the value is valid and user set. The same support is added for subsystem vendor/device ID, though these have the possibility of being useful and supported for more than a testing tool. An emulated platform might want to impose their own subsystem IDs or at least hide the physical subsystem ID. Windows guests will often reinstall drivers due to a change in subsystem IDs, something that VM users may want to avoid. Of course careful attention would be required to ensure that guest drivers do not rely on the subsystem ID as a basis for device driver quirks. All of these options are added using the standard experimental option prefix and should not be considered stable. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:49 -06:00
Alex Williamson	ff635e3775	vfio/pci: Cache vendor and device ID Simplify access to commonly referenced PCI vendor and device ID by caching it on the VFIOPCIDevice struct. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:49 -06:00
Alex Williamson	c9c5000991	vfio/pci: Move AMD device specific reset to quirks This is just another quirk, for reset rather than affecting memory regions. Move it to our new quirks file. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:49 -06:00
Alex Williamson	958d553405	vfio/pci: Remove old config window and mirror quirks These are now unused. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:48 -06:00
Alex Williamson	0d38fb1c5f	vfio/pci: Config mirror quirk Re-implement our mirror quirk using the new infrastructure. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:48 -06:00
Alex Williamson	0e54f24a5b	vfio/pci: Config window quirks Config windows make use of an address register and a data register. In VGA cards, these are often used to provide real mode code in the BIOS an easy way to access MMIO registers since the window often resides in an I/O port register. When the MMIO register has a mirror of PCI config space, we need to trap those accesses and redirect them to emulated config space. The previous version of this functionality made use of a single MemoryRegion and single match address. This version uses separate MemoryRegions for each of the address and data registers and allows for multiple match addresses. This is useful for Nvidia cards which have two ranges which index into PCI config space. The previous implementation is left for the follow-on patch for a more reviewable diff. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:48 -06:00
Alex Williamson	954258a5f1	vfio/pci: Rework RTL8168 quirk Another rework of this quirk, this time to update to the new quirk structure. We can handle the address and data registers with separate MemoryRegions and a quirk specific data structure, making the code much more understandable. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:47 -06:00
Alex Williamson	6029a424be	vfio/pci: Cleanup Nvidia 0x3d0 quirk The Nvidia 0x3d0 quirk makes use of a two separate registers and gives us our first chance to make use of separate memory regions for each to simplify the code a bit. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:47 -06:00
Alex Williamson	b946d28611	vfio/pci: Cleanup ATI 0x3c3 quirk This is an easy quirk that really doesn't need a data structure if its own. We can pass vdev as the opaque data and access to the MemoryRegion isn't required. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:47 -06:00
Alex Williamson	8c4f234853	vfio/pci: Foundation for new quirk structure VFIOQuirk hosts a single memory region and a fixed set of data fields that try to handle all the quirk cases, but end up making those that don't exactly match really confusing. This patch introduces a struct intended to provide more flexibility and simpler code. VFIOQuirk is stripped to its basics, an opaque data pointer for quirk specific data and a pointer to an array of MemoryRegions with a counter. This still allows us to have common teardown routines, but adds much greater flexibility to support multiple memory regions and quirk specific data structures that are easier to maintain. The existing VFIOQuirk is transformed into VFIOLegacyQuirk, which further patches will eliminate entirely. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:46 -06:00
Alex Williamson	056dfcb695	vfio/pci: Cleanup ROM blacklist quirk Create a vendor:device ID helper that we'll also use as we rework the rest of the quirks. Re-reading the config entries, even if we get more blacklist entries, is trivial overhead and only incurred during device setup. There's no need to typedef the blacklist structure, it's a static private data type used once. The elements get bumped up to uint32_t to avoid future maintenance issues if PCI_ANY_ID gets used for a blacklist entry (avoiding an actual hardware match). Our test loop is also crying out to be simplified as a for loop. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:45 -06:00
Alex Williamson	c00d61d8fa	vfio/pci: Split quirks to a separate file Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:45 -06:00
Alex Williamson	78f33d2bfd	vfio/pci: Extract PCI structures to a separate header Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:44 -06:00
Alex Williamson	5e15d79b86	vfio: Change polarity of our no-mmap option The default should be to allow mmap and new drivers shouldn't need to expose an option or set it to other than the allocation default in their initfn. Take advantage of the experimental flag to change this option to the correct polarity. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:44 -06:00
Alex Williamson	46746dbaa8	vfio/pci: Make interrupt bypass runtime configurable Tracing is more effective when we can completely disable all KVM bypass paths. Make these runtime rather than build-time configurable. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:44 -06:00
Alex Williamson	0de70dc7ba	vfio/pci: Rename MSI/X functions for easier tracing This allows vfio_msi* tracing. The MSI/X interrupt tracing is also pulled out of #ifdef DEBUG_VFIO to avoid a recompile for tracing this path. A few cycles to read the message is hardly anything if we're already in QEMU. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:43 -06:00
Alex Williamson	870cb6f104	vfio/pci: Rename INTx functions for easier tracing Rename functions and tracing callbacks so that we can trace vfio_intx* to see all the INTx related activities. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:43 -06:00
Alex Williamson	b5bd049fa9	vfio/pci: Cleanup vfio_early_setup_msix() error path With the addition of the Chelsio quirk we have an error path out of vfio_early_setup_msix() that doesn't free the allocated VFIOMSIXInfo struct. This doesn't introduce a leak as it still gets freed in the vfio_put_device() path, but it's complicated and sloppy to rely on that. Restructure to free the allocated data on error and only link it into the vdev on success. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com>	2015-09-23 13:04:43 -06:00
Alex Williamson	d451008e0f	vfio/pci: Cleanup RTL8168 quirk and tracing There's quite a bit of cleanup that can be done to the RTL8168 quirk, as well as the tracing to prevent a spew of uninteresting accesses for anything else the driver might choose to use the window registers for besides the MSI-X table. There should be no functional change, but it's now possible to get compact and useful traces by enabling vfio_rtl8168_quirk*, ex: vfio_rtl8168_quirk_write 0000:04:00.0 [address]: 0x1f000 vfio_rtl8168_quirk_read 0000:04:00.0 [address]: 0x8001f000 vfio_rtl8168_quirk_read 0000:04:00.0 [data]: 0xfee0100c vfio_rtl8168_quirk_write 0000:04:00.0 [address]: 0x1f004 vfio_rtl8168_quirk_read 0000:04:00.0 [address]: 0x8001f004 vfio_rtl8168_quirk_read 0000:04:00.0 [data]: 0x0 vfio_rtl8168_quirk_write 0000:04:00.0 [address]: 0x1f008 vfio_rtl8168_quirk_read 0000:04:00.0 [address]: 0x8001f008 vfio_rtl8168_quirk_read 0000:04:00.0 [data]: 0x49b1 vfio_rtl8168_quirk_write 0000:04:00.0 [address]: 0x1f00c vfio_rtl8168_quirk_read 0000:04:00.0 [address]: 0x8001f00c vfio_rtl8168_quirk_read 0000:04:00.0 [data]: 0x0 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:42 -06:00
Veres Lajos	67cc32ebfd	typofixes - v4 Signed-off-by: Veres Lajos <vlajos@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-09-11 10:45:43 +03:00
John Snow	594fd21102	trivial: remove trailing newline from error_report Minor cleanup. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-09-11 10:21:38 +03:00
Daniel P. Berrange	d7646f241c	maint: remove unused include for dirent.h A number of files were including dirent.h but not using any of the functions it provides Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-09-11 10:21:38 +03:00
Daniel P. Berrange	b6af097528	maint: remove / fix many doubled words Many source files have doubled words (eg "the the", "to to", and so on). Most of these can simply be removed, but a couple were actual mis-spellings (eg "to to" instead of "to do"). There was even one triple word score "to to to" :-) Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-09-11 10:21:38 +03:00
Alex Williamson	759b484c5d	vfio/pci: Fix bootindex bootindex was incorrectly changed to a device Property during the platform code split, resulting in it no longer working. Remove it. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: qemu-stable@nongnu.org # v2.3+	2015-07-22 14:56:01 -06:00
Alex Williamson	69970fcef9	vfio/pci: Fix RTL8168 NIC quirks The RTL8168 quirk correctly describes using bit 31 as a signal to mark a latch/completion, but the code mistakenly uses bit 28. This causes the Realtek driver to spin on this register for quite a while, 20k cycles on Windows 7 v7.092 driver. Then it gets frustrated and tries to set the bit itself and spins for another 20k cycles. For some this still results in a working driver, for others not. About the only thing the code really does in its current form is protect the guest from sneaking in writes to the real hardware MSI-X table. The fix is obviously to use bit 31 as we document that we should. The other problem doesn't seem to affect current drivers as nobody seems to use these window registers for writes to the MSI-X table, but we need to use the stored data when a write is triggered, not the value of the current write, which only provides the offset. Note that only the Windows drivers from Realtek seem to use these registers, the Microsoft drivers provided with Windows 8.1 do not access them, nor do Linux in-kernel drivers. Link: https://bugs.launchpad.net/qemu/+bug/1384892 Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: qemu-stable@nongnu.org # v2.1+	2015-07-22 14:56:01 -06:00
Gabriel Laupre	4330296996	vfio/pci : Add pba_offset PCI quirk for Chelsio T5 devices Fix pba_offset initialization value for Chelsio T5 Virtual Function device. The T5 hardware has a bug in it where it reports a Pending Interrupt Bit Array Offset of 0x8000 for its SR-IOV Virtual Functions instead of the 0x1000 that the hardware actually uses internally. As the hardware doesn't return the correct pba_offset value, add a quirk to instead return a hardcoded value of 0x1000 when a Chelsio T5 VF device is detected. This bug has been fixed in the Chelsio's next chip series T6 but there are no plans to respin the T5 ASIC for this bug. It is just documented in the T5 Errata and left it at that. Signed-off-by: Gabriel Laupre <glaupre@chelsio.com> Reviewed-by: Bandan Das <bsd@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-07-06 12:15:15 -06:00
Alexey Kardashevskiy	f8d8a94400	vfio: Unregister IOMMU notifiers when container is destroyed On systems with guest visible IOMMU, adding a new memory region onto PCI bus calls vfio_listener_region_add() for every DMA window. This installs a notifier for IOMMU memory regions. The notifier is supposed to be removed vfio_listener_region_del(), however in the case of mixed PHB (emulated + VFIO devices) when last VFIO device is unplugged and container gets destroyed, all existing DMA windows stay alive altogether with the notifiers which are on the linked list which head was in the destroyed container. This unregisters IOMMU memory region notifier when a container is destroyed. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-07-06 12:15:15 -06:00
Eric Auger	fb5f816499	hw/vfio/platform: add irqfd support This patch aims at optimizing IRQ handling using irqfd framework. Instead of handling the eventfds on user-side they are handled on kernel side using - the KVM irqfd framework, - the VFIO driver virqfd framework. the virtual IRQ completion is trapped at interrupt controller This removes the need for fast/slow path swap. Overall this brings significant performance improvements. Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com> Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Vikram Sethi <vikrams@codeaurora.org> Acked-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-07-06 12:15:14 -06:00
Eric Auger	1c9b71a731	kvm: rename kvm_irqchip_[add,remove]_irqfd_notifier with gsi suffix Anticipating for the introduction of new add/remove functions taking a qemu_irq parameter, let's rename existing ones with a gsi suffix. Signed-off-by: Eric Auger <eric.auger@linaro.org> Tested-by: Vikram Sethi <vikrams@codeaurora.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-07-06 12:15:13 -06:00
Peter Crosthwaite	f7ceed190d	vfio: cpu: Use "real" page size API This is system level code, and should only depend on the host page size, not the target page size. Note that HOST_PAGE_SIZE is misleadingly lead and is really aligning to both host and target page size. Hence it's replacement with REAL_HOST_PAGE_SIZE. Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-07-06 12:15:12 -06:00
Paolo Bonzini	7d489dcdf5	vfio: fix return type of pread size_t is an unsigned type, thus the error case is never reached in the below call to pread. If bytes is negative, it will be seen as a very high positive value. Spotted by Coverity. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-07-06 12:15:12 -06:00
Leon Alrae	e207527751	vfio: fix build error on CentOS 5.7 Include linux/vfio.h after sys/ioctl.h, just like in hw/vfio/common.c. Signed-off-by: Leon Alrae <leon.alrae@imgtec.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-id: 1434544500-22405-1-git-send-email-leon.alrae@imgtec.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-06-18 10:35:59 +01:00
Eric Auger	0b70743d4f	hw/vfio/platform: replace g_malloc0_n by g_new0 g_malloc0_n() is introduced since glib-2.24 while QEMU currently requires glib-2.22. This may cause a link error on some distributions. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-06-11 14:22:57 +01:00
Eric Auger	7a8d15d770	hw/vfio/platform: calxeda xgmac device The platform device class has become abstract. This patch introduces a calxeda xgmac device that derives from it. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-09 08:17:17 -06:00
Eric Auger	38559979bf	hw/vfio/platform: add irq assignment This patch adds the code requested to assign interrupts to a guest. The interrupts are mediated through user handled eventfds only. Signed-off-by: Eric Auger <eric.auger@linaro.org> Tested-by: Vikram Sethi <vikrams@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-08 09:25:26 -06:00
Eric Auger	0ea2730bef	hw/vfio/platform: vfio-platform skeleton Minimal VFIO platform implementation supporting register space user mapping but not IRQ assignment. Signed-off-by: Kim Phillips <kim.phillips@linaro.org> Signed-off-by: Eric Auger <eric.auger@linaro.org> Tested-by: Vikram Sethi <vikrams@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-08 09:25:25 -06:00
Paolo Bonzini	41063e1e7a	exec: move rcu_read_lock/unlock to address_space_translate callers Once address_space_translate will be called outside the BQL, the returned MemoryRegion might disappear as soon as the RCU read-side critical section ends. Avoid this by moving the critical section to the callers. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1426684909-95030-3-git-send-email-pbonzini@redhat.com>	2015-04-30 16:55:32 +02:00
Alex Williamson	5655f931ab	vfio-pci: Reset workaround for AMD Bonaire and Hawaii GPUs Somehow these GPUs manage not to respond to a PCI bus reset, removing our primary mechanism for resetting graphics cards. The result is that these devices typically work well for a single VM boot. If the VM is rebooted or restarted, the guest driver is not able to init the card from the dirty state, resulting in a blue screen for Windows guests. The workaround is to use a device specific reset. This is not 100% reliable though since it depends on the incoming state of the device, but it substantially improves the usability of these devices in a VM. Credit to Alex Deucher <alexander.deucher@amd.com> for his guidance. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-28 11:14:02 -06:00
Alex Williamson	c6d231e2fd	vfio-pci: Fix error path sign This is an impossible error path due to the fact that we're reading a kernel provided, rather than user provided link, which will certainly always fit in PATH_MAX. Currently it returns a fixed 26 char path plus %d group number, which typically maxes out at double digits. However, the caller of the initfn certainly expects a less-than zero return value on error, not just a non-zero value. Therefore we should correct the sign here. Reported-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-28 11:14:02 -06:00
Alex Williamson	07ceaf9880	vfio-pci: Further fix BAR size overflow In an analysis by Laszlo, the resulting type of our calculation for the end of the MSI-X table, and thus the start of memory after the table, is uint32_t. We're therefore not correctly preventing the corner case overflow that we intended to fix here where a BAR >=4G could place the MSI-X table to end exactly at the 4G boundary. The MSI-X table offset is defined by the hardware spec to 32bits, so we simply use a cast rather than changing data structure types. This scenario is purely theoretically, typically the MSI-X table is located at the front of the BAR. Reported-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-28 11:14:02 -06:00
Peter Maydell	3b64349539	memory: Replace io_mem_read/write with memory_region_dispatch_read/write Rather than retaining io_mem_read/write as simple wrappers around the memory_region_dispatch_read/write functions, make the latter public and change all the callers to use them, since we need to touch all the callsites anyway to add MemTxAttrs and MemTxResult support. Delete io_mem_read and io_mem_write entirely. (All the callers currently pass MEMTXATTRS_UNSPECIFIED and convert the return value back to bool or ignore it.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>	2015-04-26 16:49:23 +01:00
Gonglei	78e5b17f04	vfio: Remove superfluous '\n' around error_report() Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-03-10 08:15:33 +03:00
Gavin Shan	2aad88f4b0	sPAPR: Implement sPAPRPHBClass EEH callbacks The patch implements sPAPRPHBClass EEH callbacks so that the EEH RTAS requests can be routed to VFIO for further handling. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>	2015-03-09 15:00:08 +01:00
Alex Williamson	47cbe50cc8	vfio-pci: Enable device request notification support Linux v4.0-rc1 vfio-pci introduced a new virtual interrupt to allow the kernel to request a device from the user. When signaled, QEMU will by default attmempt to hot-unplug the device. This is a one- shot attempt with the expectation that the kernel will continue to poll for the device if it is not returned. Returning the device when requested is the expected standard model of cooperative usage, but we also add an option option to disable this feature. Initially this opt-out is set as an experimental option because we really should honor kernel requests for the device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-02 11:38:55 -07:00
Samuel Pitoiset	6ee47c9008	vfio: allow to disable MMAP per device with -x-mmap=off option Disabling MMAP support uses the slower read/write accesses but allows to trace all MMIO accesses, which is not good for performance, but very useful for reverse engineering PCI drivers. This option allows to disable MMAP per device without a compile-time change. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-02 11:38:55 -07:00
Alexey Kardashevskiy	51b833f440	vfio: Make type1 listener symbols static They are not used from anywhere but common.c which is where these are defined so make them static. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-02 11:38:55 -07:00
Alexey Kardashevskiy	46f770d4a5	vfio: Add ioctl number to error report This makes the error report more informative. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-02 11:38:54 -07:00
Alexey Kardashevskiy	bc5baffa35	vfio: Fix debug message compile error This fixes a compiler error which occurs if DEBUG_VFIO is defined. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 10:25:44 -07:00
Alex Williamson	2e6e697e16	vfio: Use vfio type1 v2 IOMMU interface The difference between v1 and v2 is fairly subtle, simply more deterministic behavior for unmaps. The v1 interface allows the user to attempt to unmap sub-regions of previous mappings, returning success with zero size if unable to comply. This was a reflection of the underlying IOMMU API. The v2 interface requires that the user may only unmap fully contained mappings, ie. an unmap cannot intersect or bisect a previous mapping, but may cover multiple mappings. QEMU never made use of the sub-region v1 support anyway, so we can support either v1 or v2. We'll favor v2 since it's newer. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 10:25:44 -07:00
Paolo Bonzini	ba5e6bfa1a	vfio: unmap and free BAR data in instance_finalize In the case of VFIO, the unrealize callback is too early to munmap the BARs. The munmap must be delayed until memory accesses are complete. To do this, split vfio_unmap_bars in two. The removal step, now called vfio_unregister_bars, remains in vfio_exitfn. The reclamation step is vfio_unmap_bars and is moved to the instance_finalize callback. Similarly, quirk MemoryRegions have to be removed during vfio_unregister_bars, but freeing the data structure must be delayed to vfio_unmap_bars. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 10:25:44 -07:00
Paolo Bonzini	77a10d04d0	vfio: free dynamically-allocated data in instance_finalize In order to enable out-of-BQL address space lookup, destruction of devices needs to be split in two phases. Unrealize is the first phase; once it complete no new accesses will be started, but there may still be pending memory accesses can still be completed. The second part is freeing the device, which only happens once all memory accesses are complete. At this point the reference count has dropped to zero, an RCU grace period must have completed (because the RCU-protected FlatViews hold a reference to the device via memory_region_ref). This is when instance_finalize is called. Freeing data belongs in an instance_finalize callback, because the dynamically allocated memory can still be used after unrealize by the pending memory accesses. This starts the process by creating an instance_finalize callback and freeing most of the dynamically-allocated data in instance_finalize. Because instance_finalize is also called on error paths or also when the device is actually not realized, the common code needs some changes to be ready for this. The error path in vfio_initfn can be simplified too. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 10:25:44 -07:00
Paolo Bonzini	217e9fdcad	vfio: cleanup vfio_get_device error path, remove vfio_populate_device callback Now that vfio_put_base_device is called unconditionally at instance_finalize time, it can be called twice if vfio_populate_device fails. This works but it is slightly harder to follow. Change vfio_get_device to not touch the vbasedev struct until it will definitely succeed, moving the vfio_populate_device call back to vfio-pci. This way, vfio_put_base_device will only be called once. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 10:25:44 -07:00
Alex Williamson	3a4dbe6aa9	vfio-pci: Fix missing unparent of dynamically allocated MemoryRegion Commit `d8d9581460` added explicit object_unparent() calls for dynamically allocated MemoryRegions. The VFIOMSIXInfo structure also contains such a MemoryRegion, covering the mmap'd region of a PCI BAR above the MSI-X table. This structure is freed as part of the class exit function and therefore also needs an explicit object_unparent(). Failing to do this results in random segfaults due to fields within the structure, often the class pointer, being reclaimed and corrupted by the time object_finalize_child_property() is called for the object. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-stable@nongnu.org # 2.2	2015-02-04 11:45:32 -07:00
Chen Fan	39cb514f02	vfio: fix wrong initialize vfio_group_list Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-04 11:45:32 -07:00
Alex Williamson	b3e27c3aee	vfio-pci: Fix interrupt disabling When disabling MSI/X interrupts the disable functions will leave the device in INTx mode (when available). This matches how hardware operates, INTx is enabled unless MSI/X is enabled (DisINTx is handled separately). Therefore when we really want to disable all interrupts, such as when removing the device, and we start with the device in MSI/X mode, we need to pass through INTx on our way to being completely quiesced. In well behaved situations, the guest driver will have shutdown the device and it will start vfio_exitfn() in INTx mode, producing the desired result. If hot-unplug causes the guest to crash, we may get the device in MSI/X state, which will leave QEMU with a bogus handler installed. Fix this by re-ordering our disable routine so that it should always finish in VFIO_INT_NONE state, which is what all callers expect. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-01-09 08:50:53 -07:00
Alex Williamson	29c6e6df49	vfio-pci: Fix BAR size overflow We use an unsigned int when working with the PCI BAR size, which can obviously overflow if the BAR is 4GB or larger. This needs to change to a fixed length uint64_t. A similar issue is possible, though even more unlikely, when mapping the region above an MSI-X table. The start of the MSI-X vector table must be below 4GB, but the end, and therefore the start of the next mapping region, could still land at 4GB. Suggested-by: Nishank Trivedi <nishank.trivedi@netapp.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Don Slutz <dslutz@verizon.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2015-01-09 08:50:53 -07:00
Alex Williamson	dcbfc5cefb	vfio: Cleanup error_report()s With the conversion to tracepoints, a couple previous DPRINTKs are now quite a bit more visible and are really just informational. Remove these and add a bit more description to another. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-22 10:37:27 -07:00
Eric Auger	e2c7d025ad	hw/vfio: create common module A new common module is created. It implements all functions that have no device specificity (PCI, Platform). This patch only consists in move (no functional changes) Signed-off-by: Kim Phillips <kim.phillips@linaro.org> Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-22 09:54:51 -07:00
Eric Auger	df92ee4448	hw/vfio/pci: use name field in format strings Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-22 09:54:49 -07:00
Eric Auger	62356b7292	hw/vfio/pci: rename group_list into vfio_group_list better fit in the rest of the namespace Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-22 09:54:46 -07:00
Eric Auger	d13dd2d7a9	hw/vfio/pci: split vfio_get_device vfio_get_device now takes a VFIODevice as argument. The function is split into 2 parts: vfio_get_device which is generic and vfio_populate_device which is bus specific. 3 new fields are introduced in VFIODevice to store dev_info. vfio_put_base_device is created. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-22 09:54:38 -07:00
Eric Auger	a664477db8	hw/vfio/pci: Introduce VFIORegion This structure is going to be shared by VFIOPCIDevice and VFIOPlatformDevice. VFIOBAR includes it. vfio_eoi becomes an ops of VFIODevice specialized by parent device. This makes possible to transform vfio_bar_write/read into generic vfio_region_write/read that will be used by VFIOPlatformDevice too. vfio_mmap_bar becomes vfio_map_region Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-22 09:54:37 -07:00
Eric Auger	b47d8efa9f	hw/vfio/pci: handle reset at VFIODevice Since we can potentially have both PCI and platform devices in the same VFIO group, this latter now owns a list of VFIODevices. A unified reset handler, vfio_reset_handler, is registered, looping through this VFIODevice list. 2 specialized operations are introduced (vfio_compute_needs_reset and vfio_hot_reset_multi): they allow to implement type specific behavior. also reset_works and needs_reset VFIOPCIDevice fields are moved into VFIODevice. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-22 09:54:35 -07:00
Eric Auger	462037c9e8	hw/vfio/pci: add type, name and group fields in VFIODevice Add 3 new fields in the VFIODevice struct. Type is set to VFIO_DEVICE_TYPE_PCI. The type enum value will later be used to discriminate between VFIO PCI and platform devices. The name is set to domain🚌slot:function. Currently used to test whether the device already is attached to the group. Later on, the name will be used to simplify all traces. The group is simply moved from VFIOPCIDevice to VFIODevice. Signed-off-by: Eric Auger <eric.auger@linaro.org> [Fix g_strdup_printf() usage] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-22 09:54:31 -07:00
Eric Auger	5546a621a8	hw/vfio/pci: introduce minimalist VFIODevice with fd Introduce a new base VFIODevice strcut that will be used by both PCI and Platform VFIO device. Move VFIOPCIDevice fd field there. Obviously other fields from VFIOPCIDevice will be moved there but this patch file is introduced to ease the review. Also vfio_mask_single_irqindex, vfio_unmask_single_irqindex, vfio_disable_irqindex now take a VFIODevice handle as argument. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-19 15:24:31 -07:00
Eric Auger	079eb19cbb	hw/vfio/pci: generalize mask/unmask to any IRQ index To prepare for platform device introduction, rename vfio_mask_intx and vfio_unmask_intx into vfio_mask_single_irqindex and respectively unmask_single_irqindex. Also use a nex index parameter. With that name and prototype the function will be usable for other indexes than VFIO_PCI_INTX_IRQ_INDEX. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-19 15:24:24 -07:00
Eric Auger	9ee27d7381	hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice This prepares for the introduction of VFIOPlatformDevice Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-19 15:24:15 -07:00
Kim Phillips	cf7087db10	vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio This is done in preparation for the addition of VFIO platform device support. Signed-off-by: Kim Phillips <kim.phillips@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-19 15:24:06 -07:00

1 2 3 4 5

249 Commits