mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Cao jin	ee640c625e	pci: Convert msix_init() to Error and fix callers msix_init() reports errors with error_report(), which is wrong when it's used in realize(). The same issue was fixed for msi_init() in commit `1108b2f`. In order to make the API change as small as possible, leave the return value check to later patch. For some devices(like e1000e, vmxnet3, nvme) who won't fail because of msix_init's failure, suppress the error report by passing NULL error object. Bonus: add comment for msix_init. CC: Jiri Pirko <jiri@resnulli.us> CC: Gerd Hoffmann <kraxel@redhat.com> CC: Dmitry Fleytman <dmitry@daynix.com> CC: Jason Wang <jasowang@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Hannes Reinecke <hare@suse.de> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Alex Williamson <alex.williamson@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-01 03:37:18 +02:00
Cao jin	8907379204	vfio: remove a duplicated word in comments Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-01-24 23:26:53 +03:00
Yongji Xie	95251725e3	vfio: Add support for mmapping sub-page MMIO BARs Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive") allows VFIO to mmap sub-page BARs. This is the corresponding QEMU patch. With those patches applied, we could passthrough sub-page BARs to guest, which can help to improve IO performance for some devices. In this patch, we expand MemoryRegions of these sub-page MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION with a valid size. The expanding size will be recovered when the base address of sub-page BAR is changed and not page aligned any more in guest. And we also set the priority of these BARs' memory regions to zero in case of overlap with BARs which share the same page with sub-page BARs in guest. Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-31 09:53:04 -06:00
Ido Yariv	a52a4c4717	vfio/pci: fix out-of-sync BAR information on reset When a PCI device is reset, pci_do_device_reset resets all BAR addresses in the relevant PCIDevice's config buffer. The VFIO configuration space stays untouched, so the guest OS may choose to skip restoring the BAR addresses as they would seem intact. The PCI device may be left non-operational. One example of such a scenario is when the guest exits S3. Fix this by resetting the BAR addresses in the VFIO configuration space as well. Signed-off-by: Ido Yariv <ido@wizery.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-31 09:53:04 -06:00
Cao jin	893bfc3cc8	vfio: fix duplicate function call When vfio device is reset(encounter FLR, or bus reset), if need to do bus reset(vfio_pci_hot_reset_one is called), vfio_pci_pre_reset & vfio_pci_post_reset will be called twice. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:03 -06:00
Eric Auger	4a94626850	vfio/pci: Handle host oversight In case the end-user calls qemu with -vfio-pci option without passing either sysfsdev or host property value, the device is interpreted as 0000:00:00.0. Let's create a specific error message to guide the end-user. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:02 -06:00
Eric Auger	e04cff9d97	vfio/pci: Remove vfio_populate_device returned value The returned value (either -errno or -1) is not used anymore by the caller, vfio_realize, since the error now is stored in the error object. So let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:02 -06:00
Eric Auger	ec3bcf424e	vfio/pci: Remove vfio_msix_early_setup returned value The returned value is not used anymore by the caller, vfio_realize, since the error now is stored in the error object. So let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:01 -06:00
Eric Auger	1a22aca1d0	vfio/pci: Conversion to realize This patch converts VFIO PCI to realize function. Also original initfn errors now are propagated using QEMU error objects. All errors are formatted with the same pattern: "vfio: %s: the error description" Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:01 -06:00
Eric Auger	59f7d6743c	vfio: Pass an error object to vfio_get_device Pass an error object to prepare for migration to VFIO-PCI realize. In vfio platform vfio_base_device_init we currently just report the error. Subsequent patches will propagate the error up to the realize function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:00 -06:00
Eric Auger	1b808d5be0	vfio: Pass an error object to vfio_get_group Pass an error object to prepare for migration to VFIO-PCI realize. For the time being let's just simply report the error in vfio platform's vfio_base_device_init(). A subsequent patch will duly propagate the error up to vfio_platform_realize. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:59 -06:00
Eric Auger	7237011d05	vfio/pci: Pass an error object to vfio_pci_igd_opregion_init Pass an error object to prepare for migration to VFIO-PCI realize. In vfio_probe_igd_bar4_quirk, simply report the error. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:59 -06:00
Eric Auger	7ef165b9a8	vfio/pci: Pass an error object to vfio_add_capabilities Pass an error object to prepare for migration to VFIO-PCI realize. The error is cascaded downto vfio_add_std_cap and then vfio_msi(x)_setup, vfio_setup_pcie_cap. vfio_add_ext_cap does not return anything else than 0 so let's transform it into a void function. Also use pci_add_capability2 which takes an error object. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:58 -06:00
Eric Auger	7dfb34247e	vfio/pci: Pass an error object to vfio_intx_enable Pass an error object to prepare for migration to VFIO-PCI realize. The error object is propagated down to vfio_intx_enable_kvm(). The three other callers, vfio_intx_enable_kvm(), vfio_msi_disable_common() and vfio_pci_post_reset() do not propagate the error and simply call error_reportf_err() with the ERR_PREFIX formatting. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:58 -06:00
Eric Auger	008d0e2d7b	vfio/pci: Pass an error object to vfio_msix_early_setup Pass an error object to prepare for migration to VFIO-PCI realize. The returned value will be removed later on. We now format an error in case of reading failure for - the MSIX flags - the MSIX table, - the MSIX PBA. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:58 -06:00
Eric Auger	2312d907dd	vfio/pci: Pass an error object to vfio_populate_device Pass an error object to prepare for migration to VFIO-PCI realize. The returned value will be removed later on. The case where error recovery cannot be enabled is not converted into an error object but directly reported through error_report, as before. Populating an error instead would cause the future realize function to fail, which is not wanted. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:57 -06:00
Eric Auger	cde4279baa	vfio/pci: Pass an error object to vfio_populate_vga Pass an error object to prepare for the same operation in vfio_populate_device. Eventually this contributes to the migration to VFIO-PCI realize. We now report an error on vfio_get_region_info failure. vfio_probe_igd_bar4_quirk is not involved in the migration to realize and simply calls error_reportf_err. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:57 -06:00
Eric Auger	426ec9049e	vfio/pci: Use local error object in vfio_initfn To prepare for migration to realize, let's use a local error object in vfio_initfn. Also let's use the same error prefix for all error messages. On top of the 1-1 conversion, we start using a common error prefix for all error messages. We also introduce a similar warning prefix which will be used later on. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:56 -06:00
David Gibson	6d17a018d0	vfio/pci: Fix regression in MSI routing configuration `d1f6af6` "kvm-irqchip: simplify kvm_irqchip_add_msi_route" was a cleanup of kvmchip routing configuration, that was mostly intended for x86. However, it also contains a subtle change in behaviour which breaks EEH[1] error recovery on certain VFIO passthrough devices on spapr guests. So far it's only been seen on a BCM5719 NIC on a POWER8 server, but there may be other hardware with the same problem. It's also possible there could be circumstances where it causes a bug on x86 as well, though I don't know of any obvious candidates. Prior to `d1f6af6`, both vfio_msix_vector_do_use() and vfio_add_kvm_msi_virq() used msg == NULL as a special flag to mark this as the "dummy" vector used to make the host hardware state sync with the guest expected hardware state in terms of MSI configuration. Specifically that flag caused vfio_add_kvm_msi_virq() to become a no-op, meaning the dummy irq would always be delivered via qemu. `d1f6af6` changed vfio_add_kvm_msi_virq() so it takes a vector number instead of the msg parameter, and determines the correct message itself. The test for !msg was removed, and not replaced with anything there or in the caller. With an spapr guest which has a VFIO device, if an EEH error occurs on the host hardware, then the device will be isolated then reset. This is a combination of host and guest action, mediated by some EEH related hypercalls. I haven't fully traced the mechanics, but somehow installing the kvm irqchip route for the dummy irq on the BCM5719 means that after EEH reset and recovery, at least some irqs are no longer delivered to the guest. In particular, the guest never gets the link up event, and so the NIC is effectively dead. [1] EEH (Enhanced Error Handling) is an IBM POWER server specific PCI-* error reporting and recovery mechanism. The concept is somewhat similar to PCI-E AER, but the details are different. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1373802 Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Gavin Shan <gwshan@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Cc: qemu-stable@nongnu.org Fixes: `d1f6af6a17` ("kvm-irqchip: simplify kvm_irqchip_add_msi_route") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-09-15 10:41:36 -06:00
Peter Xu	3f1fea0fb5	kvm-irqchip: do explicit commit when update irq In the past, we are doing gsi route commit for each irqchip route update. This is not efficient if we are updating lots of routes in the same time. This patch removes the committing phase in kvm_irqchip_update_msi_route(). Instead, we do explicit commit after all routes updated. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-07-21 20:44:19 +03:00
Peter Xu	d1f6af6a17	kvm-irqchip: simplify kvm_irqchip_add_msi_route Changing the original MSIMessage parameter in kvm_irqchip_add_msi_route into the vector number. Vector index provides more information than the MSIMessage, we can retrieve the MSIMessage using the vector easily. This will avoid fetching MSIMessage every time before adding MSI routes. Meanwhile, the vector info will be used in the coming patches to further enable gsi route update notifications. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-07-21 20:44:18 +03:00
Alex Williamson	383a7af7ec	vfio/pci: Hide ARI capability QEMU supports ARI on downstream ports and assigned devices may support ARI in their extended capabilities. The endpoint ARI capability specifies the next function, such that the OS doesn't need to walk each possible function, however this next function is relative to the host, not the guest. This leads to device discovery issues when we combine separate functions into virtual multi-function packages in a guest. For example, SR-IOV VFs are not enumerated by simply probing the function address space, therefore the ARI next-function field is zero. When we combine multiple VFs together as a multi-function device in the guest, the guest OS identifies ARI is enabled, relies on this next-function field, and stops looking for additional function after the first is found. Long term we should expose the ARI capability to the guest to enable configurations with more than 8 functions per slot, but this requires additional QEMU PCI infrastructure to manage the next-function field for multiple, otherwise independent devices. In the short term, hiding this capability allows equivalent functionality to what we currently have on non-express chipsets. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>	2016-07-18 10:55:17 -06:00
Cao jin	1108b2f8a9	pci: Convert msi_init() to Error and fix callers to check it msi_init() reports errors with error_report(), which is wrong when it's used in realize(). Fix by converting it to Error. Fix its callers to handle failure instead of ignoring it. For those callers who don't handle the failure, it might happen: when user want msi on, but he doesn't get what he want because of msi_init fails silently. cc: Gerd Hoffmann <kraxel@redhat.com> cc: John Snow <jsnow@redhat.com> cc: Dmitry Fleytman <dmitry@daynix.com> cc: Jason Wang <jasowang@redhat.com> cc: Michael S. Tsirkin <mst@redhat.com> cc: Hannes Reinecke <hare@suse.de> cc: Paolo Bonzini <pbonzini@redhat.com> cc: Alex Williamson <alex.williamson@redhat.com> cc: Markus Armbruster <armbru@redhat.com> cc: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com>	2016-07-05 13:14:41 +03:00
Alex Williamson	e37dac06dc	vfio/pci: Hide SR-IOV capability The kernel currently exposes the SR-IOV capability as read-only through vfio-pci. This is sufficient to protect the host kernel, but has the potential to confuse guests without further virtualization. In particular, OVMF tries to size the VF BARs and comes up with absurd results, ending with an assert. There's not much point in adding virtualization to a read-only capability, so we simply hide it for now. If the kernel ever enables SR-IOV virtualization, we should easily be able to test it through VF BAR sizing or explicit flags. Testing whether we should parse extended capabilities is also pulled into the function to keep these assumptions in one place. Tested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-30 13:00:23 -06:00
Chen Fan	325ae8d548	vfio: add pcie extended capability support For vfio pcie device, we could expose the extended capability on PCIE bus. due to add a new pcie capability at the tail of the chain, in order to avoid config space overwritten, we introduce a copy config for parsing extended caps. and rebuild the pcie extended config space. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-30 13:00:23 -06:00
Paolo Bonzini	02d0e09503	os-posix: include sys/mman.h qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check is bogus without a previous inclusion of sys/mman.h. Include it in sysemu/os-posix.h and remove it from everywhere else. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 18:39:03 +02:00
Alex Williamson	6ced0bba70	vfio/pci: Add a separate option for IGD OpRegion support The IGD OpRegion is enabled automatically when running in legacy mode, but it can sometimes be useful in universal passthrough mode as well. Without an OpRegion, output spigots don't work, and even though Intel doesn't officially support physical outputs in UPT mode, it's a useful feature. Note that if an OpRegion is enabled but a monitor is not connected, some graphics features will be disabled in the guest versus a headless system without an OpRegion, where they would work. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:12:03 -06:00
Alex Williamson	c4c45e943e	vfio/pci: Intel graphics legacy mode assignment Enable quirks to support SandyBridge and newer IGD devices as primary VM graphics. This requires new vfio-pci device specific regions added in kernel v4.6 to expose the IGD OpRegion, the shadow ROM, and config space access to the PCI host bridge and LPC/ISA bridge. VM firmware support, SeaBIOS only so far, is also required for reserving memory regions for IGD specific use. In order to enable this mode, IGD must be assigned to the VM at PCI bus address 00:02.0, it must have a ROM, it must be able to enable VGA, it must have or be able to create on its own an LPC/ISA bridge of the proper type at PCI bus address 00:1f.0 (sorry, not compatible with Q35 yet), and it must have the above noted vfio-pci kernel features and BIOS. The intention is that to enable this mode, a user simply needs to assign 00:02.0 from the host to 00:02.0 in the VM: -device vfio-pci,host=0000:00:02.0,bus=pci.0,addr=02.0 and everything either happens automatically or it doesn't. In the case that it doesn't, we leave error reports, but assume the device will operate in universal passthrough mode (UPT), which doesn't require any of this, but has a much more narrow window of supported devices, supported use cases, and supported guest drivers. When using IGD in this mode, the VM firmware is required to reserve some VM RAM for the OpRegion (on the order or several 4k pages) and stolen memory for the GTT (up to 8MB for the latest GPUs). An additional option, x-igd-gms allows the user to specify some amount of additional memory (value is number of 32MB chunks up to 512MB) that is pre-allocated for graphics use. TBH, I don't know of anything that requires this or makes use of this memory, which is why we don't allocate any by default, but the specification suggests this is not actually a valid combination, so the option exists as a workaround. Please report if it's actually necessary in some environment. See code comments for further discussion about the actual operation of the quirks necessary to assign these devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:12:01 -06:00
Alex Williamson	581406e0e3	vfio/pci: Setup BAR quirks after capabilities probing Capability probing modifies wmask, which quirks may be interested in changing themselves. Apply our BAR quirks after the capability scan to make this possible. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:12:00 -06:00
Alex Williamson	182bca4592	vfio/pci: Consolidate VGA setup Combine VGA discovery and registration. Quirks can have dependencies on BARs, so the quirks push out until after we've scanned the BARs. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:11:58 -06:00
Alex Williamson	4225f2b670	vfio/pci: Fix return of vfio_populate_vga() This function returns success if either we setup the VGA region or the host vfio doesn't return enough regions to support the VGA index. This latter case doesn't make any sense. If we're asked to populate VGA, fail if it doesn't exist and let the caller decide if that's important. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:11:56 -06:00
Neo Jia	062ed5d8d6	vfio/pci: replace fixed string limit by g_strdup_printf A trivial change to remove string limit by using g_strdup_printf Tested-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 20:50:43 -07:00
Alex Williamson	e593c0211b	vfio/pci: Split out VGA setup This could be setup later by device specific code, such as IGD initialization. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 20:50:41 -07:00
Alex Williamson	e2e5ee9c56	vfio/pci: Fixup PCI option ROMs Devices like Intel graphics are known to not only have bad checksums, but also the wrong device ID. This is not so surprising given that the video BIOS is typically part of the system firmware image rather that embedded into the device and needs to support any IGD device installed into the system. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 20:50:39 -07:00
Alex Williamson	2d82f8a3cd	vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions Match common vfio code with setup, exit, and finalize functions for BAR, quirk, and VGA management. VGA is also changed to dynamic allocation to match the other MemoryRegions. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 20:50:38 -07:00
Alex Williamson	db0da029a1	vfio: Generalize region support Both platform and PCI vfio drivers create a "slow", I/O memory region with one or more mmap memory regions overlayed when supported by the device. Generalize this to a set of common helpers in the core that pulls the region info from vfio, fills the region data, configures slow mapping, and adds helpers for comleting the mmap, enable/disable, and teardown. This can be immediately used by the PCI MSI-X code, which needs to mmap around the MSI-X vector table. This also changes VFIORegion.mem to be dynamically allocated because otherwise we don't know how the caller has allocated VFIORegion and therefore don't know whether to unreference it to destroy the MemoryRegion or not. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 20:03:16 -07:00
Alex Williamson	469002263a	vfio: Wrap VFIO_DEVICE_GET_REGION_INFO In preparation for supporting capability chains on regions, wrap ioctl(VFIO_DEVICE_GET_REGION_INFO) so we don't duplicate the code for each caller. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 09:39:07 -07:00
Alex Williamson	7df9381b7a	vfio: Add sysfsdev property for pci & platform vfio-pci currently requires a host= parameter, which comes in the form of a PCI address in [domain:]<bus:slot.function> notation. We expect to find a matching entry in sysfs for that under /sys/bus/pci/devices/. vfio-platform takes a similar approach, but defines the host= parameter to be a string, which can be matched directly under /sys/bus/platform/devices/. On the PCI side, we have some interest in using vfio to expose vGPU devices. These are not actual discrete PCI devices, so they don't have a compatible host PCI bus address or a device link where QEMU wants to look for it. There's also really no requirement that vfio can only be used to expose physical devices, a new vfio bus and iommu driver could expose a completely emulated device. To fit within the vfio framework, it would need a kernel struct device and associated IOMMU group, but those are easy constraints to manage. To support such devices, which would include vGPUs, that honor the VFIO PCI programming API, but are not necessarily backed by a unique PCI address, add support for specifying any device in sysfs. The vfio API already has support for probing the device type to ensure compatibility with either vfio-pci or vfio-platform. With this, a vfio-pci device could either be specified as: -device vfio-pci,host=02:00.0 or -device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0 or even -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0 When vGPU support comes along, this might look something more like: -device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0 NB - This is only a made up example path The same change is made for vfio-platform, specifying sysfsdev has precedence over the old host option. Tested-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 09:39:07 -07:00
Wei Yang	b58b17f744	vfio/pci: use PCI_MSIX_FLAGS on retrieving the MSIX entries Even PCI_CAP_FLAGS has the same value as PCI_MSIX_FLAGS, the later one is the more proper on retrieving MSIX entries. This patch uses PCI_MSIX_FLAGS to retrieve the MSIX entries. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-19 09:42:32 -07:00
Wei Yang	3fc1c182c1	vfio/pci: replace 1 with PCI_CAP_LIST_NEXT to make code self-explain Use the macro PCI_CAP_LIST_NEXT instead of 1, so that the code would be more self-explain. This patch makes this change and also fixs one typo in comment. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-19 09:42:29 -07:00
Chen Fan	88caf177ac	vfio: make the 4 bytes aligned for capability size this function search the capability from the end, the last size should 0x100 - pos, not 0xff - pos. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-19 09:42:28 -07:00
Peter Maydell	c6eacb1ac0	hw/vfio: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1453832250-766-22-git-send-email-peter.maydell@linaro.org	2016-01-29 15:07:24 +00:00
Alex Williamson	95239e1625	vfio/pci: Lazy PBA emulation The PCI spec recommends devices use additional alignment for MSI-X data structures to allow software to map them to separate processor pages. One advantage of doing this is that we can emulate those data structures without a significant performance impact to the operation of the device. Some devices fail to implement that suggestion and assigned device performance suffers. One such case of this is a Mellanox MT27500 series, ConnectX-3 VF, where the MSI-X vector table and PBA are aligned on separate 4K pages. If PBA emulation is enabled, performance suffers. It's not clear how much value we get from PBA emulation, but the solution here is to only lazily enable the emulated PBA when a masked MSI-X vector fires. We then attempt to more aggresively disable the PBA memory region any time a vector is unmasked. The expectation is then that a typical VM will run entirely with PBA emulation disabled, and only when used is that emulation re-enabled. Reported-by: Shyam Kaushik <shyam.kaushik@gmail.com> Tested-by: Shyam Kaushik <shyam.kaushik@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-01-19 11:33:42 -07:00
Markus Armbruster	bdd81addf4	vfio: Use g_new() & friends where that makes obvious sense g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. This commit only touches allocations with size arguments of the form sizeof(T). Same Coccinelle semantic patch as in commit `b45c03f`. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-10 12:11:08 -07:00
Alex Williamson	0282abf078	vfio/pci: Hide device PCIe capability on non-express buses for PCIe VMs When we have a PCIe VM, such as Q35, guests start to care more about valid configurations of devices relative to the VM view of the PCI topology. Windows will error with a Code 10 for an assigned device if a PCIe capability is found for a device on a conventional bus. We also have the possibility of IOMMUs, like VT-d, where the where the guest may be acutely aware of valid express capabilities on physical hardware. Some devices, like tg3 are adversely affected by this due to driver dependencies on the PCIe capability. The only solution for such devices is to attach them to an express capable bus in the VM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-11-10 12:11:08 -07:00
Pavel Fedin	dc9f06ca81	kvm: Pass PCI device pointer to MSI routing functions In-kernel ITS emulation on ARM64 will require to supply requester IDs. These IDs can now be retrieved from the device pointer using new pci_requester_id() function. This patch adds pci_dev pointer to KVM GSI routing functions and makes callers passing it. x86 architecture does not use requester IDs, but hw/i386/kvm/pci-assign.c also made passing PCI device pointer instead of NULL for consistency with the rest of the code. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Message-Id: <ce081423ba2394a4efc30f30708fca07656bc500.1444916432.git.p.fedin@samsung.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-19 10:13:07 +02:00
Alex Williamson	89dcccc593	vfio/pci: Add emulated PCI IDs Specifying an emulated PCI vendor/device ID can be useful for testing various quirk paths, even though the behavior and functionality of the device with bogus IDs is fully unsupportable. We need to use a uint32_t for the vendor/device IDs, even though the registers themselves are only 16-bit in order to be able to determine whether the value is valid and user set. The same support is added for subsystem vendor/device ID, though these have the possibility of being useful and supported for more than a testing tool. An emulated platform might want to impose their own subsystem IDs or at least hide the physical subsystem ID. Windows guests will often reinstall drivers due to a change in subsystem IDs, something that VM users may want to avoid. Of course careful attention would be required to ensure that guest drivers do not rely on the subsystem ID as a basis for device driver quirks. All of these options are added using the standard experimental option prefix and should not be considered stable. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:49 -06:00
Alex Williamson	ff635e3775	vfio/pci: Cache vendor and device ID Simplify access to commonly referenced PCI vendor and device ID by caching it on the VFIOPCIDevice struct. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:49 -06:00
Alex Williamson	c9c5000991	vfio/pci: Move AMD device specific reset to quirks This is just another quirk, for reset rather than affecting memory regions. Move it to our new quirks file. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:49 -06:00
Alex Williamson	c00d61d8fa	vfio/pci: Split quirks to a separate file Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:45 -06:00

1 2

90 Commits