mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Alexey Kardashevskiy	07bc681a33	vfio/spapr: Use iommu memory region's get_attr() In order to enable TCE operations support in KVM, we have to inform the KVM about VFIO groups being attached to specific LIOBNs. The KVM already knows about VFIO groups, the only bit missing is which in-kernel TCE table (the one with user visible TCEs) should update the attached broups. There is an KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE attribute of the VFIO KVM device which receives a groupfd/tablefd couple. This uses a new memory_region_iommu_get_attr() helper to get the IOMMU fd and calls KVM to establish the link. As get_attr() is not implemented yet, this should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-02-06 11:08:24 -07:00
Philippe Mathieu-Daudé	bf85388169	qdev: use device_class_set_parent_realize/unrealize/reset() changes generated using the following Coccinelle patch: @@ type DeviceParentClass; DeviceParentClass pc; DeviceClass dc; identifier parent_fn; identifier child_fn; @@ ( +device_class_set_parent_realize(dc, child_fn, &pc->parent_fn); -pc->parent_fn = dc->realize; ... -dc->realize = child_fn; \| +device_class_set_parent_unrealize(dc, child_fn, &pc->parent_fn); -pc->parent_fn = dc->unrealize; ... -dc->unrealize = child_fn; \| +device_class_set_parent_reset(dc, child_fn, &pc->parent_fn); -pc->parent_fn = dc->reset; ... -dc->reset = child_fn; ) Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180114020412.26160-4-f4bug@amsat.org> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-02-05 13:54:38 +01:00
Michael S. Tsirkin	acc95bc850	Merge remote-tracking branch 'origin/master' into HEAD Resolve conflicts around apb. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-01-11 22:03:50 +02:00
Philippe Mathieu-Daudé	e9808d0969	hw: use "qemu/osdep.h" as first #include in source files applied using ./scripts/clean-includes Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Alexey Kardashevskiy	2fb9636ebf	vfio-pci: Remove unused fields from VFIOMSIXInfo When support for multiple mappings per a region were added, this was left behind, let's finish and remove unused bits. Fixes: `db0da029a1` ("vfio: Generalize region support") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-12-13 10:19:34 -07:00
Alexey Kardashevskiy	c6e7958eb7	vfio/spapr: Allow fallback to SPAPR TCE IOMMU v1 The vfio_iommu_spapr_tce driver advertises kernel's support for v1 and v2 IOMMU support, however it is not always possible to use the requested IOMMU type. For example, a pseries host platform does not support dynamic DMA windows so v2 cannot initialize and QEMU fails to start. This adds a fallback to the v1 IOMMU if v2 cannot be used. Fixes: `318f67ce13` ("vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-12-13 10:19:33 -07:00
Liu, Yi L	f7f9c7b232	vfio/common: init giommu_list and hostwin_list of vfio container The init of giommu_list and hostwin_list is missed during container initialization. Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-12-13 10:19:33 -07:00
Alex Williamson	2016986aed	vfio: Fix vfio-kvm group registration Commit `8c37faa475` ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching") moved registration of groups with the vfio-kvm device from vfio_get_group() to vfio_connect_container(), but it missed the case where a group is attached to an existing container and takes an early exit. Perhaps this is a less common case on ppc64/spapr, but on x86 (without viommu) all groups are connected to the same container and thus only the first group gets registered with the vfio-kvm device. This becomes a problem if we then hot-unplug the devices associated with that first group and we end up with KVM being misinformed about any vfio connections that might remain. Fix by including the call to vfio_kvm_device_add_group() in this early exit path. Fixes: `8c37faa475` ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching") Cc: qemu-stable@nongnu.org # qemu-2.10+ Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-12-13 10:19:32 -07:00
David Gibson	fd56e0612b	pci: Eliminate redundant PCIDevice::bus pointer The bus pointer in PCIDevice is basically redundant with QOM information. It's always initialized to the qdev_get_parent_bus(), the only difference is the type. Therefore this patch eliminates the field, instead creating a pci_get_bus() helper to do the type mangling to derive it conveniently from the QOM Device object underneath. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-12-05 19:13:45 +02:00
Halil Pasic	66dc50f705	s390x: improve error handling for SSCH and RSCH Simplify the error handling of the SSCH and RSCH handler avoiding arbitrary and cryptic error codes being used to tell how the instruction is supposed to end. Let the code detecting the condition tell how it's to be handled in a less ambiguous way. It's best to handle SSCH and RSCH in one go as the emulation of the two shares a lot of code. For passthrough this change isn't pure refactoring, but changes the way kernel reported EFAULT is handled. After clarifying the kernel interface we decided that EFAULT shall be mapped to unit exception. Same goes for unexpected error codes and absence of required ORB flags. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Message-Id: <20171017140453.51099-4-pasic@linux.vnet.ibm.com> Tested-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [CH: cosmetic changes] Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>	2017-10-20 13:32:10 +02:00
Eduardo Habkost	fd3b02c889	pci: Add INTERFACE_CONVENTIONAL_PCI_DEVICE to Conventional PCI devices Add INTERFACE_CONVENTIONAL_PCI_DEVICE to all direct subtypes of TYPE_PCI_DEVICE, except: 1) The ones that already have INTERFACE_PCIE_DEVICE set: * base-xhci * e1000e * nvme * pvscsi * vfio-pci * virtio-pci * vmxnet3 2) base-pci-bridge Not all PCI bridges are Conventional PCI devices, so INTERFACE_CONVENTIONAL_PCI_DEVICE is added only to the subtypes that are actually Conventional PCI: * dec-21154-p2p-bridge * i82801b11-bridge * pbm-bridge * pci-bridge The direct subtypes of base-pci-bridge not touched by this patch are: * xilinx-pcie-root: Already marked as PCIe-only. * pcie-pci-bridge: Already marked as PCIe-only. * pcie-port: all non-abstract subtypes of pcie-port are already marked as PCIe-only devices. 3) megasas-base Not all megasas devices are Conventional PCI devices, so the interface names are added to the subclasses registered by megasas_register_types(), according to information in the megasas_devices[] array. "megasas-gen2" already implements INTERFACE_PCIE_DEVICE, so add INTERFACE_CONVENTIONAL_PCI_DEVICE only to "megasas". Acked-by: Alberto Garcia <berto@igalia.com> Acked-by: John Snow <jsnow@redhat.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-10-15 05:54:43 +03:00
Eduardo Habkost	a5fa336f11	pci: Add interface names to hybrid PCI devices The following devices support both PCI Express and Conventional PCI, by including special code to handle the QEMU_PCI_CAP_EXPRESS flag and/or conditional pcie_endpoint_cap_init() calls: * vfio-pci (is_express=1, but legacy PCI handled by vfio_populate_device()) * vmxnet3 (is_express=0, but PCIe handled by vmxnet3_realize()) * pvscsi (is_express=0, but PCIe handled by pvscsi_realize()) * virtio-pci (is_express=0, but PCIe handled by virtio_pci_dc_realize(), and additional legacy PCI code at virtio_pci_realize()) * base-xhci (is_express=1, but pcie_endpoint_cap_init() call is conditional on pci_bus_is_express(dev->bus) * Note that xhci does not clear QEMU_PCI_CAP_EXPRESS like the other hybrid devices Cc: Dmitry Fleytman <dmitry@daynix.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-10-15 05:54:42 +03:00
Cornelia Huck	bd2aef1065	s390x: sort some devices into categories Add missing categorizations for some s390x devices: - zpci device -> misc - 3270 -> display - vfio-ccw -> misc Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>	2017-10-06 10:53:02 +02:00
Alex Williamson	dfbee78db8	vfio/pci: Add NVIDIA GPUDirect Cliques support NVIDIA has defined a specification for creating GPUDirect "cliques", where devices with the same clique ID support direct peer-to-peer DMA. When running on bare-metal, tools like NVIDIA's p2pBandwidthLatencyTest (part of cuda-samples) determine which GPUs can support peer-to-peer based on chipset and topology. When running in a VM, these tools have no visibility to the physical hardware support or topology. This option allows the user to specify hints via a vendor defined capability. For instance: <qemu:commandline> <qemu:arg value='-set'/> <qemu:arg value='device.hostdev0.x-nv-gpudirect-clique=0'/> <qemu:arg value='-set'/> <qemu:arg value='device.hostdev1.x-nv-gpudirect-clique=1'/> <qemu:arg value='-set'/> <qemu:arg value='device.hostdev2.x-nv-gpudirect-clique=1'/> </qemu:commandline> This enables two cliques. The first is a singleton clique with ID 0, for the first hostdev defined in the XML (note that since cliques define peer-to-peer sets, singleton clique offer no benefit). The subsequent two hostdevs are both added to clique ID 1, indicating peer-to-peer is possible between these devices. QEMU only provides validation that the clique ID is valid and applied to an NVIDIA graphics device, any validation that the resulting cliques are functional and valid is the user's responsibility. The NVIDIA specification allows a 4-bit clique ID, thus valid values are 0-15. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-10-03 12:57:36 -06:00
Alex Williamson	e3f79f3bd4	vfio/pci: Add virtual capabilities quirk infrastructure If the hypervisor needs to add purely virtual capabilties, give us a hook through quirks to do that. Note that we determine the maximum size for a capability based on the physical device, if we insert a virtual capability, that can change. Therefore if maximum size is smaller after added virt capabilities, use that. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-10-03 12:57:36 -06:00
Alex Williamson	5b31c8229d	vfio/pci: Do not unwind on error If vfio_add_std_cap() errors then going to out prepends irrelevant errors for capabilities we haven't attempted to add as we unwind our recursive stack. Just return error. Fixes: `7ef165b9a8` ("vfio/pci: Pass an error object to vfio_add_capabilities") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-10-03 12:57:35 -06:00
Alexey Kardashevskiy	e100161b69	vfio, spapr: Fix levels calculation The existing tries to round up the number of pages but @pages is always calculated as the rounded up value minus one which makes ctz64() always return 0 and have create.levels always set 1. This removes wrong "-1" and allows having more than 1 levels. This becomes handy for >128GB guests with standard 64K pages as this requires blocks with zone order 9 and the popular limit of CONFIG_FORCE_MAX_ZONEORDER=9 means that only blocks up to order 8 are allowed. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-15 10:29:48 +10:00
Vladimir Sementsov-Ogievskiy	8908eb1a4a	trace-events: fix code style: print 0x before hex numbers The only exception are groups of numers separated by symbols '.', ' ', ':', '/', like 'ab.09.7d'. This patch is made by the following: > find . -name trace-events \| xargs python script.py where script.py is the following python script: ========================= #!/usr/bin/env python import sys import re import fileinput rhex = '%[-+ .0-9](?:[hljztL]\|ll\|hh)?(?:x\|X\|"\sPRI[xX][^"]"?)' rgroup = re.compile('((?:' + rhex + '[.:/ ])+' + rhex + ')') rbad = re.compile('(?<!0x)' + rhex) files = sys.argv[1:] for fname in files: for line in fileinput.input(fname, inplace=True): arr = re.split(rgroup, line) for i in range(0, len(arr), 2): arr[i] = re.sub(rbad, '0x\g<0>', arr[i]) sys.stdout.write(''.join(arr)) ========================= Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Message-id: 20170731160135.12101-5-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-01 12:13:07 +01:00
Philippe Mathieu-Daudé	87e0331c5a	docs: fix broken paths to docs/devel/tracing.txt With the move of some docs/ to docs/devel/ on `ac06724a71`, no references were updated. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-07-31 13:12:53 +03:00
Philippe Mathieu-Daudé	96d2c2c574	vfio/pci: fix use of freed memory hw/vfio/pci.c:308:29: warning: Use of memory after it is freed qemu_set_fd_handler(*pfd, NULL, NULL, vdev); ^~~~ Reported-by: Clang Static Analyzer Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-07-26 11:38:18 -06:00
Philippe Mathieu-Daudé	418c69813f	vfio/platform: fix use of freed memory free the data _after_ using it. hw/vfio/platform.c:126:29: warning: Use of memory after it is freed qemu_set_fd_handler(*pfd, NULL, NULL, NULL); ^~~~ Reported-by: Clang Static Analyzer Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-07-26 11:38:17 -06:00
Dong Jia Shi	6a79dd4631	vfio/ccw: fix initialization of the Object DeviceState pointer in the common base-device Commit `7da624e2` ("vfio: Test realized when using VFIOGroup.device_list iterator") introduced a pointer to the Object DeviceState in the VFIO common base-device and skipped non-realized devices as we iterate VFIOGroup.device_list. While it missed to initialize the pointer for the vfio-ccw case. Let's fix it. Fixes: `7da624e2` ("vfio: Test realized when using VFIOGroup.device_list iterator") Cc: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20170718014926.44781-3-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>	2017-07-25 09:17:42 +02:00
Jing Zhang	28e22d4bae	vfio/ccw: allocate irq info with the right size When allocating memory for the vfio_irq_info parameter of the VFIO_DEVICE_GET_IRQ_INFO ioctl, we used the wrong size. Let's fix it by using the right size. Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Jing Zhang <bjzhjing@linux.vnet.ibm.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170718014926.44781-2-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>	2017-07-25 09:17:42 +02:00
Alexey Kardashevskiy	8c37faa475	vfio-pci, ppc64/spapr: Reorder group-to-container attaching At the moment VFIO PCI device initialization works as follows: vfio_realize vfio_get_group vfio_connect_container register memory listeners (1) update QEMU groups lists vfio_kvm_device_add_group Then (example for pseries) the machine reset hook triggers region_add() for all regions where listeners from (1) are listening: ppc_spapr_reset spapr_phb_reset spapr_tce_table_enable memory_region_add_subregion vfio_listener_region_add vfio_spapr_create_window This scheme works fine until we need to handle VFIO PCI device hotplug and we want to enable PPC64/sPAPR in-kernel TCE acceleration on, i.e. after PCI hotplug we need a place to call ioctl(vfio_kvm_device_fd, KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE). Since the ioctl needs a LIOBN fd (from sPAPRTCETable) and a IOMMU group fd (from VFIOGroup), vfio_listener_region_add() seems to be the only place for this ioctl(). However this only works during boot time because the machine reset happens strictly after all devices are finalized. When hotplug happens, vfio_listener_region_add() is called when a memory listener is registered but when this happens: 1. new group is not added to the container->group_list yet; 2. VFIO KVM device is unaware of the new IOMMU group. This moves bits around to have all necessary VFIO infrastructure in place for both initial startup and hotplug cases. [aw: ie, register vfio groups with kvm prior to memory listener registration such that kvm-vfio pseudo device ioctls are available during the region_add callback] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-07-17 12:39:09 -06:00
Alexey Kardashevskiy	3df9d74806	memory/iommu: QOM'fy IOMMU MemoryRegion This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion as a parent. This moves IOMMU-related fields from MR to IOMMU MR. However to avoid dymanic QOM casting in fast path (address_space_translate, etc), this adds an @is_iommu boolean flag to MR and provides new helper to do simple cast to IOMMU MR - memory_region_get_iommu. The flag is set in the instance init callback. This defines memory_region_is_iommu as memory_region_get_iommu()!=NULL. This switches MemoryRegion to IOMMUMemoryRegion in most places except the ones where MemoryRegion may be an alias. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20170711035620.4232-2-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Alex Williamson	47985727e3	vfio/pci: Fixup v0 PCIe capabilities Intel 82599 VFs report a PCIe capability version of 0, which is invalid. The earliest version of the PCIe spec used version 1. This causes Windows to fail startup on the device and it will be disabled with error code 10. Our choices are either to drop the PCIe cap on such devices, which has the side effect of likely preventing the guest from discovering any extended capabilities, or performing a fixup to update the capability to the earliest valid version. This implements the latter. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-07-10 10:39:43 -06:00
Alex Williamson	7da624e26a	vfio: Test realized when using VFIOGroup.device_list iterator VFIOGroup.device_list is effectively our reference tracking mechanism such that we can teardown a group when all of the device references are removed. However, we also use this list from our machine reset handler for processing resets that affect multiple devices. Generally device removals are fully processed (exitfn + finalize) when this reset handler is invoked, however if the removal is triggered via another reset handler (piix4_reset->acpi_pcihp_reset) then the device exitfn may run, but not finalize. In this case we hit asserts when we start trying to access PCI helpers since much of the PCI state of the device is released. To resolve this, add a pointer to the Object DeviceState in our common base-device and skip non-realized devices as we iterate. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-07-10 10:39:43 -06:00
Mao Zhongyi	2784127857	pci: Replace pci_add_capability2() with pci_add_capability() After the patch 'Make errp the last parameter of pci_add_capability()', pci_add_capability() and pci_add_capability2() now do exactly the same. So drop the wrapper pci_add_capability() of pci_add_capability2(), then replace the pci_add_capability2() with pci_add_capability() everywhere. Cc: pbonzini@redhat.com Cc: rth@twiddle.net Cc: ehabkost@redhat.com Cc: mst@redhat.com Cc: dmitry@daynix.com Cc: jasowang@redhat.com Cc: marcel@redhat.com Cc: alex.williamson@redhat.com Cc: armbru@redhat.com Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-07-03 22:29:49 +03:00
Mao Zhongyi	9a7c2a5970	pci: Make errp the last parameter of pci_add_capability() Add Error argument for pci_add_capability() to leverage the errp to pass info on errors. This way is helpful for its callers to make a better error handling when moving to 'realize'. Cc: pbonzini@redhat.com Cc: rth@twiddle.net Cc: ehabkost@redhat.com Cc: mst@redhat.com Cc: jasowang@redhat.com Cc: marcel@redhat.com Cc: alex.williamson@redhat.com Cc: armbru@redhat.com Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-07-03 22:29:49 +03:00
Mao Zhongyi	9a815774bb	pci: Fix the wrong assertion. pci_add_capability returns a strictly positive value on success, correct asserts. Cc: dmitry@daynix.com Cc: jasowang@redhat.com Cc: kraxel@redhat.com Cc: alex.williamson@redhat.com Cc: armbru@redhat.com Cc: marcel@redhat.com Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-07-03 22:29:49 +03:00
Stefan Hajnoczi	a3203e7dd3	pci, virtio, vhost: fixes A bunch of fixes all over the place. Most notably this fixes the new MTU feature when using vhost. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZK2bwAAoJECgfDbjSjVRpNBgIALmNG7VaixhNUlnfX1n1JBnh +HBP2zNfvi0q5roBuPFmlziKa3IBHb2Fcte4nb6QxmPg+uoaj39AOzfrrvz210kR h2j5Qk2bCdMeWBpxI+xDDScwi/Im23Y6KN1eZyMekFr2CaSGiqOHZPPdbsyEcHPB VylM0uHqSTZL5JAAzEuYlH+LLfPu91HoxMsIAdNuQX+qKyM2DZ4eICBQ0zA73USt OduZltcRMk7UpvQMqY+2iaEXapXQQEUGrP2Mo8ZyqeIl2ItC33GspqBQIKjuZdrr tpr/T1VWsLdZnURZXyELrFqrErDXvKaP9HROwvyLyYPXZF+pJ3LA7TopS5UmfNQ= =Z4xG -----END PGP SIGNATURE----- Merge remote-tracking branch 'mst/tags/for_upstream' into staging pci, virtio, vhost: fixes A bunch of fixes all over the place. Most notably this fixes the new MTU feature when using vhost. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Mon 29 May 2017 01:10:24 AM BST # gpg: using RSA key 0x281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * mst/tags/for_upstream: acpi-test: update expected files pc: ACPI BIOS: use highest NUMA node for hotplug mem hole SRAT entry vhost-user: pass message as a pointer to process_message_reply() virtio_net: Bypass backends for MTU feature negotiation intel_iommu: turn off pt before 2.9 intel_iommu: support passthrough (PT) intel_iommu: allow dev-iotlb context entry conditionally intel_iommu: use IOMMU_ACCESS_FLAG() intel_iommu: provide vtd_ce_get_type() intel_iommu: renaming context entry helpers x86-iommu: use DeviceClass properties memory: remove the last param in memory_region_iommu_replay() memory: tune last param of iommu_ops.translate() Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-30 14:15:04 +01:00
Peter Xu	ad523590f6	memory: remove the last param in memory_region_iommu_replay() We were always passing in that one as "false" to assume that's an read operation, and we also assume that IOMMU translation would always have that read permission. A better permission would be IOMMU_NONE since the replay is after all not a real read operation, but just a page table rebuilding process. CC: David Gibson <david@gibson.dropbear.id.au> CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Dong Jia Shi	334e76850b	vfio/ccw: update sense data if a unit check is pending Concurrent-sense data is currently not delivered. This patch stores the concurrent-sense data to the subchannel if a unit check is pending and the concurrent-sense bit is enabled. Then a TSCH can retreive the right IRB data back to the guest. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-13-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>	2017-05-19 12:29:01 +02:00
Xiao Feng Ren	8ca2b376b4	s390x/css: introduce and realize ccw-request callback Introduce a new callback on subchannel to handle ccw-request. Realize the callback in vfio-ccw device. Besides, resort to the event notifier handler to handling the ccw-request results. 1. Pread the I/O results via MMIO region. 2. Update the scsw info to guest. 3. Inject an I/O interrupt to notify guest the I/O result. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Xiao Feng Ren <renxiaof@linux.vnet.ibm.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-11-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>	2017-05-19 12:29:01 +02:00
Dong Jia Shi	4886b3e9f0	vfio/ccw: get irqs info and set the eventfd fd vfio-ccw resorts to the eventfd mechanism to communicate with userspace. We fetch the irqs info via the ioctl VFIO_DEVICE_GET_IRQ_INFO, register a event notifier to get the eventfd fd which is sent to kernel via the ioctl VFIO_DEVICE_SET_IRQS, then we can implement read operation once kernel sends the signal. Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-10-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>	2017-05-19 12:29:01 +02:00
Dong Jia Shi	c14e706ce9	vfio/ccw: get io region info vfio-ccw provides an MMIO region for I/O operations. We fetch its information via ioctls here, then we can use it performing I/O instructions and retrieving I/O results later on. Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-9-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>	2017-05-19 12:29:01 +02:00
Xiao Feng Ren	1dcac3e152	vfio/ccw: vfio based subchannel passthrough driver We use the IOMMU_TYPE1 of VFIO to realize the subchannels passthrough, implement a vfio based subchannels passthrough driver called "vfio-ccw". Support qemu parameters in the style of: "-device vfio-ccw,sysfsdev=$mdev_file_path,devno=xx.x.xxxx' Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Xiao Feng Ren <renxiaof@linux.vnet.ibm.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-8-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>	2017-05-19 12:29:01 +02:00
Eduardo Habkost	e4f4fb1eca	sysbus: Set user_creatable=false by default on TYPE_SYS_BUS_DEVICE commit `33cd52b5d7` unset cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all sysbus devices appear on "-device help" and lack the "no-user" flag in "info qdm". To fix this, we can set user_creatable=false by default on TYPE_SYS_BUS_DEVICE, but this requires setting user_creatable=true explicitly on the sysbus devices that actually work with -device. Fortunately today we have just a few has_dynamic_sysbus=1 machines: virt, pc-q35-, ppce500, and spapr. virt, ppce500, and spapr have extra checks to ensure just a few device types can be instantiated: virt supports only TYPE_VFIO_CALXEDA_XGMAC, TYPE_VFIO_AMD_XGBE. * ppce500 supports only TYPE_ETSEC_COMMON. * spapr supports only TYPE_SPAPR_PCI_HOST_BRIDGE. This patch sets user_creatable=true explicitly on those 4 device classes. Now, the more complex cases: pc-q35-: q35 has no sysbus device whitelist yet (which is a separate bug). We are in the process of fixing it and building a sysbus whitelist on q35, but in the meantime we can fix the "-device help" and "info qdm" bugs mentioned above. Also, despite not being strictly necessary for fixing the q35 bug, reducing the list of user_creatable=true devices will help us be more confident when building the q35 whitelist. xen: We also have a hack at xen_set_dynamic_sysbus(), that sets has_dynamic_sysbus=true at runtime when using the Xen accelerator. This hack is only used to allow xen-backend devices to be dynamically plugged/unplugged. This means today we can use -device with the following 22 device types, that are the ones compiled into the qemu-system-x86_64 and qemu-system-i386 binaries: allwinner-ahci * amd-iommu * cfi.pflash01 * esp * fw_cfg_io * fw_cfg_mem * generic-sdhci * hpet * intel-iommu * ioapic * isabus-bridge * kvmclock * kvm-ioapic * kvmvapic * SUNW,fdtwo * sysbus-ahci * sysbus-fdc * sysbus-ohci * unimplemented-device * virtio-mmio * xen-backend * xen-sysdev This patch adds user_creatable=true explicitly to those devices, temporarily, just to keep 100% compatibility with existing behavior of q35. Subsequent patches will remove user_creatable=true from the devices that are really not meant to user-creatable on any machine, and remove the FIXME comment from the ones that are really supposed to be user-creatable. This is being done in separate patches because we still don't have an obvious list of devices that will be whitelisted by q35, and I would like to get each device reviewed individually. Cc: Alexander Graf <agraf@suse.de> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Beniamino Galvani <b.galvani@gmail.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Frank Blaschka <frank.blaschka@de.ibm.com> Cc: Gabriel L. Somlo <somlo@cmu.edu> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: John Snow <jsnow@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Laszlo Ersek <lersek@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Pierre Morel <pmorel@linux.vnet.ibm.com> Cc: Prasad J Pandit <pjp@fedoraproject.org> Cc: qemu-arm@nongnu.org Cc: qemu-block@nongnu.org Cc: qemu-ppc@nongnu.org Cc: Richard Henderson <rth@twiddle.net> Cc: Rob Herring <robh@kernel.org> Cc: Shannon Zhao <zhaoshenglong@huawei.com> Cc: sstabellini@kernel.org Cc: Thomas Huth <thuth@redhat.com> Cc: Yi Min Zhao <zyimin@linux.vnet.ibm.com> Acked-by: John Snow <jsnow@redhat.com> Acked-by: Juergen Gross <jgross@suse.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170503203604.31462-3-ehabkost@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [ehabkost: Small changes at sysbus_device_class_init() comments] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-17 10:37:01 -03:00
Dong Jia Shi	6e4e6f0d40	vfio/pci: Fix incorrect error message When the "No host device provided" error occurs, the hint message that starts with "Use -vfio-pci," makes no sense, since "-vfio-pci" is not a valid command line parameter. Correct this by replacing "-vfio-pci" with "-device vfio-pci". Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-05-03 14:52:35 -06:00
Jose Ricardo Ziviani	38d49e8c15	vfio: enable 8-byte reads/writes to vfio This patch enables 8-byte writes and reads to VFIO. Such implemention is already done but it's missing the 'case' to handle such accesses in both vfio_region_write and vfio_region_read and the MemoryRegionOps: impl.max_access_size and impl.min_access_size. After this patch, 8-byte writes such as: qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc0, 0x4140c, 4) vfio_region_write (0001:03:00.0:region1+0xc4, 0xa0000, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 goes like this: qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc0, 0xbfd0008, 8) qemu_mutex_unlock unlocked mutex 0x10905ad8 Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-05-03 14:52:34 -06:00
Jose Ricardo Ziviani	15126cba86	vfio: Set MemoryRegionOps:max_access_size and min_access_size Sets valid.max_access_size and valid.min_access_size to ensure safe 8-byte accesses to vfio. Today, 8-byte accesses are broken into pairs of 4-byte calls that goes unprotected: qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc0, 0x2020c, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc4, 0xa0000, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 which occasionally leads to: qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc0, 0x2030c, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc0, 0x1000c, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc4, 0xb0000, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc4, 0xa0000, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 causing strange errors in guest OS. With this patch, such accesses are protected by the same lock guard: qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc0, 0x2000c, 4) vfio_region_write (0001:03:00.0:region1+0xc4, 0xb0000, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 This happens because the 8-byte write should be broken into 4-byte writes by memory.c:access_with_adjusted_size() in order to be under the same lock. Today, it's done in exec.c:address_space_write_continue() which was able to handle only 4 bytes due to a zero'ed valid.max_access_size (see exec.c:memory_access_size()). Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-05-03 14:52:34 -06:00
Peter Xu	698feb5e13	memory: add section range info for IOMMU notifier In this patch, IOMMUNotifier.{start\|end} are introduced to store section information for a specific notifier. When notification occurs, we not only check the notification type (MAP\|UNMAP), but also check whether the notified iova range overlaps with the range of specific IOMMU notifier, and skip those notifiers if not in the listened range. When removing an region, we need to make sure we removed the correct VFIOGuestIOMMU by checking the IOMMUNotifier.start address as well. This patch is solving the problem that vfio-pci devices receive duplicated UNMAP notification on x86 platform when vIOMMU is there. The issue is that x86 IOMMU has a (0, 2^64-1) IOMMU region, which is splitted by the (0xfee00000, 0xfeefffff) IRQ region. AFAIK this (splitted IOMMU region) is only happening on x86. This patch also helps vhost to leverage the new interface as well, so that vhost won't get duplicated cache flushes. In that sense, it's an slight performance improvement. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-2-git-send-email-peterx@redhat.com> [ehabkost: included extra vhost_iommu_region_del() change from Peter Xu] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-04-20 15:22:41 -03:00
Alex Williamson	8f419c5b43	vfio/pci-quirks: Exclude non-ioport BAR from NVIDIA quirk The NVIDIA BAR5 quirk is targeting an ioport BAR. Some older devices have a BAR5 which is not ioport and can induce a segfault here. Test the BAR type to skip these devices. Link: https://bugs.launchpad.net/qemu/+bug/1678466 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-04-06 16:03:26 -06:00
Xiong Zhang	93587e3af3	Revert "vfio/pci-quirks.c: Disable stolen memory for igd VFIO" This reverts commit `c2b2e158cc`. The original patch intend to prevent linux i915 driver from using stolen meory. But this patch breaks windows IGD driver loading on Gen9+, as IGD HW will use stolen memory on Gen9+, once windows IGD driver see zero size stolen memory, it will unload. Meanwhile stolen memory will be disabled in 915 when i915 run as a guest. Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> [aw: Gen9+ is SkyLake and newer] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-03-31 10:04:41 -06:00
XiongZhang	c2b2e158cc	vfio/pci-quirks.c: Disable stolen memory for igd VFIO Regardless of running in UPT or legacy mode, the guest igd drivers may attempt to use stolen memory, however only legacy mode has BIOS support for reserving stolen memmory in the guest VM. We zero out the stolen memory size in all cases, then guest igd driver won't use stolen memory. In legacy mode, user could use x-igd-gms option to specify the amount of stolen memory which will be pre-allocated and reserved by bios for igd use. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99028 https://bugs.freedesktop.org/show_bug.cgi?id=99025 Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-02-22 13:19:59 -07:00
Alex Williamson	d0d1cd70d1	vfio/pci: Improve extended capability comments, skip masked caps Since commit `4bb571d857` ("pci/pcie: don't assume cap id 0 is reserved") removes the internal use of extended capability ID 0, the comment here becomes invalid. However, peeling back the onion, the code is still correct and we still can't seed the capability chain with ID 0, unless we want to muck with using the version number to force the header to be non-zero, which is much uglier to deal with. The comment also now covers some of the subtleties of using cap ID 0, such as transparently indicating absence of capabilities if none are added. This doesn't detract from the correctness of the referenced commit as vfio in the kernel also uses capability ID zero to mask capabilties. In fact, we should skip zero capabilities precisely because the kernel might also expose such a capability at the head position and re-introduce the problem. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Reported-by: Jintack Lim <jintack@cs.columbia.edu> Tested-by: Jintack Lim <jintack@cs.columbia.edu>	2017-02-22 13:19:58 -07:00
Alex Williamson	35c7cb4caf	vfio/pci: Report errors from qdev_unplug() via device request Currently we ignore this error, report it with error_reportf_err() Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>	2017-02-22 13:19:58 -07:00
Peter Xu	dfbd90e5b9	vfio: allow to notify unmap for very large region Linux vfio driver supports to do VFIO_IOMMU_UNMAP_DMA for a very big region. This can be leveraged by QEMU IOMMU implementation to cleanup existing page mappings for an entire iova address space (by notifying with an IOTLB with extremely huge addr_mask). However current vfio_iommu_map_notify() does not allow that. It make sure that all the translated address in IOTLB is falling into RAM range. The check makes sense, but it should only be a sensible checker for mapping operations, and mean little for unmap operations. This patch moves this check into map logic only, so that we'll get faster unmap handling (no need to translate again), and also we can then better support unmapping a very big region when it covers non-ram ranges or even not-existing ranges. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-17 21:52:31 +02:00
Peter Xu	4a4b88fbe1	vfio: introduce vfio_get_vaddr() A cleanup for vfio_iommu_map_notify(). Now we will fetch vaddr even if the operation is unmap, but it won't hurt much. One thing to mention is that we need the RCU read lock to protect the whole translation and map/unmap procedure. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-17 21:52:31 +02:00
Peter Xu	3213835720	vfio: trace map/unmap for notify as well We traces its range, but we don't know whether it's a MAP/UNMAP. Let's dump it as well. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-17 21:52:31 +02:00
Thomas Huth	e197de50c6	hw/vfio: Add CONFIG switches for calxeda-xgmac and amd-xgbe Both devices seem to be specific to the ARM platform. It's confusing for the users if they show up on other target architectures, too (e.g. when the user runs QEMU with "-device ?" to get a list of supported devices). Thus let's introduce proper configuration switches so that the devices are only compiled and included when they are really required. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-02-10 13:12:03 -07:00
Thomas Huth	f23363ea44	hw/vfio/pci-quirks: Set category of the "vfio-pci-igd-lpc-bridge" device The device has "bridge" in its name, so it should obviously be in the category DEVICE_CATEGORY_BRIDGE. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-02-10 13:12:03 -07:00
Alex Williamson	ac2a9862b7	vfio-pci: Fix GTT wrap-around for Skylake+ IGD Previous IGD, up through Broadwell, only seem to write GTT values into the first 1MB of space allocated for the BDSM, but clearly the GTT can be multiple MB in size. Our test in vfio_igd_quirk_data_write() correctly filters out indexes beyond 1MB, but given the 1MB mask we're using, we re-apply writes only to the first 1MB of the guest allocated BDSM. We can't assume either the host or guest BDSM is naturally aligned, so we can't simply apply a different mask. Instead, save the host BDSM and do the arithmetic to subtract the host value to get the BDSM offset and add it to the guest allocated BDSM. Reported-by: Alexander Indenbaum <alexander.indenbaum@gmail.com> Tested-by: Alexander Indenbaum <alexander.indenbaum@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-02-10 13:12:03 -07:00
Peter Maydell	4e9f5244e1	-----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYkeZAAAoJEJykq7OBq3PI6oUH/3qlRvQrWmhWLR+XCtwU0gON HRApL57Of+B1YbqJzb8wzjLMLfzZQYLoT7kf3FDRON751Iwpv2Qyl6j79kbmOQwy txvtgUTtPZrOZ9HMk6M1VboiKrkM1t0I1QiRYy/af2f1gD3KTqIt8YN1ic3xatKD Fgmx+oD+6EkrNilthemvDyaXtGsdTl4GC9ZbGcJB2VJzzWkksRUfeZWysIu9p2zP l6viegW/1+o5wYgBt6DxMalfNGbEiuBgXgx6PVFPbkw0xNURC52qDHhQ91xTSWt1 pvFrIhYWR/ETN0twJh+jtmCjkawKWSsx2nrLlrSh4H0EpwFoRfFqH/ZrOFSg0wg= =QnCX -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging # gpg: Signature made Wed 01 Feb 2017 13:44:32 GMT # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/tracing-pull-request: trace: clean up trace-events files qapi: add missing trace_visit_type_enum() call trace: improve error reporting when parsing simpletrace header trace: update docs to reflect new code generation approach trace: switch to modular code generation for sub-directories trace: move setting of group name into Makefiles trace: move hw/i386/xen events to correct subdir trace: move hw/xen events to correct subdir trace: move hw/block/dataplane events to correct subdir make: move top level dir to end of include search path # Conflicts: # Makefile Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-02-02 16:08:28 +00:00
Cao jin	ee640c625e	pci: Convert msix_init() to Error and fix callers msix_init() reports errors with error_report(), which is wrong when it's used in realize(). The same issue was fixed for msi_init() in commit `1108b2f`. In order to make the API change as small as possible, leave the return value check to later patch. For some devices(like e1000e, vmxnet3, nvme) who won't fail because of msix_init's failure, suppress the error report by passing NULL error object. Bonus: add comment for msix_init. CC: Jiri Pirko <jiri@resnulli.us> CC: Gerd Hoffmann <kraxel@redhat.com> CC: Dmitry Fleytman <dmitry@daynix.com> CC: Jason Wang <jasowang@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Hannes Reinecke <hare@suse.de> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Alex Williamson <alex.williamson@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-01 03:37:18 +02:00
Stefan Hajnoczi	7f4076c1bb	trace: clean up trace-events files There are a number of unused trace events that scripts/cleanup-trace-events.pl finds. The "hw/vfio/pci-quirks.c" filename was typoed and "qapi/qapi-visit-core.c" was missing the qapi/ directory prefix. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170126171613.1399-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-31 17:12:15 +00:00
Cao jin	8907379204	vfio: remove a duplicated word in comments Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-01-24 23:26:53 +03:00
Stefan Weil	b12227afb1	hw: Fix typos found by codespell Signed-off-by: Stefan Weil <sw@weilnetz.de> Acked-by: Alistair Francis <alistair.francis@xilinx.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-01-24 23:26:52 +03:00
Yongji Xie	95251725e3	vfio: Add support for mmapping sub-page MMIO BARs Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive") allows VFIO to mmap sub-page BARs. This is the corresponding QEMU patch. With those patches applied, we could passthrough sub-page BARs to guest, which can help to improve IO performance for some devices. In this patch, we expand MemoryRegions of these sub-page MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION with a valid size. The expanding size will be recovered when the base address of sub-page BAR is changed and not page aligned any more in guest. And we also set the priority of these BARs' memory regions to zero in case of overlap with BARs which share the same page with sub-page BARs in guest. Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-31 09:53:04 -06:00
Ido Yariv	a52a4c4717	vfio/pci: fix out-of-sync BAR information on reset When a PCI device is reset, pci_do_device_reset resets all BAR addresses in the relevant PCIDevice's config buffer. The VFIO configuration space stays untouched, so the guest OS may choose to skip restoring the BAR addresses as they would seem intact. The PCI device may be left non-operational. One example of such a scenario is when the guest exits S3. Fix this by resetting the BAR addresses in the VFIO configuration space as well. Signed-off-by: Ido Yariv <ido@wizery.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-31 09:53:04 -06:00
Alex Williamson	24acf72b9a	vfio: Handle zero-length sparse mmap ranges As reported in the link below, user has a PCI device with a 4KB BAR which contains the MSI-X table. This seems to hit a corner case in the kernel where the region reports being mmap capable, but the sparse mmap information reports a zero sized range. It's not entirely clear that the kernel is incorrect in doing this, but regardless, we need to handle it. To do this, fill our mmap array only with non-zero sized sparse mmap entries and add an error return from the function so we can tell the difference between nr_mmaps being zero based on sparse mmap info vs lack of sparse mmap info. NB, this doesn't actually change the behavior of the device, it only removes the scary "Failed to mmap ... Performance may be slow" error message. We cannot currently create an mmap over the MSI-X table. Link: http://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00009.html Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-31 09:53:03 -06:00
Alex Williamson	21e00fa55f	memory: Replace skip_dump flag with "ram_device" Setting skip_dump on a MemoryRegion allows us to modify one specific code path, but the restriction we're trying to address encompasses more than that. If we have a RAM MemoryRegion backed by a physical device, it not only restricts our ability to dump that region, but also affects how we should manipulate it. Here we recognize that MemoryRegions do not change to sometimes allow dumps and other times not, so we replace setting the skip_dump flag with a new initializer so that we know exactly the type of region to which we're applying this behavior. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>	2016-10-31 09:53:03 -06:00
Cao jin	893bfc3cc8	vfio: fix duplicate function call When vfio device is reset(encounter FLR, or bus reset), if need to do bus reset(vfio_pci_hot_reset_one is called), vfio_pci_pre_reset & vfio_pci_post_reset will be called twice. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:03 -06:00
Thorsten Kohfeldt	31e6a7b17b	vfio/pci: Fix vfio_rtl8168_quirk_data_read address offset Introductory comment for rtl8168 VFIO MSI-X quirk states: At BAR2 offset 0x70 there is a dword data register, offset 0x74 is a dword address register. vfio: vfio_bar_read(0000:05:00.0:BAR2+0x70, 4) = 0xfee00398 // read data Thus, correct offset for data read is 0x70, but function vfio_rtl8168_quirk_data_read() wrongfully uses offset 0x74. Signed-off-by: Thorsten Kohfeldt <thorsten.kohfeldt@gmx.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:02 -06:00
Eric Auger	4a94626850	vfio/pci: Handle host oversight In case the end-user calls qemu with -vfio-pci option without passing either sysfsdev or host property value, the device is interpreted as 0000:00:00.0. Let's create a specific error message to guide the end-user. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:02 -06:00
Eric Auger	e04cff9d97	vfio/pci: Remove vfio_populate_device returned value The returned value (either -errno or -1) is not used anymore by the caller, vfio_realize, since the error now is stored in the error object. So let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:02 -06:00
Eric Auger	ec3bcf424e	vfio/pci: Remove vfio_msix_early_setup returned value The returned value is not used anymore by the caller, vfio_realize, since the error now is stored in the error object. So let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:01 -06:00
Eric Auger	1a22aca1d0	vfio/pci: Conversion to realize This patch converts VFIO PCI to realize function. Also original initfn errors now are propagated using QEMU error objects. All errors are formatted with the same pattern: "vfio: %s: the error description" Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:01 -06:00
Eric Auger	9bdbfbd50d	vfio/platform: Pass an error object to vfio_base_device_init This patch propagates errors encountered during vfio_base_device_init up to the realize function. In case the host value is not set or badly formed we now report an error. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:01 -06:00
Eric Auger	0d84f47bff	vfio/platform: fix a wrong returned value in vfio_populate_device In case the vfio_init_intp fails we currently do not return an error value. This patch fixes the bug. The returned value is not explicit but in practice the error object is the one used to report the error to the end-user and the actual returned error value is not used. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:00 -06:00
Eric Auger	5ff7419d4c	vfio/platform: Pass an error object to vfio_populate_device Propagate the vfio_populate_device errors up to vfio_base_device_init. The error object also is passed to vfio_init_intp. At the moment we only report the error. Subsequent patches will propagate the error up to the realize function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:00 -06:00
Eric Auger	59f7d6743c	vfio: Pass an error object to vfio_get_device Pass an error object to prepare for migration to VFIO-PCI realize. In vfio platform vfio_base_device_init we currently just report the error. Subsequent patches will propagate the error up to the realize function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:00 -06:00
Eric Auger	1b808d5be0	vfio: Pass an error object to vfio_get_group Pass an error object to prepare for migration to VFIO-PCI realize. For the time being let's just simply report the error in vfio platform's vfio_base_device_init(). A subsequent patch will duly propagate the error up to vfio_platform_realize. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:59 -06:00
Eric Auger	01905f58f1	vfio: Pass an Error object to vfio_connect_container The error is currently simply reported in vfio_get_group. Don't bother too much with the prefix which will be handled at upper level, later on. Also return an error value in case container->error is not 0 and the container is teared down. On vfio_spapr_remove_window failure, we also report an error whereas it was silent before. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:59 -06:00
Eric Auger	7237011d05	vfio/pci: Pass an error object to vfio_pci_igd_opregion_init Pass an error object to prepare for migration to VFIO-PCI realize. In vfio_probe_igd_bar4_quirk, simply report the error. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:59 -06:00
Eric Auger	7ef165b9a8	vfio/pci: Pass an error object to vfio_add_capabilities Pass an error object to prepare for migration to VFIO-PCI realize. The error is cascaded downto vfio_add_std_cap and then vfio_msi(x)_setup, vfio_setup_pcie_cap. vfio_add_ext_cap does not return anything else than 0 so let's transform it into a void function. Also use pci_add_capability2 which takes an error object. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:58 -06:00
Eric Auger	7dfb34247e	vfio/pci: Pass an error object to vfio_intx_enable Pass an error object to prepare for migration to VFIO-PCI realize. The error object is propagated down to vfio_intx_enable_kvm(). The three other callers, vfio_intx_enable_kvm(), vfio_msi_disable_common() and vfio_pci_post_reset() do not propagate the error and simply call error_reportf_err() with the ERR_PREFIX formatting. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:58 -06:00
Eric Auger	008d0e2d7b	vfio/pci: Pass an error object to vfio_msix_early_setup Pass an error object to prepare for migration to VFIO-PCI realize. The returned value will be removed later on. We now format an error in case of reading failure for - the MSIX flags - the MSIX table, - the MSIX PBA. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:58 -06:00
Eric Auger	2312d907dd	vfio/pci: Pass an error object to vfio_populate_device Pass an error object to prepare for migration to VFIO-PCI realize. The returned value will be removed later on. The case where error recovery cannot be enabled is not converted into an error object but directly reported through error_report, as before. Populating an error instead would cause the future realize function to fail, which is not wanted. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:57 -06:00
Eric Auger	cde4279baa	vfio/pci: Pass an error object to vfio_populate_vga Pass an error object to prepare for the same operation in vfio_populate_device. Eventually this contributes to the migration to VFIO-PCI realize. We now report an error on vfio_get_region_info failure. vfio_probe_igd_bar4_quirk is not involved in the migration to realize and simply calls error_reportf_err. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:57 -06:00
Eric Auger	426ec9049e	vfio/pci: Use local error object in vfio_initfn To prepare for migration to realize, let's use a local error object in vfio_initfn. Also let's use the same error prefix for all error messages. On top of the 1-1 conversion, we start using a common error prefix for all error messages. We also introduce a similar warning prefix which will be used later on. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:56 -06:00
Peter Xu	cdb3081269	memory: introduce IOMMUNotifier and its caps IOMMU Notifier list is used for notifying IO address mapping changes. Currently VFIO is the only user. However it is possible that future consumer like vhost would like to only listen to part of its notifications (e.g., cache invalidations). This patch introduced IOMMUNotifier and IOMMUNotfierFlag bits for a finer grained control of it. IOMMUNotifier contains a bitfield for the notify consumer describing what kind of notification it is interested in. Currently two kinds of notifications are defined: - IOMMU_NOTIFIER_MAP: for newly mapped entries (additions) - IOMMU_NOTIFIER_UNMAP: for entries to be removed (cache invalidates) When registering the IOMMU notifier, we need to specify one or multiple types of messages to listen to. When notifications are triggered, its type will be checked against the notifier's type bits, and only notifiers with registered bits will be notified. (For any IOMMU implementation, an in-place mapping change should be notified with an UNMAP followed by a MAP.) Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1474606948-14391-2-git-send-email-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-27 08:59:16 +02:00
David Gibson	6d17a018d0	vfio/pci: Fix regression in MSI routing configuration `d1f6af6` "kvm-irqchip: simplify kvm_irqchip_add_msi_route" was a cleanup of kvmchip routing configuration, that was mostly intended for x86. However, it also contains a subtle change in behaviour which breaks EEH[1] error recovery on certain VFIO passthrough devices on spapr guests. So far it's only been seen on a BCM5719 NIC on a POWER8 server, but there may be other hardware with the same problem. It's also possible there could be circumstances where it causes a bug on x86 as well, though I don't know of any obvious candidates. Prior to `d1f6af6`, both vfio_msix_vector_do_use() and vfio_add_kvm_msi_virq() used msg == NULL as a special flag to mark this as the "dummy" vector used to make the host hardware state sync with the guest expected hardware state in terms of MSI configuration. Specifically that flag caused vfio_add_kvm_msi_virq() to become a no-op, meaning the dummy irq would always be delivered via qemu. `d1f6af6` changed vfio_add_kvm_msi_virq() so it takes a vector number instead of the msg parameter, and determines the correct message itself. The test for !msg was removed, and not replaced with anything there or in the caller. With an spapr guest which has a VFIO device, if an EEH error occurs on the host hardware, then the device will be isolated then reset. This is a combination of host and guest action, mediated by some EEH related hypercalls. I haven't fully traced the mechanics, but somehow installing the kvm irqchip route for the dummy irq on the BCM5719 means that after EEH reset and recovery, at least some irqs are no longer delivered to the guest. In particular, the guest never gets the link up event, and so the NIC is effectively dead. [1] EEH (Enhanced Error Handling) is an IBM POWER server specific PCI-* error reporting and recovery mechanism. The concept is somewhat similar to PCI-E AER, but the details are different. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1373802 Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Gavin Shan <gwshan@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Cc: qemu-stable@nongnu.org Fixes: `d1f6af6a17` ("kvm-irqchip: simplify kvm_irqchip_add_msi_route") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-09-15 10:41:36 -06:00
Laurent Vivier	e723b87103	trace-events: fix first line comment in trace-events Documentation is docs/tracing.txt instead of docs/trace-events.txt. find . -name trace-events -exec \ sed -i "s?See docs/trace-events.txt for syntax documentation.?See docs/tracing.txt for syntax documentation.?" \ {} \; Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-id: 1470669081-17860-1-git-send-email-lvivier@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-08-12 10:36:01 +01:00
Markus Armbruster	fea1c0999a	vfio: Use error_report() instead of error_printf() for errors Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1470224274-31522-4-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-08-08 09:01:18 +02:00
Peter Xu	3f1fea0fb5	kvm-irqchip: do explicit commit when update irq In the past, we are doing gsi route commit for each irqchip route update. This is not efficient if we are updating lots of routes in the same time. This patch removes the committing phase in kvm_irqchip_update_msi_route(). Instead, we do explicit commit after all routes updated. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-07-21 20:44:19 +03:00
Peter Xu	d1f6af6a17	kvm-irqchip: simplify kvm_irqchip_add_msi_route Changing the original MSIMessage parameter in kvm_irqchip_add_msi_route into the vector number. Vector index provides more information than the MSIMessage, we can retrieve the MSIMessage using the vector easily. This will avoid fetching MSIMessage every time before adding MSI routes. Meanwhile, the vector info will be used in the coming patches to further enable gsi route update notifications. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-07-21 20:44:18 +03:00
Alex Williamson	383a7af7ec	vfio/pci: Hide ARI capability QEMU supports ARI on downstream ports and assigned devices may support ARI in their extended capabilities. The endpoint ARI capability specifies the next function, such that the OS doesn't need to walk each possible function, however this next function is relative to the host, not the guest. This leads to device discovery issues when we combine separate functions into virtual multi-function packages in a guest. For example, SR-IOV VFs are not enumerated by simply probing the function address space, therefore the ARI next-function field is zero. When we combine multiple VFs together as a multi-function device in the guest, the guest OS identifies ARI is enabled, relies on this next-function field, and stops looking for additional function after the first is found. Long term we should expose the ARI capability to the guest to enable configurations with more than 8 functions per slot, but this requires additional QEMU PCI infrastructure to manage the next-function field for multiple, otherwise independent devices. In the short term, hiding this capability allows equivalent functionality to what we currently have on non-express chipsets. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>	2016-07-18 10:55:17 -06:00
David Gibson	21bb3093e6	vfio/spapr: Remove stale ioctl() call This ioctl() call to VFIO_IOMMU_SPAPR_TCE_REMOVE was left over from an earlier version of the code and has since been folded into vfio_spapr_remove_window(). It wasn't caught because although the argument structure has been removed, the libc function remove() means this didn't trigger a compile failure. The ioctl() was also almost certain to fail silently and harmlessly with the bogus argument, so this wasn't caught in testing. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2016-07-18 10:40:27 +10:00
Markus Armbruster	a9c94277f0	Use #include "..." for our own headers, <...> for others Tracked down with an ugly, brittle and probably buggy Perl script. Also move includes converted to <...> up so they get included before ours where that's obviously okay. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2016-07-12 16:19:16 +02:00
Peter Maydell	791b7d2340	pc, pci, virtio: new features, cleanups, fixes iommus can not be added with -device. cleanups and fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXe4l4AAoJECgfDbjSjVRpIz4IALye7mKG61/POA4Gqmhalc3d HnlNSZ2YcKAuvPg7WWkBuRacrQvVY/MbW1mLloG1lY0tdFgZG8Cy+CY6wJg1NE4c cXd+77vHkIyrnl+Nil+QOgTFiAsMnD+mXHHsnCDw2jGn3JbgVNuCMi7V34fGkQd2 PDkZyYfwTqO3HytuG0/j2Somc9du1gjYdn+9qigfZVgP96jGDojBuJWuuU5flCB3 Kj5xrOuI01XlbdTk71tVjBJBektQurWr6r7GECDqZIpUfc+BI70FU9jPh+OlLTO/ 92yi29ncjyStz4tRnf18xoQ8uBgH/tD1xigEUPRtnm1+0i/tgONBL8cAdBF9FBE= =ABGE -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging pc, pci, virtio: new features, cleanups, fixes iommus can not be added with -device. cleanups and fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 05 Jul 2016 11:18:32 BST # gpg: using RSA key 0x281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (30 commits) vmw_pvscsi: remove unnecessary internal msi state flag e1000e: remove unnecessary internal msi state flag vmxnet3: remove unnecessary internal msi state flag mptsas: remove unnecessary internal msi state flag megasas: remove unnecessary megasas_use_msi() pci: Convert msi_init() to Error and fix callers to check it pci bridge dev: change msi property type megasas: change msi/msix property type mptsas: change msi property type intel-hda: change msi property type usb xhci: change msi/msix property type change pvscsi_init_msi() type to void tests: add APIC.cphp and DSDT.cphp blobs tests: acpi: add CPU hotplug testcase log: Permit -dfilter 0..0xffffffffffffffff range: Replace internal representation of Range range: Eliminate direct Range member access log: Clean up misuse of Range for -dfilter pci_register_bar: cleanup Revert "virtio-net: unbreak self announcement and guest offloads after migration" ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-07-05 16:48:24 +01:00
Cao jin	1108b2f8a9	pci: Convert msi_init() to Error and fix callers to check it msi_init() reports errors with error_report(), which is wrong when it's used in realize(). Fix by converting it to Error. Fix its callers to handle failure instead of ignoring it. For those callers who don't handle the failure, it might happen: when user want msi on, but he doesn't get what he want because of msi_init fails silently. cc: Gerd Hoffmann <kraxel@redhat.com> cc: John Snow <jsnow@redhat.com> cc: Dmitry Fleytman <dmitry@daynix.com> cc: Jason Wang <jasowang@redhat.com> cc: Michael S. Tsirkin <mst@redhat.com> cc: Hannes Reinecke <hare@suse.de> cc: Paolo Bonzini <pbonzini@redhat.com> cc: Alex Williamson <alex.williamson@redhat.com> cc: Markus Armbruster <armbru@redhat.com> cc: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com>	2016-07-05 13:14:41 +03:00
Alexey Kardashevskiy	2e4109de8e	vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management. This adds ability to VFIO common code to dynamically allocate/remove DMA windows in the host kernel when new VFIO container is added/removed. This adds a helper to vfio_listener_region_add which makes VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into the host IOMMU list; the opposite action is taken in vfio_listener_region_del. When creating a new window, this uses heuristic to decide on the TCE table levels number. This should cause no guest visible change in behavior. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Added some casts to prevent printf() warnings on certain targets where the kernel headers' __u64 doesn't match uint64_t or PRIx64] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00
Alexey Kardashevskiy	f4ec5e26ed	vfio: Add host side DMA window capabilities There are going to be multiple IOMMUs per a container. This moves the single host IOMMU parameter set to a list of VFIOHostDMAWindow. This should cause no behavioral change and will be used later by the SPAPR TCE IOMMU v2 which will also add a vfio_host_win_del() helper. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00
Alexey Kardashevskiy	318f67ce13	vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) This makes use of the new "memory registering" feature. The idea is to provide the userspace ability to notify the host kernel about pages which are going to be used for DMA. Having this information, the host kernel can pin them all once per user process, do locked pages accounting (once) and not spent time on doing that in real time with possible failures which cannot be handled nicely in some cases. This adds a prereg memory listener which listens on address_space_memory and notifies a VFIO container about memory which needs to be pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped. The feature is only enabled for SPAPR IOMMU v2. The host kernel changes are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does not call it when v2 is detected and enabled. This enforces guest RAM blocks to be host page size aligned; however this is not new as KVM already requires memory slots to be host page size aligned. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [dwg: Fix compile error on 32-bit host] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:30:54 +10:00
Alexey Kardashevskiy	d22d8956b1	memory: Add MemoryRegionIOMMUOps.notify_started/stopped callbacks The IOMMU driver may change behavior depending on whether a notifier client is present. In the case of POWER, this represents a change in the visibility of the IOTLB, for other drivers such as intel-iommu and future AMD-Vi emulation, notifier support is not yet enabled and this provides the opportunity to flag that incompatibility. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> [new log & extracted from [PATCH qemu v17 12/12] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping listening] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-30 13:00:23 -06:00
Alex Williamson	e37dac06dc	vfio/pci: Hide SR-IOV capability The kernel currently exposes the SR-IOV capability as read-only through vfio-pci. This is sufficient to protect the host kernel, but has the potential to confuse guests without further virtualization. In particular, OVMF tries to size the VF BARs and comes up with absurd results, ending with an assert. There's not much point in adding virtualization to a read-only capability, so we simply hide it for now. If the kernel ever enables SR-IOV virtualization, we should easily be able to test it through VF BAR sizing or explicit flags. Testing whether we should parse extended capabilities is also pulled into the function to keep these assumptions in one place. Tested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-30 13:00:23 -06:00
Chen Fan	325ae8d548	vfio: add pcie extended capability support For vfio pcie device, we could expose the extended capability on PCIE bus. due to add a new pcie capability at the tail of the chain, in order to avoid config space overwritten, we introduce a copy config for parsing extended caps. and rebuild the pcie extended config space. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-30 13:00:23 -06:00
Alex Williamson	4d3fc4fdc6	vfio/pci: Fix VGA quirks Commit `2d82f8a3cd` ("vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions") converted VFIOPCIDevice.vga to be dynamically allocted, negating the need for VFIOPCIDevice.has_vga. Unfortunately not all of the has_vga users were converted, nor was the field removed from the structure. Correct these oversights. Reported-by: Peter Maloney <peter.maloney@brockmann-consult.de> Tested-by: Peter Maloney <peter.maloney@brockmann-consult.de> Fixes: `2d82f8a3cd` ("vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions") Fixes: https://bugs.launchpad.net/qemu/+bug/1591628 Cc: qemu-stable@nongnu.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-30 13:00:22 -06:00
Alexey Kardashevskiy	f682e9c244	memory: Add reporting of supported page sizes Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate uses when translating, however this information is not available outside the translate context for various checks. This adds a get_min_page_size callback to MemoryRegionIOMMUOps and a wrapper for it so IOMMU users (such as VFIO) can know the minimum actual page size supported by an IOMMU. As IOMMU MR represents a guest IOMMU, this uses TARGET_PAGE_SIZE as fallback. This removes vfio_container_granularity() and uses new helper in memory_region_iommu_replay() when replaying IOMMU mappings on added IOMMU memory region. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> [dwg: Removed an unnecessary calculation] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-06-22 11:13:09 +10:00

1 2 3 4 5 ...

264 Commits