mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Yi Liu	44ee6aaae0	vfio/common: Rename VFIOGuestIOMMU::iommu into ::iommu_mr Rename VFIOGuestIOMMU iommu field into iommu_mr. Then it becomes clearer it is an IOMMU memory region. no functional change intended Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20220502094223.36384-4-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-06 09:06:51 -06:00
Eric Auger	851d6d1a0f	vfio/common: remove spurious tpm-crb-cmd misalignment warning The CRB command buffer currently is a RAM MemoryRegion and given its base address alignment, it causes an error report on vfio_listener_region_add(). This region could have been a RAM device region, easing the detection of such safe situation but this option was not well received. So let's add a helper function that uses the memory region owner type to detect the situation is safe wrt the assignment. Other device types can be checked here if such kind of problem occurs again. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Acked-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20220506132510.1847942-3-eric.auger@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-06 09:06:50 -06:00
Xiang Chen	99510d271b	vfio/common: Fix a small boundary issue of a trace It uses [offset, offset + size - 1] to indicate that the length of range is size in most places in vfio trace code (such as trace_vfio_region_region_mmap()) execpt trace_vfio_region_sparse_mmap_entry(). So change it for trace_vfio_region_sparse_mmap_entry(), but if size is zero, the trace will be weird with an underflow, so move the trace and trace it only if size is not zero. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1650100104-130737-1-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-06 09:06:50 -06:00
Marc-André Lureau	8e3b0cbb72	Replace qemu_real_host_page variables with inlined functions Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-04-06 10:50:38 +02:00
Peng Liang	f3bc3a73c9	vfio: Fix memory leak of hostwin hostwin is allocated and added to hostwin_list in vfio_host_win_add, but it is only deleted from hostwin_list in vfio_host_win_del, which causes a memory leak. Also, freeing all elements in hostwin_list is missing in vfio_disconnect_container. Fix: `2e4109de8e` ("vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)") CC: qemu-stable@nongnu.org Signed-off-by: Peng Liang <liangpeng10@huawei.com> Link: https://lore.kernel.org/r/20211117014739.1839263-1-liangpeng10@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-11-17 11:25:55 -07:00
Kunkun Jiang	e4b3470838	vfio/common: Add a trace point when a MMIO RAM section cannot be mapped The MSI-X structures of some devices and other non-MSI-X structures may be in the same BAR. They may share one host page, especially in the case of large page granularity, such as 64K. For example, MSIX-Table size of 82599 NIC is 0x30 and the offset in Bar 3(size 64KB) is 0x0. vfio_listener_region_add() will be called to map the remaining range (0x30-0xffff). If host page size is 64KB, it will return early at 'int128_ge((int128_make64(iova), llend))' without any message. Let's add a trace point to inform users like commit `5c08600547` ("vfio: Use a trace point when a RAM section cannot be DMA mapped") did. Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> Link: https://lore.kernel.org/r/20211027090406.761-3-jiangkunkun@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-11-01 12:17:51 -06:00
Peter Xu	142518bda5	memory: Name all the memory listeners Provide a name field for all the memory listeners. It can be used to identify which memory listener is which. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20210817013553.30584-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-30 15:30:24 +02:00
Sean Christopherson	56918a126a	memory: Add RAM_PROTECTED flag to skip IOMMU mappings Add a new RAMBlock flag to denote "protected" memory, i.e. memory that looks and acts like RAM but is inaccessible via normal mechanisms, including DMA. Use the flag to skip protected memory regions when mapping RAM for DMA in VFIO. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-30 14:50:19 +02:00
David Hildenbrand	a5dba9bc05	vfio: Fix CID 1458134 in vfio_register_ram_discard_listener() CID 1458134: Integer handling issues (BAD_SHIFT) In expression "1 << ctz64(container->pgsizes)", left shifting by more than 31 bits has undefined behavior. The shift amount, "ctz64(container->pgsizes)", is 64. Commit `5e3b981c33` ("vfio: Support for RamDiscardManager in the !vIOMMU case") added an assertion that our granularity is at least as big as the page size. Although unlikely, we could have a page size that does not fit into 32 bit. In that case, we'd try shifting by more than 31 bit. Let's use 1ULL instead and make sure we're not shifting by more than 63 bit by asserting that any bit in container->pgsizes is set. Fixes: CID 1458134 Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Link: https://lore.kernel.org/r/20210712083135.15755-1-david@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-07-14 13:47:17 -06:00
David Hildenbrand	53d1b5fcfb	vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus We support coordinated discarding of RAM using the RamDiscardManager for the VFIO_TYPE1 iommus. Let's unlock support for coordinated discards, keeping uncoordinated discards (e.g., via virtio-balloon) disabled if possible. This unlocks virtio-mem + vfio on x86-64. Note that vfio used via "nvme://" by the block layer has to be implemented/unlocked separately. For now, virtio-mem only supports x86-64; we don't restrict RamDiscardManager to x86-64, though: arm64 and s390x are supposed to work as well, and we'll test once unlocking virtio-mem support. The spapr IOMMUs will need special care, to be tackled later, e.g.., once supporting virtio-mem. Note: The block size of a virtio-mem device has to be set to sane sizes, depending on the maximum hotplug size - to not run out of vfio mappings. The default virtio-mem block size is usually in the range of a couple of MBs. The maximum number of mapping is 64k, shared with other users. Assume you want to hotplug 256GB using virtio-mem - the block size would have to be set to at least 8 MiB (resulting in 32768 separate mappings). Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-14-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	0fd7616e0f	vfio: Support for RamDiscardManager in the vIOMMU case vIOMMU support works already with RamDiscardManager as long as guests only map populated memory. Both, populated and discarded memory is mapped into &address_space_memory, where vfio_get_xlat_addr() will find that memory, to create the vfio mapping. Sane guests will never map discarded memory (e.g., unplugged memory blocks in virtio-mem) into an IOMMU - or keep it mapped into an IOMMU while memory is getting discarded. However, there are two cases where a malicious guests could trigger pinning of more memory than intended. One case is easy to handle: the guest trying to map discarded memory into an IOMMU. The other case is harder to handle: the guest keeping memory mapped in the IOMMU while it is getting discarded. We would have to walk over all mappings when discarding memory and identify if any mapping would be a violation. Let's keep it simple for now and print a warning, indicating that setting RLIMIT_MEMLOCK can mitigate such attacks. We have to take care of incoming migration: at the point the IOMMUs get restored and start creating mappings in vfio, RamDiscardManager implementations might not be back up and running yet: let's add runstate priorities to enforce the order when restoring. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-10-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	a74317f636	vfio: Sanity check maximum number of DMA mappings with RamDiscardManager Although RamDiscardManager can handle running into the maximum number of DMA mappings by propagating errors when creating a DMA mapping, we want to sanity check and warn the user early that there is a theoretical setup issue and that virtio-mem might not be able to provide as much memory towards a VM as desired. As suggested by Alex, let's use the number of KVM memory slots to guess how many other mappings we might see over time. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-9-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	3eed155caf	vfio: Query and store the maximum number of possible DMA mappings Let's query the maximum number of possible DMA mappings by querying the available mappings when creating the container (before any mappings are created). We'll use this informaton soon to perform some sanity checks and warn the user. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-8-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
David Hildenbrand	5e3b981c33	vfio: Support for RamDiscardManager in the !vIOMMU case Implement support for RamDiscardManager, to prepare for virtio-mem support. Instead of mapping the whole memory section, we only map "populated" parts and update the mapping when notified about discarding/population of memory via the RamDiscardListener. Similarly, when syncing the dirty bitmaps, sync only the actually mapped (populated) parts by replaying via the notifier. Using virtio-mem with vfio is still blocked via ram_block_discard_disable()/ram_block_discard_require() after this patch. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-7-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2021-07-08 15:54:45 -04:00
Keqian Zhu	758b96b61d	vfio/migrate: Move switch of dirty tracking into vfio_memory_listener For now the switch of vfio dirty page tracking is integrated into @vfio_save_handler. The reason is that some PCI vendor driver may start to track dirty base on _SAVING state of device, so if dirty tracking is started before setting device state, vfio will report full-dirty to QEMU. However, the dirty bmap of all ramblocks are fully set when setup ram saving, so it's not matter whether the device is in _SAVING state when start vfio dirty tracking. Moreover, this logic causes some problems [1]. The object of dirty tracking is guest memory, but the object of @vfio_save_handler is device state, which produces unnecessary coupling and conflicts: 1. Coupling: Their saving granule is different (perVM vs perDevice). vfio will enable dirty_page_tracking for each devices, actually once is enough. 2. Conflicts: The ram_save_setup() traverses all memory_listeners to execute their log_start() and log_sync() hooks to get the first round dirty bitmap, which is used by the bulk stage of ram saving. However, as vfio dirty tracking is not yet started, it can't get dirty bitmap from vfio. Then we give up the chance to handle vfio dirty page at bulk stage. Move the switch of vfio dirty_page_tracking into vfio_memory_listener can solve above problems. Besides, Do not require devices in SAVING state for vfio_sync_dirty_bitmap(). [1] https://www.spinics.net/lists/kvm/msg229967.html Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210309031913.11508-1-zhukeqian1@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-16 10:06:44 -06:00
Kunkun Jiang	1eb7f64275	vfio: Support host translation granule size The cpu_physical_memory_set_dirty_lebitmap() can quickly deal with the dirty pages of memory by bitmap-traveling, regardless of whether the bitmap is aligned correctly or not. cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of host page size. So it'd better to set bitmap_pgsize to host page size to support more translation granule sizes. [aw: The Fixes commit below introduced code to restrict migration support to configurations where the target page size intersects the host dirty page support. For example, a 4K guest on a 4K host. Due to the above flexibility in bitmap handling, this restriction unnecessarily prevents mixed target/host pages size that could otherwise be supported. Use host page size for dirty bitmap.] Fixes: `87ea529c50` ("vfio: Get migration capability flags for container") Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> Message-Id: <20210304133446.1521-1-jiangkunkun@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-16 10:06:44 -06:00
Eric Auger	8dca037b48	vfio: Do not register any IOMMU_NOTIFIER_DEVIOTLB_UNMAP notifier In an attempt to fix smmu/virtio-iommu - vhost regression, commit `958ec334bc` ("vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support") broke virtio-iommu integration. This is due to the fact VFIO registers IOMMU_NOTIFIER_ALL notifiers, which includes IOMMU_NOTIFIER_DEVIOTLB_UNMAP and this latter now is rejected by the virtio-iommu. As a consequence, the registration fails. VHOST behaves like a device with an ATC cache. The VFIO device does not support this scheme yet. Let's register only legacy MAP and UNMAP notifiers. Fixes: `958ec334bc` ("vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support") Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210209213233.40985-2-eric.auger@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-16 10:06:44 -06:00
Zenghui Yu	4292d50193	vfio: Fix vfio_listener_log_sync function name typo There is an obvious typo in the function name of the .log_sync() callback. Spell it correctly. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Message-Id: <20201204014240.772-1-yuzenghui@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-16 10:06:44 -06:00
Kirti Wankhede	bb0990d174	vfio: Change default dirty pages tracking behavior during migration By default dirty pages tracking is enabled during iterative phase (pre-copy phase). Added per device opt-out option 'x-pre-copy-dirty-page-tracking' to disable dirty pages tracking during iterative phase. If the option 'x-pre-copy-dirty-page-tracking=off' is set for any VFIO device, dirty pages tracking during iterative phase will be disabled. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-23 10:05:58 -07:00
Jean-Philippe Brucker	1b296c3def	vfio: Don't issue full 2^64 unmap IOMMUs may declare memory regions spanning from 0 to UINT64_MAX. When attempting to deal with such region, vfio_listener_region_del() passes a size of 2^64 to int128_get64() which throws an assertion failure. Even ignoring this, the VFIO_IOMMU_DMA_MAP ioctl cannot handle this size since the size field is 64-bit. Split the request in two. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Message-Id: <20201030180510.747225-11-jean-philippe@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-11-03 16:39:05 -05:00
Bharat Bhushan	b917749842	vfio: Set IOMMU page size as per host supported page size Set IOMMU supported page size mask same as host Linux supported page size mask. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Message-Id: <20201030180510.747225-9-jean-philippe@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-11-03 07:19:27 -05:00
Zhengui li	c624b6b312	vfio: fix incorrect print type The type of input variable is unsigned int while the printer type is int. So fix incorrect print type. Signed-off-by: Zhengui li <lizhengui@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:52 -07:00
Matthew Rosato	92fe289ace	vfio: Add routine for finding VFIO_DEVICE_GET_INFO capabilities Now that VFIO_DEVICE_GET_INFO supports capability chains, add a helper function to find specific capabilities in the chain. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:52 -07:00
Matthew Rosato	7486a62845	vfio: Find DMA available capability The underlying host may be limiting the number of outstanding DMA requests for type 1 IOMMU. Add helper functions to check for the DMA available capability and retrieve the current number of DMA mappings allowed. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [aw: vfio_get_info_dma_avail moved inside CONFIG_LINUX] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:51 -07:00
Matthew Rosato	3ab7a0b40d	vfio: Create shared routine for scanning info capabilities Rather than duplicating the same loop in multiple locations, create a static function to do the work. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:51 -07:00
Kirti Wankhede	3710586caa	qapi: Add VFIO devices migration stats in Migration stats Added amount of bytes transferred to the VM at destination by all VFIO devices Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:51 -07:00
Kirti Wankhede	9e7b0442f2	vfio: Add ioctl to get dirty pages bitmap during dma unmap With vIOMMU, IO virtual address range can get unmapped while in pre-copy phase of migration. In that case, unmap ioctl should return pages pinned in that range and QEMU should find its correcponding guest physical addresses and report those dirty. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> [aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:51 -07:00
Kirti Wankhede	9a04fe0957	vfio: Dirty page tracking when vIOMMU is enabled When vIOMMU is enabled, register MAP notifier from log_sync when all devices in container are in stop and copy phase of migration. Call replay and get dirty pages from notifier callback. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:51 -07:00
Kirti Wankhede	b6dd6504e3	vfio: Add vfio_listener_log_sync to mark dirty pages vfio_listener_log_sync gets list of dirty pages from container using VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all devices are stopped and saving state. Return early for the RAM block section of mapped MMIO region. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> [aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:51 -07:00
Kirti Wankhede	87ea529c50	vfio: Get migration capability flags for container Added helper functions to get IOMMU info capability chain. Added function to get migration capability information from that capability chain for IOMMU container. Similar change was proposed earlier: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg03759.html Disable migration for devices if IOMMU module doesn't support migration capability. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:50 -07:00
Kirti Wankhede	0f7a903ba3	vfio: Add function to unmap VFIO region This function will be used for migration region. Migration region is mmaped when migration starts and will be unmapped when migration is complete. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:50 -07:00
Daniel P. Berrangé	448058aa99	util: rename qemu_open() to qemu_open_old() We want to introduce a new version of qemu_open() that uses an Error object for reporting problems and make this it the preferred interface. Rename the existing method to release the namespace for the new impl. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-16 10:33:48 +01:00
David Hildenbrand	aff92b8286	vfio: Convert to ram_block_discard_disable() VFIO is (except devices without a physical IOMMU or some mediated devices) incompatible with discarding of RAM. The kernel will pin basically all VM memory. Let's convert to ram_block_discard_disable(), which can now fail, in contrast to qemu_balloon_inhibit(). Leave "x-balloon-allowed" named as it is for now. Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Tony Krowiak <akrowiak@linux.ibm.com> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Pierre Morel <pmorel@linux.ibm.com> Cc: Eric Farman <farman@linux.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200626072248.78761-4-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-07-02 05:54:59 -04:00
Michal Privoznik	b09d51c909	Report stringified errno in VFIO related errors In a few places we report errno formatted as a negative integer. This is not as user friendly as it can be. Use strerror() and/or error_setg_errno() instead. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <4949c3ecf1a32189b8a4b5eb4b0fd04c1122501d.1581674006.git.mprivozn@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-02-18 20:20:49 +01:00
Eric Auger	549d400587	memory: allow memory_region_register_iommu_notifier() to fail Currently, when a notifier is attempted to be registered and its flags are not supported (especially the MAP one) by the IOMMU MR, we generally abruptly exit in the IOMMU code. The failure could be handled more nicely in the caller and especially in the VFIO code. So let's allow memory_region_register_iommu_notifier() to fail as well as notify_flag_changed() callback. All sites implementing the callback are updated. This patch does not yet remove the exit(1) in the amd_iommu code. in SMMUv3 we turn the warning message into an error message saying that the assigned device would not work properly. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-04 18:49:18 +02:00
Eric Auger	d7d8783647	vfio: Turn the container error into an Error handle The container error integer field is currently used to store the first error potentially encountered during any vfio_listener_region_add() call. However this fails to propagate detailed error messages up to the vfio_connect_container caller. Instead of using an integer, let's use an Error handle. Messages are slightly reworded to accomodate the propagation. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-04 18:49:18 +02:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	71e8a91585	Include sysemu/reset.h a lot less In my "build everything" tree, changing sysemu/reset.h triggers a recompile of some 2600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). The main culprit is hw/hw.h, which supposedly includes it for convenience. Include sysemu/reset.h only where it's needed. Touching it now recompiles less than 200 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-9-armbru@redhat.com>	2019-08-16 13:31:52 +02:00
Eric Auger	201a733145	vfio/common: Introduce vfio_set_irq_signaling helper The code used to assign an interrupt index/subindex to an eventfd is duplicated many times. Let's introduce an helper that allows to set/unset the signaling for an ACTION_TRIGGER, ACTION_MASK or ACTION_UNMASK action. In the error message, we now use errno in case of any VFIO_DEVICE_SET_IRQS ioctl failure. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-06-13 09:57:37 -06:00
Alexey Kardashevskiy	013002f0fb	vfio: Make vfio_get_region_info_cap public This makes vfio_get_region_info_cap() to be used in quirks. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20190307050518.64968-3-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 16:17:35 +11:00
Eric Auger	2b6326c0bf	hw/vfio/common: Refactor container initialization We introduce the vfio_init_container_type() helper. It computes the highest usable iommu type and then set the container and the iommu type. Its usage in vfio_connect_container() makes the code ready for addition of new iommu types. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-02-21 21:07:03 -07:00
Alex Williamson	567d7d3e6b	vfio/common: Work around kernel overflow bug in DMA unmap A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which adds a test for address space wrap-around in the vfio DMA unmap path. Unfortunately due to overflow, the kernel detects an unmap of the last page in the 64-bit address space as a wrap-around. In QEMU, a Q35 guest with VT-d emulation and guest IOMMU enabled will attempt to make such an unmap request during VM system reset, triggering an error: qemu-kvm: VFIO_UNMAP_DMA: -22 qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument) Here the IOVA start address (0xfef00000) and the size parameter (0xffffffff01100000) add to exactly 2^64, triggering the bug. A kernel fix is queued for the Linux v5.0 release to address this. This patch implements a workaround to retry the unmap, excluding the final page of the range when we detect an unmap failing which matches the requirements for this issue. This is expected to be a safe and complete workaround as the VT-d address space does not extend to the full 64-bit space and therefore the last page should never be mapped. This workaround can be removed once all kernels with this bug are sufficiently deprecated. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291 Reported-by: Pei Zhang <pezhang@redhat.com> Debugged-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2019-02-21 21:07:03 -07:00
Paolo Bonzini	f481ee2d5e	qemu/queue.h: typedef QTAILQ heads This will be needed when we change the QTAILQ head and elem structs to unions. However, it is also consistent with the usage elsewhere in QEMU for other list head structs (see for example FsMountList). Note that most QTAILQs only need their name in order to do backwards walks. Those do not break with the struct->union change, and anyway the change will also remove the need to name heads when doing backwards walks, so those are not touched here. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-01-11 15:46:55 +01:00
Paolo Bonzini	10ca76b4d2	vfio: make vfio_address_spaces static It is not used outside hw/vfio/common.c, so it does not need to be extern. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-01-11 15:46:54 +01:00
Alex Williamson	8709b3954d	vfio/pci: Fix failure to close file descriptor on error A new error path fails to close the device file descriptor when triggered by a ballooning incompatibility within the group. Fix it. Fixes: `238e917285` ("vfio/ccw/pci: Allow devices to opt-in for ballooning") Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-08-23 10:45:58 -06:00
Alexey Kardashevskiy	c26bc185b7	vfio/spapr: Allow backing bigger guest IOMMU pages with smaller physical pages At the moment the PPC64/pseries guest only supports 4K/64K/16M IOMMU pages and POWER8 CPU supports the exact same set of page size so so far things worked fine. However POWER9 supports different set of sizes - 4K/64K/2M/1G and the last two - 2M and 1G - are not even allowed in the paravirt interface (RTAS DDW) so we always end up using 64K IOMMU pages, although we could back guest's 16MB IOMMU pages with 2MB pages on the host. This stores the supported host IOMMU page sizes in VFIOContainer and uses this later when creating a new DMA window. This uses the system page size (64k normally, 2M/16M/1G if hugepages used) as the upper limit of the IOMMU pagesize. This changes the type of @pagesize to uint64_t as this is what memory_region_iommu_get_min_page_size() returns and clz64() takes. There should be no behavioral changes on platforms other than pseries. The guest will keep using the IOMMU page size selected by the PHB pagesize property as this only changes the underlying hardware TCE table granularity. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-08-21 14:28:45 +10:00
Alex Williamson	238e917285	vfio/ccw/pci: Allow devices to opt-in for ballooning If a vfio assigned device makes use of a physical IOMMU, then memory ballooning is necessarily inhibited due to the page pinning, lack of page level granularity at the IOMMU, and sufficient notifiers to both remove the page on balloon inflation and add it back on deflation. However, not all devices are backed by a physical IOMMU. In the case of mediated devices, if a vendor driver is well synchronized with the guest driver, such that only pages actively used by the guest driver are pinned by the host mdev vendor driver, then there should be no overlap between pages available for the balloon driver and pages actively in use by the device. Under these conditions, ballooning should be safe. vfio-ccw devices are always mediated devices and always operate under the constraints above. Therefore we can consider all vfio-ccw devices as balloon compatible. The situation is far from straightforward with vfio-pci. These devices can be physical devices with physical IOMMU backing or mediated devices where it is unknown whether a physical IOMMU is in use or whether the vendor driver is well synchronized to the working set of the guest driver. The safest approach is therefore to assume all vfio-pci devices are incompatible with ballooning, but allow user opt-in should they have further insight into mediated devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-08-17 09:27:16 -06:00
Alex Williamson	c65ee43315	vfio: Inhibit ballooning based on group attachment to a container We use a VFIOContainer to associate an AddressSpace to one or more VFIOGroups. The VFIOContainer represents the DMA context for that AdressSpace for those VFIOGroups and is synchronized to changes in that AddressSpace via a MemoryListener. For IOMMU backed devices, maintaining the DMA context for a VFIOGroup generally involves pinning a host virtual address in order to create a stable host physical address and then mapping a translation from the associated guest physical address to that host physical address into the IOMMU. While the above maintains the VFIOContainer synchronized to the QEMU memory API of the VM, memory ballooning occurs outside of that API. Inflating the memory balloon (ie. cooperatively capturing pages from the guest for use by the host) simply uses MADV_DONTNEED to "zap" pages from QEMU's host virtual address space. The page pinning and IOMMU mapping above remains in place, negating the host's ability to reuse the page, but the host virtual to host physical mapping of the page is invalidated outside of QEMU's memory API. When the balloon is later deflated, attempting to cooperatively return pages to the guest, the page is simply freed by the guest balloon driver, allowing it to be used in the guest and incurring a page fault when that occurs. The page fault maps a new host physical page backing the existing host virtual address, meanwhile the VFIOContainer still maintains the translation to the original host physical address. At this point the guest vCPU and any assigned devices will map different host physical addresses to the same guest physical address. Badness. The IOMMU typically does not have page level granularity with which it can track this mapping without also incurring inefficiencies in using page size mappings throughout. MMU notifiers in the host kernel also provide indicators for invalidating the mapping on balloon inflation, not for updating the mapping when the balloon is deflated. For these reasons we assume a default behavior that the mapping of each VFIOGroup into the VFIOContainer is incompatible with memory ballooning and increment the balloon inhibitor to match the attached VFIOGroups. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-08-17 09:27:16 -06:00
Peter Maydell	cb1efcf462	iommu: Add IOMMU index argument to notifier APIs Add support for multiple IOMMU indexes to the IOMMU notifier APIs. When initializing a notifier with iommu_notifier_init(), the caller must pass the IOMMU index that it is interested in. When a change happens, the IOMMU implementation must pass memory_region_notify_iommu() the IOMMU index that has changed and that notifiers must be called for. IOMMUs which support only a single index don't need to change. Callers which only really support working with IOMMUs with a single index can use the result of passing MEMTXATTRS_UNSPECIFIED to memory_region_iommu_attrs_to_index(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180604152941.20374-3-peter.maydell@linaro.org	2018-06-15 15:23:34 +01:00
Peter Maydell	bc6b1cec84	Make address_space_translate{, _cached}() take a MemTxAttrs argument As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to address_space_translate() and address_space_translate_cached(). Callers either have an attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-4-peter.maydell@linaro.org	2018-05-31 14:50:52 +01:00
Eric Auger	5c08600547	vfio: Use a trace point when a RAM section cannot be DMA mapped Commit `567b5b309a` ("vfio/pci: Relax DMA map errors for MMIO regions") added an error message if a passed memory section address or size is not aligned to the page size and thus cannot be DMA mapped. This patch fixes the trace by printing the region name and the memory region section offset within the address space (instead of offset_within_region). We also turn the error_report into a trace event. Indeed, In some cases, the traces can be confusing to non expert end-users and let think the use case does not work (whereas it works as before). This is the case where a BAR is successively mapped at different GPAs and its sections are not compatible with dma map. The listener is called several times and traces are issued for each intermediate mapping. The end-user cannot easily match those GPAs against the final GPA output by lscpi. So let's keep those information to informed users. In mid term, the plan is to advise the user about BAR relocation relevance. Fixes: `567b5b309a` ("vfio/pci: Relax DMA map errors for MMIO regions") Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-04-05 10:48:52 -06:00
Alexey Kardashevskiy	ae0215b2bb	vfio-pci: Allow mmap of MSIX BAR At the moment we unconditionally avoid mapping MSIX data of a BAR and emulate MSIX table in QEMU. However it is 1) not always necessary as a platform may provide a paravirt interface for MSIX configuration; 2) can affect the speed of MMIO access by emulating them in QEMU when frequently accessed registers share same system page with MSIX data, this is particularly a problem for systems with the page size bigger than 4KB. A new capability - VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - has been added to the kernel [1] which tells the userspace that mapping of the MSIX data is possible now. This makes use of it so from now on QEMU tries mapping the entire BAR as a whole and emulate MSIX on top of that. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-03-13 11:17:31 -06:00
Alexey Kardashevskiy	567b5b309a	vfio/pci: Relax DMA map errors for MMIO regions At the moment if vfio_memory_listener is registered in the system memory address space, it maps/unmaps every RAM memory region for DMA. It expects system page size aligned memory sections so vfio_dma_map would not fail and so far this has been the case. A mapping failure would be fatal. A side effect of such behavior is that some MMIO pages would not be mapped silently. However we are going to change MSIX BAR handling so we will end having non-aligned sections in vfio_memory_listener (more details is in the next patch) and vfio_dma_map will exit QEMU. In order to avoid fatal failures on what previously was not a failure and was just silently ignored, this checks the section alignment to the smallest supported IOMMU page size and prints an error if not aligned; it also prints an error if vfio_dma_map failed despite the page size check. Both errors are not fatal; only MMIO RAM regions are checked (aka "RAM device" regions). If the amount of errors printed is overwhelming, the MSIX relocation could be used to avoid excessive error output. This is unlikely to cause any behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: Fix Int128 bit ops] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-03-13 11:17:30 -06:00
Gerd Hoffmann	92f86bff08	vfio/common: cleanup in vfio_region_finalize Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-03-13 11:17:29 -06:00
Peter Maydell	7b213bb475	* socket option parsing fix (Daniel) * SCSI fixes (Fam) * Readline double-free fix (Greg) * More HVF attribution fixes (Izik) * WHPX (Windows Hypervisor Platform Extensions) support (Justin) * POLLHUP handler (Klim) * ivshmem fixes (Ladi) * memfd memory backend (Marc-André) * improved error message (Marcelo) * Memory fixes (Peter Xu, Zhecheng) * Remove obsolete code and comments (Peter M.) * qdev API improvements (Philippe) * Add CONFIG_I2C switch (Thomas) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJaexoYAAoJEL/70l94x66DVL0IAJC//aZCwwgyN9CRNDcOo10/ UPtzprfezERkur77r1KvEYVNIfslRF6iTBou2+suOWkzoNL2LJ0XZ+wi+2u2sFIF ikvbQVk4dOWqJJQj7e1cmv5A2EZy2dcxjAoD1IG6CRy76+HzYqwjHVw+HkYY5CUS qwnUWjQddP6WtH9MsUHpX7p7atWo7T1tzkx4v8H+CIHBO3uUJQSZLkGYflvcstpj Fo04bZzSkDj2rnlqqBo/6UgJQXD8++Rs64vmiX2xwcK47TWO31Vbuwu+r8V9osWm LHFmRpL8ZkZfL0yqf0bpjmd688dirjVpHIJ5KE043Lo6AdI+K5xBfoBjXxtPiKE= =o90D -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * socket option parsing fix (Daniel) * SCSI fixes (Fam) * Readline double-free fix (Greg) * More HVF attribution fixes (Izik) * WHPX (Windows Hypervisor Platform Extensions) support (Justin) * POLLHUP handler (Klim) * ivshmem fixes (Ladi) * memfd memory backend (Marc-André) * improved error message (Marcelo) * Memory fixes (Peter Xu, Zhecheng) * Remove obsolete code and comments (Peter M.) * qdev API improvements (Philippe) * Add CONFIG_I2C switch (Thomas) # gpg: Signature made Wed 07 Feb 2018 15:24:08 GMT # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (47 commits) Add the WHPX acceleration enlightenments Introduce the WHPX impl Add the WHPX vcpu API Add the Windows Hypervisor Platform accelerator. tests/test-filter-redirector: move close() tests: use memfd in vhost-user-test vhost-user-test: make read-guest-mem setup its own qemu tests: keep compiling failing vhost-user tests Add memfd based hostmem memfd: add hugetlbsize argument memfd: add hugetlb support memfd: add error argument, instead of perror() cpus: join thread when removing a vCPU cpus: hvf: unregister thread with RCU cpus: tcg: unregister thread with RCU, fix exiting of loop on unplug cpus: dummy: unregister thread with RCU, exit loop on unplug cpus: kvm: unregister thread with RCU cpus: hax: register/unregister thread with RCU, exit loop on unplug ivshmem: Disable irqfd on device reset ivshmem: Improve MSI irqfd error handling ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # cpus.c	2018-02-07 20:40:36 +00:00
Peter Xu	369686267a	vfio: listener unregister before unset container After next patch, listener unregister will need the container to be alive. Let's move this unregister phase to be before unset container, since that operation will free the backend container in kernel, otherwise we'll get these after next patch: qemu-system-x86_64: VFIO_UNMAP_DMA: -22 qemu-system-x86_64: vfio_dma_unmap(0x559bf53a4590, 0x0, 0xa0000) = -22 (Invalid argument) Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180122060244.29368-4-peterx@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-02-07 14:09:24 +01:00
Alexey Kardashevskiy	a5b04f7c53	vfio/common: Remove redundant copy of local variable There is already @hostwin in vfio_listener_region_add() so there is no point in having the other one. Fixes: `2e4109de8e` ("vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-02-06 11:08:27 -07:00
Alexey Kardashevskiy	07bc681a33	vfio/spapr: Use iommu memory region's get_attr() In order to enable TCE operations support in KVM, we have to inform the KVM about VFIO groups being attached to specific LIOBNs. The KVM already knows about VFIO groups, the only bit missing is which in-kernel TCE table (the one with user visible TCEs) should update the attached broups. There is an KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE attribute of the VFIO KVM device which receives a groupfd/tablefd couple. This uses a new memory_region_iommu_get_attr() helper to get the IOMMU fd and calls KVM to establish the link. As get_attr() is not implemented yet, this should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-02-06 11:08:24 -07:00
Alexey Kardashevskiy	c6e7958eb7	vfio/spapr: Allow fallback to SPAPR TCE IOMMU v1 The vfio_iommu_spapr_tce driver advertises kernel's support for v1 and v2 IOMMU support, however it is not always possible to use the requested IOMMU type. For example, a pseries host platform does not support dynamic DMA windows so v2 cannot initialize and QEMU fails to start. This adds a fallback to the v1 IOMMU if v2 cannot be used. Fixes: `318f67ce13` ("vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-12-13 10:19:33 -07:00
Liu, Yi L	f7f9c7b232	vfio/common: init giommu_list and hostwin_list of vfio container The init of giommu_list and hostwin_list is missed during container initialization. Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-12-13 10:19:33 -07:00
Alex Williamson	2016986aed	vfio: Fix vfio-kvm group registration Commit `8c37faa475` ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching") moved registration of groups with the vfio-kvm device from vfio_get_group() to vfio_connect_container(), but it missed the case where a group is attached to an existing container and takes an early exit. Perhaps this is a less common case on ppc64/spapr, but on x86 (without viommu) all groups are connected to the same container and thus only the first group gets registered with the vfio-kvm device. This becomes a problem if we then hot-unplug the devices associated with that first group and we end up with KVM being misinformed about any vfio connections that might remain. Fix by including the call to vfio_kvm_device_add_group() in this early exit path. Fixes: `8c37faa475` ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching") Cc: qemu-stable@nongnu.org # qemu-2.10+ Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-12-13 10:19:32 -07:00
Alexey Kardashevskiy	8c37faa475	vfio-pci, ppc64/spapr: Reorder group-to-container attaching At the moment VFIO PCI device initialization works as follows: vfio_realize vfio_get_group vfio_connect_container register memory listeners (1) update QEMU groups lists vfio_kvm_device_add_group Then (example for pseries) the machine reset hook triggers region_add() for all regions where listeners from (1) are listening: ppc_spapr_reset spapr_phb_reset spapr_tce_table_enable memory_region_add_subregion vfio_listener_region_add vfio_spapr_create_window This scheme works fine until we need to handle VFIO PCI device hotplug and we want to enable PPC64/sPAPR in-kernel TCE acceleration on, i.e. after PCI hotplug we need a place to call ioctl(vfio_kvm_device_fd, KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE). Since the ioctl needs a LIOBN fd (from sPAPRTCETable) and a IOMMU group fd (from VFIOGroup), vfio_listener_region_add() seems to be the only place for this ioctl(). However this only works during boot time because the machine reset happens strictly after all devices are finalized. When hotplug happens, vfio_listener_region_add() is called when a memory listener is registered but when this happens: 1. new group is not added to the container->group_list yet; 2. VFIO KVM device is unaware of the new IOMMU group. This moves bits around to have all necessary VFIO infrastructure in place for both initial startup and hotplug cases. [aw: ie, register vfio groups with kvm prior to memory listener registration such that kvm-vfio pseudo device ioctls are available during the region_add callback] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-07-17 12:39:09 -06:00
Alexey Kardashevskiy	3df9d74806	memory/iommu: QOM'fy IOMMU MemoryRegion This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion as a parent. This moves IOMMU-related fields from MR to IOMMU MR. However to avoid dymanic QOM casting in fast path (address_space_translate, etc), this adds an @is_iommu boolean flag to MR and provides new helper to do simple cast to IOMMU MR - memory_region_get_iommu. The flag is set in the instance init callback. This defines memory_region_is_iommu as memory_region_get_iommu()!=NULL. This switches MemoryRegion to IOMMUMemoryRegion in most places except the ones where MemoryRegion may be an alias. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20170711035620.4232-2-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Alex Williamson	7da624e26a	vfio: Test realized when using VFIOGroup.device_list iterator VFIOGroup.device_list is effectively our reference tracking mechanism such that we can teardown a group when all of the device references are removed. However, we also use this list from our machine reset handler for processing resets that affect multiple devices. Generally device removals are fully processed (exitfn + finalize) when this reset handler is invoked, however if the removal is triggered via another reset handler (piix4_reset->acpi_pcihp_reset) then the device exitfn may run, but not finalize. In this case we hit asserts when we start trying to access PCI helpers since much of the PCI state of the device is released. To resolve this, add a pointer to the Object DeviceState in our common base-device and skip non-realized devices as we iterate. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-07-10 10:39:43 -06:00
Peter Xu	ad523590f6	memory: remove the last param in memory_region_iommu_replay() We were always passing in that one as "false" to assume that's an read operation, and we also assume that IOMMU translation would always have that read permission. A better permission would be IOMMU_NONE since the replay is after all not a real read operation, but just a page table rebuilding process. CC: David Gibson <david@gibson.dropbear.id.au> CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Jose Ricardo Ziviani	38d49e8c15	vfio: enable 8-byte reads/writes to vfio This patch enables 8-byte writes and reads to VFIO. Such implemention is already done but it's missing the 'case' to handle such accesses in both vfio_region_write and vfio_region_read and the MemoryRegionOps: impl.max_access_size and impl.min_access_size. After this patch, 8-byte writes such as: qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc0, 0x4140c, 4) vfio_region_write (0001:03:00.0:region1+0xc4, 0xa0000, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 goes like this: qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc0, 0xbfd0008, 8) qemu_mutex_unlock unlocked mutex 0x10905ad8 Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-05-03 14:52:34 -06:00
Jose Ricardo Ziviani	15126cba86	vfio: Set MemoryRegionOps:max_access_size and min_access_size Sets valid.max_access_size and valid.min_access_size to ensure safe 8-byte accesses to vfio. Today, 8-byte accesses are broken into pairs of 4-byte calls that goes unprotected: qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc0, 0x2020c, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc4, 0xa0000, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 which occasionally leads to: qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc0, 0x2030c, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc0, 0x1000c, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc4, 0xb0000, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc4, 0xa0000, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 causing strange errors in guest OS. With this patch, such accesses are protected by the same lock guard: qemu_mutex_lock locked mutex 0x10905ad8 vfio_region_write (0001:03:00.0:region1+0xc0, 0x2000c, 4) vfio_region_write (0001:03:00.0:region1+0xc4, 0xb0000, 4) qemu_mutex_unlock unlocked mutex 0x10905ad8 This happens because the 8-byte write should be broken into 4-byte writes by memory.c:access_with_adjusted_size() in order to be under the same lock. Today, it's done in exec.c:address_space_write_continue() which was able to handle only 4 bytes due to a zero'ed valid.max_access_size (see exec.c:memory_access_size()). Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-05-03 14:52:34 -06:00
Peter Xu	698feb5e13	memory: add section range info for IOMMU notifier In this patch, IOMMUNotifier.{start\|end} are introduced to store section information for a specific notifier. When notification occurs, we not only check the notification type (MAP\|UNMAP), but also check whether the notified iova range overlaps with the range of specific IOMMU notifier, and skip those notifiers if not in the listened range. When removing an region, we need to make sure we removed the correct VFIOGuestIOMMU by checking the IOMMUNotifier.start address as well. This patch is solving the problem that vfio-pci devices receive duplicated UNMAP notification on x86 platform when vIOMMU is there. The issue is that x86 IOMMU has a (0, 2^64-1) IOMMU region, which is splitted by the (0xfee00000, 0xfeefffff) IRQ region. AFAIK this (splitted IOMMU region) is only happening on x86. This patch also helps vhost to leverage the new interface as well, so that vhost won't get duplicated cache flushes. In that sense, it's an slight performance improvement. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-2-git-send-email-peterx@redhat.com> [ehabkost: included extra vhost_iommu_region_del() change from Peter Xu] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-04-20 15:22:41 -03:00
Peter Xu	dfbd90e5b9	vfio: allow to notify unmap for very large region Linux vfio driver supports to do VFIO_IOMMU_UNMAP_DMA for a very big region. This can be leveraged by QEMU IOMMU implementation to cleanup existing page mappings for an entire iova address space (by notifying with an IOTLB with extremely huge addr_mask). However current vfio_iommu_map_notify() does not allow that. It make sure that all the translated address in IOTLB is falling into RAM range. The check makes sense, but it should only be a sensible checker for mapping operations, and mean little for unmap operations. This patch moves this check into map logic only, so that we'll get faster unmap handling (no need to translate again), and also we can then better support unmapping a very big region when it covers non-ram ranges or even not-existing ranges. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-17 21:52:31 +02:00
Peter Xu	4a4b88fbe1	vfio: introduce vfio_get_vaddr() A cleanup for vfio_iommu_map_notify(). Now we will fetch vaddr even if the operation is unmap, but it won't hurt much. One thing to mention is that we need the RCU read lock to protect the whole translation and map/unmap procedure. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-17 21:52:31 +02:00
Peter Xu	3213835720	vfio: trace map/unmap for notify as well We traces its range, but we don't know whether it's a MAP/UNMAP. Let's dump it as well. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-17 21:52:31 +02:00
Yongji Xie	95251725e3	vfio: Add support for mmapping sub-page MMIO BARs Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive") allows VFIO to mmap sub-page BARs. This is the corresponding QEMU patch. With those patches applied, we could passthrough sub-page BARs to guest, which can help to improve IO performance for some devices. In this patch, we expand MemoryRegions of these sub-page MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION with a valid size. The expanding size will be recovered when the base address of sub-page BAR is changed and not page aligned any more in guest. And we also set the priority of these BARs' memory regions to zero in case of overlap with BARs which share the same page with sub-page BARs in guest. Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-31 09:53:04 -06:00
Alex Williamson	24acf72b9a	vfio: Handle zero-length sparse mmap ranges As reported in the link below, user has a PCI device with a 4KB BAR which contains the MSI-X table. This seems to hit a corner case in the kernel where the region reports being mmap capable, but the sparse mmap information reports a zero sized range. It's not entirely clear that the kernel is incorrect in doing this, but regardless, we need to handle it. To do this, fill our mmap array only with non-zero sized sparse mmap entries and add an error return from the function so we can tell the difference between nr_mmaps being zero based on sparse mmap info vs lack of sparse mmap info. NB, this doesn't actually change the behavior of the device, it only removes the scary "Failed to mmap ... Performance may be slow" error message. We cannot currently create an mmap over the MSI-X table. Link: http://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00009.html Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-31 09:53:03 -06:00
Alex Williamson	21e00fa55f	memory: Replace skip_dump flag with "ram_device" Setting skip_dump on a MemoryRegion allows us to modify one specific code path, but the restriction we're trying to address encompasses more than that. If we have a RAM MemoryRegion backed by a physical device, it not only restricts our ability to dump that region, but also affects how we should manipulate it. Here we recognize that MemoryRegions do not change to sometimes allow dumps and other times not, so we replace setting the skip_dump flag with a new initializer so that we know exactly the type of region to which we're applying this behavior. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>	2016-10-31 09:53:03 -06:00
Eric Auger	59f7d6743c	vfio: Pass an error object to vfio_get_device Pass an error object to prepare for migration to VFIO-PCI realize. In vfio platform vfio_base_device_init we currently just report the error. Subsequent patches will propagate the error up to the realize function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:00 -06:00
Eric Auger	1b808d5be0	vfio: Pass an error object to vfio_get_group Pass an error object to prepare for migration to VFIO-PCI realize. For the time being let's just simply report the error in vfio platform's vfio_base_device_init(). A subsequent patch will duly propagate the error up to vfio_platform_realize. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:59 -06:00
Eric Auger	01905f58f1	vfio: Pass an Error object to vfio_connect_container The error is currently simply reported in vfio_get_group. Don't bother too much with the prefix which will be handled at upper level, later on. Also return an error value in case container->error is not 0 and the container is teared down. On vfio_spapr_remove_window failure, we also report an error whereas it was silent before. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:59 -06:00
Peter Xu	cdb3081269	memory: introduce IOMMUNotifier and its caps IOMMU Notifier list is used for notifying IO address mapping changes. Currently VFIO is the only user. However it is possible that future consumer like vhost would like to only listen to part of its notifications (e.g., cache invalidations). This patch introduced IOMMUNotifier and IOMMUNotfierFlag bits for a finer grained control of it. IOMMUNotifier contains a bitfield for the notify consumer describing what kind of notification it is interested in. Currently two kinds of notifications are defined: - IOMMU_NOTIFIER_MAP: for newly mapped entries (additions) - IOMMU_NOTIFIER_UNMAP: for entries to be removed (cache invalidates) When registering the IOMMU notifier, we need to specify one or multiple types of messages to listen to. When notifications are triggered, its type will be checked against the notifier's type bits, and only notifiers with registered bits will be notified. (For any IOMMU implementation, an in-place mapping change should be notified with an UNMAP followed by a MAP.) Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1474606948-14391-2-git-send-email-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-27 08:59:16 +02:00
Markus Armbruster	a9c94277f0	Use #include "..." for our own headers, <...> for others Tracked down with an ugly, brittle and probably buggy Perl script. Also move includes converted to <...> up so they get included before ours where that's obviously okay. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2016-07-12 16:19:16 +02:00
Alexey Kardashevskiy	2e4109de8e	vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management. This adds ability to VFIO common code to dynamically allocate/remove DMA windows in the host kernel when new VFIO container is added/removed. This adds a helper to vfio_listener_region_add which makes VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into the host IOMMU list; the opposite action is taken in vfio_listener_region_del. When creating a new window, this uses heuristic to decide on the TCE table levels number. This should cause no guest visible change in behavior. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Added some casts to prevent printf() warnings on certain targets where the kernel headers' __u64 doesn't match uint64_t or PRIx64] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00
Alexey Kardashevskiy	f4ec5e26ed	vfio: Add host side DMA window capabilities There are going to be multiple IOMMUs per a container. This moves the single host IOMMU parameter set to a list of VFIOHostDMAWindow. This should cause no behavioral change and will be used later by the SPAPR TCE IOMMU v2 which will also add a vfio_host_win_del() helper. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00
Alexey Kardashevskiy	318f67ce13	vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) This makes use of the new "memory registering" feature. The idea is to provide the userspace ability to notify the host kernel about pages which are going to be used for DMA. Having this information, the host kernel can pin them all once per user process, do locked pages accounting (once) and not spent time on doing that in real time with possible failures which cannot be handled nicely in some cases. This adds a prereg memory listener which listens on address_space_memory and notifies a VFIO container about memory which needs to be pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped. The feature is only enabled for SPAPR IOMMU v2. The host kernel changes are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does not call it when v2 is detected and enabled. This enforces guest RAM blocks to be host page size aligned; however this is not new as KVM already requires memory slots to be host page size aligned. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [dwg: Fix compile error on 32-bit host] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:30:54 +10:00
Alexey Kardashevskiy	d22d8956b1	memory: Add MemoryRegionIOMMUOps.notify_started/stopped callbacks The IOMMU driver may change behavior depending on whether a notifier client is present. In the case of POWER, this represents a change in the visibility of the IOTLB, for other drivers such as intel-iommu and future AMD-Vi emulation, notifier support is not yet enabled and this provides the opportunity to flag that incompatibility. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> [new log & extracted from [PATCH qemu v17 12/12] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping listening] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-30 13:00:23 -06:00
Alexey Kardashevskiy	f682e9c244	memory: Add reporting of supported page sizes Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate uses when translating, however this information is not available outside the translate context for various checks. This adds a get_min_page_size callback to MemoryRegionIOMMUOps and a wrapper for it so IOMMU users (such as VFIO) can know the minimum actual page size supported by an IOMMU. As IOMMU MR represents a guest IOMMU, this uses TARGET_PAGE_SIZE as fallback. This removes vfio_container_granularity() and uses new helper in memory_region_iommu_replay() when replaying IOMMU mappings on added IOMMU memory region. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> [dwg: Removed an unnecessary calculation] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-06-22 11:13:09 +10:00
Gavin Shan	d917e88d85	vfio: Fix broken EEH vfio_eeh_container_op() is the backend that communicates with host kernel to support EEH functionality in QEMU. However, the functon should return the value from host kernel instead of 0 unconditionally. dwg: Specifically the problem occurs for the handful of EEH sub-operations which can return a non-zero, non-error result. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> [dwg: clarification to commit message] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-06-17 15:59:18 +10:00
Paolo Bonzini	02d0e09503	os-posix: include sys/mman.h qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check is bogus without a previous inclusion of sys/mman.h. Include it in sysemu/os-posix.h and remove it from everywhere else. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 18:39:03 +02:00
Alexey Kardashevskiy	f1f9365019	vfio: Check that IOMMU MR translates to system address space At the moment IOMMU MR only translate to the system memory. However if some new code changes this, we will need clear indication why it is not working so here is the check. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-26 11:12:09 -06:00
Alexey Kardashevskiy	d78c19b5cf	memory: Fix IOMMU replay base address Since `a788f227` "memory: Allow replay of IOMMU mapping notifications" when new VFIO listener is added, all existing IOMMU mappings are replayed. However there is a problem that the base address of an IOMMU memory region (IOMMU MR) is ignored which is not a problem for the existing user (which is pseries) with its default 32bit DMA window starting at 0 but it is if there is another DMA window. This stores the IOMMU's offset_within_address_space and adjusts the IOVA before calling vfio_dma_map/vfio_dma_unmap. As the IOMMU notifier expects IOVA offset rather than the absolute address, this also adjusts IOVA in sPAPR H_PUT_TCE handler before calling notifier(s). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-26 11:12:08 -06:00
Alexey Kardashevskiy	7a057b4fb9	vfio: Fix 128 bit handling when deleting region `7532d3cbf` "vfio: Fix 128 bit handling" added support for 64bit IOMMU memory regions when those are added to VFIO address space; however removing code cannot cope with these as int128_get64() will fail on 1<<64. This copies 128bit handling from region_add() to region_del(). Since the only machine type which is actually going to use 64bit IOMMU is pseries and it never really removes them (instead it will dynamically add/remove subregions), this should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-26 11:12:07 -06:00
Alex Williamson	e61a424f05	vfio: Create device specific region info helper Given a device specific region type and sub-type, find it. Also cleanup return point on error in vfio_get_region_info() so that we always return 0 with a valid pointer or -errno and NULL. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:04:50 -06:00
Alex Williamson	b53b0f696b	vfio: Enable sparse mmap capability The sparse mmap capability in a vfio region info allows vfio to tell us which sub-areas of a region may be mmap'd. Thus rather than assuming a single mmap covers the entire region and later frobbing it ourselves for things like the PCI MSI-X vector table, we can read that directly from vfio. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 09:43:20 -06:00
Paolo Bonzini	e81096b1c8	explicitly include linux/kvm.h Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-19 16:42:27 +02:00
Bandan Das	55efcc537d	vfio: convert to 128 bit arithmetic calculations when adding mem regions vfio_listener_region_add for a iommu mr results in an overflow assert since iommu memory region is initialized with UINT64_MAX. Convert calculations to 128 bit arithmetic for iommu memory regions and let int128_get64 assert for non iommu regions if there's an overflow. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bandan Das <bsd@redhat.com> [missed (end - 1) on 2nd trace call, move llsize closer to use] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-28 13:27:49 -06:00
David Gibson	3356128cd1	vfio: Eliminate vfio_container_ioctl() vfio_container_ioctl() was a bad interface that bypassed abstraction boundaries, had semantics that sat uneasily with its name, and was unsafe in many realistic circumstances. Now that spapr-pci-vfio-host-bridge has been folded into spapr-pci-host-bridge, there are no more users, so remove it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-16 09:55:11 +11:00
David Gibson	3153119e9b	vfio: Start improving VFIO/EEH interface At present the code handling IBM's Enhanced Error Handling (EEH) interface on VFIO devices operates by bypassing the usual VFIO logic with vfio_container_ioctl(). That's a poorly designed interface with unclear semantics about exactly what can be operated on. In particular it operates on a single vfio container internally (hence the name), but takes an address space and group id, from which it deduces the container in a rather roundabout way. groupids are something that code outside vfio shouldn't even be aware of. This patch creates new interfaces for EEH operations. Internally we have vfio_eeh_container_op() which takes a VFIOContainer object directly. For external use we have vfio_eeh_as_ok() which determines if an AddressSpace is usable for EEH (at present this means it has a single container with exactly one group attached), and vfio_eeh_as_op() which will perform an operation on an AddressSpace in the unambiguous case, and otherwise returns an error. This interface still isn't great, but it's enough of an improvement to allow a number of cleanups in other places. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-16 09:55:10 +11:00
Alex Williamson	db0da029a1	vfio: Generalize region support Both platform and PCI vfio drivers create a "slow", I/O memory region with one or more mmap memory regions overlayed when supported by the device. Generalize this to a set of common helpers in the core that pulls the region info from vfio, fills the region data, configures slow mapping, and adds helpers for comleting the mmap, enable/disable, and teardown. This can be immediately used by the PCI MSI-X code, which needs to mmap around the MSI-X vector table. This also changes VFIORegion.mem to be dynamically allocated because otherwise we don't know how the caller has allocated VFIORegion and therefore don't know whether to unreference it to destroy the MemoryRegion or not. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 20:03:16 -07:00
Alex Williamson	469002263a	vfio: Wrap VFIO_DEVICE_GET_REGION_INFO In preparation for supporting capability chains on regions, wrap ioctl(VFIO_DEVICE_GET_REGION_INFO) so we don't duplicate the code for each caller. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 09:39:07 -07:00
Peter Maydell	c6eacb1ac0	hw/vfio: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1453832250-766-22-git-send-email-peter.maydell@linaro.org	2016-01-29 15:07:24 +00:00
David Gibson	508ce5eb00	vfio: Allow hotplug of containers onto existing guest IOMMU mappings At present the memory listener used by vfio to keep host IOMMU mappings in sync with the guest memory image assumes that if a guest IOMMU appears, then it has no existing mappings. This may not be true if a VFIO device is hotplugged onto a guest bus which didn't previously include a VFIO device, and which has existing guest IOMMU mappings. Therefore, use the memory_region_register_iommu_notifier_replay() function in order to fix this case, replaying existing guest IOMMU mappings, bringing the host IOMMU into sync with the guest IOMMU. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:39:47 -06:00
David Gibson	7a140a57c6	vfio: Record host IOMMU's available IO page sizes Depending on the host IOMMU type we determine and record the available page sizes for IOMMU translation. We'll need this for other validation in future patches. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:38:41 -06:00

1 2 3 4

168 Commits