ISM devices are sensitive to manipulation of the IOMMU, so the ISM device
needs to be reset before the vfio-pci device is reset (triggering a full
UNMAP). In order to ensure this occurs, trigger ISM device resets from
subsystem_reset before triggering the PCI bus reset (which will also
trigger vfio-pci reset). This only needs to be done for ISM devices
which were enabled for use by the guest.
Further, ensure that AIF is disabled as part of the reset event.
Fixes: ef1535901a ("s390x: do a subsystem reset before the unprotect on reboot")
Fixes: 03451953c7 ("s390x/pci: reset ISM passthrough devices on shutdown and system reset")
Reported-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-ID: <20240118185151.265329-4-mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Use a flag to keep track of whether AIF is currently enabled. This can be
used to avoid enabling/disabling AIF multiple times as well as to determine
whether or not it should be disabled during reset processing.
Fixes: d0bc7091c2 ("s390x/pci: enable adapter event notification for interpreted devices")
Reported-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-ID: <20240118185151.265329-2-mjrosato@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
ISM device firmware stores unique state information that can
can cause a wholesale unmap of the associated IOMMU (e.g. when
we get a termination signal for QEMU) to trigger firmware errors
because firmware believes we are attempting to invalidate entries
that are still in-use by the guest OS (when in fact that guest is
in the process of being terminated or rebooted).
To alleviate this, register both a shutdown notifier (for unexpected
termination cases e.g. virsh destroy) as well as a reset callback
(for cases like guest OS reboot). For each of these scenarios, trigger
PCI device reset; this is enough to indicate to firmware that the IOMMU
is no longer in-use by the guest OS, making it safe to invalidate any
associated IOMMU entries.
Fixes: 15d0e7942d ("s390x/pci: don't fence interpreted devices without MSI-X")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <20221209195700.263824-1-mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
[thuth: Adjusted the hunk in s390-pci-vfio.c due to different context]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Currently, s390x-pci performs accounting against the vfio DMA
limit and triggers the guest to clean up mappings when the limit
is reached. Let's go a step further and also limit the size of
the supported DMA aperture reported to the guest based upon the
initial vfio DMA limit reported for the container (if less than
than the size reported by the firmware/host zPCI layer). This
avoids processing sections of the guest DMA table during global
refresh that, for common use cases, will never be used anway, and
makes exhausting the vfio DMA limit due to mismatch between guest
aperture size and host limit far less likely and more indicitive
of an error.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <20221028194758.204007-4-mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Let's use the reserved pool of simulated PCI groups to allow intercept
devices to have separate groups from interpreted devices as some group
values may be different. If we run out of simulated PCI groups, subsequent
intercept devices just get the default group.
Furthermore, if we encounter any PCI groups from hostdevs that are marked
as simulated, let's just assign them to the default group to avoid
conflicts between host simulated groups and our own simulated groups.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20220902172737.170349-7-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Use the associated kvm ioctl operation to enable adapter event notification
and forwarding for devices when requested. This feature will be set up
with or without firmware assist based upon the 'forwarding_assist' setting.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <20220902172737.170349-6-mjrosato@linux.ibm.com>
[thuth: Rename "forwarding_assist" property to "forwarding-assist"]
Signed-off-by: Thomas Huth <thuth@redhat.com>
If the ZPCI_OP ioctl reports that is is available and usable, then the
underlying KVM host will enable load/store intepretation for any guest
device without a SHM bit in the guest function handle. For a device that
will be using interpretation support, ensure the guest function handle
matches the host function handle; this value is re-checked every time the
guest issues a SET PCI FN to enable the guest device as it is the only
opportunity to reflect function handle changes.
By default, unless interpret=off is specified, interpretation support will
always be assumed and exploited if the necessary ioctl and features are
available on the host kernel. When these are unavailable, we will silently
revert to the interception model; this allows existing guest configurations
to work unmodified on hosts with and without zPCI interpretation support,
allowing QEMU to choose the best support model available.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20220902172737.170349-4-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The DTSM is a mask that specifies which I/O Address Translation designation
types are supported. Today QEMU only supports DT=1.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20211203142706.427279-5-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The current default PCI group being used can technically collide with a
real group ID passed from a hostdev. Let's instead use a group ID that
comes from a special pool (0xF0-0xFF) that is architected to be reserved
for simulated devices.
Fixes: 28dc86a072 ("s390x/pci: use a PCI Group structure")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20211203142706.427279-2-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The PAGE_SIZE macro is causing trouble on Alpine Linux since it
clashes with a macro from a system header there. We already have
the TARGET_PAGE_SIZE, TARGET_PAGE_MASK and TARGET_PAGE_BITS macros
in QEMU anyway, so let's simply replace the PAGE_SIZE, PAGE_MASK
and PAGE_SHIFT macro with their TARGET_* counterparts.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/572
Message-Id: <20210901125800.611183-1-thuth@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
We use the capability chains of the VFIO_DEVICE_GET_INFO ioctl to retrieve
the CLP information that the kernel exports.
To be compatible with previous kernel versions we fall back on previous
predefined values, same as the emulation values, when the ioctl is found
to not support capability chains. If individual CLP capabilities are not
found, we fall back on default values for only those capabilities missing
from the chain.
This patch is based on work previously done by Pierre Morel.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[aw: non-Linux build fixes]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
We use a ClpRspQueryPci structure to hold the information related to a
zPCI Function.
This allows us to be ready to support different zPCI functions and to
retrieve the zPCI function information from the host.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
We use a S390PCIGroup structure to hold the information related to a
zPCI Function group.
This allows us to be ready to support multiple groups and to retrieve
the group information from the host.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
To have a clean separation between s390-pci-bus.h and s390-pci-inst.h
headers we export the PCI CLP instructions in a dedicated header.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
When an s390 guest is using lazy unmapping, it can result in a very
large number of oustanding DMA requests, far beyond the default
limit configured for vfio. Let's track DMA usage similar to vfio
in the host, and trigger the guest to flush their DMA mappings
before vfio runs out.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[aw: non-Linux build fixes]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Seems a more appropriate location for them.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>