vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
/*
|
|
|
|
* vfio based device assignment support
|
|
|
|
*
|
|
|
|
* Copyright Red Hat, Inc. 2012
|
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Alex Williamson <alex.williamson@redhat.com>
|
|
|
|
*
|
|
|
|
* This work is licensed under the terms of the GNU GPL, version 2. See
|
|
|
|
* the COPYING file in the top-level directory.
|
|
|
|
*
|
|
|
|
* Based on qemu-kvm device-assignment:
|
|
|
|
* Adapted for KVM by Qumranet.
|
|
|
|
* Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
|
|
|
|
* Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
|
|
|
|
* Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
|
|
|
|
* Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
|
|
|
|
* Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
|
|
|
|
*/
|
|
|
|
|
2016-01-26 21:17:14 +03:00
|
|
|
#include "qemu/osdep.h"
|
2023-11-21 11:44:09 +03:00
|
|
|
#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
|
2013-04-01 23:35:40 +04:00
|
|
|
#include <linux/vfio.h>
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
#include <sys/ioctl.h>
|
|
|
|
|
2019-08-12 08:23:48 +03:00
|
|
|
#include "hw/hw.h"
|
2013-02-04 18:40:22 +04:00
|
|
|
#include "hw/pci/msi.h"
|
|
|
|
#include "hw/pci/msix.h"
|
2015-11-10 22:11:08 +03:00
|
|
|
#include "hw/pci/pci_bridge.h"
|
2019-08-12 08:23:51 +03:00
|
|
|
#include "hw/qdev-properties.h"
|
2020-12-12 01:05:12 +03:00
|
|
|
#include "hw/qdev-properties-system.h"
|
2019-08-12 08:23:45 +03:00
|
|
|
#include "migration/vmstate.h"
|
2021-10-08 16:34:41 +03:00
|
|
|
#include "qapi/qmp/qdict.h"
|
2012-12-17 21:20:00 +04:00
|
|
|
#include "qemu/error-report.h"
|
Include qemu/main-loop.h less
In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h). It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.
Include qemu/main-loop.h only where it's needed. Touching it now
recompiles only some 1700 objects. For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the
others, they shrink only slightly.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-21-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-12 08:23:50 +03:00
|
|
|
#include "qemu/main-loop.h"
|
2019-05-23 17:35:07 +03:00
|
|
|
#include "qemu/module.h"
|
2012-12-17 21:20:00 +04:00
|
|
|
#include "qemu/range.h"
|
2018-06-25 15:42:29 +03:00
|
|
|
#include "qemu/units.h"
|
2013-04-01 23:35:40 +04:00
|
|
|
#include "sysemu/kvm.h"
|
2019-08-12 08:23:59 +03:00
|
|
|
#include "sysemu/runstate.h"
|
2015-09-23 22:04:44 +03:00
|
|
|
#include "pci.h"
|
2014-12-20 00:40:06 +03:00
|
|
|
#include "trace.h"
|
2016-06-20 09:13:39 +03:00
|
|
|
#include "qapi/error.h"
|
2019-10-29 14:49:05 +03:00
|
|
|
#include "migration/blocker.h"
|
2020-10-26 12:36:13 +03:00
|
|
|
#include "migration/qemu-file.h"
|
2023-11-21 11:44:09 +03:00
|
|
|
#include "sysemu/iommufd.h"
|
2014-02-26 21:33:45 +04:00
|
|
|
|
2019-08-22 09:49:09 +03:00
|
|
|
#define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
|
2019-05-21 18:15:40 +03:00
|
|
|
|
2022-03-26 09:02:26 +03:00
|
|
|
/* Protected by BQL */
|
|
|
|
static KVMRouteChange vfio_route_change;
|
|
|
|
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
|
|
|
|
static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
|
2022-03-26 09:02:24 +03:00
|
|
|
static void vfio_msi_disable_common(VFIOPCIDevice *vdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
vfio-pci: Update slow path INTx algorithm
We can't afford the overhead of switching out and back into mmap mode
around each interrupt, but we can do it lazily via a timer. On INTx
interrupt, disable the mmap'd memory regions and set a timer. On
every interrupt, push the timer out. If the timer expires and the
interrupt is no longer pending, switch back to mmap mode.
This has the benefit that things like graphics cards, which rarely or
never, fire an interrupt don't need manual user intervention to add
the x-intx=off parameter. They'll just remain in mmap mode until they
trigger an interrupt, and if they don't continue to regularly fire
interrupts, they'll switch back.
The default timeout is tuned for network cards so that a ping is just
enough to keep them in non-mmap mode, where they have much better
latency. It is tunable with an experimental option,
x-intx-mmap-timeout-ms. A value of 0 keeps the device in non-mmap
mode after the first interrupt.
It's possible we could look at the class code of devices and come up
with reasonable per-class defaults based on expected interrupt
frequency and latency. None of this is used for MSI interrupts and
also won't be used if we can bypass through KVM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 18:45:29 +04:00
|
|
|
/*
|
|
|
|
* Disabling BAR mmaping can be slow, but toggling it around INTx can
|
|
|
|
* also be a huge overhead. We try to get the best of both worlds by
|
|
|
|
* waiting until an interrupt to disable mmaps (subsequent transitions
|
|
|
|
* to the same state are effectively no overhead). If the interrupt has
|
|
|
|
* been serviced and the time gap is long enough, we re-enable mmaps for
|
|
|
|
* performance. This works well for things like graphics cards, which
|
|
|
|
* may not use their interrupt at all and are penalized to an unusable
|
|
|
|
* level by read/write BAR traps. Other devices, like NICs, have more
|
|
|
|
* regular interrupts and see much better latency by staying in non-mmap
|
|
|
|
* mode. We therefore set the default mmap_timeout such that a ping
|
|
|
|
* is just enough to keep the mmap disabled. Users can experiment with
|
|
|
|
* other options with the x-intx-mmap-timeout-ms parameter (a value of
|
|
|
|
* zero disables the timer).
|
|
|
|
*/
|
|
|
|
static void vfio_intx_mmap_enable(void *opaque)
|
|
|
|
{
|
2014-12-20 01:24:15 +03:00
|
|
|
VFIOPCIDevice *vdev = opaque;
|
vfio-pci: Update slow path INTx algorithm
We can't afford the overhead of switching out and back into mmap mode
around each interrupt, but we can do it lazily via a timer. On INTx
interrupt, disable the mmap'd memory regions and set a timer. On
every interrupt, push the timer out. If the timer expires and the
interrupt is no longer pending, switch back to mmap mode.
This has the benefit that things like graphics cards, which rarely or
never, fire an interrupt don't need manual user intervention to add
the x-intx=off parameter. They'll just remain in mmap mode until they
trigger an interrupt, and if they don't continue to regularly fire
interrupts, they'll switch back.
The default timeout is tuned for network cards so that a ping is just
enough to keep them in non-mmap mode, where they have much better
latency. It is tunable with an experimental option,
x-intx-mmap-timeout-ms. A value of 0 keeps the device in non-mmap
mode after the first interrupt.
It's possible we could look at the class code of devices and come up
with reasonable per-class defaults based on expected interrupt
frequency and latency. None of this is used for MSI interrupts and
also won't be used if we can bypass through KVM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 18:45:29 +04:00
|
|
|
|
|
|
|
if (vdev->intx.pending) {
|
2013-08-21 19:03:08 +04:00
|
|
|
timer_mod(vdev->intx.mmap_timer,
|
|
|
|
qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + vdev->intx.mmap_timeout);
|
vfio-pci: Update slow path INTx algorithm
We can't afford the overhead of switching out and back into mmap mode
around each interrupt, but we can do it lazily via a timer. On INTx
interrupt, disable the mmap'd memory regions and set a timer. On
every interrupt, push the timer out. If the timer expires and the
interrupt is no longer pending, switch back to mmap mode.
This has the benefit that things like graphics cards, which rarely or
never, fire an interrupt don't need manual user intervention to add
the x-intx=off parameter. They'll just remain in mmap mode until they
trigger an interrupt, and if they don't continue to regularly fire
interrupts, they'll switch back.
The default timeout is tuned for network cards so that a ping is just
enough to keep them in non-mmap mode, where they have much better
latency. It is tunable with an experimental option,
x-intx-mmap-timeout-ms. A value of 0 keeps the device in non-mmap
mode after the first interrupt.
It's possible we could look at the class code of devices and come up
with reasonable per-class defaults based on expected interrupt
frequency and latency. None of this is used for MSI interrupts and
also won't be used if we can bypass through KVM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 18:45:29 +04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vfio_mmap_set_enabled(vdev, true);
|
|
|
|
}
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
static void vfio_intx_interrupt(void *opaque)
|
|
|
|
{
|
2014-12-20 01:24:15 +03:00
|
|
|
VFIOPCIDevice *vdev = opaque;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
if (!event_notifier_test_and_clear(&vdev->intx.interrupt)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_intx_interrupt(vdev->vbasedev.name, 'A' + vdev->intx.pin);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
vdev->intx.pending = true;
|
2013-10-07 11:36:38 +04:00
|
|
|
pci_irq_assert(&vdev->pdev);
|
vfio-pci: Update slow path INTx algorithm
We can't afford the overhead of switching out and back into mmap mode
around each interrupt, but we can do it lazily via a timer. On INTx
interrupt, disable the mmap'd memory regions and set a timer. On
every interrupt, push the timer out. If the timer expires and the
interrupt is no longer pending, switch back to mmap mode.
This has the benefit that things like graphics cards, which rarely or
never, fire an interrupt don't need manual user intervention to add
the x-intx=off parameter. They'll just remain in mmap mode until they
trigger an interrupt, and if they don't continue to regularly fire
interrupts, they'll switch back.
The default timeout is tuned for network cards so that a ping is just
enough to keep them in non-mmap mode, where they have much better
latency. It is tunable with an experimental option,
x-intx-mmap-timeout-ms. A value of 0 keeps the device in non-mmap
mode after the first interrupt.
It's possible we could look at the class code of devices and come up
with reasonable per-class defaults based on expected interrupt
frequency and latency. None of this is used for MSI interrupts and
also won't be used if we can bypass through KVM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 18:45:29 +04:00
|
|
|
vfio_mmap_set_enabled(vdev, false);
|
|
|
|
if (vdev->intx.mmap_timeout) {
|
2013-08-21 19:03:08 +04:00
|
|
|
timer_mod(vdev->intx.mmap_timer,
|
|
|
|
qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + vdev->intx.mmap_timeout);
|
vfio-pci: Update slow path INTx algorithm
We can't afford the overhead of switching out and back into mmap mode
around each interrupt, but we can do it lazily via a timer. On INTx
interrupt, disable the mmap'd memory regions and set a timer. On
every interrupt, push the timer out. If the timer expires and the
interrupt is no longer pending, switch back to mmap mode.
This has the benefit that things like graphics cards, which rarely or
never, fire an interrupt don't need manual user intervention to add
the x-intx=off parameter. They'll just remain in mmap mode until they
trigger an interrupt, and if they don't continue to regularly fire
interrupts, they'll switch back.
The default timeout is tuned for network cards so that a ping is just
enough to keep them in non-mmap mode, where they have much better
latency. It is tunable with an experimental option,
x-intx-mmap-timeout-ms. A value of 0 keeps the device in non-mmap
mode after the first interrupt.
It's possible we could look at the class code of devices and come up
with reasonable per-class defaults based on expected interrupt
frequency and latency. None of this is used for MSI interrupts and
also won't be used if we can bypass through KVM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 18:45:29 +04:00
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
static void vfio_intx_eoi(VFIODevice *vbasedev)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
2014-12-22 19:54:37 +03:00
|
|
|
VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
if (!vdev->intx.pending) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
trace_vfio_intx_eoi(vbasedev->name);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
vdev->intx.pending = false;
|
2013-10-07 11:36:38 +04:00
|
|
|
pci_irq_deassert(&vdev->pdev);
|
2014-12-22 19:54:37 +03:00
|
|
|
vfio_unmask_single_irqindex(vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:03 +03:00
|
|
|
static bool vfio_intx_enable_kvm(VFIOPCIDevice *vdev, Error **errp)
|
2012-11-13 23:27:40 +04:00
|
|
|
{
|
|
|
|
#ifdef CONFIG_KVM
|
2020-03-18 17:52:01 +03:00
|
|
|
int irq_fd = event_notifier_get_fd(&vdev->intx.interrupt);
|
2012-11-13 23:27:40 +04:00
|
|
|
|
2015-09-23 22:04:44 +03:00
|
|
|
if (vdev->no_kvm_intx || !kvm_irqfds_enabled() ||
|
2012-11-13 23:27:40 +04:00
|
|
|
vdev->intx.route.mode != PCI_INTX_ENABLED ||
|
2014-10-31 16:38:19 +03:00
|
|
|
!kvm_resamplefds_enabled()) {
|
2024-05-22 07:40:03 +03:00
|
|
|
return true;
|
2012-11-13 23:27:40 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Get to a known interrupt state */
|
2020-03-18 17:52:01 +03:00
|
|
|
qemu_set_fd_handler(irq_fd, NULL, NULL, vdev);
|
2014-12-20 01:24:31 +03:00
|
|
|
vfio_mask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
|
2012-11-13 23:27:40 +04:00
|
|
|
vdev->intx.pending = false;
|
2013-10-07 11:36:38 +04:00
|
|
|
pci_irq_deassert(&vdev->pdev);
|
2012-11-13 23:27:40 +04:00
|
|
|
|
|
|
|
/* Get an eventfd for resample/unmask */
|
|
|
|
if (event_notifier_init(&vdev->intx.unmask, 0)) {
|
2016-10-17 19:57:58 +03:00
|
|
|
error_setg(errp, "event_notifier_init failed eoi");
|
2012-11-13 23:27:40 +04:00
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
|
2020-03-18 17:52:01 +03:00
|
|
|
if (kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
|
|
|
|
&vdev->intx.interrupt,
|
|
|
|
&vdev->intx.unmask,
|
|
|
|
vdev->intx.route.irq)) {
|
2016-10-17 19:57:58 +03:00
|
|
|
error_setg_errno(errp, errno, "failed to setup resample irqfd");
|
2012-11-13 23:27:40 +04:00
|
|
|
goto fail_irqfd;
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:39:59 +03:00
|
|
|
if (!vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX, 0,
|
|
|
|
VFIO_IRQ_SET_ACTION_UNMASK,
|
|
|
|
event_notifier_get_fd(&vdev->intx.unmask),
|
|
|
|
errp)) {
|
2012-11-13 23:27:40 +04:00
|
|
|
goto fail_vfio;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Let'em rip */
|
2014-12-20 01:24:31 +03:00
|
|
|
vfio_unmask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
|
2012-11-13 23:27:40 +04:00
|
|
|
|
|
|
|
vdev->intx.kvm_accel = true;
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
trace_vfio_intx_enable_kvm(vdev->vbasedev.name);
|
2012-11-13 23:27:40 +04:00
|
|
|
|
2024-05-22 07:40:03 +03:00
|
|
|
return true;
|
2012-11-13 23:27:40 +04:00
|
|
|
|
|
|
|
fail_vfio:
|
2020-03-18 17:52:01 +03:00
|
|
|
kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, &vdev->intx.interrupt,
|
|
|
|
vdev->intx.route.irq);
|
2012-11-13 23:27:40 +04:00
|
|
|
fail_irqfd:
|
|
|
|
event_notifier_cleanup(&vdev->intx.unmask);
|
|
|
|
fail:
|
2020-03-18 17:52:01 +03:00
|
|
|
qemu_set_fd_handler(irq_fd, vfio_intx_interrupt, NULL, vdev);
|
2014-12-20 01:24:31 +03:00
|
|
|
vfio_unmask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
|
2024-05-22 07:40:03 +03:00
|
|
|
return false;
|
|
|
|
#else
|
|
|
|
return true;
|
2012-11-13 23:27:40 +04:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
|
2012-11-13 23:27:40 +04:00
|
|
|
{
|
|
|
|
#ifdef CONFIG_KVM
|
|
|
|
if (!vdev->intx.kvm_accel) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get to a known state, hardware masked, QEMU ready to accept new
|
|
|
|
* interrupts, QEMU IRQ de-asserted.
|
|
|
|
*/
|
2014-12-20 01:24:31 +03:00
|
|
|
vfio_mask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
|
2012-11-13 23:27:40 +04:00
|
|
|
vdev->intx.pending = false;
|
2013-10-07 11:36:38 +04:00
|
|
|
pci_irq_deassert(&vdev->pdev);
|
2012-11-13 23:27:40 +04:00
|
|
|
|
|
|
|
/* Tell KVM to stop listening for an INTx irqfd */
|
2020-03-18 17:52:01 +03:00
|
|
|
if (kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, &vdev->intx.interrupt,
|
|
|
|
vdev->intx.route.irq)) {
|
error: Strip trailing '\n' from error string arguments (again)
Commit 6daf194d and be62a2eb got rid of a bunch, but they keep coming
back. Tracked down with this Coccinelle semantic patch:
@r@
expression err, eno, cls, fmt;
position p;
@@
(
error_report(fmt, ...)@p
|
error_set(err, cls, fmt, ...)@p
|
error_set_errno(err, eno, cls, fmt, ...)@p
|
error_setg(err, fmt, ...)@p
|
error_setg_errno(err, eno, fmt, ...)@p
)
@script:python@
fmt << r.fmt;
p << r.p;
@@
if "\\n" in str(fmt):
print "%s:%s:%s:%s" % (p[0].file, p[0].line, p[0].column, fmt)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1360354939-10994-4-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-09 00:22:16 +04:00
|
|
|
error_report("vfio: Error: Failed to disable INTx irqfd: %m");
|
2012-11-13 23:27:40 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* We only need to close the eventfd for VFIO to cleanup the kernel side */
|
|
|
|
event_notifier_cleanup(&vdev->intx.unmask);
|
|
|
|
|
|
|
|
/* QEMU starts listening for interrupt events. */
|
2020-03-18 17:52:01 +03:00
|
|
|
qemu_set_fd_handler(event_notifier_get_fd(&vdev->intx.interrupt),
|
|
|
|
vfio_intx_interrupt, NULL, vdev);
|
2012-11-13 23:27:40 +04:00
|
|
|
|
|
|
|
vdev->intx.kvm_accel = false;
|
|
|
|
|
|
|
|
/* If we've missed an event, let it re-fire through QEMU */
|
2014-12-20 01:24:31 +03:00
|
|
|
vfio_unmask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
|
2012-11-13 23:27:40 +04:00
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
trace_vfio_intx_disable_kvm(vdev->vbasedev.name);
|
2012-11-13 23:27:40 +04:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2019-10-17 03:52:45 +03:00
|
|
|
static void vfio_intx_update(VFIOPCIDevice *vdev, PCIINTxRoute *route)
|
2012-11-13 23:27:40 +04:00
|
|
|
{
|
2016-10-17 19:57:58 +03:00
|
|
|
Error *err = NULL;
|
2012-11-13 23:27:40 +04:00
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
trace_vfio_intx_update(vdev->vbasedev.name,
|
2019-10-17 03:52:45 +03:00
|
|
|
vdev->intx.route.irq, route->irq);
|
2012-11-13 23:27:40 +04:00
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
vfio_intx_disable_kvm(vdev);
|
2012-11-13 23:27:40 +04:00
|
|
|
|
2019-10-17 03:52:45 +03:00
|
|
|
vdev->intx.route = *route;
|
2012-11-13 23:27:40 +04:00
|
|
|
|
2019-10-17 03:52:45 +03:00
|
|
|
if (route->mode != PCI_INTX_ENABLED) {
|
2012-11-13 23:27:40 +04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:03 +03:00
|
|
|
if (!vfio_intx_enable_kvm(vdev, &err)) {
|
2018-10-17 11:26:29 +03:00
|
|
|
warn_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
|
2016-10-17 19:57:58 +03:00
|
|
|
}
|
2012-11-13 23:27:40 +04:00
|
|
|
|
|
|
|
/* Re-enable the interrupt in cased we missed an EOI */
|
2015-09-23 22:04:43 +03:00
|
|
|
vfio_intx_eoi(&vdev->vbasedev);
|
2012-11-13 23:27:40 +04:00
|
|
|
}
|
|
|
|
|
2019-10-17 03:52:45 +03:00
|
|
|
static void vfio_intx_routing_notifier(PCIDevice *pdev)
|
|
|
|
{
|
2020-09-03 01:43:03 +03:00
|
|
|
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
|
2019-10-17 03:52:45 +03:00
|
|
|
PCIINTxRoute route;
|
|
|
|
|
|
|
|
if (vdev->interrupt != VFIO_INT_INTx) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
route = pci_device_route_intx_to_irq(&vdev->pdev, vdev->intx.pin);
|
|
|
|
|
|
|
|
if (pci_intx_route_changed(&vdev->intx.route, &route)) {
|
|
|
|
vfio_intx_update(vdev, &route);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-10-17 04:38:30 +03:00
|
|
|
static void vfio_irqchip_change(Notifier *notify, void *data)
|
|
|
|
{
|
|
|
|
VFIOPCIDevice *vdev = container_of(notify, VFIOPCIDevice,
|
|
|
|
irqchip_change_notifier);
|
|
|
|
|
|
|
|
vfio_intx_update(vdev, &vdev->intx.route);
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:06 +03:00
|
|
|
static bool vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1);
|
2016-10-17 19:57:58 +03:00
|
|
|
Error *err = NULL;
|
2019-06-13 18:57:37 +03:00
|
|
|
int32_t fd;
|
|
|
|
int ret;
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
vfio-pci: Update slow path INTx algorithm
We can't afford the overhead of switching out and back into mmap mode
around each interrupt, but we can do it lazily via a timer. On INTx
interrupt, disable the mmap'd memory regions and set a timer. On
every interrupt, push the timer out. If the timer expires and the
interrupt is no longer pending, switch back to mmap mode.
This has the benefit that things like graphics cards, which rarely or
never, fire an interrupt don't need manual user intervention to add
the x-intx=off parameter. They'll just remain in mmap mode until they
trigger an interrupt, and if they don't continue to regularly fire
interrupts, they'll switch back.
The default timeout is tuned for network cards so that a ping is just
enough to keep them in non-mmap mode, where they have much better
latency. It is tunable with an experimental option,
x-intx-mmap-timeout-ms. A value of 0 keeps the device in non-mmap
mode after the first interrupt.
It's possible we could look at the class code of devices and come up
with reasonable per-class defaults based on expected interrupt
frequency and latency. None of this is used for MSI interrupts and
also won't be used if we can bypass through KVM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 18:45:29 +04:00
|
|
|
if (!pin) {
|
2024-05-22 07:40:06 +03:00
|
|
|
return true;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
vfio_disable_interrupts(vdev);
|
|
|
|
|
|
|
|
vdev->intx.pin = pin - 1; /* Pin A (1) -> irq[0] */
|
2013-10-07 11:36:38 +04:00
|
|
|
pci_config_set_interrupt_pin(vdev->pdev.config, pin);
|
2012-11-13 23:27:40 +04:00
|
|
|
|
|
|
|
#ifdef CONFIG_KVM
|
|
|
|
/*
|
|
|
|
* Only conditional to avoid generating error messages on platforms
|
|
|
|
* where we won't actually use the result anyway.
|
|
|
|
*/
|
2014-10-31 16:38:19 +03:00
|
|
|
if (kvm_irqfds_enabled() && kvm_resamplefds_enabled()) {
|
2012-11-13 23:27:40 +04:00
|
|
|
vdev->intx.route = pci_device_route_intx_to_irq(&vdev->pdev,
|
|
|
|
vdev->intx.pin);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
ret = event_notifier_init(&vdev->intx.interrupt, 0);
|
|
|
|
if (ret) {
|
2016-10-17 19:57:58 +03:00
|
|
|
error_setg_errno(errp, -ret, "event_notifier_init failed");
|
2024-05-22 07:40:06 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
2019-06-13 18:57:37 +03:00
|
|
|
fd = event_notifier_get_fd(&vdev->intx.interrupt);
|
|
|
|
qemu_set_fd_handler(fd, vfio_intx_interrupt, NULL, vdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2024-05-22 07:39:59 +03:00
|
|
|
if (!vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX, 0,
|
|
|
|
VFIO_IRQ_SET_ACTION_TRIGGER, fd, errp)) {
|
2019-06-13 18:57:37 +03:00
|
|
|
qemu_set_fd_handler(fd, NULL, NULL, vdev);
|
2012-10-08 18:45:30 +04:00
|
|
|
event_notifier_cleanup(&vdev->intx.interrupt);
|
2024-05-22 07:40:06 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:03 +03:00
|
|
|
if (!vfio_intx_enable_kvm(vdev, &err)) {
|
2018-10-17 11:26:29 +03:00
|
|
|
warn_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
|
2016-10-17 19:57:58 +03:00
|
|
|
}
|
2012-11-13 23:27:40 +04:00
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vdev->interrupt = VFIO_INT_INTx;
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
trace_vfio_intx_enable(vdev->vbasedev.name);
|
2024-05-22 07:40:06 +03:00
|
|
|
return true;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
static void vfio_intx_disable(VFIOPCIDevice *vdev)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
int fd;
|
|
|
|
|
2013-08-21 19:03:08 +04:00
|
|
|
timer_del(vdev->intx.mmap_timer);
|
2015-09-23 22:04:43 +03:00
|
|
|
vfio_intx_disable_kvm(vdev);
|
2014-12-20 01:24:31 +03:00
|
|
|
vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vdev->intx.pending = false;
|
2013-10-07 11:36:38 +04:00
|
|
|
pci_irq_deassert(&vdev->pdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vfio_mmap_set_enabled(vdev, true);
|
|
|
|
|
|
|
|
fd = event_notifier_get_fd(&vdev->intx.interrupt);
|
|
|
|
qemu_set_fd_handler(fd, NULL, NULL, vdev);
|
|
|
|
event_notifier_cleanup(&vdev->intx.interrupt);
|
|
|
|
|
|
|
|
vdev->interrupt = VFIO_INT_NONE;
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
trace_vfio_intx_disable(vdev->vbasedev.name);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* MSI/X
|
|
|
|
*/
|
|
|
|
static void vfio_msi_interrupt(void *opaque)
|
|
|
|
{
|
|
|
|
VFIOMSIVector *vector = opaque;
|
2014-12-20 01:24:15 +03:00
|
|
|
VFIOPCIDevice *vdev = vector->vdev;
|
2015-09-23 22:04:43 +03:00
|
|
|
MSIMessage (*get_msg)(PCIDevice *dev, unsigned vector);
|
|
|
|
void (*notify)(PCIDevice *dev, unsigned vector);
|
|
|
|
MSIMessage msg;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
int nr = vector - vdev->msi_vectors;
|
|
|
|
|
|
|
|
if (!event_notifier_test_and_clear(&vector->interrupt)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2013-12-06 22:16:29 +04:00
|
|
|
if (vdev->interrupt == VFIO_INT_MSIX) {
|
2015-09-23 22:04:43 +03:00
|
|
|
get_msg = msix_get_message;
|
|
|
|
notify = msix_notify;
|
2016-01-19 21:33:42 +03:00
|
|
|
|
|
|
|
/* A masked vector firing needs to use the PBA, enable it */
|
|
|
|
if (msix_is_masked(&vdev->pdev, nr)) {
|
|
|
|
set_bit(nr, vdev->msix->pending);
|
|
|
|
memory_region_set_enabled(&vdev->pdev.msix_pba_mmio, true);
|
|
|
|
trace_vfio_msix_pba_enable(vdev->vbasedev.name);
|
|
|
|
}
|
2014-06-30 19:50:33 +04:00
|
|
|
} else if (vdev->interrupt == VFIO_INT_MSI) {
|
2015-09-23 22:04:43 +03:00
|
|
|
get_msg = msi_get_message;
|
|
|
|
notify = msi_notify;
|
2013-12-06 22:16:29 +04:00
|
|
|
} else {
|
|
|
|
abort();
|
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
msg = get_msg(&vdev->pdev, nr);
|
2015-02-10 20:25:44 +03:00
|
|
|
trace_vfio_msi_interrupt(vdev->vbasedev.name, nr, msg.address, msg.data);
|
2015-09-23 22:04:43 +03:00
|
|
|
notify(&vdev->pdev, nr);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2023-09-26 05:14:06 +03:00
|
|
|
/*
|
|
|
|
* Get MSI-X enabled, but no vector enabled, by setting vector 0 with an invalid
|
|
|
|
* fd to kernel.
|
|
|
|
*/
|
|
|
|
static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
|
|
|
|
{
|
|
|
|
g_autofree struct vfio_irq_set *irq_set = NULL;
|
|
|
|
int ret = 0, argsz;
|
|
|
|
int32_t *fd;
|
|
|
|
|
|
|
|
argsz = sizeof(*irq_set) + sizeof(*fd);
|
|
|
|
|
|
|
|
irq_set = g_malloc0(argsz);
|
|
|
|
irq_set->argsz = argsz;
|
|
|
|
irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
|
|
|
|
VFIO_IRQ_SET_ACTION_TRIGGER;
|
|
|
|
irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
|
|
|
|
irq_set->start = 0;
|
|
|
|
irq_set->count = 1;
|
|
|
|
fd = (int32_t *)&irq_set->data;
|
|
|
|
*fd = -1;
|
|
|
|
|
|
|
|
ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:15 +03:00
|
|
|
static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
struct vfio_irq_set *irq_set;
|
|
|
|
int ret = 0, i, argsz;
|
|
|
|
int32_t *fds;
|
|
|
|
|
2023-09-26 05:14:07 +03:00
|
|
|
/*
|
|
|
|
* If dynamic MSI-X allocation is supported, the vectors to be allocated
|
|
|
|
* and enabled can be scattered. Before kernel enabling MSI-X, setting
|
|
|
|
* nr_vectors causes all these vectors to be allocated on host.
|
|
|
|
*
|
|
|
|
* To keep allocation as needed, use vector 0 with an invalid fd to get
|
|
|
|
* MSI-X enabled first, then set vectors with a potentially sparse set of
|
|
|
|
* eventfds to enable interrupts only when enabled in guest.
|
|
|
|
*/
|
|
|
|
if (msix && !vdev->msix->noresize) {
|
|
|
|
ret = vfio_enable_msix_no_vec(vdev);
|
|
|
|
|
|
|
|
if (ret) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
|
|
|
|
|
|
|
|
irq_set = g_malloc0(argsz);
|
|
|
|
irq_set->argsz = argsz;
|
|
|
|
irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
|
|
|
|
irq_set->index = msix ? VFIO_PCI_MSIX_IRQ_INDEX : VFIO_PCI_MSI_IRQ_INDEX;
|
|
|
|
irq_set->start = 0;
|
|
|
|
irq_set->count = vdev->nr_vectors;
|
|
|
|
fds = (int32_t *)&irq_set->data;
|
|
|
|
|
|
|
|
for (i = 0; i < vdev->nr_vectors; i++) {
|
2014-08-05 23:05:52 +04:00
|
|
|
int fd = -1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* MSI vs MSI-X - The guest has direct access to MSI mask and pending
|
|
|
|
* bits, therefore we always use the KVM signaling path when setup.
|
|
|
|
* MSI-X mask and pending bits are emulated, so we want to use the
|
|
|
|
* KVM signaling path only when configured and unmasked.
|
|
|
|
*/
|
|
|
|
if (vdev->msi_vectors[i].use) {
|
|
|
|
if (vdev->msi_vectors[i].virq < 0 ||
|
|
|
|
(msix && msix_is_masked(&vdev->pdev, i))) {
|
|
|
|
fd = event_notifier_get_fd(&vdev->msi_vectors[i].interrupt);
|
|
|
|
} else {
|
|
|
|
fd = event_notifier_get_fd(&vdev->msi_vectors[i].kvm_interrupt);
|
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
2014-08-05 23:05:52 +04:00
|
|
|
|
|
|
|
fds[i] = fd;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:31 +03:00
|
|
|
ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
g_free(irq_set);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:44 +03:00
|
|
|
static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
|
2016-07-14 08:56:30 +03:00
|
|
|
int vector_n, bool msix)
|
2014-06-30 19:50:33 +04:00
|
|
|
{
|
2016-07-14 08:56:30 +03:00
|
|
|
if ((msix && vdev->no_kvm_msix) || (!msix && vdev->no_kvm_msi)) {
|
2014-06-30 19:50:33 +04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2022-03-26 09:02:26 +03:00
|
|
|
vector->virq = kvm_irqchip_add_msi_route(&vfio_route_change,
|
|
|
|
vector_n, &vdev->pdev);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vfio_connect_kvm_msi_virq(VFIOMSIVector *vector)
|
|
|
|
{
|
|
|
|
if (vector->virq < 0) {
|
2014-06-30 19:50:33 +04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2022-03-26 09:02:26 +03:00
|
|
|
if (event_notifier_init(&vector->kvm_interrupt, 0)) {
|
|
|
|
goto fail_notifier;
|
2014-06-30 19:50:33 +04:00
|
|
|
}
|
|
|
|
|
2015-07-06 21:15:13 +03:00
|
|
|
if (kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, &vector->kvm_interrupt,
|
2022-03-26 09:02:26 +03:00
|
|
|
NULL, vector->virq) < 0) {
|
|
|
|
goto fail_kvm;
|
2014-06-30 19:50:33 +04:00
|
|
|
}
|
|
|
|
|
2022-03-26 09:02:26 +03:00
|
|
|
return;
|
|
|
|
|
|
|
|
fail_kvm:
|
|
|
|
event_notifier_cleanup(&vector->kvm_interrupt);
|
|
|
|
fail_notifier:
|
|
|
|
kvm_irqchip_release_virq(kvm_state, vector->virq);
|
|
|
|
vector->virq = -1;
|
2014-06-30 19:50:33 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
static void vfio_remove_kvm_msi_virq(VFIOMSIVector *vector)
|
|
|
|
{
|
2015-07-06 21:15:13 +03:00
|
|
|
kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, &vector->kvm_interrupt,
|
|
|
|
vector->virq);
|
2014-06-30 19:50:33 +04:00
|
|
|
kvm_irqchip_release_virq(kvm_state, vector->virq);
|
|
|
|
vector->virq = -1;
|
|
|
|
event_notifier_cleanup(&vector->kvm_interrupt);
|
|
|
|
}
|
|
|
|
|
2015-10-15 16:44:52 +03:00
|
|
|
static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
|
|
|
|
PCIDevice *pdev)
|
2014-06-30 19:50:33 +04:00
|
|
|
{
|
2015-10-15 16:44:52 +03:00
|
|
|
kvm_irqchip_update_msi_route(kvm_state, vector->virq, msg, pdev);
|
2016-07-14 08:56:33 +03:00
|
|
|
kvm_irqchip_commit_routes(kvm_state);
|
2014-06-30 19:50:33 +04:00
|
|
|
}
|
|
|
|
|
2013-01-09 01:09:03 +04:00
|
|
|
static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
|
|
|
|
MSIMessage *msg, IOHandler *handler)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
2020-09-03 01:43:03 +03:00
|
|
|
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
VFIOMSIVector *vector;
|
|
|
|
int ret;
|
2023-09-26 05:14:05 +03:00
|
|
|
bool resizing = !!(vdev->nr_vectors < nr + 1);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_msix_vector_do_use(vdev->vbasedev.name, nr);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
vector = &vdev->msi_vectors[nr];
|
|
|
|
|
2014-06-30 19:50:33 +04:00
|
|
|
if (!vector->use) {
|
|
|
|
vector->vdev = vdev;
|
|
|
|
vector->virq = -1;
|
|
|
|
if (event_notifier_init(&vector->interrupt, 0)) {
|
|
|
|
error_report("vfio: Error: event_notifier_init failed");
|
|
|
|
}
|
|
|
|
vector->use = true;
|
|
|
|
msix_vector_use(pdev, nr);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2014-06-30 19:50:33 +04:00
|
|
|
qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt),
|
|
|
|
handler, NULL, vector);
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
/*
|
|
|
|
* Attempt to enable route through KVM irqchip,
|
|
|
|
* default to userspace handling if unavailable.
|
|
|
|
*/
|
2014-06-30 19:50:33 +04:00
|
|
|
if (vector->virq >= 0) {
|
|
|
|
if (!msg) {
|
|
|
|
vfio_remove_kvm_msi_virq(vector);
|
|
|
|
} else {
|
2015-10-15 16:44:52 +03:00
|
|
|
vfio_update_kvm_msi_virq(vector, *msg, pdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
2014-06-30 19:50:33 +04:00
|
|
|
} else {
|
vfio/pci: Fix regression in MSI routing configuration
d1f6af6 "kvm-irqchip: simplify kvm_irqchip_add_msi_route" was a cleanup
of kvmchip routing configuration, that was mostly intended for x86.
However, it also contains a subtle change in behaviour which breaks EEH[1]
error recovery on certain VFIO passthrough devices on spapr guests. So far
it's only been seen on a BCM5719 NIC on a POWER8 server, but there may be
other hardware with the same problem. It's also possible there could be
circumstances where it causes a bug on x86 as well, though I don't know of
any obvious candidates.
Prior to d1f6af6, both vfio_msix_vector_do_use() and
vfio_add_kvm_msi_virq() used msg == NULL as a special flag to mark this
as the "dummy" vector used to make the host hardware state sync with the
guest expected hardware state in terms of MSI configuration.
Specifically that flag caused vfio_add_kvm_msi_virq() to become a no-op,
meaning the dummy irq would always be delivered via qemu. d1f6af6 changed
vfio_add_kvm_msi_virq() so it takes a vector number instead of the msg
parameter, and determines the correct message itself. The test for !msg
was removed, and not replaced with anything there or in the caller.
With an spapr guest which has a VFIO device, if an EEH error occurs on the
host hardware, then the device will be isolated then reset. This is a
combination of host and guest action, mediated by some EEH related
hypercalls. I haven't fully traced the mechanics, but somehow installing
the kvm irqchip route for the dummy irq on the BCM5719 means that after EEH
reset and recovery, at least some irqs are no longer delivered to the
guest.
In particular, the guest never gets the link up event, and so the NIC is
effectively dead.
[1] EEH (Enhanced Error Handling) is an IBM POWER server specific PCI-*
error reporting and recovery mechanism. The concept is somewhat
similar to PCI-E AER, but the details are different.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1373802
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Gavin Shan <gwshan@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-stable@nongnu.org
Fixes: d1f6af6a17a6 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-09-15 09:11:48 +03:00
|
|
|
if (msg) {
|
2022-03-26 09:02:26 +03:00
|
|
|
if (vdev->defer_kvm_irq_routing) {
|
|
|
|
vfio_add_kvm_msi_virq(vdev, vector, nr, true);
|
|
|
|
} else {
|
|
|
|
vfio_route_change = kvm_irqchip_begin_route_changes(kvm_state);
|
|
|
|
vfio_add_kvm_msi_virq(vdev, vector, nr, true);
|
|
|
|
kvm_irqchip_commit_route_changes(&vfio_route_change);
|
|
|
|
vfio_connect_kvm_msi_virq(vector);
|
|
|
|
}
|
vfio/pci: Fix regression in MSI routing configuration
d1f6af6 "kvm-irqchip: simplify kvm_irqchip_add_msi_route" was a cleanup
of kvmchip routing configuration, that was mostly intended for x86.
However, it also contains a subtle change in behaviour which breaks EEH[1]
error recovery on certain VFIO passthrough devices on spapr guests. So far
it's only been seen on a BCM5719 NIC on a POWER8 server, but there may be
other hardware with the same problem. It's also possible there could be
circumstances where it causes a bug on x86 as well, though I don't know of
any obvious candidates.
Prior to d1f6af6, both vfio_msix_vector_do_use() and
vfio_add_kvm_msi_virq() used msg == NULL as a special flag to mark this
as the "dummy" vector used to make the host hardware state sync with the
guest expected hardware state in terms of MSI configuration.
Specifically that flag caused vfio_add_kvm_msi_virq() to become a no-op,
meaning the dummy irq would always be delivered via qemu. d1f6af6 changed
vfio_add_kvm_msi_virq() so it takes a vector number instead of the msg
parameter, and determines the correct message itself. The test for !msg
was removed, and not replaced with anything there or in the caller.
With an spapr guest which has a VFIO device, if an EEH error occurs on the
host hardware, then the device will be isolated then reset. This is a
combination of host and guest action, mediated by some EEH related
hypercalls. I haven't fully traced the mechanics, but somehow installing
the kvm irqchip route for the dummy irq on the BCM5719 means that after EEH
reset and recovery, at least some irqs are no longer delivered to the
guest.
In particular, the guest never gets the link up event, and so the NIC is
effectively dead.
[1] EEH (Enhanced Error Handling) is an IBM POWER server specific PCI-*
error reporting and recovery mechanism. The concept is somewhat
similar to PCI-E AER, but the details are different.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1373802
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Gavin Shan <gwshan@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-stable@nongnu.org
Fixes: d1f6af6a17a6 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-09-15 09:11:48 +03:00
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2023-09-26 05:14:05 +03:00
|
|
|
* When dynamic allocation is not supported, we don't want to have the
|
|
|
|
* host allocate all possible MSI vectors for a device if they're not
|
|
|
|
* in use, so we shutdown and incrementally increase them as needed.
|
|
|
|
* nr_vectors represents the total number of vectors allocated.
|
|
|
|
*
|
|
|
|
* When dynamic allocation is supported, let the host only allocate
|
|
|
|
* and enable a vector when it is in use in guest. nr_vectors represents
|
|
|
|
* the upper bound of vectors being enabled (but not all of the ranges
|
|
|
|
* is allocated or enabled).
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
*/
|
2023-09-26 05:14:05 +03:00
|
|
|
if (resizing) {
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vdev->nr_vectors = nr + 1;
|
2023-09-26 05:14:05 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!vdev->defer_kvm_irq_routing) {
|
|
|
|
if (vdev->msix->noresize && resizing) {
|
2022-03-26 09:02:26 +03:00
|
|
|
vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
|
|
|
|
ret = vfio_enable_vectors(vdev, true);
|
|
|
|
if (ret) {
|
|
|
|
error_report("vfio: failed to enable vectors, %d", ret);
|
|
|
|
}
|
2014-06-30 19:50:33 +04:00
|
|
|
} else {
|
2023-09-26 05:14:05 +03:00
|
|
|
Error *err = NULL;
|
|
|
|
int32_t fd;
|
2012-10-08 18:45:31 +04:00
|
|
|
|
2023-09-26 05:14:05 +03:00
|
|
|
if (vector->virq >= 0) {
|
|
|
|
fd = event_notifier_get_fd(&vector->kvm_interrupt);
|
|
|
|
} else {
|
|
|
|
fd = event_notifier_get_fd(&vector->interrupt);
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:39:59 +03:00
|
|
|
if (!vfio_set_irq_signaling(&vdev->vbasedev,
|
|
|
|
VFIO_PCI_MSIX_IRQ_INDEX, nr,
|
|
|
|
VFIO_IRQ_SET_ACTION_TRIGGER, fd,
|
|
|
|
&err)) {
|
2023-09-26 05:14:05 +03:00
|
|
|
error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
|
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-01-19 21:33:42 +03:00
|
|
|
/* Disable PBA emulation when nothing more is pending. */
|
|
|
|
clear_bit(nr, vdev->msix->pending);
|
|
|
|
if (find_first_bit(vdev->msix->pending,
|
|
|
|
vdev->nr_vectors) == vdev->nr_vectors) {
|
|
|
|
memory_region_set_enabled(&vdev->pdev.msix_pba_mmio, false);
|
|
|
|
trace_vfio_msix_pba_disable(vdev->vbasedev.name);
|
|
|
|
}
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-01-09 01:09:03 +04:00
|
|
|
static int vfio_msix_vector_use(PCIDevice *pdev,
|
|
|
|
unsigned int nr, MSIMessage msg)
|
|
|
|
{
|
|
|
|
return vfio_msix_vector_do_use(pdev, nr, &msg, vfio_msi_interrupt);
|
|
|
|
}
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
|
|
|
|
{
|
2020-09-03 01:43:03 +03:00
|
|
|
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
VFIOMSIVector *vector = &vdev->msi_vectors[nr];
|
|
|
|
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_msix_vector_release(vdev->vbasedev.name, nr);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
/*
|
2014-06-30 19:50:33 +04:00
|
|
|
* There are still old guests that mask and unmask vectors on every
|
|
|
|
* interrupt. If we're using QEMU bypass with a KVM irqfd, leave all of
|
|
|
|
* the KVM setup in place, simply switch VFIO to use the non-bypass
|
|
|
|
* eventfd. We'll then fire the interrupt through QEMU and the MSI-X
|
|
|
|
* core will mask the interrupt and set pending bits, allowing it to
|
|
|
|
* be re-asserted on unmask. Nothing to do if already using QEMU mode.
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
*/
|
2014-06-30 19:50:33 +04:00
|
|
|
if (vector->virq >= 0) {
|
2019-06-13 18:57:37 +03:00
|
|
|
int32_t fd = event_notifier_get_fd(&vector->interrupt);
|
2019-07-02 20:18:16 +03:00
|
|
|
Error *err = NULL;
|
2012-10-08 18:45:31 +04:00
|
|
|
|
2024-05-22 07:39:59 +03:00
|
|
|
if (!vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX,
|
|
|
|
nr, VFIO_IRQ_SET_ACTION_TRIGGER, fd,
|
|
|
|
&err)) {
|
2019-07-02 20:18:16 +03:00
|
|
|
error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
|
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-03-26 09:02:26 +03:00
|
|
|
static void vfio_prepare_kvm_msi_virq_batch(VFIOPCIDevice *vdev)
|
|
|
|
{
|
|
|
|
assert(!vdev->defer_kvm_irq_routing);
|
|
|
|
vdev->defer_kvm_irq_routing = true;
|
|
|
|
vfio_route_change = kvm_irqchip_begin_route_changes(kvm_state);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vfio_commit_kvm_msi_virq_batch(VFIOPCIDevice *vdev)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
assert(vdev->defer_kvm_irq_routing);
|
|
|
|
vdev->defer_kvm_irq_routing = false;
|
|
|
|
|
|
|
|
kvm_irqchip_commit_route_changes(&vfio_route_change);
|
|
|
|
|
|
|
|
for (i = 0; i < vdev->nr_vectors; i++) {
|
|
|
|
vfio_connect_kvm_msi_virq(&vdev->msi_vectors[i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
static void vfio_msix_enable(VFIOPCIDevice *vdev)
|
2012-10-08 18:45:29 +04:00
|
|
|
{
|
2023-09-26 05:14:06 +03:00
|
|
|
int ret;
|
|
|
|
|
2012-10-08 18:45:29 +04:00
|
|
|
vfio_disable_interrupts(vdev);
|
|
|
|
|
2015-11-10 22:11:08 +03:00
|
|
|
vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
|
2012-10-08 18:45:29 +04:00
|
|
|
|
|
|
|
vdev->interrupt = VFIO_INT_MSIX;
|
|
|
|
|
2013-01-09 01:09:03 +04:00
|
|
|
/*
|
2022-03-26 09:02:26 +03:00
|
|
|
* Setting vector notifiers triggers synchronous vector-use
|
|
|
|
* callbacks for each active vector. Deferring to commit the KVM
|
|
|
|
* routes once rather than per vector provides a substantial
|
|
|
|
* performance improvement.
|
2013-01-09 01:09:03 +04:00
|
|
|
*/
|
2022-03-26 09:02:26 +03:00
|
|
|
vfio_prepare_kvm_msi_virq_batch(vdev);
|
2013-01-09 01:09:03 +04:00
|
|
|
|
2022-03-26 09:02:25 +03:00
|
|
|
if (msix_set_vector_notifiers(&vdev->pdev, vfio_msix_vector_use,
|
2012-12-12 18:10:02 +04:00
|
|
|
vfio_msix_vector_release, NULL)) {
|
error: Strip trailing '\n' from error string arguments (again)
Commit 6daf194d and be62a2eb got rid of a bunch, but they keep coming
back. Tracked down with this Coccinelle semantic patch:
@r@
expression err, eno, cls, fmt;
position p;
@@
(
error_report(fmt, ...)@p
|
error_set(err, cls, fmt, ...)@p
|
error_set_errno(err, eno, cls, fmt, ...)@p
|
error_setg(err, fmt, ...)@p
|
error_setg_errno(err, eno, fmt, ...)@p
)
@script:python@
fmt << r.fmt;
p << r.p;
@@
if "\\n" in str(fmt):
print "%s:%s:%s:%s" % (p[0].file, p[0].line, p[0].column, fmt)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1360354939-10994-4-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-09 00:22:16 +04:00
|
|
|
error_report("vfio: msix_set_vector_notifiers failed");
|
2012-10-08 18:45:29 +04:00
|
|
|
}
|
|
|
|
|
2022-03-26 09:02:26 +03:00
|
|
|
vfio_commit_kvm_msi_virq_batch(vdev);
|
|
|
|
|
|
|
|
if (vdev->nr_vectors) {
|
|
|
|
ret = vfio_enable_vectors(vdev, true);
|
|
|
|
if (ret) {
|
|
|
|
error_report("vfio: failed to enable vectors, %d", ret);
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Some communication channels between VF & PF or PF & fw rely on the
|
|
|
|
* physical state of the device and expect that enabling MSI-X from the
|
|
|
|
* guest enables the same on the host. When our guest is Linux, the
|
|
|
|
* guest driver call to pci_enable_msix() sets the enabling bit in the
|
|
|
|
* MSI-X capability, but leaves the vector table masked. We therefore
|
|
|
|
* can't rely on a vector_use callback (from request_irq() in the guest)
|
|
|
|
* to switch the physical device into MSI-X mode because that may come a
|
2023-09-26 05:14:06 +03:00
|
|
|
* long time after pci_enable_msix(). This code sets vector 0 with an
|
|
|
|
* invalid fd to make the physical device MSI-X enabled, but with no
|
|
|
|
* vectors enabled, just like the guest view.
|
2022-03-26 09:02:26 +03:00
|
|
|
*/
|
2023-09-26 05:14:06 +03:00
|
|
|
ret = vfio_enable_msix_no_vec(vdev);
|
|
|
|
if (ret) {
|
|
|
|
error_report("vfio: failed to enable MSI-X, %d", ret);
|
|
|
|
}
|
2022-03-26 09:02:26 +03:00
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
trace_vfio_msix_enable(vdev->vbasedev.name);
|
2012-10-08 18:45:29 +04:00
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
static void vfio_msi_enable(VFIOPCIDevice *vdev)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
int ret, i;
|
|
|
|
|
|
|
|
vfio_disable_interrupts(vdev);
|
|
|
|
|
2023-06-13 17:09:43 +03:00
|
|
|
vdev->nr_vectors = msi_nr_vectors_allocated(&vdev->pdev);
|
|
|
|
retry:
|
2022-03-26 09:02:26 +03:00
|
|
|
/*
|
|
|
|
* Setting vector notifiers needs to enable route for each vector.
|
|
|
|
* Deferring to commit the KVM routes once rather than per vector
|
|
|
|
* provides a substantial performance improvement.
|
|
|
|
*/
|
|
|
|
vfio_prepare_kvm_msi_virq_batch(vdev);
|
|
|
|
|
2015-11-10 22:11:08 +03:00
|
|
|
vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->nr_vectors);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
for (i = 0; i < vdev->nr_vectors; i++) {
|
|
|
|
VFIOMSIVector *vector = &vdev->msi_vectors[i];
|
|
|
|
|
|
|
|
vector->vdev = vdev;
|
2014-06-30 19:50:33 +04:00
|
|
|
vector->virq = -1;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vector->use = true;
|
|
|
|
|
|
|
|
if (event_notifier_init(&vector->interrupt, 0)) {
|
error: Strip trailing '\n' from error string arguments (again)
Commit 6daf194d and be62a2eb got rid of a bunch, but they keep coming
back. Tracked down with this Coccinelle semantic patch:
@r@
expression err, eno, cls, fmt;
position p;
@@
(
error_report(fmt, ...)@p
|
error_set(err, cls, fmt, ...)@p
|
error_set_errno(err, eno, cls, fmt, ...)@p
|
error_setg(err, fmt, ...)@p
|
error_setg_errno(err, eno, fmt, ...)@p
)
@script:python@
fmt << r.fmt;
p << r.p;
@@
if "\\n" in str(fmt):
print "%s:%s:%s:%s" % (p[0].file, p[0].line, p[0].column, fmt)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1360354939-10994-4-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-09 00:22:16 +04:00
|
|
|
error_report("vfio: Error: event_notifier_init failed");
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2014-06-30 19:50:33 +04:00
|
|
|
qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt),
|
|
|
|
vfio_msi_interrupt, NULL, vector);
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
/*
|
|
|
|
* Attempt to enable route through KVM irqchip,
|
|
|
|
* default to userspace handling if unavailable.
|
|
|
|
*/
|
2016-07-14 08:56:30 +03:00
|
|
|
vfio_add_kvm_msi_virq(vdev, vector, i, false);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2022-03-26 09:02:26 +03:00
|
|
|
vfio_commit_kvm_msi_virq_batch(vdev);
|
|
|
|
|
2014-06-30 19:50:33 +04:00
|
|
|
/* Set interrupt type prior to possible interrupts */
|
|
|
|
vdev->interrupt = VFIO_INT_MSI;
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
ret = vfio_enable_vectors(vdev, false);
|
|
|
|
if (ret) {
|
|
|
|
if (ret < 0) {
|
error: Strip trailing '\n' from error string arguments (again)
Commit 6daf194d and be62a2eb got rid of a bunch, but they keep coming
back. Tracked down with this Coccinelle semantic patch:
@r@
expression err, eno, cls, fmt;
position p;
@@
(
error_report(fmt, ...)@p
|
error_set(err, cls, fmt, ...)@p
|
error_set_errno(err, eno, cls, fmt, ...)@p
|
error_setg(err, fmt, ...)@p
|
error_setg_errno(err, eno, fmt, ...)@p
)
@script:python@
fmt << r.fmt;
p << r.p;
@@
if "\\n" in str(fmt):
print "%s:%s:%s:%s" % (p[0].file, p[0].line, p[0].column, fmt)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1360354939-10994-4-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-09 00:22:16 +04:00
|
|
|
error_report("vfio: Error: Failed to setup MSI fds: %m");
|
2022-03-26 09:02:22 +03:00
|
|
|
} else {
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
error_report("vfio: Error: Failed to enable %d "
|
error: Strip trailing '\n' from error string arguments (again)
Commit 6daf194d and be62a2eb got rid of a bunch, but they keep coming
back. Tracked down with this Coccinelle semantic patch:
@r@
expression err, eno, cls, fmt;
position p;
@@
(
error_report(fmt, ...)@p
|
error_set(err, cls, fmt, ...)@p
|
error_set_errno(err, eno, cls, fmt, ...)@p
|
error_setg(err, fmt, ...)@p
|
error_setg_errno(err, eno, fmt, ...)@p
)
@script:python@
fmt << r.fmt;
p << r.p;
@@
if "\\n" in str(fmt):
print "%s:%s:%s:%s" % (p[0].file, p[0].line, p[0].column, fmt)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1360354939-10994-4-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-09 00:22:16 +04:00
|
|
|
"MSI vectors, retry with %d", vdev->nr_vectors, ret);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2022-03-26 09:02:24 +03:00
|
|
|
vfio_msi_disable_common(vdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2022-03-26 09:02:22 +03:00
|
|
|
if (ret > 0) {
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vdev->nr_vectors = ret;
|
|
|
|
goto retry;
|
|
|
|
}
|
|
|
|
|
2014-06-30 19:50:33 +04:00
|
|
|
/*
|
|
|
|
* Failing to setup MSI doesn't really fall within any specification.
|
|
|
|
* Let's try leaving interrupts disabled and hope the guest figures
|
|
|
|
* out to fall back to INTx for this device.
|
|
|
|
*/
|
|
|
|
error_report("vfio: Error: Failed to enable MSI");
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
trace_vfio_msi_enable(vdev->vbasedev.name, vdev->nr_vectors);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
static void vfio_msi_disable_common(VFIOPCIDevice *vdev)
|
2012-10-08 18:45:29 +04:00
|
|
|
{
|
2014-06-30 19:50:33 +04:00
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < vdev->nr_vectors; i++) {
|
|
|
|
VFIOMSIVector *vector = &vdev->msi_vectors[i];
|
|
|
|
if (vdev->msi_vectors[i].use) {
|
|
|
|
if (vector->virq >= 0) {
|
|
|
|
vfio_remove_kvm_msi_virq(vector);
|
|
|
|
}
|
|
|
|
qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt),
|
|
|
|
NULL, NULL, NULL);
|
|
|
|
event_notifier_cleanup(&vector->interrupt);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-10-08 18:45:29 +04:00
|
|
|
g_free(vdev->msi_vectors);
|
|
|
|
vdev->msi_vectors = NULL;
|
|
|
|
vdev->nr_vectors = 0;
|
|
|
|
vdev->interrupt = VFIO_INT_NONE;
|
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
static void vfio_msix_disable(VFIOPCIDevice *vdev)
|
2012-10-08 18:45:29 +04:00
|
|
|
{
|
2022-03-26 09:02:23 +03:00
|
|
|
Error *err = NULL;
|
2013-12-06 22:16:40 +04:00
|
|
|
int i;
|
|
|
|
|
2012-10-08 18:45:29 +04:00
|
|
|
msix_unset_vector_notifiers(&vdev->pdev);
|
|
|
|
|
2013-12-06 22:16:40 +04:00
|
|
|
/*
|
|
|
|
* MSI-X will only release vectors if MSI-X is still enabled on the
|
|
|
|
* device, check through the rest and release it ourselves if necessary.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < vdev->nr_vectors; i++) {
|
|
|
|
if (vdev->msi_vectors[i].use) {
|
|
|
|
vfio_msix_vector_release(&vdev->pdev, i);
|
2014-06-30 19:50:33 +04:00
|
|
|
msix_vector_unuse(&vdev->pdev, i);
|
2013-12-06 22:16:40 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-01-25 16:27:36 +03:00
|
|
|
/*
|
|
|
|
* Always clear MSI-X IRQ index. A PF device could have enabled
|
|
|
|
* MSI-X with no vectors. See vfio_msix_enable().
|
|
|
|
*/
|
|
|
|
vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
|
2012-10-08 18:45:29 +04:00
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
vfio_msi_disable_common(vdev);
|
2024-05-22 07:40:06 +03:00
|
|
|
if (!vfio_intx_enable(vdev, &err)) {
|
2022-03-26 09:02:23 +03:00
|
|
|
error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
|
|
|
|
}
|
2012-10-08 18:45:29 +04:00
|
|
|
|
2016-01-19 21:33:42 +03:00
|
|
|
memset(vdev->msix->pending, 0,
|
|
|
|
BITS_TO_LONGS(vdev->msix->entries) * sizeof(unsigned long));
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
trace_vfio_msix_disable(vdev->vbasedev.name);
|
2012-10-08 18:45:29 +04:00
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
static void vfio_msi_disable(VFIOPCIDevice *vdev)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
2022-03-26 09:02:23 +03:00
|
|
|
Error *err = NULL;
|
|
|
|
|
2014-12-20 01:24:31 +03:00
|
|
|
vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
|
2015-09-23 22:04:43 +03:00
|
|
|
vfio_msi_disable_common(vdev);
|
2022-03-26 09:02:23 +03:00
|
|
|
vfio_intx_enable(vdev, &err);
|
|
|
|
if (err) {
|
|
|
|
error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
|
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
trace_vfio_msi_disable(vdev->vbasedev.name);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_update_msi(VFIOPCIDevice *vdev)
|
2013-10-02 22:52:38 +04:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < vdev->nr_vectors; i++) {
|
|
|
|
VFIOMSIVector *vector = &vdev->msi_vectors[i];
|
|
|
|
MSIMessage msg;
|
|
|
|
|
|
|
|
if (!vector->use || vector->virq < 0) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
msg = msi_get_message(&vdev->pdev, i);
|
2015-10-15 16:44:52 +03:00
|
|
|
vfio_update_kvm_msi_virq(vector, msg, &vdev->pdev);
|
2013-10-02 22:52:38 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
|
2013-10-02 22:52:38 +04:00
|
|
|
{
|
2024-05-22 07:40:12 +03:00
|
|
|
g_autofree struct vfio_region_info *reg_info = NULL;
|
2013-10-02 22:52:38 +04:00
|
|
|
uint64_t size;
|
|
|
|
off_t off = 0;
|
2015-07-06 21:15:12 +03:00
|
|
|
ssize_t bytes;
|
2013-10-02 22:52:38 +04:00
|
|
|
|
2016-03-10 19:39:07 +03:00
|
|
|
if (vfio_get_region_info(&vdev->vbasedev,
|
|
|
|
VFIO_PCI_ROM_REGION_INDEX, ®_info)) {
|
2013-10-02 22:52:38 +04:00
|
|
|
error_report("vfio: Error getting ROM info: %m");
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2016-03-10 19:39:07 +03:00
|
|
|
trace_vfio_pci_load_rom(vdev->vbasedev.name, (unsigned long)reg_info->size,
|
|
|
|
(unsigned long)reg_info->offset,
|
|
|
|
(unsigned long)reg_info->flags);
|
|
|
|
|
|
|
|
vdev->rom_size = size = reg_info->size;
|
|
|
|
vdev->rom_offset = reg_info->offset;
|
2013-10-02 22:52:38 +04:00
|
|
|
|
|
|
|
if (!vdev->rom_size) {
|
2014-01-15 21:11:52 +04:00
|
|
|
vdev->rom_read_failed = true;
|
2014-01-15 21:11:06 +04:00
|
|
|
error_report("vfio-pci: Cannot read device rom at "
|
2014-12-22 19:54:49 +03:00
|
|
|
"%s", vdev->vbasedev.name);
|
2014-01-15 21:11:06 +04:00
|
|
|
error_printf("Device option ROM contents are probably invalid "
|
|
|
|
"(check dmesg).\nSkip option ROM probe with rombar=0, "
|
|
|
|
"or load from file with romfile=\n");
|
2013-10-02 22:52:38 +04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vdev->rom = g_malloc(size);
|
|
|
|
memset(vdev->rom, 0xff, size);
|
|
|
|
|
|
|
|
while (size) {
|
2014-12-20 01:24:31 +03:00
|
|
|
bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
|
|
|
|
size, vdev->rom_offset + off);
|
2013-10-02 22:52:38 +04:00
|
|
|
if (bytes == 0) {
|
|
|
|
break;
|
|
|
|
} else if (bytes > 0) {
|
|
|
|
off += bytes;
|
|
|
|
size -= bytes;
|
|
|
|
} else {
|
|
|
|
if (errno == EINTR || errno == EAGAIN) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
error_report("vfio: Error reading device ROM: %m");
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2016-03-10 19:39:08 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Test the ROM signature against our device, if the vendor is correct
|
|
|
|
* but the device ID doesn't match, store the correct device ID and
|
|
|
|
* recompute the checksum. Intel IGD devices need this and are known
|
|
|
|
* to have bogus checksums so we can't simply adjust the checksum.
|
|
|
|
*/
|
|
|
|
if (pci_get_word(vdev->rom) == 0xaa55 &&
|
|
|
|
pci_get_word(vdev->rom + 0x18) + 8 < vdev->rom_size &&
|
|
|
|
!memcmp(vdev->rom + pci_get_word(vdev->rom + 0x18), "PCIR", 4)) {
|
|
|
|
uint16_t vid, did;
|
|
|
|
|
|
|
|
vid = pci_get_word(vdev->rom + pci_get_word(vdev->rom + 0x18) + 4);
|
|
|
|
did = pci_get_word(vdev->rom + pci_get_word(vdev->rom + 0x18) + 6);
|
|
|
|
|
|
|
|
if (vid == vdev->vendor_id && did != vdev->device_id) {
|
|
|
|
int i;
|
|
|
|
uint8_t csum, *data = vdev->rom;
|
|
|
|
|
|
|
|
pci_set_word(vdev->rom + pci_get_word(vdev->rom + 0x18) + 6,
|
|
|
|
vdev->device_id);
|
|
|
|
data[6] = 0;
|
|
|
|
|
|
|
|
for (csum = 0, i = 0; i < vdev->rom_size; i++) {
|
|
|
|
csum += data[i];
|
|
|
|
}
|
|
|
|
|
|
|
|
data[6] = -csum;
|
|
|
|
}
|
|
|
|
}
|
2013-10-02 22:52:38 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
|
|
|
|
{
|
2014-12-20 01:24:15 +03:00
|
|
|
VFIOPCIDevice *vdev = opaque;
|
2014-09-23 01:27:43 +04:00
|
|
|
union {
|
|
|
|
uint8_t byte;
|
|
|
|
uint16_t word;
|
|
|
|
uint32_t dword;
|
|
|
|
uint64_t qword;
|
|
|
|
} val;
|
|
|
|
uint64_t data = 0;
|
2013-10-02 22:52:38 +04:00
|
|
|
|
|
|
|
/* Load the ROM lazily when the guest tries to read it */
|
2014-03-25 18:24:20 +04:00
|
|
|
if (unlikely(!vdev->rom && !vdev->rom_read_failed)) {
|
2013-10-02 22:52:38 +04:00
|
|
|
vfio_pci_load_rom(vdev);
|
|
|
|
}
|
|
|
|
|
2014-09-23 01:26:36 +04:00
|
|
|
memcpy(&val, vdev->rom + addr,
|
2013-10-02 22:52:38 +04:00
|
|
|
(addr < vdev->rom_size) ? MIN(size, vdev->rom_size - addr) : 0);
|
|
|
|
|
2014-09-23 01:27:43 +04:00
|
|
|
switch (size) {
|
|
|
|
case 1:
|
|
|
|
data = val.byte;
|
|
|
|
break;
|
|
|
|
case 2:
|
|
|
|
data = le16_to_cpu(val.word);
|
|
|
|
break;
|
|
|
|
case 4:
|
|
|
|
data = le32_to_cpu(val.dword);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
hw_error("vfio: unsupported read size, %d bytes\n", size);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_rom_read(vdev->vbasedev.name, addr, size, data);
|
2013-10-02 22:52:38 +04:00
|
|
|
|
2014-09-23 01:27:43 +04:00
|
|
|
return data;
|
2013-10-02 22:52:38 +04:00
|
|
|
}
|
|
|
|
|
2013-10-04 18:51:36 +04:00
|
|
|
static void vfio_rom_write(void *opaque, hwaddr addr,
|
|
|
|
uint64_t data, unsigned size)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2013-10-02 22:52:38 +04:00
|
|
|
static const MemoryRegionOps vfio_rom_ops = {
|
|
|
|
.read = vfio_rom_read,
|
2013-10-04 18:51:36 +04:00
|
|
|
.write = vfio_rom_write,
|
2014-09-23 01:26:36 +04:00
|
|
|
.endianness = DEVICE_LITTLE_ENDIAN,
|
2013-10-02 22:52:38 +04:00
|
|
|
};
|
|
|
|
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
|
2013-10-02 22:52:38 +04:00
|
|
|
{
|
2013-10-04 22:50:51 +04:00
|
|
|
uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
|
2013-10-02 22:52:38 +04:00
|
|
|
off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
|
2014-02-26 21:33:45 +04:00
|
|
|
DeviceState *dev = DEVICE(vdev);
|
2016-03-10 19:39:09 +03:00
|
|
|
char *name;
|
2014-12-20 01:24:31 +03:00
|
|
|
int fd = vdev->vbasedev.fd;
|
2013-10-02 22:52:38 +04:00
|
|
|
|
|
|
|
if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
|
2014-02-26 21:33:45 +04:00
|
|
|
/* Since pci handles romfile, just print a message and return */
|
2021-02-05 20:18:17 +03:00
|
|
|
if (vfio_opt_rom_in_denylist(vdev) && vdev->pdev.romfile) {
|
2019-04-17 22:06:33 +03:00
|
|
|
warn_report("Device at %s is known to cause system instability"
|
|
|
|
" issues during option rom execution",
|
|
|
|
vdev->vbasedev.name);
|
|
|
|
error_printf("Proceeding anyway since user specified romfile\n");
|
2014-02-26 21:33:45 +04:00
|
|
|
}
|
2013-10-02 22:52:38 +04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Use the same size ROM BAR as the physical device. The contents
|
|
|
|
* will get filled in later when the guest tries to read it.
|
|
|
|
*/
|
2014-12-20 01:24:31 +03:00
|
|
|
if (pread(fd, &orig, 4, offset) != 4 ||
|
|
|
|
pwrite(fd, &size, 4, offset) != 4 ||
|
|
|
|
pread(fd, &size, 4, offset) != 4 ||
|
|
|
|
pwrite(fd, &orig, 4, offset) != 4) {
|
vfio: Add sysfsdev property for pci & platform
vfio-pci currently requires a host= parameter, which comes in the
form of a PCI address in [domain:]<bus:slot.function> notation. We
expect to find a matching entry in sysfs for that under
/sys/bus/pci/devices/. vfio-platform takes a similar approach, but
defines the host= parameter to be a string, which can be matched
directly under /sys/bus/platform/devices/. On the PCI side, we have
some interest in using vfio to expose vGPU devices. These are not
actual discrete PCI devices, so they don't have a compatible host PCI
bus address or a device link where QEMU wants to look for it. There's
also really no requirement that vfio can only be used to expose
physical devices, a new vfio bus and iommu driver could expose a
completely emulated device. To fit within the vfio framework, it
would need a kernel struct device and associated IOMMU group, but
those are easy constraints to manage.
To support such devices, which would include vGPUs, that honor the
VFIO PCI programming API, but are not necessarily backed by a unique
PCI address, add support for specifying any device in sysfs. The
vfio API already has support for probing the device type to ensure
compatibility with either vfio-pci or vfio-platform.
With this, a vfio-pci device could either be specified as:
-device vfio-pci,host=02:00.0
or
-device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0
or even
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0
When vGPU support comes along, this might look something more like:
-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0
NB - This is only a made up example path
The same change is made for vfio-platform, specifying sysfsdev has
precedence over the old host option.
Tested-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 19:39:07 +03:00
|
|
|
error_report("%s(%s) failed: %m", __func__, vdev->vbasedev.name);
|
2013-10-02 22:52:38 +04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2013-10-04 22:50:51 +04:00
|
|
|
size = ~(le32_to_cpu(size) & PCI_ROM_ADDRESS_MASK) + 1;
|
2013-10-02 22:52:38 +04:00
|
|
|
|
|
|
|
if (!size) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2021-02-05 20:18:17 +03:00
|
|
|
if (vfio_opt_rom_in_denylist(vdev)) {
|
2021-10-08 16:34:41 +03:00
|
|
|
if (dev->opts && qdict_haskey(dev->opts, "rombar")) {
|
2019-04-17 22:06:33 +03:00
|
|
|
warn_report("Device at %s is known to cause system instability"
|
|
|
|
" issues during option rom execution",
|
|
|
|
vdev->vbasedev.name);
|
|
|
|
error_printf("Proceeding anyway since user specified"
|
|
|
|
" non zero value for rombar\n");
|
2014-02-26 21:33:45 +04:00
|
|
|
} else {
|
2019-04-17 22:06:33 +03:00
|
|
|
warn_report("Rom loading for device at %s has been disabled"
|
|
|
|
" due to system instability issues",
|
|
|
|
vdev->vbasedev.name);
|
|
|
|
error_printf("Specify rombar=1 or romfile to force\n");
|
2014-02-26 21:33:45 +04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_pci_size_rom(vdev->vbasedev.name, size);
|
2013-10-02 22:52:38 +04:00
|
|
|
|
2016-03-10 19:39:09 +03:00
|
|
|
name = g_strdup_printf("vfio[%s].rom", vdev->vbasedev.name);
|
2013-10-02 22:52:38 +04:00
|
|
|
|
|
|
|
memory_region_init_io(&vdev->pdev.rom, OBJECT(vdev),
|
|
|
|
&vfio_rom_ops, vdev, name, size);
|
2016-03-10 19:39:09 +03:00
|
|
|
g_free(name);
|
2013-10-02 22:52:38 +04:00
|
|
|
|
|
|
|
pci_register_bar(&vdev->pdev, PCI_ROM_SLOT,
|
|
|
|
PCI_BASE_ADDRESS_SPACE_MEMORY, &vdev->pdev.rom);
|
|
|
|
|
2014-01-15 21:11:52 +04:00
|
|
|
vdev->rom_read_failed = false;
|
2013-10-02 22:52:38 +04:00
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:45 +03:00
|
|
|
void vfio_vga_write(void *opaque, hwaddr addr,
|
2013-04-01 23:33:44 +04:00
|
|
|
uint64_t data, unsigned size)
|
|
|
|
{
|
|
|
|
VFIOVGARegion *region = opaque;
|
|
|
|
VFIOVGA *vga = container_of(region, VFIOVGA, region[region->nr]);
|
|
|
|
union {
|
|
|
|
uint8_t byte;
|
|
|
|
uint16_t word;
|
|
|
|
uint32_t dword;
|
|
|
|
uint64_t qword;
|
|
|
|
} buf;
|
|
|
|
off_t offset = vga->fd_offset + region->offset + addr;
|
|
|
|
|
|
|
|
switch (size) {
|
|
|
|
case 1:
|
|
|
|
buf.byte = data;
|
|
|
|
break;
|
|
|
|
case 2:
|
|
|
|
buf.word = cpu_to_le16(data);
|
|
|
|
break;
|
|
|
|
case 4:
|
|
|
|
buf.dword = cpu_to_le32(data);
|
|
|
|
break;
|
|
|
|
default:
|
2014-03-25 22:08:52 +04:00
|
|
|
hw_error("vfio: unsupported write size, %d bytes", size);
|
2013-04-01 23:33:44 +04:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (pwrite(vga->fd, &buf, size, offset) != size) {
|
|
|
|
error_report("%s(,0x%"HWADDR_PRIx", 0x%"PRIx64", %d) failed: %m",
|
|
|
|
__func__, region->offset + addr, data, size);
|
|
|
|
}
|
|
|
|
|
2014-12-20 00:40:06 +03:00
|
|
|
trace_vfio_vga_write(region->offset + addr, data, size);
|
2013-04-01 23:33:44 +04:00
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:45 +03:00
|
|
|
uint64_t vfio_vga_read(void *opaque, hwaddr addr, unsigned size)
|
2013-04-01 23:33:44 +04:00
|
|
|
{
|
|
|
|
VFIOVGARegion *region = opaque;
|
|
|
|
VFIOVGA *vga = container_of(region, VFIOVGA, region[region->nr]);
|
|
|
|
union {
|
|
|
|
uint8_t byte;
|
|
|
|
uint16_t word;
|
|
|
|
uint32_t dword;
|
|
|
|
uint64_t qword;
|
|
|
|
} buf;
|
|
|
|
uint64_t data = 0;
|
|
|
|
off_t offset = vga->fd_offset + region->offset + addr;
|
|
|
|
|
|
|
|
if (pread(vga->fd, &buf, size, offset) != size) {
|
|
|
|
error_report("%s(,0x%"HWADDR_PRIx", %d) failed: %m",
|
|
|
|
__func__, region->offset + addr, size);
|
|
|
|
return (uint64_t)-1;
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (size) {
|
|
|
|
case 1:
|
|
|
|
data = buf.byte;
|
|
|
|
break;
|
|
|
|
case 2:
|
|
|
|
data = le16_to_cpu(buf.word);
|
|
|
|
break;
|
|
|
|
case 4:
|
|
|
|
data = le32_to_cpu(buf.dword);
|
|
|
|
break;
|
|
|
|
default:
|
2014-03-25 22:08:52 +04:00
|
|
|
hw_error("vfio: unsupported read size, %d bytes", size);
|
2013-04-01 23:33:44 +04:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2014-12-20 00:40:06 +03:00
|
|
|
trace_vfio_vga_read(region->offset + addr, size, data);
|
2013-04-01 23:33:44 +04:00
|
|
|
|
|
|
|
return data;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const MemoryRegionOps vfio_vga_ops = {
|
|
|
|
.read = vfio_vga_read,
|
|
|
|
.write = vfio_vga_write,
|
|
|
|
.endianness = DEVICE_LITTLE_ENDIAN,
|
|
|
|
};
|
|
|
|
|
2016-10-31 18:53:04 +03:00
|
|
|
/*
|
|
|
|
* Expand memory region of sub-page(size < PAGE_SIZE) MMIO BAR to page
|
|
|
|
* size if the BAR is in an exclusive page in host so that we could map
|
|
|
|
* this BAR to guest. But this sub-page BAR may not occupy an exclusive
|
|
|
|
* page in guest. So we should set the priority of the expanded memory
|
|
|
|
* region to zero in case of overlap with BARs which share the same page
|
|
|
|
* with the sub-page BAR in guest. Besides, we should also recover the
|
|
|
|
* size of this sub-page BAR when its base address is changed in guest
|
|
|
|
* and not page aligned any more.
|
|
|
|
*/
|
|
|
|
static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
|
|
|
|
{
|
2020-09-03 01:43:03 +03:00
|
|
|
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
|
2016-10-31 18:53:04 +03:00
|
|
|
VFIORegion *region = &vdev->bars[bar].region;
|
2018-02-06 21:08:25 +03:00
|
|
|
MemoryRegion *mmap_mr, *region_mr, *base_mr;
|
2016-10-31 18:53:04 +03:00
|
|
|
PCIIORegion *r;
|
|
|
|
pcibus_t bar_addr;
|
|
|
|
uint64_t size = region->size;
|
|
|
|
|
|
|
|
/* Make sure that the whole region is allowed to be mmapped */
|
|
|
|
if (region->nr_mmaps != 1 || !region->mmaps[0].mmap ||
|
|
|
|
region->mmaps[0].size != region->size) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
r = &pdev->io_regions[bar];
|
|
|
|
bar_addr = r->addr;
|
2018-02-06 21:08:25 +03:00
|
|
|
base_mr = vdev->bars[bar].mr;
|
|
|
|
region_mr = region->mem;
|
2016-10-31 18:53:04 +03:00
|
|
|
mmap_mr = ®ion->mmaps[0].mem;
|
|
|
|
|
|
|
|
/* If BAR is mapped and page aligned, update to fill PAGE_SIZE */
|
|
|
|
if (bar_addr != PCI_BAR_UNMAPPED &&
|
2022-03-23 18:57:22 +03:00
|
|
|
!(bar_addr & ~qemu_real_host_page_mask())) {
|
|
|
|
size = qemu_real_host_page_size();
|
2016-10-31 18:53:04 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
memory_region_transaction_begin();
|
|
|
|
|
2018-02-06 21:08:25 +03:00
|
|
|
if (vdev->bars[bar].size < size) {
|
|
|
|
memory_region_set_size(base_mr, size);
|
|
|
|
}
|
|
|
|
memory_region_set_size(region_mr, size);
|
2016-10-31 18:53:04 +03:00
|
|
|
memory_region_set_size(mmap_mr, size);
|
2018-02-06 21:08:25 +03:00
|
|
|
if (size != vdev->bars[bar].size && memory_region_is_mapped(base_mr)) {
|
|
|
|
memory_region_del_subregion(r->address_space, base_mr);
|
2016-10-31 18:53:04 +03:00
|
|
|
memory_region_add_subregion_overlap(r->address_space,
|
2018-02-06 21:08:25 +03:00
|
|
|
bar_addr, base_mr, 0);
|
2016-10-31 18:53:04 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
memory_region_transaction_commit();
|
|
|
|
}
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
/*
|
|
|
|
* PCI config space
|
|
|
|
*/
|
2015-09-23 22:04:45 +03:00
|
|
|
uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
2020-09-03 01:43:03 +03:00
|
|
|
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
|
2013-04-01 21:50:04 +04:00
|
|
|
uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2013-04-01 21:50:04 +04:00
|
|
|
memcpy(&emu_bits, vdev->emulated_config_bits + addr, len);
|
|
|
|
emu_bits = le32_to_cpu(emu_bits);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2013-04-01 21:50:04 +04:00
|
|
|
if (emu_bits) {
|
|
|
|
emu_val = pci_default_read_config(pdev, addr, len);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
|
|
|
|
ssize_t ret;
|
|
|
|
|
2014-12-20 01:24:31 +03:00
|
|
|
ret = pread(vdev->vbasedev.fd, &phys_val, len,
|
|
|
|
vdev->config_offset + addr);
|
2013-04-01 21:50:04 +04:00
|
|
|
if (ret != len) {
|
vfio: Add sysfsdev property for pci & platform
vfio-pci currently requires a host= parameter, which comes in the
form of a PCI address in [domain:]<bus:slot.function> notation. We
expect to find a matching entry in sysfs for that under
/sys/bus/pci/devices/. vfio-platform takes a similar approach, but
defines the host= parameter to be a string, which can be matched
directly under /sys/bus/platform/devices/. On the PCI side, we have
some interest in using vfio to expose vGPU devices. These are not
actual discrete PCI devices, so they don't have a compatible host PCI
bus address or a device link where QEMU wants to look for it. There's
also really no requirement that vfio can only be used to expose
physical devices, a new vfio bus and iommu driver could expose a
completely emulated device. To fit within the vfio framework, it
would need a kernel struct device and associated IOMMU group, but
those are easy constraints to manage.
To support such devices, which would include vGPUs, that honor the
VFIO PCI programming API, but are not necessarily backed by a unique
PCI address, add support for specifying any device in sysfs. The
vfio API already has support for probing the device type to ensure
compatibility with either vfio-pci or vfio-platform.
With this, a vfio-pci device could either be specified as:
-device vfio-pci,host=02:00.0
or
-device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0
or even
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0
When vGPU support comes along, this might look something more like:
-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0
NB - This is only a made up example path
The same change is made for vfio-platform, specifying sysfsdev has
precedence over the old host option.
Tested-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 19:39:07 +03:00
|
|
|
error_report("%s(%s, 0x%x, 0x%x) failed: %m",
|
|
|
|
__func__, vdev->vbasedev.name, addr, len);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
return -errno;
|
|
|
|
}
|
2013-04-01 21:50:04 +04:00
|
|
|
phys_val = le32_to_cpu(phys_val);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2013-04-01 21:50:04 +04:00
|
|
|
val = (emu_val & emu_bits) | (phys_val & ~emu_bits);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_pci_read_config(vdev->vbasedev.name, addr, len, val);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
return val;
|
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:45 +03:00
|
|
|
void vfio_pci_write_config(PCIDevice *pdev,
|
|
|
|
uint32_t addr, uint32_t val, int len)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
2020-09-03 01:43:03 +03:00
|
|
|
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
uint32_t val_le = cpu_to_le32(val);
|
|
|
|
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
/* Write everything to VFIO, let it filter out what we can't write */
|
2014-12-20 01:24:31 +03:00
|
|
|
if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
|
|
|
|
!= len) {
|
vfio: Add sysfsdev property for pci & platform
vfio-pci currently requires a host= parameter, which comes in the
form of a PCI address in [domain:]<bus:slot.function> notation. We
expect to find a matching entry in sysfs for that under
/sys/bus/pci/devices/. vfio-platform takes a similar approach, but
defines the host= parameter to be a string, which can be matched
directly under /sys/bus/platform/devices/. On the PCI side, we have
some interest in using vfio to expose vGPU devices. These are not
actual discrete PCI devices, so they don't have a compatible host PCI
bus address or a device link where QEMU wants to look for it. There's
also really no requirement that vfio can only be used to expose
physical devices, a new vfio bus and iommu driver could expose a
completely emulated device. To fit within the vfio framework, it
would need a kernel struct device and associated IOMMU group, but
those are easy constraints to manage.
To support such devices, which would include vGPUs, that honor the
VFIO PCI programming API, but are not necessarily backed by a unique
PCI address, add support for specifying any device in sysfs. The
vfio API already has support for probing the device type to ensure
compatibility with either vfio-pci or vfio-platform.
With this, a vfio-pci device could either be specified as:
-device vfio-pci,host=02:00.0
or
-device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0
or even
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0
When vGPU support comes along, this might look something more like:
-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0
NB - This is only a made up example path
The same change is made for vfio-platform, specifying sysfsdev has
precedence over the old host option.
Tested-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 19:39:07 +03:00
|
|
|
error_report("%s(%s, 0x%x, 0x%x, 0x%x) failed: %m",
|
|
|
|
__func__, vdev->vbasedev.name, addr, val, len);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* MSI/MSI-X Enabling/Disabling */
|
|
|
|
if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
|
|
|
|
ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
|
|
|
|
int is_enabled, was_enabled = msi_enabled(pdev);
|
|
|
|
|
|
|
|
pci_default_write_config(pdev, addr, val, len);
|
|
|
|
|
|
|
|
is_enabled = msi_enabled(pdev);
|
|
|
|
|
2013-10-02 22:52:38 +04:00
|
|
|
if (!was_enabled) {
|
|
|
|
if (is_enabled) {
|
2015-09-23 22:04:43 +03:00
|
|
|
vfio_msi_enable(vdev);
|
2013-10-02 22:52:38 +04:00
|
|
|
}
|
|
|
|
} else {
|
|
|
|
if (!is_enabled) {
|
2015-09-23 22:04:43 +03:00
|
|
|
vfio_msi_disable(vdev);
|
2013-10-02 22:52:38 +04:00
|
|
|
} else {
|
|
|
|
vfio_update_msi(vdev);
|
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
2013-04-01 21:50:04 +04:00
|
|
|
} else if (pdev->cap_present & QEMU_PCI_CAP_MSIX &&
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
ranges_overlap(addr, len, pdev->msix_cap, MSIX_CAP_LENGTH)) {
|
|
|
|
int is_enabled, was_enabled = msix_enabled(pdev);
|
|
|
|
|
|
|
|
pci_default_write_config(pdev, addr, val, len);
|
|
|
|
|
|
|
|
is_enabled = msix_enabled(pdev);
|
|
|
|
|
|
|
|
if (!was_enabled && is_enabled) {
|
2015-09-23 22:04:43 +03:00
|
|
|
vfio_msix_enable(vdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
} else if (was_enabled && !is_enabled) {
|
2015-09-23 22:04:43 +03:00
|
|
|
vfio_msix_disable(vdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
2016-10-31 18:53:04 +03:00
|
|
|
} else if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_0, 24) ||
|
|
|
|
range_covers_byte(addr, len, PCI_COMMAND)) {
|
|
|
|
pcibus_t old_addr[PCI_NUM_REGIONS - 1];
|
|
|
|
int bar;
|
|
|
|
|
|
|
|
for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
|
|
|
|
old_addr[bar] = pdev->io_regions[bar].addr;
|
|
|
|
}
|
|
|
|
|
|
|
|
pci_default_write_config(pdev, addr, val, len);
|
|
|
|
|
|
|
|
for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
|
|
|
|
if (old_addr[bar] != pdev->io_regions[bar].addr &&
|
2018-02-06 21:08:25 +03:00
|
|
|
vdev->bars[bar].region.size > 0 &&
|
2022-03-23 18:57:22 +03:00
|
|
|
vdev->bars[bar].region.size < qemu_real_host_page_size()) {
|
2016-10-31 18:53:04 +03:00
|
|
|
vfio_sub_page_bar_update_mapping(pdev, bar);
|
|
|
|
}
|
|
|
|
}
|
2013-04-01 21:50:04 +04:00
|
|
|
} else {
|
|
|
|
/* Write everything to QEMU to keep emulated bits correct */
|
|
|
|
pci_default_write_config(pdev, addr, val, len);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Interrupt setup
|
|
|
|
*/
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
2015-01-09 18:50:53 +03:00
|
|
|
/*
|
|
|
|
* More complicated than it looks. Disabling MSI/X transitions the
|
|
|
|
* device to INTx mode (if supported). Therefore we need to first
|
|
|
|
* disable MSI/X and then cleanup by disabling INTx.
|
|
|
|
*/
|
|
|
|
if (vdev->interrupt == VFIO_INT_MSIX) {
|
2015-09-23 22:04:43 +03:00
|
|
|
vfio_msix_disable(vdev);
|
2015-01-09 18:50:53 +03:00
|
|
|
} else if (vdev->interrupt == VFIO_INT_MSI) {
|
2015-09-23 22:04:43 +03:00
|
|
|
vfio_msi_disable(vdev);
|
2015-01-09 18:50:53 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
if (vdev->interrupt == VFIO_INT_INTx) {
|
2015-09-23 22:04:43 +03:00
|
|
|
vfio_intx_disable(vdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
static bool vfio_msi_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
uint16_t ctrl;
|
|
|
|
bool msi_64bit, msi_maskbit;
|
|
|
|
int ret, entries;
|
2016-06-20 09:13:39 +03:00
|
|
|
Error *err = NULL;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2014-12-20 01:24:31 +03:00
|
|
|
if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
|
2016-10-17 19:57:58 +03:00
|
|
|
error_setg_errno(errp, errno, "failed reading MSI PCI_CAP_FLAGS");
|
2024-05-22 07:40:08 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
ctrl = le16_to_cpu(ctrl);
|
|
|
|
|
|
|
|
msi_64bit = !!(ctrl & PCI_MSI_FLAGS_64BIT);
|
|
|
|
msi_maskbit = !!(ctrl & PCI_MSI_FLAGS_MASKBIT);
|
|
|
|
entries = 1 << ((ctrl & PCI_MSI_FLAGS_QMASK) >> 1);
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
trace_vfio_msi_setup(vdev->vbasedev.name, pos);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2016-06-20 09:13:39 +03:00
|
|
|
ret = msi_init(&vdev->pdev, pos, entries, msi_64bit, msi_maskbit, &err);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
if (ret < 0) {
|
2012-10-08 18:45:30 +04:00
|
|
|
if (ret == -ENOTSUP) {
|
2024-05-22 07:40:08 +03:00
|
|
|
return true;
|
2012-10-08 18:45:30 +04:00
|
|
|
}
|
2018-10-17 11:26:25 +03:00
|
|
|
error_propagate_prepend(errp, err, "msi_init failed: ");
|
2024-05-22 07:40:08 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
vdev->msi_cap_size = 0xa + (msi_maskbit ? 0xa : 0) + (msi_64bit ? 0x4 : 0);
|
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
return true;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2016-03-10 19:39:07 +03:00
|
|
|
static void vfio_pci_fixup_msix_region(VFIOPCIDevice *vdev)
|
|
|
|
{
|
|
|
|
off_t start, end;
|
|
|
|
VFIORegion *region = &vdev->bars[vdev->msix->table_bar].region;
|
|
|
|
|
2018-03-13 20:17:31 +03:00
|
|
|
/*
|
|
|
|
* If the host driver allows mapping of a MSIX data, we are going to
|
|
|
|
* do map the entire BAR and emulate MSIX table on top of that.
|
|
|
|
*/
|
|
|
|
if (vfio_has_region_cap(&vdev->vbasedev, region->nr,
|
|
|
|
VFIO_REGION_INFO_CAP_MSIX_MAPPABLE)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2016-03-10 19:39:07 +03:00
|
|
|
/*
|
|
|
|
* We expect to find a single mmap covering the whole BAR, anything else
|
|
|
|
* means it's either unsupported or already setup.
|
|
|
|
*/
|
|
|
|
if (region->nr_mmaps != 1 || region->mmaps[0].offset ||
|
|
|
|
region->size != region->mmaps[0].size) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* MSI-X table start and end aligned to host page size */
|
2022-03-23 18:57:22 +03:00
|
|
|
start = vdev->msix->table_offset & qemu_real_host_page_mask();
|
2016-03-10 19:39:07 +03:00
|
|
|
end = REAL_HOST_PAGE_ALIGN((uint64_t)vdev->msix->table_offset +
|
|
|
|
(vdev->msix->entries * PCI_MSIX_ENTRY_SIZE));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Does the MSI-X table cover the beginning of the BAR? The whole BAR?
|
|
|
|
* NB - Host page size is necessarily a power of two and so is the PCI
|
|
|
|
* BAR (not counting EA yet), therefore if we have host page aligned
|
|
|
|
* @start and @end, then any remainder of the BAR before or after those
|
|
|
|
* must be at least host page sized and therefore mmap'able.
|
|
|
|
*/
|
|
|
|
if (!start) {
|
|
|
|
if (end >= region->size) {
|
|
|
|
region->nr_mmaps = 0;
|
|
|
|
g_free(region->mmaps);
|
|
|
|
region->mmaps = NULL;
|
|
|
|
trace_vfio_msix_fixup(vdev->vbasedev.name,
|
|
|
|
vdev->msix->table_bar, 0, 0);
|
|
|
|
} else {
|
|
|
|
region->mmaps[0].offset = end;
|
|
|
|
region->mmaps[0].size = region->size - end;
|
|
|
|
trace_vfio_msix_fixup(vdev->vbasedev.name,
|
|
|
|
vdev->msix->table_bar, region->mmaps[0].offset,
|
|
|
|
region->mmaps[0].offset + region->mmaps[0].size);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Maybe it's aligned at the end of the BAR */
|
|
|
|
} else if (end >= region->size) {
|
|
|
|
region->mmaps[0].size = start;
|
|
|
|
trace_vfio_msix_fixup(vdev->vbasedev.name,
|
|
|
|
vdev->msix->table_bar, region->mmaps[0].offset,
|
|
|
|
region->mmaps[0].offset + region->mmaps[0].size);
|
|
|
|
|
|
|
|
/* Otherwise it must split the BAR */
|
|
|
|
} else {
|
|
|
|
region->nr_mmaps = 2;
|
|
|
|
region->mmaps = g_renew(VFIOMmap, region->mmaps, 2);
|
|
|
|
|
|
|
|
memcpy(®ion->mmaps[1], ®ion->mmaps[0], sizeof(VFIOMmap));
|
|
|
|
|
|
|
|
region->mmaps[0].size = start;
|
|
|
|
trace_vfio_msix_fixup(vdev->vbasedev.name,
|
|
|
|
vdev->msix->table_bar, region->mmaps[0].offset,
|
|
|
|
region->mmaps[0].offset + region->mmaps[0].size);
|
|
|
|
|
|
|
|
region->mmaps[1].offset = end;
|
|
|
|
region->mmaps[1].size = region->size - end;
|
|
|
|
trace_vfio_msix_fixup(vdev->vbasedev.name,
|
|
|
|
vdev->msix->table_bar, region->mmaps[1].offset,
|
|
|
|
region->mmaps[1].offset + region->mmaps[1].size);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:04 +03:00
|
|
|
static bool vfio_pci_relocate_msix(VFIOPCIDevice *vdev, Error **errp)
|
vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table. This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64). In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device. However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X. Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location. There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.
This new x-msix-relocation option accepts the following choices:
off: Disable MSI-X relocation, use native device config (default)
auto: Use a known good combination for the platform/device (none yet)
bar0..bar5: Specify the target BAR for MSI-X data structures
If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.
The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area. Take for
example:
# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
...
Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
...
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance. The data sheet specifically refers
to this as an MSI-X BAR. This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.
However, here's another example:
# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
...
Region 0: I/O ports at c000 [size=256]
Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
...
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=1 offset=0000e000
PBA: BAR=1 offset=0000f000
Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR. If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform. At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.
Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures. A few key rules to keep in mind for this selection
include:
* There are only 6 BAR slots, bar0..bar5
* 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
* PCI BARs are always a power of 2 in size, extending == doubling
* The maximum size of a 32-bit BAR is 2GB
* MSI-X data structures must reside in an MMIO BAR
Using these rules, we can evaluate each BAR of the second example
device above as follows:
bar0: I/O port BAR, incompatible with MSI-X tables
bar1: BAR could be extended, incurring another 64KB of MMIO
bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
bar3: BAR could be extended, incurring another 256KB of MMIO
bar4: Unavailable, bar3 is 64bit, this register is used by bar3
bar5: Available, empty BAR, minimum additional MMIO
A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3. The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users. This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 21:08:26 +03:00
|
|
|
{
|
|
|
|
int target_bar = -1;
|
|
|
|
size_t msix_sz;
|
|
|
|
|
|
|
|
if (!vdev->msix || vdev->msix_relo == OFF_AUTOPCIBAR_OFF) {
|
2024-05-22 07:40:04 +03:00
|
|
|
return true;
|
vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table. This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64). In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device. However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X. Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location. There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.
This new x-msix-relocation option accepts the following choices:
off: Disable MSI-X relocation, use native device config (default)
auto: Use a known good combination for the platform/device (none yet)
bar0..bar5: Specify the target BAR for MSI-X data structures
If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.
The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area. Take for
example:
# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
...
Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
...
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance. The data sheet specifically refers
to this as an MSI-X BAR. This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.
However, here's another example:
# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
...
Region 0: I/O ports at c000 [size=256]
Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
...
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=1 offset=0000e000
PBA: BAR=1 offset=0000f000
Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR. If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform. At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.
Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures. A few key rules to keep in mind for this selection
include:
* There are only 6 BAR slots, bar0..bar5
* 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
* PCI BARs are always a power of 2 in size, extending == doubling
* The maximum size of a 32-bit BAR is 2GB
* MSI-X data structures must reside in an MMIO BAR
Using these rules, we can evaluate each BAR of the second example
device above as follows:
bar0: I/O port BAR, incompatible with MSI-X tables
bar1: BAR could be extended, incurring another 64KB of MMIO
bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
bar3: BAR could be extended, incurring another 256KB of MMIO
bar4: Unavailable, bar3 is 64bit, this register is used by bar3
bar5: Available, empty BAR, minimum additional MMIO
A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3. The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users. This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 21:08:26 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* The actual minimum size of MSI-X structures */
|
|
|
|
msix_sz = (vdev->msix->entries * PCI_MSIX_ENTRY_SIZE) +
|
|
|
|
(QEMU_ALIGN_UP(vdev->msix->entries, 64) / 8);
|
|
|
|
/* Round up to host pages, we don't want to share a page */
|
|
|
|
msix_sz = REAL_HOST_PAGE_ALIGN(msix_sz);
|
|
|
|
/* PCI BARs must be a power of 2 */
|
|
|
|
msix_sz = pow2ceil(msix_sz);
|
|
|
|
|
|
|
|
if (vdev->msix_relo == OFF_AUTOPCIBAR_AUTO) {
|
|
|
|
/*
|
|
|
|
* TODO: Lookup table for known devices.
|
|
|
|
*
|
|
|
|
* Logically we might use an algorithm here to select the BAR adding
|
2021-07-30 04:26:13 +03:00
|
|
|
* the least additional MMIO space, but we cannot programmatically
|
vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table. This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64). In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device. However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X. Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location. There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.
This new x-msix-relocation option accepts the following choices:
off: Disable MSI-X relocation, use native device config (default)
auto: Use a known good combination for the platform/device (none yet)
bar0..bar5: Specify the target BAR for MSI-X data structures
If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.
The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area. Take for
example:
# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
...
Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
...
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance. The data sheet specifically refers
to this as an MSI-X BAR. This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.
However, here's another example:
# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
...
Region 0: I/O ports at c000 [size=256]
Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
...
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=1 offset=0000e000
PBA: BAR=1 offset=0000f000
Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR. If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform. At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.
Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures. A few key rules to keep in mind for this selection
include:
* There are only 6 BAR slots, bar0..bar5
* 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
* PCI BARs are always a power of 2 in size, extending == doubling
* The maximum size of a 32-bit BAR is 2GB
* MSI-X data structures must reside in an MMIO BAR
Using these rules, we can evaluate each BAR of the second example
device above as follows:
bar0: I/O port BAR, incompatible with MSI-X tables
bar1: BAR could be extended, incurring another 64KB of MMIO
bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
bar3: BAR could be extended, incurring another 256KB of MMIO
bar4: Unavailable, bar3 is 64bit, this register is used by bar3
bar5: Available, empty BAR, minimum additional MMIO
A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3. The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users. This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 21:08:26 +03:00
|
|
|
* predict the driver dependency on BAR ordering or sizing, therefore
|
|
|
|
* 'auto' becomes a lookup for combinations reported to work.
|
|
|
|
*/
|
|
|
|
if (target_bar < 0) {
|
|
|
|
error_setg(errp, "No automatic MSI-X relocation available for "
|
|
|
|
"device %04x:%04x", vdev->vendor_id, vdev->device_id);
|
2024-05-22 07:40:04 +03:00
|
|
|
return false;
|
vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table. This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64). In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device. However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X. Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location. There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.
This new x-msix-relocation option accepts the following choices:
off: Disable MSI-X relocation, use native device config (default)
auto: Use a known good combination for the platform/device (none yet)
bar0..bar5: Specify the target BAR for MSI-X data structures
If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.
The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area. Take for
example:
# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
...
Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
...
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance. The data sheet specifically refers
to this as an MSI-X BAR. This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.
However, here's another example:
# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
...
Region 0: I/O ports at c000 [size=256]
Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
...
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=1 offset=0000e000
PBA: BAR=1 offset=0000f000
Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR. If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform. At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.
Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures. A few key rules to keep in mind for this selection
include:
* There are only 6 BAR slots, bar0..bar5
* 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
* PCI BARs are always a power of 2 in size, extending == doubling
* The maximum size of a 32-bit BAR is 2GB
* MSI-X data structures must reside in an MMIO BAR
Using these rules, we can evaluate each BAR of the second example
device above as follows:
bar0: I/O port BAR, incompatible with MSI-X tables
bar1: BAR could be extended, incurring another 64KB of MMIO
bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
bar3: BAR could be extended, incurring another 256KB of MMIO
bar4: Unavailable, bar3 is 64bit, this register is used by bar3
bar5: Available, empty BAR, minimum additional MMIO
A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3. The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users. This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 21:08:26 +03:00
|
|
|
}
|
|
|
|
} else {
|
|
|
|
target_bar = (int)(vdev->msix_relo - OFF_AUTOPCIBAR_BAR0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* I/O port BARs cannot host MSI-X structures */
|
|
|
|
if (vdev->bars[target_bar].ioport) {
|
|
|
|
error_setg(errp, "Invalid MSI-X relocation BAR %d, "
|
|
|
|
"I/O port BAR", target_bar);
|
2024-05-22 07:40:04 +03:00
|
|
|
return false;
|
vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table. This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64). In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device. However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X. Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location. There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.
This new x-msix-relocation option accepts the following choices:
off: Disable MSI-X relocation, use native device config (default)
auto: Use a known good combination for the platform/device (none yet)
bar0..bar5: Specify the target BAR for MSI-X data structures
If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.
The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area. Take for
example:
# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
...
Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
...
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance. The data sheet specifically refers
to this as an MSI-X BAR. This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.
However, here's another example:
# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
...
Region 0: I/O ports at c000 [size=256]
Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
...
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=1 offset=0000e000
PBA: BAR=1 offset=0000f000
Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR. If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform. At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.
Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures. A few key rules to keep in mind for this selection
include:
* There are only 6 BAR slots, bar0..bar5
* 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
* PCI BARs are always a power of 2 in size, extending == doubling
* The maximum size of a 32-bit BAR is 2GB
* MSI-X data structures must reside in an MMIO BAR
Using these rules, we can evaluate each BAR of the second example
device above as follows:
bar0: I/O port BAR, incompatible with MSI-X tables
bar1: BAR could be extended, incurring another 64KB of MMIO
bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
bar3: BAR could be extended, incurring another 256KB of MMIO
bar4: Unavailable, bar3 is 64bit, this register is used by bar3
bar5: Available, empty BAR, minimum additional MMIO
A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3. The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users. This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 21:08:26 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Cannot use a BAR in the "shadow" of a 64-bit BAR */
|
|
|
|
if (!vdev->bars[target_bar].size &&
|
|
|
|
target_bar > 0 && vdev->bars[target_bar - 1].mem64) {
|
|
|
|
error_setg(errp, "Invalid MSI-X relocation BAR %d, "
|
|
|
|
"consumed by 64-bit BAR %d", target_bar, target_bar - 1);
|
2024-05-22 07:40:04 +03:00
|
|
|
return false;
|
vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table. This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64). In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device. However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X. Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location. There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.
This new x-msix-relocation option accepts the following choices:
off: Disable MSI-X relocation, use native device config (default)
auto: Use a known good combination for the platform/device (none yet)
bar0..bar5: Specify the target BAR for MSI-X data structures
If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.
The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area. Take for
example:
# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
...
Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
...
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance. The data sheet specifically refers
to this as an MSI-X BAR. This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.
However, here's another example:
# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
...
Region 0: I/O ports at c000 [size=256]
Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
...
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=1 offset=0000e000
PBA: BAR=1 offset=0000f000
Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR. If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform. At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.
Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures. A few key rules to keep in mind for this selection
include:
* There are only 6 BAR slots, bar0..bar5
* 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
* PCI BARs are always a power of 2 in size, extending == doubling
* The maximum size of a 32-bit BAR is 2GB
* MSI-X data structures must reside in an MMIO BAR
Using these rules, we can evaluate each BAR of the second example
device above as follows:
bar0: I/O port BAR, incompatible with MSI-X tables
bar1: BAR could be extended, incurring another 64KB of MMIO
bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
bar3: BAR could be extended, incurring another 256KB of MMIO
bar4: Unavailable, bar3 is 64bit, this register is used by bar3
bar5: Available, empty BAR, minimum additional MMIO
A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3. The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users. This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 21:08:26 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* 2GB max size for 32-bit BARs, cannot double if already > 1G */
|
2018-06-25 15:42:29 +03:00
|
|
|
if (vdev->bars[target_bar].size > 1 * GiB &&
|
vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table. This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64). In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device. However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X. Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location. There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.
This new x-msix-relocation option accepts the following choices:
off: Disable MSI-X relocation, use native device config (default)
auto: Use a known good combination for the platform/device (none yet)
bar0..bar5: Specify the target BAR for MSI-X data structures
If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.
The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area. Take for
example:
# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
...
Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
...
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance. The data sheet specifically refers
to this as an MSI-X BAR. This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.
However, here's another example:
# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
...
Region 0: I/O ports at c000 [size=256]
Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
...
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=1 offset=0000e000
PBA: BAR=1 offset=0000f000
Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR. If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform. At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.
Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures. A few key rules to keep in mind for this selection
include:
* There are only 6 BAR slots, bar0..bar5
* 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
* PCI BARs are always a power of 2 in size, extending == doubling
* The maximum size of a 32-bit BAR is 2GB
* MSI-X data structures must reside in an MMIO BAR
Using these rules, we can evaluate each BAR of the second example
device above as follows:
bar0: I/O port BAR, incompatible with MSI-X tables
bar1: BAR could be extended, incurring another 64KB of MMIO
bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
bar3: BAR could be extended, incurring another 256KB of MMIO
bar4: Unavailable, bar3 is 64bit, this register is used by bar3
bar5: Available, empty BAR, minimum additional MMIO
A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3. The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users. This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 21:08:26 +03:00
|
|
|
!vdev->bars[target_bar].mem64) {
|
|
|
|
error_setg(errp, "Invalid MSI-X relocation BAR %d, "
|
|
|
|
"no space to extend 32-bit BAR", target_bar);
|
2024-05-22 07:40:04 +03:00
|
|
|
return false;
|
vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table. This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64). In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device. However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X. Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location. There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.
This new x-msix-relocation option accepts the following choices:
off: Disable MSI-X relocation, use native device config (default)
auto: Use a known good combination for the platform/device (none yet)
bar0..bar5: Specify the target BAR for MSI-X data structures
If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.
The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area. Take for
example:
# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
...
Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
...
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance. The data sheet specifically refers
to this as an MSI-X BAR. This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.
However, here's another example:
# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
...
Region 0: I/O ports at c000 [size=256]
Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
...
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=1 offset=0000e000
PBA: BAR=1 offset=0000f000
Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR. If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform. At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.
Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures. A few key rules to keep in mind for this selection
include:
* There are only 6 BAR slots, bar0..bar5
* 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
* PCI BARs are always a power of 2 in size, extending == doubling
* The maximum size of a 32-bit BAR is 2GB
* MSI-X data structures must reside in an MMIO BAR
Using these rules, we can evaluate each BAR of the second example
device above as follows:
bar0: I/O port BAR, incompatible with MSI-X tables
bar1: BAR could be extended, incurring another 64KB of MMIO
bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
bar3: BAR could be extended, incurring another 256KB of MMIO
bar4: Unavailable, bar3 is 64bit, this register is used by bar3
bar5: Available, empty BAR, minimum additional MMIO
A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3. The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users. This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 21:08:26 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If adding a new BAR, test if we can make it 64bit. We make it
|
|
|
|
* prefetchable since QEMU MSI-X emulation has no read side effects
|
|
|
|
* and doing so makes mapping more flexible.
|
|
|
|
*/
|
|
|
|
if (!vdev->bars[target_bar].size) {
|
|
|
|
if (target_bar < (PCI_ROM_SLOT - 1) &&
|
|
|
|
!vdev->bars[target_bar + 1].size) {
|
|
|
|
vdev->bars[target_bar].mem64 = true;
|
|
|
|
vdev->bars[target_bar].type = PCI_BASE_ADDRESS_MEM_TYPE_64;
|
|
|
|
}
|
|
|
|
vdev->bars[target_bar].type |= PCI_BASE_ADDRESS_MEM_PREFETCH;
|
|
|
|
vdev->bars[target_bar].size = msix_sz;
|
|
|
|
vdev->msix->table_offset = 0;
|
|
|
|
} else {
|
|
|
|
vdev->bars[target_bar].size = MAX(vdev->bars[target_bar].size * 2,
|
|
|
|
msix_sz * 2);
|
|
|
|
/*
|
|
|
|
* Due to above size calc, MSI-X always starts halfway into the BAR,
|
|
|
|
* which will always be a separate host page.
|
|
|
|
*/
|
|
|
|
vdev->msix->table_offset = vdev->bars[target_bar].size / 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
vdev->msix->table_bar = target_bar;
|
|
|
|
vdev->msix->pba_bar = target_bar;
|
|
|
|
/* Requires 8-byte alignment, but PCI_MSIX_ENTRY_SIZE guarantees that */
|
|
|
|
vdev->msix->pba_offset = vdev->msix->table_offset +
|
|
|
|
(vdev->msix->entries * PCI_MSIX_ENTRY_SIZE);
|
|
|
|
|
|
|
|
trace_vfio_msix_relo(vdev->vbasedev.name,
|
|
|
|
vdev->msix->table_bar, vdev->msix->table_offset);
|
2024-05-22 07:40:04 +03:00
|
|
|
return true;
|
vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table. This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64). In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device. However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X. Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location. There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.
This new x-msix-relocation option accepts the following choices:
off: Disable MSI-X relocation, use native device config (default)
auto: Use a known good combination for the platform/device (none yet)
bar0..bar5: Specify the target BAR for MSI-X data structures
If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.
The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area. Take for
example:
# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
...
Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
...
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance. The data sheet specifically refers
to this as an MSI-X BAR. This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.
However, here's another example:
# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
...
Region 0: I/O ports at c000 [size=256]
Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
...
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=1 offset=0000e000
PBA: BAR=1 offset=0000f000
Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR. If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform. At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.
Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures. A few key rules to keep in mind for this selection
include:
* There are only 6 BAR slots, bar0..bar5
* 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
* PCI BARs are always a power of 2 in size, extending == doubling
* The maximum size of a 32-bit BAR is 2GB
* MSI-X data structures must reside in an MMIO BAR
Using these rules, we can evaluate each BAR of the second example
device above as follows:
bar0: I/O port BAR, incompatible with MSI-X tables
bar1: BAR could be extended, incurring another 64KB of MMIO
bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
bar3: BAR could be extended, incurring another 256KB of MMIO
bar4: Unavailable, bar3 is 64bit, this register is used by bar3
bar5: Available, empty BAR, minimum additional MMIO
A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3. The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users. This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 21:08:26 +03:00
|
|
|
}
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
/*
|
|
|
|
* We don't have any control over how pci_add_capability() inserts
|
|
|
|
* capabilities into the chain. In order to setup MSI-X we need a
|
|
|
|
* MemoryRegion for the BAR. In order to setup the BAR and not
|
|
|
|
* attempt to mmap the MSI-X table area, which VFIO won't allow, we
|
|
|
|
* need to first look for where the MSI-X table lives. So we
|
|
|
|
* unfortunately split MSI-X setup across two functions.
|
|
|
|
*/
|
2024-05-22 07:40:04 +03:00
|
|
|
static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
uint8_t pos;
|
|
|
|
uint16_t ctrl;
|
|
|
|
uint32_t table, pba;
|
2023-09-26 05:14:04 +03:00
|
|
|
int ret, fd = vdev->vbasedev.fd;
|
|
|
|
struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
|
|
|
|
.index = VFIO_PCI_MSIX_IRQ_INDEX };
|
2015-09-23 22:04:43 +03:00
|
|
|
VFIOMSIXInfo *msix;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
|
|
|
|
if (!pos) {
|
2024-05-22 07:40:04 +03:00
|
|
|
return true;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:31 +03:00
|
|
|
if (pread(fd, &ctrl, sizeof(ctrl),
|
2016-02-19 19:42:32 +03:00
|
|
|
vdev->config_offset + pos + PCI_MSIX_FLAGS) != sizeof(ctrl)) {
|
2016-10-17 19:57:58 +03:00
|
|
|
error_setg_errno(errp, errno, "failed to read PCI MSIX FLAGS");
|
2024-05-22 07:40:04 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:31 +03:00
|
|
|
if (pread(fd, &table, sizeof(table),
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
|
2016-10-17 19:57:58 +03:00
|
|
|
error_setg_errno(errp, errno, "failed to read PCI MSIX TABLE");
|
2024-05-22 07:40:04 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:31 +03:00
|
|
|
if (pread(fd, &pba, sizeof(pba),
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
|
2016-10-17 19:57:58 +03:00
|
|
|
error_setg_errno(errp, errno, "failed to read PCI MSIX PBA");
|
2024-05-22 07:40:04 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
ctrl = le16_to_cpu(ctrl);
|
|
|
|
table = le32_to_cpu(table);
|
|
|
|
pba = le32_to_cpu(pba);
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
msix = g_malloc0(sizeof(*msix));
|
|
|
|
msix->table_bar = table & PCI_MSIX_FLAGS_BIRMASK;
|
|
|
|
msix->table_offset = table & ~PCI_MSIX_FLAGS_BIRMASK;
|
|
|
|
msix->pba_bar = pba & PCI_MSIX_FLAGS_BIRMASK;
|
|
|
|
msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
|
|
|
|
msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2023-09-26 05:14:04 +03:00
|
|
|
ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
|
|
|
|
if (ret < 0) {
|
|
|
|
error_setg_errno(errp, -ret, "failed to get MSI-X irq info");
|
|
|
|
g_free(msix);
|
2024-05-22 07:40:04 +03:00
|
|
|
return false;
|
2023-09-26 05:14:04 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
msix->noresize = !!(irq_info.flags & VFIO_IRQ_INFO_NORESIZE);
|
|
|
|
|
2015-07-06 21:15:15 +03:00
|
|
|
/*
|
|
|
|
* Test the size of the pba_offset variable and catch if it extends outside
|
|
|
|
* of the specified BAR. If it is the case, we need to apply a hardware
|
|
|
|
* specific quirk if the device is known or we have a broken configuration.
|
|
|
|
*/
|
2015-09-23 22:04:43 +03:00
|
|
|
if (msix->pba_offset >= vdev->bars[msix->pba_bar].region.size) {
|
2015-07-06 21:15:15 +03:00
|
|
|
/*
|
|
|
|
* Chelsio T5 Virtual Function devices are encoded as 0x58xx for T5
|
|
|
|
* adapters. The T5 hardware returns an incorrect value of 0x8000 for
|
|
|
|
* the VF PBA offset while the BAR itself is only 8k. The correct value
|
|
|
|
* is 0x1000, so we hard code that here.
|
|
|
|
*/
|
2015-09-23 22:04:49 +03:00
|
|
|
if (vdev->vendor_id == PCI_VENDOR_ID_CHELSIO &&
|
|
|
|
(vdev->device_id & 0xff00) == 0x5800) {
|
2015-09-23 22:04:43 +03:00
|
|
|
msix->pba_offset = 0x1000;
|
2021-07-13 12:37:43 +03:00
|
|
|
/*
|
|
|
|
* BAIDU KUNLUN Virtual Function devices for KUNLUN AI processor
|
|
|
|
* return an incorrect value of 0x460000 for the VF PBA offset while
|
|
|
|
* the BAR itself is only 0x10000. The correct value is 0xb400.
|
|
|
|
*/
|
|
|
|
} else if (vfio_pci_is(vdev, PCI_VENDOR_ID_BAIDU,
|
|
|
|
PCI_DEVICE_ID_KUNLUN_VF)) {
|
|
|
|
msix->pba_offset = 0xb400;
|
2019-06-13 18:57:36 +03:00
|
|
|
} else if (vdev->msix_relo == OFF_AUTOPCIBAR_OFF) {
|
2016-10-17 19:57:58 +03:00
|
|
|
error_setg(errp, "hardware reports invalid configuration, "
|
|
|
|
"MSIX PBA outside of specified BAR");
|
2015-09-23 22:04:43 +03:00
|
|
|
g_free(msix);
|
2024-05-22 07:40:04 +03:00
|
|
|
return false;
|
2015-07-06 21:15:15 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-09-23 22:04:43 +03:00
|
|
|
trace_vfio_msix_early_setup(vdev->vbasedev.name, pos, msix->table_bar,
|
2023-09-26 05:14:04 +03:00
|
|
|
msix->table_offset, msix->entries,
|
|
|
|
msix->noresize);
|
2015-09-23 22:04:43 +03:00
|
|
|
vdev->msix = msix;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2016-03-10 19:39:07 +03:00
|
|
|
vfio_pci_fixup_msix_region(vdev);
|
vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table. This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64). In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device. However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X. Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location. There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.
This new x-msix-relocation option accepts the following choices:
off: Disable MSI-X relocation, use native device config (default)
auto: Use a known good combination for the platform/device (none yet)
bar0..bar5: Specify the target BAR for MSI-X data structures
If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.
The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area. Take for
example:
# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
...
Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
...
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance. The data sheet specifically refers
to this as an MSI-X BAR. This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.
However, here's another example:
# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
...
Region 0: I/O ports at c000 [size=256]
Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
...
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=1 offset=0000e000
PBA: BAR=1 offset=0000f000
Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR. If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform. At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.
Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures. A few key rules to keep in mind for this selection
include:
* There are only 6 BAR slots, bar0..bar5
* 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
* PCI BARs are always a power of 2 in size, extending == doubling
* The maximum size of a 32-bit BAR is 2GB
* MSI-X data structures must reside in an MMIO BAR
Using these rules, we can evaluate each BAR of the second example
device above as follows:
bar0: I/O port BAR, incompatible with MSI-X tables
bar1: BAR could be extended, incurring another 64KB of MMIO
bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
bar3: BAR could be extended, incurring another 256KB of MMIO
bar4: Unavailable, bar3 is 64bit, this register is used by bar3
bar5: Available, empty BAR, minimum additional MMIO
A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3. The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users. This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 21:08:26 +03:00
|
|
|
|
2024-05-22 07:40:04 +03:00
|
|
|
return vfio_pci_relocate_msix(vdev, errp);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
static bool vfio_msix_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
int ret;
|
2017-01-17 09:18:48 +03:00
|
|
|
Error *err = NULL;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2022-03-15 17:41:56 +03:00
|
|
|
vdev->msix->pending = g_new0(unsigned long,
|
|
|
|
BITS_TO_LONGS(vdev->msix->entries));
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
ret = msix_init(&vdev->pdev, vdev->msix->entries,
|
2018-02-06 21:08:25 +03:00
|
|
|
vdev->bars[vdev->msix->table_bar].mr,
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vdev->msix->table_bar, vdev->msix->table_offset,
|
2018-02-06 21:08:25 +03:00
|
|
|
vdev->bars[vdev->msix->pba_bar].mr,
|
2017-01-17 09:18:48 +03:00
|
|
|
vdev->msix->pba_bar, vdev->msix->pba_offset, pos,
|
|
|
|
&err);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
if (ret < 0) {
|
2012-10-08 18:45:30 +04:00
|
|
|
if (ret == -ENOTSUP) {
|
2018-10-17 11:26:29 +03:00
|
|
|
warn_report_err(err);
|
2024-05-22 07:40:08 +03:00
|
|
|
return true;
|
2012-10-08 18:45:30 +04:00
|
|
|
}
|
2017-01-17 09:18:48 +03:00
|
|
|
|
|
|
|
error_propagate(errp, err);
|
2024-05-22 07:40:08 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2016-01-19 21:33:42 +03:00
|
|
|
/*
|
|
|
|
* The PCI spec suggests that devices provide additional alignment for
|
|
|
|
* MSI-X structures and avoid overlapping non-MSI-X related registers.
|
|
|
|
* For an assigned device, this hopefully means that emulation of MSI-X
|
|
|
|
* structures does not affect the performance of the device. If devices
|
|
|
|
* fail to provide that alignment, a significant performance penalty may
|
|
|
|
* result, for instance Mellanox MT27500 VFs:
|
|
|
|
* http://www.spinics.net/lists/kvm/msg125881.html
|
|
|
|
*
|
|
|
|
* The PBA is simply not that important for such a serious regression and
|
|
|
|
* most drivers do not appear to look at it. The solution for this is to
|
|
|
|
* disable the PBA MemoryRegion unless it's being used. We disable it
|
|
|
|
* here and only enable it if a masked vector fires through QEMU. As the
|
|
|
|
* vector-use notifier is called, which occurs on unmask, we test whether
|
|
|
|
* PBA emulation is needed and again disable if not.
|
|
|
|
*/
|
|
|
|
memory_region_set_enabled(&vdev->pdev.msix_pba_mmio, false);
|
|
|
|
|
2018-03-13 20:17:31 +03:00
|
|
|
/*
|
|
|
|
* The emulated machine may provide a paravirt interface for MSIX setup
|
|
|
|
* so it is not strictly necessary to emulate MSIX here. This becomes
|
|
|
|
* helpful when frequently accessed MMIO registers are located in
|
|
|
|
* subpages adjacent to the MSIX table but the MSIX data containing page
|
|
|
|
* cannot be mapped because of a host page size bigger than the MSIX table
|
|
|
|
* alignment.
|
|
|
|
*/
|
|
|
|
if (object_property_get_bool(OBJECT(qdev_get_machine()),
|
|
|
|
"vfio-no-msix-emulation", NULL)) {
|
|
|
|
memory_region_set_enabled(&vdev->pdev.msix_table_mmio, false);
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
return true;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_teardown_msi(VFIOPCIDevice *vdev)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
msi_uninit(&vdev->pdev);
|
|
|
|
|
|
|
|
if (vdev->msix) {
|
2014-12-22 19:54:37 +03:00
|
|
|
msix_uninit(&vdev->pdev,
|
2018-02-06 21:08:25 +03:00
|
|
|
vdev->bars[vdev->msix->table_bar].mr,
|
|
|
|
vdev->bars[vdev->msix->pba_bar].mr);
|
2016-01-19 21:33:42 +03:00
|
|
|
g_free(vdev->msix->pending);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Resource setup
|
|
|
|
*/
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < PCI_ROM_SLOT; i++) {
|
2016-03-10 19:39:07 +03:00
|
|
|
vfio_region_mmaps_set_enabled(&vdev->bars[i].region, enabled);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-02-06 21:08:25 +03:00
|
|
|
static void vfio_bar_prepare(VFIOPCIDevice *vdev, int nr)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
VFIOBAR *bar = &vdev->bars[nr];
|
|
|
|
|
|
|
|
uint32_t pci_bar;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
/* Skip both unimplemented BARs and the upper half of 64bit BARS. */
|
2016-03-10 19:39:08 +03:00
|
|
|
if (!bar->region.size) {
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Determine what type of BAR this is for registration */
|
2014-12-20 01:24:31 +03:00
|
|
|
ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
|
|
|
|
if (ret != sizeof(pci_bar)) {
|
error: Strip trailing '\n' from error string arguments (again)
Commit 6daf194d and be62a2eb got rid of a bunch, but they keep coming
back. Tracked down with this Coccinelle semantic patch:
@r@
expression err, eno, cls, fmt;
position p;
@@
(
error_report(fmt, ...)@p
|
error_set(err, cls, fmt, ...)@p
|
error_set_errno(err, eno, cls, fmt, ...)@p
|
error_setg(err, fmt, ...)@p
|
error_setg_errno(err, eno, fmt, ...)@p
)
@script:python@
fmt << r.fmt;
p << r.p;
@@
if "\\n" in str(fmt):
print "%s:%s:%s:%s" % (p[0].file, p[0].line, p[0].column, fmt)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1360354939-10994-4-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-09 00:22:16 +04:00
|
|
|
error_report("vfio: Failed to read BAR %d (%m)", nr);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
pci_bar = le32_to_cpu(pci_bar);
|
2013-07-16 01:48:11 +04:00
|
|
|
bar->ioport = (pci_bar & PCI_BASE_ADDRESS_SPACE_IO);
|
|
|
|
bar->mem64 = bar->ioport ? 0 : (pci_bar & PCI_BASE_ADDRESS_MEM_TYPE_64);
|
2018-02-06 21:08:25 +03:00
|
|
|
bar->type = pci_bar & (bar->ioport ? ~PCI_BASE_ADDRESS_IO_MASK :
|
|
|
|
~PCI_BASE_ADDRESS_MEM_MASK);
|
|
|
|
bar->size = bar->region.size;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vfio_bars_prepare(VFIOPCIDevice *vdev)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < PCI_ROM_SLOT; i++) {
|
|
|
|
vfio_bar_prepare(vdev, i);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
|
|
|
|
{
|
|
|
|
VFIOBAR *bar = &vdev->bars[nr];
|
|
|
|
char *name;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2018-02-06 21:08:25 +03:00
|
|
|
if (!bar->size) {
|
|
|
|
return;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
2013-04-01 23:34:40 +04:00
|
|
|
|
2018-02-06 21:08:25 +03:00
|
|
|
bar->mr = g_new0(MemoryRegion, 1);
|
|
|
|
name = g_strdup_printf("%s base BAR %d", vdev->vbasedev.name, nr);
|
|
|
|
memory_region_init_io(bar->mr, OBJECT(vdev), NULL, NULL, name, bar->size);
|
|
|
|
g_free(name);
|
|
|
|
|
|
|
|
if (bar->region.size) {
|
|
|
|
memory_region_add_subregion(bar->mr, 0, bar->region.mem);
|
|
|
|
|
|
|
|
if (vfio_region_mmap(&bar->region)) {
|
|
|
|
error_report("Failed to mmap %s BAR %d. Performance may be slow",
|
|
|
|
vdev->vbasedev.name, nr);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
pci_register_bar(&vdev->pdev, nr, bar->type, bar->mr);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2018-02-06 21:08:25 +03:00
|
|
|
static void vfio_bars_register(VFIOPCIDevice *vdev)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < PCI_ROM_SLOT; i++) {
|
2018-02-06 21:08:25 +03:00
|
|
|
vfio_bar_register(vdev, i);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-03-10 19:39:08 +03:00
|
|
|
static void vfio_bars_exit(VFIOPCIDevice *vdev)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < PCI_ROM_SLOT; i++) {
|
2018-02-06 21:08:25 +03:00
|
|
|
VFIOBAR *bar = &vdev->bars[i];
|
|
|
|
|
2016-03-10 19:39:08 +03:00
|
|
|
vfio_bar_quirk_exit(vdev, i);
|
2018-02-06 21:08:25 +03:00
|
|
|
vfio_region_exit(&bar->region);
|
|
|
|
if (bar->region.size) {
|
|
|
|
memory_region_del_subregion(bar->mr, bar->region.mem);
|
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
2013-04-01 23:33:44 +04:00
|
|
|
|
2016-03-10 19:39:08 +03:00
|
|
|
if (vdev->vga) {
|
2013-04-01 23:33:44 +04:00
|
|
|
pci_unregister_vga(&vdev->pdev);
|
2016-03-10 19:39:08 +03:00
|
|
|
vfio_vga_quirk_exit(vdev);
|
2013-04-01 23:33:44 +04:00
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2016-03-10 19:39:08 +03:00
|
|
|
static void vfio_bars_finalize(VFIOPCIDevice *vdev)
|
2015-02-10 20:25:44 +03:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < PCI_ROM_SLOT; i++) {
|
2018-02-06 21:08:25 +03:00
|
|
|
VFIOBAR *bar = &vdev->bars[i];
|
|
|
|
|
2016-03-10 19:39:08 +03:00
|
|
|
vfio_bar_quirk_finalize(vdev, i);
|
2018-02-06 21:08:25 +03:00
|
|
|
vfio_region_finalize(&bar->region);
|
2023-07-04 16:39:27 +03:00
|
|
|
if (bar->mr) {
|
|
|
|
assert(bar->size);
|
2018-02-06 21:08:25 +03:00
|
|
|
object_unparent(OBJECT(bar->mr));
|
|
|
|
g_free(bar->mr);
|
2023-07-04 16:39:27 +03:00
|
|
|
bar->mr = NULL;
|
2018-02-06 21:08:25 +03:00
|
|
|
}
|
2015-02-10 20:25:44 +03:00
|
|
|
}
|
|
|
|
|
2016-03-10 19:39:08 +03:00
|
|
|
if (vdev->vga) {
|
|
|
|
vfio_vga_quirk_finalize(vdev);
|
|
|
|
for (i = 0; i < ARRAY_SIZE(vdev->vga->region); i++) {
|
|
|
|
object_unparent(OBJECT(&vdev->vga->region[i].mem));
|
|
|
|
}
|
|
|
|
g_free(vdev->vga);
|
2015-02-10 20:25:44 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
/*
|
|
|
|
* General setup
|
|
|
|
*/
|
|
|
|
static uint8_t vfio_std_cap_max_size(PCIDevice *pdev, uint8_t pos)
|
|
|
|
{
|
2016-02-19 19:42:28 +03:00
|
|
|
uint8_t tmp;
|
|
|
|
uint16_t next = PCI_CONFIG_SPACE_SIZE;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
for (tmp = pdev->config[PCI_CAPABILITY_LIST]; tmp;
|
2016-02-19 19:42:29 +03:00
|
|
|
tmp = pdev->config[tmp + PCI_CAP_LIST_NEXT]) {
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
if (tmp > pos && tmp < next) {
|
|
|
|
next = tmp;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return next - pos;
|
|
|
|
}
|
|
|
|
|
2016-06-30 22:00:23 +03:00
|
|
|
|
|
|
|
static uint16_t vfio_ext_cap_max_size(const uint8_t *config, uint16_t pos)
|
|
|
|
{
|
|
|
|
uint16_t tmp, next = PCIE_CONFIG_SPACE_SIZE;
|
|
|
|
|
|
|
|
for (tmp = PCI_CONFIG_SPACE_SIZE; tmp;
|
|
|
|
tmp = PCI_EXT_CAP_NEXT(pci_get_long(config + tmp))) {
|
|
|
|
if (tmp > pos && tmp < next) {
|
|
|
|
next = tmp;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return next - pos;
|
|
|
|
}
|
|
|
|
|
2013-04-01 21:50:04 +04:00
|
|
|
static void vfio_set_word_bits(uint8_t *buf, uint16_t val, uint16_t mask)
|
|
|
|
{
|
|
|
|
pci_set_word(buf, (pci_get_word(buf) & ~mask) | val);
|
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_add_emulated_word(VFIOPCIDevice *vdev, int pos,
|
2013-04-01 21:50:04 +04:00
|
|
|
uint16_t val, uint16_t mask)
|
|
|
|
{
|
|
|
|
vfio_set_word_bits(vdev->pdev.config + pos, val, mask);
|
|
|
|
vfio_set_word_bits(vdev->pdev.wmask + pos, ~mask, mask);
|
|
|
|
vfio_set_word_bits(vdev->emulated_config_bits + pos, mask, mask);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vfio_set_long_bits(uint8_t *buf, uint32_t val, uint32_t mask)
|
|
|
|
{
|
|
|
|
pci_set_long(buf, (pci_get_long(buf) & ~mask) | val);
|
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_add_emulated_long(VFIOPCIDevice *vdev, int pos,
|
2013-04-01 21:50:04 +04:00
|
|
|
uint32_t val, uint32_t mask)
|
|
|
|
{
|
|
|
|
vfio_set_long_bits(vdev->pdev.config + pos, val, mask);
|
|
|
|
vfio_set_long_bits(vdev->pdev.wmask + pos, ~mask, mask);
|
|
|
|
vfio_set_long_bits(vdev->emulated_config_bits + pos, mask, mask);
|
|
|
|
}
|
|
|
|
|
2023-05-27 02:15:58 +03:00
|
|
|
static void vfio_pci_enable_rp_atomics(VFIOPCIDevice *vdev)
|
|
|
|
{
|
|
|
|
struct vfio_device_info_cap_pci_atomic_comp *cap;
|
|
|
|
g_autofree struct vfio_device_info *info = NULL;
|
|
|
|
PCIBus *bus = pci_get_bus(&vdev->pdev);
|
|
|
|
PCIDevice *parent = bus->parent_dev;
|
|
|
|
struct vfio_info_cap_header *hdr;
|
|
|
|
uint32_t mask = 0;
|
|
|
|
uint8_t *pos;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* PCIe Atomic Ops completer support is only added automatically for single
|
|
|
|
* function devices downstream of a root port supporting DEVCAP2. Support
|
|
|
|
* is added during realize and, if added, removed during device exit. The
|
|
|
|
* single function requirement avoids conflicting requirements should a
|
|
|
|
* slot be composed of multiple devices with differing capabilities.
|
|
|
|
*/
|
|
|
|
if (pci_bus_is_root(bus) || !parent || !parent->exp.exp_cap ||
|
|
|
|
pcie_cap_get_type(parent) != PCI_EXP_TYPE_ROOT_PORT ||
|
|
|
|
pcie_cap_get_version(parent) != PCI_EXP_FLAGS_VER2 ||
|
|
|
|
vdev->pdev.devfn ||
|
|
|
|
vdev->pdev.cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
pos = parent->config + parent->exp.exp_cap + PCI_EXP_DEVCAP2;
|
|
|
|
|
|
|
|
/* Abort if there'a already an Atomic Ops configuration on the root port */
|
|
|
|
if (pci_get_long(pos) & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
|
|
|
|
PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
|
|
|
|
PCI_EXP_DEVCAP2_ATOMIC_COMP128)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
info = vfio_get_device_info(vdev->vbasedev.fd);
|
|
|
|
if (!info) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_PCI_ATOMIC_COMP);
|
|
|
|
if (!hdr) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
cap = (void *)hdr;
|
|
|
|
if (cap->flags & VFIO_PCI_ATOMIC_COMP32) {
|
|
|
|
mask |= PCI_EXP_DEVCAP2_ATOMIC_COMP32;
|
|
|
|
}
|
|
|
|
if (cap->flags & VFIO_PCI_ATOMIC_COMP64) {
|
|
|
|
mask |= PCI_EXP_DEVCAP2_ATOMIC_COMP64;
|
|
|
|
}
|
|
|
|
if (cap->flags & VFIO_PCI_ATOMIC_COMP128) {
|
|
|
|
mask |= PCI_EXP_DEVCAP2_ATOMIC_COMP128;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!mask) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
pci_long_test_and_set_mask(pos, mask);
|
|
|
|
vdev->clear_parent_atomics_on_exit = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vfio_pci_disable_rp_atomics(VFIOPCIDevice *vdev)
|
|
|
|
{
|
|
|
|
if (vdev->clear_parent_atomics_on_exit) {
|
|
|
|
PCIDevice *parent = pci_get_bus(&vdev->pdev)->parent_dev;
|
|
|
|
uint8_t *pos = parent->config + parent->exp.exp_cap + PCI_EXP_DEVCAP2;
|
|
|
|
|
|
|
|
pci_long_test_and_clear_mask(pos, PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
|
|
|
|
PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
|
|
|
|
PCI_EXP_DEVCAP2_ATOMIC_COMP128);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
static bool vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int pos, uint8_t size,
|
|
|
|
Error **errp)
|
2013-04-01 21:50:04 +04:00
|
|
|
{
|
|
|
|
uint16_t flags;
|
|
|
|
uint8_t type;
|
|
|
|
|
|
|
|
flags = pci_get_word(vdev->pdev.config + pos + PCI_CAP_FLAGS);
|
|
|
|
type = (flags & PCI_EXP_FLAGS_TYPE) >> 4;
|
|
|
|
|
|
|
|
if (type != PCI_EXP_TYPE_ENDPOINT &&
|
|
|
|
type != PCI_EXP_TYPE_LEG_END &&
|
|
|
|
type != PCI_EXP_TYPE_RC_END) {
|
|
|
|
|
2016-10-17 19:57:58 +03:00
|
|
|
error_setg(errp, "assignment of PCIe type 0x%x "
|
|
|
|
"devices is not currently supported", type);
|
2024-05-22 07:40:08 +03:00
|
|
|
return false;
|
2013-04-01 21:50:04 +04:00
|
|
|
}
|
|
|
|
|
2017-11-29 11:46:27 +03:00
|
|
|
if (!pci_bus_is_express(pci_get_bus(&vdev->pdev))) {
|
|
|
|
PCIBus *bus = pci_get_bus(&vdev->pdev);
|
2015-11-10 22:11:08 +03:00
|
|
|
PCIDevice *bridge;
|
|
|
|
|
2013-04-01 21:50:04 +04:00
|
|
|
/*
|
2015-11-10 22:11:08 +03:00
|
|
|
* Traditionally PCI device assignment exposes the PCIe capability
|
|
|
|
* as-is on non-express buses. The reason being that some drivers
|
|
|
|
* simply assume that it's there, for example tg3. However when
|
|
|
|
* we're running on a native PCIe machine type, like Q35, we need
|
|
|
|
* to hide the PCIe capability. The reason for this is twofold;
|
|
|
|
* first Windows guests get a Code 10 error when the PCIe capability
|
|
|
|
* is exposed in this configuration. Therefore express devices won't
|
|
|
|
* work at all unless they're attached to express buses in the VM.
|
|
|
|
* Second, a native PCIe machine introduces the possibility of fine
|
|
|
|
* granularity IOMMUs supporting both translation and isolation.
|
|
|
|
* Guest code to discover the IOMMU visibility of a device, such as
|
|
|
|
* IOMMU grouping code on Linux, is very aware of device types and
|
|
|
|
* valid transitions between bus types. An express device on a non-
|
|
|
|
* express bus is not a valid combination on bare metal systems.
|
|
|
|
*
|
|
|
|
* Drivers that require a PCIe capability to make the device
|
|
|
|
* functional are simply going to need to have their devices placed
|
|
|
|
* on a PCIe bus in the VM.
|
2013-04-01 21:50:04 +04:00
|
|
|
*/
|
2015-11-10 22:11:08 +03:00
|
|
|
while (!pci_bus_is_root(bus)) {
|
|
|
|
bridge = pci_bridge_get_device(bus);
|
2017-11-29 11:46:27 +03:00
|
|
|
bus = pci_get_bus(bridge);
|
2015-11-10 22:11:08 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
if (pci_bus_is_express(bus)) {
|
2024-05-22 07:40:08 +03:00
|
|
|
return true;
|
2015-11-10 22:11:08 +03:00
|
|
|
}
|
|
|
|
|
2017-11-29 11:46:27 +03:00
|
|
|
} else if (pci_bus_is_root(pci_get_bus(&vdev->pdev))) {
|
2013-04-01 21:50:04 +04:00
|
|
|
/*
|
|
|
|
* On a Root Complex bus Endpoints become Root Complex Integrated
|
|
|
|
* Endpoints, which changes the type and clears the LNK & LNK2 fields.
|
|
|
|
*/
|
|
|
|
if (type == PCI_EXP_TYPE_ENDPOINT) {
|
|
|
|
vfio_add_emulated_word(vdev, pos + PCI_CAP_FLAGS,
|
|
|
|
PCI_EXP_TYPE_RC_END << 4,
|
|
|
|
PCI_EXP_FLAGS_TYPE);
|
|
|
|
|
|
|
|
/* Link Capabilities, Status, and Control goes away */
|
|
|
|
if (size > PCI_EXP_LNKCTL) {
|
|
|
|
vfio_add_emulated_long(vdev, pos + PCI_EXP_LNKCAP, 0, ~0);
|
|
|
|
vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKCTL, 0, ~0);
|
|
|
|
vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKSTA, 0, ~0);
|
|
|
|
|
|
|
|
#ifndef PCI_EXP_LNKCAP2
|
|
|
|
#define PCI_EXP_LNKCAP2 44
|
|
|
|
#endif
|
|
|
|
#ifndef PCI_EXP_LNKSTA2
|
|
|
|
#define PCI_EXP_LNKSTA2 50
|
|
|
|
#endif
|
|
|
|
/* Link 2 Capabilities, Status, and Control goes away */
|
|
|
|
if (size > PCI_EXP_LNKCAP2) {
|
|
|
|
vfio_add_emulated_long(vdev, pos + PCI_EXP_LNKCAP2, 0, ~0);
|
|
|
|
vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKCTL2, 0, ~0);
|
|
|
|
vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKSTA2, 0, ~0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
} else if (type == PCI_EXP_TYPE_LEG_END) {
|
|
|
|
/*
|
|
|
|
* Legacy endpoints don't belong on the root complex. Windows
|
|
|
|
* seems to be happier with devices if we skip the capability.
|
|
|
|
*/
|
2024-05-22 07:40:08 +03:00
|
|
|
return true;
|
2013-04-01 21:50:04 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Convert Root Complex Integrated Endpoints to regular endpoints.
|
|
|
|
* These devices don't support LNK/LNK2 capabilities, so make them up.
|
|
|
|
*/
|
|
|
|
if (type == PCI_EXP_TYPE_RC_END) {
|
|
|
|
vfio_add_emulated_word(vdev, pos + PCI_CAP_FLAGS,
|
|
|
|
PCI_EXP_TYPE_ENDPOINT << 4,
|
|
|
|
PCI_EXP_FLAGS_TYPE);
|
|
|
|
vfio_add_emulated_long(vdev, pos + PCI_EXP_LNKCAP,
|
2018-12-12 22:38:41 +03:00
|
|
|
QEMU_PCI_EXP_LNKCAP_MLW(QEMU_PCI_EXP_LNK_X1) |
|
|
|
|
QEMU_PCI_EXP_LNKCAP_MLS(QEMU_PCI_EXP_LNK_2_5GT), ~0);
|
2013-04-01 21:50:04 +04:00
|
|
|
vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKCTL, 0, ~0);
|
|
|
|
}
|
2023-05-27 02:15:58 +03:00
|
|
|
|
|
|
|
vfio_pci_enable_rp_atomics(vdev);
|
2013-04-01 21:50:04 +04:00
|
|
|
}
|
|
|
|
|
2017-07-10 19:39:43 +03:00
|
|
|
/*
|
|
|
|
* Intel 82599 SR-IOV VFs report an invalid PCIe capability version 0
|
|
|
|
* (Niantic errate #35) causing Windows to error with a Code 10 for the
|
|
|
|
* device on Q35. Fixup any such devices to report version 1. If we
|
|
|
|
* were to remove the capability entirely the guest would lose extended
|
|
|
|
* config space.
|
|
|
|
*/
|
|
|
|
if ((flags & PCI_EXP_FLAGS_VERS) == 0) {
|
|
|
|
vfio_add_emulated_word(vdev, pos + PCI_CAP_FLAGS,
|
|
|
|
1, PCI_EXP_FLAGS_VERS);
|
|
|
|
}
|
|
|
|
|
2017-06-27 09:16:50 +03:00
|
|
|
pos = pci_add_capability(&vdev->pdev, PCI_CAP_ID_EXP, pos, size,
|
|
|
|
errp);
|
|
|
|
if (pos < 0) {
|
2024-05-22 07:40:08 +03:00
|
|
|
return false;
|
2013-04-01 21:50:04 +04:00
|
|
|
}
|
|
|
|
|
2017-06-27 09:16:50 +03:00
|
|
|
vdev->pdev.exp.exp_cap = pos;
|
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
return true;
|
2013-04-01 21:50:04 +04:00
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_check_pcie_flr(VFIOPCIDevice *vdev, uint8_t pos)
|
2013-10-02 22:52:38 +04:00
|
|
|
{
|
|
|
|
uint32_t cap = pci_get_long(vdev->pdev.config + pos + PCI_EXP_DEVCAP);
|
|
|
|
|
|
|
|
if (cap & PCI_EXP_DEVCAP_FLR) {
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_check_pcie_flr(vdev->vbasedev.name);
|
2013-10-02 22:52:38 +04:00
|
|
|
vdev->has_flr = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_check_pm_reset(VFIOPCIDevice *vdev, uint8_t pos)
|
2013-10-02 22:52:38 +04:00
|
|
|
{
|
|
|
|
uint16_t csr = pci_get_word(vdev->pdev.config + pos + PCI_PM_CTRL);
|
|
|
|
|
|
|
|
if (!(csr & PCI_PM_CTRL_NO_SOFT_RESET)) {
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_check_pm_reset(vdev->vbasedev.name);
|
2013-10-02 22:52:38 +04:00
|
|
|
vdev->has_pm_reset = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos)
|
2013-10-02 22:52:38 +04:00
|
|
|
{
|
|
|
|
uint8_t cap = pci_get_byte(vdev->pdev.config + pos + PCI_AF_CAP);
|
|
|
|
|
|
|
|
if ((cap & PCI_AF_CAP_TP) && (cap & PCI_AF_CAP_FLR)) {
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_check_af_flr(vdev->vbasedev.name);
|
2013-10-02 22:52:38 +04:00
|
|
|
vdev->has_flr = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
static bool vfio_add_vendor_specific_cap(VFIOPCIDevice *vdev, int pos,
|
|
|
|
uint8_t size, Error **errp)
|
2024-05-03 17:51:42 +03:00
|
|
|
{
|
|
|
|
PCIDevice *pdev = &vdev->pdev;
|
|
|
|
|
|
|
|
pos = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, size, errp);
|
|
|
|
if (pos < 0) {
|
2024-05-22 07:40:08 +03:00
|
|
|
return false;
|
2024-05-03 17:51:42 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Exempt config space check for Vendor Specific Information during
|
|
|
|
* restore/load.
|
|
|
|
* Config space check is still enforced for 3 byte VSC header.
|
|
|
|
*/
|
|
|
|
if (vdev->skip_vsc_check && size > 3) {
|
|
|
|
memset(pdev->cmask + pos + 3, 0, size - 3);
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
return true;
|
2024-05-03 17:51:42 +03:00
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
static bool vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
hw/vfio/pci: Fix missing ERRP_GUARD() for error_prepend()
As the comment in qapi/error, passing @errp to error_prepend() requires
ERRP_GUARD():
* = Why, when and how to use ERRP_GUARD() =
*
* Without ERRP_GUARD(), use of the @errp parameter is restricted:
...
* - It should not be passed to error_prepend(), error_vprepend() or
* error_append_hint(), because that doesn't work with &error_fatal.
* ERRP_GUARD() lifts these restrictions.
*
* To use ERRP_GUARD(), add it right at the beginning of the function.
* @errp can then be used without worrying about the argument being
* NULL or &error_fatal.
ERRP_GUARD() could avoid the case when @errp is &error_fatal, the user
can't see this additional information, because exit() happens in
error_setg earlier than information is added [1].
In hw/vfio/pci.c, there are 2 functions passing @errp to error_prepend()
without ERRP_GUARD():
- vfio_add_std_cap()
- vfio_realize()
The @errp of vfio_add_std_cap() is also from vfio_realize(). And
vfio_realize(), as a PCIDeviceClass.realize method, its @errp is from
DeviceClass.realize so that there is no guarantee that the @errp won't
point to @error_fatal.
To avoid the issue like [1] said, add missing ERRP_GUARD() at their
beginning.
[1]: Issue description in the commit message of commit ae7c80a7bd73
("error: New macro ERRP_GUARD()").
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20240311033822.3142585-24-zhao1.liu@linux.intel.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-11 06:38:16 +03:00
|
|
|
ERRP_GUARD();
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
PCIDevice *pdev = &vdev->pdev;
|
|
|
|
uint8_t cap_id, next, size;
|
2024-05-22 07:40:08 +03:00
|
|
|
bool ret;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
cap_id = pdev->config[pos];
|
2016-02-19 19:42:29 +03:00
|
|
|
next = pdev->config[pos + PCI_CAP_LIST_NEXT];
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If it becomes important to configure capabilities to their actual
|
|
|
|
* size, use this as the default when it's something we don't recognize.
|
|
|
|
* Since QEMU doesn't actually handle many of the config accesses,
|
|
|
|
* exact size doesn't seem worthwhile.
|
|
|
|
*/
|
|
|
|
size = vfio_std_cap_max_size(pdev, pos);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pci_add_capability always inserts the new capability at the head
|
|
|
|
* of the chain. Therefore to end up with a chain that matches the
|
|
|
|
* physical device, we insert from the end by making this recursive.
|
2016-02-19 19:42:29 +03:00
|
|
|
* This is also why we pre-calculate size above as cached config space
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
* will be changed as we unwind the stack.
|
|
|
|
*/
|
|
|
|
if (next) {
|
2024-05-22 07:40:08 +03:00
|
|
|
if (!vfio_add_std_cap(vdev, next, errp)) {
|
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
} else {
|
2013-04-01 21:50:04 +04:00
|
|
|
/* Begin the rebuild, use QEMU emulated list bits */
|
|
|
|
pdev->config[PCI_CAPABILITY_LIST] = 0;
|
|
|
|
vdev->emulated_config_bits[PCI_CAPABILITY_LIST] = 0xff;
|
|
|
|
vdev->emulated_config_bits[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
|
2017-08-30 01:05:39 +03:00
|
|
|
|
2024-05-22 07:40:11 +03:00
|
|
|
if (!vfio_add_virt_caps(vdev, errp)) {
|
2024-05-22 07:40:08 +03:00
|
|
|
return false;
|
2017-08-30 01:05:39 +03:00
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2017-08-30 01:05:39 +03:00
|
|
|
/* Scale down size, esp in case virt caps were added above */
|
|
|
|
size = MIN(size, vfio_std_cap_max_size(pdev, pos));
|
|
|
|
|
2013-04-01 21:50:04 +04:00
|
|
|
/* Use emulated next pointer to allow dropping caps */
|
2016-02-19 19:42:29 +03:00
|
|
|
pci_set_byte(vdev->emulated_config_bits + pos + PCI_CAP_LIST_NEXT, 0xff);
|
2013-04-01 21:50:04 +04:00
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
switch (cap_id) {
|
|
|
|
case PCI_CAP_ID_MSI:
|
2016-10-17 19:57:58 +03:00
|
|
|
ret = vfio_msi_setup(vdev, pos, errp);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
break;
|
2013-04-01 21:50:04 +04:00
|
|
|
case PCI_CAP_ID_EXP:
|
2013-10-02 22:52:38 +04:00
|
|
|
vfio_check_pcie_flr(vdev, pos);
|
2016-10-17 19:57:58 +03:00
|
|
|
ret = vfio_setup_pcie_cap(vdev, pos, size, errp);
|
2013-04-01 21:50:04 +04:00
|
|
|
break;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
case PCI_CAP_ID_MSIX:
|
2016-10-17 19:57:58 +03:00
|
|
|
ret = vfio_msix_setup(vdev, pos, errp);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
break;
|
2013-04-01 23:35:08 +04:00
|
|
|
case PCI_CAP_ID_PM:
|
2013-10-02 22:52:38 +04:00
|
|
|
vfio_check_pm_reset(vdev, pos);
|
2013-04-01 23:35:08 +04:00
|
|
|
vdev->pm_cap = pos;
|
2024-05-22 07:40:08 +03:00
|
|
|
ret = pci_add_capability(pdev, cap_id, pos, size, errp) >= 0;
|
2013-10-02 22:52:38 +04:00
|
|
|
break;
|
|
|
|
case PCI_CAP_ID_AF:
|
|
|
|
vfio_check_af_flr(vdev, pos);
|
2024-05-22 07:40:08 +03:00
|
|
|
ret = pci_add_capability(pdev, cap_id, pos, size, errp) >= 0;
|
2013-10-02 22:52:38 +04:00
|
|
|
break;
|
2024-05-03 17:51:42 +03:00
|
|
|
case PCI_CAP_ID_VNDR:
|
|
|
|
ret = vfio_add_vendor_specific_cap(vdev, pos, size, errp);
|
|
|
|
break;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
default:
|
2024-05-22 07:40:08 +03:00
|
|
|
ret = pci_add_capability(pdev, cap_id, pos, size, errp) >= 0;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
break;
|
|
|
|
}
|
2017-08-30 01:05:32 +03:00
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
if (!ret) {
|
2016-10-17 19:57:58 +03:00
|
|
|
error_prepend(errp,
|
|
|
|
"failed to add PCI capability 0x%x[0x%x]@0x%x: ",
|
|
|
|
cap_id, size, pos);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
return ret;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
vfio/pci: Static Resizable BAR capability
The PCI Resizable BAR (ReBAR) capability is currently hidden from the
VM because the protocol for interacting with the capability does not
support a mechanism for the device to reject an advertised supported
BAR size. However, when assigned to a VM, the act of resizing the
BAR requires adjustment of host resources for the device, which
absolutely can fail. Linux does not currently allow us to reserve
resources for the device independent of the current usage.
The only writable field within the ReBAR capability is the BAR Size
register. The PCIe spec indicates that when written, the device
should immediately begin to operate with the provided BAR size. The
spec however also notes that software must only write values
corresponding to supported sizes as indicated in the capability and
control registers. Writing unsupported sizes produces undefined
results. Therefore, if the hypervisor were to virtualize the
capability and control registers such that the current size is the
only indicated available size, then a write of anything other than
the current size falls into the category of undefined behavior,
where we can essentially expose the modified ReBAR capability as
read-only.
This may seem pointless, but users have reported that virtualizing
the capability in this way not only allows guest software to expose
related features as available (even if only cosmetic), but in some
scenarios can resolve guest driver issues. Additionally, no
regressions in behavior have been reported for this change.
A caveat here is that the PCIe spec requires for compatibility that
devices report support for a size in the range of 1MB to 512GB,
therefore if the current BAR size falls outside that range we revert
to hiding the capability.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/r/20230505232308.2869912-1-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-05-04 23:42:48 +03:00
|
|
|
static int vfio_setup_rebar_ecap(VFIOPCIDevice *vdev, uint16_t pos)
|
|
|
|
{
|
|
|
|
uint32_t ctrl;
|
|
|
|
int i, nbar;
|
|
|
|
|
|
|
|
ctrl = pci_get_long(vdev->pdev.config + pos + PCI_REBAR_CTRL);
|
|
|
|
nbar = (ctrl & PCI_REBAR_CTRL_NBAR_MASK) >> PCI_REBAR_CTRL_NBAR_SHIFT;
|
|
|
|
|
|
|
|
for (i = 0; i < nbar; i++) {
|
|
|
|
uint32_t cap;
|
|
|
|
int size;
|
|
|
|
|
|
|
|
ctrl = pci_get_long(vdev->pdev.config + pos + PCI_REBAR_CTRL + (i * 8));
|
|
|
|
size = (ctrl & PCI_REBAR_CTRL_BAR_SIZE) >> PCI_REBAR_CTRL_BAR_SHIFT;
|
|
|
|
|
|
|
|
/* The cap register reports sizes 1MB to 128TB, with 4 reserved bits */
|
|
|
|
cap = size <= 27 ? 1U << (size + 4) : 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The PCIe spec (v6.0.1, 7.8.6) requires HW to support at least one
|
|
|
|
* size in the range 1MB to 512GB. We intend to mask all sizes except
|
|
|
|
* the one currently enabled in the size field, therefore if it's
|
|
|
|
* outside the range, hide the whole capability as this virtualization
|
|
|
|
* trick won't work. If >512GB resizable BARs start to appear, we
|
|
|
|
* might need an opt-in or reservation scheme in the kernel.
|
|
|
|
*/
|
|
|
|
if (!(cap & PCI_REBAR_CAP_SIZES)) {
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Hide all sizes reported in the ctrl reg per above requirement. */
|
|
|
|
ctrl &= (PCI_REBAR_CTRL_BAR_SIZE |
|
|
|
|
PCI_REBAR_CTRL_NBAR_MASK |
|
|
|
|
PCI_REBAR_CTRL_BAR_IDX);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The BAR size field is RW, however we've mangled the capability
|
|
|
|
* register such that we only report a single size, ie. the current
|
|
|
|
* BAR size. A write of an unsupported value is undefined, therefore
|
|
|
|
* the register field is essentially RO.
|
|
|
|
*/
|
|
|
|
vfio_add_emulated_long(vdev, pos + PCI_REBAR_CAP + (i * 8), cap, ~0);
|
|
|
|
vfio_add_emulated_long(vdev, pos + PCI_REBAR_CTRL + (i * 8), ctrl, ~0);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-10-17 19:57:58 +03:00
|
|
|
static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
|
2016-06-30 22:00:23 +03:00
|
|
|
{
|
|
|
|
PCIDevice *pdev = &vdev->pdev;
|
|
|
|
uint32_t header;
|
|
|
|
uint16_t cap_id, next, size;
|
|
|
|
uint8_t cap_ver;
|
|
|
|
uint8_t *config;
|
|
|
|
|
2016-06-30 22:00:23 +03:00
|
|
|
/* Only add extended caps if we have them and the guest can see them */
|
2017-11-29 11:46:27 +03:00
|
|
|
if (!pci_is_express(pdev) || !pci_bus_is_express(pci_get_bus(pdev)) ||
|
2016-06-30 22:00:23 +03:00
|
|
|
!pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
|
2016-10-17 19:57:58 +03:00
|
|
|
return;
|
2016-06-30 22:00:23 +03:00
|
|
|
}
|
|
|
|
|
2016-06-30 22:00:23 +03:00
|
|
|
/*
|
|
|
|
* pcie_add_capability always inserts the new capability at the tail
|
|
|
|
* of the chain. Therefore to end up with a chain that matches the
|
|
|
|
* physical device, we cache the config space to avoid overwriting
|
|
|
|
* the original config space when we parse the extended capabilities.
|
|
|
|
*/
|
|
|
|
config = g_memdup(pdev->config, vdev->config_size);
|
|
|
|
|
2016-06-30 22:00:23 +03:00
|
|
|
/*
|
|
|
|
* Extended capabilities are chained with each pointing to the next, so we
|
|
|
|
* can drop anything other than the head of the chain simply by modifying
|
2017-02-22 23:19:58 +03:00
|
|
|
* the previous next pointer. Seed the head of the chain here such that
|
|
|
|
* we can simply skip any capabilities we want to drop below, regardless
|
|
|
|
* of their position in the chain. If this stub capability still exists
|
|
|
|
* after we add the capabilities we want to expose, update the capability
|
|
|
|
* ID to zero. Note that we cannot seed with the capability header being
|
|
|
|
* zero as this conflicts with definition of an absent capability chain
|
|
|
|
* and prevents capabilities beyond the head of the list from being added.
|
|
|
|
* By replacing the dummy capability ID with zero after walking the device
|
|
|
|
* chain, we also transparently mark extended capabilities as absent if
|
|
|
|
* no capabilities were added. Note that the PCIe spec defines an absence
|
|
|
|
* of extended capabilities to be determined by a value of zero for the
|
|
|
|
* capability ID, version, AND next pointer. A non-zero next pointer
|
|
|
|
* should be sufficient to indicate additional capabilities are present,
|
|
|
|
* which will occur if we call pcie_add_capability() below. The entire
|
|
|
|
* first dword is emulated to support this.
|
|
|
|
*
|
|
|
|
* NB. The kernel side does similar masking, so be prepared that our
|
|
|
|
* view of the device may also contain a capability ID zero in the head
|
|
|
|
* of the chain. Skip it for the same reason that we cannot seed the
|
|
|
|
* chain with a zero capability.
|
2016-06-30 22:00:23 +03:00
|
|
|
*/
|
|
|
|
pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
|
|
|
|
PCI_EXT_CAP(0xFFFF, 0, 0));
|
|
|
|
pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0);
|
|
|
|
pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0);
|
|
|
|
|
2016-06-30 22:00:23 +03:00
|
|
|
for (next = PCI_CONFIG_SPACE_SIZE; next;
|
|
|
|
next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
|
|
|
|
header = pci_get_long(config + next);
|
|
|
|
cap_id = PCI_EXT_CAP_ID(header);
|
|
|
|
cap_ver = PCI_EXT_CAP_VER(header);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If it becomes important to configure extended capabilities to their
|
|
|
|
* actual size, use this as the default when it's something we don't
|
|
|
|
* recognize. Since QEMU doesn't actually handle many of the config
|
|
|
|
* accesses, exact size doesn't seem worthwhile.
|
|
|
|
*/
|
|
|
|
size = vfio_ext_cap_max_size(config, next);
|
|
|
|
|
|
|
|
/* Use emulated next pointer to allow dropping extended caps */
|
|
|
|
pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
|
|
|
|
PCI_EXT_CAP_NEXT_MASK);
|
2016-06-30 22:00:23 +03:00
|
|
|
|
|
|
|
switch (cap_id) {
|
2017-02-22 23:19:58 +03:00
|
|
|
case 0: /* kernel masked capability */
|
2016-06-30 22:00:23 +03:00
|
|
|
case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
|
vfio/pci: Hide ARI capability
QEMU supports ARI on downstream ports and assigned devices may support
ARI in their extended capabilities. The endpoint ARI capability
specifies the next function, such that the OS doesn't need to walk
each possible function, however this next function is relative to the
host, not the guest. This leads to device discovery issues when we
combine separate functions into virtual multi-function packages in a
guest. For example, SR-IOV VFs are not enumerated by simply probing
the function address space, therefore the ARI next-function field is
zero. When we combine multiple VFs together as a multi-function
device in the guest, the guest OS identifies ARI is enabled, relies on
this next-function field, and stops looking for additional function
after the first is found.
Long term we should expose the ARI capability to the guest to enable
configurations with more than 8 functions per slot, but this requires
additional QEMU PCI infrastructure to manage the next-function field
for multiple, otherwise independent devices. In the short term,
hiding this capability allows equivalent functionality to what we
currently have on non-express chipsets.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-07-18 19:55:17 +03:00
|
|
|
case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
|
2016-06-30 22:00:23 +03:00
|
|
|
trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
|
|
|
|
break;
|
vfio/pci: Static Resizable BAR capability
The PCI Resizable BAR (ReBAR) capability is currently hidden from the
VM because the protocol for interacting with the capability does not
support a mechanism for the device to reject an advertised supported
BAR size. However, when assigned to a VM, the act of resizing the
BAR requires adjustment of host resources for the device, which
absolutely can fail. Linux does not currently allow us to reserve
resources for the device independent of the current usage.
The only writable field within the ReBAR capability is the BAR Size
register. The PCIe spec indicates that when written, the device
should immediately begin to operate with the provided BAR size. The
spec however also notes that software must only write values
corresponding to supported sizes as indicated in the capability and
control registers. Writing unsupported sizes produces undefined
results. Therefore, if the hypervisor were to virtualize the
capability and control registers such that the current size is the
only indicated available size, then a write of anything other than
the current size falls into the category of undefined behavior,
where we can essentially expose the modified ReBAR capability as
read-only.
This may seem pointless, but users have reported that virtualizing
the capability in this way not only allows guest software to expose
related features as available (even if only cosmetic), but in some
scenarios can resolve guest driver issues. Additionally, no
regressions in behavior have been reported for this change.
A caveat here is that the PCIe spec requires for compatibility that
devices report support for a size in the range of 1MB to 512GB,
therefore if the current BAR size falls outside that range we revert
to hiding the capability.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/r/20230505232308.2869912-1-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-05-04 23:42:48 +03:00
|
|
|
case PCI_EXT_CAP_ID_REBAR:
|
|
|
|
if (!vfio_setup_rebar_ecap(vdev, next)) {
|
|
|
|
pcie_add_capability(pdev, cap_id, cap_ver, next, size);
|
|
|
|
}
|
|
|
|
break;
|
2016-06-30 22:00:23 +03:00
|
|
|
default:
|
|
|
|
pcie_add_capability(pdev, cap_id, cap_ver, next, size);
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Cleanup chain head ID if necessary */
|
|
|
|
if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
|
|
|
|
pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
|
2016-06-30 22:00:23 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
g_free(config);
|
2016-10-17 19:57:58 +03:00
|
|
|
return;
|
2016-06-30 22:00:23 +03:00
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
static bool vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
|
|
|
PCIDevice *pdev = &vdev->pdev;
|
|
|
|
|
|
|
|
if (!(pdev->config[PCI_STATUS] & PCI_STATUS_CAP_LIST) ||
|
|
|
|
!pdev->config[PCI_CAPABILITY_LIST]) {
|
2024-05-22 07:40:08 +03:00
|
|
|
return true; /* Nothing to add */
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:08 +03:00
|
|
|
if (!vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST], errp)) {
|
|
|
|
return false;
|
2016-06-30 22:00:23 +03:00
|
|
|
}
|
|
|
|
|
2016-10-17 19:57:58 +03:00
|
|
|
vfio_add_ext_cap(vdev);
|
2024-05-22 07:40:08 +03:00
|
|
|
return true;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2023-11-21 11:44:07 +03:00
|
|
|
void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
|
2013-10-02 23:51:00 +04:00
|
|
|
{
|
|
|
|
PCIDevice *pdev = &vdev->pdev;
|
|
|
|
uint16_t cmd;
|
|
|
|
|
|
|
|
vfio_disable_interrupts(vdev);
|
|
|
|
|
|
|
|
/* Make sure the device is in D0 */
|
|
|
|
if (vdev->pm_cap) {
|
|
|
|
uint16_t pmcsr;
|
|
|
|
uint8_t state;
|
|
|
|
|
|
|
|
pmcsr = vfio_pci_read_config(pdev, vdev->pm_cap + PCI_PM_CTRL, 2);
|
|
|
|
state = pmcsr & PCI_PM_CTRL_STATE_MASK;
|
|
|
|
if (state) {
|
|
|
|
pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
|
|
|
|
vfio_pci_write_config(pdev, vdev->pm_cap + PCI_PM_CTRL, pmcsr, 2);
|
|
|
|
/* vfio handles the necessary delay here */
|
|
|
|
pmcsr = vfio_pci_read_config(pdev, vdev->pm_cap + PCI_PM_CTRL, 2);
|
|
|
|
state = pmcsr & PCI_PM_CTRL_STATE_MASK;
|
|
|
|
if (state) {
|
2014-03-25 22:08:52 +04:00
|
|
|
error_report("vfio: Unable to power on device, stuck in D%d",
|
2013-10-02 23:51:00 +04:00
|
|
|
state);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2021-07-30 04:26:13 +03:00
|
|
|
* Stop any ongoing DMA by disconnecting I/O, MMIO, and bus master.
|
2013-10-02 23:51:00 +04:00
|
|
|
* Also put INTx Disable in known state.
|
|
|
|
*/
|
|
|
|
cmd = vfio_pci_read_config(pdev, PCI_COMMAND, 2);
|
|
|
|
cmd &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
|
|
|
|
PCI_COMMAND_INTX_DISABLE);
|
|
|
|
vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
|
|
|
|
}
|
|
|
|
|
2023-11-21 11:44:07 +03:00
|
|
|
void vfio_pci_post_reset(VFIOPCIDevice *vdev)
|
2013-10-02 23:51:00 +04:00
|
|
|
{
|
2016-10-17 19:57:58 +03:00
|
|
|
Error *err = NULL;
|
2016-10-31 18:53:04 +03:00
|
|
|
int nr;
|
2016-10-17 19:57:58 +03:00
|
|
|
|
2024-05-22 07:40:06 +03:00
|
|
|
if (!vfio_intx_enable(vdev, &err)) {
|
2018-10-17 11:26:30 +03:00
|
|
|
error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
|
2016-10-17 19:57:58 +03:00
|
|
|
}
|
2016-10-31 18:53:04 +03:00
|
|
|
|
|
|
|
for (nr = 0; nr < PCI_NUM_REGIONS - 1; ++nr) {
|
|
|
|
off_t addr = vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr);
|
|
|
|
uint32_t val = 0;
|
|
|
|
uint32_t len = sizeof(val);
|
|
|
|
|
|
|
|
if (pwrite(vdev->vbasedev.fd, &val, len, addr) != len) {
|
|
|
|
error_report("%s(%s) reset bar %d failed: %m", __func__,
|
|
|
|
vdev->vbasedev.name, nr);
|
|
|
|
}
|
|
|
|
}
|
2018-06-05 17:23:17 +03:00
|
|
|
|
|
|
|
vfio_quirk_reset(vdev);
|
2013-10-02 23:51:00 +04:00
|
|
|
}
|
|
|
|
|
2023-11-21 11:44:07 +03:00
|
|
|
bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
|
2013-10-02 23:51:00 +04:00
|
|
|
{
|
vfio: Add sysfsdev property for pci & platform
vfio-pci currently requires a host= parameter, which comes in the
form of a PCI address in [domain:]<bus:slot.function> notation. We
expect to find a matching entry in sysfs for that under
/sys/bus/pci/devices/. vfio-platform takes a similar approach, but
defines the host= parameter to be a string, which can be matched
directly under /sys/bus/platform/devices/. On the PCI side, we have
some interest in using vfio to expose vGPU devices. These are not
actual discrete PCI devices, so they don't have a compatible host PCI
bus address or a device link where QEMU wants to look for it. There's
also really no requirement that vfio can only be used to expose
physical devices, a new vfio bus and iommu driver could expose a
completely emulated device. To fit within the vfio framework, it
would need a kernel struct device and associated IOMMU group, but
those are easy constraints to manage.
To support such devices, which would include vGPUs, that honor the
VFIO PCI programming API, but are not necessarily backed by a unique
PCI address, add support for specifying any device in sysfs. The
vfio API already has support for probing the device type to ensure
compatibility with either vfio-pci or vfio-platform.
With this, a vfio-pci device could either be specified as:
-device vfio-pci,host=02:00.0
or
-device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0
or even
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0
When vGPU support comes along, this might look something more like:
-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0
NB - This is only a made up example path
The same change is made for vfio-platform, specifying sysfsdev has
precedence over the old host option.
Tested-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 19:39:07 +03:00
|
|
|
char tmp[13];
|
|
|
|
|
|
|
|
sprintf(tmp, "%04x:%02x:%02x.%1x", addr->domain,
|
|
|
|
addr->bus, addr->slot, addr->function);
|
|
|
|
|
|
|
|
return (strcmp(tmp, name) == 0);
|
2013-10-02 23:51:00 +04:00
|
|
|
}
|
|
|
|
|
2023-11-21 11:44:06 +03:00
|
|
|
int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
|
|
|
|
struct vfio_pci_hot_reset_info **info_p)
|
2013-10-02 23:51:00 +04:00
|
|
|
{
|
|
|
|
struct vfio_pci_hot_reset_info *info;
|
2023-11-21 11:44:06 +03:00
|
|
|
int ret, count;
|
2013-10-02 23:51:00 +04:00
|
|
|
|
2023-11-21 11:44:06 +03:00
|
|
|
assert(info_p && !*info_p);
|
2013-10-02 23:51:00 +04:00
|
|
|
|
|
|
|
info = g_malloc0(sizeof(*info));
|
|
|
|
info->argsz = sizeof(*info);
|
|
|
|
|
2014-12-20 01:24:31 +03:00
|
|
|
ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
|
2013-10-02 23:51:00 +04:00
|
|
|
if (ret && errno != ENOSPC) {
|
|
|
|
ret = -errno;
|
2023-11-21 11:44:06 +03:00
|
|
|
g_free(info);
|
2013-10-02 23:51:00 +04:00
|
|
|
if (!vdev->has_pm_reset) {
|
vfio: Add sysfsdev property for pci & platform
vfio-pci currently requires a host= parameter, which comes in the
form of a PCI address in [domain:]<bus:slot.function> notation. We
expect to find a matching entry in sysfs for that under
/sys/bus/pci/devices/. vfio-platform takes a similar approach, but
defines the host= parameter to be a string, which can be matched
directly under /sys/bus/platform/devices/. On the PCI side, we have
some interest in using vfio to expose vGPU devices. These are not
actual discrete PCI devices, so they don't have a compatible host PCI
bus address or a device link where QEMU wants to look for it. There's
also really no requirement that vfio can only be used to expose
physical devices, a new vfio bus and iommu driver could expose a
completely emulated device. To fit within the vfio framework, it
would need a kernel struct device and associated IOMMU group, but
those are easy constraints to manage.
To support such devices, which would include vGPUs, that honor the
VFIO PCI programming API, but are not necessarily backed by a unique
PCI address, add support for specifying any device in sysfs. The
vfio API already has support for probing the device type to ensure
compatibility with either vfio-pci or vfio-platform.
With this, a vfio-pci device could either be specified as:
-device vfio-pci,host=02:00.0
or
-device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0
or even
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0
When vGPU support comes along, this might look something more like:
-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0
NB - This is only a made up example path
The same change is made for vfio-platform, specifying sysfsdev has
precedence over the old host option.
Tested-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 19:39:07 +03:00
|
|
|
error_report("vfio: Cannot reset device %s, "
|
|
|
|
"no available reset mechanism.", vdev->vbasedev.name);
|
2013-10-02 23:51:00 +04:00
|
|
|
}
|
2023-11-21 11:44:06 +03:00
|
|
|
return ret;
|
2013-10-02 23:51:00 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
count = info->count;
|
2023-11-21 11:44:06 +03:00
|
|
|
info = g_realloc(info, sizeof(*info) + (count * sizeof(info->devices[0])));
|
|
|
|
info->argsz = sizeof(*info) + (count * sizeof(info->devices[0]));
|
2013-10-02 23:51:00 +04:00
|
|
|
|
2014-12-20 01:24:31 +03:00
|
|
|
ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
|
2013-10-02 23:51:00 +04:00
|
|
|
if (ret) {
|
|
|
|
ret = -errno;
|
2023-11-21 11:44:06 +03:00
|
|
|
g_free(info);
|
2013-10-02 23:51:00 +04:00
|
|
|
error_report("vfio: hot reset info failed: %m");
|
2023-11-21 11:44:06 +03:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
*info_p = info;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
|
|
|
|
{
|
2023-11-21 11:44:07 +03:00
|
|
|
VFIODevice *vbasedev = &vdev->vbasedev;
|
2024-06-17 09:34:06 +03:00
|
|
|
const VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(vbasedev->bcontainer);
|
2013-10-02 23:51:00 +04:00
|
|
|
|
2024-06-17 09:34:06 +03:00
|
|
|
return vioc->pci_hot_reset(vbasedev, single);
|
2013-10-02 23:51:00 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2021-07-30 04:26:13 +03:00
|
|
|
* We want to differentiate hot reset of multiple in-use devices vs hot reset
|
2013-10-02 23:51:00 +04:00
|
|
|
* of a single in-use device. VFIO_DEVICE_RESET will already handle the case
|
|
|
|
* of doing hot resets when there is only a single device per bus. The in-use
|
|
|
|
* here refers to how many VFIODevices are affected. A hot reset that affects
|
|
|
|
* multiple devices, but only a single in-use device, means that we can call
|
|
|
|
* it from our bus ->reset() callback since the extent is effectively a single
|
|
|
|
* device. This allows us to make use of it in the hotplug path. When there
|
|
|
|
* are multiple in-use devices, we can only trigger the hot reset during a
|
|
|
|
* system reset and thus from our reset handler. We separate _one vs _multi
|
|
|
|
* here so that we don't overlap and do a double reset on the system reset
|
|
|
|
* path where both our reset handler and ->reset() callback are used. Calling
|
|
|
|
* _one() will only do a hot reset for the one in-use devices case, calling
|
|
|
|
* _multi() will do nothing if a _one() would have been sufficient.
|
|
|
|
*/
|
2014-12-20 01:24:15 +03:00
|
|
|
static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
|
2013-10-02 23:51:00 +04:00
|
|
|
{
|
|
|
|
return vfio_pci_hot_reset(vdev, true);
|
|
|
|
}
|
|
|
|
|
2014-12-22 19:54:35 +03:00
|
|
|
static int vfio_pci_hot_reset_multi(VFIODevice *vbasedev)
|
2013-10-02 23:51:00 +04:00
|
|
|
{
|
2014-12-22 19:54:35 +03:00
|
|
|
VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
|
2013-10-02 23:51:00 +04:00
|
|
|
return vfio_pci_hot_reset(vdev, false);
|
|
|
|
}
|
|
|
|
|
2014-12-22 19:54:35 +03:00
|
|
|
static void vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
|
|
|
|
{
|
|
|
|
VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
|
|
|
|
if (!vbasedev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
|
|
|
|
vbasedev->needs_reset = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-10-26 12:36:12 +03:00
|
|
|
static Object *vfio_pci_get_object(VFIODevice *vbasedev)
|
|
|
|
{
|
|
|
|
VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
|
|
|
|
|
|
|
|
return OBJECT(vdev);
|
|
|
|
}
|
|
|
|
|
2020-10-26 12:36:13 +03:00
|
|
|
static bool vfio_msix_present(void *opaque, int version_id)
|
|
|
|
{
|
|
|
|
PCIDevice *pdev = opaque;
|
|
|
|
|
|
|
|
return msix_present(pdev);
|
|
|
|
}
|
|
|
|
|
2023-10-09 09:32:47 +03:00
|
|
|
static bool vfio_display_migration_needed(void *opaque)
|
|
|
|
{
|
|
|
|
VFIOPCIDevice *vdev = opaque;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We need to migrate the VFIODisplay object if ramfb *migration* was
|
|
|
|
* explicitly requested (in which case we enforced both ramfb=on and
|
|
|
|
* display=on), or ramfb migration was left at the default "auto"
|
|
|
|
* setting, and *ramfb* was explicitly requested (in which case we
|
|
|
|
* enforced display=on).
|
|
|
|
*/
|
|
|
|
return vdev->ramfb_migrate == ON_OFF_AUTO_ON ||
|
|
|
|
(vdev->ramfb_migrate == ON_OFF_AUTO_AUTO && vdev->enable_ramfb);
|
|
|
|
}
|
|
|
|
|
2024-03-01 21:56:26 +03:00
|
|
|
static const VMStateDescription vmstate_vfio_display = {
|
2023-10-09 09:32:47 +03:00
|
|
|
.name = "VFIOPCIDevice/VFIODisplay",
|
|
|
|
.version_id = 1,
|
|
|
|
.minimum_version_id = 1,
|
|
|
|
.needed = vfio_display_migration_needed,
|
2023-12-21 06:16:40 +03:00
|
|
|
.fields = (const VMStateField[]){
|
2023-10-09 09:32:47 +03:00
|
|
|
VMSTATE_STRUCT_POINTER(dpy, VFIOPCIDevice, vfio_display_vmstate,
|
|
|
|
VFIODisplay),
|
|
|
|
VMSTATE_END_OF_LIST()
|
|
|
|
}
|
|
|
|
};
|
|
|
|
|
2024-03-01 21:56:26 +03:00
|
|
|
static const VMStateDescription vmstate_vfio_pci_config = {
|
2020-10-26 12:36:13 +03:00
|
|
|
.name = "VFIOPCIDevice",
|
|
|
|
.version_id = 1,
|
|
|
|
.minimum_version_id = 1,
|
2023-12-21 06:16:40 +03:00
|
|
|
.fields = (const VMStateField[]) {
|
2020-10-26 12:36:13 +03:00
|
|
|
VMSTATE_PCI_DEVICE(pdev, VFIOPCIDevice),
|
|
|
|
VMSTATE_MSIX_TEST(pdev, VFIOPCIDevice, vfio_msix_present),
|
|
|
|
VMSTATE_END_OF_LIST()
|
2023-10-09 09:32:47 +03:00
|
|
|
},
|
2023-12-21 06:16:40 +03:00
|
|
|
.subsections = (const VMStateDescription * const []) {
|
2023-10-09 09:32:47 +03:00
|
|
|
&vmstate_vfio_display,
|
|
|
|
NULL
|
2020-10-26 12:36:13 +03:00
|
|
|
}
|
|
|
|
};
|
|
|
|
|
2024-05-16 15:46:54 +03:00
|
|
|
static int vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f, Error **errp)
|
2020-10-26 12:36:13 +03:00
|
|
|
{
|
|
|
|
VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
|
|
|
|
|
2024-05-16 15:46:54 +03:00
|
|
|
return vmstate_save_state_with_err(f, &vmstate_vfio_pci_config, vdev, NULL,
|
|
|
|
errp);
|
2020-10-26 12:36:13 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
|
|
|
|
{
|
|
|
|
VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
|
|
|
|
PCIDevice *pdev = &vdev->pdev;
|
2021-10-27 12:04:05 +03:00
|
|
|
pcibus_t old_addr[PCI_NUM_REGIONS - 1];
|
|
|
|
int bar, ret;
|
|
|
|
|
|
|
|
for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
|
|
|
|
old_addr[bar] = pdev->io_regions[bar].addr;
|
|
|
|
}
|
2020-10-26 12:36:13 +03:00
|
|
|
|
|
|
|
ret = vmstate_load_state(f, &vmstate_vfio_pci_config, vdev, 1);
|
|
|
|
if (ret) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
vfio_pci_write_config(pdev, PCI_COMMAND,
|
|
|
|
pci_get_word(pdev->config + PCI_COMMAND), 2);
|
|
|
|
|
2021-10-27 12:04:05 +03:00
|
|
|
for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
|
|
|
|
/*
|
|
|
|
* The address may not be changed in some scenarios
|
|
|
|
* (e.g. the VF driver isn't loaded in VM).
|
|
|
|
*/
|
|
|
|
if (old_addr[bar] != pdev->io_regions[bar].addr &&
|
|
|
|
vdev->bars[bar].region.size > 0 &&
|
2022-03-23 18:57:22 +03:00
|
|
|
vdev->bars[bar].region.size < qemu_real_host_page_size()) {
|
2021-10-27 12:04:05 +03:00
|
|
|
vfio_sub_page_bar_update_mapping(pdev, bar);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-10-26 12:36:13 +03:00
|
|
|
if (msi_enabled(pdev)) {
|
|
|
|
vfio_msi_enable(vdev);
|
|
|
|
} else if (msix_enabled(pdev)) {
|
|
|
|
vfio_msix_enable(vdev);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-12-22 19:54:35 +03:00
|
|
|
static VFIODeviceOps vfio_pci_ops = {
|
|
|
|
.vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
|
|
|
|
.vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
|
2015-09-23 22:04:43 +03:00
|
|
|
.vfio_eoi = vfio_intx_eoi,
|
2020-10-26 12:36:12 +03:00
|
|
|
.vfio_get_object = vfio_pci_get_object,
|
2020-10-26 12:36:13 +03:00
|
|
|
.vfio_save_config = vfio_pci_save_config,
|
|
|
|
.vfio_load_config = vfio_pci_load_config,
|
2014-12-22 19:54:35 +03:00
|
|
|
};
|
|
|
|
|
2024-05-22 07:40:07 +03:00
|
|
|
bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
|
2016-03-10 19:39:08 +03:00
|
|
|
{
|
|
|
|
VFIODevice *vbasedev = &vdev->vbasedev;
|
2024-05-22 07:40:12 +03:00
|
|
|
g_autofree struct vfio_region_info *reg_info = NULL;
|
2016-03-10 19:39:08 +03:00
|
|
|
int ret;
|
|
|
|
|
2016-05-26 18:43:20 +03:00
|
|
|
ret = vfio_get_region_info(vbasedev, VFIO_PCI_VGA_REGION_INDEX, ®_info);
|
|
|
|
if (ret) {
|
2016-10-17 19:57:57 +03:00
|
|
|
error_setg_errno(errp, -ret,
|
|
|
|
"failed getting region info for VGA region index %d",
|
|
|
|
VFIO_PCI_VGA_REGION_INDEX);
|
2024-05-22 07:40:07 +03:00
|
|
|
return false;
|
2016-05-26 18:43:20 +03:00
|
|
|
}
|
2016-03-10 19:39:08 +03:00
|
|
|
|
2016-05-26 18:43:20 +03:00
|
|
|
if (!(reg_info->flags & VFIO_REGION_INFO_FLAG_READ) ||
|
|
|
|
!(reg_info->flags & VFIO_REGION_INFO_FLAG_WRITE) ||
|
|
|
|
reg_info->size < 0xbffff + 1) {
|
2016-10-17 19:57:57 +03:00
|
|
|
error_setg(errp, "unexpected VGA info, flags 0x%lx, size 0x%lx",
|
|
|
|
(unsigned long)reg_info->flags,
|
|
|
|
(unsigned long)reg_info->size);
|
2024-05-22 07:40:07 +03:00
|
|
|
return false;
|
2016-05-26 18:43:20 +03:00
|
|
|
}
|
2016-03-10 19:39:08 +03:00
|
|
|
|
2016-05-26 18:43:20 +03:00
|
|
|
vdev->vga = g_new0(VFIOVGA, 1);
|
2016-03-10 19:39:08 +03:00
|
|
|
|
2016-05-26 18:43:20 +03:00
|
|
|
vdev->vga->fd_offset = reg_info->offset;
|
|
|
|
vdev->vga->fd = vdev->vbasedev.fd;
|
2016-03-10 19:39:08 +03:00
|
|
|
|
2016-05-26 18:43:20 +03:00
|
|
|
vdev->vga->region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
|
|
|
|
vdev->vga->region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
|
|
|
|
QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_MEM].quirks);
|
2016-03-10 19:39:08 +03:00
|
|
|
|
2016-05-26 18:43:21 +03:00
|
|
|
memory_region_init_io(&vdev->vga->region[QEMU_PCI_VGA_MEM].mem,
|
|
|
|
OBJECT(vdev), &vfio_vga_ops,
|
|
|
|
&vdev->vga->region[QEMU_PCI_VGA_MEM],
|
|
|
|
"vfio-vga-mmio@0xa0000",
|
|
|
|
QEMU_PCI_VGA_MEM_SIZE);
|
|
|
|
|
2016-05-26 18:43:20 +03:00
|
|
|
vdev->vga->region[QEMU_PCI_VGA_IO_LO].offset = QEMU_PCI_VGA_IO_LO_BASE;
|
|
|
|
vdev->vga->region[QEMU_PCI_VGA_IO_LO].nr = QEMU_PCI_VGA_IO_LO;
|
|
|
|
QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_LO].quirks);
|
2016-03-10 19:39:08 +03:00
|
|
|
|
2016-05-26 18:43:21 +03:00
|
|
|
memory_region_init_io(&vdev->vga->region[QEMU_PCI_VGA_IO_LO].mem,
|
|
|
|
OBJECT(vdev), &vfio_vga_ops,
|
|
|
|
&vdev->vga->region[QEMU_PCI_VGA_IO_LO],
|
|
|
|
"vfio-vga-io@0x3b0",
|
|
|
|
QEMU_PCI_VGA_IO_LO_SIZE);
|
|
|
|
|
2016-05-26 18:43:20 +03:00
|
|
|
vdev->vga->region[QEMU_PCI_VGA_IO_HI].offset = QEMU_PCI_VGA_IO_HI_BASE;
|
|
|
|
vdev->vga->region[QEMU_PCI_VGA_IO_HI].nr = QEMU_PCI_VGA_IO_HI;
|
|
|
|
QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].quirks);
|
2016-03-10 19:39:08 +03:00
|
|
|
|
2016-05-26 18:43:21 +03:00
|
|
|
memory_region_init_io(&vdev->vga->region[QEMU_PCI_VGA_IO_HI].mem,
|
|
|
|
OBJECT(vdev), &vfio_vga_ops,
|
|
|
|
&vdev->vga->region[QEMU_PCI_VGA_IO_HI],
|
|
|
|
"vfio-vga-io@0x3c0",
|
|
|
|
QEMU_PCI_VGA_IO_HI_SIZE);
|
|
|
|
|
|
|
|
pci_register_vga(&vdev->pdev, &vdev->vga->region[QEMU_PCI_VGA_MEM].mem,
|
|
|
|
&vdev->vga->region[QEMU_PCI_VGA_IO_LO].mem,
|
|
|
|
&vdev->vga->region[QEMU_PCI_VGA_IO_HI].mem);
|
|
|
|
|
2024-05-22 07:40:07 +03:00
|
|
|
return true;
|
2016-03-10 19:39:08 +03:00
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:05 +03:00
|
|
|
static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
2015-02-10 20:25:44 +03:00
|
|
|
VFIODevice *vbasedev = &vdev->vbasedev;
|
2024-05-22 07:40:12 +03:00
|
|
|
g_autofree struct vfio_region_info *reg_info = NULL;
|
2013-07-16 01:49:49 +04:00
|
|
|
struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
|
2014-12-22 19:54:38 +03:00
|
|
|
int i, ret = -1;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
|
|
|
/* Sanity check device */
|
2014-12-22 19:54:38 +03:00
|
|
|
if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
|
2016-10-17 19:57:57 +03:00
|
|
|
error_setg(errp, "this isn't a PCI device");
|
2024-05-22 07:40:05 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2014-12-22 19:54:38 +03:00
|
|
|
if (vbasedev->num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
|
2016-10-17 19:57:57 +03:00
|
|
|
error_setg(errp, "unexpected number of io regions %u",
|
|
|
|
vbasedev->num_regions);
|
2024-05-22 07:40:05 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2014-12-22 19:54:38 +03:00
|
|
|
if (vbasedev->num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
|
2016-10-17 19:57:57 +03:00
|
|
|
error_setg(errp, "unexpected number of irqs %u", vbasedev->num_irqs);
|
2024-05-22 07:40:05 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
|
2016-03-10 19:39:07 +03:00
|
|
|
char *name = g_strdup_printf("%s BAR %d", vbasedev->name, i);
|
|
|
|
|
|
|
|
ret = vfio_region_setup(OBJECT(vdev), vbasedev,
|
|
|
|
&vdev->bars[i].region, i, name);
|
|
|
|
g_free(name);
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
if (ret) {
|
2016-10-17 19:57:57 +03:00
|
|
|
error_setg_errno(errp, -ret, "failed to get region %d info", i);
|
2024-05-22 07:40:05 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2013-04-01 23:34:40 +04:00
|
|
|
QLIST_INIT(&vdev->bars[i].quirks);
|
2016-03-10 19:39:07 +03:00
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2016-03-10 19:39:07 +03:00
|
|
|
ret = vfio_get_region_info(vbasedev,
|
|
|
|
VFIO_PCI_CONFIG_REGION_INDEX, ®_info);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
if (ret) {
|
2016-10-17 19:57:57 +03:00
|
|
|
error_setg_errno(errp, -ret, "failed to get config info");
|
2024-05-22 07:40:05 +03:00
|
|
|
return false;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2014-12-22 19:54:38 +03:00
|
|
|
trace_vfio_populate_device_config(vdev->vbasedev.name,
|
2016-03-10 19:39:07 +03:00
|
|
|
(unsigned long)reg_info->size,
|
|
|
|
(unsigned long)reg_info->offset,
|
|
|
|
(unsigned long)reg_info->flags);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2016-03-10 19:39:07 +03:00
|
|
|
vdev->config_size = reg_info->size;
|
2013-01-24 04:46:13 +04:00
|
|
|
if (vdev->config_size == PCI_CONFIG_SPACE_SIZE) {
|
|
|
|
vdev->pdev.cap_present &= ~QEMU_PCI_CAP_EXPRESS;
|
|
|
|
}
|
2016-03-10 19:39:07 +03:00
|
|
|
vdev->config_offset = reg_info->offset;
|
|
|
|
|
2016-03-10 19:39:08 +03:00
|
|
|
if (vdev->features & VFIO_FEATURE_ENABLE_VGA) {
|
2024-05-22 07:40:07 +03:00
|
|
|
if (!vfio_populate_vga(vdev, errp)) {
|
2016-10-17 19:57:57 +03:00
|
|
|
error_append_hint(errp, "device does not support "
|
2016-10-17 19:57:57 +03:00
|
|
|
"requested feature x-vga\n");
|
2024-05-22 07:40:05 +03:00
|
|
|
return false;
|
2013-04-01 23:33:44 +04:00
|
|
|
}
|
|
|
|
}
|
2015-03-02 21:38:55 +03:00
|
|
|
|
2013-07-16 01:49:49 +04:00
|
|
|
irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
|
|
|
|
|
2014-12-20 01:24:31 +03:00
|
|
|
ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
|
2013-07-16 01:49:49 +04:00
|
|
|
if (ret) {
|
|
|
|
/* This can fail for an old kernel or legacy PCI dev */
|
2019-01-23 15:00:15 +03:00
|
|
|
trace_vfio_populate_device_get_irq_info_failure(strerror(errno));
|
2013-07-16 01:49:49 +04:00
|
|
|
} else if (irq_info.count == 1) {
|
|
|
|
vdev->pci_aer = true;
|
|
|
|
} else {
|
2018-10-17 11:26:29 +03:00
|
|
|
warn_report(VFIO_MSG_PREFIX
|
|
|
|
"Could not enable error recovery for the device",
|
|
|
|
vbasedev->name);
|
2013-07-16 01:49:49 +04:00
|
|
|
}
|
2024-05-22 07:40:05 +03:00
|
|
|
|
|
|
|
return true;
|
2014-12-22 19:54:38 +03:00
|
|
|
}
|
|
|
|
|
2023-09-22 05:52:23 +03:00
|
|
|
static void vfio_pci_put_device(VFIOPCIDevice *vdev)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
2023-10-09 12:09:09 +03:00
|
|
|
vfio_detach_device(&vdev->vbasedev);
|
|
|
|
|
2014-12-22 19:54:31 +03:00
|
|
|
g_free(vdev->vbasedev.name);
|
2016-03-10 19:39:07 +03:00
|
|
|
g_free(vdev->msix);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2013-07-16 01:49:49 +04:00
|
|
|
static void vfio_err_notifier_handler(void *opaque)
|
|
|
|
{
|
2014-12-20 01:24:15 +03:00
|
|
|
VFIOPCIDevice *vdev = opaque;
|
2013-07-16 01:49:49 +04:00
|
|
|
|
|
|
|
if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TBD. Retrieve the error details and decide what action
|
|
|
|
* needs to be taken. One of the actions could be to pass
|
|
|
|
* the error to the guest and have the guest driver recover
|
|
|
|
* from the error. This requires that PCIe capabilities be
|
|
|
|
* exposed to the guest. For now, we just terminate the
|
|
|
|
* guest to contain the error.
|
|
|
|
*/
|
|
|
|
|
vfio: Add sysfsdev property for pci & platform
vfio-pci currently requires a host= parameter, which comes in the
form of a PCI address in [domain:]<bus:slot.function> notation. We
expect to find a matching entry in sysfs for that under
/sys/bus/pci/devices/. vfio-platform takes a similar approach, but
defines the host= parameter to be a string, which can be matched
directly under /sys/bus/platform/devices/. On the PCI side, we have
some interest in using vfio to expose vGPU devices. These are not
actual discrete PCI devices, so they don't have a compatible host PCI
bus address or a device link where QEMU wants to look for it. There's
also really no requirement that vfio can only be used to expose
physical devices, a new vfio bus and iommu driver could expose a
completely emulated device. To fit within the vfio framework, it
would need a kernel struct device and associated IOMMU group, but
those are easy constraints to manage.
To support such devices, which would include vGPUs, that honor the
VFIO PCI programming API, but are not necessarily backed by a unique
PCI address, add support for specifying any device in sysfs. The
vfio API already has support for probing the device type to ensure
compatibility with either vfio-pci or vfio-platform.
With this, a vfio-pci device could either be specified as:
-device vfio-pci,host=02:00.0
or
-device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0
or even
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0
When vGPU support comes along, this might look something more like:
-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0
NB - This is only a made up example path
The same change is made for vfio-platform, specifying sysfsdev has
precedence over the old host option.
Tested-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 19:39:07 +03:00
|
|
|
error_report("%s(%s) Unrecoverable error detected. Please collect any data possible and then kill the guest", __func__, vdev->vbasedev.name);
|
2013-07-16 01:49:49 +04:00
|
|
|
|
2014-06-30 19:56:08 +04:00
|
|
|
vm_stop(RUN_STATE_INTERNAL_ERROR);
|
2013-07-16 01:49:49 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Registers error notifier for devices supporting error recovery.
|
|
|
|
* If we encounter a failure in this function, we report an error
|
|
|
|
* and continue after disabling error recovery support for the
|
|
|
|
* device.
|
|
|
|
*/
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
|
2013-07-16 01:49:49 +04:00
|
|
|
{
|
2019-06-13 18:57:37 +03:00
|
|
|
Error *err = NULL;
|
|
|
|
int32_t fd;
|
2013-07-16 01:49:49 +04:00
|
|
|
|
|
|
|
if (!vdev->pci_aer) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (event_notifier_init(&vdev->err_notifier, 0)) {
|
2013-10-02 22:52:38 +04:00
|
|
|
error_report("vfio: Unable to init event notifier for error detection");
|
2013-07-16 01:49:49 +04:00
|
|
|
vdev->pci_aer = false;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2019-06-13 18:57:37 +03:00
|
|
|
fd = event_notifier_get_fd(&vdev->err_notifier);
|
|
|
|
qemu_set_fd_handler(fd, vfio_err_notifier_handler, NULL, vdev);
|
2013-07-16 01:49:49 +04:00
|
|
|
|
2024-05-22 07:39:59 +03:00
|
|
|
if (!vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_ERR_IRQ_INDEX, 0,
|
|
|
|
VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) {
|
2019-06-13 18:57:37 +03:00
|
|
|
error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
|
|
|
|
qemu_set_fd_handler(fd, NULL, NULL, vdev);
|
2013-07-16 01:49:49 +04:00
|
|
|
event_notifier_cleanup(&vdev->err_notifier);
|
|
|
|
vdev->pci_aer = false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-12-20 01:24:15 +03:00
|
|
|
static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
|
2013-07-16 01:49:49 +04:00
|
|
|
{
|
2019-06-13 18:57:37 +03:00
|
|
|
Error *err = NULL;
|
2013-07-16 01:49:49 +04:00
|
|
|
|
|
|
|
if (!vdev->pci_aer) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:39:59 +03:00
|
|
|
if (!vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_ERR_IRQ_INDEX, 0,
|
|
|
|
VFIO_IRQ_SET_ACTION_TRIGGER, -1, &err)) {
|
2019-06-13 18:57:37 +03:00
|
|
|
error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
|
2013-07-16 01:49:49 +04:00
|
|
|
}
|
|
|
|
qemu_set_fd_handler(event_notifier_get_fd(&vdev->err_notifier),
|
|
|
|
NULL, NULL, vdev);
|
|
|
|
event_notifier_cleanup(&vdev->err_notifier);
|
|
|
|
}
|
|
|
|
|
2015-03-02 21:38:55 +03:00
|
|
|
static void vfio_req_notifier_handler(void *opaque)
|
|
|
|
{
|
|
|
|
VFIOPCIDevice *vdev = opaque;
|
2017-02-22 23:19:58 +03:00
|
|
|
Error *err = NULL;
|
2015-03-02 21:38:55 +03:00
|
|
|
|
|
|
|
if (!event_notifier_test_and_clear(&vdev->req_notifier)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2019-05-28 19:40:19 +03:00
|
|
|
qdev_unplug(DEVICE(vdev), &err);
|
2017-02-22 23:19:58 +03:00
|
|
|
if (err) {
|
2018-10-17 11:26:29 +03:00
|
|
|
warn_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
|
2017-02-22 23:19:58 +03:00
|
|
|
}
|
2015-03-02 21:38:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static void vfio_register_req_notifier(VFIOPCIDevice *vdev)
|
|
|
|
{
|
|
|
|
struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
|
|
|
|
.index = VFIO_PCI_REQ_IRQ_INDEX };
|
2019-06-13 18:57:37 +03:00
|
|
|
Error *err = NULL;
|
|
|
|
int32_t fd;
|
2015-03-02 21:38:55 +03:00
|
|
|
|
|
|
|
if (!(vdev->features & VFIO_FEATURE_ENABLE_REQ)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ioctl(vdev->vbasedev.fd,
|
|
|
|
VFIO_DEVICE_GET_IRQ_INFO, &irq_info) < 0 || irq_info.count < 1) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (event_notifier_init(&vdev->req_notifier, 0)) {
|
|
|
|
error_report("vfio: Unable to init event notifier for device request");
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2019-06-13 18:57:37 +03:00
|
|
|
fd = event_notifier_get_fd(&vdev->req_notifier);
|
|
|
|
qemu_set_fd_handler(fd, vfio_req_notifier_handler, NULL, vdev);
|
2015-03-02 21:38:55 +03:00
|
|
|
|
2024-05-22 07:39:59 +03:00
|
|
|
if (!vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_REQ_IRQ_INDEX, 0,
|
|
|
|
VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) {
|
2019-06-13 18:57:37 +03:00
|
|
|
error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
|
|
|
|
qemu_set_fd_handler(fd, NULL, NULL, vdev);
|
2015-03-02 21:38:55 +03:00
|
|
|
event_notifier_cleanup(&vdev->req_notifier);
|
|
|
|
} else {
|
|
|
|
vdev->req_enabled = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
|
|
|
|
{
|
2019-06-13 18:57:37 +03:00
|
|
|
Error *err = NULL;
|
2015-03-02 21:38:55 +03:00
|
|
|
|
|
|
|
if (!vdev->req_enabled) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:39:59 +03:00
|
|
|
if (!vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_REQ_IRQ_INDEX, 0,
|
|
|
|
VFIO_IRQ_SET_ACTION_TRIGGER, -1, &err)) {
|
2019-06-13 18:57:37 +03:00
|
|
|
error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
|
2015-03-02 21:38:55 +03:00
|
|
|
}
|
|
|
|
qemu_set_fd_handler(event_notifier_get_fd(&vdev->req_notifier),
|
|
|
|
NULL, NULL, vdev);
|
|
|
|
event_notifier_cleanup(&vdev->req_notifier);
|
|
|
|
|
|
|
|
vdev->req_enabled = false;
|
|
|
|
}
|
|
|
|
|
2016-10-17 19:58:01 +03:00
|
|
|
static void vfio_realize(PCIDevice *pdev, Error **errp)
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
{
|
hw/vfio/pci: Fix missing ERRP_GUARD() for error_prepend()
As the comment in qapi/error, passing @errp to error_prepend() requires
ERRP_GUARD():
* = Why, when and how to use ERRP_GUARD() =
*
* Without ERRP_GUARD(), use of the @errp parameter is restricted:
...
* - It should not be passed to error_prepend(), error_vprepend() or
* error_append_hint(), because that doesn't work with &error_fatal.
* ERRP_GUARD() lifts these restrictions.
*
* To use ERRP_GUARD(), add it right at the beginning of the function.
* @errp can then be used without worrying about the argument being
* NULL or &error_fatal.
ERRP_GUARD() could avoid the case when @errp is &error_fatal, the user
can't see this additional information, because exit() happens in
error_setg earlier than information is added [1].
In hw/vfio/pci.c, there are 2 functions passing @errp to error_prepend()
without ERRP_GUARD():
- vfio_add_std_cap()
- vfio_realize()
The @errp of vfio_add_std_cap() is also from vfio_realize(). And
vfio_realize(), as a PCIDeviceClass.realize method, its @errp is from
DeviceClass.realize so that there is no guarantee that the @errp won't
point to @error_fatal.
To avoid the issue like [1] said, add missing ERRP_GUARD() at their
beginning.
[1]: Issue description in the commit message of commit ae7c80a7bd73
("error: New macro ERRP_GUARD()").
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20240311033822.3142585-24-zhao1.liu@linux.intel.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-11 06:38:16 +03:00
|
|
|
ERRP_GUARD();
|
2020-09-03 01:43:03 +03:00
|
|
|
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
|
2022-05-02 12:42:22 +03:00
|
|
|
VFIODevice *vbasedev = &vdev->vbasedev;
|
2024-05-07 09:42:42 +03:00
|
|
|
char *subsys;
|
2016-05-26 18:43:21 +03:00
|
|
|
int i, ret;
|
2018-08-17 18:27:16 +03:00
|
|
|
bool is_mdev;
|
2023-10-26 10:06:35 +03:00
|
|
|
char uuid[UUID_STR_LEN];
|
2024-05-07 09:42:42 +03:00
|
|
|
g_autofree char *name = NULL;
|
|
|
|
g_autofree char *tmp = NULL;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2023-11-21 11:44:10 +03:00
|
|
|
if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
|
2016-10-17 19:58:02 +03:00
|
|
|
if (!(~vdev->host.domain || ~vdev->host.bus ||
|
|
|
|
~vdev->host.slot || ~vdev->host.function)) {
|
|
|
|
error_setg(errp, "No provided host device");
|
2017-05-03 23:52:35 +03:00
|
|
|
error_append_hint(errp, "Use -device vfio-pci,host=DDDD:BB:DD.F "
|
2023-11-21 11:44:10 +03:00
|
|
|
#ifdef CONFIG_IOMMUFD
|
|
|
|
"or -device vfio-pci,fd=DEVICE_FD "
|
|
|
|
#endif
|
2017-05-03 23:52:35 +03:00
|
|
|
"or -device vfio-pci,sysfsdev=PATH_TO_DEVICE\n");
|
2016-10-17 19:58:02 +03:00
|
|
|
return;
|
|
|
|
}
|
2022-05-02 12:42:22 +03:00
|
|
|
vbasedev->sysfsdev =
|
vfio: Add sysfsdev property for pci & platform
vfio-pci currently requires a host= parameter, which comes in the
form of a PCI address in [domain:]<bus:slot.function> notation. We
expect to find a matching entry in sysfs for that under
/sys/bus/pci/devices/. vfio-platform takes a similar approach, but
defines the host= parameter to be a string, which can be matched
directly under /sys/bus/platform/devices/. On the PCI side, we have
some interest in using vfio to expose vGPU devices. These are not
actual discrete PCI devices, so they don't have a compatible host PCI
bus address or a device link where QEMU wants to look for it. There's
also really no requirement that vfio can only be used to expose
physical devices, a new vfio bus and iommu driver could expose a
completely emulated device. To fit within the vfio framework, it
would need a kernel struct device and associated IOMMU group, but
those are easy constraints to manage.
To support such devices, which would include vGPUs, that honor the
VFIO PCI programming API, but are not necessarily backed by a unique
PCI address, add support for specifying any device in sysfs. The
vfio API already has support for probing the device type to ensure
compatibility with either vfio-pci or vfio-platform.
With this, a vfio-pci device could either be specified as:
-device vfio-pci,host=02:00.0
or
-device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0
or even
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0
When vGPU support comes along, this might look something more like:
-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0
NB - This is only a made up example path
The same change is made for vfio-platform, specifying sysfsdev has
precedence over the old host option.
Tested-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 19:39:07 +03:00
|
|
|
g_strdup_printf("/sys/bus/pci/devices/%04x:%02x:%02x.%01x",
|
|
|
|
vdev->host.domain, vdev->host.bus,
|
|
|
|
vdev->host.slot, vdev->host.function);
|
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:00 +03:00
|
|
|
if (!vfio_device_get_name(vbasedev, errp)) {
|
2016-10-17 19:58:01 +03:00
|
|
|
return;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
2014-12-22 19:54:31 +03:00
|
|
|
|
2018-08-17 18:27:16 +03:00
|
|
|
/*
|
2020-06-26 10:22:30 +03:00
|
|
|
* Mediated devices *might* operate compatibly with discarding of RAM, but
|
2018-08-17 18:27:16 +03:00
|
|
|
* we cannot know for certain, it depends on whether the mdev vendor driver
|
|
|
|
* stays in sync with the active working set of the guest driver. Prevent
|
|
|
|
* the x-balloon-allowed option unless this is minimally an mdev device.
|
|
|
|
*/
|
2022-05-02 12:42:22 +03:00
|
|
|
tmp = g_strdup_printf("%s/subsystem", vbasedev->sysfsdev);
|
2018-08-17 18:27:16 +03:00
|
|
|
subsys = realpath(tmp, NULL);
|
2018-08-23 19:45:57 +03:00
|
|
|
is_mdev = subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
|
2018-08-17 18:27:16 +03:00
|
|
|
free(subsys);
|
|
|
|
|
2022-05-02 12:42:22 +03:00
|
|
|
trace_vfio_mdev(vbasedev->name, is_mdev);
|
2018-08-17 18:27:16 +03:00
|
|
|
|
2022-05-02 12:42:22 +03:00
|
|
|
if (vbasedev->ram_block_discard_allowed && !is_mdev) {
|
2018-08-17 18:27:16 +03:00
|
|
|
error_setg(errp, "x-balloon-allowed only potentially compatible "
|
|
|
|
"with mdev devices");
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2023-03-20 10:35:22 +03:00
|
|
|
if (!qemu_uuid_is_null(&vdev->vf_token)) {
|
|
|
|
qemu_uuid_unparse(&vdev->vf_token, uuid);
|
|
|
|
name = g_strdup_printf("%s vf_token=%s", vbasedev->name, uuid);
|
|
|
|
} else {
|
2023-05-17 05:46:51 +03:00
|
|
|
name = g_strdup(vbasedev->name);
|
2023-03-20 10:35:22 +03:00
|
|
|
}
|
|
|
|
|
2024-05-07 09:42:44 +03:00
|
|
|
if (!vfio_attach_device(name, vbasedev,
|
|
|
|
pci_device_iommu_address_space(pdev), errp)) {
|
2016-10-17 19:57:56 +03:00
|
|
|
goto error;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:05 +03:00
|
|
|
if (!vfio_populate_device(vdev, errp)) {
|
2023-10-11 23:09:34 +03:00
|
|
|
goto error;
|
2015-02-10 20:25:44 +03:00
|
|
|
}
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
/* Get a copy of config space */
|
2022-05-02 12:42:22 +03:00
|
|
|
ret = pread(vbasedev->fd, vdev->pdev.config,
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
MIN(pci_config_size(&vdev->pdev), vdev->config_size),
|
|
|
|
vdev->config_offset);
|
|
|
|
if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
|
|
|
|
ret = ret < 0 ? -errno : -EFAULT;
|
2016-10-17 19:58:01 +03:00
|
|
|
error_setg_errno(errp, -ret, "failed to read device config space");
|
2023-10-11 23:09:34 +03:00
|
|
|
goto error;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2013-04-01 21:50:04 +04:00
|
|
|
/* vfio emulates a lot for us, but some bits need extra love */
|
|
|
|
vdev->emulated_config_bits = g_malloc0(vdev->config_size);
|
|
|
|
|
|
|
|
/* QEMU can choose to expose the ROM or not */
|
|
|
|
memset(vdev->emulated_config_bits + PCI_ROM_ADDRESS, 0xff, 4);
|
2018-02-06 21:08:25 +03:00
|
|
|
/* QEMU can also add or extend BARs */
|
|
|
|
memset(vdev->emulated_config_bits + PCI_BASE_ADDRESS_0, 0xff, 6 * 4);
|
2013-04-01 21:50:04 +04:00
|
|
|
|
2015-09-23 22:04:49 +03:00
|
|
|
/*
|
|
|
|
* The PCI spec reserves vendor ID 0xffff as an invalid value. The
|
|
|
|
* device ID is managed by the vendor and need only be a 16-bit value.
|
|
|
|
* Allow any 16-bit value for subsystem so they can be hidden or changed.
|
|
|
|
*/
|
|
|
|
if (vdev->vendor_id != PCI_ANY_ID) {
|
|
|
|
if (vdev->vendor_id >= 0xffff) {
|
2016-10-17 19:58:01 +03:00
|
|
|
error_setg(errp, "invalid PCI vendor ID provided");
|
2023-10-11 23:09:34 +03:00
|
|
|
goto error;
|
2015-09-23 22:04:49 +03:00
|
|
|
}
|
|
|
|
vfio_add_emulated_word(vdev, PCI_VENDOR_ID, vdev->vendor_id, ~0);
|
2022-05-02 12:42:22 +03:00
|
|
|
trace_vfio_pci_emulated_vendor_id(vbasedev->name, vdev->vendor_id);
|
2015-09-23 22:04:49 +03:00
|
|
|
} else {
|
|
|
|
vdev->vendor_id = pci_get_word(pdev->config + PCI_VENDOR_ID);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (vdev->device_id != PCI_ANY_ID) {
|
|
|
|
if (vdev->device_id > 0xffff) {
|
2016-10-17 19:58:01 +03:00
|
|
|
error_setg(errp, "invalid PCI device ID provided");
|
2023-10-11 23:09:34 +03:00
|
|
|
goto error;
|
2015-09-23 22:04:49 +03:00
|
|
|
}
|
|
|
|
vfio_add_emulated_word(vdev, PCI_DEVICE_ID, vdev->device_id, ~0);
|
2022-05-02 12:42:22 +03:00
|
|
|
trace_vfio_pci_emulated_device_id(vbasedev->name, vdev->device_id);
|
2015-09-23 22:04:49 +03:00
|
|
|
} else {
|
|
|
|
vdev->device_id = pci_get_word(pdev->config + PCI_DEVICE_ID);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (vdev->sub_vendor_id != PCI_ANY_ID) {
|
|
|
|
if (vdev->sub_vendor_id > 0xffff) {
|
2016-10-17 19:58:01 +03:00
|
|
|
error_setg(errp, "invalid PCI subsystem vendor ID provided");
|
2023-10-11 23:09:34 +03:00
|
|
|
goto error;
|
2015-09-23 22:04:49 +03:00
|
|
|
}
|
|
|
|
vfio_add_emulated_word(vdev, PCI_SUBSYSTEM_VENDOR_ID,
|
|
|
|
vdev->sub_vendor_id, ~0);
|
2022-05-02 12:42:22 +03:00
|
|
|
trace_vfio_pci_emulated_sub_vendor_id(vbasedev->name,
|
2015-09-23 22:04:49 +03:00
|
|
|
vdev->sub_vendor_id);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (vdev->sub_device_id != PCI_ANY_ID) {
|
|
|
|
if (vdev->sub_device_id > 0xffff) {
|
2016-10-17 19:58:01 +03:00
|
|
|
error_setg(errp, "invalid PCI subsystem device ID provided");
|
2023-10-11 23:09:34 +03:00
|
|
|
goto error;
|
2015-09-23 22:04:49 +03:00
|
|
|
}
|
|
|
|
vfio_add_emulated_word(vdev, PCI_SUBSYSTEM_ID, vdev->sub_device_id, ~0);
|
2022-05-02 12:42:22 +03:00
|
|
|
trace_vfio_pci_emulated_sub_device_id(vbasedev->name,
|
2015-09-23 22:04:49 +03:00
|
|
|
vdev->sub_device_id);
|
|
|
|
}
|
2015-09-23 22:04:49 +03:00
|
|
|
|
2013-04-01 21:50:04 +04:00
|
|
|
/* QEMU can change multi-function devices to single function, or reverse */
|
|
|
|
vdev->emulated_config_bits[PCI_HEADER_TYPE] =
|
|
|
|
PCI_HEADER_TYPE_MULTI_FUNCTION;
|
|
|
|
|
2013-11-12 22:53:24 +04:00
|
|
|
/* Restore or clear multifunction, this is always controlled by QEMU */
|
|
|
|
if (vdev->pdev.cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
|
|
|
|
vdev->pdev.config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
|
|
|
|
} else {
|
|
|
|
vdev->pdev.config[PCI_HEADER_TYPE] &= ~PCI_HEADER_TYPE_MULTI_FUNCTION;
|
|
|
|
}
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
/*
|
|
|
|
* Clear host resource mapping info. If we choose not to register a
|
|
|
|
* BAR, such as might be the case with the option ROM, we can get
|
|
|
|
* confusing, unwritable, residual addresses from the host here.
|
|
|
|
*/
|
|
|
|
memset(&vdev->pdev.config[PCI_BASE_ADDRESS_0], 0, 24);
|
|
|
|
memset(&vdev->pdev.config[PCI_ROM_ADDRESS], 0, 4);
|
|
|
|
|
2013-10-02 22:52:38 +04:00
|
|
|
vfio_pci_size_rom(vdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table. This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64). In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device. However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X. Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location. There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.
This new x-msix-relocation option accepts the following choices:
off: Disable MSI-X relocation, use native device config (default)
auto: Use a known good combination for the platform/device (none yet)
bar0..bar5: Specify the target BAR for MSI-X data structures
If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.
The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area. Take for
example:
# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
...
Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
...
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance. The data sheet specifically refers
to this as an MSI-X BAR. This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.
However, here's another example:
# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
...
Region 0: I/O ports at c000 [size=256]
Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
...
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=1 offset=0000e000
PBA: BAR=1 offset=0000f000
Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR. If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform. At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.
Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures. A few key rules to keep in mind for this selection
include:
* There are only 6 BAR slots, bar0..bar5
* 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
* PCI BARs are always a power of 2 in size, extending == doubling
* The maximum size of a 32-bit BAR is 2GB
* MSI-X data structures must reside in an MMIO BAR
Using these rules, we can evaluate each BAR of the second example
device above as follows:
bar0: I/O port BAR, incompatible with MSI-X tables
bar1: BAR could be extended, incurring another 64KB of MMIO
bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
bar3: BAR could be extended, incurring another 256KB of MMIO
bar4: Unavailable, bar3 is 64bit, this register is used by bar3
bar5: Available, empty BAR, minimum additional MMIO
A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3. The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users. This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 21:08:26 +03:00
|
|
|
vfio_bars_prepare(vdev);
|
|
|
|
|
2024-05-22 07:40:04 +03:00
|
|
|
if (!vfio_msix_early_setup(vdev, errp)) {
|
2023-10-11 23:09:34 +03:00
|
|
|
goto error;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2018-02-06 21:08:25 +03:00
|
|
|
vfio_bars_register(vdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2024-06-05 11:30:40 +03:00
|
|
|
if (!pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
|
|
|
|
error_prepend(errp, "Failed to set iommu_device: ");
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
goto out_teardown;
|
|
|
|
}
|
|
|
|
|
2024-06-05 11:30:40 +03:00
|
|
|
if (!vfio_add_capabilities(vdev, errp)) {
|
|
|
|
goto out_unset_idev;
|
|
|
|
}
|
|
|
|
|
2016-05-26 18:43:21 +03:00
|
|
|
if (vdev->vga) {
|
|
|
|
vfio_vga_quirk_setup(vdev);
|
|
|
|
}
|
|
|
|
|
2016-05-26 18:43:21 +03:00
|
|
|
for (i = 0; i < PCI_ROM_SLOT; i++) {
|
|
|
|
vfio_bar_quirk_setup(vdev, i);
|
|
|
|
}
|
|
|
|
|
2016-05-26 18:43:22 +03:00
|
|
|
if (!vdev->igd_opregion &&
|
|
|
|
vdev->features & VFIO_FEATURE_ENABLE_IGD_OPREGION) {
|
2024-05-22 07:40:09 +03:00
|
|
|
g_autofree struct vfio_region_info *opregion = NULL;
|
2016-05-26 18:43:22 +03:00
|
|
|
|
|
|
|
if (vdev->pdev.qdev.hotplugged) {
|
2016-10-17 19:58:01 +03:00
|
|
|
error_setg(errp,
|
2016-10-17 19:57:56 +03:00
|
|
|
"cannot support IGD OpRegion feature on hotplugged "
|
|
|
|
"device");
|
2024-06-05 11:30:40 +03:00
|
|
|
goto out_unset_idev;
|
2016-05-26 18:43:22 +03:00
|
|
|
}
|
|
|
|
|
2022-05-02 12:42:22 +03:00
|
|
|
ret = vfio_get_dev_region_info(vbasedev,
|
2016-05-26 18:43:22 +03:00
|
|
|
VFIO_REGION_TYPE_PCI_VENDOR_TYPE | PCI_VENDOR_ID_INTEL,
|
|
|
|
VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &opregion);
|
|
|
|
if (ret) {
|
2016-10-17 19:58:01 +03:00
|
|
|
error_setg_errno(errp, -ret,
|
2016-10-17 19:57:56 +03:00
|
|
|
"does not support requested IGD OpRegion feature");
|
2024-06-05 11:30:40 +03:00
|
|
|
goto out_unset_idev;
|
2016-05-26 18:43:22 +03:00
|
|
|
}
|
|
|
|
|
2024-05-22 07:40:10 +03:00
|
|
|
if (!vfio_pci_igd_opregion_init(vdev, opregion, errp)) {
|
2024-06-05 11:30:40 +03:00
|
|
|
goto out_unset_idev;
|
2016-05-26 18:43:22 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-04-01 21:50:04 +04:00
|
|
|
/* QEMU emulates all of MSI & MSIX */
|
|
|
|
if (pdev->cap_present & QEMU_PCI_CAP_MSIX) {
|
|
|
|
memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff,
|
|
|
|
MSIX_CAP_LENGTH);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (pdev->cap_present & QEMU_PCI_CAP_MSI) {
|
|
|
|
memset(vdev->emulated_config_bits + pdev->msi_cap, 0xff,
|
|
|
|
vdev->msi_cap_size);
|
|
|
|
}
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
if (vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1)) {
|
2013-08-21 19:03:08 +04:00
|
|
|
vdev->intx.mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
|
vfio-pci: Update slow path INTx algorithm
We can't afford the overhead of switching out and back into mmap mode
around each interrupt, but we can do it lazily via a timer. On INTx
interrupt, disable the mmap'd memory regions and set a timer. On
every interrupt, push the timer out. If the timer expires and the
interrupt is no longer pending, switch back to mmap mode.
This has the benefit that things like graphics cards, which rarely or
never, fire an interrupt don't need manual user intervention to add
the x-intx=off parameter. They'll just remain in mmap mode until they
trigger an interrupt, and if they don't continue to regularly fire
interrupts, they'll switch back.
The default timeout is tuned for network cards so that a ping is just
enough to keep them in non-mmap mode, where they have much better
latency. It is tunable with an experimental option,
x-intx-mmap-timeout-ms. A value of 0 keeps the device in non-mmap
mode after the first interrupt.
It's possible we could look at the class code of devices and come up
with reasonable per-class defaults based on expected interrupt
frequency and latency. None of this is used for MSI interrupts and
also won't be used if we can bypass through KVM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 18:45:29 +04:00
|
|
|
vfio_intx_mmap_enable, vdev);
|
2019-10-17 03:52:45 +03:00
|
|
|
pci_device_set_intx_routing_notifier(&vdev->pdev,
|
|
|
|
vfio_intx_routing_notifier);
|
2019-10-17 04:38:30 +03:00
|
|
|
vdev->irqchip_change_notifier.notify = vfio_irqchip_change;
|
|
|
|
kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
|
2024-05-22 07:40:06 +03:00
|
|
|
if (!vfio_intx_enable(vdev, errp)) {
|
2019-10-17 04:38:30 +03:00
|
|
|
goto out_deregister;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-03-13 20:17:29 +03:00
|
|
|
if (vdev->display != ON_OFF_AUTO_OFF) {
|
2024-05-22 07:39:57 +03:00
|
|
|
if (!vfio_display_probe(vdev, errp)) {
|
2019-10-17 04:38:30 +03:00
|
|
|
goto out_deregister;
|
2018-03-13 20:17:29 +03:00
|
|
|
}
|
|
|
|
}
|
2018-10-15 19:52:09 +03:00
|
|
|
if (vdev->enable_ramfb && vdev->dpy == NULL) {
|
|
|
|
error_setg(errp, "ramfb=on requires display=on");
|
2019-10-17 04:38:30 +03:00
|
|
|
goto out_deregister;
|
2018-10-15 19:52:09 +03:00
|
|
|
}
|
2019-03-11 20:14:40 +03:00
|
|
|
if (vdev->display_xres || vdev->display_yres) {
|
|
|
|
if (vdev->dpy == NULL) {
|
|
|
|
error_setg(errp, "xres and yres properties require display=on");
|
2019-10-17 04:38:30 +03:00
|
|
|
goto out_deregister;
|
2019-03-11 20:14:40 +03:00
|
|
|
}
|
|
|
|
if (vdev->dpy->edid_regs == NULL) {
|
|
|
|
error_setg(errp, "xres and yres properties need edid support");
|
2019-10-17 04:38:30 +03:00
|
|
|
goto out_deregister;
|
2019-03-11 20:14:40 +03:00
|
|
|
}
|
|
|
|
}
|
2018-03-13 20:17:29 +03:00
|
|
|
|
2023-10-09 09:32:47 +03:00
|
|
|
if (vdev->ramfb_migrate == ON_OFF_AUTO_ON && !vdev->enable_ramfb) {
|
|
|
|
warn_report("x-ramfb-migrate=on but ramfb=off. "
|
|
|
|
"Forcing x-ramfb-migrate to off.");
|
|
|
|
vdev->ramfb_migrate = ON_OFF_AUTO_OFF;
|
|
|
|
}
|
|
|
|
if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
|
|
|
|
if (vdev->ramfb_migrate == ON_OFF_AUTO_AUTO) {
|
|
|
|
vdev->ramfb_migrate = ON_OFF_AUTO_OFF;
|
|
|
|
} else if (vdev->ramfb_migrate == ON_OFF_AUTO_ON) {
|
|
|
|
error_setg(errp, "x-ramfb-migrate requires enable-migration");
|
|
|
|
goto out_deregister;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-10-26 12:36:26 +03:00
|
|
|
if (!pdev->failover_pair_id) {
|
2023-07-03 10:15:10 +03:00
|
|
|
if (!vfio_migration_realize(vbasedev, errp)) {
|
vfio/migration: Free resources when vfio_migration_realize fails
When vfio_realize() succeeds, hot unplug will call vfio_exitfn()
to free resources allocated in vfio_realize(); when vfio_realize()
fails, vfio_exitfn() is never called and we need to free resources
in vfio_realize().
In the case that vfio_migration_realize() fails,
e.g: with -only-migratable & enable-migration=off, we see below:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
0000:81:11.1: Migration disabled
Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: Migration is disabled for VFIO device
If we hotplug again we should see same log as above, but we see:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
Error: vfio 0000:81:11.1: device is already attached
That's because some references to VFIO device isn't released.
For resources allocated in vfio_migration_realize(), free them by
jumping to out_deinit path with calling a new function
vfio_migration_deinit(). For resources allocated in vfio_realize(),
free them by jumping to de-register path in vfio_realize().
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Fixes: a22651053b59 ("vfio: Make vfio-pci device migration capable")
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-07-03 10:15:08 +03:00
|
|
|
goto out_deregister;
|
2020-10-26 12:36:26 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-07-16 01:49:49 +04:00
|
|
|
vfio_register_err_notifier(vdev);
|
2015-03-02 21:38:55 +03:00
|
|
|
vfio_register_req_notifier(vdev);
|
2015-09-23 22:04:49 +03:00
|
|
|
vfio_setup_resetfn_quirk(vdev);
|
2013-04-01 23:35:24 +04:00
|
|
|
|
2016-10-17 19:58:01 +03:00
|
|
|
return;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2019-10-17 04:38:30 +03:00
|
|
|
out_deregister:
|
2023-07-03 10:15:06 +03:00
|
|
|
if (vdev->interrupt == VFIO_INT_INTx) {
|
|
|
|
vfio_intx_disable(vdev);
|
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
|
vfio/pci: Fix a segfault in vfio_realize
The kvm irqchip notifier is only registered if the device supports
INTx, however it's unconditionally removed in vfio realize error
path. If the assigned device does not support INTx, this will cause
QEMU to crash when vfio realize fails. Change it to conditionally
remove the notifier only if the notify hook is setup.
Before fix:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1
Connection closed by foreign host.
After fix:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1
Error: vfio 0000:81:11.1: xres and yres properties require display=on
(qemu)
Fixes: c5478fea27ac ("vfio/pci: Respond to KVM irqchip change notifier")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-29 11:40:38 +03:00
|
|
|
if (vdev->irqchip_change_notifier.notify) {
|
|
|
|
kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
|
|
|
|
}
|
2023-06-29 11:40:39 +03:00
|
|
|
if (vdev->intx.mmap_timer) {
|
|
|
|
timer_free(vdev->intx.mmap_timer);
|
|
|
|
}
|
2024-06-05 11:30:40 +03:00
|
|
|
out_unset_idev:
|
|
|
|
pci_device_unset_iommu_device(pdev);
|
2019-10-17 04:38:30 +03:00
|
|
|
out_teardown:
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vfio_teardown_msi(vdev);
|
2016-03-10 19:39:08 +03:00
|
|
|
vfio_bars_exit(vdev);
|
2016-10-17 19:57:56 +03:00
|
|
|
error:
|
2022-05-02 12:42:22 +03:00
|
|
|
error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->name);
|
2015-02-10 20:25:44 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static void vfio_instance_finalize(Object *obj)
|
|
|
|
{
|
2020-09-03 01:43:03 +03:00
|
|
|
VFIOPCIDevice *vdev = VFIO_PCI(obj);
|
2015-02-10 20:25:44 +03:00
|
|
|
|
2018-03-13 20:17:29 +03:00
|
|
|
vfio_display_finalize(vdev);
|
2016-03-10 19:39:08 +03:00
|
|
|
vfio_bars_finalize(vdev);
|
2013-04-01 21:50:04 +04:00
|
|
|
g_free(vdev->emulated_config_bits);
|
2015-02-10 20:25:44 +03:00
|
|
|
g_free(vdev->rom);
|
vfio/pci: Intel graphics legacy mode assignment
Enable quirks to support SandyBridge and newer IGD devices as primary
VM graphics. This requires new vfio-pci device specific regions added
in kernel v4.6 to expose the IGD OpRegion, the shadow ROM, and config
space access to the PCI host bridge and LPC/ISA bridge. VM firmware
support, SeaBIOS only so far, is also required for reserving memory
regions for IGD specific use. In order to enable this mode, IGD must
be assigned to the VM at PCI bus address 00:02.0, it must have a ROM,
it must be able to enable VGA, it must have or be able to create on
its own an LPC/ISA bridge of the proper type at PCI bus address
00:1f.0 (sorry, not compatible with Q35 yet), and it must have the
above noted vfio-pci kernel features and BIOS. The intention is that
to enable this mode, a user simply needs to assign 00:02.0 from the
host to 00:02.0 in the VM:
-device vfio-pci,host=0000:00:02.0,bus=pci.0,addr=02.0
and everything either happens automatically or it doesn't. In the
case that it doesn't, we leave error reports, but assume the device
will operate in universal passthrough mode (UPT), which doesn't
require any of this, but has a much more narrow window of supported
devices, supported use cases, and supported guest drivers.
When using IGD in this mode, the VM firmware is required to reserve
some VM RAM for the OpRegion (on the order or several 4k pages) and
stolen memory for the GTT (up to 8MB for the latest GPUs). An
additional option, x-igd-gms allows the user to specify some amount
of additional memory (value is number of 32MB chunks up to 512MB) that
is pre-allocated for graphics use. TBH, I don't know of anything that
requires this or makes use of this memory, which is why we don't
allocate any by default, but the specification suggests this is not
actually a valid combination, so the option exists as a workaround.
Please report if it's actually necessary in some environment.
See code comments for further discussion about the actual operation
of the quirks necessary to assign these devices.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26 18:43:21 +03:00
|
|
|
/*
|
|
|
|
* XXX Leaking igd_opregion is not an oversight, we can't remove the
|
|
|
|
* fw_cfg entry therefore leaking this allocation seems like the safest
|
|
|
|
* option.
|
|
|
|
*
|
|
|
|
* g_free(vdev->igd_opregion);
|
|
|
|
*/
|
2023-09-22 05:52:23 +03:00
|
|
|
vfio_pci_put_device(vdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
static void vfio_exitfn(PCIDevice *pdev)
|
|
|
|
{
|
2020-09-03 01:43:03 +03:00
|
|
|
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
|
2024-06-05 11:30:40 +03:00
|
|
|
VFIODevice *vbasedev = &vdev->vbasedev;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2015-03-02 21:38:55 +03:00
|
|
|
vfio_unregister_req_notifier(vdev);
|
2013-07-16 01:49:49 +04:00
|
|
|
vfio_unregister_err_notifier(vdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
|
2020-01-06 23:34:45 +03:00
|
|
|
if (vdev->irqchip_change_notifier.notify) {
|
|
|
|
kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
|
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vfio_disable_interrupts(vdev);
|
vfio-pci: Update slow path INTx algorithm
We can't afford the overhead of switching out and back into mmap mode
around each interrupt, but we can do it lazily via a timer. On INTx
interrupt, disable the mmap'd memory regions and set a timer. On
every interrupt, push the timer out. If the timer expires and the
interrupt is no longer pending, switch back to mmap mode.
This has the benefit that things like graphics cards, which rarely or
never, fire an interrupt don't need manual user intervention to add
the x-intx=off parameter. They'll just remain in mmap mode until they
trigger an interrupt, and if they don't continue to regularly fire
interrupts, they'll switch back.
The default timeout is tuned for network cards so that a ping is just
enough to keep them in non-mmap mode, where they have much better
latency. It is tunable with an experimental option,
x-intx-mmap-timeout-ms. A value of 0 keeps the device in non-mmap
mode after the first interrupt.
It's possible we could look at the class code of devices and come up
with reasonable per-class defaults based on expected interrupt
frequency and latency. None of this is used for MSI interrupts and
also won't be used if we can bypass through KVM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 18:45:29 +04:00
|
|
|
if (vdev->intx.mmap_timer) {
|
2013-08-21 19:03:08 +04:00
|
|
|
timer_free(vdev->intx.mmap_timer);
|
vfio-pci: Update slow path INTx algorithm
We can't afford the overhead of switching out and back into mmap mode
around each interrupt, but we can do it lazily via a timer. On INTx
interrupt, disable the mmap'd memory regions and set a timer. On
every interrupt, push the timer out. If the timer expires and the
interrupt is no longer pending, switch back to mmap mode.
This has the benefit that things like graphics cards, which rarely or
never, fire an interrupt don't need manual user intervention to add
the x-intx=off parameter. They'll just remain in mmap mode until they
trigger an interrupt, and if they don't continue to regularly fire
interrupts, they'll switch back.
The default timeout is tuned for network cards so that a ping is just
enough to keep them in non-mmap mode, where they have much better
latency. It is tunable with an experimental option,
x-intx-mmap-timeout-ms. A value of 0 keeps the device in non-mmap
mode after the first interrupt.
It's possible we could look at the class code of devices and come up
with reasonable per-class defaults based on expected interrupt
frequency and latency. None of this is used for MSI interrupts and
also won't be used if we can bypass through KVM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 18:45:29 +04:00
|
|
|
}
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
vfio_teardown_msi(vdev);
|
2023-05-27 02:15:58 +03:00
|
|
|
vfio_pci_disable_rp_atomics(vdev);
|
2016-03-10 19:39:08 +03:00
|
|
|
vfio_bars_exit(vdev);
|
2024-06-05 11:30:40 +03:00
|
|
|
vfio_migration_exit(vbasedev);
|
|
|
|
pci_device_unset_iommu_device(pdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
static void vfio_pci_reset(DeviceState *dev)
|
|
|
|
{
|
2020-09-03 01:43:03 +03:00
|
|
|
VFIOPCIDevice *vdev = VFIO_PCI(dev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_pci_reset(vdev->vbasedev.name);
|
2012-10-08 18:45:30 +04:00
|
|
|
|
2013-10-02 23:51:00 +04:00
|
|
|
vfio_pci_pre_reset(vdev);
|
2013-04-01 23:35:08 +04:00
|
|
|
|
2018-04-27 12:11:06 +03:00
|
|
|
if (vdev->display != ON_OFF_AUTO_OFF) {
|
|
|
|
vfio_display_reset(vdev);
|
|
|
|
}
|
|
|
|
|
2015-04-28 20:14:02 +03:00
|
|
|
if (vdev->resetfn && !vdev->resetfn(vdev)) {
|
|
|
|
goto post_reset;
|
|
|
|
}
|
|
|
|
|
2014-12-22 19:54:35 +03:00
|
|
|
if (vdev->vbasedev.reset_works &&
|
|
|
|
(vdev->has_flr || !vdev->has_pm_reset) &&
|
2014-12-20 01:24:31 +03:00
|
|
|
!ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_pci_reset_flr(vdev->vbasedev.name);
|
2013-10-02 23:51:00 +04:00
|
|
|
goto post_reset;
|
2013-04-01 23:35:08 +04:00
|
|
|
}
|
|
|
|
|
2013-10-02 23:51:00 +04:00
|
|
|
/* See if we can do our own bus reset */
|
|
|
|
if (!vfio_pci_hot_reset_one(vdev)) {
|
|
|
|
goto post_reset;
|
|
|
|
}
|
2012-10-08 18:45:30 +04:00
|
|
|
|
2013-10-02 23:51:00 +04:00
|
|
|
/* If nothing else works and the device supports PM reset, use it */
|
2014-12-22 19:54:35 +03:00
|
|
|
if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
|
2014-12-20 01:24:31 +03:00
|
|
|
!ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
|
2014-12-22 19:54:49 +03:00
|
|
|
trace_vfio_pci_reset_pm(vdev->vbasedev.name);
|
2013-10-02 23:51:00 +04:00
|
|
|
goto post_reset;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
2012-10-08 18:45:30 +04:00
|
|
|
|
2013-10-02 23:51:00 +04:00
|
|
|
post_reset:
|
|
|
|
vfio_pci_post_reset(vdev);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
2014-10-07 12:00:25 +04:00
|
|
|
static void vfio_instance_init(Object *obj)
|
|
|
|
{
|
|
|
|
PCIDevice *pci_dev = PCI_DEVICE(obj);
|
2020-09-03 01:43:03 +03:00
|
|
|
VFIOPCIDevice *vdev = VFIO_PCI(obj);
|
2023-11-21 11:44:21 +03:00
|
|
|
VFIODevice *vbasedev = &vdev->vbasedev;
|
2014-10-07 12:00:25 +04:00
|
|
|
|
|
|
|
device_add_bootindex_property(obj, &vdev->bootindex,
|
|
|
|
"bootindex", NULL,
|
2020-05-05 18:29:23 +03:00
|
|
|
&pci_dev->qdev);
|
2016-10-17 19:58:02 +03:00
|
|
|
vdev->host.domain = ~0U;
|
|
|
|
vdev->host.bus = ~0U;
|
|
|
|
vdev->host.slot = ~0U;
|
|
|
|
vdev->host.function = ~0U;
|
2023-11-21 11:44:21 +03:00
|
|
|
|
2023-11-21 11:44:25 +03:00
|
|
|
vfio_device_init(vbasedev, VFIO_DEVICE_TYPE_PCI, &vfio_pci_ops,
|
|
|
|
DEVICE(vdev), false);
|
2017-08-30 01:05:47 +03:00
|
|
|
|
|
|
|
vdev->nv_gpudirect_clique = 0xFF;
|
2018-01-16 15:34:56 +03:00
|
|
|
|
|
|
|
/* QEMU_PCI_CAP_EXPRESS initialization does not depend on QEMU command
|
|
|
|
* line, therefore, no need to wait to realize like other devices */
|
|
|
|
pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
|
2014-10-07 12:00:25 +04:00
|
|
|
}
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
static Property vfio_pci_dev_properties[] = {
|
2014-12-20 01:24:15 +03:00
|
|
|
DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
|
2023-03-20 10:35:22 +03:00
|
|
|
DEFINE_PROP_UUID_NODEFAULT("vf-token", VFIOPCIDevice, vf_token),
|
vfio: Add sysfsdev property for pci & platform
vfio-pci currently requires a host= parameter, which comes in the
form of a PCI address in [domain:]<bus:slot.function> notation. We
expect to find a matching entry in sysfs for that under
/sys/bus/pci/devices/. vfio-platform takes a similar approach, but
defines the host= parameter to be a string, which can be matched
directly under /sys/bus/platform/devices/. On the PCI side, we have
some interest in using vfio to expose vGPU devices. These are not
actual discrete PCI devices, so they don't have a compatible host PCI
bus address or a device link where QEMU wants to look for it. There's
also really no requirement that vfio can only be used to expose
physical devices, a new vfio bus and iommu driver could expose a
completely emulated device. To fit within the vfio framework, it
would need a kernel struct device and associated IOMMU group, but
those are easy constraints to manage.
To support such devices, which would include vGPUs, that honor the
VFIO PCI programming API, but are not necessarily backed by a unique
PCI address, add support for specifying any device in sysfs. The
vfio API already has support for probing the device type to ensure
compatibility with either vfio-pci or vfio-platform.
With this, a vfio-pci device could either be specified as:
-device vfio-pci,host=02:00.0
or
-device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0
or even
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0
When vGPU support comes along, this might look something more like:
-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0
NB - This is only a made up example path
The same change is made for vfio-platform, specifying sysfsdev has
precedence over the old host option.
Tested-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 19:39:07 +03:00
|
|
|
DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice, vbasedev.sysfsdev),
|
2020-11-23 17:23:19 +03:00
|
|
|
DEFINE_PROP_ON_OFF_AUTO("x-pre-copy-dirty-page-tracking", VFIOPCIDevice,
|
|
|
|
vbasedev.pre_copy_dirty_page_tracking,
|
|
|
|
ON_OFF_AUTO_ON),
|
2018-03-13 20:17:29 +03:00
|
|
|
DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
|
2018-06-05 17:23:18 +03:00
|
|
|
display, ON_OFF_AUTO_OFF),
|
2019-03-11 20:14:40 +03:00
|
|
|
DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
|
|
|
|
DEFINE_PROP_UINT32("yres", VFIOPCIDevice, display_yres, 0),
|
2014-12-20 01:24:15 +03:00
|
|
|
DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
|
vfio-pci: Update slow path INTx algorithm
We can't afford the overhead of switching out and back into mmap mode
around each interrupt, but we can do it lazily via a timer. On INTx
interrupt, disable the mmap'd memory regions and set a timer. On
every interrupt, push the timer out. If the timer expires and the
interrupt is no longer pending, switch back to mmap mode.
This has the benefit that things like graphics cards, which rarely or
never, fire an interrupt don't need manual user intervention to add
the x-intx=off parameter. They'll just remain in mmap mode until they
trigger an interrupt, and if they don't continue to regularly fire
interrupts, they'll switch back.
The default timeout is tuned for network cards so that a ping is just
enough to keep them in non-mmap mode, where they have much better
latency. It is tunable with an experimental option,
x-intx-mmap-timeout-ms. A value of 0 keeps the device in non-mmap
mode after the first interrupt.
It's possible we could look at the class code of devices and come up
with reasonable per-class defaults based on expected interrupt
frequency and latency. None of this is used for MSI interrupts and
also won't be used if we can bypass through KVM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 18:45:29 +04:00
|
|
|
intx.mmap_timeout, 1100),
|
2014-12-20 01:24:15 +03:00
|
|
|
DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features,
|
2013-04-01 23:33:44 +04:00
|
|
|
VFIO_FEATURE_ENABLE_VGA_BIT, false),
|
2015-03-02 21:38:55 +03:00
|
|
|
DEFINE_PROP_BIT("x-req", VFIOPCIDevice, features,
|
|
|
|
VFIO_FEATURE_ENABLE_REQ_BIT, true),
|
2016-05-26 18:43:22 +03:00
|
|
|
DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
|
|
|
|
VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
|
2023-06-28 10:31:12 +03:00
|
|
|
DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
|
|
|
|
vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
|
2024-05-15 16:21:36 +03:00
|
|
|
DEFINE_PROP_BOOL("migration-events", VFIOPCIDevice,
|
|
|
|
vbasedev.migration_events, false),
|
2015-09-23 22:04:44 +03:00
|
|
|
DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
|
2018-08-17 18:27:16 +03:00
|
|
|
DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
|
2020-06-26 10:22:30 +03:00
|
|
|
vbasedev.ram_block_discard_allowed, false),
|
2015-09-23 22:04:44 +03:00
|
|
|
DEFINE_PROP_BOOL("x-no-kvm-intx", VFIOPCIDevice, no_kvm_intx, false),
|
|
|
|
DEFINE_PROP_BOOL("x-no-kvm-msi", VFIOPCIDevice, no_kvm_msi, false),
|
|
|
|
DEFINE_PROP_BOOL("x-no-kvm-msix", VFIOPCIDevice, no_kvm_msix, false),
|
2018-02-06 21:08:27 +03:00
|
|
|
DEFINE_PROP_BOOL("x-no-geforce-quirks", VFIOPCIDevice,
|
|
|
|
no_geforce_quirks, false),
|
vfio/quirks: ioeventfd quirk acceleration
The NVIDIA BAR0 quirks virtualize the PCI config space mirrors found
in device MMIO space. Normally PCI config space is considered a slow
path and further optimization is unnecessary, however NVIDIA uses a
register here to enable the MSI interrupt to re-trigger. Exiting to
QEMU for this MSI-ACK handling can therefore rate limit our interrupt
handling. Fortunately the MSI-ACK write is easily detected since the
quirk MemoryRegion otherwise has very few accesses, so simply looking
for consecutive writes with the same data is sufficient, in this case
10 consecutive writes with the same data and size is arbitrarily
chosen. We configure the KVM ioeventfd with data match, so there's
no risk of triggering for the wrong data or size, but we do risk that
pathological driver behavior might consume all of QEMU's file
descriptors, so we cap ourselves to 10 ioeventfds for this purpose.
In support of the above, generic ioeventfd infrastructure is added
for vfio quirks. This automatically initializes an ioeventfd list
per quirk, disables and frees ioeventfds on exit, and allows
ioeventfds marked as dynamic to be dropped on device reset. The
rationale for this latter feature is that useful ioeventfds may
depend on specific driver behavior and since we necessarily place a
cap on our use of ioeventfds, a machine reset is a reasonable point
at which to assume a new driver and re-profile.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05 17:23:17 +03:00
|
|
|
DEFINE_PROP_BOOL("x-no-kvm-ioeventfd", VFIOPCIDevice, no_kvm_ioeventfd,
|
|
|
|
false),
|
2018-06-05 17:23:17 +03:00
|
|
|
DEFINE_PROP_BOOL("x-no-vfio-ioeventfd", VFIOPCIDevice, no_vfio_ioeventfd,
|
|
|
|
false),
|
2015-09-23 22:04:49 +03:00
|
|
|
DEFINE_PROP_UINT32("x-pci-vendor-id", VFIOPCIDevice, vendor_id, PCI_ANY_ID),
|
|
|
|
DEFINE_PROP_UINT32("x-pci-device-id", VFIOPCIDevice, device_id, PCI_ANY_ID),
|
|
|
|
DEFINE_PROP_UINT32("x-pci-sub-vendor-id", VFIOPCIDevice,
|
|
|
|
sub_vendor_id, PCI_ANY_ID),
|
|
|
|
DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
|
|
|
|
sub_device_id, PCI_ANY_ID),
|
vfio/pci: Intel graphics legacy mode assignment
Enable quirks to support SandyBridge and newer IGD devices as primary
VM graphics. This requires new vfio-pci device specific regions added
in kernel v4.6 to expose the IGD OpRegion, the shadow ROM, and config
space access to the PCI host bridge and LPC/ISA bridge. VM firmware
support, SeaBIOS only so far, is also required for reserving memory
regions for IGD specific use. In order to enable this mode, IGD must
be assigned to the VM at PCI bus address 00:02.0, it must have a ROM,
it must be able to enable VGA, it must have or be able to create on
its own an LPC/ISA bridge of the proper type at PCI bus address
00:1f.0 (sorry, not compatible with Q35 yet), and it must have the
above noted vfio-pci kernel features and BIOS. The intention is that
to enable this mode, a user simply needs to assign 00:02.0 from the
host to 00:02.0 in the VM:
-device vfio-pci,host=0000:00:02.0,bus=pci.0,addr=02.0
and everything either happens automatically or it doesn't. In the
case that it doesn't, we leave error reports, but assume the device
will operate in universal passthrough mode (UPT), which doesn't
require any of this, but has a much more narrow window of supported
devices, supported use cases, and supported guest drivers.
When using IGD in this mode, the VM firmware is required to reserve
some VM RAM for the OpRegion (on the order or several 4k pages) and
stolen memory for the GTT (up to 8MB for the latest GPUs). An
additional option, x-igd-gms allows the user to specify some amount
of additional memory (value is number of 32MB chunks up to 512MB) that
is pre-allocated for graphics use. TBH, I don't know of anything that
requires this or makes use of this memory, which is why we don't
allocate any by default, but the specification suggests this is not
actually a valid combination, so the option exists as a workaround.
Please report if it's actually necessary in some environment.
See code comments for further discussion about the actual operation
of the quirks necessary to assign these devices.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26 18:43:21 +03:00
|
|
|
DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
|
2017-08-30 01:05:47 +03:00
|
|
|
DEFINE_PROP_UNSIGNED_NODEFAULT("x-nv-gpudirect-clique", VFIOPCIDevice,
|
|
|
|
nv_gpudirect_clique,
|
|
|
|
qdev_prop_nv_gpudirect_clique, uint8_t),
|
vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table. This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64). In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device. However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X. Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location. There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.
This new x-msix-relocation option accepts the following choices:
off: Disable MSI-X relocation, use native device config (default)
auto: Use a known good combination for the platform/device (none yet)
bar0..bar5: Specify the target BAR for MSI-X data structures
If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.
The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area. Take for
example:
# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
...
Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
...
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance. The data sheet specifically refers
to this as an MSI-X BAR. This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.
However, here's another example:
# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
...
Region 0: I/O ports at c000 [size=256]
Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
...
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=1 offset=0000e000
PBA: BAR=1 offset=0000f000
Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR. If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform. At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.
Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures. A few key rules to keep in mind for this selection
include:
* There are only 6 BAR slots, bar0..bar5
* 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
* PCI BARs are always a power of 2 in size, extending == doubling
* The maximum size of a 32-bit BAR is 2GB
* MSI-X data structures must reside in an MMIO BAR
Using these rules, we can evaluate each BAR of the second example
device above as follows:
bar0: I/O port BAR, incompatible with MSI-X tables
bar1: BAR could be extended, incurring another 64KB of MMIO
bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
bar3: BAR could be extended, incurring another 256KB of MMIO
bar4: Unavailable, bar3 is 64bit, this register is used by bar3
bar5: Available, empty BAR, minimum additional MMIO
A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3. The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users. This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 21:08:26 +03:00
|
|
|
DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
|
|
|
|
OFF_AUTOPCIBAR_OFF),
|
2023-11-21 11:44:09 +03:00
|
|
|
#ifdef CONFIG_IOMMUFD
|
|
|
|
DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
|
|
|
|
TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
|
|
|
|
#endif
|
2024-05-03 17:51:42 +03:00
|
|
|
DEFINE_PROP_BOOL("skip-vsc-check", VFIOPCIDevice, skip_vsc_check, true),
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
DEFINE_PROP_END_OF_LIST(),
|
|
|
|
};
|
|
|
|
|
2023-11-21 11:44:10 +03:00
|
|
|
#ifdef CONFIG_IOMMUFD
|
|
|
|
static void vfio_pci_set_fd(Object *obj, const char *str, Error **errp)
|
|
|
|
{
|
|
|
|
vfio_device_set_fd(&VFIO_PCI(obj)->vbasedev, str, errp);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
|
|
|
|
{
|
|
|
|
DeviceClass *dc = DEVICE_CLASS(klass);
|
|
|
|
PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);
|
|
|
|
|
|
|
|
dc->reset = vfio_pci_reset;
|
2020-01-10 18:30:32 +03:00
|
|
|
device_class_set_props(dc, vfio_pci_dev_properties);
|
2023-11-21 11:44:10 +03:00
|
|
|
#ifdef CONFIG_IOMMUFD
|
|
|
|
object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
|
|
|
|
#endif
|
2012-10-17 21:20:14 +04:00
|
|
|
dc->desc = "VFIO-based PCI device assignment";
|
2013-07-29 18:17:45 +04:00
|
|
|
set_bit(DEVICE_CATEGORY_MISC, dc->categories);
|
2016-10-17 19:58:01 +03:00
|
|
|
pdc->realize = vfio_realize;
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
pdc->exit = vfio_exitfn;
|
|
|
|
pdc->config_read = vfio_pci_read_config;
|
|
|
|
pdc->config_write = vfio_pci_write_config;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const TypeInfo vfio_pci_dev_info = {
|
2018-10-15 19:52:10 +03:00
|
|
|
.name = TYPE_VFIO_PCI,
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
.parent = TYPE_PCI_DEVICE,
|
2014-12-20 01:24:15 +03:00
|
|
|
.instance_size = sizeof(VFIOPCIDevice),
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
.class_init = vfio_pci_dev_class_init,
|
2014-10-07 12:00:25 +04:00
|
|
|
.instance_init = vfio_instance_init,
|
2015-02-10 20:25:44 +03:00
|
|
|
.instance_finalize = vfio_instance_finalize,
|
2017-09-27 22:56:32 +03:00
|
|
|
.interfaces = (InterfaceInfo[]) {
|
|
|
|
{ INTERFACE_PCIE_DEVICE },
|
|
|
|
{ INTERFACE_CONVENTIONAL_PCI_DEVICE },
|
|
|
|
{ }
|
|
|
|
},
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
};
|
|
|
|
|
2018-10-15 19:52:09 +03:00
|
|
|
static Property vfio_pci_dev_nohotplug_properties[] = {
|
|
|
|
DEFINE_PROP_BOOL("ramfb", VFIOPCIDevice, enable_ramfb, false),
|
2023-10-09 09:32:47 +03:00
|
|
|
DEFINE_PROP_ON_OFF_AUTO("x-ramfb-migrate", VFIOPCIDevice, ramfb_migrate,
|
|
|
|
ON_OFF_AUTO_AUTO),
|
2018-10-15 19:52:09 +03:00
|
|
|
DEFINE_PROP_END_OF_LIST(),
|
|
|
|
};
|
|
|
|
|
|
|
|
static void vfio_pci_nohotplug_dev_class_init(ObjectClass *klass, void *data)
|
|
|
|
{
|
|
|
|
DeviceClass *dc = DEVICE_CLASS(klass);
|
|
|
|
|
2020-01-10 18:30:32 +03:00
|
|
|
device_class_set_props(dc, vfio_pci_dev_nohotplug_properties);
|
2018-10-15 19:52:09 +03:00
|
|
|
dc->hotpluggable = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const TypeInfo vfio_pci_nohotplug_dev_info = {
|
2019-08-22 09:49:09 +03:00
|
|
|
.name = TYPE_VFIO_PCI_NOHOTPLUG,
|
2019-05-21 18:15:40 +03:00
|
|
|
.parent = TYPE_VFIO_PCI,
|
2018-10-15 19:52:09 +03:00
|
|
|
.instance_size = sizeof(VFIOPCIDevice),
|
|
|
|
.class_init = vfio_pci_nohotplug_dev_class_init,
|
|
|
|
};
|
|
|
|
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
static void register_vfio_pci_dev_type(void)
|
|
|
|
{
|
|
|
|
type_register_static(&vfio_pci_dev_info);
|
2018-10-15 19:52:09 +03:00
|
|
|
type_register_static(&vfio_pci_nohotplug_dev_info);
|
vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci
module. To assign device 0000:05:00.0 to a guest, do the following:
for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.
Then launch qemu including the option:
-device vfio-pci,host=0000:05:00.0
Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt. It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures. The side-effect is a significant
performance slow-down for device in INTx mode. Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether. This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-26 21:19:32 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
type_init(register_vfio_pci_dev_type)
|