Commit Graph

239 Commits

Author SHA1 Message Date
Peter Crosthwaite
f7ceed190d vfio: cpu: Use "real" page size API
This is system level code, and should only depend on the host page
size, not the target page size.

Note that HOST_PAGE_SIZE is misleadingly lead and is really aligning
to both host and target page size. Hence it's replacement with
REAL_HOST_PAGE_SIZE.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-07-06 12:15:12 -06:00
Paolo Bonzini
7d489dcdf5 vfio: fix return type of pread
size_t is an unsigned type, thus the error case is never reached in
the below call to pread.  If bytes is negative, it will be seen as
a very high positive value.

Spotted by Coverity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-07-06 12:15:12 -06:00
Leon Alrae
e207527751 vfio: fix build error on CentOS 5.7
Include linux/vfio.h after sys/ioctl.h, just like in hw/vfio/common.c.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-id: 1434544500-22405-1-git-send-email-leon.alrae@imgtec.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-06-18 10:35:59 +01:00
Eric Auger
0b70743d4f hw/vfio/platform: replace g_malloc0_n by g_new0
g_malloc0_n() is introduced since glib-2.24 while QEMU currently
requires glib-2.22. This may cause a link error on some distributions.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-06-11 14:22:57 +01:00
Eric Auger
7a8d15d770 hw/vfio/platform: calxeda xgmac device
The platform device class has become abstract. This patch introduces
a calxeda xgmac device that derives from it.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-06-09 08:17:17 -06:00
Eric Auger
38559979bf hw/vfio/platform: add irq assignment
This patch adds the code requested to assign interrupts to
a guest. The interrupts are mediated through user handled
eventfds only.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Vikram Sethi <vikrams@codeaurora.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-06-08 09:25:26 -06:00
Eric Auger
0ea2730bef hw/vfio/platform: vfio-platform skeleton
Minimal VFIO platform implementation supporting register space
user mapping but not IRQ assignment.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Vikram Sethi <vikrams@codeaurora.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-06-08 09:25:25 -06:00
Paolo Bonzini
41063e1e7a exec: move rcu_read_lock/unlock to address_space_translate callers
Once address_space_translate will be called outside the BQL, the returned
MemoryRegion might disappear as soon as the RCU read-side critical section
ends.  Avoid this by moving the critical section to the callers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1426684909-95030-3-git-send-email-pbonzini@redhat.com>
2015-04-30 16:55:32 +02:00
Alex Williamson
5655f931ab vfio-pci: Reset workaround for AMD Bonaire and Hawaii GPUs
Somehow these GPUs manage not to respond to a PCI bus reset, removing
our primary mechanism for resetting graphics cards.  The result is
that these devices typically work well for a single VM boot.  If the
VM is rebooted or restarted, the guest driver is not able to init the
card from the dirty state, resulting in a blue screen for Windows
guests.

The workaround is to use a device specific reset.  This is not 100%
reliable though since it depends on the incoming state of the device,
but it substantially improves the usability of these devices in a VM.

Credit to Alex Deucher <alexander.deucher@amd.com> for his guidance.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-04-28 11:14:02 -06:00
Alex Williamson
c6d231e2fd vfio-pci: Fix error path sign
This is an impossible error path due to the fact that we're reading a
kernel provided, rather than user provided link, which will certainly
always fit in PATH_MAX.  Currently it returns a fixed 26 char path
plus %d group number, which typically maxes out at double digits.
However, the caller of the initfn certainly expects a less-than zero
return value on error, not just a non-zero value.  Therefore we
should correct the sign here.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-04-28 11:14:02 -06:00
Alex Williamson
07ceaf9880 vfio-pci: Further fix BAR size overflow
In an analysis by Laszlo, the resulting type of our calculation for
the end of the MSI-X table, and thus the start of memory after the
table, is uint32_t.  We're therefore not correctly preventing the
corner case overflow that we intended to fix here where a BAR >=4G
could place the MSI-X table to end exactly at the 4G boundary.  The
MSI-X table offset is defined by the hardware spec to 32bits, so we
simply use a cast rather than changing data structure types.  This
scenario is purely theoretically, typically the MSI-X table is located
at the front of the BAR.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-04-28 11:14:02 -06:00
Peter Maydell
3b64349539 memory: Replace io_mem_read/write with memory_region_dispatch_read/write
Rather than retaining io_mem_read/write as simple wrappers around
the memory_region_dispatch_read/write functions, make the latter
public and change all the callers to use them, since we need to
touch all the callsites anyway to add MemTxAttrs and MemTxResult
support. Delete io_mem_read and io_mem_write entirely.

(All the callers currently pass MEMTXATTRS_UNSPECIFIED
and convert the return value back to bool or ignore it.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2015-04-26 16:49:23 +01:00
Gonglei
78e5b17f04 vfio: Remove superfluous '\n' around error_report()
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-03-10 08:15:33 +03:00
Gavin Shan
2aad88f4b0 sPAPR: Implement sPAPRPHBClass EEH callbacks
The patch implements sPAPRPHBClass EEH callbacks so that the EEH
RTAS requests can be routed to VFIO for further handling.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09 15:00:08 +01:00
Alex Williamson
47cbe50cc8 vfio-pci: Enable device request notification support
Linux v4.0-rc1 vfio-pci introduced a new virtual interrupt to allow
the kernel to request a device from the user.  When signaled, QEMU
will by default attmempt to hot-unplug the device.  This is a one-
shot attempt with the expectation that the kernel will continue to
poll for the device if it is not returned.  Returning the device when
requested is the expected standard model of cooperative usage, but we
also add an option option to disable this feature.  Initially this
opt-out is set as an experimental option because we really should
honor kernel requests for the device.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-03-02 11:38:55 -07:00
Samuel Pitoiset
6ee47c9008 vfio: allow to disable MMAP per device with -x-mmap=off option
Disabling MMAP support uses the slower read/write accesses but allows to
trace all MMIO accesses, which is not good for performance, but very
useful for reverse engineering PCI drivers. This option allows to
disable MMAP per device without a compile-time change.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-03-02 11:38:55 -07:00
Alexey Kardashevskiy
51b833f440 vfio: Make type1 listener symbols static
They are not used from anywhere but common.c which is where these are
defined so make them static.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-03-02 11:38:55 -07:00
Alexey Kardashevskiy
46f770d4a5 vfio: Add ioctl number to error report
This makes the error report more informative.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-03-02 11:38:54 -07:00
Alexey Kardashevskiy
bc5baffa35 vfio: Fix debug message compile error
This fixes a compiler error which occurs if DEBUG_VFIO is defined.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-02-10 10:25:44 -07:00
Alex Williamson
2e6e697e16 vfio: Use vfio type1 v2 IOMMU interface
The difference between v1 and v2 is fairly subtle, simply more
deterministic behavior for unmaps.  The v1 interface allows the user
to attempt to unmap sub-regions of previous mappings, returning
success with zero size if unable to comply.  This was a reflection of
the underlying IOMMU API.  The v2 interface requires that the user
may only unmap fully contained mappings, ie. an unmap cannot intersect
or bisect a previous mapping, but may cover multiple mappings.  QEMU
never made use of the sub-region v1 support anyway, so we can support
either v1 or v2.  We'll favor v2 since it's newer.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-02-10 10:25:44 -07:00
Paolo Bonzini
ba5e6bfa1a vfio: unmap and free BAR data in instance_finalize
In the case of VFIO, the unrealize callback is too early to munmap the
BARs.  The munmap must be delayed until memory accesses are complete.
To do this, split vfio_unmap_bars in two.  The removal step, now called
vfio_unregister_bars, remains in vfio_exitfn.  The reclamation step
is vfio_unmap_bars and is moved to the instance_finalize callback.

Similarly, quirk MemoryRegions have to be removed during
vfio_unregister_bars, but freeing the data structure must be delayed
to vfio_unmap_bars.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-02-10 10:25:44 -07:00
Paolo Bonzini
77a10d04d0 vfio: free dynamically-allocated data in instance_finalize
In order to enable out-of-BQL address space lookup, destruction of
devices needs to be split in two phases.

Unrealize is the first phase; once it complete no new accesses will
be started, but there may still be pending memory accesses can still
be completed.

The second part is freeing the device, which only happens once all memory
accesses are complete.  At this point the reference count has dropped to
zero, an RCU grace period must have completed (because the RCU-protected
FlatViews hold a reference to the device via memory_region_ref).  This is
when instance_finalize is called.

Freeing data belongs in an instance_finalize callback, because the
dynamically allocated memory can still be used after unrealize by the
pending memory accesses.

This starts the process by creating an instance_finalize callback and
freeing most of the dynamically-allocated data in instance_finalize.
Because instance_finalize is also called on error paths or also when
the device is actually not realized, the common code needs some changes
to be ready for this.  The error path in vfio_initfn can be simplified too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-02-10 10:25:44 -07:00
Paolo Bonzini
217e9fdcad vfio: cleanup vfio_get_device error path, remove vfio_populate_device callback
Now that vfio_put_base_device is called unconditionally at instance_finalize
time, it can be called twice if vfio_populate_device fails.  This works
but it is slightly harder to follow.

Change vfio_get_device to not touch the vbasedev struct until it will
definitely succeed, moving the vfio_populate_device call back to vfio-pci.
This way, vfio_put_base_device will only be called once.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-02-10 10:25:44 -07:00
Alex Williamson
3a4dbe6aa9 vfio-pci: Fix missing unparent of dynamically allocated MemoryRegion
Commit d8d9581460 added explicit object_unparent() calls for
dynamically allocated MemoryRegions.  The VFIOMSIXInfo structure also
contains such a MemoryRegion, covering the mmap'd region of a PCI BAR
above the MSI-X table.  This structure is freed as part of the class
exit function and therefore also needs an explicit object_unparent().
Failing to do this results in random segfaults due to fields within
the structure, often the class pointer, being reclaimed and corrupted
by the time object_finalize_child_property() is called for the object.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org # 2.2
2015-02-04 11:45:32 -07:00
Chen Fan
39cb514f02 vfio: fix wrong initialize vfio_group_list
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-02-04 11:45:32 -07:00
Alex Williamson
b3e27c3aee vfio-pci: Fix interrupt disabling
When disabling MSI/X interrupts the disable functions will leave the
device in INTx mode (when available).  This matches how hardware
operates, INTx is enabled unless MSI/X is enabled (DisINTx is handled
separately).  Therefore when we really want to disable all interrupts,
such as when removing the device, and we start with the device in
MSI/X mode, we need to pass through INTx on our way to being
completely quiesced.

In well behaved situations, the guest driver will have shutdown the
device and it will start vfio_exitfn() in INTx mode, producing the
desired result.  If hot-unplug causes the guest to crash, we may get
the device in MSI/X state, which will leave QEMU with a bogus handler
installed.

Fix this by re-ordering our disable routine so that it should always
finish in VFIO_INT_NONE state, which is what all callers expect.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-01-09 08:50:53 -07:00
Alex Williamson
29c6e6df49 vfio-pci: Fix BAR size overflow
We use an unsigned int when working with the PCI BAR size, which can
obviously overflow if the BAR is 4GB or larger.  This needs to change
to a fixed length uint64_t.  A similar issue is possible, though even
more unlikely, when mapping the region above an MSI-X table.  The
start of the MSI-X vector table must be below 4GB, but the end, and
therefore the start of the next mapping region, could still land at
4GB.

Suggested-by: Nishank Trivedi <nishank.trivedi@netapp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2015-01-09 08:50:53 -07:00
Alex Williamson
dcbfc5cefb vfio: Cleanup error_report()s
With the conversion to tracepoints, a couple previous DPRINTKs are
now quite a bit more visible and are really just informational.
Remove these and add a bit more description to another.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-12-22 10:37:27 -07:00
Eric Auger
e2c7d025ad hw/vfio: create common module
A new common module is created. It implements all functions
that have no device specificity (PCI, Platform).

This patch only consists in move (no functional changes)

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-12-22 09:54:51 -07:00
Eric Auger
df92ee4448 hw/vfio/pci: use name field in format strings
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-12-22 09:54:49 -07:00
Eric Auger
62356b7292 hw/vfio/pci: rename group_list into vfio_group_list
better fit in the rest of the namespace

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-12-22 09:54:46 -07:00
Eric Auger
d13dd2d7a9 hw/vfio/pci: split vfio_get_device
vfio_get_device now takes a VFIODevice as argument. The function is split
into 2 parts: vfio_get_device which is generic and vfio_populate_device
which is bus specific.

3 new fields are introduced in VFIODevice to store dev_info.

vfio_put_base_device is created.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-12-22 09:54:38 -07:00
Eric Auger
a664477db8 hw/vfio/pci: Introduce VFIORegion
This structure is going to be shared by VFIOPCIDevice and
VFIOPlatformDevice. VFIOBAR includes it.

vfio_eoi becomes an ops of VFIODevice specialized by parent device.
This makes possible to transform vfio_bar_write/read into generic
vfio_region_write/read that will be used by VFIOPlatformDevice too.

vfio_mmap_bar becomes vfio_map_region

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-12-22 09:54:37 -07:00
Eric Auger
b47d8efa9f hw/vfio/pci: handle reset at VFIODevice
Since we can potentially have both PCI and platform devices in
the same VFIO group, this latter now owns a list of VFIODevices.
A unified reset handler, vfio_reset_handler, is registered, looping
through this VFIODevice list. 2 specialized operations are introduced
(vfio_compute_needs_reset and vfio_hot_reset_multi): they allow to
implement type specific behavior. also reset_works and needs_reset
VFIOPCIDevice fields are moved into VFIODevice.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-12-22 09:54:35 -07:00
Eric Auger
462037c9e8 hw/vfio/pci: add type, name and group fields in VFIODevice
Add 3 new fields in the VFIODevice struct. Type is set to
VFIO_DEVICE_TYPE_PCI. The type enum value will later be used
to discriminate between VFIO PCI and platform devices. The name is
set to domain🚌slot:function. Currently used to test whether
the device already is attached to the group. Later on, the name
will be used to simplify all traces. The group is simply moved
from VFIOPCIDevice to VFIODevice.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
[Fix g_strdup_printf() usage]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-12-22 09:54:31 -07:00
Eric Auger
5546a621a8 hw/vfio/pci: introduce minimalist VFIODevice with fd
Introduce a new base VFIODevice strcut that will be used by both PCI
and Platform VFIO device. Move VFIOPCIDevice fd field there. Obviously
other fields from VFIOPCIDevice will be moved there but this patch
file is introduced to ease the review.

Also vfio_mask_single_irqindex, vfio_unmask_single_irqindex,
vfio_disable_irqindex now take a VFIODevice handle as argument.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-12-19 15:24:31 -07:00
Eric Auger
079eb19cbb hw/vfio/pci: generalize mask/unmask to any IRQ index
To prepare for platform device introduction, rename vfio_mask_intx
and vfio_unmask_intx into vfio_mask_single_irqindex and respectively
unmask_single_irqindex. Also use a nex index parameter.

With that name and prototype the function will be usable for other
indexes than VFIO_PCI_INTX_IRQ_INDEX.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-12-19 15:24:24 -07:00
Eric Auger
9ee27d7381 hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice
This prepares for the introduction of VFIOPlatformDevice

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-12-19 15:24:15 -07:00
Kim Phillips
cf7087db10 vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio
This is done in preparation for the addition of VFIO platform
device support.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-12-19 15:24:06 -07:00