All the PCI enumeration and device node creation was off-loaded to
SLOF. With PCI hotplug support, code needed to be added to add device
node. This creates multiple copy of the code one in SLOF and other in
hotplug code. To unify this, the patch adds the pci device node
creation in Qemu. For backward compatibility, a flag
"qemu,phb-enumerated" is added to the phb, suggesting to SLOF to not
do device node creation.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
[ Squashed Michael's drc_index changes ]
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Since we now require GLib 2.22+ (commit f40685c), we don't have to
work around lack of g_hash_table_iter_init() & friends anymore.
This reverts commit f8833a37c0.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Device node names should encode the unit address as hex, while the
code was encodind it as integers.
Also, use FDT_NAME_MAX macro for allocating and composing the name.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Current code missed the Prog IF register. All Class Code, Subclass,
and Prog IF registers are needed to identify the accurate device type.
For example: USB controllers use the PROG IF for denoting: USB
FullSpeed, HighSpeed or SuperSpeed.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The properties reg/assigned-resources need to encode 64-bit memory
address space as part of phys.hi dword.
00 if configuration space
01 if IO region,
10 if 32-bit MEM region
11 if 64-bit MEM region
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The code for -machine pseries maintains a global sPAPREnvironment structure
which keeps track of general state information about the guest platform.
This predates the existence of the MachineState structure, but performs
basically the same function.
Now that we have the generic MachineState, fold sPAPREnvironment into
sPAPRMachineState, the pseries specific subclass of MachineState.
This is mostly a matter of search and replace, although a few places which
relied on the global spapr variable are changed to find the structure via
qdev_get_machine().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
These macros expand into error class enumeration constant, comma,
string. Unclean. Has been that way since commit 13f59ae.
The error class is always ERROR_CLASS_GENERIC_ERROR since the previous
commit.
Clean up as follows:
* Prepend every use of a QERR_ macro by ERROR_CLASS_GENERIC_ERROR, and
delete it from the QERR_ macro. No change after preprocessing.
* Rewrite error_set(ERROR_CLASS_GENERIC_ERROR, ...) into
error_setg(...). Again, no change after preprocessing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
This uses extension of existing EPOW interrupt/event mechanism
to notify userspace tools like librtas/drmgr to handle
in-guest configuration/cleanup operations in response to
device_add/device_del.
Userspace tools that don't implement this extension will need
to be run manually in response/advance of device_add/device_del,
respectively.
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This enables hotplug of PCI devices to a PHB. Upon hotplug we
generate the OF-nodes required by PAPR specification and
IEEE 1275-1994 "PCI Bus Binding to Open Firmware" for the
device.
We associate the corresponding FDT for these nodes with the DRC
corresponding to the slot, which will be fetched via
ibm,configure-connector RTAS calls by the guest as described by PAPR
specification.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
These will be used to support hotplug/unplug of PCI devices to the PCI
bus associated with a particular PHB.
We also set up device-tree properties in each PHBs initial FDT to
describe the DRCs associated with them. This advertises to guests that
each PHB is DR-capable device with physical hotpluggable slots, each
managed by the corresponding DRC. This is necessary for allowing
hotplugging of devices to it later via bus rescan or guest rpaphp
hotplug module.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This option enables/disables PCI hotplug for a particular PHB.
Also add machine compatibility code to disable it by default for machine
types prior to pseries-2.4.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: move commas for compat fields]
Signed-off-by: Alexander Graf <agraf@suse.de>
This replaces object_child_foreach() and callback with existing
SPAPR_PCI_LIOBN() and spapr_tce_find_by_liobn() to make the code easier
to read.
This is a mechanical patch so no behaviour change is expected.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This makes find_phb()/find_dev() public and changed its names
to spapr_pci_find_phb()/spapr_pci_find_dev() as they are going to
be used from other parts of QEMU such as VFIO DDW (dynamic DMA window)
or VFIO PCI error injection or VFIO EEH handling - in all these
cases there are RTAS calls which are addressed to BUID+config_addr
in IEEE1275 format.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This gets rid of a magic constant describing the default DMA window size
for an emulated PHB.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
We are going to have multiple DMA windows per PHB and we want them to
migrate so we need a predictable way of assigning LIOBNs.
This introduces a macro which makes up a LIOBN from fixed prefix,
PHB index (unique PHB id) and window number.
This introduces a SPAPR_PCI_DMA_WINDOW_NUM() to know the window number
from LIOBN. It is used to distinguish the default 32bit windows from
dynamic windows and avoid picking default DMA window properties from
a wrong TCE table.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
spapr_pci.c contains a number of expressions of the form (uval == -1) or
(uval != -1), where 'uval' is an unsigned value.
This mostly works in practice, because as long as the width of uval is
greater or equal than that of (int), the -1 will be promoted to the
unsigned type, which is the expected outcome.
However, at least for the cases where uval is uint32_t, this would break
on platforms where sizeof(int) > 4 (and a few such do exist), because then
the uint32_t value would be promoted to the larger int type, and never be
equal to -1.
This patch fixes these errors. The fixes for the (uint32_t) cases are
necessary as described above. I've made similar fixes to (uint64_t) and
(hwaddr) cases. Those are strictly theoretical, since I don't know of any
platforms where sizeof(int) > 8, but hey, it's not that hard so we might
as well be strictly C standard compliant.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The emulation for EEH RTAS requests from guest isn't covered
by QEMU yet and the patch implements them.
The patch defines constants used by EEH RTAS calls and adds
callbacks sPAPRPHBClass::{eeh_set_option, eeh_get_state, eeh_reset,
eeh_configure}, which are going to be used as follows:
* RTAS calls are received in spapr_pci.c, sanity check is done
there.
* RTAS handlers handle what they can. If there is something it
cannot handle and the corresponding sPAPRPHBClass callback is
defined, it is called.
* Those callbacks are only implemented for VFIO now. They do ioctl()
to the IOMMU container fd to complete the calls. Error codes from
that ioctl() are transferred back to the guest.
[aik: defined RTAS tokens for EEH RTAS calls]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
When the guest switches the interrupt endian mode, which essentially
means a global machine endian switch, we want to change the VGA
framebuffer endian mode as well in order to be backward compatible
with existing guests who don't know about the new endian control
register.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment sPAPR only supports 512MB window for MMIO BARs. However
modern devices might want bigger 64bit BARs.
This extends MMIO window from 512MB to 62GB (aligned to
SPAPR_PCI_WINDOW_SPACING) and advertises it in 2 records in
the PHB "ranges" property. 32bit gets the space from
SPAPR_PCI_MEM_WIN_BUS_OFFSET till the end of 4GB, 64bit gets the rest
of the space. If no space is left, 64bit range is not advertised.
The MMIO space size is set to old value of 0x20000000 by default
for pseries machines older than 2.3.
The approach changes the device tree which is a guest visible change, however
it won't break migration as:
1. we do not support migration to older QEMU versions
2. migration to newer QEMU will migrate the device tree as well and since
the new layout only extends the old one and does not change address mappigns,
no breakage is expected here too.
SLOF change is required to utilize this extension.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
pseries guests can have large numbers of PCI host bridges. To avoid the
user having to specify a number of different configuration values for every
one, the device supports an "index" property which is a shorthand setting
the various window and configuration addresses from a predefined sensible
set.
There are some problems with the details at present:
* The "index" propery is signed, but negative values will create PCI
windows below where we expect, potentially colliding with other devices
* No limit is imposed on the "index" property and large values can
translate to extremely large window addresses. With PCI passthrough in
particular this can mean we exceed various mapping and physical address
limits causing the guest host bridge to not work in strange ways.
This patch addresses this, by making "index" unsigned, and imposing a
limit. Currently the limit allows indices from 0..255 which is probably
enough host bridges for the time being. It's fairly easy to extend if
we discover we need more.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The g_hash_table_iter_* functions for iterating through a hash table
are not present in glib 2.12, which is our current minimum requirement.
Rewrite the code to use g_hash_table_foreach() instead.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
On sPAPR, virtio devices are connected to the PCI bus and use MSI-X.
Commit cc943c36fa has modified MSI-X
so that writes are made using the bus master address space and follow
the IOMMU path.
Unfortunately, the IOMMU address space address space does not have an
MSI window: the notification is silently dropped in unassigned_mem_write
instead of reaching the guest... The most visible effect is that all
virtio devices are non-functional on sPAPR since then. :(
This patch does the following:
1) map the MSI window into the IOMMU address space for each PHB
- since each PHB instantiates its own IOMMU address space, we
can safely map the window at a fixed address (SPAPR_PCI_MSI_WINDOW)
- no real need to keep the MSI window setup in a separate function,
the spapr_pci_msi_init() code moves to spapr_phb_realize().
2) kill the global MSI window as it is not needed in the end
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
When disabling MSI/MSIX via "ibm,change-msi" RTAS call, no check was made
if MSI or MSIX is actually supported and the MSI message was reset
unconditionally. If this happened on a device which does not support MSI
(but does support MSIX, otherwise "ibm,change-msi" would not be called),
this device would have PCIDevice::msi_cap field (MSI capability offset)
set to zero and writing a vector would actually clear PCI status.
This clears MSI message only if MSI or MSIX is present on a device.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently SPAPR PHB keeps track of all allocated MSI (here and below
MSI stands for both MSI and MSIX) interrupt because
XICS used to be unable to reuse interrupts. This is a problem for
dynamic MSI reconfiguration which happens when guest reloads a driver
or performs PCI hotplug. Another problem is that the existing
implementation can enable MSI on 32 devices maximum
(SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that.
This makes use of new XICS ability to reuse interrupts.
This reorganizes MSI information storage in sPAPRPHBState. Instead of
static array of 32 descriptors (one per a PCI function), this patch adds
a GHashTable when @config_addr is a key and (first_irq, num) pair is
a value. GHashTable can dynamically grow and shrink so the initial limit
of 32 devices is gone.
This changes migration stream as @msi_table was a static array while new
@msi_devs is a dynamic hash table. This adds temporary array which is
used for migration, it is populated in "spapr_pci"::pre_save() callback
and expanded into the hash table in post_load() callback. Since
the destination side does not know the number of MSI-enabled devices
in advance and cannot pre-allocate the temporary array to receive
migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro
which allocates the array automatically.
This resets the MSI configuration space when interrupts are released by
the ibm,change-msi RTAS call.
This fixed traces to be more informative.
This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which
was incorrect by accident. As the internal representation changed,
thus bumps migration version number.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: drop g_malloc_n usage]
Signed-off-by: Alexander Graf <agraf@suse.de>
The current allocator returns IRQ numbers from a pool and does not
support IRQs reuse in any form as it did not keep track of what it
previously returned, it only keeps the last returned IRQ. Some use
cases such as PCI hot(un)plug may require IRQ release and reallocation.
This moves an allocator from SPAPR to XICS.
This switches IRQ users to use new API.
This uses LSI/MSI flags to know if interrupt is allocated.
The interrupt release function will be posted as a separate patch.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows allocating
TCE tables in the host kernel memory and handle H_PUT_TCE requests
targeted to specific LIOBN (logical bus number) right in the host without
switching to QEMU. At the moment this is used for emulated devices only
and the handler only puts TCE to the table. If the in-kernel H_PUT_TCE
handler finds a LIOBN and corresponding table, it will put a TCE to
the table and complete hypercall execution. The user space will not be
notified.
Upcoming VFIO support is going to use the same sPAPRTCETable device class
so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE
tables for VFIO are going to be allocated in the host as well.
However VFIO operates with real IOMMU tables and simple copying of
a TCE to the real hardware TCE table will not work as guest physical
to host physical address translation is requited.
So until the host kernel gets VFIO support for H_PUT_TCE, we better not
to register VFIO's TCE in the host.
This adds a place holder for KVM_CAP_SPAPR_TCE_VFIO capability. It is not
in upstream yet and being discussed so now it is always false which means
that in-kernel VFIO acceleration is not supported.
This adds a bool @vfio_accel flag to the sPAPRTCETable device telling
that sPAPRTCETable should not try allocating TCE table in the host kernel
for VFIO. The flag is false now as at the moment there is no VFIO.
This adds an vfio_accel parameter to spapr_tce_new_table(), the semantic
is the same. Since there is only emulated PCI and VIO now, the flag is set
to false. Upcoming VFIO support will set it to true.
This is a preparation patch so no change in behaviour is expected
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment spapr_rtas_register() allocates a new token number for every
new RTAS callback so numbers are not fixed and depend on the number of
supported RTAS handlers and the exact order of spapr_rtas_register() calls.
These tokens are copied into the device tree and remain the same during
the guest lifetime.
When we start another guest to receive a migration, it calls
spapr_rtas_register() as well. If the number of RTAS handlers or their
order is different in QEMU on source and destination sides, the "/rtas"
node in the device tree will differ. Since migration overwrites the device
tree (as it overwrites the entire RAM), the actual RTAS config on
the destination side gets broken.
This defines global contant values for every RTAS token which QEMU
is using today.
This changes spapr_rtas_register() to accept a token number instead of
allocating one. This changes all users of spapr_rtas_register().
This changes XICS-KVM not to cache tokens registered with KVM as they
constant now.
This makes TOKEN_BASE global as RTAS_XXX use TOKEN_BASE as
a base. TOKEN_MAX is moved and renamed too and its value is changed
to the last token + 1. Boundary checks for token values are adjusted.
This reserves token numbers for "os-term" handlers and PCI hotplug
which we are working on.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Hotplug of multiple disks fails due to MSI vector quota check.
Number of MSI vectors default to 8 allowing only 4 devices.
This happens on RHEL6.5 guest. RHEL7 and SLES11 guests fallback
to INTX.
One way to workaround the issue is to increase total MSIs,
so that MSI quota check allows us to hotplug multiple disks.
This sets the quota to the maximum number of interupts XICS has
which is 1024 now (XICS_IRQS). This moves XICS_IRQS from spapr.c
to xics.h for wider visibility.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
[aik: put XICS_IRQS=1024 instead of 64i, fixed endianness and size]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds @bus_offset into sPAPRTCETable to tell where TCE table starts
from. It is set to 0 for emulated devices. Dynamic DMA windows will use
other offset.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
spec allows other page sizes and we are going to implement them, we need
page size to be configrable.
This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT
with it where it is possible.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
This removes window_size as it is basically a copy of nb_table
shifted by SPAPR_TCE_PAGE_SHIFT. As new dynamic DMA windows are
going to support windows as big as the entire RAM and this number
will be bigger that 32 capacity, we will have to do something
about @window_size anyway and removal seems to be the right way to go.
This removes dma_window_start/dma_window_size from sPAPRPHBState as
they are no longer used.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment sPAPRPHBState contains a @tcet pointer to the only
TCE table. However sPAPR spec allows having more than one DMA window.
Since the TCE object is already a child of SPAPR PHB object, there is
no need to keep an additional pointer to it in sPAPRPHBState so remove it.
This changes the way sPAPRPHBState::reset performs reset of sPAPRTCETable
objects.
This changes the default DMA window properties calculation.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently the default DMA window is represented by a single MemoryRegion.
However there can be more than just one window so we need
a "root" memory region to be separated from the actual DMA window(s).
This introduces a "root" IOMMU memory region and adds a subregion for
the default DMA 32bit window. Following patches will add other
subregion(s).
This initializes a default DMA window subregion size to the guest RAM
size as this window can be switched into "bypass" mode which implements
direct DMA mapping.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The spapr-pci PHB initializes IOMMU for emulated devices only.
The upcoming VFIO support will do it different. However both emulated
and VFIO PHB types share most of the initialization code.
For the type specific things a new finish_realize() callback is
introduced.
This introduces sPAPRPHBClass derived from PCIHostBridgeClass and
adds the callback pointer.
This implements finish_realize() for emulated devices.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: Fix compilation]
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment XICS does not support interrupts reuse so sPAPR PHB
implements this. sPAPRPHBState holds array of 32 spapr_pci_msi to
describe PCI config address, first MSI and number of MSIs. Once
allocated for a device, QEMU tries reusing this config until the number
of MSIs changes.
Existing SPAPR guests call ibm,change-msi in a loop until the handler
returns the requested number of vectors.
Recently introduced check for the maximum number of MSI/MSIX vectors
supported by a device only works for a device which is new for PHB's
MSI cache. If it is already there, the check is not performed which
leads to new IRQ block allocation. This happens during PCI hotplug
even when the user hot plug the same device which he just hot unplugged.
This moves the check earlier.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Current guest kernels try allocating as many vectors as the quota is.
For example, in the case of virtio-net (which has just 3 vectors)
the guest requests 4 vectors (that is the quota in the test) and
the existing ibm,change-msi handler returns 4. But before it returns,
it calls msix_set_message() in a loop and corrupts memory behind
the end of msix_table.
This limits the number of vectors returned by ibm,change-msi to
the maximum supported by the actual device.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: qemu-stable@nongnu.org
[agraf: squash in bugfix from aik]
Signed-off-by: Alexander Graf <agraf@suse.de>
In the past, IO space could not be mapped into the memory address space
so we introduced a workaround for that. Nowadays it does not look
necessary so we can remove the workaround and make sPAPR PCI
configuration simplier.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
After previous Peter patch, they are redundant. This way we don't
assign them except when needed. Once there, there were lots of case
where the ".fields" indentation was wrong:
.fields = (VMStateField []) {
and
.fields = (VMStateField []) {
Change all the combinations to:
.fields = (VMStateField[]){
The biggest problem (appart from aesthetics) was that checkpatch complained
when we copy&pasted the code from one place to another.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
This converts the old-style SysBusDevice::init() callback to a new-style
DeviceClass::realize() callback.
As a part of conversion, this replaces fprintf(stderr) with error_setg()
as realize() does not "return" any value, instead it puts the extended
error into **errp.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Previously libvirt required the first/default PCI bus to have name "pci".
Since QEMU can support multiple buses now, libvirt wants "pci.0" now.
This removes custom bus name and lets QEMU make up default names.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Andreas Färber <afaerber@suse.de>
We currently size the msi window trap page according to the host's page
size so that we poke a working hole into a memory slot in case we overlap.
However, this is only ever necessary with KVM active. Without KVM, we should
rather try to be host platform agnostic and use a constant size: 4k.
This fixes a build breakage on win32 hosts.
Signed-off-by: Alexander Graf <agraf@suse.de>
Recent changes introduced cannot_instantiate_with_device_add_yet
and removed capability of adding yet another PCI host bridge via
command line for SPAPR platform (POWERPC64 server).
This brings the capability back and puts SPAPR PHB into "bridge"
category.
This is not much use for emulated PHB but it is absolutely required
for VFIO as we put an IOMMU group onto a separate PHB on SPAPR.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Replace them with uint8/32/64.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
It doesn't make sense for a region to be INT64_MAX in size:
memory core uses UINT64_MAX as a special value meaning
"all 64 bit" this is what was meant here.
While this should never affect the spapr system which at the moment always
has < 63 bit size, this makes us hit all kind of corner case bugs with
sub-pages, so users are probably better off if we just use UINT64_MAX
instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
This enables IRQFD for LSI (level triggered INTx interrupts) by adding
a spapr_route_intx_pin_to_irq() callback to the sPAPR PCI host bus. This
callback is called to know the global interrupt number to link resampling fd
with IRQFD's fd in KVM.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
On the sPAPR platform a guest allocates MSI/MSIX vectors via RTAS
hypercalls which return global IRQ numbers to a guest so it only
operates with those and never touches MSIMessage.
Therefore MSIMessage handling is completely hidden in QEMU.
Previously every sPAPR PCI host bridge implemented its own MSI window
to catch msi_notify()/msix_notify() calls from QEMU devices (virtio-pci
or vfio) and route them to the guest via qemu_pulse_irq().
MSIMessage used to be encoded as:
.addr - address within the PHB MSI window;
.data - the device index on PHB plus vector number.
The MSI MR write function translated this MSIMessage to a global IRQ
number and called qemu_pulse_irq().
However the total number of IRQs is not really big (at the moment it is
1024 IRQs starting from 4096) and even 16bit data field of MSIMessage
seems to be enough to store an IRQ number there.
This simplifies MSI handling in sPAPR PHB. Specifically, this does:
1. remove a MSI window from a PHB;
2. add a single memory region for all MSIs to sPAPREnvironment
and spapr_pci_msi_init() to initialize it;
3. encode MSIMessage as:
* .addr - a fixed address of SPAPR_PCI_MSI_WINDOW==0x40000000000ULL;
* .data as an IRQ number.
4. change IRQ allocator to align first IRQ number in a block for MSI.
MSI uses lower bits to specify the vector number so the first IRQ has to
be aligned. MSIX does not need any special allocator though.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
spapr-pci config space accessors use find_dev() to find a PCI device.
However find_dev() only searched on a primary bus and did not do
recursive search through secondary buses so config space access was not
possible for devices other that on a primary bus.
This fixed find_dev() by using the PCI API pci_find_device() function.
This effectively enabled pci bridges on spapr.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds the necessary support for saving the state of the PAPR virtual
PCI host bridge (or host bridges).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Message-id: 1374175984-8930-10-git-send-email-aliguori@us.ibm.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Model TCE tables as a device that's hooked up as a child object to
the owner. Besides the code cleanup, we get a few nice benefits:
1) free actually works now (it was dead code before)
2) the TCE information is visible in the device tree
3) we can expose table information as properties such that if we
change the window_size, we can use globals to keep migration
working.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-id: 1374175984-8930-6-git-send-email-aliguori@us.ibm.com
[dwg: pseries: savevm support for PAPR TCE tables]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[alexey: ppc kvm: fix to compile]
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This fixes endianness bugs in I/O port access.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Message-id: 1374501278-31549-5-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
i686-w64-mingw32-gcc (GCC) 4.6.3 from Debian wheezy reports these warnings:
hw/ppc/spapr_hcall.c:188:1: warning:
control reaches end of non-void function [-Wreturn-type]
hw/ppc/spapr_pci.c:454:1: warning:
control reaches end of non-void function [-Wreturn-type]
Both warnings are fixed by using g_assert_not_reached instead of assert.
A second line with assert(0) in spapr_pci.c which did not raise a compiler
warning was modified, too, because g_assert_not_reached documents the
purpose of that statement and is not removed in release builds.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This includes some pci enhancements:
Better support for systems with multiple PCI root buses
FW cfg interface for more robust pci programming in BIOS
Minor fixes/cleanups for fw cfg and cross-version migration -
because of dependencies with other patches
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJR2ctmAAoJECgfDbjSjVRpQpAH/Rk00yLrQ2R5ScNa8AL9LeaJ
gVFndBmmuRz4gdhyATx6lzR98ic32iTr0+YR5mL51btgmM5a0bEd/SIu34nXriWj
PsM0wdXfo/oEygdttxhvzJOH17tohRV9xg2WA2d8BEwDzrDyqoQ4J0VJlHlG7u3W
nq4KVDVUpLNQFKG8ZgJ2vW0WMw/mBSj2rluhQUALhcuvChphtvAFZ2rsSfJr6bzD
aBELrtIvfLvPGN/0WVeYs9qlp4EE03H3X6gN61QvV3/YElxubKUV5XyMDOX2dW3D
2j0NQi84LYHn0SFap2r/Kgm47/F6Q56SFk5lrgZrg60mhQTwocw7PfL8CGxjXRI=
=gxxc
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'mst/tags/for_anthony' into staging
pci,misc enhancements
This includes some pci enhancements:
Better support for systems with multiple PCI root buses
FW cfg interface for more robust pci programming in BIOS
Minor fixes/cleanups for fw cfg and cross-version migration -
because of dependencies with other patches
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Sun 07 Jul 2013 03:11:18 PM CDT using RSA key ID D28D5469
# gpg: Can't check signature: public key not found
# By David Gibson (10) and others
# Via Michael S. Tsirkin
* mst/tags/for_anthony:
pci: Fold host_buses list into PCIHostState functionality
pci: Remove domain from PCIHostBus
pci: Simpler implementation of primary PCI bus
pci: Add root bus parameter to pci_nic_init()
pci: Add root bus argument to pci_get_bus_devfn()
pci: Replace pci_find_domain() with more general pci_root_bus_path()
pci: Use helper to find device's root bus in pci_find_domain()
pci: Abolish pci_find_root_bus()
pci: Move pci_read_devaddr to pci-hotplug-old.c
pci: Cleanup configuration for pci-hotplug.c
pvpanic: fix fwcfg for big endian hosts
pvpanic: initialization cleanup
MAINTAINERS: s/Marcelo/Paolo/
e1000: cleanup process_tx_desc
pc_piix: cleanup init compat handling
pc: pass PCI hole ranges to Guests
pci: store PCI hole ranges in guestinfo structure
range: add Range structure
Message-id: 1373228271-31223-1-git-send-email-mst@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
pci_find_domain() is used in a number of places where we want an id for a
whole PCI domain (i.e. the subtree under a PCI root bus). The trouble is
that many platforms may support multiple independent host bridges with no
hardware supplied notion of domain number.
This patch, therefore, replaces calls to pci_find_domain() with calls to
a new pci_root_bus_path() returning a string. The new call is implemented
in terms of a new callback in the host bridge class, so it can be defined
in some way that's well defined for the platform. When no callback is
available we fall back on the qbus name.
Most current uses of pci_find_domain() are for error or informational
messages, so the change in identifiers should be harmless. The exception
is pci_get_dev_path(), whose results form part of migration streams. To
maintain compatibility with old migration streams, the PIIX PCI host is
altered to always supply "0000" for this path, which matches the old domain
number (since the code didn't actually support domains other than 0).
For the pseries (spapr) PCI bridge we use a different platform-unique
identifier (pseries machines can routinely have dozens of PCI host
bridges). Theoretically that breaks migration streams, but given that we
don't yet have migration support for pseries, it doesn't matter.
Any other machines that have working migration support including PCI
devices will need to be updated to maintain migration stream compatibility.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
RTAS is a hypervisor provided binary blob that a guest loads and
calls into to execute certain functions. It's similar to the
vsyscall page in Linux or the short lived VMCI paravirt interface
from VMware.
The QEMU implementation of the RTAS blob is simply a passthrough
that proxies all RTAS calls to the hypervisor via an hypercall.
While we pass a CPU argument for hypercall handling in QEMU, we
don't pass it for RTAS calls. Since some RTAs calls require
making hypercalls (normally RTAS is implemented as guest code) we
have nasty hacks to allow that.
Add a CPU argument to RTAS call handling so we can more easily
invoke hypercalls just as guest code would.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The "info mtree" command in QEMU console prints only "memory" and "I/O"
address spaces while there are actually a lot more other AddressSpace
structs created by PCI and VIO devices. Those devices do not normally
have names and therefore not present in "info mtree" output.
The patch fixes this.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the new iommu support in the memory core for iommu support. The only
user, spapr, is also converted, but it still provides a DMAContext
interface until the non-PCI bits switch to AddressSpace.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
[ Do not calls memory_region_del_subregion() on the device's
bus_master_enable_region, it is an alias; return an AddressSpace
from the IOMMU hook and remove the destructor hook. - David Gibson ]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The TCE table is currently returned as a DMAContext, and non-type-safe
APIs are called later passing back the DMAContext. Since we want to move
away from DMAContext, use an opaque type instead, and add an accessor
to retrieve the DMAContext from it.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>