With the previous changes, we can now let the ICS_KVM class inherit
directly from ICS_BASE class and not from the intermediate ICS_SIMPLE.
It makes the class hierarchy much cleaner.
What is left in the top classes is the low level interface to access
the KVM XICS device in ICS_KVM and the XICS emulating handlers in
ICS_SIMPLE.
This should not break migration compatibility.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Just like for the realize handlers, this makes possible to move the
common ICSState code of the reset handlers in the ics-base class.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This makes possible to move the common ICSState code of the realize
handlers in the ics-base class.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This changes the ICP realize and reset handlers in DeviceRealize and
DeviceReset handlers. parent handlers are now called from the
inheriting classes which is a cleaner object pattern.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On CentOS 7.5, gcc-4.8.5-28.el7_5.1.ppc64le fails to build QEMU due to :
hw/intc/xics_kvm.c: In function ‘ics_set_kvm_state’:
hw/intc/xics_kvm.c:281:13: error: ‘ret’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
return ret;
Fix the breakage and also remove the extra error reporting as
kvm_device_access() already provides a substantial error message.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The KVM helpers hide the low level interface used to communicate to
the XICS KVM device and provide a good cleanup to the XICS KVM models.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When using the emulated XICS, the 'info pic' monitor command shows:
CPU 0 XIRR=ff000000 ((nil)) PP=ff MFRR=ff
ICS 1000..13ff 0x10040060340
1000 MSI 05 00
1001 MSI 05 00
1002 MSI 05 00
1003 MSI ff 00
1004 LSI ff 00
1005 LSI ff 00
1006 LSI ff 00
1007 LSI ff 00
1008 MSI 05 00
1009 MSI 05 00
100a MSI 05 00
100b MSI 05 00
100c MSI 05 00
but when using the in-kernel XICS with the very same guest, we get:
CPU 0 XIRR=00000000 ((nil)) PP=ff MFRR=ff
ICS 1000..13ff 0x10032e00340
1000 MSI ff 00
1001 MSI ff 00
1002 MSI ff 00
1003 MSI ff 00
1004 LSI ff 00
1005 LSI ff 00
1006 LSI ff 00
1007 LSI ff 00
1008 MSI ff 00
1009 MSI ff 00
100a MSI ff 00
100b MSI ff 00
100c MSI ff 00
ie, all irqs are masked and XIRR is null, while we should get the
same output as with the emulated XICS.
If the guest is then migrated, 'info pic' shows the expected values
on both source and destination.
The problem is that QEMU doesn't synchronize with KVM before printing
the XICS state. Migration happens to fix the output because it enforces
synchronization with KVM.
To fix the invalid output of 'info pic', this patch introduces a new
synchronize_state operation for both ICPStateClass and ICSStateClass.
The ICP operation relies on run_on_cpu() in order to kick the vCPU
and avoid sleeping on KVM_GET_ONE_REG.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The cpu_setup() handler is only implemented by xics_kvm, where it really
does a typical "realize" job. Moreover, the realize() handler is called
shortly after cpu_setup(), on the same path.
This patch converts xics_kvm to implement realize() instead of cpu_setup().
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It makes more sense to pass an IPCState * to handlers of ICPStateClass
instead of a DeviceState *, if only to benefit from compile time type
checking. The same goes with ICSStateClass.
While here, we also change the declaration of ICPStateClass in xics.h
for consistency.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Taking into account that qemu_set_irq() returns immediatly if its first
argument is NULL, icp_kvm_reset() largely duplicates icp_reset().
This patch introduces a reset() handler, so that the common logic can
be implemented in icp_reset() only.
While there we can also drop icp_kvm_realize() and icp_kvm_unrealize(). This
causes icp-kvm to be realized in icp_realize(), which sets icp->xics, but
it has no impact.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that ICPState objects get finalized on CPU unplug, we should unregister
reset handlers as well to avoid a QEMU crash at machine reset time.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since commit a45863bda9 ("xics_kvm: Don't enable KVM_CAP_IRQ_XICS if
already enabled"), we were able to re-hotplug a vCPU that had been hot-
unplugged ealier, thanks to a boolean flag in ICPState that we set when
enabling KVM_CAP_IRQ_XICS.
This could work because the lifecycle of all ICPState objects was the
same as the machine. Commit 5bc8d26de2 ("spapr: allocate the ICPState
object from under sPAPRCPUCore") broke this assumption and now we always
pass a freshly allocated ICPState object (ie, with the flag unset) to
icp_kvm_cpu_setup().
This cause re-hotplug to fail with:
Unable to connect CPU8 to kernel XICS: Device or resource busy
Let's fix this by caching all the vCPU ids for which KVM_CAP_IRQ_XICS was
enabled. This also drops the now useless boolean flag from ICPState.
Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Kernel commit 17d48610ae0f ("KVM: PPC: Book 3S: XICS: Implement ICS
P/Q states") added new bits to the state used by KVM IRQs. Currently,
QEMU does not preserve these bits, so migrating (or otherwise saving
and restoring) the guest state causes the P and Q bits to be cleared.
Clearing the P bit has no effect, because the kernel will set it based
on other data, but the loss of a set Q bit will cause a lost
interrupt.
This patch preserves the P and Q bits, correcting the problem.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ics_get_kvm_state() "or"s set bits into irq->status but does not mask
out clear bits.
Correct this by initializing the IRQ status to zero before adding bits
to it.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The recent changes on the XICS layer removed the XICSState object to
let the sPAPR machine handle the ICP and ICS directly. The reset of
these objects was previously handled by XICSState, which was a SysBus
device, and to keep the same behavior, the ICP and ICS were assigned
to SysbBus.
But that broke the 'info qtree' command in the monitor. 'qtree'
performs a loop on the children of a bus to print their properties and
SysBus devices are expected to be found under SysBus, which is not the
case anymore.
The fix for this problem is to register reset handlers for the ICP and
ICS objects and stop using SysBus for such devices.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
'ICPState *' variables are currently named 'ss'. This is confusing, so
let's give them an appropriate name: 'icp'.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XICSState classes are not used anymore. They have now been fully
deprecated by the XICSFabric QOM interface. Do the cleanups.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is nothing left related to the XICS object in the realize
functions of the KVMXICSState and XICSState class. So adapt the
interfaces to call these routines directly from the sPAPR machine init
sequence.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The cpu_setup() handler is currently under the XICSState class but it
really belongs under ICPState as it is setting up an individual vCPU.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The cpu_setup() handler currently takes a 'XICSState *' argument to
grab the kernel ICP file descriptor. This interface can be simplified
by using the 'xics' backlink of the ICP object.
This change is also required by subsequent patches which makes use of
the QOM interface for XICS.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The kernel ICP file descriptor is the only reason behind the
KVMXICSState class and it's in the way of more cleanups. Let's make it
a static for the moment and move forward.
If this is problem, we could use an attribute under the sPAPR machine
later on.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, the ICP (Interrupt Controller Presenter) objects are created by
the 'nr_servers' property handler of the XICS object and a class
handler. They are realized in the XICS object realize routine.
Let's simplify the process by creating the ICP objects along with the
XICS object at the machine level.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, the ICS (Interrupt Controller Source) object is created and
realized by the init and realize routines of the XICS object, but some
of the parameters are only known at the machine level.
These parameters are passed from the sPAPR machine to the ICS object
in a rather convoluted way using property handlers and a class handler
of the XICS object. The number of irqs required to allocate the IRQ
state objects in the ICS realize routine is one of them.
Let's simplify the process by creating the ICS object along with the
XICS object at the machine level and link the ICS into the XICS list
of ICSs at this level also. In the sPAPR machine, there is only a
single ICS but that will change with the PowerNV machine.
Also, QOMify the creation of the objects and get rid of the
superfluous code.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
xics_spapr and xics_kvm nearly define the same 'set_nr_servers'
handler. Only the type of the ICP differs. So let's make a common one
to remove some duplicated code.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The existing implementation remains same and ics-base is introduced. The
type name "ics" is retained, and all the related functions renamed as
ics_simple_*
This will allow different implementations for the source controllers
such as the MSI support of PHB3 on Power8 which uses in-memory state
tables for example.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[ clg: added ICS_BASE_GET_CLASS and related fixes, based on :
http://patchwork.ozlabs.org/patch/646010/ ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead of an array of fixed sized blocks, use a list, as we will need
to have sources with variable number of interrupts. SPAPR only uses
a single entry. Native will create more. If performance becomes an
issue we can add some hashed lookup but for now this will do fine.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[ move the initialization of list to xics_common_initfn,
restore xirr_owner after migration and move restoring to
icp_post_load]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
[ clg: removed the icp_post_load() changes from nikunj patchset v3:
http://patchwork.ozlabs.org/patch/646008/ ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We abort a few lines above if kernel_xics_fd == -1.
This is only code cleanup.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The "ICP" is a different object than the "XICS". For historical reasons,
we have a number of places where we name a variable "icp" while it contains
a XICSState pointer. There *is* an ICPState structure too so this makes
the code really confusing.
This is a mechanical replacement of all those instances to use the name
"xics" instead. There should be no functional change.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[spapr_cpu_init has been moved to spapr_cpu_core.c, change there]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The common class doesn't change, the KVM one is sPAPR specific. Rename
variables and functions to xics_spapr.
Retain the type name as "xics" to preserve migration for existing sPAPR
guests.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
XICS is setup for each CPU during initialization. Provide a routine
to undo the same when CPU is unplugged. While here, move ss->cs management
into xics from xics_kvm since there is nothing KVM specific in it.
Also ensure xics reset doesn't set irq for CPUs that are already unplugged.
This allows reboot of a VM that has undergone CPU hotplug and unplug
to work correctly.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Memory barriers are needed also by Xen and, when the ioeventfd
bugs are fixed, by TCG as well.
sysemu/kvm.h is not anymore needed in sysemu/dma.h, move it to
the actual users.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef. Since then, we've moved to include qemu/osdep.h
everywhere. Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h. That's in excess of
100KiB of crap most .c files don't actually need.
Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h. Include qapi/error.h in .c files that need it and don't
get it now. Include qapi-types.h in qom/object.h for uint16List.
Update scripts/clean-includes accordingly. Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h
comment quoted above similarly.
This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third. Unfortunately, the number depending on
qapi-types.h shrinks only a little. More work is needed for that one.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-6-git-send-email-peter.maydell@linaro.org
When supporting CPU hot removal by parking the vCPU fd and reusing
it during hotplug again, there can be cases where we try to reenable
KVM_CAP_IRQ_XICS CAP for the vCPU for which it was already enabled.
Introduce a boolean member in ICPState to track this and don't
reenable the CAP if it was already enabled earlier.
Re-enabling this CAP should ideally work, but currently it results in
kernel trying to create and associate ICP with this vCPU and that
fails since there is already an ICP associated with it. Hence this
patch is needed to work around this problem in the kernel.
This change allows CPU hot removal to work for sPAPR.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The code for -machine pseries maintains a global sPAPREnvironment structure
which keeps track of general state information about the guest platform.
This predates the existence of the MachineState structure, but performs
basically the same function.
Now that we have the generic MachineState, fold sPAPREnvironment into
sPAPRMachineState, the pseries specific subclass of MachineState.
This is mostly a matter of search and replace, although a few places which
relied on the global spapr variable are changed to find the structure via
qdev_get_machine().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Compute kvm_irqfds_allowed by checking the KVM_CAP_IRQFD extension.
Remove direct settings in architecture specific files.
Add a new kvm_resamplefds_allowed variable, initialized by
checking the KVM_CAP_IRQFD_RESAMPLE extension. Add a corresponding
kvm_resamplefds_enabled() function.
A special notice for s390 where KVM_CAP_IRQFD was not immediatly
advirtised when irqfd capability was introduced in the kernel.
KVM_CAP_IRQ_ROUTING was advertised instead.
This was fixed in "KVM: s390: announce irqfd capability",
ebc3226202d5956a5963185222982d435378b899 whereas irqfd support
was brought in 84223598778ba08041f4297fda485df83414d57e,
"KVM: s390: irq routing for adapter interrupts". Both commits
first appear in 3.15 so there should not be any kernel
version impacted by this QEMU modification.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since islsi[] array has been merged into the ICSState struct,
we must not reset flags as they tell if the interrupt is in use.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The existing interrupt allocation scheme in SPAPR assumes that
interrupts are allocated at the start time, continously and the config
will not change. However, there are cases when this is not going to work
such as:
1. migration - we will have to have an ability to choose interrupt
numbers for devices in the command line and this will create gaps in
interrupt space.
2. PCI hotplug - interrupts from unplugged device need to be returned
back to interrupt pool, otherwise we will quickly run out of interrupts.
This replaces a separate lslsi[] array with a byte in the ICSIRQState
struct and defines "LSI" and "MSI" flags. Neither of these flags set
signals that the descriptor is not allocated and not in use.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment spapr_rtas_register() allocates a new token number for every
new RTAS callback so numbers are not fixed and depend on the number of
supported RTAS handlers and the exact order of spapr_rtas_register() calls.
These tokens are copied into the device tree and remain the same during
the guest lifetime.
When we start another guest to receive a migration, it calls
spapr_rtas_register() as well. If the number of RTAS handlers or their
order is different in QEMU on source and destination sides, the "/rtas"
node in the device tree will differ. Since migration overwrites the device
tree (as it overwrites the entire RAM), the actual RTAS config on
the destination side gets broken.
This defines global contant values for every RTAS token which QEMU
is using today.
This changes spapr_rtas_register() to accept a token number instead of
allocating one. This changes all users of spapr_rtas_register().
This changes XICS-KVM not to cache tokens registered with KVM as they
constant now.
This makes TOKEN_BASE global as RTAS_XXX use TOKEN_BASE as
a base. TOKEN_MAX is moved and renamed too and its value is changed
to the last token + 1. Boundary checks for token values are adjusted.
This reserves token numbers for "os-term" handlers and PCI hotplug
which we are working on.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Convert existing users of KVM_ENABLE_CAP to new helper.
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Currently interrupt priorities are set to 0 (highest) at the very
beginning of the guest execution which is not correct and makes the guest
produce random interrupt error messages such as:
"Interrupt 0x1001 (real) is invalid, disabling it".
This also prevents interrupt states from correct migration.
This initializes priority to 0xFF as the emulated XICS does.
Suggested-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Andreas Färber <afaerber@suse.de>
This makes use of @cpu_dt_id and related API in:
1. emulated XICS hypercall handlers as they receive fixed CPU indexes;
2. XICS-KVM to enable in-kernel XICS on right CPU;
3. device-tree renderer.
This removes @cpu_index fixup as @cpu_dt_id is used instead so QEMU monitor
can accept command-line CPU indexes again.
This changes kvm_arch_vcpu_id() to use ppc_get_vcpu_dt_id() as at the moment
KVM CPU id and device tree ID are calculated using the same algorithm.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This enables IRQFD support for sPAPR. The feature decreases the latency
of interrupt handling.
To enable IRQFD for MSI, this sets kvm_gsi_direct_mapping to true which
enables direct MSI mapping.
To enable IRQFD for LSI (level triggered INTx interrupts), a PCI host bus
callback is required. The patch for that is coming next.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Recent (host) kernels support emulating the PAPR defined "XICS" interrupt
controller system within KVM. This patch allows qemu to initialize and
configure the in-kernel XICS, and keep its state in sync with qemu's XICS
state as necessary.
This should give considerable performance improvements. e.g. on a simple
IPI ping-pong test between hardware threads, using qemu XICS gives us
around 5,000 irqs/second, whereas the in-kernel XICS gives us around
70,000 irqs/s on the same hardware configuration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[Mike Qiu <qiudayu@linux.vnet.ibm.com>: fixed mistype which caused ics_set_kvm_state() to fail]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>