memory_region_allocate_system_memory() API is going away, so
replace it with memdev allocated MemoryRegion. The later is
initialized by generic code, so board only needs to opt in
to memdev scheme by providing
MachineClass::default_ram_id
and using MachineState::ram instead of manually initializing
RAM memory region.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200219160953.13771-67-imammedo@redhat.com>
If user provided non-sense RAM size, board will complain and
continue running with max RAM size supported or sometimes
crash like this:
%QEMU -M bamboo -m 1
exec.c:1926: find_ram_offset: Assertion `size != 0' failed.
Aborted (core dumped)
Also RAM is going to be allocated by generic code, so it won't be
possible for board to fix things up for user.
Make it error message and exit to force user fix CLI,
instead of accepting non-sense CLI values.
That also fixes crash issue, since wrongly calculated size
isn't used to allocate RAM
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200219160953.13771-66-imammedo@redhat.com>
This patch includes migration support for machine check
handling. Especially this patch blocks VM migration
requests until the machine check error handling is
complete as these errors are specific to the source
hardware and is irrelevant on the target hardware.
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
[Do not set FWNMI cap in post_load, now its done in .apply hook]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Message-Id: <20200130184423.20519-7-ganeshgr@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch adds support in QEMU to handle "ibm,nmi-register"
and "ibm,nmi-interlock" RTAS calls.
The machine check notification address is saved when the
OS issues "ibm,nmi-register" RTAS call.
This patch also handles the case when multiple processors
experience machine check at or about the same time by
handling "ibm,nmi-interlock" call. In such cases, as per
PAPR, subsequent processors serialize waiting for the first
processor to issue the "ibm,nmi-interlock" call. The second
processor that also received a machine check error waits
till the first processor is done reading the error log.
The first processor issues "ibm,nmi-interlock" call
when the error log is consumed.
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
[Register fwnmi RTAS calls in core_rtas_register_types()
where other RTAS calls are registered]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Message-Id: <20200130184423.20519-6-ganeshgr@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Upon a machine check exception (MCE) in a guest address space,
KVM causes a guest exit to enable QEMU to build and pass the
error to the guest in the PAPR defined rtas error log format.
This patch builds the rtas error log, copies it to the rtas_addr
and then invokes the guest registered machine check handler. The
handler in the guest takes suitable action(s) depending on the type
and criticality of the error. For example, if an error is
unrecoverable memory corruption in an application inside the
guest, then the guest kernel sends a SIGBUS to the application.
For recoverable errors, the guest performs recovery actions and
logs the error.
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
[Assume SLOF has allocated enough room for rtas error log]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200130184423.20519-5-ganeshgr@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Memory error such as bit flips that cannot be corrected
by hardware are passed on to the kernel for handling.
If the memory address in error belongs to guest then
the guest kernel is responsible for taking suitable action.
Patch [1] enhances KVM to exit guest with exit reason
set to KVM_EXIT_NMI in such cases. This patch handles
KVM_EXIT_NMI exit.
[1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
(e20bbd3d and related commits)
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200130184423.20519-4-ganeshgr@linux.ibm.com>
[dwg: #ifdefs to fix compile for 32-bit target]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introduce fwnmi an spapr capability and add a helper function
which tries to enable it, which would be used by following patch
of the series. This patch by itself does not change the existing
behavior.
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
[eliminate cap_ppc_fwnmi, add fwnmi cap to migration state
and reprhase the commit message]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200130184423.20519-3-ganeshgr@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is a model of the PCIe Host Bridge (PHB3) found on a POWER8
processor. It includes the PowerBus logic interface (PBCQ), IOMMU
support, a single PCIe Gen.3 Root Complex, and support for MSI and LSI
interrupt sources as found on a POWER8 system using the XICS interrupt
controller.
The POWER8 processor comes in different flavors: Venice, Murano,
Naple, each having a different number of PHBs. To make things simpler,
the models provides 3 PHB3 per chip. Some platforms, like the
Firestone, can also couple PHBs on the first chip to provide more
bandwidth but this is too specific to model in QEMU.
XICS requires some adjustment to support the PHB3 MSI. The changes are
provided here but they could be decoupled in prereq patches.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200127144506.11132-3-clg@kaod.org>
[dwg: Use device_class_set_props()]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These changes introduces models for the PCIe Host Bridge (PHB4) of the
POWER9 processor. It includes the PowerBus logic interface (PBCQ),
IOMMU support, a single PCIe Gen.4 Root Complex, and support for MSI
and LSI interrupt sources as found on a POWER9 system using the XIVE
interrupt controller.
POWER9 processor comes with 3 PHB4 PEC (PCI Express Controller) and
each PEC can have several PHBs. By default,
* PEC0 provides 1 PHB (PHB0)
* PEC1 provides 2 PHBs (PHB1 and PHB2)
* PEC2 provides 3 PHBs (PHB3, PHB4 and PHB5)
Each PEC has a set "global" registers and some "per-stack" (per-PHB)
registers. Those are organized in two XSCOM ranges, the "Nest" range
and the "PCI" range, each range contains both some "PEC" registers and
some "per-stack" registers.
No default device layout is provided and PCI devices can be added on
any of the available PCIe Root Port (pcie.0 .. 2 of a Power9 chip)
with address 0x0 as the firwware (skiboot) only accepts a single
device per root port. To run a simple system with a network and a
storage adapters, use a command line options such as :
-device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0
-netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0
-device megasas,id=scsi0,bus=pcie.1,addr=0x0
-drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
If more are needed, include a bridge.
Multi chip is supported, each chip adding its set of PHB4 controllers
and its PCI busses. The model doesn't emulate the EEH error handling.
This model is not ready for hotplug yet.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[ clg: - numerous cleanups
- commit log
- fix for broken LSI support
- PHB pic printinfo
- large QOM rework ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200127144506.11132-2-clg@kaod.org>
[dwg: Use device_class_set_props()]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For devices that cannot be statically initialized, implement a
get_dt_compatible() callback that allows us to ask the device for
the 'compatible' value.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200121152935.649898-3-stefanb@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When the "hb-mode" option is activated on the powernv machine, the
firmware is mapped at 0x8000000 and the HRMOR of the HW threads are
set to the same address.
The PNOR mapping on the FW address space of the LPC bus is left enabled
to let the firmware load any other images required to boot the host.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200127144154.10170-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It's been deprecated since QEMU v3.1. The 40p machine should be
used nowadays instead.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20200114114617.28854-1-thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
blk_getlength() returns an int64_t but the result is stored in a
uint32_t. Errors (negative values) won't be caught by the check in
pnv_pnor_realize() and blk_blockalign() will allocate a very large
buffer in such cases.
Fixes Coverity issue CID 1412226.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200107171809.15556-3-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
And use it instead of reaching out to the machine. This allows to get
rid of pnv_get_chip().
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200106145645.4539-11-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This will be used in subsequent patches to access the XIVE associated to
a TCTX without reaching out to the machine through qdev_get_machine().
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[ groug: - split patch
- write subject and changelog ]
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200106145645.4539-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This allows to get rid of a call to qdev_get_machine().
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200106145645.4539-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Set it at chip creation and forward it to the cores. This allows to drop
a call to qdev_get_machine().
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200106145645.4539-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In order to get rid of qdev_get_machine(), first add a pointer to the
XIVE fabric under the XIVE router and make it configurable through a
QOM link property.
Configure it in the spapr and pnv machine. In the case of pnv, the XIVE
routers are under the chip, so this is done with a QOM alias property of
the POWER9 pnv chip.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200106145645.4539-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XIVE router base class currently inherits an empty realize hook
from the sysbus device base class, but it will soon implement one
of its own to perform some sanity checks. Do the preliminary plumbing
to have it called.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200106145645.4539-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
POWER8 is the only chip using the XICS interface. Add a "xics" link
and a XICSFabric attribute under this chip to remove the use of
qdev_get_machine()
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200106145645.4539-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XIVE router base class currently inherits an empty realize hook
from the sysbus device base class, but it will soon implement one
of its own to perform some sanity checks. Do the preliminary plumbing
to have it called.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191219181155.32530-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It isn't used anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623844102.360005.12070225703151669294.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XSCOM bus is implemented with a QOM interface, which is mostly
generic from a CPU type standpoint, except for the computation of
addresses on the Pervasive Connect Bus (PCB) network. This is handled
by the pnv_xscom_pcba() function with a switch statement based on
the chip_type class level attribute of the CPU chip.
This can be achieved using QOM. Also the address argument is masked with
PNV_XSCOM_SIZE - 1, which is for POWER8 only. Addresses may have different
sizes with other CPU types. Have each CPU chip type handle the appropriate
computation with a QOM xscom_pcba() method.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623843543.360005.13996472463887521794.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
They aren't used anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623842986.360005.1787401623906380181.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since pnv_dt_xscom() is called from chip specific dt_populate() hooks,
it shouldn't have to guess the chip type in order to populate the
"compatible" property. Just pass the compat string and its size as
arguments.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623842430.360005.9513965612524265862.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since pnv_dt_xscom() is called from chip specific dt_populate() hooks,
it shouldn't have to guess the chip type in order to populate the "reg"
property. Just pass the base address and address size as arguments.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623841868.360005.17577624823547136435.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pnv_chip_core_realize() function configures the XSCOM MMIO subregion
for each core of a single chip. The base address of the subregion depends
on the CPU type. Its computation is currently open-code using the
pnv_chip_is_powerXX() helpers. This can be achieved with QOM. Introduce
a method for this in the base chip class and implement it in child classes.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623841311.360005.4705705734873339545.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pnv_pic_print_info() callback checks the type of the chip in order
to forward to the request appropriate interrupt controller. This can
be achieved with QOM. Introduce a method for this in the base chip class
and implement it in child classes.
This also prepares ground for the upcoming interrupt controller of POWER10
chips.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623840755.360005.5002022339473369934.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
They aren't used anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623840200.360005.1300941274565357363.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We add an extra node to advertise power management on some machines,
namely powernv9 and powernv10. This is achieved by using the
pnv_is_power9() and pnv_is_power10() helpers.
This can be achieved with QOM. Add a method to the base class for
powernv machines and have it implemented by machine types that
support power management instead.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623839642.360005.9243510140436689941.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pnv_dt_create() function generates different contents for the
"compatible" property of the root node in the DT, depending on the
CPU type. This is open coded with multiple ifs using pnv_is_powerXX()
helpers.
It seems cleaner to achieve with QOM. Introduce a base class for the
powernv machine and a compat attribute that each child class can use
to provide the value for the "compatible" property.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623839085.360005.4046508784077843216.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[dwg: Folded in small fix Greg spotted after posting]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It isn't used anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623838530.360005.15470128760871845396.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The Processor Service Interface (PSI) model has a chip_type class level
attribute, which is used to generate the content of the "compatible" DT
property according to the CPU type.
Since the PSI model already has specialized classes for each supported
CPU type, it seems cleaner to achieve this with QOM. Provide the content
of the "compatible" property with a new class level attribute.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623837974.360005.14706607446188964477.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <157623837421.360005.412120366652768311.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The OCC common area is mapped at a unique address on the system and
each OCC is assigned a segment to expose its sensor data :
-------------------------------------------------------------------------
| Start (Offset from | End | Size |Description |
| BAR2 base address) | | | |
-------------------------------------------------------------------------
| 0x00580000 | 0x005A57FF |150kB |OCC 0 Sensor Data Block|
| 0x005A5800 | 0x005CAFFF |150kB |OCC 1 Sensor Data Block|
| : | : | : | : |
| 0x00686800 | 0x006ABFFF |150kB |OCC 7 Sensor Data Block|
| 0x006AC000 | 0x006FFFFF |336kB |Reserved |
-------------------------------------------------------------------------
Maximum size is 1.5MB.
We could define a "OCC common area" memory region at the machine level
and sub regions for each OCC. But it adds some extra complexity to the
models. Fix the current layout with a simpler model.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191211082912.2625-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PBA bridge unit (Power Bus Access) connects the OCC (On Chip
Controller) to the Power bus and System Memory. The PBA is used to
gather sensor data, for power management, for sleep states, for
initial boot, among other things.
The PBA logic provides a set of four registers PowerBus Access Base
Address Registers (PBABAR0..3) which map the OCC address space to the
PowerBus space. These registers are setup by the initial FW and define
the PowerBus Range of system memory that can be accessed by PBA.
The current modeling of the PBABAR registers is done under the common
XSCOM handlers. We introduce a specific XSCOM regions for these
registers and fix :
- BAR sizes and BAR masks
- The mapping of the OCC common area. It is common to all chips and
should be mapped once. We will address per-OCC area in the next
change.
- OCC common area is in BAR 3 on P8
Inspired by previous work of Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191211082912.2625-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PnvXScomInterface is an interface instance. It should never be
dereferenced. Drop the dummy type definition for extra safety,
which is the common practice with QOM interfaces.
While here also convert the bogus OBJECT_CHECK() to INTERFACE_CHECK().
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157608025541.186670.1577861507610404326.stgit@bahia.lan>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The Processor Utilisation of Resources Register (PURR) and Scaled
Processor Utilisation of Resources Register (SPURR) provide an estimate
of the resources used by the thread, present on POWER7 and later
processors.
Currently the [S]PURR registers simply count at the rate of the
timebase.
Preserve this behaviour but rework the implementation to store an offset
like the timebase rather than doing the calculation manually. Also allow
hypervisor write access to the register along with the currently
available read access.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[ clg: rebased on current ppc tree ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191128134700.16091-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The virtual timebase register (VTB) is a 64-bit register which
increments at the same rate as the timebase register, present on POWER8
and later processors.
The register is able to be read/written by the hypervisor and read by
the supervisor. All other accesses are illegal.
Currently the VTB is just an alias for the timebase (TB) register.
Implement the VTB so that is can be read/written independent of the TB.
Make use of the existing method for accessing timebase facilities where
by the compensation is stored and used to compute the value on reads/is
updated on writes.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[ clg: rebased on current ppc tree ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191128134700.16091-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Same a POWER9, only the MMIO window changes.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191205184454.10722-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The POWER10 PSIHB controller is very similar to the one on POWER9. We
should probably introduce a common PnvPsiXive object.
The ESB page size should be changed to 64k when P10 support is ready.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191205184454.10722-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is an empty shell with the XSCOM bus and cores. The chip controllers
will come later.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191205184454.10722-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When a CPU is reset, QEMU makes sure no interrupt is pending by clearing
CPUPPCstate::pending_interrupts in ppc_cpu_reset(). In the case of a
complete machine emulation, eg. a sPAPR machine, an external interrupt
request could still be pending in KVM though, eg. an IPI. It will be
eventually presented to the guest, which is supposed to acknowledge it at
the interrupt controller. If the interrupt controller is emulated in QEMU,
either XICS or XIVE, ppc_set_irq() won't deassert the external interrupt
pin in KVM since it isn't pending anymore for QEMU. When the vCPU re-enters
the guest, the interrupt request is still pending and the vCPU will try
again to acknowledge it. This causes an infinite loop and eventually hangs
the guest.
The code has been broken since the beginning. The issue wasn't hit before
because accel=kvm,kernel-irqchip=off is an awkward setup that never got
used until recently with the LC92x IBM systems (aka, Boston).
Add a ppc_irq_reset() function to do the necessary cleanup, ie. deassert
the IRQ pins of the CPU in QEMU and most importantly the external interrupt
pin for this vCPU in KVM.
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157548861740.3650476.16879693165328764758.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_ovec_diff(ov, old, new) has somewhat complex semantics. ov is set
to those bits which are in new but not old, and it returns as a boolean
whether or not there are any bits in old but not new.
It turns out that both callers only care about the second, not the first.
This is basically equivalent to a bitmap subset operation, which is easier
to understand and implement. So replace spapr_ovec_diff() with
spapr_ovec_subset().
Cc: Mike Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>
spapr_h_cas_compose_response() handles the last piece of the PAPR feature
negotiation process invoked via the ibm,client-architecture-support OF
call. Its only caller is h_client_architecture_support() which handles
most of the rest of that process.
I believe it was placed in a separate file originally to handle some
fiddly dependencies between functions, but mostly it's just confusing
to have the CAS process split into two pieces like this. Now that
compose response is simplified (by just generating the whole device
tree anew), it's cleaner to just fold it into
h_client_architecture_support().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
This is useful to dump the saved contexts of the vCPUs : configuration
of the base END index of the vCPU and the Interrupt Pending Buffer
register, which is updated when an interrupt can not be presented.
When dumping the NVT table, we skip empty indirect pages which are not
necessarily allocated.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-21-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When doing CAM line compares, fetch the block id from the interrupt
controller which can have set the PC_TCTXT_CHIPID field.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-20-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When PC_TCTXT_CHIPID_OVERRIDE is configured, the PC_TCTXT_CHIPID field
overrides the hardwired chip ID in the Powerbus operations and for CAM
compares. This is typically used in the one block-per-chip configuration
to associate a unique block id number to each IC of the system.
Simplify the model with a pnv_xive_block_id() helper and remove
'tctx_chipid' which becomes useless.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-19-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We will use it to resend missed interrupts when a vCPU context is
pushed on a HW thread.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-17-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It is now unused.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-16-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On the P9 Processor, the thread interrupt context registers of a CPU
can be accessed "directly" when by load/store from the CPU or
"indirectly" by the IC through an indirect TIMA page. This requires to
configure first the PC_TCTXT_INDIRx registers.
Today, we rely on the get_tctx() handler to deduce from the CPU PIR
the chip from which the TIMA access is being done. By handling the
TIMA memory ops under the interrupt controller model of each machine,
we can uniformize the TIMA direct and indirect ops under PowerNV. We
can also check that the CPUs have been enabled in the XIVE controller.
This prepares ground for the future versions of XIVE.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-15-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The TIMA region gives access to the thread interrupt context registers
of a CPU. It is mapped at the same address on all chips and can be
accessed by any CPU of the system. To identify the chip from which the
access is being done, the PowerBUS uses a 'chip' field in the
load/store messages. QEMU does not model these messages, instead, we
extract the chip id from the CPU PIR and do a lookup at the machine
level to fetch the targeted interrupt controller.
Introduce pnv_get_chip() and pnv_xive_tm_get_xive() helpers to clarify
this process in pnv_xive_get_tctx(). The latter will be removed in the
subsequent patches but the same principle will be kept.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-14-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XIVE and XICS-on-XIVE KVM devices on POWER9 hosts can greatly reduce
their consumption of some scarce HW resources, namely Virtual Presenter
identifiers, if they know the maximum number of vCPUs that may run in the
VM.
Prepare ground for this by passing the value down to xics_kvm_connect()
and kvmppc_xive_connect(). This is purely mechanical, no functional
change.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157478678301.67101.2717368060417156338.stgit@bahia.tlslab.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The TIMA operations are performed on behalf of the XIVE IVPE sub-engine
(Presenter) on the thread interrupt context registers. The current
operations supported by the model are simple and do not require access
to the controller but more complex operations will need access to the
controller NVT table and to its configuration.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-13-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XiveFabric QOM interface acts as the PowerBUS interface between
the interrupt controller and the system and should be implemented by
the QEMU machine. On HW, the XIVE sub-engine is responsible for the
communication with the other chip is the Common Queue (CQ) bridge
unit.
This interface offers a 'match_nvt' handler to perform the CAM line
matching when looking for a XIVE Presenter with a dispatched NVT.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When the TIMA of a CPU needs to be accessed from the indirect page,
the thread id of the target CPU is first stored in the PC_TCTXT_INDIR0
register. This thread id is relative to the chip and not to the system.
Introduce a helper routine to look for a CPU of a given PIR and fix
pnv_xive_get_indirect_tctx() to scan only the threads of the local
chip and not the whole machine.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
and use this helper to exclude CPUs which are not enabled in the XIVE
controller.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Allocating a big void * array to store multiple objects isn't a
recommended practice for various reasons:
- no compile time type checking
- potential dangling pointers if a reference on an individual is
taken and the array is freed later on
- duplicate boiler plate everywhere the array is browsed through
Allocate an array of pointers and populate it instead.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When the XIVE IVRE sub-engine (XiveRouter) looks for a Notification
Virtual Target (NVT) to notify, it broadcasts a message on the
PowerBUS to find an XIVE IVPE sub-engine (Presenter) with the NVT
dispatched on one of its HW threads, and then forwards the
notification if any response was received.
The current XIVE presenter model is sufficient for the pseries machine
because it has a single interrupt controller device, but the PowerNV
machine can have multiple chips each having its own interrupt
controller. In this case, the XIVE presenter model is too simple and
the CAM line matching should scan all chips of the system.
To start fixing this issue, we first extend the XIVE Router model with
a new XivePresenter QOM interface representing the XIVE IVPE
sub-engine. This interface exposes a 'match_nvt' handler which the
sPAPR and PowerNV XIVE Router models will need to implement to perform
the CAM line matching.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The BMC of the OpenPOWER systems monitors the machine state using
sensors, controls the power and controls the access to the PNOR flash
device containing the firmware image required to boot the host.
QEMU models the power cycle process, access to the sensors and access
to the PNOR device. But, for these features to be available, the QEMU
PowerNV machine needs two extras devices on the command line, an IPMI
BT device for communication and a BMC backend device:
-device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10
The BMC properties are then defined accordingly in the device tree and
OPAL self adapts. If a BMC device and an IPMI BT device are not
available, OPAL does not try to communicate with the BMC in any
manner. This is not how real systems behave.
To be closer to the default behavior, create an IPMI BMC simulator
device and an IPMI BT device at machine initialization time. We loose
the ability to define an external BMC device but there are benefits:
- a better match with real systems,
- a better test coverage of the OPAL code,
- system powerdown and reset commands that work,
- a QEMU device tree compliant with the specifications (*).
(*) Still needs a MBOX device.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191121162340.11049-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This activates HIOMAP support on the QEMU PowerNV machine. The PnvPnor
model is used to access the flash contents. The model simply maps the
contents at a fix offset and enables or disables the mapping.
HIOMAP Protocol description :
https://github.com/openbmc/hiomapd/blob/master/Documentation/protocol.md
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191028070027.22752-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Each vCPU in the system is identified with an NVT identifier which is
pushed in the OS CAM line (QW1W2) of the HW thread interrupt context
register when the vCPU is dispatched on a HW thread. This identifier
is used by the presenter subengine to find a matching target to notify
of an event. It is also used to fetch the associate NVT structure
which may contain pending interrupts that need a resend.
Add a couple of helpers for the NVT ids. The NVT space is 19 bits
wide, giving a maximum of 512K per chip.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191115162436.30548-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When an interrupt can not be presented to a vCPU, because it is not
running on any of the HW treads, the XIVE presenter updates the
Interrupt Pending Buffer register of the associated XIVE NVT
structure. This is only done if backlog is activated in the END but
this is generally the case.
The current code assumes that the fields of the NVT structure is
architected with the same layout of the thread interrupt context
registers. Fix this assumption and define an offset for the IPB
register backup value in the NVT.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191115162436.30548-2-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On a POWERPC PowerNV system, the host firmware is stored in a PNOR
flash chip which contents is mapped on the LPC bus. This model adds a
simple dummy device to map the contents of a block device in the host
address space.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191021131215.3693-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
SpaprInterruptControllerClass and PnvChipClass have an intc_create() method
that calls the appropriate routine, ie. icp_create() or xive_tctx_create(),
to establish the link between the VCPU and the presenter component of the
interrupt controller during realize.
There aren't any symmetrical call to be called when the VCPU gets unrealized
though. It is assumed that object_unparent() is the only thing to do.
This is questionable because the parenting logic around the CPU and
presenter objects is really an implementation detail of the interrupt
controller. It shouldn't be open-coded in the machine code.
Fix this by adding an intc_destroy() method that undoes what was done in
intc_create(). Also NULLify the presenter pointers to avoid having
stale pointers around. This will allow to reliably check if a vCPU has
a valid presenter.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157192724208.3146912.7254684777515287626.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
When a Virtual Processor is scheduled to run on a HW thread, the
hypervisor pushes its identifier in the OS CAM line. When running with
kernel_irqchip=off, QEMU needs to emulate the same behavior.
Set the OS CAM line when the interrupt presenter of the sPAPR core is
reset. This will also cover the case of hot-plugged CPUs.
This change also has the benefit to remove the use of CPU_FOREACH()
which can be unsafe.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20191022163812.330-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On the sPAPR machine and PowerNV machine, the interrupt presenters are
created by a machine handler at the core level and are reset
independently. This is not consistent and it raises issues when it
comes to handle hot-plugged CPUs. In that case, the presenters are not
reset. This is less of an issue in XICS, although a zero MFFR could
be a concern, but in XIVE, the OS CAM line is not set and this breaks
the presenting algorithm. The current code has workarounds which need
a global cleanup.
Extend the sPAPR IRQ backend and the PowerNV Chip class with a new
cpu_intc_reset() handler called by the CPU reset handler and remove
the XiveTCTX reset handler which is now redundant.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191022163812.330-6-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We will use it to reset the interrupt presenter from the CPU reset
handler.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20191022163812.330-5-clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For the benefit of peripheral device allocation, the number of available
irqs really wants to be the same on a given machine type version,
regardless of what irq backends we are using. That's the case now, but
only because we make sure the different SpaprIrq instances have the same
value except for the special legacy one.
Since this really only depends on machine type version, move the value to
SpaprMachineClass instead of SpaprIrq. This also puts the code to set it
to the lower value on old machine types right next to setting
legacy_irq_allocation, which needs to go hand in hand.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
The nr_msis value we use here has to line up with whether we're using
legacy or modern irq allocation. Therefore it's safer to derive it based
on legacy_irq_allocation rather than having SpaprIrq contain a canned
value.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
The remaining logic in the post_load hook really belongs to the interrupt
controller backends, and just needs to be called on the active controller
(after the active controller is set to the right thing based on the
incoming migration in the generic spapr_irq_post_load() logic).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
It turns out that all the logic in the SpaprIrq::reset hooks (and some in
the SpaprIrq::post_load hooks) isn't really related to resetting the irq
backend (that's handled by the backends' own reset routines). Rather its
about getting the backend ready to be the active interrupt controller or
stopping being the active interrupt controller - reset (and post_load) is
just the only time that changes at present.
To make this flow clearer, move the logic into the explicit backend
activate and deactivate hooks.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
This hook is a bit odd. The only caller is spapr_irq_init_kvm(), but
it explicitly takes an SpaprIrq *, so it's never really called through the
current SpaprIrq. Essentially this is just a way of passing through a
function pointer so that spapr_irq_init_kvm() can handle some
configuration and error handling logic without duplicating it between the
xics and xive reset paths.
So, make it just take that function pointer. Because of earlier reworks
to the KVM connect/disconnect code in the xics and xive backends we can
also eliminate some wrapper functions and streamline error handling a bit.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Both XICS and XIVE have routines to connect and disconnect KVM with
similar but not identical signatures. This adjusts them to match
exactly, which will be useful for further cleanups later.
While we're there, we add an explicit return value to the connect path
to streamline error reporting in the callers. We remove error
reporting the disconnect path. In the XICS case this wasn't used at
all. In the XIVE case the only error case was if the KVM device was
set up, but KVM didn't have the capability to do so which is pretty
obviously impossible.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
This method depends only on the active irq controller. Now that we've
formalized the notion of active controller we can dispatch directly
through that, rather than dispatching via SpaprIrq with the dual
version having to do a second conditional dispatch.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
This method depends only on the active irq controller. Now that we've
formalized the notion of active controller we can dispatch directly
through that, rather than dispatching via SpaprIrq with the dual
version having to do a second conditional dispatch.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
This method depends only on the active irq controller. Now that we've
formalized the notion of active controller we can dispatch directly through
that, rather than dispatching via SpaprIrq with the dual version having
to do a second conditional dispatch.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
spapr now has the mechanism of constructing both XICS and XIVE instances of
the SpaprInterruptController interface. However, only one of the interrupt
controllers will actually be active at any given time, depending on feature
negotiation with the guest. This is handled in the current code via
spapr_irq_current() which checks the OV5 vector from feature negotiation to
determine the current backend.
Determining the active controller at the point we need it like this
can be pretty confusing, because it makes it very non obvious at what
points the active controller can change. This can make it difficult
to reason about the code and where a change of active controller could
appear in sequence with other events.
Make this mechanism more explicit by adding an 'active_intc' pointer
and an explicit spapr_irq_update_active_intc() function to update it
from the CAS state. We also add hooks on the intc backend which will
get called when it is activated or deactivated.
For now we just introduce the switch and hooks, later patches will
actually start using them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
These methods, like cpu_intc_create, really belong to the interrupt
controller, but need to be called on all possible intcs.
Like cpu_intc_create, therefore, make them methods on the intc and
always call it for all existing intcs.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
This method essentially represents code which belongs to the interrupt
controller, but needs to be called on all possible intcs, rather than
just the currently active one. The "dual" version therefore calls
into the xics and xive versions confusingly.
Handle this more directly, by making it instead a method on the intc
backend, and always calling it on every backend that exists.
While we're there, streamline the error reporting a bit.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
The SpaprIrq structure is used to represent ths spapr machine's irq
backend. Except that it kind of conflates two concepts: one is the
backend proper - a specific interrupt controller that we might or
might not be using, the other is the irq configuration which covers
the layout of irq space and which interrupt controllers are allowed.
This leads to some pretty confusing code paths for the "dual"
configuration where its hooks redirect to other SpaprIrq structures
depending on the currently active irq controller.
To clean this up, we start by introducing a new
SpaprInterruptController QOM interface to represent strictly an
interrupt controller backend, not counting anything configuration
related. We implement this interface in the XICs and XIVE interrupt
controllers, and in future we'll move relevant methods from SpaprIrq
into it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Support for setting VSMT is available in KVM since linux-4.13. Most distros
that support KVM on POWER already have it. It thus seem reasonable enough
to have the default machine to set VSMT to smp_threads.
This brings contiguous VCPU ids and thus brings their upper bound down to
the machine's max_cpus. This is especially useful for XIVE KVM devices,
which may thus allocate only one VP descriptor per VCPU.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157010411885.246126.12610015369068227139.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The trigger data is used for both triggers of a HW source interrupts,
PHB, PSI, and triggers for rerouting interrupts between interrupt
controllers.
When an interrupt is rerouted, the trigger data follows an "END
trigger" format. In that case, the remote IC needs EAS containing an
END index to perform a lookup of an END.
An END trigger, bit0 of word0 set to '1', is defined as :
|0123|4567|0123|4567|0123|4567|0123|4567|
W0 E=1 |1P--|BLOC| END IDX |
W1 E=1 |M | END DATA |
An EAS is defined as :
|0123|4567|0123|4567|0123|4567|0123|4567|
W0 |V---|BLOC| END IDX |
W1 |M | END DATA |
The END trigger adds an extra 'PQ' bit, bit1 of word0 set to '1',
signaling that the PQ bits have been checked. That bit is unused in
the initial EAS definition.
When a HW device performs the trigger, the trigger data follows an
"EAS trigger" format because the trigger data in that case contains an
EAS index which the IC needs to look for.
An EAS trigger, bit0 of word0 set to '0', is defined as :
|0123|4567|0123|4567|0123|4567|0123|4567|
W0 E=0 |0P--|---- ---- ---- ---- ---- ---- ----|
W1 E=0 |BLOC| EAS INDEX |
There is also a 'PQ' bit, bit1 of word0 to '1', signaling that the
PQ bits have been checked.
Introduce these new trigger bits and rename the XIVE_SRCNO macros in
XIVE_EAS to reflect better the nature of the data.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191007084102.29776-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This method is used to set up the interrupt backends for the current
configuration. However, this means some confusing redirection between
the "dual" mode init and the init hooks for xics only and xive only modes.
Since we now have simple flags indicating whether XICS and/or XIVE are
supported, it's easier to just open code each initialization directly in
spapr_irq_init(). This will also make some future cleanups simpler.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
SpaprIrq::ov5 stores the value for a particular byte in PAPR option vector
5 which indicates whether XICS, XIVE or both interrupt controllers are
available. As usual for PAPR, the encoding is kind of overly complicated
and confusing (though to be fair there are some backwards compat things it
has to handle).
But to make our internal code clearer, have SpaprIrq encode more directly
which backends are available as two booleans, and derive the OV5 value from
that at the point we need it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
spapr_xive_irq_claim() returns a bool to indicate if it succeeded.
But most of the callers and one callee use int return values and/or an
Error * with more information instead. In any case, ints are a more
common idiom for success/failure states than bools (one never knows
what sense they'll be in).
So instead change to an int return value to indicate presence of error
+ an Error * to describe the details through that call chain.
It also didn't actually check if the irq was already claimed, which is
one of the primary purposes of the claim path, so do that.
spapr_xive_irq_free() also returned a bool... which no callers checked
and was always true, so just drop it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
spapr_irq_free() can be used to free multiple irqs at once. That's useful
for its callers, but there's no need to make the individual backend hooks
handle this. We can loop across the irqs in spapr_irq_free() itself and
have the hooks just do one at time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
This method is used to determine the name of the irq backend's node in the
device tree, so that we can find its phandle (after SLOF may have modified
it from the phandle we initially gave it).
But, in the two cases the only difference between the node name is the
presence of a unit address. Searching for a node name without considering
unit address is standard practice for the device tree, and
fdt_subnode_offset() will do exactly that, making this method unecessary.
While we're there, remove the XICS_NODENAME define. The name
"interrupt-controller" is required by PAPR (and IEEE1275), and a bunch of
places assume it already.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently spapr_qirq(), whic is used to find the qemu_irq for an spapr
global irq number, redirects through the SpaprIrq::qirq method. But
the array of qemu_irqs is allocated in the PAPR layer, not the
backends, and so the method implementations all return the same thing,
just differing in the preliminary checks they make.
So, we can remove the method, and just implement spapr_qirq() directly,
including all the relevant checks in one place. We change all those
checks into assert()s as well, since a failure here indicates an error in
the calling code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The only reason this parameter was needed was to work around the
inconsistent meaning of nr_irqs between xics and xive. Now that we've
fixed that, we can consistently use the number directly in the SpaprIrq
configuration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Both the XICS and XIVE interrupt backends have a "nr-irqs" property, but
it means slightly different things. For XICS (or, strictly, the ICS) it
indicates the number of "real" external IRQs. Those start at XICS_IRQ_BASE
(0x1000) and don't include the special IPI vector. For XIVE, however, it
includes the whole IRQ space, including XIVE's many IPI vectors.
The spapr code currently doesn't handle this sensibly, with the
nr_irqs value in SpaprIrq having different meanings depending on the
backend. We fix this by renaming nr_irqs to nr_xirqs and making it
always indicate just the number of external irqs, adjusting the value
we pass to XIVE accordingly. We also move to using common constants
in most of the irq configurations, to make it clearer that the IRQ
space looks the same to the guest (and emulated devices), even if the
backend is different.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Every caller of spapr_vio_qirq() immediately calls qemu_irq_pulse() with
the result, so we might as well just fold that into the helper.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
We create a subtype of TYPE_ICS specifically for sPAPR. For now all this
does is move the setup of the PAPR specific hcalls and RTAS calls to
the realize() function for this, rather than requiring the PAPR code to
explicitly call xics_spapr_init(). In future it will have some more
function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
TYPE_ICS_SIMPLE is the only subtype of TYPE_ICS_BASE that's ever
instantiated. The existence of different classes is mostly a hang
over from when we (misguidedly) had separate subtypes for the KVM and
non-KVM version of the device.
There could be some call for an abstract base type for ICS variants
that use a different representation of their state (PowerNV PHB3 might
want this). The current split isn't really in the right place for
that though. If we need this in future, we can re-implement it more
in line with what we actually need.
So, collapse the two classes together into just TYPE_ICS.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently TYPE_XICS_BASE and TYPE_XICS_SIMPLE have their own reset methods,
using the standard technique for having the subtype call the supertype's
methods before doing its own thing.
But TYPE_XICS_SIMPLE is the only subtype of TYPE_XICS_BASE ever
instantiated, so there's no point having the split here. Merge them
together into just an ics_reset() function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
There are a number of ics_simple_*() functions that aren't actually
specific to TYPE_XICS_SIMPLE at all, and are equally valid on
TYPE_XICS_BASE. Rename them to ics_*() accordingly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently ics_reject(), ics_resend() and ics_eoi() indirect through
class methods. But there's only one implementation of each method,
the one in TYPE_ICS_SIMPLE. TYPE_ICS_BASE has no implementation, but
it's never instantiated, and has no other subtypes.
So clean up by eliminating the method and just having ics_reject(),
ics_resend() and ics_eoi() contain the logic directly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Interface instances should never be directly dereferenced. So, the common
practice is to make them incomplete types to make sure no-one does that.
XICSFrabric, however, had a dummy type which is less safe.
We were also using OBJECT_CHECK() where we should have been using
INTERFACE_CHECK().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
SLOF implements one itself so let's remove it from QEMU. It is one less
image and simpler setup as the RTAS blob never stays in its initial place
anyway as the guest OS always decides where to put it.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>