Commit Graph

746 Commits

Author SHA1 Message Date
Cédric Le Goater
c6b8cc370d ppc/pnv: Add support for PQ offload on PHB5
The PQ_disable configuration bit disables the check done on the PQ
state bits when processing new MSI interrupts. When bit 9 is enabled,
the PHB forwards any MSI trigger to the XIVE interrupt controller
without checking the PQ state bits. The XIVE IC knows from the trigger
message that the PQ bits have not been checked and performs the check
locally.

This configuration bit only applies to MSIs and LSIs are still checked
on the PHB to handle the assertion level.

PQ_disable enablement is a requirement for StoreEOI.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:39 +01:00
Cédric Le Goater
0aa2612a01 ppc/xive: Add support for PQ state bits offload
The trigger message coming from a HW source contains a special bit
informing the XIVE interrupt controller that the PQ bits have been
checked at the source or not. Depending on the value, the IC can
perform the check and the state transition locally using its own PQ
state bits.

The following changes add new accessors to the XiveRouter required to
query and update the PQ state bits. This only applies to the PowerNV
machine. sPAPR accessors are provided but the pSeries machine should
not be concerned by such complex configuration for the moment.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:39 +01:00
Cédric Le Goater
aadf13abaa ppc/xive2: Add support for notification injection on ESB pages
This is an internal offset used to inject triggers when the PQ state
bits are not controlled locally. Such as for LSIs when the PHB5 are
using the Address-Based Interrupt Trigger mode and on the END.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:39 +01:00
Cédric Le Goater
924996766b ppc/pnv: Add a HOMER model to POWER10
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:39 +01:00
Cédric Le Goater
623575e16c ppc/pnv: Add model for POWER10 PHB5 PCIe Host bridge
PHB4 and PHB5 are very similar. Use the PHB4 models with some minor
adjustements in a subclass for P10.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:39 +01:00
Cédric Le Goater
ae4c68e366 ppc/pnv: Add POWER10 quads
and use a pnv_chip_power10_quad_realize() helper to avoid code
duplication with P9. This still needs some refinements on the XSCOM
registers handling in PnvQuad.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:39 +01:00
Cédric Le Goater
8bf682a349 ppc/pnv: Add a OCC model for POWER10
Our OCC model is very mininal and POWER10 can simply reuse the OCC
model we introduced for POWER9.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:39 +01:00
Cédric Le Goater
da71b7e3ed ppc/pnv: Add a XIVE2 controller to the POWER10 chip
The XIVE2 interrupt controller of the POWER10 processor follows the
same logic than on POWER9 but the HW interface has been largely
reviewed.  It has a new register interface, different BARs, extra
VSDs, new layout for the XIVE2 structures, and a set of new features
which are described below.

This is a model of the POWER10 XIVE2 interrupt controller for the
PowerNV machine. It focuses primarily on the needs of the skiboot
firmware but some initial hypervisor support is implemented for KVM
use (escalation).

Support for new features will be implemented in time and will require
new support from the OS.

* XIVE2 BARS

The interrupt controller BARs have a different layout outlined below.
Each sub-engine has now own its range and the indirect TIMA access was
replaced with a set of pages, one per CPU, under the IC BAR:

  - IC BAR (Interrupt Controller)
    . 4 pages, one per sub-engine
    . 128 indirect TIMA pages
  - TM BAR (Thread Interrupt Management Area)
    . 4 pages
  - ESB BAR (ESB pages for IPIs)
    . up to 1TB
  - END BAR (ESB pages for ENDs)
    . up to 2TB
  - NVC BAR (Notification Virtual Crowd)
    . up to 128
  - NVPG BAR (Notification Virtual Process and Group)
    . up to 1TB
  - Direct mapped Thread Context Area (reads & writes)

OPAL does not use the grouping and crowd capability.

* Virtual Structure Tables

XIVE2 adds new tables types and also changes the field layout of the END
and NVP Virtualization Structure Descriptors.

  - EAS
  - END new layout
  - NVT was splitted in :
    . NVP (Processor), 32B
    . NVG (Group), 32B
    . NVC (Crowd == P9 block group) 32B
  - IC for remote configuration
  - SYNC for cache injection
  - ERQ for event input queue

The setup is slighly different on XIVE2 because the indexing has changed
for some of the tables, block ID or the chip topology ID can be used.

* XIVE2 features

SCOM and MMIO registers have a new layout and XIVE2 adds a new global
capability and configuration registers.

The lowlevel hardware offers a set of new features among which :

  - a configurable number of priorities : 1 - 8
  - StoreEOI with load-after-store ordering is activated by default
  - Gen2 TIMA layout
  - A P9-compat mode, or Gen1, TIMA toggle bit for SW compatibility
  - increase to 24bit for VP number

Other features will have some impact on the Hypervisor and guest OS
when activated, but this is not required for initial support of the
controller.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:38 +01:00
Cédric Le Goater
09a67f3d0e ppc/xive2: Introduce a presenter matching routine
The VP space is larger in XIVE2 (P10), 24 bits instead of 19bits on
XIVE (P9), and the CAM line can use a 7bits or 8bits thread id.

For now, we only use 7bits thread ids, same as P9, but because of the
change of the size of the VP space, the CAM matching routine is
different between P9 and P10. It is easier to duplicate the whole
routine than to add extra handlers in xive_presenter_tctx_match() used
for P9.

We might come with a better solution later on, after we have added
some more support for the XIVE2 controller.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:38 +01:00
Cédric Le Goater
f8a233dedf ppc/xive2: Introduce a XIVE2 core framework
The XIVE2 interrupt controller of the POWER10 processor as the same
logic as on POWER9 but its SW interface has been largely reworked. The
interrupt controller has a new register interface, different BARs,
extra VSDs. These will be described when we add the device model for
the baremetal machine.

The XIVE internal structures for the EAS, END, NVT have different
layouts which is a problem for the current core XIVE framework. To
avoid adding too much complexity in the XIVE models, a new XIVE2 core
framework is introduced. It duplicates the models which are closely
linked to the XIVE internal structures : Xive2Router and
Xive2ENDSource and reuses the XiveSource, XivePresenter, XiveTCTX
models, as they are more generic.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:38 +01:00
Nicholas Piggin
120f738a46 spapr: implement nested-hv capability for the virtual hypervisor
This implements the Nested KVM HV hcall API for spapr under TCG.

The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
L1 is switched back in returned from the hcall when a HV exception
is sent to the vhyp. Register state is copied in and out according to
the nested KVM HV hcall API specification.

The hdecr timer is started when the L2 is switched in, and it provides
the HDEC / 0x980 return to L1.

The MMU re-uses the bare metal radix 2-level page table walker by
using the get_pate method to point the MMU to the nested partition
table entry. MMU faults due to partition scope errors raise HV
exceptions and accordingly are routed back to the L1.

The MMU does not tag translations for the L1 (direct) vs L2 (nested)
guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
and exit).

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-10-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin
93aeb70210 ppc: allow the hdecr timer to be created/destroyed
Machines which don't emulate the HDEC facility are able to use the
timer for something else. Provide functions to start and stop the
hdecr timer.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-4-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Shivaprasad G Bhat
b5513584a0 spapr: nvdimm: Implement H_SCM_FLUSH hcall
The patch adds support for the SCM flush hcall for the nvdimm devices.
To be available for exploitation by guest through the next patch. The
hcall is applicable only for new SPAPR specific device class which is
also introduced in this patch.

The hcall expects the semantics such that the flush to return with
H_LONG_BUSY_ORDER_10_MSEC when the operation is expected to take longer
time along with a continue_token. The hcall to be called again by providing
the continue_token to get the status. So, all fresh requests are put into
a 'pending' list and flush worker is submitted to the thread pool. The
thread pool completion callbacks move the requests to 'completed' list,
which are cleaned up after collecting the return status for the guest
in subsequent hcall from the guest.

The semantics makes it necessary to preserve the continue_tokens and
their return status across migrations. So, the completed flush states
are forwarded to the destination and the pending ones are restarted
at the destination in post_load. The necessary nvdimm flush specific
vmstate structures are also introduced in this patch which are to be
saved in the new SPAPR specific nvdimm device to be introduced in the
following patch.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <164396254862.109112.16675611182159105748.stgit@ltczzess4.aus.stglabs.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Philippe Mathieu-Daudé
dc10da64e1 hw/ppc/vof: Add missing includes
vof.h requires "qom/object.h" for DECLARE_CLASS_CHECKERS(),
"exec/memory.h" for address_space_read/write(),
"exec/address-spaces.h" for address_space_memory
and more importantly "cpu.h" for target_ulong.

vof.c doesn't need "exec/ram_addr.h".

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220122003104.84391-1-f4bug@amsat.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-01-28 13:15:03 +01:00
Cédric Le Goater
eb93c82888 ppc/pnv: Move num_phbs under Pnv8Chip
It is not used elsewhere so that's where it belongs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220105212338.49899-10-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-01-12 11:28:27 +01:00
Cédric Le Goater
c29dd0034d ppc/pnv: Reparent user created PHB3 devices to the PnvChip
The powernv machine uses the object hierarchy to populate the device
tree and each device should be parented to the chip it belongs to.
This is not the case for user created devices which are parented to
the container "/unattached".

Make sure a PHB3 device is parented to its chip by reparenting the
object if necessary.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220105212338.49899-8-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-01-12 11:28:27 +01:00
Cédric Le Goater
1f6a88fffc ppc/pnv: Introduce support for user created PHB3 devices
PHB3 devices and PCI devices can now be added to the powernv8 machine
using :

  -device pnv-phb3,chip-id=0,index=1 \
  -device nec-usb-xhci,bus=pci.1,addr=0x0

The 'index' property identifies the PHB3 in the chip. In case of user
created devices, a lookup on 'chip-id' is required to assign the
owning chip.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220105212338.49899-7-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-01-12 11:28:27 +01:00
Cédric Le Goater
a71cd51e2a ppc/pnv: Attach PHB3 root port device when defaults are enabled
This cleanups the PHB3 model a bit more since the root port is an
independent device and it will ease our task when adding user created
PHB3s.

pnv_phb_attach_root_port() is made public in pnv.c so it can be reused
with the pnv_phb4 root port later.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220105212338.49899-4-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-01-12 11:28:27 +01:00
Philippe Mathieu-Daudé
cd1db8df74 dma: Let ld*_dma() propagate MemTxResult
dma_memory_read() returns a MemTxResult type. Do not discard
it, return it to the caller.

Update the few callers.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211223115554.3155328-19-philmd@redhat.com>
2021-12-31 01:05:27 +01:00
Philippe Mathieu-Daudé
34cdea1db6 dma: Let ld*_dma() take MemTxAttrs argument
Let devices specify transaction attributes when calling ld*_dma().

Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211223115554.3155328-17-philmd@redhat.com>
2021-12-31 01:05:27 +01:00
Philippe Mathieu-Daudé
2280c27afc dma: Let st*_dma() take MemTxAttrs argument
Let devices specify transaction attributes when calling st*_dma().

Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211223115554.3155328-16-philmd@redhat.com>
2021-12-31 01:05:27 +01:00
Philippe Mathieu-Daudé
ba06fe8add dma: Let dma_memory_read/write() take MemTxAttrs argument
Let devices specify transaction attributes when calling
dma_memory_read() or dma_memory_write().

Patch created mechanically using spatch with this script:

  @@
  expression E1, E2, E3, E4;
  @@
  (
  - dma_memory_read(E1, E2, E3, E4)
  + dma_memory_read(E1, E2, E3, E4, MEMTXATTRS_UNSPECIFIED)
  |
  - dma_memory_write(E1, E2, E3, E4)
  + dma_memory_write(E1, E2, E3, E4, MEMTXATTRS_UNSPECIFIED)
  )

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20211223115554.3155328-6-philmd@redhat.com>
2021-12-30 17:16:32 +01:00
Philippe Mathieu-Daudé
7a36e42d91 dma: Let dma_memory_set() take MemTxAttrs argument
Let devices specify transaction attributes when calling
dma_memory_set().

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20211223115554.3155328-3-philmd@redhat.com>
2021-12-30 17:16:32 +01:00
Philippe Mathieu-Daudé
7ccb391ccd dma: Let dma_memory_valid() take MemTxAttrs argument
Let devices specify transaction attributes when calling
dma_memory_valid().

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20211223115554.3155328-2-philmd@redhat.com>
2021-12-30 17:16:32 +01:00
Cédric Le Goater
422fd92e61 ppc/pnv: Introduce a num_pecs class attribute for PHB4 PEC devices
POWER9 processor comes with 3 PHB4 PEC (PCI Express Controller) and
each PEC can have several PHBs :

  * PEC0 provides 1 PHB  (PHB0)
  * PEC1 provides 2 PHBs (PHB1 and PHB2)
  * PEC2 provides 3 PHBs (PHB3, PHB4 and PHB5)

A num_pecs class attribute represents better the logic units of the
POWER9 chip. Use that instead of num_phbs which fits POWER8 chips.
This will ease adding support for user created devices.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211213132830.108372-8-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-12-17 17:57:19 +01:00
Cédric Le Goater
621f70d210 spapr/xive: Add source status helpers
and use them to set and test the ASSERTED bit of LSI sources.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211004212141.432954-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-21 11:42:47 +11:00
Bin Meng
06caae8af0 hw/intc: openpic: Clean up the styles
Correct the multi-line comment format. No functional changes.

Signed-off-by: Bin Meng <bin.meng@windriver.com>

Message-Id: <20210918032653.646370-3-bin.meng@windriver.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30 12:26:06 +10:00
Bin Meng
86229b68a2 hw/intc: openpic: Drop Raven related codes
There is no machine that uses Motorola MCP750 (aka Raven) model.
Drop the related codes.

While we are here, drop the mentioning of Intel GW80314 I/O
companion chip in the comments as it has been obsolete for years,
and correct a typo too.

Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20210918032653.646370-2-bin.meng@windriver.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza
e0eb84d4f5 spapr_numa.c: FORM2 NUMA affinity support
The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and straightforward NUMA distance assignment without relying on
complex associations between several levels of NUMA via
ibm,associativity matches. Another feature is its extensibility. This base
support contains the facilities for NUMA distance assignment, but in the
future more facilities will be added for latency, performance, bandwidth
and so on.

This patch implements the base FORM2 affinity support as follows:

- the use of FORM2 associativity is indicated by using bit 2 of byte 5
of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1
or FORM2 affinity. Setting both forms will default to FORM2. We're not
advertising FORM2 for pseries-6.1 and older machine versions to prevent
guest visible changes in those;

- ibm,associativity-reference-points has a new semantic. Instead of
being used to calculate distances via NUMA levels, it's now used to
indicate the primary domain index in the ibm,associativity domain of
each resource. In our case it's set to {0x4}, matching the position
where we already place logical_domain_id;

- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table
and ibm,numa-distance-table. The index table is used to list all the
NUMA logical domains of the platform, in ascending order, and allows for
spartial NUMA configurations (although QEMU ATM doesn't support that).
ibm,numa-distance-table is an array that contains all the distances from
the first NUMA node to all other nodes, then the second NUMA node
distances to all other nodes and so on;

- get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity()
now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest
selected FORM2 affinity during CAS.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-7-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza
5dab5abe62 spapr: move FORM1 verifications to post CAS
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has a different latency than the original NUMA node from the
regular memory. FORM2 is also able  to deal with asymmetric NUMA
distances gracefully, something that our FORM1 implementation doesn't
do.

Move these FORM1 verifications to a new function and wait until after
CAS, when we're sure that we're sticking with FORM1, to enforce them.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza
a165ac67c3 spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array
Introducing a new NUMA affinity, FORM2, requires a new mechanism to
switch between affinity modes after CAS. Also, we want FORM2 data
structures and functions to be completely separated from the existing
FORM1 code, allowing us to avoid adding new code that inherits the
existing complexity of FORM1.

The idea of switching values used by the write_dt() functions in
spapr_numa.c was already introduced in the previous patch, and
the same approach will be used when dealing with the FORM1 and FORM2
arrays.

We can accomplish that by that by renaming the existing numa_assoc_array
to FORM1_assoc_array, which now is used exclusively to handle FORM1 affinity
data. A new helper get_associativity() is then introduced to be used by the
write_dt() functions to retrieve the current ibm,associativity array of
a given node, after considering affinity selection that might have been
done during CAS. All code that was using numa_assoc_array now needs to
retrieve the array by calling this function.

This will allow for an easier plug of FORM2 data later on.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza
3a6e4ce684 spapr_numa.c: parametrize FORM1 macros
The next preliminary step to introduce NUMA FORM2 affinity is to make
the existing code independent of FORM1 macros and values, i.e.
MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch
accomplishes that by doing the following:

- move the NUMA related macros from spapr.h to spapr_numa.c where they
are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used
to refer to the maximum number of NUMA nodes, including GPU nodes, that
the machine can support;

- MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to
FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific
macros are used in FORM1 init functions;

- code that uses MAX_DISTANCE_REF_POINTS now retrieves the
max_dist_ref_points value using get_max_dist_ref_points().
NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE
is replaced by get_vcpu_assoc_size(). These functions are used by the
generic device tree functions and h_home_node_associativity() and will
allow them to switch between FORM1 and FORM2 without changing their core
logic.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30 12:26:06 +10:00
Cédric Le Goater
92612f1550 ppc/pnv: Rename "id" to "quad-id" in PnvQuad
This to avoid possible conflicts with the "id" property of QOM objects.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210901094153.227671-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29 19:37:38 +10:00
Cédric Le Goater
daf115cf9a ppc/xive: Export xive_tctx_word2() helper
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210901094153.227671-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29 19:37:38 +10:00
Cédric Le Goater
89d2468d96 ppc/xive: Export priority_to_ipb() helper
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210901094153.227671-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29 19:37:38 +10:00
Cédric Le Goater
dd4e4d1296 ppc/xive: Export xive_presenter_notify()
It's generic enough to be used from the XIVE2 router and avoid more
duplication.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210809134547.689560-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-08-27 12:41:13 +10:00
Cédric Le Goater
fb8dc327f4 ppc/xive: Export PQ get/set routines
These will be shared with the XIVE2 router.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210809134547.689560-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-08-27 12:41:13 +10:00
Cédric Le Goater
ab17a3fe74 ppc/pnv: Use a simple incrementing index for the chip-id
When the QEMU PowerNV machine was introduced, multi chip support
modeled a two socket system with dual chip modules as found on some P8
Tuleta systems (8286-42A). But this is hardly used and not relevant
for QEMU. Use a simple index instead.

With this change, we can now increase the max socket number to 16 as
found on high end systems.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210809134547.689560-5-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-08-27 12:41:13 +10:00
Cédric Le Goater
6bc8c04648 ppc/pnv: Change the POWER10 machine to support DD2 only
There is no need to keep the DD1 chip model as it will never be
publicly available.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210809134547.689560-3-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-08-27 12:41:13 +10:00
Alexey Kardashevskiy
14c7e06e72 ppc/vof: Fix Coverity issues
Coverity reported issues which are caused by mixing of signed return codes
from DTC and unsigned return codes of the client interface.

This introduces PROM_ERROR and makes distinction between the error types.

This fixes NEGATIVE_RETURNS, OVERRUN issues reported by Coverity.

This adds a comment about the return parameters number in the VOF hcall.
The reason for such counting is to keep the numbers look the same in
vof_client_handle() and the Linux (an OF client).

vmc->client_architecture_support() returns target_ulong and we want to
propagate this to the client (for example H_MULTI_THREADS_ACTIVE).
The VOF path to do_client_architecture_support() needs chopping off
the top 32bit but SLOF's H_CAS does not; and either way the return values
are either 0 or 32bit negative error code. For now this chops
the top 32bits.

This makes "claim" fail if the allocated address is above 4GB as
the client interface is 32bit. This still allows claiming memory above
4GB as potentially initrd can be put there and the client can read
the address from the FDT's "available" property.

Fixes: CID 1458139, 1458138, 1458137, 1458133, 1458132
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210720050726.2737405-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-07-29 10:59:49 +10:00
Bharata B Rao
82123b756a target/ppc: Support for H_RPT_INVALIDATE hcall
If KVM_CAP_RPT_INVALIDATE KVM capability is enabled, then

- indicate the availability of H_RPT_INVALIDATE hcall to the guest via
  ibm,hypertas-functions property.
- Enable the hcall

Both the above are done only if the new sPAPR machine capability
cap-rpt-invalidate is set.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Message-Id: <20210706112440.1449562-3-bharata@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-07-09 11:01:06 +10:00
Alexey Kardashevskiy
21bde1ecb6 spapr: Fix implementation of Open Firmware client interface
This addresses the comments from v22.

The functional changes are (the VOF ones need retesting with Pegasos2):

(VOF) setprop will start failing if the machine class callback
did not handle it;
(VOF) unit addresses are lowered in path_offset();
(SPAPR) /chosen/bootargs is initialized from kernel_cmdline if
the client did not change it.

Fixes: 5c991e5d4378 ("spapr: Implement Open Firmware client interface")
Cc: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210708065625.548396-1-aik@ozlabs.ru>
Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-07-09 10:55:11 +10:00
Nicholas Piggin
17fd09c021 target/ppc/spapr: Update H_GET_CPU_CHARACTERISTICS L1D cache flush bits
There are several new L1D cache flush bits added to the hcall which reflect
hardware security features for speculative cache access issues.

These behaviours are now being specified as negative in order to simplify
patched kernel compatibility with older firmware (a new problem found in
existing systems would automatically be vulnerable).

[dwg: Technically this changes behaviour for existing machine types.
 After discussion with Nick, we've determined this is safe, because
 the worst that will happen if a guest gets the wrong information due
 to a migration is that it will perform some unnecessary workarounds,
 but will remain correct and secure (well, as secure as it was going
 to be anyway).  In addition the change only affects cap-cfpc=safe
 which is not enabled by default, and in fact is not possible to set
 on any current hardware (though it's expected it will be possible on
 POWER10)]

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20210615044107.1481608-1-npiggin@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-07-09 10:38:19 +10:00
Alexey Kardashevskiy
fc8c745d50 spapr: Implement Open Firmware client interface
The PAPR platform describes an OS environment that's presented by
a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.

Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.

This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.

The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.

This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.

This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.

In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.

When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.

This adds basic instances support which are managed by a hash map
ihandle -> [phandle].

Before the guest started, the used memory is:
0..e60 - the initial firmware
8000..10000 - stack
400000.. - kernel
3ea0000.. - initramdisk

This OF CI does not implement "interpret".

Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.

With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735

The immediate benefit is much faster booting time which especially
crucial with fully emulated early CPU bring up environments. Also this
may come handy when/if GRUB-in-the-userspace sees light of the day.

This separates VOF and sPAPR in a hope that VOF bits may be reused by
other POWERPC boards which do not support pSeries.

This assumes potential support for booting from QEMU backends
such as blockdev or netdev without devices/drivers used.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210625055155.2252896-1-aik@ozlabs.ru>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
[dwg: Adjusted some includes which broke compile in some more obscure
 compilation setups]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-07-09 10:38:19 +10:00
Alexey Kardashevskiy
7381c5d11f spapr: tune rtas-size
QEMU reserves space for RTAS via /rtas/rtas-size which tells the client
how much space the RTAS requires to work which includes the RTAS binary
blob implementing RTAS runtime. Because pseries supports FWNMI which
requires plenty of space, QEMU reserves more than 2KB which is
enough for the RTAS blob as it is just 20 bytes (under QEMU).

Since FWNMI reset delivery was added, RTAS_SIZE macro is not used anymore.
This replaces RTAS_SIZE with RTAS_MIN_SIZE and uses it in
the /rtas/rtas-size calculation to account for the RTAS blob.

Fixes: 0e236d3477 ("ppc/spapr: Implement FWNMI System Reset delivery")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210622070336.1463250-1-aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-07-09 10:38:18 +10:00
Shivaprasad G Bhat
f93c8f148c spapr: nvdimm: Forward declare and move the definitions
The subsequent patches add definitions which tend to get
the compilation to cyclic dependency. So, prepare with
forward declarations, move the definitions and clean up.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-Id: <162133925415.610.11584121797866216417.stgit@4f1e6f2bd33e>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-06-03 13:22:06 +10:00
Greg Kurz
3bf0844f3b spapr: Don't hijack current_machine->boot_order
QEMU 6.0 moved all the -boot variables to the machine. Especially, the
removal of the boot_order static changed the handling of '-boot once'
from:

    if (boot_once) {
        qemu_boot_set(boot_once, &error_fatal);
        qemu_register_reset(restore_boot_order, g_strdup(boot_order));
    }

to

    if (current_machine->boot_once) {
        qemu_boot_set(current_machine->boot_once, &error_fatal);
        qemu_register_reset(restore_boot_order,
                            g_strdup(current_machine->boot_order));
    }

This means that we now register as subsequent boot order a copy
of current_machine->boot_once that was just set with the previous
call to qemu_boot_set(), i.e. we never transition away from the
once boot order.

It is certainly fragile^Wwrong for the spapr code to hijack a
field of the base machine type object like that. The boot order
rework simply turned this software boundary violation into an
actual bug.

Have the spapr code to handle that with its own field in
SpaprMachineState. Also kfree() the initial boot device
string when "once" was used.

Fixes: 4b7acd2ac8 ("vl: clean up -boot variables")
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1960119
Cc: pbonzini@redhat.com
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210521160735.1901914-1-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-06-03 13:22:06 +10:00
Lucas Mateus Castro (alqotel)
962104f044 hw/ppc: moved hcalls that depend on softmmu
The hypercalls h_enter, h_remove, h_bulk_remove, h_protect, and h_read,
have been moved to spapr_softmmu.c with the functions they depend on. The
functions is_ram_address and push_sregs_to_kvm_pr are not static anymore
as functions on both spapr_hcall.c and spapr_softmmu.c depend on them.
The hypercalls h_resize_hpt_prepare and h_resize_hpt_commit have been
divided, the KVM part stayed in spapr_hcall.c while the softmmu part
was moved to spapr_softmmu.c

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Message-Id: <20210506163941.106984-2-lucas.araujo@eldorado.org.br>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-05-19 10:30:28 +10:00
Fabiano Rosas
068479e1e1 hw/ppc/spapr.c: Extract MMU mode error reporting into a function
A following patch will make use of it.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20210505001130.3999968-2-farosas@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-05-19 10:30:28 +10:00
Daniel Henrique Barboza
b7573092ab spapr.h: increase FDT_MAX_SIZE
Certain SMP topologies stress, e.g. 1 thread/core, 2048 cores and
1 socket, stress the current maximum size of the pSeries FDT:

Calling ibm,client-architecture-support...qemu-system-ppc64: error
creating device tree: (fdt_setprop(fdt, offset,
"ibm,processor-segment-sizes", segs, sizeof(segs))): FDT_ERR_NOSPACE

2048 is the default NR_CPUS value for the pSeries kernel. It's expected
that users will want QEMU to be able to handle this kind of
configuration.

Bumping FDT_MAX_SIZE to 2MB is enough for these setups to be created.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210408204049.221802-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-05-04 11:41:25 +10:00