qemu/include/hw/ppc
David Gibson 8897ea5a9f spapr: Don't attempt to clamp RMA to VRMA constraint
The Real Mode Area (RMA) is the part of memory which a guest can access
when in real (MMU off) mode.  Of course, for a guest under KVM, the MMU
isn't really turned off, it's just in a special translation mode - Virtual
Real Mode Area (VRMA) - which looks like real mode in guest mode.

The mechanics of how this works when using the hash MMU (HPT) put a
constraint on the size of the RMA, which depends on the size of the
HPT.  So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA
we advertise to the guest based on this VRMA limit.

There are several things wrong with this:
 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum
    of Node 0 memory size and the VRMA limit.  That will *often* work the
    same as clamping, but there can be other constraints on RMA size which
    supersede Node 0 memory size.  We have real bugs caused by this
    (currently worked around in the guest kernel)
 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where
    we're past the point that we can actually advertise an RMA limit to the
    guest
 3) But most fundamentally, the VRMA limit depends on host configuration
    (page size) which shouldn't be visible to the guest, but this partially
    exposes it.  This can cause problems with migration in certain edge
    cases, although we will mostly get away with it.

In practice, this clamping is almost never applied anyway.  With 64kiB
pages and the normal rules for sizing of the HPT, the theoretical VRMA
limit will be 4x(guest memory size) and so never hit.  It will hit with
4kiB pages, where it will be (guest memory size)/4.  However all mainstream
distro kernels for POWER have used a 64kiB page size for at least 10 years.

So, simply replace this logic with a check that the RMA we've calculated
based only on guest visible configuration will fit within the host implied
VRMA limit.  This can break if running HPT guests on a host kernel with
4kiB page size.  As noted that's very rare.  There also exist several
possible workarounds:
  * Change the host kernel to use 64kiB pages
  * Use radix MMU (RPT) guests instead of HPT
  * Use 64kiB hugepages on the host to back guest memory
  * Increase the guest memory size so that the RMA hits one of the fixed
    limits before the RMA limit.  This is relatively easy on POWER8 which
    has a 16GiB limit, harder on POWER9 which has a 1TiB limit.
  * Use a guest NUMA configuration which artificially constrains the RMA
    within the VRMA limit (the RMA must always fit within Node 0).

Previously, on KVM, we also temporarily reduced the rma_size to 256M so
that the we'd load the kernel and initrd safely, regardless of the VRMA
limit.  This was a) confusing, b) could significantly limit the size of
images we could load and c) introduced a behavioural difference between
KVM and TCG.  So we remove that as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
2020-03-17 09:41:15 +11:00
..
fdt.h target/ppc: Pass cpu instead of env to ppc_create_page_sizes_prop() 2018-04-27 18:05:22 +10:00
mac_dbdma.h mac_dbdma: remove DBDMA_init() function 2017-09-27 13:05:41 +10:00
openpic_kvm.h openpic: move KVM-specific declarations into separate openpic_kvm.h file 2018-03-06 13:16:29 +11:00
openpic.h hw/core: Move cpu.c, cpu.h from qom/ to hw/core/ 2019-08-21 13:24:01 +02:00
pnv_core.h ppc/pnv: Add support for "hostboot" mode 2020-02-02 14:07:57 +11:00
pnv_homer.h ppc/pnv: Introduce PBA registers 2019-12-17 10:39:48 +11:00
pnv_lpc.h ppc/pnv: add a LPC Controller model for POWER10 2019-12-17 10:39:48 +11:00
pnv_occ.h ppc/pnv: Fix OCC common area region mapping 2019-12-17 10:39:48 +11:00
pnv_pnor.h ppc/pnv: fix check on return value of blk_getlength() 2020-01-08 12:01:14 +11:00
pnv_psi.h ppc/pnv: Drop PnvPsiClass::chip_type 2019-12-17 10:39:48 +11:00
pnv_xive.h pnv/xive: Use device_class_set_parent_realize() 2020-01-08 11:01:59 +11:00
pnv_xscom.h ppc/pnv: Add models for POWER8 PHB3 PCIe Host bridge 2020-02-02 14:07:57 +11:00
pnv.h ppc/pnv: Add models for POWER8 PHB3 PCIe Host bridge 2020-02-02 14:07:57 +11:00
ppc4xx.h ppc/{ppc440_bamboo, sam460ex}: use memdev for RAM 2020-02-19 16:50:00 +00:00
ppc_e500.h intc/openpic: Build openpic only once 2013-07-09 21:33:02 +02:00
ppc.h hw/ppc/prep: Remove the deprecated "prep" machine and the OpenHackware BIOS 2020-02-02 14:07:57 +11:00
spapr_cpu_core.h spapr: Implement H_PROD 2019-08-21 17:17:12 +10:00
spapr_drc.h spapr: Don't use spapr_drc_needed() in CAS code 2020-02-21 09:15:04 +11:00
spapr_irq.h spapr: Pass the maximum number of vCPUs to the KVM interrupt controller 2019-12-17 10:39:48 +11:00
spapr_nvdimm.h spapr: Add NVDIMM device support 2020-02-21 09:15:04 +11:00
spapr_ovec.h spapr: Simplify ovec diff 2019-12-17 10:39:48 +11:00
spapr_rtas.h tests: add RTAS command in the protocol 2016-09-23 10:29:40 +10:00
spapr_tpm_proxy.h spapr: initial implementation for H_TPM_COMM/spapr-tpm-proxy 2019-08-21 17:17:12 +10:00
spapr_vio.h spapr: Implement get_dt_compatible() callback 2020-02-02 14:07:57 +11:00
spapr_xive.h spapr/xive: Use device_class_set_parent_realize() 2020-01-08 11:01:59 +11:00
spapr.h spapr: Don't attempt to clamp RMA to VRMA constraint 2020-03-17 09:41:15 +11:00
xics_spapr.h spapr: Pass the maximum number of vCPUs to the KVM interrupt controller 2019-12-17 10:39:48 +11:00
xics.h ppc/pnv: Add models for POWER8 PHB3 PCIe Host bridge 2020-02-02 14:07:57 +11:00
xive_regs.h ppc/pnv: Dump the XIVE NVT table 2019-12-17 10:39:48 +11:00
xive.h xive: Add a "presenter" link property to the TCTX object 2020-01-08 11:01:59 +11:00