qemu/include/hw
David Gibson 8897ea5a9f spapr: Don't attempt to clamp RMA to VRMA constraint
The Real Mode Area (RMA) is the part of memory which a guest can access
when in real (MMU off) mode.  Of course, for a guest under KVM, the MMU
isn't really turned off, it's just in a special translation mode - Virtual
Real Mode Area (VRMA) - which looks like real mode in guest mode.

The mechanics of how this works when using the hash MMU (HPT) put a
constraint on the size of the RMA, which depends on the size of the
HPT.  So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA
we advertise to the guest based on this VRMA limit.

There are several things wrong with this:
 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum
    of Node 0 memory size and the VRMA limit.  That will *often* work the
    same as clamping, but there can be other constraints on RMA size which
    supersede Node 0 memory size.  We have real bugs caused by this
    (currently worked around in the guest kernel)
 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where
    we're past the point that we can actually advertise an RMA limit to the
    guest
 3) But most fundamentally, the VRMA limit depends on host configuration
    (page size) which shouldn't be visible to the guest, but this partially
    exposes it.  This can cause problems with migration in certain edge
    cases, although we will mostly get away with it.

In practice, this clamping is almost never applied anyway.  With 64kiB
pages and the normal rules for sizing of the HPT, the theoretical VRMA
limit will be 4x(guest memory size) and so never hit.  It will hit with
4kiB pages, where it will be (guest memory size)/4.  However all mainstream
distro kernels for POWER have used a 64kiB page size for at least 10 years.

So, simply replace this logic with a check that the RMA we've calculated
based only on guest visible configuration will fit within the host implied
VRMA limit.  This can break if running HPT guests on a host kernel with
4kiB page size.  As noted that's very rare.  There also exist several
possible workarounds:
  * Change the host kernel to use 64kiB pages
  * Use radix MMU (RPT) guests instead of HPT
  * Use 64kiB hugepages on the host to back guest memory
  * Increase the guest memory size so that the RMA hits one of the fixed
    limits before the RMA limit.  This is relatively easy on POWER8 which
    has a 16GiB limit, harder on POWER9 which has a 1TiB limit.
  * Use a guest NUMA configuration which artificially constrains the RMA
    within the VRMA limit (the RMA must always fit within Node 0).

Previously, on KVM, we also temporarily reduced the rma_size to 256M so
that the we'd load the kernel and initrd safely, regardless of the VRMA
limit.  This was a) confusing, b) could significantly limit the size of
images we could load and c) introduced a behavioural difference between
KVM and TCG.  So we remove that as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
2020-03-17 09:41:15 +11:00
..
acpi piix4: Add a MC146818 RTC Controller as specified in datasheet 2019-11-05 23:33:12 +01:00
adc include: Make headers more self-contained 2019-08-16 13:31:51 +02:00
arm hw/arm/virt: Introduce finalize_gic_version() 2020-03-12 16:27:33 +00:00
audio Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
block block: Support providing LCHS from user 2019-10-31 11:47:11 -04:00
char mips: inline serial_init() 2020-01-07 17:24:29 +04:00
core cpu: Introduce cpu_class_set_parent_reset() 2020-01-24 20:59:06 +01:00
cpu Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
cris etraxfs: remove PROP_PTR usage 2020-01-07 17:24:29 +04:00
display hw/m68k: add Nubus macfb video card 2019-10-28 19:06:49 +01:00
dma Include hw/hw.h exactly where needed 2019-08-16 13:31:52 +02:00
firmware machine: Refactor smp-related call chains to pass MachineState 2019-07-05 17:07:36 -03:00
gpio hw/gpio: Add basic Aspeed GPIO model for AST2400 and AST2500 2019-09-13 16:05:00 +01:00
hyperv hyperv: process POST_MESSAGE hypercall 2018-10-19 13:44:14 +02:00
i2c aspeed/i2c: Add support for DMA transfers 2019-12-16 10:46:34 +00:00
i386 hw/i386/pc: Clean up includes 2020-03-09 15:59:31 +01:00
ide hw/ide: Let the DMAIntFunc prototype use a boolean 'is_write' argument 2020-02-20 14:47:08 +01:00
input hppa: add emulation of LASI PS2 controllers 2020-01-27 10:49:51 -08:00
intc arm_gic: Mask the un-supported priority bits 2020-02-28 16:14:57 +00:00
ipack Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
ipmi ipmi: Add support to customize OEM functions 2019-12-17 10:39:47 +11:00
isa hw/isa/isa-bus: cleanup irq functions 2019-12-17 19:33:51 +01:00
kvm Supply missing header guards 2019-06-12 13:20:21 +02:00
lm32 Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
m68k m68k: Add NeXTcube machine 2019-09-07 08:31:51 +02:00
mem nvdimm: add uuid property to nvdimm 2020-02-21 09:15:04 +11:00
mips Include hw/irq.h a lot less 2019-08-16 13:31:52 +02:00
misc hw/arm/allwinner-h3: add SDRAM controller device 2020-03-12 16:27:33 +00:00
net hw/arm/allwinner-h3: add EMAC ethernet device 2020-03-12 16:27:33 +00:00
nubus hw/m68k: add Nubus support 2019-10-28 19:06:47 +01:00
nvram fw_cfg: add "modify" functions for all types 2019-10-22 09:39:54 +02:00
pci pcie_root_port: Add hotplug disabling option 2020-03-08 09:18:29 -04:00
pci-bridge Supply missing header guards 2019-06-12 13:20:21 +02:00
pci-host hw/pci-host/q35: Remove unused includes 2020-03-09 15:59:31 +01:00
ppc spapr: Don't attempt to clamp RMA to VRMA constraint 2020-03-17 09:41:15 +11:00
rdma {hmp, hw/pvrdma}: Expose device internals via monitor interface 2019-03-16 15:52:44 +02:00
riscv hw/riscv: Provide rdtime callback for TCG in CLINT emulation 2020-02-27 13:46:37 -08:00
rtc hw/arm/allwinner: add RTC device support 2020-03-12 16:27:33 +00:00
s390x Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
scsi scsi: Propagate unrealize() callback to scsi-hd 2019-10-31 11:47:25 -04:00
sd hw/arm/allwinner: add SD/MMC host controller 2020-03-12 16:27:33 +00:00
semihosting semihosting: add qemu_semihosting_console_inc for SYS_READC 2020-01-09 11:41:29 +00:00
sh4 Include hw/irq.h a lot less 2019-08-16 13:31:52 +02:00
southbridge hw/pci-host/piix: Extract PIIX3 functions to hw/isa/piix3.c 2019-11-05 23:33:12 +01:00
sparc Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
ssi aspeed/smc: Add AST2600 timings registers 2019-12-16 10:46:34 +00:00
timer Fix typos and docs, trivial changes and RTC devices split 2019-10-25 14:17:08 +01:00
tricore Include hw/irq.h a lot less 2019-08-16 13:31:52 +02:00
unicore32 hw/unicore32: restrict hw addr defines to source file 2017-12-18 17:07:02 +03:00
usb usb: Add basic code to emulate Chipidea USB IP 2018-02-09 10:40:30 +00:00
vfio vfio: Turn the container error into an Error handle 2019-10-04 18:49:18 +02:00
virtio virtio-iommu-pci: Add virtio iommu pci support 2020-02-27 03:46:10 -05:00
watchdog watchdog/aspeed: Fix AST2600 frequency behaviour 2019-12-16 10:46:34 +00:00
xen xen-bus/block: explicitly assign event channels to an AioContext 2020-02-27 11:50:30 +00:00
xtensa Include hw/irq.h a lot less 2019-08-16 13:31:52 +02:00
boards.h hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
elf_ops.h hw/core/loader: Let load_elf() populate a field with CPU-specific flags 2020-01-29 19:28:52 +01:00
empty_slot.h include: Make headers more self-contained 2019-08-16 13:31:51 +02:00
fw-path-provider.h Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
hotplug.h qom: make interface types abstract 2018-12-11 15:45:22 -02:00
hw.h Include hw/hw.h exactly where needed 2019-08-16 13:31:52 +02:00
ide.h ide/via: Rename functions to match device name 2019-01-25 14:52:12 -05:00
irq.h Revert "irq: introduce qemu_irq_proxy()" 2019-11-05 23:33:12 +01:00
loader-fit.h Use #include "..." for our own headers, <...> for others 2018-02-09 05:05:11 +01:00
loader.h hw/core/loader: Let load_elf() populate a field with CPU-specific flags 2020-01-29 19:28:52 +01:00
nmi.h hw/nmi: Fix the NMI() macro, based on INTERFACE_CHECK() 2020-02-28 14:57:19 -05:00
or-irq.h hw/core/or-irq: Increase limit of or-lines to 48 2020-01-23 16:34:15 +00:00
pcmcia.h Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
platform-bus.h platform-bus-device: use device plug callback instead of machine_done notifier 2018-05-10 18:10:56 +01:00
ptimer.h ptimer: Remove old ptimer_init_with_bh() API 2019-11-11 13:44:16 +00:00
qdev-core.h hw/core: deprecate old reset functions and introduce new ones 2020-01-30 16:02:04 +00:00
qdev-dma.h Supply missing header guards 2019-06-12 13:20:21 +02:00
qdev-properties.h multifd: Add multifd-compression parameter 2020-02-28 09:24:43 +01:00
register.h hw: register: Run post_write hook on reset 2018-03-01 11:05:43 +00:00
registerfields.h Use #include "..." for our own headers, <...> for others 2018-02-09 05:05:11 +01:00
resettable.h hw/core: deprecate old reset functions and introduce new ones 2020-01-30 16:02:04 +00:00
stream.h Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
sysbus.h sysbus: remove outdated comment 2020-01-07 16:06:59 +04:00
usb.h Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
vmstate-if.h vmstate: add qom interface to get id 2020-01-06 18:41:32 +04:00