qemu/ppc at 8897ea5a9fc0aafa5ed7eee1e0c49893b91a2d87 - qemu

History

David Gibson 8897ea5a9f spapr: Don't attempt to clamp RMA to VRMA constraint The Real Mode Area (RMA) is the part of memory which a guest can access when in real (MMU off) mode. Of course, for a guest under KVM, the MMU isn't really turned off, it's just in a special translation mode - Virtual Real Mode Area (VRMA) - which looks like real mode in guest mode. The mechanics of how this works when using the hash MMU (HPT) put a constraint on the size of the RMA, which depends on the size of the HPT. So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA we advertise to the guest based on this VRMA limit. There are several things wrong with this: 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum of Node 0 memory size and the VRMA limit. That will often work the same as clamping, but there can be other constraints on RMA size which supersede Node 0 memory size. We have real bugs caused by this (currently worked around in the guest kernel) 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where we're past the point that we can actually advertise an RMA limit to the guest 3) But most fundamentally, the VRMA limit depends on host configuration (page size) which shouldn't be visible to the guest, but this partially exposes it. This can cause problems with migration in certain edge cases, although we will mostly get away with it. In practice, this clamping is almost never applied anyway. With 64kiB pages and the normal rules for sizing of the HPT, the theoretical VRMA limit will be 4x(guest memory size) and so never hit. It will hit with 4kiB pages, where it will be (guest memory size)/4. However all mainstream distro kernels for POWER have used a 64kiB page size for at least 10 years. So, simply replace this logic with a check that the RMA we've calculated based only on guest visible configuration will fit within the host implied VRMA limit. This can break if running HPT guests on a host kernel with 4kiB page size. As noted that's very rare. There also exist several possible workarounds: * Change the host kernel to use 64kiB pages * Use radix MMU (RPT) guests instead of HPT * Use 64kiB hugepages on the host to back guest memory * Increase the guest memory size so that the RMA hits one of the fixed limits before the RMA limit. This is relatively easy on POWER8 which has a 16GiB limit, harder on POWER9 which has a 1TiB limit. * Use a guest NUMA configuration which artificially constrains the RMA within the VRMA limit (the RMA must always fit within Node 0). Previously, on KVM, we also temporarily reduced the rma_size to 256M so that the we'd load the kernel and initrd safely, regardless of the VRMA limit. This was a) confusing, b) could significantly limit the size of images we could load and c) introduced a behavioural difference between KVM and TCG. So we remove that as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org>		2020-03-17 09:41:15 +11:00
..
e500-ccsr.h	ppc: do not use ../ in include files	2013-03-01 13:57:33 +01:00
e500.c	Merge tag 'patchew/20200219160953.13771-1-imammedo@redhat.com' of https://github.com/patchew-project/qemu into HEAD	2020-02-25 09:19:00 +01:00
e500.h	platform-bus-device: use device plug callback instead of machine_done notifier	2018-05-10 18:10:56 +01:00
e500plat.c	ppc/e500: use memdev for RAM	2020-02-19 16:50:00 +00:00
fdt.c	target/ppc: Split page size information into a separate allocation	2018-04-27 18:05:22 +10:00
fw_cfg.c	hw/ppc: Implement fw_cfg_arch_key_name()	2019-05-23 14:10:31 +02:00
Kconfig	ppc/pnv: Fix PCI_EXPRESS dependency	2020-02-21 09:15:03 +11:00
mac_newworld.c	ppc/mac_newworld: use memdev for RAM	2020-02-19 16:50:00 +00:00
mac_oldworld.c	hw: Make MachineClass::is_default a boolean type	2020-02-28 14:57:19 -05:00
mac.h	ide: Include hw/ide/internal a bit less outside hw/ide/	2019-08-16 13:31:52 +02:00
Makefile.objs	spapr: Add NVDIMM device support	2020-02-21 09:15:04 +11:00
mpc8544_guts.c	sysemu: Split sysemu/runstate.h off sysemu/sysemu.h	2019-08-16 13:37:36 +02:00
mpc8544ds.c	ppc/e500: use memdev for RAM	2020-02-19 16:50:00 +00:00
pnv_bmc.c	ppc/pnv: Add a "pnor" const link property to the BMC internal simulator	2020-01-08 11:01:59 +11:00
pnv_core.c	ppc/pnv: change the PowerNV machine devices to be non user creatable	2020-02-02 14:07:57 +11:00
pnv_homer.c	ppc/pnv: change the PowerNV machine devices to be non user creatable	2020-02-02 14:07:57 +11:00
pnv_lpc.c	hw/ppc/pnv: Fix typo in comment	2020-03-17 09:41:14 +11:00
pnv_occ.c	ppc/pnv: change the PowerNV machine devices to be non user creatable	2020-02-02 14:07:57 +11:00
pnv_pnor.c	ppc/pnv: improve error logging when a PNOR update fails	2020-02-02 14:07:57 +11:00
pnv_psi.c	add device_legacy_reset function to prepare for reset api change	2020-01-30 16:02:03 +00:00
pnv_xscom.c	ppc/pnv: Introduce PnvChipClass::xscom_pcba() method	2019-12-17 10:59:11 +11:00
pnv.c	Merge tag 'patchew/20200219160953.13771-1-imammedo@redhat.com' of https://github.com/patchew-project/qemu into HEAD	2020-02-25 09:19:00 +01:00
ppc4xx_devs.c	ppc/{ppc440_bamboo, sam460ex}: use memdev for RAM	2020-02-19 16:50:00 +00:00
ppc4xx_pci.c	Include hw/hw.h exactly where needed	2019-08-16 13:31:52 +02:00
ppc405_boards.c	ppc/ppc405_boards: use memdev for RAM	2020-02-19 16:50:00 +00:00
ppc405_uc.c	Include hw/boards.h a bit less	2019-08-16 13:31:53 +02:00
ppc405.h	ppc4xx: Export ECB and PLB emulation	2017-09-08 09:30:55 +10:00
ppc440_bamboo.c	ppc/{ppc440_bamboo, sam460ex}: use memdev for RAM	2020-02-19 16:50:00 +00:00
ppc440_pcix.c	Include hw/hw.h exactly where needed	2019-08-16 13:31:52 +02:00
ppc440_uc.c	Let cpu_[physical]_memory() calls pass a boolean 'is_write' argument	2020-02-20 14:47:08 +01:00
ppc440.h	ppc440_uc: Basic emulation of PPC440 DMA controller	2018-07-03 09:56:52 +10:00
ppc_booke.c	sysemu: Split sysemu/runstate.h off sysemu/sysemu.h	2019-08-16 13:37:36 +02:00
ppc.c	hw/ppc/prep: Remove the deprecated "prep" machine and the OpenHackware BIOS	2020-02-02 14:07:57 +11:00
ppce500_spin.c	Clean up inclusion of sysemu/sysemu.h	2019-08-16 13:31:53 +02:00
prep_systemio.c	qdev: set properties with device_class_set_props()	2020-01-24 20:59:15 +01:00
prep.c	hw/ppc/prep: Remove the deprecated "prep" machine and the OpenHackware BIOS	2020-02-02 14:07:57 +11:00
rs6000_mc.c	qdev: set properties with device_class_set_props()	2020-01-24 20:59:15 +01:00
sam460ex.c	ppc/{ppc440_bamboo, sam460ex}: use memdev for RAM	2020-02-19 16:50:00 +00:00
spapr_caps.c	ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls	2020-02-03 11:33:11 +11:00
spapr_cpu_core.c	spapr, ppc: Remove VPM0/RMLS hacks for POWER9	2020-03-17 09:41:15 +11:00
spapr_drc.c	spapr: Fix handling of unplugged devices during CAS and migration	2020-02-21 09:15:04 +11:00
spapr_events.c	spapr: Handle pending hot plug/unplug requests at CAS	2020-03-17 09:41:14 +11:00
spapr_hcall.c	spapr: Don't attempt to clamp RMA to VRMA constraint	2020-03-17 09:41:15 +11:00
spapr_iommu.c	vmstate: replace DeviceState with VMStateIf	2020-01-06 18:41:32 +04:00
spapr_irq.c	spapr, pnv, xive: Add a "xive-fabric" link to the XIVE router	2020-01-08 11:01:59 +11:00
spapr_nvdimm.c	spapr: Fix Coverity warning while validating nvdimm options	2020-03-17 09:41:14 +11:00
spapr_ovec.c	spapr: Simplify ovec diff	2019-12-17 10:39:48 +11:00
spapr_pci_nvlink2.c	error: Clean up unusual names of Error * variables	2019-12-18 08:36:15 +01:00
spapr_pci_vfio.c	Include qemu-common.h exactly where needed	2019-06-12 13:20:20 +02:00
spapr_pci.c	add device_legacy_reset function to prepare for reset api change	2020-01-30 16:02:03 +00:00
spapr_rng.c	qdev: set properties with device_class_set_props()	2020-01-24 20:59:15 +01:00
spapr_rtas_ddw.c	Include qemu/module.h where needed, drop it from qemu-common.h	2019-06-12 13:18:33 +02:00
spapr_rtas.c	spapr/rtas: Print message from "ibm,os-term"	2020-02-21 09:15:03 +11:00
spapr_rtc.c	Include migration/vmstate.h less	2019-08-16 13:31:52 +02:00
spapr_tpm_proxy.c	qdev: set properties with device_class_set_props()	2020-01-24 20:59:15 +01:00
spapr_vio.c	spapr: Implement get_dt_compatible() callback	2020-02-02 14:07:57 +11:00
spapr.c	spapr: Don't attempt to clamp RMA to VRMA constraint	2020-03-17 09:41:15 +11:00
trace-events	spapr: Remove unhelpful tracepoints from spapr_irq_free_xics()	2019-10-04 19:08:22 +10:00
virtex_ml507.c	Merge tag 'patchew/20200219160953.13771-1-imammedo@redhat.com' of https://github.com/patchew-project/qemu into HEAD	2020-02-25 09:19:00 +01:00