mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Alexey Kardashevskiy	21bde1ecb6	spapr: Fix implementation of Open Firmware client interface This addresses the comments from v22. The functional changes are (the VOF ones need retesting with Pegasos2): (VOF) setprop will start failing if the machine class callback did not handle it; (VOF) unit addresses are lowered in path_offset(); (SPAPR) /chosen/bootargs is initialized from kernel_cmdline if the client did not change it. Fixes: 5c991e5d4378 ("spapr: Implement Open Firmware client interface") Cc: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20210708065625.548396-1-aik@ozlabs.ru> Tested-by: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:55:11 +10:00
Nicholas Piggin	17fd09c021	target/ppc/spapr: Update H_GET_CPU_CHARACTERISTICS L1D cache flush bits There are several new L1D cache flush bits added to the hcall which reflect hardware security features for speculative cache access issues. These behaviours are now being specified as negative in order to simplify patched kernel compatibility with older firmware (a new problem found in existing systems would automatically be vulnerable). [dwg: Technically this changes behaviour for existing machine types. After discussion with Nick, we've determined this is safe, because the worst that will happen if a guest gets the wrong information due to a migration is that it will perform some unnecessary workarounds, but will remain correct and secure (well, as secure as it was going to be anyway). In addition the change only affects cap-cfpc=safe which is not enabled by default, and in fact is not possible to set on any current hardware (though it's expected it will be possible on POWER10)] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20210615044107.1481608-1-npiggin@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Alexey Kardashevskiy	fc8c745d50	spapr: Implement Open Firmware client interface The PAPR platform describes an OS environment that's presented by a combination of a hypervisor and firmware. The features it specifies require collaboration between the firmware and the hypervisor. Since the beginning, the runtime component of the firmware (RTAS) has been implemented as a 20 byte shim which simply forwards it to a hypercall implemented in qemu. The boot time firmware component is SLOF - but a build that's specific to qemu, and has always needed to be updated in sync with it. Even though we've managed to limit the amount of runtime communication we need between qemu and SLOF, there's some, and it has become increasingly awkward to handle as we've implemented new features. This implements a boot time OF client interface (CI) which is enabled by a new "x-vof" pseries machine option (stands for "Virtual Open Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall which implements Open Firmware Client Interface (OF CI). This allows using a smaller stateless firmware which does not have to manage the device tree. The new "vof.bin" firmware image is included with source code under pc-bios/. It also includes RTAS blob. This implements a handful of CI methods just to get -kernel/-initrd working. In particular, this implements the device tree fetching and simple memory allocator - "claim" (an OF CI memory allocator) and updates "/memory@0/available" to report the client about available memory. This implements changing some device tree properties which we know how to deal with, the rest is ignored. To allow changes, this skips fdt_pack() when x-vof=on as not packing the blob leaves some room for appending. In absence of SLOF, this assigns phandles to device tree nodes to make device tree traversing work. When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. This adds basic instances support which are managed by a hash map ihandle -> [phandle]. Before the guest started, the used memory is: 0..e60 - the initial firmware 8000..10000 - stack 400000.. - kernel 3ea0000.. - initramdisk This OF CI does not implement "interpret". Unlike SLOF, this does not format uninitialized nvram. Instead, this includes a disk image with pre-formatted nvram. With this basic support, this can only boot into kernel directly. However this is just enough for the petitboot kernel and initradmdisk to boot from any possible source. Note this requires reasonably recent guest kernel with: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 The immediate benefit is much faster booting time which especially crucial with fully emulated early CPU bring up environments. Also this may come handy when/if GRUB-in-the-userspace sees light of the day. This separates VOF and sPAPR in a hope that VOF bits may be reused by other POWERPC boards which do not support pSeries. This assumes potential support for booting from QEMU backends such as blockdev or netdev without devices/drivers used. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20210625055155.2252896-1-aik@ozlabs.ru> Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu> [dwg: Adjusted some includes which broke compile in some more obscure compilation setups] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Lucas Mateus Castro (alqotel)	03282a3ab8	hw/ppc: moved has_spr to cpu.h Moved has_spr to cpu.h as ppc_has_spr and turned it into an inline function. Change spr verification in pnv.c and spapr.c to a version that can compile in a !TCG environment. Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Message-Id: <20210507164146.67086-1-lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-05-19 10:30:28 +10:00
Lucas Mateus Castro (alqotel)	962104f044	hw/ppc: moved hcalls that depend on softmmu The hypercalls h_enter, h_remove, h_bulk_remove, h_protect, and h_read, have been moved to spapr_softmmu.c with the functions they depend on. The functions is_ram_address and push_sregs_to_kvm_pr are not static anymore as functions on both spapr_hcall.c and spapr_softmmu.c depend on them. The hypercalls h_resize_hpt_prepare and h_resize_hpt_commit have been divided, the KVM part stayed in spapr_hcall.c while the softmmu part was moved to spapr_softmmu.c Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Message-Id: <20210506163941.106984-2-lucas.araujo@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-05-19 10:30:28 +10:00
Fabiano Rosas	068479e1e1	hw/ppc/spapr.c: Extract MMU mode error reporting into a function A following patch will make use of it. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20210505001130.3999968-2-farosas@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-05-19 10:30:28 +10:00
Peter Maydell	d90f154867	ppc patch queue 2021-05-04 Here's the first ppc pull request for qemu-6.1. It has a wide variety of stuff accumulated during the 6.0 freeze. Highlights are: * Multi-phase reset cleanups for PAPR * Preliminary cleanups towards allowing !CONFIG_TCG for the ppc target * Cleanup of AIL logic and extension to POWER10 * Further improvements to handling of hot unplug failures on PAPR * Allow much larger numbers of CPU on pseries * Support for the H_SCM_HEALTH hypercall * Add support for the Pegasos II board * Substantial cleanup to hflag handling * Assorted minor fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAmCQ4ScACgkQbDjKyiDZ s5KmNhAAsICdDqeu/jm1uhRCr0DDT/Wa6KE1xlglQ53ybWb5Hm2ae0Uwzti5ZWkt T9yryObX++wiugbU5Dlx9eXTiJIPgTbDoBV1wfOa3a1BAxSEES1t70jwuwAXXBpX mgU++SurQB70IB7vVvyXDi2Z592qGvMiKXqT0sdkfoexPHzAL0+KkQPyJZLeFchM Ap/zRHAodXf9SuWAl+LwLXeb350jivXYXBWNcFRrBbOGpbVT0AJMYrk/TEa2ZIpi SvbzAWuW+9mX0EOmk7JK5JfkT41cGNdcBcwd0bt4xyvUpmkXLaTMFDLVHj3HWSUn PFA4RB3uKXyTfISVtWdxJBbFOzMpchI6lEiRJHCS+KuY7UsACqV1T/y54ATOUauC ycLc9APgRaStdNPxfDl+xeFfoVb/f0mQsNwcmY1tv7z+3qE/trY9bMyrbgaebBFn /TAkmPvXfwtAREnx8xF/57poarWUkvupGTQkANNosdFokpExmrLj8T0sKv90hh5Y vkGf5zP4pYGN1Rs8qhOdHu+IjhVJvUl/L3LZYWcoMI6E61D8rGRc0Dkacx7gcja+ sluFi5Yh2fQn55y6LTi3049cB1wMd6wly0214F11RKoBswguiGuaqJmL4sNDO/s4 IcMCy5mg6C0jNZA5kHcdWmqsVzD2+XwP5J29n/LedlmgXoHYF+M= =N0qr -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging ppc patch queue 2021-05-04 Here's the first ppc pull request for qemu-6.1. It has a wide variety of stuff accumulated during the 6.0 freeze. Highlights are: * Multi-phase reset cleanups for PAPR * Preliminary cleanups towards allowing !CONFIG_TCG for the ppc target * Cleanup of AIL logic and extension to POWER10 * Further improvements to handling of hot unplug failures on PAPR * Allow much larger numbers of CPU on pseries * Support for the H_SCM_HEALTH hypercall * Add support for the Pegasos II board * Substantial cleanup to hflag handling * Assorted minor fixes and cleanups # gpg: Signature made Tue 04 May 2021 06:52:39 BST # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dg-gitlab/tags/ppc-for-6.1-20210504: (46 commits) hw/ppc/pnv_psi: Use device_cold_reset() instead of device_legacy_reset() hw/ppc/spapr_vio: Reset TCE table object with device_cold_reset() hw/intc/spapr_xive: Use device_cold_reset() instead of device_legacy_reset() target/ppc: removed VSCR from SPR registration target/ppc: Reduce the size of ppc_spr_t target/ppc: Clean up _spr_register et al target/ppc: Add POWER10 exception model target/ppc: rework AIL logic in interrupt delivery target/ppc: move opcode table logic to translate.c target/ppc: code motion from translate_init.c.inc to gdbstub.c spapr_drc.c: handle hotunplug errors in drc_unisolate_logical() spapr.h: increase FDT_MAX_SIZE spapr.c: do not use MachineClass::max_cpus to limit CPUs ppc: Rename current DAWR macros and variables target/ppc: POWER10 supports scv target/ppc: Fix POWER9 radix guest HV interrupt AIL behaviour docs/system: ppc: Add documentation for ppce500 machine roms/u-boot: Bump ppce500 u-boot to v2021.04 to fix broken pci support roms/Makefile: Update ppce500 u-boot build directory name ppc/spapr: Add support for implement support for H_SCM_HEALTH ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-05-05 20:29:14 +01:00
Nicholas Piggin	526cdce771	target/ppc: Add POWER10 exception model POWER10 adds a new bit that modifies interrupt behaviour, LPCR[HAIL], and it removes support for the LPCR[AIL]=0b10 mode. Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20210501072436.145444-3-npiggin@gmail.com> [dwg: Corrected tab indenting] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-05-04 13:12:46 +10:00
Nicholas Piggin	8b7e6b07a4	target/ppc: rework AIL logic in interrupt delivery The AIL logic is becoming unmanageable spread all over powerpc_excp(), and it is slated to get even worse with POWER10 support. Move it all to a new helper function. Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20210501072436.145444-2-npiggin@gmail.com> [dwg: Corrected tab indenting] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-05-04 13:12:02 +10:00
Thomas Huth	2068cabd3f	Do not include cpu.h if it's not really necessary Stop including cpu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-4-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:51 +02:00
Daniel Henrique Barboza	eb72b63988	spapr_hcall.c: make do_client_architecture_support static The function is called only inside spapr_hcall.c. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210114180628.1675603-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-19 10:20:29 +11:00
Greg Kurz	babb819f94	spapr: Introduce spapr_drc_reset_all() No need to expose the way DRCs are traversed outside of spapr_drc.c. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201218103400.689660-4-groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-06 11:09:59 +11:00
Greg Kurz	930ef3b5c2	spapr: Fix reset of transient DR connectors Documentation of object_property_iter_init() clearly stipulates that "it is forbidden to modify the property list while iterating". But this is exactly what we do when resetting transient DR connectors during CAS. The call to spapr_drc_reset() can finalize the hot-unplug sequence of a PHB or a PCI bridge, both of which will then in turn destroy their PCI DRCs. This could potentially invalidate the iterator. It is pure luck that this haven't caused any issues so far. Change spapr_drc_reset() to return true if it caused a device to be removed. Restart from scratch in this case. This can potentially increase the overall DRC reset time, especially with a high maxmem which generates a lot of LMB DRCs. But this kind of setup is rare, and so is the use case of rebooting a guest while doing hot-unplug. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201218103400.689660-3-groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-06 11:09:59 +11:00
Greg Kurz	cd725bd748	spapr: Call spapr_drc_reset() for all DRCs at CAS Non-transient DRCs are either in the empty or the ready state, which means spapr_drc_reset() doesn't change their state. It is thus not needed to do any checking. Call spapr_drc_reset() unconditionally and squash spapr_drc_transient() into its only user, spapr_drc_needed(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201218103400.689660-2-groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-06 11:09:59 +11:00
Greg Kurz	c4c81d7d51	spapr: Pass sPAPR machine state down to spapr_pci_switch_vga() This allows to drop a user of qdev_get_machine(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201209170052.1431440-4-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-12-14 15:54:12 +11:00
Greg Kurz	f29b959dc6	spapr: Convert hpt_prepare_thread() to use qemu_try_memalign() HPT resizing is asynchronous: the guest first kicks off the creation of a new HPT, then it waits for that new HPT to be actually created and finally it asks the current HPT to be replaced by the new one. In the case of a userland allocated HPT, this currently relies on calling qemu_memalign() which aborts on OOM and never returns NULL. Since we seem to have path to report the failure to the guest with an H_NO_MEM return value, use qemu_try_memalign() instead of qemu_memalign(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160398563636.32380.1747166034877173994.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-11-05 12:18:48 +11:00
Greg Kurz	7e92da81be	spapr: Simplify error handling in do_client_architecture_support() Use the return value of ppc_set_compat_all() to check failures, which is preferred over hijacking local_err. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20200914123505.612812-7-groug@kaod.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-10-09 10:15:06 +11:00
Greg Kurz	121afbe487	spapr: Get rid of cas_check_pvr() error reporting The cas_check_pvr() function has two purposes: - finding the "best" logical PVR, ie. the most recent one supported by the guest for this CPU type - checking if the guest supports the real PVR of this CPU type, which is just an optional extra information to workaround the lack of support for "compat" mode in PR KVM This logic doesn't need error reporting, really. If we don't find a suitable logical PVR, we return the special value 0 which is definitely not a valid PVR. Let the caller decide on whether it should error out or not. This doesn't change the behavior. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20200914123505.612812-6-groug@kaod.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-10-09 10:15:06 +11:00
Daniel Henrique Barboza	f8a13fc381	spapr: move h_home_node_associativity to spapr_numa.c The implementation of this hypercall will be modified to use spapr->numa_assoc_arrays input. Moving it to spapr_numa.c makes make more sense. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200904172422.617460-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-09-08 10:37:32 +10:00
Markus Armbruster	552d7f49ee	qom: Crash more nicely on object_property_get_link() failure Pass &error_abort instead of NULL where the returned value is dereferenced or asserted to be non-null. Drop a now redundant assertion. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-24-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Greg Kurz	087820e37f	spapr: Drop CAS reboot flag The CAS reboot flag is false by default and all the locations that could set it to true have been dropped. This means that all code blocks depending on the flag being set is dead code and the other code blocks should be executed always. Just do that and drop the now uneeded CAS reboot flag. Fix a comment on the way to make checkpatch happy. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514994893.478799.11772512888322840990.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-07 11:10:50 +10:00
Alexey Kardashevskiy	91067db1ab	spapr/cas: Separate CAS handling from rebuilding the FDT At the moment "ibm,client-architecture-support" ("CAS") is implemented in SLOF and QEMU assists via the custom H_CAS hypercall which copies an updated flatten device tree (FDT) blob to the SLOF memory which it then uses to update its internal tree. When we enable the OpenFirmware client interface in QEMU, we won't need to copy the FDT to the guest as the client is expected to fetch the device tree using the client interface. This moves FDT rebuild out to a separate helper which is going to be called from the "ibm,client-architecture-support" handler and leaves writing FDT to the guest in the H_CAS handler. This should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200310050733.29805-3-aik@ozlabs.ru> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514994229.478799.2178881312094922324.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-07 11:10:50 +10:00
Greg Kurz	b4b83312e7	spapr: Simplify selection of radix/hash during CAS The guest can select the MMU mode by setting bits 0-1 of byte 24 in OV5 to to 0b00 for hash or 0b01 for radix. As required by the architecture, we terminate the boot process if any other value is found there. The usual way to negotiate features in OV5 is basically ANDing the bitfield provided by the guest and the bitfield of features supported by QEMU, previously populated at machine init. For some not documented reason, MMU is treated differently : bit 1 of byte 24 (the radix/hash bit) is cleared from the guest OV5 and explicitely set in the final negotiated OV5 if radix was requested. Since the only expected input from the guest is the radix/hash bit being set or not, it seems more appropriate to handle this like we do for XIVE. Set the radix bit in spapr->ov5 at machine init if it has a chance to work (ie. power9, either TCG or a radix capable KVM) and rely exclusively on spapr_ovec_intersect() to set the radix bit in spapr->ov5_cas. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514993621.478799.4204740354545734293.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-07 11:10:50 +10:00
Greg Kurz	86962462f8	spapr: Don't check capabilities removed between CAS calls We currently check if some capability in OV5 was removed by the guest since the previous CAS, and we trigger a CAS reboot in that case. This was required because it could call for a device-tree property or node removal, that we didn't support until recently (see commit `6787d27b04` "spapr: add option vector handling in CAS-generated resets" for details). Now that we render a full FDT at CAS and that SLOF is able to handle node removal, we don't need to do a CAS reset in this case anymore. Also, this check can only return true if the guest has already called CAS since the last full system reset (otherwise spapr->ov5_cas is empty). Linux doesn't do that so this can be considered as dead code for the vast majority of existing setups. Drop the check. Since the only use of the ov5_cas_old variable is precisely the check itself, drop the variable as well. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514993021.478799.10928618293640651819.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-07 11:10:50 +10:00
Greg Kurz	ce05fa0fcc	spapr: Fix memory leak in h_client_architecture_support() This is the only error path that needs to free the previously allocated ov1. Reported-by: Coverity (CID 1421924) Fixes: `cbd0d7f363` "spapr: Fail CAS if option vector table cannot be parsed" Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158481206205.336182.16106097429336044843.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-03-24 11:56:37 +11:00
David Gibson	8897ea5a9f	spapr: Don't attempt to clamp RMA to VRMA constraint The Real Mode Area (RMA) is the part of memory which a guest can access when in real (MMU off) mode. Of course, for a guest under KVM, the MMU isn't really turned off, it's just in a special translation mode - Virtual Real Mode Area (VRMA) - which looks like real mode in guest mode. The mechanics of how this works when using the hash MMU (HPT) put a constraint on the size of the RMA, which depends on the size of the HPT. So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA we advertise to the guest based on this VRMA limit. There are several things wrong with this: 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum of Node 0 memory size and the VRMA limit. That will often work the same as clamping, but there can be other constraints on RMA size which supersede Node 0 memory size. We have real bugs caused by this (currently worked around in the guest kernel) 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where we're past the point that we can actually advertise an RMA limit to the guest 3) But most fundamentally, the VRMA limit depends on host configuration (page size) which shouldn't be visible to the guest, but this partially exposes it. This can cause problems with migration in certain edge cases, although we will mostly get away with it. In practice, this clamping is almost never applied anyway. With 64kiB pages and the normal rules for sizing of the HPT, the theoretical VRMA limit will be 4x(guest memory size) and so never hit. It will hit with 4kiB pages, where it will be (guest memory size)/4. However all mainstream distro kernels for POWER have used a 64kiB page size for at least 10 years. So, simply replace this logic with a check that the RMA we've calculated based only on guest visible configuration will fit within the host implied VRMA limit. This can break if running HPT guests on a host kernel with 4kiB page size. As noted that's very rare. There also exist several possible workarounds: * Change the host kernel to use 64kiB pages * Use radix MMU (RPT) guests instead of HPT * Use 64kiB hugepages on the host to back guest memory * Increase the guest memory size so that the RMA hits one of the fixed limits before the RMA limit. This is relatively easy on POWER8 which has a 16GiB limit, harder on POWER9 which has a 1TiB limit. * Use a guest NUMA configuration which artificially constrains the RMA within the VRMA limit (the RMA must always fit within Node 0). Previously, on KVM, we also temporarily reduced the rma_size to 256M so that the we'd load the kernel and initrd safely, regardless of the VRMA limit. This was a) confusing, b) could significantly limit the size of images we could load and c) introduced a behavioural difference between KVM and TCG. So we remove that as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org>	2020-03-17 09:41:15 +11:00
Greg Kurz	ad334d89a6	spapr: Handle pending hot plug/unplug requests at CAS If a hot plug or unplug request is pending at CAS, we currently trigger a CAS reboot, which severely increases the guest boot time. This is because SLOF doesn't handle hot plug events and we had no way to fix the FDT that gets presented to the guest. We can do better thanks to recent changes in QEMU and SLOF: - we now return a full FDT to SLOF during CAS - SLOF was fixed to correctly detect any device that was either added or removed since boot time and to update its internal DT accordingly. The right solution is to process all pending hot plug/unplug requests during CAS: convert hot plugged devices to cold plugged devices and remove the hot unplugged ones, which is exactly what spapr_drc_reset() does. Also clear all hot plug events that are currently queued since they're no longer relevant. Note that SLOF cannot currently populate hot plugged PCI bridges or PHBs at CAS. Until this limitation is lifted, SLOF will reset the machine when this scenario occurs : this will allow the FDT to be fully processed when SLOF is started again (ie. the same effect as the CAS reboot that would occur anyway without this patch). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158257222352.4102917.8984214333937947307.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 09:41:14 +11:00
Paolo Bonzini	9e264985ff	Merge branch 'exec_rw_const_v4' of https://github.com/philmd/qemu into HEAD	2020-02-25 13:41:48 +01:00
Greg Kurz	4b63db1289	spapr: Don't use spapr_drc_needed() in CAS code We currently don't support hotplug of devices between boot and CAS. If this happens a CAS reboot is triggered. We detect this during CAS using the spapr_drc_needed() function which is essentially a VMStateDescription .needed callback. Even if the condition for CAS reboot happens to be the same as for DRC migration, it looks wrong to piggyback a migration helper for this. Introduce a helper with slightly more explicit name and use it in both CAS and DRC migration code. Since a subsequent patch will enhance this helper to cover the case of hot unplug, let's go for spapr_drc_transient(). While here convert spapr_hotplugged_dev_before_cas() to the "transient" wording as well. This doesn't change any behaviour. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158169248180.3465937.9531405453362718771.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:04 +11:00
Philippe Mathieu-Daudé	85eb7c18ee	Let cpu_[physical]_memory() calls pass a boolean 'is_write' argument Use an explicit boolean type. This commit was produced with the included Coccinelle script scripts/coccinelle/exec_rw_const. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-02-20 14:47:08 +01:00
Greg Kurz	12b3868ead	spapr: Don't allow multiple active vCPUs at CAS According to the description of "ibm,client-architecture-support" that can found in LoPAPR "B.6.2.3 Root Node Methods": If multiple partition processors or threads are active at the time of the ibm,client-architecture-support method call, or an error is detected in the format of the ibm,architecture.vec structure, the err? boolean shall be TRUE; else FALSE. We certainly don't want to temper with the platform or with the PCR of the other vCPUs if they happen to be active. Ensure we have only one active vCPU and fail CAS otherwise. This is just for conformance and robustness, it doesn't fix any known bugs. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157969867170.571404.12117797348882189656.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Greg Kurz	cbd0d7f363	spapr: Fail CAS if option vector table cannot be parsed Most of the option vector helpers have assertions to check their arguments aren't null. The guest can provide an arbitrary address for the CAS structure that would result in such null arguments. Fail CAS with H_PARAMETER and print a warning instead of aborting QEMU. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <157925255250.397143.10855183619366882459.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
David Gibson	d1d32d6255	spapr: Simplify ovec diff spapr_ovec_diff(ov, old, new) has somewhat complex semantics. ov is set to those bits which are in new but not old, and it returns as a boolean whether or not there are any bits in old but not new. It turns out that both callers only care about the second, not the first. This is basically equivalent to a bitmap subset operation, which is easier to understand and implement. So replace spapr_ovec_diff() with spapr_ovec_subset(). Cc: Mike Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>	2019-12-17 10:39:48 +11:00
David Gibson	0c21e07354	spapr: Fold h_cas_compose_response() into h_client_architecture_support() spapr_h_cas_compose_response() handles the last piece of the PAPR feature negotiation process invoked via the ibm,client-architecture-support OF call. Its only caller is h_client_architecture_support() which handles most of the rest of that process. I believe it was placed in a separate file originally to handle some fiddly dependencies between functions, but mostly it's just confusing to have the CAS process split into two pieces like this. Now that compose response is simplified (by just generating the whole device tree anew), it's cleaner to just fold it into h_client_architecture_support(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-12-17 10:39:48 +11:00
David Gibson	8deb8019d6	spapr: Don't trigger a CAS reboot for XICS/XIVE mode changeover PAPR allows the interrupt controller used on a POWER9 machine (XICS or XIVE) to be selected by the guest operating system, by using the ibm,client-architecture-support (CAS) feature negotiation call. Currently, if the guest selects an interrupt controller different from the one selected at initial boot, this causes the system to be reset with the new model and the boot starts again. This means we run through the SLOF boot process twice, as well as any other bootloader (e.g. grub) in use before the OS calls CAS. This can be confusing and/or inconvenient for users. Thanks to two fairly recent changes, we no longer need this reboot. 1) we now completely regenerate the device tree when CAS is called (meaning we don't need special case updates for all the device tree changes caused by the interrupt controller mode change), 2) we now have explicit code paths to activate and deactivate the different interrupt controllers, rather than just implicitly calling those at machine reset time. We can therefore eliminate the reboot for changing irq mode, simply by putting a call to spapr_irq_update_active_intc() before we call spapr_h_cas_compose_response() (which gives the updated device tree to the guest firmware and OS). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-12-17 10:39:48 +11:00
David Gibson	ca62823b79	spapr: Use less cryptic representation of which irq backends are supported SpaprIrq::ov5 stores the value for a particular byte in PAPR option vector 5 which indicates whether XICS, XIVE or both interrupt controllers are available. As usual for PAPR, the encoding is kind of overly complicated and confusing (though to be fair there are some backwards compat things it has to handle). But to make our internal code clearer, have SpaprIrq encode more directly which backends are available as two booleans, and derive the OV5 value from that at the point we need it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:23 +10:00
David Gibson	daa36379ce	spapr: Simplify handling of pre ISA 3.0 guest workaround handling Certain old guest versions don't understand the radix MMU introduced with POWER ISA 3.0, but incorrectly select it if presented with the option at CAS time. We workaround this in qemu by explicitly excluding the radix (and other ISA 3.0 linked) options if the guest doesn't explicitly note support for ISA 3.0. This is handled by the 'cas_legacy_guest_workaround' flag, which is pretty vague. Rename it to 'cas_pre_isa3_guest' to be clearer about what it's for. In addition, we unnecessarily call spapr_populate_pa_features() with different options when initially constructing the device tree and when adjusting it at CAS time. At the initial construct time cas_pre_isa3_guest is already false, so we can still use the flag, rather than explicitly overriding it to be false at the callsite. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2019-10-04 10:25:23 +10:00
David Gibson	9146206eb2	spapr: Use SHUTDOWN_CAUSE_SUBSYSTEM_RESET for CAS reboots The sPAPR platform includes feature negotiation between the guest and platform. That sometimes requires reconfiguring the virtual hardware, and in some cases that is a complex enough process that we trigger a system reset to handle it. That interacts badly with -no-reboot - we trigger the reboot, -no-reboot means we exit and so the guest never gets to try again. Eventually we want to get rid of CAS reboots entirely, since they're odd and irritating for the user. But in the meantime we can fix the -no-reboot problem by using SHUTDOWN_CAUSE_SUBSYSTEM_RESET which ignores -no-reboot and seems to be designed for this sort of faux-reset for internal purposes only. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-29 09:46:07 +10:00
Michael Roth	0fb6bd0732	spapr: initial implementation for H_TPM_COMM/spapr-tpm-proxy This implements the H_TPM_COMM hypercall, which is used by an Ultravisor to pass TPM commands directly to the host's TPM device, or a TPM Resource Manager associated with the device. This also introduces a new virtual device, spapr-tpm-proxy, which is used to configure the host TPM path to be used to service requests sent by H_TPM_COMM hcalls, for example: -device spapr-tpm-proxy,id=tpmp0,host-path=/dev/tpmrm0 By default, no spapr-tpm-proxy will be created, and hcalls will return H_FUNCTION. The full specification for this hypercall can be found in docs/specs/ppc-spapr-uv-hcalls.txt Since SVM-related hcalls like H_TPM_COMM use a reserved range of 0xEF00-0xEF80, we introduce a separate hcall table here to handle them. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com Message-Id: <20190717205842.17827-3-mdroth@linux.vnet.ibm.com> [dwg: Corrected #include for upstream change] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:12 +10:00
Nicholas Piggin	107413142b	spapr: Implement H_JOIN This has been useful to modify and test the Linux pseries suspend code but it requires modification to the guest to call it (due to being gated by other unimplemented features). It is not otherwise used by Linux yet, but work is slowly progressing there. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-5-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:12 +10:00
Nicholas Piggin	e8ce0e40ee	spapr: Implement H_CONFER This does not do directed yielding and is not quite as strict as PAPR specifies in terms of precise dispatch behaviour. This generally will mean suboptimal performance, rather than guest misbehaviour. Linux does not rely on exact dispatch behaviour. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-4-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:12 +10:00
Nicholas Piggin	3a6e6224a9	spapr: Implement H_PROD H_PROD is added, and H_CEDE is modified to test the prod bit according to PAPR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-3-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:12 +10:00
Nicholas Piggin	03ef074c04	spapr: Implement dispatch tracking for tcg Implement cpu_exec_enter/exit on ppc which calls into new methods of the same name in PPCVirtualHypervisorClass. These are used by spapr to implement the splpar VPA dispatch counter initially. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-2-npiggin@gmail.com> [dwg: Removed unnecessary CONFIG_USER_ONLY checks as suggested by gkurz] Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:11 +10:00
Shivaprasad G Bhat	00005f2229	ppc: fix leak in h_client_architecture_support Free all SpaprOptionVector local pointers after use. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Message-Id: <156335160761.82682.11912058325777251614.stgit@lep8c.aus.stglabs.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:11 +10:00
Markus Armbruster	54d31236b9	sysemu: Split sysemu/runstate.h off sysemu/sysemu.h sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]	2019-08-16 13:37:36 +02:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	0b8fa32f55	Include qemu/module.h where needed, drop it from qemu-common.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]	2019-06-12 13:18:33 +02:00
Greg Kurz	75de59416d	spapr: Print out extra hints when CAS negotiation of interrupt mode fails Let's suggest to the user how the machine should be configured to allow the guest to boot successfully. Suggested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155799221739.527449.14907564571096243745.stgit@bahia.lan> Reviewed-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> [dwg: Adjusted for style error] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:45 +10:00
Greg Kurz	e7f78db9fb	spapr/xive: Sanity checks of OV5 during CAS If a machine is started with ic-mode=xive but the guest only knows about XICS, eg. an RHEL 7.6 guest, the kernel panics. This is expected but a bit unfortunate since the crash doesn't provide much information for the end user to guess what's happening. Detect that during CAS and exit QEMU with a proper error message instead, like it is already done for the MMU. Even if this is less likely to happen, the opposite case of a guest that only knows about XIVE would certainly fail all the same if the machine is started with ic-mode=xics. Also, the only valid values a guest can pass in byte 23 of OV5 during CAS are 0b00 (XIVE legacy mode) and 0b01 (XIVE exploitation mode). Any other value is a bug, at least with the current spec. Again, it does not seem right to let the guest go on without a precise idea of the interrupt mode it asked for. Handle these cases as well. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155793986451.464434.12887933000007255549.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:45 +10:00
Benjamin Herrenschmidt	a2dd4e83e7	ppc/hash64: Rework R and C bit updates With MT-TCG, we are now running translation in a racy way, thus we need to mimic hardware when it comes to updating the R and C bits, by doing byte stores. The current "store_hpte" abstraction is ill suited for this, we replace it with two separate callbacks for setting R and C. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190411080004.8690-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-04-26 11:37:57 +10:00

1 2 3 4

177 Commits