mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Greg Kurz	f29b959dc6	spapr: Convert hpt_prepare_thread() to use qemu_try_memalign() HPT resizing is asynchronous: the guest first kicks off the creation of a new HPT, then it waits for that new HPT to be actually created and finally it asks the current HPT to be replaced by the new one. In the case of a userland allocated HPT, this currently relies on calling qemu_memalign() which aborts on OOM and never returns NULL. Since we seem to have path to report the failure to the guest with an H_NO_MEM return value, use qemu_try_memalign() instead of qemu_memalign(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160398563636.32380.1747166034877173994.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-11-05 12:18:48 +11:00
Greg Kurz	7e92da81be	spapr: Simplify error handling in do_client_architecture_support() Use the return value of ppc_set_compat_all() to check failures, which is preferred over hijacking local_err. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20200914123505.612812-7-groug@kaod.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-10-09 10:15:06 +11:00
Greg Kurz	121afbe487	spapr: Get rid of cas_check_pvr() error reporting The cas_check_pvr() function has two purposes: - finding the "best" logical PVR, ie. the most recent one supported by the guest for this CPU type - checking if the guest supports the real PVR of this CPU type, which is just an optional extra information to workaround the lack of support for "compat" mode in PR KVM This logic doesn't need error reporting, really. If we don't find a suitable logical PVR, we return the special value 0 which is definitely not a valid PVR. Let the caller decide on whether it should error out or not. This doesn't change the behavior. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20200914123505.612812-6-groug@kaod.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-10-09 10:15:06 +11:00
Daniel Henrique Barboza	f8a13fc381	spapr: move h_home_node_associativity to spapr_numa.c The implementation of this hypercall will be modified to use spapr->numa_assoc_arrays input. Moving it to spapr_numa.c makes make more sense. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200904172422.617460-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-09-08 10:37:32 +10:00
Markus Armbruster	552d7f49ee	qom: Crash more nicely on object_property_get_link() failure Pass &error_abort instead of NULL where the returned value is dereferenced or asserted to be non-null. Drop a now redundant assertion. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-24-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Greg Kurz	087820e37f	spapr: Drop CAS reboot flag The CAS reboot flag is false by default and all the locations that could set it to true have been dropped. This means that all code blocks depending on the flag being set is dead code and the other code blocks should be executed always. Just do that and drop the now uneeded CAS reboot flag. Fix a comment on the way to make checkpatch happy. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514994893.478799.11772512888322840990.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-07 11:10:50 +10:00
Alexey Kardashevskiy	91067db1ab	spapr/cas: Separate CAS handling from rebuilding the FDT At the moment "ibm,client-architecture-support" ("CAS") is implemented in SLOF and QEMU assists via the custom H_CAS hypercall which copies an updated flatten device tree (FDT) blob to the SLOF memory which it then uses to update its internal tree. When we enable the OpenFirmware client interface in QEMU, we won't need to copy the FDT to the guest as the client is expected to fetch the device tree using the client interface. This moves FDT rebuild out to a separate helper which is going to be called from the "ibm,client-architecture-support" handler and leaves writing FDT to the guest in the H_CAS handler. This should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200310050733.29805-3-aik@ozlabs.ru> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514994229.478799.2178881312094922324.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-07 11:10:50 +10:00
Greg Kurz	b4b83312e7	spapr: Simplify selection of radix/hash during CAS The guest can select the MMU mode by setting bits 0-1 of byte 24 in OV5 to to 0b00 for hash or 0b01 for radix. As required by the architecture, we terminate the boot process if any other value is found there. The usual way to negotiate features in OV5 is basically ANDing the bitfield provided by the guest and the bitfield of features supported by QEMU, previously populated at machine init. For some not documented reason, MMU is treated differently : bit 1 of byte 24 (the radix/hash bit) is cleared from the guest OV5 and explicitely set in the final negotiated OV5 if radix was requested. Since the only expected input from the guest is the radix/hash bit being set or not, it seems more appropriate to handle this like we do for XIVE. Set the radix bit in spapr->ov5 at machine init if it has a chance to work (ie. power9, either TCG or a radix capable KVM) and rely exclusively on spapr_ovec_intersect() to set the radix bit in spapr->ov5_cas. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514993621.478799.4204740354545734293.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-07 11:10:50 +10:00
Greg Kurz	86962462f8	spapr: Don't check capabilities removed between CAS calls We currently check if some capability in OV5 was removed by the guest since the previous CAS, and we trigger a CAS reboot in that case. This was required because it could call for a device-tree property or node removal, that we didn't support until recently (see commit `6787d27b04` "spapr: add option vector handling in CAS-generated resets" for details). Now that we render a full FDT at CAS and that SLOF is able to handle node removal, we don't need to do a CAS reset in this case anymore. Also, this check can only return true if the guest has already called CAS since the last full system reset (otherwise spapr->ov5_cas is empty). Linux doesn't do that so this can be considered as dead code for the vast majority of existing setups. Drop the check. Since the only use of the ov5_cas_old variable is precisely the check itself, drop the variable as well. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514993021.478799.10928618293640651819.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-07 11:10:50 +10:00
Greg Kurz	ce05fa0fcc	spapr: Fix memory leak in h_client_architecture_support() This is the only error path that needs to free the previously allocated ov1. Reported-by: Coverity (CID 1421924) Fixes: `cbd0d7f363` "spapr: Fail CAS if option vector table cannot be parsed" Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158481206205.336182.16106097429336044843.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-03-24 11:56:37 +11:00
David Gibson	8897ea5a9f	spapr: Don't attempt to clamp RMA to VRMA constraint The Real Mode Area (RMA) is the part of memory which a guest can access when in real (MMU off) mode. Of course, for a guest under KVM, the MMU isn't really turned off, it's just in a special translation mode - Virtual Real Mode Area (VRMA) - which looks like real mode in guest mode. The mechanics of how this works when using the hash MMU (HPT) put a constraint on the size of the RMA, which depends on the size of the HPT. So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA we advertise to the guest based on this VRMA limit. There are several things wrong with this: 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum of Node 0 memory size and the VRMA limit. That will often work the same as clamping, but there can be other constraints on RMA size which supersede Node 0 memory size. We have real bugs caused by this (currently worked around in the guest kernel) 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where we're past the point that we can actually advertise an RMA limit to the guest 3) But most fundamentally, the VRMA limit depends on host configuration (page size) which shouldn't be visible to the guest, but this partially exposes it. This can cause problems with migration in certain edge cases, although we will mostly get away with it. In practice, this clamping is almost never applied anyway. With 64kiB pages and the normal rules for sizing of the HPT, the theoretical VRMA limit will be 4x(guest memory size) and so never hit. It will hit with 4kiB pages, where it will be (guest memory size)/4. However all mainstream distro kernels for POWER have used a 64kiB page size for at least 10 years. So, simply replace this logic with a check that the RMA we've calculated based only on guest visible configuration will fit within the host implied VRMA limit. This can break if running HPT guests on a host kernel with 4kiB page size. As noted that's very rare. There also exist several possible workarounds: * Change the host kernel to use 64kiB pages * Use radix MMU (RPT) guests instead of HPT * Use 64kiB hugepages on the host to back guest memory * Increase the guest memory size so that the RMA hits one of the fixed limits before the RMA limit. This is relatively easy on POWER8 which has a 16GiB limit, harder on POWER9 which has a 1TiB limit. * Use a guest NUMA configuration which artificially constrains the RMA within the VRMA limit (the RMA must always fit within Node 0). Previously, on KVM, we also temporarily reduced the rma_size to 256M so that the we'd load the kernel and initrd safely, regardless of the VRMA limit. This was a) confusing, b) could significantly limit the size of images we could load and c) introduced a behavioural difference between KVM and TCG. So we remove that as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org>	2020-03-17 09:41:15 +11:00
Greg Kurz	ad334d89a6	spapr: Handle pending hot plug/unplug requests at CAS If a hot plug or unplug request is pending at CAS, we currently trigger a CAS reboot, which severely increases the guest boot time. This is because SLOF doesn't handle hot plug events and we had no way to fix the FDT that gets presented to the guest. We can do better thanks to recent changes in QEMU and SLOF: - we now return a full FDT to SLOF during CAS - SLOF was fixed to correctly detect any device that was either added or removed since boot time and to update its internal DT accordingly. The right solution is to process all pending hot plug/unplug requests during CAS: convert hot plugged devices to cold plugged devices and remove the hot unplugged ones, which is exactly what spapr_drc_reset() does. Also clear all hot plug events that are currently queued since they're no longer relevant. Note that SLOF cannot currently populate hot plugged PCI bridges or PHBs at CAS. Until this limitation is lifted, SLOF will reset the machine when this scenario occurs : this will allow the FDT to be fully processed when SLOF is started again (ie. the same effect as the CAS reboot that would occur anyway without this patch). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158257222352.4102917.8984214333937947307.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 09:41:14 +11:00
Paolo Bonzini	9e264985ff	Merge branch 'exec_rw_const_v4' of https://github.com/philmd/qemu into HEAD	2020-02-25 13:41:48 +01:00
Greg Kurz	4b63db1289	spapr: Don't use spapr_drc_needed() in CAS code We currently don't support hotplug of devices between boot and CAS. If this happens a CAS reboot is triggered. We detect this during CAS using the spapr_drc_needed() function which is essentially a VMStateDescription .needed callback. Even if the condition for CAS reboot happens to be the same as for DRC migration, it looks wrong to piggyback a migration helper for this. Introduce a helper with slightly more explicit name and use it in both CAS and DRC migration code. Since a subsequent patch will enhance this helper to cover the case of hot unplug, let's go for spapr_drc_transient(). While here convert spapr_hotplugged_dev_before_cas() to the "transient" wording as well. This doesn't change any behaviour. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158169248180.3465937.9531405453362718771.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:04 +11:00
Philippe Mathieu-Daudé	85eb7c18ee	Let cpu_[physical]_memory() calls pass a boolean 'is_write' argument Use an explicit boolean type. This commit was produced with the included Coccinelle script scripts/coccinelle/exec_rw_const. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-02-20 14:47:08 +01:00
Greg Kurz	12b3868ead	spapr: Don't allow multiple active vCPUs at CAS According to the description of "ibm,client-architecture-support" that can found in LoPAPR "B.6.2.3 Root Node Methods": If multiple partition processors or threads are active at the time of the ibm,client-architecture-support method call, or an error is detected in the format of the ibm,architecture.vec structure, the err? boolean shall be TRUE; else FALSE. We certainly don't want to temper with the platform or with the PCR of the other vCPUs if they happen to be active. Ensure we have only one active vCPU and fail CAS otherwise. This is just for conformance and robustness, it doesn't fix any known bugs. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157969867170.571404.12117797348882189656.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Greg Kurz	cbd0d7f363	spapr: Fail CAS if option vector table cannot be parsed Most of the option vector helpers have assertions to check their arguments aren't null. The guest can provide an arbitrary address for the CAS structure that would result in such null arguments. Fail CAS with H_PARAMETER and print a warning instead of aborting QEMU. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <157925255250.397143.10855183619366882459.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
David Gibson	d1d32d6255	spapr: Simplify ovec diff spapr_ovec_diff(ov, old, new) has somewhat complex semantics. ov is set to those bits which are in new but not old, and it returns as a boolean whether or not there are any bits in old but not new. It turns out that both callers only care about the second, not the first. This is basically equivalent to a bitmap subset operation, which is easier to understand and implement. So replace spapr_ovec_diff() with spapr_ovec_subset(). Cc: Mike Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>	2019-12-17 10:39:48 +11:00
David Gibson	0c21e07354	spapr: Fold h_cas_compose_response() into h_client_architecture_support() spapr_h_cas_compose_response() handles the last piece of the PAPR feature negotiation process invoked via the ibm,client-architecture-support OF call. Its only caller is h_client_architecture_support() which handles most of the rest of that process. I believe it was placed in a separate file originally to handle some fiddly dependencies between functions, but mostly it's just confusing to have the CAS process split into two pieces like this. Now that compose response is simplified (by just generating the whole device tree anew), it's cleaner to just fold it into h_client_architecture_support(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-12-17 10:39:48 +11:00
David Gibson	8deb8019d6	spapr: Don't trigger a CAS reboot for XICS/XIVE mode changeover PAPR allows the interrupt controller used on a POWER9 machine (XICS or XIVE) to be selected by the guest operating system, by using the ibm,client-architecture-support (CAS) feature negotiation call. Currently, if the guest selects an interrupt controller different from the one selected at initial boot, this causes the system to be reset with the new model and the boot starts again. This means we run through the SLOF boot process twice, as well as any other bootloader (e.g. grub) in use before the OS calls CAS. This can be confusing and/or inconvenient for users. Thanks to two fairly recent changes, we no longer need this reboot. 1) we now completely regenerate the device tree when CAS is called (meaning we don't need special case updates for all the device tree changes caused by the interrupt controller mode change), 2) we now have explicit code paths to activate and deactivate the different interrupt controllers, rather than just implicitly calling those at machine reset time. We can therefore eliminate the reboot for changing irq mode, simply by putting a call to spapr_irq_update_active_intc() before we call spapr_h_cas_compose_response() (which gives the updated device tree to the guest firmware and OS). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-12-17 10:39:48 +11:00
David Gibson	ca62823b79	spapr: Use less cryptic representation of which irq backends are supported SpaprIrq::ov5 stores the value for a particular byte in PAPR option vector 5 which indicates whether XICS, XIVE or both interrupt controllers are available. As usual for PAPR, the encoding is kind of overly complicated and confusing (though to be fair there are some backwards compat things it has to handle). But to make our internal code clearer, have SpaprIrq encode more directly which backends are available as two booleans, and derive the OV5 value from that at the point we need it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:23 +10:00
David Gibson	daa36379ce	spapr: Simplify handling of pre ISA 3.0 guest workaround handling Certain old guest versions don't understand the radix MMU introduced with POWER ISA 3.0, but incorrectly select it if presented with the option at CAS time. We workaround this in qemu by explicitly excluding the radix (and other ISA 3.0 linked) options if the guest doesn't explicitly note support for ISA 3.0. This is handled by the 'cas_legacy_guest_workaround' flag, which is pretty vague. Rename it to 'cas_pre_isa3_guest' to be clearer about what it's for. In addition, we unnecessarily call spapr_populate_pa_features() with different options when initially constructing the device tree and when adjusting it at CAS time. At the initial construct time cas_pre_isa3_guest is already false, so we can still use the flag, rather than explicitly overriding it to be false at the callsite. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2019-10-04 10:25:23 +10:00
David Gibson	9146206eb2	spapr: Use SHUTDOWN_CAUSE_SUBSYSTEM_RESET for CAS reboots The sPAPR platform includes feature negotiation between the guest and platform. That sometimes requires reconfiguring the virtual hardware, and in some cases that is a complex enough process that we trigger a system reset to handle it. That interacts badly with -no-reboot - we trigger the reboot, -no-reboot means we exit and so the guest never gets to try again. Eventually we want to get rid of CAS reboots entirely, since they're odd and irritating for the user. But in the meantime we can fix the -no-reboot problem by using SHUTDOWN_CAUSE_SUBSYSTEM_RESET which ignores -no-reboot and seems to be designed for this sort of faux-reset for internal purposes only. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-29 09:46:07 +10:00
Michael Roth	0fb6bd0732	spapr: initial implementation for H_TPM_COMM/spapr-tpm-proxy This implements the H_TPM_COMM hypercall, which is used by an Ultravisor to pass TPM commands directly to the host's TPM device, or a TPM Resource Manager associated with the device. This also introduces a new virtual device, spapr-tpm-proxy, which is used to configure the host TPM path to be used to service requests sent by H_TPM_COMM hcalls, for example: -device spapr-tpm-proxy,id=tpmp0,host-path=/dev/tpmrm0 By default, no spapr-tpm-proxy will be created, and hcalls will return H_FUNCTION. The full specification for this hypercall can be found in docs/specs/ppc-spapr-uv-hcalls.txt Since SVM-related hcalls like H_TPM_COMM use a reserved range of 0xEF00-0xEF80, we introduce a separate hcall table here to handle them. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com Message-Id: <20190717205842.17827-3-mdroth@linux.vnet.ibm.com> [dwg: Corrected #include for upstream change] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:12 +10:00
Nicholas Piggin	107413142b	spapr: Implement H_JOIN This has been useful to modify and test the Linux pseries suspend code but it requires modification to the guest to call it (due to being gated by other unimplemented features). It is not otherwise used by Linux yet, but work is slowly progressing there. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-5-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:12 +10:00
Nicholas Piggin	e8ce0e40ee	spapr: Implement H_CONFER This does not do directed yielding and is not quite as strict as PAPR specifies in terms of precise dispatch behaviour. This generally will mean suboptimal performance, rather than guest misbehaviour. Linux does not rely on exact dispatch behaviour. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-4-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:12 +10:00
Nicholas Piggin	3a6e6224a9	spapr: Implement H_PROD H_PROD is added, and H_CEDE is modified to test the prod bit according to PAPR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-3-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:12 +10:00
Nicholas Piggin	03ef074c04	spapr: Implement dispatch tracking for tcg Implement cpu_exec_enter/exit on ppc which calls into new methods of the same name in PPCVirtualHypervisorClass. These are used by spapr to implement the splpar VPA dispatch counter initially. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-2-npiggin@gmail.com> [dwg: Removed unnecessary CONFIG_USER_ONLY checks as suggested by gkurz] Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:11 +10:00
Shivaprasad G Bhat	00005f2229	ppc: fix leak in h_client_architecture_support Free all SpaprOptionVector local pointers after use. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Message-Id: <156335160761.82682.11912058325777251614.stgit@lep8c.aus.stglabs.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:11 +10:00
Markus Armbruster	54d31236b9	sysemu: Split sysemu/runstate.h off sysemu/sysemu.h sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]	2019-08-16 13:37:36 +02:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	0b8fa32f55	Include qemu/module.h where needed, drop it from qemu-common.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]	2019-06-12 13:18:33 +02:00
Greg Kurz	75de59416d	spapr: Print out extra hints when CAS negotiation of interrupt mode fails Let's suggest to the user how the machine should be configured to allow the guest to boot successfully. Suggested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155799221739.527449.14907564571096243745.stgit@bahia.lan> Reviewed-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> [dwg: Adjusted for style error] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:45 +10:00
Greg Kurz	e7f78db9fb	spapr/xive: Sanity checks of OV5 during CAS If a machine is started with ic-mode=xive but the guest only knows about XICS, eg. an RHEL 7.6 guest, the kernel panics. This is expected but a bit unfortunate since the crash doesn't provide much information for the end user to guess what's happening. Detect that during CAS and exit QEMU with a proper error message instead, like it is already done for the MMU. Even if this is less likely to happen, the opposite case of a guest that only knows about XIVE would certainly fail all the same if the machine is started with ic-mode=xics. Also, the only valid values a guest can pass in byte 23 of OV5 during CAS are 0b00 (XIVE legacy mode) and 0b01 (XIVE exploitation mode). Any other value is a bug, at least with the current spec. Again, it does not seem right to let the guest go on without a precise idea of the interrupt mode it asked for. Handle these cases as well. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155793986451.464434.12887933000007255549.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:45 +10:00
Benjamin Herrenschmidt	a2dd4e83e7	ppc/hash64: Rework R and C bit updates With MT-TCG, we are now running translation in a racy way, thus we need to mimic hardware when it comes to updating the R and C bits, by doing byte stores. The current "store_hpte" abstraction is ill suited for this, we replace it with two separate callbacks for setting R and C. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190411080004.8690-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-04-26 11:37:57 +10:00
Benjamin Herrenschmidt	993aaf0c00	ppc/spapr: Use proper HPTE accessors for H_READ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190411080004.8690-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-04-26 11:37:57 +10:00
David Gibson	49e9fdd741	spapr: Correctly set LPCR[GTSE] in H_REGISTER_PROCESS_TABLE `176dccee` "target/ppc/spapr: Clear partition table entry when allocating hash table" reworked the H_REGISTER_PROCESS_TABLE hypercall, but unfortunately due to a small error no longer correctly sets the LPCR[GTSE] bit which allows the guest to directly execute (some types of) tlbie (TLB flush) instructions without involving the hypervisor. We got away with this, initially, because POWER9 did not have hypervisor mode enabled in its msr_mask, which meant we didn't actually run hypervisor privilege checks in TCG at all. However, `da874d90` "target/ppc: add HV support for POWER9" turned on HV support on POWER9 for the benefit of the powernv machine type. This exposed the earlier bug in H_REGISTER_PROCESS_TABLE, and causes guests which rely on LPCR[GTSE] (i.e. basically all of them) to crash during early boot when their first tlbie instruction causes an unexpected trap. Fixes: `176dccee` target/ppc/spapr: Clear partition table entry when allocating hash table Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Cleber Rosa <crosa@redhat.com>	2019-03-19 15:20:14 +11:00
David Gibson	ce2918cbc3	spapr: Use CamelCase properly The qemu coding standard is to use CamelCase for type and structure names, and the pseries code follows that... sort of. There are quite a lot of places where we bend the rules in order to preserve the capitalization of internal acronyms like "PHB", "TCE", "DIMM" and most commonly "sPAPR". That was a bad idea - it frequently leads to names ending up with hard to read clusters of capital letters, and means they don't catch the eye as type identifiers, which is kind of the point of the CamelCase convention in the first place. In short, keeping type identifiers look like CamelCase is more important than preserving standard capitalization of internal "words". So, this patch renames a heap of spapr internal type names to a more standard CamelCase. In addition to case changes, we also make some other identifier renames: VIOsPAPR* -> SpaprVio* The reverse word ordering was only ever used to mitigate the capital cluster, so revert to the natural ordering. VIOsPAPRVTYDevice -> SpaprVioVty VIOsPAPRVLANDevice -> SpaprVioVlan Brevity, since the "Device" didn't add useful information sPAPRDRConnector -> SpaprDrc sPAPRDRConnectorClass -> SpaprDrcClass Brevity, and makes it clearer this is the same thing as a "DRC" mentioned in many other places in the code This is 100% a mechanical search-and-replace patch. It will, however, conflict with essentially any and all outstanding patches touching the spapr code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 14:33:05 +11:00
Suraj Jitindar Singh	176dcceedd	target/ppc/spapr: Clear partition table entry when allocating hash table If we allocate a hash page table then we know that the guest won't be using process tables, so set the partition table entry maintained for the guest to zero. If this isn't done, then the guest radix bit will remain set in the entry. This means that when the guest calls H_REGISTER_PROCESS_TABLE there will be a mismatch between then flags and the value in spapr->patb_entry, and the call will fail. The guest will then panic: Failed to register process table (rc=-4) kernel BUG at arch/powerpc/platforms/pseries/lpar.c:959 The result being that it isn't possible to boot a hash guest on a P9 system. Also fix a bug in the flags parsing in h_register_process_table() which was introduced by the same patch, and simplify the handling to make it less likely that errors will be introduced in the future. The effect would have been setting the host radix bit LPCR_HR for a hash guest using process tables, which currently isn't supported and so couldn't have been triggered. Fixes: `00fd075e18` "target/ppc/spapr: Set LPCR:HR when using Radix mode" Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190305022102.17610-1-sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 14:33:04 +11:00
Suraj Jitindar Singh	8ff43ee404	target/ppc/spapr: Add SPAPR_CAP_CCF_ASSIST Introduce a new spapr_cap SPAPR_CAP_CCF_ASSIST to be used to indicate the requirement for a hw-assisted version of the count cache flush workaround. The count cache flush workaround is a software workaround which can be used to flush the count cache on context switch. Some revisions of hardware may have a hardware accelerated flush, in which case the software flush can be shortened. This cap is used to set the availability of such hardware acceleration for the count cache flush routine. The availability of such hardware acceleration is indicated by the H_CPU_CHAR_BCCTR_FLUSH_ASSIST flag being set in the characteristics returned from the KVM_PPC_GET_CPU_CHAR ioctl. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301031912.28809-2-sjitindarsingh@gmail.com> [dwg: Small style fixes] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh	399b2896d4	target/ppc/spapr: Add workaround option to SPAPR_CAP_IBS The spapr_cap SPAPR_CAP_IBS is used to indicate the level of capability for mitigations for indirect branch speculation. Currently the available values are broken (default), fixed-ibs (fixed by serialising indirect branches) and fixed-ccd (fixed by diabling the count cache). Introduce a new value for this capability denoted workaround, meaning that software can work around the issue by flushing the count cache on context switch. This option is available if the hypervisor sets the H_CPU_BEHAV_FLUSH_COUNT_CACHE flag in the cpu behaviours returned from the KVM_PPC_GET_CPU_CHAR ioctl. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301031912.28809-1-sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 12:07:49 +11:00
Benjamin Herrenschmidt	79825f4d58	target/ppc: Rename PATB/PATBE -> PATE That "b" means "base address" and thus shouldn't be in the name of actual entries and related constants. This patch keeps the synthetic patb_entry field of the spapr virtual hypervisor unchanged until I figure out if that has an impact on the migration stream. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-11-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt	00fd075e18	target/ppc/spapr: Set LPCR:HR when using Radix mode The HW relies on LPCR:HR along with the PATE to determine whether to use Radix or Hash mode. In fact it uses LPCR:HR more commonly than the PATE. For us, it's also more efficient to do so, especially since unlike the HW we do not maintain a cache of the current PATE and HV PATE in a generic place. Prepare the grounds for that by ensuring that LPCR:HR is set properly on SPAPR machines. Another option would have been to use a callback to get the PATE but this gets messy when implementing bare metal support, it's much simpler (and faster) to use LPCR. Since existing migration streams may not have it, fix it up in spapr_post_load() as well based on the pseudo-PATE entry that we keep. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Cédric Le Goater	13db0cd9b8	spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS The 'dual' sPAPR IRQ backend supports both interrupt mode, XIVE exploitation mode and the legacy compatibility mode (XICS). both modes are not supported at the same time. The machine starts with the legacy mode and a new interrupt mode can then be negotiated by the CAS process. In this case, the new mode is activated after a reset to take into account the required changes in the machine. These impact the device tree layout, the interrupt presenter object and the exposed MMIO regions in the case of XIVE. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:14 +11:00
Alexey Kardashevskiy	fea35ca4b8	ppc/spapr: Receive and store device tree blob from SLOF SLOF receives a device tree and updates it with various properties before switching to the guest kernel and QEMU is not aware of any changes made by SLOF. Since there is no real RTAS (QEMU implements it), it makes sense to pass the SLOF final device tree to QEMU to let it implement RTAS related tasks better, such as PCI host bus adapter hotplug. Specifially, now QEMU can find out the actual XICS phandle (for PHB hotplug) and the RTAS linux,rtas-entry/base properties (for firmware assisted NMI - FWNMI). This stores the initial DT blob in the sPAPR machine and replaces it in the KVMPPC_H_UPDATE_DT (new private hypercall) handler. This adds an @update_dt_enabled machine property to allow backward migration. SLOF already has a hypercall since https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183 This makes use of the new fdt_check_full() helper. In order to allow the configure script to pick the correct DTC version, this adjusts the DTC presense test. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:13 +11:00
Laurent Vivier	c24ba3d0a3	spapr: Add H-Call H_HOME_NODE_ASSOCIATIVITY H_HOME_NODE_ASSOCIATIVITY H-Call returns the associativity domain designation associated with the identifier input parameter This fixes a crash when we try to hotplug a CPU in memory-less and CPU-less numa node. In this case, the kernel tries to online the node, but without the information provided by this h-call, the node id, it cannot and the CPU is started while the node is not onlined. It also removes the warning message from the kernel: VPHN is not supported. Disabling polling.. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:13 +11:00
David Gibson	7388efafc2	target/ppc, spapr: Move VPA information to machine_data CPUPPCState currently contains a number of fields containing the state of the VPA. The VPA is a PAPR specific concept covering several guest/host shared memory areas used to communicate some information with the hypervisor. As a PAPR concept this is really machine specific information, although it is per-cpu, so it doesn't really belong in the core CPU state structure. There's also other information that's per-cpu, but platform/machine specific. So create a (void *)machine_data in PowerPCCPU which can be used by the machine to locate per-cpu data. Intialization, lifetime and cleanup of machine_data is entirely up to the machine type. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>	2018-06-16 16:32:50 +10:00
Greg Kurz	2c9dfdacc5	spapr: fix leak in h_client_architecture_support() If the negotiated compat mode can't be set, but raw mode is supported, we decide to ignore the error. An so, we should free it to prevent a memory leak. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-06-16 16:32:33 +10:00
David Hildenbrand	e017da370b	machine: rename MemoryHotplugState to DeviceMemoryState Rename it to better match the new terminology. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-9-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00
David Hildenbrand	b0c14ec4ef	machine: make MemoryHotplugState accessible via the machine Let's allow to query the MemoryHotplugState directly from the machine. If the pointer is NULL, the machine does not support memory devices. If the pointer is !NULL, the machine supports memory devices and the data structure contains information about the applicable physical guest address space region. This allows us to generically detect if a certain machine has support for memory devices, and to generically manage it (find free address range, plug/unplug a memory region). We will rename "MemoryHotplugState" to something more meaningful ("DeviceMemory") after we completed factoring out the pc-dimm code into MemoryDevice code. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> [ehabkost: rebased series, solved conflicts at spapr.c] [ehabkost: squashed fix to use g_malloc0()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00

1 2 3 4

162 Commits