mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
David Gibson	2d1fb9bc8e	spapr: Handle Decimal Floating Point (DFP) as an optional capability Decimal Floating Point has been available on POWER7 and later (server) cpus. However, it can be disabled on the hypervisor, meaning that it's not available to guests. We currently handle this by conditionally advertising DFP support in the device tree depending on whether the guest CPU model supports it - which can also depend on what's allowed in the host for -cpu host. That can lead to confusion on migration, since host properties are silently affecting guest visible properties. This patch handles it by treating it as an optional capability for the pseries machine type. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-01-17 09:35:24 +11:00
David Gibson	2938664286	spapr: Handle VMX/VSX presence as an spapr capability flag We currently have some conditionals in the spapr device tree code to decide whether or not to advertise the availability of the VMX (aka Altivec) and VSX vector extensions to the guest, based on whether the guest cpu has those features. This can lead to confusion and subtle failures on migration, since it makes a guest visible change based only on host capabilities. We now have a better mechanism for this, in spapr capabilities flags, which explicitly depend on user options rather than host capabilities. Rework the advertisement of VSX and VMX based on a new VSX capability. We no longer bother with a conditional for VMX support, because every CPU that's ever been supported by the pseries machine type supports VMX. NOTE: Some userspace distributions (e.g. RHEL7.4) already rely on availability of VSX in libc, so using cap-vsx=off may lead to a fatal SIGILL in init. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-01-17 09:35:24 +11:00
David Gibson	be85537d65	spapr: Validate capabilities on migration Now that the "pseries" machine type implements optional capabilities (well, one so far) there's the possibility of having different capabilities available at either end of a migration. Although arguably a user error, it would be nice to catch this situation and fail as gracefully as we can. This adds code to migrate the capabilities flags. These aren't pulled directly into the destination's configuration since what the user has specified on the destination command line should take precedence. However, they are checked against the destination capabilities. If the source was using a capability which is absent on the destination, we fail the migration, since that could easily cause a guest crash or other bad behaviour. If the source lacked a capability which is present on the destination we warn, but allow the migration to proceed. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-01-17 09:35:24 +11:00
David Gibson	ee76a09fc7	spapr: Treat Hardware Transactional Memory (HTM) as an optional capability This adds an spapr capability bit for Hardware Transactional Memory. It is enabled by default for pseries-2.11 and earlier machine types. with POWER8 or later CPUs (as it must be, since earlier qemu versions would implicitly allow it). However it is disabled by default for the latest pseries-2.12 machine type. This means that with the latest machine type, HTM will not be available, regardless of CPU, unless it is explicitly enabled on the command line. That change is made on the basis that: * This way running with -M pseries,accel=tcg will start with whatever cpu and will provide the same guest visible model as with accel=kvm. - More specifically, this means existing make check tests don't have to be modified to use cap-htm=off in order to run with TCG * We hope to add a new "HTM without suspend" feature in the not too distant future which could work on both POWER8 and POWER9 cpus, and could be enabled by default. * Best guesses suggest that future POWER cpus may well only support the HTM-without-suspend model, not the (frankly, horribly overcomplicated) POWER8 style HTM with suspend. * Anecdotal evidence suggests problems with HTM being enabled when it wasn't wanted are more common than being missing when it was. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-01-17 09:35:24 +11:00
David Gibson	33face6b89	spapr: Capabilities infrastructure Because PAPR is a paravirtual environment access to certain CPU (or other) facilities can be blocked by the hypervisor. PAPR provides ways to advertise in the device tree whether or not those features are available to the guest. In some places we automatically determine whether to make a feature available based on whether our host can support it, in most cases this is based on limitations in the available KVM implementation. Although we correctly advertise this to the guest, it means that host factors might make changes to the guest visible environment which is bad: as well as generaly reducing reproducibility, it means that a migration between different host environments can easily go bad. We've mostly gotten away with it because the environments considered mature enough to be well supported (basically, KVM on POWER8) have had consistent feature availability. But, it's still not right and some limitations on POWER9 is going to make it more of an issue in future. This introduces an infrastructure for defining "sPAPR capabilities". These are set by default based on the machine version, masked by the capabilities of the chosen cpu, but can be overriden with machine properties. The intention is at reset time we verify that the requested capabilities can be supported on the host (considering TCG, KVM and/or host cpu limitations). If not we simply fail, rather than silently modifying the advertised featureset to the guest. This does mean that certain configurations that "worked" may now fail, but such configurations were already more subtly broken. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-01-17 09:35:24 +11:00
David Gibson	51f84465dd	spapr: Correct compatibility mode setting for hotplugged CPUs Currently the pseries machine sets the compatibility mode for the guest's cpus in two places: 1) at machine reset and 2) after CAS negotiation. This means that if we set or negotiate a compatiblity mode, then hotplug a cpu, the hotplugged cpu doesn't get the right mode set and will incorrectly have the full native features. To correct this, we set the compatibility mode on a cpu when it is brought online with the 'start-cpu' RTAS call. Given that we no longer need to set the compatibility mode on all CPUs at machine reset, so we change that to only set the mode for the boot cpu. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2018-01-10 12:53:00 +11:00
Laurent Vivier	1481fe5fcf	spapr: don't initialize PATB entry if max-cpu-compat < power9 if KVM is enabled and KVM capabilities MMU radix is available, the partition table entry (patb_entry) for the radix mode is initialized by default in ppc_spapr_reset(). It's a problem if we want to migrate the guest to a POWER8 host while the kernel is not started to set the value to the one expected for a POWER8 CPU. The "-machine max-cpu-compat=power8" should allow to migrate a POWER9 KVM host to a POWER8 KVM host, but because patb_entry is set, the destination QEMU tries to enable radix mode on the POWER8 host. This fails and cancels the migration: Process table config unsupported by the host error while loading state for instance 0x0 of device 'spapr' load of migration failed: Invalid argument This patch doesn't set the PATB entry if the user provides a CPU compatibility mode that doesn't support radix mode. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:50:29 +11:00
David Gibson	4f441474c6	spapr: Assume msi_nonbroken We conditionally adjust part of the guest device tree based on the global msi_nonbroken flag. However, the main machine type code initializes msi_nonbroken to true and there's nothing that would set it to false again. So replace the test with an assert(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2017-12-15 09:49:24 +11:00
David Gibson	bcb5ce08cf	spapr: Rename machine init functions for clarity Machine objects have two init functions - the generic QOM level instance_init which should only do static object initialization, and the Machine specific MachineClass::init which does the actual construction of the machine. In spapr the functions implementing these two have names - ppc_machine_initfn() and ppc_spapr_init() - which don't correspond closely to either of those. To prevent people (read, me) from confusing which is which, rename them spapr_instance_init() and spapr_machine_init() to make it clearer which is which. While we're there rename ppc_spapr_reset() to spapr_machine_reset() to match. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-12-15 09:49:24 +11:00
Igor Mammedov	f47bd1c839	spapr: replace numa_get_node() with lookup in pc-dimm list SPAPR is the last user of numa_get_node() and a bunch of supporting code to maintain numa_info[x].addr list. Get LMB node id from pc-dimm list, which allows to remove ~80LOC maintaining dynamic address range lookup list. It also removes pc-dimm dependency on numa_[un]set_mem_node_id() and makes pc-dimms a sole source of information about which node it belongs to and removes duplicate data from global numa_info. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Cédric Le Goater	7718375584	spapr: introduce a spapr_qirq() helper xics_get_qirq() is only used by the sPAPR machine. Let's move it there and change its name to reflect its scope. It will be useful for XIVE support which will use its own set of qirqs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Cédric Le Goater	9e7dc5fc2e	spapr: introduce a spapr_irq_set_lsi() helper It will make synchronisation easier with the XIVE interrupt mode when available. The 'irq' parameter refers to the global IRQ number space. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Cédric Le Goater	60c6823b9b	spapr: move the IRQ allocation routines under the machine Also change the prototype to use a sPAPRMachineState and prefix them with spapr_irq_. It will let us synchronise the IRQ allocation with the XIVE interrupt mode when available. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Greg Kurz	94ad93bd97	spapr_cpu_core: instantiate CPUs separately The current code assumes that only the CPU core object holds a reference on each individual CPU object, and happily frees their allocated memory when the core is unrealized. This is dangerous as some other code can legitimely keep a pointer to a CPU if it calls object_ref(), but it would end up with a dangling pointer. Let's allocate all CPUs with object_new() and let QOM free them when their reference count reaches zero. This greatly simplify the code as we don't have to fiddle with the instance size anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:23 +11:00
David Gibson	2b6154120c	spapr: Add pseries-2.12 machine type While we're at it fix a couple of small errors in the 2.11 and 2.10 models (they didn't have any real effect, but don't quite match the template). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:23 +11:00
David Gibson	768a20f3a4	spapr: Include "pre-plugged" DIMMS in ram size calculation at reset At guest reset time, we allocate a hash page table (HPT) for the guest based on the guest's RAM size. If dynamic HPT resizing is not available we use the maximum RAM size, if it is we use the current RAM size. But the "current RAM size" calculation is incorrect - we just use the "base" ram_size from the machine structure. This doesn't include any pluggable DIMMs that are already plugged at reset time. This means that if you try to start a 'pseries' machine with a DIMM specified on the command line that's much larger than the "base" RAM size, then the guest will get a woefully inadequate HPT. This can lead to a guest freeze during boot as it runs out of HPT space during initial MMU setup. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>	2017-12-04 11:31:22 +11:00
Laurent Vivier	0c86b2df78	pseries: fix TCG migration Migration of pseries is broken with TCG because QEMU tries to restore KVM MMU state unconditionally. The result is a SIGSEGV in kvm_vm_ioctl(): #0 kvm_vm_ioctl (s=0x0, type=-2146390353) at qemu/accel/kvm/kvm-all.c:2032 #1 0x00000001003e3e2c in kvmppc_configure_v3_mmu (cpu=<optimized out>, radix=<optimized out>, gtse=<optimized out>, proc_tbl=<optimized out>) at qemu/target/ppc/kvm.c:396 #2 0x00000001002f8b88 in spapr_post_load (opaque=0x1019103c0, version_id=<optimized out>) at qemu/hw/ppc/spapr.c:1578 #3 0x000000010059e4cc in vmstate_load_state (f=0x106230000, vmsd=0x1009479e0 <vmstate_spapr>, opaque=0x1019103c0, version_id=<optimized out>) at qemu/migration/vmstate.c:165 #4 0x00000001005987e0 in vmstate_load (f=<optimized out>, se=<optimized out>) at qemu/migration/savevm.c:748 This patch fixes the problem by not calling the KVM function with the TCG mode. Fixes: `d39c90f5f3` ("spapr: Fix migration of Radix guests") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-11-30 13:57:51 +11:00
Suraj Jitindar Singh	ee4d9ecc36	target/ppc: Move setting of patb_entry on hash table init The patb_entry is used to store the location of the process table in guest memory. The msb is also used to indicate the mmu mode of the guest, that is patb_entry & 1 << 63 ? radix_mode : hash_mode. Currently we set this to zero in spapr_setup_hpt_and_vrma() since if this function gets called then we know we're hash. However some code paths, such as setting up the hpt on incoming migration of a hash guest, call spapr_reallocate_hpt() directly bypassing this higher level function. Since we assume radix if the host is capable this results in the msb in patb_entry being left set so in spapr_post_load() we call kvmppc_configure_v3_mmu() and tell the host we're radix which as expected means addresses cannot be translated once we actually run the cpu. To fix this move the zeroing of patb_entry into spapr_reallocate_hpt(). Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-11-27 12:20:11 +11:00
Thomas Huth	bac658d1a4	hw/ppc/spapr: Fix virtio-scsi bootindex handling for LUNs >= 256 LUNs >= 256 have to be encoded with the so-called "flat space addressing method" for virtio-scsi, where an additional bit has to be set. SLOF already took care of this with the following commit: https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=f72a37713fea47da (see https://bugzilla.redhat.com/show_bug.cgi?id=1431584 for details) But QEMU does not use this encoding yet for device tree paths that have to be handed over to SLOF to deal with the "bootindex" property, so SLOF currently fails to boot from virtio-scsi devices with LUNs >= 256 in the right boot order. Fix it by using the bit to indicate the "flat space addressing method" for LUNs >= 256. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-11-22 15:28:37 +11:00
Greg Kurz	8251248394	spapr: reset DRCs after devices A DRC with a pending unplug request releases its associated device at machine reset time. In the case of LMB, when all DRCs for a DIMM device have been reset, the DIMM gets unplugged, causing guest memory to disappear. This may be very confusing for anything still using this memory. This is exactly what happens with vhost backends, and QEMU aborts with: qemu-system-ppc64: used ring relocated for ring 2 qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion `r >= 0' failed. The issue is that each DRC registers a QEMU reset handler, and we don't control the order in which these handlers are called (ie, a LMB DRC will unplug a DIMM before the virtio device using the memory on this DIMM could stop its vhost backend). To avoid such situations, let's reset DRCs after all devices have been reset. Reported-by: Mallesh N. Koti <mallesh@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-11-20 10:10:56 +11:00
Suraj Jitindar Singh	7abd43baec	target/ppc: Update setting of cpu features to account for compat modes The device tree nodes ibm,arch-vec-5-platform-support and ibm,pa-features are used to communicate features of the cpu to the guest operating system. The properties of each of these are determined based on the selected cpu model and the availability of hypervisor features. Currently the compatibility mode of the cpu is not taken into account. The ibm,arch-vec-5-platform-support node is used to communicate the level of support for various ISAv3 processor features to the guest before CAS to inform the guests' request. The available mmu mode should only be hash unless the cpu is a POWER9 which is not in a prePOWER9 compat mode, in which case the available modes depend on the accelerator and the hypervisor capabilities. The ibm,pa-featues node is used to communicate the level of cpu support for various features to the guest os. This should only contain features relevant to the operating mode of the processor, that is the selected cpu model taking into account any compat mode. This means that the compat mode should be taken into account when choosing the properties of ibm,pa-features and they should match the compat mode selected, or the cpu model selected if no compat mode. Update the setting of these cpu features in the device tree as described above to properly take into account any compat mode. We use the ppc_check_compat function which takes into account the current processor model and the cpu compat mode. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-11-20 10:07:49 +11:00
Igor Mammedov	2e9c10eba0	ppc: spapr: use generic cpu_model parsing use generic cpu_model parsing introduced by (`6063d4c0f` vl.c: convert cpu_model to cpu type and set of global properties before machine_init()) it allows to: * replace sPAPRMachineClass::tcg_default_cpu with MachineClass::default_cpu_type * drop cpu_parse_cpu_model() from hw/ppc/spapr.c and reuse one in vl.c * simplify spapr_get_cpu_core_type() by removing not needed anymore recurrsion since alias look up happens earlier at vl.c and spapr_get_cpu_core_type() works only with resulted from that cpu type. * spapr no more needs to parse/depend on being phased out MachineState::cpu_model, all tha parsing done by generic code and target specific callback. Signed-off-by: Igor Mammedov <imammedo@redhat.com> [dwg: Correct minor compile error] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:01 +11:00
Igor Mammedov	17be88a713	ppc: spapr: use cpu model names as tcg defaults instead of aliases Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:01 +11:00
Igor Mammedov	b51d3c8818	ppc: spapr: use cpu type name directly replace sPAPRCPUCoreClass::cpu_class with cpu type name since it were needed just to get that at points it were accessed. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Igor Mammedov	b8e999673b	ppc: move '-cpu foo,compat=xxx' parsing into ppc_cpu_parse_featurestr() there is a dedicated callback CPUClass::parse_features which purpose is to convert -cpu features into a set of global properties AND deal with compat/legacy features that couldn't be directly translated into CPU's properties. Create ppc variant of it (ppc_cpu_parse_featurestr) and move 'compat=val' handling from spapr_cpu_core.c into it. That removes a dependency of board/core code on cpu_model parsing and would let to reuse common -cpu parsing introduced by `6063d4c0` Set "max-cpu-compat" property only if it exists, in practice it should limit 'compat' hack to spapr machine and allow to avoid including machine/spapr headers in target/ppc/cpu.c Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Daniel Henrique Barboza	2a129767eb	hw/ppc/spapr.c: abort unplug_request if previous unplug isn't done LMB removal is completed only when the spapr_lmb_release callback is called after all DRCs of the dimm are detached. During this time, it is possible that a unplug request for the same dimm arrives, trying to detach DRCs that were detached by the guest in the first unplug_request. BQL doesn't help in this case - the lock will prevent any concurrent removal from happening until the end of spapr_memory_unplug_request only. What happens is that the second unplug_request ends up calling spapr_drc_detach in a DRC that were detached already, causing an assert error in spapr_drc_detach (e.g https://bugs.launchpad.net/qemu/+bug/1718118). spapr_lmb_release uses a structure called sPAPRDIMMState, stored in the spapr->pending_dimm_unplugs QTAIL, to track how many LMB DRCs are left to be detached by the guest. When there are no more DRCs left, this structure is deleted and the pc-dimm unplug handler is called to finish the process. This patch reuses the sPAPRDIMMState to allow unplug_request to know if there is an ongoing unplug process for a given dimm, aborting the unplug request in this case, by doing the following changes: - in spapr_lmb_release callback, move the dimm state removal to the end, after pc-dimm unplug handler. With this change we can check for the existence of the dimm state to see if the unplug process is done. - use spapr_pending_dimm_unplugs_find in spapr_memory_unplug_request to check if the dimm state exists. If positive, there is an unplug operation already in progress for this dimm, meaning that we should abort it and warn the user about it. Fixes: https://bugs.launchpad.net/qemu/+bug/1718118 Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Greg Kurz	827b17c468	spapr: sanity check size of the CAS buffer The CAS buffer is provided by SLOF. A broken SLOF could pass a silly size: either smaller than the diff header, in which case the current code will try to allocate 16 Exabytes of memory and g_malloc0() will abort, or bigger than the maximum memory provisioned for SLOF (ie, 40 Megabytes), which doesn't make sense. Both cases indicate that SLOF has a bug. Let's print out an explicit error message and exit since rebooting as we do with other errors would only result in a reset loop. Signed-off-by: Greg Kurz <groug@kaod.org> [dwg: Fix format specifier that broke 32-bit builds] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Greg Kurz	dc1b5eee86	spapr: fix OF word name in comment Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Greg Kurz	a4f3885c74	hw/ppc: use 0 instead of fdt_path_offset(fdt, "/") The offset of the root node is guaranteed to be 0. This doesn't fix anything, it's just trivial cleanup of the two remaining places where this was done under hw/ppc. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Greg Kurz	1ec26c757d	spapr: fix the value of SDR1 in kvmppc_put_books_sregs() When running with KVM PR, if a new HPT is allocated we need to inform KVM about the HPT address and size. This is currently done by hacking the value of SDR1 and pushing it to KVM in several places. Also, migration breaks the guest since it is very unlikely the HPT has the same address in source and destination, but we push the incoming value of SDR1 to KVM anyway. This patch introduces a new virtual hypervisor hook so that the spapr code can provide the correct value of SDR1 to be pushed to KVM each time kvmppc_put_books_sregs() is called. It allows to get rid of all the hacking in the spapr/kvmppc code and it fixes migration of nested KVM PR. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Greg Kurz	332f7721cb	spapr: introduce helpers to migrate HPT chunks and the end marker This consolidates some duplicated code in a dedicated helpers. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Greg Kurz	14b0d74887	ppc/kvm: generalize the use of kvmppc_get_htab_fd() The use of KVM_PPC_GET_HTAB_FD is open-coded in kvmppc_read_hptes() and kvmppc_write_hpte(). This patch modifies kvmppc_get_htab_fd() so that it can be used everywhere we need to access the in-kernel htab: - add an index argument => only kvmppc_read_hptes() passes an actual index, all other users pass 0 - add an errp argument to propagate error messages to the caller. => spapr migration code prints the error => hpte helpers pass &error_abort to keep the current behavior of hw_error() While here, this also fixes a bug in kvmppc_write_hpte() so that it opens the htab fd for writing instead of reading as it currently does. This never broke anything because we currently never call this code, as explained in the changelog of commit `c138593380`: "This support updating htab managed by the hypervisor. Currently we don't have any user for this feature. This actually bring the store_hpte interface in-line with the load_hpte one. We may want to use this when we want to emulate henter hcall in qemu for HV kvm." The above is still true today. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Greg Kurz	82be8e7394	ppc/kvm: change kvmppc_get_htab_fd() to return -errno on error When kvmppc_get_htab_fd() fails, its return value is propagated up to qemu_savevm_state_iterate() or to qemu_savevm_state_complete_precopy(). All savevm handlers expect to receive a negative errno on error. Let's patch kvmppc_get_htab_fd() accordingly. While here, let's change htab_load() in the spapr code to also propagate the error, since it doesn't make sense to abort() if we couldn't get the htab fd from KVM. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Igor Mammedov	79e0793614	numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed Calculating default node-ids for CPUs in possible_cpu_arch_ids() is rather fragile since defaults calculation uses nb_numa_nodes but callback might be potentially called early before all -numa CLI options are parsed, which would lead to cpus assigned only upto nb_numa_nodes at the time possible_cpu_arch_ids() is called. Issue was introduced by (`7c88e65` numa: mirror cpu to node mapping in MachineState::possible_cpus) and for example CLI: -smp 4 -numa node,cpus=0 -numa node would set props.node-id in possible_cpus array for every non explicitly mapped CPU to the first node. Issue is not visible to guest nor to mgmt interface due to 1) implictly mapped cpus are forced to the first node in case of partial mapping 2) in case of default mapping possible_cpu_arch_ids() is called after all -numa options are parsed (resulting in correct mapping). However it's fragile to rely on late execution of possible_cpu_arch_ids(), therefore add machine specific callback that returns node-id for CPU and use it to calculate/ set defaults at machine_numa_finish_init() time when all -numa options are parsed. Reported-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496314408-163972-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-09-19 16:51:33 -03:00
Cédric Le Goater	21f3f8db0e	ppc/xive: fix OV5_XIVE_EXPLOIT bits On POWER9, the Client Architecture Support (CAS) negotiation process determines whether the guest operates in XIVE Legacy compatibility or in XIVE exploitation mode. Now that we have initial guest support for the XIVE interrupt controller, let's fix the bits definition which have evolved in the latest specs. The platform advertises the XIVE Exploitation Mode support using the property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 : - 0b00 XIVE legacy mode Only - 0b01 XIVE exploitation mode Only - 0b10 XIVE legacy or exploitation mode The OS asks for XIVE Exploitation Mode support using the property "ibm,architecture-vec-5", byte 23 bits 0-1: - 0b00 XIVE legacy mode Only - 0b01 XIVE exploitation mode Only Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-15 10:29:48 +10:00
Daniel Henrique Barboza	c86c1affae	hw/ppc/spapr.c: cleaning up qdev_get_machine() calls This patch removes the qdev_get_machine() calls that are made in spapr.c in situations where we can get an existing pointer for the MachineState by either passing it as an argument to the function or by using other already available pointers. The following changes were made: - spapr_node0_size: static function that is called two times: at spapr_setup_hpt_and_vrma and ppc_spapr_init. In both cases we can pass an existing MachineState pointer to it. - spapr_build_fdt: MachineState pointer can be retrieved from the existing sPAPRMachineState pointer. - spapr_boot_set: the opaque in the first arg is a sPAPRMachineState pointer as we can see inside ppc_spapr_init: qemu_register_boot_set(spapr_boot_set, spapr); We can get a MachineState pointer from it. - spapr_machine_device_plug and spapr_machine_device_unplug_request: the MachineState, sPAPRMachineState, MachineClass and sPAPRMachineClass pointers can all be retrieved from the HotplugHandler pointer. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-15 10:29:48 +10:00
Sam Bobroff	fa98fbfcdf	PPC: KVM: Support machine option to set VSMT mode KVM now allows writing to KVM_CAP_PPC_SMT which has previously been read only. Doing so causes KVM to act, for that VM, as if the host's SMT mode was the given value. This is particularly important on Power 9 systems because their default value is 1, but they are able to support values up to 8. This patch introduces a way to control this capability via a new machine property called VSMT ("Virtual SMT"). If the value is not set on the command line a default is chosen that is, when possible, compatible with legacy systems. Note that the intialization of KVM_CAP_PPC_SMT has changed slightly because it has changed (in KVM) from a global capability to a VM-specific one. This won't cause a problem on older KVMs because VM capabilities fall back to global ones. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Sam Bobroff	2e886fb391	ppc: spapr: Make VCPU ID handling private to SPAPR The concept of a VCPU ID that differs from the CPU's index (cpu->cpu_index) exists only within SPAPR machines so, move the functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c and rename them appropriately. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Sam Bobroff	81210c2009	ppc: spapr: Rename cpu_dt_id to vcpu_id This field actually records the VCPU ID used by KVM and, although the value is also used in the device tree it is primarily the VCPU ID so rename it as such. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Updated comment missed in cpu.h] Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Greg Kurz	e2676b1697	spapr: add pseries-2.11 machine type Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Daniel Henrique Barboza	10f12e6450	hw/ppc: CAS reset on early device hotplug This patch is a follow up on the discussions made in patch "hw/ppc: disable hotplug before CAS is completed" that can be found at [1]. At this moment, we do not support CPU/memory hotplug in early boot stages, before CAS. When a hotplug occurs, the event is logged in an internal RTAS event log queue and an IRQ pulse is fired. In regular conditions, the guest handles the interrupt by executing check_exception, fetching the generated hotplug event and enabling the device for use. In early boot, this IRQ isn't caught (SLOF does not handle hotplug events), leaving the event in the rtas event log queue. If the guest executes check_exception due to another hotplug event, the re-assertion of the IRQ ends up de-queuing the first hotplug event as well. In short, a device hotplugged before CAS is considered coldplugged by SLOF. This leads to device misbehavior and, in some cases, guest kernel Ooops when trying to unplug the device. A proper fix would be to turn every device hotplugged before CAS as a colplugged device. This is not trivial to do with the current code base though - the FDT is written in the guest memory at ppc_spapr_reset and can't be retrieved without adding extra state (fdt_size for example) that will need to managed and migrated. Adding the hotplugged DT in the middle of CAS negotiation via the updated DT tree works with CPU devs, but panics the guest kernel at boot. Additional analysis would be necessary for LMBs and PCI devices. There are questions to be made in QEMU/SLOF/kernel level about how we can make this change in a sustainable way. With Linux guests, a fix would be the kernel executing check_exception at boot time, de-queueing the events that happened in early boot and processing them. However, even if/when the newer kernels start fetching these events at boot time, we need to take care of older kernels that won't be doing that. This patch works around the situation by issuing a CAS reset if a hotplugged device is detected during CAS: - the DRC conditions that warrant a CAS reset is the same as those that triggers a DRC migration - the DRC must have a device attached and the DRC state is not equal to its ready_state. With that in mind, this patch makes use of 'spapr_drc_needed' to determine if a CAS reset is needed. - In the middle of CAS negotiations, the function 'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see if there are any DRC that requires a reset, using spapr_drc_needed. If that happens, returns '1' in 'spapr_h_cas_compose_response' which will set spapr->cas_reboot to true, causing the machine to reboot. No changes are made for coldplug devices. [1] http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02855.html Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Daniel Henrique Barboza	5625817423	hw/ppc: clear pending_events on machine reset The sPAPR machine isn't clearing up the pending events QTAILQ on machine reboot. This allows for unprocessed hotplug/epow events to persist in the queue after reset and, when reasserting the IRQs in check_exception later on, these will be being processed by the OS. This patch implements a new function called 'spapr_clear_pending_events' that clears up the pending_events QTAILQ. This helper is then called inside ppc_spapr_reset to clear up the events queue, preventing old/deprecated events from persisting after a reset. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Thomas Huth	0479097859	hw/ppc/spapr: Fix segfault when instantiating a 'pc-dimm' without 'memdev' QEMU currently crashes when trying to use a 'pc-dimm' on the pseries machine without specifying its 'memdev' property. This happens because pc_dimm_get_memory_region() does not check whether the 'memdev' property has properly been set by the user. Looking closer at this function, it's also obvious that it is using &error_abort to call another function - and this is bad in a function that is used in the hot-plugging calling chain since this can also cause QEMU to exit unexpectedly. So let's fix these issues in a proper way now: Add a "Error **errp" parameter to pc_dimm_get_memory_region() which we use in case the 'memdev' property has not been set by the user, and which we can use instead of the &error_abort, and change the callers of get_memory_region() to make use of this "errp" parameter for proper error checking. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-08-22 21:26:46 +10:00
David Gibson	fc7e0765fc	Revert "spapr: populate device tree depending on XIVE_EXPLOIT option" This reverts commit `b87680427e`. I thought this was a harmless preliminary for XIVE enablement patches we expect later on. However, due to some subtle interactions between qemu and SLOF (guest firmware) this breaks some things. Revert it for now, we'll work out how to fix it when the rest of the XIVE patches are ready. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-29 16:22:14 +10:00
Bharata B Rao	8d5981c4fc	spapr: Fix QEMU abort during memory unplug Commit `0cffce56` (hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState) introduced a new way to track pending LMBs of DIMM device that is marked for removal. Since this commit we can hit the assert in spapr_pending_dimm_unplugs_add() in the following situation: - DIMM device removal fails as the guest doesn't allow the removal. - Subsequent attempt to remove the same DIMM would hit the assert as the corresponding sPAPRDIMMState is still part of the pending_dimm_unplugs list. Fix this by removing the assert and conditionally adding the sPAPRDIMMState to pending_dimm_unplugs list only when it is not already present. Fixes: `0cffce56ae` Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> [dwg: Tweaked to avoid returning NULL when spapr_pending_dimm_unplugs_add() does find an existing entry] Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-25 11:14:25 +10:00
Laurent Vivier	e8cd4247e9	spapr/htab: fix savevm Commit `3a38429` ("spapr: Add a "no HPT" encoding to HTAB migration stream") allows to migrate an empty HPT, but doesn't mark correctly the end of the migration stream. The end condition (value returned by htab_save_iterate()) should be 1, whereas in `3a38429` it returns 0. The problem can be reproduced with QEMU monitor command "savevm": the command never stops and the disk image grows without limit. Fixes: `3a38429748` Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-25 11:14:25 +10:00
Greg Kurz	df8658de43	spapr: fix memory leak in spapr_core_pre_plug() In case of error, we must ensure the dynamically allocated base_core_type is freed, like it is done everywhere else in this function. This is a regression introduced in QEMU 2.9 by commit `8149e2992f`. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	2772cf6be9	pseries: Use smaller default hash page tables when guest can resize We've now implemented a PAPR extension allowing PAPR guest to resize their hash page table (HPT) during runtime. This patch makes use of that facility to allocate smaller HPTs by default. Specifically when a guest is aware of the HPT resize facility, qemu sizes the HPT to the initial memory size, rather than the maximum memory size on the assumption that the guest will resize its HPT if necessary for hot plugged memory. When the initial memory size is much smaller than the maximum memory size (a common configuration with e.g. oVirt / RHEV) then this can save significant memory on the HPT. If the guest does not advertise HPT resize awareness when it makes the ibm,client-architecture-support call, qemu resizes the HPT for maxmimum memory size (unless it's been configured not to allow such guests at all). For now we make that reallocation assuming the guest has not yet used the HPT at all. That's true in practice, but not, strictly, an architectural or PAPR requirement. If we need to in future we can fix this by having the client-architecture-support call reboot the guest with the revised HPT size (the client-architecture-support call is explicitly permitted to trigger a reboot in this way). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-07-17 15:07:05 +10:00
David Gibson	52b81ab5e9	pseries: Enable HPT resizing for 2.10 We've now implemented a PAPR extensions which allows PAPR guests (i.e. "pseries" machine type) to resize their hash page table during runtime. However, that extension is only enabled if explicitly chosen on the command line. This patch enables it by default for spapr-2.10, but leaves it disabled (by default) for older machine types. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-07-17 15:07:05 +10:00
David Gibson	0b0b831016	pseries: Implement HPT resizing This patch implements hypercalls allowing a PAPR guest to resize its own hash page table. This will eventually allow for more flexible memory hotplug. The implementation is partially asynchronous, handled in a special thread running the hpt_prepare_thread() function. The state of a pending resize is stored in SPAPR_MACHINE->pending_hpt. The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or, if one is already in progress, monitor it for completion. If there is an existing HPT resize in progress that doesn't match the size specified in the call, it will cancel it, replacing it with a new one matching the given size. The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only be called successfully once H_RESIZE_HPT_PREPARE has successfully completed initialization of a new HPT. The guest must ensure that there are no concurrent accesses to the existing HPT while this is called (this effectively means stop_machine() for Linux guests). For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each HPTE into the new HPT. This can have quite high latency, but it seems to be of the order of typical migration downtime latencies for HPTs of size up to ~2GiB (which would be used in a 256GiB guest). In future we probably want to move more of the rehashing to the "prepare" phase, by having H_ENTER and other hcalls update both current and pending HPTs. That's a project for another day, but should be possible without any changes to the guest interface. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00

1 2 3 4 5 ...

481 Commits