There are four scenarios being handled in this function:
- single stepping
- hardware breakpoints
- software breakpoints
- fallback (no debug supported)
A future patch will add code to handle specific single step and
software breakpoints cases so let's split each scenario into its own
function now to avoid hurting readability.
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190228225759.21328-5-farosas@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is in preparation for a refactoring of the kvm_handle_debug
function in the next patch.
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20190228225759.21328-4-farosas@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introduce a new spapr_cap SPAPR_CAP_CCF_ASSIST to be used to indicate
the requirement for a hw-assisted version of the count cache flush
workaround.
The count cache flush workaround is a software workaround which can be
used to flush the count cache on context switch. Some revisions of
hardware may have a hardware accelerated flush, in which case the
software flush can be shortened. This cap is used to set the
availability of such hardware acceleration for the count cache flush
routine.
The availability of such hardware acceleration is indicated by the
H_CPU_CHAR_BCCTR_FLUSH_ASSIST flag being set in the characteristics
returned from the KVM_PPC_GET_CPU_CHAR ioctl.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301031912.28809-2-sjitindarsingh@gmail.com>
[dwg: Small style fixes]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The spapr_cap SPAPR_CAP_IBS is used to indicate the level of capability
for mitigations for indirect branch speculation. Currently the available
values are broken (default), fixed-ibs (fixed by serialising indirect
branches) and fixed-ccd (fixed by diabling the count cache).
Introduce a new value for this capability denoted workaround, meaning that
software can work around the issue by flushing the count cache on
context switch. This option is available if the hypervisor sets the
H_CPU_BEHAV_FLUSH_COUNT_CACHE flag in the cpu behaviours returned from
the KVM_PPC_GET_CPU_CHAR ioctl.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301031912.28809-1-sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Implement support to allow KVM guests to take advantage of the large
decrementer introduced on POWER9 cpus.
To determine if the host can support the requested large decrementer
size, we check it matches that specified in the ibm,dec-bits device-tree
property. We also need to enable it in KVM by setting the LPCR_LD bit in
the LPCR. Note that to do this we need to try and set the bit, then read
it back to check the host allowed us to set it, if so we can use it but
if we were unable to set it the host cannot support it and we must not
use the large decrementer.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190301024317.22137-3-sjitindarsingh@gmail.com>
[dwg: Small style fixes]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It has been there since the enablement of PR KVM for PAPR, ie, commit
f61b4bedaf in 2011. Not sure why at that time, but it is definitely
not needed with the current code.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The VSX register array is a block of 64 128-bit registers where the first 32
registers consist of the existing 64-bit FP registers extended to 128-bit
using new VSR registers, and the last 32 registers are the VMX 128-bit
registers as show below:
64-bit 64-bit
+--------------------+--------------------+
| FP0 | | VSR0
+--------------------+--------------------+
| FP1 | | VSR1
+--------------------+--------------------+
| ... | ... | ...
+--------------------+--------------------+
| FP30 | | VSR30
+--------------------+--------------------+
| FP31 | | VSR31
+--------------------+--------------------+
| VMX0 | VSR32
+-----------------------------------------+
| VMX1 | VSR33
+-----------------------------------------+
| ... | ...
+-----------------------------------------+
| VMX30 | VSR62
+-----------------------------------------+
| VMX31 | VSR63
+-----------------------------------------+
In order to allow for future conversion of VSX instructions to use TCG vector
operations, recreate the same layout using an aligned version of the existing
vsr register array.
Since the old fpr and avr register arrays are removed, the existing callers
must also be updated to use the correct offset in the vsr register array. This
also includes switching the relevant VMState fields over to using subarrays
to make sure that migration is preserved.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add the spapr cap SPAPR_CAP_NESTED_KVM_HV to be used to control the
availability of nested kvm-hv to the level 1 (L1) guest.
Assuming a hypervisor with support enabled an L1 guest can be allowed to
use the kvm-hv module (and thus run it's own kvm-hv guests) by setting:
-machine pseries,cap-nested-hv=true
or disabled with:
-machine pseries,cap-nested-hv=false
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Set the newly added register(KVM_REG_PPC_ONLINE) to indicate if the vcpu is
online(1) or offline(0)
KVM will use this information to set the RWMR register, which controls the PURR
and SPURR accumulation.
CC: paulus@samba.org
Signed-off-by: Nikunj A Dadhania <nikunj@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is no known available OS for ppc around anymore that uses page
sizes below 4k, so it does not make much sense that we keep wasting
our time on building and testing the ppcemb-softmmu target. It has
been deprecated since two releases, and nobody complained, so let's
remove this now.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In a future patch the machine code will need to retrieve the MMU
information from KVM during machine initialization before the CPUs
are created.
Actually, KVM_PPC_GET_SMMU_INFO is a VM class ioctl, and thus, we
don't need to have a CPU object around. We just need for KVM to
be initialized and use the kvm_state global. This patch just does
that.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that we're checking our MMU configuration is supported by KVM,
rather than adjusting it to KVM, it doesn't really make sense to
have a fallback for kvm_get_smmu_info(). If KVM is too old or buggy
to provide the details, we should rather treat this as an error.
This patch thus adds error reporting to kvm_get_smmu_info() and get
rid of the fallback code. QEMU will now terminate if KVM fails to
provide MMU details. This may break some very old setups, but the
simplification is worth the sacrifice.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently during KVM initialization on POWER, kvm_fixup_page_sizes()
rewrites a bunch of information in the cpu state to reflect the
capabilities of the host MMU and KVM. This overwrites the information
that's already there reflecting how the TCG implementation of the MMU will
operate.
This means that we can get guest-visibly different behaviour between KVM
and TCG (and between different KVM implementations). That's bad. It also
prevents migration between KVM and TCG.
The pseries machine type now has filtering of the pagesizes it allows the
guest to use which means it can present a consistent model of the MMU
across all accelerators.
So, we can now replace kvm_fixup_page_sizes() with kvm_check_mmu() which
merely verifies that the expected cpu model can be faithfully handled by
KVM, rather than updating the cpu model to match KVM.
We call kvm_check_mmu() from the spapr cpu reset code. This is a hack:
conceptually it makes more sense where fixup_page_sizes() was - in the KVM
cpu init path. However, doing that would require moving the platform's
pagesize filtering much earlier, which would require a lot of work making
further adjustments. There wouldn't be a lot of concrete point to doing
that, since the only KVM implementation which has the awkward MMU
restrictions is KVM HV, which can only work with an spapr guest anyway.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
The way we used to handle KVM allowable guest pagesizes for PAPR guests
required some convoluted checking of memory attached to the guest.
The allowable pagesizes advertised to the guest cpus depended on the memory
which was attached at boot, but then we needed to ensure that any memory
later hotplugged didn't change which pagesizes were allowed.
Now that we have an explicit machine option to control the allowable
maximum pagesize we can simplify this. We just check all memory backends
against that declared pagesize. We check base and cold-plugged memory at
reset time, and hotplugged memory at pre_plug() time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
KVM HV has a restriction that for HPT mode guests, guest pages must be hpa
contiguous as well as gpa contiguous. We have to account for that in
various places. We determine whether we're subject to this restriction
from the SMMU information exposed by KVM.
Planned cleanups to the way we handle this will require knowing whether
this restriction is in play in wider parts of the code. So, expose a
helper function which returns it.
This does mean some redundant calls to kvm_get_smmu_info(), but they'll go
away again with future cleanups.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
CPUPPCState currently contains a number of fields containing the state of
the VPA. The VPA is a PAPR specific concept covering several guest/host
shared memory areas used to communicate some information with the
hypervisor.
As a PAPR concept this is really machine specific information, although it
is per-cpu, so it doesn't really belong in the core CPU state structure.
There's also other information that's per-cpu, but platform/machine
specific. So create a (void *)machine_data in PowerPCCPU which can be
used by the machine to locate per-cpu data. Intialization, lifetime and
cleanup of machine_data is entirely up to the machine type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
For cap_ppc_safe_cache to be set to workaround, we require both a l1d
cache flush instruction and private l1d cache.
On POWER8 don't require private l1d cache. This means a guest on a
POWER8 machine can make use of the cache flush workarounds.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Factor out the parsing of struct kvm_ppc_cpu_char in
kvmppc_get_cpu_characteristics() into a separate function for each cap
for simplicity.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
cpu_ppc_set_papr() does several things:
1) it sets up the virtual hypervisor interface
2) it prevents the cpu from ever entering hypervisor mode
3) it tells KVM that we're emulating a cpu in PAPR mode
and 4) it configures the LPCR and AMOR (hypervisor privileged registers)
so that TCG will behave correctly for PAPR guests, without
attempting to emulate the cpu in hypervisor mode
(1) & (2) make sense for any virtual hypervisor (if another one ever
exists).
(3) belongs more properly in the machine type specific to a PAPR guest, so
move it to spapr_cpu_init(). While we're at it, remove an ugly test on
kvm_enabled() by making kvmppc_set_papr() a safe no-op on non-KVM.
(4) also belongs more properly in the machine type specific code. (4) is
done by mangling the default values of the SPRs, so that they will be set
correctly at reset time. Manipulating usually-static parameters of the cpu
model like this is kind of ugly, especially since the values used really
have more to do with the platform than the cpu.
The spapr code already has places for PAPR specific initializations of
register state in spapr_cpu_reset(), so move this handling there.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Current POWER cpus allow for a VRMA, a special mapping which describes a
guest's view of memory when in real mode (MMU off, from the guest's point
of view). Older cpus didn't have that which meant that to support a guest
a special host-contiguous region of memory was needed to give the guest its
Real Mode Area (RMA).
KVM used to provide special calls to allocate a contiguous RMA for those
cases. This was useful in the early days of KVM on Power to allow it to be
tested on PowerPC 970 chips as used in Macintosh G5 machines. Now, those
machines are so old as to be almost irrelevant.
The normal qemu deprecation process would require this to be marked
deprecated then removed in 2 releases. However, this can only be used
with corresponding support in the host kernel - which was dropped
years ago (in c17b98cf "KVM: PPC: Book3S HV: Remove code for PPC970
processors" of 2014-12-03 to be precise). Therefore it should be ok
to drop this immediately.
Just to be clear this only affects *KVM HV* guests with PowerPC 970,
and those already require an ancient host kernel. TCG and KVM PR
guests with PowerPC 970 should still work.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Thomas Huth <thuth@redhat.com>
The env->slb_nr field gives the size of the SLB (Segment Lookaside Buffer).
This is another static-after-initialization parameter of the specific
version of the 64-bit hash MMU in the CPU. So, this patch folds the field
into PPCHash64Options with the other hash MMU options.
This is a bit more complicated that the things previously put in there,
because slb_nr was foolishly included in the migration stream. So we need
some of the usual dance to handle backwards compatible migration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
These macros were introduced to deal with the fact that the mmu_model
field has bit flags mixed in with what's otherwise an enum of various mmu
types.
We've now eliminated all those flags except for one, and that one -
POWERPC_MMU_64 - is already included/compared in the MMU_VER macros. So,
we can get rid of those macros and just directly compare mmu_model values
in the places it was used.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
The ci_large_pages boolean in CPUPPCState is only relevant to 64-bit hash
MMU machines, indicating whether it's possible to map large (> 4kiB) pages
as cache-inhibitied (i.e. for IO, rather than memory). Fold it as another
flag into the PPCHash64Options structure.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently env->mmu_model is a bit of an unholy mess of an enum of distinct
MMU types, with various flag bits as well. This makes which bits of the
field should be compared pretty confusing.
Make a start on cleaning that up by moving two of the flags bits -
POWERPC_MMU_1TSEG and POWERPC_MMU_AMR - which are specific to the 64-bit
hash MMU into a new flags field in PPCHash64Options structure.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
env->sps contains page size encoding information as an embedded structure.
Since this information is specific to 64-bit hash MMUs, split it out into
a separately allocated structure, to reduce the basic env size for other
cpus. Along the way we make a few other cleanups:
* Rename to PPCHash64Options which is more in line with qemu name
conventions, and reflects that we're going to merge some more hash64
mmu specific details in there in future. Also rename its
substructures to match qemu conventions.
* Move structure definitions to the mmu-hash64.[ch] files.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
CPU definitions for cpus with the 64-bit hash MMU can include a table of
available pagesizes. If this isn't supplied ppc_cpu_instance_init() will
fill it in a fallback table based on the POWERPC_MMU_64K bit in mmu_model.
However, it turns out all the cpus which support 64K pages already include
an explicit table of page sizes, so there's no point to the fallback table
including 64k pages.
That removes the only place which tests POWERPC_MMU_64K, so we can remove
it. Which in turn allows some logic to be removed from
kvm_fixup_page_sizes().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
There are a couple places (one generic, one target specific) where we need
to get the host page size associated with a particular memory backend. I
have some upcoming code which will add another place which wants this. So,
for convenience, add a helper function to calculate this.
host_memory_backend_pagesize() returns the host pagesize for a given
HostMemoryBackend object.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
qemu_mempath_getpagesize() gets the effective (host side) page size for
a block of memory backed by an mmap()ed file on the host. It requires
the mem_path parameter to be non-NULL.
This ends up meaning all the callers need a different case for handling
anonymous memory (for memory-backend-ram or default memory with -mem-path
is not specified).
We can make all those callers a little simpler by having
qemu_mempath_getpagesize() accept NULL, and treat that as the anonymous
memory case.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Convert cap-ibs (indirect branch speculation) to a custom spapr-cap
type.
All tristate caps have now been converted to custom spapr-caps, so
remove the remaining support for them.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Don't explicitly list "?"/help option, trust convention]
[dwg: Fold tristate removal into here, to not break bisect]
[dwg: Fix minor style problems]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Check the character and character_mask field when setting
cap_ppc_safe_indirect_branch based on the hypervisor response
to KVM_PPC_GET_CPU_CHAR. Previously the mask field wasn't checked
which was incorrect.
Fixes: 8acc2ae5 (target/ppc/kvm: Add cap_ppc_safe_[cache/bounds_check/indirect_branch])
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In order to enable TCE operations support in KVM, we have to inform
the KVM about VFIO groups being attached to specific LIOBNs;
the necessary bits are implemented already by IOMMU MR and VFIO.
This defines get_attr() for the SPAPR TCE IOMMU MR which makes VFIO
call the KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl and establish
LIOBN-to-IOMMU link.
This changes spapr_tce_set_need_vfio() to avoid TCE table reallocation
if the kernel supports the TCE acceleration.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
[aw - remove unnecessary sys/ioctl.h include]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add three new kvm capabilities used to represent the level of host support
for three corresponding workarounds.
Host support for each of the capabilities is queried through the
new ioctl KVM_PPC_GET_CPU_CHAR which returns four uint64 quantities. The
first two, character and behaviour, represent the available
characteristics of the cpu and the behaviour of the cpu respectively.
The second two, c_mask and b_mask, represent the mask of known bits for
the character and beheviour dwords respectively.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Correct some compile errors due to name change in final kernel
patch version]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When constructing the "host" cpu class we modify whether the VMX and VSX
vector extensions and DFP (Decimal Floating Point) are available
based on whether KVM can support those instructions. This can depend on
policy in the host kernel as well as on the actual host cpu capabilities.
However, the way we probe for this is not very nice: we explicitly check
the host's device tree. That works in practice, but it's not really
correct, since the device tree is a property of the host kernel's platform
which we don't really know about. We get away with it because the only
modern POWER platforms happen to encode VMX, VSX and DFP availability in
the device tree in the same way.
Arguably we should have an explicit KVM capability for this, but we haven't
needed one so far. Barring specific KVM policies which don't yet exist,
each of these instruction classes will be available in the guest if and
only if they're available in the qemu userspace process. We can determine
that from the ELF AUX vector we're supplied with.
Once reworked like this, there are no more callers for kvmppc_get_vmx() and
kvmppc_get_dfp() so remove them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
As stated in the 1ad9f0a464 commit log, the returned entries are not
a whole PTEG. It was not a problem before 1ad9f0a464 as it would read
a single record assuming it contains a whole PTEG but now the code tries
reading the entire PTEG and "if ((n - i) < invalid)" produces negative
values which then are converted to size_t for memset() and that throws
seg fault.
This fixes the math.
While here, fix the last @i increment as well.
Fixes: 1ad9f0a464 "target/ppc: Fix KVM-HV HPTE accessors"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
applied using ./scripts/clean-includes
not needed since 7ebaf79556
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
use generic cpu_model parsing introduced by
(6063d4c0f vl.c: convert cpu_model to cpu type and set of global properties before machine_init())
it allows to:
* replace sPAPRMachineClass::tcg_default_cpu with
MachineClass::default_cpu_type
* drop cpu_parse_cpu_model() from hw/ppc/spapr.c and reuse
one in vl.c
* simplify spapr_get_cpu_core_type() by removing
not needed anymore recurrsion since alias look up
happens earlier at vl.c and spapr_get_cpu_core_type()
works only with resulted from that cpu type.
* spapr no more needs to parse/depend on being phased out
MachineState::cpu_model, all tha parsing done by generic
code and target specific callback.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[dwg: Correct minor compile error]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
consolidate 'host' core type registration by moving it from
KVM specific code into spapr_cpu_core.c, similar like it's
done in x86 target.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
replace sPAPRCPUCoreClass::cpu_class with cpu type name
since it were needed just to get that at points it were
accessed.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When running with KVM PR, if a new HPT is allocated we need to inform
KVM about the HPT address and size. This is currently done by hacking
the value of SDR1 and pushing it to KVM in several places.
Also, migration breaks the guest since it is very unlikely the HPT has
the same address in source and destination, but we push the incoming
value of SDR1 to KVM anyway.
This patch introduces a new virtual hypervisor hook so that the spapr
code can provide the correct value of SDR1 to be pushed to KVM each
time kvmppc_put_books_sregs() is called.
It allows to get rid of all the hacking in the spapr/kvmppc code and
it fixes migration of nested KVM PR.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The use of KVM_PPC_GET_HTAB_FD is open-coded in kvmppc_read_hptes()
and kvmppc_write_hpte().
This patch modifies kvmppc_get_htab_fd() so that it can be used
everywhere we need to access the in-kernel htab:
- add an index argument
=> only kvmppc_read_hptes() passes an actual index, all other users
pass 0
- add an errp argument to propagate error messages to the caller.
=> spapr migration code prints the error
=> hpte helpers pass &error_abort to keep the current behavior
of hw_error()
While here, this also fixes a bug in kvmppc_write_hpte() so that it
opens the htab fd for writing instead of reading as it currently does.
This never broke anything because we currently never call this code,
as explained in the changelog of commit c138593380:
"This support updating htab managed by the hypervisor. Currently
we don't have any user for this feature. This actually bring the
store_hpte interface in-line with the load_hpte one. We may want
to use this when we want to emulate henter hcall in qemu for HV
kvm."
The above is still true today.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When kvmppc_get_htab_fd() fails, its return value is propagated up to
qemu_savevm_state_iterate() or to qemu_savevm_state_complete_precopy().
All savevm handlers expect to receive a negative errno on error.
Let's patch kvmppc_get_htab_fd() accordingly.
While here, let's change htab_load() in the spapr code to also
propagate the error, since it doesn't make sense to abort() if
we couldn't get the htab fd from KVM.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It never got used since its introduction (commit 7c43bca004).
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The following capabilities are VM specific:
- KVM_CAP_PPC_SMT_POSSIBLE
- KVM_CAP_PPC_HTAB_FD
- KVM_CAP_PPC_ALLOC_HTAB
If both KVM HV and KVM PR are present, checking them always return
the HV value, even if we explicitely requested to use PR.
This has no visible effect for KVM_CAP_PPC_ALLOC_HTAB, because we also
try the KVM_PPC_ALLOCATE_HTAB ioctl which is only suppored by HV. As
a consequence, the spapr code doesn't even check KVM_CAP_PPC_HTAB_FD.
However, this will cause kvmppc_hint_smt_possible(), introduced by
commit fa98fbfcdf, to report several VSMT modes (eg, Available
VSMT modes: 8 4 2 1) whereas PR only support mode 1.
This patch fixes all three anyway to use kvm_vm_check_extension(). It
is okay since the VM is already created at the time kvm_arch_init() or
kvmppc_reset_htab() is called.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If the host has both KVM PR and KVM HV loaded and we pass:
-machine pseries,accel=kvm,kvm-type=PR
the kvmppc_is_pr() returns false instead of true. Since the helper
is mostly used as fallback, it doesn't have any real impact with
recent kernels. A notable exception is the workaround to allow
migration between compatible hosts with different PVRs (eg, POWER8
and POWER8E), since KVM still doesn't provide a way to check if a
specific PVR is supported (see commit c363a37a45 for details).
According to the official KVM API documentation [1], KVM_PPC_GET_PVINFO
is "vm ioctl", but we check it as a global ioctl. The following function
in KVM is hence called with kvm == NULL and considers we're in HV mode.
int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
{
int r;
/* Assume we're using HV mode when the HV module is loaded */
int hv_enabled = kvmppc_hv_ops ? 1 : 0;
if (kvm) {
/*
* Hooray - we know which VM type we're running on. Depend on
* that rather than the guess above.
*/
hv_enabled = is_kvmppc_hv_enabled(kvm);
}
Let's use kvm_vm_check_extension() to fix the issue.
[1] https://www.kernel.org/doc/Documentation/virtual/kvm/api.txt
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Caching there practically doesn't give any benefits
and that at slow path druring querying supported CPU list.
But it introduces non conventional path of where from
comes used CPU type name (kvm_ppc_register_host_cpu_type).
Taking in account that kvm_ppc_register_host_cpu_type()
fixes up models the aliases point to, it's sufficient to
make ppc_cpu_class_by_name() translate cpu alias to
correct cpu type name.
So drop PowerPCCPUAlias::oc field + ppc_cpu_class_by_alias()
and let ppc_cpu_class_by_name() do conversion to cpu type name,
which simplifies code a little bit saving ~20LOC and trouble
wondering why ppc_cpu_class_by_alias() is necessary.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PPC handles -cpu FOO rather incosistently,
i.e. it does case-insensitive matching of FOO to
a CPU type (see: ppc_cpu_compare_class_name) but
handles alias names as case-sensitive, as result:
# qemu-system-ppc64 -M mac99 -cpu g3
qemu-system-ppc64: unable to find CPU model ' kN�U'
# qemu-system-ppc64 -cpu 970MP_V1.1
qemu-system-ppc64: Unable to find sPAPR CPU Core definition
while
# qemu-system-ppc64 -M mac99 -cpu G3
# qemu-system-ppc64 -cpu 970MP_v1.1
start up just fine.
Considering we can't take case-insensitive matching away,
make it case-insensitive for all alias/type/core_type
lookups.
As side effect it allows to remove duplicate core types
which are the same except of using different cased letters in name.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Replace
"-" TYPE_POWERPC_CPU
when composing cpu type name from cpu model string literal
and the same pattern in format strings with
POWERPC_CPU_TYPE_SUFFIX and POWERPC_CPU_TYPE_NAME(model)
macroses like we do in x86.
Later POWERPC_CPU_TYPE_NAME() will be used to define default
cpu type per machine type and as bonus it will be consistent
and easy grep-able pattern across all other targets that I'm
plannig to treat the same way.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM now allows writing to KVM_CAP_PPC_SMT which has previously been
read only. Doing so causes KVM to act, for that VM, as if the host's
SMT mode was the given value. This is particularly important on Power
9 systems because their default value is 1, but they are able to
support values up to 8.
This patch introduces a way to control this capability via a new
machine property called VSMT ("Virtual SMT"). If the value is not set
on the command line a default is chosen that is, when possible,
compatible with legacy systems.
Note that the intialization of KVM_CAP_PPC_SMT has changed slightly
because it has changed (in KVM) from a global capability to a
VM-specific one. This won't cause a problem on older KVMs because VM
capabilities fall back to global ones.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The concept of a VCPU ID that differs from the CPU's index
(cpu->cpu_index) exists only within SPAPR machines so, move the
functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c
and rename them appropriately.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This field actually records the VCPU ID used by KVM and, although the
value is also used in the device tree it is primarily the VCPU ID so
rename it as such.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Updated comment missed in cpu.h]
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>