The number of MSI interrupts a sPAPR machine can allocate is in direct
relation with the number of interrupts of the sPAPRIrq backend. Define
statically this value at the sPAPRIrq class level and use it for the
"ibm,pe-total-#msi" property of the sPAPR PHB.
According to the PAPR specs, "ibm,pe-total-#msi" defines the maximum
number of MSIs that are available to the PE. We choose to advertise
the maximum number of MSIs that are available to the machine for
simplicity of the model and to avoid segmenting the MSI interrupt pool
which can be easily shared. If the pool limit is reached, it can be
extended dynamically.
Finally, remove XICS_IRQS_SPAPR which is now unused.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This proposal moves all the related IRQ routines of the sPAPR machine
behind a sPAPR IRQ backend interface 'spapr_irq' to prepare for future
changes. First of which will be to increase the size of the IRQ number
space, then, will follow a new backend for the POWER9 XIVE IRQ controller.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This proposal introduces a new IRQ number space layout using static
numbers for all devices, depending on a device index, and a bitmap
allocator for the MSI IRQ numbers which are negotiated by the guest at
runtime.
As the VIO device model does not have a device index but a "reg"
property, we introduce a formula to compute an IRQ number from a "reg"
value. It should minimize most of the collisions.
The previous layout is kept in pre-3.1 machines raising the
'legacy_irq_allocation' machine class flag.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The last user of the PowerPCCPU typedef in "hw/ppc/xics.h" vanished with
commit b1fd36c363. It isn't necessary to
include "target/ppc/cpu-qom.h" there anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Here's a last minue pull request before today's soft freeze. Ideally
I would have sent this earlier, but I was waiting for a couple of
extra fixes I knew were close. And the freeze crept up on me, like
always.
Most of the changes here are bugfixes in any case. There are some
cleanups as well, which have been in my staging tree for a little
while. There are a couple of truly new features (some extensions to
the sam460ex platform), but these are low risk, since they only affect
a new and not really stabilized machine type anyway.
Higlights are:
* Mac platform improvements from Mark Cave-Ayland
* Sam460ex improvements from BALATON Zoltan et al.
* XICS interrupt handler cleanups from Cédric Le Goater
* TCG improvements for atomic loads and stores from Richard
Henderson
* Assorted other bugfixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAls7D8oACgkQbDjKyiDZ
s5Lxmg//YzPfC/nKqTTKkyJPzh/NnSC+kRTMAT3mbxdRIc7yfgMqJtWGGbS1iKgK
EeJ9hl5Qm0HfscfDuzf0xasU62ZEv3kNdLnWJEIgkqiXrxoO5KCnC0y4D8NN1W03
mvINNCa8+QDg2OsirGmNUTkriiG3wLIrHTpLZ4+JuC2Bd9H3nTHZgJ0MXON/1VWY
oRgr6kMZ5+IAzPhvYLFR6l3nPI883fgJOFyRo7YqYrkVBKFrFkfK0Xjw6vpsNxcx
2dE/YCHhNIriLuBG5noewL7GuqZRtLnl6rjjee5VAKIe1EmFeR+jsXwNjzGOVOJg
dhjOtsJsQQ3WdEw5uImJzE64kV228WCgmkeXzZd1010JBLr7sUkrd2EuoZ23vvat
uvZAHVSBrJg5WvzMo1VMEoPU3VeeZQ5HL+MI80iKiU6oUgRK11gVJcebtA0sEKt+
zhJC4JiUlHtZLTGIpMBmU8DJZ3Tyk1cBEm+Ky+SaPE+dsz16UHI0fazFQXJnXphE
MLHEGAyQgzWYp7kIcAjUFev0Geq/Uovy4JKIGI6ISop1wRPEQDxkthfkfRyQxQkE
zuse4EBcEH/Undw9KrmEQa0hCe+8BRkxklVbPesFPPdqH3PKNxtHYuWpSShQF0PW
XMjw43O2Rbsl8kBUHCpy4pYSugD1hpfgaw/mVUOU1u/M1O6toTw=
=AHrx
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.0-20180703' into staging
ppc patch queue 2018-07-03
Here's a last minue pull request before today's soft freeze. Ideally
I would have sent this earlier, but I was waiting for a couple of
extra fixes I knew were close. And the freeze crept up on me, like
always.
Most of the changes here are bugfixes in any case. There are some
cleanups as well, which have been in my staging tree for a little
while. There are a couple of truly new features (some extensions to
the sam460ex platform), but these are low risk, since they only affect
a new and not really stabilized machine type anyway.
Higlights are:
* Mac platform improvements from Mark Cave-Ayland
* Sam460ex improvements from BALATON Zoltan et al.
* XICS interrupt handler cleanups from Cédric Le Goater
* TCG improvements for atomic loads and stores from Richard
Henderson
* Assorted other bugfixes
# gpg: Signature made Tue 03 Jul 2018 06:55:22 BST
# gpg: using RSA key 6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-3.0-20180703: (35 commits)
ppc: Include vga cirrus card into the compiling process
target/ppc: Relax reserved bitmask of indexed store instructions
target/ppc: set is_jmp on ppc_tr_breakpoint_check
spapr: compute default value of "hpt-max-page-size" later
target/ppc/kvm: don't pass cpu to kvm_get_smmu_info()
target/ppc/kvm: get rid of kvm_get_fallback_smmu_info()
ppc440_uc: Basic emulation of PPC440 DMA controller
sam460ex: Add RTC device
hw/timer: Add basic M41T80 emulation
ppc4xx_i2c: Rewrite to model hardware more closely
hw/ppc: Give sam46ex its own config option
fpu_helper.c: fix setting FPSCR[FI] bit
target/ppc: Implement the rest of gen_st_atomic
target/ppc: Implement the rest of gen_ld_atomic
target/ppc: Use atomic min/max helpers
target/ppc: Use MO_ALIGN for EXIWX and ECOWX
target/ppc: Split out gen_st_atomic
target/ppc: Split out gen_ld_atomic
target/ppc: Split out gen_load_locked
target/ppc: Tidy gen_conditional_store
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# hw/ppc/spapr.c
Just like for the realize handlers, this makes possible to move the
common ICSState code of the reset handlers in the ics-base class.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This makes possible to move the common ICSState code of the realize
handlers in the ics-base class.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This changes the ICP realize and reset handlers in DeviceRealize and
DeviceReset handlers. parent handlers are now called from the
inheriting classes which is a cleaner object pattern.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It eases code review, unit is explicit.
Patch generated using:
$ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/
and modified manually.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20180625124238.25339-33-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The way we used to handle KVM allowable guest pagesizes for PAPR guests
required some convoluted checking of memory attached to the guest.
The allowable pagesizes advertised to the guest cpus depended on the memory
which was attached at boot, but then we needed to ensure that any memory
later hotplugged didn't change which pagesizes were allowed.
Now that we have an explicit machine option to control the allowable
maximum pagesize we can simplify this. We just check all memory backends
against that declared pagesize. We check base and cold-plugged memory at
reset time, and hotplugged memory at pre_plug() time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
The way the POWER Hash Page Table (HPT) MMU is virtualized by KVM HV means
that every page that the guest puts in the pagetables must be truly
physically contiguous, not just GPA-contiguous. In effect this means that
an HPT guest can't use any pagesizes greater than the host page size used
to back its memory.
At present we handle this by changing what we advertise to the guest based
on the backing pagesizes. This is pretty bad, because it means the guest
sees a different environment depending on what should be host configuration
details.
As a start on fixing this, we add a new capability parameter to the
pseries machine type which gives the maximum allowed pagesizes for an
HPT guest. For now we just create and validate the parameter without
making it do anything.
For backwards compatibility, on older machine types we set it to the max
available page size for the host. For the 3.0 machine type, we fix it to
16, the intention being to only allow HPT pagesizes up to 64kiB by default
in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
spapr_irq_alloc_block and spapr_irq_alloc() are now deprecated.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, when a device requests for IRQ number in a sPAPR machine, the
spapr_irq_alloc() routine first scans the ICSState status array to
find an empty slot and then performs the assignement of the selected
numbers. Split this sequence in two distinct routines : spapr_irq_find()
for lookups and spapr_irq_claim() for claiming the IRQ numbers.
This will ease the introduction of a static layout of IRQ numbers.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr capabilities have an apply hook to actually activate (or deactivate)
the feature in the system at reset time. However, a number of capabilities
affect the setup of cpus, and need to be applied to each of them -
including hotplugged cpus for extra complication. To make this simpler,
add an optional cpu_apply hook that is called from spapr_cpu_reset().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Previously, the effective values of the various spapr capability flags
were only determined at machine reset time. That was a lazy way of making
sure it was after cpu initialization so it could use the cpu object to
inform the defaults.
But we've now improved the compat checking code so that we don't need to
instantiate the cpus to use it. That lets us move the resolution of the
capability defaults much earlier.
This is going to be necessary for some future capabilities.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
It introduces a base PnvChip class from which the specific processor
chip classes, Pnv8Chip and Pnv9Chip, inherit. Each of them needs to
define an init and a realize routine which will create the controllers
of the target processor. For the moment, the base PnvChip class
handles the XSCOM bus and the cores.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A per-CPU machine data pointer was recently added to PowerPCCPU. The
motivation is to to hide platform specific details from the core CPU
code. This per-CPU data can hold state which is relevant to the guest
though, eg, Virtual Processor Areas, and we should migrate this state.
This patch adds the plumbing so that we can migrate the per-CPU data
for PAPR guests. We only do this for newer machine types for the sake
of backward compatibility. No state is migrated for the moment: the
vmstate_spapr_cpu_state structure will be populated by subsequent
patches.
Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Fix some trivial spelling and spacing errors]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This moves the details of the ISA bus creation under the LPC model but
more important, the new PnvChip operation will let us choose the chip
class to use when we introduce the different chip classes for Power9
and Power8. It hides away the processor chip controllers from the
machine.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On Power9, the thread interrupt presenter has a different type and is
linked to the chip owning the cores.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
CPUPPCState currently contains a number of fields containing the state of
the VPA. The VPA is a PAPR specific concept covering several guest/host
shared memory areas used to communicate some information with the
hypervisor.
As a PAPR concept this is really machine specific information, although it
is per-cpu, so it doesn't really belong in the core CPU state structure.
There's also other information that's per-cpu, but platform/machine
specific. So create a (void *)machine_data in PowerPCCPU which can be
used by the machine to locate per-cpu data. Intialization, lifetime and
cleanup of machine_data is entirely up to the machine type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Currently, we allocate space for all the cpu objects within a single core
in one big block. This was copied from an older version of the spapr code
and requires some ugly pointer manipulation to extract the individual
objects.
This design was due to a misunderstanding of qemu lifetime conventions and
has already been changed in spapr (in 94ad93bd "spapr_cpu_core: instantiate
CPUs separately".
Make an equivalent change in pnv_core to get rid of the nasty pointer
arithmetic.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
This option allows the VIA configuration to be controlled between 3
different possible setups: cuda, pmu-adb and pmu with USB rather than ADB
keyboard/mouse.
For the moment we don't do anything with the configuration except to pass
it to the macio device (the via-cuda parent) and also to the firmware via
the fw_cfg interface so that it can present the correct device tree.
The default is cuda which is the current default and so will have no
change in behaviour.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A specific MemoryRegion is required for the LPC HC Firmware address
space.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is used in OpenBIOS to define the memory layout of the NVRAM device. Whilst
currently left at its default value, add the missing definition to ensure it is
reserved.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is no need to include pci.h in these files.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Let's make it clear at relevant places that we are dealing with device
memory. That it can be used for memory hotplug is just a special case.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-11-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
[ehabkost: rebased series, solved conflicts at spapr.c]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Let's allow to query the MemoryHotplugState directly from the machine.
If the pointer is NULL, the machine does not support memory devices. If
the pointer is !NULL, the machine supports memory devices and the
data structure contains information about the applicable physical
guest address space region.
This allows us to generically detect if a certain machine has support
for memory devices, and to generically manage it (find free address
range, plug/unplug a memory region).
We will rename "MemoryHotplugState" to something more meaningful
("DeviceMemory") after we completed factoring out the pc-dimm code into
MemoryDevice code.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
[ehabkost: rebased series, solved conflicts at spapr.c]
[ehabkost: squashed fix to use g_malloc0()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Under PAPR, only the boot CPU is active when the system starts. Other cpus
must be explicitly activated using an RTAS call. The entry state for the
boot and secondary cpus isn't identical, but it has some things in common.
We're going to add a bit more common setup later, too, so to simplify
make a helper which sets up the common entry state for both boot and
secondary cpu threads.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
The new property ibm,dynamic-memory-v2 allows memory to be represented
in a more compact manner in device tree.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As a rule we prefer to pass PowerPCCPU instead of CPUPPCState, and this
change will make some things simpler later on.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Convert cap-ibs (indirect branch speculation) to a custom spapr-cap
type.
All tristate caps have now been converted to custom spapr-caps, so
remove the remaining support for them.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Don't explicitly list "?"/help option, trust convention]
[dwg: Fold tristate removal into here, to not break bisect]
[dwg: Fix minor style problems]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is to faciliate access to OpenPICState when wiring up the PIC to the macio
controller.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is needed before the next patch because the target-dependent kvm stub
uses the existing kvm_openpic_connect_vcpu() declaration, making it impossible
to move the device-specific declarations into the same file without breaking
ppc-linux-user compilation.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In my "build everything" tree, a change to the types in
qapi-schema.json triggers a recompile of about 4800 out of 5100
objects.
The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h,
qapi-types.h. Each of these headers still includes all its shards.
Reduce compile time by including just the shards we actually need.
To illustrate the benefits: adding a type to qapi/migration.json now
recompiles some 2300 instead of 4800 objects. The next commit will
improve it further.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-24-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
The spapr_vcpu_id() function is an accessor actually. Let's rename it
for symmetry with the recently added spapr_set_vcpu_id() helper.
The motivation behind this is that a later patch will consolidate
the VCPU id formula in a function and spapr_vcpu_id looks like an
appropriate name.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The VCPU ids are currently computed and assigned to each individual
CPU threads in spapr_cpu_core_realize(). But the numbering logic
of VCPU ids is actually a machine-level concept, and many places
in hw/ppc/spapr.c also have to compute VCPU ids out of CPU indexes.
The current formula used in spapr_cpu_core_realize() is:
vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i
where:
cc->core_id is a multiple of smp_threads
cpu_index = cc->core_id + i
0 <= i < smp_threads
So we have:
cpu_index % smp_threads == i
cc->core_id / smp_threads == cpu_index / smp_threads
hence:
vcpu_id =
(cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads;
This formula was used before VSMT at the time VCPU ids where computed
at the target emulation level. It has the advantage of being useable
to derive a VPCU id out of a CPU index only. It is fitted for all the
places where the machine code has to compute a VCPU id.
This patch introduces an accessor to set the VCPU id in a PowerPCCPU object
using the above formula. It is a first step to consolidate all the VCPU id
logic in a single place.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The new H-Call H_GET_CPU_CHARACTERISTICS is used by the guest to query
behaviours and available characteristics of the cpu.
Implement the handler for this new H-Call which formulates its response
based on the setting of the spapr_caps cap-cfpc, cap-sbbc and cap-ibs.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add new tristate cap cap-ibs to represent the indirect branch
serialisation capability.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add new tristate cap cap-sbbc to represent the speculation barrier
bounds checking capability.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add new tristate cap cap-cfpc to represent the cache flush on privilege
change capability.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_caps are used to represent the level of support for various
capabilities related to the spapr machine type. Currently there is
only support for boolean capabilities.
Add support for tristate capabilities by implementing their get/set
functions. These capabilities can have the values 0, 1 or 2
corresponding to broken, workaround and fixed.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add three new kvm capabilities used to represent the level of host support
for three corresponding workarounds.
Host support for each of the capabilities is queried through the
new ioctl KVM_PPC_GET_CPU_CHAR which returns four uint64 quantities. The
first two, character and behaviour, represent the available
characteristics of the cpu and the behaviour of the cpu respectively.
The second two, c_mask and b_mask, represent the mask of known bits for
the character and beheviour dwords respectively.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Correct some compile errors due to name change in final kernel
patch version]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This redefinition generates warnings on some clang compilers and older
gcc4.4.
...include/hw/ppc/pnv_xscom.h:24:24: warning: redefinition of typedef 'PnvChip' is a C11
feature [-Wtypedef-redefinition]
typedef struct PnvChip PnvChip;
^
...include/hw/ppc/pnv.h:65:3: note: previous definition is here
} PnvChip;
^
1 warning generated.
CC ppc64-softmmu/hw/ppc/pnv_xscom.o
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XSCOM base address of the core chiplet was wrongly calculated. Use
the OPAL macros to fix that and do a couple of renames.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These are useful when instantiating device models which are shared
between the POWER8 and the POWER9 processor families.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently spapr_caps are tied to boolean values (on or off). This patch
reworks the caps so that they can have any uint8 value. This allows more
capabilities with various values to be represented in the same way
internally. Capabilities are numbered in ascending order. The internal
representation of capability values is an array of uint8s in the
sPAPRMachineState, indexed by capability number.
Capabilities can have their own name, description, options, getter and
setter functions, type and allow functions. They also each have their own
section in the migration stream. Capabilities are only migrated if they
were explictly set on the command line, with the assumption that
otherwise the default will match.
On migration we ensure that the capability value on the destination
is greater than or equal to the capability value from the source. So
long at this remains the case then the migration is considered
compatible and allowed to continue.
This patch implements generic getter and setter functions for boolean
capabilities. It also converts the existings cap-htm, cap-vsx and
cap-dfp capabilities to this new format.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Decimal Floating Point has been available on POWER7 and later (server)
cpus. However, it can be disabled on the hypervisor, meaning that it's
not available to guests.
We currently handle this by conditionally advertising DFP support in the
device tree depending on whether the guest CPU model supports it - which
can also depend on what's allowed in the host for -cpu host. That can lead
to confusion on migration, since host properties are silently affecting
guest visible properties.
This patch handles it by treating it as an optional capability for the
pseries machine type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
We currently have some conditionals in the spapr device tree code to decide
whether or not to advertise the availability of the VMX (aka Altivec) and
VSX vector extensions to the guest, based on whether the guest cpu has
those features.
This can lead to confusion and subtle failures on migration, since it makes
a guest visible change based only on host capabilities. We now have a
better mechanism for this, in spapr capabilities flags, which explicitly
depend on user options rather than host capabilities.
Rework the advertisement of VSX and VMX based on a new VSX capability. We
no longer bother with a conditional for VMX support, because every CPU
that's ever been supported by the pseries machine type supports VMX.
NOTE: Some userspace distributions (e.g. RHEL7.4) already rely on
availability of VSX in libc, so using cap-vsx=off may lead to a fatal
SIGILL in init.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Now that the "pseries" machine type implements optional capabilities (well,
one so far) there's the possibility of having different capabilities
available at either end of a migration. Although arguably a user error,
it would be nice to catch this situation and fail as gracefully as we can.
This adds code to migrate the capabilities flags. These aren't pulled
directly into the destination's configuration since what the user has
specified on the destination command line should take precedence. However,
they are checked against the destination capabilities.
If the source was using a capability which is absent on the destination,
we fail the migration, since that could easily cause a guest crash or other
bad behaviour. If the source lacked a capability which is present on the
destination we warn, but allow the migration to proceed.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
This adds an spapr capability bit for Hardware Transactional Memory. It is
enabled by default for pseries-2.11 and earlier machine types. with POWER8
or later CPUs (as it must be, since earlier qemu versions would implicitly
allow it). However it is disabled by default for the latest pseries-2.12
machine type.
This means that with the latest machine type, HTM will not be available,
regardless of CPU, unless it is explicitly enabled on the command line.
That change is made on the basis that:
* This way running with -M pseries,accel=tcg will start with whatever cpu
and will provide the same guest visible model as with accel=kvm.
- More specifically, this means existing make check tests don't have
to be modified to use cap-htm=off in order to run with TCG
* We hope to add a new "HTM without suspend" feature in the not too
distant future which could work on both POWER8 and POWER9 cpus, and
could be enabled by default.
* Best guesses suggest that future POWER cpus may well only support the
HTM-without-suspend model, not the (frankly, horribly overcomplicated)
POWER8 style HTM with suspend.
* Anecdotal evidence suggests problems with HTM being enabled when it
wasn't wanted are more common than being missing when it was.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Because PAPR is a paravirtual environment access to certain CPU (or other)
facilities can be blocked by the hypervisor. PAPR provides ways to
advertise in the device tree whether or not those features are available to
the guest.
In some places we automatically determine whether to make a feature
available based on whether our host can support it, in most cases this is
based on limitations in the available KVM implementation.
Although we correctly advertise this to the guest, it means that host
factors might make changes to the guest visible environment which is bad:
as well as generaly reducing reproducibility, it means that a migration
between different host environments can easily go bad.
We've mostly gotten away with it because the environments considered mature
enough to be well supported (basically, KVM on POWER8) have had consistent
feature availability. But, it's still not right and some limitations on
POWER9 is going to make it more of an issue in future.
This introduces an infrastructure for defining "sPAPR capabilities". These
are set by default based on the machine version, masked by the capabilities
of the chosen cpu, but can be overriden with machine properties.
The intention is at reset time we verify that the requested capabilities
can be supported on the host (considering TCG, KVM and/or host cpu
limitations). If not we simply fail, rather than silently modifying the
advertised featureset to the guest.
This does mean that certain configurations that "worked" may now fail, but
such configurations were already more subtly broken.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
The 'pnv' prefix is now used for all and the routines populating the
device tree start with 'pnv_dt'. The handler of the PnvXScomInterface
is also renamed to 'dt_xscom' which should reflect that it is
populating the device tree under the 'xscom@' node of the chip.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
LoPAPR 1.1 B.6.9.1.2 describes the "#interrupt-cells" property of the
PowerPC External Interrupt Source Controller node as follows:
“#interrupt-cells”
Standard property name to define the number of cells in an interrupt-
specifier within an interrupt domain.
prop-encoded-array: An integer, encoded as with encode-int, that denotes
the number of cells required to represent an interrupt specifier in its
child nodes.
The value of this property for the PowerPC External Interrupt option shall
be 2. Thus all interrupt specifiers (as used in the standard “interrupts”
property) shall consist of two cells, each containing an integer encoded
as with encode-int. The first integer represents the interrupt number the
second integer is the trigger code: 0 for edge triggered, 1 for level
triggered.
This patch fixes the interrupt specifiers in the "interrupt-map" property
of the PHB node, that were setting the second cell to 8 (confusion with
IRQ_TYPE_LEVEL_LOW ?) instead of 1.
VIO devices and RTAS event sources use the same format for interrupt
specifiers: while here, we introduce a common helper to handle the
encoding details.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
--
v3: - reference public LoPAPR instead of internal PAPR+ in changelog
- change helper name to spapr_dt_xics_irq()
v2: - drop the erroneous changes to the "interrupts" prop in PCI device nodes
- introduce a common helper to encode interrupt specifiers
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
xics_get_qirq() is only used by the sPAPR machine. Let's move it there
and change its name to reflect its scope. It will be useful for XIVE
support which will use its own set of qirqs.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also change the prototype to use a sPAPRMachineState and prefix them
with spapr_irq_. It will let us synchronise the IRQ allocation with
the XIVE interrupt mode when available.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The sPAPR and the PowerNV core objects create the interrupt presenter
object of the CPUs in a very similar way. Let's provide a common
routine in which we use the presenter 'type' as a child identifier.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The current code assumes that only the CPU core object holds a
reference on each individual CPU object, and happily frees their
allocated memory when the core is unrealized. This is dangerous
as some other code can legitimely keep a pointer to a CPU if it
calls object_ref(), but it would end up with a dangling pointer.
Let's allocate all CPUs with object_new() and let QOM free them
when their reference count reaches zero. This greatly simplify the
code as we don't have to fiddle with the instance size anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When using the emulated XICS, the 'info pic' monitor command shows:
CPU 0 XIRR=ff000000 ((nil)) PP=ff MFRR=ff
ICS 1000..13ff 0x10040060340
1000 MSI 05 00
1001 MSI 05 00
1002 MSI 05 00
1003 MSI ff 00
1004 LSI ff 00
1005 LSI ff 00
1006 LSI ff 00
1007 LSI ff 00
1008 MSI 05 00
1009 MSI 05 00
100a MSI 05 00
100b MSI 05 00
100c MSI 05 00
but when using the in-kernel XICS with the very same guest, we get:
CPU 0 XIRR=00000000 ((nil)) PP=ff MFRR=ff
ICS 1000..13ff 0x10032e00340
1000 MSI ff 00
1001 MSI ff 00
1002 MSI ff 00
1003 MSI ff 00
1004 LSI ff 00
1005 LSI ff 00
1006 LSI ff 00
1007 LSI ff 00
1008 MSI ff 00
1009 MSI ff 00
100a MSI ff 00
100b MSI ff 00
100c MSI ff 00
ie, all irqs are masked and XIRR is null, while we should get the
same output as with the emulated XICS.
If the guest is then migrated, 'info pic' shows the expected values
on both source and destination.
The problem is that QEMU doesn't synchronize with KVM before printing
the XICS state. Migration happens to fix the output because it enforces
synchronization with KVM.
To fix the invalid output of 'info pic', this patch introduces a new
synchronize_state operation for both ICPStateClass and ICSStateClass.
The ICP operation relies on run_on_cpu() in order to kick the vCPU
and avoid sleeping on KVM_GET_ONE_REG.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
deduce core type directly from chip type instead of
maintaining type mapping in PnvChipClass::cpu_model.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
deduce cpu type directly from core type instead of
maintaining type mapping in PnvCoreClass::cpu_oc and doing
extra cpu_model parsing in pnv_core_class_init()
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
typically for cpus/core type names following convention is used
new_type_prefix-superclass_typename
make PNV core/chip to follow common convention.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
use common cpu_model prasing in vl.c and set default cpu_model
using generic MachineClass::default_cpu_type.
Beside of switching to generic infrastructure it solves several
issues.
* ppc_cpu_class_by_name() is used to deal with lower/upper case
and alias translations into actual cpu type, which fixes
'-M powernv -cpu power8' and '-M powernv -cpu power9_v1.0'
usecases which error out with:
'invalid CPU model 'FOO' for powernv machine'
* allows to switch to lower-case typenames in pnv chip/core name
(by convention typnames should be lower-case)
* replace aliased names /power8, power9, .../ with exact cpu model
names (i.e. typenames should be stable but aliases might decide to
point to other cpu model withi family or changed by kvm). It will
also help to simplify pnv_chip/core code and get rid of dependency
on cpu_model parsing.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[dwg: Updated to make DD2.0 as default POWER9 chip]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
use generic cpu_model parsing introduced by
(6063d4c0f vl.c: convert cpu_model to cpu type and set of global properties before machine_init())
it allows to:
* replace sPAPRMachineClass::tcg_default_cpu with
MachineClass::default_cpu_type
* drop cpu_parse_cpu_model() from hw/ppc/spapr.c and reuse
one in vl.c
* simplify spapr_get_cpu_core_type() by removing
not needed anymore recurrsion since alias look up
happens earlier at vl.c and spapr_get_cpu_core_type()
works only with resulted from that cpu type.
* spapr no more needs to parse/depend on being phased out
MachineState::cpu_model, all tha parsing done by generic
code and target specific callback.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[dwg: Correct minor compile error]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
consolidate 'host' core type registration by moving it from
KVM specific code into spapr_cpu_core.c, similar like it's
done in x86 target.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
replace sPAPRCPUCoreClass::cpu_class with cpu type name
since it were needed just to get that at points it were
accessed.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr core type definition doesn't have any fields that
require it to be defined at runtime. So replace code
that fills in TypeInfo at runtime with static TypeInfo
array that does the same at complie time.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
there is a dedicated callback CPUClass::parse_features
which purpose is to convert -cpu features into a set of
global properties AND deal with compat/legacy features
that couldn't be directly translated into CPU's properties.
Create ppc variant of it (ppc_cpu_parse_featurestr) and
move 'compat=val' handling from spapr_cpu_core.c into it.
That removes a dependency of board/core code on cpu_model
parsing and would let to reuse common -cpu parsing
introduced by 6063d4c0
Set "max-cpu-compat" property only if it exists, in practice
it should limit 'compat' hack to spapr machine and allow
to avoid including machine/spapr headers in target/ppc/cpu.c
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ppc_cpu_parse_features() is doing practically the same thing as
generic cpu_parse_cpu_model(). So remove duplicated impl. and
reuse generic one.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead we can now instantiate the MAC_DBDMA object directly within the
macio device. We also add the DBDMA device as a child property so that
it is possible to retrieve later.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These fields were used to manually handle IO requests that weren't aligned
to a sector boundary before this feature was supported by the block API.
Once the block API changed to support byte-aligned IO requests, the macio
controller was switched over to use it in commit be1e343 but these fields
were accidentally left behind. Remove them, including the initialisation
in DBDMA_init().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Although none of the existing macro call-sites were broken,
it's always better to write macros that properly parenthesize
arguments that can be complex expressions, so that the intended
order of operations is not broken.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Apple uses an IBM MPIC2A without timers, it has 64 sources.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility or
in XIVE exploitation mode. Now that we have initial guest support for
the XIVE interrupt controller, let's fix the bits definition which have
evolved in the latest specs.
The platform advertises the XIVE Exploitation Mode support using the
property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :
- 0b00 XIVE legacy mode Only
- 0b01 XIVE exploitation mode Only
- 0b10 XIVE legacy or exploitation mode
The OS asks for XIVE Exploitation Mode support using the property
"ibm,architecture-vec-5", byte 23 bits 0-1:
- 0b00 XIVE legacy mode Only
- 0b01 XIVE exploitation mode Only
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM now allows writing to KVM_CAP_PPC_SMT which has previously been
read only. Doing so causes KVM to act, for that VM, as if the host's
SMT mode was the given value. This is particularly important on Power
9 systems because their default value is 1, but they are able to
support values up to 8.
This patch introduces a way to control this capability via a new
machine property called VSMT ("Virtual SMT"). If the value is not set
on the command line a default is chosen that is, when possible,
compatible with legacy systems.
Note that the intialization of KVM_CAP_PPC_SMT has changed slightly
because it has changed (in KVM) from a global capability to a
VM-specific one. This won't cause a problem on older KVMs because VM
capabilities fall back to global ones.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Allow MAL with more RX and TX channels as found in newer versions.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This device appears in other SoCs as well not just in 405 ones
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The concept of a VCPU ID that differs from the CPU's index
(cpu->cpu_index) exists only within SPAPR machines so, move the
functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c
and rename them appropriately.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch is a follow up on the discussions made in patch
"hw/ppc: disable hotplug before CAS is completed" that can be
found at [1].
At this moment, we do not support CPU/memory hotplug in early
boot stages, before CAS. When a hotplug occurs, the event is logged
in an internal RTAS event log queue and an IRQ pulse is fired. In
regular conditions, the guest handles the interrupt by executing
check_exception, fetching the generated hotplug event and enabling
the device for use.
In early boot, this IRQ isn't caught (SLOF does not handle hotplug
events), leaving the event in the rtas event log queue. If the guest
executes check_exception due to another hotplug event, the re-assertion
of the IRQ ends up de-queuing the first hotplug event as well. In short,
a device hotplugged before CAS is considered coldplugged by SLOF.
This leads to device misbehavior and, in some cases, guest kernel
Ooops when trying to unplug the device.
A proper fix would be to turn every device hotplugged before CAS
as a colplugged device. This is not trivial to do with the current
code base though - the FDT is written in the guest memory at
ppc_spapr_reset and can't be retrieved without adding extra state
(fdt_size for example) that will need to managed and migrated. Adding
the hotplugged DT in the middle of CAS negotiation via the updated DT
tree works with CPU devs, but panics the guest kernel at boot. Additional
analysis would be necessary for LMBs and PCI devices. There are
questions to be made in QEMU/SLOF/kernel level about how we can make
this change in a sustainable way.
With Linux guests, a fix would be the kernel executing check_exception
at boot time, de-queueing the events that happened in early boot and
processing them. However, even if/when the newer kernels start
fetching these events at boot time, we need to take care of older
kernels that won't be doing that.
This patch works around the situation by issuing a CAS reset if a hotplugged
device is detected during CAS:
- the DRC conditions that warrant a CAS reset is the same as those that
triggers a DRC migration - the DRC must have a device attached and
the DRC state is not equal to its ready_state. With that in mind, this
patch makes use of 'spapr_drc_needed' to determine if a CAS reset
is needed.
- In the middle of CAS negotiations, the function
'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see
if there are any DRC that requires a reset, using spapr_drc_needed. If
that happens, returns '1' in 'spapr_h_cas_compose_response' which will set
spapr->cas_reboot to true, causing the machine to reboot.
No changes are made for coldplug devices.
[1] http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02855.html
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The sPAPR machine isn't clearing up the pending events QTAILQ on
machine reboot. This allows for unprocessed hotplug/epow events
to persist in the queue after reset and, when reasserting the IRQs in
check_exception later on, these will be being processed by the OS.
This patch implements a new function called 'spapr_clear_pending_events'
that clears up the pending_events QTAILQ. This helper is then called
inside ppc_spapr_reset to clear up the events queue, preventing
old/deprecated events from persisting after a reset.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We've now implemented a PAPR extension allowing PAPR guest to resize
their hash page table (HPT) during runtime.
This patch makes use of that facility to allocate smaller HPTs by default.
Specifically when a guest is aware of the HPT resize facility, qemu sizes
the HPT to the initial memory size, rather than the maximum memory size on
the assumption that the guest will resize its HPT if necessary for hot
plugged memory.
When the initial memory size is much smaller than the maximum memory size
(a common configuration with e.g. oVirt / RHEV) then this can save
significant memory on the HPT.
If the guest does *not* advertise HPT resize awareness when it makes the
ibm,client-architecture-support call, qemu resizes the HPT for maxmimum
memory size (unless it's been configured not to allow such guests at all).
For now we make that reallocation assuming the guest has not yet used the
HPT at all. That's true in practice, but not, strictly, an architectural
or PAPR requirement. If we need to in future we can fix this by having
the client-architecture-support call reboot the guest with the revised
HPT size (the client-architecture-support call is explicitly permitted to
trigger a reboot in this way).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This introduces stub implementations of the H_RESIZE_HPT_PREPARE and
H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR
extension to allow run time resizing of a guest's hash page table. It
also adds a new machine property for controlling whether this new
facility is available.
For now we only allow resizing with TCG, allowing it with KVM will require
kernel changes as well.
Finally, it adds a new string to the hypertas property in the device
tree, advertising to the guest the availability of the HPT resizing
hypercalls. This is a tentative suggested value, and would need to be
standardized by PAPR before being merged.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
e6f7e110ee "ppc/xics: remove the XICSState classes" got rid of
XICSState, this is just an leftover.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
According to PAPR, the DR-indicator should only be valid for physical DRCs,
not logical DRCs. At the moment we implement it for all DRCs, so restrict
it to physical ones only.
We move the state to the physical DRC subclass, which means adding some
QOM boilerplate to handle the newly distinct type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Most of the time, the state of a DRC object is contained in the single
'state' variable. However, during the transition from UNISOLATE to
CONFIGURED state requires multiple calls to the ibm,configure-connector
RTAS call to retrieve the device tree for the attached device. We need
some extra state to keep track of where we're up to in delivering the
device tree information to the guest.
Currently that extra state is in a sPAPRConfigureConnectorState
substructure which is only allocated when we're in the middle of the
configure connector process. That sounds like a good idea, but the extra
state is only two integers - on many platforms that will take up the same
room as the (maybe NULL) ccs pointer even before malloc() overhead. Plus
it's another object whose lifetime we need to manage. In short, it's not
worth it.
So, fold the sPAPRConfigureConnectorState substructure directly into the
DRC object.
Previously the structure was allocated lazily when the configure-connector
call discovers it's not there. Now, we need to initialize the subfields
pre-emptively, as soon as we enter UNISOLATE state.
Although it's not strictly necessary (the field values should only ever
be consulted when in UNISOLATE state), we try to keep them at -1 when in
other states, as a debugging aid.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Each DRC has three fields describing its state: isolation_state,
allocation_state and configured. At first this seems like a reasonable
representation, since its based directly on the PAPR defined
isolation-state and allocation-state indicators. However:
* Only a few combinations of the two fields' values are permitted
* allocation_state isn't used at all for physical DRCs
* The indicators are write only so they don't really have a well
defined current value independent of each other
This replaces these variables with a single state variable, whose names
and numbers are based on the diagram in LoPAPR section 13.4. Along with
this we add code to check the current state on various operations and make
sure the requested transition is permitted.
Strictly speaking, this makes guest visible changes to behaviour (since we
probably allowed some transitions we shouldn't have before). However, a
hypothetical guest broken by that wasn't PAPR compliant, and probably
wouldn't have worked under PowerVM.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
'awaiting_release' indicates that the host has requested an unplug of the
device attached to the DRC, but the guest has not (yet) put the device
into a state where it is safe to complete removal.
1. Rename it to 'unplug_requested' which to me at least is clearer
2. Remove the ->release_pending() method used to check this from outside
spapr_drc.c. The method only plausibly has one implementation, so use
a plain function (spapr_drc_unplug_requested()) instead.
3. Remove it from the migration stream. Attempting to migrate mid-unplug
is broken not just for spapr - in general management has no good way to
determine if the device should be present on the destination or not. So,
until that's fixed, there's no point adding extra things to the stream.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
This function has two unused parameters - remove them.
It also sets awaiting_release on all paths, except one. On that path
setting it is harmless, since it will be immediately cleared by
spapr_drc_release(). So factor it out of the if statements.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
The awaiting_allocation flag in the DRC was introduced by aab9913
"spapr_drc: Prevent detach racing against attach for CPU DR", allegedly to
prevent a guest crash on racing attach and detach. Except.. information
from the BZ actually suggests a qemu crash, not a guest crash. And there
shouldn't be a problem here anyway: if the guest has already moved the DRC
away from UNUSABLE state, the detach would already be deferred, and if it
hadn't it should be safe to detach it (the guest should fail gracefully
when it attempts to change the allocation state).
I think this was probably just a bandaid for some other problem in the
state management. So, remove awaiting_allocation and associated code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
When migrating a guest which has already had devices hotplugged,
libvirt typically starts the destination qemu with -incoming defer,
adds those hotplugged devices with qmp, then initiates the incoming
migration.
This causes problems for the management of spapr DRC state. Because
the device is treated as hotplugged, it goes into a DRC state for a
device immediately after it's plugged, but before the guest has
acknowledged its presence. However, chances are the guest on the
source machine *has* acknowledged the device's presence and configured
it.
If the source has fully configured the device, then DRC state won't be
sent in the migration stream: for maximum migration compatibility with
earlier versions we don't migrate DRCs in coldplug-equivalent state.
That means that the DRC effectively changes state over the migrate,
causing problems later on.
In addition, logging hotplug events for these devices isn't what we
want because a) those events should already have been issued on the
source host and b) the event queue should get wiped out by the
incoming state anyway.
In short, what we really want is to treat devices added before an
incoming migration as if they were coldplugged.
To do this, we first add a spapr_drc_hotplugged() helper which
determines if the device is hotplugged in the sense relevant for DRC
state management. We only send hotplug events when this is true.
Second, when we add a device which isn't hotplugged in this sense, we
force a reset of the DRC state - this ensures the DRC is in a
coldplug-equivalent state (there isn't usually a system reset between
these device adds and the incoming migration).
This is based on an earlier patch by Laurent Vivier, cleaned up and
extended.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
The rtas_error_log structure is marked packed, which strongly suggests its
precise layout is important to match an external interface. Along with
that one could expect it to have a fixed endianness to match the same
interface. That used to be the case - matching the layout of PAPR RTAS
event format and requiring BE fields.
Now, however, it's only used embedded within sPAPREventLogEntry with the
fields in native order, since they're processed internally.
Clear that up by removing the nested structure in sPAPREventLogEntry.
struct rtas_error_log is moved back to spapr_events.c where it is used as
a temporary to help convert the fields in sPAPREventLogEntry to the correct
in memory format when delivering an event to the guest.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In racing situations between hotplug events and migration operation,
a rtas hotplug event could have not yet be delivered to the source
guest when migration is started. In this case the pending_events of
spapr state need be transmitted to the target so that the hotplug
event can be finished on the target.
To achieve the minimal VMSD possible to migrate the pending_events list,
this patch makes the changes in spapr_events.c:
- 'log_type' of sPAPREventLogEntry struct deleted. This information can be
derived by inspecting the rtas_error_log summary field. A new function
called 'spapr_event_log_entry_type' was added to retrieve the type of
a given sPAPREventLogEntry.
- sPAPREventLogEntry, epow_log_full and hp_log_full were redesigned. The
only data we're going to migrate in the VMSD is the event log data itself,
which can be divided in two parts: a rtas_error_log header and an extended
event log field. The rtas_error_log header contains information about the
size of the extended log field, which can be used inside VMSD as the size
parameter of the VBUFFER_ALOC field that will store it. To allow this use,
the header.extended_length field must be exposed inline to the VMSD instead
of embedded into a 'data' field that holds everything. With this in mind,
the following changes were done:
* a new 'header' field was added to sPAPREventLogEntry. This field holds a
a struct rtas_error_log inline.
* the declaration of the 'rtas_error_log' struct was moved to spapr.h
to be visible to the VMSD macros.
* 'data' field of sPAPREventLogEntry was renamed to 'extended_log' and
now holds only the contents of the extended event log.
* 'struct rtas_error_log hdr' were taken away from both epow_log_full
and hp_log_full. This information is now available at the header field of
sPAPREventLogEntry.
* epow_log_full and hp_log_full were renamed to epow_extended_log and
hp_extended_log respectively. This rename makes it clearer to understand
the new purpose of both structures: hold the information of an extended
event log field.
* spapr_powerdown_req and spapr_hotplug_req_event now creates a
sPAPREventLogEntry structure that contains the full rtas log entry.
* rtas_event_log_queue and rtas_event_log_dequeue now receives a
sPAPREventLogEntry pointer as a parameter instead of a void pointer.
- the endianess of the sPAPREventLogEntry header is now native instead
of be32. We can use the fields in native endianess internally and write
them in be32 in the guest physical memory inside 'check_exception'. This
allows the VMSD inside spapr.c to read the correct size of the
entended_log field.
- inside spapr.c, pending_events is put in a subsection in the spapr state
VMSD to make sure migration across different versions is not broken.
A small change in rtas_event_log_queue and rtas_event_log_dequeue were also
made: instead of calling qdev_get_machine(), both functions now receive
a pointer to the sPAPRMachineState. This pointer is already available in
the callers of these functions and we don't need to waste resources
calling qdev() again.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This finishes QOM'fication of IOMMUMemoryRegion by introducing
a IOMMUMemoryRegionClass. This also provides a fastpath analog for
IOMMU_MEMORY_REGION_GET_CLASS().
This makes IOMMUMemoryRegion an abstract class.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170711035620.4232-3-aik@ozlabs.ru>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion
as a parent.
This moves IOMMU-related fields from MR to IOMMU MR. However to avoid
dymanic QOM casting in fast path (address_space_translate, etc),
this adds an @is_iommu boolean flag to MR and provides new helper to
do simple cast to IOMMU MR - memory_region_get_iommu. The flag
is set in the instance init callback. This defines
memory_region_is_iommu as memory_region_get_iommu()!=NULL.
This switches MemoryRegion to IOMMUMemoryRegion in most places except
the ones where MemoryRegion may be an alias.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20170711035620.4232-2-aik@ozlabs.ru>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility
(the former POWER8 interrupt model) or in XIVE exploitation mode (the
newer POWER9 interrupt model).
Bit 7 of Byte 23 of vector 5 is used for this purpose.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_drc_attach() has a 'coldplug' parameter which sets the DRC into
configured state initially, instead of the usual ISOLATED/UNUSABLE state.
It turns out this is unnecessary: although coldplugged devices do need to
be in CONFIGURED state once the guest starts, that will already be
accomplished by the reset code which will move DRCs for already plugged
devices into a coldplug equivalent state.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
At the moment, spapr_drc_release() has an ugly switch on the DRC type to
call the right, device-specific release function. This cleans it up by
doing that via a proper QOM method.
It's still arguably an abstraction violation for the DRC code to call into
the specific device code, but one mess at a time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
We have more of these since the addition of KVMPPC_H_LOGICAL_MEMOP in 2012.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since commit ff9006ddbf ("spapr: move spapr_core_[foo]plug() callbacks
close to machine code in spapr.c"), this function doesn't need to be extern
anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>