Commit Graph

197 Commits

Author SHA1 Message Date
David Gibson
64d4a53431 spapr: Add forgotten capability to migration stream
spapr machine capabilities are supposed to be sent in the migration stream
so that we can sanity check the source and destination have compatible
configuration.  Unfortunately, when we added the hpt-max-page-size
capability, we forgot to add it to the migration state.  This means that we
can generate spurious warnings when both ends are configured for large
pages, or potentially fail to warn if the source is configured for huge
pages, but the destination is not.

Fixes: 2309832afd "spapr: Maximum (HPT) pagesize property"

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-05-29 11:39:45 +10:00
Benjamin Herrenschmidt
a2dd4e83e7 ppc/hash64: Rework R and C bit updates
With MT-TCG, we are now running translation in a racy way, thus
we need to mimic hardware when it comes to updating the R and
C bits, by doing byte stores.

The current "store_hpte" abstraction is ill suited for this, we
replace it with two separate callbacks for setting R and C.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190411080004.8690-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 11:37:57 +10:00
Cédric Le Goater
64db6c70dc spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers
Removing RTAS handlers will become necessary when the new pseries
machine supporting multiple interrupt mode is introduced.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190321144914.19934-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 10:41:23 +10:00
Alexey Kardashevskiy
ec132efaa8 spapr: Support NVIDIA V100 GPU with NVLink2
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory
space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver
implements special regions for such GPUs and emulates an NVLink bridge.
NVLink2-enabled POWER9 CPUs also provide address translation services
which includes an ATS shootdown (ATSD) register exported via the NVLink
bridge device.

This adds a quirk to VFIO to map the GPU memory and create an MR;
the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses
this to get the MR and map it to the system address space.
Another quirk does the same for ATSD.

This adds additional steps to sPAPR PHB setup:

1. Search for specific GPUs and NPUs, collect findings in
sPAPRPHBState::nvgpus, manage system address space mappings;

2. Add device-specific properties such as "ibm,npu", "ibm,gpu",
"memory-block", "link-speed" to advertise the NVLink2 function to
the guest;

3. Add "mmio-atsd" to vPHB to advertise the ATSD capability;

4. Add new memory blocks (with extra "linux,memory-usable" to prevent
the guest OS from accessing the new memory until it is onlined) and
npuphb# nodes representing an NPU unit for every vPHB as the GPU driver
uses it for link discovery.

This allocates space for GPU RAM and ATSD like we do for MMIOs by
adding 2 new parameters to the phb_placement() hook. Older machine types
set these to zero.

This puts new memory nodes in a separate NUMA node to as the GPU RAM
needs to be configured equally distant from any other node in the system.
Unlike the host setup which assigns numa ids from 255 downwards, this
adds new NUMA nodes after the user configures nodes or from 1 if none
were configured.

This adds requirement similar to EEH - one IOMMU group per vPHB.
The reason for this is that ATSD registers belong to a physical NPU
so they cannot invalidate translations on GPUs attached to another NPU.
It is guaranteed by the host platform as it does not mix NVLink bridges
or GPUs from different NPU in the same IOMMU group. If more than one
IOMMU group is detected on a vPHB, this disables ATSD support for that
vPHB and prints a warning.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for vfio portions]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190312082103.130561-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 10:41:23 +10:00
David Gibson
0a794529bd spapr: Simplify handling of host-serial and host-model values
27461d69a0 "ppc: add host-serial and host-model machine attributes
(CVE-2019-8934)" introduced 'host-serial' and 'host-model' machine
properties for spapr to explicitly control the values advertised to the
guest in device tree properties with the same names.

The previous behaviour on KVM was to unconditionally populate the device
tree with the real host serial number and model, which leaks possibly
sensitive information about the host to the guest.

To maintain compatibility for old machine types, we allowed those props
to be set to "passthrough" to take the value from the host as before.  Or
they could be set to "none" to explicitly omit the device tree items.

Special casing specific values on what's otherwise a user supplied string
is very ugly.  So, this patch simplifies things by implementing the
backwards compatibility in a different way: we have a machine class flag
set for the older machines, and we only load the host values into the
device tree if A) they're not set by the user and B) we have that flag set.

This does mean that the "passthrough" functionality is no longer available
with the current machine type.  That's ok though: if a user or management
layer really wants the information passed through they can read it
themselves (OpenStack Nova already does something similar for x86).

It also means the user can't explicitly ask for the values to be omitted
on the old machine types.  I think that's an acceptable trade-off: if you
care enough about not leaking the host information you can either move to
the new machine type, or use a dummy value for the properties.

For the new machine type, this also removes an odd inconsistency
between running on a POWER and non-POWER (or non-Linux) hosts: if the
host information couldn't be read from where we expect (in the host's
device tree as exposed by Linux), we'd fallback to omitting the guest
device tree items.

While we're there, improve some poorly worded comments, and the help text
for the properties.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
2019-03-29 10:25:50 +11:00
David Gibson
ce2918cbc3 spapr: Use CamelCase properly
The qemu coding standard is to use CamelCase for type and structure names,
and the pseries code follows that... sort of.  There are quite a lot of
places where we bend the rules in order to preserve the capitalization of
internal acronyms like "PHB", "TCE", "DIMM" and most commonly "sPAPR".

That was a bad idea - it frequently leads to names ending up with hard to
read clusters of capital letters, and means they don't catch the eye as
type identifiers, which is kind of the point of the CamelCase convention in
the first place.

In short, keeping type identifiers look like CamelCase is more important
than preserving standard capitalization of internal "words".  So, this
patch renames a heap of spapr internal type names to a more standard
CamelCase.

In addition to case changes, we also make some other identifier renames:
  VIOsPAPR* -> SpaprVio*
    The reverse word ordering was only ever used to mitigate the capital
    cluster, so revert to the natural ordering.
  VIOsPAPRVTYDevice -> SpaprVioVty
  VIOsPAPRVLANDevice -> SpaprVioVlan
    Brevity, since the "Device" didn't add useful information
  sPAPRDRConnector -> SpaprDrc
  sPAPRDRConnectorClass -> SpaprDrcClass
    Brevity, and makes it clearer this is the same thing as a "DRC"
    mentioned in many other places in the code

This is 100% a mechanical search-and-replace patch.  It will, however,
conflict with essentially any and all outstanding patches touching the
spapr code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:05 +11:00
Alexey Kardashevskiy
5f36666722 spapr_iommu: Do not replay mappings from just created DMA window
On sPAPR vfio_listener_region_add() is called in 2 situations:
1. a new listener is registered from vfio_connect_container();
2. a new IOMMU Memory Region is added from rtas_ibm_create_pe_dma_window().

In both cases vfio_listener_region_add() calls
memory_region_iommu_replay() to notify newly registered IOMMU notifiers
about existing mappings which is totally desirable for case 1.

However for case 2 it is nothing but noop as the window has just been
created and has no valid mappings so replaying those does not do anything.
It is barely noticeable with usual guests but if the window happens to be
really big, such no-op replay might take minutes and trigger RCU stall
warnings in the guest.

For example, a upcoming GPU RAM memory region mapped at 64TiB (right
after SPAPR_PCI_LIMIT) causes a 64bit DMA window to be at least 128TiB
which is (128<<40)/0x10000=2.147.483.648 TCEs to replay.

This mitigates the problem by adding an "skipping_replay" flag to
sPAPRTCETable and defining sPAPR own IOMMU MR replay() hook which does
exactly the same thing as the generic one except it returns early if
@skipping_replay==true.

Another way of fixing this would be delaying replay till the very first
H_PUT_TCE but this does not work if in-kernel H_PUT_TCE handler is
enabled (a likely case).

When "ibm,create-pe-dma-window" is complete, the guest will map only
required regions of the huge DMA window.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190307050518.64968-2-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
David Gibson
e075623aa5 spapr: Force SPAPR_MEMORY_BLOCK_SIZE to be a hwaddr (64-bit)
SPAPR_MEMORY_BLOCK_SIZE is logically a difference in memory addresses, and
hence of type hwaddr which is 64-bit.  Previously it wasn't marked as such
which means that it could be treated as 32-bit.  That will work in some
circumstances but if multiplied by another 32-bit value it could lead to
a 32-bit overflow and an incorrect result.

One specific instance of this in spapr_lmb_dt_populate() was spotted by
Coverity (CID 1399145).

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Suraj Jitindar Singh
8ff43ee404 target/ppc/spapr: Add SPAPR_CAP_CCF_ASSIST
Introduce a new spapr_cap SPAPR_CAP_CCF_ASSIST to be used to indicate
the requirement for a hw-assisted version of the count cache flush
workaround.

The count cache flush workaround is a software workaround which can be
used to flush the count cache on context switch. Some revisions of
hardware may have a hardware accelerated flush, in which case the
software flush can be shortened. This cap is used to set the
availability of such hardware acceleration for the count cache flush
routine.

The availability of such hardware acceleration is indicated by the
H_CPU_CHAR_BCCTR_FLUSH_ASSIST flag being set in the characteristics
returned from the KVM_PPC_GET_CPU_CHAR ioctl.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301031912.28809-2-sjitindarsingh@gmail.com>
[dwg: Small style fixes]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh
399b2896d4 target/ppc/spapr: Add workaround option to SPAPR_CAP_IBS
The spapr_cap SPAPR_CAP_IBS is used to indicate the level of capability
for mitigations for indirect branch speculation. Currently the available
values are broken (default), fixed-ibs (fixed by serialising indirect
branches) and fixed-ccd (fixed by diabling the count cache).

Introduce a new value for this capability denoted workaround, meaning that
software can work around the issue by flushing the count cache on
context switch. This option is available if the hypervisor sets the
H_CPU_BEHAV_FLUSH_COUNT_CACHE flag in the cpu behaviours returned from
the KVM_PPC_GET_CPU_CHAR ioctl.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301031912.28809-1-sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh
c982f5cf9a target/ppc/spapr: Add SPAPR_CAP_LARGE_DECREMENTER
Add spapr_cap SPAPR_CAP_LARGE_DECREMENTER to be used to control the
availability of the large decrementer for a guest.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301024317.22137-1-sjitindarsingh@gmail.com>
[dwg: Trivial style fix]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 12:07:49 +11:00
Greg Kurz
bb2bdd812e spapr: add hotplug hooks for PHB hotplug
Hotplugging PHBs is a machine-level operation, but PHBs reside on the
main system bus, so we register spapr machine as the handler for the
main system bus.

Provide the usual pre-plug, plug and unplug-request handlers.

Move the checking of the PHB index to the pre-plug handler. It is okay
to do that and assert in the realize function because the pre-plug
handler is always called, even for the oldest machine types we support.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(Fixed interrupt controller phandle in "interrupt-map" and
 TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059672926.1466090.13612804072190051439.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Michael Roth
962b6c3650 spapr: create DR connectors for PHBs
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059670389.1466090.10015601248906623076.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz
345b12b99e spapr: Generate FDT fragment for CPUs at configure connector time
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059666839.1466090.3833376527523126752.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz
62d38c9bd3 spapr: Generate FDT fragment for LMBs at configure connector time
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059666331.1466090.6766540766297333313.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt
00fd075e18 target/ppc/spapr: Set LPCR:HR when using Radix mode
The HW relies on LPCR:HR along with the PATE to determine whether
to use Radix or Hash mode. In fact it uses LPCR:HR more commonly
than the PATE.

For us, it's also more efficient to do so, especially since unlike
the HW we do not maintain a cache of the current PATE and HV PATE
in a generic place.

Prepare the grounds for that by ensuring that LPCR:HR is set
properly on SPAPR machines.

Another option would have been to use a callback to get the PATE
but this gets messy when implementing bare metal support, it's
much simpler (and faster) to use LPCR.

Since existing migration streams may not have it, fix it up in
spapr_post_load() as well based on the pseudo-PATE entry that
we keep.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Prasad J Pandit
27461d69a0 ppc: add host-serial and host-model machine attributes (CVE-2019-8934)
On ppc hosts, hypervisor shares following system attributes

  - /proc/device-tree/system-id
  - /proc/device-tree/model

with a guest. This could lead to information leakage and misuse.[*]
Add machine attributes to control such system information exposure
to a guest.

[*] https://wiki.openstack.org/wiki/OSSN/OSSN-0028

Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Fix-suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20190218181349.23885-1-ppandit@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz
56af66566d spapr/irq: Use the base ICP class for KVM
The base ICP class knows how to interact with KVM. Adapt sPAPR to use it
instead of the ICP KVM class.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023080638.1011724.792095453419098948.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:34:56 +11:00
Greg Kurz
5c7adcf422 spapr: Rename xics to intc in interrupt controller agnostic code
All this code is used with both the XICS and XIVE interrupt controllers.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17 21:54:02 +11:00
Thomas Huth
0d8d6a24fc ppc: Fix duplicated typedefs to be able to compile with Clang in gnu99 mode
When compiling the ppc code with clang and -std=gnu99, there are a
couple of warnings/errors like this one:

  CC      ppc64-softmmu/hw/intc/xics.o
In file included from hw/intc/xics.c:35:
include/hw/ppc/xics.h:43:25: error: redefinition of typedef 'ICPState' is a C11 feature
      [-Werror,-Wtypedef-redefinition]
typedef struct ICPState ICPState;
                        ^
target/ppc/cpu.h:1181:25: note: previous definition is here
typedef struct ICPState ICPState;
                        ^
Work around the problems by including the proper headers in spapr.h
and by using struct forward declarations in cpu.h.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2019-01-22 05:14:33 +01:00
Cédric Le Goater
872ff3dea3 spapr: move the qemu_irq array under the machine
The qemu_irq array is now allocated at the machine level using a sPAPR
IRQ set_irq handler depending on the chosen interrupt mode. The use of
this handler is slightly inefficient today but it will become necessary
when the 'dual' interrupt mode is introduced.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Alexey Kardashevskiy
fea35ca4b8 ppc/spapr: Receive and store device tree blob from SLOF
SLOF receives a device tree and updates it with various properties
before switching to the guest kernel and QEMU is not aware of any changes
made by SLOF. Since there is no real RTAS (QEMU implements it), it makes
sense to pass the SLOF final device tree to QEMU to let it implement
RTAS related tasks better, such as PCI host bus adapter hotplug.

Specifially, now QEMU can find out the actual XICS phandle (for PHB
hotplug) and the RTAS linux,rtas-entry/base properties (for firmware
assisted NMI - FWNMI).

This stores the initial DT blob in the sPAPR machine and replaces it
in the KVMPPC_H_UPDATE_DT (new private hypercall) handler.

This adds an @update_dt_enabled machine property to allow backward
migration.

SLOF already has a hypercall since
https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183

This makes use of the new fdt_check_full() helper. In order to allow
the configure script to pick the correct DTC version, this adjusts
the DTC presense test.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:13 +11:00
Laurent Vivier
c24ba3d0a3 spapr: Add H-Call H_HOME_NODE_ASSOCIATIVITY
H_HOME_NODE_ASSOCIATIVITY H-Call returns the associativity domain
designation associated with the identifier input parameter

This fixes a crash when we try to hotplug a CPU in memory-less and
CPU-less numa node. In this case, the kernel tries to online the
node, but without the information provided by this h-call, the node id,
it cannot and the CPU is started while the node is not onlined.

It also removes the warning message from the kernel:
  VPHN is not supported. Disabling polling..

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:13 +11:00
Cédric Le Goater
3ba3d0bc33 spapr: introduce an 'ic-mode' machine option
This option is used to select the interrupt controller mode (XICS or
XIVE) with which the machine will operate. XICS being the default
mode for now.

When running a machine with the XIVE interrupt mode backend, the guest
OS is required to have support for the XIVE exploitation mode. In the
case of legacy OS, the mode selected by CAS should be XICS and the OS
should fail to boot. However, QEMU could possibly detect it, terminate
the boot process and reset to stop in the SLOF firmware. This is not
yet handled.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:40:43 +11:00
Cédric Le Goater
db592b5b16 spapr: add an extra OV5 field to the sPAPR IRQ backend
The interrupt modes supported by the hypervisor are advertised to the
guest with new bits definitions of the option vector 5 of property
"ibm,arch-vec-5-platform-support. The byte 23 bits 0-1 of the OV5 are
defined as follow :

  0b00   PAPR 2.7 and earlier (Legacy systems)
  0b01   XIVE Exploitation mode only
  0b10   Either available

If the client/guest selects the XIVE interrupt mode, it informs the
hypervisor by returning the value 0b01 in byte 23 bits 0-1. A 0b00
value indicates the use of the XICS interrupt mode (Legacy systems).

The sPAPR IRQ backend is extended with these definitions and the
values are directly used to populate the "ibm,arch-vec-5-platform-support"
property. The interrupt mode is advertised under TCG and under KVM.
Although a KVM XIVE device is not yet available, the machine can still
operate with kernel_irqchip=off. However, we apply a restriction on
the CPU which is required to be a POWER9 when a XIVE interrupt
controller is in use.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:40:43 +11:00
Cédric Le Goater
23bcd5eb9a spapr: add hcalls support for the XIVE exploitation interrupt mode
The different XIVE virtualization structures (sources and event queues)
are configured with a set of Hypervisor calls :

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (ESB) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification configuration associated
   with the queue, only unconditional notification is supported for
   the moment. Reset is performed with a queue size of 0 and queueing
   is disabled in that case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the guest's internal interrupt structures to their
   initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure all notifications
   have reached their queue.

Calls that still need to be addressed :

   H_INT_SET_OS_REPORTING_LINE
   H_INT_GET_OS_REPORTING_LINE

See the code for more documentation on each hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Folded in fix for field accessors]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:37:38 +11:00
Cédric Le Goater
dcc345b61e spapr: introduce a new machine IRQ backend for XIVE
The XIVE IRQ backend uses the same layout as the new XICS backend but
covers the full range of the IRQ number space. The IRQ numbers for the
CPU IPIs are allocated at the bottom of this space, below 4K, to
preserve compatibility with XICS which does not use that range.

This should be enough given that the maximum number of CPUs is 1024
for the sPAPR machine under QEMU. For the record, the biggest POWER8
or POWER9 system has a maximum of 1536 HW threads (16 sockets, 192
cores, SMT8).

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:37:38 +11:00
Cédric Le Goater
1a518e7693 spapr: export and rename the xics_max_server_number() routine
The XIVE sPAPR IRQ backend will use it to define the number of ENDs of
the IC controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:29:10 +11:00
Suraj Jitindar Singh
b9a477b725 ppc/spapr_caps: Add SPAPR_CAP_NESTED_KVM_HV
Add the spapr cap SPAPR_CAP_NESTED_KVM_HV to be used to control the
availability of nested kvm-hv to the level 1 (L1) guest.

Assuming a hypervisor with support enabled an L1 guest can be allowed to
use the kvm-hv module (and thus run it's own kvm-hv guests) by setting:
-machine pseries,cap-nested-hv=true
or disabled with:
-machine pseries,cap-nested-hv=false

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-11-08 13:08:35 +11:00
Thomas Huth
0e947a89ce hw/ppc/spapr_rng: Introduce CONFIG_SPAPR_RNG switch for spapr_rng.c
The spapr-rng device is suboptimal when compared to virtio-rng, so
users might want to disable it in their builds. Thus let's introduce
a proper CONFIG switch to allow us to compile QEMU without this device.
The function spapr_rng_populate_dt is required for linking, so move it
to a different location.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-11-08 12:04:40 +11:00
Cédric Le Goater
ef01ed9d19 spapr: introduce a IRQ controller backend to the machine
This proposal moves all the related IRQ routines of the sPAPR machine
behind a sPAPR IRQ backend interface 'spapr_irq' to prepare for future
changes. First of which will be to increase the size of the IRQ number
space, then, will follow a new backend for the POWER9 XIVE IRQ controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21 14:28:45 +10:00
Cédric Le Goater
82cffa2eb2 spapr: introduce a fixed IRQ number space
This proposal introduces a new IRQ number space layout using static
numbers for all devices, depending on a device index, and a bitmap
allocator for the MSI IRQ numbers which are negotiated by the guest at
runtime.

As the VIO device model does not have a device index but a "reg"
property, we introduce a formula to compute an IRQ number from a "reg"
value. It should minimize most of the collisions.

The previous layout is kept in pre-3.1 machines raising the
'legacy_irq_allocation' machine class flag.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21 14:28:45 +10:00
Philippe Mathieu-Daudé
ab3dd74924 hw/ppc: Use the IEC binary prefix definitions
It eases code review, unit is explicit.

Patch generated using:

  $ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/

and modified manually.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20180625124238.25339-33-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-02 15:41:16 +02:00
David Gibson
123eec6552 spapr: Use maximum page size capability to simplify memory backend checking
The way we used to handle KVM allowable guest pagesizes for PAPR guests
required some convoluted checking of memory attached to the guest.

The allowable pagesizes advertised to the guest cpus depended on the memory
which was attached at boot, but then we needed to ensure that any memory
later hotplugged didn't change which pagesizes were allowed.

Now that we have an explicit machine option to control the allowable
maximum pagesize we can simplify this.  We just check all memory backends
against that declared pagesize.  We check base and cold-plugged memory at
reset time, and hotplugged memory at pre_plug() time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-22 14:19:07 +10:00
David Gibson
2309832afd spapr: Maximum (HPT) pagesize property
The way the POWER Hash Page Table (HPT) MMU is virtualized by KVM HV means
that every page that the guest puts in the pagetables must be truly
physically contiguous, not just GPA-contiguous.  In effect this means that
an HPT guest can't use any pagesizes greater than the host page size used
to back its memory.

At present we handle this by changing what we advertise to the guest based
on the backing pagesizes.  This is pretty bad, because it means the guest
sees a different environment depending on what should be host configuration
details.

As a start on fixing this, we add a new capability parameter to the
pseries machine type which gives the maximum allowed pagesizes for an
HPT guest.  For now we just create and validate the parameter without
making it do anything.

For backwards compatibility, on older machine types we set it to the max
available page size for the host.  For the 3.0 machine type, we fix it to
16, the intention being to only allow HPT pagesizes up to 64kiB by default
in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-22 14:19:07 +10:00
Cédric Le Goater
71b5c8d26e spapr: remove unused spapr_irq routines
spapr_irq_alloc_block and spapr_irq_alloc() are now deprecated.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21 21:22:53 +10:00
Cédric Le Goater
4fe75a8ccd spapr: split the IRQ allocation sequence
Today, when a device requests for IRQ number in a sPAPR machine, the
spapr_irq_alloc() routine first scans the ICSState status array to
find an empty slot and then performs the assignement of the selected
numbers. Split this sequence in two distinct routines : spapr_irq_find()
for lookups and spapr_irq_claim() for claiming the IRQ numbers.

This will ease the introduction of a static layout of IRQ numbers.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21 21:22:53 +10:00
David Gibson
e2e4f64118 spapr: Add cpu_apply hook to capabilities
spapr capabilities have an apply hook to actually activate (or deactivate)
the feature in the system at reset time.  However, a number of capabilities
affect the setup of cpus, and need to be applied to each of them -
including hotplugged cpus for extra complication.  To make this simpler,
add an optional cpu_apply hook that is called from spapr_cpu_reset().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2018-06-21 21:22:53 +10:00
David Gibson
9f6edd066e spapr: Compute effective capability values earlier
Previously, the effective values of the various spapr capability flags
were only determined at machine reset time.  That was a lazy way of making
sure it was after cpu initialization so it could use the cpu object to
inform the defaults.

But we've now improved the compat checking code so that we don't need to
instantiate the cpus to use it.  That lets us move the resolution of the
capability defaults much earlier.

This is going to be necessary for some future capabilities.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2018-06-21 21:22:53 +10:00
David Hildenbrand
0c9269a52d spapr: rename "hotplug memory" terminology to "device memory"
Let's make it clear at relevant places that we are dealing with device
memory. That it can be used for memory hotplug is just a special case.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-11-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
[ehabkost: rebased series, solved conflicts at spapr.c]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-07 10:00:02 -03:00
David Hildenbrand
b0c14ec4ef machine: make MemoryHotplugState accessible via the machine
Let's allow to query the MemoryHotplugState directly from the machine.
If the pointer is NULL, the machine does not support memory devices. If
the pointer is !NULL, the machine supports memory devices and the
data structure contains information about the applicable physical
guest address space region.

This allows us to generically detect if a certain machine has support
for memory devices, and to generically manage it (find free address
range, plug/unplug a memory region).

We will rename "MemoryHotplugState" to something more meaningful
("DeviceMemory") after we completed factoring out the pc-dimm code into
MemoryDevice code.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
[ehabkost: rebased series, solved conflicts at spapr.c]
[ehabkost: squashed fix to use g_malloc0()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-07 10:00:02 -03:00
Suraj Jitindar Singh
c76c0d3090 ppc/spapr-caps: Convert cap-ibs to custom spapr-cap
Convert cap-ibs (indirect branch speculation) to a custom spapr-cap
type.

All tristate caps have now been converted to custom spapr-caps, so
remove the remaining support for them.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Don't explicitly list "?"/help option, trust convention]
[dwg: Fold tristate removal into here, to not break bisect]
[dwg: Fix minor style problems]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-03-06 13:16:29 +11:00
Greg Kurz
14bb4486c8 spapr: rename spapr_vcpu_id() to spapr_get_vcpu_id()
The spapr_vcpu_id() function is an accessor actually. Let's rename it
for symmetry with the recently added spapr_set_vcpu_id() helper.

The motivation behind this is that a later patch will consolidate
the VCPU id formula in a function and spapr_vcpu_id looks like an
appropriate name.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-16 12:14:26 +11:00
Greg Kurz
648edb6475 spapr: move VCPU calculation to core machine code
The VCPU ids are currently computed and assigned to each individual
CPU threads in spapr_cpu_core_realize(). But the numbering logic
of VCPU ids is actually a machine-level concept, and many places
in hw/ppc/spapr.c also have to compute VCPU ids out of CPU indexes.

The current formula used in spapr_cpu_core_realize() is:

    vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i

where:

    cc->core_id is a multiple of smp_threads
    cpu_index = cc->core_id + i
    0 <= i < smp_threads

So we have:

    cpu_index % smp_threads == i
    cc->core_id / smp_threads == cpu_index / smp_threads

hence:

    vcpu_id =
        (cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads;

This formula was used before VSMT at the time VCPU ids where computed
at the target emulation level. It has the advantage of being useable
to derive a VPCU id out of a CPU index only. It is fitted for all the
places where the machine code has to compute a VCPU id.

This patch introduces an accessor to set the VCPU id in a PowerPCCPU object
using the above formula. It is a first step to consolidate all the VCPU id
logic in a single place.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-16 12:14:26 +11:00
Suraj Jitindar Singh
c59704b254 target/ppc/spapr: Add H-Call H_GET_CPU_CHARACTERISTICS
The new H-Call H_GET_CPU_CHARACTERISTICS is used by the guest to query
behaviours and available characteristics of the cpu.

Implement the handler for this new H-Call which formulates its response
based on the setting of the spapr_caps cap-cfpc, cap-sbbc and cap-ibs.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29 14:24:55 +11:00
Suraj Jitindar Singh
4be8d4e7d9 target/ppc/spapr_caps: Add new tristate cap safe_indirect_branch
Add new tristate cap cap-ibs to represent the indirect branch
serialisation capability.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29 14:24:55 +11:00
Suraj Jitindar Singh
09114fd817 target/ppc/spapr_caps: Add new tristate cap safe_bounds_check
Add new tristate cap cap-sbbc to represent the speculation barrier
bounds checking capability.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29 14:24:55 +11:00
Suraj Jitindar Singh
8f38eaf8f9 target/ppc/spapr_caps: Add new tristate cap safe_cache
Add new tristate cap cap-cfpc to represent the cache flush on privilege
change capability.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29 14:24:55 +11:00
Suraj Jitindar Singh
6898aed77f target/ppc/spapr_caps: Add support for tristate spapr_capabilities
spapr_caps are used to represent the level of support for various
capabilities related to the spapr machine type. Currently there is
only support for boolean capabilities.

Add support for tristate capabilities by implementing their get/set
functions. These capabilities can have the values 0, 1 or 2
corresponding to broken, workaround and fixed.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29 14:24:55 +11:00
Suraj Jitindar Singh
8acc2ae5e9 target/ppc/kvm: Add cap_ppc_safe_[cache/bounds_check/indirect_branch]
Add three new kvm capabilities used to represent the level of host support
for three corresponding workarounds.

Host support for each of the capabilities is queried through the
new ioctl KVM_PPC_GET_CPU_CHAR which returns four uint64 quantities. The
first two, character and behaviour, represent the available
characteristics of the cpu and the behaviour of the cpu respectively.
The second two, c_mask and b_mask, represent the mask of known bits for
the character and beheviour dwords respectively.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Correct some compile errors due to name change in final kernel
 patch version]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29 14:24:55 +11:00