Commit Graph

1893 Commits

Author SHA1 Message Date
David Gibson
0b0e52b131 spapr, xics, xive: Move irq claim and free from SpaprIrq to SpaprInterruptController
These methods, like cpu_intc_create, really belong to the interrupt
controller, but need to be called on all possible intcs.

Like cpu_intc_create, therefore, make them methods on the intc and
always call it for all existing intcs.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson
ebd6be089b spapr, xics, xive: Move cpu_intc_create from SpaprIrq to SpaprInterruptController
This method essentially represents code which belongs to the interrupt
controller, but needs to be called on all possible intcs, rather than
just the currently active one.  The "dual" version therefore calls
into the xics and xive versions confusingly.

Handle this more directly, by making it instead a method on the intc
backend, and always calling it on every backend that exists.

While we're there, streamline the error reporting a bit.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson
150e25f85b spapr, xics, xive: Introduce SpaprInterruptController QOM interface
The SpaprIrq structure is used to represent ths spapr machine's irq
backend.  Except that it kind of conflates two concepts: one is the
backend proper - a specific interrupt controller that we might or
might not be using, the other is the irq configuration which covers
the layout of irq space and which interrupt controllers are allowed.

This leads to some pretty confusing code paths for the "dual"
configuration where its hooks redirect to other SpaprIrq structures
depending on the currently active irq controller.

To clean this up, we start by introducing a new
SpaprInterruptController QOM interface to represent strictly an
interrupt controller backend, not counting anything configuration
related.  We implement this interface in the XICs and XIVE interrupt
controllers, and in future we'll move relevant methods from SpaprIrq
into it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
Greg Kurz
29cb418749 spapr: Set VSMT to smp_threads by default
Support for setting VSMT is available in KVM since linux-4.13. Most distros
that support KVM on POWER already have it. It thus seem reasonable enough
to have the default machine to set VSMT to smp_threads.

This brings contiguous VCPU ids and thus brings their upper bound down to
the machine's max_cpus. This is especially useful for XIVE KVM devices,
which may thus allocate only one VP descriptor per VCPU.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157010411885.246126.12610015369068227139.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-24 09:36:55 +11:00
Cédric Le Goater
06d26eeb47 ppc/pnv: Use address_space_stq_be() when triggering an interrupt from PSI
Include the XIVE_TRIGGER_PQ bit in the trigger data which is how
hardware signals to the IC that the PQ bits of the interrupt source
have been checked.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191007084102.29776-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-24 09:36:55 +11:00
Tao Xu
0533ef5f20 numa: Introduce MachineClass::auto_enable_numa for implicit NUMA node
Add MachineClass::auto_enable_numa field. When it is true, a NUMA node
is expected to be created implicitly.

Acked-by: David Gibson <david@gibson.dropbear.id.au>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20190905083238.1799-1-tao3.xu@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-10-15 18:18:08 -03:00
Peter Maydell
0f0b43868a ppc patch queue 2019-10-04
Here's the next batch of ppc and spapr patches.  Includes:
   * Fist part of a large cleanup to irq infrastructure
   * Recreate the full FDT at CAS time, instead of making a difficult
     to follow set of updates.  This will help us move towards
     eliminating CAS reboots altogether
   * No longer provide RTAS blob to SLOF - SLOF can include it just as
     well itself, since guests will generally need to relocate it with
     a call to instantiate-rtas
   * A number of DFP fixes and cleanups from Mark Cave-Ayland
   * Assorted bugfixes
   * Several new small devices for powernv
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl2XEn0ACgkQbDjKyiDZ
 s5I6bA/7B5sjY/QxuE8axm5KupoAnE8zf205hN8mbYASwtDfFwgaeNreVaOSJUpr
 fgcx/g9G3rAryGZv3O6i02+wcRgNw1DnJ3ynCthIrExZEcfbTYJiS4s9apwPEQy8
 HFmBNdPDqrhFI0aFvXEUauiOp1aapPUUklm34eFscs94lJXxphRUEfa3XT5uEhUh
 xrIZwYq20A+ih4UHwk3Onyx/cvFpl6BRB2nVEllQFqzwF5eTTfz9t8+JGTebxD/7
 8qqt8ti0KM3wxSDTQnmyMUmpgy+C1iCvNYvv6nWFg+07QuGs48EHlQUUVVni4r9j
 kUrDwKS2eC+8e8gP/xdIXEq3R2DsAMq+wFIswXZ3X6x4DoUV0OAJSHc9iMD4l+pr
 LyWnVpDprc6XhJHWKpuHZ5w9EuBnZFbIXdlZGFno+8UvXtusnbbuwAZzHTrRJRqe
 /AWVpFwGAoOF4KxIOFlPVBI8m4vFad/soVojC0vzIbRqaogOFZAjiL/yD5GwLmMa
 tywOEMBUJ/j2lgudTCyKn5uCa/Ew3DS1TSdenJjyqRi/gZM0IaORIhJhyFYW/eO1
 U7Uh8BnbC+4J11wwvFR5+W789dgM2+EEtAX9uI08VcE/R2ASabZlN4Zwrl0w4cb/
 VRybMT4bgmjzHRpfrqYPxpn8wqPcIw0BCeipSOjY3QU1Q25TEYQ=
 =PXXe
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20191004' into staging

ppc patch queue 2019-10-04

Here's the next batch of ppc and spapr patches.  Includes:
  * Fist part of a large cleanup to irq infrastructure
  * Recreate the full FDT at CAS time, instead of making a difficult
    to follow set of updates.  This will help us move towards
    eliminating CAS reboots altogether
  * No longer provide RTAS blob to SLOF - SLOF can include it just as
    well itself, since guests will generally need to relocate it with
    a call to instantiate-rtas
  * A number of DFP fixes and cleanups from Mark Cave-Ayland
  * Assorted bugfixes
  * Several new small devices for powernv

# gpg: Signature made Fri 04 Oct 2019 10:35:57 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-4.2-20191004: (53 commits)
  ppc/pnv: Remove the XICSFabric Interface from the POWER9 machine
  spapr: Eliminate SpaprIrq::init hook
  spapr: Add return value to spapr_irq_check()
  spapr: Use less cryptic representation of which irq backends are supported
  xive: Improve irq claim/free path
  spapr, xics, xive: Better use of assert()s on irq claim/free paths
  spapr: Handle freeing of multiple irqs in frontend only
  spapr: Remove unhelpful tracepoints from spapr_irq_free_xics()
  spapr: Eliminate SpaprIrq:get_nodename method
  spapr: Simplify spapr_qirq() handling
  spapr: Fix indexing of XICS irqs
  spapr: Eliminate nr_irqs parameter to SpaprIrq::init
  spapr: Clarify and fix handling of nr_irqs
  spapr: Replace spapr_vio_qirq() helper with spapr_vio_irq_pulse() helper
  spapr: Fold spapr_phb_lsi_qirq() into its single caller
  xics: Create sPAPR specific ICS subtype
  xics: Merge TYPE_ICS_BASE and TYPE_ICS_SIMPLE classes
  xics: Eliminate reset hook
  xics: Rename misleading ics_simple_*() functions
  xics: Eliminate 'reject', 'resend' and 'eoi' class hooks
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-10-07 13:49:02 +01:00
Eric Auger
549d400587 memory: allow memory_region_register_iommu_notifier() to fail
Currently, when a notifier is attempted to be registered and its
flags are not supported (especially the MAP one) by the IOMMU MR,
we generally abruptly exit in the IOMMU code. The failure could be
handled more nicely in the caller and especially in the VFIO code.

So let's allow memory_region_register_iommu_notifier() to fail as
well as notify_flag_changed() callback.

All sites implementing the callback are updated. This patch does
not yet remove the exit(1) in the amd_iommu code.

in SMMUv3 we turn the warning message into an error message saying
that the assigned device would not work properly.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-04 18:49:18 +02:00
Cédric Le Goater
1aba8716c8 ppc/pnv: Remove the XICSFabric Interface from the POWER9 machine
The POWER8 PowerNV machine needs to implement a XICSFabric interface
as this is the POWER8 interrupt controller model. But the POWER9
machine uselessly inherits of XICSFabric from the common PowerNV
machine definition.

Open code machine definitions to have a better control on the
different interfaces each machine should define.

Fixes: f30c843ced ("ppc/pnv: Introduce PowerNV machines with fixed CPU models")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191003143617.21682-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 19:08:23 +10:00
David Gibson
f478d9af21 spapr: Eliminate SpaprIrq::init hook
This method is used to set up the interrupt backends for the current
configuration.  However, this means some confusing redirection between
the "dual" mode init and the init hooks for xics only and xive only modes.

Since we now have simple flags indicating whether XICS and/or XIVE are
supported, it's easier to just open code each initialization directly in
spapr_irq_init().  This will also make some future cleanups simpler.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:23 +10:00
David Gibson
0a3fd3df6f spapr: Add return value to spapr_irq_check()
Explicitly return success or failure, rather than just relying on the
Error ** parameter.  This makes handling it less verbose in the caller.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:23 +10:00
David Gibson
ca62823b79 spapr: Use less cryptic representation of which irq backends are supported
SpaprIrq::ov5 stores the value for a particular byte in PAPR option vector
5 which indicates whether XICS, XIVE or both interrupt controllers are
available.  As usual for PAPR, the encoding is kind of overly complicated
and confusing (though to be fair there are some backwards compat things it
has to handle).

But to make our internal code clearer, have SpaprIrq encode more directly
which backends are available as two booleans, and derive the OV5 value from
that at the point we need it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:23 +10:00
David Gibson
e594c2ad1c xive: Improve irq claim/free path
spapr_xive_irq_claim() returns a bool to indicate if it succeeded.
But most of the callers and one callee use int return values and/or an
Error * with more information instead.  In any case, ints are a more
common idiom for success/failure states than bools (one never knows
what sense they'll be in).

So instead change to an int return value to indicate presence of error
+ an Error * to describe the details through that call chain.

It also didn't actually check if the irq was already claimed, which is
one of the primary purposes of the claim path, so do that.

spapr_xive_irq_free() also returned a bool... which no callers checked
and was always true, so just drop it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:23 +10:00
David Gibson
580dde5e4a spapr, xics, xive: Better use of assert()s on irq claim/free paths
The irq claim and free paths for both XICS and XIVE check for some
validity conditions.  Some of these represent genuine runtime failures,
however others - particularly checking that the basic irq number is in a
sane range - could only fail in the case of bugs in the callin code.
Therefore use assert()s instead of runtime failures for those.

In addition the non backend-specific part of the claim/free paths should
only be used for PAPR external irqs, that is in the range SPAPR_XIRQ_BASE
to the maximum irq number.  Put assert()s for that into the top level
dispatchers as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:23 +10:00
David Gibson
f233cee97b spapr: Handle freeing of multiple irqs in frontend only
spapr_irq_free() can be used to free multiple irqs at once. That's useful
for its callers, but there's no need to make the individual backend hooks
handle this.  We can loop across the irqs in spapr_irq_free() itself and
have the hooks just do one at time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-04 19:08:23 +10:00
David Gibson
85d0425652 spapr: Remove unhelpful tracepoints from spapr_irq_free_xics()
These traces contain some useless information (the always-0 source#) and
have no equivalents for XIVE mode.  For now just remove them, and we can
put back something more sensible if and when we need it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-10-04 19:08:22 +10:00
David Gibson
14789694cd spapr: Eliminate SpaprIrq:get_nodename method
This method is used to determine the name of the irq backend's node in the
device tree, so that we can find its phandle (after SLOF may have modified
it from the phandle we initially gave it).

But, in the two cases the only difference between the node name is the
presence of a unit address.  Searching for a node name without considering
unit address is standard practice for the device tree, and
fdt_subnode_offset() will do exactly that, making this method unecessary.

While we're there, remove the XICS_NODENAME define.  The name
"interrupt-controller" is required by PAPR (and IEEE1275), and a bunch of
places assume it already.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson
af1861511d spapr: Simplify spapr_qirq() handling
Currently spapr_qirq(), whic is used to find the qemu_irq for an spapr
global irq number, redirects through the SpaprIrq::qirq method.  But
the array of qemu_irqs is allocated in the PAPR layer, not the
backends, and so the method implementations all return the same thing,
just differing in the preliminary checks they make.

So, we can remove the method, and just implement spapr_qirq() directly,
including all the relevant checks in one place.  We change all those
checks into assert()s as well, since a failure here indicates an error in
the calling code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-10-04 19:08:22 +10:00
David Gibson
9f53c0db19 spapr: Fix indexing of XICS irqs
spapr global irq numbers are different from the source numbers on the ICS
when using XICS - they're offset by XICS_IRQ_BASE (0x1000).  But
spapr_irq_set_irq_xics() was passing through the global irq number to
the ICS code unmodified.

We only got away with this because of a counteracting bug - we were
incorrectly adjusting the qemu_irq we returned for a requested global irq
number.

That approach mostly worked but is very confusing, incorrectly relies on
the way the qemu_irq array is allocated, and undermines the intention of
having the global array of qemu_irqs for spapr have a consistent meaning
regardless of irq backend.

So, fix both set_irq and qemu_irq indexing.  We rename some parameters at
the same time to make it clear that they are referring to spapr global
irq numbers.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson
fe9b61b246 spapr: Eliminate nr_irqs parameter to SpaprIrq::init
The only reason this parameter was needed was to work around the
inconsistent meaning of nr_irqs between xics and xive.  Now that we've
fixed that, we can consistently use the number directly in the SpaprIrq
configuration.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson
ad8de98636 spapr: Clarify and fix handling of nr_irqs
Both the XICS and XIVE interrupt backends have a "nr-irqs" property, but
it means slightly different things.  For XICS (or, strictly, the ICS) it
indicates the number of "real" external IRQs.  Those start at XICS_IRQ_BASE
(0x1000) and don't include the special IPI vector.  For XIVE, however, it
includes the whole IRQ space, including XIVE's many IPI vectors.

The spapr code currently doesn't handle this sensibly, with the
nr_irqs value in SpaprIrq having different meanings depending on the
backend.  We fix this by renaming nr_irqs to nr_xirqs and making it
always indicate just the number of external irqs, adjusting the value
we pass to XIVE accordingly.  We also move to using common constants
in most of the irq configurations, to make it clearer that the IRQ
space looks the same to the guest (and emulated devices), even if the
backend is different.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson
7678b74a94 spapr: Replace spapr_vio_qirq() helper with spapr_vio_irq_pulse() helper
Every caller of spapr_vio_qirq() immediately calls qemu_irq_pulse() with
the result, so we might as well just fold that into the helper.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-10-04 19:08:22 +10:00
David Gibson
258aa5ce1c spapr: Fold spapr_phb_lsi_qirq() into its single caller
No point having a two-line helper that's used exactly once, and not likely
to be used anywhere else in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-10-04 19:08:22 +10:00
David Gibson
9db8c551c9 xics: Create sPAPR specific ICS subtype
We create a subtype of TYPE_ICS specifically for sPAPR.  For now all this
does is move the setup of the PAPR specific hcalls and RTAS calls to
the realize() function for this, rather than requiring the PAPR code to
explicitly call xics_spapr_init().  In future it will have some more
function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson
642e92719e xics: Merge TYPE_ICS_BASE and TYPE_ICS_SIMPLE classes
TYPE_ICS_SIMPLE is the only subtype of TYPE_ICS_BASE that's ever
instantiated.  The existence of different classes is mostly a hang
over from when we (misguidedly) had separate subtypes for the KVM and
non-KVM version of the device.

There could be some call for an abstract base type for ICS variants
that use a different representation of their state (PowerNV PHB3 might
want this).  The current split isn't really in the right place for
that though.  If we need this in future, we can re-implement it more
in line with what we actually need.

So, collapse the two classes together into just TYPE_ICS.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson
28976c99cf xics: Rename misleading ics_simple_*() functions
There are a number of ics_simple_*() functions that aren't actually
specific to TYPE_XICS_SIMPLE at all, and are equally valid on
TYPE_XICS_BASE.  Rename them to ics_*() accordingly.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:22 +10:00
Alexey Kardashevskiy
e68cd0cb5c spapr: Render full FDT on ibm,client-architecture-support
The ibm,client-architecture-support call is a way for the guest to
negotiate capabilities with a hypervisor. It is implemented as:
- the guest calls SLOF via client interface;
- SLOF calls QEMU (H_CAS hypercall) with an options vector from the guest;
- QEMU returns a device tree diff (which uses FDT format with
an additional header before it);
- SLOF walks through the partial diff tree and updates its internal tree
with the values from the diff.

This changes QEMU to simply re-render the entire tree and send it as
an update. SLOF can handle this already mostly, [1] is needed before this
can be applied. This stores the resulting tree in the spapr machine to have
the latest valid FDT copy possible (this should not matter much as
H_UPDATE_DT happens right after that but nevertheless).

The benefit is reduced code size as there is no need for another set of
DT rendering helpers such as spapr_fixup_cpu_dt().

The downside is that the updates are bigger now (as they include all
nodes and properties) but the difference on a '-smp 256,threads=1' system
before/after is 2.35s vs. 2.5s.

[1] https://patchwork.ozlabs.org/patch/1152915/

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 19:08:21 +10:00
Alexey Kardashevskiy
c4ec08ab70 spapr-pci: Stop providing assigned-addresses
QEMU does not allocate PCI resources (BARs) in any case - coldplug devices
are configured by the firmware and hotplug devices rely on the guest
system to do the assignment via the PCI rescan mechanism. Also in order
to create non empty "assigned-addresses", the device has to be enabled
(i.e. PCI_COMMAND needs the MMIO bit set) first as otherwise
io_regions[i].addr are -1, and devices are not enabled at this point.

This removes "assigned-addresses" and leaves it to those who actually
do resource allocation.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190927022651.71642-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 19:08:21 +10:00
Alexey Kardashevskiy
744a928cce spapr: Stop providing RTAS blob
SLOF implements one itself so let's remove it from QEMU. It is one less
image and simpler setup as the RTAS blob never stays in its initial place
anyway as the guest OS always decides where to put it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Alexey Kardashevskiy
5ced78955f spapr: Do not put empty properties for -kernel/-initrd/-append
We are going to use spapr_build_fdt() for the boot time FDT and as an
update for SLOF during handling of H_CAS. SLOF will apply all properties
from the QEMU's FDT which is usually ok unless there are properties
changed by grub or guest kernel. The properties are:
bootargs, linux,initrd-start, linux,initrd-end, linux,stdout-path,
linux,rtas-base, linux,rtas-entry. Resetting those during CAS will most
likely cause grub failure.

Don't create such properties if we're booting without "-kernel" and
"-initrd" so they won't get included into the DT update blob and
therefore the guest is more likely to boot successfully.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[dwg: Tweaked commit message based on Greg Kurz's input]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Alexey Kardashevskiy
3a17e38f6e spapr: Skip leading zeroes from memory@ DT node names
The device tree build by QEMU at the machine reset time is used by SLOF
to build its internal device tree but the node names are not preserved
exactly so when QEMU provides a device tree update in response to H_CAS,
it might become tricky to match a node from the update blob to
the actual node in SLOF.

This removed leading zeroes from "memory@" nodes and makes
the DTC checker happy.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 10:25:23 +10:00
Alexey Kardashevskiy
f767b1ac57 spapr: Fixes a leak in CAS
Add a missing g_free(fdt) if the resulting tree is bigger
than the space allocated by SLOF.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-04 10:25:23 +10:00
David Gibson
db5127b28a spapr: Move handling of special NVLink numa node from reset to init
The number of NUMA nodes in the system is fixed from the command line.
Therefore, there's no need to recalculate it at reset time, and we can
determine the special gpu_numa_id value used for NVLink2 devices at init
time.

This simplifies the reset path a bit which will make further improvements
easier.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2019-10-04 10:25:23 +10:00
David Gibson
daa36379ce spapr: Simplify handling of pre ISA 3.0 guest workaround handling
Certain old guest versions don't understand the radix MMU introduced with
POWER ISA 3.0, but incorrectly select it if presented with the option at
CAS time.  We workaround this in qemu by explicitly excluding the radix
(and other ISA 3.0 linked) options if the guest doesn't explicitly note
support for ISA 3.0.

This is handled by the 'cas_legacy_guest_workaround' flag, which is pretty
vague.  Rename it to 'cas_pre_isa3_guest' to be clearer about what it's for.

In addition, we unnecessarily call spapr_populate_pa_features() with
different options when initially constructing the device tree and when
adjusting it at CAS time.  At the initial construct time cas_pre_isa3_guest
is already false, so we can still use the flag, rather than explicitly
overriding it to be false at the callsite.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2019-10-04 10:25:23 +10:00
Cédric Le Goater
4a99d40551 spapr/irq: Introduce an ics_irq_free() helper
It will help us to discard interrupt numbers which have not been
claimed in the next patch.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190911133937.2716-2-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Balamuruhan S
3887d24123 hw/ppc/pnv_homer: add PowerNV homer device model
add PnvHomer device model to emulate homer memory access
for pstate table, occ-sensors, slw, occ static and dynamic
values for Power8 and Power9 chips.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-4-bala24@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Balamuruhan S
f3db82660d hw/ppc/pnv_occ: add sram device model for occ common area
emulate occ common area region with occ sram device model which
occ and skiboot uses it to communicate regarding sensors, slw
and HWMON in PowerNV emulated host.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-3-bala24@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Balamuruhan S
7454558c69 hw/ppc/pnv_xscom: retrieve homer/occ base address from PBA BARs
During PowerNV boot skiboot populates the device tree by
retrieving base address of homer/occ common area from
PBA BARs and prd ipoll mask by accessing xscom read/write
accesses.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-2-bala24@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Greg Kurz
f041d6af55 spapr: Report kvm_irqchip_in_kernel() in 'info pic'
Unless the machine was started with kernel-irqchip=on, we cannot easily
tell if we're actually using an in-kernel or an emulated irqchip. This
information is important enough that it is worth printing it in 'info
pic'.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156829860985.2073005.5893493824873412773.stgit@bahia.tls.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Balamuruhan S
59b7c1c283 hw/ppc/pnv: fix checkpatch.pl coding style warnings
There were few trailing comments after `/*` instead in
new line and line more than 80 character, these fixes are
trivial and doesn't change any logic in code.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190911142925.19197-5-bala24@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Greg Kurz
226c9d15df spapr-tpm-proxy: Drop misleading check
Coverity is reporting in CID 1405304 that tpm_execute() may pass a NULL
tpm_proxy->host_path pointer to open(). This is based on the fact that
h_tpm_comm() does a NULL check on tpm_proxy->host_path and then passes
tpm_proxy to tpm_execute().

The check in h_tpm_comm() is abusive actually since a spapr-proxy-tpm
requires a non NULL host_path property, as checked during realize.

Fixes: 0fb6bd0732
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156805260916.1779401.11054185183758185247.stgit@bahia.lan>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Cédric Le Goater
f42b6f535c ppc/pnv: fix "bmc" node name in DT
Fixes the dtc output :

ERROR (node_name_chars): //bmc: Bad character '/' in node name
Warning (avoid_unnecessary_addr_size): /bmc: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190902092932.20200-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Laurent Vivier
58c46efa45 pseries: do not allow memory-less/cpu-less NUMA node
When we hotplug a CPU on memory-less/cpu-less node, the linux kernel
crashes.

This happens because linux kernel needs to know the NUMA topology at
start to be able to initialize the distance lookup table.

On pseries, the topology is provided by the firmware via the existing
CPUs and memory information. Thus a node without memory and CPU cannot be
discovered by the kernel.

To avoid the kernel crash, do not allow to start pseries with empty
nodes.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20190830161345.22436-1-lvivier@redhat.com>
[dwg: Rework to cope with movement of numa state from globals to MachineState]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Dr. David Alan Gilbert
ce62df5378 migration: register_savevm_live doesn't need dev
Commit 78dd48df3 removed the last caller of register_savevm_live for an
instantiable device (rather than a single system wide device);
so trim out the parameter.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190822115433.12070-1-dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-09-12 11:15:03 +01:00
Peter Maydell
f884294bd7 Machine + x86 queue, 2019-09-03
Bug fixes:
 * Fix die-id validation regression (Eduardo Habkost)
 * vmmouse: Properly reset state (Jan Kiszka)
 * hostmem-file: fix pmem file size check (Stefan Hajnoczi)
 * Keep query-hotpluggable-cpus output compatible with older QEMU
   if '-smp dies' is not set (Igor Mammedov)
 * migration: Do not re-read the clock on pre_save in case of paused guest
   (Maxiwell S. Garcia)
 
 Cleanups:
 * NUMA code cleanups (Tao Xu)
 * Remove stale externs from includes (Alex Bennée)
 
 Features:
 * qapi: report the default CPU type for each machine (Daniel P. Berrangé)
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEWjIv1avE09usz9GqKAeTb5hNxaYFAl1u08EUHGVoYWJrb3N0
 QHJlZGhhdC5jb20ACgkQKAeTb5hNxaaKGQ//WQY+JQgXj2M7i5bAuz1lkR0QKJvh
 n++70ugqNmmlj1YH7LKmZNll0tz+auo25PLgEBOamPZPFQXxkRhPBxTUnOdQJ1UC
 bSwyRzHrFluVITXD/nGkIXgmP4rjXil5QBWTxneWb7zYsXDGBEnauZnC1YsXzc9T
 5LISvc5zEz6pEzz5s3LdUJ947jTui/dDHVHupeyK/5bPkiPoKVoymsd4p8rvAmFw
 4obMftjuFzklm8oLPKpHYAm7VvXj5yb92/FE/ZKdaahcLPGStWixiHJ7xJlGMBti
 GqcWca+2sdbsraOz4Pg05x//vbOgiwIECqgKJRlJSAnG7Roz7E6J/xXQIYIkhpkL
 Sn0+s181WtFeNFlQgEP056iTUCq81oBjek2XzgsXzuQyFip5IJGLLQox4E+w0ty6
 7houoCkJD70ddl3sEj/koXi6rBeswNStfuxVYxUgwYa7HecehNvVD5q9NlElRhev
 Lce4szuWJzHBbhW5ubGmN6rCbXNa+mPrBunrDwbjApl12DFkr163dj9DsyN/DUgy
 MmfsgqpKZ+g18VSajck2QtvTg+9Oqv0bv3SWtpDwzDxS9VULz0r2wfcN9TZDipV0
 qCZWg39BpCIgdd4s5L0q6bamC9+eSwoByFx54WrkoQT81odHJqUHNsCE9wnoNvmG
 aZlV3idjGmsTFiE=
 =u5HZ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging

Machine + x86 queue, 2019-09-03

Bug fixes:
* Fix die-id validation regression (Eduardo Habkost)
* vmmouse: Properly reset state (Jan Kiszka)
* hostmem-file: fix pmem file size check (Stefan Hajnoczi)
* Keep query-hotpluggable-cpus output compatible with older QEMU
  if '-smp dies' is not set (Igor Mammedov)
* migration: Do not re-read the clock on pre_save in case of paused guest
  (Maxiwell S. Garcia)

Cleanups:
* NUMA code cleanups (Tao Xu)
* Remove stale externs from includes (Alex Bennée)

Features:
* qapi: report the default CPU type for each machine (Daniel P. Berrangé)

# gpg: Signature made Tue 03 Sep 2019 21:57:37 BST
# gpg:                using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6
# gpg:                issuer "ehabkost@redhat.com"
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full]
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/machine-next-pull-request:
  migration: Do not re-read the clock on pre_save in case of paused guest
  x86: do not advertise die-id in query-hotpluggbale-cpus if '-smp dies' is not set
  i386/vmmouse: Properly reset state
  hostmem-file: fix pmem file size check
  qapi: report the default CPU type for each machine
  pc: Don't make die-id mandatory unless necessary
  pc: Improve error message when die-id is omitted
  pc: Fix error message on die-id validation
  numa: move numa global variable numa_info into MachineState
  numa: move numa global variable have_numa_distance into MachineState
  numa: move numa global variable nb_numa_nodes into MachineState
  hw/arm: simplify arm_load_dtb
  includes: remove stale [smp|max]_cpus externs

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-09-04 14:44:54 +01:00
Tao Xu
7e721e7b10 numa: move numa global variable numa_info into MachineState
Move existing numa global numa_info (renamed as "nodes") into NumaState.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20190809065731.9097-5-tao3.xu@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-09-03 11:26:55 -03:00
Tao Xu
aa57020774 numa: move numa global variable nb_numa_nodes into MachineState
Add struct NumaState in MachineState and move existing numa global
nb_numa_nodes(renamed as "num_nodes") into NumaState. And add variable
numa_support into MachineClass to decide which submachines support NUMA.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20190809065731.9097-3-tao3.xu@intel.com>
[ehabkost: include hw/boards.h again to fix build failures]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-09-03 11:26:55 -03:00
Greg Kurz
b1e8156743 spapr: Set compat mode in spapr_core_plug()
A recent change in spapr_machine_reset() showed that resetting the compat
mode in spapr_machine_reset() for the boot vCPU and in spapr_cpu_reset()
for all other vCPUs was fragile. The fix was thus to reset the compat mode
for all vCPUs in spapr_machine_reset(), but we still have to propagate
it to hot-plugged CPUs. This is still performed from spapr_cpu_reset(),
hence resulting in ppc_set_compat() being called twice for every vCPU at
machine reset. Apart from wasting cycles, which isn't really an issue
during machine reset, this seems to indicate that spapr_cpu_reset() isn't
the best place to set the compat mode.

A natural candidate for CPU-hotplug specific code is spapr_core_plug().
Also, it sits in the same file as spapr_machine_reset() : this makes
it easier for someone who wants to know when the compat PVR is set.

Call ppc_set_compat() from there. This doesn't need to be done for
initial vCPUs since the compat PVR is 0 and spapr_machine_reset() sets
the appropriate value later. No need to do this on manually added vCPUS
on the destination QEMU during migration since the compat PVR is
part of the migrated vCPU state. Both conditions can be checked with
spapr_drc_hotplugged().

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156701285312.499757.7807417667750711711.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-29 09:46:07 +10:00
Greg Kurz
572ebd08b3 spapr/pci: Convert types to QEMU coding style
The QEMU coding style requires:
- to typedef structured types (HACKING)
- to use CamelCase for types and structure names (CODING_STYLE)

Do that for PCI and Nvlink2 code.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156701644465.505236.2850655823182656869.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-29 09:46:07 +10:00
Alexey Kardashevskiy
6c3829a265 spapr_pci: Advertise BAR reallocation capability
The pseries guests do not normally allocate PCI resources and rely on
the system firmware doing so. Furthermore at least at some point in
the past the pseries guests won't even allowed to change BARs, probably
it is still the case for phyp. So since the initial commit we have [1]
which prevents resource reallocation.

This is not a problem until we want specific BAR alignments, for example,
PAGE_SIZE==64k to make sure we can still map MMIO BARs directly. For
the boot time devices we handle this in SLOF [2] but since QEMU's RTAS
does not allocate BARs, the guest does this instead and does not align
BARs even if Linux is given pci=resource_alignment=16@pci:0:0 as
PCI_PROBE_ONLY makes Linux ignore alignment requests.

ARM folks added a dial to control PCI_PROBE_ONLY via the device tree [3].
This makes use of the dial to advertise to the guest that we can handle
BAR reassignments. This limits the change to the latest pseries machine
to avoid old guests explosion.

We do not remove the flag from [1] as pseries guests are still supported
under phyp so having that removed may cause problems.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/platforms/pseries/setup.c?h=v5.1#n773
[2] https://git.qemu.org/?p=SLOF.git;a=blob;f=board-qemu/slof/pci-phb.fs;h=06729bcf77a0d4e900c527adcd9befe2a269f65d;hb=HEAD#l338
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f81c11af
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190719043734.108462-1-aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-29 09:46:07 +10:00