The function is called only inside spapr_hcall.c.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210114180628.1675603-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This whitespace was messing with lots of diffs if you happen
to use an editor that eliminates trailing whitespaces on file
save.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210114180628.1675603-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since commit 1e8b5b1aa1 ("spapr: Allow memory unplug to always succeed")
trying to unplug memory from a guest that doesn't support it (eg. rhel6)
no longer generates an error like it used to. Instead, it leaves the
memory around : only a subsequent reboot or manual use of drmgr within
the guest can complete the hot-unplug sequence. A flag was added to
SpaprMachineClass so that this new behavior only applies to the default
machine type.
We can do better. CAS processes all pending hot-unplug requests. This
means that we don't really care about what the guest supports if
the hot-unplug request happens before CAS.
All guests that we care for, even old ones, set enough bits in OV5
that lead to a non-empty bitmap in spapr->ov5_cas. Use that as a
heuristic to decide if CAS has already occured or not.
Always accept unplug requests that happen before CAS since CAS will
process them. Restore the previous behavior of rejecting them after
CAS when we know that the guest doesn't support memory hot-unplug.
This behavior is suitable for all machine types : this allows to
drop the pre_6_0_memory_unplug flag.
Fixes: 1e8b5b1aa1 ("spapr: Allow memory unplug to always succeed")
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <161012708715.801107.11418801796987916516.stgit@bahia.lan>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now we've converted all the callsites to directly create the QOM UIC
device themselves, the ppcuic_init() function is unused and can be
removed. The enum defining PPCUIC symbolic constants can be moved
to the ppc-uic.h header where it more naturally belongs.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-Id: <20210108171212.16500-5-peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
No need to expose the way DRCs are traversed outside of spapr_drc.c.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201218103400.689660-4-groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Documentation of object_property_iter_init() clearly stipulates that
"it is forbidden to modify the property list while iterating". But this
is exactly what we do when resetting transient DR connectors during CAS.
The call to spapr_drc_reset() can finalize the hot-unplug sequence of a
PHB or a PCI bridge, both of which will then in turn destroy their PCI
DRCs. This could potentially invalidate the iterator. It is pure luck
that this haven't caused any issues so far.
Change spapr_drc_reset() to return true if it caused a device to be
removed. Restart from scratch in this case. This can potentially
increase the overall DRC reset time, especially with a high maxmem
which generates a lot of LMB DRCs. But this kind of setup is rare,
and so is the use case of rebooting a guest while doing hot-unplug.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201218103400.689660-3-groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Non-transient DRCs are either in the empty or the ready state,
which means spapr_drc_reset() doesn't change their state. It
is thus not needed to do any checking. Call spapr_drc_reset()
unconditionally and squash spapr_drc_transient() into its
only user, spapr_drc_needed().
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201218103400.689660-2-groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Running a guest with 128 NUMA nodes crashes QEMU:
../../util/error.c:59: error_setv: Assertion `*errp == NULL' failed.
The crash happens when setting the FWNMI migration blocker:
2861 if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI) == SPAPR_CAP_ON) {
2862 /* Create the error string for live migration blocker */
2863 error_setg(&spapr->fwnmi_migration_blocker,
2864 "A machine check is being handled during migration. The handler"
2865 "may run and log hardware error on the destination");
2866 }
Inspection reveals that papr->fwnmi_migration_blocker isn't NULL:
(gdb) p spapr->fwnmi_migration_blocker
$1 = (Error *) 0x8000000004000000
Since this is the only place where papr->fwnmi_migration_blocker is
set, this means someone wrote there in our back. Further analysis
points to spapr_numa_associativity_init(), especially the part
that initializes the associative arrays for NVLink GPUs:
max_nodes_with_gpus = nb_numa_nodes + NVGPU_MAX_NUM;
ie. max_nodes_with_gpus = 128 + 6, but the array isn't sized to
accommodate the 6 extra nodes:
struct SpaprMachineState {
.
.
.
uint32_t numa_assoc_array[MAX_NODES][NUMA_ASSOC_SIZE];
Error *fwnmi_migration_blocker;
};
and the following loops happily overwrite spapr->fwnmi_migration_blocker,
and probably more:
for (i = nb_numa_nodes; i < max_nodes_with_gpus; i++) {
spapr->numa_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
for (j = 1; j < MAX_DISTANCE_REF_POINTS; j++) {
uint32_t gpu_assoc = smc->pre_5_1_assoc_refpoints ?
SPAPR_GPU_NUMA_ID : cpu_to_be32(i);
spapr->numa_assoc_array[i][j] = gpu_assoc;
}
spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
}
Fix the size of the array. This requires "hw/ppc/spapr.h" to see
NVGPU_MAX_NUM. Including "hw/pci-host/spapr.h" introduces a
circular dependency that breaks the build, so this moves the
definition of NVGPU_MAX_NUM to "hw/ppc/spapr.h" instead.
Reported-by: Min Deng <mdeng@redhat.com>
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1908693
Fixes: dd7e1d7ae4 ("spapr_numa: move NVLink2 associativity handling to spapr_numa.c")
Cc: danielhb413@gmail.com
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160829960428.734871.12634150161215429514.stgit@bahia.lan>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It is currently impossible to hot-unplug a memory device between
machine reset and CAS.
(qemu) device_del dimm1
Error: Memory hot unplug not supported for this guest
This limitation was introduced in order to provide an explicit
error path for older guests that didn't support hot-plug event
sources (and thus memory hot-unplug).
The linux kernel has been supporting these since 4.11. All recent
enough guests are thus capable of handling the removal of a memory
device at all time, including during early boot.
Lift the limitation for the latest machine type. This means that
trying to unplug memory from a guest that doesn't support it will
likely just do nothing and the memory will only get removed at
next reboot. Such older guests can still get the existing behavior
by using an older machine type.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160794035064.23292.17560963281911312439.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This allows to drop a user of qdev_get_machine().
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201209170052.1431440-4-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
All users are passing &error_abort already. Document the fact
that spapr_drc_attach() should only be passed a free DRC, which
is supposedly the case if appropriate checking is done earlier.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201201113728.885700-5-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Read documentation in "qapi/error.h" and changelog of commit
e3fe3988d7 ("error: Document Error API usage rules") for
rationale.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120234208.683521-7-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Pre-plug of a memory device, be it an NVDIMM or a PC-DIMM, ensures
that the memory slot is available and that addresses don't overlap
with existing memory regions. The corresponding DRCs in the LMB
and PMEM namespaces are thus necessarily attachable at plug time.
Pass &error_abort to spapr_drc_attach() in spapr_add_lmbs() and
spapr_add_nvdimm(). This allows to greatly simplify error handling
on the plug path.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120234208.683521-3-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Never used from the start.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120174646.619395-6-groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.
This patch contains all the files, whose maintainer I could not get
from ‘get_maintainer.pl’ script.
Signed-off-by: Chetan Pant <chetan4windows@gmail.com>
Message-Id: <20201023124424.20177-1-chetan4windows@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[thuth: Adapted exec.c and qdev-monitor.c to new location]
Signed-off-by: Thomas Huth <thuth@redhat.com>
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.
Signed-off-by: Chetan Pant <chetan4windows@gmail.com>
Message-Id: <20201016145346.27167-1-chetan4windows@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
spapr_reallocate_hpt() has three users, two of which pass &error_fatal
and the third one, htab_load(), passes &local_err, uses it to detect
failures and simply propagates -EINVAL up to vmstate_load(), which will
cause QEMU to exit. It is thus confusing that spapr_reallocate_hpt()
doesn't return right away when an error is detected in some cases. Also,
the comment suggesting that the caller is welcome to try to carry on
seems like a remnant in this respect.
This can be improved:
- change spapr_reallocate_hpt() to always report a negative errno on
failure, either as reported by KVM or -ENOSPC if the HPT is smaller
than what was asked,
- use that to detect failures in htab_load() which is preferred over
checking &local_err,
- propagate this negative errno to vmstate_load() because it is more
accurate than propagating -EINVAL for all possible errors.
[dwg: Fix compile error due to omitted prelim patch]
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160371605460.305923.5890143959901241157.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As recommended in "qapi/error.h", add a bool return value to
spapr_add_lmbs() and spapr_add_nvdimm(), and use them instead
of local_err in spapr_memory_plug().
This allows to get rid of the error propagation overhead.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160309734178.2739814.3488437759887793902.stgit@bahia.lan>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The spapr_create_nvdimm_dr_connectors() function doesn't need to access
any internal details of the sPAPR NVDIMM implementation. Also, pretty
much like for the LMBs, only spapr_machine_init() is responsible for the
creation of DR connectors for NVDIMMs.
Make this clear by making this function static in hw/ppc/spapr.c.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160249772183.757627.7396780936543977766.stgit@bahia.lan>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The changes to come to NUMA support are all guest visible. In
theory we could just create a new 5_1 class option flag to
avoid the changes to cascade to 5.1 and under. The reality is that
these changes are only relevant if the machine has more than one
NUMA node. There is no need to change guest behavior that has
been around for years needlesly.
This new helper will be used by the next patches to determine
whether we should retain the (soon to be) legacy NUMA behavior
in the pSeries machine. The new behavior will only be exposed
if:
- machine is pseries-5.2 and newer;
- more than one NUMA node is declared in NUMA state.
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As recommended in "qapi/error.h", return true on success and false on
failure. This allows to reduce error propagation overhead in the callers.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-14-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As recommended in "qapi/error.h", return true on success and false on
failure. This allows to reduce error propagation overhead in the callers.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-13-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As recommended in "qapi/error.h", return true on success and false on
failure. This allows to reduce error propagation overhead in the callers.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-11-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As recommended in "qapi/error.h", return true on success and false on
failure. This allows to reduce error propagation overhead in the callers.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-9-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
One of the goals of having less boilerplate on QOM declarations
is to avoid human error. Requiring an extra argument that is
never used is an opportunity for mistakes.
Remove the unused argument from OBJECT_DECLARE_TYPE and
OBJECT_DECLARE_SIMPLE_TYPE.
Coccinelle patch used to convert all users of the macros:
@@
declarer name OBJECT_DECLARE_TYPE;
identifier InstanceType, ClassType, lowercase, UPPERCASE;
@@
OBJECT_DECLARE_TYPE(InstanceType, ClassType,
- lowercase,
UPPERCASE);
@@
declarer name OBJECT_DECLARE_SIMPLE_TYPE;
identifier InstanceType, lowercase, UPPERCASE;
@@
OBJECT_DECLARE_SIMPLE_TYPE(InstanceType,
- lowercase,
UPPERCASE);
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Paul Durrant <paul@xen.org>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20200916182519.415636-4-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Separate run of the TypeCheckMacro converter using the --force
flag, for the cases where typedefs weren't found in the same
header nor in typedefs.h.
Generated initially using:
$ ./scripts/codeconverter/converter.py --force -i \
--pattern=TypeCheckMacro $(git grep -l '' -- '*.[ch]')
Then each case was manually reviewed, and a comment was added
indicating what's unusual about those type checking
macros/functions. Despite not following the usual pattern, the
changes in this patch were found to be safe.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20200831210740.126168-15-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Some typedefs and macros are defined after the type check macros.
This makes it difficult to automatically replace their
definitions with OBJECT_DECLARE_TYPE.
Patch generated using:
$ ./scripts/codeconverter/converter.py -i \
--pattern=QOMStructTypedefSplit $(git grep -l '' -- '*.[ch]')
which will split "typdef struct { ... } TypedefName"
declarations.
Followed by:
$ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \
$(git grep -l '' -- '*.[ch]')
which will:
- move the typedefs and #defines above the type check macros
- add missing #include "qom/object.h" lines if necessary
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-9-ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-10-ehabkost@redhat.com>
Message-Id: <20200831210740.126168-11-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The work to be done in h_home_node_associativity() intersects
with what is already done in spapr_numa_fixup_cpu_dt(). This
patch creates a new helper, spapr_numa_get_vcpu_assoc(), to
be used for both spapr_numa_fixup_cpu_dt() and
h_home_node_associativity().
While we're at it, use memcpy() instead of loop assignment
to created the returned array.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200904172422.617460-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In a similar fashion as the previous patch, let's move the
handling of ibm,associativity-lookup-arrays from spapr.c to
spapr_numa.c. A spapr_numa_write_assoc_lookup_arrays() helper was
created, and spapr_dt_dynamic_reconfiguration_memory() can now
use it to advertise the lookup-arrays.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200903220639.563090-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Vcpus have an additional paramenter to be appended, vcpu_id. This
also changes the size of the of property itself, which is being
represented in index 0 of numa_assoc_array[cpu->node_id],
and defaults to MAX_DISTANCE_REF_POINTS for all cases but
vcpus.
All this logic makes more sense in spapr_numa.c, where we handle
everything NUMA and associativity. A new helper spapr_numa_fixup_cpu_dt()
was added, and spapr.c uses it the same way as it was using the former
spapr_fixup_cpu_numa_dt().
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200903220639.563090-3-danielhb413@gmail.com>
[dwg: Correct uint to int type, which can break windows builds]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The next step to centralize all NUMA/associativity handling in
the spapr machine is to create a 'one stop place' for all
things ibm,associativity.
This patch introduces numa_assoc_array, a 2 dimensional array
that will store all ibm,associativity arrays of all NUMA nodes.
This array is initialized in a new spapr_numa_associativity_init()
function, called in spapr_machine_init(). It is being initialized
with the same values used in other ibm,associativity properties
around spapr files (i.e. all zeros, last value is node_id).
The idea is to remove all hardcoded definitions and FDT writes
of ibm,associativity arrays, doing instead a call to the new
helper spapr_numa_write_associativity_dt() helper, that will
be able to write the DT with the correct values.
We'll start small, handling the trivial cases first. The
remaining instances of ibm,associativity will be handled
next.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200903220639.563090-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This function is only used inside spapr_nvdimm.c.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200901125645.118026-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We're going to make changes in how spapr handles all
ibm,associativity* related properties to enhance our current NUMA
support.
At this moment we have associativity code scattered all around
spapr_* files, with hardcoded values and array sizes. This
makes it harder to change any NUMA specific parameters in
the future. Having everything in the same place allows not
only for easier tuning, but also easier understanding since all
NUMA related code is on the same file.
This patch introduces a new file to gather all NUMA/associativity
handling code in spapr, spapr_numa.c. To get things started, let's
remove associativity-reference-points and max-associativity-domains
code from spapr_dt_rtas() to a new helper called spapr_numa_write_rtas_dt().
This will decouple spapr_dt_rtas() from the NUMA changes that
are going to happen in those two properties.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200901125645.118026-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
NVDIMM has different contraints and conditions than the regular
DIMM and we'll need to add at least one more.
Instead of relying on 'if (nvdimm)' conditionals in the body of
spapr_memory_pre_plug(), use the existing spapr_nvdimm_validate_opts()
and put all NVDIMM handling code there. Rename it to
spapr_nvdimm_validate() to reflect that the function is now checking
more than the nvdimm device options. This makes spapr_memory_pre_plug()
a bit easier to follow, and we can tune in NVDIMM parameters
and validation in the same place.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200825215749.213536-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On POWER9, the KVM XIVE device uses priority 7 for the escalation
interrupts. On POWER10, the host can use a reduced set of priorities
and KVM will configure the escalation priority to a lower number. In
any case, the guest is allowed to use priorities in a single range :
[ 0 .. (maxprio - 1) ].
Introduce a 'hv-prio' property to represent the escalation priority
number and use it to compute the "ibm,plat-res-int-priorities"
property defining the priority ranges reserved by the hypervisor.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200819130843.2230799-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_drc.h includes typechecker macro boilerplate for the many different
DRC subclasses. However, most of these types don't actually have different
data in their class and/or instance, making these unneeded, unused, and in
fact a bad idea. Remove them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Move the typedef from spapr_irq.h to spapr.h, and use "struct
SpaprMachineState" in the spapr_*.h headers (to avoid circular
header dependencies).
This will make future conversion to OBJECT_DECLARE* easier.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-By: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20200825192110.3528606-28-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Now that kvmppc_xive_cpu_get_state() returns negative on error, use that
and get rid of the temporary Error object and error_propagate().
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <159707852916.1489912.8376334685349668124.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since kvm_device_access() returns a negative errno on failure, convert
kvmppc_xive_set_source_config() to use it for error checking. This allows
to get rid of the local_err boilerplate.
Propagate the return value so that callers may use it as well to check
failures.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <159707848764.1489912.17078842252160674523.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since kvm_device_access() returns a negative errno on failure, convert
kvmppc_xive_get_queue_config() and kvmppc_xive_set_queue_config() to
use it for error checking. This allows to get rid of the local_err
boilerplate.
Propagate the return value so that callers may use it as well to check
failures.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <159707847357.1489912.2032291280645236480.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
kvm_set_one_reg() returns a negative errno on failure, use that instead
of errno. Also propagate it to callers so they can use it to check
for failures and hopefully get rid of their local_err boilerplate.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <159707846665.1489912.14267225652103441921.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use error_setg_errno() instead of error_setg(strerror()). While here,
use -ret instead of errno since kvm_vcpu_enable_cap() returns a negative
errno on failure.
Use ERRP_GUARD() to ensure that errp can be passed to error_append_hint(),
and get rid of the local_err boilerplate.
Propagate the return value so that callers may use it as well to check
failures.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <159707844549.1489912.4862921680328017645.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Calls to the KVM XIVE device are guarded by kvm_irqchip_in_kernel(). This
ensures that QEMU won't try to use the device if KVM is disabled or if
an in-kernel irqchip isn't required.
When using ic-mode=dual with the pseries machine, we have two possible
interrupt controllers: XIVE and XICS. The kvm_irqchip_in_kernel() helper
will return true as soon as any of the KVM device is created. It might
lure QEMU to think that the other one is also around, while it is not.
This is exactly what happens with ic-mode=dual at machine init when
claiming IRQ numbers, which must be done on all possible IRQ backends,
eg. RTAS event sources or the PHB0 LSI table : only the KVM XICS device
is active but we end up calling kvmppc_xive_source_reset_one() anyway,
which fails. This doesn't cause any trouble because of another bug :
kvmppc_xive_source_reset_one() lacks an error_setg() and callers don't
see the failure.
Most of the other kvmppc_xive_* functions have similar xive->fd
checks to filter out the case when KVM XIVE isn't active. It
might look safer to have idempotent functions but it doesn't
really help to understand what's going on when debugging.
Since we already have all the kvm_irqchip_in_kernel() in place,
also have the callers to check xive->fd as well before calling
KVM XIVE specific code. This is straight-forward for the spapr
specific XIVE code. Some more care is needed for the platform
agnostic XIVE code since it cannot access xive->fd directly.
Introduce new in_kernel() methods in some base XIVE classes
for this purpose and implement them only in spapr.
In all cases, we still need to call kvm_irqchip_in_kernel() so that
compilers can optimize the kvmppc_xive_* calls away when CONFIG_KVM
isn't defined, thus avoiding the need for stubs.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <159679993438.876294.7285654331498605426.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Depending on whether XIVE is emultated or backed with a KVM XIVE device,
the ESB MMIOs of a XIVE source point to an I/O memory region or a mapped
memory region.
This is currently handled by checking kvm_irqchip_in_kernel() returns
false in xive_source_realize(). This is a bit awkward as we usually
need to do extra things when we're using the in-kernel backend, not
less. But most important, we can do better: turn the existing "xive.esb"
memory region into a plain container, introduce an "xive.esb-emulated"
I/O subregion and rename the existing "xive.esb" subregion in the KVM
code to "xive.esb-kvm". Since "xive.esb-kvm" is added with overlap
and a higher priority, it prevails over "xive.esb-emulated" (ie.
a guest using KVM XIVE will interact with "xive.esb-kvm" instead of
the default "xive.esb-emulated" region.
While here, consolidate the computation of the MMIO region size in
a common helper.
Suggested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <159679992680.876294.7520540158586170894.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
NUMA nodes corresponding to GPU memory currently have the same
affinity/distance as normal memory nodes. Add a third NUMA associativity
reference point enabling us to give GPU nodes more distance.
This is guest visible information, which shouldn't change under a
running guest across migration between different qemu versions, so make
the change effective only in new (pseries > 5.0) machine types.
Before, `numactl -H` output in a guest with 4 GPUs (nodes 2-5):
node distances:
node 0 1 2 3 4 5
0: 10 40 40 40 40 40
1: 40 10 40 40 40 40
2: 40 40 10 40 40 40
3: 40 40 40 10 40 40
4: 40 40 40 40 10 40
5: 40 40 40 40 40 10
After:
node distances:
node 0 1 2 3 4 5
0: 10 40 80 80 80 80
1: 40 10 80 80 80 80
2: 80 80 10 80 80 80
3: 80 80 80 10 80 80
4: 80 80 80 80 10 80
5: 80 80 80 80 80 10
These are the same distances as on the host, mirroring the change made
to host firmware in skiboot commit f845a648b8cb ("numa/associativity:
Add a new level of NUMA for GPU's").
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Message-Id: <20200716225655.24289-1-arbab@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>