QEMU is failing to launch a CGS pSeries guest in a host that has PEF
support:
qemu-system-ppc64: ../softmmu/vl.c:2585: qemu_machine_creation_done: Assertion `machine->cgs->ready' failed.
Aborted
This is happening because we're not setting the cgs->ready flag that is
asserted in qemu_machine_creation_done() during machine start.
cgs->ready is set in s390_pv_kvm_init() and sev_kvm_init(). Let's set it
in kvmppc_svm_init() as well.
Reported-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210528201619.52363-1-danielhb413@gmail.com>
Acked-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
TCG does not keep track of AIL mode in a central place, it's based on
the current LPCR[AIL] bits. Synchronize the new CPU's LPCR to the
current LPCR in rtas_start_cpu(), similarly to the way the ILE bit is
synchronized.
Open-code the ILE setting as well now that the caller's LPCR is
available directly, there is no need for the indirection.
Without this, under both TCG and KVM, adding a POWER8/9/10 class CPU
with a new core ID after a modern Linux has booted results in the new
CPU's LPCR missing the LPCR[AIL]=0b11 setting that the other CPUs have.
This can cause crashes and unexpected behaviour.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20210526091626.3388262-3-npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 47a9b55154 ("spapr: Clean up handling of LPCR power-saving exit
bits") moved this logic but did not remove the comment from the
previous location.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20210526091626.3388262-2-npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The FDT code is adding the pmem root node by name "persistent-memory"
which should have been "ibm,persistent-memory".
The linux fetches the device tree nodes by type and it has been working
correctly as the type is correct. If someone searches by its intended
name it would fail, so fix that.
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-Id: <162204278956.219.9061511386011411578.stgit@cc493db1e665>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The subsequent patches add definitions which tend to get
the compilation to cyclic dependency. So, prepare with
forward declarations, move the definitions and clean up.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-Id: <162133925415.610.11584121797866216417.stgit@4f1e6f2bd33e>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
With upstream kernel, especially after commit 98ba956f6a389
("powerpc/pseries/eeh: Rework device EEH PE determination") we see that KVM
guest isn't able to enable EEH option for PCI pass-through devices anymore.
[root@atest-guest ~]# dmesg | grep EEH
[ 0.032337] EEH: pSeries platform initialized
[ 0.298207] EEH: No capable adapters found: recovery disabled.
[root@atest-guest ~]#
So far the linux kernel was assuming pe_config_addr equal to device's
config_addr and using it to enable EEH on the PE through ibm,set-eeh-option
RTAS call. Which wasn't the correct way as per PAPR. The linux kernel
commit 98ba956f6a389 fixed this flow. With that fixed, linux now uses PE
config address returned by ibm,get-config-addr-info2 RTAS call to enable
EEH option per-PE basis instead of per-device basis. However this has
uncovered a bug in qemu where ibm,set-eeh-option is treating PE config
address as per-device config address.
Hence in qemu guest with recent kernel the ibm,set-eeh-option RTAS call
fails with -3 return value indicating that there is no PCI device exist for
the specified PE config address. The rtas_ibm_set_eeh_option call uses
pci_find_device() to get the PC device that matches specific bus and devfn
extracted from PE config address passed as argument. Thus it tries to map
the PE config address to a single specific PCI device 'bus->devices[devfn]'
which always results into checking device on slot 0 'bus->devices[0]'.
This succeeds when there is a pass-through device (vfio-pci) present on
slot 0. But in cases where there is no pass-through device present in slot
0, but present in non-zero slots, ibm,set-eeh-option call fails to enable
the EEH capability.
hw/ppc/spapr_pci_vfio.c: spapr_phb_vfio_eeh_set_option()
case RTAS_EEH_ENABLE: {
PCIHostState *phb;
PCIDevice *pdev;
/*
* The EEH functionality is enabled on basis of PCI device,
* instead of PE. We need check the validity of the PCI
* device address.
*/
phb = PCI_HOST_BRIDGE(sphb);
pdev = pci_find_device(phb->bus,
(addr >> 16) & 0xFF, (addr >> 8) & 0xFF);
if (!pdev || !object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
return RTAS_OUT_PARAM_ERROR;
}
hw/pci/pci.c:pci_find_device()
PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
{
bus = pci_find_bus_nr(bus, bus_num);
if (!bus)
return NULL;
return bus->devices[devfn];
}
This patch fixes ibm,set-eeh-option to check for presence of any PCI device
(vfio-pci) under specified bus and enable the EEH if found. The current
code already makes sure that all the devices on that bus are from same
iommu group (within same PE) and fail very early if it does not.
After this fix guest is able to find EEH capable devices and enable EEH
recovery on it.
[root@atest-guest ~]# dmesg | grep EEH
[ 0.048139] EEH: pSeries platform initialized
[ 0.405115] EEH: Capable adapter found: recovery enabled.
[root@atest-guest ~]#
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Message-Id: <162158429107.145117.5843504911924013125.stgit@jupiter>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU 6.0 moved all the -boot variables to the machine. Especially, the
removal of the boot_order static changed the handling of '-boot once'
from:
if (boot_once) {
qemu_boot_set(boot_once, &error_fatal);
qemu_register_reset(restore_boot_order, g_strdup(boot_order));
}
to
if (current_machine->boot_once) {
qemu_boot_set(current_machine->boot_once, &error_fatal);
qemu_register_reset(restore_boot_order,
g_strdup(current_machine->boot_order));
}
This means that we now register as subsequent boot order a copy
of current_machine->boot_once that was just set with the previous
call to qemu_boot_set(), i.e. we never transition away from the
once boot order.
It is certainly fragile^Wwrong for the spapr code to hijack a
field of the base machine type object like that. The boot order
rework simply turned this software boundary violation into an
actual bug.
Have the spapr code to handle that with its own field in
SpaprMachineState. Also kfree() the initial boot device
string when "once" was used.
Fixes: 4b7acd2ac8 ("vl: clean up -boot variables")
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1960119
Cc: pbonzini@redhat.com
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210521160735.1901914-1-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit e50caf4a5c ("tracing: convert documentation to rST")
converted docs/devel/tracing.txt to docs/devel/tracing.rst.
We still have several references to the old file, so let's fix them
with the following command:
sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt)
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210517151702.109066-2-sgarzare@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Per the kconfig.rst:
A device should be listed [...] ``imply`` if (depending on
the QEMU command line) the board may or may not be started
without it.
This is the case with the NVDIMM device, so use the 'imply'
weak reverse dependency to select the symbol.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210511155354.3069141-2-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Moved has_spr to cpu.h as ppc_has_spr and turned it into an inline function.
Change spr verification in pnv.c and spapr.c to a version that can
compile in a !TCG environment.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Message-Id: <20210507164146.67086-1-lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The function ppc_hash64_filter_pagesizes has been moved from a function
with prototype in mmu-hash64.h and implemented in mmu-hash64.c to
a static function in hw/ppc/spapr_caps.c as it's only used in that file.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Message-Id: <20210506163941.106984-3-lucas.araujo@eldorado.org.br>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The hypercalls h_enter, h_remove, h_bulk_remove, h_protect, and h_read,
have been moved to spapr_softmmu.c with the functions they depend on. The
functions is_ram_address and push_sregs_to_kvm_pr are not static anymore
as functions on both spapr_hcall.c and spapr_softmmu.c depend on them.
The hypercalls h_resize_hpt_prepare and h_resize_hpt_commit have been
divided, the KVM part stayed in spapr_hcall.c while the softmmu part
was moved to spapr_softmmu.c
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Message-Id: <20210506163941.106984-2-lucas.araujo@eldorado.org.br>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Starting with Linux kernel v5.12 we dropped support[1] in KVM for
hosts that can't have their threads running in different MMU modes
(POWER9 < DD2.2). In these hosts, KVM will no longer report the
KVM_CAP_PPC_MMU_HASH_V3 capability[2] when the host is running Radix.
For guests that support both MMU modes, the negotiation during CAS
will make sure it selects the correct one.
For guests that only support Hash, such as P8 compat mode guests, the
following error is currently thrown:
$ ~/qemu-system-ppc64 -machine pseries,accel=kvm,max-cpu-compat=power8 ...
error: kvm run failed Invalid argument
NIP 0000000000000100 LR 0000000000000000 CTR 0000000000000000 XER 0000000000000000 CPU#0
MSR 8000000000001000 HID0 0000000000000000 HF 8000000000000000 iidx 3 didx 3
TB 00000000 00000000 DECR 0
GPR00 0000000000000000 0000000000000000 0000000000000000 000000007ff00000
GPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
CR 00000000 [ - - - - - - - - ] RES ffffffffffffffff
SRR0 0000000000000000 SRR1 0000000000000000 PVR 00000000004e1201 VRSAVE 0000000000000000
SPRG0 0000000000000000 SPRG1 0000000000000000 SPRG2 0000000000000000 SPRG3 0000000000000000
SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000
HSRR0 0000000000000000 HSRR1 0000000000000000
CFAR 0000000000000000
LPCR 000000000004f01f
PTCR 0000000000000000 DAR 0000000000000000 DSISR 0000000000000000
This patch adds a verification during the writing of the platform
support vector so that we error out as soon as we determine this guest
only supports Hash and the host doesn't.
~/qemu-system-ppc64 -machine pseries,accel=kvm,max-cpu-compat=power8 ...
qemu-system-ppc64: Guest requested unavailable MMU mode (hash).
1- https://git.kernel.org/torvalds/p/b1b1697ae0cc8
2- https://git.kernel.org/torvalds/p/a722076e94702
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20210505001130.3999968-3-farosas@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A following patch will make use of it.
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20210505001130.3999968-2-farosas@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Here's the first ppc pull request for qemu-6.1. It has a wide variety
of stuff accumulated during the 6.0 freeze. Highlights are:
* Multi-phase reset cleanups for PAPR
* Preliminary cleanups towards allowing !CONFIG_TCG for the ppc target
* Cleanup of AIL logic and extension to POWER10
* Further improvements to handling of hot unplug failures on PAPR
* Allow much larger numbers of CPU on pseries
* Support for the H_SCM_HEALTH hypercall
* Add support for the Pegasos II board
* Substantial cleanup to hflag handling
* Assorted minor fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAmCQ4ScACgkQbDjKyiDZ
s5KmNhAAsICdDqeu/jm1uhRCr0DDT/Wa6KE1xlglQ53ybWb5Hm2ae0Uwzti5ZWkt
T9yryObX++wiugbU5Dlx9eXTiJIPgTbDoBV1wfOa3a1BAxSEES1t70jwuwAXXBpX
mgU++SurQB70IB7vVvyXDi2Z592qGvMiKXqT0sdkfoexPHzAL0+KkQPyJZLeFchM
Ap/zRHAodXf9SuWAl+LwLXeb350jivXYXBWNcFRrBbOGpbVT0AJMYrk/TEa2ZIpi
SvbzAWuW+9mX0EOmk7JK5JfkT41cGNdcBcwd0bt4xyvUpmkXLaTMFDLVHj3HWSUn
PFA4RB3uKXyTfISVtWdxJBbFOzMpchI6lEiRJHCS+KuY7UsACqV1T/y54ATOUauC
ycLc9APgRaStdNPxfDl+xeFfoVb/f0mQsNwcmY1tv7z+3qE/trY9bMyrbgaebBFn
/TAkmPvXfwtAREnx8xF/57poarWUkvupGTQkANNosdFokpExmrLj8T0sKv90hh5Y
vkGf5zP4pYGN1Rs8qhOdHu+IjhVJvUl/L3LZYWcoMI6E61D8rGRc0Dkacx7gcja+
sluFi5Yh2fQn55y6LTi3049cB1wMd6wly0214F11RKoBswguiGuaqJmL4sNDO/s4
IcMCy5mg6C0jNZA5kHcdWmqsVzD2+XwP5J29n/LedlmgXoHYF+M=
=N0qr
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging
ppc patch queue 2021-05-04
Here's the first ppc pull request for qemu-6.1. It has a wide variety
of stuff accumulated during the 6.0 freeze. Highlights are:
* Multi-phase reset cleanups for PAPR
* Preliminary cleanups towards allowing !CONFIG_TCG for the ppc target
* Cleanup of AIL logic and extension to POWER10
* Further improvements to handling of hot unplug failures on PAPR
* Allow much larger numbers of CPU on pseries
* Support for the H_SCM_HEALTH hypercall
* Add support for the Pegasos II board
* Substantial cleanup to hflag handling
* Assorted minor fixes and cleanups
# gpg: Signature made Tue 04 May 2021 06:52:39 BST
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dg-gitlab/tags/ppc-for-6.1-20210504: (46 commits)
hw/ppc/pnv_psi: Use device_cold_reset() instead of device_legacy_reset()
hw/ppc/spapr_vio: Reset TCE table object with device_cold_reset()
hw/intc/spapr_xive: Use device_cold_reset() instead of device_legacy_reset()
target/ppc: removed VSCR from SPR registration
target/ppc: Reduce the size of ppc_spr_t
target/ppc: Clean up _spr_register et al
target/ppc: Add POWER10 exception model
target/ppc: rework AIL logic in interrupt delivery
target/ppc: move opcode table logic to translate.c
target/ppc: code motion from translate_init.c.inc to gdbstub.c
spapr_drc.c: handle hotunplug errors in drc_unisolate_logical()
spapr.h: increase FDT_MAX_SIZE
spapr.c: do not use MachineClass::max_cpus to limit CPUs
ppc: Rename current DAWR macros and variables
target/ppc: POWER10 supports scv
target/ppc: Fix POWER9 radix guest HV interrupt AIL behaviour
docs/system: ppc: Add documentation for ppce500 machine
roms/u-boot: Bump ppce500 u-boot to v2021.04 to fix broken pci support
roms/Makefile: Update ppce500 u-boot build directory name
ppc/spapr: Add support for implement support for H_SCM_HEALTH
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The pnv_psi.c code uses device_legacy_reset() for two purposes:
* to reset itself from its qemu_register_reset() handler
* to reset a XiveSource object it has
Neither it nor the XiveSource have any qbuses, so the new
device_cold_reset() function (which resets both the device and its
child buses) is equivalent here to device_legacy_reset() and we can
just switch to the new API.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210503151849.8766-4-peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The spapr_vio_quiesce_one() function resets the TCE table object
(TYPE_SPAPR_TCE_TABLE) via device_legacy_reset(). We know that
objects of that type do not have a qbus of their own, so the new
device_cold_reset() function (which resets both the device and its
child buses) is equivalent here to device_legacy_reset() and we can
just switch to the new API.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210503151849.8766-3-peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
POWER10 adds a new bit that modifies interrupt behaviour, LPCR[HAIL],
and it removes support for the LPCR[AIL]=0b10 mode.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20210501072436.145444-3-npiggin@gmail.com>
[dwg: Corrected tab indenting]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The AIL logic is becoming unmanageable spread all over powerpc_excp(),
and it is slated to get even worse with POWER10 support.
Move it all to a new helper function.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20210501072436.145444-2-npiggin@gmail.com>
[dwg: Corrected tab indenting]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
At this moment, PAPR does not provide a way to report errors during a
device removal operation. This led the pSeries machine to implement
extra mechanisms to try to fallback and recover from an error that might
have happened during the hotunplug in the guest side. This started to
change a bit with commit fe1831eff8 ("spapr_drc.c: use DRC
reconfiguration to cleanup DIMM unplug state"), where one way to
fallback from a memory removal error was introduced.
Around the same time, in [1], the idea of using RTAS set-indicator for
this role was first introduced. The RTAS set-indicator call, when
attempting to UNISOLATE a DRC that is already UNISOLATED or CONFIGURED,
returns RTAS_OK and does nothing else for both QEMU and phyp. This gives
us an opportunity to use this behavior to signal the hypervisor layer
when a device removal errir happens, allowing QEMU/phyp to do a proper
error handling. Using set-indicator to report HP errors isn't strange to
PAPR, as per R1-13.5.3.4-4. of table 13.7 of current PAPR [2]:
"For all DR options: If this is a DR operation that involves the user
insert- ing a DR entity, then if the firmware can determine that the
inserted entity would cause a system disturbance, then the set-indicator
RTAS call must not unisolate the entity and must return an error status
which is unique to the particular error."
A change was proposed to the pSeries Linux kernel to call set-indicator
to move a DRC to 'unisolate' in the case of a hotunplug error in the
guest side [3]. Setting a DRC that is already unisolated or configured to
'unisolate' is a no-op (returns RTAS_OK) for QEMU and also for phyp.
Being a benign change for hypervisors that doesn't care about handling
such errors, we expect the kernel to accept this change at some point.
This patch prepares the pSeries machine for this new kernel feature by
changing drc_unisolate_logical() to handle guest side hotunplug errors.
For CPUs it's a simple matter of setting drc->unplug_requested to 'false',
while for LMBs the process is similar to the rollback that is done in
rtas_ibm_configure_connector().
[1] https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06395.html
[2] https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200611.pdf
[3] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210416210216.380291-3-danielhb413@gmail.com/
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210420165100.108368-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Up to this patch, 'max_cpus' value is hardcoded to 1024 (commit
6244bb7e58). In theory this patch would simply bump it to 2048, since
it's the default NR_CPUS kernel setting for ppc64 servers nowadays, but
the whole mechanic of MachineClass:max_cpus is flawed for the pSeries
machine. The two supported accelerators, KVM and TCG, can live without
it.
TCG guests don't have a theoretical limit. The user must be free to
emulate as many CPUs as the hardware is capable of. And even if there
were a limit, max_cpus is not the proper way to report it since it's a
common value checked by SMP code in machine_smp_parse() for KVM as well.
For KVM guests, the proper way to limit KVM CPUs is by host
configuration via NR_CPUS, not a QEMU hardcoded value. There is no
technical reason for a pSeries QEMU guest to forcefully stay below
NR_CPUS.
This hardcoded value also disregard hosts that might have a lower
NR_CPUS limit, say 512. In this case, machine.c:machine_smp_parse() will
allow a 1024 value to pass, but then kvm_init() will complain about it
because it will exceed NR_CPUS:
Number of SMP cpus requested (1024) exceeds the maximum cpus supported
by KVM (512)
A better 'max_cpus' value would consider host settings, but
MachineClass::max_cpus is defined well before machine_init() and
kvm_init(). We can't check for KVM limits because it's too soon, so we
end up making a guess.
This patch makes MachineClass:max_cpus settings innocuous by setting it
to INT32_MAX. machine.c:machine_smp_parse() will not fail the
verification based on max_cpus, letting kvm_init() do the checking with
actual host settings. And TCG guests get to do whatever the hardware is
capable of emulating.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210408204049.221802-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add support for H_SCM_HEALTH hcall described at [1] for spapr
nvdimms. This enables guest to detect the 'unarmed' status of a
specific spapr nvdimm identified by its DRC and if its unarmed, mark
the region backed by the nvdimm as read-only.
The patch adds h_scm_health() to handle the H_SCM_HEALTH hcall which
returns two 64-bit bitmaps (health bitmap, health bitmap mask) derived
from 'struct nvdimm->unarmed' member.
Linux kernel side changes to enable handling of 'unarmed' nvdimms for
ppc64 are proposed at [2].
References:
[1] "Hypercall Op-codes (hcalls)"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n220
[2] "powerpc/papr_scm: Mark nvdimm as unarmed if needed during probe"
https://lore.kernel.org/linux-nvdimm/20210329113103.476760-1-vaibhav@linux.ibm.com/
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Message-Id: <20210402102128.213943-1-vaibhav@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
SLOF instantiates RTAS since
744a928cce ("spapr: Stop providing RTAS blob")
so the max address applies to the FDT only.
This renames the macro and fixes up the comment.
This should not cause any behavioral change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210331025123.29310-1-aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add new machine called pegasos2 emulating the Genesi/bPlan Pegasos II,
a PowerPC board based on the Marvell MV64361 system controller and the
VIA VT8231 integrated south bridge/superio chips. It can run Linux,
AmigaOS and a wide range of MorphOS versions. Currently a firmware ROM
image is needed to boot and only MorphOS has a video driver to produce
graphics output. Linux could work too but distros that supported this
machine don't include usual video drivers so those only run with
serial console for now.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <30cbfb9cbe6f46a1e15a69a75fac45ac39340122.1616680239.git.balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210315184615.1985590-16-richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210315184615.1985590-15-richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On Mac99 and newer machines, the Uninorth PCI host bridge maps
the PCI hole region at 2GiB, so the RAM area beside 2GiB is not
accessible by the CPU. Restrict the memory to 2GiB to avoid
problems such the one reported in the buglink.
Buglink: https://bugs.launchpad.net/qemu/+bug/1922391
Reported-by: Håvard Eidnes <he@NetBSD.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210406084842.2859664-1-f4bug@amsat.org>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Stop including exec/address-spaces.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-5-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Stop including cpu.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-4-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Stop including hw/boards.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-3-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Stop including sysemu/sysemu.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-2-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Many files include qemu/log.h without needing it. Remove the superfluous
include statements.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20210328054833.2351597-1-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Many files include hw/irq.h without needing it. Remove the superfluous
include statements.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20210327050236.2232347-1-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Commit 47c8c915b1 fixed a problem where multiple spapr_drc_detach()
requests were breaking QEMU. The solution was to just spapr_drc_detach()
once, and use spapr_drc_unplug_requested() to filter whether we already
detached it or not. The commit also tied the hotplug request to the
guest in the same condition.
Turns out that there is a reliable way for a CPU hotunplug to fail. If a
guest with one CPU hotplugs a CPU1, then offline CPU0s via 'echo 0 >
/sys/devices/system/cpu/cpu0/online', then attempts to hotunplug CPU1,
the kernel will refuse it because it's the last online CPU of the
system. Given that we're pulsing the IRQ only in the first try, in a
failed attempt, all other CPU1 hotunplug attempts will fail, regardless
of the online state of CPU1 in the kernel, because we're simply not
letting the guest know that we want to hotunplug the device.
Let's move spapr_hotplug_req_remove_by_index() back out of the "if
(!spapr_drc_unplug_requested(drc))" conditional, allowing for multiple
'device_del' requests to the same CPU core to reach the guest, in case
the CPU core didn't fully hotunplugged previously.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210401000437.131140-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pseries machines introduced the concept of 'unplug timeout' for CPU
hotunplugs. The idea was to circunvent a deficiency in the pSeries
specification (PAPR), that currently does not define a proper way for
the hotunplug to fail. If the guest refuses to release the CPU (see [1]
for an example) there is no way for QEMU to detect the failure.
Further discussions about how to send a QAPI event to inform about the
hotunplug timeout [2] exposed problems that weren't predicted back when
the idea was developed. Other QEMU machines don't have any type of
hotunplug timeout mechanism for any device, e.g. ACPI based machines
have a way to make hotunplug errors visible to the hypervisor. This
would make this timeout mechanism exclusive to pSeries, which is not
ideal.
The real problem is that a QAPI event that reports hotunplug timeouts
puts the management layer (namely Libvirt) in a weird spot. We're not
telling that the hotunplug failed, because we can't be 100% sure of
that, and yet we're resetting the unplug state back, preventing any
DEVICE_DEL events to reach out in case the guest decides to release the
device. Libvirt would need to inspect the guest itself to see if the
device was released or not, otherwise the internal domain states will be
inconsistent. Moreover, Libvirt already has an 'unplug timeout'
concept, and a QEMU side timeout would need to be juggled together with
the existing Libvirt timeout.
All this considered, this solution ended up creating more trouble than
it solved. This patch reverts the 3 commits that introduced the timeout
mechanism for CPU hotplugs in pSeries machines.
This reverts commit 4515a5f786
"qemu_timer.c: add timer_deadline_ms() helper"
This reverts commit d1c2e3ce3d
"spapr_drc.c: add hotunplug timeout for CPUs"
This reverts commit 51254ffb32
"spapr_drc.c: introduce unplug_timeout_timer"
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1911414
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg04682.html
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210401000437.131140-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The e500plat machine device plug callback currently calls
platform_bus_link_device() for any sysbus device. This is overly
broad, because platform_bus_link_device() will unconditionally grab
the IRQs and MMIOs of the device it is passed, whether it was
intended for the platform bus or not. Restrict hotpluggability of
sysbus devices to only those devices on the dynamic sysbus allowlist.
We were mostly getting away with this because the board creates the
platform bus as the last device it creates, and so the hotplug
callback did not do anything for all the sysbus devices created by
the board itself. However if the user plugged in a device which
itself uses a sysbus device internally we would have mishandled this
and probably asserted. An example of this is:
qemu-system-ppc64 -M ppce500 -device macio-oldworld
This isn't a sensible command because the macio-oldworld device
is really specific to the 'g3beige' machine, but we now fail
with a reasonable error message rather than asserting:
qemu-system-ppc64: Device heathrow is not supported by this machine yet.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-id: 20210325153310.9131-5-peter.maydell@linaro.org
spapr_memory_unplug() is the last step of the hot unplug sequence.
It is indirectly called by:
spapr_lmb_release()
hotplug_handler_unplug()
and spapr_lmb_release() already buys us that DIMM unplug state is
present : it gets restored with spapr_recover_pending_dimm_state()
if missing.
g_assert() that spapr_pending_dimm_unplugs_find() cannot return NULL
in spapr_memory_unplug() to make this clear and silence Coverity.
Fixes: Coverity CID 1450767
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <161562021166.948373.15092876234470478331.stgit@bahia.lan>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Per devicetree spec v0.3 [1] chapter 2.3.5:
The #address-cells and #size-cells properties are not inherited
from ancestors in the devicetree. They shall be explicitly defined.
If missing, a client program should assume a default value of 2
for #address-cells, and a value of 1 for #size-cells.
These properties are currently missing, causing the <reg> property
of the queue-group subnode to be incorrectly parsed using default
values.
[1] https://github.com/devicetree-org/devicetree-specification/releases/download/v0.3/devicetree-specification-v0.3.pdf
Fixes: fdfb7f2cdb ("e500: Add support for eTSEC in device tree")
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20210311081608.66891-1-bmeng.cn@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The 'ide-hd' and 'ide-cd' devices provide suitable alternatives.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Previous work on dev-iotlb message broke spapr_iommu/vhost integration
as it did for SMMU and virtio-iommu. The spapr_iommu currently
only sends IOMMU_NOTIFIER_UNMAP notifications. Since commit
958ec334bc ("vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support"),
VHOST first tries to register IOMMU_NOTIFIER_DEVIOTLB_UNMAP notifier
and if it fails, falls back to legacy IOMMU_NOTIFIER_UNMAP. So
spapr_iommu must fail on the IOMMU_NOTIFIER_DEVIOTLB_UNMAP
registration.
Reported-by: Peter Xu <peterx@redhat.com>
Fixes: b68ba1ca57 ("memory: Add IOMMU_NOTIFIER_DEVIOTLB_UNMAP IOMMUTLBNotificationType")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210209213233.40985-3-eric.auger@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Next batch of patches for the ppc target and machine types. Includes:
* Several cleanups for sm501 from Peter Maydell
* An update to the SLOF guest firmware
* Improved handling of hotplug failures in spapr, associated cleanups
to the hotplug handling code
* Several etsec fixes and cleanups from Bin Meng
* Assorted other fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAmBIRlUACgkQbDjKyiDZ
s5JGlxAApWKpxdtMwrxvQ7EX95XtDWY0v2Jpl3ZKLhYgWJ28pt1SfsDUlA9KhlDd
syXITpyspECe9kjOAKEim4J0y5sMVlTw8KjzIVPMik4uyoLTOBwE+nRmwPnmnWEy
9ZH0J+QOonQYh3jCp7JbTGU2ZW5pJ9s/sv8bPbzXfrR07HbAJ2+MjUkTVxkSVJAq
QUvo/jMntu+a1HFU8Eiw8VyyIcIOAQyS469xzUiHHzKFlR8XodE56Vj+oh6ZFtaA
cB2h4U51uzGfpz+GISm3lZUHSVnWQSFwLAc4x66aRsnLiQ66iAu8N0jRh8lsoW0y
FHF+uGp3AFUARHOiCRk0r7+s29gbu+lX2jogfddj+qj7mGIZXd2tMfrrG3eWsB2C
HvNby4xzyyDaguHK7N0/C42B8OX5dy2pxOP5lvdzL20ip97AKRGXngyM7LhYH8yw
4uzdebYVFu0KkLri4Qzxjm/GxgzrCbWIe5ImsDIlnmY1cJ7NKQYPzFX56xqq147y
6USFQu7RM9E03vj3c9UIkmK0KhL8GQvYxX4dMWIUjtjeLGJuN5seKBkl5mH2OSEJ
D9svKOanXmsZYS0A25VX9FRX263zbJ1HIkDmGzpLi7HULdRy78e89rJk6490WNDr
mnLogO+ttBvhEaLUsIVrWwLd21JW/A2NHuEz0+KELr9ZOQMYRj8=
=/uyx
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.0-20210310' into staging
ppc patch queue for 2021-03-10
Next batch of patches for the ppc target and machine types. Includes:
* Several cleanups for sm501 from Peter Maydell
* An update to the SLOF guest firmware
* Improved handling of hotplug failures in spapr, associated cleanups
to the hotplug handling code
* Several etsec fixes and cleanups from Bin Meng
* Assorted other fixes and cleanups
# gpg: Signature made Wed 10 Mar 2021 04:08:53 GMT
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dg-gitlab/tags/ppc-for-6.0-20210310:
spapr.c: send QAPI event when memory hotunplug fails
spapr.c: remove duplicated assert in spapr_memory_unplug_request()
target/ppc: fix icount support on Book-e vms accessing SPRs
qemu_timer.c: add timer_deadline_ms() helper
spapr_pci.c: add 'unplug already in progress' message for PCI unplug
spapr.c: add 'unplug already in progress' message for PHB unplug
hw/ppc: e500: Add missing <ranges> in the eTSEC node
hw/net: fsl_etsec: Fix build error when HEX_DUMP is on
spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
spapr_drc.c: add hotunplug timeout for CPUs
spapr_drc.c: introduce unplug_timeout_timer
target/ppc: Fix bcdsub. emulation when result overflows
docs/system: Extend PPC section
spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable
pseries: Update SLOF firmware image
spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical()
hw/display/sm501: Inline template header into C file
hw/display/sm501: Expand out macros in template header
hw/display/sm501: Remove dead code for non-32-bit RGB surfaces
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The 'running' argument from VMChangeStateHandler does not require
other value than 0 / 1. Make it a plain boolean.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210111152020.1422021-3-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Recent changes allowed the pSeries machine to rollback the hotunplug
process for the DIMM when the guest kernel signals, via a
reconfiguration of the DR connector, that it's not going to release the
LMBs.
Let's also warn QAPI listerners about it. One place to do it would be
right after the unplug state is cleaned up,
spapr_clear_pending_dimm_unplug_state(). This would mean that the
function is now doing more than cleaning up the pending dimm state
though.
This patch does the following changes in spapr.c:
- send a QAPI event to inform that we experienced a failure in the
hotunplug of the DIMM;
- rename spapr_clear_pending_dimm_unplug_state() to
spapr_memory_unplug_rollback(). This is a better fit for what the
function is now doing, and it makes callers care more about what the
function goal is and less about spapr.c internals such as clearing
the pending dimm unplug state.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210302141019.153729-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We are asserting the existence of the first DRC LMB after sending unplug
requests to all LMBs of the DIMM, where every DRC is being asserted
inside the loop. This means that the first DRC is being asserted twice.
Remove the duplicated assert.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210302141019.153729-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pSeries machine is using QEMUTimer internals to return the timeout
in seconds for a timer object, in hw/ppc/spapr.c, function
spapr_drc_unplug_timeout_remaining_sec().
Create a helper in qemu-timer.c to retrieve the deadline for a QEMUTimer
object, in ms, to avoid exposing timer internals to the PPC code.
CC: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210301124133.23800-2-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Hotunplug for all other devices are warning the user when the hotunplug
is already in progress. Do the same for PCI devices in
spapr_pci_unplug_request().
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210226163301.419727-5-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Both CPU hotunplug and PC_DIMM unplug reports an user warning,
mentioning that the hotunplug is in progress, if consecutive
'device_del' are issued in quick succession.
Do the same for PHBs in spapr_phb_unplug_request().
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210226163301.419727-4-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The eTSEC node should provide an empty <ranges> property in the
eTSEC node, otherwise of_translate_address() in the Linux kernel
fails to get the eTSEC register base, reporting:
OF: ** translation for device /platform@f00000000/ethernet@0/queue-group **
OF: bus is default (na=1, ns=1) on /platform@f00000000/ethernet@0
OF: translating address: 00000000
OF: parent bus is default (na=1, ns=1) on /platform@f00000000
OF: no ranges; cannot translate
Per devicetree spec v0.3 [1] chapter 2.3.8:
If the property is not present in a bus node, it is assumed that
no mapping exists between children of the node and the parent
address space.
This is why of_translate_address() aborts the address translation.
Apparently U-Boot devicetree parser seems to be tolerant with
missing <ranges> as this was not noticed when testing with U-Boot.
The empty <ranges> property is present in all kernel shipped dtsi
files for eTSEC, Let's add it to conform with the spec.
[1] https://github.com/devicetree-org/devicetree-specification/releases/download/v0.3/devicetree-specification-v0.3.pdf
Fixes: fdfb7f2cdb ("e500: Add support for eTSEC in device tree")
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <1614158919-9473-1-git-send-email-bmeng.cn@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>