We can get the job done in spapr_numa_write_assoc_lookup_arrays() a bit
cleaner:
- 'cur_index = int_buf = g_malloc0(..)' is doing a g_malloc0() in the
'int_buf' pointer and making 'cur_index' point to 'int_buf' all in a
single line. No problem with that, but splitting into 2 lines is clearer
to follow
- use g_autofree in 'int_buf' to avoid a g_free() call later on
- 'buf_len' is only being used to store the size of 'int_buf' malloc.
Remove the var and just use the value in g_malloc0() directly
- remove the 'ret' var and just return the result of fdt_setprop()
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-12-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Use g_autoptr() with GArray* and GString* pointers to avoid calling
g_free() and the need for the 'out' label.
'drc_name' can also be g_autofreed to avoid a g_free() call at the end
of the while() loop.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-7-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-6-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-5-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
And get rid of the 'out' label since it's now unused.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-4-danielhb413@gmail.com>
[ clg: Fixed typo in commit log ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The firmware check consists on a file search (qemu_find_file) and load
it via load_imag_targphys(). This validation is not dependent on any
other machine state but it currently being done at the end of
spapr_machine_init(). This means that we can do a lot of stuff and end
up failing at the end for something that we can verify right out of the
gate.
Move this validation to the start of spapr_machine_init() to fail
earlier. While we're at it, use g_autofree in the 'filename' pointer.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-3-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-2-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The trigger message coming from a HW source contains a special bit
informing the XIVE interrupt controller that the PQ bits have been
checked at the source or not. Depending on the value, the IC can
perform the check and the state transition locally using its own PQ
state bits.
The following changes add new accessors to the XiveRouter required to
query and update the PQ state bits. This only applies to the PowerNV
machine. sPAPR accessors are provided but the pSeries machine should
not be concerned by such complex configuration for the moment.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
POWER10 adds support for StoreEOI operation and 64K ESB pages on PSIHB
to be consistent with the other interrupt sources of the system.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
and use a pnv_chip_power10_quad_realize() helper to avoid code
duplication with P9. This still needs some refinements on the XSCOM
registers handling in PnvQuad.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Our OCC model is very mininal and POWER10 can simply reuse the OCC
model we introduced for POWER9.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The XIVE2 interrupt controller of the POWER10 processor follows the
same logic than on POWER9 but the HW interface has been largely
reviewed. It has a new register interface, different BARs, extra
VSDs, new layout for the XIVE2 structures, and a set of new features
which are described below.
This is a model of the POWER10 XIVE2 interrupt controller for the
PowerNV machine. It focuses primarily on the needs of the skiboot
firmware but some initial hypervisor support is implemented for KVM
use (escalation).
Support for new features will be implemented in time and will require
new support from the OS.
* XIVE2 BARS
The interrupt controller BARs have a different layout outlined below.
Each sub-engine has now own its range and the indirect TIMA access was
replaced with a set of pages, one per CPU, under the IC BAR:
- IC BAR (Interrupt Controller)
. 4 pages, one per sub-engine
. 128 indirect TIMA pages
- TM BAR (Thread Interrupt Management Area)
. 4 pages
- ESB BAR (ESB pages for IPIs)
. up to 1TB
- END BAR (ESB pages for ENDs)
. up to 2TB
- NVC BAR (Notification Virtual Crowd)
. up to 128
- NVPG BAR (Notification Virtual Process and Group)
. up to 1TB
- Direct mapped Thread Context Area (reads & writes)
OPAL does not use the grouping and crowd capability.
* Virtual Structure Tables
XIVE2 adds new tables types and also changes the field layout of the END
and NVP Virtualization Structure Descriptors.
- EAS
- END new layout
- NVT was splitted in :
. NVP (Processor), 32B
. NVG (Group), 32B
. NVC (Crowd == P9 block group) 32B
- IC for remote configuration
- SYNC for cache injection
- ERQ for event input queue
The setup is slighly different on XIVE2 because the indexing has changed
for some of the tables, block ID or the chip topology ID can be used.
* XIVE2 features
SCOM and MMIO registers have a new layout and XIVE2 adds a new global
capability and configuration registers.
The lowlevel hardware offers a set of new features among which :
- a configurable number of priorities : 1 - 8
- StoreEOI with load-after-store ordering is activated by default
- Gen2 TIMA layout
- A P9-compat mode, or Gen1, TIMA toggle bit for SW compatibility
- increase to 24bit for VP number
Other features will have some impact on the Hypervisor and guest OS
when activated, but this is not required for initial support of the
controller.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Determine the IRQ number in the same way as for pnv_dt_ipmi_bt(). This
resolves one usage of ISADevice::isairq[] which allows it to be removed
eventually.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220301220037.76555-6-shentey@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Event RTC_CHANGE is "emitted when the guest changes the RTC time" (and
the RTC supports the event). What if there's more than one RTC?
Which one changed? New @qom-path identifies it.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <87a6ejnm80.fsf@pond.sub.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This commit effectively reverts commit 183e4281a3, which moved
the RTC_CHANGE event to the target schema. That change was an
attempt to make the event target-specific to improve introspection,
but the event isn't really target-specific: it's machine or device
specific. Putting RTC_CHANGE in the target schema with an ifdef list
reduces maintainability (by adding an if: list with a long list of
targets that needs to be manually updated as architectures are added
or removed or as new devices gain the RTC_CHANGE functionality) and
increases compile time (by preventing RTC devices which emit the
event from being "compile once" rather than "compile once per
target", because qapi-events-misc-target.h uses TARGET_* ifdefs,
which are poisoned in "compile once" files.)
Move RTC_CHANGE back to misc.json.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Greg Kurz <groug@kaod.org>
Message-Id: <20220221192123.749970-2-peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
More than 1k of TypeInfo instances are already marked as const. Mark the
remaining ones, too.
This commit was created with:
git grep -z -l 'static TypeInfo' -- '*.c' | \
xargs -0 sed -i 's/static TypeInfo/static const TypeInfo/'
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20220117145805.173070-2-shentey@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This implements the Nested KVM HV hcall API for spapr under TCG.
The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
L1 is switched back in returned from the hcall when a HV exception
is sent to the vhyp. Register state is copied in and out according to
the nested KVM HV hcall API specification.
The hdecr timer is started when the L2 is switched in, and it provides
the HDEC / 0x980 return to L1.
The MMU re-uses the bare metal radix 2-level page table walker by
using the get_pate method to point the MMU to the nested partition
table entry. MMU faults due to partition scope errors raise HV
exceptions and accordingly are routed back to the L1.
The MMU does not tag translations for the L1 (direct) vs L2 (nested)
guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
and exit).
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-10-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Introduce virtual hypervisor methods that can support a "Nested KVM HV"
implementation using the bare metal 2-level radix MMU, and using HV
exceptions to return from H_ENTER_NESTED (rather than cause interrupts).
HV exceptions can now be raised in the TCG spapr machine when running a
nested KVM HV guest. The main ones are the lev==1 syscall, the hdecr,
hdsi and hisi, hv fu, and hv emu, and h_virt external interrupts.
HV exceptions are intercepted in the exception handler code and instead
of causing interrupts in the guest and switching the machine to HV mode,
they go to the vhyp where it may exit the H_ENTER_NESTED hcall with the
interrupt vector numer as return value as required by the hcall API.
Address translation is provided by the 2-level page table walker that is
implemented for the bare metal radix MMU. The partition scope page table
is pointed to the L1's partition scope by the get_pate vhc method.
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220216102545.1808018-9-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
In prepartion for implementing a full partition table option for
vhyp, update the get_pate method to take an lpid and return a
success/fail indicator.
The spapr implementation currently just asserts lpid is always 0
and always return success.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-6-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Machines which don't emulate the HDEC facility are able to use the
timer for something else. Provide functions to start and stop the
hdecr timer.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-4-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The spapr virtual hypervisor does not require the hdecr timer.
Remove it.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20220216102545.1808018-3-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
If the device backend is not persistent memory for the nvdimm, there is
need for explicit IO flushes on the backend to ensure persistence.
On SPAPR, the issue is addressed by adding a new hcall to request for
an explicit flush from the guest when the backend is not pmem. So, the
approach here is to convey when the hcall flush is required in a device
tree property. The guest once it knows the device backend is not pmem,
makes the hcall whenever flush is required.
To set the device tree property, a new PAPR specific device type inheriting
the nvdimm device is implemented. When the backend doesn't have pmem=on
the device tree property "ibm,hcall-flush-required" is set, and the guest
makes hcall H_SCM_FLUSH requesting for an explicit flush. The new device
has boolean property pmem-override which when "on" advertises the device
tree property even when pmem=on for the backend. The flush function
invokes the fdatasync or pmem_persist() based on the type of backend.
The vmstate structures are made part of the spapr-nvdimm device object.
The patch attempts to keep the migration compatibility between source and
destination while rejecting the incompatibles ones with failures.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <164396256092.109112.17933240273840803354.stgit@ltczzess4.aus.stglabs.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The patch adds support for the SCM flush hcall for the nvdimm devices.
To be available for exploitation by guest through the next patch. The
hcall is applicable only for new SPAPR specific device class which is
also introduced in this patch.
The hcall expects the semantics such that the flush to return with
H_LONG_BUSY_ORDER_10_MSEC when the operation is expected to take longer
time along with a continue_token. The hcall to be called again by providing
the continue_token to get the status. So, all fresh requests are put into
a 'pending' list and flush worker is submitted to the thread pool. The
thread pool completion callbacks move the requests to 'completed' list,
which are cleaned up after collecting the return status for the guest
in subsequent hcall from the guest.
The semantics makes it necessary to preserve the continue_tokens and
their return status across migrations. So, the completed flush states
are forwarded to the destination and the pending ones are restarted
at the destination in post_load. The necessary nvdimm flush specific
vmstate structures are also introduced in this patch which are to be
saved in the new SPAPR specific nvdimm device to be introduced in the
following patch.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <164396254862.109112.16675611182159105748.stgit@ltczzess4.aus.stglabs.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The PowerPC 601 processor is the first generation of processors to
implement the PowerPC architecture. It was designed as a bridge
processor and also could execute most of the instructions of the
previous POWER architecture. It was found on the first Macs and IBM
RS/6000 workstations.
There is not much interest in keeping the CPU model of this
POWER-PowerPC bridge processor. We have the 603 and 604 CPU models of
the 60x family which implement the complete PowerPC instruction set.
Cc: "Hervé Poussineau" <hpoussin@reactos.org>
Cc: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20220203142756.1302515-1-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Hi
This time I have disabled vmstate canary patches form Dave Gilbert.
Let's see if it works.
Later, Juan.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmH0NkEACgkQ9IfvGFhy
1yM4VQ/+MML5ugA9XA5hOFV+Stwv2ENtMR4r4raQsC7UKdKMaCNuoj1BdlXMRaki
E2TpoHYq99rfJX+AA0XihxHh84I1l9fpoiXrcr8pgNmhcj0qkBykY9Elzf95woMM
UMyinL2jhHfHjby29AaE7BDelUZIA0BgyzQ3TMq8rO+l/ZqFYA8U1SEgPlDYj7cn
gkDWFkPJx6IKgcI8M1obHw11azHgS7dmjjl9lXzxJ2/WfXnoZCuU0BtHd6a1rnAS
qcO3gwLfCo+3aTGKRseJie1Cljz6sIP+ke0Xgn5O+e7alWjCOtlVZrWwd2MqQ07K
2bf7uuTC2KQicLLH8DCnoH/BSvHmpyl/FglFrETRk/55KKg0bi+ZltXaTs9bC2uO
jzNbBSRf8UMcX6Bp3ukhPaFQ1vxqP7KxN9bM+7LYP9aX7Lt/NCJciYjw1jCTwcwi
nz0RS4d7cscMhoMEarPCKcaNJR6PJetdZY2VXavWjXv6er3407yTocvuei0Epdyb
WZtbFnpI2tfx1GEr/Bz6Mnk/qn7kwo7BFEUtJoweFE05g5wHa1PojsblrrsqeOuc
llpK8o8c8NFACxeiLa0z0VBkTjdOtao206eLhF+Se3ukubImayRQwZiOCEBBXwB3
+LmVcmwNDfNonSWI04AA2WAy9gAdM3Ko/gBfWsuOPR5oIs65wns=
=F/ek
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/quintela-gitlab/tags/migration-20220128-pull-request' into staging
Migration Pull request (Take 2)
Hi
This time I have disabled vmstate canary patches form Dave Gilbert.
Let's see if it works.
Later, Juan.
# gpg: Signature made Fri 28 Jan 2022 18:30:25 GMT
# gpg: using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full]
# gpg: aka "Juan Quintela <quintela@trasno.org>" [full]
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723
* remotes/quintela-gitlab/tags/migration-20220128-pull-request: (36 commits)
migration: Move temp page setup and cleanup into separate functions
migration: Simplify unqueue_page()
migration: Add postcopy_has_request()
migration: Enable UFFD_FEATURE_THREAD_ID even without blocktime feat
migration: No off-by-one for pss->page update in host page size
migration: Tally pre-copy, downtime and post-copy bytes independently
migration: Introduce ram_transferred_add()
migration: Don't return for postcopy_send_discard_bm_ram()
migration: Drop return code for disgard ram process
migration: Do chunk page in postcopy_each_ram_send_discard()
migration: Drop postcopy_chunk_hostpages()
migration: Don't return for postcopy_chunk_hostpages()
migration: Drop dead code of ram_debug_dump_bitmap()
migration/ram: clean up unused comment.
migration: Report the error returned when save_live_iterate fails
migration/migration.c: Remove the MIGRATION_STATUS_ACTIVE when migration finished
migration/migration.c: Avoid COLO boot in postcopy migration
migration/migration.c: Add missed default error handler for migration state
Remove unnecessary minimum_version_id_old fields
multifd: Rename pages_used to normal_pages
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The migration code will not look at a VMStateDescription's
minimum_version_id_old field unless that VMSD has set the
load_state_old field to something non-NULL. (The purpose of
minimum_version_id_old is to specify what migration version is needed
for the code in the function pointed to by load_state_old to be able
to handle it on incoming migration.)
We have exactly one VMSD which still has a load_state_old,
in the PPC CPU; every other VMSD which sets minimum_version_id_old
is doing so unnecessarily. Delete all the unnecessary ones.
Commit created with:
sed -i '/\.minimum_version_id_old/d' $(git grep -l '\.minimum_version_id_old')
with the one legitimate use then hand-edited back in.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
It missed vmstate_ppc_cpu.
softmmu/rtc.c defines two public functions: qemu_get_timedate() and
qemu_timedate_diff(). Currently we keep the prototypes for these in
qemu-common.h, but most files don't need them. Move them to their
own header, a new include/sysemu/rtc.h.
Since the C files using these two functions did not need to include
qemu-common.h for any other reason, we can remove those include lines
when we add the include of the new rtc.h.
The license for the .h file follows that of the softmmu/rtc.c
where both the functions are defined.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
vof.h requires "qom/object.h" for DECLARE_CLASS_CHECKERS(),
"exec/memory.h" for address_space_read/write(),
"exec/address-spaces.h" for address_space_memory
and more importantly "cpu.h" for target_ulong.
vof.c doesn't need "exec/ram_addr.h".
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220122003104.84391-1-f4bug@amsat.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
spapr_get_fw_dev_path() is an impl of
FWPathProviderClass::get_dev_path(). This interface is used by
hw/core/qdev-fw.c via fw_path_provider_try_get_dev_path() in two
functions:
- static char *qdev_get_fw_dev_path_from_handler(), which is used only in
qdev_get_fw_dev_path_helper() and it's guarded by "if (dev &&
dev->parent_bus)";
- char *qdev_get_own_fw_dev_path_from_handler(), which is used in
softmmu/bootdevice.c in get_boot_device_path() like this:
if (dev) {
d = qdev_get_own_fw_dev_path_from_handler(dev->parent_bus, dev);
This means that, when called via softmmu/bootdevice.c, there's no check
of 'dev->parent_bus' being not NULL. The result is that the "BusState
*bus" arg of spapr_get_fw_dev_path() can potentially be NULL and if, at
the same time, "SCSIDevice *d" is not NULL, we'll hit this line:
void *spapr = CAST(void, bus->parent, "spapr-vscsi");
And we'll SIGINT because 'bus' is NULL and we're accessing bus->parent.
Adding a simple 'bus != NULL' check to guard the instances where we
access 'bus->parent' can avoid this altogether.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220121213852.30243-1-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
"PowerPC Processor binding to IEEE 1275" says in
"8.2.1. Initial Register Values" that the initial state is defined as
32bit so do it for both SLOF and VOF.
This should not cause behavioral change as SLOF switches to 64bit very
early anyway. As nothing enforces LE anywhere, this drops it for VOF.
The goal is to make VOF work with TCG as otherwise it barfs with
qemu: fatal: TCG hflags mismatch (current:0x6c000004 rebuilt:0x6c000000)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220107072423.2278113-1-aik@ozlabs.ru>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This patch introduces pnv-phb4 user creatable devices that are created
in a similar manner as pnv-phb3 devices, allowing the user to interact
with the PHBs directly instead of creating PCI Express Controllers that
will create a certain amount of PHBs per controller index.
We accomplish this by doing the following:
- add a pnv_phb4_get_stack() helper to retrieve which stack an user
created phb4 would occupy;
- when dealing with an user created pnv-phb4 (detected by checking if
phb->stack is NULL at the start of phb4_realize()), retrieve its stack
and initialize its properties as done in stk_realize();
- use 'defaults_enabled()' in stk_realize() to avoid creating and
initializing a 'stack->phb' qdev that might be overwritten by an user
created pnv-phb4 device. This process is wrapped into a new helper
called pnv_pec_stk_default_phb_realize().
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220111131027.599784-5-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
It is not used elsewhere so that's where it belongs.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220105212338.49899-10-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
PHB3s ared SysBus devices and should be allowed to be dynamically
created.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220105212338.49899-9-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The powernv machine uses the object hierarchy to populate the device
tree and each device should be parented to the chip it belongs to.
This is not the case for user created devices which are parented to
the container "/unattached".
Make sure a PHB3 device is parented to its chip by reparenting the
object if necessary.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220105212338.49899-8-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
PHB3 devices and PCI devices can now be added to the powernv8 machine
using :
-device pnv-phb3,chip-id=0,index=1 \
-device nec-usb-xhci,bus=pci.1,addr=0x0
The 'index' property identifies the PHB3 in the chip. In case of user
created devices, a lookup on 'chip-id' is required to assign the
owning chip.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220105212338.49899-7-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This cleanups the PHB3 model a bit more since the root port is an
independent device and it will ease our task when adding user created
PHB3s.
pnv_phb_attach_root_port() is made public in pnv.c so it can be reused
with the pnv_phb4 root port later.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220105212338.49899-4-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
POWER5+ (ISA v2.03) processors are supported by the pseries machine
but they do not have Altivec instructions. Do not advertise support
for it in the DT.
To be noted that this test is in contradiction with the assert in
cap_vsx_apply().
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20220105095142.3990430-3-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Add 7.0 machine types for arm/i440fx/q35/s390x/spapr.
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211217143948.289995-1-cohuck@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Timers are already initialized in ppc4xx_init(). No need to do it a
second time with a wrong set.
Fixes: d715ea9612 ("PPC: 405: Fix ppc405ep initialization")
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211222064025.1541490-7-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220103063441.3424853-8-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This is a small cleanup to ease reading. It includes the removal of a
check done on the returned value of g_malloc0(), which can not fail.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211222064025.1541490-6-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220103063441.3424853-7-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The 405 timers were broken when booke support was added. Assumption
was made that the register numbers were the same but it's not :
SPR_BOOKE_TSR (0x150)
SPR_BOOKE_TCR (0x154)
SPR_40x_TSR (0x3D8)
SPR_40x_TCR (0x3DA)
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Fixes: ddd1055b07 ("PPC: booke timers")
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211222064025.1541490-5-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220103063441.3424853-6-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Use a QEMU log primitive for errors and trace events for debug.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.drobear.id.au>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211222064025.1541490-3-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220103063441.3424853-4-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The POWER8 processors with a NVLink logic unit have 4 PHB3 devices per
chip.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20211222063817.1541058-2-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
When -nodefaults is supported for PHB4 devices, the pecs array under
the chip will be empty. This will break the 'info pic' HMP command.
Do a QOM loop on the chip children and look for PEC PHB4 devices
instead.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20211213132830.108372-15-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This change will help us providing support for user created PHB4
devices.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211213132830.108372-14-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This is not useful and will be in the way for support of user created
PHB4 devices.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211213132830.108372-13-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Use the num_stacks class attribute to compute the PHB index depending
on the PEC index :
* PEC0 provides 1 PHB (PHB0)
* PEC1 provides 2 PHBs (PHB1 and PHB2)
* PEC2 provides 3 PHBs (PHB3, PHB4 and PHB5)
The routine pnv_pec_phb_offset() is a bit complex but it also prepares
ground for PHB5 which has a different layout of stacks: 3 per PECs.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211213132830.108372-12-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Each PEC device of the POWER9 chip has a predefined number of stacks,
equivalent of a root port complex:
PEC0 -> 1 stack
PEC1 -> 2 stacks
PEC2 -> 3 stacks
Introduce a class attribute to hold these values and remove the
"num-stacks" property.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211213132830.108372-11-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
And check the PEC index using the chip class.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211213132830.108372-10-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
It prepares ground for PHB5 which has different values.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211213132830.108372-9-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
POWER9 processor comes with 3 PHB4 PEC (PCI Express Controller) and
each PEC can have several PHBs :
* PEC0 provides 1 PHB (PHB0)
* PEC1 provides 2 PHBs (PHB1 and PHB2)
* PEC2 provides 3 PHBs (PHB3, PHB4 and PHB5)
A num_pecs class attribute represents better the logic units of the
POWER9 chip. Use that instead of num_phbs which fits POWER8 chips.
This will ease adding support for user created devices.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211213132830.108372-8-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
When -nodefaults is supported for PHB3 devices, the phbs array under
the chip will be empty. This will break the XICSFabric handlers, and
all interrupt delivery, and the 'info pic' HMP command.
Do a QOM loop on the chip children and look for PHB3 devices instead.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20211213132830.108372-7-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This change will help us providing support for user created PHB3
devices.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211213132830.108372-6-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
It is never used.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211213132830.108372-5-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This change will help us move the mapping of XSCOM regions under the
PHB3 realize routine, which will be necessary for user created PHB3
devices.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211213132830.108372-3-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This patch starts an IBM Power8+ compatible PMU implementation by adding
the representation of PMU events that we are going to sample,
PMUEventType. This enum represents a Perf event that is being sampled by
a specific counter 'sprn'. Events that aren't available (i.e. no event
was set in MMCR1) will be of type 'PMU_EVENT_INVALID'. Events that are
inactive due to frozen counter bits state are of type
'PMU_EVENT_INACTIVE'. Other types added in this patch are
PMU_EVENT_CYCLES and PMU_EVENT_INSTRUCTIONS. More types will be added
later on.
Let's also add the required PMU cycle overflow timers. They will be used
to trigger cycle overflows when cycle events are being sampled. This
timer will call cpu_ppc_pmu_timer_cb(), which in turn calls
fire_PMC_interrupt(). Both functions are stubs that will be implemented
later on when EBB support is added.
Two new helper files are created to host this new logic.
cpu_ppc_pmu_init() will init all overflow timers during CPU init time.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20211201151734.654994-2-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Adapt the fields offset in the board information for Linux. Since
Linux relies on the CPU frequency value, I wonder how it ever worked.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211206103712.1866296-15-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The board information for the 405EP first appeared in commit 04f20795ac
("Move PowerPC 405 specific definitions into a separate file ...")
An Ethernet address is a 6 byte number. Fix that.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20211206103712.1866296-14-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
These values are computed and updated by U-Boot at startup. Use them
as defaults to improve direct Linux boot.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211206103712.1866296-13-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The machine can already boot with kernel and initrd U-boot images if a
firmware is loaded first. Adapt and improve the load sequence to let
the machine boot directly from a Linux kernel ELF image and a usual
initrd image if a firmware image is not provided. For that, install a
custom CPU reset handler to setup the registers and to start the CPU
from the Linux kernel entry point.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211206103712.1866296-12-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This routine is a small helper to cleanup the code. The update of the
flash fields were removed because there are not of any use when booting
from a Linux kernel image. It should be functionally equivalent.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211206103712.1866296-11-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
QEMU installs a custom U-Boot in-memory descriptor to share board
information with Linux, which means that the QEMU machine was
initially designed to support booting Linux directly without using the
loaded FW. But, it's not that simple because the CPU still starts at
address 0xfffffffc where nothing is currently mapped. Support must
have been broken these last years.
Since we can not find a "ppc405_rom.bin" firmware file, request one to
be specified on the command line. A consequence of this change is that
the machine can be booted directly from Linux without any FW being
loaded. This is still broken and the CPU start address will be fixed
in the next changes.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211206103712.1866296-10-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
It is currently impossible to find a "ppc405_rom.bin" firmware file or
a full flash image for the PPC405EP evalution board. Even if it should
be technically possible to recreate such an image, it's unlikely that
anyone will do it since the board is obsolete and support in QEMU has
been broken for about 10 years.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211206103712.1866296-9-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
I will be useful to rework the boot from Linux.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211206103712.1866296-7-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
It was introduced in commit b8d3f5d126 ("Add flags to support
PowerPC 405 bootinfos variations.") but since its value has always
been set to '1'.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20211206103712.1866296-6-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
and one error message to a LOG_GUEST_ERROR.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211206103712.1866296-5-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The PPC 405 CPU is a system-on-a-chip, so all 405 machines are very similar,
except for some external periphery. However, the periphery of the 'taihu'
machine is hardly emulated at all (e.g. neither the LCD nor the USB part had
been implemented), so there is not much value added by this board. The users
can use the 'ref405ep' machine to test their PPC405 code instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20211203164904.290954-2-thuth@redhat.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211206103712.1866296-3-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The default addresses to load the kernel, fdt, initrd of AMCC boards
in U-Boot v2015.10 are :
"kernel_addr_r=1000000\0"
"fdt_addr_r=1800000\0"
"ramdisk_addr_r=1900000\0"
The taihu is one of these boards, the ref405ep is not but we don't
have much information on it and both boards have a very similar
address space layout.
Also, if loaded at address 0, U-Boot will partially overwrite the
uImage because of a bug in get_ram_size() (U-Boot v2015.10) not
restoring properly the probed RAM contents and because the exception
vectors are installed in the same range. Finally, a gzipped kernel
image will be uncompressed at 0x0. These are all good reasons for not
mappping a kernel image at this address.
Change the kernel load address to match U-Boot expectations and fix
loading.
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20211202191446.1292125-1-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211206103712.1866296-2-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Setting -uuid in the pnv machine does not work:
./qemu-system-ppc64 -machine powernv8,accel=tcg -uuid 7ff61ca1-a4a0-4bc1-944c-abd114a35e80
qemu-system-ppc64: error creating device tree: (fdt_property_string(fdt, "system-id", buf)): FDT_ERR_BADSTATE
This happens because we're using fdt_property_string(), which is a
sequential write function that is supposed to be used when we're
building a new FDT, in a case where read/writing into an existing FDT.
Fix it by using fdt_setprop_string() instead.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20211207094858.744386-1-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
If one tries to use -machine powernv9,accel=kvm in a Power9 host, a
cryptic error will be shown:
qemu-system-ppc64: Register sync failed... If you're using kvm-hv.ko, only "-cpu host" is possible
qemu-system-ppc64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid argument
Appending '-cpu host' will throw another error:
qemu-system-ppc64: invalid chip model 'host' for powernv9 machine
The root cause is that in IBM PowerPC we have different specs for the bare-metal
and the guests. The bare-metal follows OPAL, the guests follow PAPR. The kernel
KVM modules presented in the ppc kernels implements PAPR. This means that we
can't use KVM accel when using the powernv machine, which is the emulation of
the bare-metal host.
All that said, let's give a more informative error in this case.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20211130133153.444601-2-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The mac.h header defines a MAX_CPUS macro. This is confusingly named,
because it suggests it's a generic setting, but in fact it's used
by only the g3beige and mac99 machines. It's also using a single
macro for two values which aren't inherently the same -- if one
of these two machines was updated to support SMP configurations
then it would want a different max_cpus value to the other.
Since the macro is used in only two places, just expand it out
and get rid of it. If hypothetical future work to support SMP
in these boards needs a compile-time-known limit on the number
of CPUs, we can give it a suitable name at that point.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211105184216.120972-1-peter.maydell@linaro.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
When updating the R bit of a PTE, the Hash64 MMU was using a wrong byte
offset, causing the first byte of the adjacent PTE to be corrupted.
This caused a panic when booting FreeBSD, using the Hash MMU.
Fixes: a2dd4e83e7 ("ppc/hash64: Rework R and C bit updates")
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Commit 71e6fae3a9 fixed an issue with FORM2 affinity guests with NUMA
nodes in which the distance info is absent in
machine_state->numa_state->nodes. This happens when QEMU adds a default
NUMA node and when the user adds NUMA nodes without specifying the
distances.
During the discussions of the forementioned patch [1] it was found that
FORM1 guests were behaving in a strange way in the same scenario, with
the kernel seeing the distances between the nodes as '160', as we can
see in this example with 4 NUMA nodes without distance information:
$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node 0 1 2 3
0: 10 160 160 160
1: 160 10 160 160
2: 160 160 10 160
3: 160 160 160 10
Turns out that we have the same problem with FORM1 guests - we are
calculating associativity domain using zeroed values. And as it also
turns out, the solution from 71e6fae3a9 applies to FORM1 as well.
This patch creates a wrapper called 'get_numa_distance' that contains
the logic used in FORM2 to define node distances when this information
is absent. This helper is then used in all places where we need to read
distance information from machine_state->numa_state->nodes. That way
we'll guarantee that the NUMA node distance is always being curated
before being used.
After this patch, the FORM1 guest mentioned above will have the
following topology:
$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node 0 1 2 3
0: 10 20 20 20
1: 20 10 20 20
2: 20 20 10 20
3: 20 20 20 10
This is compatible with what FORM2 guests and other archs do in this
case.
[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg01960.html
Fixes: 690fbe4295 ("spapr_numa: consider user input when defining associativity")
CC: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
CC: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
A configuration that specifies multiple nodes without distance info
results in the non-local points in the FORM2 matrix having a distance of
0. This causes Linux to complain "Invalid distance value range" because
a node distance is smaller than the local distance.
Fix this by building a simple local / remote fallback for points where
distance information is missing.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20211105135137.1584840-1-npiggin@gmail.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Test is wrong and the backend can never updated. It could have led to
a QEMU crash but since the firmware deactivates flash access if a valid
layout is not detected, it went unnoticed.
Reported-by: Coverity CID 1465223
Fixes: 35dde57662 ("ppc/pnv: Add a PNOR model")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211102162905.762078-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
They're actually more commonly used than the helper without _under_bus, because
most callers do have the pci bus on hand. After exporting we can switch a lot
of the call sites to use these two helpers.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20211028043129.38871-3-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Recent Linux kernels are accessing the PCI device in slot 0 that
represents the PCI host bridge. This causes ppc4xx_pci_map_irq()
to return -1 which causes an assert() later:
hw/pci/pci.c:262: pci_bus_change_irq_level: Assertion `irq_num >= 0' failed.
Thus we should allocate an IRQ line for the device in slot 0, too.
To avoid changes to the outside of ppc4xx_pci.c, we map it to
the internal IRQ number 4 which will then happily be ignored since
ppc440_bamboo.c does not wire it up.
With these changes it is now possible again to use recent Linux
kernels for the bamboo board.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20211019091817.469003-1-thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This only helps Linux guests as only that seems to use it.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <1c1e030f2bbc86e950b3310fb5922facdc21ef86.1634241019.git.balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Define a constant for PCI config addresses to make it clearer what
these numbers are.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <9bd8e84d02d91693b71082a1fadeb86e6bce3025.1634241019.git.balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead of relying on the mapped address of the MV64361 registers
access them via their memory region. This is not a problem at reset
time when these registers are mapped at the default address but the
guest could change this later and then the RTAS calls accessing PCI
config registers could fail. None of the guests actually do this so
this only avoids a theoretical problem not seen in practice.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <b6f768023603dc2c4d130720bcecdbea459b7668.1634241019.git.balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is needed for Linux to access RTC time.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <6233eb07c680d6c74427e11b9641958f98d53378.1634241019.git.balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Issue a warning when using VOF (which is the default) but no -kernel
option given to let users know that it will likely fail as the guest
has nothing to run. It is not a hard error because it may still be
useful to start the machine without further options for testing or
inspecting it from monitor without actually booting it.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <a4ec9a900df772b91e9f69ca7a0799d8ae293e5a.1634241019.git.balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The CHRP spec this board confirms to only allows 2 GiB of system
memory below 4 GiB as the high 2 GiB is allocated to IO and system
resources. To avoid problems with memory overlapping these areas
restrict RAM to 2 GiB similar to mac_newworld.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <54f58229a69c9c1cca21bcecad700b3d7052edd5.1634241019.git.balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When using u-boot as firmware with the taihu board, QEMU aborts with
this assertion:
ERROR:../accel/tcg/tcg-accel-ops.c:79:tcg_handle_interrupt: assertion failed:
(qemu_mutex_iothread_locked())
Running QEMU with "-d in_asm" shows that the crash happens when writing
to SPR 0x3f2, so we are missing to lock the iothread in the code path
here.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20211006071140.565952-1-thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 962104f044 ("hw/ppc: moved hcalls that depend on softmmu")
introduced a lot of unnecessary #include directives. Remove them.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211006170801.178023-1-philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 4d9b8ef9b5 ("target/ppc: Fix 64-bit decrementer") introduced
new int64t variables and broke the test triggering the decrementer
exception. Revert partially the change to evaluate both clause of the
if statement.
Reported-by: Coverity CID 1464061
Fixes: 4d9b8ef9b5 ("target/ppc: Fix 64-bit decrementer")
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211005053324.441132-1-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
DeviceState.id is a pointer to a string that is stored in the QemuOpts
object DeviceState.opts and freed together with it. We want to create
devices without going through QemuOpts in the future, so make this a
separately allocated string.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211008133442.141332-9-kwolf@redhat.com>
Reviewed-by: Damien Hedde <damien.hedde@greensocs.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now we have a common structure SMPCompatProps used to store information
about SMP compatibility stuff, so we can also move smp_prefer_sockets
there for cleaner code.
No functional change intended.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-15-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the real SMP hardware topology world, it's much more likely that
we have high cores-per-socket counts and few sockets totally. While
the current preference of sockets over cores in smp parsing results
in a virtual cpu topology with low cores-per-sockets counts and a
large number of sockets, which is just contrary to the real world.
Given that it is better to make the virtual cpu topology be more
reflective of the real world and also for the sake of compatibility,
we start to prefer cores over sockets over threads in smp parsing
since machine type 6.2 for different arches.
In this patch, a boolean "smp_prefer_sockets" is added, and we only
enable the old preference on older machines and enable the new one
since type 6.2 for all arches by using the machine compat mechanism.
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-10-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rename the "allocate and return" qbus creation function to
qbus_new(), to bring it into line with our _init vs _new convention.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20210923121153.23754-6-peter.maydell@linaro.org
This patch has a handful of modifications for the recent added
FORM2 support:
- to not allocate more than the necessary size in 'distance_table'.
At this moment the array is oversized due to allocating uint32_t for
all elements, when most of them fits in an uint8_t. Fix it by
changing the array to uint8_t and allocating the exact size;
- use stl_be_p() to store the uint32_t at the start of 'distance_table';
- use sizeof(uint32_t) to skip the uint32_t length when populating the
distances;
- use the NUMA_DISTANCE_MIN macro from sysemu/numa.h to avoid hardcoding
the local distance value.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210922122852.130054-2-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The current way the mask is built can overflow with a 64-bit decrementer.
Use sextract64() to extract the signed values and remove the logic to
handle negative values which has become useless.
Cc: Luis Fernando Fujita Pires <luis.pires@eldorado.org.br>
Fixes: a8dafa5251 ("target/ppc: Implement large decrementer support for TCG")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210920061203.989563-5-clg@kaod.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
numa_complete_configuration() in hw/core/numa.c always adds a NUMA node
for the pSeries machine if none was specified, but without node distance
information for the single node created.
NUMA FORM1 affinity code didn't rely on numa_state information to do its
job, but FORM2 does. As is now, this is the result of a pSeries guest
with NUMA FORM2 affinity when no NUMA nodes is specified:
$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0
node 0 size: 16222 MB
node 0 free: 15681 MB
No distance information available.
This can be amended in spapr_numa_FORM2_write_rtas_tables(). We're
enforcing that the local distance (the distance to the node to itself) is
always 10. This allows for the proper creation of the NUMA distance tables,
fixing the output of 'numactl -H' in the guest:
$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0
node 0 size: 16222 MB
node 0 free: 15685 MB
node distances:
node 0
0: 10
CC: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-8-danielhb413@gmail.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and straightforward NUMA distance assignment without relying on
complex associations between several levels of NUMA via
ibm,associativity matches. Another feature is its extensibility. This base
support contains the facilities for NUMA distance assignment, but in the
future more facilities will be added for latency, performance, bandwidth
and so on.
This patch implements the base FORM2 affinity support as follows:
- the use of FORM2 associativity is indicated by using bit 2 of byte 5
of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1
or FORM2 affinity. Setting both forms will default to FORM2. We're not
advertising FORM2 for pseries-6.1 and older machine versions to prevent
guest visible changes in those;
- ibm,associativity-reference-points has a new semantic. Instead of
being used to calculate distances via NUMA levels, it's now used to
indicate the primary domain index in the ibm,associativity domain of
each resource. In our case it's set to {0x4}, matching the position
where we already place logical_domain_id;
- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table
and ibm,numa-distance-table. The index table is used to list all the
NUMA logical domains of the platform, in ascending order, and allows for
spartial NUMA configurations (although QEMU ATM doesn't support that).
ibm,numa-distance-table is an array that contains all the distances from
the first NUMA node to all other nodes, then the second NUMA node
distances to all other nodes and so on;
- get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity()
now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest
selected FORM2 affinity during CAS.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-7-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has a different latency than the original NUMA node from the
regular memory. FORM2 is also able to deal with asymmetric NUMA
distances gracefully, something that our FORM1 implementation doesn't
do.
Move these FORM1 verifications to a new function and wait until after
CAS, when we're sure that we're sticking with FORM1, to enforce them.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introducing a new NUMA affinity, FORM2, requires a new mechanism to
switch between affinity modes after CAS. Also, we want FORM2 data
structures and functions to be completely separated from the existing
FORM1 code, allowing us to avoid adding new code that inherits the
existing complexity of FORM1.
The idea of switching values used by the write_dt() functions in
spapr_numa.c was already introduced in the previous patch, and
the same approach will be used when dealing with the FORM1 and FORM2
arrays.
We can accomplish that by that by renaming the existing numa_assoc_array
to FORM1_assoc_array, which now is used exclusively to handle FORM1 affinity
data. A new helper get_associativity() is then introduced to be used by the
write_dt() functions to retrieve the current ibm,associativity array of
a given node, after considering affinity selection that might have been
done during CAS. All code that was using numa_assoc_array now needs to
retrieve the array by calling this function.
This will allow for an easier plug of FORM2 data later on.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The next preliminary step to introduce NUMA FORM2 affinity is to make
the existing code independent of FORM1 macros and values, i.e.
MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch
accomplishes that by doing the following:
- move the NUMA related macros from spapr.h to spapr_numa.c where they
are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used
to refer to the maximum number of NUMA nodes, including GPU nodes, that
the machine can support;
- MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to
FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific
macros are used in FORM1 init functions;
- code that uses MAX_DISTANCE_REF_POINTS now retrieves the
max_dist_ref_points value using get_max_dist_ref_points().
NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE
is replaced by get_vcpu_assoc_size(). These functions are used by the
generic device tree functions and h_home_node_associativity() and will
allow them to switch between FORM1 and FORM2 without changing their core
logic.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When first introduced, 'legacy_numa' was a way to refer to guests that
either wouldn't be affected by associativity domain calculations, namely
the ones with only 1 NUMA node, and pre 5.2 guests that shouldn't be
affected by it because it would be an userspace change. Calling these
cases 'legacy_numa' was a convenient way to label these cases.
We're about to introduce a new NUMA affinity, FORM2, and this concept
of 'legacy_numa' is now a bit misleading because, although it is called
'legacy' it is in fact a FORM1 exclusive contraint.
This patch removes spapr_machine_using_legacy_numa() and open code the
conditions in each caller. While we're at it, move the chunk inside
spapr_numa_FORM1_affinity_init() that sets all numa_assoc_array domains
with 'node_id' to spapr_numa_define_FORM1_domains(). This chunk was
being executed if !pre_5_2_numa_associativity and num_nodes => 1, the
same conditions in which spapr_numa_define_FORM1_domains() is called
shortly after.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies
and doesn't need be concerned with all the legacy support for older
pseries FORM1 guests.
We're also not going to calculate associativity domains based on numa
distance (via spapr_numa_define_associativity_domains) since the
distances will be written directly into new DT properties.
Let's split FORM1 code into its own functions to allow for easier
insertion of FORM2 logic later on.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If an unknown pin of the IRQ controller is raised, something is very
wrong in the QEMU model. It is better to abort.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210920061203.989563-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Linux Kernel 5.12 is now unisolating CPU DRCs in the device_removal
error path, signalling that the hotunplug process wasn't successful.
This allow us to send a DEVICE_UNPLUG_GUEST_ERROR in drc_unisolate_logical()
to signal this error to the management layer.
We also have another error path in spapr_memory_unplug_rollback() for
configured LMB DRCs. Kernels older than 5.13 will not unisolate the LMBs
in the hotunplug error path, but it will reconfigure them. Let's send
the DEVICE_UNPLUG_GUEST_ERROR event in that code path as well to cover the
case of older kernels.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210907004755.424931-7-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The error_report() call in drc_unisolate_logical() is not considering
that drc->dev->id can be NULL, and the underlying functions error_report()
calls to do its job (vprintf(), g_strdup_printf() ...) has undefined
behavior when trying to handle "%s" with NULL arguments.
Besides, there is no utility into reporting that an unknown device was
rejected by the guest.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210907004755.424931-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As done in hw/acpi/memory_hotplug.c, pass an empty string if dev->id
is NULL to qapi_event_send_mem_unplug_error() to avoid relying on
a behavior that can be changed in the future.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210907004755.424931-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210902130928.528803-3-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This to avoid possible conflicts with the "id" property of QOM objects.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210901094153.227671-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On P10, the chip id is calculated from the "Primary topology table
index". See skiboot commits for more information [1].
This information is extracted from the hdata on real systems which
QEMU needs to emulate. Add this property for all machines even if it
is only used on POWER10.
[1] https://github.com/open-power/skiboot/commit/2ce3f083f399https://github.com/open-power/skiboot/commit/a2d4d7f9e14a
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210901094153.227671-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210901094153.227671-3-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add 6.2 machine types for arm/i440fx/q35/s390x/spapr.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
First ppc pull request for qemu-6.2. As usual, there's a fair bit
here, since it's been queued during the 6.1 freeze. Highlights are:
* Some fixes for 128 bit arithmetic and some vector opcodes that use
them
* Significant improvements to the powernv to support POWER10 cpus
(more to come though)
* Several cleanups to the ppc softmmu code
* A few other assorted fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAmEoj5gACgkQbDjKyiDZ
s5JFPw/+JOmi1G6eY3u/kYJ8TJhe65s6TJDQhGQiQSBoBShRBJ1+bro3fPGA8pkT
48NAb9RnTnLqys+vhScF7qt2wIxXJFVoVyMhAj2Xv11VQzDPpbLGg6+2Qt7WFraQ
zyeEKBQQTV29RtV7UBUEmx4ZGmnoc0cmzl3QGO3Jq17ucOHNTSW19QpxU60wClU1
PZIUDoWdt7FBS8lvj/55736H3z6ZRnBqZtW9m64ln+CBQuuKo5UkAkaooaJhEFJx
OUZYeo+zky8YaYSWwTFGIxBYhwptnAWCsqkzeJUxPw1ICAzwj/kQX7ckVhbgTpbE
CADpgkATXTbQzLFipzxJ45UMP0yMsk5IOPZ6FS9G+JfsP2T92RMwy7XhqPfWCoov
WKqX/xpmGTnJONuQ7SO/bWUyPH4K7hYgSPPlLAcwDYCg4szWRIbTCs9Yr9rzAPhk
KqKUGLb7D7Rbi1ulSC2ieqsTqVmp6plfnjxR2gPcbp0FltqGln6tVZEHEyPjTEv0
5b7w+3AHDwh9a4NyzULaxxBKktNU1KXKe74/U86qhJtx4kXFSkAhoeztcR30zmUX
W1xjb5eoRgFbHnoDTCtDYAUwuz2w1/I2OLA5kfnSQnRQS0YiqUeicbBkW6iIE61z
oM86ZwEQX1lyf7agECRgpfdcPa6uyAQ72QUR5wgvXDW59PSNNxk=
=C5XY
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.2-20210827' into staging
ppc patch queue 2021-08-27
First ppc pull request for qemu-6.2. As usual, there's a fair bit
here, since it's been queued during the 6.1 freeze. Highlights are:
* Some fixes for 128 bit arithmetic and some vector opcodes that use
them
* Significant improvements to the powernv to support POWER10 cpus
(more to come though)
* Several cleanups to the ppc softmmu code
* A few other assorted fixes
# gpg: Signature made Fri 27 Aug 2021 08:09:12 BST
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dg-gitlab/tags/ppc-for-6.2-20210827:
target/ppc: fix vector registers access in gdbstub for little-endian
include/qemu/int128.h: introduce bswap128s
target/ppc: fix vextu[bhw][lr]x helpers
include/qemu/int128.h: define struct Int128 according to the host endianness
ppc/xive: Export xive_presenter_notify()
ppc/xive: Export PQ get/set routines
ppc/pnv: add a chip topology index for POWER10
ppc/pnv: Distribute RAM among the chips
ppc/pnv: Use a simple incrementing index for the chip-id
ppc/pnv: powerpc_excp: Do not discard HDECR exception when entering power-saving mode
ppc/pnv: Change the POWER10 machine to support DD2 only
ppc: Add a POWER10 DD2 CPU
ppc/pnv: update skiboot to commit 820d43c0a775.
target/ppc: moved store_40x_sler to helper_regs.c
target/ppc: moved ppc_store_sdr1 to mmu_common.c
target/ppc: divided mmu_helper.c in 2 files
spapr_pci: Fix leak in spapr_phb_vfio_get_loc_code() with g_autofree
xive: Remove extra '0x' prefix in trace events
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
But always give the first 1GB to chip 0 as skiboot requires it.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210809134547.689560-6-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When the QEMU PowerNV machine was introduced, multi chip support
modeled a two socket system with dual chip modules as found on some P8
Tuleta systems (8286-42A). But this is hardly used and not relevant
for QEMU. Use a simple index instead.
With this change, we can now increase the max socket number to 16 as
found on high end systems.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210809134547.689560-5-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is no need to keep the DD1 chip model as it will never be
publicly available.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210809134547.689560-3-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This uses g_autofree to simplify logic in spapr_phb_vfio_get_loc_code(),
in the process fixing a leak in one of the paths. I'm told this fixes
Coverity error CID 1460454
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: 16b0ea1d85 ("spapr_pci: populate ibm,loc-code")
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
arch_init.h only defines the QEMU_ARCH_* enumeration and the
arch_type global. Don't include it in files that don't use those.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20210730105947.28215-8-peter.maydell@linaro.org
spapr_mce_req_event() makes an effort to prevent migration from
degrading the reporting of FWNMIs. It adds a migration blocker when
it receives one, and deletes it when it's done handling it. This is a
best effort.
Commit 2500fb423a "migration: Include migration support for machine
check handling" tried to explain this in a comment. Rewrite the
comment for clarity, and reposition it to make it clear it applies to
all failure modes, not just "migration already in progress".
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Aravinda Prasad <arawinda.p@gmail.com>
Cc: Ganesh Goudar <ganeshgr@linux.ibm.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210720125408.387910-4-armbru@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Coverity reported issues which are caused by mixing of signed return codes
from DTC and unsigned return codes of the client interface.
This introduces PROM_ERROR and makes distinction between the error types.
This fixes NEGATIVE_RETURNS, OVERRUN issues reported by Coverity.
This adds a comment about the return parameters number in the VOF hcall.
The reason for such counting is to keep the numbers look the same in
vof_client_handle() and the Linux (an OF client).
vmc->client_architecture_support() returns target_ulong and we want to
propagate this to the client (for example H_MULTI_THREADS_ACTIVE).
The VOF path to do_client_architecture_support() needs chopping off
the top 32bit but SLOF's H_CAS does not; and either way the return values
are either 0 or 32bit negative error code. For now this chops
the top 32bits.
This makes "claim" fail if the allocated address is above 4GB as
the client interface is 32bit. This still allows claiming memory above
4GB as potentially initrd can be put there and the client can read
the address from the FDT's "available" property.
Fixes: CID 1458139, 1458138, 1458137, 1458133, 1458132
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210720050726.2737405-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The -append option is currently not compatible with -bios (as we don't
yet emulate nvram so we can only put it in the environment with VOF).
Therefore a warning is printed if -append is used with -bios but
because the default value of kernel_cmdline seems to be an empty
string instead of NULL this warning was printed even without -append
when -bios is used. Only print warning if -append is given.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <483ac599a1407b766179aaea2794aed60cc09f53.1626367844.git.balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
From clang-13:
hw/ppc/spapr_events.c:937:14: error: variable 'xinfo' set but not used \
[-Werror,-Wunused-but-set-variable]
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The ATI VGA device isn't a requisite for the Pegasos2 machine
because Linux only uses the serial console; see commit ba7e5ac18e
("hw/ppc: Add emulation of Genesi/bPlan Pegasos II") for rationale.
Using the default devices we don't have any problem:
$ qemu-system-ppc -M pegasos2
qemu-system-ppc: standard VGA not available
But when trying to explicitly use the ATI device we get an error:
$ qemu-system-ppc -M pegasos2 -vga none -bios pegasos2.rom -device ati-vga,romfile=
qemu-system-ppc: -device ati-vga,romfile=: 'ati-vga' is not a valid device model name
Add it as an implicit Kconfig dependency.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210515173716.358295-13-philmd@redhat.com>
Acked-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Commit 7abb479c7a ("PPC: E500: Add FSL I2C controller and integrate
RTC with it") added a global dependency on the DS1338 model, instead
of a machine one (via Kconfig). This gives trouble when building
standalone machines not exposing I2C bus:
The following clauses were found for DS1338
CONFIG_DS1338=y
config DS1338 depends on I2C
Fix by selecting the DS1338 symbol in the single machine requiring
it, the E500.
Fixes: 7abb479c7a ("PPC: E500: Add FSL I2C controller and integrate RTC with it")
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210513163858.3928976-9-philmd@redhat.com>
Linux needs setprop to fix up the device tree, otherwise it's not
finding devices and cannot boot. Since recent VOF change now we need
to add a callback to allow this which is what this patch does.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20210709132920.6544E7457EF@zero.eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The ASIC PCI bridge chipset from Motorola is named 'Raven'.
This chipset is used in the PowerPC Reference Platform (PReP),
but not restricted to it. Rename it accordingly.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210417103028.601124-5-f4bug@amsat.org>
Introduce an usb device flag instead, set it when usb-host looks at the
device descriptors anyway. Also set it for emulated storage devices,
for consistency. Add an inline helper function to check the flag.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jose R. Ziviani <jziviani@suse.de>
Message-Id: <20210624103836.2382472-32-kraxel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If KVM_CAP_RPT_INVALIDATE KVM capability is enabled, then
- indicate the availability of H_RPT_INVALIDATE hcall to the guest via
ibm,hypertas-functions property.
- Enable the hcall
Both the above are done only if the new sPAPR machine capability
cap-rpt-invalidate is set.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Message-Id: <20210706112440.1449562-3-bharata@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This addresses the comments from v22.
The functional changes are (the VOF ones need retesting with Pegasos2):
(VOF) setprop will start failing if the machine class callback
did not handle it;
(VOF) unit addresses are lowered in path_offset();
(SPAPR) /chosen/bootargs is initialized from kernel_cmdline if
the client did not change it.
Fixes: 5c991e5d4378 ("spapr: Implement Open Firmware client interface")
Cc: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210708065625.548396-1-aik@ozlabs.ru>
Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Linux uses RTAS functions to access PCI devices so we need to provide
these with VOF. Implement some of the most important functions to
allow booting Linux with VOF. With this the board is now usable
without a binary ROM image and we can enable it by default as other
boards.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20210708215113.B3F747456E3@zero.eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pegasos2 board comes with an Open Firmware compliant ROM based on
SmartFirmware but it has some changes that are not open source
therefore the ROM binary cannot be included in QEMU. Guests running on
the board however depend on services provided by the firmware. The
Virtual Open Firmware recently added to QEMU implements a minimal set
of these services to allow some guests to boot without the original
firmware. This patch adds VOF as the default firmware for pegasos2
which allows booting Linux and MorphOS via -kernel option while a ROM
image can still be used with -bios for guests that don't run with VOF.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <1d6ed6f290c5c1f0b5a1e1c51cf1151452d70d9a.1624811233.git.balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There are several new L1D cache flush bits added to the hcall which reflect
hardware security features for speculative cache access issues.
These behaviours are now being specified as negative in order to simplify
patched kernel compatibility with older firmware (a new problem found in
existing systems would automatically be vulnerable).
[dwg: Technically this changes behaviour for existing machine types.
After discussion with Nick, we've determined this is safe, because
the worst that will happen if a guest gets the wrong information due
to a migration is that it will perform some unnecessary workarounds,
but will remain correct and secure (well, as secure as it was going
to be anyway). In addition the change only affects cap-cfpc=safe
which is not enabled by default, and in fact is not possible to set
on any current hardware (though it's expected it will be possible on
POWER10)]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20210615044107.1481608-1-npiggin@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add own machine state structure which will be used to store state
needed for firmware emulation.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <7f6d5fbf4f70c64dba001483174a2921dd616ecd.1624811233.git.balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PAPR platform describes an OS environment that's presented by
a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.
Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.
This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.
The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.
This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.
This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.
In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.
When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.
This adds basic instances support which are managed by a hash map
ihandle -> [phandle].
Before the guest started, the used memory is:
0..e60 - the initial firmware
8000..10000 - stack
400000.. - kernel
3ea0000.. - initramdisk
This OF CI does not implement "interpret".
Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.
With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
The immediate benefit is much faster booting time which especially
crucial with fully emulated early CPU bring up environments. Also this
may come handy when/if GRUB-in-the-userspace sees light of the day.
This separates VOF and sPAPR in a hope that VOF bits may be reused by
other POWERPC boards which do not support pSeries.
This assumes potential support for booting from QEMU backends
such as blockdev or netdev without devices/drivers used.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210625055155.2252896-1-aik@ozlabs.ru>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
[dwg: Adjusted some includes which broke compile in some more obscure
compilation setups]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU reserves space for RTAS via /rtas/rtas-size which tells the client
how much space the RTAS requires to work which includes the RTAS binary
blob implementing RTAS runtime. Because pseries supports FWNMI which
requires plenty of space, QEMU reserves more than 2KB which is
enough for the RTAS blob as it is just 20 bytes (under QEMU).
Since FWNMI reset delivery was added, RTAS_SIZE macro is not used anymore.
This replaces RTAS_SIZE with RTAS_MIN_SIZE and uses it in
the /rtas/rtas-size calculation to account for the RTAS blob.
Fixes: 0e236d3477 ("ppc/spapr: Implement FWNMI System Reset delivery")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210622070336.1463250-1-aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU is failing to launch a CGS pSeries guest in a host that has PEF
support:
qemu-system-ppc64: ../softmmu/vl.c:2585: qemu_machine_creation_done: Assertion `machine->cgs->ready' failed.
Aborted
This is happening because we're not setting the cgs->ready flag that is
asserted in qemu_machine_creation_done() during machine start.
cgs->ready is set in s390_pv_kvm_init() and sev_kvm_init(). Let's set it
in kvmppc_svm_init() as well.
Reported-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210528201619.52363-1-danielhb413@gmail.com>
Acked-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
TCG does not keep track of AIL mode in a central place, it's based on
the current LPCR[AIL] bits. Synchronize the new CPU's LPCR to the
current LPCR in rtas_start_cpu(), similarly to the way the ILE bit is
synchronized.
Open-code the ILE setting as well now that the caller's LPCR is
available directly, there is no need for the indirection.
Without this, under both TCG and KVM, adding a POWER8/9/10 class CPU
with a new core ID after a modern Linux has booted results in the new
CPU's LPCR missing the LPCR[AIL]=0b11 setting that the other CPUs have.
This can cause crashes and unexpected behaviour.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20210526091626.3388262-3-npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 47a9b55154 ("spapr: Clean up handling of LPCR power-saving exit
bits") moved this logic but did not remove the comment from the
previous location.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20210526091626.3388262-2-npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The FDT code is adding the pmem root node by name "persistent-memory"
which should have been "ibm,persistent-memory".
The linux fetches the device tree nodes by type and it has been working
correctly as the type is correct. If someone searches by its intended
name it would fail, so fix that.
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-Id: <162204278956.219.9061511386011411578.stgit@cc493db1e665>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The subsequent patches add definitions which tend to get
the compilation to cyclic dependency. So, prepare with
forward declarations, move the definitions and clean up.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-Id: <162133925415.610.11584121797866216417.stgit@4f1e6f2bd33e>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
With upstream kernel, especially after commit 98ba956f6a389
("powerpc/pseries/eeh: Rework device EEH PE determination") we see that KVM
guest isn't able to enable EEH option for PCI pass-through devices anymore.
[root@atest-guest ~]# dmesg | grep EEH
[ 0.032337] EEH: pSeries platform initialized
[ 0.298207] EEH: No capable adapters found: recovery disabled.
[root@atest-guest ~]#
So far the linux kernel was assuming pe_config_addr equal to device's
config_addr and using it to enable EEH on the PE through ibm,set-eeh-option
RTAS call. Which wasn't the correct way as per PAPR. The linux kernel
commit 98ba956f6a389 fixed this flow. With that fixed, linux now uses PE
config address returned by ibm,get-config-addr-info2 RTAS call to enable
EEH option per-PE basis instead of per-device basis. However this has
uncovered a bug in qemu where ibm,set-eeh-option is treating PE config
address as per-device config address.
Hence in qemu guest with recent kernel the ibm,set-eeh-option RTAS call
fails with -3 return value indicating that there is no PCI device exist for
the specified PE config address. The rtas_ibm_set_eeh_option call uses
pci_find_device() to get the PC device that matches specific bus and devfn
extracted from PE config address passed as argument. Thus it tries to map
the PE config address to a single specific PCI device 'bus->devices[devfn]'
which always results into checking device on slot 0 'bus->devices[0]'.
This succeeds when there is a pass-through device (vfio-pci) present on
slot 0. But in cases where there is no pass-through device present in slot
0, but present in non-zero slots, ibm,set-eeh-option call fails to enable
the EEH capability.
hw/ppc/spapr_pci_vfio.c: spapr_phb_vfio_eeh_set_option()
case RTAS_EEH_ENABLE: {
PCIHostState *phb;
PCIDevice *pdev;
/*
* The EEH functionality is enabled on basis of PCI device,
* instead of PE. We need check the validity of the PCI
* device address.
*/
phb = PCI_HOST_BRIDGE(sphb);
pdev = pci_find_device(phb->bus,
(addr >> 16) & 0xFF, (addr >> 8) & 0xFF);
if (!pdev || !object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
return RTAS_OUT_PARAM_ERROR;
}
hw/pci/pci.c:pci_find_device()
PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
{
bus = pci_find_bus_nr(bus, bus_num);
if (!bus)
return NULL;
return bus->devices[devfn];
}
This patch fixes ibm,set-eeh-option to check for presence of any PCI device
(vfio-pci) under specified bus and enable the EEH if found. The current
code already makes sure that all the devices on that bus are from same
iommu group (within same PE) and fail very early if it does not.
After this fix guest is able to find EEH capable devices and enable EEH
recovery on it.
[root@atest-guest ~]# dmesg | grep EEH
[ 0.048139] EEH: pSeries platform initialized
[ 0.405115] EEH: Capable adapter found: recovery enabled.
[root@atest-guest ~]#
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Message-Id: <162158429107.145117.5843504911924013125.stgit@jupiter>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU 6.0 moved all the -boot variables to the machine. Especially, the
removal of the boot_order static changed the handling of '-boot once'
from:
if (boot_once) {
qemu_boot_set(boot_once, &error_fatal);
qemu_register_reset(restore_boot_order, g_strdup(boot_order));
}
to
if (current_machine->boot_once) {
qemu_boot_set(current_machine->boot_once, &error_fatal);
qemu_register_reset(restore_boot_order,
g_strdup(current_machine->boot_order));
}
This means that we now register as subsequent boot order a copy
of current_machine->boot_once that was just set with the previous
call to qemu_boot_set(), i.e. we never transition away from the
once boot order.
It is certainly fragile^Wwrong for the spapr code to hijack a
field of the base machine type object like that. The boot order
rework simply turned this software boundary violation into an
actual bug.
Have the spapr code to handle that with its own field in
SpaprMachineState. Also kfree() the initial boot device
string when "once" was used.
Fixes: 4b7acd2ac8 ("vl: clean up -boot variables")
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1960119
Cc: pbonzini@redhat.com
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210521160735.1901914-1-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit e50caf4a5c ("tracing: convert documentation to rST")
converted docs/devel/tracing.txt to docs/devel/tracing.rst.
We still have several references to the old file, so let's fix them
with the following command:
sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt)
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210517151702.109066-2-sgarzare@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Per the kconfig.rst:
A device should be listed [...] ``imply`` if (depending on
the QEMU command line) the board may or may not be started
without it.
This is the case with the NVDIMM device, so use the 'imply'
weak reverse dependency to select the symbol.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210511155354.3069141-2-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Moved has_spr to cpu.h as ppc_has_spr and turned it into an inline function.
Change spr verification in pnv.c and spapr.c to a version that can
compile in a !TCG environment.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Message-Id: <20210507164146.67086-1-lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The function ppc_hash64_filter_pagesizes has been moved from a function
with prototype in mmu-hash64.h and implemented in mmu-hash64.c to
a static function in hw/ppc/spapr_caps.c as it's only used in that file.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Message-Id: <20210506163941.106984-3-lucas.araujo@eldorado.org.br>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The hypercalls h_enter, h_remove, h_bulk_remove, h_protect, and h_read,
have been moved to spapr_softmmu.c with the functions they depend on. The
functions is_ram_address and push_sregs_to_kvm_pr are not static anymore
as functions on both spapr_hcall.c and spapr_softmmu.c depend on them.
The hypercalls h_resize_hpt_prepare and h_resize_hpt_commit have been
divided, the KVM part stayed in spapr_hcall.c while the softmmu part
was moved to spapr_softmmu.c
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Message-Id: <20210506163941.106984-2-lucas.araujo@eldorado.org.br>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Starting with Linux kernel v5.12 we dropped support[1] in KVM for
hosts that can't have their threads running in different MMU modes
(POWER9 < DD2.2). In these hosts, KVM will no longer report the
KVM_CAP_PPC_MMU_HASH_V3 capability[2] when the host is running Radix.
For guests that support both MMU modes, the negotiation during CAS
will make sure it selects the correct one.
For guests that only support Hash, such as P8 compat mode guests, the
following error is currently thrown:
$ ~/qemu-system-ppc64 -machine pseries,accel=kvm,max-cpu-compat=power8 ...
error: kvm run failed Invalid argument
NIP 0000000000000100 LR 0000000000000000 CTR 0000000000000000 XER 0000000000000000 CPU#0
MSR 8000000000001000 HID0 0000000000000000 HF 8000000000000000 iidx 3 didx 3
TB 00000000 00000000 DECR 0
GPR00 0000000000000000 0000000000000000 0000000000000000 000000007ff00000
GPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
CR 00000000 [ - - - - - - - - ] RES ffffffffffffffff
SRR0 0000000000000000 SRR1 0000000000000000 PVR 00000000004e1201 VRSAVE 0000000000000000
SPRG0 0000000000000000 SPRG1 0000000000000000 SPRG2 0000000000000000 SPRG3 0000000000000000
SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000
HSRR0 0000000000000000 HSRR1 0000000000000000
CFAR 0000000000000000
LPCR 000000000004f01f
PTCR 0000000000000000 DAR 0000000000000000 DSISR 0000000000000000
This patch adds a verification during the writing of the platform
support vector so that we error out as soon as we determine this guest
only supports Hash and the host doesn't.
~/qemu-system-ppc64 -machine pseries,accel=kvm,max-cpu-compat=power8 ...
qemu-system-ppc64: Guest requested unavailable MMU mode (hash).
1- https://git.kernel.org/torvalds/p/b1b1697ae0cc8
2- https://git.kernel.org/torvalds/p/a722076e94702
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20210505001130.3999968-3-farosas@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A following patch will make use of it.
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20210505001130.3999968-2-farosas@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Here's the first ppc pull request for qemu-6.1. It has a wide variety
of stuff accumulated during the 6.0 freeze. Highlights are:
* Multi-phase reset cleanups for PAPR
* Preliminary cleanups towards allowing !CONFIG_TCG for the ppc target
* Cleanup of AIL logic and extension to POWER10
* Further improvements to handling of hot unplug failures on PAPR
* Allow much larger numbers of CPU on pseries
* Support for the H_SCM_HEALTH hypercall
* Add support for the Pegasos II board
* Substantial cleanup to hflag handling
* Assorted minor fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAmCQ4ScACgkQbDjKyiDZ
s5KmNhAAsICdDqeu/jm1uhRCr0DDT/Wa6KE1xlglQ53ybWb5Hm2ae0Uwzti5ZWkt
T9yryObX++wiugbU5Dlx9eXTiJIPgTbDoBV1wfOa3a1BAxSEES1t70jwuwAXXBpX
mgU++SurQB70IB7vVvyXDi2Z592qGvMiKXqT0sdkfoexPHzAL0+KkQPyJZLeFchM
Ap/zRHAodXf9SuWAl+LwLXeb350jivXYXBWNcFRrBbOGpbVT0AJMYrk/TEa2ZIpi
SvbzAWuW+9mX0EOmk7JK5JfkT41cGNdcBcwd0bt4xyvUpmkXLaTMFDLVHj3HWSUn
PFA4RB3uKXyTfISVtWdxJBbFOzMpchI6lEiRJHCS+KuY7UsACqV1T/y54ATOUauC
ycLc9APgRaStdNPxfDl+xeFfoVb/f0mQsNwcmY1tv7z+3qE/trY9bMyrbgaebBFn
/TAkmPvXfwtAREnx8xF/57poarWUkvupGTQkANNosdFokpExmrLj8T0sKv90hh5Y
vkGf5zP4pYGN1Rs8qhOdHu+IjhVJvUl/L3LZYWcoMI6E61D8rGRc0Dkacx7gcja+
sluFi5Yh2fQn55y6LTi3049cB1wMd6wly0214F11RKoBswguiGuaqJmL4sNDO/s4
IcMCy5mg6C0jNZA5kHcdWmqsVzD2+XwP5J29n/LedlmgXoHYF+M=
=N0qr
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging
ppc patch queue 2021-05-04
Here's the first ppc pull request for qemu-6.1. It has a wide variety
of stuff accumulated during the 6.0 freeze. Highlights are:
* Multi-phase reset cleanups for PAPR
* Preliminary cleanups towards allowing !CONFIG_TCG for the ppc target
* Cleanup of AIL logic and extension to POWER10
* Further improvements to handling of hot unplug failures on PAPR
* Allow much larger numbers of CPU on pseries
* Support for the H_SCM_HEALTH hypercall
* Add support for the Pegasos II board
* Substantial cleanup to hflag handling
* Assorted minor fixes and cleanups
# gpg: Signature made Tue 04 May 2021 06:52:39 BST
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dg-gitlab/tags/ppc-for-6.1-20210504: (46 commits)
hw/ppc/pnv_psi: Use device_cold_reset() instead of device_legacy_reset()
hw/ppc/spapr_vio: Reset TCE table object with device_cold_reset()
hw/intc/spapr_xive: Use device_cold_reset() instead of device_legacy_reset()
target/ppc: removed VSCR from SPR registration
target/ppc: Reduce the size of ppc_spr_t
target/ppc: Clean up _spr_register et al
target/ppc: Add POWER10 exception model
target/ppc: rework AIL logic in interrupt delivery
target/ppc: move opcode table logic to translate.c
target/ppc: code motion from translate_init.c.inc to gdbstub.c
spapr_drc.c: handle hotunplug errors in drc_unisolate_logical()
spapr.h: increase FDT_MAX_SIZE
spapr.c: do not use MachineClass::max_cpus to limit CPUs
ppc: Rename current DAWR macros and variables
target/ppc: POWER10 supports scv
target/ppc: Fix POWER9 radix guest HV interrupt AIL behaviour
docs/system: ppc: Add documentation for ppce500 machine
roms/u-boot: Bump ppce500 u-boot to v2021.04 to fix broken pci support
roms/Makefile: Update ppce500 u-boot build directory name
ppc/spapr: Add support for implement support for H_SCM_HEALTH
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The pnv_psi.c code uses device_legacy_reset() for two purposes:
* to reset itself from its qemu_register_reset() handler
* to reset a XiveSource object it has
Neither it nor the XiveSource have any qbuses, so the new
device_cold_reset() function (which resets both the device and its
child buses) is equivalent here to device_legacy_reset() and we can
just switch to the new API.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210503151849.8766-4-peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The spapr_vio_quiesce_one() function resets the TCE table object
(TYPE_SPAPR_TCE_TABLE) via device_legacy_reset(). We know that
objects of that type do not have a qbus of their own, so the new
device_cold_reset() function (which resets both the device and its
child buses) is equivalent here to device_legacy_reset() and we can
just switch to the new API.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210503151849.8766-3-peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
POWER10 adds a new bit that modifies interrupt behaviour, LPCR[HAIL],
and it removes support for the LPCR[AIL]=0b10 mode.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20210501072436.145444-3-npiggin@gmail.com>
[dwg: Corrected tab indenting]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The AIL logic is becoming unmanageable spread all over powerpc_excp(),
and it is slated to get even worse with POWER10 support.
Move it all to a new helper function.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20210501072436.145444-2-npiggin@gmail.com>
[dwg: Corrected tab indenting]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
At this moment, PAPR does not provide a way to report errors during a
device removal operation. This led the pSeries machine to implement
extra mechanisms to try to fallback and recover from an error that might
have happened during the hotunplug in the guest side. This started to
change a bit with commit fe1831eff8 ("spapr_drc.c: use DRC
reconfiguration to cleanup DIMM unplug state"), where one way to
fallback from a memory removal error was introduced.
Around the same time, in [1], the idea of using RTAS set-indicator for
this role was first introduced. The RTAS set-indicator call, when
attempting to UNISOLATE a DRC that is already UNISOLATED or CONFIGURED,
returns RTAS_OK and does nothing else for both QEMU and phyp. This gives
us an opportunity to use this behavior to signal the hypervisor layer
when a device removal errir happens, allowing QEMU/phyp to do a proper
error handling. Using set-indicator to report HP errors isn't strange to
PAPR, as per R1-13.5.3.4-4. of table 13.7 of current PAPR [2]:
"For all DR options: If this is a DR operation that involves the user
insert- ing a DR entity, then if the firmware can determine that the
inserted entity would cause a system disturbance, then the set-indicator
RTAS call must not unisolate the entity and must return an error status
which is unique to the particular error."
A change was proposed to the pSeries Linux kernel to call set-indicator
to move a DRC to 'unisolate' in the case of a hotunplug error in the
guest side [3]. Setting a DRC that is already unisolated or configured to
'unisolate' is a no-op (returns RTAS_OK) for QEMU and also for phyp.
Being a benign change for hypervisors that doesn't care about handling
such errors, we expect the kernel to accept this change at some point.
This patch prepares the pSeries machine for this new kernel feature by
changing drc_unisolate_logical() to handle guest side hotunplug errors.
For CPUs it's a simple matter of setting drc->unplug_requested to 'false',
while for LMBs the process is similar to the rollback that is done in
rtas_ibm_configure_connector().
[1] https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06395.html
[2] https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200611.pdf
[3] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210416210216.380291-3-danielhb413@gmail.com/
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210420165100.108368-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Up to this patch, 'max_cpus' value is hardcoded to 1024 (commit
6244bb7e58). In theory this patch would simply bump it to 2048, since
it's the default NR_CPUS kernel setting for ppc64 servers nowadays, but
the whole mechanic of MachineClass:max_cpus is flawed for the pSeries
machine. The two supported accelerators, KVM and TCG, can live without
it.
TCG guests don't have a theoretical limit. The user must be free to
emulate as many CPUs as the hardware is capable of. And even if there
were a limit, max_cpus is not the proper way to report it since it's a
common value checked by SMP code in machine_smp_parse() for KVM as well.
For KVM guests, the proper way to limit KVM CPUs is by host
configuration via NR_CPUS, not a QEMU hardcoded value. There is no
technical reason for a pSeries QEMU guest to forcefully stay below
NR_CPUS.
This hardcoded value also disregard hosts that might have a lower
NR_CPUS limit, say 512. In this case, machine.c:machine_smp_parse() will
allow a 1024 value to pass, but then kvm_init() will complain about it
because it will exceed NR_CPUS:
Number of SMP cpus requested (1024) exceeds the maximum cpus supported
by KVM (512)
A better 'max_cpus' value would consider host settings, but
MachineClass::max_cpus is defined well before machine_init() and
kvm_init(). We can't check for KVM limits because it's too soon, so we
end up making a guess.
This patch makes MachineClass:max_cpus settings innocuous by setting it
to INT32_MAX. machine.c:machine_smp_parse() will not fail the
verification based on max_cpus, letting kvm_init() do the checking with
actual host settings. And TCG guests get to do whatever the hardware is
capable of emulating.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210408204049.221802-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add support for H_SCM_HEALTH hcall described at [1] for spapr
nvdimms. This enables guest to detect the 'unarmed' status of a
specific spapr nvdimm identified by its DRC and if its unarmed, mark
the region backed by the nvdimm as read-only.
The patch adds h_scm_health() to handle the H_SCM_HEALTH hcall which
returns two 64-bit bitmaps (health bitmap, health bitmap mask) derived
from 'struct nvdimm->unarmed' member.
Linux kernel side changes to enable handling of 'unarmed' nvdimms for
ppc64 are proposed at [2].
References:
[1] "Hypercall Op-codes (hcalls)"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n220
[2] "powerpc/papr_scm: Mark nvdimm as unarmed if needed during probe"
https://lore.kernel.org/linux-nvdimm/20210329113103.476760-1-vaibhav@linux.ibm.com/
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Message-Id: <20210402102128.213943-1-vaibhav@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
SLOF instantiates RTAS since
744a928cce ("spapr: Stop providing RTAS blob")
so the max address applies to the FDT only.
This renames the macro and fixes up the comment.
This should not cause any behavioral change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210331025123.29310-1-aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add new machine called pegasos2 emulating the Genesi/bPlan Pegasos II,
a PowerPC board based on the Marvell MV64361 system controller and the
VIA VT8231 integrated south bridge/superio chips. It can run Linux,
AmigaOS and a wide range of MorphOS versions. Currently a firmware ROM
image is needed to boot and only MorphOS has a video driver to produce
graphics output. Linux could work too but distros that supported this
machine don't include usual video drivers so those only run with
serial console for now.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <30cbfb9cbe6f46a1e15a69a75fac45ac39340122.1616680239.git.balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210315184615.1985590-16-richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210315184615.1985590-15-richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On Mac99 and newer machines, the Uninorth PCI host bridge maps
the PCI hole region at 2GiB, so the RAM area beside 2GiB is not
accessible by the CPU. Restrict the memory to 2GiB to avoid
problems such the one reported in the buglink.
Buglink: https://bugs.launchpad.net/qemu/+bug/1922391
Reported-by: Håvard Eidnes <he@NetBSD.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210406084842.2859664-1-f4bug@amsat.org>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Stop including exec/address-spaces.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-5-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Stop including cpu.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-4-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Stop including hw/boards.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-3-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Stop including sysemu/sysemu.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-2-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Many files include qemu/log.h without needing it. Remove the superfluous
include statements.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20210328054833.2351597-1-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Many files include hw/irq.h without needing it. Remove the superfluous
include statements.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20210327050236.2232347-1-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Commit 47c8c915b1 fixed a problem where multiple spapr_drc_detach()
requests were breaking QEMU. The solution was to just spapr_drc_detach()
once, and use spapr_drc_unplug_requested() to filter whether we already
detached it or not. The commit also tied the hotplug request to the
guest in the same condition.
Turns out that there is a reliable way for a CPU hotunplug to fail. If a
guest with one CPU hotplugs a CPU1, then offline CPU0s via 'echo 0 >
/sys/devices/system/cpu/cpu0/online', then attempts to hotunplug CPU1,
the kernel will refuse it because it's the last online CPU of the
system. Given that we're pulsing the IRQ only in the first try, in a
failed attempt, all other CPU1 hotunplug attempts will fail, regardless
of the online state of CPU1 in the kernel, because we're simply not
letting the guest know that we want to hotunplug the device.
Let's move spapr_hotplug_req_remove_by_index() back out of the "if
(!spapr_drc_unplug_requested(drc))" conditional, allowing for multiple
'device_del' requests to the same CPU core to reach the guest, in case
the CPU core didn't fully hotunplugged previously.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210401000437.131140-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pseries machines introduced the concept of 'unplug timeout' for CPU
hotunplugs. The idea was to circunvent a deficiency in the pSeries
specification (PAPR), that currently does not define a proper way for
the hotunplug to fail. If the guest refuses to release the CPU (see [1]
for an example) there is no way for QEMU to detect the failure.
Further discussions about how to send a QAPI event to inform about the
hotunplug timeout [2] exposed problems that weren't predicted back when
the idea was developed. Other QEMU machines don't have any type of
hotunplug timeout mechanism for any device, e.g. ACPI based machines
have a way to make hotunplug errors visible to the hypervisor. This
would make this timeout mechanism exclusive to pSeries, which is not
ideal.
The real problem is that a QAPI event that reports hotunplug timeouts
puts the management layer (namely Libvirt) in a weird spot. We're not
telling that the hotunplug failed, because we can't be 100% sure of
that, and yet we're resetting the unplug state back, preventing any
DEVICE_DEL events to reach out in case the guest decides to release the
device. Libvirt would need to inspect the guest itself to see if the
device was released or not, otherwise the internal domain states will be
inconsistent. Moreover, Libvirt already has an 'unplug timeout'
concept, and a QEMU side timeout would need to be juggled together with
the existing Libvirt timeout.
All this considered, this solution ended up creating more trouble than
it solved. This patch reverts the 3 commits that introduced the timeout
mechanism for CPU hotplugs in pSeries machines.
This reverts commit 4515a5f786
"qemu_timer.c: add timer_deadline_ms() helper"
This reverts commit d1c2e3ce3d
"spapr_drc.c: add hotunplug timeout for CPUs"
This reverts commit 51254ffb32
"spapr_drc.c: introduce unplug_timeout_timer"
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1911414
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg04682.html
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210401000437.131140-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The e500plat machine device plug callback currently calls
platform_bus_link_device() for any sysbus device. This is overly
broad, because platform_bus_link_device() will unconditionally grab
the IRQs and MMIOs of the device it is passed, whether it was
intended for the platform bus or not. Restrict hotpluggability of
sysbus devices to only those devices on the dynamic sysbus allowlist.
We were mostly getting away with this because the board creates the
platform bus as the last device it creates, and so the hotplug
callback did not do anything for all the sysbus devices created by
the board itself. However if the user plugged in a device which
itself uses a sysbus device internally we would have mishandled this
and probably asserted. An example of this is:
qemu-system-ppc64 -M ppce500 -device macio-oldworld
This isn't a sensible command because the macio-oldworld device
is really specific to the 'g3beige' machine, but we now fail
with a reasonable error message rather than asserting:
qemu-system-ppc64: Device heathrow is not supported by this machine yet.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-id: 20210325153310.9131-5-peter.maydell@linaro.org
spapr_memory_unplug() is the last step of the hot unplug sequence.
It is indirectly called by:
spapr_lmb_release()
hotplug_handler_unplug()
and spapr_lmb_release() already buys us that DIMM unplug state is
present : it gets restored with spapr_recover_pending_dimm_state()
if missing.
g_assert() that spapr_pending_dimm_unplugs_find() cannot return NULL
in spapr_memory_unplug() to make this clear and silence Coverity.
Fixes: Coverity CID 1450767
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <161562021166.948373.15092876234470478331.stgit@bahia.lan>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Per devicetree spec v0.3 [1] chapter 2.3.5:
The #address-cells and #size-cells properties are not inherited
from ancestors in the devicetree. They shall be explicitly defined.
If missing, a client program should assume a default value of 2
for #address-cells, and a value of 1 for #size-cells.
These properties are currently missing, causing the <reg> property
of the queue-group subnode to be incorrectly parsed using default
values.
[1] https://github.com/devicetree-org/devicetree-specification/releases/download/v0.3/devicetree-specification-v0.3.pdf
Fixes: fdfb7f2cdb ("e500: Add support for eTSEC in device tree")
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20210311081608.66891-1-bmeng.cn@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The 'ide-hd' and 'ide-cd' devices provide suitable alternatives.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Previous work on dev-iotlb message broke spapr_iommu/vhost integration
as it did for SMMU and virtio-iommu. The spapr_iommu currently
only sends IOMMU_NOTIFIER_UNMAP notifications. Since commit
958ec334bc ("vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support"),
VHOST first tries to register IOMMU_NOTIFIER_DEVIOTLB_UNMAP notifier
and if it fails, falls back to legacy IOMMU_NOTIFIER_UNMAP. So
spapr_iommu must fail on the IOMMU_NOTIFIER_DEVIOTLB_UNMAP
registration.
Reported-by: Peter Xu <peterx@redhat.com>
Fixes: b68ba1ca57 ("memory: Add IOMMU_NOTIFIER_DEVIOTLB_UNMAP IOMMUTLBNotificationType")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210209213233.40985-3-eric.auger@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Next batch of patches for the ppc target and machine types. Includes:
* Several cleanups for sm501 from Peter Maydell
* An update to the SLOF guest firmware
* Improved handling of hotplug failures in spapr, associated cleanups
to the hotplug handling code
* Several etsec fixes and cleanups from Bin Meng
* Assorted other fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAmBIRlUACgkQbDjKyiDZ
s5JGlxAApWKpxdtMwrxvQ7EX95XtDWY0v2Jpl3ZKLhYgWJ28pt1SfsDUlA9KhlDd
syXITpyspECe9kjOAKEim4J0y5sMVlTw8KjzIVPMik4uyoLTOBwE+nRmwPnmnWEy
9ZH0J+QOonQYh3jCp7JbTGU2ZW5pJ9s/sv8bPbzXfrR07HbAJ2+MjUkTVxkSVJAq
QUvo/jMntu+a1HFU8Eiw8VyyIcIOAQyS469xzUiHHzKFlR8XodE56Vj+oh6ZFtaA
cB2h4U51uzGfpz+GISm3lZUHSVnWQSFwLAc4x66aRsnLiQ66iAu8N0jRh8lsoW0y
FHF+uGp3AFUARHOiCRk0r7+s29gbu+lX2jogfddj+qj7mGIZXd2tMfrrG3eWsB2C
HvNby4xzyyDaguHK7N0/C42B8OX5dy2pxOP5lvdzL20ip97AKRGXngyM7LhYH8yw
4uzdebYVFu0KkLri4Qzxjm/GxgzrCbWIe5ImsDIlnmY1cJ7NKQYPzFX56xqq147y
6USFQu7RM9E03vj3c9UIkmK0KhL8GQvYxX4dMWIUjtjeLGJuN5seKBkl5mH2OSEJ
D9svKOanXmsZYS0A25VX9FRX263zbJ1HIkDmGzpLi7HULdRy78e89rJk6490WNDr
mnLogO+ttBvhEaLUsIVrWwLd21JW/A2NHuEz0+KELr9ZOQMYRj8=
=/uyx
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.0-20210310' into staging
ppc patch queue for 2021-03-10
Next batch of patches for the ppc target and machine types. Includes:
* Several cleanups for sm501 from Peter Maydell
* An update to the SLOF guest firmware
* Improved handling of hotplug failures in spapr, associated cleanups
to the hotplug handling code
* Several etsec fixes and cleanups from Bin Meng
* Assorted other fixes and cleanups
# gpg: Signature made Wed 10 Mar 2021 04:08:53 GMT
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dg-gitlab/tags/ppc-for-6.0-20210310:
spapr.c: send QAPI event when memory hotunplug fails
spapr.c: remove duplicated assert in spapr_memory_unplug_request()
target/ppc: fix icount support on Book-e vms accessing SPRs
qemu_timer.c: add timer_deadline_ms() helper
spapr_pci.c: add 'unplug already in progress' message for PCI unplug
spapr.c: add 'unplug already in progress' message for PHB unplug
hw/ppc: e500: Add missing <ranges> in the eTSEC node
hw/net: fsl_etsec: Fix build error when HEX_DUMP is on
spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
spapr_drc.c: add hotunplug timeout for CPUs
spapr_drc.c: introduce unplug_timeout_timer
target/ppc: Fix bcdsub. emulation when result overflows
docs/system: Extend PPC section
spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable
pseries: Update SLOF firmware image
spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical()
hw/display/sm501: Inline template header into C file
hw/display/sm501: Expand out macros in template header
hw/display/sm501: Remove dead code for non-32-bit RGB surfaces
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The 'running' argument from VMChangeStateHandler does not require
other value than 0 / 1. Make it a plain boolean.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210111152020.1422021-3-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Recent changes allowed the pSeries machine to rollback the hotunplug
process for the DIMM when the guest kernel signals, via a
reconfiguration of the DR connector, that it's not going to release the
LMBs.
Let's also warn QAPI listerners about it. One place to do it would be
right after the unplug state is cleaned up,
spapr_clear_pending_dimm_unplug_state(). This would mean that the
function is now doing more than cleaning up the pending dimm state
though.
This patch does the following changes in spapr.c:
- send a QAPI event to inform that we experienced a failure in the
hotunplug of the DIMM;
- rename spapr_clear_pending_dimm_unplug_state() to
spapr_memory_unplug_rollback(). This is a better fit for what the
function is now doing, and it makes callers care more about what the
function goal is and less about spapr.c internals such as clearing
the pending dimm unplug state.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210302141019.153729-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We are asserting the existence of the first DRC LMB after sending unplug
requests to all LMBs of the DIMM, where every DRC is being asserted
inside the loop. This means that the first DRC is being asserted twice.
Remove the duplicated assert.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210302141019.153729-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pSeries machine is using QEMUTimer internals to return the timeout
in seconds for a timer object, in hw/ppc/spapr.c, function
spapr_drc_unplug_timeout_remaining_sec().
Create a helper in qemu-timer.c to retrieve the deadline for a QEMUTimer
object, in ms, to avoid exposing timer internals to the PPC code.
CC: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210301124133.23800-2-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Hotunplug for all other devices are warning the user when the hotunplug
is already in progress. Do the same for PCI devices in
spapr_pci_unplug_request().
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210226163301.419727-5-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Both CPU hotunplug and PC_DIMM unplug reports an user warning,
mentioning that the hotunplug is in progress, if consecutive
'device_del' are issued in quick succession.
Do the same for PHBs in spapr_phb_unplug_request().
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210226163301.419727-4-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The eTSEC node should provide an empty <ranges> property in the
eTSEC node, otherwise of_translate_address() in the Linux kernel
fails to get the eTSEC register base, reporting:
OF: ** translation for device /platform@f00000000/ethernet@0/queue-group **
OF: bus is default (na=1, ns=1) on /platform@f00000000/ethernet@0
OF: translating address: 00000000
OF: parent bus is default (na=1, ns=1) on /platform@f00000000
OF: no ranges; cannot translate
Per devicetree spec v0.3 [1] chapter 2.3.8:
If the property is not present in a bus node, it is assumed that
no mapping exists between children of the node and the parent
address space.
This is why of_translate_address() aborts the address translation.
Apparently U-Boot devicetree parser seems to be tolerant with
missing <ranges> as this was not noticed when testing with U-Boot.
The empty <ranges> property is present in all kernel shipped dtsi
files for eTSEC, Let's add it to conform with the spec.
[1] https://github.com/devicetree-org/devicetree-specification/releases/download/v0.3/devicetree-specification-v0.3.pdf
Fixes: fdfb7f2cdb ("e500: Add support for eTSEC in device tree")
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <1614158919-9473-1-git-send-email-bmeng.cn@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Handling errors in memory hotunplug in the pSeries machine is more
complex than any other device type, because there are all the
complications that other devices has, and more.
For instance, determining a timeout for a DIMM hotunplug must consider
if it's a Hash-MMU or a Radix-MMU guest, because Hash guests takes
longer to hotunplug DIMMs. The size of the DIMM is also a factor, given
that longer DIMMs naturally takes longer to be hotunplugged from the
kernel. And there's also the guest memory usage to be considered: if
there's a process that is consuming memory that would be lost by the
DIMM unplug, the kernel will postpone the unplug process until the
process finishes, and then initiate the regular hotunplug process. The
first two considerations are manageable, but the last one is a deal
breaker.
There is no sane way for the pSeries machine to determine the memory
load in the guest when attempting a DIMM hotunplug - and even if there
was a way, the guest can start using all the RAM in the middle of the
unplug process and invalidate our previous assumptions - and in result
we can't even begin to calculate a timeout for the operation. This means
that we can't implement a viable timeout mechanism for memory unplug in
pSeries.
Going back to why we would consider an unplug timeout, the reason is
that we can't know if the kernel is giving up the unplug. Turns out
that, sometimes, we can. Consider a failed memory hotunplug attempt
where the kernel will error out with the following message:
'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any
removed LMBs'
This happens when there is a LMB that the kernel gave up in removing,
and the LMBs previously marked for removal are now being added back.
This happens in the pseries kernel in [1], dlpar_memory_remove_by_ic()
into dlpar_add_lmb(), and after that update_lmb_associativity_index().
In this function, the kernel is configuring the LMB DRC connector again.
Note that this is a valid usage in LOPAR, as stated in section
"ibm,configure-connector RTAS Call":
'A subsequent sequence of calls to ibm,configure-connector with the same
entry from the “ibm,drc-indexes” or “ibm,drc-info” property will restart
the configuration of devices which were not completely configured.'
We can use this kernel behavior in our favor. If a DRC connector
reconfiguration for a LMB that we marked as unplug pending happens, this
indicates that the kernel changed its mind about the unplug and is
reasserting that it will keep using all the LMBs of the DIMM. In this
case, it's safe to assume that the whole DIMM device unplug was
cancelled.
This patch hops into rtas_ibm_configure_connector() and, in the scenario
described above, clear the unplug state for the DIMM device. This will
not solve all the problems we still have with memory unplug, but it will
cover this case where the kernel reconfigures LMBs after a failed
unplug. We are a bit more resilient, without using an unreliable
timeout, and we didn't make the remaining error cases any worse.
[1] arch/powerpc/platforms/pseries/hotplug-memory.c
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210222194531.62717-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is a reliable way to make a CPU hotunplug fail in the pseries
machine. Hotplug a CPU A, then offline all other CPUs inside the guest
but A. When trying to hotunplug A the guest kernel will refuse to do it,
because A is now the last online CPU of the guest. PAPR has no 'error
callback' in this situation to report back to the platform, so the guest
kernel will deny the unplug in silent and QEMU will never know what
happened. The unplug pending state of A will remain until the guest is
shutdown or rebooted.
Previous attempts of fixing it (see [1] and [2]) were aimed at trying to
mitigate the effects of the problem. In [1] we were trying to guess
which guest CPUs were online to forbid hotunplug of the last online CPU
in the QEMU layer, avoiding the scenario described above because QEMU is
now failing in behalf of the guest. This is not robust because the last
online CPU of the guest can change while we're in the middle of the
unplug process, and our initial assumptions are now invalid. In [2] we
were accepting that our unplug process is uncertain and the user should
be allowed to spam the IRQ hotunplug queue of the guest in case the CPU
hotunplug fails.
This patch presents another alternative, using the timeout
infrastructure introduced in the previous patch. CPU hotunplugs in the
pSeries machine will now timeout after 15 seconds. This is a long time
for a single CPU unplug to occur, regardless of guest load - although
the user is *strongly* encouraged to *not* hotunplug devices from a
guest under high load - and we can be sure that something went wrong if
it takes longer than that for the guest to release the CPU (the same
can't be said about memory hotunplug - more on that in the next patch).
Timing out the unplug operation will reset the unplug state of the CPU
and allow the user to try it again, regardless of the error situation
that prevented the hotunplug to occur. Of all the not so pretty
fixes/mitigations for CPU hotunplug errors in pSeries, timing out the
operation is an admission that we have no control in the process, and
must assume the worst case if the operation doesn't succeed in a
sensible time frame.
[1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg03353.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html
Reported-by: Xujun Ma <xuma@redhat.com>
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210222194531.62717-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The LoPAR spec provides no way for the guest kernel to report failure of
hotplug/hotunplug events. This wouldn't be bad if those operations were
granted to always succeed, but that's far for the reality.
What ends up happening is that, in the case of a failed hotunplug,
regardless of whether it was a QEMU error or a guest misbehavior, the
pSeries machine is retaining the unplug state of the device in the
running guest. This state is cleanup in machine reset, where it is
assumed that this state represents a device that is pending unplug, and
the device is hotunpluged from the board. Until the reset occurs, any
hotunplug operation of the same device is forbid because there is a
pending unplug state.
This behavior has at least one undesirable side effect. A long standing
pending unplug state is, more often than not, the result of a hotunplug
error. The user had to dealt with it, since retrying to unplug the
device is noy allowed, and then in the machine reset we're removing the
device from the guest. This means that we're failing the user twice -
failed to hotunplug when asked, then hotunplugged without notice.
Solutions to this problem range between trying to predict when the
hotunplug will fail and forbid the operation from the QEMU layer, from
opening up the IRQ queue to allow for multiple hotunplug attempts, from
telling the users to 'reboot the machine if something goes wrong'. The
first solution is flawed because we can't fully predict guest behavior
from QEMU, the second solution is a trial and error remediation that
counts on a hope that the unplug will eventually succeed, and the third
is ... well.
This patch introduces a crude, but effective solution to hotunplug
errors in the pSeries machine. For each unplug done, we'll timeout after
some time. If a certain amount of time passes, we'll cleanup the
hotunplug state from the machine. During the timeout period, any unplug
operations in the same device will still be blocked. After that, we'll
assume that the guest failed the operation, and allow the user to try
again. If the timeout is too short we'll prevent legitimate hotunplug
situations to occur, so we'll need to overestimate the regular time an
unplug operation takes to succeed to account that.
The true solution for the hotunplug errors in the pSeries machines is a
PAPR change to allow for the guest to warn the platform about it. For
now, the work done in this timeout design can be used for the new PAPR
'abort hcall' in the future, given that for both cases we'll need code
to cleanup the existing unplug states of the DRCs.
At this moment we're adding the basic wiring of the timer into the DRC.
Next patch will use the timer to timeout failed CPU hotunplugs.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210222194531.62717-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_drc_detach() is not the best name for what the function does. The
function does not detach the DRC, it makes an uncommited attempt to do
it. It'll mark the DRC as pending unplug, via the 'unplug_request'
flag, and only if the DRC state is drck->empty_state it will detach the
DRC, via spapr_drc_release().
This is a contrast with its pair spapr_drc_attach(), where the function
is indeed creating the DRC QOM object. If you know what
spapr_drc_attach() does, you can be misled into thinking that
spapr_drc_detach() is removing the DRC from QEMU internal state, which
isn't true.
The current role of this function is better described as a request for
detach, since there's no guarantee that we're going to detach the DRC in
the end. Rename the function to spapr_drc_unplug_request to reflect
what is is doing.
The initial idea was to change the name to spapr_drc_detach_request(),
and later on change the unplug_request flag to detach_request. However,
unplug_request is a migratable boolean for a long time now and renaming
it is not worth the trouble. spapr_drc_unplug_request() setting
drc->unplug_request is more natural than spapr_drc_detach_request
setting drc->unplug_request.
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210222194531.62717-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When moving a physical DRC to "Available", drc_isolate_physical() will
move the DRC state to STATE_PHYSICAL_POWERON and, if the DRC is marked
for unplug, call spapr_drc_detach(). For physical DRCs,
drck->empty_state is STATE_PHYSICAL_POWERON, meaning that we're sure
that spapr_drc_detach() will end up calling spapr_drc_release() in the
end.
Likewise, for logical DRCs, drc_set_unusable will move the DRC to
"Unusable" state, setting drc->state to STATE_LOGICAL_UNUSABLE, which is
the drck->empty_state for logical DRCs. spapr_drc_detach() will call
spapr_drc_release() in this case as well.
In both scenarios, spapr_drc_detach() is being used as a
spapr_drc_release(), wrapper, where we also set unplug_requested (which
is already true, otherwise spapr_drc_detach() wouldn't be called in the
first place) and check if drc->state == drck->empty_state, which we also
know it's guaranteed to be true because we just set it.
Just use spapr_drc_release() in these functions to be clear of our
intentions in both these functions.
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210222194531.62717-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
drc_isolate_logical() is used to move the DRC from the "Configured" to
the "Available" state, erroring out if the DRC is in the unexpected
"Unisolate" state and doing nothing (with RTAS_OUT_SUCCESS) if the DRC
is already in "Available" or in "Unusable" state.
When moving from "Configured" to "Available", the DRC is moved to the
LOGICAL_AVAILABLE state, a drc->unplug_requested check is done and, if
true, spapr_drc_detach() is called.
What spapr_drc_detach() does then is:
- set drc->unplug_requested to true. In fact, this is the only place
where unplug_request is set to true;
- does nothing else if drc->state != drck->empty_state. If the DRC
state is equal to drck->empty_state, spapr_drc_release() is
called. For logical DRCs, drck->empty_state = LOGICAL_UNUSABLE.
In short, calling spapr_drc_detach() in drc_isolate_logical() does
nothing. It'll set unplug_request to true again ('again' since it was
already true - otherwise the function wouldn't be called), and will
return without calling spapr_drc_release() because the DRC is not in
LOGICAL_UNUSABLE, since drc_isolate_logical() just moved it to
LOGICAL_AVAILABLE. The only place where the logical DRC is released is
when called from drc_set_unusable(), when it is moved to the
"Unusable" state. As it should, according to PAPR.
Even though calling spapr_drc_detach() in drc_isolate_logical() is
benign, removing it will avoid further thought about the matter. So
let's go ahead and do that.
As a note, this logic was introduced in commit bbf5c878ab. Since
then, the DRC handling code was refactored and enhanced, and PAPR
itself went through some changes in the DRC area as well. It is
expected that some assumptions we had back then are now deprecated.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210211225246.17315-2-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We forward-declare Object typedef in "qemu/typedefs.h" since commit
ca27b5eb7c ("qom/object: Move Object typedef to 'qemu/typedefs.h'").
Use it everywhere to make the code simpler.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210225182003.3629342-1-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
An assorted set of spelling fixes in various places.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210309111510.79495-1-mjt@msgid.tls.msk.ru>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
There are 23 files that include the "sysemu/qtest.h",
but they do not use any qtest functions.
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210226081414.205946-1-kuhn.chenqun@huawei.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
At present the <clock-frequency> property of the serial node is
populated with value zero. U-Boot's ns16550 driver is not happy
about this, so let's fill in a meaningful value.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1612362288-22216-2-git-send-email-bmeng.cn@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
At present the platform clock frequency is using a magic number.
Convert it to a macro and use it everywhere.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1612362288-22216-1-git-send-email-bmeng.cn@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The current logic for calculating 'maxdomain' making it a sum of
numa_state->num_nodes with spapr->gpu_numa_id. spapr->gpu_numa_id is
used as a index to determine the next available NUMA id that a
given NVGPU can use.
The problem is that the initial value of gpu_numa_id, for any topology
that has more than one NUMA node, is equal to numa_state->num_nodes.
This means that our maxdomain will always be, at least, twice the
amount of existing NUMA nodes. This means that a guest with 4 NUMA
nodes will end up with the following max-associativity-domains:
rtas/ibm,max-associativity-domains
00000004 00000008 00000008 00000008 00000008
This overtuning of maxdomains doesn't go unnoticed in the guest, being
detected in SLUB during boot:
dmesg | grep SLUB
[ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=8
SLUB is detecting 8 total nodes, with 4 nodes being online.
This patch fixes ibm,max-associativity-domains by considering the amount
of NVGPUs NUMA nodes presented in the guest, instead of just
spapr->gpu_numa_id.
Reported-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-4-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We'll need to check the initial value given to spapr->gpu_numa_id when
building the rtas DT, so put it in a helper for easier access and to
avoid repetition.
Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This function is used only in spapr_numa.c.
Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This to map the PNOR from the machine init handler directly and finish
the cleanup of the LPC model.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-8-clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On PowerNV systems, the BMC is in charge of mapping the PNOR contents
on the LPC FW address space using the HIOMAP protocol. Under QEMU, we
emulate this behavior and we also add an extra control on the flash
accesses by letting the HIOMAP command handler decide whether the
memory region is accessible or not depending on the firmware requests.
However, this behavior is not compatible with hostboot like firmwares
which need this mapping to be always available. For this reason, the
PNOR memory region is initially disabled for skiboot mode only.
This is badly placed under the LPC model and requires the use of the
machine. Since it doesn't add much, simply remove the initial setting.
The extra control in the HIOMAP command handler will still be performed.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-7-clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PowerNV machine can be run with an external IPMI BMC device
connected to a remote QEMU machine acting as BMC, using these options :
-chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \
-device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \
-device isa-ipmi-bt,bmc=bmc0,irq=10 \
-nodefaults
In that case, some aspects of the BMC initialization should be
skipped, since they rely on the simulator interface.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-6-clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
and reuse pnv_bmc_set_pnor() to share the setting of the PNOR.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The current settings are useful to load large kernels (with debug) but
it moves the initrd image in a memory region not protected by
skiboot. If skiboot is compiled with DEBUG=1, memory poisoning will
corrupt the initrd.
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210126171059.307867-4-clg@kaod.org>
Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It is currently not possible to perform a strict boot from USB storage:
$ qemu-system-ppc64 -accel kvm -nodefaults -nographic -serial stdio \
-boot strict=on \
-device qemu-xhci \
-device usb-storage,drive=disk,bootindex=0 \
-blockdev driver=file,node-name=disk,filename=fedora-ppc64le.qcow2
SLOF **********************************************************************
QEMU Starting
Build Date = Jul 17 2020 11:15:24
FW Version = git-e18ddad8516ff2cf
Press "s" to enter Open Firmware.
Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
00 0000 (D) : 1b36 000d serial bus [ usb-xhci ]
No NVRAM common partition, re-initializing...
Scanning USB
XHCI: Initializing
USB Storage
SCSI: Looking for devices
101000000000000 DISK : "QEMU QEMU HARDDISK 2.5+"
Using default console: /vdevice/vty@71000000
Welcome to Open Firmware
Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
This program and the accompanying materials are made available
under the terms of the BSD License available at
http://www.opensource.org/licenses/bsd-license.php
Trying to load: from: /pci@800000020000000/usb@0/storage@1/disk@101000000000000 ...
E3405: No such device
E3407: Load failed
Type 'boot' and press return to continue booting the system.
Type 'reset-all' and press return to reboot the system.
Ready!
0 >
The device tree handed over by QEMU to SLOF indeed contains:
qemu,boot-list =
"/pci@800000020000000/usb@0/storage@1/disk@101000000000000 HALT";
but the device node is named usb-xhci@0, not usb@0.
This happens because the firmware names of PCI devices returned
by get_boot_devices_list() come from pcibus_get_fw_dev_path(),
while the sPAPR PHB code uses a different naming scheme for
device nodes. This inconsistency has always been there but it was
hidden for a long time because SLOF used to rename USB device
nodes, until this commit, merged in QEMU 4.2.0 :
commit 85164ad4ed
Author: Alexey Kardashevskiy <aik@ozlabs.ru>
Date: Wed Sep 11 16:24:32 2019 +1000
pseries: Update SLOF firmware image
This fixes USB host bus adapter name in the device tree to match QEMU's
one.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Fortunately, sPAPR implements the firmware path provider interface.
This provides a way to override the default firmware paths.
Just factor out the sPAPR PHB naming logic from spapr_dt_pci_device()
to a helper, and use it in the sPAPR firmware path provider hook.
Fixes: 85164ad4ed ("pseries: Update SLOF firmware image")
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210122170157.246374-1-groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In the CPU hotunplug bug [1] the guest kernel throws a scary
message in dmesg:
pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16
The reason isn't related to the bug though. This happens because the
kernel file arch/powerpc/platform/pseries/hotplug-cpu.c, function
dlpar_cpu_remove(), is not finding the device_node.name of the offending
CPU.
We're not populating the 'name' property for hotplugged CPUs. Since the
kernel relies on device_node.name for identifying CPU nodes, and the
CPUs that are coldplugged has the 'name' property filled by SLOF, this
is creating an unneeded inconsistency between hotplug and coldplug CPUs
in the kernel.
Let's fill the 'name' property for hotplugged CPUs as well. This will
make the guest dmesg throws a less intimidating message when we try to
unplug the last online CPU:
pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9@1, rc: -16
[1] https://bugzilla.redhat.com/1911414
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210120232305.241521-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Next patch will use the 'nodename' string in spapr_core_dt_populate()
after the point it's being freed today.
Instead of moving 'g_free(nodename)' around, let's do a QoL change in
both CPU DT functions where 'nodename' is being freed, and use
g_autofree to avoid the 'g_free()' call altogether.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210120232305.241521-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
v2
Dropped vmstate: Fix memory leak in vmstate_handle_alloc
Broke on Power
Added migration: only check page size match if RAM postcopy is enabled
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmAhIE4ACgkQBRYzHrxb
/ecPuA/+Pgo++1ZSseJUgbLePwyTVc0jahdcvYEDmLUn8UM6ikBcBXBgUKHdkFW3
bjSSVgB/xxvXSiafBK4xFNrCqSgqMSr3DJcHmvWgv2wVARcYf6Z26Da53LZq1Qru
0tvRyb40Od1f9zb8Zj7e2Y3pjQ9ybLLbjfNhgnOBbQivqWkjZI31oV2KUCWY2+eV
T1BEwr6mgYepqhmeB6OvQZtaQVC5toirS6NajNF4nt0vZEIGIvK6/A9erCVU8Tze
5ch1J0MUqgc3q6ZSE/I9BHEy6MaL0X8G6H+ezjxdoRQtbt1iM/YqZJCSrXkAxiLC
ROohryb6qVk26+UYuana79faLwrw359WlkwNEE6SEIRSENu+6p7bgN3LZuCILCO7
xJEkeTgy6r40IGCkDC9aWa8pyLHpNX9gyLpGBHdIRD6zEOWaKNtzh7E2uo/T0ann
BpcfgQOsYN25hIHiiXnxozUREbx71VDfMq7GqGB6eC3u2+a3U6jpSJb1nNq5NB89
FJYLZy5Rbuy7OStMwfMsxRs7E63XvGgnwrN8FczU/pumCPX4lDYIpnocqinUmP8p
XubRQQVaVDSKIq1mvzw7iR/1NsP9vfYvnrAIv941f38NBmDKqdPuMOXR/qB/Kp2Y
jB7b1L5/JcXbWsQmK7fda9jmPzFwSO2cTeTiUonk9RfuuDEws0A=
=4tbe
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20210208a' into staging
Migration pull 2021-02-08
v2
Dropped vmstate: Fix memory leak in vmstate_handle_alloc
Broke on Power
Added migration: only check page size match if RAM postcopy is enabled
# gpg: Signature made Mon 08 Feb 2021 11:28:14 GMT
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20210208a: (27 commits)
migration: only check page size match if RAM postcopy is enabled
migration: introduce snapshot-{save, load, delete} QMP commands
iotests: fix loading of common.config from tests/ subdir
iotests: add support for capturing and matching QMP events
migration: introduce a delete_snapshot wrapper
migration: wire up support for snapshot device selection
migration: control whether snapshots are ovewritten
block: rename and alter bdrv_all_find_snapshot semantics
block: allow specifying name of block device for vmstate storage
block: add ability to specify list of blockdevs during snapshot
migration: stop returning errno from load_snapshot()
migration: Make save_snapshot() return bool, not 0/-1
block: push error reporting into bdrv_all_*_snapshot functions
migration: Display the migration blockers
migration: Add blocker information
migration: Fix a few absurdly defective error messages
migration: Fix cache_init()'s "Failed to allocate" error messages
migration: Clean up signed vs. unsigned XBZRLE cache-size
migration: Fix migrate-set-parameters argument validation
migration: introduce 'userfaultfd-wrlat.py' script
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When VM migrate VMState of spapr_pci, the field(msi_devs) of spapr_pci
having a flag of VMS_ALLOC need to allocate memory. If the src doesn't free
memory of msi_devs in SaveStateEntry of spapr_pci after QEMUFile save
VMState of spapr_pci, it may result in memory leak of msi_devs. We add the
post_save func to free memory, which prevents memory leak.
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Jinhao Gao <gaojinhao@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20201231061020.828-2-gaojinhao@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We haven't yet implemented the fairly involved handshaking that will be
needed to migrate PEF protected guests. For now, just use a migration
blocker so we get a meaningful error if someone attempts this (this is the
same approach used by AMD SEV).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Some upcoming POWER machines have a system called PEF (Protected
Execution Facility) which uses a small ultravisor to allow guests to
run in a way that they can't be eavesdropped by the hypervisor. The
effect is roughly similar to AMD SEV, although the mechanisms are
quite different.
Most of the work of this is done between the guest, KVM and the
ultravisor, with little need for involvement by qemu. However qemu
does need to tell KVM to allow secure VMs.
Because the availability of secure mode is a guest visible difference
which depends on having the right hardware and firmware, we don't
enable this by default. In order to run a secure guest you need to
create a "pef-guest" object and set the confidential-guest-support
property to point to it.
Note that this just *allows* secure guests, the architecture of PEF is
such that the guest still needs to talk to the ultravisor to enter
secure mode. Qemu has no direct way of knowing if the guest is in
secure mode, and certainly can't know until well after machine
creation time.
To start a PEF-capable guest, use the command line options:
-object pef-guest,id=pef0 -machine confidential-guest-support=pef0
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently, blk_is_read_only() tells whether a given BlockBackend can
only be used in read-only mode because its root node is read-only. Some
callers actually try to answer a slightly different question: Is the
BlockBackend configured to be writable, by taking write permissions on
the root node?
This can differ, for example, for CD-ROM devices which don't take write
permissions, but may be backed by a writable image file. scsi-cd allows
write requests to the drive if blk_is_read_only() returns false.
However, the write request will immediately run into an assertion
failure because the write permission is missing.
This patch introduces separate functions for both questions.
blk_supports_write_perm() answers the question whether the block
node/image file can support writable devices, whereas blk_is_writable()
tells whether the BlockBackend is currently configured to be writable.
All calls of blk_is_read_only() are converted to one of the two new
functions.
Fixes: https://bugs.launchpad.net/bugs/1906693
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210118123448.307825-2-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use g_autoptr() with Object and g_autofree with the string to
avoid the need of a cleanup path.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210114180628.1675603-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The function is called only inside spapr_hcall.c.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210114180628.1675603-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since commit 1e8b5b1aa1 ("spapr: Allow memory unplug to always succeed")
trying to unplug memory from a guest that doesn't support it (eg. rhel6)
no longer generates an error like it used to. Instead, it leaves the
memory around : only a subsequent reboot or manual use of drmgr within
the guest can complete the hot-unplug sequence. A flag was added to
SpaprMachineClass so that this new behavior only applies to the default
machine type.
We can do better. CAS processes all pending hot-unplug requests. This
means that we don't really care about what the guest supports if
the hot-unplug request happens before CAS.
All guests that we care for, even old ones, set enough bits in OV5
that lead to a non-empty bitmap in spapr->ov5_cas. Use that as a
heuristic to decide if CAS has already occured or not.
Always accept unplug requests that happen before CAS since CAS will
process them. Restore the previous behavior of rejecting them after
CAS when we know that the guest doesn't support memory hot-unplug.
This behavior is suitable for all machine types : this allows to
drop the pre_6_0_memory_unplug flag.
Fixes: 1e8b5b1aa1 ("spapr: Allow memory unplug to always succeed")
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <161012708715.801107.11418801796987916516.stgit@bahia.lan>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use the PCI_BUS type cast macro to convert result of qdev_get_child_bus().
Also remove the check for NULL afterwards which should not be needed
because sysbus_create_simple() uses error_abort and we create the PCI
host object here that's expected to have a PCI bus so this shouldn't
fail. Even if it would fail that would be due to a programmer error so
an error message is not necessary.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <a4dc55b56eed3ce899b7bf9835b980a114c52598.1610143658.git.balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This reverts commit e6d5106786 which was added mistakenly. While this
change works it was suggested during review that keeping dependencies
explicit for each board may be better than listing them in a common
option so keep the previous version and revert this change.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <8c65807fc7dc1c4c4f6320f2fd6409a3091c88ff.1610143658.git.balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This reverts commit 038da2adf that was mistakenly added, this
dependency is still needed to get libfdt dependencies even if fdt.o is
not needed by sam460ex.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <15a9fa72eed4f02bdbeaef206803d5e22260e2de.1610143658.git.balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now we've converted all the callsites to directly create the QOM UIC
device themselves, the ppcuic_init() function is unused and can be
removed. The enum defining PPCUIC symbolic constants can be moved
to the ppc-uic.h header where it more naturally belongs.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-Id: <20210108171212.16500-5-peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Switch the ppc405_uc boards to directly creating and configuring the
UIC, rather than doing it via the old ppcuic_init() helper function.
We retain the API feature of ppc405ep_init() where it passes back
something allowing the callers to wire up devices to the UIC if
they need to, even though neither of the callsites currently makes
use of this ability -- instead of passing back the qemu_irq array
we pass back the UIC DeviceState.
This fixes a trivial Coverity-detected memory leak where
we were leaking the array of IRQs returned by ppcuic_init().
Fixes: Coverity CID 1421922
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210108171212.16500-4-peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The function ppc405cr_init() has apparently been unused since it was
added in commit 8ecc791352 in 2007.
Remove this dead code, so we don't have to convert it away from using
ppcuic_init().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210108171212.16500-3-peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Switch the sam460ex board to directly creating and configuring the
UIC, rather than doing it via the old ppcuic_init() helper function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210108171212.16500-2-peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The OpenPIC device is located within the macio device on real hardware so make it
a child of the macio-newworld device. This also removes the need for setting and
checking a separate PIC object property link on the macio-newworld device which
currently causes the automated QOM introspection tests to fail.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20201229175619.6051-6-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
In order to move the OpenPIC device to the macio device, the PCI bus needs to be
initialised before the macio device and also before wiring the OpenPIC IRQs.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20201229175619.6051-5-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
The heathrow PIC is located within the macio device on real hardware so make it
a child of the macio-oldworld device. This also removes the need for setting and
checking a separate PIC object property link on the macio-oldworld device which
currently causes the automated QOM introspection tests to fail.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20201229175619.6051-4-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
In order to move the heathrow PIC to the macio device, the PCI bus needs to be
initialised before the macio device and also before wiring the PIC IRQs.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20201229175619.6051-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
This condition will have already been caught when wiring the heathrow PIC
IRQs to the CPU.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20201229175619.6051-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
First pull request for 2021, which has a bunch of things accumulated
over the holidays. Includes:
* A number of cleanups to sam460ex and ppc440 code from BALATON Zoltan
* Several fixes for builds with --without-default-devices from Greg Kurz
* Fixes for some DRC reset problems from Greg Kurz
* QOM conversion of the PPC 4xx UIC devices from Peter Maydell
* Some other assorted fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl/1L38ACgkQbDjKyiDZ
s5JmvQ//RddzvCrHewdtRys+XnLDsKbWng3rQGKh2rSpKMYM11ilmo7FOGoOMNwq
aiZXm5z3t2lpSUTGZorVuPAnYKExzkuAQkPsFZ65uf9wfDhB2wg3BIr97GgZBF2S
MvK9DlxhUNJI+1W8Y+hwj9xDMOX3oFqZp24g2i6EQPRcpqE7GtRpOzt6PdL15sNz
KiJtIeyZ32uGDQaqNlWHJ/pBiYECEQTVpaZIztg2WLdfMICzgYMSCSZzbUrYXCii
WPDJ9sr69sMFwX2oEAgmfmJeFaTOFMt/xTOwFvi2ex4Rd1Rzqb9XToZ+ihOeOAFr
c4a7fpZzx0ePYLIAfOAZ2exV8Nh04dWjRyr2ykgo1ik3DaJ1Ck80O7jYyPQN1Dir
wKpWW59a3pjdABa/ZAoMoFwJh1zPAwGuiN4Higy87Ux8X+JOlTzzkP9ja9v2fgRC
DNb8VYvehUbY6bbHkqs57JcVyYLX56yphfq6Pr2D3DE6y1Ekph2G2vR8YXnqbRmY
Pw5VJ9q1SdYypGVZdMmIXseM7XerFA9YlIfIAQ7DiEW5wH9sx5QjDxlSt07l56J0
TlK6m9Fgc3koLLtVqDlK0NPx39xqVa1JUkrvPeWNKqn1FG/0tfPU6oPVjdQx3ouk
X2cv4A99MJsWSoyUMCH5r5+CHdMCscILOSOZ6OiWAHMEdqCxH0Q=
=7Eiy
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.0-20210106' into staging
ppc patch queue 2021-01-06
First pull request for 2021, which has a bunch of things accumulated
over the holidays. Includes:
* A number of cleanups to sam460ex and ppc440 code from BALATON Zoltan
* Several fixes for builds with --without-default-devices from Greg Kurz
* Fixes for some DRC reset problems from Greg Kurz
* QOM conversion of the PPC 4xx UIC devices from Peter Maydell
* Some other assorted fixes and cleanups
# gpg: Signature made Wed 06 Jan 2021 03:33:19 GMT
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dg-gitlab/tags/ppc-for-6.0-20210106: (22 commits)
ppc440_pcix: Fix up pci config access
ppc440_pcix: Fix register write trace event
ppc440_pcix: Improve comment for IRQ mapping
sam460ex: Remove FDT_PPC dependency from KConfig
ppc4xx: Move common dependency on serial to common option
pnv: Fix reverse dependency on PCI express root ports
ppc: Simplify reverse dependencies of POWERNV and PSERIES on XICS and XIVE
ppc: Fix build with --without-default-devices
spapr: Add drc_ prefix to the DRC realize and unrealize functions
spapr: Use spapr_drc_reset_all() at machine reset
spapr: Introduce spapr_drc_reset_all()
spapr: Fix reset of transient DR connectors
spapr: Call spapr_drc_reset() for all DRCs at CAS
spapr: Fix buffer overflow in spapr_numa_associativity_init()
spapr: Allow memory unplug to always succeed
spapr: Fix DR properties of the root node
spapr/xive: Make spapr_xive_pic_print_info() static
spapr: DRC lookup cannot fail
hw/ppc/ppc440_bamboo: Drop use of ppcuic_init()
hw/ppc/virtex_ml507: Drop use of ppcuic_init()
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>