The number of NUMA nodes in the system is fixed from the command line.
Therefore, there's no need to recalculate it at reset time, and we can
determine the special gpu_numa_id value used for NVLink2 devices at init
time.
This simplifies the reset path a bit which will make further improvements
easier.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Certain old guest versions don't understand the radix MMU introduced with
POWER ISA 3.0, but incorrectly select it if presented with the option at
CAS time. We workaround this in qemu by explicitly excluding the radix
(and other ISA 3.0 linked) options if the guest doesn't explicitly note
support for ISA 3.0.
This is handled by the 'cas_legacy_guest_workaround' flag, which is pretty
vague. Rename it to 'cas_pre_isa3_guest' to be clearer about what it's for.
In addition, we unnecessarily call spapr_populate_pa_features() with
different options when initially constructing the device tree and when
adjusting it at CAS time. At the initial construct time cas_pre_isa3_guest
is already false, so we can still use the flag, rather than explicitly
overriding it to be false at the callsite.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
On POWER8 systems the Directed Privileged Door-bell Exception State
register (DPDES) stores doorbell pending status, one bit per a thread
of a core, set by "msgsndp" instruction. The register is shared among
threads of the same core and KVM on POWER9 emulates it in a similar way
(POWER9 does not have DPDES).
DPDES is shared but QEMU assumes all SPRs are per thread so the only safe
way to write DPDES back to VCPU before running a guest is doing so
while all threads are pulled out of the guest so DPDES cannot change.
There is only one situation when this condition is met: incoming migration
when all threads are stopped. Otherwise any QEMU HMP/QMP command causing
kvm_arch_put_registers() (for example printing registers or dumping memory)
can clobber DPDES in a race with other vcpu threads.
This changes DPDES handling so it is not written to KVM at runtime.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190923084110.34643-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There are FPSCR-related defines in target/ppc/cpu.h which can be used in
place of constants and explicit shifts which arguably improve the code a
bit in places.
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Message-Id: <1568817169-1721-1-git-send-email-pc@us.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR)
instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl.
This patch adds support for 'mffsce' instruction.
'mffsce' is identical to 'mffs', except that it also clears the exception
enable bits in the FPSCR.
On CPUs without support for 'mffsce' (below ISA 3.0), the
instruction will execute identically to 'mffs'.
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1568817082-1384-1-git-send-email-pc@us.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR)
instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl.
This patch adds support for 'mffscrn' and 'mffscrni' instructions.
'mffscrn' and 'mffscrni' are similar to 'mffsl', except they do not return
the status bits (FI, FR, FPRF) and they also set the rounding mode in the
FPSCR.
On CPUs without support for 'mffscrn'/'mffscrni' (below ISA 3.0), the
instructions will execute identically to 'mffs'.
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Message-Id: <1568817081-1345-1-git-send-email-pc@us.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A typical pseries VM with 16 vCPUs, one disk, one network adapater
uses less than 100 interrupts but the whole IRQ number space of the
QEMU machine is allocated at reset time and it is 8K wide. This is
wasting a considerable amount of interrupt numbers in the global IRQ
space which has 1M interrupts per socket on a POWER9.
To optimise the HW resources, only request at the KVM level interrupts
which have been claimed by the guest. This will help to increase the
maximum number of VMs per system and also help supporting nested guests
using the XIVE interrupt mode.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190911133937.2716-3-clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156942766014.1274533.10792048853177121231.stgit@bahia.lan>
[dwg: Folded in fix up from Greg Kurz]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It will help us to discard interrupt numbers which have not been
claimed in the next patch.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190911133937.2716-2-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
add PnvHomer device model to emulate homer memory access
for pstate table, occ-sensors, slw, occ static and dynamic
values for Power8 and Power9 chips.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-4-bala24@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
emulate occ common area region with occ sram device model which
occ and skiboot uses it to communicate regarding sensors, slw
and HWMON in PowerNV emulated host.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-3-bala24@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
During PowerNV boot skiboot populates the device tree by
retrieving base address of homer/occ common area from
PBA BARs and prd ipoll mask by accessing xscom read/write
accesses.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-2-bala24@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Unless the machine was started with kernel-irqchip=on, we cannot easily
tell if we're actually using an in-kernel or an emulated irqchip. This
information is important enough that it is worth printing it in 'info
pic'.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156829860985.2073005.5893493824873412773.stgit@bahia.tls.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There were few trailing comments after `/*` instead in
new line and line more than 80 character, these fixes are
trivial and doesn't change any logic in code.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190911142925.19197-5-bala24@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Coverity is reporting in CID 1405304 that tpm_execute() may pass a NULL
tpm_proxy->host_path pointer to open(). This is based on the fact that
h_tpm_comm() does a NULL check on tpm_proxy->host_path and then passes
tpm_proxy to tpm_execute().
The check in h_tpm_comm() is abusive actually since a spapr-proxy-tpm
requires a non NULL host_path property, as checked during realize.
Fixes: 0fb6bd0732
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156805260916.1779401.11054185183758185247.stgit@bahia.lan>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Fixes the dtc output :
ERROR (node_name_chars): //bmc: Bad character '/' in node name
Warning (avoid_unnecessary_addr_size): /bmc: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190902092932.20200-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When we hotplug a CPU on memory-less/cpu-less node, the linux kernel
crashes.
This happens because linux kernel needs to know the NUMA topology at
start to be able to initialize the distance lookup table.
On pseries, the topology is provided by the firmware via the existing
CPUs and memory information. Thus a node without memory and CPU cannot be
discovered by the kernel.
To avoid the kernel crash, do not allow to start pseries with empty
nodes.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20190830161345.22436-1-lvivier@redhat.com>
[dwg: Rework to cope with movement of numa state from globals to MachineState]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mostly fix errors and warnings reported by 'checkpatch.pl -f'.
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Aleksandar Rikalo <arikalo@wavecomp.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <1569331602-2586-7-git-send-email-aleksandar.markovic@rt-rk.com>
Mostly fix errors and warnings reported by 'checkpatch.pl -f'.
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Aleksandar Rikalo <arikalo@wavecomp.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <1569331602-2586-5-git-send-email-aleksandar.markovic@rt-rk.com>
Mostly fix errors and warnings reported by 'checkpatch.pl -f'.
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Aleksandar Rikalo <arikalo@wavecomp.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <1569331602-2586-4-git-send-email-aleksandar.markovic@rt-rk.com>
Mostly fix errors and warnings reported by 'checkpatch.pl -f'.
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Aleksandar Rikalo <arikalo@wavecomp.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <1569331602-2586-3-git-send-email-aleksandar.markovic@rt-rk.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl2PebUSHGFybWJydUBy
ZWRoYXQuY29tAAoJEDhwtADrkYZT/VEP/Rru2BgOV/6OSIunW9Ii6qUsVNYtYM0L
4XrOaVJBhNnakAdPPphXmhk6iRCJESZ+4z+s4Iw+yD4aSRYEWowMFi1blaIk+3lH
BNBllepKrvX9ZZyMHCooKrrO2PEPOPRin5izCDn5O93onQjwzXfWtuxZRn6WaLzB
MOtixr340ysa+KNktpAWWPH/NJFq+LLvyQVUdN5xR1i5YBEVE9s0uZg9uq+zpFEg
xSw45BDQSCjNNywd5mqJ0x+y7PCeGAINS9el43ernn654qVXpNgYepE3PpkCjCH7
snFKFJA8+h1VjXPd8/amJimD11+CFIUlhvtlVwccCWme8PRE6Hveerf7FvexnjpN
yGfDOV2ezgrnFspAjI3hrPfsF1Vfxe+eDOUg+y9xGtRgD5LsWmkUBEukKZocVYQG
H52BT6qt3IlUbWVPuHlWEUOsWKk7e3IeAZgnoaafRxwmLHaVWX0SPZCGm1Bfvxe2
LxXnS/pPWubyTrmQA9xCDhILrvkquCyOVBufFnS6D3h45psQKXr2Rh+vVjnYslRR
/B1tDAFZvu9Q6+Y9//AnnLhTZlEPf+qg7Ajv+rH8PTl7SbB/tNVI+vrJJCp5AJab
OifkqahV4H12STf4N2R3g4j/HqPj2PB1/GdLpmOc94ZY5h8GMxviEpwZkKTnw5xD
zhwDdMVqL6c1
=CNvl
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2019-09-28' into staging
QAPI patches for 2019-09-28
# gpg: Signature made Sat 28 Sep 2019 16:18:13 BST
# gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg: issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qapi-2019-09-28: (27 commits)
qapi: Improve source file read error handling
qapi: Improve reporting of redefinition
qapi: Improve reporting of missing documentation comment
qapi: Eliminate check_keys(), rename check_known_keys()
qapi: Improve reporting of invalid 'if' further
qapi: Avoid redundant definition references in error messages
qapi: Improve reporting of missing / unknown definition keys
qapi: Improve reporting of invalid flags
qapi: Improve reporting of invalid 'if' errors
qapi: Move context-free checking to the proper place
qapi: Move context-sensitive checking to the proper place
qapi: Inline check_name() into check_union()
qapi: Plumb info to the QAPISchemaMember
qapi: Make check_type()'s array case a bit more obvious
qapi: Move check for reserved names out of add_name()
qapi: Report invalid '*' prefix like any other invalid name
qapi: Use check_name_str() where it suffices
qapi: Improve reporting of invalid name errors
qapi: Reorder check_FOO() parameters for consistency
qapi: Improve reporting of member name clashes
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Everybody who used something like "-machine accel=kvm:tcg" in the past
might be tempted to specify a similar list with the -accel parameter,
too, for example "-accel kvm:tcg". However, this is not how this
options is thought to be used, since each "-accel" should only take care
of one specific accelerator.
In the long run, we really should rework the "-accel" code completely,
so that it does not set "-machine accel=..." anymore internally, but
is completely independent from "-machine". For the short run, let's
make sure that users cannot use "-accel xyz:tcg", so that we avoid
that we have to deal with such cases in the wild later.
Message-Id: <20190930123505.11607-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Both, "rom->addr" and "addr" are derived from the binary image
that can be loaded with the "-kernel" paramer. The code in
rom_copy() then calculates:
d = dest + (rom->addr - addr);
and uses "d" as destination in a memcpy() some lines later. Now with
bad kernel images, it is possible that rom->addr is smaller than addr,
thus "rom->addr - addr" gets negative and the memcpy() then tries to
copy contents from the image to a bad memory location. This could
maybe be used to inject code from a kernel image into the QEMU binary,
so we better fix it with an additional sanity check here.
Cc: qemu-stable@nongnu.org
Reported-by: Guangming Liu
Buglink: https://bugs.launchpad.net/qemu/+bug/1844635
Message-Id: <20190925130331.27825-1-thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Coverity currently complains that the "if (0x00 & (0x80 >> (phase - 8))"
in next-cube.c can never be true. Right it is. The "0x00" is meant as value
of the control register of the RTC, which is currently not implemented yet.
Thus, let's add a register variable for this now. However, the RTC
registers are currently defined as static variables in nextscr2_write(),
which is quite ugly. Thus let's also move the RTC variables to the main
machine state instead. In the long run, we should likely even refactor
the whole RTC code into a separate device in a separate file, but that's
something for calm winter nights later... as a first step, cleaning up
the static variables and shutting up the warning from Coverity should
be sufficient.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190921091738.26953-1-huth@tuxfamily.org>
Signed-off-by: Thomas Huth <huth@tuxfamily.org>
While at it, simplify using $(land).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20190926111955.17276-3-marcandre.lureau@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Fixes: dad5ddcea3 ("check: Only test usb-ehci when it is compiled in")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Fixes commit
e5758de4e8 ("tests/libqtest: Make
qtest_qmp_device_add/del independent from global_qtest")
and commit
dd21074972 ("tests/libqtest: Use
libqtest-single.h in tests that require global_qtest").
Cc: Thomas Huth <thuth@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20190926111955.17276-2-marcandre.lureau@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Instead of splitting at an unaligned address, we can simply split at
4TB.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
s390 was trying to solve limited KVM memslot size issue by abusing
memory_region_allocate_system_memory(), which breaks API contract
where the function might be called only once.
Beside an invalid use of API, the approach also introduced migration
issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in
migration stream as separate RAMBlocks.
After discussion [1], it was agreed to break migration from older
QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12)
and considered to be not actually used downstream).
Migration should keep working for guests with less than 8TB and for
more than 8TB with QEMU 4.2 and newer binary.
In case user tries to migrate more than 8TB guest, between incompatible
QEMU versions, migration should fail gracefully due to non-exiting
RAMBlock ID or RAMBlock size mismatch.
Taking in account above and that now KVM code is able to split too
big MemorySection into several memslots, partially revert commit
(bb223055b s390-ccw-virtio: allow for systems larger that 7.999TB)
and use kvm_set_max_memslot_size() to set KVMSlot size to
KVM_SLOT_MAX_BYTES.
1) [PATCH RFC v2 4/4] s390: do not call memory_region_allocate_system_memory() multiple times
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20190924144751.24149-5-imammedo@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Max memslot size supported by kvm on s390 is 8Tb,
move logic of splitting RAM in chunks upto 8T to KVM code.
This way it will hide KVM specific restrictions in KVM code
and won't affect board level design decisions. Which would allow
us to avoid misusing memory_region_allocate_system_memory() API
and eventually use a single hostmem backend for guest RAM.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20190924144751.24149-4-imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Currently MemoryRegionSection has 1:1 mapping to KVMSlot.
However next patch will allow splitting MemoryRegionSection into
several KVMSlot-s, make sure that kvm_physical_log_slot_clear()
is able to handle such 1:N mapping.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190924144751.24149-3-imammedo@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We may need to clear the dirty bitmap for more than one KVM memslot.
First do some code movement with no semantic change.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190924144751.24149-2-imammedo@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[fixup line break]
On IBM Z, KVM in the kernel is only implemented for 64-bit mode, and
with regards to TCG, we also only support 64-bit host CPUs (see the
check at the beginning of tcg/s390/tcg-target.inc.c), so we should
remove s390 (without "x", i.e. the old 31-bit mode CPUs) from the
list of supported CPUs.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190928190334.6897-1-thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Return the correct error code when the SCCB buffer is too small to
contain all of the output, for the Read SCP Information and
Read CPU Information commands.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Message-Id: <1569591203-15258-5-git-send-email-imbrenda@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>