Commit Graph

903 Commits

Author SHA1 Message Date
Michael Tokarev
e6a19a6477 ppc: spelling fixes
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2023-09-20 07:54:34 +03:00
Cédric Le Goater
44fa20c928 spapr: Remove support for NVIDIA V100 GPU with NVLink2
NVLink2 support was removed from the PPC PowerNV platform and VFIO in
Linux 5.13 with commits :

  562d1e207d32 ("powerpc/powernv: remove the nvlink support")
  b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2")

This was 2.5 years ago. Do the same in QEMU with a revert of commit
ec132efaa8 ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some
adjustements are required on the NUMA part.

Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20230918091717.149950-1-clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-09-18 07:25:28 -03:00
Nicholas Piggin
b27fcb288b spapr: Fix record-replay machine reset consuming too many events
spapr_machine_reset gets a random number to populate the device-tree
rng seed with. When loading a snapshot for record-replay, the machine
is reset again, and that tries to consume the random event record
again, crashing due to inconsistent record

Fix this by saving the seed to populate the device tree with, and
skipping the rng on snapshot load.

Acked-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-09-06 11:19:33 +02:00
Nicholas Piggin
9c7b7f01f9 spapr: Fix machine reset deadlock from replay-record
When the machine is reset to load a new snapshot while being debugged
with replay-record, it is done from another thread, so the CPU does
not run the register setting operations. Set CPU registers directly in
machine reset.

Cc: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-09-06 11:19:33 +02:00
Stefan Hajnoczi
50e7a40af3 target-arm queue:
* hw/gpio/nrf51: implement DETECT signal
  * accel/kvm: Specify default IPA size for arm64
  * ptw: refactor, fix some FEAT_RME bugs
  * target/arm: Adjust PAR_EL1.SH for Device and Normal-NC memory types
  * target/arm/helper: Implement CNTHCTL_EL2.CNT[VP]MASK
  * Fix SME ST1Q
  * Fix 64-bit SSRA
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmTnIoUZHHBldGVyLm1h
 eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3vufEACPJcwyFvSBHDv4VQ6tbgOU
 zwjpUMv4RMKhCOjuxBlJ2DICwOcGNuKer0tc6wkH2T5Ebhoego1osYbRZZoawAJf
 ntg+Ndrx1QH9ORuGqYccLXtHnP741KiKggDHM05BJqB7rqtuH+N4fEn7Cdsw/DNg
 XuCYD5QrxMYvkSOD1l8W0aqp81ucYPgkFqLufypgxrXUiRZ1RBAmPF47BFFdnM8f
 NmrmT1LTF5jr70ySRB+ukK6BAGDc0CUfs6R6nYRwUjRPmSG2rrtUDGo+nOQGDqJo
 PHWmt7rdZQG2w7HVyE/yc3h/CQ3NciwWKbCkRlaoujxHx/B6DRynSeO3NXsP8ELu
 Gizoi3ltwHDQVIGQA19P5phZKHZf7x3MXmK4fDBGB9znvoSFTcjJqkdaN/ARXXO3
 e1vnK1MqnPI8Z1nGdeVIAUIrqhtLHnrrM7jf1tI/e4sjpl3prHq2PvQkakXu8clr
 H8bPZ9zZzyrrSbl4NhpaFTsUiYVxeLoJsNKAmG8dHb+9YsFGXTvEBhtR9eUxnbaV
 XyZ3jEdeW7/ngQ4C6XMD2ZDiKVdx2xJ2Pp5npvljldjmtGUvwQabKo+fPDt2fKjM
 BwjhHA50I633k4fYIwm8YOb70I4oxoL9Lr6PkKriWPMTI5r7+dtwgigREVwnCn+Y
 RsiByKMkDO2TcoQjvBZlCA==
 =3MJ8
 -----END PGP SIGNATURE-----

Merge tag 'pull-target-arm-20230824' of https://git.linaro.org/people/pmaydell/qemu-arm into staging

target-arm queue:
 * hw/gpio/nrf51: implement DETECT signal
 * accel/kvm: Specify default IPA size for arm64
 * ptw: refactor, fix some FEAT_RME bugs
 * target/arm: Adjust PAR_EL1.SH for Device and Normal-NC memory types
 * target/arm/helper: Implement CNTHCTL_EL2.CNT[VP]MASK
 * Fix SME ST1Q
 * Fix 64-bit SSRA

# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmTnIoUZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3vufEACPJcwyFvSBHDv4VQ6tbgOU
# zwjpUMv4RMKhCOjuxBlJ2DICwOcGNuKer0tc6wkH2T5Ebhoego1osYbRZZoawAJf
# ntg+Ndrx1QH9ORuGqYccLXtHnP741KiKggDHM05BJqB7rqtuH+N4fEn7Cdsw/DNg
# XuCYD5QrxMYvkSOD1l8W0aqp81ucYPgkFqLufypgxrXUiRZ1RBAmPF47BFFdnM8f
# NmrmT1LTF5jr70ySRB+ukK6BAGDc0CUfs6R6nYRwUjRPmSG2rrtUDGo+nOQGDqJo
# PHWmt7rdZQG2w7HVyE/yc3h/CQ3NciwWKbCkRlaoujxHx/B6DRynSeO3NXsP8ELu
# Gizoi3ltwHDQVIGQA19P5phZKHZf7x3MXmK4fDBGB9znvoSFTcjJqkdaN/ARXXO3
# e1vnK1MqnPI8Z1nGdeVIAUIrqhtLHnrrM7jf1tI/e4sjpl3prHq2PvQkakXu8clr
# H8bPZ9zZzyrrSbl4NhpaFTsUiYVxeLoJsNKAmG8dHb+9YsFGXTvEBhtR9eUxnbaV
# XyZ3jEdeW7/ngQ4C6XMD2ZDiKVdx2xJ2Pp5npvljldjmtGUvwQabKo+fPDt2fKjM
# BwjhHA50I633k4fYIwm8YOb70I4oxoL9Lr6PkKriWPMTI5r7+dtwgigREVwnCn+Y
# RsiByKMkDO2TcoQjvBZlCA==
# =3MJ8
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 24 Aug 2023 05:27:33 EDT
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [full]
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>" [full]
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [full]
# gpg:                 aka "Peter Maydell <peter@archaic.org.uk>" [unknown]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* tag 'pull-target-arm-20230824' of https://git.linaro.org/people/pmaydell/qemu-arm: (35 commits)
  target/arm: Fix 64-bit SSRA
  target/arm: Fix SME ST1Q
  target/arm/helper: Implement CNTHCTL_EL2.CNT[VP]MASK
  target/arm/helper: Check SCR_EL3.{NSE, NS} encoding for AT instructions
  target/arm: Pass security space rather than flag for AT instructions
  target/arm: Skip granule protection checks for AT instructions
  target/arm/helper: Fix tlbmask and tlbbits for TLBI VAE2*
  target/arm/ptw: Load stage-2 tables from realm physical space
  target/arm: Adjust PAR_EL1.SH for Device and Normal-NC memory types
  target/arm/ptw: Report stage 2 fault level for stage 2 faults on stage 1 ptw
  target/arm/ptw: Check for block descriptors at invalid levels
  target/arm/ptw: Set attributes correctly for MMU disabled data accesses
  target/arm/ptw: Drop S1Translate::out_secure
  target/arm/ptw: Remove S1Translate::in_secure
  target/arm/ptw: Remove last uses of ptw->in_secure
  target/arm/ptw: Only fold in NSTable bit effects in Secure state
  target/arm: Pass an ARMSecuritySpace to arm_is_el2_enabled_secstate()
  target/arm/ptw: Pass an ARMSecuritySpace to arm_hcr_el2_eff_secstate()
  target/arm/ptw: Pass ARMSecurityState to regime_translation_disabled()
  target/arm/ptw: Pass ptw into get_phys_addr_pmsa*() and get_phys_addr_disabled()
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-24 10:08:33 -04:00
Cornelia Huck
95f5c89eca hw: Add compat machines for 8.2
Add 8.2 machine types for arm/i440fx/m68k/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20230718142235.135319-1-cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Laurent Vivier <laurent@vivier.eu>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-08-23 12:06:39 +02:00
Akihiko Odaki
bc3e41a0e8 accel/kvm: Use negative KVM type for error propagation
On MIPS, kvm_arch_get_default_type() returns a negative value when an
error occurred so handle the case. Also, let other machines return
negative values when errors occur and declare returning a negative
value as the correct way to propagate an error that happened when
determining KVM type.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-5-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-08-22 17:31:03 +01:00
David Hildenbrand
c0ce7b4acb hw/ppc/spapr: Use machine_memory_devices_init()
Let's use our new helper and stop always allocating ms->device_memory.
There is no difference in common memory-device code anymore between
ms->device_memory being NULL or the size being 0. So we only have to
teach spapr code that ms->device_memory isn't always around.

We can now modify two maxram_size checks to rely on ms->device_memory
for detecting whether we have memory devices.

Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: "Cédric Le Goater" <clg@kaod.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Greg Kurz <groug@kaod.org>
Cc: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230623124553.400585-5-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:25:37 +02:00
Nicholas Piggin
dc5e072188 spapr: TCG allow up to 8-thread SMT on POWER8 and newer CPUs
PPC TCG supports SMT CPU configurations for non-hypervisor state, so
permit POWER8-10 pseries machines to enable SMT.

This requires PIR and TIR be set, because that's how sibling thread
matching is done by TCG.

spapr's nested-HV capability does not currently coexist with SMT, so
that combination is prohibited (interestingly somewhat analogous to
LPAR-per-core mode on real hardware which also does not support KVM).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[ clg: Also test smp_threads when checking for POWER8 CPU and above ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-06-25 22:41:30 +02:00
Philippe Mathieu-Daudé
516cd73733 hw/ppc/spapr: Test whether TCG is enabled with tcg_enabled()
Although the PPC target only supports the TCG and KVM
accelerators, QEMU supports more. We can not assume that
'!kvm == tcg', so test for the correct accelerator. This
also eases code review, because here we don't care about
KVM, we really want to test for TCG.

Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[np: Fix changelog typo noticed by Zoltan]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-06-25 22:41:30 +02:00
Nicholas Piggin
6b8a05373b ppc/spapr: Move spapr nested HV to a new file
Create spapr_nested.c for most of the nested HV implementation.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-06-25 22:41:30 +02:00
Nicholas Piggin
277ee17212 target/ppc: Add POWER9 DD2.2 model
POWER9 DD2.1 and earlier had significant limitations when running KVM,
including lack of "mixed mode" MMU support (ability to run HPT and RPT
mode on threads of the same core), and a translation prefetch issue
which is worked around by disabling "AIL" mode for the guest.

These processors are not widely available, and it's difficult to deal
with all these quirks in qemu +/- KVM, so create a POWER9 DD2.2 CPU
and make it the default POWER9 CPU.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-Id: <20230515160201.394587-1-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-05-28 13:25:11 -03:00
Nicholas Piggin
ccc5a4c5e1 spapr: Add SPAPR_CAP_AIL_MODE_3 for AIL mode 3 support for H_SET_MODE hcall
The behaviour of the Address Translation Mode on Interrupt resource is
not consistently supported by all CPU versions or all KVM versions: KVM
HV does not support mode 2, and does not support mode 3 on POWER7 or
early POWER9 processesors. KVM PR only supports mode 0. TCG supports all
modes (0, 2, 3) on CPUs with support for the corresonding LPCR[AIL] mode.
This leads to inconsistencies in guest behaviour and could cause problems
migrating guests.

This was not noticable for Linux guests for a long time because the
kernel only uses modes 0 and 3, and it used to consider AIL-3 to be
advisory in that it would always keep the AIL-0 vectors around, so it
did not matter whether or not interrupts were delivered according to
the AIL mode. Recent Linux guests depend on AIL mode 3 working as
specified in order to support the SCV facility interrupt. If AIL-3 can
not be provided, then H_SET_MODE must return an error to Linux so it can
disable the SCV facility (failure to do so can lead to userspace being
able to crash the guest kernel).

Add the ail-mode-3 capability to specify that AIL-3 is supported. AIL-0
is implied as the baseline, and AIL-2 is no longer supported by spapr.
AIL-2 is not known to be used by any software, but support in TCG could
be restored with an ail-mode-2 capability quite easily if a regression
is reported.

Modify the H_SET_MODE Address Translation Mode on Interrupt resource
handler to check capabilities and correctly return error if not
supported.

KVM has a cap to advertise support for AIL-3.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20230515160216.394612-1-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-05-28 07:13:54 -03:00
Juan Quintela
e1fde0e038 migration: Move rate_limit_max and rate_limit_used to migration_stats
These way we can make them atomic and use this functions from any
place.  I also moved all functions that use rate_limit to
migration-stats.

Functions got renamed, they are not qemu_file anymore.

qemu_file_rate_limit -> migration_rate_exceeded
qemu_file_set_rate_limit -> migration_rate_set
qemu_file_get_rate_limit -> migration_rate_get
qemu_file_reset_rate_limit -> migration_rate_reset
qemu_file_acct_rate_limit -> migration_rate_account.

Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-6-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Cornelia Huck
f9be4771d3 hw: Add compat machines for 8.1
Add 8.1 machine types for arm/i440fx/m68k/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20230314173009.152667-1-cohuck@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-04-20 06:44:15 +02:00
Peter Maydell
222059a0fc ppc patch queue for 2022-12-21:
This queue contains a MAINTAINERS update, the implementation of the Freescale eSDHC,
 the introduction of the DEXCR/HDEXCR instructions and other assorted fixes (most of
 them for the e500 board).
 -----BEGIN PGP SIGNATURE-----
 
 iIwEABYKADQWIQQX6/+ZI9AYAK8oOBk82cqW3gMxZAUCY6M//RYcZGFuaWVsaGI0
 MTNAZ21haWwuY29tAAoJEDzZypbeAzFkaNABAKfQ/zpg2ugr/SmC7Ee9tnFNxDrq
 JsNw+roXpUZvnkUZAQCMRm4BxfaXhXikRaSL2ZfGRtybKXki5o3Ez+rLxISiAg==
 =gRo7
 -----END PGP SIGNATURE-----

Merge tag 'pull-ppc-20221221' of https://gitlab.com/danielhb/qemu into staging

ppc patch queue for 2022-12-21:

This queue contains a MAINTAINERS update, the implementation of the Freescale eSDHC,
the introduction of the DEXCR/HDEXCR instructions and other assorted fixes (most of
them for the e500 board).

# gpg: Signature made Wed 21 Dec 2022 17:18:53 GMT
# gpg:                using EDDSA key 17EBFF9923D01800AF2838193CD9CA96DE033164
# gpg:                issuer "danielhb413@gmail.com"
# gpg: Good signature from "Daniel Henrique Barboza <danielhb413@gmail.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 17EB FF99 23D0 1800 AF28  3819 3CD9 CA96 DE03 3164

* tag 'pull-ppc-20221221' of https://gitlab.com/danielhb/qemu:
  target/ppc: Check DEXCR on hash{st, chk} instructions
  target/ppc: Implement the DEXCR and HDEXCR
  hw/ppc/e500: Move comment to more appropriate place
  hw/ppc/e500: Resolve variable shadowing
  hw/ppc/e500: Prefer local variable over qdev_get_machine()
  hw/ppc/virtex_ml507: Prefer local over global variable
  target/ppc/mmu_common: Fix table layout of "info tlb" HMP command
  target/ppc/mmu_common: Log which effective address had no TLB entry found
  hw/ppc/spapr: Reduce "vof.h" inclusion
  hw/ppc/vof: Do not include the full "cpu.h"
  target/ppc/kvm: Add missing "cpu.h" and "exec/hwaddr.h"
  hw/ppc/e500: Add Freescale eSDHC to e500plat
  hw/sd/sdhci: Support big endian SD host controller interfaces
  MAINTAINERS: downgrade PPC KVM/TCG CPUs and pSeries to 'Odd Fixes'

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-12-21 18:08:09 +00:00
Philippe Mathieu-Daudé
46d80a56a1 hw/ppc/spapr: Reduce "vof.h" inclusion
Currently objects including "hw/ppc/spapr.h" are forced to be
target specific due to the inclusion of "vof.h" in "spapr.h".

"spapr.h" only uses a Vof pointer, so doesn't require the structure
declaration. The only place where Vof structure is accessed is in
spapr.c, so include "vof.h" there, and forward declare the structure
in "spapr.h".

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221213123550.39302-4-philmd@linaro.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-12-21 14:17:55 -03:00
Cornelia Huck
db723c80b1 hw: Add compat machines for 8.0
Add 8.0 machine types for arm/i440fx/m68k/q35/s390x/spapr.

Reviewed-by: Cédric Le Goater <clg@kaod.org> [ppc]
Reviewed-by: Thomas Huth <thuth@redhat.com> [s390x]
Reviewed-by: Greg Kurz <groug@kaod.org> [ppc]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20221212152145.124317-2-cohuck@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2022-12-21 06:35:28 -05:00
Markus Armbruster
047f2ca1ce qapi qdev qom: Elide redundant has_FOO in generated C
The has_FOO for pointer-valued FOO are redundant, except for arrays.
They are also a nuisance to work with.  Recent commit "qapi: Start to
elide redundant has_FOO in generated C" provided the means to elide
them step by step.  This is the step for qapi/qdev.json and
qapi/qom.json.

Said commit explains the transformation in more detail.  The invariant
violations mentioned there do not occur here.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Eduardo Habkost <eduardo@habkost.net>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221104160712.3005652-21-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2022-12-14 20:05:07 +01:00
Jason A. Donenfeld
7966d70f6f reset: allow registering handlers that aren't called by snapshot loading
Snapshot loading only expects to call deterministic handlers, not
non-deterministic ones. So introduce a way of registering handlers that
won't be called when reseting for snapshots.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-2-Jason@zx2c4.com
[PMM: updated json doc comment with Markus' text; fixed
 checkpatch style nit]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27 11:34:31 +01:00
Daniel Henrique Barboza
d890f2fa9f hw/ppc: set machine->fdt in spapr machine
The pSeries machine never bothered with the common machine->fdt
attribute. We do all the FDT related work using spapr->fdt_blob.

We're going to introduce a QMP/HMP command to dump the FDT, which will
rely on setting machine->fdt properly to work across all machine
archs/types.

Let's set machine->fdt in two places where we manipulate the FDT:
spapr_machine_reset() and CAS. There are other places where the FDT is
manipulated in the pSeries machines, most notably the hotplug/unplug
path. For now we'll acknowledge that we won't have the most accurate
representation of the FDT, depending on the current machine state, when
using this QMP/HMP fdt command. Making the internal FDT representation
always match the actual FDT representation that the guest is using is a
problem for another day.

spapr->fdt_blob is left untouched for now. To replace it with
machine->fdt, since we're migrating spapr->fdt_blob, we would need to
migrate machine->fdt as well. This is something that we would like to to
do keep our code simpler but it's also a work we'll leave for later.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220926173855.1159396-14-danielhb413@gmail.com>
2022-10-17 16:15:10 -03:00
Philippe Mathieu-Daudé
a580fdcd60 hw/ppc/pnv: Avoid dynamic stack allocation
Use autofree heap allocation instead of variable-length
array on the stack.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-id: 20220819153931.3147384-7-peter.maydell@linaro.org
2022-09-22 16:38:28 +01:00
Xuzhou Cheng
cb5b5ab9a5 hw/ppc: spapr: Use qemu_vfree() to free spapr->htab
spapr->htab is allocated by qemu_memalign(), hence we should use
qemu_vfree() to free it.

Fixes: c5f54f3e31 ("pseries: Move hash page table allocation to reset time")
Fixes: b4db54132f ("target/ppc: Implement H_REGISTER_PROCESS_TABLE H_CALL"")
Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220920103159.1865256-28-bmeng.cn@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-09-20 12:31:53 -03:00
Cornelia Huck
f514e1477f hw: Add compat machines for 7.2
Add 7.2 machine types for arm/i440fx/m68k/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220727121755.395894-1-cohuck@redhat.com>
[thuth: fixed conflict with pcmc->legacy_no_rng_seed]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-08-25 21:59:04 +02:00
Leandro Lupori
3c2e80ad2f ppc: Check partition and process table alignment
Check if partition and process tables are properly aligned, in
their size, according to PowerISA 3.1B, Book III 6.7.6 programming
note. Hardware and KVM also raise an exception in these cases.

Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20220628133959.15131-2-leandro.lupori@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00
Jason A. Donenfeld
c4b075318e hw/ppc: pass random seed to fdt
If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
initialize early. Set this using the usual guest random number
generation function. This is confirmed to successfully initialize the
RNG on Linux 5.19-rc6. The rng-seed node is part of the DT spec. Set
this on the paravirt platforms, spapr and e500, just as is done on other
architectures with paravirt hardware.

Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220712135114.289855-1-Jason@zx2c4.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00
Alexey Kardashevskiy
81b205cecf ppc/spapr: Implement H_WATCHDOG
The new PAPR 2.12 defines a watchdog facility managed via the new
H_WATCHDOG hypercall.

This adds H_WATCHDOG support which a proposed driver for pseries uses:
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=303120

This was tested by running QEMU with a debug kernel and command line:
-append \
 "pseries-wdt.timeout=60 pseries-wdt.nowayout=1 pseries-wdt.action=2"

and running "echo V > /dev/watchdog0" inside the VM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220622051008.1067464-1-aik@ozlabs.ru>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-06 10:22:38 -03:00
Alexey Kardashevskiy
5bb55f3e3b spapr: Use address from elf parser for kernel address
tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.

QEMU loads the kernel at 0x400000 by default which works most of
the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
(position independent code). This works for a little endian zImage too.

However a big endian zImage is compiled without -pie, is 32bit, linked to
0x4000000 so current QEMU ends up loading it at
0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.

This uses the kernel address returned from load_elf().
If the default kernel_addr is used, there is no change in behavior (as
translate_kernel_address() takes care of this), which is:
LE/BE vmlinux and LE zImage boot, BE zImage does not.
If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
prints a warning and BE zImage boots.

Note #1: SLOF (x-vof=off) still cannot boot a big endian zImage as
SLOF enables MSR_SF for everything loaded by QEMU and this leads to early
crash of 32bit zImage.

Note #2: BE/LE vmlinux images set MSR_SF in early boot so these just work;
a LE zImage restores MSR_SF after every CI call and we are lucky enough
not to crash before the first CI call.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20220504065536.3534488-1-aik@ozlabs.ru>
[danielhb: use PRIx64 instead of lx in warn_report]
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-05-26 17:11:32 -03:00
Paolo Bonzini
f73eb9484b pseries: allow setting stdout-path even on machines with a VGA
-machine graphics=off is the usual way to tell the firmware or the OS that the
user wants a serial console.  The pseries machine however does not support
this, and never adds the stdout-path node to the device tree if a VGA device
is provided.  This is in addition to the other magic behavior of VGA devices,
which is to add a keyboard and mouse to the default USB bus.

Split spapr->has_graphics in two variables so that the two behaviors can be
separated: the USB devices remains the same, but the stdout-path is added
even with "-device VGA -machine graphics=off".

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220507054826.124936-1-pbonzini@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-05-26 17:11:32 -03:00
Paolo Bonzini
97ec4d21e0 machine: use QAPI struct for boot configuration
As part of converting -boot to a property with a QAPI type, define
the struct and use it throughout QEMU to access boot configuration.
machine_boot_parse takes care of doing the QemuOpts->QAPI conversion by
hand, for now.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 12:29:43 +02:00
Gautam Agrawal
f9bcb2d684 Warn user if the vga flag is passed but no vga device is created
A global boolean variable "vga_interface_created"(declared in softmmu/globals.c)
has been used to track the creation of vga interface. If the vga flag is passed
in the command line "default_vga"(declared in softmmu/vl.c) variable is set to 0.
To warn user, the condition checks if vga_interface_created is false
and default_vga is equal to 0. If "-vga none" is passed, this patch will not warn the
user regarding the creation of VGA device.

The warning "A -vga option was passed but this
machine type does not use that option; no VGA device has been created"
is logged if vga flag is passed but no vga device is created.

This patch has been tested for x86_64, i386, sparc, sparc64 and arm boards.

Signed-off-by: Gautam Agrawal <gautamnagrawal@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/581
Message-Id: <20220501122505.29202-1-gautamnagrawal@gmail.com>
[thuth: Fix wrong warning with "-device" in some cases as reported by Paolo]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-05-09 08:21:14 +02:00
Víctor Colombo
d41ccf6eea target/ppc: Remove msr_pr macro
msr_pr macro hides the usage of env->msr, which is a bad behavior
Substitute it with FIELD_EX64 calls that explicitly use env->msr
as a parameter.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220504210541.115256-4-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-05-05 15:36:17 -03:00
Cornelia Huck
0ca703662e hw: Add compat machines for 7.1
Add 7.1 machine types for arm/i440fx/m68k/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20220316145521.1224083-1-cohuck@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-04-20 09:36:24 +02:00
Marc-André Lureau
0f9668e0c1 Remove qemu-common.h include from most units
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:55 +02:00
Markus Armbruster
b21e238037 Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

This commit only touches allocations with size arguments of the form
sizeof(T).

Patch created mechanically with:

    $ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
	     --macro-file scripts/cocci-macro-file.h FILES...

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220315144156.1595462-4-armbru@redhat.com>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
2022-03-21 15:44:44 +01:00
Peter Maydell
5df022cf2e osdep: Move memalign-related functions to their own header
Move the various memalign-related functions out of osdep.h and into
their own header, which we include only where they are used.
While we're doing this, add some brief documentation comments.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org
2022-03-07 13:16:49 +00:00
Daniel Henrique Barboza
5f2b96b38e hw/ppc/spapr.c: fail early if no firmware found in machine_init()
The firmware check consists on a file search (qemu_find_file) and load
it via load_imag_targphys(). This validation is not dependent on any
other machine state but it currently being done at the end of
spapr_machine_init(). This means that we can do a lot of stuff and end
up failing at the end for something that we can verify right out of the
gate.

Move this validation to the start of spapr_machine_init() to fail
earlier.  While we're at it, use g_autofree in the 'filename' pointer.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-3-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:39 +01:00
Daniel Henrique Barboza
aebb9b9cb2 hw/ppc/spapr.c: use g_autofree in spapr_dt_chosen()
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-2-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:39 +01:00
Nicholas Piggin
120f738a46 spapr: implement nested-hv capability for the virtual hypervisor
This implements the Nested KVM HV hcall API for spapr under TCG.

The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
L1 is switched back in returned from the hcall when a HV exception
is sent to the vhyp. Register state is copied in and out according to
the nested KVM HV hcall API specification.

The hdecr timer is started when the L2 is switched in, and it provides
the HDEC / 0x980 return to L1.

The MMU re-uses the bare metal radix 2-level page table walker by
using the get_pate method to point the MMU to the nested partition
table entry. MMU faults due to partition scope errors raise HV
exceptions and accordingly are routed back to the L1.

The MMU does not tag translations for the L1 (direct) vs L2 (nested)
guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
and exit).

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-10-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin
7cebc5db2e target/ppc: Introduce a vhyp framework for nested HV support
Introduce virtual hypervisor methods that can support a "Nested KVM HV"
implementation using the bare metal 2-level radix MMU, and using HV
exceptions to return from H_ENTER_NESTED (rather than cause interrupts).

HV exceptions can now be raised in the TCG spapr machine when running a
nested KVM HV guest. The main ones are the lev==1 syscall, the hdecr,
hdsi and hisi, hv fu, and hv emu, and h_virt external interrupts.

HV exceptions are intercepted in the exception handler code and instead
of causing interrupts in the guest and switching the machine to HV mode,
they go to the vhyp where it may exit the H_ENTER_NESTED hcall with the
interrupt vector numer as return value as required by the hcall API.

Address translation is provided by the 2-level page table walker that is
implemented for the bare metal radix MMU. The partition scope page table
is pointed to the L1's partition scope by the get_pate vhc method.

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220216102545.1808018-9-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin
f32d4ab41c target/ppc: make vhyp get_pate method take lpid and return success
In prepartion for implementing a full partition table option for
vhyp, update the get_pate method to take an lpid and return a
success/fail indicator.

The spapr implementation currently just asserts lpid is always 0
and always return success.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-6-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Shivaprasad G Bhat
b5513584a0 spapr: nvdimm: Implement H_SCM_FLUSH hcall
The patch adds support for the SCM flush hcall for the nvdimm devices.
To be available for exploitation by guest through the next patch. The
hcall is applicable only for new SPAPR specific device class which is
also introduced in this patch.

The hcall expects the semantics such that the flush to return with
H_LONG_BUSY_ORDER_10_MSEC when the operation is expected to take longer
time along with a continue_token. The hcall to be called again by providing
the continue_token to get the status. So, all fresh requests are put into
a 'pending' list and flush worker is submitted to the thread pool. The
thread pool completion callbacks move the requests to 'completed' list,
which are cleaned up after collecting the return status for the guest
in subsequent hcall from the guest.

The semantics makes it necessary to preserve the continue_tokens and
their return status across migrations. So, the completed flush states
are forwarded to the destination and the pending ones are restarted
at the destination in post_load. The necessary nvdimm flush specific
vmstate structures are also introduced in this patch which are to be
saved in the new SPAPR specific nvdimm device to be introduced in the
following patch.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <164396254862.109112.16675611182159105748.stgit@ltczzess4.aus.stglabs.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Daniel Henrique Barboza
1977434bbf spapr.c: check bus != NULL in spapr_get_fw_dev_path()
spapr_get_fw_dev_path() is an impl of
FWPathProviderClass::get_dev_path(). This interface is used by
hw/core/qdev-fw.c via fw_path_provider_try_get_dev_path() in two
functions:

- static char *qdev_get_fw_dev_path_from_handler(), which is used only in
qdev_get_fw_dev_path_helper() and it's guarded by "if (dev &&
dev->parent_bus)";

- char *qdev_get_own_fw_dev_path_from_handler(), which is used in
softmmu/bootdevice.c in get_boot_device_path() like this:

    if (dev) {
        d = qdev_get_own_fw_dev_path_from_handler(dev->parent_bus, dev);

This means that, when called via softmmu/bootdevice.c, there's no check
of 'dev->parent_bus' being not NULL. The result is that the "BusState
*bus" arg of spapr_get_fw_dev_path() can potentially be NULL and if, at
the same time, "SCSIDevice *d" is not NULL, we'll hit this line:

    void *spapr = CAST(void, bus->parent, "spapr-vscsi");

And we'll SIGINT because 'bus' is NULL and we're accessing bus->parent.

Adding a simple 'bus != NULL' check to guard the instances where we
access 'bus->parent' can avoid this altogether.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220121213852.30243-1-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-01-28 13:15:02 +01:00
Cédric Le Goater
2460e1d75b spapr: Fix support of POWER5+ processors
POWER5+ (ISA v2.03) processors are supported by the pseries machine
but they do not have Altivec instructions. Do not advertise support
for it in the DT.

To be noted that this test is in contradiction with the assert in
cap_vsx_apply().

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20220105095142.3990430-3-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-01-12 11:28:26 +01:00
Cornelia Huck
01854af2cf hw: Add compat machines for 7.0
Add 7.0 machine types for arm/i440fx/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211217143948.289995-1-cohuck@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-01-05 09:06:36 +01:00
Leandro Lupori
7bf00dfb51 target/ppc: fix Hash64 MMU update of PTE bit R
When updating the R bit of a PTE, the Hash64 MMU was using a wrong byte
offset, causing the first byte of the adjacent PTE to be corrupted.
This caused a panic when booting FreeBSD, using the Hash MMU.

Fixes: a2dd4e83e7 ("ppc/hash64: Rework R and C bit updates")
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-11-29 21:00:08 +01:00
Yanan Wang
2b52619994 machine: Move smp_prefer_sockets to struct SMPCompatProps
Now we have a common structure SMPCompatProps used to store information
about SMP compatibility stuff, so we can also move smp_prefer_sockets
there for cleaner code.

No functional change intended.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-15-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 15:29:15 +02:00
Yanan Wang
4a0af2930a machine: Prefer cores over sockets in smp parsing since 6.2
In the real SMP hardware topology world, it's much more likely that
we have high cores-per-socket counts and few sockets totally. While
the current preference of sockets over cores in smp parsing results
in a virtual cpu topology with low cores-per-sockets counts and a
large number of sockets, which is just contrary to the real world.

Given that it is better to make the virtual cpu topology be more
reflective of the real world and also for the sake of compatibility,
we start to prefer cores over sockets over threads in smp parsing
since machine type 6.2 for different arches.

In this patch, a boolean "smp_prefer_sockets" is added, and we only
enable the old preference on older machines and enable the new one
since type 6.2 for all arches by using the machine compat mechanism.

Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-10-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 15:28:16 +02:00
Daniel Henrique Barboza
e0eb84d4f5 spapr_numa.c: FORM2 NUMA affinity support
The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and straightforward NUMA distance assignment without relying on
complex associations between several levels of NUMA via
ibm,associativity matches. Another feature is its extensibility. This base
support contains the facilities for NUMA distance assignment, but in the
future more facilities will be added for latency, performance, bandwidth
and so on.

This patch implements the base FORM2 affinity support as follows:

- the use of FORM2 associativity is indicated by using bit 2 of byte 5
of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1
or FORM2 affinity. Setting both forms will default to FORM2. We're not
advertising FORM2 for pseries-6.1 and older machine versions to prevent
guest visible changes in those;

- ibm,associativity-reference-points has a new semantic. Instead of
being used to calculate distances via NUMA levels, it's now used to
indicate the primary domain index in the ibm,associativity domain of
each resource. In our case it's set to {0x4}, matching the position
where we already place logical_domain_id;

- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table
and ibm,numa-distance-table. The index table is used to list all the
NUMA logical domains of the platform, in ascending order, and allows for
spartial NUMA configurations (although QEMU ATM doesn't support that).
ibm,numa-distance-table is an array that contains all the distances from
the first NUMA node to all other nodes, then the second NUMA node
distances to all other nodes and so on;

- get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity()
now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest
selected FORM2 affinity during CAS.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-7-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza
5dab5abe62 spapr: move FORM1 verifications to post CAS
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has a different latency than the original NUMA node from the
regular memory. FORM2 is also able  to deal with asymmetric NUMA
distances gracefully, something that our FORM1 implementation doesn't
do.

Move these FORM1 verifications to a new function and wait until after
CAS, when we're sure that we're sticking with FORM1, to enforce them.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30 12:26:06 +10:00