Currently helper.c includes some code which is part of the arm
target's gdbstub support. This code has a better home: in gdbstub.c
and gdbstub64.c. Move it there.
Because aarch64_fpu_gdb_get_reg() and aarch64_fpu_gdb_set_reg() move
into gdbstub64.c, this means that they're now compiled only for
TARGET_AARCH64 rather than always. That is the only case when they
would ever be used, but it does mean that the ifdef in
arm_cpu_register_gdb_regs_for_features() needs to be adjusted to
match.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210921162901.17508-4-peter.maydell@linaro.org
We're going to move this code to a different file; fix the coding
style first so checkpatch doesn't complain. This includes deleting
the spurious 'break' statements after returns in the
vfp_gdb_get_reg() function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210921162901.17508-3-peter.maydell@linaro.org
The aarch64-linux QEMU usermode binaries can never run 32-bit
code, so they do not need to include the GDB XML for it.
(arm_cpu_register_gdb_regs_for_features() will not use these
XML files if the CPU has ARM_FEATURE_AARCH64, so we will not
advertise to gdb that we have them.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210921162901.17508-2-peter.maydell@linaro.org
Add BBRAM and eFUSE usage to the Xilinx Versal Virt board
document.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-10-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Connect the support for ZynqMP eFUSE one-time field-programmable
bit array.
The command argument:
-drive if=pflash,index=3,...
Can be used to optionally connect the bit array to a
backend storage, such that field-programmed values
in one invocation can be made available to next
invocation.
The backend storage must be a seekable binary file, and
its size must be 768 bytes or larger. A file with all
binary 0's is a 'blank'.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-9-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Connect the support for Xilinx ZynqMP Battery-Backed RAM (BBRAM)
The command argument:
-drive if=pflash,index=2,...
Can be used to optionally connect the bbram to a backend
storage, such that field-programmed values in one
invocation can be made available to next invocation.
The backend storage must be a seekable binary file, and
its size must be 36 bytes or larger. A file with all
binary 0's is a 'blank'.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-8-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Connect the support for Versal eFUSE one-time field-programmable
bit array.
The command argument:
-drive if=pflash,index=1,...
Can be used to optionally connect the bit array to a
backend storage, such that field-programmed values
in one invocation can be made available to next
invocation.
The backend storage must be a seekable binary file, and
its size must be 3072 bytes or larger. A file with all
binary 0's is a 'blank'.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-7-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Connect the support for Versal Battery-Backed RAM (BBRAM)
The command argument:
-drive if=pflash,index=0,...
Can be used to optionally connect the bbram to a backend
storage, such that field-programmed values in one
invocation can be made available to next invocation.
The backend storage must be a seekable binary file, and
its size must be 36 bytes or larger. A file with all
binary 0's is a 'blank'.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-6-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This device is present in Versal and ZynqMP product
families to store a 256-bit encryption key.
Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-5-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This implements the Xilinx ZynqMP eFuse, an one-time
field-programmable non-volatile storage device. There is
only one such device in the Xilinx ZynqMP product family.
Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-4-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This implements the Xilinx Versal eFuse, an one-time
field-programmable non-volatile storage device. There is
only one such device in the Xilinx Versal product family.
This device has two separate mmio interfaces, a controller
and a flatten readback.
The controller provides interfaces for field-programming,
configuration, control, and status.
The flatten readback is a cache to provide a byte-accessible
read-only interface to efficiently read efuse array.
Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-3-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This introduces the QOM for Xilinx eFuse, an one-time
field-programmable storage bit array.
The actual mmio interface to the array varies by device
families and will be provided in different change-sets.
Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-2-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The SMCCC 1.3 spec section 5.2 says
The Unknown SMC Function Identifier is a sign-extended value of (-1)
that is returned in the R0, W0 or X0 registers. An implementation must
return this error code when it receives:
* An SMC or HVC call with an unknown Function Identifier
* An SMC or HVC call for a removed Function Identifier
* An SMC64/HVC64 call from AArch32 state
To comply with these statements, let's always return -1 when we encounter
an unknown HVC or SMC call.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Allwinner H3 SoC uses Cortex-A7 cores which support virtualization.
However, today we are configuring QEMU to use HVC as PSCI conduit.
That means HVC calls get trapped into QEMU instead of the guest's own
emulated CPU and thus break the guest's ability to execute virtualization.
Fix this by moving to SMC as conduit, freeing up HYP completely to the VM.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-id: 20210920203931.66527-1-agraf@csgraf.de
Fixes: 740dafc0ba ("hw/arm: add Allwinner H3 System-on-Chip")
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
- Vladimir Sementsov-Ogievskiy: Rework coroutines of qemu NBD client
to improve reconnect support
- Eric Blake: Relax server in regards to NBD_OPT_LIST_META_CONTEXT
- Vladimir Sementsov-Ogievskiy: Plumb up 64-bit bulk-zeroing support
in block layer, in preparation for future NBD spec extensions
- Nir Soffer: Default to writeback cache in qemu-nbd
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmFU1a4ACgkQp6FrSiUn
Q2obEggAq4KgnBILBip6TsFc9p7yzVpK9qWzri0vhJHOI7p+dE5Bxqt8sLBEdMQP
OiV5WE/ZRHogGULXXJChBmUTK2/z0U57AXVvbCpKWgr48kSj818dk8uhuUjBeWaM
5Qc8PsH+/Rij5WRlVmTePu7QMwXp1h3+gkw3fmLlHRkvzO4MmOBn1lrOqnCxSGbo
xnFvfdeplNiexmpImdO6QSaHfDsmnUqOFpBPZUsKXfdHOiRqg2I3eU1ibv8eQH7B
oTtBxWI0KHc6kWaJUWqZbEe4ChJTHvAsVdZxmB+il1diIl46lq+s/Zn7nT+0HXTD
pS/Fws9rrGKS/7PnGeFOfskVVCRnIw==
=VGqR
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2021-09-27-v2' into staging
nbd patches for 2021-09-27
- Vladimir Sementsov-Ogievskiy: Rework coroutines of qemu NBD client
to improve reconnect support
- Eric Blake: Relax server in regards to NBD_OPT_LIST_META_CONTEXT
- Vladimir Sementsov-Ogievskiy: Plumb up 64-bit bulk-zeroing support
in block layer, in preparation for future NBD spec extensions
- Nir Soffer: Default to writeback cache in qemu-nbd
# gpg: Signature made Wed 29 Sep 2021 22:07:58 BST
# gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg: aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2021-09-27-v2:
block/nbd: check that received handle is valid
block/nbd: drop connection_co
block/nbd: refactor nbd_recv_coroutines_wake_all()
block/nbd: move nbd_recv_coroutines_wake_all() up
block/nbd: nbd_channel_error() shutdown channel unconditionally
nbd/client-connection: nbd_co_establish_connection(): fix non set errp
nbd/server: Allow LIST_META_CONTEXT without STRUCTURED_REPLY
block/io: allow 64bit discard requests
block: use int64_t instead of int in driver discard handlers
block: make BlockLimits::max_pdiscard 64bit
block/io: allow 64bit write-zeroes requests
block: use int64_t instead of int in driver write_zeroes handlers
block: make BlockLimits::max_pwrite_zeroes 64bit
block: use int64_t instead of uint64_t in copy_range driver handlers
block: use int64_t instead of uint64_t in driver write handlers
block: use int64_t instead of uint64_t in driver read handlers
qcow2: check request on vmstate save/load path
block/io: bring request check to bdrv_co_(read,write)v_vmstate
qemu-nbd: Change default cache mode to writeback
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
qemu/KVM on Power is no longer my primary job responsibility, nor Greg
Kurz'. I still have some time for upstream maintenance, but it's no longer
accurate to say that I'm paid to do so. Therefore, reduce sPAPR (the
"pseries" machine type) from Supported to Maintained.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The OpenPIC interrupt controller was once the de facto standard on ppc
machines. In qemu it's now only used on some Macintosh and the
Freescale e500 machine. It has no listed maintainer, and as far as I
know, no-one who's really familiar with it any more.
Since I'm moving away from the area, I no longer have capacity to do even
minimal maintenance of it under the auspices of the ppc targets in general.
Therefore, this patch lists the main part of openpic, and marks it as
"Odd Fixes" to be looked after by Mark Cave-Ayland who handles the
Macintosh targets. The openpic_kvm variant is only used on e500, so
add it to the files for that machine type (itself already Orphaned).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
With our interests moving to other areas, Greg and myself no longer have
capacity to be regular reviewers of code for the powernv machine type, let
alone co-maintainers. Additionally, not being IBM employees, we don't have
easy access to the hardware information we'd need for good review.
Therefore, remove our names as reviewers and/or co-maintainers of the
powernv machine type, and the related XIVE interrupt controller.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Cédric Le Goater <clg@kaod.org>
There are a nunber of old embedded ppc machine types which have been little
changed and in "Odd Fixes" state for a long time. With both myself and
Greg Kurz moving toward other areas, we no longer have the capacity to
keep reviewing and maintaining even the rare patches that come in for those
platforms.
Therefore, remove our names as reviewers and mark these platforms as
orphaned.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Greg and I are moving towards other areas and no longer have capacity to
act as regular reviewers for several of the secondary ppc machine types.
So, remove ourselves as reviewers for Macintosh, PReP, sam460ex and
pegasos2 in MAINTAINERS.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Currently the PowerPC TCG CPUs entry in MAINTAINERS lists all of hw/ppc/
and include/hw/ppc. Nearly all the files in those places are related to
specific ppc machine types, rather than to the actual CPUs however. Those
machine types list their own files separately, often overlapping with this.
For greater clarity, remove these misleading entries from the TCG CPUs
stanza, leaving just hw/ppc/ppc.c and hw/ppc/ppc_booke.c which are the only
ones common to a wide range of PPC TCG cpus each.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The trace event was placed in the wrong routine. Move it under
kvmppc_xive_source_reset_one().
Fixes: 4e960974d4 ("xive: Add trace events")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210922070205.1235943-1-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch has a handful of modifications for the recent added
FORM2 support:
- to not allocate more than the necessary size in 'distance_table'.
At this moment the array is oversized due to allocating uint32_t for
all elements, when most of them fits in an uint8_t. Fix it by
changing the array to uint8_t and allocating the exact size;
- use stl_be_p() to store the uint32_t at the start of 'distance_table';
- use sizeof(uint32_t) to skip the uint32_t length when populating the
distances;
- use the NUMA_DISTANCE_MIN macro from sysemu/numa.h to avoid hardcoding
the local distance value.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210922122852.130054-2-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Correct the multi-line comment format. No functional changes.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20210918032653.646370-3-bin.meng@windriver.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is no machine that uses Motorola MCP750 (aka Raven) model.
Drop the related codes.
While we are here, drop the mentioning of Intel GW80314 I/O
companion chip in the comments as it has been obsolete for years,
and correct a typo too.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20210918032653.646370-2-bin.meng@windriver.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The reset value of IPIDR should be zero for Freescale chipset, per
the following 2 manuals I checked:
- P2020RM (https://www.nxp.com/webapp/Download?colCode=P2020RM)
- P4080RM (https://www.nxp.com/webapp/Download?colCode=P4080RM)
Currently it is set to 1, which leaves the IPI enabled on core 0
after power-on reset. Such may cause unexpected interrupt to be
delivered to core 0 if the IPI is triggered from core 0 to other
cores later.
Fixes: ffd5e9fe02 ("openpic: Reset IRQ source private members")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/584
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20210918032653.646370-1-bin.meng@windriver.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The current way the mask is built can overflow with a 64-bit decrementer.
Use sextract64() to extract the signed values and remove the logic to
handle negative values which has become useless.
Cc: Luis Fernando Fujita Pires <luis.pires@eldorado.org.br>
Fixes: a8dafa5251 ("target/ppc: Implement large decrementer support for TCG")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210920061203.989563-5-clg@kaod.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
numa_complete_configuration() in hw/core/numa.c always adds a NUMA node
for the pSeries machine if none was specified, but without node distance
information for the single node created.
NUMA FORM1 affinity code didn't rely on numa_state information to do its
job, but FORM2 does. As is now, this is the result of a pSeries guest
with NUMA FORM2 affinity when no NUMA nodes is specified:
$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0
node 0 size: 16222 MB
node 0 free: 15681 MB
No distance information available.
This can be amended in spapr_numa_FORM2_write_rtas_tables(). We're
enforcing that the local distance (the distance to the node to itself) is
always 10. This allows for the proper creation of the NUMA distance tables,
fixing the output of 'numactl -H' in the guest:
$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0
node 0 size: 16222 MB
node 0 free: 15685 MB
node distances:
node 0
0: 10
CC: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-8-danielhb413@gmail.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and straightforward NUMA distance assignment without relying on
complex associations between several levels of NUMA via
ibm,associativity matches. Another feature is its extensibility. This base
support contains the facilities for NUMA distance assignment, but in the
future more facilities will be added for latency, performance, bandwidth
and so on.
This patch implements the base FORM2 affinity support as follows:
- the use of FORM2 associativity is indicated by using bit 2 of byte 5
of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1
or FORM2 affinity. Setting both forms will default to FORM2. We're not
advertising FORM2 for pseries-6.1 and older machine versions to prevent
guest visible changes in those;
- ibm,associativity-reference-points has a new semantic. Instead of
being used to calculate distances via NUMA levels, it's now used to
indicate the primary domain index in the ibm,associativity domain of
each resource. In our case it's set to {0x4}, matching the position
where we already place logical_domain_id;
- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table
and ibm,numa-distance-table. The index table is used to list all the
NUMA logical domains of the platform, in ascending order, and allows for
spartial NUMA configurations (although QEMU ATM doesn't support that).
ibm,numa-distance-table is an array that contains all the distances from
the first NUMA node to all other nodes, then the second NUMA node
distances to all other nodes and so on;
- get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity()
now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest
selected FORM2 affinity during CAS.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-7-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has a different latency than the original NUMA node from the
regular memory. FORM2 is also able to deal with asymmetric NUMA
distances gracefully, something that our FORM1 implementation doesn't
do.
Move these FORM1 verifications to a new function and wait until after
CAS, when we're sure that we're sticking with FORM1, to enforce them.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introducing a new NUMA affinity, FORM2, requires a new mechanism to
switch between affinity modes after CAS. Also, we want FORM2 data
structures and functions to be completely separated from the existing
FORM1 code, allowing us to avoid adding new code that inherits the
existing complexity of FORM1.
The idea of switching values used by the write_dt() functions in
spapr_numa.c was already introduced in the previous patch, and
the same approach will be used when dealing with the FORM1 and FORM2
arrays.
We can accomplish that by that by renaming the existing numa_assoc_array
to FORM1_assoc_array, which now is used exclusively to handle FORM1 affinity
data. A new helper get_associativity() is then introduced to be used by the
write_dt() functions to retrieve the current ibm,associativity array of
a given node, after considering affinity selection that might have been
done during CAS. All code that was using numa_assoc_array now needs to
retrieve the array by calling this function.
This will allow for an easier plug of FORM2 data later on.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The next preliminary step to introduce NUMA FORM2 affinity is to make
the existing code independent of FORM1 macros and values, i.e.
MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch
accomplishes that by doing the following:
- move the NUMA related macros from spapr.h to spapr_numa.c where they
are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used
to refer to the maximum number of NUMA nodes, including GPU nodes, that
the machine can support;
- MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to
FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific
macros are used in FORM1 init functions;
- code that uses MAX_DISTANCE_REF_POINTS now retrieves the
max_dist_ref_points value using get_max_dist_ref_points().
NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE
is replaced by get_vcpu_assoc_size(). These functions are used by the
generic device tree functions and h_home_node_associativity() and will
allow them to switch between FORM1 and FORM2 without changing their core
logic.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When first introduced, 'legacy_numa' was a way to refer to guests that
either wouldn't be affected by associativity domain calculations, namely
the ones with only 1 NUMA node, and pre 5.2 guests that shouldn't be
affected by it because it would be an userspace change. Calling these
cases 'legacy_numa' was a convenient way to label these cases.
We're about to introduce a new NUMA affinity, FORM2, and this concept
of 'legacy_numa' is now a bit misleading because, although it is called
'legacy' it is in fact a FORM1 exclusive contraint.
This patch removes spapr_machine_using_legacy_numa() and open code the
conditions in each caller. While we're at it, move the chunk inside
spapr_numa_FORM1_affinity_init() that sets all numa_assoc_array domains
with 'node_id' to spapr_numa_define_FORM1_domains(). This chunk was
being executed if !pre_5_2_numa_associativity and num_nodes => 1, the
same conditions in which spapr_numa_define_FORM1_domains() is called
shortly after.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies
and doesn't need be concerned with all the legacy support for older
pseries FORM1 guests.
We're also not going to calculate associativity domains based on numa
distance (via spapr_numa_define_associativity_domains) since the
distances will be written directly into new DT properties.
Let's split FORM1 code into its own functions to allow for easier
insertion of FORM2 logic later on.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PowerISA v3.0B made tlbie[l] hypervisor privileged when PSR=0 and HR=1.
To allow the check at translation time, we'll use the HR bit of LPCR to
check the MMU mode instead of the PATE.HR.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20210917114751.206845-3-matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add a Host Radix field (hr) in DisasContext with LPCR[HR] value to allow
us to decide between Radix and HPT while validating instructions
arguments. Note that PowerISA v3.1 does not require LPCR[HR] and PATE.HR
to match if the thread is in ultravisor/hypervisor real addressing mode,
so ctx->hr may be invalid if ctx->hv and ctx->dr are set.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210917114751.206845-2-matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If an unknown pin of the IRQ controller is raised, something is very
wrong in the QEMU model. It is better to abort.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210920061203.989563-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
MEM_UNPLUG_ERROR is deprecated since the introduction of
DEVICE_UNPLUG_GUEST_ERROR. Keep emitting both while the deprecation of
MEM_UNPLUG_ERROR is pending.
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Igor Mammedov <imammedo@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210907004755.424931-8-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Linux Kernel 5.12 is now unisolating CPU DRCs in the device_removal
error path, signalling that the hotunplug process wasn't successful.
This allow us to send a DEVICE_UNPLUG_GUEST_ERROR in drc_unisolate_logical()
to signal this error to the management layer.
We also have another error path in spapr_memory_unplug_rollback() for
configured LMB DRCs. Kernels older than 5.13 will not unisolate the LMBs
in the hotunplug error path, but it will reconfigure them. Let's send
the DEVICE_UNPLUG_GUEST_ERROR event in that code path as well to cover the
case of older kernels.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210907004755.424931-7-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
At this moment we only provide one event to report a hotunplug error,
MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
machine is now able to report unplug errors for other device types, such
as CPUs.
Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
create a generic DEVICE_UNPLUG_GUEST_ERROR event that can be used by all
guest side unplug errors in the future. This event has a similar API as
the existing DEVICE_DELETED event, always providing the QOM path of the
device and dev->id if there's any.
With this new generic event, MEM_UNPLUG_ERROR is now marked as deprecated.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210907004755.424931-6-danielhb413@gmail.com>
[dwg: Correct missing ')' in stubs/qdev.c]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If we don't have active request, that waiting for this handle to be
received, we should report an error.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902103805.25686-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
OK, that's a big rewrite of the logic.
Pre-patch we have an always running coroutine - connection_co. It does
reply receiving and reconnecting. And it leads to a lot of difficult
and unobvious code around drained sections and context switch. We also
abuse bs->in_flight counter which is increased for connection_co and
temporary decreased in points where we want to allow drained section to
begin. One of these place is in another file: in nbd_read_eof() in
nbd/client.c.
We also cancel reconnect and requests waiting for reconnect on drained
begin which is not correct. And this patch fixes that.
Let's finally drop this always running coroutine and go another way:
do both reconnect and receiving in request coroutines.
The detailed list of changes below (in the sequence of diff hunks).
1. receiving coroutines are woken directly from nbd_channel_error, when
we change s->state
2. nbd_co_establish_connection_cancel(): we don't have drain_begin now,
and in nbd_teardown_connection() all requests should already be
finished (and reconnect is done from request). So
nbd_co_establish_connection_cancel() is called from
nbd_cancel_in_flight() (to cancel the request that is doing
nbd_co_establish_connection()) and from reconnect_delay_timer_cb()
(previously we didn't need it, as reconnect delay only should cancel
active requests not the reconnection itself). But now reconnection
itself is done in the separate thread (we now call
nbd_client_connection_enable_retry() in nbd_open()), and we need to
cancel the requests that wait in nbd_co_establish_connection()
now).
2A. We do receive headers in request coroutine. But we also should
dispatch replies for other pending requests. So,
nbd_connection_entry() is turned into nbd_receive_replies(), which
does reply dispatching while it receives other request headers, and
returns when it receives the requested header.
3. All old staff around drained sections and context switch is dropped.
In details:
- we don't need to move connection_co to new aio context, as we
don't have connection_co anymore
- we don't have a fake "request" of connection_co (extra increasing
in_flight), so don't care with it in drain_begin/end
- we don't stop reconnection during drained section anymore. This
means that drain_begin may wait for a long time (up to
reconnect_delay). But that's an improvement and more correct
behavior see below[*]
4. In nbd_teardown_connection() we don't have to wait for
connection_co, as it is dropped. And cleanup for s->ioc and nbd_yank
is moved here from removed connection_co.
5. In nbd_co_do_establish_connection() we now should handle
NBD_CLIENT_CONNECTING_NOWAIT: if new request comes when we are in
NBD_CLIENT_CONNECTING_NOWAIT, it still should call
nbd_co_establish_connection() (who knows, maybe the connection was
already established by another thread in the background). But we
shouldn't wait: if nbd_co_establish_connection() can't return new
channel immediately the request should fail (we are in
NBD_CLIENT_CONNECTING_NOWAIT state).
6. nbd_reconnect_attempt() is simplified: it's now easier to wait for
other requests in the caller, so here we just assert that fact.
Also delay time is now initialized here: we can easily detect first
attempt and start a timer.
7. nbd_co_reconnect_loop() is dropped, we don't need it. Reconnect
retries are fully handle by thread (nbd/client-connection.c), delay
timer we initialize in nbd_reconnect_attempt(), we don't have to
bother with s->drained and friends. nbd_reconnect_attempt() now
called from nbd_co_send_request().
8. nbd_connection_entry is dropped: reconnect is now handled by
nbd_co_send_request(), receiving reply is now handled by
nbd_receive_replies(): all handled from request coroutines.
9. So, welcome new nbd_receive_replies() called from request coroutine,
that receives reply header instead of nbd_connection_entry().
Like with sending requests, only one coroutine may receive in a
moment. So we introduce receive_mutex, which is locked around
nbd_receive_reply(). It also protects some related fields. Still,
full audit of thread-safety in nbd driver is a separate task.
New function waits for a reply with specified handle being received
and works rather simple:
Under mutex:
- if current handle is 0, do receive by hand. If another handle
received - switch to other request coroutine, release mutex and
yield. Otherwise return success
- if current handle == requested handle, we are done
- otherwise, release mutex and yield
10: in nbd_co_send_request() we now do nbd_reconnect_attempt() if
needed. Also waiting in free_sema queue we now wait for one of two
conditions:
- connectED, in_flight < MAX_NBD_REQUESTS (so we can start new one)
- connectING, in_flight == 0, so we can call
nbd_reconnect_attempt()
And this logic is protected by s->send_mutex
Also, on failure we don't have to care of removed s->connection_co
11. nbd_co_do_receive_one_chunk(): now instead of yield() and wait for
s->connection_co we just call new nbd_receive_replies().
12. nbd_co_receive_one_chunk(): place where s->reply.handle becomes 0,
which means that handling of the whole reply is finished. Here we
need to wake one of coroutines sleeping in nbd_receive_replies().
If none are sleeping - do nothing. That's another behavior change: we
don't have endless recv() in the idle time. It may be considered as
a drawback. If so, it may be fixed later.
13. nbd_reply_chunk_iter_receive(): don't care about removed
connection_co, just ping in_flight waiters.
14. Don't create connection_co, enable retry in the connection thread
(we don't have own reconnect loop anymore)
15. We now need to add a nbd_co_establish_connection_cancel() call in
nbd_cancel_in_flight(), to cancel the request that is doing a
connection attempt.
[*], ok, now we don't cancel reconnect on drain begin. That's correct:
reconnect feature leads to possibility of long-running requests (up
to reconnect delay). Still, drain begin is not a reason to kill
long requests. We should wait for them.
This also means, that we can again reproduce a dead-lock, described
in 8c517de24a.
Why we are OK with it:
1. Now this is not absolutely-dead dead-lock: the vm is unfrozen
after reconnect delay. Actually 8c517de24a fixed a bug in
NBD logic, that was not described in 8c517de24a and led to
forever dead-lock. The problem was that nobody woke the free_sema
queue, but drain_begin can't finish until there is a request in
free_sema queue. Now we have a reconnect delay timer that works
well.
2. It's not a problem of the NBD driver, but of the ide code,
because it does drain_begin under the global mutex; the problem
doesn't reproduce when using scsi instead of ide.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902103805.25686-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar and comment tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
Split out nbd_recv_coroutine_wake_one(), as it will be used
separately.
Rename the function and add a possibility to wake only first found
sleeping coroutine.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902103805.25686-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
We are going to use it in nbd_channel_error(), so move it up. Note,
that we are going also refactor and rename
nbd_recv_coroutines_wake_all() in future anyway, so keeping it where it
is and making forward declaration doesn't make real sense.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210902103805.25686-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Don't rely on connection being totally broken in case of -EIO. Safer
and more correct is to just shut down the channel anyway, since we
change the state and plan on reconnecting.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210902103805.25686-2-vsementsov@virtuozzo.com>
[eblake: grammar tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
When we don't have a connection and blocking is false, we return NULL
but don't set errp. That's wrong.
We have two paths for calling nbd_co_establish_connection():
1. nbd_open() -> nbd_do_establish_connection() -> ...
but that will never set blocking=false
2. nbd_reconnect_attempt() -> nbd_co_do_establish_connection() -> ...
but that uses errp=NULL
So, we are safe with our wrong errp policy in
nbd_co_establish_connection(). Still let's fix it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210906190654.183421-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The NBD protocol just relaxed the requirements on
NBD_OPT_LIST_META_CONTEXT:
https://github.com/NetworkBlockDevice/nbd/commit/13a4e33a87
Since listing is not stateful (unlike SET_META_CONTEXT), we don't care
if a client asks for meta contexts without first requesting structured
replies. Well-behaved clients will still ask for structured reply
first (if for no other reason than for back-compat to older servers),
but that's no reason to avoid this change.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210907173505.1499709-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>