hw/sd/sdhci: Default I/O ops to little endian
hw/mips/loongson3-virt: Only use default USB if available
hw/char/escc: Implement loopback mode to allow self-testing
target/mips: Avoid overruns and shifts by negative number
target/sparc: Handle FPRS correctly on big-endian hosts
target/tricore: Rename tricore_feature to avoid clash with libcapstone
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmS/4ksACgkQ4+MsLN6t
wN6OSg//cZY9C6fRXNNaIqkmnhjbaV6KLtjE7mOKp0RUyh3aN0dtTwWIjdJc0O5C
iipHESYhcbHTiN/TxK0zXg4KgtKmtwqGsa3QTXGdTlSkTY/dMNioSpb7p82becu0
fhCvGRLJ97j7/mhebiBNT/urrcG5h3n7CjA5IoFMMA4f+cajsGZHwmq5TTzc2ehy
4FuchjFUw+cgqU1peNYoqt2dfnxFg0EgKBSRikl8MyPf9lFzTlXOKbgd+qppG6hI
2fAUHyMqBkU22sAoK0eB0077LjgjPPQfmn8UPGkpGD5QZQcvBRNArg4fyHxCKTS7
zOsO1Qc+4D2l2RJlIHgct2pmcHdT29TlTn2T4Lg900Hm09KelZh1XF+1BemCC13z
cGWjPcYozvGFFiHlhazINtbGpB6XaP/Z3OwroRHRn+Mn3ss+FaU+j/p+4YlEVyFi
4yoEyjhNma6/hssmstifSQsaOf6XthzpS+XdKNB6G1b2WuRSc1Z59b2gcPBTwbXY
B52lfI61nzSrP9pLuS8c/6hQXQvADIEndeWEcWZ50h3WW2Cemj9jTDVgfjWC4Vg9
wV2U6NeTr+g54cSU5vcKiZrqsQHUoLiKbZFRJkXF7EEMbOErIQnyIS5l8xf71Pay
YPxuPf1VprRiR07d+ZaA+wmEaBxLCUPEl1CEuu5NPVA9S4yIIWE=
=F+Wb
-----END PGP SIGNATURE-----
Merge tag 'misc-fixes-20230725' of https://github.com/philmd/qemu into staging
Misc patches queue
hw/sd/sdhci: Default I/O ops to little endian
hw/mips/loongson3-virt: Only use default USB if available
hw/char/escc: Implement loopback mode to allow self-testing
target/mips: Avoid overruns and shifts by negative number
target/sparc: Handle FPRS correctly on big-endian hosts
target/tricore: Rename tricore_feature to avoid clash with libcapstone
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmS/4ksACgkQ4+MsLN6t
# wN6OSg//cZY9C6fRXNNaIqkmnhjbaV6KLtjE7mOKp0RUyh3aN0dtTwWIjdJc0O5C
# iipHESYhcbHTiN/TxK0zXg4KgtKmtwqGsa3QTXGdTlSkTY/dMNioSpb7p82becu0
# fhCvGRLJ97j7/mhebiBNT/urrcG5h3n7CjA5IoFMMA4f+cajsGZHwmq5TTzc2ehy
# 4FuchjFUw+cgqU1peNYoqt2dfnxFg0EgKBSRikl8MyPf9lFzTlXOKbgd+qppG6hI
# 2fAUHyMqBkU22sAoK0eB0077LjgjPPQfmn8UPGkpGD5QZQcvBRNArg4fyHxCKTS7
# zOsO1Qc+4D2l2RJlIHgct2pmcHdT29TlTn2T4Lg900Hm09KelZh1XF+1BemCC13z
# cGWjPcYozvGFFiHlhazINtbGpB6XaP/Z3OwroRHRn+Mn3ss+FaU+j/p+4YlEVyFi
# 4yoEyjhNma6/hssmstifSQsaOf6XthzpS+XdKNB6G1b2WuRSc1Z59b2gcPBTwbXY
# B52lfI61nzSrP9pLuS8c/6hQXQvADIEndeWEcWZ50h3WW2Cemj9jTDVgfjWC4Vg9
# wV2U6NeTr+g54cSU5vcKiZrqsQHUoLiKbZFRJkXF7EEMbOErIQnyIS5l8xf71Pay
# YPxuPf1VprRiR07d+ZaA+wmEaBxLCUPEl1CEuu5NPVA9S4yIIWE=
# =F+Wb
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 25 Jul 2023 15:55:07 BST
# gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE
* tag 'misc-fixes-20230725' of https://github.com/philmd/qemu:
target/tricore: Rename tricore_feature
target/sparc: Handle FPRS correctly on big-endian hosts
target/mips: Avoid shift by negative number in page_table_walk_refill()
target/mips: Pass directory/leaf shift values to walk_directory()
target/mips/mxu: Avoid overrun in gen_mxu_q8adde()
target/mips/mxu: Avoid overrun in gen_mxu_S32SLT()
target/mips/mxu: Replace magic array size by its definition
hw/char/escc: Implement loopback mode
hw/mips: Improve the default USB settings in the loongson3-virt machine
hw/sd/sdhci: Do not force sdhci_mmio_*_ops onto all SD controllers
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This change is cosmetic. A comment is added explaining why we need to check for
the availability of function 0 when we hotplug a device.
CC: mst@redhat.com
CC: mjt@tls.msk.ru
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The firmware of the m68k next-cube machine uses the loopback mode
for self-testing the hardware and currently fails during this step.
By implementing the loopback mode, we can make the firmware pass
to the next step.
Signed-off-by: Thomas Huth <huth@tuxfamily.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230716153519.31722-1-huth@tuxfamily.org>
It's possible to compile QEMU without the USB devices (e.g. when using
"--without-default-devices" as option for the "configure" script).
To be still able to run the loongson3-virt machine in default mode with
such a QEMU binary, we have to check here for the availability of the
OHCI controller first before instantiating the USB devices.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230714104903.284845-1-thuth@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Since commit c0a55a0c9d "hw/sd/sdhci: Support big endian SD host controller
interfaces" sdhci_common_realize() forces all SD card controllers to use either
sdhci_mmio_le_ops or sdhci_mmio_be_ops, depending on the "endianness" property.
However, there are device models which use different MMIO ops: TYPE_IMX_USDHC
uses usdhc_mmio_ops and TYPE_S3C_SDHCI uses sdhci_s3c_mmio_ops.
Forcing sdhci_mmio_le_ops breaks SD card handling on the "sabrelite" board, for
example. Fix this by defaulting the io_ops to little endian and switch to big
endian in sdhci_common_realize() only if there is a matchig big endian variant
available.
Fixes: c0a55a0c9d ("hw/sd/sdhci: Support big endian SD host controller
interfaces")
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Message-Id: <20230709080950.92489-1-shentey@gmail.com>
The implementation of the SMMUv3 has multiple places where it reads a
data structure from the guest and directly operates on it without
doing a guest-to-host endianness conversion. Since all SMMU data
structures are little-endian, this means that the SMMU doesn't work
on a big-endian host. In particular, this causes the Avocado test
machine_aarch64_virt.py:Aarch64VirtMachine.test_alpine_virt_tcg_gic_max
to fail on an s390x host.
Add appropriate byte-swapping on reads and writes of guest in-memory
data structures so that the device works correctly on big-endian
hosts.
As part of this we constrain queue_read() to operate only on Cmd
structs and queue_write() on Evt structs, because in practice these
are the only data structures the two functions are used with, and we
need to know what the data structure is to be able to byte-swap its
parts correctly.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20230717132641.764660-1-peter.maydell@linaro.org
Cc: qemu-stable@nongnu.org
* Fix LMUL check to use VLEN
* Fix typo field in NUMA error_report
* check priv_ver before auto-enable zca/zcd/zcf
* Fix disas output of upper immediates
* tidy CPU firmware section
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEaukCtqfKh31tZZKWr3yVEwxTgBMFAmS3akMACgkQr3yVEwxT
gBPQ/BAArrieEkrRco3tIQJFZqTLfII28M0cYdwN+gjMAkL6RlauCh5yKkc+gsGy
bhhpr0AE+EzrjKfJgdyMQe2ZH08WEpoAfJHAmLTSm2ktgIlnDAjyJtVksZ3FSwfG
MRK3v0CChyOav3EfDZzK9jcaXeaSSfjCIG8JW3enoZxf2TnpoXlsCIQdRTnMw7Um
C73BWoOGOfixFehywHBnkkAPo/nkQPofELrRKNTlefAIsH1RcgYw+s3IgCIuYxJN
zCjM1y6ye1aiaQhKcNJiLoiP4Eq2R6vUuL8RKWkXqTP3QBZUqKMPnRVgI+W0qRAj
9DS+l37zMdxytovQ4gmIqnENT8ty9bholOtWM8nI54subJBplQhkRednG3RBFYjH
hqbsakcHfE1lyyNI7WoBpO8UMtnOad6eBNmMOM48VduSdNuBZN3ksoRVomnJTlCY
nq1ZdteywHEZ3uBqk3k/4yzKH+jLj0McPz5FswxsMIGScVjd6H8rMYmM95r1He4k
YTJ8GwnOTBs1tFxOz5DaM3BVfq5hrzB0SbpDHMOdQHNXnqkyfvSd/QWeXfnY09Ux
kbNvSpzjn7wWRSP7s4KMcTmas4oGtPS2dheREB/gmoC1ubrfuhbzduDNXJt+omuC
GDcn9cpouyE/Vp/358PuEe1gW9GFMH0CbYBJ66P0hI/76iPfwLY=
=MOsI
-----END PGP SIGNATURE-----
Merge tag 'pull-riscv-to-apply-20230719-1' of https://github.com/alistair23/qemu into staging
Fourth RISC-V PR for 8.1
* Fix LMUL check to use VLEN
* Fix typo field in NUMA error_report
* check priv_ver before auto-enable zca/zcd/zcf
* Fix disas output of upper immediates
* tidy CPU firmware section
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEaukCtqfKh31tZZKWr3yVEwxTgBMFAmS3akMACgkQr3yVEwxT
# gBPQ/BAArrieEkrRco3tIQJFZqTLfII28M0cYdwN+gjMAkL6RlauCh5yKkc+gsGy
# bhhpr0AE+EzrjKfJgdyMQe2ZH08WEpoAfJHAmLTSm2ktgIlnDAjyJtVksZ3FSwfG
# MRK3v0CChyOav3EfDZzK9jcaXeaSSfjCIG8JW3enoZxf2TnpoXlsCIQdRTnMw7Um
# C73BWoOGOfixFehywHBnkkAPo/nkQPofELrRKNTlefAIsH1RcgYw+s3IgCIuYxJN
# zCjM1y6ye1aiaQhKcNJiLoiP4Eq2R6vUuL8RKWkXqTP3QBZUqKMPnRVgI+W0qRAj
# 9DS+l37zMdxytovQ4gmIqnENT8ty9bholOtWM8nI54subJBplQhkRednG3RBFYjH
# hqbsakcHfE1lyyNI7WoBpO8UMtnOad6eBNmMOM48VduSdNuBZN3ksoRVomnJTlCY
# nq1ZdteywHEZ3uBqk3k/4yzKH+jLj0McPz5FswxsMIGScVjd6H8rMYmM95r1He4k
# YTJ8GwnOTBs1tFxOz5DaM3BVfq5hrzB0SbpDHMOdQHNXnqkyfvSd/QWeXfnY09Ux
# kbNvSpzjn7wWRSP7s4KMcTmas4oGtPS2dheREB/gmoC1ubrfuhbzduDNXJt+omuC
# GDcn9cpouyE/Vp/358PuEe1gW9GFMH0CbYBJ66P0hI/76iPfwLY=
# =MOsI
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 19 Jul 2023 05:44:51 BST
# gpg: using RSA key 6AE902B6A7CA877D6D659296AF7C95130C538013
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6AE9 02B6 A7CA 877D 6D65 9296 AF7C 9513 0C53 8013
* tag 'pull-riscv-to-apply-20230719-1' of https://github.com/alistair23/qemu:
target/riscv: Fix LMUL check to use VLEN
hw/riscv: Fix typo field in error_report
target/riscv/cpu.c: check priv_ver before auto-enable zca/zcd/zcf
riscv/disas: Fix disas output of upper immediates
docs/system/target-riscv.rst: tidy CPU firmware section
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In commit 2fda0726e5 ("hw/nvme: fix missing endian conversions for
doorbell buffers"), we fixed shadow doorbells for big-endian guests
running on little endian hosts. But I did not fix little-endian guests
on big-endian hosts. Fix this.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1765
Fixes: 3f7fe8de3d ("hw/nvme: Implement shadow doorbell buffer support")
Cc: qemu-stable@nongnu.org
Reported-by: Thomas Huth <thuth@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
"smp.cpus" means the number of online CPUs and "smp.max_cpus" means the
total number of CPUs.
riscv_numa_get_default_cpu_node_id() checks "smp.cpus" and the
"available CPUs" description in the next error message also indicates
online CPUs.
So report "smp.cpus" in error_report() instand of "smp.max_cpus".
Since "smp.cpus" is "unsigned int", use "%u".
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230718080712.503333-1-zhao1.liu@linux.intel.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
If QEMU is built with --without-default-devices, the s390-flic-kvm
device is missing and QEMU aborts when started with the KVM accelerator.
Make sure it's available by selecting S390_FLIC_KVM in Kconfig.
Consequently, this also fixes an abort in tests/qtest/migration-test.
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Message-Id: <20230711151440.716822-1-clg@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Replace 'width' and 'height' in QemuDmaBuf with 'backing_widht'
and 'backing_height' as these commonly indicate the size of the
whole surface (e.g. guest's Xorg extended display). Then use
'width' and 'height' for sub region in there (e.g. guest's
scanouts).
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20230713040444.32267-1-dongwon.kim@intel.com>
The primary guest scanout shows the booting screen right after reboot
but additional guest displays (i.e. max_ouptuts > 1) will keep displaying
the old frames until the guest virtio gpu driver gets initialized, which
could cause some confusion. A better way is to to replace the surface with
a place holder that tells the display is not active during the reset of
virtio-gpu device.
And to immediately update the surface with the place holder image after
the switch, displaychangelistener_gfx_switch needs to be called with
'update == TRUE' in dpy_gfx_replace_surface when the new surface is NULL.
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20230627224451.11739-1-dongwon.kim@intel.com>
Commit 9462ff4695 ("virtio-gpu/win32: allocate shareable 2d
resources/images") introduces a division, which can lead to crashes when
"height" is 0.
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1744
Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a check in the bit-set operation to write the backstore
only if the affected bit is 0 before.
With this in place, there will be no need for callers to
do the checking in order to avoid unnecessary writes.
Signed-off-by: Tong Ho <tong.ho@amd.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This extends the slots of xhci to 64, since the default xhci_sysbus
just supports one slot.
Signed-off-by: Wang Yuquan <wangyuquan1236@phytium.com.cn>
Signed-off-by: Chen Baozi <chenbaozi@phytium.com.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Tested-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Message-id: 20230710063750.473510-2-wangyuquan1236@phytium.com.cn
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The ppi command line option for the TIS device on sysbus never worked
and caused an immediate segfault. Remove support for it since it also
needs support in the firmware and needs testing inside the VM.
Reproducer with the ppi=on option passed:
qemu-system-aarch64 \
-machine virt,gic-version=3 \
-m 4G \
-nographic -no-acpi \
-chardev socket,id=chrtpm,path=/tmp/mytpm1/swtpm-sock \
-tpmdev emulator,id=tpm0,chardev=chrtpm \
-device tpm-tis-device,tpmdev=tpm0,ppi=on
[...]
Segmentation fault (core dumped)
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20230713171955.149236-1-stefanb@linux.ibm.com
scsi_clear_unit_attention() now only handles REPORTED LUNS DATA HAS
CHANGED.
This only happens when we handle REPORT LUNS commands, so let's rename
the function in scsi_clear_reported_luns_changed() and call it only in
scsi_target_emulate_report_luns().
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-ID: <20230712134352.118655-4-sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The previous commit moved the unit attention clearing when we create
the request. So now we can clean scsi_clear_unit_attention() to handle
only the case of the REPORT LUNS command: this is the only case in
which a UNIT ATTENTION is cleared without having been reported.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-ID: <20230712134352.118655-3-sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 1880ad4f4e ("virtio-scsi: Batched prepare for cmd reqs") split
calls to scsi_req_new() and scsi_req_enqueue() in the virtio-scsi device.
No ill effects were observed until commit 8cc5583abe ("virtio-scsi: Send
"REPORTED LUNS CHANGED" sense data upon disk hotplug events") added a
unit attention that was easy to trigger with device hotplug and
hot-unplug.
Because the two calls were separated, all requests in the batch were
prepared calling scsi_req_new() to report a sense. The first one
submitted would report the right sense and reset it to NO_SENSE, while
the others reported CHECK_CONDITION with no sense data. This caused
SCSI errors in Linux.
To solve this issue, let's fetch the unit attention as early as possible
when we prepare the request, so that only the first request in the batch
will use the unit attention SCSIReqOps and the others will not report
CHECK CONDITION.
Fixes: 1880ad4f4e ("virtio-scsi: Batched prepare for cmd reqs")
Fixes: 8cc5583abe ("virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events")
Reported-by: Thomas Huth <thuth@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-ID: <20230712134352.118655-2-sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is useful to extend the number of available PCIe devices to KVM guests
for passthrough scenarios and also to expose these models to a different
(big endian) architecture. Introduce a new config PCIE_DEVICES to select
models, Intel Ethernet adapters and one USB controller. These devices all
support MSI-X which is a requirement on s390x as legacy INTx are not
supported.
Cc: Matthew Rosato <mjrosato@linux.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20230712080146.839113-1-clg@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to the 82371FB documentation (82371FB.pdf, 2.3.9. BMIBA-BUS
MASTER INTERFACE BASE ADDRESS REGISTER, April 1997), the register is
32bit wide. To properly reset it to default values, all 32bit need to be
cleared. Bit #0 "Resource Type Indicator (RTE)" needs to be enabled.
The initial change wrote just the lower 8 bit, leaving parts of the "Bus
Master Interface Base Address" address at bit 15:4 unchanged.
Fixes: e6a71ae327 ("Add support for 82371FB (Step A1) and Improved support for 82371SB (Function 1)")
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230712074721.14728-1-olaf@aepfle.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The main loop thread can consume 100% CPU when using --device
virtio-blk-pci,iothread=<iothread>. ppoll() constantly returns but
reading virtqueue host notifiers fails with EAGAIN. The file descriptors
are stale and remain registered with the AioContext because of bugs in
the virtio-blk dataplane start/stop code.
The problem is that the dataplane start/stop code involves drain
operations, which call virtio_blk_drained_begin() and
virtio_blk_drained_end() at points where the host notifier is not
operational:
- In virtio_blk_data_plane_start(), blk_set_aio_context() drains after
vblk->dataplane_started has been set to true but the host notifier has
not been attached yet.
- In virtio_blk_data_plane_stop(), blk_drain() and blk_set_aio_context()
drain after the host notifier has already been detached but with
vblk->dataplane_started still set to true.
I would like to simplify ->ioeventfd_start/stop() to avoid interactions
with drain entirely, but couldn't find a way to do that. Instead, this
patch accepts the fragile nature of the code and reorders it so that
vblk->dataplane_started is false during drain operations. This way the
virtio_blk_drained_begin() and virtio_blk_drained_end() calls don't
touch the host notifier. The result is that
virtio_blk_data_plane_start() and virtio_blk_data_plane_stop() have
complete control over the host notifier and stale file descriptors are
no longer left in the AioContext.
This patch fixes the 100% CPU consumption in the main loop thread and
correctly moves host notifier processing to the IOThread.
Fixes: 1665d9326f ("virtio-blk: implement BlockDevOps->drained_begin()")
Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Lukas Doktor <ldoktor@redhat.com>
Message-id: 20230704151527.193586-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Let's support device unplug by forwarding the unplug_request_check()
callback to the virtio-mem device.
Further, disallow changing the requested-size once an unplug request is
pending.
Disallowing requested-size changes handles corner cases such as
(1) pausing the VM (2) requesting device unplug and (3) adjusting the
requested size. If the VM would plug memory (due to the requested size
change) before processing the unplug request, we would be in trouble.
Message-ID: <20230711153445.514112-8-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
In many cases, blindly unplugging a virtio-mem device is problematic. We
can only safely remove a device once:
* The guest is not expecting to be able to read unplugged memory
(unplugged-inaccessible == on)
* The virtio-mem device does not have memory plugged (size == 0)
* The virtio-mem device does not have outstanding requests to the VM to
plug memory (requested-size == 0)
So let's add a callback to the virtio-mem device class to check for that.
We'll wire-up virtio-mem-pci next.
Message-ID: <20230711153445.514112-7-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's support unplug requests for virtio-md-pci devices that provide
a unplug_request_check() callback.
We'll wire that up for virtio-mem-pci next.
Message-ID: <20230711153445.514112-6-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
While we fence unplug requests from the outside, the VM can still
trigger unplug of virtio based memory devices, for example, in Linux
doing on a virtio-mem-pci device:
# echo 0 > /sys/bus/pci/slots/3/power
While doing that is not really expected to work without harming the
guest OS (e.g., removing a virtio-mem device while it still provides
memory), let's make sure that we properly handle it on the QEMU side.
We'll add support for unplugging of virtio-mem devices in some
configurations next.
Message-ID: <20230711153445.514112-5-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's use our new helper functions. Note that virtio-pmem-pci is not
enabled for arm and, therefore, not compiled in.
Message-ID: <20230711153445.514112-4-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's factor out (un)plug handling, to be reused from arm/virt code.
Provide stubs for the case that CONFIG_VIRTIO_MD is not selected because
neither virtio-mem nor virtio-pmem is enabled. While this cannot
currently happen for x86, it will be possible for arm/virt.
Message-ID: <20230711153445.514112-3-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's add a new abstract "virtio memory device" type, and use it as
parent class of virtio-mem-pci and virtio-pmem-pci.
Message-ID: <20230711153445.514112-2-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
To achieve desired "x-ignore-shared" functionality, we should not
discard all RAM when realizing the device and not mess with
preallocation/postcopy when loading device state. In essence, we should
not touch RAM content.
As "x-ignore-shared" gets set after realizing the device, we cannot
rely on that. Let's simply skip discarding of RAM on incoming migration.
Note that virtio_mem_post_load() will call
virtio_mem_restore_unplugged() -- unless "x-ignore-shared" is set. So
once migration finished we'll have a consistent state.
The initial system reset will also not discard any RAM, because
virtio_mem_unplug_all() will not call virtio_mem_unplug_all() when no
memory is plugged (which is the case before loading the device state).
Note that something like VM templating -- see commit b17fbbe55c
("migration: allow private destination ram with x-ignore-shared") -- is
currently incompatible with virtio-mem and ram_block_discard_range() will
warn in case a private file mapping is supplied by virtio-mem.
For VM templating with virtio-mem, it makes more sense to either
(a) Create the template without the virtio-mem device and hotplug a
virtio-mem device to the new VM instances using proper own memory
backend.
(b) Use a virtio-mem device that doesn't provide any memory in the
template (requested-size=0) and use private anonymous memory.
Message-ID: <20230706075612.67404-5-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Already when starting QEMU we perform one system reset that ends up
triggering virtio_mem_unplug_all() with no actual memory plugged yet.
That, in turn will trigger ram_block_discard_range() and perform some
other actions that are not required in that case.
Let's optimize virtio_mem_unplug_all() for the case that no memory is
plugged. This will be beneficial for x-ignore-shared support as well.
Message-ID: <20230706075612.67404-3-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's avoid iterating over all devices and simply track it in the
DeviceMemoryState.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230623124553.400585-11-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's move memory_device_check_addable() and basic checks out of
memory_device_get_free_addr() directly into memory_device_pre_plug().
Separating basic checks from address assignment is cleaner and
prepares for further changes.
As all memory device users now use memory_devices_init(), and that
function enforces that the size is 0, we can drop the check for an empty
region.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230623124553.400585-10-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
There are no remaining users in the tree. Libvirt never used that
property and a quick internet search revealed no other users.
Further, we renamed that property already in commit f2ffbe2b7d
("pc: rename "hotplug memory" terminology to "device memory"") without
anybody complaining.
So let's just get rid of it.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230623124553.400585-9-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
We're already looking at machine->device_memory when calling
build_srat_memory(), so let's simply avoid going via
PC_MACHINE_DEVMEM_REGION_SIZE to get the size and rely on
machine->device_memory directly.
Once machine->device_memory is set, we know that the size > 0. The code now
looks much more similar the hw/arm/virt-acpi-build.c variant.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230623124553.400585-8-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's use our new helper and stop always allocating ms->device_memory.
Once allcoated, we're sure that the size > 0 and that the base was
initialized.
Adjust the code in pc_memory_init() to check for machine->device_memory
instead of pcmc->has_reserved_memory and machine->device_memory->base.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230623124553.400585-7-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's use our new helper. While at it, use VIRT_HIGHMEM_BASE.
Cc: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Cc: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230623124553.400585-6-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's use our new helper and stop always allocating ms->device_memory.
There is no difference in common memory-device code anymore between
ms->device_memory being NULL or the size being 0. So we only have to
teach spapr code that ms->device_memory isn't always around.
We can now modify two maxram_size checks to rely on ms->device_memory
for detecting whether we have memory devices.
Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: "Cédric Le Goater" <clg@kaod.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Greg Kurz <groug@kaod.org>
Cc: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230623124553.400585-5-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's use our new helper. We'll add the subregion to system RAM now
earlier. That shouldn't matter, because the system RAM memory region should
already be alive at that point.
Cc: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230623124553.400585-4-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's intrduce a new helper that we will use to replace existing memory
device setup code during machine initialization. We'll enforce that the
size has to be > 0.
Once all machines were converted, we'll only allocate ms->device_memory
if the size > 0.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230623124553.400585-3-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's unify the error messages, such that we can simply stop allocating
ms->device_memory if the size would be 0 (and there are no memory
devices ever).
The case of "not supported by the machine" should barely pop up either
way: if the machine doesn't support memory devices, it usually doesn't
call the pre_plug handler ...
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230623124553.400585-2-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>