Commit 3b8022269c added the capability of named features/profile
extensions to be added in riscv,isa. To do that we had to assign priv
versions for each one of them in isa_edata_arr[]. But this resulted in a
side-effect: vendor CPUs that aren't running priv_version_latest started
to experience warnings for these profile extensions [1]:
| $ qemu-system-riscv32 -M sifive_e
| qemu-system-riscv32: warning: disabling zic64b extension for hart
0x00000000 because privilege spec version does not match
| qemu-system-riscv32: warning: disabling ziccamoa extension for
hart 0x00000000 because privilege spec version does not match
This is benign as far as the CPU behavior is concerned since disabling
both extensions is a no-op (aside from riscv,isa). But the warnings are
unpleasant to deal with, especially because we're sending user warnings
for extensions that users can't enable/disable.
Instead of enabling all named features all the time, separate them by
priv version. During finalize() time, after we decided which
priv_version the CPU is running, enable/disable all the named extensions
based on the priv spec chosen. This will be enough for a bug fix, but as
a future work we should look into how we can name these extensions in a
way that we don't need an explicit ext_name => priv_ver as we're doing
here.
The named extensions being added in isa_edata_arr[] that will be
enabled/disabled based solely on priv version can be removed from
riscv_cpu_named_features[]. 'zic64b' is an extension that can be
disabled based on block sizes so it'll retain its own flag and entry.
[1] https://lists.gnu.org/archive/html/qemu-devel/2024-03/msg02592.html
Reported-by: Clément Chigot <chigot@adacore.com>
Fixes: 3b8022269c ("target/riscv: add riscv,isa to named features")
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Clément Chigot <chigot@adacore.com>
Message-ID: <20240312203214.350980-1-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
an attempt to spawn child process is made
* Reduce priority of POLLHUP handling for socket chardevs
to increase likelihood of pending data being processed
* Fix chardev I/O main loop integration when TLS is enabled
* Fix broken crypto test suite when distro disables
SM4 algorithm
* Improve diagnosis of failed crypto tests
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE2vOm/bJrYpEtDo4/vobrtBUQT98FAmX585EACgkQvobrtBUQ
T98TIg//ekc/f0JrRs68hjmo/vfcHWGHDMbZagj48zZNIn8DhJmQdt+qrCjMrMGW
353nTawFuF3EO9ju/eRLO54T+p1+a3zX8TyO4tL1W+RY9HARPeqssmFemDPfkMfQ
IFGv0M0vaxGZpBna7jlXfDK/hCbJexKoChyT4eSF9H1Tp9o6T2J9AWvB5WTYLoQ2
GzusDqBLKTkKhxMTCqevkFD/yCkgIQKlX8mG188PoJnGMqpGzQLTyw9lo5Npi1nE
nhXa2MrrSfusk0rtwEzT14sQ58U+MF4fLQxUC+knNX81FSv8Q6QDu4Stfhwc+az7
ynO4b/3IzK+VCICb2QM1ZNoTZNLcLfw1jdFTIAt8wiE+BMSySNQtdneURZOynydy
Qd0alPNb4zfVRIGVjoOj38HiOmIKp5riIsUsI03jjBAgJu47tYRi60Tq2t6KxVoP
rpDd5Vmsd0AR+7acO29rp0aLB+x2/ANDY+1N1Xi4tQdblmKIziHPZzx6H49wbwev
8Jdghg10RpbdqIGOfZ9fn13iCDO+1/gy6g/jTe2tMZrZsyov904tDqyUCDCzAbTz
B8lvnr0LfSX2DYBryGEHIa/eMN2TxPuzpvZP0JFO1QxJnOs9w3aHr1T6A1sCV4a3
JjTu71LsomNMXj3t3ImBHzMlgQZoL5Bxoh7b7jbLO4cvnhRbiJk=
=4HKW
-----END PGP SIGNATURE-----
Merge tag 'misc-fixes-pull-request' of https://gitlab.com/berrange/qemu into staging
* Use EPERM for seccomp filter instead of killing QEMU when
an attempt to spawn child process is made
* Reduce priority of POLLHUP handling for socket chardevs
to increase likelihood of pending data being processed
* Fix chardev I/O main loop integration when TLS is enabled
* Fix broken crypto test suite when distro disables
SM4 algorithm
* Improve diagnosis of failed crypto tests
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE2vOm/bJrYpEtDo4/vobrtBUQT98FAmX585EACgkQvobrtBUQ
# T98TIg//ekc/f0JrRs68hjmo/vfcHWGHDMbZagj48zZNIn8DhJmQdt+qrCjMrMGW
# 353nTawFuF3EO9ju/eRLO54T+p1+a3zX8TyO4tL1W+RY9HARPeqssmFemDPfkMfQ
# IFGv0M0vaxGZpBna7jlXfDK/hCbJexKoChyT4eSF9H1Tp9o6T2J9AWvB5WTYLoQ2
# GzusDqBLKTkKhxMTCqevkFD/yCkgIQKlX8mG188PoJnGMqpGzQLTyw9lo5Npi1nE
# nhXa2MrrSfusk0rtwEzT14sQ58U+MF4fLQxUC+knNX81FSv8Q6QDu4Stfhwc+az7
# ynO4b/3IzK+VCICb2QM1ZNoTZNLcLfw1jdFTIAt8wiE+BMSySNQtdneURZOynydy
# Qd0alPNb4zfVRIGVjoOj38HiOmIKp5riIsUsI03jjBAgJu47tYRi60Tq2t6KxVoP
# rpDd5Vmsd0AR+7acO29rp0aLB+x2/ANDY+1N1Xi4tQdblmKIziHPZzx6H49wbwev
# 8Jdghg10RpbdqIGOfZ9fn13iCDO+1/gy6g/jTe2tMZrZsyov904tDqyUCDCzAbTz
# B8lvnr0LfSX2DYBryGEHIa/eMN2TxPuzpvZP0JFO1QxJnOs9w3aHr1T6A1sCV4a3
# JjTu71LsomNMXj3t3ImBHzMlgQZoL5Bxoh7b7jbLO4cvnhRbiJk=
# =4HKW
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 19 Mar 2024 20:20:33 GMT
# gpg: using RSA key DAF3A6FDB26B62912D0E8E3FBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>" [full]
# gpg: aka "Daniel P. Berrange <berrange@redhat.com>" [full]
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* tag 'misc-fixes-pull-request' of https://gitlab.com/berrange/qemu:
crypto: report which ciphers are being skipped during tests
crypto: use error_abort for unexpected failures
crypto: query gcrypt for cipher availability
crypto: factor out conversion of QAPI to gcrypt constants
Revert "chardev: use a child source for qio input source"
Revert "chardev/char-socket: Fix TLS io channels sending too much data to the backend"
chardev: lower priority of the HUP GSource in socket chardev
seccomp: report EPERM instead of killing process for spawn set
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The "link_depends" key has not been used since commit c46f76d158
("meson: specify fuzz linker script as a project arg", 2020-09-08),
and even before that it was only used for fork-fuzzing which we
removed in commit d2e6f9272d ("fuzz: remove fork-fuzzing scaffolding",
2023-02-16).
So, remove it for a very small simplification of meson.build.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
PAuth makes run timeout on CI so add tests using 'max' without
it and with impdef one.
Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240318-sbsa-ref-firmware-update-v3-4-1c33b995a538@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
PAuth makes run timeout on CI so add tests using 'max' without it
and with impdef one.
Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240318-sbsa-ref-firmware-update-v3-3-1c33b995a538@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
sbsa-ref is supposed to emulate real hardware so virtio-rng-pci
does not fit here
Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Message-Id: <20240318-sbsa-ref-firmware-update-v3-2-1c33b995a538@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
We now have CI job to build those and publish in space with
readable urls.
Firmware is built using Debian 'bookworm' cross toolchain (gcc 12.2.0).
Used versions:
- Trusted Firmware v2.10.2
- Tianocore EDK2 stable202402
- Tianocore EDK2 Platforms code commit 085c2fb
Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240318-sbsa-ref-firmware-update-v3-1-1c33b995a538@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
This avoids fetching blobs and tree references for branches we are not
going to worry about. Also skip tag references which are similarly not
useful and keep the default --prune. This keeps the .git data to
around 100M rather than the ~400M even a shallow clone takes.
So we can check the savings we also run a quick du while setting up
the build.
We also have to have special settings of GIT_FETCH_EXTRA_FLAGS for the
Windows build, the migration legacy test and the custom runners. In
the case of the custom runners we also move the free floating variable
to the runner template.
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240312170011.1688444-1-alex.bennee@linaro.org>
rec->count.score is inside rec, which is freed before rec->count.score is.
Reorder the instructions
Reported by Coverity as CID 1539967.
Cc: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
monitor_puts() doesn't check the monitor pointer, but do_inject_x86_mce()
may have a parameter with NULL monitor pointer. Revert monitor_puts() in
do_inject_x86_mce() to fix, then the fact that we send the same message to
monitor and log is again more obvious.
Fixes: bf0c50d4aa (monitor: expose monitor_puts to rest of code)
Reviwed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Message-ID: <20240320083640.523287-1-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Building dbus-display1.c explicitly as a static library drops -fPIC by
default, which may not be correct if it ends up linked to a shared
library.
Let the target decide how to build the unit, with or without -fPIC. This
makes commit 186acfbaf7 ("tests/qtest: Depend on dbus_display1_dep") no
longer relevant, as dbus-display1.c will be recompiled.
Fixes: c172136ea3 ("meson: ensure dbus-display generated code is built
before other units")
Reported-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
ui/curses is the only user of console_select(). Move the implementation
to ui/curses.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20240319-console-v2-4-3fd6feef321a@daynix.com>
ui/cocoa needs to update the UI info and reset the keyboard state
tracker when switching the console, or the new console will see the
stale UI info or keyboard state. Previously, updating the UI info was
done with cocoa_switch(), but it is meant to be called when the surface
is being replaced, and may be called even when not switching the
console. ui/cocoa never reset the keyboard state, which resulted in
stuck keys.
Add ui/cocoa's own implementation of console_select(), which updates the
UI info and resets the keyboard state tracker.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20240319-console-v2-3-3fd6feef321a@daynix.com>
console_select() is shared by other displays and a console_select() call
from one of them triggers console switching also in ui/curses,
circumventing key state reinitialization that needs to be performed in
preparation and resulting in stuck keys.
Use its internal state to track the current active console to prevent
such a surprise console switch.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20240319-console-v2-2-3fd6feef321a@daynix.com>
A chardev-vc used to inherit the size of a graphic console when its
size not explicitly specified, but it often did not make sense. If a
chardev-vc is instantiated during the startup, the active graphic
console has no content at the time, so it will have the size of graphic
console placeholder, which contains no useful information. It's better
to have the standard size of text console instead.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20240319-console-v2-1-3fd6feef321a@daynix.com>
On gen_ll, if a->imm is zero, make_address_x return src1,
but the load to destination may clobber src1. We use a new
destination to fix this problem.
Fixes: c5af6628f4 (target/loongarch: Extract make_address_i() helper)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240320013955.1561311-1-gaosong@loongson.cn>
When we use qemu tcg simulation, the page size of bios is 4KB.
When using the level 2 super huge page (page size is 1G) to create the page table,
it is found that the content of the corresponding address space is abnormal,
resulting in the bios can not start the operating system and graphical interface normally.
The lddir and ldpte instruction emulation has
a problem with the use of super huge page processing above level 2.
The page size is not correctly calculated,
resulting in the wrong page size of the table entry found by tlb.
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240318070332.1273939-1-lixianglai@loongson.cn>
Interrupt number in loop sentence should be base irq plus
loop index, it is missing on checking whether the irq
is pending.
Fixes: 428a6ef439 ("Add vmstate post_load support")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240313093932.2653518-1-maobibo@loongson.cn>
stdby,e,m was writing data from the wrong half of the register
into memory for cases 0-3.
Fixes: 25460fc5a7 ("target/hppa: Implement STDBY")
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240319161921.487080-7-svens@stackframe.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
mfia should return only the iaoq bits without privilege
bits.
Fixes: 98a9cb792c ("target-hppa: Implement system and memory-management insns")
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Helge Deller <deller@gmx.de>
Message-Id: <20240319161921.487080-6-svens@stackframe.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When the guest modifies the tb it is currently executing from,
it executes a fic instruction. Exit the tb on such instruction,
otherwise we might execute stale code.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Message-Id: <20240319161921.487080-5-svens@stackframe.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since the ciphers can be dynamically disabled at runtime, when running
unit tests it is helpful to report which ciphers we can skipped for
testing.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This improves the error diagnosis from the unit test when a cipher
is unexpected not available from
ERROR:../tests/unit/test-crypto-cipher.c:683:test_cipher: assertion failed: (err == NULL)
Bail out! ERROR:../tests/unit/test-crypto-cipher.c:683:test_cipher: assertion failed: (err == NULL)
Aborted (core dumped)
to
Unexpected error in qcrypto_cipher_ctx_new() at ../crypto/cipher-gcrypt.c.inc:262:
./build//tests/unit/test-crypto-cipher: Cannot initialize cipher: Invalid cipher algorithm
Aborted (core dumped)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Just because a cipher is defined in the gcrypt header file, does not
imply that it can be used. Distros can filter the list of ciphers when
building gcrypt. For example, RHEL-9 disables the SM4 cipher. It is
also possible that running in FIPS mode might dynamically change what
ciphers are available at runtime.
qcrypto_cipher_supports must therefore query gcrypt directly to check
for cipher availability.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The conversion of cipher mode will shortly be required in more
than one place.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This reverts commit a7077b8e35,
and add comments to explain why child sources cannot be used.
When a GSource is added as a child of another GSource, if its
'prepare' function indicates readiness, then the parent's
'prepare' function will never be run. The io_watch_poll_prepare
absolutely *must* be run on every iteration of the main loop,
to ensure that the chardev backend doesn't feed data to the
frontend that it is unable to consume.
At the time a7077b8e35 was made,
all the child GSource impls were relying on poll'ing an FD,
so their 'prepare' functions would never indicate readiness
ahead of poll() being invoked. So the buggy behaviour was
not noticed and lay dormant.
Relatively recently the QIOChannelTLS impl introduced a
level 2 child GSource, which checks with GNUTLS whether it
has cached any data that was decoded but not yet consumed:
commit ffda5db65a
Author: Antoine Damhet <antoine.damhet@shadow.tech>
Date: Tue Nov 15 15:23:29 2022 +0100
io/channel-tls: fix handling of bigger read buffers
Since the TLS backend can read more data from the underlying QIOChannel
we introduce a minimal child GSource to notify if we still have more
data available to be read.
Signed-off-by: Antoine Damhet <antoine.damhet@shadow.tech>
Signed-off-by: Charles Frey <charles.frey@shadow.tech>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
With this, it is now quite common for the 'prepare' function
on a QIOChannelTLS GSource to indicate immediate readiness,
bypassing the parent GSource 'prepare' function. IOW, the
critical 'io_watch_poll_prepare' is being skipped on some
iterations of the main loop. As a result chardev frontend
asserts are now being triggered as they are fed data they
are not ready to consume.
A reproducer is as follows:
* In terminal 1 run a GNUTLS *echo* server
$ gnutls-serv --echo \
--x509cafile ca-cert.pem \
--x509keyfile server-key.pem \
--x509certfile server-cert.pem \
-p 9000
* In terminal 2 run a QEMU guest
$ qemu-system-s390x \
-nodefaults \
-display none \
-object tls-creds-x509,id=tls0,dir=$PWD,endpoint=client \
-chardev socket,id=con0,host=localhost,port=9000,tls-creds=tls0 \
-device sclpconsole,chardev=con0 \
-hda Fedora-Cloud-Base-39-1.5.s390x.qcow2
After the previous patch revert, but before this patch revert,
this scenario will crash:
qemu-system-s390x: ../hw/char/sclpconsole.c:73: chr_read: Assertion
`size <= SIZE_BUFFER_VT220 - scon->iov_data_len' failed.
This assert indicates that 'tcp_chr_read' was called without
'tcp_chr_read_poll' having first been checked for ability to
receive more data
QEMU's use of a 'prepare' function to create/delete another
GSource is rather a hack and not normally the kind of thing that
is expected to be done by a GSource. There is no mechanism to
force GLib to always run the 'prepare' function of a parent
GSource. The best option is to simply not use the child source
concept, and go back to the functional approach previously
relied on.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This commit results in unexpected termination of the TLS connection.
When 'fd_can_read' returns 0, the code goes on to pass a zero length
buffer to qio_channel_read. The TLS impl calls into gnutls_recv()
with this zero length buffer, at which point GNUTLS returns an error
GNUTLS_E_INVALID_REQUEST. This is treated as fatal by QEMU's TLS code
resulting in the connection being torn down by the chardev.
Simply skipping the qio_channel_read when the buffer length is zero
is also not satisfactory, as it results in a high CPU burn busy loop
massively slowing QEMU's functionality.
The proper solution is to avoid tcp_chr_read being called at all
unless the frontend is able to accept more data. This will be done
in a followup commit.
This reverts commit 462945cd22
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The socket chardev often has 2 GSource object registered against the
same FD. One is registered all the time and is just intended to handle
POLLHUP events, while the other gets registered & unregistered on the
fly as the frontend is ready to receive more data or not.
It is very common for poll() to signal a POLLHUP event at the same time
as there is pending incoming data from the disconnected client. It is
therefore essential to process incoming data prior to processing HUP.
The problem with having 2 GSource on the same FD is that there is no
guaranteed ordering of execution between them, so the chardev code may
process HUP first and thus discard data.
This failure scenario is non-deterministic but can be seen fairly
reliably by reverting a7077b8e35, and
then running 'tests/unit/test-char', which will sometimes fail with
missing data.
Ideally QEMU would only have 1 GSource, but that's a complex code
refactoring job. The next best solution is to try to ensure ordering
between the 2 GSource objects. This can be achieved by lowering the
priority of the HUP GSource, so that it is never dispatched if the
main GSource is also ready to dispatch. Counter-intuitively, lowering
the priority of a GSource is done by raising its priority number.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
When something tries to run one of the spawn syscalls (eg clone),
our seccomp deny filter is set to cause a fatal trap which kills
the process.
This is found to be unhelpful when QEMU has loaded the nvidia
GL library. This tries to spawn a process to modprobe the nvidia
kmod. This is a dubious thing to do, but at the same time, the
code will gracefully continue if this fails. Our seccomp filter
rightly blocks the spawning, but prevent the graceful continue.
Switching to reporting EPERM will make QEMU behave more gracefully
without impacting the level of protect we have.
https://gitlab.com/qemu-project/qemu/-/issues/2116
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This fix solves the "failed to set up stack guard page" error that has been
reported on Linux hosts where the QEMU coroutine pool exceeds the
vm.max_map_count limit.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmX5qq0ACgkQnKSrs4Gr
c8ginQf8DRKzA7K8OivEegKpf0TgGcAcw9/xKc6zJH3X0/GXi1my61tzz+XUkbNy
/R9HRrjBUb4MhSmJzP9kxuPFcBD5fZeipg4eTqtJCdi+DQ57+YypShVpsDrD7eNv
X5dxeeONdWwP+k9JiOj9NtSOMmTKExn/Q/w45G2eeBlJh4yRA+56XN/dDXTFlidm
NEpOGrKbyFKuAf/ZwYmeBr4aqIGTN3UgOVco/rqkGPYPTYpKlCoE5rSTEnQrbR7/
C9KojlrGawJXlKjxfu/6i7yGHrv0eJ2N1VauvR/DHhQvdRhojVVt3NFGG/WJi+cL
CMbxNyYeQJLNFtfPWzokjKEudxkshg==
=lznr
-----END PGP SIGNATURE-----
Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging
Pull request
This fix solves the "failed to set up stack guard page" error that has been
reported on Linux hosts where the QEMU coroutine pool exceeds the
vm.max_map_count limit.
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmX5qq0ACgkQnKSrs4Gr
# c8ginQf8DRKzA7K8OivEegKpf0TgGcAcw9/xKc6zJH3X0/GXi1my61tzz+XUkbNy
# /R9HRrjBUb4MhSmJzP9kxuPFcBD5fZeipg4eTqtJCdi+DQ57+YypShVpsDrD7eNv
# X5dxeeONdWwP+k9JiOj9NtSOMmTKExn/Q/w45G2eeBlJh4yRA+56XN/dDXTFlidm
# NEpOGrKbyFKuAf/ZwYmeBr4aqIGTN3UgOVco/rqkGPYPTYpKlCoE5rSTEnQrbR7/
# C9KojlrGawJXlKjxfu/6i7yGHrv0eJ2N1VauvR/DHhQvdRhojVVt3NFGG/WJi+cL
# CMbxNyYeQJLNFtfPWzokjKEudxkshg==
# =lznr
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 19 Mar 2024 15:09:33 GMT
# gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
coroutine: cap per-thread local pool size
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The coroutine pool implementation can hit the Linux vm.max_map_count
limit, causing QEMU to abort with "failed to allocate memory for stack"
or "failed to set up stack guard page" during coroutine creation.
This happens because per-thread pools can grow to tens of thousands of
coroutines. Each coroutine causes 2 virtual memory areas to be created.
Eventually vm.max_map_count is reached and memory-related syscalls fail.
The per-thread pool sizes are non-uniform and depend on past coroutine
usage in each thread, so it's possible for one thread to have a large
pool while another thread's pool is empty.
Switch to a new coroutine pool implementation with a global pool that
grows to a maximum number of coroutines and per-thread local pools that
are capped at hardcoded small number of coroutines.
This approach does not leave large numbers of coroutines pooled in a
thread that may not use them again. In order to perform well it
amortizes the cost of global pool accesses by working in batches of
coroutines instead of individual coroutines.
The global pool is a list. Threads donate batches of coroutines to when
they have too many and take batches from when they have too few:
.-----------------------------------.
| Batch 1 | Batch 2 | Batch 3 | ... | global_pool
`-----------------------------------'
Each thread has up to 2 batches of coroutines:
.-------------------.
| Batch 1 | Batch 2 | per-thread local_pool (maximum 2 batches)
`-------------------'
The goal of this change is to reduce the excessive number of pooled
coroutines that cause QEMU to abort when vm.max_map_count is reached
without losing the performance of an adequately sized coroutine pool.
Here are virtio-blk disk I/O benchmark results:
RW BLKSIZE IODEPTH OLD NEW CHANGE
randread 4k 1 113725 117451 +3.3%
randread 4k 8 192968 198510 +2.9%
randread 4k 16 207138 209429 +1.1%
randread 4k 32 212399 215145 +1.3%
randread 4k 64 218319 221277 +1.4%
randread 128k 1 17587 17535 -0.3%
randread 128k 8 17614 17616 +0.0%
randread 128k 16 17608 17609 +0.0%
randread 128k 32 17552 17553 +0.0%
randread 128k 64 17484 17484 +0.0%
See files/{fio.sh,test.xml.j2} for the benchmark configuration:
https://gitlab.com/stefanha/virt-playbooks/-/tree/coroutine-pool-fix-sizing
Buglink: https://issues.redhat.com/browse/RHEL-28947
Reported-by: Sanjay Rao <srao@redhat.com>
Reported-by: Boaz Ben Shabat <bbenshab@redhat.com>
Reported-by: Joe Mario <jmario@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240318183429.1039340-1-stefanha@redhat.com>
* user device fixes for Aspeed and PowerNV machines
* coverity fix for iommufd
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmX5mm0ACgkQUaNDx8/7
7KE/MQ/9GeX4yNBxY2iTATdmPXwjMw8AtKyfIQb605nIO0ch1Z98ywl5VMwCNohn
ppY9L5bFpEASgRlFVm73X4DGxKyRGpRPqylsvINh0hKciRpmRkELHY3llhnXsd7P
Q197pDtFr54FeX8j4+hSAu4paT97fPENlKn0J6lto2I1cXGcD1LYNDFhysoXdGme
brJgo7KjQJZPZ560ZewskL5FWf3G9EkRjpqd8y0G5OtNmAPgAaahOMHhDCXan182
J89I9CHI5xN45MRfAs8JamSaj/GyNsr4h04WhPa0+VZQ5vsaeW2Ekt4ypj+oAV+p
wykhYzQk4ALZcmmph2flSAtLa7uheI+imyqubMthQCDj3G8onSQBMd5/4WRK6O49
0oE1DpPDEfhlJEQYxaYhOeqeA9iaP+w6V+yE+L5oGlMO66cR7GZsPu0x7kXailbH
IoHw9mO+vMkpuyeP7M3hA8WRFCdFpf1Nn1Ao5Jz3KoiTyJWlIvX5VSaj12sjddQ2
fU9SKu2Q5QqS5uQGakkY64EyUy7RkGIX6zY2NIscVe2lfAfKf3mZwu7OIuLjEy5O
lRn35vWV8fOdRooKoDPTNcdBCaNPi+RApin8chOv5P+F+ie7+Twf9sb1AgH/pIcv
HptvTXbvSFNbbdb+OE8a5qsqTvnrN8d31IXzrWRYsJB07x2IyoA=
=zR3v
-----END PGP SIGNATURE-----
Merge tag 'pull-for-9.0-20240319' of https://github.com/legoater/qemu into staging
aspeed, pnv, vfio queue:
* user device fixes for Aspeed and PowerNV machines
* coverity fix for iommufd
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmX5mm0ACgkQUaNDx8/7
# 7KE/MQ/9GeX4yNBxY2iTATdmPXwjMw8AtKyfIQb605nIO0ch1Z98ywl5VMwCNohn
# ppY9L5bFpEASgRlFVm73X4DGxKyRGpRPqylsvINh0hKciRpmRkELHY3llhnXsd7P
# Q197pDtFr54FeX8j4+hSAu4paT97fPENlKn0J6lto2I1cXGcD1LYNDFhysoXdGme
# brJgo7KjQJZPZ560ZewskL5FWf3G9EkRjpqd8y0G5OtNmAPgAaahOMHhDCXan182
# J89I9CHI5xN45MRfAs8JamSaj/GyNsr4h04WhPa0+VZQ5vsaeW2Ekt4ypj+oAV+p
# wykhYzQk4ALZcmmph2flSAtLa7uheI+imyqubMthQCDj3G8onSQBMd5/4WRK6O49
# 0oE1DpPDEfhlJEQYxaYhOeqeA9iaP+w6V+yE+L5oGlMO66cR7GZsPu0x7kXailbH
# IoHw9mO+vMkpuyeP7M3hA8WRFCdFpf1Nn1Ao5Jz3KoiTyJWlIvX5VSaj12sjddQ2
# fU9SKu2Q5QqS5uQGakkY64EyUy7RkGIX6zY2NIscVe2lfAfKf3mZwu7OIuLjEy5O
# lRn35vWV8fOdRooKoDPTNcdBCaNPi+RApin8chOv5P+F+ie7+Twf9sb1AgH/pIcv
# HptvTXbvSFNbbdb+OE8a5qsqTvnrN8d31IXzrWRYsJB07x2IyoA=
# =zR3v
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 19 Mar 2024 14:00:13 GMT
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-for-9.0-20240319' of https://github.com/legoater/qemu:
aspeed/smc: Only wire flash devices at reset
ppc/pnv: I2C controller is not user creatable
vfio/iommufd: Fix memory leak
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>