Our current usage of MMU indexes when EL3 is AArch32 is confused.
Architecturally, when EL3 is AArch32, all Secure code runs under the
Secure PL1&0 translation regime:
* code at EL3, which might be Mon, or SVC, or any of the
other privileged modes (PL1)
* code at EL0 (Secure PL0)
This is different from when EL3 is AArch64, in which case EL3 is its
own translation regime, and EL1 and EL0 (whether AArch32 or AArch64)
have their own regime.
We claimed to be mapping Secure PL1 to our ARMMMUIdx_EL3, but didn't
do anything special about Secure PL0, which meant it used the same
ARMMMUIdx_EL10_0 that NonSecure PL0 does. This resulted in a bug
where arm_sctlr() incorrectly picked the NonSecure SCTLR as the
controlling register when in Secure PL0, which meant we were
spuriously generating alignment faults because we were looking at the
wrong SCTLR control bits.
The use of ARMMMUIdx_EL3 for Secure PL1 also resulted in the bug that
we wouldn't honour the PAN bit for Secure PL1, because there's no
equivalent _PAN mmu index for it.
We could fix this in one of two ways:
* The most straightforward is to add new MMU indexes EL30_0,
EL30_3, EL30_3_PAN to correspond to "Secure PL1&0 at PL0",
"Secure PL1&0 at PL1", and "Secure PL1&0 at PL1 with PAN".
This matches how we use indexes for the AArch64 regimes, and
preserves propirties like being able to determine the privilege
level from an MMU index without any other information. However
it would add two MMU indexes (we can share one with ARMMMUIdx_EL3),
and we are already using 14 of the 16 the core TLB code permits.
* The more complicated approach is the one we take here. We use
the same MMU indexes (E10_0, E10_1, E10_1_PAN) for Secure PL1&0
than we do for NonSecure PL1&0. This saves on MMU indexes, but
means we need to check in some places whether we're in the
Secure PL1&0 regime or not before we interpret an MMU index.
The changes in this commit were created by auditing all the places
where we use specific ARMMMUIdx_ values, and checking whether they
needed to be changed to handle the new index value usage.
Note for potential stable backports: taking also the previous
(comment-change-only) commit might make the backport easier.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2326
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240809160430.1144805-3-peter.maydell@linaro.org
We have a long comment describing the Arm architectural translation
regimes and how we map them to QEMU MMU indexes. This comment has
got a bit out of date:
* FEAT_SEL2 allows Secure EL2 and corresponding new regimes
* FEAT_RME introduces Realm state and its translation regimes
* We now model the Cortex-R52 so that is no longer a hypothetical
* We separated Secure Stage 2 and NonSecure Stage 2 MMU indexes
* We have an MMU index per physical address spacea
Add the missing pieces so that the list of architectural translation
regimes matches the Arm ARM, and the list and count of QEMU MMU
indexes in the comment matches the enum.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240809160430.1144805-2-peter.maydell@linaro.org
AdvSIMD instructions are supposed to zero bits beyond 128.
Affects SSHLL, USHLL, SSHLL2, USHLL2.
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240717060903.205098-15-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This commit adds validation checks for the MCOPRE and MCOSEL values in
the rcc_update_cfgr_register function. If the MCOPRE value exceeds
0b100 or the MCOSEL value exceeds 0b111, an error is logged and the
corresponding clock mux is disabled. This helps in identifying and
handling invalid configurations in the RCC registers.
Reproducer:
cat << EOF | qemu-system-aarch64 -display \
none -machine accel=qtest, -m 512M -machine b-l475e-iot01a -qtest \
stdio
writeq 0x40021008 0xffffffff
EOF
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2356
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Rather that enumerating the types that can produce
MMX operands, examine the unit. No functional change.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20240812025844.58956-3-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When cross compiling QEMU configured with --static, I've been getting
configure errors like the following:
Build-time dependency glib-2.0 found: NO
../target/hexagon/meson.build:303:15: ERROR: Dependency lookup for glib-2.0 with method 'pkgconfig' failed: Could not generate libs for glib-2.0:
Package libpcre2-8 was not found in the pkg-config search path.
Perhaps you should add the directory containing `libpcre2-8.pc'
to the PKG_CONFIG_PATH environment variable
Package 'libpcre2-8', required by 'glib-2.0', not found
This happens because --static sets the prefer_static Meson option, but
my build machine doesn't have a static libpcre2. I don't think it
makes sense to insist that native dependencies are static, just
because I want the non-native QEMU binaries to be static.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Link: https://lore.kernel.org/r/20240805104921.4035256-1-hi@alyssa.is
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix for hosts with an older libblkio.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAma6MIoACgkQnKSrs4Gr
c8i+7gf/Q1vTYE9U1ksbmASQGVJSyCfZlKB0fNxgsGgdnhcIF2uipSxNiDTVVAgn
rKfMXCvFrPQ7cjbKiiHe4Aj9GqjI6nY6vimnuxqxq9FCd1+RiGGZWDRBfS+6ZQjg
815BFB7tkc7ejoL5plMk95XHM+2uHHV0xvK/zelrZ5VOeWdot0yUgL1QLMpAvzMQ
dY3pwarG8txlnTrMuE+Ig03hjkPf0Z6aK6kdaI5xn9G6O1+799NYXpjqKNtDbisc
Sf9iq5hmbfASECBBUJH9iWrLdgnieADPebRbOAmDpUsM1bGV6UW9KHUE7zC0h394
jz8fSjMOjY03rDQjOpzV1wtR8zwpDw==
=Asvz
-----END PGP SIGNATURE-----
Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging
Pull request
Fix for hosts with an older libblkio.
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAma6MIoACgkQnKSrs4Gr
# c8i+7gf/Q1vTYE9U1ksbmASQGVJSyCfZlKB0fNxgsGgdnhcIF2uipSxNiDTVVAgn
# rKfMXCvFrPQ7cjbKiiHe4Aj9GqjI6nY6vimnuxqxq9FCd1+RiGGZWDRBfS+6ZQjg
# 815BFB7tkc7ejoL5plMk95XHM+2uHHV0xvK/zelrZ5VOeWdot0yUgL1QLMpAvzMQ
# dY3pwarG8txlnTrMuE+Ig03hjkPf0Z6aK6kdaI5xn9G6O1+799NYXpjqKNtDbisc
# Sf9iq5hmbfASECBBUJH9iWrLdgnieADPebRbOAmDpUsM1bGV6UW9KHUE7zC0h394
# jz8fSjMOjY03rDQjOpzV1wtR8zwpDw==
# =Asvz
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 13 Aug 2024 01:55:54 AM AEST
# gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
block/blkio: use FUA flag on write zeroes only if supported
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* include: Fix typo in name of MAKE_IDENTFIER macro
* docs: Various txt-to-rST conversions
* hw/core/ptimer: fix timer zero period condition for freq > 1GHz
* arm/virt: place power button pin number on a define
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAma5+4wZHHBldGVyLm1h
eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3pX3D/9UVutdg5TsB9N8y5mPaVSn
Yx0awBgxK5SHWeVgQJBkSdqh6LiGhhukR3VHfNanDELq24s0uLqLW86thgj+iB0H
51rnVHJtWtT9mIt0Qq9BlXX8+j0th6hELy/z+/aYdrWI1pmKsGYgF1gRh1vXrg+I
0s/S7kZY5CNDBbTXoBNtJfbZRe8fzyy5gUqc/tnw6Qonp8XM1OeG6sg/qF0KwzbB
8R7IvnY7gaBWm3daXqrFoxYuR+9i6F8uaFflOm+CarKQc9foH6KEzmfLAYLfGkFZ
2ZVHg3uC4k4OicyrpYcWsgumNTzOj8RTI4kV7M8NAj5TXCr+0pO6lnhlAKVGTWiL
nJrW62dN56w8NVOzcy0tB0xqTHnKIxioGZyU4RDVKHjD/Fy0x7LX7KVmaBEZgyxJ
oA4zY4KOrCNFsXQlqZgx38v/1hshnIYFN7V5AmfGEfbbKpBznKBQKmuyJ9VwSfGT
jLwlwU4VMJPsj2Rs70seEl6obgyZicAXIAbqPgtMsvt3H2kKI2jtsNPFka3WaY62
0jOEbbFrsKV1//ZExBZdFhqBH/CoiZMvM4jsq1Y/oxAxIWtGv5dmJJsAA3w33YE4
kNWXfHKAAhydZKeQloMgeOdLliP5UiCfF1FltwAWkLo59GV3TkjwagDU8+pWs9OF
plOKWaKDUzkHq6G197uaBA==
=ftoZ
-----END PGP SIGNATURE-----
Merge tag 'pull-target-arm-20240812' of https://git.linaro.org/people/pmaydell/qemu-arm into staging
* Fix BTI versus CF_PCREL
* include: Fix typo in name of MAKE_IDENTFIER macro
* docs: Various txt-to-rST conversions
* hw/core/ptimer: fix timer zero period condition for freq > 1GHz
* arm/virt: place power button pin number on a define
# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAma5+4wZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3pX3D/9UVutdg5TsB9N8y5mPaVSn
# Yx0awBgxK5SHWeVgQJBkSdqh6LiGhhukR3VHfNanDELq24s0uLqLW86thgj+iB0H
# 51rnVHJtWtT9mIt0Qq9BlXX8+j0th6hELy/z+/aYdrWI1pmKsGYgF1gRh1vXrg+I
# 0s/S7kZY5CNDBbTXoBNtJfbZRe8fzyy5gUqc/tnw6Qonp8XM1OeG6sg/qF0KwzbB
# 8R7IvnY7gaBWm3daXqrFoxYuR+9i6F8uaFflOm+CarKQc9foH6KEzmfLAYLfGkFZ
# 2ZVHg3uC4k4OicyrpYcWsgumNTzOj8RTI4kV7M8NAj5TXCr+0pO6lnhlAKVGTWiL
# nJrW62dN56w8NVOzcy0tB0xqTHnKIxioGZyU4RDVKHjD/Fy0x7LX7KVmaBEZgyxJ
# oA4zY4KOrCNFsXQlqZgx38v/1hshnIYFN7V5AmfGEfbbKpBznKBQKmuyJ9VwSfGT
# jLwlwU4VMJPsj2Rs70seEl6obgyZicAXIAbqPgtMsvt3H2kKI2jtsNPFka3WaY62
# 0jOEbbFrsKV1//ZExBZdFhqBH/CoiZMvM4jsq1Y/oxAxIWtGv5dmJJsAA3w33YE4
# kNWXfHKAAhydZKeQloMgeOdLliP5UiCfF1FltwAWkLo59GV3TkjwagDU8+pWs9OF
# plOKWaKDUzkHq6G197uaBA==
# =ftoZ
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 12 Aug 2024 10:09:48 PM AEST
# gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [full]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [full]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [full]
# gpg: aka "Peter Maydell <peter@archaic.org.uk>" [unknown]
* tag 'pull-target-arm-20240812' of https://git.linaro.org/people/pmaydell/qemu-arm:
arm/virt: place power button pin number on a define
hw/core/ptimer: fix timer zero period condition for freq > 1GHz
docs: Typo fix in live disk backup
docs/interop/prl-xml.rst: Fix minor grammar nits
docs/interop/prl-xml.txt: Convert to rST
docs/interop/parallels.txt: Convert to rST
docs/interop/nbd.txt: Convert to rST
docs/specs/rocker.txt: Convert to rST
include: Fix typo in name of MAKE_IDENTFIER macro
target/arm: Fix BTI versus CF_PCREL
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
libblkio supports BLKIO_REQ_FUA with write zeros requests only since
version 1.4.0, so let's inform the block layer that the blkio driver
supports it only in this case. Otherwise we can have runtime errors
as reported in https://issues.redhat.com/browse/RHEL-32878
Fixes: fd66dbd424 ("blkio: add libblkio block driver")
Cc: qemu-stable@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-32878
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20240808080545.40744-1-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Having magic numbers inside the code is not a good idea, as it
is error-prone. So, instead, create a macro with the number
definition.
Link: https://lore.kernel.org/qemu-devel/CAFEAcA-PYnZ-32MRX+PgvzhnoAV80zBKMYg61j2f=oHaGfwSsg@mail.gmail.com/
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-id: ef0e7f5fca6cd94eda415ecee670c3028c671b74.1723121692.git.mchehab+huawei@kernel.org
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The real period is zero when both period and period_frac are zero.
Check the method ptimer_set_freq, if freq is larger than 1000 MHz,
the period is zero, but the period_frac is not, in this case, the
ptimer will work but the current code incorrectly recognizes that
the ptimer is disabled.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2306
Signed-off-by: JianZhou Yue <JianZhou.Yue@verisilicon.com>
Message-id: 3DA024AEA8B57545AF1B3CAA37077D0FB75E82C8@SHASXM03.verisilicon.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
iQEzBAABCAAdFiEEIV1G9IJGaJ7HfzVi7wSWWzmNYhEFAma5uNkACgkQ7wSWWzmN
YhFpLwf+J9+cBWKUze7FZkxNHU78GJ/b+oVQfLYPnrCRrVKoyTr9yiKfMDS8qf5/
tPd+xFABwcHb8UL3EeAe9w5aB0QCqqdmZMFRkWuaZ7HEbZkYNt9cJck5iMdNaPBm
cKiFRLb8FDVA3aegCcsBqnwCxgFW+3P3rrnHQz1C+GQAOm7FER+HiFnYucjrrLSM
SaXZYIH/LPqL01gbZcbixQkhgL5XFWUToFXQEYECGS07uZZ1WSJkxIP6WZDchJ4+
vYO8/fWXVdrjvDirraZQRYnurWQGpTUk0Ocn2R8MaJsF8TK031MrMRJ3YP9zXp4n
wMe0BZO/YG5oi2gFrJpYL2AZqh2MgQ==
=DhS+
-----END PGP SIGNATURE-----
Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEIV1G9IJGaJ7HfzVi7wSWWzmNYhEFAma5uNkACgkQ7wSWWzmN
# YhFpLwf+J9+cBWKUze7FZkxNHU78GJ/b+oVQfLYPnrCRrVKoyTr9yiKfMDS8qf5/
# tPd+xFABwcHb8UL3EeAe9w5aB0QCqqdmZMFRkWuaZ7HEbZkYNt9cJck5iMdNaPBm
# cKiFRLb8FDVA3aegCcsBqnwCxgFW+3P3rrnHQz1C+GQAOm7FER+HiFnYucjrrLSM
# SaXZYIH/LPqL01gbZcbixQkhgL5XFWUToFXQEYECGS07uZZ1WSJkxIP6WZDchJ4+
# vYO8/fWXVdrjvDirraZQRYnurWQGpTUk0Ocn2R8MaJsF8TK031MrMRJ3YP9zXp4n
# wMe0BZO/YG5oi2gFrJpYL2AZqh2MgQ==
# =DhS+
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 12 Aug 2024 05:25:13 PM AEST
# gpg: using RSA key 215D46F48246689EC77F3562EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* tag 'net-pull-request' of https://github.com/jasowang/qemu:
net: Fix '-net nic,model=' for non-help arguments
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Oops, don't *delete* the model option when checking for 'help'.
Fixes: 64f75f57f9 ("net: Reinstate '-net nic, model=help' output as documented in man page")
Reported-by: Hans <sungdgdhtryrt@gmail.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Add in the missing space in the section header.
Fixes: 1084159b31 ("qapi: deprecate drive-backup", v6.2.0)
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Fix some minor grammar nits in the prl-xml documentation.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20240801170131.3977807-6-peter.maydell@linaro.org
Convert prl-xml.txt to rST format.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20240801170131.3977807-5-peter.maydell@linaro.org
Convert parallels.txt to rST format.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20240801170131.3977807-4-peter.maydell@linaro.org
Convert nbd.txt to rST format.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20240801170131.3977807-3-peter.maydell@linaro.org
Convert the rocker.txt specification document to rST format. We make
extensive use of the :: marker to introduce a literal block for all
the tables and ASCII art, rather than trying to convert the tables to
rST table syntax. This produces a valid rST document without needing
a huge diff.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240801170131.3977807-2-peter.maydell@linaro.org
In commit bb71846325 we added some macro magic to avoid
variable-shadowing when using some of our more complicated
macros. One of the internal components of this is a macro
named MAKE_IDENTFIER. Fix the typo in its name: it should
be MAKE_IDENTIFIER.
Commit created with
sed -i -e 's/MAKE_IDENTFIER/MAKE_IDENTIFIER/g' include/qemu/*.h include/qapi/qmp/qobject.h
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20240801102516.3843780-1-peter.maydell@linaro.org
With pcrel, we cannot check the guarded page bit at translation
time, as different mappings of the same physical page may or may
not have the GP bit set.
Instead, add a couple of helpers to check the page at runtime,
after all other filters that might obviate the need for the check.
The set_btype_for_br call must be moved after the gen_a64_set_pc
call to ensure the current pc can still be computed.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240802003028.795476-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
A malicious client can attempt to connect to an NBD server, and then
intentionally delay progress in the handshake, including if it does
not know the TLS secrets. Although the previous two patches reduce
this behavior by capping the default max-connections parameter and
killing slow clients, they did not eliminate the possibility of a
client waiting to close the socket until after the QMP nbd-server-stop
command is executed, at which point qemu would SEGV when trying to
dereference the NULL nbd_server global which is no longer present.
This amounts to a denial of service attack. Worse, if another NBD
server is started before the malicious client disconnects, I cannot
rule out additional adverse effects when the old client interferes
with the connection count of the new server (although the most likely
is a crash due to an assertion failure when checking
nbd_server->connections > 0).
For environments without this patch, the CVE can be mitigated by
ensuring (such as via a firewall) that only trusted clients can
connect to an NBD server. Note that using frameworks like libvirt
that ensure that TLS is used and that nbd-server-stop is not executed
while any trusted clients are still connected will only help if there
is also no possibility for an untrusted client to open a connection
but then stall on the NBD handshake.
Given the previous patches, it would be possible to guarantee that no
clients remain connected by having nbd-server-stop sleep for longer
than the default handshake deadline before finally freeing the global
nbd_server object, but that could make QMP non-responsive for a long
time. So intead, this patch fixes the problem by tracking all client
sockets opened while the server is running, and forcefully closing any
such sockets remaining without a completed handshake at the time of
nbd-server-stop, then waiting until the coroutines servicing those
sockets notice the state change. nbd-server-stop now has a second
AIO_WAIT_WHILE_UNLOCKED (the first is indirectly through the
blk_exp_close_all_type() that disconnects all clients that completed
handshakes), but forced socket shutdown is enough to progress the
coroutines and quickly tear down all clients before the server is
freed, thus finally fixing the CVE.
This patch relies heavily on the fact that nbd/server.c guarantees
that it only calls nbd_blockdev_client_closed() from the main loop
(see the assertion in nbd_client_put() and the hoops used in
nbd_client_put_nonzero() to achieve that); if we did not have that
guarantee, we would also need a mutex protecting our accesses of the
list of connections to survive re-entrancy from independent iothreads.
Although I did not actually try to test old builds, it looks like this
problem has existed since at least commit 862172f45c (v2.12.0, 2017) -
even back when that patch started using a QIONetListener to handle
listening on multiple sockets, nbd_server_free() was already unaware
that the nbd_blockdev_client_closed callback can be reached later by a
client thread that has not completed handshakes (and therefore the
client's socket never got added to the list closed in
nbd_export_close_all), despite that patch intentionally tearing down
the QIONetListener to prevent new clients.
Reported-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Fixes: CVE-2024-7409
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-14-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
A client that opens a socket but does not negotiate is merely hogging
qemu's resources (an open fd and a small amount of memory); and a
malicious client that can access the port where NBD is listening can
attempt a denial of service attack by intentionally opening and
abandoning lots of unfinished connections. The previous patch put a
default bound on the number of such ongoing connections, but once that
limit is hit, no more clients can connect (including legitimate ones).
The solution is to insist that clients complete handshake within a
reasonable time limit, defaulting to 10 seconds. A client that has
not successfully completed NBD_OPT_GO by then (including the case of
where the client didn't know TLS credentials to even reach the point
of NBD_OPT_GO) is wasting our time and does not deserve to stay
connected. Later patches will allow fine-tuning the limit away from
the default value (including disabling it for doing integration
testing of the handshake process itself).
Note that this patch in isolation actually makes it more likely to see
qemu SEGV after nbd-server-stop, as any client socket still connected
when the server shuts down will now be closed after 10 seconds rather
than at the client's whims. That will be addressed in the next patch.
For a demo of this patch in action:
$ qemu-nbd -f raw -r -t -e 10 file &
$ nbdsh --opt-mode -c '
H = list()
for i in range(20):
print(i)
H.insert(i, nbd.NBD())
H[i].set_opt_mode(True)
H[i].connect_uri("nbd://localhost")
'
$ kill $!
where later connections get to start progressing once earlier ones are
forcefully dropped for taking too long, rather than hanging.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-13-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: rebase to changes earlier in series, reduce scope of timer]
Signed-off-by: Eric Blake <eblake@redhat.com>
Allowing an unlimited number of clients to any web service is a recipe
for a rudimentary denial of service attack: the client merely needs to
open lots of sockets without closing them, until qemu no longer has
any more fds available to allocate.
For qemu-nbd, we default to allowing only 1 connection unless more are
explicitly asked for (-e or --shared); this was historically picked as
a nice default (without an explicit -t, a non-persistent qemu-nbd goes
away after a client disconnects, without needing any additional
follow-up commands), and we are not going to change that interface now
(besides, someday we want to point people towards qemu-storage-daemon
instead of qemu-nbd).
But for qemu proper, and the newer qemu-storage-daemon, the QMP
nbd-server-start command has historically had a default of unlimited
number of connections, in part because unlike qemu-nbd it is
inherently persistent until nbd-server-stop. Allowing multiple client
sockets is particularly useful for clients that can take advantage of
MULTI_CONN (creating parallel sockets to increase throughput),
although known clients that do so (such as libnbd's nbdcopy) typically
use only 8 or 16 connections (the benefits of scaling diminish once
more sockets are competing for kernel attention). Picking a number
large enough for typical use cases, but not unlimited, makes it
slightly harder for a malicious client to perform a denial of service
merely by opening lots of connections withot progressing through the
handshake.
This change does not eliminate CVE-2024-7409 on its own, but reduces
the chance for fd exhaustion or unlimited memory usage as an attack
surface. On the other hand, by itself, it makes it more obvious that
with a finite limit, we have the problem of an unauthenticated client
holding 100 fds opened as a way to block out a legitimate client from
being able to connect; thus, later patches will further add timeouts
to reject clients that are not making progress.
This is an INTENTIONAL change in behavior, and will break any client
of nbd-server-start that was not passing an explicit max-connections
parameter, yet expects more than 100 simultaneous connections. We are
not aware of any such client (as stated above, most clients aware of
MULTI_CONN get by just fine on 8 or 16 connections, and probably cope
with later connections failing by relying on the earlier connections;
libvirt has not yet been passing max-connections, but generally
creates NBD servers with the intent for a single client for the sake
of live storage migration; meanwhile, the KubeSAN project anticipates
a large cluster sharing multiple clients [up to 8 per node, and up to
100 nodes in a cluster], but it currently uses qemu-nbd with an
explicit --shared=0 rather than qemu-storage-daemon with
nbd-server-start).
We considered using a deprecation period (declare that omitting
max-parameters is deprecated, and make it mandatory in 3 releases -
then we don't need to pick an arbitrary default); that has zero risk
of breaking any apps that accidentally depended on more than 100
connections, and where such breakage might not be noticed under unit
testing but only under the larger loads of production usage. But it
does not close the denial-of-service hole until far into the future,
and requires all apps to change to add the parameter even if 100 was
good enough. It also has a drawback that any app (like libvirt) that
is accidentally relying on an unlimited default should seriously
consider their own CVE now, at which point they are going to change to
pass explicit max-connections sooner than waiting for 3 qemu releases.
Finally, if our changed default breaks an app, that app can always
pass in an explicit max-parameters with a larger value.
It is also intentional that the HMP interface to nbd-server-start is
not changed to expose max-connections (any client needing to fine-tune
things should be using QMP).
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-12-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[ericb: Expand commit message to summarize Dan's argument for why we
break corner-case back-compat behavior without a deprecation period]
Signed-off-by: Eric Blake <eblake@redhat.com>
Upcoming patches to fix a CVE need to track an opaque pointer passed
in by the owner of a client object, as well as request for a time
limit on how fast negotiation must complete. Prepare for that by
changing the signature of nbd_client_new() and adding an accessor to
get at the opaque pointer, although for now the two servers
(qemu-nbd.c and blockdev-nbd.c) do not change behavior even though
they pass in a new default timeout value.
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-11-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: s/LIMIT/MAX_SECS/ as suggested by Dan]
Signed-off-by: Eric Blake <eblake@redhat.com>
Touch up a comment with the wrong type name, and an over-long line,
both noticed while working on upcoming patches.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-10-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Define a hexagon_cpu_properties list to match the idiom used
by other targets.
Signed-off-by: Brian Cain <bcain@quicinc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Taylor Simpson <ltaylorsimpson@gmail.com>
For now, v66 behavior is the same as other CPUs.
Signed-off-by: Brian Cain <bcain@quicinc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Taylor Simpson <ltaylorsimpson@gmail.com>
Add my git tree for hexagon. Note that the branch is "hex-next" and not
"hex.next" as had been used previously. But I'll keep the "hex.next" branch
in sync with "hex-next" until this commit lands to avoid confusion.
Signed-off-by: Brian Cain <bcain@quicinc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The self assignment is clearly useless, and @1.last_column does not have
to be set for an expression with only a single token, so remove it.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Brian Cain <bcain@quicinc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230713120853.27023-1-anjo@rev.ng>
Signed-off-by: Brian Cain <bcain@quicinc.com>
The implementation for these instructions handles -0 as an invalid float
point value, whereas the Hexagon hardware considers it the same as +0
(which is valid). Let's fix that and add a regression test.
Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com>
Reviewed-by: Brian Cain <bcain@quicinc.com>
Reviewed-by: Taylor Simpson <ltaylorsimpson@gmail.com>
Signed-off-by: Brian Cain <bcain@quicinc.com>
Ensure the code structure is the same for matching constraints
and emitting code, lest we allow constants that cannot be
trivially tested.
Cc: qemu-stable@nongnu.org
Fixes: ad788aebba ("tcg/ppc: Support TCG_COND_TST{EQ,NE}")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2487
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <44328324-af73-4439-9d2b-d414e0e13dd7@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Apparently 'qemu-img info' doesn't report the backing file format field
for qed (as it does for qcow2):
$ qemu-img create -f qed base.qed 1M && qemu-img create -f qed -b base.qed -F qed top.qed 1M
$ qemu-img create -f qcow2 base.qcow2 1M && qemu-img create -f qcow2 -b base.qcow2 -F qcow2 top.qcow2 1M
$ qemu-img info top.qed | grep 'backing file format'
$ qemu-img info top.qcow2 | grep 'backing file format'
backing file format: qcow2
This leads to the 024 test failure with -qed. Let's just filter the
field out and exclude it from the output.
This is a fixup for the commit f93e65ee51 ("iotests/{024, 271}: add
testcases for qemu-img rebase").
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Message-ID: <20240730094701.790624-1-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Added several tests to verify the implementation of the vvfat driver.
We needed a way to interact with it, so created a basic `fat16.py` driver
that handled writing correct sectors for us.
Added `vvfat` to the non-generic formats, as its not a normal image format.
Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <bb8149c945301aefbdf470a0924c07f69f9c087d.1721470238.git.amjadsharafi10@gmail.com>
[kwolf: Made mypy and pylint happy to unbreak 297]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When reading with `read_cluster` we get the `mapping` with
`find_mapping_for_cluster` and then we call `open_file` for this
mapping.
The issue appear when its the same file, but a second cluster that is
not immediately after it, imagine clusters `500 -> 503`, this will give
us 2 mappings one has the range `500..501` and another `503..504`, both
point to the same file, but different offsets.
When we don't open the file since the path is the same, we won't assign
`s->current_mapping` and thus accessing way out of bound of the file.
From our example above, after `open_file` (that didn't open anything) we
will get the offset into the file with
`s->cluster_size*(cluster_num-s->current_mapping->begin)`, which will
give us `0x2000 * (504-500)`, which is out of bound for this mapping and
will produce some issues.
Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Message-ID: <1f3ea115779abab62ba32c788073cdc99f9ad5dd.1721470238.git.amjadsharafi10@gmail.com>
[kwolf: Simplified the patch based on Amjad's analysis and input]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
How this `abort` was intended to check for was:
- if the `mapping->first_mapping_index` is not the same as
`first_mapping_index`, which **should** happen only in one case,
when we are handling the first mapping, in that case
`mapping->first_mapping_index == -1`, in all other cases, the other
mappings after the first should have the condition `true`.
- From above, we know that this is the first mapping, so if the offset
is not `0`, then abort, since this is an invalid state.
The issue was that `first_mapping_index` is not set if we are
checking from the middle, the variable `first_mapping_index` is
only set if we passed through the check `cluster_was_modified` with the
first mapping, and in the same function call we checked the other
mappings.
One approach is to go into the loop even if `cluster_was_modified`
is not true so that we will be able to set `first_mapping_index` for the
first mapping, but since `first_mapping_index` is only used here,
another approach is to just check manually for the
`mapping->first_mapping_index != -1` since we know that this is the
value for the only entry where `offset == 0` (i.e. first mapping).
Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <b0fbca3ee208c565885838f6a7deeaeb23f4f9c2.1721470238.git.amjadsharafi10@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The field is marked as "the offset in the file (in clusters)", but it
was being used like this
`cluster_size*(nums)+mapping->info.file.offset`, which is incorrect.
Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <72f19a7903886dda1aa78bcae0e17702ee939262.1721470238.git.amjadsharafi10@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before this commit, the behavior when calling `commit_one_file` for
example with `offset=0x2000` (second cluster), what will happen is that
we won't fetch the next cluster from the fat, and instead use the first
cluster for the read operation.
This is due to off-by-one error here, where `i=0x2000 !< offset=0x2000`,
thus not fetching the next cluster.
Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <b97c1e1f1bc2f776061ae914f95d799d124fcd73.1721470238.git.amjadsharafi10@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the case of scsi-block, RESERVATION_CONFLICT is not a backend error,
but indicates that the guest tried to make a request that it isn't
allowed to execute. Pass the error to the guest so that it can decide
what to do with it.
Without this, if we stop the VM in response to a RESERVATION_CONFLICT
(as is the default policy in management software such as oVirt or
KubeVirt), it can happen that the VM cannot be resumed any more because
every attempt to resume it immediately runs into the same error and
stops the VM again.
One case that expects RESERVATION_CONFLICT errors to be visible in the
guest is running the validation tests in Windows 2019's Failover Cluster
Manager, which intentionally tries to execute invalid requests to see if
they are properly rejected.
Buglink: https://issues.redhat.com/browse/RHEL-50000
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240731123207.27636-5-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
scsi_block_sgio_complete() has surprising behaviour in that there are
error cases in which it directly completes the request and never calls
the passed callback. In the current state of the code, this doesn't seem
to result in bugs, but with future code changes, we must be careful to
never rely on the callback doing some cleanup until this code smell is
fixed. For now, just add warnings to make people aware of the trap.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240731123207.27636-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of calling into scsi_handle_rw_error() directly from
scsi_block_sgio_complete() and skipping the normal callback, go through
the normal cleanup path by calling the callback with a positive error
value.
The important difference here is not only that the code path is cleaner,
but that the callbacks set r->req.aiocb = NULL. If we skip setting this
and the error action is BLOCK_ERROR_ACTION_STOP, resuming the VM runs
into an assertion failure in scsi_read_data() or scsi_write_data()
because the dangling aiocb pointer is unexpected.
Fixes: a108557bbf ("scsi: inline sg_io_sense_from_errno() into the callers.")
Buglink: https://issues.redhat.com/browse/RHEL-50000
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240731123207.27636-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In some error cases, scsi_block_sgio_complete() never calls the passed
callback, but directly completes the request. This leads to bugs because
its error paths are not exact copies of what the callback would normally
do.
In preparation to fix this, allow passing positive return values to the
callbacks that represent the status code that should be used to complete
the request.
scsi_handle_rw_error() already handles positive values for its ret
parameter because scsi_block_sgio_complete() calls directly into it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240731123207.27636-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Upstream clang 18 (and backports to clang 17 in Fedora and RHEL)
implemented support for __attribute__((cleanup())) in its Thread Safety
Analysis, so we can now actually have a proper implementation of
WITH_GRAPH_RDLOCK_GUARD() that understands when we acquire and when we
release the lock.
-Wthread-safety is now only enabled if the compiler is new enough to
understand this pattern. In theory, we could have used some #ifdefs to
keep the existing basic checks on old compilers, but as long as someone
runs a newer compiler (and our CI does), we will catch locking problems,
so it's probably not worth keeping multiple implementations for this.
The implementation can't use g_autoptr any more because the glib macros
define wrapper functions that don't have the right TSA attributes, so
the compiler would complain about them. Just use the cleanup attribute
directly instead.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20240627181245.281403-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>