116600 Commits

Author SHA1 Message Date
Thomas Huth
bc9da794cc hw/s390x: Re-enable the pci-bridge device on s390x
Commit e779e5c05a ("hw/pci-bridge: Add a Kconfig switch for the
normal PCI bridge") added a config switch for the pci-bridge, so
that the device is not included in the s390x target anymore (since
the pci-bridge is not really useful on s390x).

However, it seems like libvirt is still adding pci-bridge devices
automatically to the guests' XML definitions (when adding a PCI
device to a non-zero PCI bus), so these guests are now broken due
to the missing pci-bridge in the QEMU binary.

To avoid disruption of the users, let's re-enable the pci-bridge
device on s390x for the time being.

Message-ID: <20241024130405.62134-1-thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-11-04 14:16:11 +01:00
Thomas Huth
e6a401d7a6 tests/functional: Fix the s390x and ppc64 tuxrun tests
I forgot to add the tests to the meson.build file and looks
like I even managed to somehow mix up the hashsums in the
ppc64 test!

Message-ID: <20241023141919.930689-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-11-04 14:16:11 +01:00
Thomas Huth
a3c946ec88 tests/vm/openbsd: Remove the "Time appears wrong" workaround
Seems like the server now reports the right time again, so we have
to drop the workaround to get the installer working again.

Message-ID: <20241023072414.827732-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-11-04 14:16:11 +01:00
Thomas Huth
62728ddcba tests/functional: Add a test for sh4eb
Now that we are aware of binaries that are available for sh4eb,
we should make sure that there are no regressions with this
target and test it regularly in our CI.

Message-ID: <20241024082735.42324-3-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-11-04 14:16:11 +01:00
Thomas Huth
51cdb6806f Revert "Remove the unused sh4eb target"
This reverts commit 73ceb12960e686b763415f0880cc5171ccce01cf.

The "r2d" machine can work in big endian mode, see:

 https://lore.kernel.org/qemu-devel/d6755445-1060-48a8-82b6-2f392c21f9b9@landley.net/

So the reasoning for removing sh4eb was wrong.

Message-ID: <20241024082735.42324-2-thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Rob Landley <rob@landley.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-11-04 14:16:11 +01:00
Daniel P. Berrangé
786bc22552 tests/functional: make cached asset files read-only
This ensures that if a functional test runs QEMU with a writable
disk pointing to a cached asset, an error will be reported, rather
than silently modifying the cache file.

As an example, tweaking test_sbsaref.py to set snapshot=off,
results in a clear error:

  Command: ./build/qemu-system-aarch64 ...snip... -drive file=/var/home/berrange/.cache/qemu/download/44cdbae275ef1bb6dab1d5fbb59473d4f741e1c8ea8a80fd9e906b531d6ad461,format=raw,snapshot=off -cpu max,pauth=off
  Output: qemu-system-aarch64: Could not open '/var/home/berrange/.cache/qemu/download/44cdbae275ef1bb6dab1d5fbb59473d4f741e1c8ea8a80fd9e906b531d6ad461': Permission denied

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20241025092659.2312118-3-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-11-04 14:16:11 +01:00
Daniel P. Berrangé
c9daf680d1 tests/functional: make tuxrun disk images writable
The zstd command will preserve the input archive permissions on the
output file. So when we decompress the readonly cached image, the
resulting per-test run private disk image will also be readonly.
We need it to be writable, so make it so.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20241025092659.2312118-2-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-11-04 14:16:11 +01:00
Thomas Huth
9094f7c934 .gitlab-ci.d/cirrus: Remove the macos-15 job
Cirrus-CI stopped providing the possibility to run macOS 15 jobs.
Quoting https://cirrus-ci.org/guide/macOS/ :

 "Cirrus CI Cloud only allows ghcr.io/cirruslabs/macos-runner:sonoma image ..."

If you still try to run a Sequoia image, it gets automatically "upgraded"
to Sonoma instead. So the macos-15 job in the QEMU CI now does not
make sense anymore, thus let's remove it.

Message-ID: <20241021124722.139348-1-thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-11-04 14:16:07 +01:00
Peter Maydell
c94bee4cd6 * target/i386: new feature bits for AMD processors
* target/i386/tcg: improvements around flag handling
 * target/i386: add AVX10 support
 * target/i386: add GraniteRapids-v2 model
 * dockerfiles: add libcbor
 * New nitro-enclave machine type
 * qom: cleanups to object_new
 * configure: detect 64-bit MIPS for rust
 * configure: deprecate 32-bit MIPS
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmcjvkQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPIKgf/etNpO2T+eLFtWN/Qd5eopBXqNd9k
 KmeK9EgW9lqx2IPGNen33O+uKpb/TsMmubSsSF+YxTp7pmkc8+71f3rBMaIAD02r
 /paHSMVw0+f12DAFQz1jdvGihR7Mew0wcF/UdEt737y6vEmPxLTyYG3Gfa4NSZwT
 /V5jTOIcfUN/UEjNgIp6NTuOEESKmlqt22pfMapgkwMlAJYeeJU2X9eGYE86wJbq
 ZSXNgK3jL9wGT2XKa3e+OKzHfFpSkrB0JbQbdico9pefnBokN/hTeeUJ81wBAc7u
 i00W1CEQVJ5lhBc121d4AWMp83ME6HijJUOTMmJbFIONPsITFPHK1CAkng==
 =D4nR
 -----END PGP SIGNATURE-----

Merge tag 'for-upstream-i386' of https://gitlab.com/bonzini/qemu into staging

* target/i386: new feature bits for AMD processors
* target/i386/tcg: improvements around flag handling
* target/i386: add AVX10 support
* target/i386: add GraniteRapids-v2 model
* dockerfiles: add libcbor
* New nitro-enclave machine type
* qom: cleanups to object_new
* configure: detect 64-bit MIPS for rust
* configure: deprecate 32-bit MIPS

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmcjvkQUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroPIKgf/etNpO2T+eLFtWN/Qd5eopBXqNd9k
# KmeK9EgW9lqx2IPGNen33O+uKpb/TsMmubSsSF+YxTp7pmkc8+71f3rBMaIAD02r
# /paHSMVw0+f12DAFQz1jdvGihR7Mew0wcF/UdEt737y6vEmPxLTyYG3Gfa4NSZwT
# /V5jTOIcfUN/UEjNgIp6NTuOEESKmlqt22pfMapgkwMlAJYeeJU2X9eGYE86wJbq
# ZSXNgK3jL9wGT2XKa3e+OKzHfFpSkrB0JbQbdico9pefnBokN/hTeeUJ81wBAc7u
# i00W1CEQVJ5lhBc121d4AWMp83ME6HijJUOTMmJbFIONPsITFPHK1CAkng==
# =D4nR
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 31 Oct 2024 17:28:36 GMT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream-i386' of https://gitlab.com/bonzini/qemu: (49 commits)
  target/i386: Introduce GraniteRapids-v2 model
  target/i386: Add AVX512 state when AVX10 is supported
  target/i386: Add feature dependencies for AVX10
  target/i386: add CPUID.24 features for AVX10
  target/i386: add AVX10 feature and AVX10 version property
  target/i386: return bool from x86_cpu_filter_features
  target/i386: do not rely on ExtSaveArea for accelerator-supported XCR0 bits
  target/i386: cpu: set correct supported XCR0 features for TCG
  target/i386: use + to put flags together
  target/i386: use higher-precision arithmetic to compute CF
  target/i386: use compiler builtin to compute PF
  target/i386: make flag variables unsigned
  target/i386: add a note about gen_jcc1
  target/i386: add a few more trivial CCPrepare cases
  target/i386: optimize TEST+Jxx sequences
  target/i386: optimize computation of ZF from CC_OP_DYNAMIC
  target/i386: Wrap cc_op_live with a validity check
  target/i386: Introduce cc_op_size
  target/i386: Rearrange CCOp
  target/i386: remove CC_OP_CLR
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-11-02 16:21:38 +00:00
Tao Su
1a519388a8 target/i386: Introduce GraniteRapids-v2 model
Update GraniteRapids CPU model to add AVX10 and the missing features(ss,
tsc-adjust, cldemote, movdiri, movdir64b).

Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241028024512.156724-7-tao1.su@linux.intel.com
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241031085233.425388-9-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Tao Su
0d7475be3b target/i386: Add AVX512 state when AVX10 is supported
AVX10 state enumeration in CPUID leaf D and enabling in XCR0 register
are identical to AVX512 state regardless of the supported vector lengths.

Given that some E-cores will support AVX10 but not support AVX512, add
AVX512 state components to guest when AVX10 is enabled.

Based on a patch by Tao Su <tao1.su@linux.intel.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-8-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Tao Su
150ab84b2d target/i386: Add feature dependencies for AVX10
Since the highest supported vector length for a processor implies that
all lesser vector lengths are also supported, add the dependencies of
the supported vector lengths. If all vector lengths aren't supported,
clear AVX10 enable bit as well.

Note that the order of AVX10 related dependencies should be kept as:
        CPUID_24_0_EBX_AVX10_128     -> CPUID_24_0_EBX_AVX10_256,
        CPUID_24_0_EBX_AVX10_256     -> CPUID_24_0_EBX_AVX10_512,
        CPUID_24_0_EBX_AVX10_VL_MASK -> CPUID_7_1_EDX_AVX10,
        CPUID_7_1_EDX_AVX10          -> CPUID_24_0_EBX,
so that prevent user from setting weird CPUID combinations, e.g. 256-bits
and 512-bits are supported but 128-bits is not, no vector lengths are
supported but AVX10 enable bit is still set.

Since AVX10_128 will be reserved as 1, adding these dependencies has the
bonus that when user sets -cpu host,-avx10-128, CPUID_7_1_EDX_AVX10 and
CPUID_24_0_EBX will be disabled automatically.

Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241028024512.156724-5-tao1.su@linux.intel.com
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241031085233.425388-7-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Tao Su
2d055b8fe1 target/i386: add CPUID.24 features for AVX10
Introduce features for the supported vector bit lengths.

Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241028024512.156724-3-tao1.su@linux.intel.com
Link: https://lore.kernel.org/r/20241028024512.156724-4-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-6-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Tao Su
bccfb846fd target/i386: add AVX10 feature and AVX10 version property
When AVX10 enable bit is set, the 0x24 leaf will be present as "AVX10
Converged Vector ISA leaf" containing fields for the version number and
the supported vector bit lengths.

Introduce avx10-version property so that avx10 version can be controlled
by user and cpu model. Per spec, avx10 version can never be 0, the default
value of avx10-version is set to 0 to determine whether it is specified by
user.  The default can come from the device model or, for the max model,
from KVM's reported value.

Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241028024512.156724-3-tao1.su@linux.intel.com
Link: https://lore.kernel.org/r/20241028024512.156724-4-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-5-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
3507c6f046 target/i386: return bool from x86_cpu_filter_features
Prepare for filtering non-boolean features such as AVX10 version.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-4-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
b888c78070 target/i386: do not rely on ExtSaveArea for accelerator-supported XCR0 bits
Right now, QEMU is using the "feature" and "bits" fields of ExtSaveArea
to query the accelerator for the support status of extended save areas.
This is a problem for AVX10, which attaches two feature bits (AVX512F
and AVX10) to the same extended save states.

To keep the AVX10 hacks to the minimum, limit usage of esa->features
and esa->bits.  Instead, just query the accelerator for the 0xD leaf.
Do it in common code and clear esa->size if an extended save state is
unsupported.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-3-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
33098002a8 target/i386: cpu: set correct supported XCR0 features for TCG
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-2-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
6d8623b5c0 target/i386: use + to put flags together
This gives greater opportunity for reassociation on x86 targets,
since addition can use the LEA instruction.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
134ffcb276 target/i386: use higher-precision arithmetic to compute CF
If the operands of the arithmetic instruction fit within a half-register,
it's easiest to use a comparison instruction to compute the carry.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
24899cdcd2 target/i386: use compiler builtin to compute PF
This removes the 256 byte parity table from the executable.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
46c04e4bcf target/i386: make flag variables unsigned
This makes it easier for the compiler to understand which bits are set,
and it also removes "cltq" instructions to canonicalize the output value
as 32-bit signed.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
44d58e938b target/i386: add a note about gen_jcc1
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
cea677e821 target/i386: add a few more trivial CCPrepare cases
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
37df7c4d57 target/i386: optimize TEST+Jxx sequences
Mostly used for TEST+JG and TEST+JLE, but it is easy to cover
also JBE/JA and JL/JGE; shaves about 0.5% TCG ops.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
ae14b33de8 target/i386: optimize computation of ZF from CC_OP_DYNAMIC
Most uses of CC_OP_DYNAMIC are for CMP/JB/JE or similar sequences.
We can optimize many of them to avoid computation of the flags.
This eliminates both TCG ops to set up the new cc_op, and helper
instructions because evaluating just ZF is much cheaper.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Richard Henderson
1f7f72bdc4 target/i386: Wrap cc_op_live with a validity check
Assert that op is known and that cc_op_live_ is populated.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Richard Henderson
f359b2fb71 target/i386: Introduce cc_op_size
Replace arithmetic on cc_op with a helper function.
Assert that the op has a size and that it is valid
for the configuration.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240701025115.1265117-6-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Richard Henderson
ee806f9f67 target/i386: Rearrange CCOp
Give the first few enumerators explicit integer constants,
align the BWLQ enumerators.

This will be used to simplify ((op - CC_OP_*B) & 3).

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240701025115.1265117-4-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
e09447c39f target/i386: remove CC_OP_CLR
Just use CC_OP_EFLAGS; it is not that likely that the flags computed by
CC_OP_CLR survive the end of the basic block, in which case there is no
need to spill cc_op_src.

cc_op_src now does need spilling if the XOR is followed by a memory
operation, but this only costs 0.2% extra TCG ops.  They will be recouped
by simplifications in how QEMU evaluates ZF at runtime, which are even
greater with this change.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Richard Henderson
c2954745f2 target/i386: Tidy cc_op_str usage
Make const.  Use the read-only strings directly; do not copy
them into an on-stack buffer with snprintf.  Allow for holes
in the cc_op_str array, now present with CC_OP_POPCNT.

Fixes: 460231ad369 ("target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240701025115.1265117-2-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
a635390f05 target/i386: use tcg_gen_ext_tl when applicable
Prefer it to gen_ext_tl in the common case where the destination is known.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Paolo Bonzini
cf4344639b ci: always invoke meson through pyvenv
Do not assume that the distro-installed meson is compatible with the one
in the virtual environment.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Dorjoy Chowdhury
05bad41ba9 docs/nitro-enclave: Documentation for nitro-enclave machine type
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-7-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Dorjoy Chowdhury
f1826463d2 machine/nitro-enclave: New machine type for AWS Nitro Enclaves
AWS nitro enclaves[1] is an Amazon EC2[2] feature that allows creating
isolated execution environments, called enclaves, from Amazon EC2
instances which are used for processing highly sensitive data. Enclaves
have no persistent storage and no external networking. The enclave VMs
are based on the Firecracker microvm with a vhost-vsock device for
communication with the parent EC2 instance that spawned it and a Nitro
Secure Module (NSM) device for cryptographic attestation. The parent
instance VM always has CID 3 while the enclave VM gets a dynamic CID.

An EIF (Enclave Image Format)[3] file is used to boot an AWS nitro enclave
virtual machine. This commit adds support for AWS nitro enclave emulation
using a new machine type option '-M nitro-enclave'. This new machine type
is based on the 'microvm' machine type, similar to how real nitro enclave
VMs are based on Firecracker microvm. For nitro-enclave to boot from an
EIF file, the kernel and ramdisk(s) are extracted into a temporary kernel
and a temporary initrd file which are then hooked into the regular x86
boot mechanism along with the extracted cmdline. The EIF file path should
be provided using the '-kernel' QEMU option.

In QEMU, the vsock emulation for nitro enclave is added using vhost-user-
vsock as opposed to vhost-vsock. vhost-vsock doesn't support sibling VM
communication which is needed for nitro enclaves. So for the vsock
communication to CID 3 to work, another process that does the vsock
emulation in  userspace must be run, for example, vhost-device-vsock[4]
from rust-vmm, with necessary vsock communication support in another
guest VM with CID 3. Using vhost-user-vsock also enables the possibility
to implement some proxying support in the vhost-user-vsock daemon that
will forward all the packets to the host machine instead of CID 3 so
that users of nitro-enclave can run the necessary applications in their
host machine instead of running another whole VM with CID 3. The following
mandatory nitro-enclave machine option has been added related to the
vhost-user-vsock device.
  - 'vsock': The chardev id from the '-chardev' option for the
vhost-user-vsock device.

AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which
has been added using the virtio-nsm device added in a previous commit.
In Nitro Enclaves, all the PCRs start in a known zero state and the first
16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8
contain the SHA384 hashes related to the EIF file used to boot the VM
for validation. The following optional nitro-enclave machine options
have been added related to the NSM device.
  - 'id': Enclave identifier, reflected in the module-id of the NSM
device. If not provided, a default id will be set.
  - 'parent-role': Parent instance IAM role ARN, reflected in PCR3
of the NSM device.
  - 'parent-id': Parent instance identifier, reflected in PCR4 of the
NSM device.

[1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
[2] https://aws.amazon.com/ec2/
[3] https://github.com/aws/aws-nitro-enclaves-image-format
[4] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock

Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-6-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Dorjoy Chowdhury
1a9867498d core/machine: Make create_default_memdev machine a virtual method
This is in preparation for the next commit where the nitro-enclave
machine type will need to instead use a memfd backend, for the built-in
vhost-user-vsock device to work.

Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-5-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Dorjoy Chowdhury
63d2a5c787 hw/core: Add Enclave Image Format (EIF) related helpers
An EIF (Enclave Image Format)[1] file is used to boot an AWS nitro
enclave[2] virtual machine. The EIF file contains the necessary kernel,
cmdline, ramdisk(s) sections to boot.

Some helper functions have been introduced for extracting the necessary
sections from an EIF file and then writing them to temporary files as
well as computing SHA384 hashes from the section data. These will be
used in the following commit to add support for nitro-enclave machine
type in QEMU.

The files added in this commit are not compiled yet but will be added
to the hw/core/meson.build file in the following commit where
CONFIG_NITRO_ENCLAVE will be introduced.

[1] https://github.com/aws/aws-nitro-enclaves-image-format
[2] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html

Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-4-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Dorjoy Chowdhury
bb154e3e0c device/virtio-nsm: Support for Nitro Secure Module device
Nitro Secure Module (NSM)[1] device is used in AWS Nitro Enclaves[2]
for stripped down TPM functionality like cryptographic attestation.
The requests to and responses from NSM device are CBOR[3] encoded.

This commit adds support for NSM device in QEMU. Although related to
AWS Nitro Enclaves, the virito-nsm device is independent and can be
used in other machine types as well. The libcbor[4] library has been
used for the CBOR encoding and decoding functionalities.

[1] https://lists.oasis-open.org/archives/virtio-comment/202310/msg00387.html
[2] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
[3] http://cbor.io/
[4] https://libcbor.readthedocs.io/en/latest/

Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-3-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Dorjoy Chowdhury
1ac32dc8ea tests/lcitool: Update libvirt-ci and add libcbor dependency
libcbor dependecy is necessary for adding virtio-nsm and nitro-enclave
machine support in the following commits. libvirt-ci has already been
updated with the dependency upstream and this commit updates libvirt-ci
submodule in QEMU to latest upstream. Also the libcbor dependency has
been added to tests/lcitool/projects/qemu.yml.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-2-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Paolo Bonzini
7cac7aa704 target/i386/hvf: fix handling of XSAVE-related CPUID bits
The call to xgetbv() is passing the ecx value for cpuid function 0xD,
index 0. The xgetbv call thus returns false (OSXSAVE is bit 27, which is
well out of the range of CPUID[0xD,0].ECX) and eax is not modified. While
fixing it, cache the whole computation of supported XCR0 bits since it
will be used for more than just CPUID leaf 0xD.

Furthermore, unsupported subleafs of CPUID 0xD (including all those
corresponding to zero bits in host's XCR0) must be hidden; if OSXSAVE
is not set at all, the whole of CPUID leaf 0xD plus the XSAVE bit must
be hidden.

Finally, unconditionally drop XSTATE_BNDREGS_MASK and XSTATE_BNDCSR_MASK;
real hardware will only show them if the MPX bit is set in CPUID;
this is never the case for hvf_get_supported_cpuid() because QEMU's
Hypervisor.framework support does not handle the VMX fields related to
MPX (even in the unlikely possibility that the host has MPX enabled).
So hide those bits in the new cache_host_xcr0().

Cc: Phil Dennis-Jordan <lists@philjordan.eu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Babu Moger
9c07a7af5d target/i386: Expose new feature bits in CPUID 8000_0021_EAX/EBX
Newer AMD CPUs support ERAPS (Enhanced Return Address Prediction Security)
feature that enables the auto-clear of RSB entries on a TLB flush, context
switches and VMEXITs. The number of default RSP entries is reflected in
RapSize.

Add the feature bit and feature word to support these features.

CPUID_Fn80000021_EAX
Bits   Feature Description
24     ERAPS:
       Indicates support for enhanced return address predictor security.

CPUID_Fn80000021_EBX
Bits   Feature Description
31-24  Reserved
23:16  RapSize:
       Return Address Predictor size. RapSize x 8 is the minimum number
       of CALL instructions software needs to execute to flush the RAP.
15-00  MicrocodePatchSize. Read-only.
       Reports the size of the Microcode patch in 16-byte multiples.
       If 0, the size of the patch is at most 5568 (15C0h) bytes.

Link: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/57238.zip
Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/7c62371fe60af1e9bbd853f5f8e949bf2d908bd0.1729807947.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Babu Moger
2ec282b8ea target/i386: Expose bits related to SRSO vulnerability
Add following bits related Speculative Return Stack Overflow (SRSO).
Guests can make use of these bits if supported.

These bits are reported via CPUID Fn8000_0021_EAX.
===================================================================
Bit Feature Description
===================================================================
27  SBPB                Indicates support for the Selective Branch Predictor Barrier.
28  IBPB_BRTYPE         MSR_PRED_CMD[IBPB] flushes all branch type predictions.
29  SRSO_NO             Not vulnerable to SRSO.
30  SRSO_USER_KERNEL_NO Not vulnerable to SRSO at the user-kernel boundary.
===================================================================

Link: https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf
Link: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/57238.zip
Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/dadbd70c38f4e165418d193918a3747bd715c5f4.1729807947.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Sandipan Das
209b0ac120 target/i386: Add PerfMonV2 feature bit
CPUID leaf 0x80000022, i.e. ExtPerfMonAndDbg, advertises new performance
monitoring features for AMD processors. Bit 0 of EAX indicates support
for Performance Monitoring Version 2 (PerfMonV2) features. If found to
be set during PMU initialization, the EBX bits can be used to determine
the number of available counters for different PMUs. It also denotes the
availability of global control and status registers.

Add the required CPUID feature word and feature bit to allow guests to
make use of the PerfMonV2 features.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/a96f00ee2637674c63c61e9fc4dee343ea818053.1729807947.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Babu Moger
9c882ad4dc target/i386: Fix minor typo in NO_NESTED_DATA_BP feature bit
Rename CPUID_8000_0021_EAX_No_NESTED_DATA_BP to
       CPUID_8000_0021_EAX_NO_NESTED_DATA_BP.

No functional change intended.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/a6749acd125670d3930f4ca31736a91b1d965f2f.1729807947.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Paolo Bonzini
f41823e059 qom: allow user-creatable classes to be in modules
There is no real reason to make user-creatable classes different
from other backends in this respect.  This also allows modularized
character devices to be treated by qom-list-properties just like
builtin ones.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Paolo Bonzini
02009a12bc qom: let object_new use a module if the type is not present
object_initialize() can use modules (it was added there because
virtio-gpu-device is a child device of virtio-gpu-pci; commit
64f7aece8ea, "object_initialize: try module load", 2020-09-15).
object_new() cannot; make things consistent.

qdev_new() is now just a simple wrapper that returns DeviceState.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Paolo Bonzini
144d80f69e qom: centralize module-loading functionality
Put together the common code of object_initialize() and
module_object_class_by_name() into a function that supports
Error **.  Rename the existing function type_get_by_name() to
clarify that it will only look at defined types; this is often
okay within object.c to look at the parents, but not outside it.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Paolo Bonzini
b801e3cb2a qom: use object_new_with_class when possible
A small optimization/code simplification, that also makes it clear that
we won't look for a type in a not-loaded-yet module---the module will
have been loaded by a call to module_object_class_by_name(), if present.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Paolo Bonzini
845b54efaf qom: remove unused function
The function has been unused since commit 4fa28f23906 ("ppc/pnv:
Instantiate cores separately", 2019-12-17).  The idea was that
you could use it to build an array of objects via pointer
arithmetic, but no one is doing it anymore.

Cc: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Xiaoyao Li
855bdb6c8a i386/cpu: Drop the check of phys_bits in host_cpu_realizefn()
The check of cpu->phys_bits to be in range between
[32, TARGET_PHYS_ADDR_SPACE_BITS] in host_cpu_realizefn()
is duplicated with check in x86_cpu_realizefn().

Since the ckeck in x86_cpu_realizefn() is called later and can cover all
the x86 cases. Remove the one in host_cpu_realizefn().

Opportunistically adjust cpu->phys_bits directly in
host_cpu_adjust_phys_bits(), which matches more with the function name.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20240929085747.2023198-1-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00
Paolo Bonzini
8aade934df accel: remove dead statement and useless assertion
ops is assigned again just below, and the result of the assignment must
be non-NULL.

Originally, the check for NULL was meant to be a check for the existence
of the ops class:

    ops = ACCEL_OPS_CLASS(object_class_by_name(ops_name));
    ...
    g_assert(ops != NULL);

(where the ops assignment begot the one that I am removing); but this is
meaningless now that oc is checked to be non-NULL before ops is assigned
(commit 5141e9a23fc, "accel: abort if we fail to load the accelerator
plugin", 2022-11-06).

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:32 +01:00