v2:
- Patch "migration: Move cpu-throttle.c from system to migration",
fix build on MacOS, and subject spelling
NOTE: checkpatch.pl could report a false positive on this branch:
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#21:
{include/sysemu => migration}/cpu-throttle.h | 0
That's covered by "F: migration/" entry.
Changelog:
- Peter's cleanup patch on migrate_fd_cleanup()
- Peter's cleanup patch to introduce thread name macros
- Hanna's error path fix for vmstate subsection save()s
- Hyman's auto converge enhancement on background dirty sync
- Peter's additional tracepoints for save state entries
- Thomas's build fix for OpenBSD in dirtyrate.c
- Peter's deprecation of query-migrationthreads command
- Peter's cleanup/fixes from the "export misc.h" series
- Maciej's two small patches from multifd+vfio series
-----BEGIN PGP SIGNATURE-----
iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZyTbVRIccGV0ZXJ4QHJl
ZGhhdC5jb20ACgkQO1/MzfOr1wan3wD+L4TVNDc34Hy4mvWu7u1lCOePX0GBdUEc
oEeBGblwbrcBAIR8d+5z9O5YcWH1coozG1aUC4qCtSHHk5TGbJk4/UUD
=XB5Q
-----END PGP SIGNATURE-----
Merge tag 'migration-20241030-pull-request' of https://gitlab.com/peterx/qemu into staging
Migration pull request for softfreeze
v2:
- Patch "migration: Move cpu-throttle.c from system to migration",
fix build on MacOS, and subject spelling
NOTE: checkpatch.pl could report a false positive on this branch:
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#21:
{include/sysemu => migration}/cpu-throttle.h | 0
That's covered by "F: migration/" entry.
Changelog:
- Peter's cleanup patch on migrate_fd_cleanup()
- Peter's cleanup patch to introduce thread name macros
- Hanna's error path fix for vmstate subsection save()s
- Hyman's auto converge enhancement on background dirty sync
- Peter's additional tracepoints for save state entries
- Thomas's build fix for OpenBSD in dirtyrate.c
- Peter's deprecation of query-migrationthreads command
- Peter's cleanup/fixes from the "export misc.h" series
- Maciej's two small patches from multifd+vfio series
# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZyTbVRIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wan3wD+L4TVNDc34Hy4mvWu7u1lCOePX0GBdUEc
# oEeBGblwbrcBAIR8d+5z9O5YcWH1coozG1aUC4qCtSHHk5TGbJk4/UUD
# =XB5Q
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 01 Nov 2024 13:44:53 GMT
# gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg: issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal]
# gpg: aka "Peter Xu <peterx@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706
* tag 'migration-20241030-pull-request' of https://gitlab.com/peterx/qemu:
migration/multifd: Zero p->flags before starting filling a packet
migration/ram: Add load start trace event
migration: Drop migration_is_idle()
migration: Drop migration_is_setup_or_active()
migration: Unexport ram_mig_init()
migration: Unexport dirty_bitmap_mig_init()
migration: Take migration object refcount earlier for threads
migration: Deprecate query-migrationthreads command
migration/dirtyrate: Silence warning about strcpy() on OpenBSD
tests/migration: Add case for periodic ramblock dirty sync
migration: Support periodic RAMBlock dirty bitmap sync
migration: Remove "rs" parameter in migration_bitmap_sync_precopy
migration: Move cpu-throttle.c from system to migration
migration: Stop CPU throttling conditionally
accel/tcg/icount-common: Remove the reference to the unused header file
migration: Ensure vmstate_save() sets errp
migration: Put thread names together with macros
migration: Cleanup migrate_fd_cleanup() on accessing to_dst_file
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This way there aren't stale flags there.
p->flags can't contain SYNC to be sent at the next RAM packet since syncs
are now handled separately in multifd_send_thread.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/r/1c96b6cdb797e6f035eb1a4ad9bfc24f4c7f5df8.1730203967.git.maciej.szmigiero@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Now with the current migration_is_running(), it will report exactly the
opposite of what will be reported by migration_is_idle().
Drop migration_is_idle(), instead use "!migration_is_running()" which
should be identical on functionality.
In reality, most of the idle check is inverted, so it's even easier to
write with "migrate_is_running()" check.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
This helper is mostly the same as migration_is_running(), except that one
has COLO reported as true, the other has CANCELLING reported as true.
Per my past years experience on the state changes, none of them should
matter.
To make it slightly safer, report both COLO || CANCELLING to be true in
migration_is_running(), then drop the other one. We kept the 1st only
because the name is simpler, and clear enough.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
It's only used within migration/, so it shouldn't be exported.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Both migration thread or background snapshot thread will take a refcount of
the migration object at the entrace of the thread function.
That makes sense, because it protects the object from being freed by the
main thread in migration_shutdown() later, but it might still race with it
if the thread is scheduled too late. Consider the case right after
pthread_create() happened, VM shuts down with the object released, but
right after that the migration thread finally got created, referencing
MigrationState* in the opaque pointer which is already freed.
The only 100% safe way to make sure it won't get freed is taking the
refcount right before the thread is created, meanwhile when BQL is held.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Per previous discussion [1,2], this patch deprecates query-migrationthreads
command.
To summarize, the major reason of the deprecation is due to no sensible way
to consume the API properly:
(1) The reported list of threads are incomplete (ignoring destination
threads and non-multifd threads).
(2) For CPU pinning, there's no way to properly pin the threads with
the API if the threads will start running right away after migration
threads can be queried, so the threads will always run on the default
cores for a short window.
(3) For VM debugging, one can use "-name $VM,debug-threads=on" instead,
which will provide proper names for all migration threads.
[1] https://lore.kernel.org/r/20240930195837.825728-1-peterx@redhat.com
[2] https://lore.kernel.org/r/20241011153417.516715-1-peterx@redhat.com
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20241022194501.1022443-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
The linker on OpenBSD complains:
ld: warning: dirtyrate.c:447 (../src/migration/dirtyrate.c:447)(...):
warning: strcpy() is almost always misused, please use strlcpy()
It's currently not a real problem in this case since both arrays
have the same size (256 bytes). But just in case somebody changes
the size of the source array in the future, let's better play safe
and use g_strlcpy() here instead, with an additional check that the
string has been copied as a whole.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Hyman Huang <yong.huang@smartx.com>
Link: https://lore.kernel.org/r/20241022063402.184213-1-thuth@redhat.com
[peterx: Fix over-80 chars]
Signed-off-by: Peter Xu <peterx@redhat.com>
When VM is configured with huge memory, the current throttle logic
doesn't look like to scale, because migration_trigger_throttle()
is only called for each iteration, so it won't be invoked for a long
time if one iteration can take a long time.
The periodic dirty sync aims to fix the above issue by synchronizing
the ramblock from remote dirty bitmap and, when necessary, triggering
the CPU throttle multiple times during a long iteration.
This is a trade-off between synchronization overhead and CPU throttle
impact.
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/f61f1b3653f2acf026901103e1c73d157d38b08f.1729146786.git.yong.huang@smartx.com
[peterx: make prev_cnt global, and reset for each migration]
Signed-off-by: Peter Xu <peterx@redhat.com>
The global static variable ram_state in fact is referred to by the
"rs" parameter in migration_bitmap_sync_precopy. For ease of calling
by the callees, use the global variable directly in
migration_bitmap_sync_precopy and remove "rs" parameter.
The migration_bitmap_sync_precopy will be exported in the next commit.
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/283c335d61463bf477160da91b24da45cdaf3e43.1729146786.git.yong.huang@smartx.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Move cpu-throttle.c from system to migration since it's
only used for migration; this makes us avoid exporting the
util functions and variables in misc.h but export them in
migration.h when implementing the periodic ramblock dirty
sync feature in the upcoming commits.
Since CPU throttle timers are only used in migration, move
their registry to migration_object_init.
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/c1b3efaa0cb49e03d422e9da97bdb65cc3d234d1.1729146786.git.yong.huang@smartx.com
[peterx: Fix build on MacOS on cocoa.m, not move cpu-throttle.h yet]
[peterx: Fix subject spelling, per pm215]
Signed-off-by: Peter Xu <peterx@redhat.com>
migration/savevm.c contains some calls to vmstate_save() that are
followed by migrate_set_error() if the integer return value indicates an
error. migrate_set_error() requires that the `Error *` object passed to
it is set. Therefore, vmstate_save() is assumed to always set *errp on
error.
Right now, that assumption is not met: vmstate_save_state_v() (called
internally by vmstate_save()) will not set *errp if
vmstate_subsection_save() or vmsd->post_save() fail. Fix that by adding
an *errp parameter to vmstate_subsection_save(), and by generating a
generic error in case post_save() fails (as is already done for
pre_save()).
Without this patch, qemu will crash after vmstate_subsection_save() or
post_save() have failed inside of a vmstate_save() call (unless
migrate_set_error() then happen to discard the new error because
s->error is already set). This happens e.g. when receiving the state
from a virtio-fs back-end (virtiofsd) fails.
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Link: https://lore.kernel.org/r/20241015170437.310358-1-hreitz@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Keep migration thread names together, so it's easier to see a list of all
possible migration threads.
Still two functional changes below besides the macro defintions:
- There's one dirty rate thread that we overlooked before, now we add
that too and name it as "mig/dirtyrate" following the old rules.
- The old name "mig/src/rp-thr" has "-thr" but it may not be useful if
it's a thread name anyway, while "rp" can be slightly hard to read.
Taking this chance to rename it to "mig/src/return", hopefully a better
name.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Link: https://lore.kernel.org/r/20241011153652.517440-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
The cleanup function can in many cases needs cleanup on its own.
The major thing we want to do here is not referencing to_dst_file when
without the file mutex. When at it, touch things elsewhere too to make it
look slightly better in general.
One thing to mention is, migration_thread has its own "running" boolean, so
it doesn't need to rely on to_dst_file being non-NULL. Multifd has a
dependency so it needs to be skipped if to_dst_file is not yet set; add a
richer comment for such reason.
Resolves: Coverity CID 1527402
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240919163042.116767-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
AVX10 state enumeration in CPUID leaf D and enabling in XCR0 register
are identical to AVX512 state regardless of the supported vector lengths.
Given that some E-cores will support AVX10 but not support AVX512, add
AVX512 state components to guest when AVX10 is enabled.
Based on a patch by Tao Su <tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-8-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since the highest supported vector length for a processor implies that
all lesser vector lengths are also supported, add the dependencies of
the supported vector lengths. If all vector lengths aren't supported,
clear AVX10 enable bit as well.
Note that the order of AVX10 related dependencies should be kept as:
CPUID_24_0_EBX_AVX10_128 -> CPUID_24_0_EBX_AVX10_256,
CPUID_24_0_EBX_AVX10_256 -> CPUID_24_0_EBX_AVX10_512,
CPUID_24_0_EBX_AVX10_VL_MASK -> CPUID_7_1_EDX_AVX10,
CPUID_7_1_EDX_AVX10 -> CPUID_24_0_EBX,
so that prevent user from setting weird CPUID combinations, e.g. 256-bits
and 512-bits are supported but 128-bits is not, no vector lengths are
supported but AVX10 enable bit is still set.
Since AVX10_128 will be reserved as 1, adding these dependencies has the
bonus that when user sets -cpu host,-avx10-128, CPUID_7_1_EDX_AVX10 and
CPUID_24_0_EBX will be disabled automatically.
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241028024512.156724-5-tao1.su@linux.intel.com
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241031085233.425388-7-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When AVX10 enable bit is set, the 0x24 leaf will be present as "AVX10
Converged Vector ISA leaf" containing fields for the version number and
the supported vector bit lengths.
Introduce avx10-version property so that avx10 version can be controlled
by user and cpu model. Per spec, avx10 version can never be 0, the default
value of avx10-version is set to 0 to determine whether it is specified by
user. The default can come from the device model or, for the max model,
from KVM's reported value.
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241028024512.156724-3-tao1.su@linux.intel.com
Link: https://lore.kernel.org/r/20241028024512.156724-4-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-5-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Prepare for filtering non-boolean features such as AVX10 version.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-4-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Right now, QEMU is using the "feature" and "bits" fields of ExtSaveArea
to query the accelerator for the support status of extended save areas.
This is a problem for AVX10, which attaches two feature bits (AVX512F
and AVX10) to the same extended save states.
To keep the AVX10 hacks to the minimum, limit usage of esa->features
and esa->bits. Instead, just query the accelerator for the 0xD leaf.
Do it in common code and clear esa->size if an extended save state is
unsupported.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20241031085233.425388-3-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This gives greater opportunity for reassociation on x86 targets,
since addition can use the LEA instruction.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the operands of the arithmetic instruction fit within a half-register,
it's easiest to use a comparison instruction to compute the carry.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This removes the 256 byte parity table from the executable.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This makes it easier for the compiler to understand which bits are set,
and it also removes "cltq" instructions to canonicalize the output value
as 32-bit signed.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Mostly used for TEST+JG and TEST+JLE, but it is easy to cover
also JBE/JA and JL/JGE; shaves about 0.5% TCG ops.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Most uses of CC_OP_DYNAMIC are for CMP/JB/JE or similar sequences.
We can optimize many of them to avoid computation of the flags.
This eliminates both TCG ops to set up the new cc_op, and helper
instructions because evaluating just ZF is much cheaper.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Assert that op is known and that cc_op_live_ is populated.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace arithmetic on cc_op with a helper function.
Assert that the op has a size and that it is valid
for the configuration.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240701025115.1265117-6-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Give the first few enumerators explicit integer constants,
align the BWLQ enumerators.
This will be used to simplify ((op - CC_OP_*B) & 3).
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240701025115.1265117-4-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Just use CC_OP_EFLAGS; it is not that likely that the flags computed by
CC_OP_CLR survive the end of the basic block, in which case there is no
need to spill cc_op_src.
cc_op_src now does need spilling if the XOR is followed by a memory
operation, but this only costs 0.2% extra TCG ops. They will be recouped
by simplifications in how QEMU evaluates ZF at runtime, which are even
greater with this change.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make const. Use the read-only strings directly; do not copy
them into an on-stack buffer with snprintf. Allow for holes
in the cc_op_str array, now present with CC_OP_POPCNT.
Fixes: 460231ad36 ("target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240701025115.1265117-2-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Prefer it to gen_ext_tl in the common case where the destination is known.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do not assume that the distro-installed meson is compatible with the one
in the virtual environment.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
AWS nitro enclaves[1] is an Amazon EC2[2] feature that allows creating
isolated execution environments, called enclaves, from Amazon EC2
instances which are used for processing highly sensitive data. Enclaves
have no persistent storage and no external networking. The enclave VMs
are based on the Firecracker microvm with a vhost-vsock device for
communication with the parent EC2 instance that spawned it and a Nitro
Secure Module (NSM) device for cryptographic attestation. The parent
instance VM always has CID 3 while the enclave VM gets a dynamic CID.
An EIF (Enclave Image Format)[3] file is used to boot an AWS nitro enclave
virtual machine. This commit adds support for AWS nitro enclave emulation
using a new machine type option '-M nitro-enclave'. This new machine type
is based on the 'microvm' machine type, similar to how real nitro enclave
VMs are based on Firecracker microvm. For nitro-enclave to boot from an
EIF file, the kernel and ramdisk(s) are extracted into a temporary kernel
and a temporary initrd file which are then hooked into the regular x86
boot mechanism along with the extracted cmdline. The EIF file path should
be provided using the '-kernel' QEMU option.
In QEMU, the vsock emulation for nitro enclave is added using vhost-user-
vsock as opposed to vhost-vsock. vhost-vsock doesn't support sibling VM
communication which is needed for nitro enclaves. So for the vsock
communication to CID 3 to work, another process that does the vsock
emulation in userspace must be run, for example, vhost-device-vsock[4]
from rust-vmm, with necessary vsock communication support in another
guest VM with CID 3. Using vhost-user-vsock also enables the possibility
to implement some proxying support in the vhost-user-vsock daemon that
will forward all the packets to the host machine instead of CID 3 so
that users of nitro-enclave can run the necessary applications in their
host machine instead of running another whole VM with CID 3. The following
mandatory nitro-enclave machine option has been added related to the
vhost-user-vsock device.
- 'vsock': The chardev id from the '-chardev' option for the
vhost-user-vsock device.
AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which
has been added using the virtio-nsm device added in a previous commit.
In Nitro Enclaves, all the PCRs start in a known zero state and the first
16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8
contain the SHA384 hashes related to the EIF file used to boot the VM
for validation. The following optional nitro-enclave machine options
have been added related to the NSM device.
- 'id': Enclave identifier, reflected in the module-id of the NSM
device. If not provided, a default id will be set.
- 'parent-role': Parent instance IAM role ARN, reflected in PCR3
of the NSM device.
- 'parent-id': Parent instance identifier, reflected in PCR4 of the
NSM device.
[1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
[2] https://aws.amazon.com/ec2/
[3] https://github.com/aws/aws-nitro-enclaves-image-format
[4] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-6-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is in preparation for the next commit where the nitro-enclave
machine type will need to instead use a memfd backend, for the built-in
vhost-user-vsock device to work.
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-5-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
An EIF (Enclave Image Format)[1] file is used to boot an AWS nitro
enclave[2] virtual machine. The EIF file contains the necessary kernel,
cmdline, ramdisk(s) sections to boot.
Some helper functions have been introduced for extracting the necessary
sections from an EIF file and then writing them to temporary files as
well as computing SHA384 hashes from the section data. These will be
used in the following commit to add support for nitro-enclave machine
type in QEMU.
The files added in this commit are not compiled yet but will be added
to the hw/core/meson.build file in the following commit where
CONFIG_NITRO_ENCLAVE will be introduced.
[1] https://github.com/aws/aws-nitro-enclaves-image-format
[2] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-4-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
libcbor dependecy is necessary for adding virtio-nsm and nitro-enclave
machine support in the following commits. libvirt-ci has already been
updated with the dependency upstream and this commit updates libvirt-ci
submodule in QEMU to latest upstream. Also the libcbor dependency has
been added to tests/lcitool/projects/qemu.yml.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-2-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The call to xgetbv() is passing the ecx value for cpuid function 0xD,
index 0. The xgetbv call thus returns false (OSXSAVE is bit 27, which is
well out of the range of CPUID[0xD,0].ECX) and eax is not modified. While
fixing it, cache the whole computation of supported XCR0 bits since it
will be used for more than just CPUID leaf 0xD.
Furthermore, unsupported subleafs of CPUID 0xD (including all those
corresponding to zero bits in host's XCR0) must be hidden; if OSXSAVE
is not set at all, the whole of CPUID leaf 0xD plus the XSAVE bit must
be hidden.
Finally, unconditionally drop XSTATE_BNDREGS_MASK and XSTATE_BNDCSR_MASK;
real hardware will only show them if the MPX bit is set in CPUID;
this is never the case for hvf_get_supported_cpuid() because QEMU's
Hypervisor.framework support does not handle the VMX fields related to
MPX (even in the unlikely possibility that the host has MPX enabled).
So hide those bits in the new cache_host_xcr0().
Cc: Phil Dennis-Jordan <lists@philjordan.eu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>