mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Het Gala	4f2f5b694d	tests/qtest/migration: Replace migrate_get_connect_uri inplace of migrate_get_socket_address Refactor migrate_get_socket_address to internally utilize 'socket-address' parameter, reducing redundancy in the function definition. migrate_get_socket_address implicitly converts SocketAddress into str. Move migrate_get_socket_address inside migrate_get_connect_uri which should return the uri string instead. Signed-off-by: Het Gala <het.gala@nutanix.com> Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240312202634.63349-4-het.gala@nutanix.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-04-23 18:36:01 -04:00
Het Gala	d1155fd485	tests/qtest/migration: Replace connect_uri and move migrate_get_socket_address inside migrate_qmp Move the calls to migrate_get_socket_address() into migrate_qmp(). Get rid of connect_uri and replace it with args->connect_uri only because 'to' object will help to generate connect_uri with the correct port number. Signed-off-by: Het Gala <het.gala@nutanix.com> Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240312202634.63349-3-het.gala@nutanix.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-04-23 18:36:01 -04:00
Het Gala	8c47168cca	tests/qtest/migration: Add 'to' object into migrate_qmp() Add the 'to' object into migrate_qmp(), so we can use migrate_get_socket_address() inside migrate_qmp() to get the port value. This is not applied to other migrate_qmp* because they don't need the port. Signed-off-by: Het Gala <het.gala@nutanix.com> Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240312202634.63349-2-het.gala@nutanix.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-04-23 18:36:01 -04:00
Mark Cave-Ayland	7653b44534	target/i386/translate.c: always write 32-bits for SGDT and SIDT The various Intel CPU manuals claim that SGDT and SIDT can write either 24-bits or 32-bits depending upon the operand size, but this is incorrect. Not only do the Intel CPU manuals give contradictory information between processor revisions, but this information doesn't even match real-life behaviour. In fact, tests on real hardware show that the CPU always writes 32-bits for SGDT and SIDT, and this behaviour is required for at least OS/2 Warp and WFW 3.11 with Win32s to function correctly. Remove the masking applied due to the operand size for SGDT and SIDT so that the TCG behaviour matches the behaviour on real hardware. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2198 -- MCA: Whilst I don't have a copy of OS/2 Warp handy, I've confirmed that this patch fixes the issue in WFW 3.11 with Win32s. For more technical information I highly recommend the excellent write-up at https://www.os2museum.com/wp/sgdtsidt-fiction-and-reality/. Message-ID: <20240419195147.434894-1-mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Paolo Bonzini	9c05071719	pythondeps.toml: warn about updates needed to docs/requirements.txt docs/requirements.txt is expected by readthedocs and should be in sync with pythondeps.toml. Add a comment to both. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Zhao Liu	94da7b6e9a	accel/tcg/icount-common: Consolidate the use of warn_report_once() Use warn_report_once() to get rid of the static local variable "notified". Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240418100716.1085491-1-zhao1.liu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Zhao Liu	aec202cb0e	target/i386/cpu: Merge the warning and error messages for AMD HT check Currently, the difference between warn_report_once() and error_report_once() is the former has the "warning:" prefix, while the latter does not have a similar level prefix. At the meantime, considering that there is no error handling logic here, and the purpose of error_report_once() is only to prompt the user with an abnormal message, there is no need to use an error-level message here, and instead we can just use a warning. Therefore, downgrade the message in error_report_once() to warning, and merge it into the previous warn_report_once(). Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240327103951.3853425-4-zhao1.liu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Zhao Liu	8e3991ebc8	target/i386/cpu: Consolidate the use of warn_report_once() The difference between error_printf() and error_report() is the latter may contain more information, such as the name of the program ("qemu-system-x86_64"). Thus its variant error_report_once() and warn_report()'s variant warn_report_once() can be used here to print the information only once without a static local variable "ht_warned". Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240327103951.3853425-3-zhao1.liu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Zhao Liu	7502ffb2f3	target/i386/host-cpu: Consolidate the use of warn_report_once() Use warn_report_once() to get rid of the static local variable "warned". Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240327103951.3853425-2-zhao1.liu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Isaku Yamahata	565f4768bb	kvm/tdx: Ignore memory conversion to shared of unassigned region TDX requires vMMIO region to be shared. For KVM, MMIO region is the region which kvm memslot isn't assigned to (except in-kernel emulation). qemu has the memory region for vMMIO at each device level. While OVMF issues MapGPA(to-shared) conservatively on 32bit PCI MMIO region, qemu doesn't find corresponding vMMIO region because it's before PCI device allocation and memory_region_find() finds the device region, not PCI bus region. It's safe to ignore MapGPA(to-shared) because when guest accesses those region they use GPA with shared bit set for vMMIO. Ignore memory conversion request of non-assigned region to shared and return success. Otherwise OVMF is confused and panics there. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240229063726.610065-35-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Isaku Yamahata	c5d9425ef4	kvm/tdx: Don't complain when converting vMMIO region to shared Because vMMIO region needs to be shared region, guest TD may explicitly convert such region from private to shared. Don't complain such conversion. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240229063726.610065-34-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Chao Peng	c15e568407	kvm: handle KVM_EXIT_MEMORY_FAULT Upon an KVM_EXIT_MEMORY_FAULT exit, userspace needs to do the memory conversion on the RAMBlock to turn the memory into desired attribute, switching between private and shared. Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when KVM_EXIT_MEMORY_FAULT happens. Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has guest_memfd memory backend. Note, KVM_EXIT_MEMORY_FAULT returns with -EFAULT, so special handling is added. When page is converted from shared to private, the original shared memory can be discarded via ram_block_discard_range(). Note, shared memory can be discarded only when it's not back'ed by hugetlb because hugetlb is supposed to be pre-allocated and no need for discarding. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240320083945.991426-13-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Xiaoyao Li	b2e9426c04	physmem: Introduce ram_block_discard_guest_memfd_range() When memory page is converted from private to shared, the original private memory is back'ed by guest_memfd. Introduce ram_block_discard_guest_memfd_range() for discarding memory in guest_memfd. Based on a patch by Isaku Yamahata <isaku.yamahata@intel.com>. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-ID: <20240320083945.991426-12-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Paolo Bonzini	852f0048f3	RAMBlock: make guest_memfd require uncoordinated discard Some subsystems like VFIO might disable ram block discard, but guest_memfd uses discard operations to implement conversions between private and shared memory. Because of this, sequences like the following can result in stale IOMMU mappings: 1. allocate shared page 2. convert page shared->private 3. discard shared page 4. convert page private->shared 5. allocate shared page 6. issue DMA operations against that shared page This is not a use-after-free, because after step 3 VFIO is still pinning the page. However, DMA operations in step 6 will hit the old mapping that was allocated in step 1. Address this by taking ram_block_discard_is_enabled() into account when deciding whether or not to discard pages. Since kvm_convert_memory()/guest_memfd doesn't implement a RamDiscardManager handler to convey and replay discard operations, this is a case of uncoordinated discard, which is blocked/released by ram_block_discard_require(). Interestingly, this function had no use so far. Alternative approaches would be to block discard of shared pages, but this would cause guests to consume twice the memory if they use VFIO; or to implement a RamDiscardManager and only block uncoordinated discard, i.e. use ram_block_coordinated_discard_require(). [Commit message mostly by Michael Roth <michael.roth@amd.com>] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Xiaoyao Li	37662d85b0	HostMem: Add mechanism to opt in kvm guest memfd via MachineState Add a new member "guest_memfd" to memory backends. When it's set to true, it enables RAM_GUEST_MEMFD in ram_flags, thus private kvm guest_memfd will be allocated during RAMBlock allocation. Memory backend's @guest_memfd is wired with @require_guest_memfd field of MachineState. It avoid looking up the machine in phymem.c. MachineState::require_guest_memfd is supposed to be set by any VMs that requires KVM guest memfd as private memory, e.g., TDX VM. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-ID: <20240320083945.991426-8-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Xiaoyao Li	bd3bcf6962	kvm/memory: Make memory type private by default if it has guest memfd backend KVM side leaves the memory to shared by default, which may incur the overhead of paging conversion on the first visit of each page. Because the expectation is that page is likely to private for the VMs that require private memory (has guest memfd). Explicitly set the memory to private when memory region has valid guest memfd backend. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-ID: <20240320083945.991426-16-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Chao Peng	ce5a983233	kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM. With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that backend'ed both by hva-based shared memory and guest memfd based private memory. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240320083945.991426-10-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Xiaoyao Li	15f7a80c49	RAMBlock: Add support of KVM private guest memfd Add KVM guest_memfd support to RAMBlock so both normal hva based memory and kvm guest memfd based private memory can be associated in one RAMBlock. Introduce new flag RAM_GUEST_MEMFD. When it's set, it calls KVM ioctl to create private guest_memfd during RAMBlock setup. Allocating a new RAM_GUEST_MEMFD flag to instruct the setup of guest memfd is more flexible and extensible than simply relying on the VM type because in the future we may have the case that not all the memory of a VM need guest memfd. As a benefit, it also avoid getting MachineState in memory subsystem. Note, RAM_GUEST_MEMFD is supposed to be set for memory backends of confidential guests, such as TDX VM. How and when to set it for memory backends will be implemented in the following patches. Introduce memory_region_has_guest_memfd() to query if the MemoryRegion has KVM guest_memfd allocated. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-ID: <20240320083945.991426-7-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Xiaoyao Li	0811baed49	kvm: Introduce support for memory_attributes Introduce the helper functions to set the attributes of a range of memory to private or shared. This is necessary to notify KVM the private/shared attribute of each gpa range. KVM needs the information to decide the GPA needs to be mapped at hva-based shared memory or guest_memfd based private memory. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240320083945.991426-11-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Xiaoyao Li	72853afc63	trace/kvm: Split address space and slot id in trace_kvm_set_user_memory() The upper 16 bits of kvm_userspace_memory_region::slot are address space id. Parse it separately in trace_kvm_set_user_memory(). Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240229063726.610065-5-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Michael Roth	ea7fbd3753	hw/i386/sev: Use legacy SEV VM types for older machine types Newer 9.1 machine types will default to using the KVM_SEV_INIT2 API for creating SEV/SEV-ES going forward. However, this API results in guest measurement changes which are generally not expected for users of these older guest types and can cause disruption if they switch to a newer QEMU/kernel version. Avoid this by continuing to use the older KVM_SEV_INIT/KVM_SEV_ES_INIT APIs for older machine types. Signed-off-by: Michael Roth <michael.roth@amd.com> Message-ID: <20240409230743.962513-4-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Michael Roth	023267334d	i386/sev: Add 'legacy-vm-type' parameter for SEV guest objects QEMU will currently automatically make use of the KVM_SEV_INIT2 API for initializing SEV and SEV-ES guests verses the older KVM_SEV_INIT/KVM_SEV_ES_INIT interfaces. However, the older interfaces will silently avoid sync'ing FPU/XSAVE state to the VMSA prior to encryption, thus relying on behavior and measurements that assume the related fields to be allow zero. With KVM_SEV_INIT2, this state is now synced into the VMSA, resulting in measurements changes and, theoretically, behaviorial changes, though the latter are unlikely to be seen in practice. To allow a smooth transition to the newer interface, while still providing a mechanism to maintain backward compatibility with VMs created using the older interfaces, provide a new command-line parameter: -object sev-guest,legacy-vm-type=true,... and have it default to false. Signed-off-by: Michael Roth <michael.roth@amd.com> Message-ID: <20240409230743.962513-2-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Paolo Bonzini	663e2f443e	target/i386: SEV: use KVM_SEV_INIT2 if possible Implement support for the KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM virtual machine types, and the KVM_SEV_INIT2 function of KVM_MEMORY_ENCRYPT_OP. These replace the KVM_SEV_INIT and KVM_SEV_ES_INIT functions, and have several advantages: - sharing the initialization sequence with SEV-SNP and TDX - allowing arguments including the set of desired VMSA features - protection against invalid use of KVM_GET/SET_* ioctls for guests with encrypted state If the KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types are not supported, fall back to KVM_SEV_INIT and KVM_SEV_ES_INIT (which use the default x86 VM type). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Paolo Bonzini	ee88612df1	target/i386: Implement mc->kvm_type() to get VM type KVM is introducing a new API to create confidential guests, which will be used by TDX and SEV-SNP but is also available for SEV and SEV-ES. The API uses the VM type argument to KVM_CREATE_VM to identify which confidential computing technology to use. Since there are no other expected uses of VM types, delegate mc->kvm_type() for x86 boards to the confidential-guest-support object pointed to by ms->cgs. For example, if a sev-guest object is specified to confidential-guest-support, like, qemu -machine ...,confidential-guest-support=sev0 \ -object sev-guest,id=sev0,... it will check if a VM type KVM_X86_SEV_VM or KVM_X86_SEV_ES_VM is supported, and if so use them together with the KVM_SEV_INIT2 function of the KVM_MEMORY_ENCRYPT_OP ioctl. If not, it will fall back to KVM_SEV_INIT and KVM_SEV_ES_INIT. This is a preparatory work towards TDX and SEV-SNP support, but it will also enable support for VMSA features such as DebugSwap, which are only available via KVM_SEV_INIT2. Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Paolo Bonzini	d82e9c843d	target/i386: introduce x86-confidential-guest Introduce a common superclass for x86 confidential guest implementations. It will extend ConfidentialGuestSupportClass with a method that provides the VM type to be passed to KVM_CREATE_VM. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Paolo Bonzini	a99c0c66eb	KVM: remove kvm_arch_cpu_check_are_resettable Board reset requires writing a fresh CPU state. As far as KVM is concerned, the only thing that blocks reset is that CPU state is encrypted; therefore, kvm_cpus_are_resettable() can simply check if that is the case. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Paolo Bonzini	5c3131c392	KVM: track whether guest state is encrypted So far, KVM has allowed KVM_GET/SET_* ioctls to execute even if the guest state is encrypted, in which case they do nothing. For the new API using VM types, instead, the ioctls will fail which is a safer and more robust approach. The new API will be the only one available for SEV-SNP and TDX, but it is also usable for SEV and SEV-ES. In preparation for that, require architecture-specific KVM code to communicate the point at which guest state is protected (which must be after kvm_cpu_synchronize_post_init(), though that might change in the future in order to suppor migration). From that point, skip reading registers so that cpu->vcpu_dirty is never true: if it ever becomes true, kvm_arch_put_registers() will fail miserably. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Paolo Bonzini	08b2d15cdd	runstate: skip initial CPU reset if reset is not actually possible Right now, the system reset is concluded by a call to cpu_synchronize_all_post_reset() in order to sync any changes that the machine reset callback applied to the CPU state. However, for VMs with encrypted state such as SEV-ES guests (currently the only case of guests with non-resettable CPUs) this cannot be done, because guest state has already been finalized by machine-init-done notifiers. cpu_synchronize_all_post_reset() does nothing on these guests, and actually we would like to make it fail if called once guest has been encrypted. So, assume that boards that support non-resettable CPUs do not touch CPU state and that all such setup is done before, at the time of cpu_synchronize_all_post_init(). Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Paolo Bonzini	ab0c7fb22b	linux-headers: update to current kvm/next Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Michael Roth	b40b8eb609	scripts/update-linux-headers: Add bits.h to file imports Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Michael Roth	66210a1a30	scripts/update-linux-headers: Add setup_data.h to import list Data structures like struct setup_data have been moved to a separate setup_data.h header which bootparam.h relies on. Add setup_data.h to the cp_portable() list and sync it along with the other header files. Note that currently struct setup_data is stripped away as part of generating bootparam.h, but that handling is no currently needed for setup_data.h since it doesn't pull in many external headers/dependencies. However, QEMU currently redefines struct setup_data in hw/i386/x86.c, so that will need to be removed as part of any header update that pulls in the new setup_data.h to avoid build bisect breakage. Because <asm/setup_data.h> is the first architecture specific #include in include/standard-headers/, add a new sed substitution to rewrite asm/ include to the standard-headers/asm-* subdirectory for the current architecture. And while at it, remove asm-generic/kvm_para.h from the list of allowed includes: it does not have a matching substitution, and therefore it would not be possible to use it on non-Linux systems where there is no /usr/include/asm-generic/ directory. Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Xiaoyao Li	a14a2b0148	s390: Switch to use confidential_guest_kvm_init() Use unified confidential_guest_kvm_init() for consistency with other architectures. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20240229060038.606591-1-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Xiaoyao Li	00a238b1a8	ppc/pef: switch to use confidential_guest_kvm_init/reset() Use the unified interface to call confidential guest related kvm_init() and kvm_reset(), to avoid exposing pef specific functions. As a bonus, pef.h goes away since there is no direct call from sPAPR board code to PEF code anymore. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Xiaoyao Li	637c95b37b	i386/sev: Switch to use confidential_guest_kvm_init() Use confidential_guest_kvm_init() instead of calling SEV specific sev_kvm_init(). This allows the introduction of multiple confidential-guest-support subclasses for different x86 vendors. As a bonus, stubs are not needed anymore since there is no direct call from target/i386/kvm/kvm.c to SEV code. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20240229060038.606591-1-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Xiaoyao Li	41a605944e	confidential guest support: Add kvm_init() and kvm_reset() in class Different confidential VMs in different architectures all have the same needs to do their specific initialization (and maybe resetting) stuffs with KVM. Currently each of them exposes individual *_kvm_init() functions and let machine code or kvm code to call it. To facilitate the introduction of confidential guest technology from different x86 vendors, add two virtual functions, kvm_init() and kvm_reset() in ConfidentialGuestSupportClass, and expose two helpers functions for invodking them. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20240229060038.606591-1-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Xiaoyao Li	292dd287e7	hw/i386/acpi: Set PCAT_COMPAT bit only when pic is not disabled A value 1 of PCAT_COMPAT (bit 0) of MADT.Flags indicates that the system also has a PC-AT-compatible dual-8259 setup, i.e., the PIC. When PIC is not enabled (pic=off) for x86 machine, the PCAT_COMPAT bit needs to be cleared. The PIC probe should then print: [ 0.155970] Using NULL legacy PIC However, no such log printed in guest kernel unless PCAT_COMPAT is cleared. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240403145953.3082491-1-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Isaku Yamahata	b07bf7b73f	q35: Introduce smm_ranges property for q35-pci-host Add a q35 property to check whether or not SMM ranges, e.g. SMRAM, TSEG, etc... exist for the target platform. TDX doesn't support SMM and doesn't play nice with QEMU modifying related guest memory ranges. Signed-off-by: Isaku Yamahata <isaku.yamahata@linux.intel.com> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-ID: <20240320083945.991426-19-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Isaku Yamahata	42c11ae241	pci-host/q35: Move PAM initialization above SMRAM initialization In mch_realize(), process PAM initialization before SMRAM initialization so that later patch can skill all the SMRAM related with a single check. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-ID: <20240320083945.991426-18-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Pawan Gupta	41bdd98128	target/i386: Export RFDS bit to guests Register File Data Sampling (RFDS) is a CPU side-channel vulnerability that may expose stale register value. CPUs that set RFDS_NO bit in MSR IA32_ARCH_CAPABILITIES indicate that they are not vulnerable to RFDS. Similarly, RFDS_CLEAR indicates that CPU is affected by RFDS, and has the microcode to help mitigate RFDS. Make RFDS_CLEAR and RFDS_NO bits available to guests. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <9a38877857392b5c2deae7e7db1b170d15510314.1710341348.git.pawan.kumar.gupta@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Tao Su	6e82d3b622	target/i386: Add new CPU model SierraForest According to table 1-2 in Intel Architecture Instruction Set Extensions and Future Features (rev 051) [1], SierraForest has the following new features which have already been virtualized: - CMPCCXADD CPUID.(EAX=7,ECX=1):EAX[bit 7] - AVX-IFMA CPUID.(EAX=7,ECX=1):EAX[bit 23] - AVX-VNNI-INT8 CPUID.(EAX=7,ECX=1):EDX[bit 4] - AVX-NE-CONVERT CPUID.(EAX=7,ECX=1):EDX[bit 5] Add above features to new CPU model SierraForest. Comparing with GraniteRapids CPU model, SierraForest bare-metal removes the following features: - HLE CPUID.(EAX=7,ECX=0):EBX[bit 4] - RTM CPUID.(EAX=7,ECX=0):EBX[bit 11] - AVX512F CPUID.(EAX=7,ECX=0):EBX[bit 16] - AVX512DQ CPUID.(EAX=7,ECX=0):EBX[bit 17] - AVX512_IFMA CPUID.(EAX=7,ECX=0):EBX[bit 21] - AVX512CD CPUID.(EAX=7,ECX=0):EBX[bit 28] - AVX512BW CPUID.(EAX=7,ECX=0):EBX[bit 30] - AVX512VL CPUID.(EAX=7,ECX=0):EBX[bit 31] - AVX512_VBMI CPUID.(EAX=7,ECX=0):ECX[bit 1] - AVX512_VBMI2 CPUID.(EAX=7,ECX=0):ECX[bit 6] - AVX512_VNNI CPUID.(EAX=7,ECX=0):ECX[bit 11] - AVX512_BITALG CPUID.(EAX=7,ECX=0):ECX[bit 12] - AVX512_VPOPCNTDQ CPUID.(EAX=7,ECX=0):ECX[bit 14] - LA57 CPUID.(EAX=7,ECX=0):ECX[bit 16] - TSXLDTRK CPUID.(EAX=7,ECX=0):EDX[bit 16] - AMX-BF16 CPUID.(EAX=7,ECX=0):EDX[bit 22] - AVX512_FP16 CPUID.(EAX=7,ECX=0):EDX[bit 23] - AMX-TILE CPUID.(EAX=7,ECX=0):EDX[bit 24] - AMX-INT8 CPUID.(EAX=7,ECX=0):EDX[bit 25] - AVX512_BF16 CPUID.(EAX=7,ECX=1):EAX[bit 5] - fast zero-length MOVSB CPUID.(EAX=7,ECX=1):EAX[bit 10] - fast short CMPSB, SCASB CPUID.(EAX=7,ECX=1):EAX[bit 12] - AMX-FP16 CPUID.(EAX=7,ECX=1):EAX[bit 21] - PREFETCHI CPUID.(EAX=7,ECX=1):EDX[bit 14] - XFD CPUID.(EAX=0xD,ECX=1):EAX[bit 4] - EPT_PAGE_WALK_LENGTH_5 VMX_EPT_VPID_CAP(0x48c)[bit 7] Add all features of GraniteRapids CPU model except above features to SierraForest CPU model. SierraForest doesn’t support TSX and RTM but supports TAA_NO. When RTM is not enabled in host, KVM will not report TAA_NO. So, just don't include TAA_NO in SierraForest CPU model. [1] https://cdrdv2.intel.com/v1/dl/getContent/671368 Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Tao Su <tao1.su@linux.intel.com> Message-ID: <20240320021044.508263-1-tao1.su@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Zhenzhong Duan	c895fa54e3	target/i386: Introduce Icelake-Server-v7 to enable TSX When start L2 guest with both L1/L2 using Icelake-Server-v3 or above, QEMU reports below warning: "warning: host doesn't support requested feature: MSR(10AH).taa-no [bit 8]" Reason is QEMU Icelake-Server-v3 has TSX feature disabled but enables taa-no bit. It's meaningless that TSX isn't supported but still claim TSX is secure. So L1 KVM doesn't expose taa-no to L2 if TSX is unsupported, then starting L2 triggers the warning. Fix it by introducing a new version Icelake-Server-v7 which has both TSX and taa-no features. Then guest can use TSX securely when it see taa-no. This matches the production Icelake which supports TSX and isn't susceptible to TSX Async Abort (TAA) vulnerabilities, a.k.a, taa-no. Ideally, TSX should have being enabled together with taa-no since v3, but for compatibility, we'd better to add v7 to enable it. Fixes: `d965dc3559` ("target/i386: Add ARCH_CAPABILITIES related bits into Icelake-Server CPU model") Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-ID: <20240320093138.80267-2-zhenzhong.duan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Sean Christopherson	a5acf4f26c	i386/kvm: Move architectural CPUID leaf generation to separate helper Move the architectural (for lack of a better term) CPUID leaf generation to a separate helper so that the generation code can be reused by TDX, which needs to generate a canonical VM-scoped configuration. For now this is just a cleanup, so keep the function static. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240229063726.610065-23-xiaoyao.li@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:13 +02:00
Peter Maydell	c25df57ae8	Update version for 9.0.0 release Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2024-04-23 14:19:21 +01:00
Gerd Hoffmann	0d08c42368	kvm: add support for guest physical bits Query kvm for supported guest physical address bits, in cpuid function 80000008, eax[23:16]. Usually this is identical to host physical address bits. With NPT or EPT being used this might be restricted to 48 (max 4-level paging address space size) even if the host cpu supports more physical address bits. When set pass this to the guest, using cpuid too. Guest firmware can use this to figure how big the usable guest physical address space is, so PCI bar mapping are actually reachable. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240318155336.156197-2-kraxel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-18 11:17:28 +02:00
Gerd Hoffmann	513ba32dcc	target/i386: add guest-phys-bits cpu property Allows to set guest-phys-bits (cpuid leaf 80000008, eax[23:16]) via -cpu $model,guest-phys-bits=$nr. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-ID: <20240318155336.156197-3-kraxel@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-18 11:17:28 +02:00
Paolo Bonzini	85fa9acda8	hw: Add compat machines for 9.1 Add 9.1 machine types for arm/i440fx/m68k/q35/s390x/spapr. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Cc: Gavin Shan <gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-18 11:17:28 +02:00
Paolo Bonzini	1e1e48792a	kvm: use configs/ definition to conditionalize debug support If an architecture adds support for KVM_CAP_SET_GUEST_DEBUG but QEMU does not have the necessary code, QEMU will fail to build after updating kernel headers. Avoid this by using a #define in config-target.h instead of KVM_CAP_SET_GUEST_DEBUG. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-18 11:17:27 +02:00
Paolo Bonzini	f89761d349	vga: move dirty memory region code together Take into account split screen mode close to wrap around, which is the other special case for dirty memory region computation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-18 11:17:27 +02:00
Paolo Bonzini	ab75ecb79b	vga: optimize computation of dirty memory region The depth == 0 and depth == 15 have to be special cased because width * depth / 8 does not provide the correct scanline length. However, thanks to the recent reorganization of vga_draw_graphic() the correct value of VRAM bits per pixel is available in "bits". Use it (via the same "bwidth" computation that is used later in the function), thus restricting the slow path to the wraparound case. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-18 11:17:27 +02:00
Paolo Bonzini	748e62dbf5	stubs: move monitor_fdsets_cleanup with other fdset stubs Even though monitor_get_fd() has to remain separate because it is mocked by tests/unit/test-util-sockets, monitor_fdsets_cleanup() is logically part of the stubs for monitor/fds.c, so move it there. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240408155330.522792-19-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-18 11:17:27 +02:00

... 3 4 5 6 7 ...

112545 Commits