x86_cpu_get_supported_cpuid() already checks for KVM/HVF
accelerators, so it is not needed to manually check it via
a call to accel_uses_host_cpuid() before calling it.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230913093009.83520-3-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In case more code is added after the kvm_hyperv_expand_features()
call, check its return value (since it can fail).
Fixes: 071ce4b03b ("i386: expand Hyper-V features during CPU feature expansion time")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230913093009.83520-2-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Latest Intel platform GraniteRapids-D introduces AMX-COMPLEX, which adds
two instructions to perform matrix multiplication of two tiles containing
complex elements and accumulate the results into a packed single precision
tile.
AMX-COMPLEX is enumerated via CPUID.(EAX=7,ECX=1):EDX[bit 8]. Add the CPUID
definition for AMX-COMPLEX, AMX-COMPLEX will be enabled automatically when
using '-cpu host' and KVM advertises AMX-COMPLEX to userspace.
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230830074324.84059-1-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Current QEMU can expose waitpkg to guests when it is available. However,
VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE is still not recognized and
masked by QEMU. This can lead to an unexpected situation when a L1
hypervisor wants to expose waitpkg to a L2 guest. The L1 hypervisor can
assume that VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE exists as waitpkg is
available. The L1 hypervisor then can accidentally expose waitpkg to the
L2 guest. This will cause invalid opcode exception in the L2 guest when
it executes waitpkg related instructions.
This patch adds VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE support, and
sets up dependency between the bit and CPUID_7_0_ECX_WAITPKG. QEMU should
not expose waitpkg feature if VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE is
not available to avoid unexpected invalid opcode exception in L2 guests.
Signed-off-by: Ake Koomsin <ake@igel.co.jp>
Message-ID: <20230807093339.32091-2-ake@igel.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The GraniteRapids CPU model mainly adds the following new features
based on SapphireRapids:
- PREFETCHITI CPUID.(EAX=7,ECX=1):EDX[bit 14]
- AMX-FP16 CPUID.(EAX=7,ECX=1):EAX[bit 21]
And adds the following security fix for corresponding vulnerabilities:
- MCDT_NO CPUID.(EAX=7,ECX=2):EDX[bit 5]
- SBDR_SSDP_NO MSR_IA32_ARCH_CAPABILITIES[bit 13]
- FBSDP_NO MSR_IA32_ARCH_CAPABILITIES[bit 14]
- PSDP_NO MSR_IA32_ARCH_CAPABILITIES[bit 15]
- PBRSB_NO MSR_IA32_ARCH_CAPABILITIES[bit 24]
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230706054949.66556-7-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SapphireRapids has bit 13, 14 and 15 of MSR_IA32_ARCH_CAPABILITIES
enabled, which are related to some security fixes.
Add version 2 of SapphireRapids CPU model with those bits enabled also.
Signed-off-by: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Message-ID: <20230706054949.66556-6-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CPUID.(EAX=7,ECX=2):EDX[bit 5] enumerates MCDT_NO. Processors enumerate
this bit as 1 do not exhibit MXCSR Configuration Dependent Timing (MCDT)
behavior and do not need to be mitigated to avoid data-dependent behavior
for certain instructions.
Since MCDT_NO is in a new sub-leaf, add a new CPUID feature word
FEAT_7_2_EDX. Also update cpuid_level_func7 by FEAT_7_2_EDX.
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230706054949.66556-3-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If FEAT_7_1_EAX is 0 and FEAT_7_1_EDX is non-zero, as is the case
with a Granite Rapids host and
'-cpu host,-avx-vnni,-avx512-bf16,-fzrm,-fsrs,-fsrc,-amx-fp16', we can't
get CPUID_7_1 leaf even though CPUID_7_1_EDX has non-zero value.
Update cpuid_level_func7 according to CPUID_7_1_EDX, otherwise
guest may report wrong maximum number sub-leaves in leaf 07H.
Fixes: eaaa197d5b ("target/i386: Add support for AVX-VNNI-INT8 in CPUID enumeration")
Cc: qemu-stable@nongnu.org
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230706054949.66556-2-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
32-bit binaries can run on a long mode processor even if the kernel
is 64-bit, of course, and this can have slightly different behavior;
for example, SYSCALL is allowed on Intel processors.
Allow reporting LM to programs running under user mode emulation,
so that "-cpu" can be used with named CPU models even for qemu-i386
and even without disabling LM by hand.
Fortunately, most of the runtime code in QEMU has to depend on HF_LMA_MASK
or on HF_CS64_MASK (which is anyway false for qemu-i386's 32-bit code
segment) rather than TARGET_X86_64, therefore all that is needed is an
update of linux-user's ring 0 setup.
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1534
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Features such as PCID are only accessible through privileged operations,
and therefore have no impact on any user-mode operation. Allow reporting
them to programs running under user mode emulation, so that "-cpu" can be
used with more named CPU models.
XSAVES would be similar, but it doesn't make sense to provide it until
XSAVEC is implemented.
With this change, all CPUs up to Broadwell-v4 can be emulate. Skylake-Client
requires XSAVEC, while EPYC also requires SHA-NI, MISALIGNSSE and TOPOEXT.
MISALIGNSSE is not hard to implement, but I am not sure it is worth using
a precious hflags bit for it.
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1534
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ARCH_CAPABILITIES is only accessible through a read-only MSR, so it has
no impact on any user-mode operation (user-mode cannot read the MSR).
So do not bother printing warnings about it in user mode emulation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On Intel CPUs there are certain bits in MSR_ARCH_CAPABILITIES that
indicates if the CPU is not affected by a vulnerability. Without these
bits guests may try to deploy the mitigation even if the CPU is not
affected.
Export the bits to guests that indicate immunity to hardware
vulnerabilities.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Message-ID: <63d85cc76d4cdc51e6c732478b81d8f13be11e5a.1687551881.git.pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
AMD supports both 32-bit and 64-bit SYSCALL/SYSRET, but the TCG only
exposes it for 64-bit targets. For system emulation just reuse the
helper; for user-mode emulation the ABI is the same as "int $80".
The BSDs does not support any fast system call mechanism in 32-bit
mode so add to bsd-user the same stub that FreeBSD has for 64-bit
compatibility mode.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
RDPID corresponds to a RDMSR(TSC_AUX); however, it is unprivileged
so for user-mode emulation we must provide the value that the kernel
places in the MSR. For Linux, it is a combination of the current CPU
and the current NUMA node, both of which can be retrieved with getcpu(2).
Also try sched_getcpu(), which might be there on the BSDs. If there is
no portable way to retrieve the current CPU id from userspace, return 0.
RDTSCP is reimplemented as RDTSC + RDPID ECX; the differences in terms
of serializability are not relevant to QEMU.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
WBNOINVD is the same as INVD or WBINVD as far as TCG is concerned,
since there is no cache in TCG and therefore no invalidation side effect
in WBNOINVD.
With respect to SVM emulation, processors that do not support WBNOINVD
will ignore the prefix and treat it as WBINVD, while those that support
it will generate exactly the same vmexit.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
XSAVEERPTR is actually a fix for an errata; TCG does not have the issue.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
TCG implements RDSEED, and in fact uses qcrypto_random_bytes which is
secure enough to match hardware behavior. Expose it to guests.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The AMD prefetch(w) instructions have not been deprecated together with the rest
of 3DNow!, and in fact are even supported by newer Intel processor. Mark them
as supported by TCG, as it supports all of 3DNow!.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Based on the kernel commit "b0563468ee x86/CPU/AMD: Disable XSAVES on
AMD family 0x17", host system with EPYC-Rome can clear XSAVES capability
bit. In another words, EPYC-Rome host without XSAVES can occur. Thus, we
need an EPYC-Rome cpu model (without this feature) that matches the
solution of fixing this erratum
Signed-off-by: Maksim Davydov <davydov-max@yandex-team.ru>
Message-Id: <20230524213748.8918-1-davydov-max@yandex-team.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As reported by the Intel's doc:
"FB_CLEAR: The processor will overwrite fill buffer values as part of
MD_CLEAR operations with the VERW instruction.
On these processors, L1D_FLUSH does not overwrite fill buffer values."
If this cpu feature is present in host, allow QEMU to choose whether to
show it to the guest too.
One disadvantage of not exposing it is that the guest will report
a non existing vulnerability in
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
because the mitigation is present only when the cpu has
(FLUSH_L1D and MD_CLEAR) or FB_CLEAR
features enabled.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230201135759.555607-3-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As reported by Intel's doc:
"L1D_FLUSH: Writeback and invalidate the L1 data cache"
If this cpu feature is present in host, allow QEMU to choose whether to
show it to the guest too.
One disadvantage of not exposing it is that the guest will report
a non existing vulnerability in
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
because the mitigation is present only when the cpu has
(FLUSH_L1D and MD_CLEAR) or FB_CLEAR
features enabled.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230201135759.555607-2-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Adds the support for AMD EPYC Genoa generation processors. The model
display for the new processor will be EPYC-Genoa.
Adds the following new feature bits on top of the feature bits from
the previous generation EPYC models.
avx512f : AVX-512 Foundation instruction
avx512dq : AVX-512 Doubleword & Quadword Instruction
avx512ifma : AVX-512 Integer Fused Multiply Add instruction
avx512cd : AVX-512 Conflict Detection instruction
avx512bw : AVX-512 Byte and Word Instructions
avx512vl : AVX-512 Vector Length Extension Instructions
avx512vbmi : AVX-512 Vector Byte Manipulation Instruction
avx512_vbmi2 : AVX-512 Additional Vector Byte Manipulation Instruction
gfni : AVX-512 Galois Field New Instructions
avx512_vnni : AVX-512 Vector Neural Network Instructions
avx512_bitalg : AVX-512 Bit Algorithms, add bit algorithms Instructions
avx512_vpopcntdq: AVX-512 AVX-512 Vector Population Count Doubleword and
Quadword Instructions
avx512_bf16 : AVX-512 BFLOAT16 instructions
la57 : 57-bit virtual address support (5-level Page Tables)
vnmi : Virtual NMI (VNMI) allows the hypervisor to inject the NMI
into the guest without using Event Injection mechanism
meaning not required to track the guest NMI and intercepting
the IRET.
auto-ibrs : The AMD Zen4 core supports a new feature called Automatic IBRS.
It is a "set-and-forget" feature that means that, unlike e.g.,
s/w-toggled SPEC_CTRL.IBRS, h/w manages its IBRS mitigation
resources automatically across CPL transitions.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <20230504205313.225073-8-babu.moger@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add the following featute bits.
vnmi: Virtual NMI (VNMI) allows the hypervisor to inject the NMI into the
guest without using Event Injection mechanism meaning not required to
track the guest NMI and intercepting the IRET.
The presence of this feature is indicated via the CPUID function
0x8000000A_EDX[25].
automatic-ibrs :
The AMD Zen4 core supports a new feature called Automatic IBRS.
It is a "set-and-forget" feature that means that, unlike e.g.,
s/w-toggled SPEC_CTRL.IBRS, h/w manages its IBRS mitigation
resources automatically across CPL transitions.
The presence of this feature is indicated via the CPUID function
0x80000021_EAX[8].
The documention for the features are available in the links below.
a. Processor Programming Reference (PPR) for AMD Family 19h Model 01h,
Revision B1 Processors
b. AMD64 Architecture Programmer’s Manual Volumes 1–5 Publication No. Revision
40332 4.05 Date October 2022
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip
Link: https://www.amd.com/system/files/TechDocs/40332_4.05.pdf
Message-Id: <20230504205313.225073-7-babu.moger@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add the following feature bits for EPYC-Milan model and bump the version.
vaes : Vector VAES(ENC|DEC), VAES(ENC|DEC)LAST instruction support
vpclmulqdq : Vector VPCLMULQDQ instruction support
stibp-always-on : Single Thread Indirect Branch Prediction Mode has enhanced
performance and may be left Always on
amd-psfd : Predictive Store Forward Disable
no-nested-data-bp : Processor ignores nested data breakpoints
lfence-always-serializing : LFENCE instruction is always serializing
null-sel-clr-base : Null Selector Clears Base. When this bit is
set, a null segment load clears the segment base
These new features will be added in EPYC-Milan-v2. The "-cpu help" output
after the change will be.
x86 EPYC-Milan (alias configured by machine type)
x86 EPYC-Milan-v1 AMD EPYC-Milan Processor
x86 EPYC-Milan-v2 AMD EPYC-Milan Processor
The documentation for the features are available in the links below.
a. Processor Programming Reference (PPR) for AMD Family 19h Model 01h,
Revision B1 Processors
b. SECURITY ANALYSIS OF AMD PREDICTIVE STORE FORWARDING
c. AMD64 Architecture Programmer’s Manual Volumes 1–5 Publication No. Revision
40332 4.05 Date October 2022
Signed-off-by: Babu Moger <babu.moger@amd.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip
Link: https://www.amd.com/system/files/documents/security-analysis-predictive-store-forwarding.pdf
Link: https://www.amd.com/system/files/TechDocs/40332_4.05.pdf
Message-Id: <20230504205313.225073-6-babu.moger@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add the following feature bits.
no-nested-data-bp : Processor ignores nested data breakpoints.
lfence-always-serializing : LFENCE instruction is always serializing.
null-sel-cls-base : Null Selector Clears Base. When this bit is
set, a null segment load clears the segment base.
The documentation for the features are available in the links below.
a. Processor Programming Reference (PPR) for AMD Family 19h Model 01h,
Revision B1 Processors
b. AMD64 Architecture Programmer’s Manual Volumes 1–5 Publication No. Revision
40332 4.05 Date October 2022
Signed-off-by: Babu Moger <babu.moger@amd.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip
Link: https://www.amd.com/system/files/TechDocs/40332_4.05.pdf
Message-Id: <20230504205313.225073-5-babu.moger@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add the following feature bits.
amd-psfd : Predictive Store Forwarding Disable:
PSF is a hardware-based micro-architectural optimization
designed to improve the performance of code execution by
predicting address dependencies between loads and stores.
While SSBD (Speculative Store Bypass Disable) disables both
PSF and speculative store bypass, PSFD only disables PSF.
PSFD may be desirable for the software which is concerned
with the speculative behavior of PSF but desires a smaller
performance impact than setting SSBD.
Depends on the following kernel commit:
b73a54321ad8 ("KVM: x86: Expose Predictive Store Forwarding Disable")
stibp-always-on :
Single Thread Indirect Branch Prediction mode has enhanced
performance and may be left always on.
The documentation for the features are available in the links below.
a. Processor Programming Reference (PPR) for AMD Family 19h Model 01h,
Revision B1 Processors
b. SECURITY ANALYSIS OF AMD PREDICTIVE STORE FORWARDING
Signed-off-by: Babu Moger <babu.moger@amd.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://www.amd.com/system/files/documents/security-analysis-predictive-store-forwarding.pdf
Link: https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip
Message-Id: <20230504205313.225073-4-babu.moger@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce new EPYC cpu versions: EPYC-v4 and EPYC-Rome-v3.
The only difference vs. older models is an updated cache_info with
the 'complex_indexing' bit unset, since this bit is not currently
defined for AMD and may cause problems should it be used for
something else in the future. Setting this bit will also cause
CPUID validation failures when running SEV-SNP guests.
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20230504205313.225073-3-babu.moger@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
New EPYC CPUs versions require small changes to their cache_info's.
Because current QEMU x86 CPU definition does not support versioned
cach_info, we would have to declare a new CPU type for each such case.
To avoid the dup work, add "cache_info" in X86CPUVersionDefinition",
to allow new cache_info pointers to be specified for a new CPU version.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20230504205313.225073-2-babu.moger@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Latest Intel platform Granite Rapids has introduced a new instruction -
PREFETCHIT0/1, which moves code to memory (cache) closer to the
processor depending on specific hints.
The bit definition:
CPUID.(EAX=7,ECX=1):EDX[bit 14]
Add CPUID definition for PREFETCHIT0/1.
Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20230303065913.1246327-7-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
AVX-NE-CONVERT is a new set of instructions which can convert low
precision floating point like BF16/FP16 to high precision floating point
FP32, as well as convert FP32 elements to BF16. This instruction allows
the platform to have improved AI capabilities and better compatibility.
The bit definition:
CPUID.(EAX=7,ECX=1):EDX[bit 5]
Add CPUID definition for AVX-NE-CONVERT.
Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20230303065913.1246327-6-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
AVX-VNNI-INT8 is a new set of instructions in the latest Intel platform
Sierra Forest, aims for the platform to have superior AI capabilities.
This instruction multiplies the individual bytes of two unsigned or
unsigned source operands, then adds and accumulates the results into the
destination dword element size operand.
The bit definition:
CPUID.(EAX=7,ECX=1):EDX[bit 4]
AVX-VNNI-INT8 is on a new feature bits leaf. Add a CPUID feature word
FEAT_7_1_EDX for this leaf.
Add CPUID definition for AVX-VNNI-INT8.
Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20230303065913.1246327-5-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
AVX-IFMA is a new instruction in the latest Intel platform Sierra
Forest. This instruction packed multiplies unsigned 52-bit integers and
adds the low/high 52-bit products to Qword Accumulators.
The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 23]
Add CPUID definition for AVX-IFMA.
Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20230303065913.1246327-4-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Latest Intel platform Granite Rapids has introduced a new instruction -
AMX-FP16, which performs dot-products of two FP16 tiles and accumulates
the results into a packed single precision tile. AMX-FP16 adds FP16
capability and allows a FP16 GPU trained model to run faster without
loss of accuracy or added SW overhead.
The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 21]
Add CPUID definition for AMX-FP16.
Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20230303065913.1246327-3-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CMPccXADD is a new set of instructions in the latest Intel platform
Sierra Forest. This new instruction set includes a semaphore operation
that can compare and add the operands if condition is met, which can
improve database performance.
The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 7]
Add CPUID definition for CMPCCXADD.
Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20230303065913.1246327-2-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Update the setting of CPUID 0x8000001F EBX to clearly document the ranges
associated with fields being set.
Fixes: 6cb8f2a663 ("cpu/i386: populate CPUID 0x8000_001F when SEV is active")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <5822fd7d02b575121380e1f493a8f6d9eba2b11a.1664550870.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We want to get rid of the "#ifdef TARGET_X86_64" compile-time switch
in the long run, so we can drop the separate compilation of the
"qemu-system-i386" binary one day - but we then still need a way to
run a guest with max. CPU settings in 32-bit mode. So the "max" CPU
should determine its family/model/stepping settings according to the
"large mode" (LM) CPU feature bit during runtime, so that it is
possible to run "qemu-system-x86_64 -cpu max,lm=off" and still get
a sane family/model/stepping setting for the guest CPU.
To be able to check the LM bit, we have to move the code that sets
up these properties to a "realize" function, since the LM setting is
not available yet when the "instance_init" function is being called.
Message-Id: <20230306154311.476458-1-thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The previous patch wrongly replaced FEAT_XSAVE_XCR0_{LO|HI} with
FEAT_XSAVE_XSS_{LO|HI} in CPUID(EAX=12,ECX=1):{ECX,EDX}. As a result,
SGX enclaves only supported SSE and x87 feature (xfrm=0x3).
Fixes: 301e90675c ("target/i386: Enable support for XSAVES based features")
Signed-off-by: Yang Zhong <yang.zhong@linux.intel.com>
Reviewed-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20230406064041.420039-1-yang.zhong@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This had been pulled in via qemu/plugin.h from hw/core/cpu.h,
but that will be removed.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230310195252.210956-5-richard.henderson@linaro.org>
[AJB: add various additional cases shown by CI]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230315174331.2959-15-alex.bennee@linaro.org>
Reviewed-by: Emilio Cota <cota@braap.org>
Since commit a0e61807a3 ("qapi: Remove QMP events and commands from
user-mode builds") we don't generate the "qapi-commands-machine.h"
header in a user-emulation-only build.
Guard qmp_query_cpu_definitions() within CONFIG_USER_ONLY; move
x86_cpu_class_check_missing_features() closer since it is only used
by this QMP command handler.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230223155540.30370-3-philmd@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230227135202.9710-3-anjo@rev.ng>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Introduce support for emulating CPUID for Xen HVM guests. It doesn't make
sense to advertise the KVM leaves to a Xen guest, so do Xen unconditionally
when the xen-version machine property is set.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Obtain xen_version from KVM property, make it automatic]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
The new CPU model mostly inherits features from Icelake-Server, while
adding new features:
- AMX (Advance Matrix eXtensions)
- Bus Lock Debug Exception
and new instructions:
- AVX VNNI (Vector Neural Network Instruction):
- VPDPBUS: Multiply and Add Unsigned and Signed Bytes
- VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation
- VPDPWSSD: Multiply and Add Signed Word Integers
- VPDPWSSDS: Multiply and Add Signed Integers with Saturation
- FP16: Replicates existing AVX512 computational SP (FP32) instructions
using FP16 instead of FP32 for ~2X performance gain
- SERIALIZE: Provide software with a simple way to force the processor to
complete all modifications, faster, allowed in all privilege levels and
not causing an unconditional VM exit
- TSX Suspend Load Address Tracking: Allows programmers to choose which
memory accesses do not need to be tracked in the TSX read set
- AVX512_BF16: Vector Neural Network Instructions supporting BFLOAT16
inputs and conversion instructions from IEEE single precision
- fast zero-length MOVSB (KVM doesn't support yet)
- fast short STOSB (KVM doesn't support yet)
- fast short CMPSB, SCASB (KVM doesn't support yet)
Features that may be added in future versions:
- CET (virtualization support hasn't been merged)
Signed-off-by: Wang, Lei <lei4.wang@intel.com>
Reviewed-by: Robert Hoo <robert.hu@linux.intel.com>
Message-Id: <20220812055751.14553-1-lei4.wang@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These are three more markers for string operation optimizations.
They can all be added to TCG, whose string operations are more or
less as fast as they can be for short lengths.
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fast short REP MOVS can be added to TCG, since a trivial translation
of string operation is a good option for short lengths.
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new SGX Asynchronous Exit (AEX) notification mechanism (AEX-notify)
allows one enclave to receive a notification in the ERESUME after the
enclave exit due to an AEX. EDECCSSA is a new SGX user leaf function
(ENCLU[EDECCSSA]) to facilitate the AEX notification handling.
Whether the hardware supports to create enclave with AEX-notify support
is enumerated via CPUID.(EAX=0x12,ECX=0x1):EAX[10]. The new EDECCSSA
user leaf function is enumerated via CPUID.(EAX=0x12,ECX=0x0):EAX[11].
Add support to allow to expose the new SGX AEX-notify feature and the
new EDECCSSA user leaf function to KVM guest.
Link: https://lore.kernel.org/lkml/166760360549.4906.809756297092548496.tip-bot2@tip-bot2/
Link: https://lore.kernel.org/lkml/166760360934.4906.2427175408052308969.tip-bot2@tip-bot2/
Reviewed-by: Yang Zhong <yang.zhong@linux.intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20221109024834.172705-1-kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Convert the i386 CPU class to use 3-phase reset, so it doesn't
need to use device_class_set_parent_reset() any more.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Edgar E. Iglesias <edgar@zeroasic.com>
Reviewed-by: Taylor Simpson <tsimpson@quicinc.com>
Message-id: 20221124115023.2437291-7-peter.maydell@linaro.org
The has_FOO for pointer-valued FOO are redundant, except for arrays.
They are also a nuisance to work with. Recent commit "qapi: Start to
elide redundant has_FOO in generated C" provided the means to elide
them step by step. This is the step for qapi/machine*.json.
Said commit explains the transformation in more detail. The invariant
violations mentioned there do not occur here.
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221104160712.3005652-16-armbru@redhat.com>
The only issue with FMA instructions is that there are _a lot_ of them (30
opcodes, each of which comes in up to 4 versions depending on VEX.W and
VEX.L; a total of 96 possibilities). However, they can be implement with
only 6 helpers, two for scalar operations and four for packed operations.
(Scalar versions do not do any merging; they only affect the bottom 32
or 64 bits of the output operand. Therefore, there is no separate XMM
and YMM of the scalar helpers).
First, we can reduce the number of helpers to one third by passing four
operands (one output and three inputs); the reordering of which operands
go to the multiply and which go to the add is done in emit.c.
Second, the different instructions also dispatch to the same softfloat
function, so the flags for float32_muladd and float64_muladd are passed
in the helper as int arguments, with a little extra complication to
handle FMADDSUB and FMSUBADD.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
F16C only consists of two instructions, which are a bit peculiar
nevertheless.
First, they access only the low half of an YMM or XMM register for the
packed-half operand; the exact size still depends on the VEX.L flag.
This is similar to the existing avx_movx flag, but not exactly because
avx_movx is hardcoded to affect operand 2. To this end I added a "ph"
format name; it's possible to reuse this approach for the VPMOVSX and
VPMOVZX instructions, though that would also require adding two more
formats for the low-quarter and low-eighth of an operand.
Second, VCVTPS2PH is somewhat weird because it *stores* the result of
the instruction into memory rather than loading it.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Include AVX, AVX2 and VAES in the guest cpuid features supported by TCG.
Signed-off-by: Paul Brook <paul@nowt.org>
Message-Id: <20220424220204.2493824-40-paul@nowt.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The semantic difference between the deprecated device_legacy_reset()
function and the newer device_cold_reset() function is that the new
function resets both the device itself and any qbuses it owns,
whereas the legacy function resets just the device itself and nothing
else.
The x86_cpu_after_reset() function uses device_legacy_reset() to reset
the APIC; this is an APICCommonState and does not have any qbuses, so
for this purpose the two functions behave identically and we can stop
using the deprecated one.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20221013171926.1447899-1-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Resetting a guest that has Hyper-V VMBus support enabled triggers a QEMU
assertion failure:
hw/hyperv/hyperv.c:131: synic_reset: Assertion `QLIST_EMPTY(&synic->sint_routes)' failed.
This happens both on normal guest reboot or when using "system_reset" HMP
command.
The failing assertion was introduced by commit 64ddecc88b ("hyperv: SControl is optional to enable SynIc")
to catch dangling SINT routes on SynIC reset.
The root cause of this problem is that the SynIC itself is reset before
devices using SINT routes have chance to clean up these routes.
Since there seems to be no existing mechanism to force reset callbacks (or
methods) to be executed in specific order let's use a similar method that
is already used to reset another interrupt controller (APIC) after devices
have been reset - by invoking the SynIC reset from the machine reset
handler via a new x86_cpu_after_reset() function co-located with
the existing x86_cpu_reset() in target/i386/cpu.c.
Opportunistically move the APIC reset handler there, too.
Fixes: 64ddecc88b ("hyperv: SControl is optional to enable SynIc") # exposed the bug
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <cb57cee2e29b20d06f81dce054cbcea8b5d497e8.1664552976.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add support for saving/restoring extended save states when signals
are delivered. This allows using AVX, MPX or PKRU registers in
signal handlers.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For the direct triple faults, i.e. hardware detected and KVM morphed
to VM-Exit, KVM will never lose them. But for triple faults sythesized
by KVM, e.g. the RSM path, if KVM exits to userspace before the request
is serviced, userspace could migrate the VM and lose the triple fault.
A new flag KVM_VCPUEVENT_VALID_TRIPLE_FAULT is defined to signal that
the event.triple_fault_pending field contains a valid state if the
KVM_CAP_X86_TRIPLE_FAULT_EVENT capability is enabled.
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220929072014.20705-2-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When the user queries CPU models via QMP there is a 'deprecated' flag
present, however, this is not done for the CLI '-cpu help' command.
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220707163720.1421716-5-berrange@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When QEMU is started with '-cpu host,host-cache-info=on', it will
passthrough host's number of logical processors sharing cache and
number of processor cores in the physical package. QEMU already
fixes up the later to correctly reflect number of configured cores
for VM, however number of logical processors sharing cache is still
comes from host CPU, which confuses guest started with:
-machine q35,accel=kvm \
-cpu host,host-cache-info=on,l3-cache=off \
-smp 20,sockets=2,dies=1,cores=10,threads=1 \
-numa node,nodeid=0,memdev=ram-node0 \
-numa node,nodeid=1,memdev=ram-node1 \
-numa cpu,socket-id=0,node-id=0 \
-numa cpu,socket-id=1,node-id=1
on 2 socket Xeon 4210R host with 10 cores per socket
with CPUID[04H]:
...
--- cache 3 ---
cache type = unified cache (3)
cache level = 0x3 (3)
self-initializing cache level = true
fully associative cache = false
maximum IDs for CPUs sharing cache = 0x1f (31)
maximum IDs for cores in pkg = 0xf (15)
...
that doesn't match number of logical processors VM was
configured with and as result RHEL 9.0 guest complains:
sched: CPU #10's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
WARNING: CPU: 10 PID: 0 at arch/x86/kernel/smpboot.c:421 topology_sane.isra.0+0x67/0x80
...
Call Trace:
set_cpu_sibling_map+0x176/0x590
start_secondary+0x5b/0x150
secondary_startup_64_no_verify+0xc2/0xcb
Fix it by capping max number of logical processors to vcpus/socket
as it was configured, which fixes the issue.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2088311
Message-Id: <20220524151020.2541698-3-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Accourding Intel's CPUID[EAX=04H] resulting bits 31 - 26 in EAX
should be:
"
**** The nearest power-of-2 integer that is not smaller than (1 + EAX[31:26]) is the number of unique
Core_IDs reserved for addressing different processor cores in a physical package. Core ID is a subset of
bits of the initial APIC ID.
"
ensure that values stored in EAX[31-26] always meets this condition.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220524151020.2541698-2-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The previous patch used wrong count setting with index value, which got wrong
value from CPUID(EAX=12,ECX=0):EAX. So the SGX1 instruction can't be exposed
to VM and the SGX decice can't work in VM.
Fixes: d19d6ffa07 ("target/i386: introduce helper to access supported CPUID")
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220530131834.1222801-1-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V TLFS allows for L0 and L1 hypervisors to collaborate on L2's
TLB flush hypercalls handling. With the correct setup, L2's TLB flush
hypercalls can be handled by L0 directly, without the need to exit to
L1.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220525115949.1294004-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM kind of supported "extended GVA ranges" (up to 4095 additional GFNs
per hypercall) since the implementation of Hyper-V PV TLB flush feature
(Linux-4.18) as regardless of the request, full TLB flush was always
performed. "Extended GVA ranges for TLB flush hypercalls" feature bit
wasn't exposed then. Now, as KVM gains support for fine-grained TLB
flush handling, exposing this feature starts making sense.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220525115949.1294004-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V specification allows to pass parameters for certain hypercalls
using XMM registers ("XMM Fast Hypercall Input"). When the feature is
in use, it allows for faster hypercalls processing as KVM can avoid
reading guest's memory.
KVM supports the feature since v5.14.
Rename HV_HYPERCALL_{PARAMS_XMM_AVAILABLE -> XMM_INPUT_AVAILABLE} to
comply with KVM.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220525115949.1294004-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The newly introduced enlightenment allow L0 (KVM) and L1 (Hyper-V)
hypervisors to collaborate to avoid unnecessary updates to L2
MSR-Bitmap upon vmexits.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220525115949.1294004-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since KVM commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled")
it is not possible to disable MPX on a "-cpu host" just by adding "-mpx"
there if the host CPU does indeed support MPX.
QEMU will fail to set MSR_IA32_VMX_TRUE_{EXIT,ENTRY}_CTLS MSRs in this case
and so trigger an assertion failure.
Instead, besides "-mpx" one has to explicitly add also
"-vmx-exit-clear-bndcfgs" and "-vmx-entry-load-bndcfgs" to QEMU command
line to make it work, which is a bit convoluted.
Make the MPX-related bits in FEAT_VMX_{EXIT,ENTRY}_CTLS dependent on MPX
being actually enabled so such workarounds are no longer necessary.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <51aa2125c76363204cc23c27165e778097c33f0b.1653323077.git.maciej.szmigiero@oracle.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If CPUID.(EAX=07H, ECX=0):EDX[19] is set to 1, the processor
supports Architectural LBRs. In this case, CPUID leaf 01CH
indicates details of the Architectural LBRs capabilities.
XSAVE support for Architectural LBRs is enumerated in
CPUID.(EAX=0DH, ECX=0FH).
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20220215195258.29149-9-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Define Arch LBR bit in XSS and save/restore structure
for XSAVE area size calculation.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20220215195258.29149-6-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There're some new features, including Arch LBR, depending
on XSAVES/XRSTORS support, the new instructions will
save/restore data based on feature bits enabled in XCR0 | XSS.
This patch adds the basic support for related CPUID enumeration
and meanwhile changes the name from FEAT_XSAVE_COMP_{LO|HI} to
FEAT_XSAVE_XCR0_{LO|HI} to differentiate clearly the feature
bits in XCR0 and those in XSS.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20220215195258.29149-5-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Last Branch Recording (LBR) is a performance monitor unit (PMU)
feature on Intel processors which records a running trace of the most
recent branches taken by the processor in the LBR stack. This option
indicates the LBR format to enable for guest perf.
The LBR feature is enabled if below conditions are met:
1) KVM is enabled and the PMU is enabled.
2) msr-based-feature IA32_PERF_CAPABILITIES is supporterd on KVM.
3) Supported returned value for lbr_fmt from above msr is non-zero.
4) Guest vcpu model does support FEAT_1_ECX.CPUID_EXT_PDCM.
5) User-provided lbr-fmt value doesn't violate its bitmask (0x3f).
6) Target guest LBR format matches that of host.
Co-developed-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20220215195258.29149-3-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Icelake, is the codename for Intel 3rd generation Xeon Scalable server
processors. There isn't ever client variants. This "Icelake-Client" CPU
model was added wrongly and imaginarily.
It has been deprecated since v5.2, now it's time to remove it completely
from code.
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1647247859-4947-1-git-send-email-robert.hu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When cache_info_passthrough is requested, QEMU passes the host values
of the cache information CPUID leaves down to the guest. However,
it blindly assumes that the CPUID leaf exists on the host, and this
cannot be guaranteed: for example, KVM has recently started to
synthesize AMD leaves up to 0x80000021 in order to provide accurate
CPU bug information to guests.
Querying a nonexistent host leaf fills the output arguments of
host_cpuid with data that (albeit deterministic) is nonsensical
as cache information, namely the data in the highest Intel CPUID
leaf. If said highest leaf is not ECX-dependent, this can even
cause an infinite loop when kvm_arch_init_vcpu prepares the input
to KVM_SET_CPUID2. The infinite loop is only terminated by an
abort() when the array gets full.
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Capstone should be superior to the old libopcode disassembler,
so we can drop the old file nowadays.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220412165836.355850-4-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
SynDbg commands can come from two different flows:
1. Hypercalls, in this mode the data being sent is fully
encapsulated network packets.
2. SynDbg specific MSRs, in this mode only the data that needs to be
transfered is passed.
Signed-off-by: Jon Doron <arilou@gmail.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220216102500.692781-4-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some versions of Windows hang on reboot if their TSC value is greater
than 2^54. The calibration of the Hyper-V reference time overflows
and fails; as a result the processors' clock sources are out of sync.
The issue is that the TSC _should_ be reset to 0 on CPU reset and
QEMU tries to do that. However, KVM special cases writing 0 to the
TSC and thinks that QEMU is trying to hot-plug a CPU, which is
correct the first time through but not later. Thwart this valiant
effort and reset the TSC to 1 instead, but only if the CPU has been
run once.
For this to work, env->tsc has to be moved to the part of CPUArchState
that is not zeroed at the beginning of x86_cpu_reset.
Reported-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Supersedes: <20220324082346.72180-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some AMD processors expose the PKRU extended save state even if they do not have
the related PKU feature in CPUID. Worse, when they do they report a size of
64, whereas the expected size of the PKRU extended save state is 8, therefore
the esa->size == eax assertion does not hold.
The state is already ignored by KVM_GET_SUPPORTED_CPUID because it
was not enabled in the host XCR0. However, QEMU kvm_cpu_xsave_init()
runs before QEMU invokes arch_prctl() to enable dynamically-enabled
save states such as XTILEDATA, and KVM_GET_SUPPORTED_CPUID hides save
states that have yet to be enabled. Therefore, kvm_cpu_xsave_init()
needs to consult the host CPUID instead of KVM_GET_SUPPORTED_CPUID,
and dies with an assertion failure.
When setting up the ExtSaveArea array to match the host, ignore features that
KVM does not report as supported. This will cause QEMU to skip the incorrect
CPUID leaf instead of tripping the assertion.
Closes: https://gitlab.com/qemu-project/qemu/-/issues/916
Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Analyzed-by: Yang Zhong <yang.zhong@intel.com>
Reported-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Even when the feature is not supported in guest CPUID,
still set the msr to the default value which will
be the only value KVM will accept in this case
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220223115824.319821-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Windows 11 with WSL2 enabled (Hyper-V) fails to boot with Icelake-Server
{-v5} CPU model but boots well with '-cpu host'. Apparently, it expects
5-level paging and 5-level EPT support to come in pair but QEMU's
Icelake-Server CPU model lacks the later. Introduce 'Icelake-Server-v6'
CPU model with 'vmx-page-walk-5' enabled by default.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220221145316.576138-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add AMX primary feature bits XFD and AMX_TILE to
enumerate the CPU's AMX capability. Meanwhile, add
AMX TILE and TMUL CPUID leaf and subleaves which
exist when AMX TILE is present to provide the maximum
capability of TILE and TMUL.
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-6-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Intel introduces XFD faulting mechanism for extended
XSAVE features to dynamically enable the features in
runtime. If CPUID (EAX=0Dh, ECX=n, n>1).ECX[2] is set
as 1, it indicates support for XFD faulting of this
state component.
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-5-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Kernel allocates 4K xstate buffer by default. For XSAVE features
which require large state component (e.g. AMX), Linux kernel
dynamically expands the xstate buffer only after the process has
acquired the necessary permissions. Those are called dynamically-
enabled XSAVE features (or dynamic xfeatures).
There are separate permissions for native tasks and guests.
Qemu should request the guest permissions for dynamic xfeatures
which will be exposed to the guest. This only needs to be done
once before the first vcpu is created.
KVM implemented one new ARCH_GET_XCOMP_SUPP system attribute API to
get host side supported_xcr0 and Qemu can decide if it can request
dynamically enabled XSAVE features permission.
https://lore.kernel.org/all/20220126152210.3044876-1-pbonzini@redhat.com/
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Message-Id: <20220217060434.52460-4-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The AMX TILECFG register and the TMMx tile data registers are
saved/restored via XSAVE, respectively in state component 17
(64 bytes) and state component 18 (8192 bytes).
Add AMX feature bits to x86_ext_save_areas array to set
up AMX components. Add structs that define the layout of
AMX XSAVE areas and use QEMU_BUILD_BUG_ON to validate the
structs sizes.
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-3-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
indicate whether the extended state component locates
on the next 64-byte boundary following the preceding state
component when the compacted format of an XSAVE area is
used.
Right now, they are all zero because no supported component
needed the bit to be set, but the upcoming AMX feature will
use it. Fix the subleaves value according to KVM's supported
cpuid.
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-2-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The "hardware version" machinery (qemu_set_hw_version(),
qemu_hw_version(), and the QEMU_HW_VERSION define) is used by fewer
than 10 files. Move it out from osdep.h into a new
qemu/hw-version.h.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220208200856.3558249-6-peter.maydell@linaro.org
* DMA support in the multiboot option ROM
* Rename default-bus-bypass-iommu
* Deprecate -watchdog and cleanup -watchdog-action
* HVF fix for <PAGE_SIZE regions
* Support TSC scaling for AMD nested virtualization
* Fix for ESP fuzzing bug
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGBUeEUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOh+Qf+OMRhRiv6dYjbK/5zXrx81AgxYAY3
dBUSr8v16LyrMl1U3DZWzhD+MzQsC83m/Xsh4lGxlHDWtkK9QQA5xDG95JZdY26i
MGCbbjnFHISbyBQV9Y724gPfPjOOODuoFbzafSx6VLITOcyv1ye0cm7TOjOPB+tt
E4c3JqTZ7g8a5yMe8ItkVhz5pPY+oVw8dxMNRp6Sup5Dbfx0DjacIwLasLsHfPL7
qBADfqB20ovHUzLjXu7oWgEd4KxJ6kiSCaJJu/KD36hg0wB8+WVP1o43j4PkczHT
QjU7eZaeaTrN5Cf34ttPge6QReMi5SFNCaA9O9/HLqrQgdEtt/diZWuqjQ==
=a2mC
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Build system fixes and cleanups
* DMA support in the multiboot option ROM
* Rename default-bus-bypass-iommu
* Deprecate -watchdog and cleanup -watchdog-action
* HVF fix for <PAGE_SIZE regions
* Support TSC scaling for AMD nested virtualization
* Fix for ESP fuzzing bug
# gpg: Signature made Tue 02 Nov 2021 10:57:37 AM EDT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
* remotes/bonzini/tags/for-upstream: (27 commits)
configure: fix --audio-drv-list help message
configure: Remove the check for the __thread keyword
Move the l2tpv3 test from configure to meson.build
meson: remove unnecessary coreaudio test program
meson: remove pointless warnings
meson.build: Allow to disable OSS again
meson: bump submodule to 0.59.3
qtest/am53c974-test: add test for cancelling in-flight requests
esp: ensure in-flight SCSI requests are always cancelled
KVM: SVM: add migration support for nested TSC scaling
hw/i386: fix vmmouse registration
watchdog: remove select_watchdog_action
vl: deprecate -watchdog
watchdog: add information from -watchdog help to -device help
hw/i386: Rename default_bus_bypass_iommu
hvf: Avoid mapping regions < PAGE_SIZE as ram
configure: do not duplicate CPU_CFLAGS into QEMU_LDFLAGS
configure: remove useless NPTL probe
target/i386: use DMA-enabled multiboot ROM for new-enough QEMU machine types
optionrom: add a DMA-enabled multiboot ROM
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
I noticed -cpu help printing enough trailing spaces to make the output
at least 84 characters wide. Looks ugly unless the terminal is wider.
Ugly or not, trailing spaces are stupid.
The culprit is this line in x86_cpu_list_entry():
qemu_printf("x86 %-20s %-58s\n", name, desc);
This prints a string with minimum field left-justified right before a
newline. Change it to
qemu_printf("x86 %-20s %s\n", name, desc);
which avoids the trailing spaces and is simpler to boot.
A search for the pattern with "git-grep -E '%-[0-9]+s\\n'" found a few
more instances. Change them similarly.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Message-Id: <20211009152401.2982862-1-armbru@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Because core-capability releated features are model-specific and KVM
won't support it, remove the core-capability in CPU model to avoid the
warning message.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210827064818.4698-3-chenyi.qiang@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Only declare sev_enabled() and sev_es_enabled() when CONFIG_SEV is
set, to allow the compiler to elide unused code. Remove unnecessary
stubs.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20211007161716.453984-17-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SEV is a x86 specific feature, and the "sev_i386.h" header
is already in target/i386/. Rename it as "sev.h" to simplify.
Patch created mechanically using:
$ git mv target/i386/sev_i386.h target/i386/sev.h
$ sed -i s/sev_i386.h/sev.h/ $(git grep -l sev_i386.h)
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20211007161716.453984-15-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 00b8105324 ("target-i386: Remove assert_no_error usage")
forgot to add the "qapi/error.h" for &error_abort, add it now.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211007161716.453984-8-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM implements some Hyper-V 2016 functions so providing WS2008R2 version
is somewhat incorrect. While generally guests shouldn't care about it
and always check feature bits, it is known that some tools in Windows
actually check version info.
For compatibility reasons make the change for 6.2 machine types only.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-9-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, we hardcode Hyper-V version id (CPUID 0x40000002) to
WS2008R2 and it is known that certain tools in Windows check this. It
seems useful to provide some flexibility by making it possible to change
this info at will. CPUID information is defined in TLFS as:
EAX: Build Number
EBX Bits 31-16: Major Version
Bits 15-0: Minor Version
ECX Service Pack
EDX Bits 31-24: Service Branch
Bits 23-0: Service Number
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-8-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The enlightenment allows to use Hyper-V SynIC with hardware APICv/AVIC
enabled. Normally, Hyper-V SynIC disables these hardware features and
suggests the guest to use paravirtualized AutoEOI feature. Linux-4.15
gains support for conditional APICv/AVIC disablement, the feature
stays on until the guest tries to use AutoEOI feature with SynIC. With
'HV_DEPRECATING_AEOI_RECOMMENDED' bit exposed, modern enough Windows/
Hyper-V versions should follow the recommendation and not use the
(unwanted) feature.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>