Observed the following failure while booting the SEV-SNP guest and the
guest fails to boot with the smp parameters:
"-smp 192,sockets=1,dies=12,cores=8,threads=2".
qemu-system-x86_64: sev_snp_launch_update: SNP_LAUNCH_UPDATE ret=-5 fw_error=22 'Invalid parameter'
qemu-system-x86_64: SEV-SNP: CPUID validation failed for function 0x8000001e, index: 0x0.
provided: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000b00, edx: 0x00000000
expected: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000300, edx: 0x00000000
qemu-system-x86_64: SEV-SNP: failed update CPUID page
Reason for the failure is due to overflowing of bits used for "Node per
processor" in CPUID Fn8000001E_ECX. This field's width is 3 bits wide and
can hold maximum value 0x7. With dies=12 (0xB), it overflows and spills
over into the reserved bits. In the case of SEV-SNP, this causes CPUID
enforcement failure and guest fails to boot.
The PPR documentation for CPUID_Fn8000001E_ECX [Node Identifiers]
=================================================================
Bits Description
31:11 Reserved.
10:8 NodesPerProcessor: Node per processor. Read-only.
ValidValues:
Value Description
0h 1 node per processor.
7h-1h Reserved.
7:0 NodeId: Node ID. Read-only. Reset: Fixed,XXh.
=================================================================
As in the spec, the valid value for "node per processor" is 0 and rest
are reserved.
Looking back at the history of decoding of CPUID_Fn8000001E_ECX, noticed
that there were cases where "node per processor" can be more than 1. It
is valid only for pre-F17h (pre-EPYC) architectures. For EPYC or later
CPUs, the linux kernel does not use this information to build the L3
topology.
Also noted that the CPUID Function 0x8000001E_ECX is available only when
TOPOEXT feature is enabled. This feature is enabled only for EPYC(F17h)
or later processors. So, previous generation of processors do not not
enumerate 0x8000001E_ECX leaf.
There could be some corner cases where the older guests could enable the
TOPOEXT feature by running with -cpu host, in which case legacy guests
might notice the topology change. To address those cases introduced a
new CPU property "legacy-multi-node". It will be true for older machine
types to maintain compatibility. By default, it will be false, so new
decoding will be used going forward.
The documentation is taken from Preliminary Processor Programming
Reference (PPR) for AMD Family 19h Model 11h, Revision B1 Processors 55901
Rev 0.25 - Oct 6, 2022.
Cc: qemu-stable@nongnu.org
Fixes: 31ada106d8 ("Simplify CPUID_8000_001E for AMD")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-ID: <0ee4b0a8293188a53970a2b0e4f4ef713425055e.1714757834.git.babu.moger@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Extract page-protection definitions to page-protection.h
- Rework in accel/tcg in preparation of extracting TCG fields from CPUState
- More uses of get_task_state() in user emulation
- Xen refactors in preparation for adding multiple map caches (Juergen & Edgar)
- MAINTAINERS updates (Aleksandar and Bin)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmY40CAACgkQ4+MsLN6t
wN5drxAA1oIsuUzpAJmlMIxZwlzbICiuexgn/HH9DwWNlrarKo7V1l4YB8jd9WOg
IKuj7c39kJKsDEB8BXApYwcly+l7DYdnAAI8Z7a+eN+ffKNl/0XBaLjsGf58RNwY
fb39/cXWI9ZxKxsHMSyjpiu68gOGvZ5JJqa30Fr+eOGuug9Fn/fOe1zC6l/dMagy
Dnym72stpD+hcsN5sVwohTBIk+7g9og1O/ctRx6Q3ZCOPz4p0+JNf8VUu43/reaR
294yRK++JrSMhOVFRzP+FH1G25NxiOrVCFXZsUTYU+qPDtdiKtjH1keI/sk7rwZ7
U573lesl7ewQFf1PvMdaVf0TrQyOe6kUGr9Mn2k8+KgjYRAjTAQk8V4Ric/+xXSU
0rd7Cz7lyQ8jm0DoOElROv+lTDQs4dvm3BopF3Bojo4xHLHd3SFhROVPG4tvGQ3H
72Q5UPR2Jr2QZKiImvPceUOg0z5XxoN6KRUkSEpMFOiTRkbwnrH59z/qPijUpe6v
8l5IlI9GjwkL7pcRensp1VC6e9KC7F5Od1J/2RLDw3UQllMQXqVw2bxD3CEtDRJL
QSZoS4d1jUCW4iAYdqh/8+2cOIPiCJ4ai5u7lSdjrIJkRErm32FV/pQLZauoHlT5
eTPUgzDoRXVgI1X1slTpVXlEEvRNbhZqSkYLkXr80MLn5hTafo0=
=3Qkg
-----END PGP SIGNATURE-----
Merge tag 'accel-20240506' of https://github.com/philmd/qemu into staging
Accelerator patches
- Extract page-protection definitions to page-protection.h
- Rework in accel/tcg in preparation of extracting TCG fields from CPUState
- More uses of get_task_state() in user emulation
- Xen refactors in preparation for adding multiple map caches (Juergen & Edgar)
- MAINTAINERS updates (Aleksandar and Bin)
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmY40CAACgkQ4+MsLN6t
# wN5drxAA1oIsuUzpAJmlMIxZwlzbICiuexgn/HH9DwWNlrarKo7V1l4YB8jd9WOg
# IKuj7c39kJKsDEB8BXApYwcly+l7DYdnAAI8Z7a+eN+ffKNl/0XBaLjsGf58RNwY
# fb39/cXWI9ZxKxsHMSyjpiu68gOGvZ5JJqa30Fr+eOGuug9Fn/fOe1zC6l/dMagy
# Dnym72stpD+hcsN5sVwohTBIk+7g9og1O/ctRx6Q3ZCOPz4p0+JNf8VUu43/reaR
# 294yRK++JrSMhOVFRzP+FH1G25NxiOrVCFXZsUTYU+qPDtdiKtjH1keI/sk7rwZ7
# U573lesl7ewQFf1PvMdaVf0TrQyOe6kUGr9Mn2k8+KgjYRAjTAQk8V4Ric/+xXSU
# 0rd7Cz7lyQ8jm0DoOElROv+lTDQs4dvm3BopF3Bojo4xHLHd3SFhROVPG4tvGQ3H
# 72Q5UPR2Jr2QZKiImvPceUOg0z5XxoN6KRUkSEpMFOiTRkbwnrH59z/qPijUpe6v
# 8l5IlI9GjwkL7pcRensp1VC6e9KC7F5Od1J/2RLDw3UQllMQXqVw2bxD3CEtDRJL
# QSZoS4d1jUCW4iAYdqh/8+2cOIPiCJ4ai5u7lSdjrIJkRErm32FV/pQLZauoHlT5
# eTPUgzDoRXVgI1X1slTpVXlEEvRNbhZqSkYLkXr80MLn5hTafo0=
# =3Qkg
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 06 May 2024 05:42:08 AM PDT
# gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
* tag 'accel-20240506' of https://github.com/philmd/qemu: (28 commits)
MAINTAINERS: Update my email address
MAINTAINERS: Update Aleksandar Rikalo email
system: Pass RAM MemoryRegion and is_write in xen_map_cache()
xen: mapcache: Break out xen_map_cache_init_single()
xen: mapcache: Break out xen_invalidate_map_cache_single()
xen: mapcache: Refactor xen_invalidate_map_cache_entry_unlocked
xen: mapcache: Refactor xen_replace_cache_entry_unlocked
xen: mapcache: Break out xen_ram_addr_from_mapcache_single
xen: mapcache: Refactor xen_remap_bucket for multi-instance
xen: mapcache: Refactor xen_map_cache for multi-instance
xen: mapcache: Refactor lock functions for multi-instance
xen: let xen_ram_addr_from_mapcache() return -1 in case of not found entry
system: let qemu_map_ram_ptr() use qemu_ram_ptr_length()
user: Use get_task_state() helper
user: Declare get_task_state() once in 'accel/tcg/vcpu-state.h'
user: Forward declare TaskState type definition
accel/tcg: Move @plugin_mem_cbs from CPUState to CPUNegativeOffsetState
accel/tcg: Restrict cpu_plugin_mem_cbs_enabled() to TCG
accel/tcg: Restrict qemu_plugin_vcpu_exit_hook() to TCG plugins
accel/tcg: Update CPUNegativeOffsetState::can_do_io field documentation
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add the missing features(ss, tsc-adjust, cldemote, movdiri, movdir64b) in
the SapphireRapids-v3 CPU model.
Signed-off-by: Lei Wang <lei4.wang@intel.com>
Message-ID: <20240424072912.43188-1-lei4.wang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Printing an "x86" in front of each CPU name is not helpful at all:
It is confusing for the users since they don't know whether they
have to specify these letters for the "-cpu" parameter, too, and
it also takes some precious space in the dense output of the CPU
entries. Let's simply remove this now and use two spaces at the
beginning of the lines for the indentation of the entries instead,
like most other target architectures are doing it for their CPU help
output already.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
We pass a ResetType argument to the Resettable class enter
phase method, but we don't pass it to hold and exit, even though
the callsites have it readily available. This means that if
a device cared about the ResetType it would need to record it
in the enter phase method to use later on. Pass the type to
all three of the phase methods to avoid having to do that.
Commit created with
for dir in hw target include; do \
spatch --macro-file scripts/cocci-macro-file.h \
--sp-file scripts/coccinelle/reset-type.cocci \
--keep-comments --smpl-spacing --in-place \
--include-headers --dir $dir; done
and no manual edits.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Luc Michel <luc.michel@amd.com>
Message-id: 20240412160809.1260625-5-peter.maydell@linaro.org
Currently, the difference between warn_report_once() and
error_report_once() is the former has the "warning:" prefix, while the
latter does not have a similar level prefix.
At the meantime, considering that there is no error handling logic here,
and the purpose of error_report_once() is only to prompt the user with
an abnormal message, there is no need to use an error-level message here,
and instead we can just use a warning.
Therefore, downgrade the message in error_report_once() to warning, and
merge it into the previous warn_report_once().
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240327103951.3853425-4-zhao1.liu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The difference between error_printf() and error_report() is the latter
may contain more information, such as the name of the program
("qemu-system-x86_64").
Thus its variant error_report_once() and warn_report()'s variant
warn_report_once() can be used here to print the information only once
without a static local variable "ht_warned".
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240327103951.3853425-3-zhao1.liu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Register File Data Sampling (RFDS) is a CPU side-channel vulnerability
that may expose stale register value. CPUs that set RFDS_NO bit in MSR
IA32_ARCH_CAPABILITIES indicate that they are not vulnerable to RFDS.
Similarly, RFDS_CLEAR indicates that CPU is affected by RFDS, and has
the microcode to help mitigate RFDS.
Make RFDS_CLEAR and RFDS_NO bits available to guests.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <9a38877857392b5c2deae7e7db1b170d15510314.1710341348.git.pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to table 1-2 in Intel Architecture Instruction Set Extensions and
Future Features (rev 051) [1], SierraForest has the following new features
which have already been virtualized:
- CMPCCXADD CPUID.(EAX=7,ECX=1):EAX[bit 7]
- AVX-IFMA CPUID.(EAX=7,ECX=1):EAX[bit 23]
- AVX-VNNI-INT8 CPUID.(EAX=7,ECX=1):EDX[bit 4]
- AVX-NE-CONVERT CPUID.(EAX=7,ECX=1):EDX[bit 5]
Add above features to new CPU model SierraForest. Comparing with GraniteRapids
CPU model, SierraForest bare-metal removes the following features:
- HLE CPUID.(EAX=7,ECX=0):EBX[bit 4]
- RTM CPUID.(EAX=7,ECX=0):EBX[bit 11]
- AVX512F CPUID.(EAX=7,ECX=0):EBX[bit 16]
- AVX512DQ CPUID.(EAX=7,ECX=0):EBX[bit 17]
- AVX512_IFMA CPUID.(EAX=7,ECX=0):EBX[bit 21]
- AVX512CD CPUID.(EAX=7,ECX=0):EBX[bit 28]
- AVX512BW CPUID.(EAX=7,ECX=0):EBX[bit 30]
- AVX512VL CPUID.(EAX=7,ECX=0):EBX[bit 31]
- AVX512_VBMI CPUID.(EAX=7,ECX=0):ECX[bit 1]
- AVX512_VBMI2 CPUID.(EAX=7,ECX=0):ECX[bit 6]
- AVX512_VNNI CPUID.(EAX=7,ECX=0):ECX[bit 11]
- AVX512_BITALG CPUID.(EAX=7,ECX=0):ECX[bit 12]
- AVX512_VPOPCNTDQ CPUID.(EAX=7,ECX=0):ECX[bit 14]
- LA57 CPUID.(EAX=7,ECX=0):ECX[bit 16]
- TSXLDTRK CPUID.(EAX=7,ECX=0):EDX[bit 16]
- AMX-BF16 CPUID.(EAX=7,ECX=0):EDX[bit 22]
- AVX512_FP16 CPUID.(EAX=7,ECX=0):EDX[bit 23]
- AMX-TILE CPUID.(EAX=7,ECX=0):EDX[bit 24]
- AMX-INT8 CPUID.(EAX=7,ECX=0):EDX[bit 25]
- AVX512_BF16 CPUID.(EAX=7,ECX=1):EAX[bit 5]
- fast zero-length MOVSB CPUID.(EAX=7,ECX=1):EAX[bit 10]
- fast short CMPSB, SCASB CPUID.(EAX=7,ECX=1):EAX[bit 12]
- AMX-FP16 CPUID.(EAX=7,ECX=1):EAX[bit 21]
- PREFETCHI CPUID.(EAX=7,ECX=1):EDX[bit 14]
- XFD CPUID.(EAX=0xD,ECX=1):EAX[bit 4]
- EPT_PAGE_WALK_LENGTH_5 VMX_EPT_VPID_CAP(0x48c)[bit 7]
Add all features of GraniteRapids CPU model except above features to
SierraForest CPU model.
SierraForest doesn’t support TSX and RTM but supports TAA_NO. When RTM is
not enabled in host, KVM will not report TAA_NO. So, just don't include
TAA_NO in SierraForest CPU model.
[1] https://cdrdv2.intel.com/v1/dl/getContent/671368
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Message-ID: <20240320021044.508263-1-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When start L2 guest with both L1/L2 using Icelake-Server-v3 or above,
QEMU reports below warning:
"warning: host doesn't support requested feature: MSR(10AH).taa-no [bit 8]"
Reason is QEMU Icelake-Server-v3 has TSX feature disabled but enables taa-no
bit. It's meaningless that TSX isn't supported but still claim TSX is secure.
So L1 KVM doesn't expose taa-no to L2 if TSX is unsupported, then starting L2
triggers the warning.
Fix it by introducing a new version Icelake-Server-v7 which has both TSX
and taa-no features. Then guest can use TSX securely when it see taa-no.
This matches the production Icelake which supports TSX and isn't susceptible
to TSX Async Abort (TAA) vulnerabilities, a.k.a, taa-no.
Ideally, TSX should have being enabled together with taa-no since v3, but for
compatibility, we'd better to add v7 to enable it.
Fixes: d965dc3559 ("target/i386: Add ARCH_CAPABILITIES related bits into Icelake-Server CPU model")
Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-ID: <20240320093138.80267-2-zhenzhong.duan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allows to set guest-phys-bits (cpuid leaf 80000008, eax[23:16])
via -cpu $model,guest-phys-bits=$nr.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20240318155336.156197-3-kraxel@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The low bit of MMU indices for x86 TCG indicates whether the processor is
in 32-bit mode and therefore linear addresses have to be masked to 32 bits.
However, the index was computed incorrectly, leading to possible conflicts
in the TLB for any address above 4G.
Analyzed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Fixes: b1661801c1 ("target/i386: Fix physical address truncation", 2024-02-28)
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2206
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since CPU() macro is a simple cast, the following are equivalent:
Object *obj;
CPUState *cs = CPU(obj)
In order to ease static analysis when running
scripts/coccinelle/cpu_env.cocci from the previous commit,
replace:
- CPU_GET_CLASS(cpu);
+ CPU_GET_CLASS(obj);
Most code use the 'cs' variable name for CPUState handle.
Replace few 's' -> 'cs' to unify cpu_reset_hold() style.
No logical change in this patch.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240129164514.73104-7-philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
GDBFeature has the num_regs member so use it where applicable to
remove magic numbers.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20231213-gdb-v17-8-777047380591@daynix.com>
[AJB: remove core reg check from microblaze read reg]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240227144335.1196131-13-alex.bennee@linaro.org>
The address translation logic in get_physical_address() will currently
truncate physical addresses to 32 bits unless long mode is enabled.
This is incorrect when using physical address extensions (PAE) outside
of long mode, with the result that a 32-bit operating system using PAE
to access memory above 4G will experience undefined behaviour.
The truncation code was originally introduced in commit 33dfdb5 ("x86:
only allow real mode to access 32bit without LMA"), where it applied
only to translations performed while paging is disabled (and so cannot
affect guests using PAE).
Commit 9828198 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX")
rearranged the code such that the truncation also applied to the use
of MMU_PHYS_IDX and MMU_NESTED_IDX. Commit 4a1e9d4 ("target/i386: Use
atomic operations for pte updates") brought this truncation into scope
for page table entry accesses, and is the first commit for which a
Windows 10 32-bit guest will reliably fail to boot if memory above 4G
is present.
The truncation code however is not completely redundant. Even though the
maximum address size for any executed instruction is 32 bits, helpers for
operations such as BOUND, FSAVE or XSAVE may ask get_physical_address()
to translate an address outside of the 32-bit range, if invoked with an
argument that is close to the 4G boundary. Likewise for processor
accesses, for example TSS or IDT accesses, when EFER.LMA==0.
So, move the address truncation in get_physical_address() so that it
applies to 32-bit MMU indexes, but not to MMU_PHYS_IDX and MMU_NESTED_IDX.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2040
Fixes: 4a1e9d4d11 ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Cc: qemu-stable@nongnu.org
Co-developed-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Accesses from a 32-bit environment (32-bit code segment for instruction
accesses, EFER.LMA==0 for processor accesses) have to mask away the
upper 32 bits of the address. While a bit wasteful, the easiest way
to do so is to use separate MMU indexes. These days, QEMU anyway is
compiled with a fixed value for NB_MMU_MODES. Split MMU_USER_IDX,
MMU_KSMAP_IDX and MMU_KNOSMAP_IDX in two.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The processor tracing features in cpu_x86_cpuid() are hardcoded to a set
that should be safe on all processor that support PT virtualization.
But as an additional check, x86_cpu_filter_features() also checks
that the accelerator supports that safe subset, and if not it marks
CPUID_7_0_EBX_INTEL_PT as unavailable.
This check fails on accelerators other than KVM, but it is actually
unnecessary to do it because KVM is the only accelerator that uses the
safe subset. Everything else just provides nonzero values for CPUID
leaf 0x14 (TCG/HVF because processor tracing is not supported; qtest
because nothing is able to read CPUID anyway). Restricting the check
to KVM fixes a warning with the qtest accelerator:
$ qemu-system-x86_64 -display none -cpu max,mmx=off -accel qtest
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.07H:EBX.intel-pt [bit 25]
The warning also happens in the test-x86-cpuid-compat qtest.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2096
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240221162910.101327-1-pbonzini@redhat.com>
Fixes: d047402436 ("target/i386: Call accel-agnostic x86_cpu_get_supported_cpuid()")
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
KVM_FEATURE_ASYNC_PF_VMEXIT has been introduced for years, however QEMU
doesn't support expose it to guest. Add support for it.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20231024083354.1171308-1-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The value of FEAT_XSAVE_XCR0_HI leaf and FEAT_XSAVE_XSS_HI leaf also
need to be masked by XCR0 and XSS mask respectively, to make it
logically correct.
Fixes: 301e90675c ("target/i386: Enable support for XSAVES based features")
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yang Weijiang <weijiang.yang@intel.com>
Message-ID: <20240115091325.1904229-3-xiaoyao.li@intel.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Leaf FEAT_XSAVE_XSS_LO and FEAT_XSAVE_XSS_HI also need to be cleared
when CPUID_EXT_XSAVE is not set.
Fixes: 301e90675c ("target/i386: Enable support for XSAVES based features")
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yang Weijiang <weijiang.yang@intel.com>
Message-ID: <20240115091325.1904229-2-xiaoyao.li@intel.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240106132546.21248-4-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This commit adds support for x2APIC transitions when writing to
MSR_IA32_APICBASE register and finally adds CPUID_EXT_X2APIC to
TCG_EXT_FEATURES.
The set_base in APICCommonClass now returns an integer to indicate error in
execution. apic_set_base return -1 on invalid APIC state transition,
accelerator can use this to raise appropriate exception.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Message-Id: <20240111154404.5333-4-minhquangbui99@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
For user-only mode, use MMU_USER_IDX.
For system mode, use CPUClass.mmu_index.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
'tcg_cflags' is specific to TCG.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231130075958.21285-1-philmd@linaro.org>
Use generic cpu_model_from_type() when the CPU model name needs to
be extracted from the CPU type name.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20231114235628.534334-23-gshan@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The main difficulty here is that a page fault when writing to the destination
must not overwrite the flags. Therefore, the flags computation must be
inlined instead of using gen_jcc1*.
For simplicity, I am using an unconditional cmpxchg operation, that becomes
a NOP if the comparison fails.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vCPU "reset" is only possible with system emulation.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-ID: <20231212113640.30287-5-philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Improve
$ qemu-system-x86_64 -device max-x86_64-cpu,vendor=me
qemu-system-x86_64: -device max-x86_64-cpu,vendor=me: Property '.vendor' doesn't take value 'me'
to
qemu-system-x86_64: -device max-x86_64-cpu,vendor=0123456789abc: value of property 'vendor' must consist of exactly 12 characters
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20231031111059.3407803-8-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[Typo corrected]
From CPUState.nr_cores' comment, it represents "number of cores within
this CPU package".
After 003f230e37 ("machine: Tweak the order of topology members in
struct CpuTopology"), the meaning of smp.cores changed to "the number of
cores in one die", but this commit missed to change CPUState.nr_cores'
calculation, so that CPUState.nr_cores became wrong and now it
misses to consider numbers of clusters and dies.
At present, only i386 is using CPUState.nr_cores.
But as for i386, which supports die level, the uses of CPUState.nr_cores
are very confusing:
Early uses are based on the meaning of "cores per package" (before die
is introduced into i386), and later uses are based on "cores per die"
(after die's introduction).
This difference is due to that commit a94e142899 ("target/i386: Add
CPUID.1F generation support for multi-dies PCMachine") misunderstood
that CPUState.nr_cores means "cores per die" when calculated
CPUID.1FH.01H:EBX. After that, the changes in i386 all followed this
wrong understanding.
With the influence of 003f230e37 and a94e142899, for i386 currently
the result of CPUState.nr_cores is "cores per die", thus the original
uses of CPUState.cores based on the meaning of "cores per package" are
wrong when multiple dies exist:
1. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.01H:EBX[bits 23:16] is
incorrect because it expects "cpus per package" but now the
result is "cpus per die".
2. In cpu_x86_cpuid() of target/i386/cpu.c, for all leaves of CPUID.04H:
EAX[bits 31:26] is incorrect because they expect "cpus per package"
but now the result is "cpus per die". The error not only impacts the
EAX calculation in cache_info_passthrough case, but also impacts other
cases of setting cache topology for Intel CPU according to cpu
topology (specifically, the incoming parameter "num_cores" expects
"cores per package" in encode_cache_cpuid4()).
3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.0BH.01H:EBX[bits
15:00] is incorrect because the EBX of 0BH.01H (core level) expects
"cpus per package", which may be different with 1FH.01H (The reason
is 1FH can support more levels. For QEMU, 1FH also supports die,
1FH.01H:EBX[bits 15:00] expects "cpus per die").
4. In cpu_x86_cpuid() of target/i386/cpu.c, when CPUID.80000001H is
calculated, here "cpus per package" is expected to be checked, but in
fact, now it checks "cpus per die". Though "cpus per die" also works
for this code logic, this isn't consistent with AMD's APM.
5. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.80000008H:ECX expects
"cpus per package" but it obtains "cpus per die".
6. In simulate_rdmsr() of target/i386/hvf/x86_emu.c, in
kvm_rdmsr_core_thread_count() of target/i386/kvm/kvm.c, and in
helper_rdmsr() of target/i386/tcg/sysemu/misc_helper.c,
MSR_CORE_THREAD_COUNT expects "cpus per package" and "cores per
package", but in these functions, it obtains "cpus per die" and
"cores per die".
On the other hand, these uses are correct now (they are added in/after
a94e142899):
1. In cpu_x86_cpuid() of target/i386/cpu.c, topo_info.cores_per_die
meets the actual meaning of CPUState.nr_cores ("cores per die").
2. In cpu_x86_cpuid() of target/i386/cpu.c, vcpus_per_socket (in CPUID.
04H's calculation) considers number of dies, so it's correct.
3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.1FH.01H:EBX[bits
15:00] needs "cpus per die" and it gets the correct result, and
CPUID.1FH.02H:EBX[bits 15:00] gets correct "cpus per package".
When CPUState.nr_cores is correctly changed to "cores per package" again
, the above errors will be fixed without extra work, but the "currently"
correct cases will go wrong and need special handling to pass correct
"cpus/cores per die" they want.
Fix CPUState.nr_cores' calculation to fit the original meaning "cores
per package", as well as changing calculation of topo_info.cores_per_die,
vcpus_per_socket and CPUID.1FH.
Fixes: a94e142899 ("target/i386: Add CPUID.1F generation support for multi-dies PCMachine")
Fixes: 003f230e37 ("machine: Tweak the order of topology members in struct CpuTopology")
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20231024090323.1859210-4-zhao1.liu@linux.intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
PAE mode in x86 supports 36 bit address space. Check the PAE CPUID on the
guest processor and set phys_bits to 36 if PAE feature is set. This is in
addition to checking the presence of PSE36 CPUID feature for setting 36 bit
phys_bits.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20230912120650.371781-1-anisinha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The implementation was validated with OpenSSL and with the test vectors in
https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/x86/sha.rs.
The instructions provide a ~25% improvement on hashing a 64 MiB file:
runtime goes down from 1.8 seconds to 1.4 seconds; instruction count on
the host goes down from 5.8 billion to 4.8 billion with slightly better
IPC too. Good job Intel. ;)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When explicitly booting a multiple vcpus vm with "-cpu +ht", it gets
warning of
warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
Make CPUID_HT as supported unconditionally can resolve the warning.
However it introduces another issue that it also expose CPUID_HT to
guest when "-cpu host/max" with only 1 vcpu. To fix this, need mark
CPUID_HT as the no_autoenable_flags.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20231010060539.210258-1-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All implementations of gdb_arch_name() returns dynamic duplicates of
static strings. It's also unlikely that there will be an implementation
of gdb_arch_name() that returns a truly dynamic value due to the nature
of the function returning a well-known identifiers. Qualify the value
gdb_arch_name() with const and make all of its implementations return
static strings.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230912224107.29669-8-akihiko.odaki@daynix.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20231009164104.369749-15-alex.bennee@linaro.org>
Since we *might* have user emulation with softmmu,
replace the system emulation check by !user emulation one.
(target/ was cleaned from invalid CONFIG_SOFTMMU uses at
commit cab35c73be, but these files were merged few days
after, thus missed the cleanup.)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231004082239.27251-1-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
hvf_get_supported_cpuid() is only defined for x86 targets
(in target/i386/hvf/x86_cpuid.c).
Its declaration is pointless on all other targets.
All the calls to it in target/i386/cpu.c are guarded by
a call on hvf_enabled(), so are elided when HVF is not
built in. Therefore we can remove the unnecessary function
stub.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231004092510.39498-3-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This function is now empty, so remove it. In the case of
m68k and tricore, this empties the class instance initfn,
so remove those as well.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The omission of alignment has technically been wrong since
269bd5d8f6, where QEMU_ALIGNED was added to CPUTLBDescFast.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Gather Data Sampling (GDS) is a side-channel attack using Gather
instructions. Some Intel processors will set ARCH_CAP_GDS_NO bit in
MSR IA32_ARCH_CAPABILITIES to report that they are not vulnerable to
GDS.
Make this bit available to guests.
Closes: https://lore.kernel.org/qemu-devel/CAMGffEmG6TNq0n3+4OJAgXc8J0OevY60KHZekXCBs3LoK9vehA@mail.gmail.com/
Reported-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Tested-by: Jack Wang <jinpu.wang@ionos.com>
Tested-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Message-ID: <fde42d81ce454477ca8e27d5429a190b7366fe86.1692074650.git.pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On parts that enumerate IA32_VMX_BASIC MSR bit as 1, any exception vector
can be delivered with or without an error code if the other consistency
checks are satisfied.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
x86_cpu_get_supported_cpuid() is generic and handles the different
accelerators. Use it instead of kvm_arch_get_supported_cpuid().
That fixes a link failure introduced by commit 3adce820cf
("target/i386: Remove unused KVM stubs") when QEMU is configured
as:
$ ./configure --cc=clang \
--target-list=x86_64-linux-user,x86_64-softmmu \
--enable-debug
We were getting:
[71/71] Linking target qemu-x86_64
FAILED: qemu-x86_64
/usr/bin/ld: libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o: in function `cpu_x86_cpuid':
cpu.c:(.text+0x1374): undefined reference to `kvm_arch_get_supported_cpuid'
/usr/bin/ld: libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o: in function `x86_cpu_filter_features':
cpu.c:(.text+0x81c2): undefined reference to `kvm_arch_get_supported_cpuid'
/usr/bin/ld: cpu.c:(.text+0x81da): undefined reference to `kvm_arch_get_supported_cpuid'
/usr/bin/ld: cpu.c:(.text+0x81f2): undefined reference to `kvm_arch_get_supported_cpuid'
/usr/bin/ld: cpu.c:(.text+0x820a): undefined reference to `kvm_arch_get_supported_cpuid'
/usr/bin/ld: libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o:cpu.c:(.text+0x8225): more undefined references to `kvm_arch_get_supported_cpuid' follow
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.
For the record, this is because '--enable-debug' disables
optimizations (CFLAGS=-O0).
While at this (un)optimization level GCC eliminate the
following dead code (CPP output of mentioned build):
static void x86_cpu_get_supported_cpuid(uint32_t func, uint32_t index,
uint32_t *eax, uint32_t *ebx,
uint32_t *ecx, uint32_t *edx)
{
if ((0)) {
*eax = kvm_arch_get_supported_cpuid(kvm_state, func, index, R_EAX);
*ebx = kvm_arch_get_supported_cpuid(kvm_state, func, index, R_EBX);
*ecx = kvm_arch_get_supported_cpuid(kvm_state, func, index, R_ECX);
*edx = kvm_arch_get_supported_cpuid(kvm_state, func, index, R_EDX);
} else if (0) {
*eax = 0;
*ebx = 0;
*ecx = 0;
*edx = 0;
} else {
*eax = 0;
*ebx = 0;
*ecx = 0;
*edx = 0;
}
Clang does not (see commit 2140cfa51d "i386: Fix build by
providing stub kvm_arch_get_supported_cpuid()").
Cc: qemu-stable@nongnu.org
Fixes: 3adce820cf ("target/i386: Remove unused KVM stubs")
Reported-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230913093009.83520-4-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
x86_cpu_get_supported_cpuid() already checks for KVM/HVF
accelerators, so it is not needed to manually check it via
a call to accel_uses_host_cpuid() before calling it.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230913093009.83520-3-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In case more code is added after the kvm_hyperv_expand_features()
call, check its return value (since it can fail).
Fixes: 071ce4b03b ("i386: expand Hyper-V features during CPU feature expansion time")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230913093009.83520-2-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Latest Intel platform GraniteRapids-D introduces AMX-COMPLEX, which adds
two instructions to perform matrix multiplication of two tiles containing
complex elements and accumulate the results into a packed single precision
tile.
AMX-COMPLEX is enumerated via CPUID.(EAX=7,ECX=1):EDX[bit 8]. Add the CPUID
definition for AMX-COMPLEX, AMX-COMPLEX will be enabled automatically when
using '-cpu host' and KVM advertises AMX-COMPLEX to userspace.
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20230830074324.84059-1-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Current QEMU can expose waitpkg to guests when it is available. However,
VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE is still not recognized and
masked by QEMU. This can lead to an unexpected situation when a L1
hypervisor wants to expose waitpkg to a L2 guest. The L1 hypervisor can
assume that VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE exists as waitpkg is
available. The L1 hypervisor then can accidentally expose waitpkg to the
L2 guest. This will cause invalid opcode exception in the L2 guest when
it executes waitpkg related instructions.
This patch adds VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE support, and
sets up dependency between the bit and CPUID_7_0_ECX_WAITPKG. QEMU should
not expose waitpkg feature if VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE is
not available to avoid unexpected invalid opcode exception in L2 guests.
Signed-off-by: Ake Koomsin <ake@igel.co.jp>
Message-ID: <20230807093339.32091-2-ake@igel.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>