Commit Graph

82 Commits

Author SHA1 Message Date
Yang Weijiang
5a778a5f82 target/i386: Add kvm_get_one_msr helper
When try to get one msr from KVM, I found there's no such kind of
existing interface while kvm_put_one_msr() is there. So here comes
the patch. It'll remove redundant preparation code before finally
call KVM_GET_MSRS IOCTL.

No functional change intended.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20220215195258.29149-4-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-14 12:32:41 +02:00
Jon Doron
d8701185f4 hw: hyperv: Initial commit for Synthetic Debugging device
Signed-off-by: Jon Doron <arilou@gmail.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220216102500.692781-5-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:56 +02:00
Jon Doron
73d2407407 hyperv: Add support to process syndbg commands
SynDbg commands can come from two different flows:
1. Hypercalls, in this mode the data being sent is fully
   encapsulated network packets.
2. SynDbg specific MSRs, in this mode only the data that needs to be
   transfered is passed.

Signed-off-by: Jon Doron <arilou@gmail.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220216102500.692781-4-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:56 +02:00
Jon Doron
ccbdf5e81b hyperv: Add definitions for syndbg
Add all required definitions for hyperv synthetic debugger interface.

Signed-off-by: Jon Doron <arilou@gmail.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220216102500.692781-3-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:56 +02:00
Marc-André Lureau
0f9668e0c1 Remove qemu-common.h include from most units
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:55 +02:00
Paolo Bonzini
58f7db26f2 KVM: x86: workaround invalid CPUID[0xD,9] info on some AMD processors
Some AMD processors expose the PKRU extended save state even if they do not have
the related PKU feature in CPUID.  Worse, when they do they report a size of
64, whereas the expected size of the PKRU extended save state is 8, therefore
the esa->size == eax assertion does not hold.

The state is already ignored by KVM_GET_SUPPORTED_CPUID because it
was not enabled in the host XCR0.  However, QEMU kvm_cpu_xsave_init()
runs before QEMU invokes arch_prctl() to enable dynamically-enabled
save states such as XTILEDATA, and KVM_GET_SUPPORTED_CPUID hides save
states that have yet to be enabled.  Therefore, kvm_cpu_xsave_init()
needs to consult the host CPUID instead of KVM_GET_SUPPORTED_CPUID,
and dies with an assertion failure.

When setting up the ExtSaveArea array to match the host, ignore features that
KVM does not report as supported.  This will cause QEMU to skip the incorrect
CPUID leaf instead of tripping the assertion.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/916
Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Analyzed-by: Yang Zhong <yang.zhong@intel.com>
Reported-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-23 14:13:58 +01:00
luofei
cb48748af7 i386: Set MCG_STATUS_RIPV bit for mce SRAR error
In the physical machine environment, when a SRAR error occurs,
the IA32_MCG_STATUS RIPV bit is set, but qemu does not set this
bit. When qemu injects an SRAR error into virtual machine, the
virtual machine kernel just call do_machine_check() to kill the
current task, but not call memory_failure() to isolate the faulty
page, which will cause the faulty page to be allocated and used
repeatedly. If used by the virtual machine kernel, it will cause
the virtual machine to crash

Signed-off-by: luofei <luofei@unicloud.com>
Message-Id: <20220120084634.131450-1-luofei@unicloud.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-23 12:22:25 +01:00
Philippe Mathieu-Daudé
dcebbb65b8 target/i386/kvm: Free xsave_buf when destroying vCPU
Fix vCPU hot-unplug related leak reported by Valgrind:

  ==132362== 4,096 bytes in 1 blocks are definitely lost in loss record 8,440 of 8,549
  ==132362==    at 0x4C3B15F: memalign (vg_replace_malloc.c:1265)
  ==132362==    by 0x4C3B288: posix_memalign (vg_replace_malloc.c:1429)
  ==132362==    by 0xB41195: qemu_try_memalign (memalign.c:53)
  ==132362==    by 0xB41204: qemu_memalign (memalign.c:73)
  ==132362==    by 0x7131CB: kvm_init_xsave (kvm.c:1601)
  ==132362==    by 0x7148ED: kvm_arch_init_vcpu (kvm.c:2031)
  ==132362==    by 0x91D224: kvm_init_vcpu (kvm-all.c:516)
  ==132362==    by 0x9242C9: kvm_vcpu_thread_fn (kvm-accel-ops.c:40)
  ==132362==    by 0xB2EB26: qemu_thread_start (qemu-thread-posix.c:556)
  ==132362==    by 0x7EB2159: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==132362==    by 0x9D45DD2: clone (in /usr/lib64/libc-2.28.so)

Reported-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Mark Kanda <mark.kanda@oracle.com>
Message-Id: <20220322120522.26200-1-philippe.mathieu.daude@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-23 12:22:25 +01:00
Paolo Bonzini
3ec5ad4008 target/i386: kvm: do not access uninitialized variable on older kernels
KVM support for AMX includes a new system attribute, KVM_X86_XCOMP_GUEST_SUPP.
Commit 19db68ca68 ("x86: Grant AMX permission for guest", 2022-03-15) however
did not fully consider the behavior on older kernels.  First, it warns
too aggressively.  Second, it invokes the KVM_GET_DEVICE_ATTR ioctl
unconditionally and then uses the "bitmask" variable, which remains
uninitialized if the ioctl fails.  Third, kvm_ioctl returns -errno rather
than -1 on errors.

While at it, explain why the ioctl is needed and KVM_GET_SUPPORTED_CPUID
is not enough.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-20 20:38:52 +01:00
Zeng Guang
cdec2b753b x86: Support XFD and AMX xsave data migration
XFD(eXtended Feature Disable) allows to enable a
feature on xsave state while preventing specific
user threads from using the feature.

Support save and restore XFD MSRs if CPUID.D.1.EAX[4]
enumerate to be valid. Likewise migrate the MSRs and
related xsave state necessarily.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-8-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:50:50 +01:00
Jing Liu
e56dd3c70a x86: add support for KVM_CAP_XSAVE2 and AMX state migration
When dynamic xfeatures (e.g. AMX) are used by the guest, the xsave
area would be larger than 4KB. KVM_GET_XSAVE2 and KVM_SET_XSAVE
under KVM_CAP_XSAVE2 works with a xsave buffer larger than 4KB.
Always use the new ioctls under KVM_CAP_XSAVE2 when KVM supports it.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-7-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:50:50 +01:00
Jing Liu
f21a48171c x86: Add AMX CPUIDs enumeration
Add AMX primary feature bits XFD and AMX_TILE to
enumerate the CPU's AMX capability. Meanwhile, add
AMX TILE and TMUL CPUID leaf and subleaves which
exist when AMX TILE is present to provide the maximum
capability of TILE and TMUL.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-6-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:50:50 +01:00
Yang Zhong
19db68ca68 x86: Grant AMX permission for guest
Kernel allocates 4K xstate buffer by default. For XSAVE features
which require large state component (e.g. AMX), Linux kernel
dynamically expands the xstate buffer only after the process has
acquired the necessary permissions. Those are called dynamically-
enabled XSAVE features (or dynamic xfeatures).

There are separate permissions for native tasks and guests.

Qemu should request the guest permissions for dynamic xfeatures
which will be exposed to the guest. This only needs to be done
once before the first vcpu is created.

KVM implemented one new ARCH_GET_XCOMP_SUPP system attribute API to
get host side supported_xcr0 and Qemu can decide if it can request
dynamically enabled XSAVE features permission.
https://lore.kernel.org/all/20220126152210.3044876-1-pbonzini@redhat.com/

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Message-Id: <20220217060434.52460-4-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:50:50 +01:00
Jing Liu
131266b756 x86: Fix the 64-byte boundary enumeration for extended state
The extended state subleaves (EAX=0Dh, ECX=n, n>1).ECX[1]
indicate whether the extended state component locates
on the next 64-byte boundary following the preceding state
component when the compacted format of an XSAVE area is
used.

Right now, they are all zero because no supported component
needed the bit to be set, but the upcoming AMX feature will
use it.  Fix the subleaves value according to KVM's supported
cpuid.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220217060434.52460-2-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:50:50 +01:00
Longpeng(Mike)
def4c5570c kvm/msi: do explicit commit when adding msi routes
We invoke the kvm_irqchip_commit_routes() for each addition to MSI route
table, which is not efficient if we are adding lots of routes in some cases.

This patch lets callers invoke the kvm_irqchip_commit_routes(), so the
callers can decide how to optimize.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00967.html

Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20220222141116.2091-3-longpeng2@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:26:20 +01:00
Peter Maydell
5df022cf2e osdep: Move memalign-related functions to their own header
Move the various memalign-related functions out of osdep.h and into
their own header, which we include only where they are used.
While we're doing this, add some brief documentation comments.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org
2022-03-07 13:16:49 +00:00
Paolo Bonzini
1520f8bb67 KVM: x86: ignore interrupt_bitmap field of KVM_GET/SET_SREGS
This is unnecessary, because the interrupt would be retrieved and queued
anyway by KVM_GET_VCPU_EVENTS and KVM_SET_VCPU_EVENTS respectively,
and it makes the flow more similar to the one for KVM_GET/SET_SREGS2.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-12 14:09:06 +01:00
Maxim Levitsky
8f515d3869 KVM: use KVM_{GET|SET}_SREGS2 when supported.
This allows to make PDPTRs part of the migration
stream and thus not reload them after migration which
is against X86 spec.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211101132300.192584-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-12 14:09:06 +01:00
Philippe Mathieu-Daudé
dc7d6cafce target/i386/kvm: Replace use of __u32 type
QEMU coding style mandates to not use Linux kernel internal
types for scalars types. Replace __u32 by uint32_t.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211116193955.2793171-1-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-12-17 10:40:51 +01:00
Maxim Levitsky
cabf9862e4 KVM: SVM: add migration support for nested TSC scaling
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211101132300.192584-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-02 15:57:27 +01:00
Philippe Mathieu-Daudé
deae846f94 target/i386/sev: Declare system-specific functions in 'sev.h'
"sysemu/sev.h" is only used from x86-specific files. Let's move it
to include/hw/i386, and merge it with target/i386/sev.h.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211007161716.453984-16-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-13 10:47:49 +02:00
Philippe Mathieu-Daudé
93777de365 target/i386/sev: Rename sev_i386.h -> sev.h
SEV is a x86 specific feature, and the "sev_i386.h" header
is already in target/i386/. Rename it as "sev.h" to simplify.

Patch created mechanically using:

  $ git mv target/i386/sev_i386.h target/i386/sev.h
  $ sed -i s/sev_i386.h/sev.h/ $(git grep -l sev_i386.h)

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20211007161716.453984-15-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-13 10:47:49 +02:00
Philippe Mathieu-Daudé
773ab6cb16 target/i386/kvm: Restrict SEV stubs to x86 architecture
SEV is x86-specific, no need to add its stub to other
architectures. Move the stub file to target/i386/kvm/.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211007161716.453984-5-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-13 10:47:49 +02:00
Philippe Mathieu-Daudé
02310f3a91 target/i386/kvm: Introduce i386_softmmu_kvm Meson source set
Introduce the i386_softmmu_kvm Meson source set to be able to
add features dependent on CONFIG_KVM.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211007161716.453984-4-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-13 10:47:49 +02:00
Vitaly Kuznetsov
af7228b88d i386: Make Hyper-V version id configurable
Currently, we hardcode Hyper-V version id (CPUID 0x40000002) to
WS2008R2 and it is known that certain tools in Windows check this. It
seems useful to provide some flexibility by making it possible to change
this info at will. CPUID information is defined in TLFS as:

EAX: Build Number
EBX Bits 31-16: Major Version
    Bits 15-0: Minor Version
ECX Service Pack
EDX Bits 31-24: Service Branch
    Bits 23-0: Service Number

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-8-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 19:04:45 +02:00
Vitaly Kuznetsov
e1f9a8e8c9 i386: Implement pseudo 'hv-avic' ('hv-apicv') enlightenment
The enlightenment allows to use Hyper-V SynIC with hardware APICv/AVIC
enabled. Normally, Hyper-V SynIC disables these hardware features and
suggests the guest to use paravirtualized AutoEOI feature. Linux-4.15
gains support for conditional APICv/AVIC disablement, the feature
stays on until the guest tries to use AutoEOI feature with SynIC. With
'HV_DEPRECATING_AEOI_RECOMMENDED' bit exposed, modern enough Windows/
Hyper-V versions should follow the recommendation and not use the
(unwanted) feature.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 19:04:45 +02:00
Vitaly Kuznetsov
050716292a i386: Move HV_APIC_ACCESS_RECOMMENDED bit setting to hyperv_fill_cpuids()
In preparation to enabling Hyper-V + APICv/AVIC move
HV_APIC_ACCESS_RECOMMENDED setting out of kvm_hyperv_properties[]: the
'real' feature bit for the vAPIC features is HV_APIC_ACCESS_AVAILABLE,
HV_APIC_ACCESS_RECOMMENDED is a recommendation to use the feature which
we may not always want to give.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 19:04:45 +02:00
Vitaly Kuznetsov
70367f0917 i386: Support KVM_CAP_HYPERV_ENFORCE_CPUID
By default, KVM allows the guest to use all currently supported Hyper-V
enlightenments when Hyper-V CPUID interface was exposed, regardless of if
some features were not announced in guest visible CPUIDs. hv-enforce-cpuid
feature alters this behavior and only allows the guest to use exposed
Hyper-V enlightenments. The feature is supported by Linux >= 5.14 and is
not enabled by default in QEMU.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 19:04:45 +02:00
Vitaly Kuznetsov
988f7b8bfe i386: Support KVM_CAP_ENFORCE_PV_FEATURE_CPUID
By default, KVM allows the guest to use all currently supported PV features
even when they were not announced in guest visible CPUIDs. Introduce a new
"kvm-pv-enforce-cpuid" flag to limit the supported feature set to the
exposed features. The feature is supported by Linux >= 5.10 and is not
enabled by default in QEMU.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 19:04:45 +02:00
Peter Xu
142518bda5 memory: Name all the memory listeners
Provide a name field for all the memory listeners.  It can be used to identify
which memory listener is which.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210817013553.30584-2-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 15:30:24 +02:00
Sean Christopherson
b9edbadefb i386: Propagate SGX CPUID sub-leafs to KVM
The SGX sub-leafs are enumerated at CPUID 0x12.  Indices 0 and 1 are
always present when SGX is supported, and enumerate SGX features and
capabilities.  Indices >=2 are directly correlated with the platform's
EPC sections.  Because the number of EPC sections is dynamic and user
defined, the number of SGX sub-leafs is "NULL" terminated.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-15-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 14:50:20 +02:00
Sean Christopherson
c22f546785 i386: kvm: Add support for exposing PROVISIONKEY to guest
If the guest want to fully use SGX, the guest needs to be able to
access provisioning key. Add a new KVM_CAP_SGX_ATTRIBUTE to KVM to
support provisioning key to KVM guests.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-14-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 14:50:20 +02:00
Sean Christopherson
a04835414b i386: Add feature control MSR dependency when SGX is enabled
SGX adds multiple flags to FEATURE_CONTROL to enable SGX and Flexible
Launch Control.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-12-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 14:50:20 +02:00
Sean Christopherson
db88806523 i386: Add get/set/migrate support for SGX_LEPUBKEYHASH MSRs
On real hardware, on systems that supports SGX Launch Control, those
MSRs are initialized to digest of Intel's signing key; on systems that
don't support SGX Launch Control, those MSRs are not available but
hardware always uses digest of Intel's signing key in EINIT.

KVM advertises SGX LC via CPUID if and only if the MSRs are writable.
Unconditionally initialize those MSRs to digest of Intel's signing key
when CPU is realized and reset to reflect the fact. This avoids
potential bug in case kvm_arch_put_registers() is called before
kvm_arch_get_registers() is called, in which case guest's virtual
SGX_LEPUBKEYHASH MSRs will be set to 0, although KVM initializes those
to digest of Intel's signing key by default, since KVM allows those MSRs
to be updated by Qemu to support live migration.

Save/restore the SGX Launch Enclave Public Key Hash MSRs if SGX Launch
Control (LC) is exposed to the guest. Likewise, migrate the MSRs if they
are writable by the guest.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-11-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 14:50:20 +02:00
Markus Armbruster
436c831a28 migration: Unify failure check for migrate_add_blocker()
Most callers check the return value.  Some check whether it set an
error.  Functionally equivalent, but the former tends to be easier on
the eyes, so do that everywhere.

Prior art: commit c6ecec43b2 "qemu-option: Check return value instead
of @err where convenient".

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210720125408.387910-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-26 17:15:28 +02:00
Markus Armbruster
a5c051b2cf i386: Never free migration blocker objects instead of sometimes
invtsc_mig_blocker has static storage duration.  When a CPU with
certain features is initialized, and invtsc_mig_blocker is still null,
we add a migration blocker and store it in invtsc_mig_blocker.

The object is freed when migrate_add_blocker() fails, leaving
invtsc_mig_blocker dangling.  It is not freed on later failures.

Same for hv_passthrough_mig_blocker and hv_no_nonarch_cs_mig_blocker.

All failures are actually fatal, so whether we free or not doesn't
really matter, except as bad examples to be copied / imitated.

Clean this up in a minimal way: never free these blocker objects.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210720125408.387910-7-armbru@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-26 17:15:28 +02:00
Vitaly Kuznetsov
e4adb09f79 i386: assert 'cs->kvm_state' is not null
Coverity reports potential NULL pointer dereference in
get_supported_hv_cpuid_legacy() when 'cs->kvm_state' is NULL. While
'cs->kvm_state' can indeed be NULL in hv_cpuid_get_host(),
kvm_hyperv_expand_features() makes sure that it only happens when
KVM_CAP_SYS_HYPERV_CPUID is supported and KVM_CAP_SYS_HYPERV_CPUID
implies KVM_CAP_HYPERV_CPUID so get_supported_hv_cpuid_legacy() is
never really called. Add asserts to strengthen the protection against
broken KVM behavior.

Coverity: CID 1458243
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210716115852.418293-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-29 10:15:51 +02:00
Claudio Fontana
5b8978d804 i386: do not call cpudef-only models functions for max, host, base
Some cpu properties have to be set only for cpu models in builtin_x86_defs,
registered with x86_register_cpu_model_type, and not for
cpu models "base", "max", and the subclass "host".

These properties are the ones set by function x86_cpu_apply_props,
(also including kvm_default_props, tcg_default_props),
and the "vendor" property for the KVM and HVF accelerators.

After recent refactoring of cpu, which also affected these properties,
they were instead set unconditionally for all x86 cpus.

This has been detected as a bug with Nested on AMD with cpu "host",
as svm was not turned on by default, due to the wrongful setting of
kvm_default_props via x86_cpu_apply_props, which set svm to "off".

Rectify the bug introduced in commit "i386: split cpu accelerators"
and document the functions that are builtin_x86_defs-only.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Tested-by: Alexander Bulekov <alxndr@bu.edu>
Fixes: f5cc5a5c ("i386: split cpu accelerators from cpu.c,"...)
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/477
Message-Id: <20210723112921.12637-1-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-23 15:47:13 +02:00
Vitaly Kuznetsov
cce087f628 i386: Hyper-V SynIC requires POST_MESSAGES/SIGNAL_EVENTS privileges
When Hyper-V SynIC is enabled, we may need to allow Windows guests to make
hypercalls (POST_MESSAGES/SIGNAL_EVENTS). No issue is currently observed
because KVM is very permissive, allowing these hypercalls regarding of
guest visible CPUid bits.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210608120817.1325125-9-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-13 09:13:29 -04:00
Vitaly Kuznetsov
b26f68c36b i386: HV_HYPERCALL_AVAILABLE privilege bit is always needed
According to TLFS, Hyper-V guest is supposed to check
HV_HYPERCALL_AVAILABLE privilege bit before accessing
HV_X64_MSR_GUEST_OS_ID/HV_X64_MSR_HYPERCALL MSRs but at least some
Windows versions ignore that. As KVM is very permissive and allows
accessing these MSRs unconditionally, no issue is observed. We may,
however, want to tighten the checks eventually. Conforming to the
spec is probably also a good idea.

Enable HV_HYPERCALL_AVAILABLE bit unconditionally.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210608120817.1325125-8-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-13 09:13:29 -04:00
Vitaly Kuznetsov
5ce48fa354 i386: kill off hv_cpuid_check_and_set()
hv_cpuid_check_and_set() does too much:
- Checks if the feature is supported by KVM;
- Checks if all dependencies are enabled;
- Sets the feature bit in cpu->hyperv_features for 'passthrough' mode.

To reduce the complexity, move all the logic except for dependencies
check out of it. Also, in 'passthrough' mode we don't really need to
check dependencies because KVM is supposed to provide a consistent
set anyway.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210608120817.1325125-7-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-13 09:13:29 -04:00
Vitaly Kuznetsov
071ce4b03b i386: expand Hyper-V features during CPU feature expansion time
To make Hyper-V features appear in e.g. QMP query-cpu-model-expansion we
need to expand and set the corresponding CPUID leaves early. Modify
x86_cpu_get_supported_feature_word() to call newly intoduced Hyper-V
specific kvm_hv_get_supported_cpuid() instead of
kvm_arch_get_supported_cpuid(). We can't use kvm_arch_get_supported_cpuid()
as Hyper-V specific CPUID leaves intersect with KVM's.

Note, early expansion will only happen when KVM supports system wide
KVM_GET_SUPPORTED_HV_CPUID ioctl (KVM_CAP_SYS_HYPERV_CPUID).

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210608120817.1325125-6-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-13 09:13:29 -04:00
Vitaly Kuznetsov
d7652b772f i386: make hyperv_expand_features() return bool
Return 'false' when hyperv_expand_features() sets an error.

No functional change intended.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210608120817.1325125-5-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-13 09:13:29 -04:00
Vitaly Kuznetsov
07454e2ea8 i386: hardcode supported eVMCS version to '1'
Currently, the only eVMCS version, supported by KVM (and described in TLFS)
is '1'. When Enlightened VMCS feature is enabled, QEMU takes the supported
eVMCS version range (from KVM_CAP_HYPERV_ENLIGHTENED_VMCS enablement) and
puts it to guest visible CPUIDs. When (and if) eVMCS ver.2 appears a
problem on migration is expected: it doesn't seem to be possible to migrate
from a host supporting eVMCS ver.2 to a host, which only support eVMCS
ver.1.

Hardcode eVMCS ver.1 as the result of 'hv-evmcs' enablement for now. Newer
eVMCS versions will have to have their own enablement options (e.g.
'hv-evmcs=2').

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210608120817.1325125-4-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-13 09:13:29 -04:00
David Edmondson
fea4500841 target/i386: Populate x86_ext_save_areas offsets using cpuid where possible
Rather than relying on the X86XSaveArea structure definition,
determine the offset of XSAVE state areas using CPUID leaf 0xd where
possible (KVM and HVF).

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210705104632.2902400-8-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-06 08:33:48 +02:00
David Edmondson
c0198c5f87 target/i386: Pass buffer and length to XSAVE helper
In preparation for removing assumptions about XSAVE area offsets, pass
a buffer pointer and buffer length to the XSAVE helper functions.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210705104632.2902400-5-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-06 07:54:53 +02:00
David Edmondson
436463b84b target/i386: Consolidate the X86XSaveArea offset checks
Rather than having similar but different checks in cpu.h and kvm.c,
move them all to cpu.h.
Message-Id: <20210705104632.2902400-3-david.edmondson@oracle.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-06 07:54:53 +02:00
Paolo Bonzini
9ce8af4d92 target/i386: kvm: add support for TSC scaling
Linux 5.14 will add support for nested TSC scaling.  Add the
corresponding feature in QEMU; to keep support for existing kernels,
do not add it to any processor yet.

The handling of the VMCS enumeration MSR is ugly; once we have more than
one case, we may want to add a table to check VMX features against.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25 10:53:46 +02:00
Chenyi Qiang
035d1ef265 i386: Add ratelimit for bus locks acquired in guest
A bus lock is acquired through either split locked access to writeback
(WB) memory or any locked access to non-WB memory. It is typically >1000
cycles slower than an atomic operation within a cache and can also
disrupts performance on other cores.

Virtual Machines can exploit bus locks to degrade the performance of
system. To address this kind of performance DOS attack coming from the
VMs, bus lock VM exit is introduced in KVM and it can report the bus
locks detected in guest. If enabled in KVM, it would exit to the
userspace to let the user enforce throttling policies once bus locks
acquired in VMs.

The availability of bus lock VM exit can be detected through the
KVM_CAP_X86_BUS_LOCK_EXIT. The returned bitmap contains the potential
policies supported by KVM. The field KVM_BUS_LOCK_DETECTION_EXIT in
bitmap is the only supported strategy at present. It indicates that KVM
will exit to userspace to handle the bus locks.

This patch adds a ratelimit on the bus locks acquired in guest as a
mitigation policy.

Introduce a new field "bus_lock_ratelimit" to record the limited speed
of bus locks in the target VM. The user can specify it through the
"bus-lock-ratelimit" as a machine property. In current implementation,
the default value of the speed is 0 per second, which means no
restrictions on the bus locks.

As for ratelimit on detected bus locks, simply set the ratelimit
interval to 1s and restrict the quota of bus lock occurence to the value
of "bus_lock_ratelimit". A potential alternative is to introduce the
time slice as a property which can help the user achieve more precise
control.

The detail of bus lock VM exit can be found in spec:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210521043820.29678-1-chenyi.qiang@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-06-17 14:11:06 -04:00
Claudio Fontana
662175b91f i386: reorder call to cpu_exec_realizefn
i386 realizefn code is sensitive to ordering, and recent commits
aimed at refactoring it, splitting accelerator-specific code,
broke assumptions which need to be fixed.

We need to:

* process hyper-v enlightements first, as they assume features
  not to be expanded

* only then, expand features

* after expanding features, attempt to check them and modify them in the
  accel-specific realizefn code called by cpu_exec_realizefn().

* after the framework has been called via cpu_exec_realizefn,
  the code can check for what has or hasn't been set by accel-specific
  code, or extend its results, ie:

  - check and evenually set code_urev default
  - modify cpu->mwait after potentially being set from host CPUID.
  - finally check for phys_bits assuming all user and accel-specific
    adjustments have already been taken into account.

Fixes: f5cc5a5c ("i386: split cpu accelerators from cpu.c"...)
Fixes: 30565f10 ("cpu: call AccelCPUClass::cpu_realizefn in"...)
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210603123001.17843-2-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-04 13:47:08 +02:00