mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Eduardo Habkost	30c60f77a8	x86-iommu: Rename QOM type macros Some QOM macros were using a X86_IOMMU_DEVICE prefix, and others were using a X86_IOMMU prefix. Rename all of them to use the same X86_IOMMU_DEVICE prefix. This will make future conversion to OBJECT_DECLARE* easier. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20200825192110.3528606-47-ehabkost@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-09-02 07:29:25 -04:00
Vitaly Kuznetsov	0baa4b445e	KVM: fix CPU reset wrt HF2_GIF_MASK HF2_GIF_MASK is set in env->hflags2 unconditionally on CPU reset (see x86_cpu_reset()) but when calling KVM_SET_NESTED_STATE, KVM_STATE_NESTED_GIF_SET is only valid for nSVM as e.g. nVMX code looks like if (kvm_state->hdr.vmx.vmxon_pa == -1ull) { if (kvm_state->flags & ~KVM_STATE_NESTED_EVMCS) return -EINVAL; } Also, when adjusting the environment after KVM_GET_NESTED_STATE we need not reset HF2_GIF_MASK on VMX as e.g. x86_cpu_pending_interrupt() expects it to be set. Alternatively, we could've made env->hflags2 SVM-only. Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Fixes: `b16c0e20c7` ("KVM: add support for AMD nested live migration") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200723142701.2521161-1-vkuznets@redhat.com> Tested-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-07-23 15:03:54 -04:00
Paolo Bonzini	e1e43813e7	KVM: x86: believe what KVM says about WAITPKG Currently, QEMU is overriding KVM_GET_SUPPORTED_CPUID's answer for the WAITPKG bit depending on the "-overcommit cpu-pm" setting. This is a bad idea because it does not even check if the host supports it, but it can be done in x86_cpu_realizefn just like we do for the MONITOR bit. This patch moves it there, while making it conditional on host support for the related UMWAIT MSR. Cc: qemu-stable@nongnu.org Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 18:02:22 -04:00
Paolo Bonzini	b16c0e20c7	KVM: add support for AMD nested live migration Support for nested guest live migration is part of Linux 5.8, add the corresponding code to QEMU. The migration format consists of a few flags, is an opaque 4k blob. The blob is in VMCB format (the control area represents the L1 VMCB control fields, the save area represents the pre-vmentry state; KVM does not use the host save area since the AMD manual allows that) but QEMU does not really care about that. However, the flags need to be copied to hflags/hflags2 and back. In addition, support for retrieving and setting the AMD nested virtualization states allows the L1 guest to be reset while running a nested guest, but a small bug in CPU reset needs to be fixed for that to work. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 18:02:17 -04:00
Marcelo Tosatti	74aaddc628	kvm: i386: allow TSC to differ by NTP correction bounds without TSC scaling The Linux TSC calibration procedure is subject to small variations (its common to see +-1 kHz difference between reboots on a given CPU, for example). So migrating a guest between two hosts with identical processor can fail, in case of a small variation in calibrated TSC between them. Allow a conservative 250ppm error between host TSC and VM TSC frequencies, rather than requiring an exact match. NTP daemon in the guest can correct this difference. Also change migration to accept this bound. KVM_SET_TSC_KHZ depends on a kernel interface change. Without this change, the behaviour remains the same: in case of a different frequency between host and VM, KVM_SET_TSC_KHZ will fail and QEMU will exit. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20200616165805.GA324612@fuller.cnet> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-26 09:39:40 -04:00
Like Xu	ea39f9b643	target/i386: define a new MSR based feature word - FEAT_PERF_CAPABILITIES The Perfmon and Debug Capability MSR named IA32_PERF_CAPABILITIES is a feature-enumerating MSR, which only enumerates the feature full-width write (via bit 13) by now which indicates the processor supports IA32_A_PMCx interface for updating bits 32 and above of IA32_PMCx. The existence of MSR IA32_PERF_CAPABILITIES is enumerated by CPUID.1:ECX[15]. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20200529074347.124619-5-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-10 12:10:47 -04:00
Pan Nengyuan	2a69314258	i386/kvm: fix a use-after-free when vcpu plug/unplug When we hotplug vcpus, cpu_update_state is added to vm_change_state_head in kvm_arch_init_vcpu(). But it forgot to delete in kvm_arch_destroy_vcpu() after unplug. Then it will cause a use-after-free access. This patch delete it in kvm_arch_destroy_vcpu() to fix that. Reproducer: virsh setvcpus vm1 4 --live virsh setvcpus vm1 2 --live virsh suspend vm1 virsh resume vm1 The UAF stack: ==qemu-system-x86_64==28233==ERROR: AddressSanitizer: heap-use-after-free on address 0x62e00002e798 at pc 0x5573c6917d9e bp 0x7fff07139e50 sp 0x7fff07139e40 WRITE of size 1 at 0x62e00002e798 thread T0 #0 0x5573c6917d9d in cpu_update_state /mnt/sdb/qemu/target/i386/kvm.c:742 #1 0x5573c699121a in vm_state_notify /mnt/sdb/qemu/vl.c:1290 #2 0x5573c636287e in vm_prepare_start /mnt/sdb/qemu/cpus.c:2144 #3 0x5573c6362927 in vm_start /mnt/sdb/qemu/cpus.c:2150 #4 0x5573c71e8304 in qmp_cont /mnt/sdb/qemu/monitor/qmp-cmds.c:173 #5 0x5573c727cb1e in qmp_marshal_cont qapi/qapi-commands-misc.c:835 #6 0x5573c7694c7a in do_qmp_dispatch /mnt/sdb/qemu/qapi/qmp-dispatch.c:132 #7 0x5573c7694c7a in qmp_dispatch /mnt/sdb/qemu/qapi/qmp-dispatch.c:175 #8 0x5573c71d9110 in monitor_qmp_dispatch /mnt/sdb/qemu/monitor/qmp.c:145 #9 0x5573c71dad4f in monitor_qmp_bh_dispatcher /mnt/sdb/qemu/monitor/qmp.c:234 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200513132630.13412-1-pannengyuan@huawei.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-10 12:10:01 -04:00
Liran Alon	73b994f6d7	i386/cpu: Store LAPIC bus frequency in CPU structure No functional change. This information will be used by following patches. Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Message-Id: <20200312165431.82118-15-liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-10 12:09:52 -04:00
Dongjiu Geng	6b552b9bc8	KVM: Move hwpoison page related functions into kvm-all.c kvm_hwpoison_page_add() and kvm_unpoison_all() will both be used by X86 and ARM platforms, so moving them into "accel/kvm/kvm-all.c" to avoid duplicate code. For architectures that don't use the poison-list functionality the reset handler will harmlessly do nothing, so let's register the kvm_unpoison_all() function in the generic kvm_init() function. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com> Acked-by: Xiang Zheng <zhengxiang9@huawei.com> Message-id: 20200512030609.19593-8-gengdongjiu@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-05-14 15:03:09 +01:00
Vitaly Kuznetsov	4a910e1f6a	target/i386: do not set unsupported VMX secondary execution controls Commit `048c95163b` ("target/i386: work around KVM_GET_MSRS bug for secondary execution controls") added a workaround for KVM pre-dating commit 6defc591846d ("KVM: nVMX: include conditional controls in /dev/kvm KVM_GET_MSRS") which wasn't setting certain available controls. The workaround uses generic CPUID feature bits to set missing VMX controls. It was found that in some cases it is possible to observe hosts which have certain CPUID features but lack the corresponding VMX control. In particular, it was reported that Azure VMs have RDSEED but lack VMX_SECONDARY_EXEC_RDSEED_EXITING; attempts to enable this feature bit result in QEMU abort. Resolve the issue but not applying the workaround when we don't have to. As there is no good way to find out if KVM has the fix itself, use 95c5c7c77c ("KVM: nVMX: list VMX MSRs in KVM_GET_MSR_INDEX_LIST") instead as these [are supposed to] come together. Fixes: `048c95163b` ("target/i386: work around KVM_GET_MSRS bug for secondary execution controls") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200331162752.1209928-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-02 14:55:45 -04:00
Paolo Bonzini	6702514814	target/i386: check for availability of MSR_IA32_UCODE_REV as an emulated MSR Even though MSR_IA32_UCODE_REV has been available long before Linux 5.6, which added it to the emulated MSR list, a bug caused the microcode version to revert to 0x100000000 on INIT. As a result, processors other than the bootstrap processor would not see the host microcode revision; some Windows version complain loudly about this and crash with a fairly explicit MICROCODE REVISION MISMATCH error. [If running 5.6 prereleases, the kernel fix "KVM: x86: do not reset microcode version on INIT or RESET" should also be applied.] Reported-by: Alex Williamson <alex.williamson@redhat.com> Message-id: <20200211175516.10716-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-02-12 16:29:40 +01:00
Philippe Mathieu-Daudé	4f7f589381	accel: Replace current_machine->accelerator by current_accel() wrapper We actually want to access the accelerator, not the machine, so use the current_accel() wrapper instead. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200121110349.25842-10-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-24 20:59:11 +01:00
Paolo Bonzini	32c87d70ff	target/i386: kvm: initialize microcode revision from KVM KVM can return the host microcode revision as a feature MSR. Use it as the default value for -cpu host. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1579544504-3616-4-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-24 20:59:10 +01:00
Paolo Bonzini	420ae1fc51	target/i386: kvm: initialize feature MSRs very early Some read-only MSRs affect the behavior of ioctls such as KVM_SET_NESTED_STATE. We can initialize them once and for all right after the CPU is realized, since they will never be modified by the guest. Reported-by: Qingua Cheng <qcheng@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1579544504-3616-2-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-24 20:59:09 +01:00
Michal Privoznik	8f54bbd0b4	x86: Check for machine state object class before typecasting it In `ed9e923c3c` ("x86: move SMM property to X86MachineState", 2019-12-17) In v4.2.0-246-ged9e923c3c the SMM property was moved from PC machine class to x86 machine class. Makes sense, but the change was too aggressive: in target/i386/kvm.c:kvm_arch_init() it altered check which sets SMRAM if given machine has SMM enabled. The line that detects whether given machine object is class of PC_MACHINE was removed from the check. This makes qemu try to enable SMRAM for all machine types, which is not what we want. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Fixes: `ed9e923c3c` ("x86: move SMM property to X86MachineState", 2019-12-17) Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <7cc91bab3191bfd7e071bdd3fdf7fe2a2991deb0.1577692822.git.mprivozn@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-07 12:08:38 +01:00
Eiichi Tsukata	7529a79607	target/i386: remove unused pci-assign codes Legacy PCI device assignment has been already removed in commit `ab37bfc7d6` ("pci-assign: Remove"), but some codes remain unused. CC: qemu-trivial@nongnu.org Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Message-Id: <20191209072932.313056-1-devel@etsukata.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-12-18 02:34:11 +01:00
Paolo Bonzini	89a289c7e9	x86: move more x86-generic functions out of PC files These are needed by microvm too, so move them outside of PC-specific files. With this patch, microvm.c need not include pc.h anymore. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-12-17 19:33:50 +01:00
Paolo Bonzini	ed9e923c3c	x86: move SMM property to X86MachineState Add it to microvm as well, it is a generic property of the x86 architecture. Suggested-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-12-17 19:33:50 +01:00
Paolo Bonzini	4376c40ded	kvm: introduce kvm_kernel_irqchip_* functions The KVMState struct is opaque, so provide accessors for the fields that will be moved from current_machine to the accelerator. For now they just forward to the machine object, but this will change. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-12-17 19:32:45 +01:00
Paolo Bonzini	23b0898e44	kvm: convert "-machine kvm_shadow_mem" to an accelerator property Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-12-17 19:32:27 +01:00
Yang Zhong	2605188240	target/i386: disable VMX features if nested=0 If kvm does not support VMX feature by nested=0, the kvm_vmx_basic can't get the right value from MSR_IA32_VMX_BASIC register, which make qemu coredump when qemu do KVM_SET_MSRS. The coredump info: error: failed to set MSR 0x480 to 0x0 kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20191206071111.12128-1-yang.zhong@intel.com> Reported-by: Catherine Ho <catherine.hecx@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-12-06 12:35:40 +01:00
Paolo Bonzini	2a9758c51e	target/i386: add support for MSR_IA32_TSX_CTRL The MSR_IA32_TSX_CTRL MSR can be used to hide TSX (also known as the Trusty Side-channel Extension). By virtualizing the MSR, KVM guests can disable TSX and avoid paying the price of mitigating TSX-based attacks on microarchitectural side channels. Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-11-21 16:35:05 +01:00
Tao Xu	6508799707	target/i386: Add support for save/load IA32_UMWAIT_CONTROL MSR UMWAIT and TPAUSE instructions use 32bits IA32_UMWAIT_CONTROL at MSR index E1H to determines the maximum time in TSC-quanta that the processor can reside in either C0.1 or C0.2. This patch is to Add support for save/load IA32_UMWAIT_CONTROL MSR in guest. Co-developed-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20191011074103.30393-3-tao3.xu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-23 17:50:27 +02:00
Tao Xu	67192a298f	x86/cpu: Add support for UMONITOR/UMWAIT/TPAUSE UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions. This patch adds support for user wait instructions in KVM. Availability of the user wait instructions is indicated by the presence of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may be executed at any privilege level, and use IA32_UMWAIT_CONTROL MSR to set the maximum time. The patch enable the umonitor, umwait and tpause features in KVM. Because umwait and tpause can put a (psysical) CPU into a power saving state, by default we dont't expose it to kvm and enable it only when guest CPUID has it. And use QEMU command-line "-overcommit cpu-pm=on" (enable_cpu_pm is enabled), a VM can use UMONITOR, UMWAIT and TPAUSE instructions. If the instruction causes a delay, the amount of time delayed is called here the physical delay. The physical delay is first computed by determining the virtual delay (the time to delay relative to the VMâ€™s timestamp counter). Otherwise, UMONITOR, UMWAIT and TPAUSE cause an invalid-opcode exception(#UD). The release document ref below link: https://software.intel.com/sites/default/files/\ managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf Co-developed-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20191011074103.30393-2-tao3.xu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-23 17:50:27 +02:00
Vitaly Kuznetsov	30d6ff662d	i386/kvm: add NoNonArchitecturalCoreSharing Hyper-V enlightenment Hyper-V TLFS specifies this enlightenment as: "NoNonArchitecturalCoreSharing - Indicates that a virtual processor will never share a physical core with another virtual processor, except for virtual processors that are reported as sibling SMT threads. This can be used as an optimization to avoid the performance overhead of STIBP". However, STIBP is not the only implication. It was found that Hyper-V on KVM doesn't pass MD_CLEAR bit to its guests if it doesn't see NoNonArchitecturalCoreSharing bit. KVM reports NoNonArchitecturalCoreSharing in KVM_GET_SUPPORTED_HV_CPUID to indicate that SMT on the host is impossible (not supported of forcefully disabled). Implement NoNonArchitecturalCoreSharing support in QEMU as tristate: 'off' - the feature is disabled (default) 'on' - the feature is enabled. This is only safe if vCPUS are properly pinned and correct topology is exposed. As CPU pinning is done outside of QEMU the enablement decision will be made on a higher level. 'auto' - copy KVM setting. As during live migration SMT settings on the source and destination host may differ this requires us to add a migration blocker. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20191018163908.10246-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-22 09:38:42 +02:00
Mario Smarduch	73284563dc	target/i386: log MCE guest and host addresses Patch logs MCE AO, AR messages injected to guest or taken by QEMU itself. We print the QEMU address for guest MCEs, helps on hypervisors that have another source of MCE logging like mce log, and when they go missing. For example we found these QEMU logs: September 26th 2019, 17:36:02.309 Droplet-153258224: Guest MCE Memory Error at qemu addr 0x7f8ce14f5000 and guest 3d6f5000 addr of type BUS_MCEERR_AR injected qemu-system-x86_64 amsN ams3nodeNNNN September 27th 2019, 06:25:03.234 Droplet-153258224: Guest MCE Memory Error at qemu addr 0x7f8ce14f5000 and guest 3d6f5000 addr of type BUS_MCEERR_AR injected qemu-system-x86_64 amsN ams3nodeNNNN The first log had a corresponding mce log entry, the second didnt (logging thresholds) we can infer from second entry same PA and mce type. Signed-off-by: Mario Smarduch <msmarduch@digitalocean.com> Message-Id: <20191009164459.8209-3-msmarduch@digitalocean.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-22 09:38:38 +02:00
Eduardo Habkost	af95cafb87	i386: Omit all-zeroes entries from KVM CPUID table KVM has a 80-entry limit at KVM_SET_CPUID2. With the introduction of CPUID[0x1F], it is now possible to hit this limit with unusual CPU configurations, e.g.: $ ./x86_64-softmmu/qemu-system-x86_64 \ -smp 1,dies=2,maxcpus=2 \ -cpu EPYC,check=off,enforce=off \ -machine accel=kvm qemu-system-x86_64: kvm_init_vcpu failed: Argument list too long This happens because QEMU adds a lot of all-zeroes CPUID entries for unused CPUID leaves. In the example above, we end up creating 48 all-zeroes CPUID entries. KVM already returns all-zeroes when emulating the CPUID instruction if an entry is missing, so the all-zeroes entries are redundant. Skip those entries. This reduces the CPUID table size by half while keeping CPUID output unchanged. Reported-by: Yumei Huang <yuhuang@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1741508 Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190822225210.32541-1-ehabkost@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-10-15 18:34:44 -03:00
Thomas Huth	a1834d975f	target/i386/kvm: Silence warning from Valgrind about uninitialized bytes When I run QEMU with KVM under Valgrind, I currently get this warning: Syscall param ioctl(generic) points to uninitialised byte(s) at 0x95BA45B: ioctl (in /usr/lib64/libc-2.28.so) by 0x429DC3: kvm_ioctl (kvm-all.c:2365) by 0x51B249: kvm_arch_get_supported_msr_feature (kvm.c:469) by 0x4C2A49: x86_cpu_get_supported_feature_word (cpu.c:3765) by 0x4C4116: x86_cpu_expand_features (cpu.c:5065) by 0x4C7F8D: x86_cpu_realizefn (cpu.c:5242) by 0x5961F3: device_set_realized (qdev.c:835) by 0x7038F6: property_set_bool (object.c:2080) by 0x707EFE: object_property_set_qobject (qom-qobject.c:26) by 0x705814: object_property_set_bool (object.c:1338) by 0x498435: pc_new_cpu (pc.c:1549) by 0x49C67D: pc_cpus_init (pc.c:1681) Address 0x1ffeffee74 is on thread 1's stack in frame #2, created by kvm_arch_get_supported_msr_feature (kvm.c:445) It's harmless, but a little bit annoying, so silence it by properly initializing the whole structure with zeroes. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-04 18:49:20 +02:00
Paolo Bonzini	048c95163b	target/i386: work around KVM_GET_MSRS bug for secondary execution controls Some secondary controls are automatically enabled/disabled based on the CPUID values that are set for the guest. However, they are still available at a global level and therefore should be present when KVM_GET_MSRS is sent to /dev/kvm. Unfortunately KVM forgot to include those, so fix that. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-04 18:49:20 +02:00
Paolo Bonzini	20a78b02d3	target/i386: add VMX features Add code to convert the VMX feature words back into MSR values, allowing the user to enable/disable VMX features as they wish. The same infrastructure enables support for limiting VMX features in named CPU models. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-04 18:49:20 +02:00
Paolo Bonzini	ede146c2e7	target/i386: expand feature words to 64 bits VMX requires 64-bit feature words for the IA32_VMX_EPT_VPID_CAP and IA32_VMX_BASIC MSRs. (The VMX control MSRs are 64-bit wide but actually have only 32 bits of information). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-04 18:49:19 +02:00
Philippe Mathieu-Daudé	d6d059ca07	hw/i386/pc: Extract e820 memory layout code Suggested-by: Samuel Ortiz <sameo@linux.intel.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190818225414.22590-3-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-09-16 17:13:07 +02:00
Wanpeng Li	d38d201f0e	i386/kvm: support guest access CORE cstate Allow guest reads CORE cstate when exposing host CPU power management capabilities to the guest. PKG cstate is restricted to avoid a guest to get the whole package information in multi-tenant scenario. Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1563154124-18579-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-09-16 12:32:20 +02:00
Jing Liu	80db491da4	x86: Intel AVX512_BF16 feature enabling Intel CooperLake cpu adds AVX512_BF16 instruction, defining as CPUID.(EAX=7,ECX=1):EAX[bit 05]. The patch adds a property for setting the subleaf of CPUID leaf 7 in case that people would like to specify it. The release spec link as follows, https://software.intel.com/sites/default/files/managed/c5/15/\ architecture-instruction-set-extensions-programming-reference.pdf Signed-off-by: Jing Liu <jing2.liu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-08-20 20:00:52 +02:00
Andrey Shinkevich	1f670a95b3	i386/kvm: initialize struct at full before ioctl call Not the whole structure is initialized before passing it to the KVM. Reduce the number of Valgrind reports. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-Id: <1564502498-805893-4-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-08-20 17:26:19 +02:00
Li Qiang	de428cead6	target-i386: kvm: 'kvm_get_supported_msrs' cleanup Function 'kvm_get_supported_msrs' is only called once now, get rid of the static variable 'kvm_supported_msrs'. Signed-off-by: Li Qiang <liq3ea@163.com> Message-Id: <20190725151639.21693-1-liq3ea@163.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-08-20 17:26:19 +02:00
Marcelo Tosatti	d645e13287	kvm: i386: halt poll control MSR support Add support for halt poll control MSR: save/restore, migration and new feature name. The purpose of this MSR is to allow the guest to disable host halt poll. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20190603230408.GA7938@amt.cnet> [Do not enable by default, as pointed out by Mark Kanda. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-08-20 17:26:17 +02:00
Markus Armbruster	54d31236b9	sysemu: Split sysemu/runstate.h off sysemu/sysemu.h sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]	2019-08-16 13:37:36 +02:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	71e8a91585	Include sysemu/reset.h a lot less In my "build everything" tree, changing sysemu/reset.h triggers a recompile of some 2600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). The main culprit is hw/hw.h, which supposedly includes it for convenience. Include sysemu/reset.h only where it's needed. Touching it now recompiles less than 200 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-9-armbru@redhat.com>	2019-08-16 13:31:52 +02:00
Jan Kiszka	bec7156a45	i386/kvm: Do not sync nested state during runtime Writing the nested state e.g. after a vmport access can invalidate important parts of the kernel-internal state, and it is not needed as well. So leave this out from KVM_PUT_RUNTIME_STATE. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Message-Id: <bdd53f40-4e60-f3ae-7ec6-162198214953@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-07-24 11:21:59 +02:00
Paolo Bonzini	1e44f3ab71	target/i386: skip KVM_GET/SET_NESTED_STATE if VMX disabled, or for SVM Do not allocate env->nested_state unless we later need to migrate the nested virtualization state. With this change, nested_state_needed() will return false if the VMX flag is not included in the virtual machine. KVM_GET/SET_NESTED_STATE is also disabled for SVM which is safer (we know that at least the NPT root and paging mode have to be saved/loaded), and thus the corresponding subsection can go away as well. Inspired by a patch from Liran Alon. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-07-19 18:02:22 +02:00
Liran Alon	79a197ab18	target/i386: kvm: Demand nested migration kernel capabilities only when vCPU may have enabled VMX Previous to this change, a vCPU exposed with VMX running on a kernel without KVM_CAP_NESTED_STATE or KVM_CAP_EXCEPTION_PAYLOAD resulted in adding a migration blocker. This was because when the code was written it was thought there is no way to reliably know if a vCPU is utilising VMX or not at runtime. However, it turns out that this can be known to some extent: In order for a vCPU to enter VMX operation it must have CR4.VMXE set. Since it was set, CR4.VMXE must remain set as long as the vCPU is in VMX operation. This is because CR4.VMXE is one of the bits set in MSR_IA32_VMX_CR4_FIXED1. There is one exception to the above statement when vCPU enters SMM mode. When a vCPU enters SMM mode, it temporarily exits VMX operation and may also reset CR4.VMXE during execution in SMM mode. When the vCPU exits SMM mode, vCPU state is restored to be in VMX operation and CR4.VMXE is restored to its original state of being set. Therefore, when the vCPU is not in SMM mode, we can infer whether VMX is being used by examining CR4.VMXE. Otherwise, we cannot know for certain but assume the worse that vCPU may utilise VMX. Summaring all the above, a vCPU may have enabled VMX in case CR4.VMXE is set or vCPU is in SMM mode. Therefore, remove migration blocker and check before migration (cpu_pre_save()) if the vCPU may have enabled VMX. If true, only then require relevant kernel capabilities. While at it, demand KVM_CAP_EXCEPTION_PAYLOAD only when the vCPU is in guest-mode and there is a pending/injected exception. Otherwise, this kernel capability is not required for proper migration. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Maran Wilson <maran.wilson@oracle.com> Tested-by: Maran Wilson <maran.wilson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-07-19 18:01:47 +02:00
Peter Maydell	c4107e8208	Bugfixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJdH7FgAAoJEL/70l94x66DdUUH+gLr/ZdjLIdfYy9cjcnevf4E cJlxdaW9KvUsK2uVgqQ/3b1yF+GCGk10n6n8ZTIbClhs+6NpqEMz5O3FA/Na6FGA 48M2DwJaJ2H9AG/lQBlSBNUZfLsEJ9rWy7DHvNut5XMJFuWGwdtF/jRhUm3KqaRq vAaOgcQHbzHU9W8r1NJ7l6pnPebeO7S0JQV+82T/ITTz2gEBDUkJ36boO6fedkVQ jLb9nZyG3CJXHm2WlxGO4hkqbLFzURnCi6imOh2rMdD8BCu1eIVl59tD1lC/A0xv Pp3xXnv9SgJXsV4/I/N3/nU85ZhGVMPQZXkxaajHPtJJ0rQq7FAG8PJMEj9yPe8= =poke -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging Bugfixes. # gpg: Signature made Fri 05 Jul 2019 21:21:52 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: ioapic: use irq number instead of vector in ioapic_eoi_broadcast hw/i386: Fix linker error when ISAPC is disabled Makefile: generate header file with the list of devices enabled target/i386: kvm: Fix when nested state is needed for migration minikconf: do not include variables from MINIKCONF_ARGS in config-all-devices.mak target/i386: fix feature check in hyperv-stub.c ioapic: clear irq_eoi when updating the ioapic redirect table entry intel_iommu: Fix unexpected unmaps during global unmap intel_iommu: Fix incorrect "end" for vtd_address_space_unmap i386/kvm: Fix build with -m32 checkpatch: do not warn for multiline parenthesized returned value pc: fix possible NULL pointer dereference in pc_machine_get_device_memory_region_size() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-07-08 10:26:18 +01:00
Max Reitz	9dc83cd9c3	i386/kvm: Fix build with -m32 find_next_bit() takes a pointer of type "const unsigned long ", but the first argument passed here is a "uint64_t ". These types are incompatible when compiling qemu with -m32. Just use ctz64() instead. Fixes: `c686193072` Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190624193913.28343-1-mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-07-05 22:16:46 +02:00
Like Xu	a94e142899	target/i386: Add CPUID.1F generation support for multi-dies PCMachine The CPUID.1F as Intel V2 Extended Topology Enumeration Leaf would be exposed if guests want to emulate multiple software-visible die within each package. Per Intel's SDM, the 0x1f is a superset of 0xb, thus they can be generated by almost same code as 0xb except die_offset setting. If the number of dies per package is greater than 1, the cpuid_min_level would be adjusted to 0x1f regardless of whether the host supports CPUID.1F. Likewise, the CPUID.1F wouldn't be exposed if env->nr_dies < 2. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20190620054525.37188-2-like.xu@linux.intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-07-05 17:08:04 -03:00
Liran Alon	12604092e2	target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities Previous commits have added support for migration of nested virtualization workloads. This was done by utilising two new KVM capabilities: KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD. Both which are required in order to correctly migrate such workloads. Therefore, change code to add a migration blocker for vCPUs exposed with Intel VMX or AMD SVM in case one of these kernel capabilities is missing. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Maran Wilson <maran.wilson@oracle.com> Message-Id: <20190619162140.133674-11-liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-06-21 13:25:28 +02:00
Liran Alon	fd13f23b8c	target/i386: kvm: Add support for KVM_CAP_EXCEPTION_PAYLOAD Kernel commit c4f55198c7c2 ("kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD") introduced a new KVM capability which allows userspace to correctly distinguish between pending and injected exceptions. This distinguish is important in case of nested virtualization scenarios because a L2 pending exception can still be intercepted by the L1 hypervisor while a L2 injected exception cannot. Furthermore, when an exception is attempted to be injected by QEMU, QEMU should specify the exception payload (CR2 in case of #PF or DR6 in case of #DB) instead of having the payload already delivered in the respective vCPU register. Because in case exception is injected to L2 guest and is intercepted by L1 hypervisor, then payload needs to be reported to L1 intercept (VMExit handler) while still preserving respective vCPU register unchanged. This commit adds support for QEMU to properly utilise this new KVM capability (KVM_CAP_EXCEPTION_PAYLOAD). Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Message-Id: <20190619162140.133674-10-liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-06-21 13:25:27 +02:00
Liran Alon	ebbfef2f34	target/i386: kvm: Add support for save and restore nested state Kernel commit 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE") introduced new IOCTLs to extract and restore vCPU state related to Intel VMX & AMD SVM. Utilize these IOCTLs to add support for migration of VMs which are running nested hypervisors. Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Maran Wilson <maran.wilson@oracle.com> Tested-by: Maran Wilson <maran.wilson@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Message-Id: <20190619162140.133674-9-liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-06-21 13:23:47 +02:00
Liran Alon	18ab37ba1c	target/i386: kvm: Block migration for vCPUs exposed with nested virtualization Commit `d98f26073b` ("target/i386: kvm: add VMX migration blocker") added a migration blocker for vCPU exposed with Intel VMX. However, migration should also be blocked for vCPU exposed with AMD SVM. Both cases should be blocked because QEMU should extract additional vCPU state from KVM that should be migrated as part of vCPU VMState. E.g. Whether vCPU is running in guest-mode or host-mode. Fixes: `d98f26073b` ("target/i386: kvm: add VMX migration blocker") Reviewed-by: Maran Wilson <maran.wilson@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Message-Id: <20190619162140.133674-6-liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-06-21 13:23:44 +02:00

1 2 3

142 Commits