The value comes from tb->flags, which is uint32_t.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-21-richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-20-richard.henderson@linaro.org>
Treat this flag exactly like we treat the other rex bits.
The -1 initialization is unused; the two tests are > 0 and == 1,
so the value can be reduced to a bool.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-19-richard.henderson@linaro.org>
Treat this flag exactly like we treat rex_b and rex_x.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-18-richard.henderson@linaro.org>
Change the storage from int to uint8_t since the value is in {0,8}.
For x86_64 add 0 in the macros to (1) promote the type back to int,
and (2) make the macro an rvalue.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-17-richard.henderson@linaro.org>
The existing flag, x86_64_hregs, does not accurately describe
its setting. It is true if and only if a REX prefix has been
seen. Yes, that affects the "h" regs, but that's secondary.
Add PREFIX_REX and include this bit in s->prefix. Add REX_PREFIX
so that the check folds away when x86_64 is compiled out.
Fold away the reg >= 8 check, because bit 3 of the register
number comes from the REX prefix in the first place.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-16-richard.henderson@linaro.org>
LMA disables traditional segmentation, exposing a flat address space.
This means that ADDSEG is off.
Since we're adding an accessor macro, pull the value directly out
of flags otherwise.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-15-richard.henderson@linaro.org>
LMA is a pre-requisite for CODE64, so there is no way to disable it
for x86_64-linux-user, and there is no way to enable it for i386.
Since we're adding an accessor macro, pull the value directly out
of flags when we're not assuming a constant.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-14-richard.henderson@linaro.org>
For x86_64 user-only, there is no way to leave 64-bit mode.
Without x86_64, there is no way to enter 64-bit mode. There is
an existing macro to aid with that; simply place it in the right
place in the ifdef chain.
Since we're adding an accessor macro, pull the value directly out
of flags when we're not assuming a constant.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-13-richard.henderson@linaro.org>
For user-only, SS32 == !VM86, because we are never in
real-mode. Since we cannot enter vm86 mode for x86_64
user-only, SS32 is always set.
Since we're adding an accessor macro, pull the value
directly out of flags otherwise.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-12-richard.henderson@linaro.org>
For user-only, CODE32 == !VM86, because we are never in real-mode.
Since we cannot enter vm86 mode for x86_64 user-only, CODE32 is
always set.
Since we're adding an accessor macro, pull the value directly out
of flags otherwise.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-11-richard.henderson@linaro.org>
For i386-linux-user, we can enter vm86 mode via the vm86(2) syscall.
That syscall explicitly returns to 32-bit mode, and the syscall does
not exist for a 64-bit x86_64 executable.
Since we're adding an accessor macro, pull the value directly out of
flags otherwise.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-10-richard.henderson@linaro.org>
On real hardware, the linux kernel has the iopl(2) syscall which
can set IOPL to 3, to allow e.g. the xserver to briefly disable
interrupts while programming the graphics card.
However, QEMU cannot and does not implement this syscall, so the
IOPL is never changed from 0. Which means that all of the checks
vs CPL <= IOPL are false for user-only.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-9-richard.henderson@linaro.org>
A user-mode executable always runs in ring 3.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-8-richard.henderson@linaro.org>
A user-mode executable is never in real-mode. Since we're adding
an accessor macro, pull the value directly out of flags for sysemu.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-7-richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-6-richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-5-richard.henderson@linaro.org>
In vm86 mode, we use the same helper as real-mode, but with
an extra check for IOPL. All non-exceptional paths set EFLAGS.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-4-richard.henderson@linaro.org>
Split out the check for CPL != 0 and the raising of #GP.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-3-richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-2-richard.henderson@linaro.org>
Ram block notifiers are currently not aware of resizes. To properly
handle resizes during migration, we want to teach ram block notifiers about
resizeable ram.
Introduce the basic infrastructure but keep using max_size in the
existing notifiers. Supply the max_size when adding and removing ram
blocks. Also, notify on resizes.
Acked-by: Paul Durrant <paul@xen.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: xen-devel@lists.xenproject.org
Cc: haxm-team@intel.com
Cc: Paul Durrant <paul@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Wenchao Wang <wenchao.wang@intel.com>
Cc: Colin Xu <colin.xu@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210429112708.12291-3-david@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Unify the duplicate code between get_hphys and mmu_translate, by simply
making get_hphys call mmu_translate. This also fixes the support for
5-level nested page tables.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In order to unify the two stages of page table lookup, we need
mmu_translate to use either the host CR0/EFER/CR4 or the guest's.
To do so, make mmu_translate use the same pg_mode constants that
were used for the NPT lookup.
This also prepares for adding 5-level NPT support, which however does
not work yet.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Extract the page table lookup out of handle_mmu_fault, which only has
to invoke mmu_translate and either fill the TLB or deliver the page
fault.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We will reuse the page walker for both SVM and regular accesses. To do
so we will build a function that receives the currently active paging
mode; start by including in cpu.h the constants and the function to go
from cr4/hflags/efer to the paging mode.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
while on x86 all CPU classes can use the same set of TCGCPUOps,
on ARM the right accel behavior depends on the type of the CPU.
So we need a way to specialize the accel behavior according to
the CPU. Therefore, add a second initialization, after the
accel_cpu->cpu_class_init, that allows to do this.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210322132800.7470-24-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
cpu_load_efer is now used only for sysemu code.
Therefore, move this function implementation to
sysemu-only section of helper.c
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210322132800.7470-22-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
a number of registers are read as 64bit under the condition that
(hflags & HF_CS64_MASK) || TARGET_X86_64)
and a number of registers are written as 64bit under the condition that
(hflags & HF_CS64_MASK).
Provide some auxiliary functions that do that.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210322132800.7470-20-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For now we just copy over the previous user stubs, but really,
everything that requires s->cpl == 0 should be impossible
to trigger from user-mode emulation.
Later on we should add a check that asserts this easily f.e.:
static bool check_cpl0(DisasContext *s)
{
int cpl = s->cpl;
#ifdef CONFIG_USER_ONLY
assert(cpl == 3);
#endif
if (cpl != 0) {
gen_exception(s, EXCP0D_GPF, s->pc_start - s->cs_base);
return false;
}
return true;
}
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210322132800.7470-17-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
create a separate tcg/sysemu/fpu_helper.c for the sysemu-only parts.
For user mode, some small #ifdefs remain in tcg/fpu_helper.c
which do not seem worth splitting into their own user-mode module.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210322132800.7470-16-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
for user-mode, assert that the hidden IOBPT flags are not set
while attempting to generate io_bpt helpers.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210322132800.7470-14-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
smm is only really useful for sysemu, split in two modules
around the CONFIG_USER_ONLY, in order to remove the ifdef
and use the build system instead.
add cpu_abort() when detecting attempts to enter SMM mode via
SMI interrupt in user-mode, and assert that the cpu is not
in SMM mode while translating RSM instructions.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210322132800.7470-12-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
overall, all devices' realize functions take an Error **errp, but return void.
hw/core/qdev.c code, which realizes devices, therefore does:
local_err = NULL;
dc->realize(dev, &local_err);
if (local_err != NULL) {
goto fail;
}
However, we can improve at least accel_cpu to return a meaningful bool value.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210322132800.7470-9-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
move the check for phys_bits outside of host_cpu_adjust_phys_bits,
because otherwise it is impossible to return an error condition
explicitly.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210322132800.7470-8-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
avoid open coding the accesses to cpu->accel_cpu interfaces,
and instead introduce:
accel_cpu_instance_init,
accel_cpu_realizefn
to be used by the targets/ initfn code,
and by cpu_exec_realizefn respectively.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210322132800.7470-7-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
move the call to accel_cpu->cpu_realizefn to the general
cpu_exec_realizefn from target/i386, so it does not need to be
called for every target explicitly as we enable more targets.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210322132800.7470-6-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
i386 is the first user of AccelCPUClass, allowing to split
cpu.c into:
cpu.c cpuid and common x86 cpu functionality
host-cpu.c host x86 cpu functions and "host" cpu type
kvm/kvm-cpu.c KVM x86 AccelCPUClass
hvf/hvf-cpu.c HVF x86 AccelCPUClass
tcg/tcg-cpu.c TCG x86 AccelCPUClass
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[claudio]:
Rebased on commit b8184135 ("target/i386: allow modifying TCG phys-addr-bits")
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210322132800.7470-5-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The helper_* functions must use GETPC() to unwind from TCG.
The cpu_x86_* functions cannot, and directly calling the
helper_* functions is a bug. Split out new functions that
perform the work and can be used by both.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Tested-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210322132800.7470-4-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change the prefix from "helper" to "do". The former should be
reserved for those functions that are called from TCG; the latter
is in use within the file already for those functions that are
called from the helper functions, adding a "retaddr" argument.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Tested-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210322132800.7470-3-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Stop including exec/address-spaces.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-5-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Found the following cpu feature bits missing from EPYC-Rome model.
ibrs : Indirect Branch Restricted Speculation
ssbd : Speculative Store Bypass Disable
These new features will be added in EPYC-Rome-v2. The -cpu help output
after the change.
x86 EPYC-Rome (alias configured by machine type)
x86 EPYC-Rome-v1 AMD EPYC-Rome Processor
x86 EPYC-Rome-v2 AMD EPYC-Rome Processor
Reported-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <161478622280.16275.6399866734509127420.stgit@bmoger-ubuntu>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
These two opcodes only allow a memory operand.
Lacking the check for a register operand, we used the A0 temp
without initialization, which led to a tcg abort.
Buglink: https://bugs.launchpad.net/qemu/+bug/1921138
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210324164650.128608-1-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM doesn't fully support Hyper-V reenlightenment notifications on
migration. In particular, it doesn't support emulating TSC frequency
of the source host by trapping all TSC accesses so unless TSC scaling
is supported on the destination host and KVM_SET_TSC_KHZ succeeds, it
is unsafe to proceed with migration.
KVM_SET_TSC_KHZ is called from two sites: kvm_arch_init_vcpu() and
kvm_arch_put_registers(). The later (intentionally) doesn't propagate
errors allowing migrations to succeed even when TSC scaling is not
supported on the destination. This doesn't suit 're-enlightenment'
use-case as we have to guarantee that TSC frequency stays constant.
Require 'tsc-frequency=' command line option to be specified for successful
migration when re-enlightenment was enabled by the guest.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210319123801.1111090-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Even the name of this section is 'cpu/msr_hyperv_hypercall',
'hypercall_hypercall' is clearly a typo.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210318160249.1084178-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
env->error_code is only 32-bits wide, so the high 32 bits of EXITINFO1
are being lost. However, even though saving guest state and restoring
host state must be delayed to do_vmexit, because they might take tb_lock,
it is always possible to write to the VMCB. So do this for the exit
code and EXITINFO1, just like it is already being done for EXITINFO2.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The 'running' argument from VMChangeStateHandler does not require
other value than 0 / 1. Make it a plain boolean.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210111152020.1422021-3-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
An assorted set of spelling fixes in various places.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210309111510.79495-1-mjt@msgid.tls.msk.ru>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
If kvm_arch_remove_sw_breakpoint finds that a software breakpoint does not
have an INT3 instruction, it fails. This can happen if one sets a
software breakpoint in a kernel module and then reloads it. gdb then
thinks the breakpoint cannot be deleted and there is no way to add it
back.
Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Bus lock debug exception is a feature that can notify the kernel by
generate an #DB trap after the instruction acquires a bus lock when
CPL>0. This allows the kernel to enforce user application throttling or
mitigations.
This feature is enumerated via CPUID.(EAX=7,ECX=0).ECX[bit 24].
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210202090224.13274-1-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The preferred syntax is to use "foo=on|off", rather than a bare
"+foo" or "-foo"
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210216191027.595031-11-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Adds the support for AMD 3rd generation processors. The model
display for the new processor will be EPYC-Milan.
Adds the following new feature bits on top of the feature bits from
the first and second generation EPYC models.
pcid : Process context identifiers support
ibrs : Indirect Branch Restricted Speculation
ssbd : Speculative Store Bypass Disable
erms : Enhanced REP MOVSB/STOSB support
fsrm : Fast Short REP MOVSB support
invpcid : Invalidate processor context ID
pku : Protection keys support
svme-addr-chk : SVM instructions address check for #GP handling
Depends on the following kernel commits:
14c2bf81fcd2 ("KVM: SVM: Fix #GP handling for doubly-nested virtualization")
3b9c723ed7cf ("KVM: SVM: Add support for SVM instruction address check change")
4aa2691dcbd3 ("8ce1c461188799d863398dd2865d KVM: x86: Factor out x86 instruction emulation with decoding")
4407a797e941 ("KVM: SVM: Enable INVPCID feature on AMD")
9715092f8d7e ("KVM: X86: Move handling of INVPCID types to x86")
3f3393b3ce38 ("KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c")
830bd71f2c06 ("KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept")
4c44e8d6c193 ("KVM: SVM: Add new intercept word in vmcb_control_area")
c62e2e94b9d4 ("KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors")
9780d51dc2af ("KVM: SVM: Modify intercept_exceptions to generic intercepts")
30abaa88382c ("KVM: SVM: Change intercept_dr to generic intercepts")
03bfeeb988a9 ("KVM: SVM: Change intercept_cr to generic intercepts")
c45ad7229d13 ("KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)")
a90c1ed9f11d ("(pcid) KVM: nSVM: Remove unused field")
fa44b82eb831 ("KVM: x86: Move MPK feature detection to common code")
38f3e775e9c2 ("x86/Kconfig: Update config and kernel doc for MPK feature on AMD")
37486135d3a7 ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <161290460478.11352.8933244555799318236.stgit@bmoger-ubuntu>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The CPUID function 1 has a bit called OSXSAVE which tells user space the
status of the CR4.OSXSAVE bit. Our generic CPUID function injects that bit
based on the status of CR4.
With Hypervisor.framework, we do not synchronize full CPU state often enough
for this function to see the CR4 update before guest user space asks for it.
To be on the save side, let's just always synchronize it when we receive a
CPUID(1) request. That way we can set the bit with real confidence.
Reported-by: Asad Ali <asad@osaro.com>
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-Id: <20210123004129.6364-1-agraf@csgraf.de>
[RB: resolved conflict with another CPUID change]
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some guests (ex. Darwin-XNU) can attemp to read this MSR to retrieve and
validate CPU topology comparing it to ACPI MADT content
MSR description from Intel Manual:
35H: MSR_CORE_THREAD_COUNT: Configured State of Enabled Processor Core
Count and Logical Processor Count
Bits 15:0 THREAD_COUNT The number of logical processors that are
currently enabled in the physical package
Bits 31:16 Core_COUNT The number of processor cores that are currently
enabled in the physical package
Bits 63:32 Reserved
Signed-off-by: Vladislav Yaroshchuk <yaroshchuk2000@gmail.com>
Message-Id: <20210113205323.33310-1-yaroshchuk2000@gmail.com>
[RB: reordered MSR definition and dropped u suffix from shift offset]
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The hvf i386 has a few struct and cpp definitions that are never
used. Remove them.
Suggested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-Id: <20210120224444.71840-3-agraf@csgraf.de>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For `-accel hvf` cpu_x86_cpuid() is wrapped with hvf_cpu_x86_cpuid() to
add paravirtualization cpuid leaf 0x40000010
https://lkml.org/lkml/2008/10/1/246
Leaf 0x40000010, Timing Information:
EAX: (Virtual) TSC frequency in kHz.
EBX: (Virtual) Bus (local apic timer) frequency in kHz.
ECX, EDX: RESERVED (Per above, reserved fields are set to zero).
On macOS TSC and APIC Bus frequencies can be readed by sysctl call with
names `machdep.tsc.frequency` and `hw.busfrequency`
This options is required for Darwin-XNU guest to be synchronized with
host
Leaf 0x40000000 not exposes HVF leaving hypervisor signature empty
Signed-off-by: Vladislav Yaroshchuk <yaroshchuk2000@gmail.com>
Message-Id: <20210122150518.3551-1-yaroshchuk2000@gmail.com>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This prevents illegal instruction on cpus that do not support xgetbv.
Buglink: https://bugs.launchpad.net/qemu/+bug/1758819
Reviewed-by: Cameron Esfahani <dirty@apple.com>
Signed-off-by: Hill Ma <maahiuzeon@gmail.com>
Message-Id: <X/6OJ7qk0W6bHkHQ@Hills-Mac-Pro.local>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Update the sev_es_enabled() function return value to be based on the SEV
policy that has been specified. SEV-ES is enabled if SEV is enabled and
the SEV-ES policy bit is set in the policy object.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Message-Id: <c69f81c6029f31fc4c52a9f35f1bd704362476a5.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SMM is not currently supported for an SEV-ES guest by KVM. Change the SMM
capability check from a KVM-wide check to a per-VM check in order to have
a finer-grained SMM capability check.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Message-Id: <f851903809e9d4e6a22d5dfd738dac8da991e28d.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
An SEV-ES guest does not allow register state to be altered once it has
been measured. When an SEV-ES guest issues a reboot command, Qemu will
reset the vCPU state and resume the guest. This will cause failures under
SEV-ES. Prevent that from occuring by introducing an arch-specific
callback that returns a boolean indicating whether vCPUs are resettable.
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: David Hildenbrand <david@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Message-Id: <1ac39c441b9a3e970e9556e1cc29d0a0814de6fd.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When SEV-ES is enabled, it is not possible modify the guests register
state after it has been initially created, encrypted and measured.
Normally, an INIT-SIPI-SIPI request is used to boot the AP. However, the
hypervisor cannot emulate this because it cannot update the AP register
state. For the very first boot by an AP, the reset vector CS segment
value and the EIP value must be programmed before the register has been
encrypted and measured. Search the guest firmware for the guest for a
specific GUID that tells Qemu the value of the reset vector to use.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <22db2bfb4d6551aed661a9ae95b4fdbef613ca21.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In prep for AP booting, require the use of in-kernel irqchip support. This
lessens the Qemu support burden required to boot APs.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Message-Id: <e9aec5941e613456f0757f5a73869cdc5deea105.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Provide initial support for SEV-ES. This includes creating a function to
indicate the guest is an SEV-ES guest (which will return false until all
support is in place), performing the proper SEV initialization and
ensuring that the guest CPU state is measured as part of the launch.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Co-developed-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Message-Id: <2e6386cbc1ddeaf701547dd5677adf5ddab2b6bd.1611682609.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the gpa isn't specified, it's value is extracted from the OVMF
properties table located below the reset vector (and if this doesn't
exist, an error is returned). OVMF has defined the GUID for the SEV
secret area as 4c2eb361-7d9b-4cc3-8081-127c90d3d294 and the format of
the <data> is: <base>|<size> where both are uint32_t. We extract
<base> and use it as the gpa for the injection.
Note: it is expected that the injected secret will also be GUID
described but since qemu can't interpret it, the format is left
undefined here.
Signed-off-by: James Bottomley <jejb@linux.ibm.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210204193939.16617-3-jejb@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
OVMF is developing a mechanism for depositing a GUIDed table just
below the known location of the reset vector. The table goes
backwards in memory so all entries are of the form
<data>|len|<GUID>
Where <data> is arbtrary size and type, <len> is a uint16_t and
describes the entire length of the entry from the beginning of the
data to the end of the guid.
The foot of the table is of this form and <len> for this case
describes the entire size of the table. The table foot GUID is
defined by OVMF as 96b582de-1fb2-45f7-baea-a366c55a082d and if the
table is present this GUID is just below the reset vector, 48 bytes
before the end of the firmware file.
Add a parser for the ovmf reset block which takes a copy of the block,
if the table foot guid is found, minus the footer and a function for
later traversal to return the data area of any specified GUIDs.
Signed-off-by: James Bottomley <jejb@linux.ibm.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210204193939.16617-2-jejb@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use g2h_untagged in contexts that have no cpu, e.g. the binary
loaders that operate before the primary cpu is created. As a
colollary, target_mmap and friends must use untagged addresses,
since they are used by the loaders.
Use g2h_untagged on values returned from target_mmap, as the
kernel never applies a tag itself.
Use g2h_untagged on all pc values. The only current user of
tags, aarch64, removes tags from code addresses upon branch,
so "pc" is always untagged.
Use g2h with the cpu context on hand wherever possible.
Use g2h_untagged in lock_user, which will be updated soon.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210212184902.1251044-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Expose the VMX exit/entry load pkrs control bits in
VMX_TRUE_EXIT_CTLS/VMX_TRUE_ENTRY_CTLS MSRs to guest, which supports the
PKS in nested VM.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210205083325.13880-3-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
PKS introduces MSR IA32_PKRS(0x6e1) to manage the supervisor protection
key rights. Page access and writes can be managed via the MSR update
without TLB flushes when permissions change.
Add the support to save/load IA32_PKRS MSR in guest.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210205083325.13880-2-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Protection Keys for Supervisor-mode pages is a simple extension of
the PKU feature that QEMU already implements. For supervisor-mode
pages, protection key restrictions come from a new MSR. The MSR
has no XSAVE state associated to it.
PKS is only respected in long mode. However, in principle it is
possible to set the MSR even outside long mode, and in fact
even the XSAVE state for PKRU could be set outside long mode
using XRSTOR. So do not limit the migration subsections for
PKRU and PKRS to long mode.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch fixes a translation bug for a subset of x86 BMI instructions
such as the following:
c4 e2 f9 f7 c0 shlxq %rax, %rax, %rax
Currently, these incorrectly generate an undefined instruction exception
when SSE is disabled via CR4, while instructions like "shrxq" work fine.
The problem appears to be related to BMI instructions encoded using VEX
and with a mandatory prefix of "0x66" (data). Instructions with this
data prefix (such as shlxq) are currently rejected. Instructions with
other mandatory prefixes (such as shrxq) translate as expected.
This patch removes the incorrect check in "gen_sse" that causes the
exception to be generated. For the non-BMI cases, the check is
redundant: prefixes are already checked at line 3696.
Buglink: https://bugs.launchpad.net/qemu/+bug/1748296
Signed-off-by: David Greenaway <dgreenaway@google.com>
Message-Id: <20210114063958.1508050-1-dgreenaway@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Newer AMD CPUs will add CPUID_0x8000000A_EDX[28] bit, which indicates
that SVM instructions (VMRUN/VMSAVE/VMLOAD) will trigger #VMEXIT before
CPU checking their EAX against reserved memory regions. This change will
allow the hypervisor to avoid intercepting #GP and emulating SVM
instructions. KVM turns on this CPUID bit for nested VMs. In order to
support it, let us populate this bit, along with other SVM feature bits,
in FEAT_SVM.
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Message-Id: <20210126202456.589932-1-wei.huang2@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
32-bit targets by definition do not support long mode; therefore, the
bit must be masked in the features supported by the accelerator.
As a side effect, this avoids setting up the 0x80000008 CPUID leaf
for
qemu-system-i386 -cpu host
which since commit 5a140b255d ("x86/cpu: Use max host physical address
if -cpu max option is applied") would have printed this error:
qemu-system-i386: phys-bits should be between 32 and 36 (but is 48)
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
While we've abstracted some (potential) differences between mechanisms for
securing guest memory, the initialization is still specific to SEV. Given
that, move it into x86's kvm_arch_init() code, rather than the generic
kvm_init() code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
The platform specific details of mechanisms for implementing
confidential guest support may require setup at various points during
initialization. Thus, it's not really feasible to have a single cgs
initialization hook, but instead each mechanism needs its own
initialization calls in arch or machine specific code.
However, to make it harder to have a bug where a mechanism isn't
properly initialized under some circumstances, we want to have a
common place, late in boot, where we verify that cgs has been
initialized if it was requested.
This patch introduces a ready flag to the ConfidentialGuestSupport
base type to accomplish this, which we verify in
qemu_machine_creation_done().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
This allows failures to be reported richly and idiomatically.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Currently the "memory-encryption" property is only looked at once we
get to kvm_init(). Although protection of guest memory from the
hypervisor isn't something that could really ever work with TCG, it's
not conceptually tied to the KVM accelerator.
In addition, the way the string property is resolved to an object is
almost identical to how a QOM link property is handled.
So, create a new "confidential-guest-support" link property which sets
this QOM interface link directly in the machine. For compatibility we
keep the "memory-encryption" property, but now implemented in terms of
the new property.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
When AMD's SEV memory encryption is in use, flash memory banks (which are
initialed by pc_system_flash_map()) need to be encrypted with the guest's
key, so that the guest can read them.
That's abstracted via the kvm_memcrypt_encrypt_data() callback in the KVM
state.. except, that it doesn't really abstract much at all.
For starters, the only call site is in code specific to the 'pc'
family of machine types, so it's obviously specific to those and to
x86 to begin with. But it makes a bunch of further assumptions that
need not be true about an arbitrary confidential guest system based on
memory encryption, let alone one based on other mechanisms:
* it assumes that the flash memory is defined to be encrypted with the
guest key, rather than being shared with hypervisor
* it assumes that that hypervisor has some mechanism to encrypt data into
the guest, even though it can't decrypt it out, since that's the whole
point
* the interface assumes that this encrypt can be done in place, which
implies that the hypervisor can write into a confidential guests's
memory, even if what it writes isn't meaningful
So really, this "abstraction" is actually pretty specific to the way SEV
works. So, this patch removes it and instead has the PC flash
initialization code call into a SEV specific callback.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Several architectures have mechanisms which are designed to protect
guest memory from interference or eavesdropping by a compromised
hypervisor. AMD SEV does this with in-chip memory encryption and
Intel's TDX can do similar things. POWER's Protected Execution
Framework (PEF) accomplishes a similar goal using an ultravisor and
new memory protection features, instead of encryption.
To (partially) unify handling for these, this introduces a new
ConfidentialGuestSupport QOM base class. "Confidential" is kind of vague,
but "confidential computing" seems to be the buzzword about these schemes,
and "secure" or "protected" are often used in connection to unrelated
things (such as hypervisor-from-guest or guest-from-guest security).
The "support" in the name is significant because in at least some of the
cases it requires the guest to take specific actions in order to protect
itself from hypervisor eavesdropping.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This will allow us to centralize the registration of
the cpus.c module accelerator operations (in accel/accel-softmmu.c),
and trigger it automatically using object hierarchy lookup from the
new accel_init_interfaces() initialization step, depending just on
which accelerators are available in the code.
Rename all tcg-cpus.c, kvm-cpus.c, etc to tcg-accel-ops.c,
kvm-accel-ops.c, etc, matching the object type names.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210204163931.7358-18-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
we cannot in principle make the TCG Operations field definitions
conditional on CONFIG_TCG in code that is included by both common_ss
and specific_ss modules.
Therefore, what we can do safely to restrict the TCG fields to TCG-only
builds, is to move all tcg cpu operations into a separate header file,
which is only included by TCG, target-specific code.
This leaves just a NULL pointer in the cpu.h for the non-TCG builds.
This also tidies up the code in all targets a bit, having all TCG cpu
operations neatly contained by a dedicated data struct.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210204163931.7358-16-cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>