The current way of injecting MCE events without updating of and
synchronizing with the CPUState is broken and causes spurious
corruptions of the MCE-related parts of the CPUState.
As a first step towards a fix, enhance the state writeback code with
support for injecting events that are pending in the CPUState. A pending
exception will then be signaled via cpu_interrupt(CPU_INTERRUPT_MCE).
And, just like for TCG, we need to leave the halt state when
CPU_INTERRUPT_MCE is pending (left broken for the to-be-removed old KVM
code).
This will also allow to unify TCG and KVM injection code.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We will broaden the scope of this function on x86 beyond irqchip events.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Pure function suffling to avoid multiple #ifdef KVM_CAP_MCE sections,
no functional changes. While at it, annotate some #ifdef sections.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Allow to tell cpu_x86_inject_mce that it should ignore Action Optional
MCE events when the target VCPU is still processing another one. This
will be used by KVM soon.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
As this service is used by the human monitor, make sure that errors get
reported to the right channel, and also raise the verbosity.
This requires to move Monitor typedef in qemu-common.h to resolve the
include dependency.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
MCEs can be injected asynchronously, so they can also terminate the halt
state.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
All implementations are now the same, and there is only one caller,
so inline the function there.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
We have qemu_cpu_self and qemu_thread_self. The latter is retrieving the
current thread, the former is checking for equality (using CPUState). We
also have qemu_thread_equal which is only used like qemu_cpu_self.
This refactors the interfaces, creating qemu_cpu_is_self and
qemu_thread_is_self as well ass qemu_thread_get_self.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
We do not need to abort, but the user should be notified that weird
things go on.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We do not check them, and the only arch with non-empty implementations
always returns 0 (this is also true for qemu-kvm).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Effectively no functional change yet as kvm_irqchip_in_kernel still only
returns 0, but this patch will allow qemu-kvm to adopt the VCPU loop of
upsteam KVM.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Mixing up TCG bits with KVM already led to problems around eflags
emulation on x86. Moreover, quite some code that TCG requires on cpu
enty/exit is useless for KVM. So dispatch between tcg_cpu_exec and
kvm_cpu_exec as early as possible.
The core logic of cpu_halted from cpu_exec is added to
kvm_arch_process_irqchip_events. Moving away from cpu_exec makes
exception_index meaningless for KVM, we can simply pass the exit reason
directly (only "EXCP_DEBUG vs. rest" is relevant).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If the machine is stopped, we should not record two different tsc values
upon a save operation. The same problem happens with kvmclock.
But kvmclock is taking a different diretion, being now seen as a separate
device. Since this is unlikely to happen with the tsc, I am taking the
approach here of simply registering a handler for state change, and
using a per-CPUState variable that prevents double updates for the TSC.
Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
KVM requires to reenter the kernel after IO exits in order to complete
instruction emulation. Failing to do so will leave the kernel state
inconsistently behind. To ensure that we will get back ASAP, we issue a
self-signal that will cause KVM_RUN to return once the pending
operations are completed.
We can move kvm_arch_process_irqchip_events out of the inner VCPU loop.
The only state that mattered at its old place was a pending INIT
request. Catch it in kvm_arch_pre_run and also trigger a self-signal to
process the request on next kvm_cpu_exec.
This patch also fixes the missing exit_request check in kvm_cpu_exec in
the CONFIG_IOTHREAD case.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Provide arch-independent kvm_on_sigbus* stubs to remove the #ifdef'ery
from cpus.c. This patch also fixes --disable-kvm build by providing the
missing kvm_on_sigbus_vcpu kvm-stub.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When broadcasting MCEs, we need to set MCIP and RIPV in mcg_status like
it is done for KVM. Use the symbolic constants at this chance.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
If the kernel does not support KVM_CAP_ASYNC_PF, it also does not know
about the related MSR. So skip it during state synchronization in that
case. Fixes annoying kernel warnings.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
All CPUX86State variables before CPU_COMMON are automatically cleared on
reset. Reorder nmi_injected and nmi_pending to avoid having to touch
them explicitly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In order to support loading BIOSes > 256K, reorder the code, adjusting
the base if the kernel supports moving the identity map.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of splattering the code with #ifdefs and runtime checks for
capabilities we cannot work without anyway, provide central test
infrastructure for verifying their availability both at build and
runtime.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If we lack kvm_para.h, MSR_KVM_ASYNC_PF_EN is not defined. The change in
kvm_arch_init_vcpu is just for consistency reasons.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Make sure to write the cleared MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
and MSR_KVM_ASYNC_PF_EN to the kernel state so that a freshly booted
guest cannot be disturbed by old values.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Simplify kvm_has_msr_star/hsave_pa to booleans and push their one-time
initialization into kvm_arch_init. Also handle potential errors of that
setup procedure.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
For unknown reasons, xcr0 reset ended up in kvm_arch_update_guest_debug
on upstream merge. Fix this and also remove the misleading comment (1 is
THE reset value).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
kvm_arch_reset_vcpu initializes mp_state, and that function is invoked
right after kvm_arch_init_vcpu.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This code path will not yet be taken as we still lack in-kernel irqchip
support. But qemu-kvm can already make use of it and drop its own
mp_state access services.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The ordering doesn't matter in this case, but better keep it consistent.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Introduce the cpu_dump_state flag CPU_DUMP_CODE and implement it for
x86. This writes out the code bytes around the current instruction
pointer. Make use of this feature in KVM to help debugging fatal vm
exits.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Report KVM_EXIT_UNKNOWN, KVM_EXIT_FAIL_ENTRY, and KVM_EXIT_EXCEPTION
with more details to stderr. The latter two are so far x86-only, so move
them into the arch-specific handler. Integrate the Intel real mode
warning on KVM_EXIT_FAIL_ENTRY that qemu-kvm carries, but actually
restrict it to Intel CPUs. Moreover, always dump the CPU state in case
we fail.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Ensure that we stop the guest whenever we face a fatal or unknown exit
reason. If we stop, we also have to enforce a cpu loop exit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This exit only triggers activity in the common exit path, but we should
accept it in order to be able to detect unknown exit types.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This unbreaks guest debugging when the 4th hardware breakpoint used for
guest debugging is a watchpoint of 4 or 8 byte lenght. The 31st bit of
DR7 is set in that case and used to cause a sign extension to the high
word which was breaking the guest state (vm entry failure).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This seems to date back to the days KVM didn't support real mode. The
check is no longer needed and, even worse, is corrupting the guest state
in case SS.RPL != DPL.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The DPL is stored in the flags and not in the selector. In fact, the RPL
may differ from the DPL at some point in time, and so we were corrupting
the guest state so far.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Share same error handing, and rename this function after
MCIP (Machine Check In Progress) flag.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Add function for checking whether current CPU support mca broadcast.
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When the following test case is injected with mce command, maybe user could not
get the expected result.
DATA
command cpu bank status mcg_status addr misc
(qemu) mce 1 1 0xbd00000000000000 0x05 0x1234 0x8c
Expected Result
panic type: "Fatal Machine check"
That is because each mce command can only inject the given cpu and could not
inject mce interrupt to other cpus. So user will get the following result:
panic type: "Fatal machine check on current CPU"
"broadcast" option is used for injecting dummy data into other cpus. Injecting
mce with this option the expected result could be gotten.
Usage:
Broadcast[on]
command broadcast cpu bank status mcg_status addr misc
(qemu) mce -b 1 1 0xbd00000000000000 0x05 0x1234 0x8c
Broadcast[off]
command cpu bank status mcg_status addr misc
(qemu) mce 1 1 0xbd00000000000000 0x05 0x1234 0x8c
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Clean up cpu_inject_x86_mce() for later patch.
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
simple cleanup and use existing helper: kvm_check_extension().
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Make use of the new KVM_NMI IOCTL to send NMIs into the KVM guest if the
user space raised them. (example: qemu monitor's "nmi" command)
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Use this for assignment to the low byte or low word of a register.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Only bits 8..23 of the segment flags contain valid data, so only dump
those when printing the CPU state.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
softfloat.h's uint64 type has least-width semantics.
Use uint64_t instead since that is used in helpers.
v4:
* Summary change.
v3:
* Split off.
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Acked-by: Huang Ying <ying.huang@intel.com>
Acked-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
fprintf_function uses format checking with GCC_FMT_ATTR.
Format errors were fixed in
* target-i386/helper.c
* target-mips/translate.c
* target-ppc/translate.c
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This patch removes following warnings:
target-i386/kvm.c: In function 'kvm_put_msrs':
target-i386/kvm.c:782: error: unused variable 'i'
target-i386/kvm.c: In function 'kvm_get_msrs':
target-i386/kvm.c:1083: error: label at end of compound statement
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
There is no reason why SRAO event received by the main thread
is the only one that being broadcasted.
According to the x86 ASDM vol.3A 15.10.4.1,
MCE signal is broadcast on processor version 06H_EH or later.
This change is required to handle SRAR in smp guests.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
And restruct this block to call kvm_mce_in_exception() only when it is
required.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Port qemu-kvm's
commit 1bab5d11545d8de5facf46c28630085a2f9651ae
Author: Huang Ying <ying.huang@intel.com>
Date: Wed Mar 3 16:52:46 2010 +0800
Add savevm/loadvm support for MCE
MCE registers are saved/load into/from CPUState in
kvm_arch_save/load_regs. To simulate the MCG_STATUS clearing upon
reset, MSR_MCG_STATUS is set to 0 for KVM_PUT_RESET_STATE.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Port qemu-kvm's
commit 4b62fff1101a7ad77553147717a8bd3bf79df7ef
Author: Huang Ying <ying.huang@intel.com>
Date: Mon Sep 21 10:43:25 2009 +0800
MCE: Relay UCR MCE to guest
UCR (uncorrected recovery) MCE is supported in recent Intel CPUs,
where some hardware error such as some memory error can be reported
without PCC (processor context corrupted). To recover from such MCE,
the corresponding memory will be unmapped, and all processes accessing
the memory will be killed via SIGBUS.
For KVM, if QEMU/KVM is killed, all guest processes will be killed
too. So we relay SIGBUS from host OS to guest system via a UCR MCE
injection. Then guest OS can isolate corresponding memory and kill
necessary guest processes only. SIGBUS sent to main thread (not VCPU
threads) will be broadcast to all VCPU threads as UCR MCE.
aliguori: fix build
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Port qemu-kvm's MCE support
commit c68b2374c9048812f488e00ffb95db66c0bc07a7
Author: Huang Ying <ying.huang@intel.com>
Date: Mon Jul 20 10:00:53 2009 +0800
Add MCE simulation support to qemu/kvm
KVM ioctls are used to initialize MCE simulation and inject MCE. The
real MCE simulation is implemented in Linux kernel. The Kernel part
has been merged.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds the svm cpuid feature flags to the qemu
intialization path. It also adds the svm features available
on phenom to its cpu-definition and extends the host cpu
type to support all svm features KVM can provide.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch cleans the (stack-allocated) cpuid definition to
0 before actually initializing it.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Compiling with GCC 4.6.0 20100925 produced warnings:
/src/qemu/target-i386/op_helper.c: In function 'switch_tss':
/src/qemu/target-i386/op_helper.c:283:53: error: variable 'new_trap' set but not used [-Werror=unused-but-set-variable]
Fix by adding a dummy cast so that the variable is not unused. Add also
pointer to docs.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Switch tree to lookup-by-name using qemu_find_opts().
Also hook up virtfs options so qemu_find_opts works for them too.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Correct the calculation of the offset in the msrpm
for the MSR range 0 - 0x1fff.
Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Replace array size calculations with ARRAY_SIZE macro.
Implemented with this Coccinelle semantic patch, adapted from
Linux kernel:
@@
type T;
T[] E;
@@
- (sizeof(E)/sizeof(*E))
+ ARRAY_SIZE(E)
@@
type T;
T[] E;
@@
- (sizeof(E)/sizeof(E[...]))
+ ARRAY_SIZE(E)
@@
type T;
T[] E;
@@
- (sizeof(E)/sizeof(T))
+ ARRAY_SIZE(E)
Some files (*-dis.c, tests/*) had to be filtered out.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This patch simplifies target-i386/translate.c a bit by replacing some
code with gen_update_cc_op()
Signed-off-by: Jun Koi <junkoi2004@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This patch replaces constant value assigned for (DisasContext
*)->is_jmp with DISAS_TB_JUMP.
Signed-off-by: Jun Koi <junkoi2004@gmail.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
i386 cpuid.c currently claims XSAVE is supported in the CPUID filter,
but that's not true: Only FXSAVE is supported. Remove that bit
from the filter.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
ssse3 uses tables with only two entries per op, but it is indexed
with b1 which can contain variables upto 3. This happens when ssse3
or sse4 are used with REP* prefixes.
Add boundary checking for this case.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
We were ignoring REX_B while special-casing NOP, i.e. xchg eax,eax.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
We currently only clear SVM_EVTINJ_VALID after successful interrupt
delivery. This apparently does not match real hardware which clears the
whole event_inj field on every vmexit, including unsuccessful interrupt
delivery.
Reported-by: Erik van der Kouwe <vdkouwe@cs.vu.nl>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
We can support it in KVM now. The 0xd leaf is queried from KVM.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
the meaning of vendor_override is actually the opposite of how it
is currently used :-(
Fix it to allow KVM to export the non-native CPUID vendor if
explicitly requested by the user.
The intended behavior is:
With TCG:
- always inject the configured vendor (either hard-coded, in config
files or via ",vendor=" commandline)
With KVM:
- by default inject the host's vendor
- if the user specifies ",vendor=" on the commandline, use this
instead of the host's vendor
- all pre-configured vendors (hard-coded, config file) are ignored
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This fixes a regression of 0e26b7b892: Reset halted also on INIT.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Make APICState completely private to apic.c by using DeviceState
in external APIs.
Move apic_init() to pc.c.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Move the actual CPUState contents handling to cpu.h and cpuid.c.
Handle CPU reset and set env->halted in pc.c.
Add a function to get the local APIC state of the current
CPU for the MMIO.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Some hosts (amd64, ia64) have an ABI that ignores the high bits
of the 64-bit register when passing 32-bit arguments. Others
require the value to be properly sign-extended for the type.
I.e. "int32_t" must be sign-extended and "uint32_t" must be
zero-extended to 64-bits.
To effect this, extend the "sizemask" parameter to tcg_gen_callN
to include the signedness of the type of each parameter. If the
tcg target requires it, extend each 32-bit argument into a 64-bit
temp and pass that to the function call.
This ABI feature is required by sparc64, ppc64 and s390x.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Negative four byte displacements need to be sign-extended after
c086b783eb. Do so.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Create a kvm32 CPU model that describes a least common denominator
for KVM capable guest CPUs. Useful for migration purposes.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
On AMD some bits from 1.EDX are reported in 80000001.EDX. The mask used
to copy bits from 1.EDX to 80000001.EDX is incorrect resulting in
unsupported features passed into a guest.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Continue vcpu execution in case emulation failure happened while vcpu
was in userspace. In this case #UD will be injected into the guest
allowing guest OS to kill offending process and continue.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Validate that KVM vcpu state is only read/written from cpu thread itself
or that cpu is stopped.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The proper logging for -d cpu is done in generic code.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
If we use larger BIOS image than current 256KB, we would need move reserved
TSS and EPT identity mapping pages. Currently TSS support this, but not
EPT.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Make use of the new KVM_GET/SET_DEBUGREGS to save/restore the x86 debug
registers.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Fixes clang errors:
CC i386-softmmu/kvm.o
/src/qemu/target-i386/kvm.c:40:9: error: 'dprintf' macro redefined
In file included from /src/qemu/target-i386/kvm.c:21:
In file included from /src/qemu/qemu-common.h:27:
In file included from /usr/include/stdio.h:910:
/usr/include/bits/stdio2.h:189:12: note: previous definition is here
CC i386-softmmu/kvm-all.o
/src/qemu/kvm-all.c:39:9: error: 'dprintf' macro redefined
In file included from /src/qemu/kvm-all.c:23:
In file included from /src/qemu/qemu-common.h:27:
In file included from /usr/include/stdio.h:910:
/usr/include/bits/stdio2.h:189:12: note: previous definition is here
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
With argument checking for cpu_fprintf, gcc throws this warning:
CC i386-softmmu/helper.o
cc1: warnings being treated as errors
/qemu/ar7/target-i386/helper.c: In function ‘cpu_x86_dump_seg_cache’:
/qemu/ar7/target-i386/helper.c:220: error: format not a string literal and no format arguments
The code is correct, but current gcc versions don't detect this.
Therefore the patch rewrites the statement to satisfy the compiler.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
QEMU uses a fixed page size for the CPU TLB. If the guest uses large
pages then we effectively split these into multiple smaller pages, and
populate the corresponding TLB entries on demand.
When the guest invalidates the TLB by virtual address we must invalidate
all entries covered by the large page. However the address used to
invalidate the entry may not be present in the QEMU TLB, so we do not
know which regions to clear.
Implementing a full vaiable size TLB is hard and slow, so just keep a
simple address/mask pair to record which addresses may have been mapped by
large pages. If the guest invalidates this region then flush the
whole TLB.
Signed-off-by: Paul Brook <paul@codesourcery.com>
Enable all features of real CPU, unsupported features will be
trimmed depending on TCG or KVM capabilities.
Move the list of unsupported TCG features near the TCG capabilities
masks.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Bump up the xlevel number for qemu32 to allow parsing of the processor
name string for this model.
Similiarly the 486 processor should have at least the feature bit
leaf enabled.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Since 64-bit capability is just another CPUID bit we now properly
mask, there is no reason anymore to hide the 64-bit capable CPU
models from a 32-bit only QEMU. All 64-bit CPUs can be used
perfectly in 32-bit legacy mode anyway, so these models also make
sense for 32-bit.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
In KVM we trim the user provided CPUID bits to match the host CPU's
one. Introduce a similar feature to QEMU/TCG. Create a mask of TCG's
capabilities and apply it to the user bits.
This allows to let the CPU models reflect their native archetypes.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Correct me if I am wrong, but kvm_trim looks like a really bloated
implementation of a bitwise AND. So remove this function and replace
it with the real stuff(TM).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Some CPUID feature flags had no string value, so they could not be
switched on or off from the command line.
Add names for the missing ones mentioned in the current public CPUID
specification from both Intel and AMD. Those only mentioned in the
Linux kernel source I put as comments.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
the host_cpuid function was located at the end of the file and had
a prototype before it's first use. Move it up and remove the
prototype.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This one was accidently removed with commit
bb0300dc57
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
CPUID leaf Fn8000_0001.EDX contains a copy of many Fn0000_0001.EDX bits.
Define a name for this mask to improve readability and avoid typos.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
about half of target-i386/helper.c consist of CPUID related functions.
Only one of them is a real TCG helper function. So move the whole
CPUID stuff out of this into a separate file to get better
maintainable parts.
This is only code reordering and should not affect QEMU's
functionality.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The commit c22549204a led movntps &
movntdq to be translated incorrectly.
Signed-off-by: TeLeMan <geleman@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Removes a set of ifdefs from exec.c.
Introduce TARGET_VIRT_ADDR_SPACE_BITS for all targets other
than Alpha. This will be used for page_find_alloc, which is
supposed to be using virtual addresses in the first place.
Signed-off-by: Richard Henderson <rth@twiddle.net>
A SIB byte with an index of 4 means "no scaled index", even if the scale
value is not 0. In 64-bit mode, if REX.X is used, an index of 4 selects
%r12. This is correctly handled by the computation of the index variable,
which includes the index bits, and also the REX.X prefix:
index = ((code >> 3) & 7) | REX_X(s);
Thanks to Avi Kivity, Jamie Lokier and Malc for the analysis of the
problem and the initial patch.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Do not write nmi_pending, sipi_vector, and mpstate unless we at least go
through a reset. And TSC as well as KVM wallclocks should only be
written on full sync, otherwise we risk to drop some time on state
read-modify-write.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This grand cleanup drops all reset and vmsave/load related
synchronization points in favor of four(!) generic hooks:
- cpu_synchronize_all_states in qemu_savevm_state_complete
(initial sync from kernel before vmsave)
- cpu_synchronize_all_post_init in qemu_loadvm_state
(writeback after vmload)
- cpu_synchronize_all_post_init in main after machine init
- cpu_synchronize_all_post_reset in qemu_system_reset
(writeback after system reset)
These writeback points + the existing one of VCPU exec after
cpu_synchronize_state map on three levels of writeback:
- KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
- KVM_PUT_RESET_STATE (on synchronous system reset, all VCPUs stopped)
- KVM_PUT_FULL_STATE (on init or vmload, all VCPUs stopped as well)
This level is passed to the arch-specific VCPU state writing function
that will decide which concrete substates need to be written. That way,
no writer of load, save or reset functions that interact with in-kernel
KVM states will ever have to worry about synchronization again. That
also means that a lot of reasons for races, segfaults and deadlocks are
eliminated.
cpu_synchronize_state remains untouched, just as Anthony suggested. We
continue to need it before reading or writing of VCPU states that are
also tracked by in-kernel KVM subsystems.
Consequently, this patch removes many cpu_synchronize_state calls that
are now redundant, just like remaining explicit register syncs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
So far we synchronized any dirty VCPU state back into the kernel before
updating the guest debug state. This was a tribute to a deficite in x86
kernels before 2.6.33. But as this is an arch-dependent issue, it is
better handle in the x86 part of KVM and remove the writeback point for
generic code. This also avoids overwriting the flushed state later on if
user space decides to change some more registers before resuming the
guest.
We furthermore need to reinject guest exceptions via the appropriate
mechanism. That is KVM_SET_GUEST_DEBUG for older kernels and
KVM_SET_VCPU_EVENTS for recent ones. Using both mechanisms at the same
time will cause state corruptions.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If you make use of hw breakpoints on a 32bit x86 linux host, qemu
will segmentation fault when processing the exception.
The problem is that the value of env is stored in $ebp in the op_helper
raise_exception() function, and it can have the wrong value when
calling it from non generated code.
It is possible to work around the problem by restoring the value of
env before calling raise_exception() using a new helper function that
takes (CPUState *) as one of the arguments.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
helper.o(.text+0x11e0): In function `listflags':
/src/qemu/target-i386/helper.c:661: warning: sprintf() is often misused, please use snprintf()
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This is a reimplementation of prior versions which adds
the ability to define cpu models for contemporary processors.
The added models are likewise selected via -cpu <name>,
and are intended to displace the existing convention
of "-cpu qemu64" augmented with a series of feature flags.
A primary motivation was determination of a least common
denominator within a given processor class to simplify guest
migration. It is still possible to modify an arbitrary model
via additional feature flags however the goal here was to
make doing so unnecessary in typical usage. The other
consideration was providing models names reflective of
current processors. Both AMD and Intel have reviewed the
models in terms of balancing generality of migration vs.
excessive feature downgrade relative to released silicon.
This version of the patch replaces the prior hard wired
definitions with a configuration file approach for new
models. Existing models are thus far left as-is but may
easily be transitioned to (or may be overridden by) the
configuration file representation.
Proposed new model definitions are provided here for current
AMD and Intel processors. Each model consists of a name
used to select it on the command line (-cpu <name>), and a
model_id which corresponds to a least common denominator
commercial instance of the processor class.
A table of names/model_ids may be queried via "-cpu ?model":
:
x86 Opteron_G3 AMD Opteron 23xx (Gen 3 Class Opteron)
x86 Opteron_G2 AMD Opteron 22xx (Gen 2 Class Opteron)
x86 Opteron_G1 AMD Opteron 240 (Gen 1 Class Opteron)
x86 Nehalem Intel Core i7 9xx (Nehalem Class Core i7)
x86 Penryn Intel Core 2 Duo P9xxx (Penryn Class Core 2)
x86 Conroe Intel Celeron_4x0 (Conroe/Merom Class Core 2)
:
Also added is "-cpu ?dump" which exhaustively outputs all config
data for all defined models, and "-cpu ?cpuid" which enumerates
all qemu recognized CPUID feature flags.
The pseudo cpuid flag 'check' when added to the feature flag list
will warn when feature flags (either implicit in a cpu model or
explicit on the command line) would have otherwise been quietly
unavailable to a guest:
# qemu-system-x86_64 ... -cpu Nehalem,check
warning: host cpuid 0000_0001 lacks requested flag 'sse4.2|sse4_2' [0x00100000]
warning: host cpuid 0000_0001 lacks requested flag 'popcnt' [0x00800000]
A similar 'enforce' pseudo flag exists which in addition
to the above causes qemu to error exit if requested flags are
unavailable.
Configuration data for a cpu model resides in the target config
file which by default will be installed as:
/usr/local/etc/qemu/target-<arch>.conf
The format of this file should be self explanatory given the
definitions for the above six models and essentially mimics
the structure of the static x86_def_t x86_defs.
Encoding of cpuid flags names now allows aliases for both the
configuration file and the command line which reconciles some
Intel/AMD/Linux/Qemu naming differences.
This patch was tested relative to qemu.git.
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Remove all references to KVM_CR3_CACHE as it was never implemented.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Hi,
Kevin and I have agreed on the approach for this one now. So here is
the latest version of the patch for QEMU, submitting e820 reservation
entries via fw_cfg.
Cheers,
Jes
Use qemu-cfg to provide the BIOS with an optional table of e820 entries.
Notify the BIOS of the location of the TSS+EPT range to by reserving
it via the e820 table.
This matches a corresponding patch for Seabios, however older versions
of Seabios will default to the hardcoded address range and stay
compatible with current QEMU.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
From qemu-kvm: Kernels before 2.6.30 misreported some essential CPU
features via KVM_GET_SUPPORTED_CPUID. Fix them up.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
The final version of VCPU events in 2.6.33 will allow to skip
nmi_pending and sipi_vector on KVM_SET_VCPU_EVENTS. For now let's write
them unconditionally, which is unproblematic for upstream due to missing
SMP support. Future version which enable SMP will write them only on
reset.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
These are unused since edea5f0 (no need to define global registers in
cpu-exec.c, 2008-05-10).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Initialize KVM paravirt cpuid leaf and allow user to control guest
visible PV features through -cpu flag.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Without this qemu can even start on kvm modules with events support
since default value of exception_injected in zero and this is #DE
exception.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Now, if we inject a fatal MCE into guest OS, for example Linux, Linux
will go panic and then reboot. But if we inject another MCE now,
system will reset directly instead of go panic firstly, because
MCG_STATUS.MCIP is set to 1 and not cleared after reboot. This is does
not follow the behavior in real hardware.
This patch fixes this via set env->mcg_status to 0 during system reset.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Similarly to what is done in 32938e127f
for "jmp im", trunc the immediate to 32-bit when not running in 64-bit
mode.
Reported-by: Kevin O'Connor <kevin@koconnor.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This reverts commit ebbc8a3d8e.
As suggested by Jan Kiszka,
"It was obsoleted by d1793b836f8f123b961c613de1bb1c0c185c84cc and now
saves/restores a useless field."
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
hw_breakpoint_type and hw_breakpoint_len used the wrong index multiplier
to extract type and len.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Marcelo correctly remarked that there are usage conflicts between QEMU
core code and KVM /wrt exception_index. So spend a separate field and
also save/restore it properly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The CPUID features QEMU presented to the guest were not up-to-date
with QEMU's emulated feature set.
Add the missing bits of recent (and not so recent) additions to
QEMU's emulation engine.
For stability reasons only the user mode usable bits are exposed for
now, features like Monitor or CR8LEG are left out.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Currently, the msrs involved in setting up pvclock are not saved over
migration and/or save/restore. This patch puts their value in special
fields in our CPUState, and deal with them using vmstate.
kvm also has to account for it, by including them in the msr list
for the ioctls.
This is a backport from qemu-kvm.git
[v2: sucessfully build without kerneldir ]
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
As KVM now makes use of exception_index to keep pending exceptions, we
have to save&restore this field as well.
NOTE: We have to nail the arch-independent exception_index down to a
certain bit width for proper vmstate processing, namely to 32 bit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The multicore CPUID code detects whether the guest is an Intel or an
AMD CPU, because the Linux kernel is picky about the CmpLegacy bit.
KVM by default passes through the host's vendor, which was not
catched by the code. So fork out the vendor determining bits into a
separate function to be used from both places and always get the real
vendor.
This fixes KVM's multicore setup on Intel CPUs.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Reported-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
KVM_GET_MSR_INDEX_LIST returns -E2BIG when the provided space is too
small for all MSRs. But this is precisely the error we trigger with the
initial request in order to obtain that size. Do not fail in that case.
This caused a subtle corruption of the guest state as MSR_STAR was not
properly saved/restored. The corruption became visible with latest kvm
optimizing the MSR updates.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch extends the qemu-kvm state sync logic with support for
KVM_GET/SET_VCPU_EVENTS, giving access to yet missing exception,
interrupt and NMI states.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Drop interrupt_bitmap from the cpustate and solely rely on the integer
interupt_injected. This prepares us for the new injected-interrupt
interface, which will deprecate the bitmap, while preserving
compatibility.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
There is absolutely no need to call reset functions when initializing
devices. Since we are already registering them, calling qemu_system_reset()
should suffice. Actually, it is what happens when we reboot the machine,
and using the same process instead of a special case semantics will even
allow us to find bugs easier.
Furthermore, the fact that we initialize things like the cpu quite early,
leads to the need to introduce synchronization stuff like qemu_system_cond.
This patch removes it entirely. All we need to do is call qemu_system_reset()
only when we're already sure the system is up and running
I tested it with qemu (with and without io-thread) and qemu-kvm, and it
seems to be doing okay - although qemu-kvm uses a slightly different patch.
[ v2: user mode still needs cpu_reset, so put it in ifdef. ]
[ v3: leave qemu_system_cond for now. ]
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This allows to define VMSTATE_SINGLE with VMSTATE_SINGLE_TEST
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
lzcnt is a AMD Phenom/Barcelona added instruction returning the
number of leading zero bits in a word.
As this is similar to the "bsr" instruction, reuse the existing
code. There need to be some more changes, though, as lzcnt always
returns a valid value (in opposite to bsr, which has a special
case when the operand is 0).
lzcnt is guarded by the ABM CPUID bit (Fn8000_0001:ECX_5).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The arpl implementation in target-i386/translate.c uses cpu_A0
temporary across a brcond op. This patch fixes that issue.
Signed-off-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This reduce the impact on hosts that have addressing modes with limited
offsets. Suggested by Laurent Desnogues.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
There was a missmerge, and then we got a tail recursive call to cpu_post_load
without case base :)
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Commit 56aebc8916 changed gdbstub in way
that debugging 32 or 16-bit guest code is no longer possible with qemu
for x86_64 guest CPUs. Since that commit, qemu only provides registers
sets for 64-bit, forcing current and foreseeable gdb to also switch its
architecture to 64-bit. And this breaks if the inferior is 32 or 16 bit.
No question, this is a gdb issue. But, as it was confirmed in several
discusssions with gdb people, it is a non-trivial thing to fix. So until
qemu finds a gdb version attach with a rework x86 support, we have to
work around it by switching the register layout as the guest switches
its execution mode between 16/32 and 64 bit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
mce_banks is always MCE_BANKS_DEF * 4 in size, value never change
CC: Huang Ying <ying.huang@intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Don't even ask, being able to load/save between 64<->80bit floats should be forbidden
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We save more that fpus on that 16 bits (fpstt), we need an additional field
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This makes the savevm code correct, and sign extensins gives us exactly
what we need (namely, sign extend to 64 bits when used with 64bit addresess.
Once there, change 0x100000 for 1 << 20, that maks all a20 use the same syntax.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch corrects the following aspects of exception generation in
fxsave/fxrstor:
* Generate #GP if the operand is not aligned to a 16 byte boundary
* Generate #UD if the LOCK prefix is used
* For CR0.EM = 1 #NM is generated, not #UD
Signed-off-by: Kevin Wolf <mail@kevin-wolf.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
RDTSCP reads the time stamp counter and atomically also the content
of a 32-bit MSR, which can be freely set by the OS. This allows CPU
local data to be queried by userspace.
Linux uses this to allow a fast implementation of the getcpu()
syscall, which uses the vsyscall page to avoid a context switch.
AMD CPUs since K8RevF and Intel CPUs since Nehalem support this
instruction.
RDTSCP is guarded by the RDTSCP CPUID bit (Fn8000_0001:EDX[27]).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This adds support for the AMD Phenom/Barcelona's SSE4a instructions.
Those include insertq and extrq, which are doing shift and mask on
XMM registers, in two versions (immediate shift/length values and
stored in another XMM register).
Additionally it implements movntss, movntsd, which are scalar
non-temporal stores (avoiding cache trashing). These are implemented
as normal stores, though.
SSE4a is guarded by the SSE4A CPUID bit (Fn8000_0001:ECX[6]).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
AMD CPUs featuring a shortcut to access CR8 even from 32-bit mode.
If you use the LOCK prefix with "mov CR0", it accesses CR8 instead.
This behavior is guarded by the CR8_LEGACY CPUID bit
(Fn8000_0001:ECX[1]).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
In the very least, a change like this requires discussion on the list.
The naming convention is goofy and it causes a massive merge problem. Something
like this _must_ be presented on the list first so people can provide input
and cope with it.
This reverts commit 99a0949b72.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The CPU state parameter is not used, remove it and adjust callers. Now we
can compile ioport.c once for all targets.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc923584,
f40d753718,
96555a96d7 and
3990d09adf but the fixes were fragile.
Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Direct call to kvm_arch_get_registers() bypass logic in
cpu_synchronize_state()
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
cpu_synchronize_state() is a little unreadable since the 'modified'
argument isn't self-explanatory. Simplify it by making it always
synchronize the kernel state into qemu, and automatically flush the
registers back to the kernel if they've been synchronized on this
exit.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In addition to the TCG based qemu64 type let's introduce a kvm64 CPU type,
which is the least common denominator of all KVM-capable x86-CPUs
(based on Intel Pentium 4 Prescott). It can be used as a base type
for migration.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The CPUID level determines how many CPUID leafs are exposed to the guest.
Some features (like multi-core) cannot be propagated without the proper
level, but guests maybe confused by bogus entries in some leafs.
So add level= and xlevel= to the list of -cpu options to allow the user to
override the default settings. While at it, merge unnecessary local
variables into one and allow hexadecimal arguments.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Controlled by the enhanced -smp option set the CPUID bits to present the
guest the desired topology. This is vendor specific, but (with the exception
of the CMP_LEGACY bit) not conflicting, so we set all bits everytime.
There is no real multithreading support for AMD CPUs, so report cores
instead.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Intel CPUs store the number of cores in CPUID leaf 4. So push
the maxleaf value to 4 to allow the guests access to this leaf.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
handle_cpu_signal is very nearly copy-paste code for each target, with a
few minor variations. This patch sets up appropriate defaults for a
generic handle_cpu_signal and provides overrides for particular targets
that did things differently. Fixing things like the persistent (XXX:
use sigsetjmp) should now become somewhat easier.
Previous comments on this patch suggest that the "activate soft MMU for
this block" comments refer to defunct functionality. I have removed
such blocks for the appropriate targets in this patch.
Signed-off-by: Nathan Froyd <froydnj@codesourcery.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
kqemu introduces a number of restrictions on the i386 target. The worst is that
it prevents large memory from working in the default build.
Furthermore, kqemu is fundamentally flawed in a number of ways. It relies on
the TSC as a time source which will not be reliable on a multiple processor
system in userspace. Since most modern processors are multicore, this severely
limits the utility of kqemu.
kvm is a viable alternative for people looking to accelerate qemu and has the
benefit of being supported by the upstream Linux kernel. If someone can
implement work arounds to remove the restrictions introduced by kqemu, I'm
happy to avoid and/or revert this patch.
N.B. kqemu will still function in the 0.11 series but this patch removes it from
the 0.12 series.
Paul, please Ack or Nack this patch.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Since we recently do not disable 3DNOW! support anymore, we should
avoid setting the bits in the default qemu64 CPU model to ease
migration. TCG does not support it anyway and even AMD deprecates
it's usage nowadays.
If you want to use it in KVM, use the phenom, athlon or host CPU
model.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This allows to set segment registers via gdb also in system emulation
mode. Basic sanity checks are applied and nothing is changed if they
fail. But screwing up the target via this interface will never be
complicated, so I avoided being too paranoid here.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Allocate enough memory for KVM_GET_MSR_INDEX_LIST as older kernels shot
far beyond their limits, corrupting user space memory.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
- MCE features are initialized when VCPU is intialized according to CPUID.
- A monitor command "mce" is added to inject a MCE.
- A new interrupt mask: CPU_INTERRUPT_MCE is added to inject the MCE.
aliguori: fix build for linux-user
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch aligns the KVM-related layout and encoding of the CPU state
to be saved to disk or migrated with qemu-kvm. The major differences are
reordering of fields and a compressed interrupt_bitmap into a single
number as there can be no more than one pending IRQ at a time.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The KVM kernel will disable all bits in CPUID which are not present in
the host. As this is mostly true for the hypervisor bit (1.ecx),
preserve its value before the trim and restore it afterwards.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
KVM provides an in-kernel feature to disable CPUID bits that are not
present in the current host. So there is no need here to duplicate this
work. Additionally allows 3DNow! on capable processors, since the
restriction seems to apply to QEMU/TCG only.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If we want to trim the user provided CPUID bits for KVM to be not greater
than that of the host, we should not remove the bits _after_ we sent
them to the kernel.
This fixes the masking of features that are not present on the host by
moving the trim function and it's call from helper.c to kvm.c.
It helps to use -cpu host.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Although the guest's CPUID bits can be controlled in a fine grained way
in QEMU, a simple way to inject the host CPU is missing. This is handy
for KVM desktop virtualization, where one wants the guest to support the
full host feature set.
Introduce another CPU type called 'host', which will propagate the host's
CPUID bits to the guest. Unwanted bits can still be turned off by using
the existing syntax (-cpu host,-skinit)
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
KVM defaults to the hypervisor CPUID bit to be set, whereas pure
QEMU clears it. On some occasions one wants to set or clear it the
other way round (for instance to get HyperV running inside a guest).
Move the bit-set to be done before the command line parsing and
enable it by default. One can disable it by using: -cpu qemu64,-hypervisor
Fix some whitespace damage on the way.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This should fix compilation problem in case of CONFIG_USER_ONLY.
Currently INIT/SIPI is handled in the context of CPU that sends IPI.
This patch changes this to handle them like all other events in a main
cpu exec loop. When KVM will gain thread per vcpu capability it will
be much more clear to handle those event by cpu thread itself and not
modify one cpu's state from the context of the other.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
As per the IA32 processor manual, the accessed bit is set to 1 in the
processor state after reset. qemu pc cpu_reset code was missing this
accessed bit setting.
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
KVM-enabled QEMU will always report the vendor ID of the physical CPU it is
running on. Allow to override this if explicitly requested on the
command line. It will not suffice to name a CPU type (like -cpu phenom),
but you have to explicitly set the vendor: -cpu phenom,vendor=AuthenticAMD
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Save and restore all so far neglected KVM-specific CPU states. Handling
the TSC stabilizes migration in KVM mode. The interrupt_bitmap and
mp_state are currently unused, but will become relevant for in-kernel
irqchip support. By including proper saving/restoring already, we avoid
having to increment CPU_SAVE_VERSION later on once again.
v2:
- initialize mp_state runnable (for the boot CPU)
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds the missing hooks to allow live migration in KVM mode.
It adds proper synchronization before/after saving/restoring the VCPU
states (note: PPC is untested), hooks into
cpu_physical_memory_set_dirty_tracking() to enable dirty memory logging
at KVM level, and synchronizes that drity log into QEMU's view before
running ram_live_save().
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
KVM_GET_SUPPORTED_CPUID has been known to fail to return -E2BIG
when it runs out of entries. Detect this by always trying again
with a bigger table if the ioctl() fills the table.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Include assert.h from qemu-common.h and remove other direct uses.
cpu-all.h still need to include it because of the dyngen-exec.h hacks
Signed-off-by: Paul Brook <paul@codesourcery.com>
Remove cpu features that are not supported by kvm from the cpuid features
reported to the guest.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
QEMU allows adding or removing cpu features by using the syntax '-cpu +feature'
or '-cpu -feature'. Some cpuid features cause more than one bit to be set or
cleared; but QEMU stops after just one bit has been modified, causing the
feature bits to be inconsistent.
Fix by allowing all feature bits corresponding to a given name to be set.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
kvm does not support all cpu features; add support for dunamically querying
the supported feature set.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This broke due to r7230.
Signed-off-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7233 c046a42c-6fe2-441c-8c8c-71466251a162
If fault happened during event delivery exit_int_info should contain
valid info about the event on vm exit.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7230 c046a42c-6fe2-441c-8c8c-71466251a162
- configure script and build system changes.
- wind up new machine type.
- add -xen-* command line options.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7219 c046a42c-6fe2-441c-8c8c-71466251a162
Parse the descriptor flags that segment registers refer to and show the
result in a more human-friendly format. The output of info registers eg.
then looks like this:
[...]
ES =007b 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
CS =0060 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =007b 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
FS =0000 00000000 00000000 00000000
GS =0033 b7dd66c0 ffffffff b7dff3dd DPL=3 DS [-WA]
LDT=0000 00000000 00000000 00008200 DPL=0 LDT
TR =0080 c06da700 0000206b 00008900 DPL=0 TSS32-avl
[...]
Changes in this version:
- refactoring so that only a single helper is used for dumping the
segment descriptor cache
- tiny typo fixed that broke 64-bit segment type names
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7179 c046a42c-6fe2-441c-8c8c-71466251a162