To both avoid that kvm_irqchip_in_kernel always has to be paired with
kvm_enabled and that the former ends up in a function call, implement it
like the latter. This means keeping the state in a global variable and
defining kvm_irqchip_in_kernel as a preprocessor macro.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This introduces the alternative APIC device which makes use of KVM's
in-kernel device model. External NMI injection via LINT1 is emulated by
checking the current state of the in-kernel APIC, only injecting a NMI
into the VCPU if LINT1 is unmasked and configured to DM_NMI.
MSI is not yet supported, so we disable this when the in-kernel model is
in use.
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
KVM is forced to disable the IRQ0 override when we run with in-kernel
irqchip but without IRQ routing support of the kernel. Set the fwcfg
value correspondingly. This aligns us with qemu-kvm.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Add the basic infrastructure to active in-kernel irqchip support, inject
interrupts into these models, and maintain IRQ routes.
Routing is optional and depends on the host arch supporting
KVM_CAP_IRQ_ROUTING. When it's not available on x86, we looe the HPET as
we can't route GSI0 to IOAPIC pin 2.
In-kernel irqchip support will once be controlled by the machine
property 'kernel_irqchip', but this is not yet wired up.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
kvm_arch_get_supported_cpuid checks for global cpuid restrictions, it
does not require any CPUState reference. Changing its interface allows
to call it before any VCPU is initialized.
CC: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
There are no generic bits remaining in the handling of KVM_EXIT_DEBUG.
So push its logic completely into arch hands, i.e. only x86 so far.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We will broaden the scope of this function on x86 beyond irqchip events.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
KVM-assisted devices need access to it but we have no clean channel to
distribute a reference. As a workaround until there is a better
solution, export kvm_state for global use, though use should remain
restricted to the mentioned scenario.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In order to use log_start/log_stop with Xen as well in the vga code,
this two operations have been put in CPUPhysMemoryClient.
The two new functions cpu_physical_log_start,cpu_physical_log_stop are
used in hw/vga.c and replace the kvm_log_start/stop. With this, vga does
no longer depends on kvm header.
[ Jan: rebasing and style fixlets ]
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We do not check them, and the only arch with non-empty implementations
always returns 0 (this is also true for qemu-kvm).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Provide arch-independent kvm_on_sigbus* stubs to remove the #ifdef'ery
from cpus.c. This patch also fixes --disable-kvm build by providing the
missing kvm_on_sigbus_vcpu kvm-stub.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of splattering the code with #ifdefs and runtime checks for
capabilities we cannot work without anyway, provide central test
infrastructure for verifying their availability both at build and
runtime.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
There used to be a limit of 6 KVM io bus devices in the kernel.
On such a kernel, we can't use many ioeventfds for host notification
since the limit is reached too easily.
Add an API to test for this condition.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Port qemu-kvm's
commit 4b62fff1101a7ad77553147717a8bd3bf79df7ef
Author: Huang Ying <ying.huang@intel.com>
Date: Mon Sep 21 10:43:25 2009 +0800
MCE: Relay UCR MCE to guest
UCR (uncorrected recovery) MCE is supported in recent Intel CPUs,
where some hardware error such as some memory error can be reported
without PCC (processor context corrupted). To recover from such MCE,
the corresponding memory will be unmapped, and all processes accessing
the memory will be killed via SIGBUS.
For KVM, if QEMU/KVM is killed, all guest processes will be killed
too. So we relay SIGBUS from host OS to guest system via a UCR MCE
injection. Then guest OS can isolate corresponding memory and kill
necessary guest processes only. SIGBUS sent to main thread (not VCPU
threads) will be broadcast to all VCPU threads as UCR MCE.
aliguori: fix build
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In QEMU-KVM, physical address != RAM address. While MCE simulation
needs physical address instead of RAM address. So
kvm_physical_memory_addr_from_ram() is implemented to do the
conversion, and it is invoked before being filled in the IA32_MCi_ADDR
MSR.
Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Continue vcpu execution in case emulation failure happened while vcpu
was in userspace. In this case #UD will be injected into the guest
allowing guest OS to kill offending process and continue.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Make use of the new KVM_GET/SET_DEBUGREGS to save/restore the x86 debug
registers.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This allows limited use of kvm functions (which will return ENOSYS)
even in once-compiled modules. The patch also improves a bit the error
messages for KVM initialization.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[blauwirbel@gmail.com: fixed Win32 build]
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Comment on kvm usage: rather than require users to do if (kvm_enabled())
and/or ifdefs, this patch adds an API that, internally, is defined to
stub function on non-kvm build, and checks kvm_enabled for non-kvm
run.
While rest of qemu code still uses if (kvm_enabled()), I think this
approach is cleaner, and we should convert rest of code to it
long term.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This grand cleanup drops all reset and vmsave/load related
synchronization points in favor of four(!) generic hooks:
- cpu_synchronize_all_states in qemu_savevm_state_complete
(initial sync from kernel before vmsave)
- cpu_synchronize_all_post_init in qemu_loadvm_state
(writeback after vmload)
- cpu_synchronize_all_post_init in main after machine init
- cpu_synchronize_all_post_reset in qemu_system_reset
(writeback after system reset)
These writeback points + the existing one of VCPU exec after
cpu_synchronize_state map on three levels of writeback:
- KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
- KVM_PUT_RESET_STATE (on synchronous system reset, all VCPUs stopped)
- KVM_PUT_FULL_STATE (on init or vmload, all VCPUs stopped as well)
This level is passed to the arch-specific VCPU state writing function
that will decide which concrete substates need to be written. That way,
no writer of load, save or reset functions that interact with in-kernel
KVM states will ever have to worry about synchronization again. That
also means that a lot of reasons for races, segfaults and deadlocks are
eliminated.
cpu_synchronize_state remains untouched, just as Anthony suggested. We
continue to need it before reading or writing of VCPU states that are
also tracked by in-kernel KVM subsystems.
Consequently, this patch removes many cpu_synchronize_state calls that
are now redundant, just like remaining explicit register syncs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
So far we synchronized any dirty VCPU state back into the kernel before
updating the guest debug state. This was a tribute to a deficite in x86
kernels before 2.6.33. But as this is an arch-dependent issue, it is
better handle in the x86 part of KVM and remove the writeback point for
generic code. This also avoids overwriting the flushed state later on if
user space decides to change some more registers before resuming the
guest.
We furthermore need to reinject guest exceptions via the appropriate
mechanism. That is KVM_SET_GUEST_DEBUG for older kernels and
KVM_SET_VCPU_EVENTS for recent ones. Using both mechanisms at the same
time will cause state corruptions.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Change the way the internal qemu signal, used for communication between
iothread and vcpus, is handled.
Block and consume it with sigtimedwait on the outer vcpu loop, which
allows more precise timing control.
Change from standard signal (SIGUSR1) to real-time one, so multiple
signals are not collapsed.
Set the signal number on KVM's in-kernel allowed sigmask.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
remove direct kvm calls from exec.c, make
kvm use memory notifiers framework instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The default action of coalesced MMIO is, cache the writing in buffer, until:
1. The buffer is full.
2. Or the exit to QEmu due to other reasons.
But this would result in a very late writing in some condition.
1. The each time write to MMIO content is small.
2. The writing interval is big.
3. No need for input or accessing other devices frequently.
This issue was observed in a experimental embbed system. The test image
simply print "test" every 1 seconds. The output in QEmu meets expectation,
but the output in KVM is delayed for seconds.
Per Avi's suggestion, I hooked flushing coalesced MMIO buffer in VGA update
handler. By this way, We don't need vcpu explicit exit to QEmu to
handle this issue.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch extends the qemu-kvm state sync logic with support for
KVM_GET/SET_VCPU_EVENTS, giving access to yet missing exception,
interrupt and NMI states.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In the very least, a change like this requires discussion on the list.
The naming convention is goofy and it causes a massive merge problem. Something
like this _must_ be presented on the list first so people can provide input
and cope with it.
This reverts commit 99a0949b72.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc923584,
f40d753718,
96555a96d7 and
3990d09adf but the fixes were fragile.
Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
cpu_synchronize_state() is a little unreadable since the 'modified'
argument isn't self-explanatory. Simplify it by making it always
synchronize the kernel state into qemu, and automatically flush the
registers back to the kernel if they've been synchronized on this
exit.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
KVM can have an in-kernel pit or irqchip. While we don't implement it
yet, having a way for test for it (that always returns zero) will allow us
to reuse code in qemu-kvm that tests for it.
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
During startup and after reset we have to synchronize user space to the
in-kernel KVM state. Namely, we need to transfer the VCPU registers when
they change due to VCPU as well as APIC reset.
This patch refactors the required hooks so that kvm_init_vcpu registers
its own per-VCPU reset handler and adds a cpu_synchronize_state to the
APIC reset. That way we no longer depend on the new reset order (and can
drop this disliked interface again) and we can even drop a KVM hook in
main().
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Save and restore all so far neglected KVM-specific CPU states. Handling
the TSC stabilizes migration in KVM mode. The interrupt_bitmap and
mp_state are currently unused, but will become relevant for in-kernel
irqchip support. By including proper saving/restoring already, we avoid
having to increment CPU_SAVE_VERSION later on once again.
v2:
- initialize mp_state runnable (for the boot CPU)
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Extend kvm_physical_sync_dirty_bitmap() so that is can sync across
multiple slots. Useful for updating the whole dirty log during
migration. Moreover, properly pass down errors the whole call chain.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Introduce a global dirty logging flag that enforces logging for all
slots. This can be used by the live migration code to enable/disable
global logging withouth destroying the per-slot setting.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
kvm does not support all cpu features; add support for dunamically querying
the supported feature set.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Avi Kivity wrote:
> Suggest wrapping in a function and hiding it deep inside kvm-all.c.
>
Done in v2:
---------->
If the KVM MMU is asynchronous (kernel does not support MMU_NOTIFIER),
we have to avoid COW for the guest memory. Otherwise we risk serious
breakage when guest pages change there physical locations due to COW
after fork. Seen when forking smbd during runtime via -smb.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>