KVM Hyper-V based guests can notify hypervisor about
occurred guest crash by writing into Hyper-V crash MSR's.
This patch does handling and migration of HV_X64_MSR_CRASH_P0-P4,
HV_X64_MSR_CRASH_CTL msrs. User can enable these MSR's by
'hv-crash' option.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Andreas Färber <afaerber@suse.de>
Message-Id: <1435924905-8926-13-git-send-email-den@openvz.org>
[Folks, stop abrviating variable names!!! Also fix compilation on
non-Linux/x86. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Hyper-V definitions are an industry standard and can be used
from code that is not KVM-specific.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ARAT signals that the APIC timer does not stop in power saving states.
As our APICs are emulated, it's fine to expose this feature to guests,
at least when asking for KVM host features or with CPU types that
include the flag. The exact model number that introduced the feature is
not known, but reports can be found that it's at least available since
Sandy Bridge.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The property can take values on, off or auto. The default is "off"
for KVM and pre-2.4 machines, otherwise "auto" (which makes it
available on TCG or on new-enough kernels).
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Apart from the MSR, the smi field of struct kvm_vcpu_events has to be
translated into the corresponding CPUX86State fields. Also,
memory transaction flags depend on SMM state, so pull it from struct
kvm_run on every exit from KVM to userspace.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This opens the path to get rid of the iothread lock on vmexits in KVM
mode. On x86, the in-kernel irqchips has to be used because we otherwise
need to synchronize APIC and other per-cpu state accesses that could be
changed concurrently.
Regarding pre/post-run callbacks, s390x and ARM should be fine without
specific locking as the callbacks are empty. MIPS and POWER require
locking for the pre-run callback.
For the handle_exit callback, it is non-empty in x86, POWER and s390.
Some POWER cases could do without the locking, but it is left in
place for now.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1434646046-27150-7-git-send-email-pbonzini@redhat.com>
In particular, don't include it into headers.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
On ARM the MSI data corresponds to the shared peripheral interrupt (SPI)
ID. This latter equals to the SPI index + 32. to retrieve the SPI index,
matching the gsi, an architecture specific function is introduced.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Let kvm_arch_post_run convert fields in the kvm_run struct to MemTxAttrs.
These are then passed to address_space_rw.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A bunch of fixes all over the place, some of the
bugs fixed are actually regressions.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVAH/uAAoJECgfDbjSjVRprq0H/iyqLSHQIv6gNOPYQbLXOCv0
pkCeLx6kTMO9lSwxZcsZvMsYPeiEL3CHRKJcEjq0+Ap0uen0pa2Yl3WzyJcnBcib
xwkHk/UftFYAiZAzVtd4moXujvVLYNL1ukvr/wPOdIkTEn8U6K3NaT3pLooc369f
oTyQhlL3E9HJ5S6X0HXJIFwtsOIhPfS3NCLoDFbFjtb9mIsqTx7N5s2C5hctF+ir
JtyuwPx5oT73WYxoYmjSP6n/Nf5cuJdqtm6o2KijjhWWYMJ6epYVBo/DD6dIFbmJ
V/23dxpon+lvhae2c2LAVrkiJ1Boon/eMbJK/mNwpFX7vW35ataLPy6pYpaiEJs=
=RUld
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
misc fixes and cleanups
A bunch of fixes all over the place, some of the
bugs fixed are actually regressions.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Wed Mar 11 17:48:30 2015 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream: (25 commits)
virtio-scsi: remove empty wrapper for cmd
virtio-scsi: clean out duplicate cdb field
virtio-scsi: fix cdb/sense size
uapi/virtio_scsi: allow overriding CDB/SENSE size
virtio-scsi: drop duplicate CDB/SENSE SIZE
exec: don't include hw/boards for linux-user
acpi: specify format for build_append_namestring
MAINTAINERS: drop aliguori@amazon.com
tpm: Move memory subregion function into realize function
virtio-pci: Convert to realize()
pci: Convert pci_nic_init() to Error to avoid qdev_init()
machine: query mem-merge machine property
machine: query dump-guest-core machine property
hw/boards: make it safe to include for linux-user
machine: query phandle-start machine property
machine: query kvm-shadow-mem machine property
kvm: add machine state to kvm_arch_init
machine: query kernel-irqchip property
machine: allowed/required kernel-irqchip support
machine: replace qemu opts with iommu property
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit e79d5a6 ("machine: remove qemu_machine_opts global list") removed
the global option descriptions and moved them to MachineState's QOM
properties.
Query kvm-shadow-mem by accessing machine properties through designated
wrappers.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Needed to query machine's properties.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
The field doesn't need to be inside CPUX86State, and it is not specific
for the CPUID instruction, so move and rename it.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This reverts commit b8a173b25c, reversing
changes made to 5de090464f.
(I applied this pull request when I should not have done so, and
am now immediately reverting it.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The field doesn't need to be inside CPUState, and it is not specific for
the CPUID instruction, so move and rename it.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Right now, the AVX512 registers are split in many different fields:
xmm_regs for the low 128 bits of the first 16 registers, ymmh_regs
for the next 128 bits of the same first 16 registers, zmmh_regs
for the next 256 bits of the same first 16 registers, and finally
hi16_zmm_regs for the full 512 bits of the second 16 bit registers.
This makes it simple to move data in and out of the xsave region,
but would be a nightmare for a hypothetical TCG implementation and
leads to a proliferation of [XYZ]MM_[BWLSQD] macros. Instead,
this patch marshals data manually from the xsave region to a single
32x512-bit array, simplifying the macro jungle and clarifying which
bits are in which vmstate subsection.
The migration format is unaffected.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
the record/replay series, and a few SCSI and i386 patches as well.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJUtjlCAAoJEL/70l94x66Dy5gH/0QIHoXVH/2wuA9apNK2/gBj
2U7g08QGKlc2wQGF4a48sQf523lSt5eirVxrwta0wmvFeznrdR84d4YGpolHM67A
Q9Y5J2i+v1H6cfQH6ylq61QQ7rEC3+isa65wblLeMSCAb2W1CcV7avSKu4BSPZw2
jGr3jd2Ve7pOsULpPhiNsmmltYSeZc7sQBYc9C7fQEoxOGsNnRoKOUKPnIk1mJTc
iYH480L1MnOL3enIz13K34lQofNRhJxJBLYKhYsBydQbOh0/Ls1eifOY4xEegXZ0
IUODy6c2pk+s/IUPARpBucKGKzDxdv0DLXDV60uGn5EsYT0CjCl9/sRs3bZvaQE=
=eT8u
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
Mostly bugfixes and cleanups from qemu-devel. Yet another small patch from
the record/replay series, and a few SCSI and i386 patches as well.
# gpg: Signature made Wed 14 Jan 2015 09:39:14 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream:
cpus: consistently use QEMU_CLOCK_VIRTUAL_RT for icount_warp_rt timer
qemu-timer: rename timer_init to timer_init_tl
scsi: fix cancellation when I/O was completed but DMA was not.
rules.mak: Fix module build
hw/scsi/lsi53c895a: add support for additional diag / debug registers
qemu-common.h: optimise muldiv64 if int128 is available
target-i386: do not memcpy in and out of xmm_regs
target-i386: fix movntsd on big-endian hosts
vl.c: fix regression when reading memory size from config file
vl: Don't silently change topology when all -smp options were set
vl: fix max_cpus check
vl: Avoid unnecessary 'if' nesting
9pfs: changed to use event_notifier instead of qemu_pipe
vl.c: fix regression when reading machine type from config file
char: restore stdio echo on resume from suspend.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
After the next patch, we will move the high parts of AVX and AVX512 registers
in the same array as the SSE registers. This will make it impossible to
memcpy an array of 128-bit values in and out of xmm_regs in one swoop.
Use a for loop instead.
Similarly, always use XMM_Q in translate.c. This avoids introducing bugs
such as the one fixed in the previous patch.
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
on s390 MSI-X irqs are presented as thin or adapter interrupts
for this we have to reorganize the routing entry to contain
valid information for the adapter interrupt code on s390.
To minimize impact on existing code we introduce an architecture
function to fixup the routing entry.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
This commit only touches allocations with size arguments of the form
sizeof(T).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add xsaves related definition, it also adds corresponding part
to kvm_get/put, and vmstate.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
struct kvm_vcpu_events contains reserved fields. Let's use a
designated initializer to avoid false positives in valgrind.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
struct kvm_msrs contains a pad field. Let's use a designated
initializer on the info part to avoid false positives from
valgrind/memcheck.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
struct kvm_msrs contains padding bytes. Let's use a designated
initializer on the info part to avoid false positives from
valgrind/memcheck. Do the same for generic MSRS, the TSC and
feature control.
We also need to zero out the reserved fields in the entries.
We do this in kvm_msr_entry_set as suggested by Paolo. This
avoids a big memset that a designated initializer on the
full structure would do.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
struct kvm_xcrs contains padding bytes. Let's use a designated
initializer to avoid false positives from valgrind/memcheck.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Compute kvm_irqfds_allowed by checking the KVM_CAP_IRQFD extension.
Remove direct settings in architecture specific files.
Add a new kvm_resamplefds_allowed variable, initialized by
checking the KVM_CAP_IRQFD_RESAMPLE extension. Add a corresponding
kvm_resamplefds_enabled() function.
A special notice for s390 where KVM_CAP_IRQFD was not immediatly
advirtised when irqfd capability was introduced in the kernel.
KVM_CAP_IRQ_ROUTING was advertised instead.
This was fixed in "KVM: s390: announce irqfd capability",
ebc3226202d5956a5963185222982d435378b899 whereas irqfd support
was brought in 84223598778ba08041f4297fda485df83414d57e,
"KVM: s390: irq routing for adapter interrupts". Both commits
first appear in 3.15 so there should not be any kernel
version impacted by this QEMU modification.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The MTRR state in KVM currently runs completely independent of the
QEMU state in CPUX86State.mtrr_*. This means that on migration, the
target loses MTRR state from the source. Generally that's ok though
because KVM ignores it and maps everything as write-back anyway. The
exception to this rule is when we have an assigned device and an IOMMU
that doesn't promote NoSnoop transactions from that device to be cache
coherent. In that case KVM trusts the guest mapping of memory as
configured in the MTRR.
This patch updates kvm_get|put_msrs() so that we retrieve the actual
vCPU MTRR settings and therefore keep CPUX86State synchronized for
migration. kvm_put_msrs() is also used on vCPU reset and therefore
allows future modificaitons of MTRR state at reset to be realized.
Note that the entries array used by both functions was already
slightly undersized for holding every possible MSR, so this patch
increases it beyond the 28 new entries necessary for MTRR state.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Invariant TSC documentation mentions that "invariant TSC will run at a
constant rate in all ACPI P-, C-. and T-states".
This is not the case if migration to a host with different TSC frequency
is allowed, or if savevm is performed. So block migration/savevm.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
[AF+mtosatti: Updated error message]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Since Linux kernel 3.5, KVM has documented eax for leaf 0x40000000
to be KVM_CPUID_FEATURES:
57c22e5f35
But qemu still tries to set it to 0. It would be better to make qemu
and kvm consistent. This patch just fixes this issue.
Signed-off-by: Jidong Xiao <jidong.xiao@gmail.com>
[Include kvm_base in the value. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The latest Nvidia driver (337.88) specifically checks for KVM as the
hypervisor and reports Code 43 for the driver in a Windows guest when
found. Removing or changing the KVM signature is sufficient for the
driver to load and work. This patch adds an option to easily allow
the KVM hypervisor signature to be hidden using '-cpu kvm=off'. We
continue to expose KVM via the cpuid value by default. The state of
this option does not supercede or replace -enable-kvm or the accel=kvm
machine option. This only changes the visibility of KVM to the guest
and paravirtual features specifically tied to the KVM cpuid.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS. We get this right in the common
case, because writes to CR0 do not modify the CPL, but it would
not be enough if an SMI comes exactly during that brief period.
Were this to happen, the RSM instruction would erroneously set
CPL to the low two bits of the real-mode selector; and if they are
not 00, the next instruction fetch cannot access the code segment
and causes a triple fault.
However, SS.DPL *is* always equal to the CPL. In real processors
(AMD only) there is a weird case of SYSRET setting SS.DPL=SS.RPL
from the STAR register while forcing CPL=3, but we do not emulate
that.
Tested-by: Kevin O'Connor <kevin@koconnor.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that we have a CPU object with a reset method, it is better to
keep the KVM reset close to the CPU reset. Using qemu_register_reset
as we do now keeps them far apart.
With this patch, PPC no longer calls the kvm_arch_ function, so
it can get removed there. Other arches call it from their CPU
reset handler, and the function gets an ARMCPU/X86CPU/S390CPU.
Note that ARM- and s390-specific functions are called kvm_arm_*
and kvm_s390_*, while x86-specific functions are called kvm_arch_*.
That follows the convention used by the different architectures.
Changing that is the topic of a separate patch.
Reviewed-by: Gleb Natapov <gnatapov@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This fixes warnings from the static code analysis (smatch).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
http://msdn.microsoft.com/en-us/library/windows/hardware/ff541625%28v=vs.85%29.aspx
This code is generic for activating reference time counter or virtual reference time stamp counter
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The MS docs specify HV_X64_MSR_HYPERCALL as a mandatory interface,
thus we must provide the MSRs even if the user only specified
features that, like relaxed timing, in principle don't require them.
And the MSRs are only there if the hypervisor has KVM_CAP_HYPERV.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_arch_init_vcpu's initialization of the KVM leaves at 0x40000100
is broken, because KVM_CPUID_FEATURES is left at 0x40000001. Move
it to 0x40000101 if Hyper-V is enabled.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qemu-kvm/uq/master:
kvm: always update the MPX model specific register
KVM: fix addr type for KVM_IOEVENTFD
KVM: Retry KVM_CREATE_VM on EINTR
mempath prefault: fix off-by-one error
kvm: x86: Separately write feature control MSR on reset
roms: Flush icache when writing roms to guest memory
target-i386: clear guest TSC on reset
target-i386: do not special case TSC writeback
target-i386: Intel MPX
Conflicts:
exec.c
aliguori: fix trivial merge conflict in exec.c
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
The original patch from Liu Jinsong restricted them to reset or full
state updates, but that's unnecessary (and wrong) since the BNDCFGS
MSR has no side effects.
Cc: Liu Jinsong <jinsong.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This motion is preparing for refactoring vCPU APIC subsequently.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
If the guest is running in nested mode on system reset, clearing the
feature MSR signals the kernel to leave this mode. Recent kernels
processes this properly, but leave the VCPU state undefined behind. It
is the job of userspace to bring it to a proper shape. Therefore, write
this specific MSR first so that no state transfer gets lost.
This allows to cleanly reset a guest with VMX in use.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VCPU TSC is not cleared by a warm reset (*), which leaves some types of Linux
guests (non-pvops guests and those with the kernel parameter no-kvmclock set)
vulnerable to the overflow in cyc2ns_offset fixed by upstream commit
9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 ("sched/x86: Fix overflow in
cyc2ns_offset").
To put it in a nutshell, if such a Linux guest without the patch above applied
has been up more than 208 days and attempts a warm reset chances are that
the newly booted kernel will panic or hang.
(*) Intel Xeon E5 processors show the same broken behavior due to
the errata "TSC is Not Affected by Warm Reset" (Intel® Xeon®
Processor E5 Family Specification Update - August 2013): "The
TSC (Time Stamp Counter MSR 10H) should be cleared on
reset. Due to this erratum the TSC is not affected by warm
reset."
Cc: Will Auld <will.auld@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp>