The commit 317b0a6d8 fixed an issue which caused by the outdated
env->tsc value, but the fix lead to 'cpu_synchronize_all_states()'
called twice during live migration. The 'cpu_synchronize_all_states()'
takes about 130us for a VM which has 4 vcpus, it's a bit expensive.
Synchronize the whole CPU context just for updating env->tsc is too
wasting, this patch use a new function to update the env->tsc.
Comparing to 'cpu_synchronize_all_states()', it only takes about 20us.
Signed-off-by: Liang Li <liang.z.li@intel.com>
Message-Id: <1446695464-27116-2-git-send-email-liang.z.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to Microsoft documentation, the signature in the standard
hypervisor CPUID leaf at 0x40000000 identifies the Vendor ID and is
for reporting and diagnostic purposes only. We can therefore allow
the user to change it to whatever they want, within the 12 character
limit. Add a new hv-vendor-id option to the -cpu flag to allow
for this, ex:
-cpu host,hv_time,hv-vendor-id=KeenlyKVM
Link: http://msdn.microsoft.com/library/windows/hardware/hh975392
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20151016153356.28104.48612.stgit@gimli.home>
[Adjust error message to match the property name, use error_report. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The functions for checking xcrs, xsave and pit_state2 are
only used on x86, so they should reside in target-i386/kvm.c.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1444933820-6968-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In-kernel ITS emulation on ARM64 will require to supply requester IDs.
These IDs can now be retrieved from the device pointer using new
pci_requester_id() function.
This patch adds pci_dev pointer to KVM GSI routing functions and makes
callers passing it.
x86 architecture does not use requester IDs, but hw/i386/kvm/pci-assign.c
also made passing PCI device pointer instead of NULL for consistency with
the rest of the code.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Message-Id: <ce081423ba2394a4efc30f30708fca07656bc500.1444916432.git.p.fedin@samsung.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
HV_X64_MSR_VP_RUNTIME msr used by guest to get
"the time the virtual processor consumes running guest code,
and the time the associated logical processor spends running
hypervisor code on behalf of that guest."
Calculation of that time is performed by task_cputime_adjusted()
for vcpu task by KVM side.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
CC: "Andreas Färber" <afaerber@suse.de>
CC: Marcelo Tosatti <mtosatti@redhat.com>
Message-Id: <1442397584-16698-4-git-send-email-den@openvz.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V features bit HV_X64_MSR_VP_INDEX_AVAILABLE value is
based on cpu option "hv-vpindex" and kernel support of
HV_X64_MSR_VP_INDEX.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
CC: "Andreas Färber" <afaerber@suse.de>
CC: Marcelo Tosatti <mtosatti@redhat.com>
Message-Id: <1442397584-16698-3-git-send-email-den@openvz.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
HV_X64_MSR_RESET msr is used by Hyper-V based Windows guest
to reset guest VM by hypervisor. This msr is stateless so
no migration/fetch/update is required.
This code checks cpu option "hv-reset" and support by
kernel. If both conditions are met appropriate Hyper-V features
cpuid bit is set.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
CC: "Andreas Färber" <afaerber@suse.de>
CC: Marcelo Tosatti <mtosatti@redhat.com>
Message-Id: <1442397584-16698-2-git-send-email-den@openvz.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There's one report of migration breaking due to missing MSR_TSC_AUX
save/restore. Fix this by adding a new subsection that saves the state
of this MSR.
https://bugzilla.redhat.com/show_bug.cgi?id=1261797
Reported-by: Xiaoqing Wei <xwei@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
KVM Hyper-V based guests can notify hypervisor about
occurred guest crash by writing into Hyper-V crash MSR's.
This patch does handling and migration of HV_X64_MSR_CRASH_P0-P4,
HV_X64_MSR_CRASH_CTL msrs. User can enable these MSR's by
'hv-crash' option.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Andreas Färber <afaerber@suse.de>
Message-Id: <1435924905-8926-13-git-send-email-den@openvz.org>
[Folks, stop abrviating variable names!!! Also fix compilation on
non-Linux/x86. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Hyper-V definitions are an industry standard and can be used
from code that is not KVM-specific.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ARAT signals that the APIC timer does not stop in power saving states.
As our APICs are emulated, it's fine to expose this feature to guests,
at least when asking for KVM host features or with CPU types that
include the flag. The exact model number that introduced the feature is
not known, but reports can be found that it's at least available since
Sandy Bridge.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The property can take values on, off or auto. The default is "off"
for KVM and pre-2.4 machines, otherwise "auto" (which makes it
available on TCG or on new-enough kernels).
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Apart from the MSR, the smi field of struct kvm_vcpu_events has to be
translated into the corresponding CPUX86State fields. Also,
memory transaction flags depend on SMM state, so pull it from struct
kvm_run on every exit from KVM to userspace.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This opens the path to get rid of the iothread lock on vmexits in KVM
mode. On x86, the in-kernel irqchips has to be used because we otherwise
need to synchronize APIC and other per-cpu state accesses that could be
changed concurrently.
Regarding pre/post-run callbacks, s390x and ARM should be fine without
specific locking as the callbacks are empty. MIPS and POWER require
locking for the pre-run callback.
For the handle_exit callback, it is non-empty in x86, POWER and s390.
Some POWER cases could do without the locking, but it is left in
place for now.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1434646046-27150-7-git-send-email-pbonzini@redhat.com>
In particular, don't include it into headers.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
On ARM the MSI data corresponds to the shared peripheral interrupt (SPI)
ID. This latter equals to the SPI index + 32. to retrieve the SPI index,
matching the gsi, an architecture specific function is introduced.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Let kvm_arch_post_run convert fields in the kvm_run struct to MemTxAttrs.
These are then passed to address_space_rw.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A bunch of fixes all over the place, some of the
bugs fixed are actually regressions.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVAH/uAAoJECgfDbjSjVRprq0H/iyqLSHQIv6gNOPYQbLXOCv0
pkCeLx6kTMO9lSwxZcsZvMsYPeiEL3CHRKJcEjq0+Ap0uen0pa2Yl3WzyJcnBcib
xwkHk/UftFYAiZAzVtd4moXujvVLYNL1ukvr/wPOdIkTEn8U6K3NaT3pLooc369f
oTyQhlL3E9HJ5S6X0HXJIFwtsOIhPfS3NCLoDFbFjtb9mIsqTx7N5s2C5hctF+ir
JtyuwPx5oT73WYxoYmjSP6n/Nf5cuJdqtm6o2KijjhWWYMJ6epYVBo/DD6dIFbmJ
V/23dxpon+lvhae2c2LAVrkiJ1Boon/eMbJK/mNwpFX7vW35ataLPy6pYpaiEJs=
=RUld
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
misc fixes and cleanups
A bunch of fixes all over the place, some of the
bugs fixed are actually regressions.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Wed Mar 11 17:48:30 2015 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream: (25 commits)
virtio-scsi: remove empty wrapper for cmd
virtio-scsi: clean out duplicate cdb field
virtio-scsi: fix cdb/sense size
uapi/virtio_scsi: allow overriding CDB/SENSE size
virtio-scsi: drop duplicate CDB/SENSE SIZE
exec: don't include hw/boards for linux-user
acpi: specify format for build_append_namestring
MAINTAINERS: drop aliguori@amazon.com
tpm: Move memory subregion function into realize function
virtio-pci: Convert to realize()
pci: Convert pci_nic_init() to Error to avoid qdev_init()
machine: query mem-merge machine property
machine: query dump-guest-core machine property
hw/boards: make it safe to include for linux-user
machine: query phandle-start machine property
machine: query kvm-shadow-mem machine property
kvm: add machine state to kvm_arch_init
machine: query kernel-irqchip property
machine: allowed/required kernel-irqchip support
machine: replace qemu opts with iommu property
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit e79d5a6 ("machine: remove qemu_machine_opts global list") removed
the global option descriptions and moved them to MachineState's QOM
properties.
Query kvm-shadow-mem by accessing machine properties through designated
wrappers.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Needed to query machine's properties.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
The field doesn't need to be inside CPUX86State, and it is not specific
for the CPUID instruction, so move and rename it.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This reverts commit b8a173b25c, reversing
changes made to 5de090464f.
(I applied this pull request when I should not have done so, and
am now immediately reverting it.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The field doesn't need to be inside CPUState, and it is not specific for
the CPUID instruction, so move and rename it.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Right now, the AVX512 registers are split in many different fields:
xmm_regs for the low 128 bits of the first 16 registers, ymmh_regs
for the next 128 bits of the same first 16 registers, zmmh_regs
for the next 256 bits of the same first 16 registers, and finally
hi16_zmm_regs for the full 512 bits of the second 16 bit registers.
This makes it simple to move data in and out of the xsave region,
but would be a nightmare for a hypothetical TCG implementation and
leads to a proliferation of [XYZ]MM_[BWLSQD] macros. Instead,
this patch marshals data manually from the xsave region to a single
32x512-bit array, simplifying the macro jungle and clarifying which
bits are in which vmstate subsection.
The migration format is unaffected.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
the record/replay series, and a few SCSI and i386 patches as well.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJUtjlCAAoJEL/70l94x66Dy5gH/0QIHoXVH/2wuA9apNK2/gBj
2U7g08QGKlc2wQGF4a48sQf523lSt5eirVxrwta0wmvFeznrdR84d4YGpolHM67A
Q9Y5J2i+v1H6cfQH6ylq61QQ7rEC3+isa65wblLeMSCAb2W1CcV7avSKu4BSPZw2
jGr3jd2Ve7pOsULpPhiNsmmltYSeZc7sQBYc9C7fQEoxOGsNnRoKOUKPnIk1mJTc
iYH480L1MnOL3enIz13K34lQofNRhJxJBLYKhYsBydQbOh0/Ls1eifOY4xEegXZ0
IUODy6c2pk+s/IUPARpBucKGKzDxdv0DLXDV60uGn5EsYT0CjCl9/sRs3bZvaQE=
=eT8u
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
Mostly bugfixes and cleanups from qemu-devel. Yet another small patch from
the record/replay series, and a few SCSI and i386 patches as well.
# gpg: Signature made Wed 14 Jan 2015 09:39:14 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream:
cpus: consistently use QEMU_CLOCK_VIRTUAL_RT for icount_warp_rt timer
qemu-timer: rename timer_init to timer_init_tl
scsi: fix cancellation when I/O was completed but DMA was not.
rules.mak: Fix module build
hw/scsi/lsi53c895a: add support for additional diag / debug registers
qemu-common.h: optimise muldiv64 if int128 is available
target-i386: do not memcpy in and out of xmm_regs
target-i386: fix movntsd on big-endian hosts
vl.c: fix regression when reading memory size from config file
vl: Don't silently change topology when all -smp options were set
vl: fix max_cpus check
vl: Avoid unnecessary 'if' nesting
9pfs: changed to use event_notifier instead of qemu_pipe
vl.c: fix regression when reading machine type from config file
char: restore stdio echo on resume from suspend.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
After the next patch, we will move the high parts of AVX and AVX512 registers
in the same array as the SSE registers. This will make it impossible to
memcpy an array of 128-bit values in and out of xmm_regs in one swoop.
Use a for loop instead.
Similarly, always use XMM_Q in translate.c. This avoids introducing bugs
such as the one fixed in the previous patch.
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
on s390 MSI-X irqs are presented as thin or adapter interrupts
for this we have to reorganize the routing entry to contain
valid information for the adapter interrupt code on s390.
To minimize impact on existing code we introduce an architecture
function to fixup the routing entry.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
This commit only touches allocations with size arguments of the form
sizeof(T).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add xsaves related definition, it also adds corresponding part
to kvm_get/put, and vmstate.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
struct kvm_vcpu_events contains reserved fields. Let's use a
designated initializer to avoid false positives in valgrind.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
struct kvm_msrs contains a pad field. Let's use a designated
initializer on the info part to avoid false positives from
valgrind/memcheck.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
struct kvm_msrs contains padding bytes. Let's use a designated
initializer on the info part to avoid false positives from
valgrind/memcheck. Do the same for generic MSRS, the TSC and
feature control.
We also need to zero out the reserved fields in the entries.
We do this in kvm_msr_entry_set as suggested by Paolo. This
avoids a big memset that a designated initializer on the
full structure would do.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
struct kvm_xcrs contains padding bytes. Let's use a designated
initializer to avoid false positives from valgrind/memcheck.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Compute kvm_irqfds_allowed by checking the KVM_CAP_IRQFD extension.
Remove direct settings in architecture specific files.
Add a new kvm_resamplefds_allowed variable, initialized by
checking the KVM_CAP_IRQFD_RESAMPLE extension. Add a corresponding
kvm_resamplefds_enabled() function.
A special notice for s390 where KVM_CAP_IRQFD was not immediatly
advirtised when irqfd capability was introduced in the kernel.
KVM_CAP_IRQ_ROUTING was advertised instead.
This was fixed in "KVM: s390: announce irqfd capability",
ebc3226202d5956a5963185222982d435378b899 whereas irqfd support
was brought in 84223598778ba08041f4297fda485df83414d57e,
"KVM: s390: irq routing for adapter interrupts". Both commits
first appear in 3.15 so there should not be any kernel
version impacted by this QEMU modification.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The MTRR state in KVM currently runs completely independent of the
QEMU state in CPUX86State.mtrr_*. This means that on migration, the
target loses MTRR state from the source. Generally that's ok though
because KVM ignores it and maps everything as write-back anyway. The
exception to this rule is when we have an assigned device and an IOMMU
that doesn't promote NoSnoop transactions from that device to be cache
coherent. In that case KVM trusts the guest mapping of memory as
configured in the MTRR.
This patch updates kvm_get|put_msrs() so that we retrieve the actual
vCPU MTRR settings and therefore keep CPUX86State synchronized for
migration. kvm_put_msrs() is also used on vCPU reset and therefore
allows future modificaitons of MTRR state at reset to be realized.
Note that the entries array used by both functions was already
slightly undersized for holding every possible MSR, so this patch
increases it beyond the 28 new entries necessary for MTRR state.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Invariant TSC documentation mentions that "invariant TSC will run at a
constant rate in all ACPI P-, C-. and T-states".
This is not the case if migration to a host with different TSC frequency
is allowed, or if savevm is performed. So block migration/savevm.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
[AF+mtosatti: Updated error message]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Since Linux kernel 3.5, KVM has documented eax for leaf 0x40000000
to be KVM_CPUID_FEATURES:
57c22e5f35
But qemu still tries to set it to 0. It would be better to make qemu
and kvm consistent. This patch just fixes this issue.
Signed-off-by: Jidong Xiao <jidong.xiao@gmail.com>
[Include kvm_base in the value. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The latest Nvidia driver (337.88) specifically checks for KVM as the
hypervisor and reports Code 43 for the driver in a Windows guest when
found. Removing or changing the KVM signature is sufficient for the
driver to load and work. This patch adds an option to easily allow
the KVM hypervisor signature to be hidden using '-cpu kvm=off'. We
continue to expose KVM via the cpuid value by default. The state of
this option does not supercede or replace -enable-kvm or the accel=kvm
machine option. This only changes the visibility of KVM to the guest
and paravirtual features specifically tied to the KVM cpuid.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS. We get this right in the common
case, because writes to CR0 do not modify the CPL, but it would
not be enough if an SMI comes exactly during that brief period.
Were this to happen, the RSM instruction would erroneously set
CPL to the low two bits of the real-mode selector; and if they are
not 00, the next instruction fetch cannot access the code segment
and causes a triple fault.
However, SS.DPL *is* always equal to the CPL. In real processors
(AMD only) there is a weird case of SYSRET setting SS.DPL=SS.RPL
from the STAR register while forcing CPL=3, but we do not emulate
that.
Tested-by: Kevin O'Connor <kevin@koconnor.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that we have a CPU object with a reset method, it is better to
keep the KVM reset close to the CPU reset. Using qemu_register_reset
as we do now keeps them far apart.
With this patch, PPC no longer calls the kvm_arch_ function, so
it can get removed there. Other arches call it from their CPU
reset handler, and the function gets an ARMCPU/X86CPU/S390CPU.
Note that ARM- and s390-specific functions are called kvm_arm_*
and kvm_s390_*, while x86-specific functions are called kvm_arch_*.
That follows the convention used by the different architectures.
Changing that is the topic of a separate patch.
Reviewed-by: Gleb Natapov <gnatapov@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This fixes warnings from the static code analysis (smatch).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
http://msdn.microsoft.com/en-us/library/windows/hardware/ff541625%28v=vs.85%29.aspx
This code is generic for activating reference time counter or virtual reference time stamp counter
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The MS docs specify HV_X64_MSR_HYPERCALL as a mandatory interface,
thus we must provide the MSRs even if the user only specified
features that, like relaxed timing, in principle don't require them.
And the MSRs are only there if the hypervisor has KVM_CAP_HYPERV.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_arch_init_vcpu's initialization of the KVM leaves at 0x40000100
is broken, because KVM_CPUID_FEATURES is left at 0x40000001. Move
it to 0x40000101 if Hyper-V is enabled.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qemu-kvm/uq/master:
kvm: always update the MPX model specific register
KVM: fix addr type for KVM_IOEVENTFD
KVM: Retry KVM_CREATE_VM on EINTR
mempath prefault: fix off-by-one error
kvm: x86: Separately write feature control MSR on reset
roms: Flush icache when writing roms to guest memory
target-i386: clear guest TSC on reset
target-i386: do not special case TSC writeback
target-i386: Intel MPX
Conflicts:
exec.c
aliguori: fix trivial merge conflict in exec.c
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
The original patch from Liu Jinsong restricted them to reset or full
state updates, but that's unnecessary (and wrong) since the BNDCFGS
MSR has no side effects.
Cc: Liu Jinsong <jinsong.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This motion is preparing for refactoring vCPU APIC subsequently.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
If the guest is running in nested mode on system reset, clearing the
feature MSR signals the kernel to leave this mode. Recent kernels
processes this properly, but leave the VCPU state undefined behind. It
is the job of userspace to bring it to a proper shape. Therefore, write
this specific MSR first so that no state transfer gets lost.
This allows to cleanly reset a guest with VMX in use.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VCPU TSC is not cleared by a warm reset (*), which leaves some types of Linux
guests (non-pvops guests and those with the kernel parameter no-kvmclock set)
vulnerable to the overflow in cyc2ns_offset fixed by upstream commit
9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 ("sched/x86: Fix overflow in
cyc2ns_offset").
To put it in a nutshell, if such a Linux guest without the patch above applied
has been up more than 208 days and attempts a warm reset chances are that
the newly booted kernel will panic or hang.
(*) Intel Xeon E5 processors show the same broken behavior due to
the errata "TSC is Not Affected by Warm Reset" (Intel® Xeon®
Processor E5 Family Specification Update - August 2013): "The
TSC (Time Stamp Counter MSR 10H) should be cleared on
reset. Due to this erratum the TSC is not affected by warm
reset."
Cc: Will Auld <will.auld@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp>
Newer kernels are capable of synchronizing TSC values of multiple VCPUs
on writeback, but we were excluding the power up case, which is not needed
anymore.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp>
Add some MPX related definiation, and hardcode sizes and offsets
of xsave features 3 and 4. It also add corresponding part to
kvm_get/put_xsave, and vmstate.
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This error was reported by valgrind when running qemu-system-x86_64
with kvm:
==975== Conditional jump or move depends on uninitialised value(s)
==975== at 0x521C38: cpuid_find_entry (kvm.c:176)
==975== by 0x5235BA: kvm_arch_init_vcpu (kvm.c:686)
==975== by 0x4D5175: kvm_init_vcpu (kvm-all.c:267)
==975== by 0x45035B: qemu_kvm_cpu_thread_fn (cpus.c:858)
==975== by 0xD361E0D: start_thread (pthread_create.c:311)
==975== by 0xD65E9EC: clone (clone.S:113)
==975== Uninitialised value was created by a stack allocation
==975== at 0x5226E4: kvm_arch_init_vcpu (kvm.c:446)
Instead of adding more memset calls for parts of cpuid_data, the existing
calls were removed and cpuid_data is now initialized completely in one
call.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Only the first item of the array was ever looked at. No
practical effect, but still worth fixing.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
* qemu-kvm/uq/master:
kvm-stub: fix compilation
kvm: shorten the parameter list for get_real_device()
kvm: i386: fix LAPIC TSC deadline timer save/restore
kvm-all.c: max_cpus should not exceed KVM vcpu limit
kvm: Simplify kvm_handle_io
kvm: x86: fix setting IA32_FEATURE_CONTROL with nested VMX disabled
kvm: add KVM_IRQFD_FLAG_RESAMPLE support
kvm: migrate vPMU state
target-i386: remove tabs from target-i386/cpu.h
Initialize IA32_FEATURE_CONTROL MSR in reset and migration
Conflicts:
target-i386/cpu.h
target-i386/kvm.c
aliguori: fixup trivial conflicts due to whitespace and added cpu
argument
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
The configuration of the timer represented by MSR_IA32_TSCDEADLINE depends on:
- APIC LVT Timer register.
- TSC value.
Change the order to respect the dependency.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch is to fix the bug https://bugs.launchpad.net/qemu-kvm/+bug/1207623
IA32_FEATURE_CONTROL is pointless if not expose VMX or SMX bits to
cpuid.1.ecx of vcpu. Current qemu-kvm will error return when kvm_put_msrs
or kvm_get_msrs.
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- since hyperv_* helper functions are used only in target-i386/kvm.c
move them there as static helpers
Requested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Passing a CPUState pointer instead of a CPUArchState pointer eliminates
the last target dependent data type in sysemu/kvm.h.
It also simplifies the code.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
The recent KVM patch adds IA32_FEATURE_CONTROL support. QEMU needs
to clear this MSR when reset vCPU and keep the value of it when
migration. This patch add this feature.
Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Prepares for changing cpu_single_step() argument to CPUState.
Acked-by: Michael Walle <michael@walle.cc> (for lm32)
Signed-off-by: Andreas Färber <afaerber@suse.de>
Move next_cpu from CPU_COMMON to CPUState.
Move first_cpu variable to qom/cpu.h.
gdbstub needs to use CPUState::env_ptr for now.
cpu_copy() no longer needs to save and restore cpu_next.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
[AF: Rebased, simplified cpu_copy()]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Multiple -machine options with the same ID are merged. All but the
one without an ID are to be silently ignored.
In most places, we query these options with a null ID. This is
correct.
In some places, we instead query whatever options come first in the
list. This is wrong. When the -machine processed first happens to
have an ID, options are taken from that ID, and the ones specified
without ID are silently ignored.
Example:
$ upstream-qemu -nodefaults -S -display none -monitor stdio -machine id=foo -machine accel=kvm,usb=on
$ upstream-qemu -nodefaults -S -display none -monitor stdio -machine id=foo,accel=kvm,usb=on -machine accel=xen
$ upstream-qemu -nodefaults -S -display none -monitor stdio -machine accel=xen -machine id=foo,accel=kvm,usb=on
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -machine accel=kvm,usb=on
QEMU 1.5.50 monitor - type 'help' for more information
(qemu) info kvm
kvm support: enabled
(qemu) info usb
(qemu) q
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -machine id=foo -machine accel=kvm,usb=on
QEMU 1.5.50 monitor - type 'help' for more information
(qemu) info kvm
kvm support: disabled
(qemu) info usb
(qemu) q
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -machine id=foo,accel=kvm,usb=on -machine accel=xen
QEMU 1.5.50 monitor - type 'help' for more information
(qemu) info kvm
kvm support: enabled
(qemu) info usb
USB support not enabled
(qemu) q
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -machine accel=xen -machine id=foo,accel=kvm,usb=on
xc: error: Could not obtain handle on privileged command interface (2 = No such file or directory): Internal error
xen be core: can't open xen interface
failed to initialize Xen: Operation not permitted
Option usb is queried correctly, and the one without an ID wins,
regardless of option order.
Option accel is queried incorrectly, and which one wins depends on
option order and ID.
Affected options are accel (and its sugared forms -enable-kvm and
-no-kvm), kernel_irqchip, kvm_shadow_mem.
Additionally, option kernel_irqchip is normally on by default, except
it's off when no -machine options are given. Bug can't bite, because
kernel_irqchip is used only when KVM is enabled, KVM is off by
default, and enabling always creates -machine options. Downstreams
that enable KVM by default do get bitten, though.
Use qemu_get_machine_opts() to fix these bugs.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1372943363-24081-5-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Change Monitor::mon_cpu to CPUState as well.
Reviewed-by: liguang <lig.fnst@cn.fujitsu.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
It no longer relies on CPUArchState since 20d695a.
Reviewed-by: liguang <lig.fnst@cn.fujitsu.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
This replaces the feature-bit fields on both X86CPU and x86_def_t
structs with an array.
With this, we will be able to simplify code that simply does the same
operation on all feature words (e.g. kvm_check_features_against_host(),
filter_features_for_kvm(), add_flagname_to_bitmaps(), CPU feature-bit
property lookup/registration, and the proposed "feature-words" property)
The following field replacements were made on X86CPU and x86_def_t:
(cpuid_)features -> features[FEAT_1_EDX]
(cpuid_)ext_features -> features[FEAT_1_ECX]
(cpuid_)ext2_features -> features[FEAT_8000_0001_EDX]
(cpuid_)ext3_features -> features[FEAT_8000_0001_ECX]
(cpuid_)ext4_features -> features[FEAT_C000_0001_EDX]
(cpuid_)kvm_features -> features[FEAT_KVM]
(cpuid_)svm_features -> features[FEAT_SVM]
(cpuid_)7_0_ebx_features -> features[FEAT_7_0_EBX]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Add appropriate spaces around operators, and break line where it needs
to be broken to allow feature-words array to be introduced without
having too-long lines.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Read and write steal time MSR, so that reporting is functional across
migration.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Many of these should be cleaned up with proper qdev-/QOM-ification.
Right now there are many catch-all headers in include/hw/ARCH depending
on cpu.h, and this makes it necessary to compile these files per-target.
However, fixing this does not belong in these patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Both fields are used in VMState, thus need to be moved together.
Explicitly zero them on reset since they were located before
breakpoints.
Pass PowerPCCPU to kvmppc_handle_halt().
Signed-off-by: Andreas Färber <afaerber@suse.de>
* qemu-kvm/uq/master:
target-i386: kvm: prevent buffer overflow if -cpu foo, [x]level is too big
vmxcap: bit 9 of VMX_PROCBASED_CTLS2 is 'virtual interrupt delivery'
Conflicts:
target-i386/kvm.c
Trivial merge resolution due to lack of context.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Stack corruption may occur if too big 'level' or 'xlevel' values passed
on command line with KVM enabled, due to limited size of cpuid_data
in kvm_arch_init_vcpu().
reproduces with:
qemu -enable-kvm -cpu qemu64,level=4294967295
or
qemu -enable-kvm -cpu qemu64,xlevel=4294967295
Check if there is space in cpuid_data before passing it to cpu_x86_cpuid()
or abort() if there is not space.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Andreas Faerber <afaerber@suse.de>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
The CPU ID in KVM is supposed to be the APIC ID, so change the
KVM_CREATE_VCPU call to match it. The current behavior didn't break
anything yet because today the APIC ID is assumed to be equal to the CPU
index, but this won't be true in the future.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
This will allow each architecture to define how the VCPU ID is set on
the KVM_CREATE_VCPU ioctl call.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Various header files rely on qemu-char.h including qemu-config.h or
main-loop.h, but they really do not need qemu-char.h at all (particularly
interesting is the case of the block layer!). Clean this up, and also
add missing inclusions of qemu-char.h itself.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
Basic design is to emulate the MSR by allowing reads and writes to the
hypervisor vcpu specific locations to store the value of the emulated MSRs.
In this way the IA32_TSC_ADJUST value will be included in all reads to
the TSC MSR whether through rdmsr or rdtsc.
As this is a new MSR that the guest may access and modify its value needs
to be migrated along with the other MRSs. The changes here are specifically
for recognizing when IA32_TSC_ADJUST is enabled in CPUID and code added
for migrating its value.
Signed-off-by: Will Auld <will.auld@intel.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Pass around CPUArchState instead of using global cpu_single_env.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>