* remotes/kvm/uq/master:
kvm: Fix eax for cpuid leaf 0x40000000
kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec calculation
kvm: Enable -cpu option to hide KVM
kvm: Ensure negative return value on kvm_init() error handling path
target-i386: set CC_OP to CC_OP_EFLAGS in cpu_load_eflags
target-i386: get CPL from SS.DPL
target-i386: rework CPL checks during task switch, preparing for next patch
target-i386: fix segment flags for SMM and VM86 mode
target-i386: Fix vm86 mode regression introduced in fd460606fd.
kvm_stat: allow choosing between tracepoints and old stats
kvmclock: Ensure time in migration never goes backward
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We need to ensure ret < 0 when going through the error path, or QEMU may
try to run the half-initialized VM and crash.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make use of the new s390 adapter irq routing support to enable real
in-kernel irqfds for virtio-ccw with adapter interrupts.
Note that s390 doesn't provide the common KVM_CAP_IRQCHIP capability, but
rather needs KVM_CAP_S390_IRQCHIP to be enabled. This is to ensure backward
compatibility.
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Now that we have a CPU object with a reset method, it is better to
keep the KVM reset close to the CPU reset. Using qemu_register_reset
as we do now keeps them far apart.
With this patch, PPC no longer calls the kvm_arch_ function, so
it can get removed there. Other arches call it from their CPU
reset handler, and the function gets an ARMCPU/X86CPU/S390CPU.
Note that ARM- and s390-specific functions are called kvm_arm_*
and kvm_s390_*, while x86-specific functions are called kvm_arch_*.
That follows the convention used by the different architectures.
Changing that is the topic of a separate patch.
Reviewed-by: Gleb Natapov <gnatapov@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
s390x introduced helper functions for getting/setting one_regs with
commit 860643bc. However, nothing about these is s390-specific.
Alexey Kardashevskiy had already posted a general version, so let's
merge the two patches and massage the code a bit.
CC: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This minimizes QEMUMachine usage, as part of machine QOM-ification.
Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
This reverts commit b533f658a9.
The original code was wrong, because effectively it ignored errors
from kernel, because kernel does not return -1 on error case but
returns -errno, and does not return -EPERM for this particular ioctl.
But in some cases kernel actually returned unsuccessful result,
namely, when the dirty bitmap in requested slot does not exist
it returns -ENOENT. With new code this condition becomes an
error when it shouldn't be.
Revert that patch instead of fixing it properly this late in the
release process. I disagree with this approach, but let's make
things move _somewhere_, instead of arguing endlessly whch of
the 2 proposed fixes is better.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Message-id: 1397477644-902-1-git-send-email-mjt@msgid.tls.msk.ru
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Fix return condition check from kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d) to
handle internal failures or no support for memory slot dirty bitmap.
Otherwise the ioctl succeeds and continues with migration.
Addresses BUG# 1294227
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
* remotes/kvm/uq/master:
target-i386: bugfix of Intel MPX
file_ram_alloc: unify mem-path,mem-prealloc error handling
kvm-all: exit in case max vcpus exceeded
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Targets like ppc64 support different types of KVM, one which use
hypervisor mode and the other which doesn't. Add a new machine
option kvm-type that helps in selecting the respective ones
We also add a new QEMUMachine callback get_vm_type that helps
in mapping the string representation of kvm type specified.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: spelling fixes, use error_report(), use qemumachine.h]
Signed-off-by: Alexander Graf <agraf@suse.de>
Rather than fall back to TCG (so the user has to discover
whats happening, in case of no access to qemu stdout/stderr).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduces two simple functions:
int kvm_device_ioctl(int fd, int type, ...);
int kvm_create_device(KVMState *s, uint64_t type, bool test);
These functions wrap the basic ioctl-based interactions with KVM in a
way similar to other KVM ioctl wrappers.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Message-id: 1392687720-26806-4-git-send-email-christoffer.dall@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Introduce kvm_arch_irqchip_create an arch-specific hook in preparation
for architecture-specific use of the device control API to create IRQ
chips.
Following patches will implement the ARM irqchip create method to prefer
the device control API over the older KVM_CREATE_IRQCHIP API.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Message-id: 1392687720-26806-3-git-send-email-christoffer.dall@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit 94ccff13 introduced a more verbose failure message and retry
operations on KVM VM creation. However, it ended up using a variable
for its failure message that hasn't been initialized yet.
Fix it to use the value it meant to set.
Cc: qemu-stable@nongnu.org
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* remotes/kvm/uq/master:
target-i386: Move KVM default-vendor hack to instance_init
target-i386: Don't change x86_def_t struct on cpu_x86_register()
target-i386: Eliminate CONFIG_KVM #ifdefs
kvm: add support for hyper-v timers
kvm: make hyperv vapic assist page migratable
kvm: make hyperv hypercall and guest os id MSRs migratable.
kvm: make availability of Hyper-V enlightenments dependent on KVM_CAP_HYPERV
KVM: fix coexistence of KVM and Hyper-V leaves
kvm: print suberror on all internal errors
target-i386: kvm_check_features_against_host(): Kill feature word array
target-i386: kvm_cpu_fill_host(): Fill feature words in a loop
target-i386: kvm_cpu_fill_host(): Set all feature words at end of function
target-i386: kvm_cpu_fill_host(): No need to check xlevel2
target-i386: kvm_cpu_fill_host(): No need to check CPU vendor
target-i386: kvm_cpu_fill_host(): No need to check level
target-i386: kvm_cpu_fill_host(): Kill unused code
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
KVM introduced internal error exit reason and suberror at the same time,
and later extended it with internal error data.
QEMU does not report suberror on hosts between these two events because
we check for the extension. (half a year in 2009, but it is misleading)
Fix by removing KVM_CAP_INTERNAL_ERROR_DATA condition on printf.
(partially improved by bb44e0d12d and ba4047cf84 in the past)
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qemu-kvm/uq/master:
kvm: always update the MPX model specific register
KVM: fix addr type for KVM_IOEVENTFD
KVM: Retry KVM_CREATE_VM on EINTR
mempath prefault: fix off-by-one error
kvm: x86: Separately write feature control MSR on reset
roms: Flush icache when writing roms to guest memory
target-i386: clear guest TSC on reset
target-i386: do not special case TSC writeback
target-i386: Intel MPX
Conflicts:
exec.c
aliguori: fix trivial merge conflict in exec.c
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
There is a HOST_PAGE_ALIGN macro which makes sense for KVM accelerator
but it uses qemu_host_page_size/qemu_host_page_mask which initialized
for TCG only.
This moves qemu_host_page_size/qemu_host_page_mask initialization from
TCG's page_init() and adds a call for it from kvm_init().
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The @addr here is a guest physical address and can easily be bigger
than 4G.
This changes uint32_t to hwaddr.
Cc: qemu-stable@nongnu.org
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Upstreaming this change from Android (https://android-review.googlesource.com/54211).
On heavily loaded machines with many VM instances we see KVM_CREATE_VM
failing with EINTR on this path:
kvm_dev_ioctl_create_vm -> kvm_create_vm -> kvm_init_mmu_notifier -> mmu_notifier_register -> do_mmu_notifier_register -> mm_take_all_locks
which checks if any signals have been raised while it was attaining locks
and returns EINTR. Retrying the system call greatly improves reliability.
Cc: qemu-stable@nongnu.org
Signed-off-by: thomas knych <thomaswk@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We want to have all the functions that handle directly the dirty
bitmap near. We will change it later.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Performance is important in this function, and we want to optimize even further.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
KVM reports the number of available memory slots (KVM_CAP_NR_MEMSLOTS)
using the extension interface. Both x86 and s390 implement this, ARM
and powerpc do not yet enable it. Convert the static slots array to
be dynamically allocated, supporting more slots when available.
Default to 32 when KVM_CAP_NR_MEMSLOTS is not implemented. The
motivation for this change is to support more assigned devices, where
memory mapped PCI MMIO BARs typically take one slot each.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
# By Alexey Kardashevskiy (3) and others
# Via Paolo Bonzini
* qemu-kvm/uq/master:
target-i386: add feature kvm_pv_unhalt
linux-headers: update to 3.12-rc1
target-i386: forward CPUID cache leaves when -cpu host is used
linux-headers: update to 3.11
kvm: fix traces to use %x instead of %d
kvmvapic: Clear also physical ROM address when entering INACTIVE state
kvmvapic: Enter inactive state on hardware reset
kvmvapic: Catch invalid ROM size
kvm irqfd: support direct msimessage to irq translation
fix steal time MSR vmsd callback to proper opaque type
kvm: warn if num cpus is greater than num recommended
cpu: Move cpu state syncs up into cpu_dump_state()
exec: always use MADV_DONTFORK
Message-id: 1379694292-1601-1-git-send-email-pbonzini@redhat.com
On PPC64 systems MSI Messages are translated to system IRQ in a PCI
host bridge. This is already supported for emulated MSI/MSIX but
not for irqfd where the current QEMU allocates IRQ numbers from
irqchip and maps MSIMessages to IRQ in the host kernel.
This adds a new direct mapping flag which tells
the kvm_irqchip_add_msi_route() function that a new VIRQ
should not be allocated, instead the value from MSIMessage::data
should be used. It is up to the platform code to make sure that
this contains a valid IRQ number as sPAPR does in spapr_pci.c.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The comment in kvm_max_vcpus() states that it's using the recommended
procedure from the kernel API documentation to get the max number
of vcpus that kvm supports. It is, but by always returning the
maximum number supported. The maximum number should only be used
for development purposes. qemu should check KVM_CAP_NR_VCPUS for
the recommended number of vcpus. This patch adds a warning if a user
specifies a number of cpus between the recommended and max.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Make it a generic hook rather than a KVM hook. Less code and
ifdeffery.
Since the only user of the hook is old S390 KVM, there's hope we can
get rid of it some day.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Message-id: 1375276272-15988-5-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
* qemu-kvm/uq/master:
kvm-stub: fix compilation
kvm: shorten the parameter list for get_real_device()
kvm: i386: fix LAPIC TSC deadline timer save/restore
kvm-all.c: max_cpus should not exceed KVM vcpu limit
kvm: Simplify kvm_handle_io
kvm: x86: fix setting IA32_FEATURE_CONTROL with nested VMX disabled
kvm: add KVM_IRQFD_FLAG_RESAMPLE support
kvm: migrate vPMU state
target-i386: remove tabs from target-i386/cpu.h
Initialize IA32_FEATURE_CONTROL MSR in reset and migration
Conflicts:
target-i386/cpu.h
target-i386/kvm.c
aliguori: fixup trivial conflicts due to whitespace and added cpu
argument
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
maxcpus, which specifies the maximum number of hotpluggable CPUs,
should not exceed KVM's vcpu limit.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[Reword message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that cpu_in/out is just a wrapper around address_space_rw, we can
also call the latter directly. As host endianness == guest endianness,
there is no need for the memory access helpers st*_p/ld*_p as well.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Added an EventNotifier* parameter to
kvm-all.c:kvm_irqchip_add_irqfd_notifier(), in order to give KVM
another eventfd to be used as "resamplefd". See the documentation
in the linux kernel sources in Documentation/virtual/kvm/api.txt
(section 4.75) for more details.
When the added parameter is passed NULL, the behaviour of the
function is unchanged with respect to the previous versions.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Passing a CPUState pointer instead of a CPUArchState pointer eliminates
the last target dependent data type in sysemu/kvm.h.
It also simplifies the code.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
CPUArchState is no longer directly used since converting CPU loops to
CPUState.
Prepares for changing GDBState::c_cpu to CPUState.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Prepares for changing cpu_single_step() argument to CPUState.
Acked-by: Michael Walle <michael@walle.cc> (for lm32)
Signed-off-by: Andreas Färber <afaerber@suse.de>
Move next_cpu from CPU_COMMON to CPUState.
Move first_cpu variable to qom/cpu.h.
gdbstub needs to use CPUState::env_ptr for now.
cpu_copy() no longer needs to save and restore cpu_next.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
[AF: Rebased, simplified cpu_copy()]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Since CPU loops are done as last step in kvm_{insert,remove}_breakpoint()
and kvm_remove_all_breakpoints(), we do not need to distinguish between
invoking CPU and iterated CPUs and can thereby free the identifier for
use as a global variable.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Multiple -machine options with the same ID are merged. All but the
one without an ID are to be silently ignored.
In most places, we query these options with a null ID. This is
correct.
In some places, we instead query whatever options come first in the
list. This is wrong. When the -machine processed first happens to
have an ID, options are taken from that ID, and the ones specified
without ID are silently ignored.
Example:
$ upstream-qemu -nodefaults -S -display none -monitor stdio -machine id=foo -machine accel=kvm,usb=on
$ upstream-qemu -nodefaults -S -display none -monitor stdio -machine id=foo,accel=kvm,usb=on -machine accel=xen
$ upstream-qemu -nodefaults -S -display none -monitor stdio -machine accel=xen -machine id=foo,accel=kvm,usb=on
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -machine accel=kvm,usb=on
QEMU 1.5.50 monitor - type 'help' for more information
(qemu) info kvm
kvm support: enabled
(qemu) info usb
(qemu) q
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -machine id=foo -machine accel=kvm,usb=on
QEMU 1.5.50 monitor - type 'help' for more information
(qemu) info kvm
kvm support: disabled
(qemu) info usb
(qemu) q
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -machine id=foo,accel=kvm,usb=on -machine accel=xen
QEMU 1.5.50 monitor - type 'help' for more information
(qemu) info kvm
kvm support: enabled
(qemu) info usb
USB support not enabled
(qemu) q
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -machine accel=xen -machine id=foo,accel=kvm,usb=on
xc: error: Could not obtain handle on privileged command interface (2 = No such file or directory): Internal error
xen be core: can't open xen interface
failed to initialize Xen: Operation not permitted
Option usb is queried correctly, and the one without an ID wins,
regardless of option order.
Option accel is queried incorrectly, and which one wins depends on
option order and ID.
Affected options are accel (and its sugared forms -enable-kvm and
-no-kvm), kernel_irqchip, kvm_shadow_mem.
Additionally, option kernel_irqchip is normally on by default, except
it's off when no -machine options are given. Bug can't bite, because
kernel_irqchip is used only when KVM is enabled, KVM is off by
default, and enabling always creates -machine options. Downstreams
that enable KVM by default do get bitten, though.
Use qemu_get_machine_opts() to fix these bugs.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1372943363-24081-5-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
# By Paolo Bonzini (50) and others
# Via Paolo Bonzini
* bonzini/iommu-for-anthony: (66 commits)
exec: change some APIs to take AddressSpaceDispatch
exec: remove cur_map
exec: put memory map in AddressSpaceDispatch
exec: separate current radix tree from the one being built
exec: move listener from AddressSpaceDispatch to AddressSpace
memory: move MemoryListener declaration earlier
exec: separate current memory map from the one being built
exec: change well-known physical sections to macros
qom: Use atomics for object refcounting
memory: add reference counting to FlatView
memory: use a new FlatView pointer on every topology update
memory: access FlatView from a local variable
add a header file for atomic operations
hw/[u-x]*: pass owner to memory_region_init* functions
hw/t*: pass owner to memory_region_init* functions
hw/s*: pass owner to memory_region_init* functions
hw/p*: pass owner to memory_region_init* functions
hw/n*: pass owner to memory_region_init* functions
hw/m*: pass owner to memory_region_init* functions
hw/i*: pass owner to memory_region_init* functions
...
Message-id: 1372950842-32422-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add ref/unref calls at the following places:
- places where memory regions are stashed by a listener and
used outside the BQL (including in Xen or KVM).
- memory_region_find callsites
- creation of aliases and containers (only the aliased/contained
region gets a reference to avoid loops)
- around calls to del_subregion/add_subregion, where the region
could disappear after the first call
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some guests do a large number of mask/unmask
calls which currently trigger expensive route update
system calls.
Detect that route in unchanged and skip the system call.
Reported-by: "Zhanghaoyu (A)" <haoyu.zhang@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
kvm_add_routing_entry makes an attempt to
zero-initialize any new routing entry.
However, it fails to initialize padding
within the u field of the structure
kvm_irq_routing_entry.
Other functions like kvm_irqchip_update_msi_route
also fail to initialize the padding field in
kvm_irq_routing_entry.
It's better to just make sure all input is initialized.
Once it is, we can also drop complex field by field assignment and just
do the simple *a = *b to update a route entry.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
I try to hotplug 28 * 8 multiple-function devices to guest with
old host kernel, ioeventfds in host kernel will be exhausted, then
qemu fails to allocate ioeventfds for blk/nic devices.
It's better to add detail error here.
Signed-off-by: Amos Kong <akong@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
The current logic updates KVM's view of our interrupt map every time we
change it. While this is nice and bullet proof, it slows things down
badly for me. QEMU spends about 3 seconds on every start telling KVM what
news it has on its routing maps.
Instead, let's just synchronize the whole irq routing map as a whole when
we're done constructing it. For things that change during runtime, we can
still update the routing table on demand.
Signed-off-by: Alexander Graf <agraf@suse.de>
The usual MSI injection mechanism writes msi.data into memory using an
le32 wrapper. So on big endian guests, this swaps msg.data into the
expected byte order.
For irqfd however, we don't swap the payload right now, rendering
in-kernel MPIC emulation broken on PowerPC.
Swap msg.data to the correct endianness whenever we touch it.
Signed-off-by: Alexander Graf <agraf@suse.de>