On PPC, we can have different types of interrupt controllers, so we really
only know that we are going to use one when we created it.
Export kvm_init_irq_routing() to common code, so that we don't have to call
kvm_irqchip_create().
Signed-off-by: Alexander Graf <agraf@suse.de>
On PPC, we don't support MP state. So far it's not necessary and I'm
not convinced yet that we really need to support it ever.
However, the current idle logic in QEMU assumes that an in-kernel PIC
also means we support MP state. This assumption is not true anymore.
Let's split up the two cases into two different variables. That way
PPC can expose an in-kernel PIC, while not implementing MP state.
Signed-off-by: Alexander Graf <agraf@suse.de>
CC: Jan Kiszka <jan.kiszka@siemens.com>
It no longer uses CPUArchState.
Prepares for changing qemu_kvm_cpu_thread_fn() opaque to CPUState.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
It no longer uses CPUArchState.
Prepares for changing kvm_cpu_exec() argument to CPUState.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Make cpustats monitor command available unconditionally.
Prepares for changing kvm_handle_internal_error() and kvm_cpu_exec()
arguments to CPUState.
Signed-off-by: Andreas Färber <afaerber@suse.de>
CPUArchState is no longer needed.
Prepares for changing qemu_kvm_init_cpu_signals() argument to CPUState.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
It no longer relies on CPUArchState since 20d695a.
Reviewed-by: liguang <lig.fnst@cn.fujitsu.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
So far, the size of all regions passed to listeners could fit in 64 bits,
because artificial regions (containers and aliases) are eliminated by
the memory core, leaving only device regions which have reasonable sizes
An IOMMU however cannot be eliminated by the memory core, and may have
an artificial size, hence we may need 65 bits to represent its size.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Luiz Capitulino reported that guest refused to boot and qemu
complained with:
kvm_set_phys_mem: error unregistering overlapping slot: Invalid argument
It is caused by commit 235e8982ad that did double free for the memslot
so that the second one raises the -EINVAL error
Fix it by reset memory size only if it is needed
Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
For readonly memory regions and rom devices in romd_mode,
we make use of the KVM_MEM_READONLY. A slot that uses
KVM_MEM_READONLY can be read from and code can execute from the
region, but writes will exit to qemu.
For rom devices with !romd_mode, we force the slot to be
removed so reads or writes to the region will exit to qemu.
(Note that a memory region in this state is not executable
within kvm.)
v7:
* Update for readable => romd_mode rename (5f9a5ea1)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> (v4)
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> (v5)
Message-id: 1369816047-16384-4-git-send-email-jordan.l.justen@intel.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This is preparatory to the introduction of a separate freeing API.
Reported-by: Amos Kong <akong@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Message-id: 1368454796-14989-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch enable us to know exit reason of KVM_RUN. It will help us
know where the trouble is caused.
Signed-off-by: Kazuya Saito <saito.kazuya@jp.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch adds tracepoints at ioctl to kvm. Tracing these ioctl is
useful for clarification whether the cause of troubles is qemu or kvm.
Signed-off-by: Kazuya Saito <saito.kazuya@jp.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If hotplugged, synchronize CPU state to KVM.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
This adds a new device that we can use for testing PCI PIO and MMIO, with and
without ioeventfd in different configurations. FAST_MMIO will be added if/when
kvm supports it. Also included are minor cleanups in kvm APIs that it needs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJRbIJQAAoJECgfDbjSjVRpQuoH/RfMHb6YYfsuwJKUsjCKxhdr
695YqNsBLmh7E/+wr1dwUsKrMGSF97VSGAIPeX0u4wwP6jrORhA9iycCevXYlh/S
O7RTcePqVEQrwnMX5rOAEWGARVzg4hAT8i4Pdza1A+gBvaO/WLZIVJfUOHBAZNL7
2TTDymfixipErcTcxckITHfaShn9ajZgt/Yo8oVX70VqklWU+OEU/tYEXmvTC0H3
bTuTU3vpeAlCubF0AHHZqWA9g7myrKMCxwv4LWx7gmQGXoyQesy4s5C9KMrld1On
RovLw0REbtjB2xGjAj3g82ESK5eoi295Th/E7Fu1NJNYyDyfhxB7/cnbRa+Wpsg=
=jvZE
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'mst/tags/for_anthony' into staging
pci: add pci test device
This adds a new device that we can use for testing PCI PIO and MMIO, with and
without ioeventfd in different configurations. FAST_MMIO will be added if/when
kvm supports it. Also included are minor cleanups in kvm APIs that it needs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Mon 15 Apr 2013 05:42:24 PM CDT using RSA key ID D28D5469
# gpg: Can't check signature: public key not found
# By Michael S. Tsirkin
# Via Michael S. Tsirkin
* mst/tags/for_anthony:
pci: add pci test device
kvm: support non datamatch ioeventfd
kvm: support any size for pio eventfd
kvm: remove unused APIs
Message-id: cover.1366272004.git.mst@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
... so it could be called without requiring CPUArchState.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Move it to qom/cpu.h to avoid issues with include order.
Change pc_acpi_smi_interrupt() opaque to X86CPU.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Both fields are used in VMState, thus need to be moved together.
Explicitly zero them on reset since they were located before
breakpoints.
Pass PowerPCCPU to kvmppc_handle_halt().
Signed-off-by: Andreas Färber <afaerber@suse.de>
Since commit 20d695a925 (kvm: Pass
CPUState to kvm_arch_*) CPUArchState is no longer needed.
Allows to change qemu_kvm_eat_signals() argument as well.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
This will allow each architecture to define how the VCPU ID is set on
the KVM_CREATE_VCPU ioctl call.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
ppc64 build needs this stub to build with virtio enabled.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Note that target-alpha accesses this field from TCG, now using a
negative offset. Therefore the field is placed last in CPUState.
Pass PowerPCCPU to [kvm]ppc_fixup_cpu() to facilitate this change.
Move common parts of mips cpu_state_reset() to mips_cpu_reset().
Acked-by: Richard Henderson <rth@twiddle.net> (for alpha)
[AF: Rebased onto ppc CPU subclasses and openpic changes]
Signed-off-by: Andreas Färber <afaerber@suse.de>
QEMU allocates a map enough for 4k pages. However the system page size
can be 64K (for example on POWER) and the host kernel uses only a small
part of it as one big stores a dirty flag for 16 pages 4K each,
the hpratio variable stores this ratio and
the kvm_get_dirty_pages_log_range function handles it correctly.
However kvm_get_dirty_pages_log_range still goes beyond the data
provided by the host kernel which is not correct. It does not cause
errors at the moment as the whole bitmap is zeroed before doing KVM ioctl.
The patch reduces number of iterations over the map.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
So far we only removed them from the guest, leaving its states in the
list. This made it impossible for gdb to re-enable breakpoints on the
same address after re-attaching.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Change return type to bool, move to include/qemu/cpu.h and
add documentation.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
[AF: Updated new caller qemu_in_vcpu_thread()]
target_phys_addr_t is unwieldly, violates the C standard (_t suffixes are
reserved) and its purpose doesn't match the name (most target_phys_addr_t
addresses are not target specific). Replace it with a finger-friendly,
standards conformant hwaddr.
Outstanding patchsets can be fixed up with the command
git rebase -i --exec 'find -name "*.[ch]"
| xargs s/target_phys_addr_t/hwaddr/g' origin
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Using the AddressSpace type reduces confusion, as you can't accidentally
supply the MemoryRegion you're interested in.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of calling a global function on coalesced mmio changes, which
routes the call to kvm if enabled, add coalesced mmio hooks to
MemoryListener and make kvm use that instead.
The motivation is support for multiple address spaces (which means we
we need to filter the call on the right address space) but the result
is cleaner as well.
Signed-off-by: Avi Kivity <avi@redhat.com>
The construct
if (address_space == get_system_memory()) {
// memory thing
} else {
// io thing
}
fails if we have more than two address spaces. Use a separate listener
for memory and I/O, and utilize MemoryListener's address space filtering to
fix this.
Signed-off-by: Avi Kivity <avi@redhat.com>
* stefanha/trivial-patches:
configure: fix seccomp check
arch_init.c: add missing '%' symbols before PRIu64 in debug printfs
kvm: Fix warning from static code analysis
qapi: Fix enumeration typo error
console: Clean up bytes per pixel calculation
Fix copy&paste typos in documentation comments
linux-user: Remove #if 0'd cpu_get_real_ticks() definition
ui: Fix spelling in comment (ressource -> resource)
Spelling fixes in comments and macro names (ressource -> resource)
Fix spelling (licenced -> licensed) in GPL
Spelling fixes in comments and documentation
srp: Don't use QEMU_PACKED for single elements of a structured type
Report from smatch:
kvm-all.c:1373 kvm_init(135) warn:
variable dereferenced before check 's' (see line 1360)
's' cannot by NULL (it was alloced using g_malloc0), so there is no need
to check it here.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
This variable is no longer bound to irqchip, and the IOCTL sets the IRQ
level, does not directly inject it. No functional changes.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The memory subsystem will now take care of flushing whenever affected
regions are accessed or the memory mapping changes.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Move the init of the irqchip_inject_ioctl field of KVMState out of
kvm_irqchip_create() and into kvm_init(), so that kvm_set_irq()
can be used even when no irqchip is created (for architectures
that support async interrupt notification even without an in
kernel irqchip).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Will be used by PCI device assignment code.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This service allows to update an MSI route without releasing/reacquiring
the associated VIRQ. Will be used by PCI device assignment, later on
likely also by virtio/vhost and VFIO.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
No need to expose the fd-based interface, everyone will already be fine
with the more handy EventNotifier variant. Rename the latter to clarify
that we are still talking about irqfds here.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
valgrind with kvm produces a big amount of false positives regarding
"Conditional jump or move depends on uninitialised value(s)". This
happens because the guest memory is allocated with qemu_vmalloc which
boils down posix_memalign etc. This function is (correctly) considered
by valgrind as returning undefined memory.
Since valgrind is based on jitting code, it will not be able to see
changes made by the guest to guest memory if this is done by KVM_RUN,
thus keeping most of the guest memory undefined.
Now lots of places in qemu will then use guest memory to change behaviour.
To avoid the flood of these messages, lets declare the whole guest
memory as defined. This will reduce the noise and allows us to see real
problems.
In the future we might want to make this conditional, since there
is actually something that we can use those false positives for:
These messages will point to code that depends on guest memory, so
we can use these backtraces to actually make an audit that is focussed
only at those code places. For normal development we dont want to
see those messages, though.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Don't assume having an in-kernel irqchip means that GSI
routing is enabled.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Decouple another x86-specific assumption about what irqchips imply.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of assuming that we can use irqfds if and only if
kvm_irqchip_in_kernel(), add a bool to the KVMState which
indicates this, and is set only on x86 and only if the
irqchip is in the kernel.
The kernel documentation implies that the only thing
you need to use KVM_IRQFD is that KVM_CAP_IRQFD is
advertised, but this seems to be untrue. In particular
the kernel does not (alas) return a sensible error if you
try to set up an irqfd when you haven't created an irqchip.
If it did we could remove all this nonsense and let the
kernel return the error code.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
kvm_allows_irq0_override() is a totally x86 specific concept:
move it to the target-specific source file where it belongs.
This means we need a new header file for the prototype:
kvm_i386.h, in line with the existing kvm_ppc.h.
While we are moving it, fix the return type to be 'bool' rather
than 'int'.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Rename the function kvm_irqchip_set_irq() to kvm_set_irq(),
since it can be used for sending (asynchronous) interrupts whether
there is a full irqchip model in the kernel or not. (We don't
include 'async' in the function name since asynchronous is the
normal case.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
On x86 userspace delivers interrupts to the kernel asynchronously
(and therefore VCPU idle management is done in the kernel) if and
only if there is an in-kernel irqchip. On other architectures this
isn't necessarily true (they may always send interrupts
asynchronously), so define a new kvm_async_interrupts_enabled()
function instead of misusing kvm_irqchip_in_kernel().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add a helper function for fetching max cpus supported by kvm.
Make QEMU exit with an error message if smp_cpus exceeds limit
of VCPU count retrieved by invoking this helper function.
Signed-off-by: Dunrong Huang <riegamaths@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
* qemu-kvm/uq/master:
virtio: move common irqfd handling out of virtio-pci
virtio: move common ioeventfd handling out of virtio-pci
event_notifier: add event_notifier_set_handler
memory: pass EventNotifier, not eventfd
ivshmem: wrap ivshmem_del_eventfd loops with transaction
ivshmem: use EventNotifier and memory API
event_notifier: add event_notifier_init_fd
event_notifier: remove event_notifier_test
event_notifier: add event_notifier_set
apic: Defer interrupt updates to VCPU thread
apic: Reevaluate pending interrupts on LVT_LINT0 changes
apic: Resolve potential endless loop around apic_update_irq
kvm: expose tsc deadline timer feature to guest
kvm_pv_eoi: add flag support
kvm: Don't abort on kvm_irqchip_add_msi_route()
All transports can use the same event handler for the irqfd, though the
exact mechanics of the assignment will be specific. Note that there
are three states: handled by the kernel, handled in userspace, disabled.
This also lets virtio use event_notifier_set_handler.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Under Win32, EventNotifiers will not have event_notifier_get_fd, so we
cannot call it in common code such as hw/virtio-pci.c. Pass a pointer to
the notifier, and only retrieve the file descriptor in kvm-specific code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
By default qemu will use MAP_PRIVATE for guest pages. This will write
protect pages and thus break on s390 systems that dont support this feature.
Therefore qemu has a hack to always use MAP_SHARED for s390. But MAP_SHARED
has other problems (no dirty pages tracking, a lot more swap overhead etc.)
Newer systems allow the distinction via KVM_CAP_S390_COW. With this feature
qemu can use the standard qemu alloc if available, otherwise it will use
the old s390 hack.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Anyone using these functions has to be prepared that irqchip
support may not be present. It shouldn't be up to the core
code to determine whether this is a fatal error. Currently
code written as:
virq = kvm_irqchip_add_msi_route(...)
if (virq < 0) {
<slow path>
} else {
<fast path>
}
works on x86 with and without kvm irqchip enabled, works
without kvm support compiled in, but aborts() on !x86 with
kvm support.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
These are included via monitor.h right now, add them explicitly.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
A type definition and a KVMState field initialization escaped the
required wrapping with KVM_CAP_IRQ_ROUTING. Also, we need to provide a
dummy kvm_irqchip_release_virq as virtio-pci references (but does not
use) it.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Ben Collins <bcollins@ubuntu.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add services to associate an eventfd file descriptor as input with an
IRQ line as output. Such a line can be an input pin of an in-kernel
irqchip or a virtual line returned by kvm_irqchip_add_route.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Automatically commit route changes after kvm_add_routing_entry and
kvm_irqchip_release_virq. There is no performance relevant use case for
which collecting multiple route changes is beneficial. This makes
kvm_irqchip_commit_routes an internal service which assert()s that the
corresponding IOCTL will always succeed.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This allows to drop routes created by kvm_irqchip_add_irq/msi_route
again.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add a service that establishes a static route from a virtual IRQ line to
an MSI message. Will be used for IRQFD and device assignment. As we will
use this service outside of CONFIG_KVM protected code, stub it properly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
We will add kvm_irqchip_add_msi_route, so let's make the difference
clearer.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
As MSI is now fully supported by KVM (/wrt available features in
upstream), we can finally enable the in-kernel irqchip by default.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If the kernel supports KVM_SIGNAL_MSI, we can avoid the route-based
MSI injection mechanism.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch basically adds kvm_irqchip_send_msi, a service for sending
arbitrary MSI messages to KVM's in-kernel irqchip models.
As the original KVM API requires us to establish a static route from a
pseudo GSI to the target MSI message and inject the MSI via toggling
that virtual IRQ, we need to play some tricks to make this interface
transparent. We create those routes on demand and keep them in a hash
table. Succeeding messages can then search for an existing route in the
table first and reuse it whenever possible. If we should run out of
limited GSIs, we simply flush the table and rebuild it as messages are
sent.
This approach is rather simple and could be optimized further. However,
latest kernels contains a more efficient MSI injection interface that
will obsolete the GSI-based dynamic injection.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of the bitmap size, store the maximum of GSIs the kernel
support. Move the GSI limit assertion to the API function
kvm_irqchip_add_route and make it stricter.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If the kernel page size is larger than TARGET_PAGE_SIZE, which
happens for example on ppc64 with kernels compiled for 64K pages,
the dirty tracking doesn't work.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Avi Kivity <avi@redhat.com>
The current kvm_init_irq_routing() doesn't set up the used_gsi_bitmap
correctly, and as a consequence pins max_gsi to 32 when it really
should be 1024. I ran into this limitation while testing pci
passthrough, where I consistently got an -ENOSPC return from
kvm_get_irq_route_gsi() called from assigned_dev_update_msix_mmio().
Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We use a 2 byte ioeventfd for virtio memory,
add support for this.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
In kvm-all.c we store an ioctl cmd number in the irqchip_inject_ioctl field
of KVMState, which has type 'int'. This seems to make sense since the
ioctl() man page says that the cmd parameter has type int.
However, the kernel treats ioctl numbers as unsigned - sys_ioctl() takes an
unsigned int, and the macros which generate ioctl numbers expand to
unsigned expressions. Furthermore, some ioctls (IOC_READ ioctls on x86
and IOC_WRITE ioctls on powerpc) have bit 31 set, and so would be negative
if interpreted as an int. This has the surprising and compile-breaking
consequence that in kvm_irqchip_set_irq() where we do:
return (s->irqchip_inject_ioctl == KVM_IRQ_LINE) ? 1 : event.status;
We will get a "comparison is always false due to limited range of data
type" warning from gcc if KVM_IRQ_LINE is one of the bit-31-set ioctls,
which it is on powerpc.
So, despite the fact that the man page and posix say ioctl numbers are
signed, they're actually unsigned. The kernel uses unsigned, the glibc
header uses unsigned long, and FreeBSD, NetBSD and OSX also use unsigned
long ioctl numbers in the code.
Therefore, this patch changes the variable to be unsigned, fixing the
compile.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Scripted conversion:
for file in *.[hc] hw/*.[hc] hw/kvm/*.[hc] linux-user/*.[hc] linux-user/m68k/*.[hc] bsd-user/*.[hc] darwin-user/*.[hc] tcg/*/*.[hc] target-*/cpu.h; do
sed -i "s/CPUState/CPUArchState/g" $file
done
All occurrences of CPUArchState are expected to be replaced by QOM CPUState,
once all targets are QOM'ified and common fields have been extracted.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
* stefanha/trivial-patches:
configure: Quote the configure args printed in config.log
osdep: Remove local definition of macro offsetof
libcacard: Spelling and grammar fixes in documentation
Spelling fixes in comments (it's -> its)
vnc: Add break statement
libcacard: Use format specifier %u instead of %d for unsigned values
Fix sign of sscanf format specifiers
block/vmdk: Fix warning from splint (comparision of unsigned value)
qmp: Fix spelling fourty -> forty
qom: Fix spelling in documentation
sh7750: Remove redundant 'struct' from MemoryRegionOps
* it's -> its (fixed for all files)
* dont -> don't (only fixed in a line which was touched by the previous fix)
* distrub -> disturb (fixed in the same line)
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
valgrind warns about padding fields which are passed
to vcpu ioctls uninitialized.
This is not an error in practice because kvm ignored padding.
Since the ioctls in question are off data path and
the cost is zero anyway, initialize padding to 0
to suppress these errors.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
* qemu-kvm/memory/core: (30 commits)
memory: allow phys_map tree paths to terminate early
memory: unify PhysPageEntry::node and ::leaf
memory: change phys_page_set() to set multiple pages
memory: switch phys_page_set() to a recursive implementation
memory: replace phys_page_find_alloc() with phys_page_set()
memory: simplify multipage/subpage registration
memory: give phys_page_find() its own tree search loop
memory: make phys_page_find() return a MemoryRegionSection
memory: move tlb flush to MemoryListener commit callback
memory: unify the two branches of cpu_register_physical_memory_log()
memory: fix RAM subpages in newly initialized pages
memory: compress phys_map node pointers to 16 bits
memory: store MemoryRegionSection pointers in phys_map
memory: unify phys_map last level with intermediate levels
memory: remove first level of l1_phys_map
memory: change memory registration to rebuild the memory map on each change
memory: support stateless memory listeners
memory: split memory listener for the two address spaces
xen: ignore I/O memory regions
memory: allow MemoryListeners to observe a specific address space
...
kvm_set_phys_mem() may be passed sections that are not aligned to a page
boundary. The current code simply brute-forces the alignment which leads
to an inconsistency and an abort().
Fix by aligning the start and the end of the section correctly, discarding
and unaligned head or tail.
This was triggered by a guest sizing a 64-bit BAR that is smaller than a page
with PCI_COMMAND_MEMORY enabled and the upper dword clear.
Signed-off-by: Avi Kivity <avi@redhat.com>
Current memory listeners are incremental; that is, they are expected to
maintain their own state, and receive callbacks for changes to that state.
This patch adds support for stateless listeners; these work by receiving
a ->begin() callback (which tells them that new state is coming), a
sequence of ->region_add() and ->region_nop() callbacks, and then a
->commit() callback which signifies the end of the new state. They should
ignore ->region_del() callbacks.
Signed-off-by: Avi Kivity <avi@redhat.com>
This allows reverse iteration, which in turns allows consistent ordering
among multiple listeners:
l1->add
l2->add
l2->del
l1->del
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
As we have thread-local cpu_single_env now and KVM uses exactly one
thread per VCPU, we can drop the cpu_single_env updates from the loop
and initialize this variable only once during setup.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
To both avoid that kvm_irqchip_in_kernel always has to be paired with
kvm_enabled and that the former ends up in a function call, implement it
like the latter. This means keeping the state in a global variable and
defining kvm_irqchip_in_kernel as a preprocessor macro.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>