# By Michael R. Hines (9) and others
# Via Juan Quintela
* quintela/migration.next:
rdma: introduce capability x-rdma-pin-all
rdma: new QEMUFileOps hooks
rdma: introduce qemu_ram_foreach_block()
rdma: export qemu_fflush()
rdma: introduce qemu_file_mode_is_not_valid()
rdma: export throughput w/ MigrationStats QMP
rdma: export yield_until_fd_readable()
rdma: introduce qemu_update_position()
rdma: add documentation
migration: do not overwrite zero pages
Revert "migration: do not sent zero pages in bulk stage"
arch_init/ram_load: add error message for block length mismatch
Message-id: 1372329455-5995-1-git-send-email-quintela@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This capability allows you to disable dynamic chunk registration
for better throughput on high-performance links.
For example, using an 8GB RAM virtual machine with all 8GB of memory in
active use and the VM itself is completely idle using a 40 gbps infiniband link:
1. x-rdma-pin-all disabled total time: approximately 7.5 seconds @ 9.5 Gbps
2. x-rdma-pin-all enabled total time: approximately 4 seconds @ 26 Gbps
These numbers would of course scale up to whatever size virtual machine
you have to migrate using RDMA.
Enabling this feature does *not* have any measurable affect on
migration *downtime*. This is because, without this feature, all of the
memory will have already been registered already in advance during
the bulk round and does not need to be re-registered during the successive
iteration rounds.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
These are the prototypes and implementation of new hooks that
RDMA takes advantage of to perform dynamic page registration.
An optional hook is also introduced for a custom function
to be able to override the default save_page function.
Also included are the prototypes and accessor methods used by
arch_init.c which invoke funtions inside savevm.c to call out
to the hooks that may or may not have been overridden
inside of QEMUFileOps.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This is used during RDMA initialization in order to
transmit a description of all the RAM blocks to the
peer for later dynamic chunk registration purposes.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
RDMA uses this to flush the control channel before sending its
own message to handle page registrations.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
QEMUFileRDMA also has read and write modes. This function is now
shared to reduce code duplication.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This exposes throughput (in megabits/sec) through QMP.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The RDMA event channel can be made non-blocking just like a TCP
socket. Exporting this function allows us to yield so that the
QEMU monitor remains available.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
RDMA writes happen asynchronously, and thus the performance accounting
also needs to be able to occur asynchronously. This allows anybody
to call into savevm.c to update both f->pos as well as into arch_init.c
to update the acct_info structure with up-to-date values when
the RDMA transfer actually completes.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
docs/rdma.txt contains full documentation,
wiki links, github url and contact information.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
on incoming migration do not memset pages to zero if they already read as zero.
this will allocate a new zero page and consume memory unnecessarily. even
if we madvise a MADV_DONTNEED later this will only deallocate the memory
asynchronously.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Not sending zero pages breaks migration if a page is zero
at the source but not at the destination. This can e.g. happen
if different BIOS versions are used at source and destination.
It has also been reported that migration on pseries is completely
broken with this patch.
This effectively reverts commit f1c72795af.
Conflicts:
arch_init.c
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Makes it easier to debug situations where the source and target have
different ram blocks in a device and migration fails due to that, for
instance a BAR size change on a PCI device.
Signed-off-by: Alon Levy <alevy@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
# By Peter Crosthwaite (3) and others
# Via Peter Maydell
* pmaydell/arm-devs.for-upstream:
nand: Don't inherit from Sysbus
block/nand: Convert Sysbus::init to Device::realize
block/nand: QOM casting sweep
i.MX31: Fix PRCS bit test
arm/boot: Free dtb blob memory after use
i.MX: Rework functions/types name and use new style initialization
i.MX: Implement a more complete version of the GPT timer.
ARM: Allow dumping of device tree
Message-id: 1372184516-32397-1-git-send-email-peter.maydell@linaro.org
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
# By Peter Maydell
# Via Peter Maydell
* pmaydell/target-arm.for-upstream:
target-arm: Make LPAE feature imply V7MP
target-arm: Use tuple list to sync cp regs with KVM
target-arm: Reinitialize all KVM VCPU registers on reset
target-arm: Initialize cpreg list from KVM when using KVM
target-arm: Convert TCG to using (index,value) list for cp migration
target-arm: mark up cpregs for no-migrate or raw access
target-arm: Add raw_readfn and raw_writefn to ARMCPRegInfo
target-arm: Allow special cpregs to have flags set
Message-id: 1372181592-32170-1-git-send-email-peter.maydell@linaro.org
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
# By Cornelia Huck (2) and Christian Borntraeger (1)
# Via Cornelia Huck
* cohuck/virtio-ccw-upstr:
virtio-ccw: Wire up guest and host notifies.
virtio-ccw: Wire up ioeventfd.
s390/virtio-ccw: Fix virtio reset
Message-id: 1372177538-9812-1-git-send-email-cornelia.huck@de.ibm.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
# By Paul Durrant (2) and Stefano Stabellini (1)
# Via Stefano Stabellini
* sstabellini/xen-20130625:
Move hardcoded initialization of xen-platform device.
Allow use of pc machine type (accel=xen) for Xen HVM domains.
Revert "xen: start PCI hole at 0xe0000000 (same as pc_init1 and qemu-xen-traditional)"
Message-id: alpine.DEB.2.02.1306251323220.4782@kaball.uk.xensource.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Nand chips are not sysbus devices - they do not have any sense of MMIO,
nor interrupts. Re-parent to TYPE_DEVICE accordingly.
Cc: afaerber@suse.de
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The prescribed transition from Sysbus::init function to a
Device::realize.
Cc: afaerber@suse.de
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Define and use standard QOM cast macro. Remove usages of DO_UPCAST and
direct -> style casting.
Cc: afaerber@suse.de
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
cppcheck detected a condition which was always false.
According to the MCIMX31 Reference Manual, the PRCS bits have to be 01
to select the Frequency Pre-Multiplier (FPM). PRCS uses bits 1 and 2,
so we have to test for 2.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au>
Message-id: 1370810662-32320-1-git-send-email-sw@weilnetz.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The dtb blob returned by load_device_tree() is in memory allocated
with g_malloc(). Free it accordingly once we have copied its
contents into the guest memory. To make this easy, we need also to
clean up the error handling in load_dtb() so that we consistently
handle errors in the same way (by printing a message and then
returning -1, rather than either plowing on or exiting immediately).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Message-id: 1371209256-11408-1-git-send-email-peter.maydell@linaro.org
* use dynamic cast whenever possible
* Change function names to some more meaningful prefix
* Change type names to a more meaningful one
* use new style device initialization
Signed-off-by: Jean-Christophe DUBOIS <jcd@tribudubois.net>
Message-id: 1369898943-1993-3-git-send-email-jcd@tribudubois.net
Reviewed-by: Peter Chubb <peter.chubb@nicta.com.au>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
By calling qemu_devtree_dumpdtb near the end of load_dtb.
Signed-off-by: John Rigby <john.rigby@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The v7 ARM ARM specifies that the Large Physical Address
Extension requires implementation of the Multiprocessing
Extensions, so make our LPAE feature imply V7MP rather
than specifying both in the A15 CPU initfn.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Message-id: 1371127899-10364-1-git-send-email-peter.maydell@linaro.org
Use the tuple list of cp registers for syncing KVM state to QEMU,
rather than only syncing a very minimal set by hand.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Since the ARM KVM API doesn't include a "reset this VCPU"
ioctl, we have to capture the initial values of every
register it knows about so that we can reset the VCPU
by feeding those values back again.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When using KVM, use the kernel's initial state to set up the
cpreg list, and sync to and from the kernel when doing
migration.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Convert the TCG ARM target to using an (index,value) list for migrating
coprocessors. The primary benefit of the (index,value) list is for
passing state between KVM and QEMU, but it works for TCG-to-TCG
migration as well and is a useful self-contained first step.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Mark up coprocessor register definitions to add raw access
functions or mark the register as non-migratable where necessary.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For reading and writing register values from the kernel for KVM,
we need to provide accessor functions which are guaranteed to succeed
and don't impose access checks, mask out unwritable bits, etc.
Define new fields raw_readfn and raw_writefn for this purpose;
these only need to be provided if there is a readfn or writefn
already and it is not suitable.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Relax the "is this a valid ARMCPRegInfo type value?" check to permit
"special" cpregs to have flags other than ARM_CP_SPECIAL set. At
the moment none of the other flags are relevant for special regs,
but the migration related flag we're about to introduce can apply
here too.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Guest and host notifiers are needed by vhost. We use ioeventfds for
the guest notifiers, but need to fall back on qemu injecting interrupts
for the host notifiers.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
On hosts that support ioeventfd, make use of it for host-to-guest
notifications via diagnose 500.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
On virtio reset we must reset the indicator to avoid stale interrupts,
e.g. after a reset.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Creation of the xen-platform device is currently hardcoded into machine
type pc's initialization code, guarded by a test for the whether the xen
accelerator is enabled. This patch moves the creation of xen-platform into
the initialization code of the xenfv machine type. This maintains backwards
compatibility for that machine type but allows more flexibility if another
machine type is used with Xen HVM domains.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Xen HVM domains normally spawn QEMU with a dedicated xenfv machine type. The
initialization code for this machine type can easily be pulled into the
generic pc initialization code and guarded with a test for whether the xen
accelerator options is specified, which is more consistent with the way
other accelerators are used.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
This reverts commit 9f24a8030a.
The start of the PCI hole is actually set to 0xf0000000 by hvmloader.
In order to retain ABI compatibility with Xen we leave the start of the
PCI hole at 0xf0000000 in QEMU (for Xen) too.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
# By Kevin Wolf (22) and Fam Zheng (1)
# Via Stefan Hajnoczi
* stefanha/block: (23 commits)
vmdk: refuse to open higher version than supported
block: Always enable discard on the protocol level
qcow2: Batch discards
qcow2: Options to enable discard for freed clusters
qcow2: Add refcount update reason to all callers
Revert "block: Disable driver-specific options for 1.5"
ide: Clean up ide_exec_cmd()
ide: Convert SMART commands to ide_cmd_table handler
ide: Convert CF-ATA commands to ide_cmd_table handler
ide: Convert ATAPI commands to ide_cmd_table handler
ide: Convert SEEK to ide_cmd_table handler
ide: Convert FLUSH CACHE to ide_cmd_table handler
ide: Convert SET FEATURES to ide_cmd_table handler
ide: Convert CHECK POWER MDOE to ide_cmd_table handler
ide: Convert READ NATIVE MAX ADDRESS to ide_cmd_table handler
ide: Convert DMA read/write commands to ide_cmd_table handler
ide: Convert PIO read/write commands to ide_cmd_table handler
ide: Convert read/write multiple commands to ide_cmd_table handler
ide: Convert verify commands to ide_cmd_table handler
ide: Convert cmd_nop commands to ide_cmd_table handler
...
Message-id: 1372065035-19601-1-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
# By Andreas Färber (3) and others
# Via Gerd Hoffmann
* kraxel/usb.84:
usb: fix serial number for hid devices
usb: add serial bus property
usb-host-libusb: set USB_DEV_FLAG_IS_HOST
usb/host-libusb: Fix building with libusb git master code
usb/hcd-ehci: Add Faraday FUSBH200 support
usb/hcd-ehci: Replace PORTSC macros with variables
usb/hcd-ehci: Add Tegra2 SysBus EHCI device
usb/hcd-ehci: Split off instance_init from realize
usb/hcd-ehci-sysbus: Convert to QOM realize
# By Stefan Weil (5) and others
# Via Michael Tokarev
* mjt/trivial-patches:
configure: Add signed*signed check to [u]int128_t test
Makefile: pass include directives to dtc via CPPFLAGS, not CFLAGS
qapi: lack of two commas in dict
sd: pass bool parameter for sd_init
qemu-char: use bool in qemu_chr_open_socket and simplify code a bit
vnc: use booleans for vnc_connect, vnc_listen_read and vnc_display_add_client
block/nand: Formatting sweep
qxl: Fix QXLRam initialisation.
acl: acl_add can't insert before last list element, fix
configure: Fix "ERROR: ERROR: " for missing/incompatible DTC
audio: Replace static functions in header file by macros, remove GCC_ATTR
libcacard: Fix cppcheck warning and remove unneeded code
savevm: Fix potential memory leak
kvm: Fix potential resource leak (missing fclose)
qemu-img: Add missing GCC_FMT_ATTR
qemu-options: trivial fix for -mon args help
vl: reformat SDL ifdeffery a bit
Message-id: 1371893076-9643-1-git-send-email-mjt@msgid.tls.msk.ru
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Refuse to open higher version for safety.
Although we try to be compatible with published VMDK spec, VMware has
newer version from ESXi 5.1 exported OVF/OVA, which we have no knowledge
what's changed in it. And it is very likely to have more new versions in
the future, so it's not safe to open them blindly.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Turning on discard options in qcow2 doesn't help a lot when the discard
requests that it issues are thrown away by the raw-posix layer. This
patch always enables discard functionality on the protocol level so that
it's the image format's responsibility to send (or not) discard
requests. Requests sent by the guest will be allowed or ignored by the
top level BlockDriverState, which depends on the discard=... option like
before.
In particular, this means that even without specifying options, the
qcow2 default of discarding deleted snapshots actually takes effect now,
both for qemu and qemu-img.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This optimises the discard operation for freed clusters by batching
discard requests (both snapshot deletion and bdrv_discard end up
updating the refcounts cluster by cluster).
Note that we don't discard asynchronously, but keep s->lock held. This
is to avoid that a freed cluster is reallocated and written to while the
discard is still in flight.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Deleted snapshots are discarded in the image file by default, discard
requests take their default from the -drive discard=... option and other
places that free clusters must always be enabled explicitly.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This adds a refcount update reason to all callers of update_refcounts(),
so that a follow-up patch can use this information to decide whether
clusters that reach a refcount of 0 should be discarded in the image
file.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>