If a user chooses to turn on the auto-converge migration capability
these changes detect the lack of convergence and throttle down the
guest. i.e. force the VCPUs out of the guest for some duration
and let the migration thread catchup and help converge.
Verified the convergence using the following :
- Java Warehouse workload running on a 20VCPU/256G guest(~80% busy)
- OLTP like workload running on a 80VCPU/512G guest (~80% busy)
Sample results with Java warehouse workload : (migrate speed set to 20Gb and
migrate downtime set to 4seconds).
(qemu) info migrate
capabilities: xbzrle: off auto-converge: off <----
Migration status: active
total time: 1487503 milliseconds
expected downtime: 519 milliseconds
transferred ram: 383749347 kbytes
remaining ram: 2753372 kbytes
total ram: 268444224 kbytes
duplicate: 65461532 pages
skipped: 64901568 pages
normal: 95750218 pages
normal bytes: 383000872 kbytes
dirty pages rate: 67551 pages
---
(qemu) info migrate
capabilities: xbzrle: off auto-converge: on <----
Migration status: completed
total time: 241161 milliseconds
downtime: 6373 milliseconds
transferred ram: 28235307 kbytes
remaining ram: 0 kbytes
total ram: 268444224 kbytes
duplicate: 64946416 pages
skipped: 64903523 pages
normal: 7044971 pages
normal bytes: 28179884 kbytes
Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We have stayed at 800x600x15 as default graphics mode for the last 9 years.
If there ever was a reason to be there, surely nobody remembers it.
However, recently non-Linux PPC guests started to show bad effects on 15 bit
color mode. They do work just fine with 32 bits however.
So let's switch to 32 bit color as the default graphic mode.
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Alexander Graf <agraf@suse.de>
length is a ram_addr_t, so RAM_ADDR_FMT must be used instead of %ld.
This fixes a recently introduced regression for w64 builds.
Using RAM_ADDR_FMT also changes decimal output to sedecimal.
This is good here because length and block->length should both
use the same base in the error message.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1372359606-2759-1-git-send-email-sw@weilnetz.de
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
RDMA writes happen asynchronously, and thus the performance accounting
also needs to be able to occur asynchronously. This allows anybody
to call into savevm.c to update both f->pos as well as into arch_init.c
to update the acct_info structure with up-to-date values when
the RDMA transfer actually completes.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
on incoming migration do not memset pages to zero if they already read as zero.
this will allocate a new zero page and consume memory unnecessarily. even
if we madvise a MADV_DONTNEED later this will only deallocate the memory
asynchronously.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Not sending zero pages breaks migration if a page is zero
at the source but not at the destination. This can e.g. happen
if different BIOS versions are used at source and destination.
It has also been reported that migration on pseries is completely
broken with this patch.
This effectively reverts commit f1c72795af.
Conflicts:
arch_init.c
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Makes it easier to debug situations where the source and target have
different ram blocks in a device and migration fails due to that, for
instance a BAR size change on a PCI device.
Signed-off-by: Alon Levy <alevy@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
# By Paolo Bonzini (4) and others
# Via Peter Maydell
* pmaydell/configury.next:
ppc: Remove CONFIG_FDT conditionals
microblaze: Remove CONFIG_FDT conditionals
arm: Remove CONFIG_FDT conditionals
configure: Require libfdt for arm, ppc, microblaze softmmu targets
configure: dtc: Probe for libfdt_env.h
build: drop TARGET_TYPE
main: use TARGET_ARCH only for the target-specific #define
build: do not use TARGET_ARCH
build: rename TARGET_ARCH2 to TARGET_NAME
Add a stp file for usage from build directory
Message-id: 1371221594-11556-1-git-send-email-peter.maydell@linaro.org
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Just use the TARGET_NAME free string.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1370349928-20419-6-git-send-email-pbonzini@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Everything else needs to match the executable name, which is
TARGET_NAME.
Before:
$ sh4eb-linux-user/qemu-sh4eb --help
usage: qemu-sh4 [options] program [arguments...]
Linux CPU emulator (compiled for sh4 emulation)
After:
$ sh4eb-linux-user/qemu-sh4eb --help
usage: qemu-sh4eb [options] program [arguments...]
Linux CPU emulator (compiled for sh4eb emulation)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1370349928-20419-5-git-send-email-pbonzini@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Having size precede the associated pointer is odd. Swap them, and fix
up the types.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laszlo "ever the optimist" Ersek <lersek@redhat.com>
Message-id: 1370610036-10577-5-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Improves diagnistics from ad hoc messages like
Invalid SMBIOS UUID string
to
qemu-system-x86_64: -smbios type=1,uuid=gaga: Invalid UUID
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laszlo "ever the optimist" Ersek <lersek@redhat.com>
Message-id: 1370610036-10577-4-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Since this is a MemoryListener operation, it only makes sense
on an AddressSpace granularity.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Several targets can have wavcapture/-soundhw support via PCI cards.
HAS_AUDIO is a useless limitation, remove it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1366303444-24620-4-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Headers in include/exec/ are for the deepest innards of QEMU,
they should almost never be included directly.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Functions defined in acpi/ should be declared in
acpi.h
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Many of these should be cleaned up with proper qdev-/QOM-ification.
Right now there are many catch-all headers in include/hw/ARCH depending
on cpu.h, and this makes it necessary to compile these files per-target.
However, fixing this does not belong in these patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As one consequence, strtok() -- which modifies its argument -- is replaced
with g_strsplit().
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Message-id: 1363821803-3380-6-git-send-email-lersek@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This will remove an unneeded copy of guest memory pages.
For the page header and device state we still copy the data to the
static buffer the other option is to allocate the memory on demand
which is more expensive.
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
at the beginning of migration all pages are marked dirty and
in the first round a bulk migration of all pages is performed.
currently all these pages are copied to the page cache regardless
of whether they are frequently updated or not. this doesn't make sense
since most of these pages are never transferred again.
this patch changes the XBZRLE transfer to only be used after
the bulk stage has been completed. that means a page is added
to the page cache the second time it is transferred and XBZRLE
can benefit from the third time of transfer.
since the page cache is likely smaller than the number of pages
it's also likely that in the second round the page is missing in the
cache due to collisions in the bulk phase.
on the other hand a lot of unnecessary mallocs, memdups and frees
are saved.
the following results have been taken earlier while executing
the test program from docs/xbzrle.txt. (+) with the patch and (-)
without. (thanks to Eric Blake for reformatting and comments)
+ total time: 22185 milliseconds
- total time: 22410 milliseconds
Shaved 0.3 seconds, better than 1%!
+ downtime: 29 milliseconds
- downtime: 21 milliseconds
Not sure why downtime seemed worse, but probably not the end of the world.
+ transferred ram: 706034 kbytes
- transferred ram: 721318 kbytes
Fewer bytes sent - good.
+ remaining ram: 0 kbytes
- remaining ram: 0 kbytes
+ total ram: 1057216 kbytes
- total ram: 1057216 kbytes
+ duplicate: 108556 pages
- duplicate: 105553 pages
+ normal: 175146 pages
- normal: 179589 pages
+ normal bytes: 700584 kbytes
- normal bytes: 718356 kbytes
Fewer normal bytes...
+ cache size: 67108864 bytes
- cache size: 67108864 bytes
+ xbzrle transferred: 3127 kbytes
- xbzrle transferred: 630 kbytes
...and more compressed pages sent - good.
+ xbzrle pages: 117811 pages
- xbzrle pages: 21527 pages
+ xbzrle cache miss: 18750
- xbzrle cache miss: 179589
And very good improvement on the cache miss rate.
+ xbzrle overflow : 0
- xbzrle overflow : 0
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
avoid searching for dirty pages just increment the
page offset. all pages are dirty anyway.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
during bulk stage of ram migration if a page is a
zero page do not send it at all.
the memory at the destination reads as zero anyway.
even if there is an madvise with QEMU_MADV_DONTNEED
at the target upon receipt of a zero page I have observed
that the target starts swapping if the memory is overcommitted.
it seems that the pages are dropped asynchronously.
this patch also updates QMP to return the number of
skipped pages in MigrationStats.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
the first round of ram transfer is special since all pages
are dirty and thus all memory pages are transferred to
the target. this patch adds a boolean variable to track
this stage.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
virtually all dup pages are zero pages. remove
the special is_dup_page() function and use the
optimized buffer_find_nonzero_offset() function
instead.
here buffer_find_nonzero_offset() is used directly
to avoid the unnecssary additional checks in
buffer_is_zero().
raw performace gain checking 1 GByte zeroed memory
over is_dup_page() is approx. 10-12% with SSE2
and 8-10% with unsigned long arithmedtic.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
vector optimizations will now be used at various places
not just in is_dup_page() in arch_init.c
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The page cache frees all data on finish, on resize and
if there is collision on insert. So it should be the caches
responsibility to dup the data that is stored in the cache.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Only the migration_bitmap_sync() call needs the iothread lock.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This makes it possible to do blocking writes directly to the socket,
with no buffer in the middle. For RAM, only the migration_bitmap_sync()
call needs the iothread lock. For block migration, it is needed by
the block layer (including bdrv_drain_all and dirty bitmap access),
but because some code is shared between iterate and complete, all of
mig_save_device_dirty is run with the lock taken.
In the savevm case, the iterate callback runs within the big lock.
This is annoying because it complicates the rules. Luckily we do not
need to do anything about it: the RAM iterate callback does not need
the iothread lock, and block migration never runs during savevm.
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We removed the calculation in commit e4ed1541ac
Now we add it back. We need to create dirty_bytes_rate because we
can't include cpu-all.h from migration.c, and there is no other way to
include TARGET_PAGE_SIZE.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Before this fix we couldn't load a guest from
XBZRLE compressed file.
For example:
The user activated the XBZRLE capability
The user run migrate -d "exec:gzip -c > vm.gz"
The user won't be able to load vm.gz and get an error.
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Reviewed-by: Eric Blake <eblake@redhat.com>
It could only return 0 if we only found dirty xbzrle pages that hadn't
changed (i.e. they were written with the same content). We don't care
about that case, it is the same than nothing dirty.
So now the return of the function is how much have it written, nothing
else. Adjust callers.
And we also made ram_save_iterate() return the number of transferred
bytes, not the number of transferred pages.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Instead of testing each page individually, we search what is the next
dirty page with a bitmap operation. We have to reorganize the code to
move from a "for" loop, to a while(dirty) loop.
Signed-off-by: Juan Quintela <quintela@redhat.com>
This avoids having to do two walks over the dirty bitmap, once reading
the dirty bits, and anthoer cleaning them.
Signed-off-by: Juan Quintela <quintela@redhat.com>
This is the last block from where we have sent data.
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Code just now does (simplified for clarity)
if (qemu_savevm_state_iterate(s->file) == 1) {
vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
qemu_savevm_state_complete(s->file);
}
Problem here is that qemu_savevm_state_iterate() returns 1 when it
knows that remaining memory to sent takes less than max downtime.
But this means that we could end spending 2x max_downtime, one
downtime in qemu_savevm_iterate, and the other in
qemu_savevm_state_complete.
Changed code to:
pending_size = qemu_savevm_state_pending(s->file, max_size);
DPRINTF("pending size %lu max %lu\n", pending_size, max_size);
if (pending_size >= max_size) {
ret = qemu_savevm_state_iterate(s->file);
} else {
vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
qemu_savevm_state_complete(s->file);
}
So what we do is: at current network speed, we calculate the maximum
number of bytes we can sent: max_size.
Then we ask every save_live section how much they have pending. If
they are less than max_size, we move to complete phase, otherwise we
do an iterate one.
This makes things much simpler, because now individual sections don't
have to caluclate the bandwidth (it was implossible to do right from
there).
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Add the new mutex that protects shared state between ram_save_live
and the iothread. If the iothread mutex has to be taken together
with the ramlist mutex, the iothread shall always be _outside_.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
This will be used to detect if last_block might have become invalid
across different calls to ram_save_live.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Umesh Deshpande <udeshpan@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Most of the time, only 2 items will be active (from/to for a string operation,
or code/data). But TCG guests likely won't have gigabytes of memory, so
this actually goes down to 1 item.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>