This patch fixed as follows:
Thread 1 (Thread 0x7f34ee738d80 (LWP 11212)):
#0 __pthread_clockjoin_ex (threadid=139847152957184, thread_return=0x7f30b1febf30, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
#1 0x0000563401998e36 in qemu_thread_join (thread=0x563402d66610) at util/qemu-thread-posix.c:587
#2 0x00005634017a79fa in process_incoming_migration_co (opaque=0x0) at migration/migration.c:502
#3 0x00005634019b59c9 in coroutine_trampoline (i0=63395504, i1=22068) at util/coroutine-ucontext.c:115
#4 0x00007f34ef860660 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007f30b21ee730 in ?? ()
#6 0x0000000000000000 in ?? ()
Thread 13 (Thread 0x7f30b3dff700 (LWP 11747)):
#0 __lll_lock_wait (futex=futex@entry=0x56340218ffa0 <qemu_global_mutex>, private=0) at lowlevellock.c:52
#1 0x00007f34efa000a3 in _GI__pthread_mutex_lock (mutex=0x56340218ffa0 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
#2 0x0000563401997f99 in qemu_mutex_lock_impl (mutex=0x56340218ffa0 <qemu_global_mutex>, file=0x563401b7a80e "migration/colo.c", line=806) at util/qemu-thread-posix.c:78
#3 0x0000563401407144 in qemu_mutex_lock_iothread_impl (file=0x563401b7a80e "migration/colo.c", line=806) at /home/workspace/colo-qemu/cpus.c:1899
#4 0x00005634017ba8e8 in colo_process_incoming_thread (opaque=0x563402d664c0) at migration/colo.c:806
#5 0x0000563401998b72 in qemu_thread_start (args=0x5634039f8370) at util/qemu-thread-posix.c:519
#6 0x00007f34ef9fd609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7 0x00007f34ef924293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The QEMU main thread is holding the lock:
(gdb) p qemu_global_mutex
$1 = {lock = {_data = {lock = 2, __count = 0, __owner = 11212, __nusers = 9, __kind = 0, __spins = 0, __elision = 0, __list = {_prev = 0x0, __next = 0x0}},
__size = "\002\000\000\000\000\000\000\000\314+\000\000\t", '\000' <repeats 26 times>, __align = 2}, file = 0x563401c07e4b "util/main-loop.c", line = 240,
initialized = true}
>From the call trace, we can see it is a deadlock bug. and the QEMU main thread holds the global mutex to wait until the COLO thread ends. and the colo thread
wants to acquire the global mutex, which will cause a deadlock. So, we should release the qemu_global_mutex before waiting colo thread ends.
Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This patch fixes the following:
qemu-system-x86_64: invalid runstate transition: 'shutdown' -> 'running'
Aborted (core dumped)
The gdb bt as following:
0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
1 0x00007faa3d613859 in __GI_abort () at abort.c:79
2 0x000055c5a21268fd in runstate_set (new_state=RUN_STATE_RUNNING) at vl.c:723
3 0x000055c5a1f8cae4 in vm_prepare_start () at /home/workspace/colo-qemu/cpus.c:2206
4 0x000055c5a1f8cb1b in vm_start () at /home/workspace/colo-qemu/cpus.c:2213
5 0x000055c5a2332bba in migration_iteration_finish (s=0x55c5a4658810) at migration/migration.c:3376
6 0x000055c5a2332f3b in migration_thread (opaque=0x55c5a4658810) at migration/migration.c:3527
7 0x000055c5a251d68a in qemu_thread_start (args=0x55c5a5491a70) at util/qemu-thread-posix.c:519
8 0x00007faa3d7e9609 in start_thread (arg=<optimized out>) at pthread_create.c:477
9 0x00007faa3d710293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
If the compression migration fails or is canceled, the query for the value of
compression_counters during the next compression migration is wrong.
Signed-off-by: yuxiating <yuxiating@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This avoids to call migrate_get_current() in the caller function
whereas migration_cancel() already needs the pointer to the current
migration state.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Let's use QEMU_ALIGN_DOWN() and friends to make the code a bit easier to
read.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
An internal version that removes -only-migratable implications. It can be used
for temporary migration blockers like dump-guest-memory.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
save_snapshot() checks migration blocker, which looks sane. At the meantime we
should also teach the blocker add helper to fail if during a snapshot, just
like for migrations.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org>
CC: Li Zhijian <lizhijian@cn.fujitsu.com>
Date: Sat, 31 Jul 2021 22:05:52 +0800 (5 weeks, 4 days, 17 hours ago)
And change the default to true so that in '-incoming defer' case, user is able
to change multifd capability.
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org>
CC: Li Zhijian <lizhijian@cn.fujitsu.com>
Date: Sat, 31 Jul 2021 22:05:51 +0800 (5 weeks, 4 days, 17 hours ago)
multifd with unsupported protocol will cause a segment fault.
(gdb) bt
#0 0x0000563b4a93faf8 in socket_connect (addr=0x0, errp=0x7f7f02675410) at ../util/qemu-sockets.c:1190
#1 0x0000563b4a797a03 in qio_channel_socket_connect_sync
(ioc=0x563b4d16e8c0, addr=0x0, errp=0x7f7f02675410) at
../io/channel-socket.c:145
#2 0x0000563b4a797abf in qio_channel_socket_connect_worker (task=0x563b4cd86c30, opaque=0x0) at ../io/channel-socket.c:168
#3 0x0000563b4a792631 in qio_task_thread_worker (opaque=0x563b4cd86c30) at ../io/task.c:124
#4 0x0000563b4a91da69 in qemu_thread_start (args=0x563b4c44bb80) at ../util/qemu-thread-posix.c:541
#5 0x00007f7fe9b5b3f9 in ?? ()
#6 0x0000000000000000 in ?? ()
It's enough to check migrate_multifd_is_allowed() in multifd cleanup() and
multifd setup() though there are so many other places using migrate_use_multifd().
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
qemu_savevm_state_complete_postcopy assumes the iothread lock (BQL)
to be held, but instead it isn't.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20211005080751.3797161-3-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit b673eab4e2 "multifd: Make multifd_load_setup() get an Error
parameter" changed migration_incoming_setup() to take an Error **
argument, and adjusted the callers accordingly. It neglected to
change adjust multifd_load_setup(): it still exit()s on error. Clean
that up.
The error now gets propagated up two call chains: via
migration_fd_process_incoming() to rdma_accept_incoming_migration(),
and via migration_ioc_process_incoming() to
migration_channel_process_incoming(). Both chain ends report the
error with error_report_err(), but otherwise ignore it. Behavioral
change: we no longer exit() on this error.
This is consistent with how we handle other errors here, e.g. from
multifd_recv_new_channel() via migration_ioc_process_incoming() to
migration_channel_process_incoming(). Whether it's consistently right
or consistently wrong I can't tell.
Also clean up the return value from the unusual 0 on success, 1 on
error to the more common true on success, false on error.
Cc: Juan Quintela <quintela@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210720125408.387910-11-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
We did this with scripts/coccinelle/use-error_fatal.cocci before, in
commit 50beeb6809 and 007b06578a. This commit cleans up rarer
variations that don't seem worth matching with Coccinelle.
Cc: Thomas Huth <thuth@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210720125408.387910-2-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
It's efficient, but hackish to call yank unregister calls in channel_close(),
especially it'll be hard to debug when qemu crashed with some yank function
leaked.
Remove that hack, but instead explicitly unregister yank functions at the
places where needed, they are:
(on src)
- migrate_fd_cleanup
- postcopy_pause
(on dst)
- migration_incoming_state_destroy
- postcopy_pause_incoming
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210722175841.938739-6-peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Accessing from_dst_file is potentially racy in current code base like below:
if (s->from_dst_file)
do_something(s->from_dst_file);
Because from_dst_file can be reset right after the check in another
thread (rp_thread). One example is migrate_fd_cancel().
Use the same qemu_file_lock to protect it too, just like to_dst_file.
When it's safe to access without lock, comment it.
There's one special reference in migration_thread() that can be replaced by
the newly introduced rp_thread_created flag.
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Message-Id: <20210722175841.938739-3-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
with Peter's fixup
It's possible that the migration thread skip the join() of the rp_thread in
below race and crash on src right at finishing migration:
migration_thread rp_thread
---------------- ---------
migration_completion()
(before rp_thread quits)
from_dst_file=NULL
[thread got scheduled out]
s->rp_state.from_dst_file==NULL
(skip join() of rp_thread)
migrate_fd_cleanup()
qemu_fclose(s->to_dst_file)
yank_unregister_instance()
assert(yank_find_entry()) <------- crash
It could mostly happen with postcopy, but that shouldn't be required, e.g., I
think it could also trigger with MIGRATION_CAPABILITY_RETURN_PATH set.
It's suspected that above race could be the root cause of a recent (but rare)
migration-test break reported by either Dave or PMM:
https://lore.kernel.org/qemu-devel/YPamXAHwan%2FPPXLf@work-vm/
The issue is: from_dst_file is reset in the rp_thread, so if the thread reset
it to NULL fast enough then the migration thread will assume there's no
rp_thread at all.
This could potentially cause more severe issue (e.g. crash) after the yank code.
Fix it by using a boolean to keep "whether we've created rp_thread".
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210722175841.938739-2-peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
For each "migrate" command, remember to clear the s->error before going on.
For one reason, when there's a new error it'll be still remembered; see
migrate_set_error() who only sets the error if error==NULL. Meanwhile if a
failed migration completes (e.g., postcopy recovered and finished), we
shouldn't dump an error when calling migrate_fd_cleanup() at last.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210708190653.252961-4-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Below process could crash qemu with postcopy recovery:
1. (hmp) migrate -d ..
2. (hmp) migrate_start_postcopy
3. [network down, postcopy paused]
4. (hmp) migrate -r $WRONG_PORT
when try the recover on an invalid $WRONG_PORT, cleanup_bh will be cleared
5. (hmp) migrate -r $RIGHT_PORT
[qemu crash on assert(cleanup_bh)]
The thing is we shouldn't cleanup if it's postcopy resume; the error is set
mostly because the channel is wrong, so we return directly waiting for the user
to retry.
migrate_fd_cleanup() should only be called when migration is cancelled or
completed.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210708190653.252961-3-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
When postcopy pause triggered, we rely on the migration thread to cleanup the
to_dst_file handle, and the return path thread to cleanup the from_dst_file
handle (which is stored in the local variable "rp").
Within the process, from_dst_file cleanup (qemu_fclose) is postponed until it's
setup again due to a postcopy recovery.
It used to work before yank was born; after yank is introduced we rely on the
refcount of IOC to correctly unregister yank function in channel_close(). If
without the early and on-time release of from_dst_file handle the yank function
will be leftover during paused postcopy.
Without this patch, below steps (quoted from Xiaohui) could trigger qemu src
crash:
1.Boot vm on src host
2.Boot vm on dst host
3.Enable postcopy on src&dst host
4.Load stressapptest in vm and set postcopy speed to 50M
5.Start migration from src to dst host, change into postcopy mode when migration is active.
6.When postcopy is active, down the network card(do migration via this network) on dst host.
7.Wait untill postcopy is paused on src&dst host.
8.Before up network card, recover migration on dst host, will get error like following.
9.Ignore the error of step 8, go on recovering migration on src host:
After step 9, qemu on src host will core dump after some seconds:
qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
1.sh: line 38: 44662 Aborted (core dumped)
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210708190653.252961-2-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
When the migration fails or is canceled we wait the end of the unplug
operation to be able to plug it back. But if the unplug operation
is never finished we stop to wait and QEMU emits a warning to inform
the user.
Based-on: 20210629155007.629086-1-lvivier@redhat.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20210701131458.112036-1-lvivier@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
If the user cancels the migration in the unplug-wait state,
QEMU will try to plug back the card and this fails because the card
is partially unplugged.
To avoid the problem, continue to wait the card unplug, but to
allow the migration to be canceled if the card never finishes to unplug
use a timeout.
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210629155007.629086-3-lvivier@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The loop is used in migration_thread() and bg_migration_thread(),
so we can move it to its own function and call it from these both places.
Moreover, in migration_thread() we have a wrong state transition from
SETUP to ACTIVE while state could be WAIT_UNPLUG. This is correctly
managed in bg_migration_thread() so use this code instead.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20210629155007.629086-2-lvivier@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
It's possible qemu_start_incoming_migration() failed at any point, when it
happens we should reset postcopy_recover_triggered to false so that the user
can still retry with a saner incoming port.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210629181356.217312-3-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Starting from commit b5eea99ec2, qmp_migrate_recover() calls unregister
before calling qemu_start_incoming_migration(). I believe it wanted to mitigate
the next call to yank_register_instance(), but I think that's wrong.
Firstly, if during recover, we should keep the yank instance there, not
"quickly removing and adding it back".
Meanwhile, calling qmp_migrate_recover() twice with b5eea99ec2 will directly
crash the dest qemu (right now it can't; but it'll start to work right after
the next patch) because the 1st call of qmp_migrate_recover() will unregister
permanently when the channel failed to establish, then the 2nd call of
qmp_migrate_recover() crashes at yank_unregister_instance().
This patch fixes it by moving yank ops out of qemu_start_incoming_migration()
into qmp_migrate_incoming. For qmp_migrate_recover(), drop the unregister of
yank instance too since we keep it there during the recovery phase.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210629181356.217312-2-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This is a critical failure scenario for migration that is hard to
diagnose from existing probes. Most likely it is caused by an error
from bdrv_flush(), but we're not logging the errno anywhere, hence
this new probe.
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
We can detect disk migration in migrate_prepare, if disk migration
is enabled in COLO mode, we can directly report an error.and there
is no need to disable block migration at every checkpoint.
Signed-off-by: Lei Rao <lei.rao@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Add a cleanup hook for incoming migration that gets called
at the end as a way for a transport to allow cleanup.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210421112834.107651-4-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The CONFIG_VFIO switch only works in target specific code. Since
migration/migration.c is common code, the #ifdef does not have
the intended behavior here. Move the related code to a separate
file now which gets compiled via specific_ss instead.
Fixes: 3710586caa ("qapi: Add VFIO devices migration stats in Migration stats")
Message-Id: <20210414112004.943383-3-thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Resizing while migrating is dangerous and does not work as expected.
The whole migration code works on the usable_length of ram blocks and does
not expect this to change at random points in time.
In the case of precopy, the ram block size must not change on the source,
after syncing the RAM block list in ram_save_setup(), so as long as the
guest is still running on the source.
Resizing can be trigger *after* (but not during) a reset in
ACPI code by the guest
- hw/arm/virt-acpi-build.c:acpi_ram_update()
- hw/i386/acpi-build.c:acpi_ram_update()
Use the ram block notifier to get notified about resizes. Let's simply
cancel migration and indicate the reason. We'll continue running on the
source. No harm done.
Update the documentation. Postcopy will be handled separately.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210429112708.12291-5-david@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Manual merge
Result @blocked is redundant. Unfortunately, we realized this too
close to the release to risk dropping it, so we deprecated it
instead, in commit e11ce6c06.
Since it was deprecated from the start, we can delete it without
the customary grace period. Do so.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210429140424.2802929-1-armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This commit solves the issue with userfault_fd WP feature that
background snapshot is based on. For any never poluated or discarded
memory page, the UFFDIO_WRITEPROTECT ioctl() would skip updating
PTE for that page, thereby loosing WP setting for it.
So we need to pre-fault pages for each RAM block to be protected
before making a userfault_fd wr-protect ioctl().
Fixes: 278e2f551a (migration: support
UFFD write fault processing in ram_save_iterate())
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reported-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210401092226.102804-4-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert:
Bodged ifdef __linux__ on ram_write_tracking_prepare, should really
go in a stub
The same thing as for incoming postcopy - we cannot deal with concurrent
RAM discards when using background snapshot feature in outgoing migration.
Fixes: 8518278a6a (migration: implementation
of background snapshot thread)
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reported-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210401092226.102804-3-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Added missing qemu_fflush() on buffer file holding precopy device state.
Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs.
Typical configurations often require >200KB for device state and VMDESC.
Fixes: 8518278a6a (migration: implementation
of background snapshot thread)
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Message-Id: <20210401092226.102804-2-andrey.gruzdev@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The generic 'migrate_set_parameters' command handle all types of param.
Only the QMP commands were documented in the deprecations page, but the
rationale for deprecating applies equally to HMP, and the replacements
exist. Furthermore the HMP commands are just shims to the QMP commands,
so removing the latter breaks the former unless they get re-implemented.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Replaced various qemu_mutex_lock calls and their respective
qemu_mutex_unlock calls with QEMU_LOCK_GUARD macro. This simplifies
the code by eliminating the respective qemu_mutex_unlock calls.
Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
Message-Id: <20210311031538.5325-7-ma.mandourr@gmail.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Modify query-migrate so that it has a flag indicating if outbound
migration is blocked, and if it is a list of reasons.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202135522.127380-2-dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
migrate_params_check() has a number of error messages of the form
Parameter 'NAME' expects is invalid, it should be ...
Fix them to something like
Parameter 'NAME' expects a ...
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202141734.2488076-5-armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
73af8dd8d7 "migration: Make xbzrle_cache_size a migration
parameter" (v2.11.0) made the new parameter unsigned (QAPI type
'size', uint64_t in C). It neglected to update existing code, which
continues to use int64_t.
migrate_xbzrle_cache_size() returns the new parameter. Adjust its
return type.
QMP query-migrate-cache-size returns migrate_xbzrle_cache_size().
Adjust its return type.
migrate-set-parameters passes the new parameter to
xbzrle_cache_resize(). Adjust its parameter type.
xbzrle_cache_resize() passes it on to cache_init(). Adjust its
parameter type.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202141734.2488076-3-armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Introducing implementation of 'background' snapshot thread
which in overall follows the logic of precopy migration
while internally utilizes completely different mechanism
to 'freeze' vmstate at the start of snapshot creation.
This mechanism is based on userfault_fd with wr-protection
support and is Linux-specific.
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210129101407.103458-5-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Add new capability to 'qapi/migration.json' schema.
Update migrate_caps_check() to validate enabled capability set
against introduced one. Perform checks for required kernel features
and compatibility with guest memory backends.
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210129101407.103458-2-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
These cases require a bit more thought to review; in each case, the
code was appending to a list, but not with a FOOList **tail variable.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210113221013.390592-6-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Flawed change to qmp_guest_network_get_interfaces() dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Register yank functions on sockets to shut them down.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <484c6a14cc2506bebedd5a237259b91363ff8f88.1609167865.git.lukasstraub2@web.de>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/dynUSHGFybWJydUBy
ZWRoYXQuY29tAAoJEDhwtADrkYZT3igP/3bWwsKR5vKVsDUTmMfrhcgaFvQiaYoG
F29Bond8Xy0Zd0gl7OWh/5jKL0vGlrEVPrKfYLUjMnfkeRec/pOkIB2oOmIxpnPs
9zi4kh2hQ3dEoRBuvSnnZzedetYPTuCpWMIjlztkgfxgcimqm8TPNVSxRaSApjC3
Y8108wGwBWVf2C0rhKO9E2xA51uo6khy05i1psUtqUlC+PuDQ/OwzQHM2dnWdDB6
kUwBDK17nhL6WwsYqCyKLSiDModReYfDiY8GS5MDLo74dzwXiatEefCR7+sbM4xq
eX/SBoqoeS1jLPNuCryNeGNKvNA2KAbEJTnbQA2NxBXHgZ9/1SxVZFxuPp4nDMSQ
N7BDuDI8YtJE479RjT/ZzRG65xadGBSe/HXkXM9mZwh1zitop8SVZ9fArFBHvNzw
Y5zAv3fQd54+87psffg4dYFK0wGmqTabLEEuVzM8KIVqcAdYA2yC2b2EHy+vsxuq
GMkr0WaA6Sq2gthXmzdTjmUPuHdan/NIhuV6d66SbPNH2oH31piptFxuznyFWSKV
isciFFdUrkg5QrF8DSt2nmdwMFf8QGbszqP8QIGMzhJCCS9GXIiGG8f149++q8X8
HO1lFAdLQJdrDwCYmfx36tOvi2rS/rcoTGgvg66UX3xKko1ruoxR1ZWcS54obJN6
vEQDZ+PxubDg
=vGLy
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into staging
QAPI patches patches for 2020-12-19
# gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT
# gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg: issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits)
qobject: Make QString immutable
block: Use GString instead of QString to build filenames
keyval: Use GString to accumulate value strings
json: Use GString instead of QString to accumulate strings
migration: Replace migration's JSON writer by the general one
qobject: Factor JSON writer out of qobject_to_json()
qobject: Factor quoted_str() out of to_json()
qobject: Drop qstring_get_try_str()
qobject: Drop qobject_get_try_str()
Revert "qobject: let object_property_get_str() use new API"
block: Avoid qobject_get_try_str()
qmp: Fix tracing of non-string command IDs
qobject: Move internals to qobject-internal.h
hw/rdma: Replace QList by GQueue
Revert "qstring: add qstring_free()"
qobject: Change qobject_to_json()'s value to GString
qobject: Use GString instead of QString to accumulate JSON
qobject: Make qobject_to_json_pretty() take a pretty argument
monitor: Use GString instead of QString for output buffer
hmp: Simplify how qmp_human_monitor_command() gets output
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Anywhere we create a list of just one item or by prepending items
(typically because order doesn't matter), we can use
QAPI_LIST_PREPEND(). But places where we must keep the list in order
by appending remain open-coded until later patches.
Note that as a side effect, this also performs a cleanup of two minor
issues in qga/commands-posix.c: the old code was performing
new = g_malloc0(sizeof(*ret));
which 1) is confusing because you have to verify whether 'new' and
'ret' are variables with the same type, and 2) would conflict with C++
compilation (not an actual problem for this file, but makes
copy-and-paste harder).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201113011340.463563-5-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
[Straightforward conflicts due to commit a8aa94b5f8 "qga: update
schema for guest-get-disks 'dependents' field" and commit a10b453a52
"target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c"
resolved. Commit message tweaked.]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Instead of taking a list parameter and returning a new head at a
distance, just return the new item for the caller to insert into a
list via QAPI_LIST_PREPEND.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201113011340.463563-4-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Move the property types and property macros implemented in
qdev-properties-system.c to a new qdev-properties-system.h
header.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The following steps will cause qemu assertion failure:
- pause vm by executing 'virsh suspend'
- create external snapshot of memory and disk using 'virsh snapshot-create-as'
- doing the above operation again will cause qemu crash
The backtrace looks like:
at /build/qemu-5.0/migration/savevm.c:1401
at /build/qemu-5.0/migration/savevm.c:1453
When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE
flag by bdrv_inactivate_all(), and during the second migration the
bdrv_inactivate_recurse assert that the bs->open_flags is already
BDRV_O_INACTIVE enabled which cause crash.
As Vladimir suggested, this patch makes migrate_prepare check the state of vm and
return error if it is in RUN_STATE_POSTMIGRATE state.
Signed-off-by: Tuguoyi <tu.guoyi@h3c.com>
Message-Id: <6b704294ad2e405781c38fb38d68c744@h3c.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reported-by: Li Zhang <li.zhang@cloud.ionos.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Make qemu_start_incoming_migration local to migration/migration.c.
By using the runstate instead of a separate flag, vl need not do
anything to setup deferred incoming migration.
qmp_migrate_incoming also does not need the deferred_incoming flag
anymore, because "-incoming PROTOCOL" will clear the "once" flag
before the main loop starts. Therefore, later invocations of
the migrate-incoming command will fail with the existing
"The incoming migration has already been started" error message.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some very simple initialization routines can be nested in existing
subsystem-level functions, do that to simplify qemu_init.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>