Commit Graph

2011 Commits

Author SHA1 Message Date
Juan Quintela
19df4f3226 migration/RDMA: It is accounting for zero/normal pages in two places
Remove the one in control_save_page().

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-12-quintela@redhat.com>
2023-09-29 18:11:21 +02:00
Juan Quintela
67c31c9c1a migration: Don't abuse qemu_file transferred for RDMA
Just create a variable for it, the same way that multifd does.  This
way it is safe to use for other thread, etc, etc.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-11-quintela@redhat.com>
2023-09-29 18:11:21 +02:00
Juan Quintela
f16ecfa9f9 migration: Use qemu_file_transferred_noflush() for block migration.
We only care about the amount of bytes transferred.  Flushing is done
by the system somewhere else.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230530183941.7223-4-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-09-29 17:05:23 +02:00
Tejus GK
f4e1b61336 migration: Refactor repeated call of yank_unregister_instance
In the function qmp_migrate(), yank_unregister_instance() gets called
twice which isn't required. Hence, refactoring it so that it gets called
during the local_error cleanup.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
Message-ID: <20230621130940.178659-3-tejus.gk@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-09-29 17:05:23 +02:00
Markus Armbruster
7f3de3f02f migration: Clean up local variable shadowing
Local variables shadowing other local variables or parameters make the
code needlessly hard to understand.  Tracked down with -Wshadow=local.
Clean up: delete inner declarations when they are actually redundant,
else rename variables.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Message-ID: <20230921121312.1301864-3-armbru@redhat.com>
2023-09-29 08:13:57 +02:00
Markus Armbruster
bbde656263 migration/rdma: Fix save_page method to fail on polling error
qemu_rdma_save_page() reports polling error with error_report(), then
succeeds anyway.  This is because the variable holding the polling
status *shadows* the variable the function returns.  The latter
remains zero.

Broken since day one, and duplicated more recently.

Fixes: 2da776db48 (rdma: core logic)
Fixes: b390afd8c5 (migration/rdma: Fix out of order wrid)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Message-ID: <20230921121312.1301864-2-armbru@redhat.com>
2023-09-29 08:13:57 +02:00
Fabiano Rosas
36e9aab3c5 migration: Move return path cleanup to main migration thread
Now that the return path thread is allowed to finish during a paused
migration, we can move the cleanup of the QEMUFiles to the main
migration thread.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-9-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Fabiano Rosas
ef796ee93b migration: Replace the return path retry logic
Replace the return path retry logic with finishing and restarting the
thread. This fixes a race when resuming the migration that leads to a
segfault.

Currently when doing postcopy we consider that an IO error on the
return path file could be due to a network intermittency. We then keep
the thread alive but have it do cleanup of the 'from_dst_file' and
wait on the 'postcopy_pause_rp' semaphore. When the user issues a
migrate resume, a new return path is opened and the thread is allowed
to continue.

There's a race condition in the above mechanism. It is possible for
the new return path file to be setup *before* the cleanup code in the
return path thread has had a chance to run, leading to the *new* file
being closed and the pointer set to NULL. When the thread is released
after the resume, it tries to dereference 'from_dst_file' and crashes:

Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
154         return f->last_error;

(gdb) bt
 #0  0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
 #1  0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
 #2  0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
 #3  0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
 #4  0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
 #5  0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Here's the race (important bit is open_return_path happening before
migration_release_dst_files):

migration                 | qmp                         | return path
--------------------------+-----------------------------+---------------------------------
			    qmp_migrate_pause()
			     shutdown(ms->to_dst_file)
			      f->last_error = -EIO
migrate_detect_error()
 postcopy_pause()
  set_state(PAUSED)
  wait(postcopy_pause_sem)
			    qmp_migrate(resume)
			    migrate_fd_connect()
			     resume = state == PAUSED
			     open_return_path <-- TOO SOON!
			     set_state(RECOVER)
			     post(postcopy_pause_sem)
							(incoming closes to_src_file)
							res = qemu_file_get_error(rp)
							migration_release_dst_files()
							ms->rp_state.from_dst_file = NULL
  post(postcopy_pause_rp_sem)
							postcopy_pause_return_path_thread()
							  wait(postcopy_pause_rp_sem)
							rp = ms->rp_state.from_dst_file
							goto retry
							qemu_file_get_error(rp)
							SIGSEGV
-------------------------------------------------------------------------------------------

We can keep the retry logic without having the thread alive and
waiting. The only piece of data used by it is the 'from_dst_file' and
it is only allowed to proceed after a migrate resume is issued and the
semaphore released at migrate_fd_connect().

Move the retry logic to outside the thread by waiting for the thread
to finish before pausing the migration.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-8-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Fabiano Rosas
d50f5dc075 migration: Consolidate return path closing code
We'll start calling the await_return_path_close_on_source() function
from other parts of the code, so move all of the related checks and
tracepoints into it.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-7-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Fabiano Rosas
b3b101157d migration: Remove redundant cleanup of postcopy_qemufile_src
This file is owned by the return path thread which is already doing
cleanup.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-6-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Fabiano Rosas
7478fb0df9 migration: Fix possible race when shutting down to_dst_file
It's not safe to call qemu_file_shutdown() on the to_dst_file without
first checking for the file's presence under the lock. The cleanup of
this file happens at postcopy_pause() and migrate_fd_cleanup() which
are not necessarily running in the same thread as migrate_fd_cancel().

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-5-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Fabiano Rosas
639decf529 migration: Fix possible races when shutting down the return path
We cannot call qemu_file_shutdown() on the return path file without
taking the file lock. The return path thread could be running it's
cleanup code and have just cleared the from_dst_file pointer.

Checking ms->to_dst_file for errors could also race with
migrate_fd_cleanup() which clears the to_dst_file pointer.

Protect both accesses by taking the file lock.

This was caught by inspection, it should be rare, but the next patches
will start calling this code from other places, so let's do the
correct thing.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-4-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Fabiano Rosas
28a8347281 migration: Fix possible race when setting rp_state.error
We don't need to set the rp_state.error right after a shutdown because
qemu_file_shutdown() always sets the QEMUFile error, so the return
path thread would have seen it and set the rp error itself.

Setting the error outside of the thread is also racy because the
thread could clear it after we set it.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-3-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Peter Xu
cf02f29e1e migration: Fix race that dest preempt thread close too early
We hit intermit CI issue on failing at migration-test over the unit test
preempt/plain:

qemu-system-x86_64: Unable to read from socket: Connection reset by peer
Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1
**
ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0)
(test program exited with status code -6)

Fabiano debugged into it and found that the preempt thread can quit even
without receiving all the pages, which can cause guest not receiving all
the pages and corrupt the guest memory.

To make sure preempt thread finished receiving all the pages, we can rely
on the page_requested_count being zero because preempt channel will only
receive requested page faults. Note, not all the faulted pages are required
to be sent via the preempt channel/thread; imagine the case when a
requested page is just queued into the background main channel for
migration, the src qemu will just still send it via the background channel.

Here instead of spinning over reading the count, we add a condvar so the
main thread can wait on it if that unusual case happened, without burning
the cpu for no good reason, even if the duration is short; so even if we
spin in this rare case is probably fine.  It's just better to not do so.

The condvar is only used when that special case is triggered.  Some memory
ordering trick is needed to guarantee it from happening (against the
preempt thread status field), so the main thread will always get a kick
when that triggers correctly.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886
Debugged-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-2-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Avihai Horon
08fc4cb517 migration: Add .save_prepare() handler to struct SaveVMHandlers
Add a new .save_prepare() handler to struct SaveVMHandlers. This handler
is called early, even before migration starts, and can be used by
devices to perform early checks.

Refactor migrate_init() to be able to return errors and call
.save_prepare() from there.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-09-11 08:34:06 +02:00
Avihai Horon
f543aa222d migration: Move more initializations to migrate_init()
Initialization of mig_stats, compression_counters and VFIO bytes
transferred is hard-coded in migration code path and snapshot code path.

Make the code cleaner by initializing them in migrate_init().

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-09-11 08:34:06 +02:00
Avihai Horon
38c482b477 migration: Add migration prefix to functions in target.c
The functions in target.c are not static, yet they don't have a proper
migration prefix. Add such prefix.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-09-11 08:34:06 +02:00
Stefan Hajnoczi
06e0f098d6 io: follow coroutine AioContext in qio_channel_yield()
The ongoing QEMU multi-queue block layer effort makes it possible for multiple
threads to process I/O in parallel. The nbd block driver is not compatible with
the multi-queue block layer yet because QIOChannel cannot be used easily from
coroutines running in multiple threads. This series changes the QIOChannel API
to make that possible.

In the current API, calling qio_channel_attach_aio_context() sets the
AioContext where qio_channel_yield() installs an fd handler prior to yielding:

  qio_channel_attach_aio_context(ioc, my_ctx);
  ...
  qio_channel_yield(ioc); // my_ctx is used here
  ...
  qio_channel_detach_aio_context(ioc);

This API design has limitations: reading and writing must be done in the same
AioContext and moving between AioContexts involves a cumbersome sequence of API
calls that is not suitable for doing on a per-request basis.

There is no fundamental reason why a QIOChannel needs to run within the
same AioContext every time qio_channel_yield() is called. QIOChannel
only uses the AioContext while inside qio_channel_yield(). The rest of
the time, QIOChannel is independent of any AioContext.

In the new API, qio_channel_yield() queries the AioContext from the current
coroutine using qemu_coroutine_get_aio_context(). There is no need to
explicitly attach/detach AioContexts anymore and
qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are gone.
One coroutine can read from the QIOChannel while another coroutine writes from
a different AioContext.

This API change allows the nbd block driver to use QIOChannel from any thread.
It's important to keep in mind that the block driver already synchronizes
QIOChannel access and ensures that two coroutines never read simultaneously or
write simultaneously.

This patch updates all users of qio_channel_attach_aio_context() to the
new API. Most conversions are simple, but vhost-user-server requires a
new qemu_coroutine_yield() call to quiesce the vu_client_trip()
coroutine when not attached to any AioContext.

While the API is has become simpler, there is one wart: QIOChannel has a
special case for the iohandler AioContext (used for handlers that must not run
in nested event loops). I didn't find an elegant way preserve that behavior, so
I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true|false)
for opting in to the new AioContext model. By default QIOChannel uses the
iohandler AioHandler. Code that formerly called
qio_channel_attach_aio_context() now calls
qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
created.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20230830224802.493686-5-stefanha@redhat.com>
[eblake: also fix migration/rdma.c]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-07 20:32:11 -05:00
Stefan Hajnoczi
156618d9ea Pull request
v3:
 - Drop UFS emulation due to CI failures
 - Add "aio-posix: zero out io_uring sqe user_data"
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmTvLIEACgkQnKSrs4Gr
 c8itVggAka3RMkEclbeW7JKJBOolm3oUuJTobV8oJfDNMQ8mmom9JkXVUctyPWQT
 EF+oeqZz1omjr0Dk7YEA2toCahTbXm/UsG7i6cZg8JXPl6e9sOne0j+p5zO5x/kc
 YlG43SBQJHdp/BfTm/gvwUh0W2on0wadaeEV82m3ZyIrZGTgNcrC1p1gj5dwF5VX
 SqW02mgALETECyJpo8O7y9vNUYGxEtETG9jzAhtrugGpYk4bPeXlm/rc+2zwV+ET
 YCnfUvhjhlu5vS4nkta6natg0If16ODjy35vWYm/aGlgveGTqQq9HWgTL71eNuxm
 Smn+hJHuvkyBclKjbGiiO1W1MuG1/g==
 =UvNK
 -----END PGP SIGNATURE-----

Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging

Pull request

v3:
- Drop UFS emulation due to CI failures
- Add "aio-posix: zero out io_uring sqe user_data"

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmTvLIEACgkQnKSrs4Gr
# c8itVggAka3RMkEclbeW7JKJBOolm3oUuJTobV8oJfDNMQ8mmom9JkXVUctyPWQT
# EF+oeqZz1omjr0Dk7YEA2toCahTbXm/UsG7i6cZg8JXPl6e9sOne0j+p5zO5x/kc
# YlG43SBQJHdp/BfTm/gvwUh0W2on0wadaeEV82m3ZyIrZGTgNcrC1p1gj5dwF5VX
# SqW02mgALETECyJpo8O7y9vNUYGxEtETG9jzAhtrugGpYk4bPeXlm/rc+2zwV+ET
# YCnfUvhjhlu5vS4nkta6natg0If16ODjy35vWYm/aGlgveGTqQq9HWgTL71eNuxm
# Smn+hJHuvkyBclKjbGiiO1W1MuG1/g==
# =UvNK
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 30 Aug 2023 07:48:17 EDT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [ultimate]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [ultimate]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
  aio-posix: zero out io_uring sqe user_data
  tests/qemu-iotests/197: add testcase for CoR with subclusters
  block/io: align requests to subcluster_size
  block: add subcluster_size field to BlockDriverInfo
  block-migration: Ensure we don't crash during migration cleanup

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-30 09:20:27 -04:00
Fabiano Rosas
f187609f27 block-migration: Ensure we don't crash during migration cleanup
We can fail the blk_insert_bs() at init_blk_migration(), leaving the
BlkMigDevState without a dirty_bitmap and BlockDriverState. Account
for the possibly missing elements when doing cleanup.

Fix the following crashes:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at ../block/dirty-bitmap.c:359
359         BlockDriverState *bs = bitmap->bs;
 #0  0x0000555555ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at ../block/dirty-bitmap.c:359
 #1  0x0000555555bba331 in unset_dirty_tracking () at ../migration/block.c:371
 #2  0x0000555555bbad98 in block_migration_cleanup_bmds () at ../migration/block.c:681

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
7073        QLIST_FOREACH_SAFE(blocker, &bs->op_blockers[op], list, next) {
 #0  0x0000555555e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
 #1  0x0000555555e9734a in bdrv_op_unblock_all (bs=0x0, reason=0x0) at ../block.c:7095
 #2  0x0000555555bbae13 in block_migration_cleanup_bmds () at ../migration/block.c:690

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-id: 20230731203338.27581-1-farosas@suse.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-30 07:39:10 -04:00
Andrei Gudkov
3eb82637fb migration/dirtyrate: Fix precision losses and g_usleep overshoot
Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
Reviewed-by: Hyman Huang <yong.huang@smartx.com>
Message-Id: <8ddb0d40d143f77aab8f602bd494e01e5fa01614.1691161009.git.gudkov.andrei@huawei.com>
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
2023-08-29 10:19:03 +08:00
Juan Quintela
697c4c86ab migration/rdma: Split qemu_fopen_rdma() into input/output functions
This is how everything else in QEMUFile is structured.
As a bonus they are three less lines of code.

Reviewed-by: Peter Xu <peterx@redhat.com>
Message-ID: <20230530183941.7223-17-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Juan Quintela
ac6f48e15d qemu-file: Make qemu_file_get_error_obj() static
It was not used outside of qemu_file.c anyways.

Reviewed-by: Peter Xu <peterx@redhat.com>
Message-ID: <20230530183941.7223-21-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Juan Quintela
8c5ee0bfb8 qemu-file: Simplify qemu_file_shutdown()
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-ID: <20230530183941.7223-20-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Juan Quintela
9ccf83f486 qemu_file: Make qemu_file_is_writable() static
It is not used outside of qemu_file, and it shouldn't.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230530183941.7223-19-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Juan Quintela
cf786549ce migration: Change qemu_file_transferred to noflush
We do a qemu_fclose() just after that, that also does a qemu_fflush(),
so remove one qemu_fflush().

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230530183941.7223-3-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Juan Quintela
fc95c63b60 qemu-file: Rename qemu_file_transferred_ fast -> noflush
Fast don't say much.  Noflush indicates more clearly that it is like
qemu_file_transferred but without the flush.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230530183941.7223-2-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Wei Wang
82137e6c8c migration: enforce multifd and postcopy preempt to be set before incoming
qemu_start_incoming_migration needs to check the number of multifd
channels or postcopy ram channels to configure the backlog parameter (i.e.
the maximum length to which the queue of pending connections for sockfd
may grow) of listen(). So enforce the usage of postcopy-preempt and
multifd as below:
- need to use "-incoming defer" on the destination; and
- set_capability and set_parameter need to be done before migrate_incoming

Otherwise, disable the use of the features and report error messages to
remind users to adjust the commands.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230606101910.20456-2-wei.w.wang@intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Tejus GK
908927db28 migration: Update error description whenever migration fails
There are places in migration.c where the migration is marked failed with
MIGRATION_STATUS_FAILED, but the failure reason is never updated. Hence
libvirt doesn't know why the migration failed when it queries for it.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
Message-ID: <20230621130940.178659-2-tejus.gk@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
15699cf542 migration: Extend query-migrate to provide dirty page limit info
Extend query-migrate to provide throttle time and estimated
ring full time with dirty-limit capability enabled, through which
we can observe if dirty limit take effect during live migration.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-ID: <168733225273.5845.15871826788879741674-8@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
acac51ba24 migration: Implement dirty-limit convergence algo
Implement dirty-limit convergence algo for live migration,
which is kind of like auto-converge algo but using dirty-limit
instead of cpu throttle to make migration convergent.

Enable dirty page limit if dirty_rate_high_cnt greater than 2
when dirty-limit capability enabled, Disable dirty-limit if
migration be canceled.

Note that "set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit"
commands are not allowed during dirty-limit live migration.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <168733225273.5845.15871826788879741674-7@git.sr.ht>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
310ad5625e migration: Put the detection logic before auto-converge checking
This commit is prepared for the implementation of dirty-limit
convergence algo.

The detection logic of throttling condition can apply to both
auto-converge and dirty-limit algo, putting it's position
before the checking logic for auto-converge feature.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-ID: <168733225273.5845.15871826788879741674-6@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
bb9993c672 migration: Refactor auto-converge capability logic
Check if block migration is running before throttling
guest down in auto-converge way.

Note that this modification is kind of like code clean,
because block migration does not depend on auto-converge
capability, so the order of checks can be adjusted.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <168618975839.6361.17407633874747688653-5@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
dc62395557 migration: Introduce dirty-limit capability
Introduce migration dirty-limit capability, which can
be turned on before live migration and limit dirty
page rate durty live migration.

Introduce migrate_dirty_limit function to help check
if dirty-limit capability enabled during live migration.

Meanwhile, refactor vcpu_dirty_rate_stat_collect
so that period can be configured instead of hardcoded.

dirty-limit capability is kind of like auto-converge
but using dirty limit instead of traditional cpu-throttle
to throttle guest down. To enable this feature, turn on
the dirty-limit capability before live migration using
migrate-set-capabilities, and set the parameters
"x-vcpu-dirty-limit-period", "vcpu-dirty-limit" suitably
to speed up convergence.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <168618975839.6361.17407633874747688653-4@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
09f9ec9913 qapi/migration: Introduce vcpu-dirty-limit parameters
Introduce "vcpu-dirty-limit" migration parameter used
to limit dirty page rate during live migration.

"vcpu-dirty-limit" and "x-vcpu-dirty-limit-period" are
two dirty-limit-related migration parameters, which can
be set before and during live migration by qmp
migrate-set-parameters.

This two parameters are used to help implement the dirty
page rate limit algo of migration.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <168618975839.6361.17407633874747688653-3@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
4d80785719 qapi/migration: Introduce x-vcpu-dirty-limit-period parameter
Introduce "x-vcpu-dirty-limit-period" migration experimental
parameter, which is in the range of 1 to 1000ms and used to
make dirtyrate calculation period configurable.

Currently with the "x-vcpu-dirty-limit-period" varies, the
total time of live migration changes, test results show the
optimal value of "x-vcpu-dirty-limit-period" ranges from
500ms to 1000 ms. "x-vcpu-dirty-limit-period" should be made
stable once it proves best value can not be determined with
developer's experiments.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <168618975839.6361.17407633874747688653-2@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Fabiano Rosas
01ec0f3a92 migration/multifd: Protect accesses to migration_threads
This doubly linked list is common for all the multifd and migration
threads so we need to avoid concurrent access.

Add a mutex to protect the data from concurrent access. This fixes a
crash when removing two MigrationThread objects from the list at the
same time during cleanup of multifd threads.

Fixes: 671326201d ("migration: Introduce interface query-migrationthreads")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230607161306.31425-3-farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Fabiano Rosas
788fa68041 migration/multifd: Rename threadinfo.c functions
We're about to add more functions to this file so make it use the same
coding style as the rest of the code.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230607161306.31425-2-farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Michael Tokarev
d8b71d96b3 migration: spelling fixes
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
2023-07-25 17:13:20 +03:00
David Hildenbrand
f161c88a03 migration/ram: Expose ramblock_is_ignored() as migrate_ram_is_ignored()
virtio-mem wants to know whether it should not mess with the RAMBlock
content (e.g., discard RAM, preallocate memory) on incoming migration.

So let's expose that function as migrate_ram_is_ignored() in
migration/misc.h

Message-ID: <20230706075612.67404-4-david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:25:37 +02:00
Laszlo Ersek
aaf26bd382 migration: unexport migrate_fd_error()
The only migrate_fd_error() call sites are in "migration/migration.c",
which is also where we define migrate_fd_error(). Make the function
static, and remove its declaration from "migration/migration.h".

Cc: Juan Quintela <quintela@redhat.com> (maintainer:Migration)
Cc: Leonardo Bras <leobras@redhat.com> (reviewer:Migration)
Cc: Peter Xu <peterx@redhat.com> (reviewer:Migration)
Cc: qemu-trivial@nongnu.org
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-07-08 07:24:38 +03:00
Laszlo Ersek
8c69ae9eff migration: factor out "resume_requested" in qmp_migrate()
It cuts back on those awkward, duplicated !(has_resume && resume)
expressions.

Cc: Juan Quintela <quintela@redhat.com> (maintainer:Migration)
Cc: Leonardo Bras <leobras@redhat.com> (reviewer:Migration)
Cc: Peter Xu <peterx@redhat.com> (reviewer:Migration)
Cc: qemu-trivial@nongnu.org
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-07-08 07:24:38 +03:00
Avihai Horon
808642a2f6 vfio/migration: Reset bytes_transferred properly
Currently, VFIO bytes_transferred is not reset properly:
1. bytes_transferred is not reset after a VM snapshot (so a migration
   following a snapshot will report incorrect value).
2. bytes_transferred is a single counter for all VFIO devices, however
   upon migration failure it is reset multiple times, by each VFIO
   device.

Fix it by introducing a new function vfio_reset_bytes_transferred() and
calling it during migration and snapshot start.

Remove existing bytes_transferred reset in VFIO migration state
notifier, which is not needed anymore.

Fixes: 3710586caa ("qapi: Add VFIO devices migration stats in Migration stats")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30 06:02:51 +02:00
Avihai Horon
538ef4fe2f migration: Enable switchover ack capability
Now that switchover ack logic has been implemented, enable the
capability.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: YangHang Liu <yanghliu@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30 06:02:51 +02:00
Avihai Horon
1b4adb10f8 migration: Implement switchover ack logic
Implement switchover ack logic. This prevents the source from stopping
the VM and completing the migration until an ACK is received from the
destination that it's OK to do so.

To achieve this, a new SaveVMHandlers handler switchover_ack_needed()
and a new return path message MIG_RP_MSG_SWITCHOVER_ACK are added.

The switchover_ack_needed() handler is called during migration setup in
the destination to check if switchover ack is used by the migrated
device.

When switchover is approved by all migrated devices in the destination
that support this capability, the MIG_RP_MSG_SWITCHOVER_ACK return path
message is sent to the source to notify it that it's OK to do
switchover.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: YangHang Liu <yanghliu@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30 06:02:51 +02:00
Avihai Horon
6574232fff migration: Add switchover ack capability
Migration downtime estimation is calculated based on bandwidth and
remaining migration data. This assumes that loading of migration data in
the destination takes a negligible amount of time and that downtime
depends only on network speed.

While this may be true for RAM, it's not necessarily true for other
migrated devices. For example, loading the data of a VFIO device in the
destination might require from the device to allocate resources, prepare
internal data structures and so on. These operations can take a
significant amount of time which can increase migration downtime.

This patch adds a new capability "switchover ack" that prevents the
source from stopping the VM and completing the migration until an ACK
is received from the destination that it's OK to do so.

This can be used by migrated devices in various ways to reduce downtime.
For example, a device can send initial precopy metadata to pre-allocate
resources in the destination and use this capability to make sure that
the pre-allocation is completed before the source VM is stopped, so it
will have full effect.

This new capability relies on the return path capability to communicate
from the destination back to the source.

The actual implementation of the capability will be added in the
following patches.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Tested-by: YangHang Liu <yanghliu@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30 06:02:51 +02:00
Philippe Mathieu-Daudé
de6cd7599b meson: Replace softmmu_ss -> system_ss
We use the user_ss[] array to hold the user emulation sources,
and the softmmu_ss[] array to hold the system emulation ones.
Hold the latter in the 'system_ss[]' array for parity with user
emulation.

Mechanical change doing:

  $ sed -i -e s/softmmu_ss/system_ss/g $(git grep -l softmmu_ss)

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230613133347.82210-10-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-20 10:01:30 +02:00
Philippe Mathieu-Daudé
c7b64948f8 meson: Replace CONFIG_SOFTMMU -> CONFIG_SYSTEM_ONLY
Since we *might* have user emulation with softmmu,
use the clearer 'CONFIG_SYSTEM_ONLY' key to check
for system emulation.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230613133347.82210-9-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-20 10:01:30 +02:00
Steve Sistare
b0182e537e exec/memory: Introduce RAM_NAMED_FILE flag
migrate_ignore_shared() is an optimization that avoids copying memory
that is visible and can be mapped on the target.  However, a
memory-backend-ram or a memory-backend-memfd block with the RAM_SHARED
flag set is not migrated when migrate_ignore_shared() is true.  This is
wrong, because the block has no named backing store, and its contents will
be lost.  To fix, ignore shared memory iff it is a named file.  Define a
new flag RAM_NAMED_FILE to distinguish this case.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <1686151116-253260-1-git-send-email-steven.sistare@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-06-13 11:28:58 +02:00
Philippe Mathieu-Daudé
7d5b0d6864 bulk: Remove pointless QOM casts
Mechanical change running Coccinelle spatch with content
generated from the qom-cast-macro-clean-cocci-gen.py added
in the previous commit.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230601093452.38972-3-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-06-05 20:48:34 +02:00
Fiona Ebner
3a8b81f2e6 migration: stop tracking ram writes when cancelling background migration
Currently, it is only done when the iteration finishes successfully.
Not cleaning up the userfaultfd write protection can lead to
symptoms/issues such as the process hanging in memmove or GDB not
being able to attach.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20230526115908.196171-1-f.ebner@proxmox.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-06-02 01:03:19 +02:00
Vladimir Sementsov-Ogievskiy
a4c6275aa1 migration: restore vmstate on migration failure
1. Otherwise failed migration just drops guest-panicked state, which is
   not good for management software.

2. We do keep different paused states like guest-panicked during
   migration with help of global_state state.

3. We do restore running state on source when migration is cancelled or
   failed.

4. "postmigrate" state is documented as "guest is paused following a
   successful 'migrate'", so originally it's only for successful path
   and we never documented current behavior.

Let's restore paused states like guest-panicked in case of cancel or
fail too. Allow same transitions like for inmigrate state.

This commit changes the behavior that was introduced by commit
42da5550d6 "migration: set state to post-migrate on failure" and
provides a bit different fix on related
  https://bugzilla.redhat.com/show_bug.cgi?id=1355683

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230517123752.21615-6-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-06-02 01:03:19 +02:00
Vladimir Sementsov-Ogievskiy
f4584076fc migration: switch from .vm_was_running to .vm_old_state
No logic change here, only refactoring. That's a preparation for next
commit where we finally restore the stopped vm state on migration
failure or cancellation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230517123752.21615-5-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-06-02 01:03:19 +02:00
Vladimir Sementsov-Ogievskiy
c33f1829f8 migration: never fail in global_state_store()
Actually global_state_store() can never fail. Let's get rid of extra
error paths.

To make things clear, use new runstate_get() and use same approach for
global_state_store() and global_state_store_running().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230517123752.21615-3-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-06-02 01:03:19 +02:00
Stefan Hajnoczi
60f782b6b7 aio: remove aio_disable_external() API
All callers now pass is_external=false to aio_set_fd_handler() and
aio_set_event_notifier(). The aio_disable_external() API that
temporarily disables fd handlers that were registered is_external=true
is therefore dead code.

Remove aio_disable_external(), aio_enable_external(), and the
is_external arguments to aio_set_fd_handler() and
aio_set_event_notifier().

The entire test-fdmon-epoll test is removed because its sole purpose was
testing aio_disable_external().

Parts of this patch were generated using the following coccinelle
(https://coccinelle.lip6.fr/) semantic patch:

  @@
  expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque;
  @@
  - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque)
  + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque)

  @@
  expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
  @@
  - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready)
  + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)

Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230516190238.8401-21-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:37:26 +02:00
Richard Henderson
b5c0d842d6 migration: Build migration_files once
The items in migration_files are built for libmigration and included
info softmmu_ss from there; no need to also include them directly.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-05-23 16:51:18 -07:00
Richard Henderson
7ba7db9fa1 migration/xbzrle: Use i386 host/cpuinfo.h
Perform the function selection once, and only if CONFIG_AVX512_OPT
is enabled.  Centralize the selection to xbzrle.c, instead of
spreading the init across 3 files.

Remove xbzrle-bench.c.  The benefit of being able to benchmark
the different implementations is less important than not peeking
into the internals of the implementation.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-05-23 16:51:18 -07:00
Richard Henderson
1b48d0abdf migration/xbzrle: Shuffle function order
Place the CONFIG_AVX512BW_OPT block at the top,
which will aid function selection in the next patch.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-05-23 16:51:18 -07:00
Richard Henderson
146f515110 Migration Pull request
Hi
 
 Based on latest reviewed parts of migration:
 - Disable colo (vladimir)
 - Migration atomic counters (juan)
 
 Please apply.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmRmXJUACgkQ9IfvGFhy
 1yNRAxAAjDYJELL34Qovt/WE9qKhYJEvIUGTl1IMWJ22YMFnqIFKRdka57dWoU3P
 7EK1BHmokEEtzGT7Fe1ecERXsOwQIJDIkDTJ5g8Oc8Jt1iqY1AC8h5T+LghijCar
 mbZ6qWHaSjsg2lmek/xc9quymzFGGK36PSyB5WkaLRviKQn4RIkEDpUaWny7nDbA
 Q8zJJpBqNFqKfC5/DN0ePa3QQscXQJhey3nxqFd8hYp8RFNIV5UJVW5Lf6ombtK7
 atgdWC4ckkfO2z3OsghKeo/UiMFWpPktgBVVMhDLmk+P/E6czc2gfzD6SCvrPKTj
 XowI8hro22HVmq9bEY8PtbjMOfpxrAxer+tM2KR/0O9l3UzUacFsi7KGqCJ1/trQ
 1tSDjlgyczb8GOgLwwxj8XE+jPHPfVrzCNfDqrBKBNxz6nnZSdZUwhV5mG8FdVtm
 oVVV96BIrNXLl/lIxYIFD/Zyvl8/lrSWQdLkEHTzihYQeXaQfyvPVbV/dOLT4sii
 YUuGCuEhF+DW/qz43G1krwq5/bfxsiZoQzrMV/Odtf0wYQKkabA3KNBIda/vxBCR
 dsLQ7QtmOwKmCzjqw4LUov9vDNYOYr98o7ZqwJ3qeKL4QgFwtEZUFO3VW6UR8fnF
 arVXiTn9wVlkTpu4sT5hLm9400iadhX4Fppji7Ce0tUpLbWbghA=
 =3x32
 -----END PGP SIGNATURE-----

Merge tag 'migration-20230518-pull-request' of https://gitlab.com/juan.quintela/qemu into staging

Migration Pull request

Hi

Based on latest reviewed parts of migration:
- Disable colo (vladimir)
- Migration atomic counters (juan)

Please apply.

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmRmXJUACgkQ9IfvGFhy
# 1yNRAxAAjDYJELL34Qovt/WE9qKhYJEvIUGTl1IMWJ22YMFnqIFKRdka57dWoU3P
# 7EK1BHmokEEtzGT7Fe1ecERXsOwQIJDIkDTJ5g8Oc8Jt1iqY1AC8h5T+LghijCar
# mbZ6qWHaSjsg2lmek/xc9quymzFGGK36PSyB5WkaLRviKQn4RIkEDpUaWny7nDbA
# Q8zJJpBqNFqKfC5/DN0ePa3QQscXQJhey3nxqFd8hYp8RFNIV5UJVW5Lf6ombtK7
# atgdWC4ckkfO2z3OsghKeo/UiMFWpPktgBVVMhDLmk+P/E6czc2gfzD6SCvrPKTj
# XowI8hro22HVmq9bEY8PtbjMOfpxrAxer+tM2KR/0O9l3UzUacFsi7KGqCJ1/trQ
# 1tSDjlgyczb8GOgLwwxj8XE+jPHPfVrzCNfDqrBKBNxz6nnZSdZUwhV5mG8FdVtm
# oVVV96BIrNXLl/lIxYIFD/Zyvl8/lrSWQdLkEHTzihYQeXaQfyvPVbV/dOLT4sii
# YUuGCuEhF+DW/qz43G1krwq5/bfxsiZoQzrMV/Odtf0wYQKkabA3KNBIda/vxBCR
# dsLQ7QtmOwKmCzjqw4LUov9vDNYOYr98o7ZqwJ3qeKL4QgFwtEZUFO3VW6UR8fnF
# arVXiTn9wVlkTpu4sT5hLm9400iadhX4Fppji7Ce0tUpLbWbghA=
# =3x32
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 18 May 2023 10:12:53 AM PDT
# gpg:                using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [undefined]
# gpg:                 aka "Juan Quintela <quintela@trasno.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* tag 'migration-20230518-pull-request' of https://gitlab.com/juan.quintela/qemu:
  migration: Fix duplicated included in meson.build
  migration/multifd: Compute transferred bytes correctly
  migration: We don't need the field rate_limit_used anymore
  migration: Use migration_transferred_bytes() to calculate rate_limit
  migration: Add a trace for migration_transferred_bytes
  migration: Move migration_total_bytes() to migration-stats.c
  migration: Move rate_limit_max and rate_limit_used to migration_stats
  qemu-file: Account for rate_limit usage on qemu_fflush()
  migration: Don't use INT64_MAX for unlimited rate
  migration: process_incoming_migration_co(): move colo part to colo
  migration: split migration_incoming_co
  configure: add --disable-colo-proxy option

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-05-18 11:07:06 -07:00
Juan Quintela
ba9d2cbc01 migration: Fix duplicated included in meson.build
This is the commint with the merge error (not in the submited patch).

commit 52623f23b0
Author: Lukas Straub <lukasstraub2@web.de>
Date:   Thu Apr 20 11:48:35 2023 +0200

    ram-compress.c: Make target independent

    Make ram-compress.c target independent.

Fixes: 52623f23b0
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230509170217.83246-1-quintela@redhat.com>
2023-05-18 18:41:53 +02:00
Juan Quintela
cbec7eb768 migration/multifd: Compute transferred bytes correctly
In the past, we had to put the in the main thread all the operations
related with sizes due to qemu_file not beeing thread safe.  As now
all counters are atomic, we can update the counters just after the
do the write.  As an aditional bonus, we are able to use the right
value for the compression methods.  Right now we were assuming that
there were no compression at all.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230515195709.63843-17-quintela@redhat.com>
2023-05-18 18:41:46 +02:00
Juan Quintela
bd7ceaf6d5 migration: We don't need the field rate_limit_used anymore
Since previous commit, we calculate how much data we have send with
migration_transferred_bytes() so no need to maintain this counter and
remember to always update it.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230515195709.63843-10-quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Juan Quintela
813cd61669 migration: Use migration_transferred_bytes() to calculate rate_limit
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230515195709.63843-9-quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Juan Quintela
3db9c05a90 migration: Add a trace for migration_transferred_bytes
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230515195709.63843-8-quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Juan Quintela
99319e2daf migration: Move migration_total_bytes() to migration-stats.c
Once there rename it to migration_transferred_bytes() and pass a
QEMUFile instead of a migration object.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230515195709.63843-7-quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Juan Quintela
e1fde0e038 migration: Move rate_limit_max and rate_limit_used to migration_stats
These way we can make them atomic and use this functions from any
place.  I also moved all functions that use rate_limit to
migration-stats.

Functions got renamed, they are not qemu_file anymore.

qemu_file_rate_limit -> migration_rate_exceeded
qemu_file_set_rate_limit -> migration_rate_set
qemu_file_get_rate_limit -> migration_rate_get
qemu_file_reset_rate_limit -> migration_rate_reset
qemu_file_acct_rate_limit -> migration_rate_account.

Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-6-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Juan Quintela
de37f8b9c2 qemu-file: Account for rate_limit usage on qemu_fflush()
That is the moment we know we have transferred something.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230515195709.63843-5-quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Juan Quintela
8e4b2a7059 migration: Don't use INT64_MAX for unlimited rate
Define and use RATE_LIMIT_DISABLED instead.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-Id: <20230515195709.63843-2-quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Vladimir Sementsov-Ogievskiy
d0a14a2ba0 migration: process_incoming_migration_co(): move colo part to colo
Let's make better public interface for COLO: instead of
colo_process_incoming_thread and not trivial logic around creating the
thread let's make simple colo_incoming_co(), hiding implementation from
generic code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515130640.46035-4-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Vladimir Sementsov-Ogievskiy
dd42ce24a3 migration: split migration_incoming_co
Originally, migration_incoming_co was introduced by
25d0c16f62
   "migration: Switch to COLO process after finishing loadvm"
to be able to enter from COLO code to one specific yield point, added
by 25d0c16f62.

Later in 923709896b
 "migration: poll the cm event for destination qemu"
we reused this variable to wake the migration incoming coroutine from
RDMA code.

That was doubtful idea. Entering coroutines is a very fragile thing:
you should be absolutely sure which yield point you are going to enter.

I don't know how much is it safe to enter during qemu_loadvm_state()
which I think what RDMA want to do. But for sure RDMA shouldn't enter
the special COLO-related yield-point. As well, COLO code doesn't want
to enter during qemu_loadvm_state(), it want to enter it's own specific
yield-point.

As well, when in 8e48ac9586
 "COLO: Add block replication into colo process" we added
bdrv_invalidate_cache_all() call (now it's called activate_all())
it became possible to enter the migration incoming coroutine during
that call which is wrong too.

So, let't make these things separate and disjoint: loadvm_co for RDMA,
non-NULL during qemu_loadvm_state(), and colo_incoming_co for COLO,
non-NULL only around specific yield.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515130640.46035-3-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Gavin Shan
1e493be587 migration: Add last stage indicator to global dirty log
The global dirty log synchronization is used when KVM and dirty ring
are enabled. There is a particularity for ARM64 where the backup
bitmap is used to track dirty pages in non-running-vcpu situations.
It means the dirty ring works with the combination of ring buffer
and backup bitmap. The dirty bits in the backup bitmap needs to
collected in the last stage of live migration.

In order to identify the last stage of live migration and pass it
down, an extra parameter is added to the relevant functions and
callbacks. This last stage indicator isn't used until the dirty
ring is enabled in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-Id: <20230509022122.20888-2-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18 08:53:50 +02:00
Philippe Mathieu-Daudé
1e05888ab5 sysemu/kvm: Remove unused headers
All types used are forward-declared in "qemu/typedefs.h".

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230405160454.97436-2-philmd@linaro.org>
[thuth: Add hw/core/cpu.h to migration/dirtyrate.c to fix compile failure]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-05-16 09:13:34 +02:00
Juan Quintela
6da835d42a qemu-file: Remove total from qemu_file_total_transferred_*()
Function is already quite long.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230508130909.65420-7-quintela@redhat.com>
2023-05-15 13:46:14 +02:00
Juan Quintela
f87e4d6d43 qemu-file: Make rate_limit_used an uint64_t
Change all the functions that use it.  It was already passed as
uint64_t.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230508130909.65420-6-quintela@redhat.com>
2023-05-15 13:45:33 +02:00
Juan Quintela
bffc0441d5 qemu-file: make qemu_file_[sg]et_rate_limit() use an uint64_t
It is really size_t.  Everything else uses uint64_t, so move this to
uint64_t as well.  A size can't be negative anyways.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230508130909.65420-5-quintela@redhat.com>
2023-05-15 13:44:38 +02:00
Juan Quintela
9d3ebbe217 migration: We set the rate_limit by a second
That the implementation does the check every 100 milliseconds is an
implementation detail that shouldn't be seen on the interfaz.
Notice that all callers of qemu_file_set_rate_limit() used the
division or pass 0, so this change is a NOP.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230508130909.65420-4-quintela@redhat.com>
2023-05-15 13:44:07 +02:00
Juan Quintela
52d01d4a5d migration: A rate limit value of 0 is valid
And it is the best way to not have rate_limit.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230508130909.65420-2-quintela@redhat.com>
2023-05-15 13:42:07 +02:00
Juan Quintela
dc2836c380 migration: Make dirtyrate.c target independent
After the previous two patches, there is nothing else that is target
specific.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230511141208.17779-6-quintela@redhat.com>
2023-05-15 10:33:05 +02:00
Juan Quintela
148b1ad83c migration: Teach dirtyrate about qemu_target_page_bits()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230511141208.17779-5-quintela@redhat.com>
2023-05-15 10:33:05 +02:00
Juan Quintela
edd83a70dc migration: Teach dirtyrate about qemu_target_page_size()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230511141208.17779-4-quintela@redhat.com>
2023-05-15 10:33:04 +02:00
Juan Quintela
beeda9b7cd Use new created qemu_target_pages_to_MiB()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230511141208.17779-3-quintela@redhat.com>
2023-05-15 10:33:04 +02:00
Andrei Gudkov
00a3f9c60a migration/calc-dirty-rate: replaced CRC32 with xxHash
This significantly reduces overhead of dirty page
rate calculation in sampling mode.
Tested using 32GiB VM on E5-2690 CPU.

With CRC32:
total_pages=8388608 sampled_pages=16384 millis=71

With xxHash:
total_pages=8388608 sampled_pages=16384 millis=14

Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
Message-Id: <cd115a89fc81d5f2eeb4ea7d57a98b84f794f340.1682598010.git.gudkov.andrei@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-15 10:33:03 +02:00
Jamie Iles
370ed60029 cpu: expose qemu_cpu_list_lock for lock-guard use
Expose qemu_cpu_list_lock globally so that we can use
WITH_QEMU_LOCK_GUARD and QEMU_LOCK_GUARD to simplify a few code paths
now and in future.

Signed-off-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230427020925.51003-2-quic_jiles@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-05-11 09:53:41 +01:00
Vladimir Sementsov-Ogievskiy
121ccedc2b migration: block incoming colo when capability is disabled
We generally require same set of capabilities on source and target.
Let's require x-colo capability to use COLO on target.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20230428194928.1426370-11-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-10 18:48:12 +02:00
Vladimir Sementsov-Ogievskiy
d70178a88f migration: disallow change capabilities in COLO state
COLO is not listed as running state in migrate_is_running(), so, it's
theoretically possible to disable colo capability in COLO state and the
unexpected error in migration_iteration_finish() is reachable.

Let's disallow that in qmp_migrate_set_capabilities. Than the error
becomes absolutely unreachable: we can get into COLO state only with
enabled capability and can't disable it while we are in COLO state. So
substitute the error by simple assertion.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230428194928.1426370-10-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-10 18:48:12 +02:00
Vladimir Sementsov-Ogievskiy
ecbfec6d77 migration: process_incoming_migration_co: simplify code flow around ret
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20230428194928.1426370-7-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-10 18:48:11 +02:00
Vladimir Sementsov-Ogievskiy
1d4cfcd409 migration: drop colo_incoming_thread from MigrationIncomingState
have_colo_incoming_thread variable is unused. colo_incoming_thread can
be local.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20230428194928.1426370-6-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-10 18:48:11 +02:00
Vladimir Sementsov-Ogievskiy
51e47cf860 build: move COLO under CONFIG_REPLICATION
We don't allow to use x-colo capability when replication is not
configured. So, no reason to build COLO when replication is disabled,
it's unusable in this case.

Note also that the check in migrate_caps_check() is not the only
restriction: some functions in migration/colo.c will just abort if
called with not defined CONFIG_REPLICATION, for example:

    migration_iteration_finish()
       case MIGRATION_STATUS_COLO:
           migrate_start_colo_process()
               colo_process_checkpoint()
                   abort()

It could probably make sense to have possibility to enable COLO without
REPLICATION, but this requires deeper audit of colo & replication code,
which may be done later if needed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Acked-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230428194928.1426370-4-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-10 18:48:11 +02:00
Vladimir Sementsov-Ogievskiy
4332ffcd7b colo: make colo_checkpoint_notify static and provide simpler API
colo_checkpoint_notify() is mostly used in colo.c. Outside we use it
once when x-checkpoint-delay migration parameter is set. So, let's
simplify the external API to only that function - notify COLO that
parameter was set. This make external API more robust and hides
implementation details from external callers. Also this helps us to
make COLO module optional in further patch (i.e. we are going to add
possibility not build the COLO module).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20230428194928.1426370-3-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-10 18:48:11 +02:00
Lukas Straub
5d1d1fcf43 multifd: Add the ramblock to MultiFDRecvParams
This will be used in the next commits to add colo support to multifd.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <88135197411df1a71d7832962b39abf60faf0021.1683572883.git.lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-10 18:48:11 +02:00
Lukas Straub
9d638407ef ram: Let colo_flush_ram_cache take the bitmap_mutex
This is not required, colo_flush_ram_cache does not run concurrently
with the multifd threads since the cache is only flushed after
everything has been received. But it makes me more comfortable.

This will be used in the next commits to add colo support to multifd.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <35cb23ba854151d38a31e3a5c8a1020e4283cb4a.1683572883.git.lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-10 18:48:11 +02:00
Lukas Straub
871cfc5400 ram: Add public helper to set colo bitmap
The overhead of the mutex in non-multifd mode is negligible,
because in that case its just the single thread taking the mutex.

This will be used in the next commits to add colo support to multifd.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <22d83cb428f37929563155531bfb69fd8953cc61.1683572883.git.lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-10 18:48:11 +02:00
Eric Blake
6dab4c93ec migration: Attempt disk reactivation in more failure scenarios
Commit fe904ea824 added a fail_inactivate label, which tries to
reactivate disks on the source after a failure while s->state ==
MIGRATION_STATUS_ACTIVE, but didn't actually use the label if
qemu_savevm_state_complete_precopy() failed.  This failure to
reactivate is also present in commit 6039dd5b1c (also covering the new
s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring
s->block_inactive is set more reliably).

Consolidate the two labels back into one - no matter HOW migration is
failed, if there is any chance we can reach vm_start() after having
attempted inactivation, it is essential that we have tried to restart
disks before then.  This also makes the cleanup more like
migrate_fd_cancel().

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20230502205212.134680-1-eblake@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10 14:16:53 +02:00
Lukas Straub
c323518a7a migration: Initialize and cleanup decompression in migration.c
This fixes compress with colo.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:27 +02:00
Lukas Straub
52623f23b0 ram-compress.c: Make target independent
Make ram-compress.c target independent.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
4024cc8506 ram compress: Assert that the file buffer matches the result
Before this series, "nothing to send" was handled by the file buffer
being empty. Now it is tracked via param->result.

Assert that the file buffer state matches the result.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
b1f17720c1 ram.c: Move core decompression code into its own file
No functional changes intended.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
b5ca3368d9 ram.c: Move core compression code into its own file
No functional changes intended.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
ef4f5f5d5a ram.c: Remove last ram.c dependency from the core compress code
Make compression interfaces take send_queued_data() as an argument.
Remove save_page_use_compression() from flush_compressed_data().

This removes the last ram.c dependency from the core compress code.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
680628d200 ram.c: Call update_compress_thread_counts from compress_send_queued_data
This makes the core compress code more independend from ram.c.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
3e81763e4c ram.c: Do not call save_page_header() from compress threads
save_page_header() accesses several global variables, so calling it
from multiple threads is pretty ugly.

Instead, call save_page_header() before writing out the compressed
data from the compress buffer to the migration stream.

This also makes the core compress code more independend from ram.c.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
b5cf1cd3e8 ram.c: Reset result after sending queued data
And take the param->mutex lock for the whole section to ensure
thread-safety.
Now, it is explicitly clear if there is no queued data to send.
Before, this was handled by param->file stream being empty and thus
qemu_put_qemu_file() not sending anything.

This will be used in the next commits to move save_page_header()
out of compress code.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
10c2f7b747 ram.c: Dont change param->block in the compress thread
Instead introduce a extra parameter to trigger the compress thread.
Now, when the compress thread is done, we know what RAMBlock and
offset it did compress.

This will be used in the next commits to move save_page_header()
out of compress code.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
97274a871f ram.c: Let the compress threads return a CompressResult enum
This will be used in the next commits to move save_page_header()
out of compress code.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Juan Quintela
fae4009fb5 qemu-file: Make ram_control_save_page() use accessors for rate_limit
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230504113841.23130-9-quintela@redhat.com>
2023-05-05 02:01:59 +02:00
Juan Quintela
61abf1ebdc qemu-file: Make total_transferred an uint64_t
Change all the functions that use it.  It was already passed as
uint64_t.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-8-quintela@redhat.com>
2023-05-05 02:01:59 +02:00
Juan Quintela
ac7d25b816 qemu-file: remove shutdown member
The first thing that we do after setting the shutdown value is set the
error as -EIO if there is not a previous error.

So this value is redundant.  Just remove it and use
qemu_file_get_error() in the places that it was tested.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230504113841.23130-7-quintela@redhat.com>
2023-05-05 02:01:59 +02:00
Juan Quintela
27a1243f14 qemu-file: No need to check for shutdown in qemu_file_rate_limit
After calling qemu_file_shutdown() we set the error as -EIO if there
is no another previous error, so no need to check it here.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-6-quintela@redhat.com>
2023-05-05 02:01:59 +02:00
Juan Quintela
f3030d3440 migration: qemu_file_total_transferred() function is monotonic
So delta_bytes can only be greater or equal to zero.  Never negative.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-3-quintela@redhat.com>
2023-05-05 01:04:33 +02:00
Juan Quintela
520333490a migration: max_postcopy_bandwidth is a size parameter
So make everything that uses it uint64_t no int64_t.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-2-quintela@redhat.com>
2023-05-05 01:04:33 +02:00
Juan Quintela
cd01a60231 migration/rdma: Check for postcopy sooner
It makes no sense first try to see if there is an rdma error and then
do nothing on postcopy stage.  Change it so we check we are in
postcopy before doing anything.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-6-quintela@redhat.com>
2023-05-05 01:04:33 +02:00
Juan Quintela
8c90815797 migration/rdma: It makes no sense to recive that flag without RDMA
This could only happen if the source sent
RAM_SAVE_FLAG_HOOK (i.e. rdma) and destination don't have CONFIG_RDMA.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-5-quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
93dc710585 migration/rdma: We can calculate the rioc from the QEMUFile
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-4-quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
cf7fe0c5b0 migration/rdma: simplify ram_control_load_hook()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-3-quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
5f1e7540b4 migration: Make RAM_SAVE_FLAG_HOOK a normal case entry
Fixes this commit, clearly a bad merge after a rebase or similar, it
should have been its own case since that point.

commit 5b0e9dd46f
Author: Peter Lieven <pl@kamp.de>
Date:   Tue Jun 24 11:32:36 2014 +0200

    migration: catch unknown flag combinations in ram_load

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-2-quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
f3095cc8a7 migration: Rename xbzrle_enabled xbzrle_started
Otherwise it is confusing with the function xbzrle_enabled().

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230504115323.24407-1-quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
40f240a764 migration: Put zero_pages in alphabetical order
I forgot to move it when I rename it from duplicated_pages.

Message-Id: <20230504103357.22130-3-quintela@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
e2ee200558 migration: Document all migration_stats
Message-Id: <20230504103357.22130-2-quintela@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
3ec6828a79 migration/rdma: Don't pass the QIOChannelRDMA as an opaque
We can calculate it from the QEMUFile like the caller.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230503131847.11603-6-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
3cba22c9ad migration: Fix block_bitmap_mapping migration
It is valid that params->has_block_bitmap_mapping is true and
params->block_bitmap_mapping is NULL.  So we can't use the trick of
having a single function.

Move to two functions one for each value and the tests are fixed.

Fixes: b804b35b1c
       migration: Create migrate_block_bitmap_mapping() function

Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230503181036.14890-1-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
0deb7e9b6c migration: Drop unused parameter for migration_tls_client_create()
It is not needed since we moved the accessor for tls properties to
options.c.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-05-03 11:24:20 +02:00
Juan Quintela
3f461a0c0b migration: Drop unused parameter for migration_tls_get_creds()
It is not needed since we moved the accessor for tls properties to
options.c.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-05-03 11:24:20 +02:00
Juan Quintela
5690756d7c migration/rdma: Unfold last user of acct_update_position()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:20 +02:00
Juan Quintela
c61d2faa93 migration/rdma: Split the zero page case from acct_update_position
Now that we have atomic counters, we can do it on the place that we
need it, no need to do it inside ram.c.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:20 +02:00
Juan Quintela
96820df24e migration: Rename RAMStats to MigrationAtomicStats
It is lousely based on MigrationStats, but that name is taken, so this
is the best one that I came with.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>

---

If you have any good suggestion for the name, I am all ears.
2023-05-03 11:24:20 +02:00
Juan Quintela
aff3f6606d migration: Rename ram_counters to mig_stats
migration_stats is just too long, and it is going to have more than
ram counters in the near future.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:20 +02:00
Juan Quintela
947701cc1a migration: Move ram_stats to its own file migration-stats.[ch]
There is already include/qemu/stats.h, so stats.h was a bad idea.
We want this file to not depend on anything else, we will move all the
migration counters/stats to this struct.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:19 +02:00
Juan Quintela
e232199aad multifd: We already account for this packet on the multifd thread
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:19 +02:00
Richard Henderson
dc165fcd4e migration/xbzrle: Use __attribute__((target)) for avx512
Use the attribute, which is supported by clang, instead of
the #pragma, which is not supported and, for some reason,
also not detected by the meson probe, so we fail by -Werror.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230501210555.289806-1-richard.henderson@linaro.org>
2023-05-02 13:05:45 -07:00
Juan Quintela
73208a336e migration: Make dirty_bytes_last_sync atomic
As we set its value, it needs to be operated with atomics.
We rename it from remaining to better reflect its meaning.

Statistics always return the real reamaining bytes.  This was used to
store how much pages where dirty on the previous generation, so we can
calculate the expected downtime as: dirty_bytes_last_sync /
current_bandwith.

If we use the actual remaining bytes, we would see a very small value
at the end of the iteration.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

---

I am open to use ram_bytes_remaining() in its only use and be more
"optimistic" about the downtime.

Don't use __nocheck() functions.
Use stat64_get() now that it exists.
2023-04-27 16:39:54 +02:00
Juan Quintela
72f8e58707 migration: Make dirty_pages_rate atomic
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>

---

Don't use __nocheck() variants
Use stat64_get()
2023-04-27 16:39:49 +02:00
Juan Quintela
294e5a4034 multifd: Only flush once each full round of memory
We need to add a new flag to mean to flush at that point.
Notice that we still flush at the end of setup and at the end of
complete stages.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>

---

Add missing qemu_fflush(), now it passes all tests always.
In the previous version, the check that changes the default value to
false got lost in some rebase.  Get it back.
2023-04-27 16:37:28 +02:00
Juan Quintela
b05292c237 multifd: Protect multifd_send_sync_main() calls
We only need to do that on the ram_save_iterate() call on sending and
on destination when we get a RAM_SAVE_FLAG_EOS.

In setup() and complete() we need to synch in both new and old cases,
so don't add a check there.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>

---

Remove the wrappers that we take out on patch 5.
2023-04-27 16:37:28 +02:00
Juan Quintela
77c259a4cb multifd: Create property multifd-flush-after-each-section
We used to flush all channels at the end of each RAM section
sent.  That is not needed, so preparing to only flush after a full
iteration through all the RAM.

Default value of the property is false.  But we return "true" in
migrate_multifd_flush_after_each_section() until we implement the code
in following patches.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>

---

Rename each-iteration to after-each-section
Rename multifd-sync-after-each-section to
       multifd-flush-after-each-section
Move to machine-8.0 (peter)
2023-04-27 16:37:28 +02:00
Juan Quintela
f9436522c8 migration: Move migration_properties to options.c
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27 16:37:28 +02:00
Juan Quintela
b804b35b1c migration: Create migrate_block_bitmap_mapping() function
Notice that we changed the test of ->has_block_bitmap_mapping
for the test that block_bitmap_mapping is not NULL.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

---

Make it return const (vladimir)
2023-04-27 16:37:28 +02:00
Juan Quintela
1f2f366c32 migration: Create migrate_tls_hostname() function
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

---

Moved the type to const char * (vladimir)
2023-04-27 16:37:28 +02:00
Juan Quintela
2eb0308bbd migration: Create migrate_tls_authz() function
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

---

Moved the type to const char * (vladimir)
2023-04-27 16:37:28 +02:00
Juan Quintela
d5c3e1959c migration: Create migrate_tls_creds() function
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

---

Moved the type to const char * (vladimir)
2023-04-27 16:37:28 +02:00
Juan Quintela
b1a8795654 migration: Remove MigrationState from block_cleanup_parameters()
This makes the function more regular with everything else.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27 16:37:28 +02:00
Juan Quintela
b7b73122dd migration: Move block_cleanup_parameters() to options.c
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27 16:37:28 +02:00
Juan Quintela
87c2290109 migration: Move migrate_set_block_incremental() to options.c
Once there, make it more regular and remove the need for
MigrationState parameter.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27 16:37:28 +02:00
Juan Quintela
f5da8ba477 migration: Create migrate_downtime_limit() function
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27 16:37:28 +02:00
Juan Quintela
8f9c532756 migration: Make all functions check have the same format
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27 16:37:28 +02:00
Juan Quintela
61a174e227 migration: Create migrate_params_init() function
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27 16:37:28 +02:00
Juan Quintela
d2026ee117 multifd: Fix the number of channels ready
We don't wait in the sem when we are doing a sync_main.  Make it wait
there.  To make things clearer, we mark the channel ready at the
begining of the thread loop.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
2023-04-27 16:37:28 +02:00
Peter Xu
12c81e5ae9 migration/vmstate-dump: Dump array size too as "num"
For VMS_ARRAY typed vmsd fields, also dump the number of entries in the
array in -vmstate-dump.

Without such information, vmstate static checker can report false negatives
of incompatible vmsd on VMS_ARRAY typed fields, when the src/dst do not
have the same type of array defined.  It's because in the checker we only
check against size of fields within a VMSD field.

One example: e1000e used to have a field defined as a boolean array with 5
entries, then removed it and replaced it with UNUSED (in 31e3f318c8):

-        VMSTATE_BOOL_ARRAY(core.eitr_intr_pending, E1000EState,
-                           E1000E_MSIX_VEC_NUM),
+        VMSTATE_UNUSED(E1000E_MSIX_VEC_NUM),

It's a legal replacement but vmstate static checker is not happy with it,
because it checks only against the "size" field between the two
fields (here one is BOOL_ARRAY, the other is UNUSED):

For BOOL_ARRAY:

      {
        "field": "core.eitr_intr_pending",
        "version_id": 0,
        "field_exists": false,
        "size": 1
      },

For UNUSED:

      {
        "field": "unused",
        "version_id": 0,
        "field_exists": false,
        "size": 5
      },

It's not the script to blame because there's just not enough information
dumped to show the total size of the entry for an array.  Add it.

Note that this will not break old vmstate checker because the field will
just be ignored.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-27 10:18:25 +02:00
Peter Xu
74c38cf7fd migration: Allow postcopy_ram_supported_by_host() to report err
Instead of print it to STDERR, bring the error upwards so that it can be
reported via QMP responses.

E.g.:

{ "execute": "migrate-set-capabilities" ,
  "arguments": { "capabilities":
  [ { "capability": "postcopy-ram", "state": true } ] } }

{ "error":
  { "class": "GenericError",
    "desc": "Postcopy is not supported: Host backend files need to be TMPFS
    or HUGETLBFS only" } }

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-27 10:18:25 +02:00
Juan Quintela
09d6c96584 migration: Move qmp_migrate_set_parameters() to options.c
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27 10:18:25 +02:00
Juan Quintela
10d4703be5 migration: Move migrate_use_tls() to options.c
Once there, rename it to migrate_tls() and make it return bool for
consistency.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

---

Fix typos found by fabiano
2023-04-27 10:18:25 +02:00