Commit Graph

1922 Commits

Author SHA1 Message Date
Peter Xu
579cedf430 migration: Unify and trace vmstate field_exists() checks
For both save/load we actually share the logic on deciding whether a field
should exist.  Merge the checks into a helper and use it for both save and
load.  When doing so, add documentations and reformat the code to make it
much easier to read.

The real benefit here (besides code cleanups) is we add a trace-point for
this; this is a known spot where we can easily break migration
compatibilities between binaries, and this trace point will be critical for
us to identify such issues.

For example, this will be handy when debugging things like:

https://gitlab.com/qemu-project/qemu/-/issues/932

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230906204722.514474-1-peterx@redhat.com>
2023-10-04 13:19:47 +02:00
Steve Sistare
385f510df5 migration: file URI offset
Allow an offset option to be specified as part of the file URI, in
the form "file:filename,offset=offset", where offset accepts the common
size suffixes, or the 0x prefix, but not both.  Migration data is written
to and read from the file starting at offset.  If unspecified, it defaults
to 0.

This is needed by libvirt to store its own data at the head of the file.

Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1694182931-61390-3-git-send-email-steven.sistare@oracle.com>
2023-10-04 13:18:08 +02:00
Steve Sistare
2a9e2e595f migration: file URI
Extend the migration URI to support file:<filename>.  This can be used for
any migration scenario that does not require a reverse path.  It can be
used as an alternative to 'exec:cat > file' in minimized containers that
do not contain /bin/sh, and it is easier to use than the fd:<fdname> URI.
It can be used in HMP commands, and as a qemu command-line parameter.

For best performance, guest ram should be shared and x-ignore-shared
should be true, so guest pages are not written to the file, in which case
the guest may remain running.  If ram is not so configured, then the user
is advised to stop the guest first.  Otherwise, a busy guest may re-dirty
the same page, causing it to be appended to the file multiple times,
and the file may grow unboundedly.  That issue is being addressed in the
"fixed-ram" patch series.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Tested-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1694182931-61390-2-git-send-email-steven.sistare@oracle.com>
2023-10-04 13:16:58 +02:00
Li Zhijian
2ada4b63f1 migration/rdma: zore out head.repeat to make the error more clear
Previously, we got a confusion error that complains
the RDMAControlHeader.repeat:
qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.

Actually, it's caused by an unexpected RDMAControlHeader.type.
After this patch, error will become:
qemu-system-x86_64: Unknown control message QEMU FILE

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230926100103.201564-2-lizhijian@fujitsu.com>
2023-10-04 10:54:41 +02:00
Tejus GK
848a050342 migration: Update error description outside migration.c
A few code paths exist in the source code,where a migration is
marked as failed via MIGRATION_STATUS_FAILED, but the failure happens
outside	of migration.c

In such	cases, an error_report() call is made, however the current
MigrationState is never	updated	with the error description, and	hence
clients	like libvirt never know	the actual reason for the failure.

This patch covers such cases outside of	migration.c and	updates	the
error description at the appropriate places.

Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231003065538.244752-3-tejus.gk@nutanix.com>
2023-10-04 10:54:40 +02:00
Tejus GK
969298f9d7 migration/vmstate: Introduce vmstate_save_state_with_err
Currently, a few code paths exist in the function vmstate_save_state_v,
which ultimately leads to a migration failure. However, an update in the
current MigrationState for the error description is never done.

vmstate.c somehow doesn't seem to allow	the use	of migrate_set_error due
to some	dependencies for unit tests. Hence, this patch introduces a new
function vmstate_save_state_with_err, which will eventually propagate
the error message to savevm.c where a migrate_set_error	call can be
eventually done.

Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231003065538.244752-2-tejus.gk@nutanix.com>
2023-10-04 10:54:40 +02:00
Stefan Hajnoczi
50d0bfd0ed Migration Pull request (20231002)
In this migration pull request:
 
 - Refactor repeated call of yank_unregister_instance (tejus)
 - More migraton-test changes
 
 Please, apply.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmUatX4ACgkQ9IfvGFhy
 1yMlbQ/+Kp7m1Mr5LUM/8mvh9LZTVvWauBHch1pdvpCsJO+Grdtv6MtZL5UKT2ue
 xYksZvf/rT4bdt2H1lSsG1o2GOcIf4qyWICgYNDo8peaxm1IrvgAbimaWHWLeORX
 sBxKcBBuTac55vmEKzbPSbwGCGGTU/11UGXQ4ruGN3Hwbd2JZHAK6GxGIzANToZc
 JtwBr/31SxJ2YndNLaPMEnD3cHbRbD2UyODeTt1KI5LdTGgXHoB6PgCk2AMQP1Ko
 LlaPLsrEKC06h2CJ27BB36CNVEGMN2iFa3aKz1FC85Oj2ckatspAFw78t9guj6eM
 MYxn0ipSsjjWjMsc3zEDxi7JrA///5bp1e6e7WdLpOaMBPpV4xuvVvA6Aku2es7D
 fMPOMdftBp6rrXp8edBMTs1sOHdE1k8ZsyJ90m96ckjfLX39TPAiJRm4pWD2UuP5
 Wjr+/IU+LEp/KCqimMj0kYMRz4rM3PP8hOakPZLiRR5ZG6sgbHZK44iPXB/Udz/g
 TCZ87siIpI8YHb3WCaO5CvbdjPrszg1j9v7RimtDeGLDR/hNokkQ1EEeszDTGpgt
 xst4S4wVmex2jYyi53woH4V1p8anP7iqa8elPehAaYPobp47pmBV53ZaSwibqzPN
 TmO7P9rfyQGCiXXZRvrAQJa+gmAkQlSEI7mSssV77pU+1gdEj9c=
 =hD/8
 -----END PGP SIGNATURE-----

Merge tag 'migration-20231002-pull-request' of https://gitlab.com/juan.quintela/qemu into staging

Migration Pull request (20231002)

In this migration pull request:

- Refactor repeated call of yank_unregister_instance (tejus)
- More migraton-test changes

Please, apply.

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmUatX4ACgkQ9IfvGFhy
# 1yMlbQ/+Kp7m1Mr5LUM/8mvh9LZTVvWauBHch1pdvpCsJO+Grdtv6MtZL5UKT2ue
# xYksZvf/rT4bdt2H1lSsG1o2GOcIf4qyWICgYNDo8peaxm1IrvgAbimaWHWLeORX
# sBxKcBBuTac55vmEKzbPSbwGCGGTU/11UGXQ4ruGN3Hwbd2JZHAK6GxGIzANToZc
# JtwBr/31SxJ2YndNLaPMEnD3cHbRbD2UyODeTt1KI5LdTGgXHoB6PgCk2AMQP1Ko
# LlaPLsrEKC06h2CJ27BB36CNVEGMN2iFa3aKz1FC85Oj2ckatspAFw78t9guj6eM
# MYxn0ipSsjjWjMsc3zEDxi7JrA///5bp1e6e7WdLpOaMBPpV4xuvVvA6Aku2es7D
# fMPOMdftBp6rrXp8edBMTs1sOHdE1k8ZsyJ90m96ckjfLX39TPAiJRm4pWD2UuP5
# Wjr+/IU+LEp/KCqimMj0kYMRz4rM3PP8hOakPZLiRR5ZG6sgbHZK44iPXB/Udz/g
# TCZ87siIpI8YHb3WCaO5CvbdjPrszg1j9v7RimtDeGLDR/hNokkQ1EEeszDTGpgt
# xst4S4wVmex2jYyi53woH4V1p8anP7iqa8elPehAaYPobp47pmBV53ZaSwibqzPN
# TmO7P9rfyQGCiXXZRvrAQJa+gmAkQlSEI7mSssV77pU+1gdEj9c=
# =hD/8
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 02 Oct 2023 08:20:14 EDT
# gpg:                using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full]
# gpg:                 aka "Juan Quintela <quintela@trasno.org>" [full]
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* tag 'migration-20231002-pull-request' of https://gitlab.com/juan.quintela/qemu:
  migration/rdma: Simplify the function that saves a page
  migration: Remove unused qemu_file_credit_transfer()
  migration/rdma: Don't use imaginary transfers
  migration/rdma: Remove QEMUFile parameter when not used
  migration/RDMA: It is accounting for zero/normal pages in two places
  migration: Don't abuse qemu_file transferred for RDMA
  migration: Use qemu_file_transferred_noflush() for block migration.
  migration: Refactor repeated call of yank_unregister_instance
  migration-test: simplify shmem_opts handling
  migration-test: dirtylimit checks for x86_64 arch before
  migration-test: Add bootfile_create/delete() functions
  migration-test: bootpath is the same for all tests and for all archs
  migration-test: Create kvm_opts

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-10-02 14:42:44 -04:00
Juan Quintela
9c53d369e5 migration/rdma: Simplify the function that saves a page
When we sent a page through QEMUFile hooks (RDMA) there are three
posiblities:
- We are not using RDMA. return RAM_SAVE_CONTROL_DELAYED and
  control_save_page() returns false to let anything else to proceed.
- There is one error but we are using RDMA.  Then we return a negative
  value, control_save_page() needs to return true.
- Everything goes well and RDMA start the sent of the page
  asynchronously.  It returns RAM_SAVE_CONTROL_DELAYED and we need to
  return 1 for ram_save_page_legacy.

Clear?

I know, I know, the interface is as bad as it gets.  I think that now
it is a bit clearer, but this needs to be done some other way.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-16-quintela@redhat.com>
2023-09-29 18:13:53 +02:00
Juan Quintela
9f51fe9239 migration: Remove unused qemu_file_credit_transfer()
After this change, nothing abuses QEMUFile to account for data
transferrefd during migration.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-15-quintela@redhat.com>
2023-09-29 18:13:36 +02:00
Juan Quintela
2ebe5d4d5a migration/rdma: Don't use imaginary transfers
RDMA protocol is completely asynchronous, so in qemu_rdma_save_page()
they "invent" that a byte has been transferred.  And then they call
qemu_file_credit_transfer() and ram_transferred_add() with that byte.
Just remove that calls as nothing has been sent.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-14-quintela@redhat.com>
2023-09-29 18:11:21 +02:00
Juan Quintela
e33780351c migration/rdma: Remove QEMUFile parameter when not used
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-13-quintela@redhat.com>
2023-09-29 18:11:21 +02:00
Juan Quintela
19df4f3226 migration/RDMA: It is accounting for zero/normal pages in two places
Remove the one in control_save_page().

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-12-quintela@redhat.com>
2023-09-29 18:11:21 +02:00
Juan Quintela
67c31c9c1a migration: Don't abuse qemu_file transferred for RDMA
Just create a variable for it, the same way that multifd does.  This
way it is safe to use for other thread, etc, etc.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-11-quintela@redhat.com>
2023-09-29 18:11:21 +02:00
Juan Quintela
f16ecfa9f9 migration: Use qemu_file_transferred_noflush() for block migration.
We only care about the amount of bytes transferred.  Flushing is done
by the system somewhere else.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230530183941.7223-4-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-09-29 17:05:23 +02:00
Tejus GK
f4e1b61336 migration: Refactor repeated call of yank_unregister_instance
In the function qmp_migrate(), yank_unregister_instance() gets called
twice which isn't required. Hence, refactoring it so that it gets called
during the local_error cleanup.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
Message-ID: <20230621130940.178659-3-tejus.gk@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-09-29 17:05:23 +02:00
Markus Armbruster
7f3de3f02f migration: Clean up local variable shadowing
Local variables shadowing other local variables or parameters make the
code needlessly hard to understand.  Tracked down with -Wshadow=local.
Clean up: delete inner declarations when they are actually redundant,
else rename variables.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Message-ID: <20230921121312.1301864-3-armbru@redhat.com>
2023-09-29 08:13:57 +02:00
Markus Armbruster
bbde656263 migration/rdma: Fix save_page method to fail on polling error
qemu_rdma_save_page() reports polling error with error_report(), then
succeeds anyway.  This is because the variable holding the polling
status *shadows* the variable the function returns.  The latter
remains zero.

Broken since day one, and duplicated more recently.

Fixes: 2da776db48 (rdma: core logic)
Fixes: b390afd8c5 (migration/rdma: Fix out of order wrid)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Message-ID: <20230921121312.1301864-2-armbru@redhat.com>
2023-09-29 08:13:57 +02:00
Fabiano Rosas
36e9aab3c5 migration: Move return path cleanup to main migration thread
Now that the return path thread is allowed to finish during a paused
migration, we can move the cleanup of the QEMUFiles to the main
migration thread.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-9-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Fabiano Rosas
ef796ee93b migration: Replace the return path retry logic
Replace the return path retry logic with finishing and restarting the
thread. This fixes a race when resuming the migration that leads to a
segfault.

Currently when doing postcopy we consider that an IO error on the
return path file could be due to a network intermittency. We then keep
the thread alive but have it do cleanup of the 'from_dst_file' and
wait on the 'postcopy_pause_rp' semaphore. When the user issues a
migrate resume, a new return path is opened and the thread is allowed
to continue.

There's a race condition in the above mechanism. It is possible for
the new return path file to be setup *before* the cleanup code in the
return path thread has had a chance to run, leading to the *new* file
being closed and the pointer set to NULL. When the thread is released
after the resume, it tries to dereference 'from_dst_file' and crashes:

Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
154         return f->last_error;

(gdb) bt
 #0  0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
 #1  0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
 #2  0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
 #3  0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
 #4  0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
 #5  0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Here's the race (important bit is open_return_path happening before
migration_release_dst_files):

migration                 | qmp                         | return path
--------------------------+-----------------------------+---------------------------------
			    qmp_migrate_pause()
			     shutdown(ms->to_dst_file)
			      f->last_error = -EIO
migrate_detect_error()
 postcopy_pause()
  set_state(PAUSED)
  wait(postcopy_pause_sem)
			    qmp_migrate(resume)
			    migrate_fd_connect()
			     resume = state == PAUSED
			     open_return_path <-- TOO SOON!
			     set_state(RECOVER)
			     post(postcopy_pause_sem)
							(incoming closes to_src_file)
							res = qemu_file_get_error(rp)
							migration_release_dst_files()
							ms->rp_state.from_dst_file = NULL
  post(postcopy_pause_rp_sem)
							postcopy_pause_return_path_thread()
							  wait(postcopy_pause_rp_sem)
							rp = ms->rp_state.from_dst_file
							goto retry
							qemu_file_get_error(rp)
							SIGSEGV
-------------------------------------------------------------------------------------------

We can keep the retry logic without having the thread alive and
waiting. The only piece of data used by it is the 'from_dst_file' and
it is only allowed to proceed after a migrate resume is issued and the
semaphore released at migrate_fd_connect().

Move the retry logic to outside the thread by waiting for the thread
to finish before pausing the migration.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-8-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Fabiano Rosas
d50f5dc075 migration: Consolidate return path closing code
We'll start calling the await_return_path_close_on_source() function
from other parts of the code, so move all of the related checks and
tracepoints into it.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-7-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Fabiano Rosas
b3b101157d migration: Remove redundant cleanup of postcopy_qemufile_src
This file is owned by the return path thread which is already doing
cleanup.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-6-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Fabiano Rosas
7478fb0df9 migration: Fix possible race when shutting down to_dst_file
It's not safe to call qemu_file_shutdown() on the to_dst_file without
first checking for the file's presence under the lock. The cleanup of
this file happens at postcopy_pause() and migrate_fd_cleanup() which
are not necessarily running in the same thread as migrate_fd_cancel().

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-5-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Fabiano Rosas
639decf529 migration: Fix possible races when shutting down the return path
We cannot call qemu_file_shutdown() on the return path file without
taking the file lock. The return path thread could be running it's
cleanup code and have just cleared the from_dst_file pointer.

Checking ms->to_dst_file for errors could also race with
migrate_fd_cleanup() which clears the to_dst_file pointer.

Protect both accesses by taking the file lock.

This was caught by inspection, it should be rare, but the next patches
will start calling this code from other places, so let's do the
correct thing.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-4-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Fabiano Rosas
28a8347281 migration: Fix possible race when setting rp_state.error
We don't need to set the rp_state.error right after a shutdown because
qemu_file_shutdown() always sets the QEMUFile error, so the return
path thread would have seen it and set the rp error itself.

Setting the error outside of the thread is also racy because the
thread could clear it after we set it.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-3-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Peter Xu
cf02f29e1e migration: Fix race that dest preempt thread close too early
We hit intermit CI issue on failing at migration-test over the unit test
preempt/plain:

qemu-system-x86_64: Unable to read from socket: Connection reset by peer
Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1
**
ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0)
(test program exited with status code -6)

Fabiano debugged into it and found that the preempt thread can quit even
without receiving all the pages, which can cause guest not receiving all
the pages and corrupt the guest memory.

To make sure preempt thread finished receiving all the pages, we can rely
on the page_requested_count being zero because preempt channel will only
receive requested page faults. Note, not all the faulted pages are required
to be sent via the preempt channel/thread; imagine the case when a
requested page is just queued into the background main channel for
migration, the src qemu will just still send it via the background channel.

Here instead of spinning over reading the count, we add a condvar so the
main thread can wait on it if that unusual case happened, without burning
the cpu for no good reason, even if the duration is short; so even if we
spin in this rare case is probably fine.  It's just better to not do so.

The condvar is only used when that special case is triggered.  Some memory
ordering trick is needed to guarantee it from happening (against the
preempt thread status field), so the main thread will always get a kick
when that triggers correctly.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886
Debugged-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-2-farosas@suse.de>
2023-09-27 13:58:02 -04:00
Avihai Horon
08fc4cb517 migration: Add .save_prepare() handler to struct SaveVMHandlers
Add a new .save_prepare() handler to struct SaveVMHandlers. This handler
is called early, even before migration starts, and can be used by
devices to perform early checks.

Refactor migrate_init() to be able to return errors and call
.save_prepare() from there.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-09-11 08:34:06 +02:00
Avihai Horon
f543aa222d migration: Move more initializations to migrate_init()
Initialization of mig_stats, compression_counters and VFIO bytes
transferred is hard-coded in migration code path and snapshot code path.

Make the code cleaner by initializing them in migrate_init().

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-09-11 08:34:06 +02:00
Avihai Horon
38c482b477 migration: Add migration prefix to functions in target.c
The functions in target.c are not static, yet they don't have a proper
migration prefix. Add such prefix.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-09-11 08:34:06 +02:00
Stefan Hajnoczi
06e0f098d6 io: follow coroutine AioContext in qio_channel_yield()
The ongoing QEMU multi-queue block layer effort makes it possible for multiple
threads to process I/O in parallel. The nbd block driver is not compatible with
the multi-queue block layer yet because QIOChannel cannot be used easily from
coroutines running in multiple threads. This series changes the QIOChannel API
to make that possible.

In the current API, calling qio_channel_attach_aio_context() sets the
AioContext where qio_channel_yield() installs an fd handler prior to yielding:

  qio_channel_attach_aio_context(ioc, my_ctx);
  ...
  qio_channel_yield(ioc); // my_ctx is used here
  ...
  qio_channel_detach_aio_context(ioc);

This API design has limitations: reading and writing must be done in the same
AioContext and moving between AioContexts involves a cumbersome sequence of API
calls that is not suitable for doing on a per-request basis.

There is no fundamental reason why a QIOChannel needs to run within the
same AioContext every time qio_channel_yield() is called. QIOChannel
only uses the AioContext while inside qio_channel_yield(). The rest of
the time, QIOChannel is independent of any AioContext.

In the new API, qio_channel_yield() queries the AioContext from the current
coroutine using qemu_coroutine_get_aio_context(). There is no need to
explicitly attach/detach AioContexts anymore and
qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are gone.
One coroutine can read from the QIOChannel while another coroutine writes from
a different AioContext.

This API change allows the nbd block driver to use QIOChannel from any thread.
It's important to keep in mind that the block driver already synchronizes
QIOChannel access and ensures that two coroutines never read simultaneously or
write simultaneously.

This patch updates all users of qio_channel_attach_aio_context() to the
new API. Most conversions are simple, but vhost-user-server requires a
new qemu_coroutine_yield() call to quiesce the vu_client_trip()
coroutine when not attached to any AioContext.

While the API is has become simpler, there is one wart: QIOChannel has a
special case for the iohandler AioContext (used for handlers that must not run
in nested event loops). I didn't find an elegant way preserve that behavior, so
I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true|false)
for opting in to the new AioContext model. By default QIOChannel uses the
iohandler AioHandler. Code that formerly called
qio_channel_attach_aio_context() now calls
qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
created.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20230830224802.493686-5-stefanha@redhat.com>
[eblake: also fix migration/rdma.c]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-07 20:32:11 -05:00
Stefan Hajnoczi
156618d9ea Pull request
v3:
 - Drop UFS emulation due to CI failures
 - Add "aio-posix: zero out io_uring sqe user_data"
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmTvLIEACgkQnKSrs4Gr
 c8itVggAka3RMkEclbeW7JKJBOolm3oUuJTobV8oJfDNMQ8mmom9JkXVUctyPWQT
 EF+oeqZz1omjr0Dk7YEA2toCahTbXm/UsG7i6cZg8JXPl6e9sOne0j+p5zO5x/kc
 YlG43SBQJHdp/BfTm/gvwUh0W2on0wadaeEV82m3ZyIrZGTgNcrC1p1gj5dwF5VX
 SqW02mgALETECyJpo8O7y9vNUYGxEtETG9jzAhtrugGpYk4bPeXlm/rc+2zwV+ET
 YCnfUvhjhlu5vS4nkta6natg0If16ODjy35vWYm/aGlgveGTqQq9HWgTL71eNuxm
 Smn+hJHuvkyBclKjbGiiO1W1MuG1/g==
 =UvNK
 -----END PGP SIGNATURE-----

Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging

Pull request

v3:
- Drop UFS emulation due to CI failures
- Add "aio-posix: zero out io_uring sqe user_data"

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmTvLIEACgkQnKSrs4Gr
# c8itVggAka3RMkEclbeW7JKJBOolm3oUuJTobV8oJfDNMQ8mmom9JkXVUctyPWQT
# EF+oeqZz1omjr0Dk7YEA2toCahTbXm/UsG7i6cZg8JXPl6e9sOne0j+p5zO5x/kc
# YlG43SBQJHdp/BfTm/gvwUh0W2on0wadaeEV82m3ZyIrZGTgNcrC1p1gj5dwF5VX
# SqW02mgALETECyJpo8O7y9vNUYGxEtETG9jzAhtrugGpYk4bPeXlm/rc+2zwV+ET
# YCnfUvhjhlu5vS4nkta6natg0If16ODjy35vWYm/aGlgveGTqQq9HWgTL71eNuxm
# Smn+hJHuvkyBclKjbGiiO1W1MuG1/g==
# =UvNK
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 30 Aug 2023 07:48:17 EDT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [ultimate]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [ultimate]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
  aio-posix: zero out io_uring sqe user_data
  tests/qemu-iotests/197: add testcase for CoR with subclusters
  block/io: align requests to subcluster_size
  block: add subcluster_size field to BlockDriverInfo
  block-migration: Ensure we don't crash during migration cleanup

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-30 09:20:27 -04:00
Fabiano Rosas
f187609f27 block-migration: Ensure we don't crash during migration cleanup
We can fail the blk_insert_bs() at init_blk_migration(), leaving the
BlkMigDevState without a dirty_bitmap and BlockDriverState. Account
for the possibly missing elements when doing cleanup.

Fix the following crashes:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at ../block/dirty-bitmap.c:359
359         BlockDriverState *bs = bitmap->bs;
 #0  0x0000555555ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at ../block/dirty-bitmap.c:359
 #1  0x0000555555bba331 in unset_dirty_tracking () at ../migration/block.c:371
 #2  0x0000555555bbad98 in block_migration_cleanup_bmds () at ../migration/block.c:681

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
7073        QLIST_FOREACH_SAFE(blocker, &bs->op_blockers[op], list, next) {
 #0  0x0000555555e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
 #1  0x0000555555e9734a in bdrv_op_unblock_all (bs=0x0, reason=0x0) at ../block.c:7095
 #2  0x0000555555bbae13 in block_migration_cleanup_bmds () at ../migration/block.c:690

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-id: 20230731203338.27581-1-farosas@suse.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-30 07:39:10 -04:00
Andrei Gudkov
3eb82637fb migration/dirtyrate: Fix precision losses and g_usleep overshoot
Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
Reviewed-by: Hyman Huang <yong.huang@smartx.com>
Message-Id: <8ddb0d40d143f77aab8f602bd494e01e5fa01614.1691161009.git.gudkov.andrei@huawei.com>
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
2023-08-29 10:19:03 +08:00
Juan Quintela
697c4c86ab migration/rdma: Split qemu_fopen_rdma() into input/output functions
This is how everything else in QEMUFile is structured.
As a bonus they are three less lines of code.

Reviewed-by: Peter Xu <peterx@redhat.com>
Message-ID: <20230530183941.7223-17-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Juan Quintela
ac6f48e15d qemu-file: Make qemu_file_get_error_obj() static
It was not used outside of qemu_file.c anyways.

Reviewed-by: Peter Xu <peterx@redhat.com>
Message-ID: <20230530183941.7223-21-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Juan Quintela
8c5ee0bfb8 qemu-file: Simplify qemu_file_shutdown()
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-ID: <20230530183941.7223-20-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Juan Quintela
9ccf83f486 qemu_file: Make qemu_file_is_writable() static
It is not used outside of qemu_file, and it shouldn't.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230530183941.7223-19-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Juan Quintela
cf786549ce migration: Change qemu_file_transferred to noflush
We do a qemu_fclose() just after that, that also does a qemu_fflush(),
so remove one qemu_fflush().

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230530183941.7223-3-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Juan Quintela
fc95c63b60 qemu-file: Rename qemu_file_transferred_ fast -> noflush
Fast don't say much.  Noflush indicates more clearly that it is like
qemu_file_transferred but without the flush.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230530183941.7223-2-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Wei Wang
82137e6c8c migration: enforce multifd and postcopy preempt to be set before incoming
qemu_start_incoming_migration needs to check the number of multifd
channels or postcopy ram channels to configure the backlog parameter (i.e.
the maximum length to which the queue of pending connections for sockfd
may grow) of listen(). So enforce the usage of postcopy-preempt and
multifd as below:
- need to use "-incoming defer" on the destination; and
- set_capability and set_parameter need to be done before migrate_incoming

Otherwise, disable the use of the features and report error messages to
remind users to adjust the commands.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230606101910.20456-2-wei.w.wang@intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Tejus GK
908927db28 migration: Update error description whenever migration fails
There are places in migration.c where the migration is marked failed with
MIGRATION_STATUS_FAILED, but the failure reason is never updated. Hence
libvirt doesn't know why the migration failed when it queries for it.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
Message-ID: <20230621130940.178659-2-tejus.gk@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
15699cf542 migration: Extend query-migrate to provide dirty page limit info
Extend query-migrate to provide throttle time and estimated
ring full time with dirty-limit capability enabled, through which
we can observe if dirty limit take effect during live migration.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-ID: <168733225273.5845.15871826788879741674-8@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
acac51ba24 migration: Implement dirty-limit convergence algo
Implement dirty-limit convergence algo for live migration,
which is kind of like auto-converge algo but using dirty-limit
instead of cpu throttle to make migration convergent.

Enable dirty page limit if dirty_rate_high_cnt greater than 2
when dirty-limit capability enabled, Disable dirty-limit if
migration be canceled.

Note that "set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit"
commands are not allowed during dirty-limit live migration.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <168733225273.5845.15871826788879741674-7@git.sr.ht>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
310ad5625e migration: Put the detection logic before auto-converge checking
This commit is prepared for the implementation of dirty-limit
convergence algo.

The detection logic of throttling condition can apply to both
auto-converge and dirty-limit algo, putting it's position
before the checking logic for auto-converge feature.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-ID: <168733225273.5845.15871826788879741674-6@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
bb9993c672 migration: Refactor auto-converge capability logic
Check if block migration is running before throttling
guest down in auto-converge way.

Note that this modification is kind of like code clean,
because block migration does not depend on auto-converge
capability, so the order of checks can be adjusted.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <168618975839.6361.17407633874747688653-5@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
dc62395557 migration: Introduce dirty-limit capability
Introduce migration dirty-limit capability, which can
be turned on before live migration and limit dirty
page rate durty live migration.

Introduce migrate_dirty_limit function to help check
if dirty-limit capability enabled during live migration.

Meanwhile, refactor vcpu_dirty_rate_stat_collect
so that period can be configured instead of hardcoded.

dirty-limit capability is kind of like auto-converge
but using dirty limit instead of traditional cpu-throttle
to throttle guest down. To enable this feature, turn on
the dirty-limit capability before live migration using
migrate-set-capabilities, and set the parameters
"x-vcpu-dirty-limit-period", "vcpu-dirty-limit" suitably
to speed up convergence.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <168618975839.6361.17407633874747688653-4@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
09f9ec9913 qapi/migration: Introduce vcpu-dirty-limit parameters
Introduce "vcpu-dirty-limit" migration parameter used
to limit dirty page rate during live migration.

"vcpu-dirty-limit" and "x-vcpu-dirty-limit-period" are
two dirty-limit-related migration parameters, which can
be set before and during live migration by qmp
migrate-set-parameters.

This two parameters are used to help implement the dirty
page rate limit algo of migration.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <168618975839.6361.17407633874747688653-3@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)
4d80785719 qapi/migration: Introduce x-vcpu-dirty-limit-period parameter
Introduce "x-vcpu-dirty-limit-period" migration experimental
parameter, which is in the range of 1 to 1000ms and used to
make dirtyrate calculation period configurable.

Currently with the "x-vcpu-dirty-limit-period" varies, the
total time of live migration changes, test results show the
optimal value of "x-vcpu-dirty-limit-period" ranges from
500ms to 1000 ms. "x-vcpu-dirty-limit-period" should be made
stable once it proves best value can not be determined with
developer's experiments.

Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <168618975839.6361.17407633874747688653-2@git.sr.ht>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Fabiano Rosas
01ec0f3a92 migration/multifd: Protect accesses to migration_threads
This doubly linked list is common for all the multifd and migration
threads so we need to avoid concurrent access.

Add a mutex to protect the data from concurrent access. This fixes a
crash when removing two MigrationThread objects from the list at the
same time during cleanup of multifd threads.

Fixes: 671326201d ("migration: Introduce interface query-migrationthreads")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230607161306.31425-3-farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Fabiano Rosas
788fa68041 migration/multifd: Rename threadinfo.c functions
We're about to add more functions to this file so make it use the same
coding style as the rest of the code.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230607161306.31425-2-farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Michael Tokarev
d8b71d96b3 migration: spelling fixes
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
2023-07-25 17:13:20 +03:00