Report shows that commit 34a8892dec broke iotest 055:
https://lore.kernel.org/r/b8806360-a2b6-4608-83a3-db67e264c733@linaro.org
Denis Rastyogin reported more such issue:
https://lore.kernel.org/r/20241107114256.106831-1-gerben@altlinux.org
In this merge, the migration_is_idle() function was replaced with
migrate_is_running(). However, the null pointer check for `s` was
removed, leading to a dereference of `s` when using qemu-system-x86_64
-hda *.vdi.
When replacing migration_is_idle() with "!migration_is_running()", it was
overlooked that the idle helper also checks for current_migration being
available first. Sample stack dump:
migration_is_running
is_busy
migrate_add_blocker_modes
migrate_add_blocker_normal
vmdk_open
bdrv_open_driver
bdrv_open_common
bdrv_open_inherit
bdrv_open
blk_new_open
blockdev_init
drive_new
drive_init_func
qemu_opts_foreach
configure_blockdev
qemu_create_early_backends
qemu_init
main
The check would be there if the whole series was applied, but since the
last patches in the previous series rely on some other patches to land
first, we need to recover the behavior of migration_is_idle() first before
that whole set will be merged.
I left migration_is_active / migration_is_device alone, as I don't think
it's possible for them to hit uninitialized current_migration. Also they're
prone to removal soon from VFIO side.
Cc: Peter Maydell <peter.maydell@linaro.org>
Fixes: 34a8892dec ("migration: Drop migration_is_idle()")
Reported-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reported-by: Denis Rastyogin <gerben@altlinux.org>
Tested-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241105182725.2393425-1-peterx@redhat.com
[peterx: enhance commit msg]
Signed-off-by: Peter Xu <peterx@redhat.com>
Now with the current migration_is_running(), it will report exactly the
opposite of what will be reported by migration_is_idle().
Drop migration_is_idle(), instead use "!migration_is_running()" which
should be identical on functionality.
In reality, most of the idle check is inverted, so it's even easier to
write with "migrate_is_running()" check.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
This helper is mostly the same as migration_is_running(), except that one
has COLO reported as true, the other has CANCELLING reported as true.
Per my past years experience on the state changes, none of them should
matter.
To make it slightly safer, report both COLO || CANCELLING to be true in
migration_is_running(), then drop the other one. We kept the 1st only
because the name is simpler, and clear enough.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Both migration thread or background snapshot thread will take a refcount of
the migration object at the entrace of the thread function.
That makes sense, because it protects the object from being freed by the
main thread in migration_shutdown() later, but it might still race with it
if the thread is scheduled too late. Consider the case right after
pthread_create() happened, VM shuts down with the object released, but
right after that the migration thread finally got created, referencing
MigrationState* in the opaque pointer which is already freed.
The only 100% safe way to make sure it won't get freed is taking the
refcount right before the thread is created, meanwhile when BQL is held.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
When VM is configured with huge memory, the current throttle logic
doesn't look like to scale, because migration_trigger_throttle()
is only called for each iteration, so it won't be invoked for a long
time if one iteration can take a long time.
The periodic dirty sync aims to fix the above issue by synchronizing
the ramblock from remote dirty bitmap and, when necessary, triggering
the CPU throttle multiple times during a long iteration.
This is a trade-off between synchronization overhead and CPU throttle
impact.
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/f61f1b3653f2acf026901103e1c73d157d38b08f.1729146786.git.yong.huang@smartx.com
[peterx: make prev_cnt global, and reset for each migration]
Signed-off-by: Peter Xu <peterx@redhat.com>
Move cpu-throttle.c from system to migration since it's
only used for migration; this makes us avoid exporting the
util functions and variables in misc.h but export them in
migration.h when implementing the periodic ramblock dirty
sync feature in the upcoming commits.
Since CPU throttle timers are only used in migration, move
their registry to migration_object_init.
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/c1b3efaa0cb49e03d422e9da97bdb65cc3d234d1.1729146786.git.yong.huang@smartx.com
[peterx: Fix build on MacOS on cocoa.m, not move cpu-throttle.h yet]
[peterx: Fix subject spelling, per pm215]
Signed-off-by: Peter Xu <peterx@redhat.com>
Keep migration thread names together, so it's easier to see a list of all
possible migration threads.
Still two functional changes below besides the macro defintions:
- There's one dirty rate thread that we overlooked before, now we add
that too and name it as "mig/dirtyrate" following the old rules.
- The old name "mig/src/rp-thr" has "-thr" but it may not be useful if
it's a thread name anyway, while "rp" can be slightly hard to read.
Taking this chance to rename it to "mig/src/return", hopefully a better
name.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Link: https://lore.kernel.org/r/20241011153652.517440-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
The cleanup function can in many cases needs cleanup on its own.
The major thing we want to do here is not referencing to_dst_file when
without the file mutex. When at it, touch things elsewhere too to make it
look slightly better in general.
One thing to mention is, migration_thread has its own "running" boolean, so
it doesn't need to rely on to_dst_file being non-NULL. Multifd has a
dependency so it needs to be skipped if to_dst_file is not yet set; add a
richer comment for such reason.
Resolves: Coverity CID 1527402
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240919163042.116767-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
../migration/dirtyrate.c:186:5: error: ‘records’ may be used uninitialized [-Werror=maybe-uninitialized]
../migration/dirtyrate.c:168:12: error: ‘gen_id’ may be used uninitialized [-Werror=maybe-uninitialized]
../migration/migration.c:2273:5: error: ‘file’ may be used uninitialized [-Werror=maybe-uninitialized]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Hyman Huang <yong.huang@smartx.com>
Fix a segmentation fault in multifd when rb->receivedmap is cleared
too early.
After commit 5ef7e26bdb ("migration/multifd: solve zero page causing
multiple page faults"), multifd started using the rb->receivedmap
bitmap, which belongs to ram.c and is initialized and *freed* from the
ram SaveVMHandlers.
Multifd threads are live until migration_incoming_state_destroy(),
which is called after qemu_loadvm_state_cleanup(), leading to a crash
when accessing rb->receivedmap.
process_incoming_migration_co() ...
qemu_loadvm_state() multifd_nocomp_recv()
qemu_loadvm_state_cleanup() ramblock_recv_bitmap_set_offset()
rb->receivedmap = NULL set_bit_atomic(..., rb->receivedmap)
...
migration_incoming_state_destroy()
multifd_recv_cleanup()
multifd_recv_terminate_threads(NULL)
Move the loadvm cleanup into migration_incoming_state_destroy(), after
multifd_recv_cleanup() to ensure multifd threads have already exited
when rb->receivedmap is cleared.
Adjust the postcopy listen thread comment to indicate that we still
want to skip the cpu synchronization.
CC: qemu-stable@nongnu.org
Fixes: 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page faults")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240917185802.15619-3-farosas@suse.de
[peterx: added comment in migration_incoming_state_destroy()]
Signed-off-by: Peter Xu <peterx@redhat.com>
This patch adds a migration state on src called "postcopy-recover-setup".
The new state will describe the intermediate step starting from when the
src QEMU received a postcopy recovery request, until the migration channels
are properly established, but before the recovery process take place.
The request came from Libvirt where Libvirt currently rely on the migration
state events to detect migration state changes. That works for most of the
migration process but except postcopy recovery failures at the beginning.
Currently postcopy recovery only has two major states:
- postcopy-paused: this is the state that both sides of QEMU will be in
for a long time as long as the migration channel was interrupted.
- postcopy-recover: this is the state where both sides of QEMU handshake
with each other, preparing for a continuation of postcopy which used to
be interrupted.
The issue here is when the recovery port is invalid, the src QEMU will take
the URI/channels, noticing the ports are not valid, and it'll silently keep
in the postcopy-paused state, with no event sent to Libvirt. In this case,
the only thing Libvirt can do is to poll the migration status with a proper
interval, however that's less optimal.
Considering that this is the only case where Libvirt won't get a
notification from QEMU on such events, let's add postcopy-recover-setup
state to mimic what we have with the "setup" state of a newly initialized
migration, describing the phase of connection establishment.
With that, postcopy recovery will have two paths to go now, and either path
will guarantee an event generated. Now the events will look like this
during a recovery process on src QEMU:
- Initially when the recovery is initiated on src, QEMU will go from
"postcopy-paused" -> "postcopy-recover-setup". Old QEMUs don't have
this event.
- Depending on whether the channel re-establishment is succeeded:
- In succeeded case, src QEMU will move from "postcopy-recover-setup"
to "postcopy-recover". Old QEMUs also have this event.
- In failure case, src QEMU will move from "postcopy-recover-setup" to
"postcopy-paused" again. Old QEMUs don't have this event.
This guarantees that Libvirt will always receive a notification for
recovery process properly.
One thing to mention is, such new status is only needed on src QEMU not
both. On dest QEMU, the state machine doesn't change. Hence the events
don't change either. It's done like so because dest QEMU may not have an
explicit point of setup start. E.g., it can happen that when dest QEMUs
doesn't use migrate-recover command to use a new URI/channel, but the old
URI/channels can be reused in recovery, in which case the old ports simply
can work again after the network routes are fixed up.
Add a new helper postcopy_is_paused() detecting whether postcopy is still
paused, taking RECOVER_SETUP into account too. When using it on both
src/dst, a slight change is done altogether to always wait for the
semaphore before checking the status, because for both sides a sem_post()
will be required for a recovery.
Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Prasad Pandit <ppandit@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Buglink: https://issues.redhat.com/browse/RHEL-38485
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Destination QEMU can setup incoming ports for two purposes: either a fresh
new incoming migration, in which QEMU will switch to SETUP for channel
establishment, or a paused postcopy migration, in which QEMU will stay in
POSTCOPY_PAUSED until kicking off the RECOVER phase.
Now the state machine worked on dest node for the latter, only because
migrate_set_state() implicitly will become a noop if the current state
check failed. It wasn't clear at all.
Clean it up by providing a helper migration_incoming_state_setup() doing
proper checks over current status. Postcopy-paused will be explicitly
checked now, and then we can bail out for unknown states.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
QEMU uses "int" in most cases even if it stores MigrationStatus. I don't
know why, so let's try to do that right and see what blows up..
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The postcopy thread names on dest QEMU are slightly confusing, partly I'll
need to blame myself on 36f62f11e4 ("migration: Postcopy preemption
preparation on channel creation"). E.g., "fault-fast" reads like a fast
version of "fault-default", but it's actually the fast version of
"postcopy/listen".
Taking this chance, rename all the migration threads with proper rules.
Considering we only have 15 chars usable, prefix all threads with "mig/",
meanwhile identify src/dst threads properly this time. So now most thread
names will look like "mig/DIR/xxx", where DIR will be "src"/"dst", except
the bg-snapshot thread which doesn't have a direction.
For multifd threads, making them "mig/{src|dst}/{send|recv}_%d".
We used to have "live_migration" thread for a very long time, now it's
called "mig/src/main". We may hope to have "mig/dst/main" soon but not
yet.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
When multifd is used along with mapped-ram, we can take benefit of a
filesystem that supports the O_DIRECT flag and perform direct I/O in
the multifd threads. This brings a significant performance improvement
because direct-io writes bypass the page cache which would otherwise
be thrashed by the multifd data which is unlikely to be needed again
in a short period of time.
To be able to use a multifd channel opened with O_DIRECT, we must
ensure that a certain aligment is used. Filesystems usually require a
block-size alignment for direct I/O. The way to achieve this is by
enabling the mapped-ram feature, which already aligns its I/O properly
(see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).
By setting O_DIRECT on the multifd channels, all writes to the same
file descriptor need to be aligned as well, even the ones that come
from outside multifd, such as the QEMUFile I/O from the main migration
code. This makes it impossible to use the same file descriptor for the
QEMUFile and for the multifd channels. The various flags and metadata
written by the main migration code will always be unaligned by virtue
of their small size. To workaround this issue, we'll require a second
file descriptor to be used exclusively for direct I/O.
The second file descriptor can be obtained by QEMU by re-opening the
migration file (already possible), or by being provided by the user or
management application (support to be added in future patches).
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
- Li Zhijian's COLO minor fixes
- Marc-André's virtio-gpu fix
- Fiona's virtio-net USO fix
- A couple of migration-test fixes from Thomas
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmZObggQHGZhcm9zYXNA
c3VzZS5kZQAKCRDHmNx0G+wxnWE8D/49RGE+g29qyk9aKx3lU8mSq+ZzmX5GncBt
5+Mx5qoHDsBCQTE+dQpEVIoeMJ2HIbgbOML4qsnp6Hw/4/TWkfwC/R6+ZmHBevRk
fVLkVh2JMHVg8Tq+0FO1X1QnMU03uJ7EAuWdDa8HqlJ5dQY/K3gDaku8oQBXk96X
13pChSbMob76tdb+wiwbdEakabigH7XfrPdI6lzI8MCGTIcPKc/UKTFYuoj/OsNx
raqy+uBtvKtfHxiaYnIgHIPNAF/1f4tP3iAOcPoZWIMXWxFkE8+ANDJAbWo6xIcL
DGg/wEzZO/OnXLjOhjvLBUHK/fx4wQ5bsqA09BVxoRyBGblkXr+bcwBLYjgiEqzT
aniPiAx5W/Db+T7HqZPIWesFYj3cmcwvYUTrx/RPMdC0epG+ZczDMtescHdZbxvt
Pjs3nFeCLhyYcVhlTI72eXRCxdd/26+r6/OmrBC2+GaZrybM61TvNo+3XvO0Pfhi
UmwF2EN27XmSMelLvH/MnflUVgBHKDs3CCQzDlxreHq2jMVR0SL7LU5wMJJ58Iok
M3u74izQM25bwYxiASH+4iRn0puH1mOwgOx28W0uiQfZY/678/lCnwa1Tul15BRE
fIQZJhyIGzhSpwLqEXmdXdlLQs1isqIgpd/mzKgZ285nLr7kz+4gxCUqiXgVbrl7
P45Dym1u4g==
=DDrh
-----END PGP SIGNATURE-----
Merge tag 'migration-20240522-pull-request' of https://gitlab.com/farosas/qemu into staging
Migration pull request
- Li Zhijian's COLO minor fixes
- Marc-André's virtio-gpu fix
- Fiona's virtio-net USO fix
- A couple of migration-test fixes from Thomas
# -----BEGIN PGP SIGNATURE-----
#
# iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmZObggQHGZhcm9zYXNA
# c3VzZS5kZQAKCRDHmNx0G+wxnWE8D/49RGE+g29qyk9aKx3lU8mSq+ZzmX5GncBt
# 5+Mx5qoHDsBCQTE+dQpEVIoeMJ2HIbgbOML4qsnp6Hw/4/TWkfwC/R6+ZmHBevRk
# fVLkVh2JMHVg8Tq+0FO1X1QnMU03uJ7EAuWdDa8HqlJ5dQY/K3gDaku8oQBXk96X
# 13pChSbMob76tdb+wiwbdEakabigH7XfrPdI6lzI8MCGTIcPKc/UKTFYuoj/OsNx
# raqy+uBtvKtfHxiaYnIgHIPNAF/1f4tP3iAOcPoZWIMXWxFkE8+ANDJAbWo6xIcL
# DGg/wEzZO/OnXLjOhjvLBUHK/fx4wQ5bsqA09BVxoRyBGblkXr+bcwBLYjgiEqzT
# aniPiAx5W/Db+T7HqZPIWesFYj3cmcwvYUTrx/RPMdC0epG+ZczDMtescHdZbxvt
# Pjs3nFeCLhyYcVhlTI72eXRCxdd/26+r6/OmrBC2+GaZrybM61TvNo+3XvO0Pfhi
# UmwF2EN27XmSMelLvH/MnflUVgBHKDs3CCQzDlxreHq2jMVR0SL7LU5wMJJ58Iok
# M3u74izQM25bwYxiASH+4iRn0puH1mOwgOx28W0uiQfZY/678/lCnwa1Tul15BRE
# fIQZJhyIGzhSpwLqEXmdXdlLQs1isqIgpd/mzKgZ285nLr7kz+4gxCUqiXgVbrl7
# P45Dym1u4g==
# =DDrh
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 22 May 2024 03:13:28 PM PDT
# gpg: using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D
# gpg: issuer "farosas@suse.de"
# gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown]
# gpg: aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3 64CF C798 DC74 1BEC 319D
* tag 'migration-20240522-pull-request' of https://gitlab.com/farosas/qemu:
tests/qtest/migration-test: Fix the check for a successful run of analyze-migration.py
tests/qtest/migration-test: Run some basic tests on s390x and ppc64 with TCG, too
hw/core/machine: move compatibility flags for VirtIO-net USO to machine 8.1
virtio-gpu: fix v2 migration
migration: fix a typo
migration: add "exists" info to load-state-field trace
migration/colo: Tidy up bql_unlock() around bdrv_activate_all()
migration/colo: make colo_incoming_co() return void
migration/colo: Minor fix for colo error message
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Currently, it always returns 0, no need to check the return value at all.
In addition, enter colo coroutine only if migration_incoming_colo_enabled()
is true.
Once the destination side enters the COLO* state, the COLO process will
take over the remaining processes until COLO exits.
Cc: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
[fixed mangled author email address]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Use it to update the current error of the migration stream if
available and if not, simply print out the error. Next changes will
update with an error to report.
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Acked-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The 'compress' migration capability enables the old compression code
which has shown issues over the years and is thought to be less stable
and tested than the more recent multifd-based compression. The old
compression code has been deprecated in 8.2 and now is time to remove
it.
Deprecation commit 864128df46 ("migration: Deprecate old compression
method").
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The block migration has been considered obsolete since QEMU 8.2 in
favor of the more flexible storage migration provided by the
blockdev-mirror driver. Two releases have passed so now it's time to
remove it.
Deprecation commit 66db46ca83 ("migration: Deprecate block
migration").
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The block migration is considered obsolete and has been deprecated in
8.2. Remove the migrate command option that enables it. This only
affects the QMP and HMP commands, the feature can still be accessed by
setting the migration 'block' capability. The whole feature will be
removed in a future patch.
Deprecation commit 8846b5bfca ("migration: migrate 'blk' command
option is deprecated.").
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The block incremental option for block migration has been deprecated
in 8.2 in favor of using the block-mirror feature. Remove it now.
Deprecation commit 40101f320d ("migration: migrate 'inc' command
option is deprecated.").
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The 'skipped' field of the MigrationStats struct has been deprecated
in 8.1. Time to remove it.
Deprecation commit 7b24d32634 ("migration: skipped field is really
obsolete.").
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Now we do set MIGRATION_FAILED state, but don't give a chance to
orchestrator to query migration state and get the error.
Let's provide a possibility for QMP-based orchestrators to get an error
like with outgoing migration.
For hmp_migrate_incoming(), let's enable the new behavior: HMP is not
and ABI, it's mostly intended to use by developer and it makes sense
not to stop the process.
For x-exit-preconfig, let's keep the old behavior:
- it's called from init(), so here we want to keep current behavior by
default
- it does exit on error by itself as well
So, if we want to change the behavior of x-exit-preconfig, it should be
another patch.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Unify error reporting in the function. This simplifies the following
commit, which will not-exit-on-error behavior variant to the function.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
It's bad idea to leave critical section with error object freed, but
s->error still set, this theoretically may lead to use-after-free
crash. Let's avoid it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Make call to migration_incoming_state_destroy(), instead of doing only
partial of it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Address the comment added in commit 4629ed1e98
("qerror: Finally unused, clean up"), from 2015:
/*
* These macros will go away, please don't use
* in new code, and do not add new ones!
*/
Mechanical transformation using sed, manually
removing the definition in include/qapi/qmp/qerror.h.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240312141343.3168265-10-armbru@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
[Straightforward conflict with commit aeaafb1e59 (migration: export
migration_is_running) resolved]
This prepares ground for the changes coming next which add an Error**
argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
now handle the error and fail earlier setting the migration state from
MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
In qemu_savevm_state(), move the cleanup to preserve the error
reported by .save_setup() handlers.
Since the previous behavior was to ignore errors at this step of
migration, this change should be examined closely to check that
cleanups are still correctly done.
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240320064911.545001-7-clg@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
There are several places where postcopy_start() fails without setting
errp. This can cause a null pointer de-reference, as in case of error,
the caller of postcopy_start() copies/prints the error set in errp.
Fix it by setting errp in all of postcopy_start() error paths.
Cc: qemu-stable <qemu-stable@nongnu.org>
Fixes: 908927db28 ("migration: Update error description whenever migration fails")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240328140252.16756-3-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
After commit 9425ef3f99 ("migration: Use migrate_has_error() in
close_return_path_on_source()"), close_return_path_on_source() assumes
that migration error is set if an error occurs during migration.
This may not be true if migration errors in migration_completion(). For
example, if qemu_savevm_state_complete_precopy() errors, migration error
will not be set.
This in turn, will cause a migration hang bug, similar to the bug that
was fixed by commit 22b04245f0 ("migration: Join the return path
thread before releasing to_dst_file"), as shutdown() will not be issued
for the return-path channel.
Fix it by ensuring migration error is set in case of error in
migration_completion().
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Fixes: 9425ef3f99 ("migration: Use migrate_has_error() in close_return_path_on_source()")
Acked-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/r/20240328140252.16756-2-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
With current code base I can observe extremely high sync count during
precopy, as long as one enables postcopy-ram=on before switchover to
postcopy.
To provide some context of when QEMU decides to do a full sync: it checks
must_precopy (which implies "data must be sent during precopy phase"), and
as long as it is lower than the threshold size we calculated (out of
bandwidth and expected downtime) QEMU will kick off the slow/exact sync.
However, when postcopy is enabled (even if still during precopy phase), RAM
only reports all pages as can_postcopy, and report must_precopy==0. Then
"must_precopy <= threshold_size" mostly always triggers and enforces a slow
sync for every call to migration_iteration_run() when postcopy is enabled
even if not used. That is insane.
It turns out it was a regress bug introduced in the previous refactoring in
8.0 as reported by Nina [1]:
(a) c8df4a7aef ("migration: Split save_live_pending() into state_pending_*")
Then a workaround patch is applied at the end of release (8.0-rc4) to fix it:
(b) 28ef5339c3 ("migration: fix ram_state_pending_exact()")
However that "workaround" was overlooked when during the cleanup in this
9.0 release in this commit..
(c) b0504edd40 ("migration: Drop unnecessary check in ram's pending_exact()")
Then the issue was re-exposed as reported by Nina [1].
The problem with (b) is that it only fixed the case for RAM, rather than
all the rest of iterators. Here a slow sync should only be required if all
dirty data (precopy+postcopy) is less than the threshold_size that QEMU
calculated. It is even debatable whether a sync is needed when switched to
postcopy. Currently ram_state_pending_exact() will be mostly noop if
switched to postcopy, and that logic seems to apply too for all the rest of
iterators, as sync dirty bitmap during a postcopy doesn't make much sense.
However let's leave such change for later, as we're in rc phase.
So rather than reusing commit (b), this patch provides the complete fix for
all iterators. When at it, cleanup a little bit on the lines around.
[1] https://gitlab.com/qemu-project/qemu/-/issues/1565
Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Fixes: b0504edd40 ("migration: Drop unnecessary check in ram's pending_exact()")
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240320214453.584374-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
This reverts commit decdc76772 in full
and also the relevant migration-tests from
7a09f09283.
After the addition of the new QAPI-based migration address API in 8.2
we've been converting an "fd:" URI into a SocketAddress, missing the
fact that the "fd:" syntax could also be used for a plain file instead
of a socket. This is a problem because the SocketAddress is part of
the API, so we're effectively asking users to create a "socket"
channel to pass in a plain file.
The easiest way to fix this situation is to deprecate the usage of
both SocketAddress and "fd:" when used with a plain file for
migration. Since this has been possible since 8.2, we can wait until
9.1 to deprecate it.
For 9.0, however, we should avoid adding further support to migration
to a plain file using the old "fd:" syntax or the new SocketAddress
API, and instead require the usage of either the old-style "file:" URI
or the FileMigrationArgs::filename field of the new API with the
"/dev/fdset/NN" syntax, both of which are already supported.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240319210941.1907-1-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
When doing migration using the fd: URI, QEMU will fetch the file
descriptor passed in via the monitor at
fd_start_outgoing|incoming_migration(), which means the checks at
migration_channels_and_transport_compatible() happen too soon and we
don't know at that point whether the FD refers to a plain file or a
socket.
For this reason, we've been allowing a migration channel of type
SOCKET_ADDRESS_TYPE_FD to pass the initial verifications in scenarios
where the socket migration is not supported, such as with fd + multifd.
The commit decdc76772 ("migration/multifd: Add mapped-ram support to
fd: URI") was supposed to add a second check prior to starting
migration to make sure a socket fd is not passed instead of a file fd,
but failed to do so.
Add the missing verification and update the comment explaining this
situation which is currently incorrect.
Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240315032040.7974-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
Delete the MigrationState parameter from migration_is_setup_or_active
and move it to the public API in misc.h.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/1710179338-294359-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
If we receive a file descriptor that points to a regular file, there's
nothing stopping us from doing multifd migration with mapped-ram to
that file.
Enable the fd: URI to work with multifd + mapped-ram.
Note that the fds passed into multifd are duplicated because we want
to avoid cross-thread effects when doing cleanup (i.e. close(fd)). The
original fd doesn't need to be duplicated because monitor_get_fd()
transfers ownership to the caller.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240229153017.2221-23-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
The new mapped-ram stream format uses a file transport and puts ram
pages in the migration file at their respective offsets and can be
done in parallel by using the pwritev system call which takes iovecs
and an offset.
Add support to enabling the new format along with multifd to make use
of the threading and page handling already in place.
This requires multifd to stop sending headers and leaving the stream
format to the mapped-ram code. When it comes time to write the data, we
need to call a version of qio_channel_write that can take an offset.
Usage on HMP is:
(qemu) stop
(qemu) migrate_set_capability multifd on
(qemu) migrate_set_capability mapped-ram on
(qemu) migrate_set_parameter max-bandwidth 0
(qemu) migrate_set_parameter multifd-channels 8
(qemu) migrate file:migfile
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-21-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
On the receiving side we don't need to differentiate between main
channel and threads, so whichever channel is defined first gets to be
the main one. And since there are no packets, use the atomic channel
count to index into the params array.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-19-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
The mapped-ram migration format needs a channel that supports seeking
to be able to write each page to an arbitrary offset in the migration
stream.
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-9-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
Add a new migration capability 'mapped-ram'.
The core of the feature is to ensure that RAM pages are mapped
directly to offsets in the resulting migration file instead of being
streamed at arbitrary points.
The reasons why we'd want such behavior are:
- The resulting file will have a bounded size, since pages which are
dirtied multiple times will always go to a fixed location in the
file, rather than constantly being added to a sequential
stream. This eliminates cases where a VM with, say, 1G of RAM can
result in a migration file that's 10s of GBs, provided that the
workload constantly redirties memory.
- It paves the way to implement O_DIRECT-enabled save/restore of the
migration stream as the pages are ensured to be written at aligned
offsets.
- It allows the usage of multifd so we can write RAM pages to the
migration file in parallel.
For now, enabling the capability has no effect. The next couple of
patches implement the core functionality.
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-8-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>