The global dirty log synchronization is used when KVM and dirty ring
are enabled. There is a particularity for ARM64 where the backup
bitmap is used to track dirty pages in non-running-vcpu situations.
It means the dirty ring works with the combination of ring buffer
and backup bitmap. The dirty bits in the backup bitmap needs to
collected in the last stage of live migration.
In order to identify the last stage of live migration and pass it
down, an extra parameter is added to the relevant functions and
callbacks. This last stage indicator isn't used until the dirty
ring is enabled in the subsequent patches.
No functional change intended.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-Id: <20230509022122.20888-2-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All types used are forward-declared in "qemu/typedefs.h".
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230405160454.97436-2-philmd@linaro.org>
[thuth: Add hw/core/cpu.h to migration/dirtyrate.c to fix compile failure]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Function is already quite long.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230508130909.65420-7-quintela@redhat.com>
Change all the functions that use it. It was already passed as
uint64_t.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230508130909.65420-6-quintela@redhat.com>
It is really size_t. Everything else uses uint64_t, so move this to
uint64_t as well. A size can't be negative anyways.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230508130909.65420-5-quintela@redhat.com>
That the implementation does the check every 100 milliseconds is an
implementation detail that shouldn't be seen on the interfaz.
Notice that all callers of qemu_file_set_rate_limit() used the
division or pass 0, so this change is a NOP.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230508130909.65420-4-quintela@redhat.com>
And it is the best way to not have rate_limit.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230508130909.65420-2-quintela@redhat.com>
After the previous two patches, there is nothing else that is target
specific.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230511141208.17779-6-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230511141208.17779-5-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230511141208.17779-4-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230511141208.17779-3-quintela@redhat.com>
This significantly reduces overhead of dirty page
rate calculation in sampling mode.
Tested using 32GiB VM on E5-2690 CPU.
With CRC32:
total_pages=8388608 sampled_pages=16384 millis=71
With xxHash:
total_pages=8388608 sampled_pages=16384 millis=14
Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
Message-Id: <cd115a89fc81d5f2eeb4ea7d57a98b84f794f340.1682598010.git.gudkov.andrei@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Expose qemu_cpu_list_lock globally so that we can use
WITH_QEMU_LOCK_GUARD and QEMU_LOCK_GUARD to simplify a few code paths
now and in future.
Signed-off-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230427020925.51003-2-quic_jiles@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We generally require same set of capabilities on source and target.
Let's require x-colo capability to use COLO on target.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20230428194928.1426370-11-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
COLO is not listed as running state in migrate_is_running(), so, it's
theoretically possible to disable colo capability in COLO state and the
unexpected error in migration_iteration_finish() is reachable.
Let's disallow that in qmp_migrate_set_capabilities. Than the error
becomes absolutely unreachable: we can get into COLO state only with
enabled capability and can't disable it while we are in COLO state. So
substitute the error by simple assertion.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230428194928.1426370-10-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
have_colo_incoming_thread variable is unused. colo_incoming_thread can
be local.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20230428194928.1426370-6-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We don't allow to use x-colo capability when replication is not
configured. So, no reason to build COLO when replication is disabled,
it's unusable in this case.
Note also that the check in migrate_caps_check() is not the only
restriction: some functions in migration/colo.c will just abort if
called with not defined CONFIG_REPLICATION, for example:
migration_iteration_finish()
case MIGRATION_STATUS_COLO:
migrate_start_colo_process()
colo_process_checkpoint()
abort()
It could probably make sense to have possibility to enable COLO without
REPLICATION, but this requires deeper audit of colo & replication code,
which may be done later if needed.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Acked-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230428194928.1426370-4-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
colo_checkpoint_notify() is mostly used in colo.c. Outside we use it
once when x-checkpoint-delay migration parameter is set. So, let's
simplify the external API to only that function - notify COLO that
parameter was set. This make external API more robust and hides
implementation details from external callers. Also this helps us to
make COLO module optional in further patch (i.e. we are going to add
possibility not build the COLO module).
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20230428194928.1426370-3-vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This will be used in the next commits to add colo support to multifd.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <88135197411df1a71d7832962b39abf60faf0021.1683572883.git.lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This is not required, colo_flush_ram_cache does not run concurrently
with the multifd threads since the cache is only flushed after
everything has been received. But it makes me more comfortable.
This will be used in the next commits to add colo support to multifd.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <35cb23ba854151d38a31e3a5c8a1020e4283cb4a.1683572883.git.lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The overhead of the mutex in non-multifd mode is negligible,
because in that case its just the single thread taking the mutex.
This will be used in the next commits to add colo support to multifd.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <22d83cb428f37929563155531bfb69fd8953cc61.1683572883.git.lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Commit fe904ea824 added a fail_inactivate label, which tries to
reactivate disks on the source after a failure while s->state ==
MIGRATION_STATUS_ACTIVE, but didn't actually use the label if
qemu_savevm_state_complete_precopy() failed. This failure to
reactivate is also present in commit 6039dd5b1c (also covering the new
s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring
s->block_inactive is set more reliably).
Consolidate the two labels back into one - no matter HOW migration is
failed, if there is any chance we can reach vm_start() after having
attempted inactivation, it is essential that we have tried to restart
disks before then. This also makes the cleanup more like
migrate_fd_cancel().
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20230502205212.134680-1-eblake@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This fixes compress with colo.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Make ram-compress.c target independent.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Before this series, "nothing to send" was handled by the file buffer
being empty. Now it is tracked via param->result.
Assert that the file buffer state matches the result.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
No functional changes intended.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Make compression interfaces take send_queued_data() as an argument.
Remove save_page_use_compression() from flush_compressed_data().
This removes the last ram.c dependency from the core compress code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This makes the core compress code more independend from ram.c.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
save_page_header() accesses several global variables, so calling it
from multiple threads is pretty ugly.
Instead, call save_page_header() before writing out the compressed
data from the compress buffer to the migration stream.
This also makes the core compress code more independend from ram.c.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
And take the param->mutex lock for the whole section to ensure
thread-safety.
Now, it is explicitly clear if there is no queued data to send.
Before, this was handled by param->file stream being empty and thus
qemu_put_qemu_file() not sending anything.
This will be used in the next commits to move save_page_header()
out of compress code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Instead introduce a extra parameter to trigger the compress thread.
Now, when the compress thread is done, we know what RAMBlock and
offset it did compress.
This will be used in the next commits to move save_page_header()
out of compress code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This will be used in the next commits to move save_page_header()
out of compress code.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230504113841.23130-9-quintela@redhat.com>
Change all the functions that use it. It was already passed as
uint64_t.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-8-quintela@redhat.com>
The first thing that we do after setting the shutdown value is set the
error as -EIO if there is not a previous error.
So this value is redundant. Just remove it and use
qemu_file_get_error() in the places that it was tested.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230504113841.23130-7-quintela@redhat.com>
After calling qemu_file_shutdown() we set the error as -EIO if there
is no another previous error, so no need to check it here.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-6-quintela@redhat.com>
So delta_bytes can only be greater or equal to zero. Never negative.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-3-quintela@redhat.com>
So make everything that uses it uint64_t no int64_t.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-2-quintela@redhat.com>
It makes no sense first try to see if there is an rdma error and then
do nothing on postcopy stage. Change it so we check we are in
postcopy before doing anything.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-6-quintela@redhat.com>
This could only happen if the source sent
RAM_SAVE_FLAG_HOOK (i.e. rdma) and destination don't have CONFIG_RDMA.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-5-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-4-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-3-quintela@redhat.com>
Fixes this commit, clearly a bad merge after a rebase or similar, it
should have been its own case since that point.
commit 5b0e9dd46f
Author: Peter Lieven <pl@kamp.de>
Date: Tue Jun 24 11:32:36 2014 +0200
migration: catch unknown flag combinations in ram_load
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-2-quintela@redhat.com>
Otherwise it is confusing with the function xbzrle_enabled().
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230504115323.24407-1-quintela@redhat.com>
I forgot to move it when I rename it from duplicated_pages.
Message-Id: <20230504103357.22130-3-quintela@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230504103357.22130-2-quintela@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We can calculate it from the QEMUFile like the caller.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230503131847.11603-6-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It is valid that params->has_block_bitmap_mapping is true and
params->block_bitmap_mapping is NULL. So we can't use the trick of
having a single function.
Move to two functions one for each value and the tests are fixed.
Fixes: b804b35b1c
migration: Create migrate_block_bitmap_mapping() function
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230503181036.14890-1-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It is not needed since we moved the accessor for tls properties to
options.c.
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
It is not needed since we moved the accessor for tls properties to
options.c.
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Now that we have atomic counters, we can do it on the place that we
need it, no need to do it inside ram.c.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
It is lousely based on MigrationStats, but that name is taken, so this
is the best one that I came with.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
---
If you have any good suggestion for the name, I am all ears.
migration_stats is just too long, and it is going to have more than
ram counters in the near future.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
There is already include/qemu/stats.h, so stats.h was a bad idea.
We want this file to not depend on anything else, we will move all the
migration counters/stats to this struct.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Use the attribute, which is supported by clang, instead of
the #pragma, which is not supported and, for some reason,
also not detected by the meson probe, so we fail by -Werror.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230501210555.289806-1-richard.henderson@linaro.org>
As we set its value, it needs to be operated with atomics.
We rename it from remaining to better reflect its meaning.
Statistics always return the real reamaining bytes. This was used to
store how much pages where dirty on the previous generation, so we can
calculate the expected downtime as: dirty_bytes_last_sync /
current_bandwith.
If we use the actual remaining bytes, we would see a very small value
at the end of the iteration.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
I am open to use ram_bytes_remaining() in its only use and be more
"optimistic" about the downtime.
Don't use __nocheck() functions.
Use stat64_get() now that it exists.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
Don't use __nocheck() variants
Use stat64_get()
We need to add a new flag to mean to flush at that point.
Notice that we still flush at the end of setup and at the end of
complete stages.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
Add missing qemu_fflush(), now it passes all tests always.
In the previous version, the check that changes the default value to
false got lost in some rebase. Get it back.
We only need to do that on the ram_save_iterate() call on sending and
on destination when we get a RAM_SAVE_FLAG_EOS.
In setup() and complete() we need to synch in both new and old cases,
so don't add a check there.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
Remove the wrappers that we take out on patch 5.
We used to flush all channels at the end of each RAM section
sent. That is not needed, so preparing to only flush after a full
iteration through all the RAM.
Default value of the property is false. But we return "true" in
migrate_multifd_flush_after_each_section() until we implement the code
in following patches.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
Rename each-iteration to after-each-section
Rename multifd-sync-after-each-section to
multifd-flush-after-each-section
Move to machine-8.0 (peter)
Notice that we changed the test of ->has_block_bitmap_mapping
for the test that block_bitmap_mapping is not NULL.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
Make it return const (vladimir)
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
Moved the type to const char * (vladimir)
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
Moved the type to const char * (vladimir)
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
Moved the type to const char * (vladimir)
This makes the function more regular with everything else.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Once there, make it more regular and remove the need for
MigrationState parameter.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
We don't wait in the sem when we are doing a sync_main. Make it wait
there. To make things clearer, we mark the channel ready at the
begining of the thread loop.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
For VMS_ARRAY typed vmsd fields, also dump the number of entries in the
array in -vmstate-dump.
Without such information, vmstate static checker can report false negatives
of incompatible vmsd on VMS_ARRAY typed fields, when the src/dst do not
have the same type of array defined. It's because in the checker we only
check against size of fields within a VMSD field.
One example: e1000e used to have a field defined as a boolean array with 5
entries, then removed it and replaced it with UNUSED (in 31e3f318c8):
- VMSTATE_BOOL_ARRAY(core.eitr_intr_pending, E1000EState,
- E1000E_MSIX_VEC_NUM),
+ VMSTATE_UNUSED(E1000E_MSIX_VEC_NUM),
It's a legal replacement but vmstate static checker is not happy with it,
because it checks only against the "size" field between the two
fields (here one is BOOL_ARRAY, the other is UNUSED):
For BOOL_ARRAY:
{
"field": "core.eitr_intr_pending",
"version_id": 0,
"field_exists": false,
"size": 1
},
For UNUSED:
{
"field": "unused",
"version_id": 0,
"field_exists": false,
"size": 5
},
It's not the script to blame because there's just not enough information
dumped to show the total size of the entry for an array. Add it.
Note that this will not break old vmstate checker because the field will
just be ignored.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Instead of print it to STDERR, bring the error upwards so that it can be
reported via QMP responses.
E.g.:
{ "execute": "migrate-set-capabilities" ,
"arguments": { "capabilities":
[ { "capability": "postcopy-ram", "state": true } ] } }
{ "error":
{ "class": "GenericError",
"desc": "Postcopy is not supported: Host backend files need to be TMPFS
or HUGETLBFS only" } }
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Once there, rename it to migrate_tls() and make it return bool for
consistency.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
Fix typos found by fabiano
Since the introduction of multifd, it's possible to perform a multifd
migration and finish it using postcopy.
A bug introduced by yank (fixed on cfc3bcf373) was previously preventing
a successful use of this migration scenario, and now thing should be
working on most scenarios.
But since there is not enough testing/support nor any reported users for
this scenario, we should disable this combination before it may cause any
problems for users.
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
To be consistent with every other parameter, rename to
migrate_block_incremental().
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
Fixed missing space after comma (fabiano)
Once that we are there, we rename the function to migrate_return_path()
to be consistent with all other capabilities.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Once that we are there, we rename the function to migrate_block()
to be consistent with all other capabilities.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Once that we are there, we rename the function to migrate_xbzrle()
to be consistent with all other capabilities.
We change the type to return bool also for consistency.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Once that we are there, we rename the function to
migrate_zero_copy_send() to be consistent with all other capabilities.
We can remove the CONFIG_LINUX guard. We already check that we can't
setup this capability in migrate_caps_check().
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Once that we are there, we rename the function to migrate_multifd()
to be consistent with all other capabilities.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Once that we are there, we rename the function to migrate_events()
to be consistent with all other capabilities.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Once that we are there, we rename the function to migrate_compress()
to be consistent with all other capabilities.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Once that we are there, we rename the function to migrate_colo() to be
consistent with all other capabilities.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
We move there all capabilities helpers from migration.c.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
Following David advise:
- looked through the history, capabilities are newer than 2012, so we
can remove that bit of the header.
- This part is posterior to Anthony.
Original Author is Orit. Once there,
I put myself. Peter Xu also did quite a bit of work here.
Anyone else wants/needs to be there? I didn't search too hard
because nobody asked before to be added.
What do you think?
And remove the convoluted use of qmp_migrate_set_capabilities() to
enable disable MIGRATION_CAPABILITY_BLOCK.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
It has nothing to do with migration, except for the "migrate" in the
name of the command. Move it with the rest of the ui commands.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
It is only used there, so we can make it static.
Once there, remove spice.h that it is not used.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
fix David Edmonson ui/qemu-spice.h unintended removal
No need to declare a temporary variable.
Suggested-by: Juan Quintela <quintela@redhat.com>
Fixes: 1df36e8c6289 ("migration: Handle block device inactivation failures better")
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We used to pass the old capabilities array and the new
capabilities as a list.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
It is clear from the context what that means, and such a long name
with the extra long names of the capabilities make very difficilut to
stay inside the 80 columns limit.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Postcopy requires the memory support userfaultfd to work. Right now we
check it but it's a bit too late (when switching to postcopy migration).
Do that early right at enabling of postcopy.
Note that this is still only a best effort because ramblocks can be
dynamically created. We can add check in hostmem creations and fail if
postcopy enabled, but maybe that's too aggressive.
Still, we have chance to fail the most obvious where we know there's an
existing unsupported ramblock.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Consider what happens when performing a migration between two host
machines connected to an NFS server serving multiple block devices to
the guest, when the NFS server becomes unavailable. The migration
attempts to inactivate all block devices on the source (a necessary
step before the destination can take over); but if the NFS server is
non-responsive, the attempt to inactivate can itself fail. When that
happens, the destination fails to get the migrated guest (good,
because the source wasn't able to flush everything properly):
(qemu) qemu-kvm: load of migration failed: Input/output error
at which point, our only hope for the guest is for the source to take
back control. With the current code base, the host outputs a message, but then appears to resume:
(qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
(src qemu)info status
VM status: running
but a second migration attempt now asserts:
(src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
Whether the guest is recoverable on the source after the first failure
is debatable, but what we do not want is to have qemu itself fail due
to an assertion. It looks like the problem is as follows:
In migration.c:migration_completion(), the source sets 'inactivate' to
true (since COLO is not enabled), then tries
savevm.c:qemu_savevm_state_complete_precopy() with a request to
inactivate block devices. In turn, this calls
block.c:bdrv_inactivate_all(), which fails when flushing runs up
against the non-responsive NFS server. With savevm failing, we are
now left in a state where some, but not all, of the block devices have
been inactivated; but migration_completion() then jumps to 'fail'
rather than 'fail_invalidate' and skips an attempt to reclaim those
those disks by calling bdrv_activate_all(). Even if we do attempt to
reclaim disks, we aren't taking note of failure there, either.
Thus, we have reached a state where the migration engine has forgotten
all state about whether a block device is inactive, because we did not
set s->block_inactive in enough places; so migration allows the source
to reach vm_start() and resume execution, violating the block layer
invariant that the guest CPUs should not be restarted while a device
is inactive. Note that the code in migration.c:migrate_fd_cancel()
will also try to reactivate all block devices if s->block_inactive was
set, but because we failed to set that flag after the first failure,
the source assumes it has reclaimed all devices, even though it still
has remaining inactivated devices and does not try again. Normally,
qmp_cont() will also try to reactivate all disks (or correctly fail if
the disks are not reclaimable because NFS is not yet back up), but the
auto-resumption of the source after a migration failure does not go
through qmp_cont(). And because we have left the block layer in an
inconsistent state with devices still inactivated, the later migration
attempt is hitting the assertion failure.
Since it is important to not resume the source with inactive disks,
this patch marks s->block_inactive before attempting inactivation,
rather than after succeeding, in order to prevent any vm_start() until
it has successfully reactivated all devices.
See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Rest of counters that refer to pages has a _pages suffix.
And historically, this showed the number of full pages transferred.
The name "normal" refered to the fact that they were sent without any
optimization (compression, xbzrle, zero_page, ...).
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Rest of counters that refer to pages has a _pages suffix.
And historically, this showed the number of pages composed of the same
character, here comes the name "duplicated". But since years ago, it
refers to the number of zero_pages.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
In the spirit of:
commit 394d323bc3451e4d07f13341cb8817fac8dfbadd
Author: Peter Xu <peterx@redhat.com>
Date: Tue Oct 11 17:55:51 2022 -0400
migration: Use atomic ops properly for page accountings
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Using MgrationStats as type for ram_counters mean that we didn't have
to re-declare each value in another struct. The need of atomic
counters have make us to create MigrationAtomicStats for this atomic
counters.
Create RAMStats type which is a merge of MigrationStats and
MigrationAtomicStats removing unused members.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
Fix typos found by David Edmondson
It does not even pair with a qatomic_mb_set(), so it is clearer to use
load-acquire in this case; they are synonyms.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There should be no paths from a coroutine_fn to aio_poll, however in
practice coroutine_mixed_fn will call aio_poll in the !qemu_in_coroutine()
path. By marking mixed functions, we can track accurately the call paths
that execute entirely in coroutine context, and find more missing
coroutine_fn markers. This results in more accurate checks that
coroutine code does not end up blocking.
If the marking were extended transitively to all functions that call
these ones, static analysis could be done much more efficiently.
However, this is a start and makes it possible to use vrc's path-based
searches to find potential bugs where coroutine_fns call blocking functions.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I removed that bit on commit:
commit c8df4a7aef
Author: Juan Quintela <quintela@redhat.com>
Date: Mon Oct 3 02:00:03 2022 +0200
migration: Split save_live_pending() into state_pending_*
Fixes: c8df4a7aef
Suggested-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Since ec6f3ab9, migration with compress enabled was broken, because
the compress threads use a dummy QEMUFile which just acts as a
buffer and that commit accidentally changed it to use the outgoing
migration channel instead.
Fix this by using the dummy file again in the compress threads.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
In 8.0 devel window we reworked preempt channel creation, so that there'll
be no race condition when the migration channel and preempt channel got
established in the wrong order in commit 5655aab079.
However no one noticed that the change will also be not compatible with
older qemus, majorly 7.1/7.2 versions where preempt mode started to be
supported.
Leverage the same pre-7.2 flag introduced in the previous patch to recover
the behavior hopefully before 8.0 releases, so we don't break migration
when we migrate from 8.0 to older qemu binaries.
Fixes: 5655aab079 ("migration: Postpone postcopy preempt channel to be after main")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
postcopy_qemufile_src object should be owned by one thread, either the main
thread (e.g. when at the beginning, or at the end of migration), or by the
return path thread (when during a preempt enabled postcopy migration). If
that's not the case the access to the object might be racy.
postcopy_preempt_shutdown_file() can be potentially racy, because it's
called at the end phase of migration on the main thread, however during
which the return path thread hasn't yet been recycled; the recycle happens
in await_return_path_close_on_source() which is after this point.
It means, logically it's posslbe the main thread and the return path thread
are both operating on the same qemufile. While I don't think qemufile is
thread safe at all.
postcopy_preempt_shutdown_file() used to be needed because that's where we
send EOS to dest so that dest can safely shutdown the preempt thread.
To avoid the possible race, remove this only place that a race can happen.
Instead we figure out another way to safely close the preempt thread on
dest.
The core idea during postcopy on deciding "when to stop" is that dest will
send a postcopy SHUT message to src, telling src that all data is there.
Hence to shut the dest preempt thread maybe better to do it directly on
dest node.
This patch proposed such a way that we change postcopy_prio_thread_created
into PreemptThreadStatus, so that we kick the preempt thread on dest qemu
by a sequence of:
mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
qemu_file_shutdown(mis->postcopy_qemufile_dst);
While here shutdown() is probably so far the easiest way to kick preempt
thread from a blocked qemu_get_be64(). Then it reads preempt_thread_status
to make sure it's not a network failure but a willingness to quit the
thread.
We could have avoided that extra status but just rely on migration status.
The problem is postcopy_ram_incoming_cleanup() is just called early enough
so we're still during POSTCOPY_ACTIVE no matter what.. So just make it
simple to have the status introduced.
One flag x-preempt-pre-7-2 is added to keep old pre-7.2 behaviors of
postcopy preempt.
Fixes: 9358982744 ("migration: Send requested page directly in rp-return thread")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Uses of blk_nb_sectors must check whether the result is negative.
Otherwise, underflow can happen. Fortunately, alloc_aio_bitmap()
and bmds_aio_inflight() both have an alternative way to retrieve the
number of sectors in the file.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20230407153303.391121-6-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This had been pulled in via qemu/plugin.h from hw/core/cpu.h,
but that will be removed.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230310195252.210956-5-richard.henderson@linaro.org>
[AJB: add various additional cases shown by CI]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230315174331.2959-15-alex.bennee@linaro.org>
Reviewed-by: Emilio Cota <cota@braap.org>
Include CONFIG_DEVICES so that populate_vfio_info is instantiated for
CONFIG_VFIO. Without it, the 'info migrate' command never returns
info about vfio.
Fixes: 43bd0bf30f ("migration: Move populate_vfio_info() into a separate file")
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The p->flags could be updated via the send_prepare callback, e.g. OR-ed
with MULTIFD_FLAG_ZLIB via zlib_send_prepare. Assign p->flags to the
local "flags" before the send_prepare callback could only get partial of
p->flags. Fix it by moving the assignment of p->flags to the local flags
after the callback, so that the correct flags can be traced.
Fixes: ab7cbb0b9a ("multifd: Make no compression operations into its own structure")
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It's no longer needed since commit
44bcfd45e9 ("migration/rdma: destination: create the return patch after the first accept")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
xbzrle_encode_buffer_avx512() checks for overflows too scarcely in its
outer loop, causing out-of-bounds writes:
$ ../configure --target-list=aarch64-softmmu --enable-sanitizers --enable-avx512bw
$ make tests/unit/test-xbzrle && ./tests/unit/test-xbzrle
==5518==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100000b100 at pc 0x561109a7714d bp 0x7ffed712a440 sp 0x7ffed712a430
WRITE of size 1 at 0x62100000b100 thread T0
#0 0x561109a7714c in uleb128_encode_small ../util/cutils.c:831
#1 0x561109b67f6a in xbzrle_encode_buffer_avx512 ../migration/xbzrle.c:275
#2 0x5611099a7428 in test_encode_decode_overflow ../tests/unit/test-xbzrle.c:153
#3 0x7fb2fb65a58d (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a58d)
#4 0x7fb2fb65a333 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a333)
#5 0x7fb2fb65aa79 in g_test_run_suite (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7aa79)
#6 0x7fb2fb65aa94 in g_test_run (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7aa94)
#7 0x5611099a3a23 in main ../tests/unit/test-xbzrle.c:218
#8 0x7fb2fa78c082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
#9 0x5611099a608d in _start (/qemu/build/tests/unit/test-xbzrle+0x28408d)
0x62100000b100 is located 0 bytes to the right of 4096-byte region [0x62100000a100,0x62100000b100)
allocated by thread T0 here:
#0 0x7fb2fb823a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
#1 0x7fb2fb637ef0 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57ef0)
Fix that by performing the overflow check in the inner loop, instead.
Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
__builtin_ctzll() produces undefined results when the argument is 0.
This can be seen through test-xbzrle, which produces the following
warning:
../migration/xbzrle.c:265: runtime error: passing zero to ctz(), which is not a valid argument
Replace __builtin_ctzll() with our ctz64() wrapper which properly
handles 0.
Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The RDMA code has return-path handling code, but it's only enabled
if postcopy is enabled; if the 'return-path' migration capability
is enabled, the return path is NOT setup but the core migration
code still tries to use it and breaks.
Enable the RDMA return path if either postcopy or the return-path
capability is enabled.
bz: https://bugzilla.redhat.com/show_bug.cgi?id=2063615
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
QEMU main thread will wait until dest preempt channel established during
processing the LISTEN command (within the whole postcopy PACKAGED data), by
waiting on the semaphore postcopy_qemufile_dst_done.
That's racy, because it's possible that the dest QEMU main thread hasn't
yet accept()ed the new connection when processing the LISTEN event. The
sem_wait() will yield the main thread without being able to run anything
else including the accept() of the new socket, which can cause deadlock
within the main thread.
To avoid the race, move the "wait channel" from main thread to the preempt
thread right at the start.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: 5655aab079 ("migration: Postpone postcopy preempt channel to be after main")
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
* Use cmd instead of /bin/sh on Windows.
* Try to auto-detect cmd.exe's path, but default to a hard-coded path.
Note that this will require that gspawn-win[32|64]-helper.exe and
gspawn-win[32|64]-helper-console.exe are included in the Windows binary
distributions (cc: Stefan Weil).
Signed-off-by: "John Berberian, Jr" <jeb.study@gmail.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The QERR_ macros are leftovers from the days of "rich" error objects.
We've been trying to reduce their remaining use.
Get rid of a use of QERR_FEATURE_DISABLED, and improve the somewhat
imprecise error message
(qemu) x_colo_lost_heartbeat
Error: The feature 'colo' is not enabled
to
Error: VM is not in COLO mode
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-12-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com>
Once that res_compatible is removed, they don't make sense anymore.
We remove the _only preffix. And to make things clearer we rename
them to must_precopy and can_postcopy.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Nothing assigns to it after previous commit.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
So remove last assignation of res_compatible.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Finish the conversion from commit fe80c0241d
("migration: using trace_ to replace DPRINTF").
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Add new function qemu_file_get_to_fd() that allows reading data from
QEMUFile and writing it straight into a given fd.
This will be used later in VFIO migration code.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
0x80 is RAM_SAVE_FLAG_HOOK, it is in qemu-file now.
Bigger usable flag is 0x200, noticing that.
We can reuse RAM_SAVe_FLAG_FULL.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Currently running migration_incoming_state_destroy() without first running
multifd_load_cleanup() will cause a yank error:
qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance:
Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
(core dumped)
The above error happens in the target host, when multifd is being used
for precopy, and then postcopy is triggered and the migration finishes.
This will crash the VM in the target host.
To avoid that, move multifd_load_cleanup() inside
migration_incoming_state_destroy(), so that the load cleanup becomes part
of the incoming state destroying process.
Running multifd_load_cleanup() twice can become an issue, though, but the
only scenario it could be ran twice is on process_incoming_migration_bh().
So removing this extra call is necessary.
On the other hand, this multifd_load_cleanup() call happens way before the
migration_incoming_state_destroy() and having this happening before
dirty_bitmap_mig_before_vm_start() and vm_start() may be a need.
So introduce a new function multifd_load_shutdown() that will mainly stop
all multifd threads and close their QIOChannels. Then use this function
instead of multifd_load_cleanup() to make sure nothing else is received
before dirty_bitmap_mig_before_vm_start().
Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Current approach will only join threads that are still running.
For the threads not joined, resources or private memory are always kept in
the process space and never reclaimed before process end, and this risks
serious memory leaks.
This should usually not represent a big problem, since multifd migration
is usually just ran at most a few times, and after it succeeds there is
not much to be done before exiting the process.
Yet still, it should not hurt performance to join all of them.
Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Before assigning "p->quit = true" for every multifd channel,
multifd_load_cleanup() will call multifd_recv_terminate_threads() which
already does the same assignment, while protected by a mutex.
So there is no point doing the same assignment again.
Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Since it's introduction in commit f986c3d256 ("migration: Create multifd
migration threads"), multifd_load_cleanup() never returned any value
different than 0, neither set up any error on errp.
Even though, on process_incoming_migration_bh() an if clause uses it's
return value to decide on setting autostart = false, which will never
happen.
In order to simplify the codebase, change multifd_load_cleanup() signature
to 'void multifd_load_cleanup(void)', and for every usage remove error
handling or decision made based on return value != 0.
Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Postcopy with preempt-mode enabled needs two channels to communicate. The
order of channel establishment is not guaranteed. It can happen that the
dest QEMU got the preempt channel connection request before the main
channel is established, then the migration may make no progress even during
precopy due to the wrong order.
To fix it, create the preempt channel only if we know the main channel is
established.
For a general postcopy migration, we delay it until postcopy_start(),
that's where we already went through some part of precopy on the main
channel. To make sure dest QEMU has already established the channel, we
wait until we got the first PONG received. That's something we do at the
start of precopy when postcopy enabled so it's guaranteed to happen sooner
or later.
For a postcopy recovery, we delay it to qemu_savevm_state_resume_prepare()
where we'll have round trips of data on bitmap synchronizations, which
means the main channel must have been established.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This is mostly useless, but useful for us to know whether the main channel
is correctly established without changing the migration protocol.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Since we just dropped the only case where postcopy_preempt_setup() can
return an error, it doesn't need a retval anymore because it never fails.
Move the preempt check to the caller, preparing it to be used elsewhere to
do nothing but as simple as kicking the async connection.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The whole idea of multi-channel checks was not properly done, IMHO.
Currently we check multi-channel in a lot of places, but actually that's
not needed because we only need to check it right after we get the URI and
that should be it.
If the URI check succeeded, we should never need to check it again because
we must have it. If it check fails, we should fail immediately on either
the qmp_migrate or qmp_migrate_incoming, instead of failingg it later after
the connection established.
Neither should we fail any set capabiliities like what we used to do here:
5ad15e8614 ("migration: allow enabling mutilfd for specific protocol only", 2021-10-19)
Because logically the URI will only be set later after the capability is
set, so it doesn't make a lot of sense to check the URI type when setting
the capability, because we're checking the cap with an old URI passed in,
and that may not even be the URI we're going to use later.
This patch mostly reverted all such checks for before, dropping the
variable migrate_allow_multi_channels and helpers. Instead, add a common
helper to check URI for multi-channels for either qmp_migrate and
qmp_migrate_incoming and that should do all the proper checks. The failure
will only trigger with the "migrate" or "migrate_incoming" command, or when
user specified "-incoming xxx" where "xxx" is not "defer".
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This commit is the same with [PATCH v6 1/2], and provides avx512 support for xbzrle_encode_buffer
function to accelerate xbzrle encoding speed. Runtime check of avx512
support and benchmark for this feature are added. Compared with C
version of xbzrle_encode_buffer function, avx512 version can achieve
50%-70% performance improvement on benchmarking. In addition, if dirty
data is randomly located in 4K page, the avx512 version can achieve
almost 140% performance gain.
Signed-off-by: ling xu <ling1.xu@intel.com>
Co-authored-by: Zhou Zhao <zhou.zhao@intel.com>
Co-authored-by: Jun Jin <jun.i.jin@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
I called the helper function from the wrong top level function.
This code was introduced in:
commit c8df4a7aef
Author: Juan Quintela <quintela@redhat.com>
Date: Mon Oct 3 02:00:03 2022 +0200
migration: Split save_live_pending() into state_pending_*
We split the function into to:
- state_pending_estimate: We estimate the remaining state size without
stopping the machine.
- state pending_exact: We calculate the exact amount of remaining
state.
Thanks to Avihai Horon <avihaih@nvidia.com> for finding it.
Fixes:c8df4a7aeffcb46020f610526eea621fa5b0cd47
When we introduced that patch, we enden calling
state_pending_estimate() helper from qemu_savevm_statepending_exact()
and
state_pending_exact() helper from qemu_savevm_statepending_estimate()
This patch fixes it.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We are going to create a new function for multifd latest in the series.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We are recalculating ram size continously, when we know that it don't
change during migration. Create a field in RAMState to track it.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
It is just a big if in the middle of the function, and we need two
functions anways.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
Reindent to make Phillipe happy (and CODING_STYLE)
We used to return two bools, just return a single int with the
following meaning:
old return / again / new return
false false PAGE_ALL_CLEAN
false true PAGE_TRY_AGAIN
true true PAGE_DIRTY_FOUND /* We don't care about again at all */
Signed-off-by: Juan Quintela <quintela@redhat.com>
We will need later that find_dirty_block() return errors, so
simplify the loop.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Cleanup multifd_channel_connect
Signed-off-by: Li Zhang <lizhang@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
I introduced spurious files on my tree during a rebase:
commit ebfc578715
Author: Zhenzhong Duan <zhenzhong.duan@intel.com>
Date: Mon Oct 17 15:53:51 2022 +0800
multifd: Fix flush of zero copy page send request
Make IO channel flush call after the inflight request has been drained
in multifd thread, or else we may missed to flush the inflight request.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
To make things worse, it appears like Zhenzhong is the one to blame.
for(int i=0; i < 1000000; i++) {
printf("I will not do rebases when I am tired\n");
}
Sorry, Juan.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Tracked down with the help of scripts/clean-includes.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230202133830.2152150-21-armbru@redhat.com>
To support query migration thread infomation, save and delete
thread(live_migration and multifdsend) information at thread
creation and finish.
Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Introduce interface query-migrationthreads. The interface is used
to query information about migration threads and returns with
migration thread's name and its id.
Introduce threadinfo.c to manage threads with migration.
Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Make IO channel flush call after the inflight request has been drained
in multifd thread, or else we may missed to flush the inflight request.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
In multifd_queue_page() MultiFDPages_t.block is checked twice.
Between the two checks, MultiFDPages_t.block may be reset to NULL
by multifd thread. This lead to the 2nd check always true then a
redundant page submitted to multifd thread again.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Current logic assumes that channel connections on the destination side are
always established in the same order as the source and the first one will
always be the main channel followed by the multifid or post-copy
preemption channel. This may not be always true, as even if a channel has a
connection established on the source side it can be in the pending state on
the destination side and a newer connection can be established first.
Basically causing out of order mapping of channels on the destination side.
Currently, all channels except post-copy preempt send a magic number, this
patch uses that magic number to decide the type of channel. This logic is
applicable only for precopy(multifd) live migration, as mentioned, the
post-copy preempt channel does not send any magic number. Also, tls live
migrations already does tls handshake before creating other channels, so
this issue is not possible with tls, hence this logic is avoided for tls
live migrations. This patch uses read peek to check the magic number of
channels so that current data/control stream management remains
un-effected.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: manish.mishra <manish.mishra@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
MSG_PEEK peeks at the channel, The data is treated as unread and
the next read shall still return this data. This support is
currently added only for socket class. Extra parameter 'flags'
is added to io_readv calls to pass extra read flags like MSG_PEEK.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: manish.mishra <manish.mishra@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The value of "Sample Pages" is confusing in mode other than page-sampling.
See below:
(qemu) calc_dirty_rate -b 10 520
(qemu) info dirty_rate
Status: measuring
Start Time: 11646834 (ms)
Sample Pages: 520 (per GB)
Period: 10 (sec)
Mode: dirty-bitmap
Dirty rate: (not ready)
(qemu) info dirty_rate
Status: measured
Start Time: 11646834 (ms)
Sample Pages: 0 (per GB)
Period: 10 (sec)
Mode: dirty-bitmap
Dirty rate: 2 (MB/s)
While it's totally useless in dirty-ring and dirty-bitmap mode, fix to
show it only in page-sampling mode.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Perform a check on vmsd structures during test runs in the hope
of catching any missing terminators and other simple screwups.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We fairly regularly forget VMSTATE_END_OF_LIST markers off descriptions;
given that the current check is only for ->name being NULL, sometimes
we get unlucky and the code apparently works and no one spots the error.
Explicitly add a flag, VMS_END that should be set, and assert it is
set during the traversal.
Note: This can't go in until we update the copy of vmstate.h in slirp.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
upon errors. As the documentation in include/io/channel.h states, only
-1 and QIO_CHANNEL_ERR_BLOCK should be returned upon error. Other
values have the potential to confuse the call sites.
error_setg is used rather than error_setg_errno, because there are
certain code paths where -1 (as a non-errno) is propagated up (e.g.
starting from qemu_rdma_block_for_wrid or qemu_rdma_post_recv_control)
all the way to qio_channel_rdma_{readv,writev}.
Similar to a216ec85b7 ("migration/channel-block: fix return value for
qio_channel_block_{readv,writev}").
Suggested-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The downtime should be displayed during postcopy phase because the
switchover phase is done. OTOH it's weird to show "expected downtime"
which can confuse what does that mean if the switchover has already
happened anyway.
This is a slight ABI change on QMP, but I assume it shouldn't affect
anyone.
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Let's factor out this check, to be used in virtio-mem context next.
While at it, fix a spelling error in a related comment.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
For virtio-mem, we want to have the plugged/unplugged state of memory
blocks available before migrating any actual RAM content, and perform
sanity checks before touching anything on the destination. This
information is immutable on the migration source while migration is active,
We want to use this information for proper preallocation support with
migration: currently, we don't preallocate memory on the migration target,
and especially with hugetlb, we can easily run out of hugetlb pages during
RAM migration and will crash (SIGBUS) instead of catching this gracefully
via preallocation.
Migrating device state via a VMSD before we start iterating is currently
impossible: the only approach that would be possible is avoiding a VMSD
and migrating state manually during save_setup(), to be restored during
load_state().
Let's allow for migrating device state via a VMSD early, during the
setup phase in qemu_savevm_state_setup(). To keep it simple, we
indicate applicable VMSD's using an "early_setup" flag.
Note that only very selected devices (i.e., ones seriously messing with
RAM setup) are supposed to make use of such early state migration.
While at it, also use a bool for the "unmigratable" member.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
... and store it in the migration state. This is a preparation for
storing selected vmds's already in qemu_savevm_state_setup().
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Let's move more code into vmstate_save(), reducing code duplication and
preparing for reuse of vmstate_save() in qemu_savevm_state_setup(). We
have to move vmstate_save() to make the compiler happy.
We'll now also trace from qemu_save_device_state(), triggering the same
tracepoints as previously called from
qemu_savevm_state_complete_precopy_non_iterable() only. Note that
qemu_save_device_state() ignores iterable device state, such as RAM,
and consequently doesn't trigger some other trace points (e.g.,
trace_savevm_state_setup()).
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
ram_block_populate_read() already optimizes for RamDiscardManager.
However, ram_write_tracking_start() will still try protecting discarded
memory ranges.
Let's optimize, because discarded ranges don't map any pages and
(1) For anonymous memory, trying to protect using uffd-wp without a mapped
page is ignored by the kernel and consequently a NOP.
(2) For shared/file-backed memory, we will fill present page tables in the
range with PTE markers. However, we will even allocate page tables
just to fill them with unnecessary PTE markers and effectively
waste memory.
So let's exclude these ranges, just like ram_block_populate_read()
already does.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
ram_mig_ram_block_resized() will abort migration (including background
snapshots) when resizing a RAMBlock. ram_block_populate_read() will only
populate RAM up to used_length, so at least for anonymous memory
protecting everything between used_length and max_length won't
actually be protected and is just a NOP.
So let's only protect everything up to used_length.
Note: it still makes sense to register uffd-wp for max_length, such
that RAM_UF_WRITEPROTECT is independent of a changing used_length.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
When unregistering uffd-wp, older kernels before commit f369b07c86143
("mm/uffd:reset write protection when unregister with wp-mode") won't
clear the uffd-wp PTE bit. When re-registering uffd-wp, the previous
uffd-wp PTE bits would trigger again. With above commit, the kernel will
clear the uffd-wp PTE bits when unregistering itself.
Consequently, we'll clear the uffd-wp PTE bits now twice -- whereby we
don't care about clearing them at all: a new background snapshot will
re-register uffd-wp and re-protect all memory either way.
So let's skip the manual clearing of uffd-wp. If ever relevant, we
could clear conditionally in uffd_unregister_memory() -- we just need a
way to figure out more recent kernels.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
If something goes wrong during uffd_change_protection(), we would miss
to unregister uffd-wp and not release our reference. Fix it by
performing the uffd_change_protection(true) last.
Note that a uffd_change_protection(false) on the recovery path without a
prior uffd_change_protection(false) is fine.
Fixes: 278e2f551a ("migration: support UFFD write fault processing in ram_save_iterate()")
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Unfortunately, commit f7b9dcfbcf broke populate_read_range(): the loop
end condition is very wrong, resulting in that function not populating the
full range. Lets' fix that.
Fixes: f7b9dcfbcf ("migration/ram: Factor out populating pages readable in ram_block_populate_pages()")
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Add a helper to create the uffd handle.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Until previous commit, save_live_pending() was used for ram. Now with
the split into state_pending_estimate() and state_pending_exact() it
is not needed anymore, so remove them.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We split the function into to:
- state_pending_estimate: We estimate the remaining state size without
stopping the machine.
- state pending_exact: We calculate the exact amount of remaining
state.
The only "device" that implements different functions for _estimate()
and _exact() is ram.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commit d9e474ea56 overlooked the case where the target psize is even larger
than the host psize. One example is Alpha has 8K page size and migration
will start to crash the source QEMU when running Alpha migration on x86.
Fix it by detecting that case and set host start/end just to cover the
single page to be migrated.
This will slightly optimize the common case where host psize equals to
guest psize so we don't even need to do the roundups, but that's trivial.
Cc: qemu-stable@nongnu.org
Reported-by: Thomas Huth <thuth@redhat.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1456
Fixes: d9e474ea56 ("migration: Teach PSS about host page")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This moves the command from MAINTAINERS sections "Human Monitor (HMP)"
and "QMP" to "Migration".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230124121946.1139465-19-armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
This moves these commands from MAINTAINERS sections "Human
Monitor (HMP)" and "QMP" to "Migration".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230124121946.1139465-18-armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
We have two inclusion loops:
block/block.h
-> block/block-global-state.h
-> block/block-common.h
-> block/blockjob.h
-> block/block.h
block/block.h
-> block/block-io.h
-> block/block-common.h
-> block/blockjob.h
-> block/block.h
I believe these go back to Emanuele's reorganization of the block API,
merged a few months ago in commit d7e2fe4aac.
Fortunately, breaking them is merely a matter of deleting unnecessary
includes from headers, and adding them back in places where they are
now missing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221221133551.3967339-2-armbru@redhat.com>
Hi
This are the patches that I had to drop form the last PULL request because they werent fixes:
- AVX2 is dropped, intel posted a fix, I have to redo it
- Fix for out of order channels is out
Daniel nacked it and I need to redo it
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmOa6xUACgkQ9IfvGFhy
1yP13BAAj4GdlWCqgvv98qIf9dY5WjvrbzL+8qdUvt7VIsDgh18amjlBmvvBngmd
tssPHqLTqs6CXYxo4PBwKsvhA1qBCg9Fr+RtMTJG4FoumFdeO/l4tcXs99Ww5o9p
OnrMAshTRHMRapvvX0vIiR0dGUPXs6KOz2JLNX1oF5ZY1yqskLxp9x3ydL7iw2oN
GikRUfd4bG8drvhrKl6WPZOMKt0fVRH/2j0TqKPtl/hh/F4Ie6AUSI7McYMwOeXx
xUhFcm2PKY5US6uYhZpKo7envCmuxreZSAH/eRrlu5uNCCOKaZ9uWYwACMJGpfrB
SqY5dCTDpfFoaOloFEOYDfWOwoCJl5u9vNwRK1ArSVCfjczq50itswFTQ3A/hyd2
1noMv60XcR3An3mUydQ3j/C+hfE3KVXdFPImOKjPrn8zU6f2Dfug3ALXiHi1xyov
ZdpcZjCEhdSruYxIdlIKfzlYLy8R1G4mSFrBV3NuMrywlM2fWQgyCUAYwzRwQrJw
oBiedgpNP/MCM4NPQKLpvz/sci6nxkrGV8QX44zg0LdViXkpCU5ZiaoPXQcbiQCC
Xkkah3GLbVt6788qKja2U9ccdofAe5yUbjo6XYxdbXC7y9mSyvBS9FCHvWr4HY/8
TUavGrcjKqQ31WxiyWw5CEi/hqNftFUNtWmEzZuAjRwM2cw89sU=
=zGNB
-----END PGP SIGNATURE-----
Merge tag 'next-8.0-pull-request' of https://gitlab.com/juan.quintela/qemu into staging
Migration patches for 8.0
Hi
This are the patches that I had to drop form the last PULL request because they werent fixes:
- AVX2 is dropped, intel posted a fix, I have to redo it
- Fix for out of order channels is out
Daniel nacked it and I need to redo it
# gpg: Signature made Thu 15 Dec 2022 09:38:29 GMT
# gpg: using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full]
# gpg: aka "Juan Quintela <quintela@trasno.org>" [full]
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723
* tag 'next-8.0-pull-request' of https://gitlab.com/juan.quintela/qemu:
migration: Drop rs->f
migration: Remove old preempt code around state maintainance
migration: Send requested page directly in rp-return thread
migration: Move last_sent_block into PageSearchStatus
migration: Make PageSearchStatus part of RAMState
migration: Add pss_init()
migration: Introduce pss_channel
migration: Teach PSS about host page
migration: Use atomic ops properly for page accountings
migration: Yield bitmap_mutex properly when sending/sleeping
migration: Remove RAMState.f references in compression code
migration: Trivial cleanup save_page_header() on same block check
migration: Cleanup xbzrle zero page cache update logic
migration: Add postcopy_preempt_active()
migration: Take bitmap mutex when completing ram migration
migration: Export ram_release_page()
migration: Export ram_transferred_ram()
multifd: Create page_count fields into both MultiFD{Recv,Send}Params
multifd: Create page_size fields into both MultiFD{Recv,Send}Params
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmOZ6lYSHGFybWJydUBy
ZWRoYXQuY29tAAoJEDhwtADrkYZT6VEQAKynjWh3AIZ4/qOgrVqsP0oRspevLmfH
BbuGoldjYpEE7RbwuCaZalZ7iy7TcSySxnPfUDVsFHd7NWffJVjwKHifGC0D/Ez0
+Ggyb1CBebN+mS7t+BNFUHdMM+wxFIlHwg4f4aTFbn2o0HKgj2a8tcNzNRonZbfa
xURnvbD4G4u0VZEc3Jak+x193xbOJFsuuWq0BZnDuNk+XqjyW2RwfpXLPJVk+82a
4uy/YgYuqXUqBeULwcJj+shBL4SXR9GyajTFMS64przSUle0ADUmXkPtaS2agV7e
Pym/UQuAcxvNyw34fJsiMZxx6rZI9YU30jQUMRLoYcPRR/Q/aiPeiiHtiD6Kaid7
IfOeH/EArXaQRFpD89xj4YcaTnRLQOEj0NXgXvAbQf6eD8JYyao/S/0lCsPZEoA2
nibLqEQ25ncDNXoSomuwtfjVff3w68lODFbhwqfA0gf3cPtCgVZ6xQ8P/McNY6K6
wqFHXMWTDHk1LOCTucjYz1z2TGzTnSG4iWi5Yt6FSxAc958AO+v5ALn/1pcYun+E
azM/MF0AInKj2aJCT530zT0tpCs/Jo07YKC8k6ubi77S0ZdmGS1XLeXkRXfk1+yI
OhuUgiVlSTHxD69DagT2vbnx1mDMM9X+OBIMvEi5nwvD9A/ghaCgkDeGFvbA1ud0
t0mxPBZJ+tiZ
=JJjG
-----END PGP SIGNATURE-----
Merge tag 'pull-misc-2022-12-14' of https://repo.or.cz/qemu/armbru into staging
Miscellaneous patches for 2022-12-14
# gpg: Signature made Wed 14 Dec 2022 15:23:02 GMT
# gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg: issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* tag 'pull-misc-2022-12-14' of https://repo.or.cz/qemu/armbru:
ppc4xx_sdram: Simplify sdram_ddr_size() to return
block/vmdk: Simplify vmdk_co_create() to return directly
cleanup: Tweak and re-run return_directly.cocci
io: Tidy up fat-fingered parameter name
qapi: Use returned bool to check for failure (again)
sockets: Use ERRP_GUARD() where obviously appropriate
qemu-config: Use ERRP_GUARD() where obviously appropriate
qemu-config: Make config_parse_qdict() return bool
monitor: Use ERRP_GUARD() in monitor_init()
monitor: Simplify monitor_fd_param()'s error handling
error: Move ERRP_GUARD() to the beginning of the function
error: Drop a few superfluous ERRP_GUARD()
error: Drop some obviously superfluous error_propagate()
Drop more useless casts from void * to pointer
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now with rs->pss we can already cache channels in pss->pss_channels. That
pss_channel contains more infromation than rs->f because it's per-channel.
So rs->f could be replaced by rss->pss[RAM_CHANNEL_PRECOPY].pss_channel,
while rs->f itself is a bit vague now.
Note that vanilla postcopy still send pages via pss[RAM_CHANNEL_PRECOPY],
that's slightly confusing but it reflects the reality.
Then, after the replacement we can safely drop rs->f.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
With the new code to send pages in rp-return thread, there's little help to
keep lots of the old code on maintaining the preempt state in migration
thread, because the new way should always be faster..
Then if we'll always send pages in the rp-return thread anyway, we don't
need those logic to maintain preempt state anymore because now we serialize
things using the mutex directly instead of using those fields.
It's very unfortunate to have those code for a short period, but that's
still one intermediate step that we noticed the next bottleneck on the
migration thread. Now what we can do best is to drop unnecessary code as
long as the new code is stable to reduce the burden. It's actually a good
thing because the new "sending page in rp-return thread" model is (IMHO)
even cleaner and with better performance.
Remove the old code that was responsible for maintaining preempt states, at
the meantime also remove x-postcopy-preempt-break-huge parameter because
with concurrent sender threads we don't really need to break-huge anymore.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
With all the facilities ready, send the requested page directly in the
rp-return thread rather than queuing it in the request queue, if and only
if postcopy preempt is enabled. It can achieve so because it uses separate
channel for sending urgent pages. The only shared data is bitmap and it's
protected by the bitmap_mutex.
Note that since we're moving the ownership of the urgent channel from the
migration thread to rp thread it also means the rp thread is responsible
for managing the qemufile, e.g. properly close it when pausing migration
happens. For this, let migration_release_from_dst_file to cover shutdown
of the urgent channel too, renaming it as migration_release_dst_files() to
better show what it does.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Since we use PageSearchStatus to represent a channel, it makes perfect
sense to keep last_sent_block (aka, leverage RAM_SAVE_FLAG_CONTINUE) to be
per-channel rather than global because each channel can be sending
different pages on ramblocks.
Hence move it from RAMState into PageSearchStatus.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We used to allocate PSS structure on the stack for precopy when sending
pages. Make it static, so as to describe per-channel ram migration status.
Here we declared RAM_CHANNEL_MAX instances, preparing for postcopy to use
it, even though this patch has not yet to start using the 2nd instance.
This should not have any functional change per se, but it already starts to
export PSS information via the RAMState, so that e.g. one PSS channel can
start to reference the other PSS channel.
Always protect PSS access using the same RAMState.bitmap_mutex. We already
do so, so no code change needed, just some comment update. Maybe we should
consider renaming bitmap_mutex some day as it's going to be a more commonly
and big mutex we use for ram states, but just leave it for later.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Helper to init PSS structures.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Introduce pss_channel for PageSearchStatus, define it as "the migration
channel to be used to transfer this host page".
We used to have rs->f, which is a mirror to MigrationState.to_dst_file.
After postcopy preempt initial version, rs->f can be dynamically changed
depending on which channel we want to use.
But that later work still doesn't grant full concurrency of sending pages
in e.g. different threads, because rs->f can either be the PRECOPY channel
or POSTCOPY channel. This needs to be per-thread too.
PageSearchStatus is actually a good piece of struct which we can leverage
if we want to have multiple threads sending pages. Sending a single guest
page may not make sense, so we make the granule to be "host page", and in
the PSS structure we allow specify a QEMUFile* to migrate a specific host
page. Then we open the possibility to specify different channels in
different threads with different PSS structures.
The PSS prefix can be slightly misleading here because e.g. for the
upcoming usage of postcopy channel/thread it's not "searching" (or,
scanning) at all but sending the explicit page that was requested. However
since PSS existed for some years keep it as-is until someone complains.
This patch mostly (simply) replace rs->f with pss->pss_channel only. No
functional change intended for this patch yet. But it does prepare to
finally drop rs->f, and make ram_save_guest_page() thread safe.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Migration code has a lot to do with host pages. Teaching PSS core about
the idea of host page helps a lot and makes the code clean. Meanwhile,
this prepares for the future changes that can leverage the new PSS helpers
that this patch introduces to send host page in another thread.
Three more fields are introduced for this:
(1) host_page_sending: this is set to true when QEMU is sending a host
page, false otherwise.
(2) host_page_{start|end}: these point to the start/end of host page
we're sending, and it's only valid when host_page_sending==true.
For example, when we look up the next dirty page on the ramblock, with
host_page_sending==true, we'll not try to look for anything beyond the
current host page boundary. This can be slightly efficient than current
code because currently we'll set pss->page to next dirty bit (which can be
over current host page boundary) and reset it to host page boundary if we
found it goes beyond that.
With above, we can easily make migration_bitmap_find_dirty() self contained
by updating pss->page properly. rs* parameter is removed because it's not
even used in old code.
When sending a host page, we should use the pss helpers like this:
- pss_host_page_prepare(pss): called before sending host page
- pss_within_range(pss): whether we're still working on the cur host page?
- pss_host_page_finish(pss): called after sending a host page
Then we can use ram_save_target_page() to save one small page.
Currently ram_save_host_page() is still the only user. If there'll be
another function to send host page (e.g. in return path thread) in the
future, it should follow the same style.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
To prepare for thread-safety on page accountings, at least below counters
need to be accessed only atomically, they are:
ram_counters.transferred
ram_counters.duplicate
ram_counters.normal
ram_counters.postcopy_bytes
There are a lot of other counters but they won't be accessed outside
migration thread, then they're still safe to be accessed without atomic
ops.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Don't take the bitmap mutex when sending pages, or when being throttled by
migration_rate_limit() (which is a bit tricky to call it here in ram code,
but seems still helpful).
It prepares for the possibility of concurrently sending pages in >1 threads
using the function ram_save_host_page() because all threads may need the
bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of
qemu_sem_wait() blocking for one thread will not block the other from
progressing.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Removing referencing to RAMState.f in compress_page_with_multi_thread() and
flush_compressed_data().
Compression code by default isn't compatible with having >1 channels (or it
won't currently know which channel to flush the compressed data), so to
make it simple we always flush on the default to_dst_file port until
someone wants to add >1 ports support, as rs->f right now can really
change (after postcopy preempt is introduced).
There should be no functional change at all after patch applied, since as
long as rs->f referenced in compression code, it must be to_dst_file.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The 2nd check on RAM_SAVE_FLAG_CONTINUE is a bit redundant. Use a boolean
to be clearer.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The major change is to replace "!save_page_use_compression()" with
"xbzrle_enabled" to make it clear.
Reasonings:
(1) When compression enabled, "!save_page_use_compression()" is exactly the
same as checking "xbzrle_enabled".
(2) When compression disabled, "!save_page_use_compression()" always return
true. We used to try calling the xbzrle code, but after this change we
won't, and we shouldn't need to.
Since at it, drop the xbzrle_enabled check in xbzrle_cache_zero_page()
because with this change it's not needed anymore.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Add the helper to show that postcopy preempt enabled, meanwhile active.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Any call to ram_find_and_save_block() needs to take the bitmap mutex. We
used to not take it for most of ram_save_complete() because we thought
we're the only one left using the bitmap, but it's not true after the
preempt full patchset applied, since the return path can be taking it too.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
We were recalculating it left and right. We plan to change that
values on next patches.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
We were calling qemu_target_page_size() left and right.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
The has_FOO for pointer-valued FOO are redundant, except for arrays.
They are also a nuisance to work with. Recent commit "qapi: Start to
elide redundant has_FOO in generated C" provided the means to elide
them step by step. This is the step for qapi/migration.json.
Said commit explains the transformation in more detail. The invariant
violations mentioned there do not occur here.
Cc: Juan Quintela <quintela@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20221104160712.3005652-17-armbru@redhat.com>
Tweak the semantic patch to drop redundant parenthesis around the
return expression.
Coccinelle drops a comment in hw/rdma/vmw/pvrdma_cmd.c; restored
manually.
Coccinelle messes up vmdk_co_create(), not sure why. Change dropped,
will be done manually in the next commit.
Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
manually.
Whitespace in tools/virtiofsd/fuse_lowlevel.c tidied up manually.
checkpatch.pl complains "return of an errno should typically be -ve"
two times for hw/9pfs/9p-synth.c. Preexisting, the patch merely makes
it visible to checkpatch.pl.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221122134917.1217307-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
And it appears that what is wrong is the code. During bulk stage we
need to make sure that some block is dirty, but no games with
max_size at all.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Multifd thread model does not work for compression, explicitly disable it.
Note that previuosly even we can enable both of them, nothing will go
wrong, because the compression code has higher priority so multifd feature
will just be ignored. Now we'll fail even earlier at config time so the
user should be aware of the consequence better.
Note that there can be a slight chance of breaking existing users, but
let's assume they're not majority and not serious users, or they should
have found that multifd is not working already.
With that, we can safely drop the check in ram_save_target_page() for using
multifd, because when multifd=on then compression=off, then the removed
check on save_page_use_compression() will also always return false too.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The preempt mode requires the capability to assign channel for each of the
page, while the compression logic will currently assign pages to different
compress thread/local-channel so potentially they're incompatible.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
In qemu_file_shutdown(), there's a possible race if with current order of
operation. There're two major things to do:
(1) Do real shutdown() (e.g. shutdown() syscall on socket)
(2) Update qemufile's last_error
We must do (2) before (1) otherwise there can be a race condition like:
page receiver other thread
------------- ------------
qemu_get_buffer()
do shutdown()
returns 0 (buffer all zero)
(meanwhile we didn't check this retcode)
try to detect IO error
last_error==NULL, IO okay
install ALL-ZERO page
set last_error
--> guest crash!
To fix this, we can also check retval of qemu_get_buffer(), but not all
APIs can be properly checked and ultimately we still need to go back to
qemu_file_get_error(). E.g. qemu_get_byte() doesn't return error.
Maybe some day a rework of qemufile API is really needed, but for now keep
using qemu_file_get_error() and fix it by not allowing that race condition
to happen. Here shutdown() is indeed special because the last_error was
emulated. For real -EIO errors it'll always be set when e.g. sendmsg()
error triggers so we won't miss those ones, only shutdown() is a bit tricky
here.
Cc: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
When starting ram saving procedure (especially at the completion phase),
always set last_seen_block to non-NULL to make sure we can always correctly
detect the case where "we've migrated all the dirty pages".
Then we'll guarantee both last_seen_block and pss.block will be valid
always before the loop starts.
See the comment in the code for some details.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Move flushing code from multifd_send_sync_main() to a new helper, and call
it in multifd_send_sync_main().
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
in the error case. The documentation in include/io/channel.h states
that -1 or QIO_CHANNEL_ERR_BLOCK should be returned upon error. Simply
passing along the return value from the bdrv-functions has the
potential to confuse the call sides. Non-blocking mode is not
implemented currently, so -1 it is.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Snapshot loading only expects to call deterministic handlers, not
non-deterministic ones. So introduce a way of registering handlers that
won't be called when reseting for snapshots.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-2-Jason@zx2c4.com
[PMM: updated json doc comment with Markus' text; fixed
checkpatch style nit]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Callers of coroutine_fn must be coroutine_fn themselves, or the call
must be within "if (qemu_in_coroutine())". Apply coroutine_fn to
functions where this holds.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Faria <afaria@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220922084924.201610-26-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
This commit only touches allocations with size arguments of the form
sizeof(T).
Patch created mechanically with:
$ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
--macro-file scripts/cocci-macro-file.h FILES...
The previous iteration was commit a95942b50c.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220923084254.4173111-1-armbru@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
When we use BLK_MIG_BLOCK_SIZE in expressions like
block_mig_state.submitted * BLK_MIG_BLOCK_SIZE, this multiplication
is done as 32 bits, because both operands are 32 bits. Coverity
complains about possible overflows because we then accumulate that
into a 64 bit variable.
Define BLK_MIG_BLOCK_SIZE as unsigned long long using the ULL suffix.
The only two current uses of it with this problem are both in
block_save_pending(), so we could just cast to uint64_t there, but
using the ULL suffix is simpler and ensures that we don't
accidentally introduce new variants of the same issue in future.
Resolves: Coverity CID 1487136, 1487175
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220721115207.729615-3-peter.maydell@linaro.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Coverity complains that when we use the return value from
migrate_multifd_compression() as an array index:
multifd_recv_state->ops = multifd_ops[migrate_multifd_compression()];
that this might overrun the array (which is declared to have size
MULTIFD_COMPRESSION__MAX). This is because the function return type
is MultiFDCompression, which is an autogenerated enum. The code
generator includes the "one greater than the maximum possible value"
MULTIFD_COMPRESSION__MAX in the enum, even though this is not
actually a valid value for the enum, and this makes Coverity think
that migrate_multifd_compression() could return that __MAX value and
index off the end of the array.
Suppress the Coverity error by asserting that the value we're going
to return is within range.
Resolves: Coverity CID 1487239, 1487254
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220721115207.729615-2-peter.maydell@linaro.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This reverts commit cfd66f30fb.
The simplification of unqueue_page() introduced a bug that sometimes
breaks migration on s390x hosts.
The problem is not fully understood yet, but since we are already in
the freeze for QEMU 7.1 and we need something working there, let's
revert this patch for the upcoming release. The optimization can be
redone later again in a proper way if necessary.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2099934
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20220802061949.331576-1-thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Some of params->has_* = true are missing in migration_instance_init, this
causes migrate_params_check() to skip some tests, allowing some
unsupported scenarios.
Fix this by adding all missing params->has_* = true in
migration_instance_init().
Fixes: 69ef1f36b0 ("migration: define 'tls-creds' and 'tls-hostname' migration parameters")
Fixes: 1d58872a91 ("migration: do not wait for free thread")
Fixes: d2f1d29b95 ("migration: add support for a "tls-authz" migration parameter")
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220726010235.342927-1-leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Migration with zero-copy-send currently has it's limitations, as it can't
be used with TLS nor any kind of compression. In such scenarios, it should
output errors during parameter / capability setting.
But currently there are some ways of setting this not-supported scenarios
without printing the error message:
!) For 'compression' capability, it works by enabling it together with
zero-copy-send. This happens because the validity test for zero-copy uses
the helper unction migrate_use_compression(), which check for compression
presence in s->enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS].
The point here is: the validity test happens before the capability gets
enabled. If all of them get enabled together, this test will not return
error.
In order to fix that, replace migrate_use_compression() by directly testing
the cap_list parameter migrate_caps_check().
2) For features enabled by parameters such as TLS & 'multifd_compression',
there was also a possibility of setting non-supported scenarios: setting
zero-copy-send first, then setting the unsupported parameter.
In order to fix that, also add a check for parameters conflicting with
zero-copy-send on migrate_params_check().
3) XBZRLE is also a compression capability, so it makes sense to also add
it to the list of capabilities which are not supported with zero-copy-send.
Fixes: 1abaec9a1b ("migration: Change zero_copy_send from migration parameter to migration capability")
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220719122345.253713-1-leobras@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reorder the structures so we can know if the fields are:
- Read only
- Their own locking (i.e. sems)
- Protected by 'mutex'
- Only for the multifd channel
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220531104318.7494-2-quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Typo fixes from Chen Zhang
Some errors, like the lack of Scatter-Gather support by the network
interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
zero-copy, which causes it to fall back to the default copying mechanism.
After each full dirty-bitmap scan there should be a zero-copy flush
happening, which checks for errors each of the previous calls to
sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
increment dirty_sync_missed_zero_copy migration stat to let the user know
about it.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220711211112.18951-4-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220711211112.18951-3-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The code calls qio_channel_read() in a loop when it reports
QIO_CHANNEL_ERR_BLOCK. This code is reported when errno==EAGAIN.
As such the later block of code will always hit the 'errno != EAGAIN'
condition, making the final 'else' unreachable.
Fixes: Coverity CID 1490203
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220627135318.156121-1-berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
With preemption mode on, when we see a postcopy request that was requesting
for exactly the page that we have preempted before (so we've partially sent
the page already via PRECOPY channel and it got preempted by another
postcopy request), currently we drop the request so that after all the
other postcopy requests are serviced then we'll go back to precopy stream
and start to handle that.
We dropped the request because we can't send it via postcopy channel since
the precopy channel already contains partial of the data, and we can only
send a huge page via one channel as a whole. We can't split a huge page
into two channels.
That's a very corner case and that works, but there's a change on the order
of postcopy requests that we handle since we're postponing this (unlucky)
postcopy request to be later than the other queued postcopy requests. The
problem is there's a possibility that when the guest was very busy, the
postcopy queue can be always non-empty, it means this dropped request will
never be handled until the end of postcopy migration. So, there's a chance
that there's one dest QEMU vcpu thread waiting for a page fault for an
extremely long time just because it's unluckily accessing the specific page
that was preempted before.
The worst case time it needs can be as long as the whole postcopy migration
procedure. It's extremely unlikely to happen, but when it happens it's not
good.
The root cause of this problem is because we treat pss->postcopy_requested
variable as with two meanings bound together, as the variable shows:
1. Whether this page request is urgent, and,
2. Which channel we should use for this page request.
With the old code, when we set postcopy_requested it means either both (1)
and (2) are true, or both (1) and (2) are false. We can never have (1)
and (2) to have different values.
However it doesn't necessarily need to be like that. It's very legal that
there's one request that has (1) very high urgency, but (2) we'd like to
use the precopy channel. Just like the corner case we were discussing
above.
To differenciate the two meanings better, introduce a new field called
postcopy_target_channel, showing which channel we should use for this page
request, so as to cover the old meaning (2) only. Then we leave the
postcopy_requested variable to stand only for meaning (1), which is the
urgency of this page request.
With this change, we can easily boost priority of a preempted precopy page
as long as we know that page is also requested as a postcopy page. So with
the new approach in get_queued_page() instead of dropping that request, we
send it right away with the precopy channel so we get back the ordering of
the page faults just like how they're requested on dest.
Reported-by: Manish Mishra <manish.mishra@nutanix.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Manish Mishra <manish.mishra@nutanix.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185520.27583-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This patch is based on the async preempt channel creation. It continues
wiring up the new channel with TLS handshake to destionation when enabled.
Note that only the src QEMU needs such operation; the dest QEMU does not
need any change for TLS support due to the fact that all channels are
established synchronously there, so all the TLS magic is already properly
handled by migration_tls_channel_process_incoming().
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185518.27529-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
It's useful for specifying tls credentials all in the cmdline (along with
the -object tls-creds-*), especially for debugging purpose.
The trick here is we must remember to not free these fields again in the
finalize() function of migration object, otherwise it'll cause double-free.
The thing is when destroying an object, we'll first destroy the properties
that bound to the object, then the object itself. To be explicit, when
destroy the object in object_finalize() we have such sequence of
operations:
object_property_del_all(obj);
object_deinit(obj, ti);
So after this change the two fields are properly released already even
before reaching the finalize() function but in object_property_del_all(),
hence we don't need to free them anymore in finalize() or it's double-free.
This also fixes a trivial memory leak for tls-authz as we forgot to free it
before this patch.
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185515.27475-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Add migrate_channel_requires_tls() to detect whether the specific channel
requires TLS, leveraging the recently introduced migrate_use_tls(). No
functional change intended.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185513.27421-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Add a property field that can conditionally disable the "break sending huge
page" behavior in postcopy preemption. By default it's enabled.
It should only be used for debugging purposes, and we should never remove
the "x-" prefix.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Manish Mishra <manish.mishra@nutanix.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185511.27366-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This patch allows the postcopy preempt channel to be created
asynchronously. The benefit is that when the connection is slow, we won't
take the BQL (and potentially block all things like QMP) for a long time
without releasing.
A function postcopy_preempt_wait_channel() is introduced, allowing the
migration thread to be able to wait on the channel creation. The channel
is always created by the main thread, in which we'll kick a new semaphore
to tell the migration thread that the channel has created.
We'll need to wait for the new channel in two places: (1) when there's a
new postcopy migration that is starting, or (2) when there's a postcopy
migration to resume.
For the start of migration, we don't need to wait for this channel until
when we want to start postcopy, aka, postcopy_start(). We'll fail the
migration if we found that the channel creation failed (which should
probably not happen at all in 99% of the cases, because the main channel is
using the same network topology).
For a postcopy recovery, we'll need to wait in postcopy_pause(). In that
case if the channel creation failed, we can't fail the migration or we'll
crash the VM, instead we keep in PAUSED state, waiting for yet another
recovery.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Manish Mishra <manish.mishra@nutanix.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185509.27311-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
needs similar handling on fault tolerance. When ram_load_postcopy() fails,
instead of stopping the thread it halts with a semaphore, preparing to be
kicked again when recovery is detected.
A mutex is introduced to make sure there's no concurrent operation upon the
socket. To make it simple, the fast ram load thread will take the mutex during
its whole procedure, and only release it if it's paused. The fast-path socket
will be properly released by the main loading thread safely when there's
network failures during postcopy with that mutex held.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185506.27257-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This patch enables postcopy-preempt feature.
It contains two major changes to the migration logic:
(1) Postcopy requests are now sent via a different socket from precopy
background migration stream, so as to be isolated from very high page
request delays.
(2) For huge page enabled hosts: when there's postcopy requests, they can now
intercept a partial sending of huge host pages on src QEMU.
After this patch, we'll live migrate a VM with two channels for postcopy: (1)
PRECOPY channel, which is the default channel that transfers background pages;
and (2) POSTCOPY channel, which only transfers requested pages.
There's no strict rule of which channel to use, e.g., if a requested page is
already being transferred on precopy channel, then we will keep using the same
precopy channel to transfer the page even if it's explicitly requested. In 99%
of the cases we'll prioritize the channels so we send requested page via the
postcopy channel as long as possible.
On the source QEMU, when we found a postcopy request, we'll interrupt the
PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
After we serviced all the high priority postcopy pages, we'll switch back to
PRECOPY channel so that we'll continue to send the interrupted huge page again.
There's no new thread introduced on src QEMU.
On the destination QEMU, one new thread is introduced to receive page data from
the postcopy specific socket (done in the preparation patch).
This patch has a side effect: after sending postcopy pages, previously we'll
assume the guest will access follow up pages so we'll keep sending from there.
Now it's changed. Instead of going on with a postcopy requested page, we'll go
back and continue sending the precopy huge page (which can be intercepted by a
postcopy request so the huge page can be sent partially before).
Whether that's a problem is debatable, because "assuming the guest will
continue to access the next page" may not really suite when huge pages are
used, especially if the huge page is large (e.g. 1GB pages). So that locality
hint is much meaningless if huge pages are used.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185504.27203-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Create a new socket for postcopy to be prepared to send postcopy requested
pages via this specific channel, so as to not get blocked by precopy pages.
A new thread is also created on dest qemu to receive data from this new channel
based on the ram_load_postcopy() routine.
The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
function, and that'll be done in follow up patches.
Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
thread too to make sure it'll be recycled properly.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185502.27149-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: With Peter's fix to quieten compiler warning on
start_migration
Firstly, postcopy already preempts precopy due to the fact that we do
unqueue_page() first before looking into dirty bits.
However that's not enough, e.g., when there're host huge page enabled, when
sending a precopy huge page, a postcopy request needs to wait until the whole
huge page that is sending to finish. That could introduce quite some delay,
the bigger the huge page is the larger delay it'll bring.
This patch adds a new capability to allow postcopy requests to preempt existing
precopy page during sending a huge page, so that postcopy requests can be
serviced even faster.
Meanwhile to send it even faster, bypass the precopy stream by providing a
standalone postcopy socket for sending requested pages.
Since the new behavior will not be compatible with the old behavior, this will
not be the default, it's enabled only when the new capability is set on both
src/dst QEMUs.
This patch only adds the capability itself, the logic will be added in follow
up patches.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185342.26794-2-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>