Commit Graph

159 Commits

Author SHA1 Message Date
Leonardo Bras
abb6295b3a migration: Add zero-copy-send parameter for QMP/HMP for Linux
Add property that allows zero-copy migration of memory pages
on the sending side, and also includes a helper function
migrate_use_zero_copy_send() to check if it's enabled.

No code is introduced to actually do the migration, but it allow
future implementations to enable/disable this feature.

On non-Linux builds this parameter is compiled-out.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220513062836.965425-5-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-05-16 13:56:24 +01:00
Peter Xu
08401c0426 migration: Allow migrate-recover to run multiple times
Previously migration didn't have an easy way to cleanup the listening
transport, migrate recovery only allows to execute once.  That's done with a
trick flag in postcopy_recover_triggered.

Now the facility is already there.

Drop postcopy_recover_triggered and instead allows a new migrate-recover to
release the previous listener transport.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220331150857.74406-8-peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-04-21 19:36:46 +01:00
Peter Xu
f444eeda71 migration: Move migrate_allow_multifd and helpers into migration.c
This variable, along with its helpers, is used to detect whether multiple
channel will be supported for migration.  In follow up patches, there'll be
other capability that requires multi-channels.  Hence move it outside multifd
specific code and make it public.  Meanwhile rename it from "multifd" to
"multi_channels" to show its real meaning.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220331150857.74406-5-peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-04-21 19:36:46 +01:00
Peter Xu
e031149c78 migration: Add migration_incoming_transport_cleanup()
Add a helper to cleanup the transport listener.

When do it, we should also null-ify the cleanup hook and the data, then it's
even safe to call it multiple times.

Move the socket_address_list cleanup altogether, because that's a mirror of the
listener channels and only for the purpose of query-migrate.  Hence when
someone wants to cleanup the listener transport, it should also want to cleanup
the socket list too, always.

No functional change intended.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-15-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-03-02 18:20:45 +00:00
Peter Xu
755e8d7cb6 migration: Move static var in ram_block_from_stream() into global
Static variable is very unfriendly to threading of ram_block_from_stream().
Move it into MigrationIncomingState.

Make the incoming state pointer to be passed over to ram_block_from_stream() on
both caller sites.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-8-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-03-02 18:20:45 +00:00
Peter Xu
095c12a4a2 migration: Add postcopy_thread_create()
Postcopy create threads. A common manner is we init a sem and use it to sync
with the thread.  Namely, we have fault_thread_sem and listen_thread_sem and
they're only used for this.

Make it a shared infrastructure so it's easier to create yet another thread.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-7-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-03-02 18:20:45 +00:00
Peter Xu
77dadc3f83 migration: Introduce postcopy channels on dest node
Postcopy handles huge pages in a special way that currently we can only have
one "channel" to transfer the page.

It's because when we install pages using UFFDIO_COPY, we need to have the whole
huge page ready, it also means we need to have a temp huge page when trying to
receive the whole content of the page.

Currently all maintainance around this tmp page is global: firstly we'll
allocate a temp huge page, then we maintain its status mostly within
ram_load_postcopy().

To enable multiple channels for postcopy, the first thing we need to do is to
prepare N temp huge pages as caching, one for each channel.

Meanwhile we need to maintain the tmp huge page status per-channel too.

To give some example, some local variables maintained in ram_load_postcopy()
are listed; they are responsible for maintaining temp huge page status:

  - all_zero:     this keeps whether this huge page contains all zeros
  - target_pages: this counts how many target pages have been copied
  - host_page:    this keeps the host ptr for the page to install

Move all these fields to be together with the temp huge pages to form a new
structure called PostcopyTmpPage.  Then for each (future) postcopy channel, we
need one structure to keep the state around.

For vanilla postcopy, obviously there's only one channel.  It contains both
precopy and postcopy pages.

This patch teaches the dest migration node to start realize the possible number
of postcopy channels by introducing the "postcopy_channels" variable.  Its
value is calculated when setup postcopy on dest node (during POSTCOPY_LISTEN
phase).

Vanilla postcopy will have channels=1, but when postcopy-preempt capability is
enabled (in the future), we will boost it to 2 because even during partial
sending of a precopy huge page we still want to preempt it and start sending
the postcopy requested page right away (so we start to keep two temp huge
pages; more if we want to enable multifd).  In this patch there's a TODO marked
for that; so far the channels is always set to 1.

We need to send one "host huge page" on one channel only and we cannot split
them, because otherwise the data upon the same huge page can locate on more
than one channel so we need more complicated logic to manage.  One temp host
huge page for each channel will be enough for us for now.

Postcopy will still always use the index=0 huge page even after this patch.
However it prepares for the latter patches where it can start to use multiple
channels (which needs src intervention, because only src knows which channel we
should use).

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-5-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Fixed up long line
2022-03-02 18:19:31 +00:00
Laurent Vivier
458fecca80 migration: provide an error message to migration_cancel()
This avoids to call migrate_get_current() in the caller function
whereas migration_cancel() already needs the pointer to the current
migration state.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-03 09:38:53 +01:00
Peter Xu
43044ac0ee migration: Make from_dst_file accesses thread-safe
Accessing from_dst_file is potentially racy in current code base like below:

  if (s->from_dst_file)
    do_something(s->from_dst_file);

Because from_dst_file can be reset right after the check in another
thread (rp_thread).  One example is migrate_fd_cancel().

Use the same qemu_file_lock to protect it too, just like to_dst_file.

When it's safe to access without lock, comment it.

There's one special reference in migration_thread() that can be replaced by
the newly introduced rp_thread_created flag.

Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Message-Id: <20210722175841.938739-3-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  with Peter's fixup
2021-07-26 12:44:46 +01:00
Peter Xu
53021ea165 migration: Fix missing join() of rp_thread
It's possible that the migration thread skip the join() of the rp_thread in
below race and crash on src right at finishing migration:

       migration_thread                     rp_thread
       ----------------                     ---------
    migration_completion()
                                        (before rp_thread quits)
                                        from_dst_file=NULL
                                        [thread got scheduled out]
      s->rp_state.from_dst_file==NULL
        (skip join() of rp_thread)
    migrate_fd_cleanup()
      qemu_fclose(s->to_dst_file)
      yank_unregister_instance()
        assert(yank_find_entry())  <------- crash

It could mostly happen with postcopy, but that shouldn't be required, e.g., I
think it could also trigger with MIGRATION_CAPABILITY_RETURN_PATH set.

It's suspected that above race could be the root cause of a recent (but rare)
migration-test break reported by either Dave or PMM:

https://lore.kernel.org/qemu-devel/YPamXAHwan%2FPPXLf@work-vm/

The issue is: from_dst_file is reset in the rp_thread, so if the thread reset
it to NULL fast enough then the migration thread will assume there's no
rp_thread at all.

This could potentially cause more severe issue (e.g. crash) after the yank code.

Fix it by using a boolean to keep "whether we've created rp_thread".

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210722175841.938739-2-peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-26 12:44:26 +01:00
Dr. David Alan Gilbert
1df6ddb43b migration: Add cleanup hook for inwards migration
Add a cleanup hook for incoming migration that gets called
at the end as a way for a transport to allow cleanup.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210421112834.107651-4-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-06-08 19:36:18 +01:00
Peter Maydell
9b1e81d1c2 * Replace YAML anchors by extends in the gitlab-CI yaml files
* Many small qtest fixes (e.g. to fix issues discovered by Coverity)
 * Poison more config switches in common code
 * Fix the failing Travis-CI and Cirrus-CI tasks
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmCeXFMRHHRodXRoQHJl
 ZGhhdC5jb20ACgkQLtnXdP5wLbUV7g//VRxN74v2pO6zSsSNavYTkMa3jZE1Io3l
 w9lsOr3DM0HIMhyEMwSJsiygj0s8TNanUtFBHfDfo1k1gmZYd5FiM1sB7ZB/sB/S
 9Q9wjmeV2NMG7pcr9zLmiJaEM2LGfGbso44/m0c+gpSyxzg2TeH7sF+38AUZI9/R
 J6/gOzIs4WWdSV4f8kp+YPPQCyOtxZsfxDuk1z1fKsBgXE5iEBUqYyyZUm4gYJkN
 4ALmqc/NVeLszt08qkPHxwXQc488nCJP31psxx6MQ7toKfCAamT07Pp3oi70cakC
 y+HVfsIHc7SxaZyj8sbJCU3LJHwDd8N82ydJUrv216qDffb38fNb+m6y9mcYy2MH
 C25uGe9mucTuTP1WIttC6nCKg/MCgi4PWIzqEhkeKj3TxpTUJJP+BUIfSubV38Gc
 T+XUCNkWWW8sTeRiyE3m9pEZ+gz1MFubaIr/Owephch1SjRYn4zUwJFHHi3I4PY4
 7XjEo8y2M9tKckW3pCfYi+aIDzC1DcRtzvXUGtdemX5xVjTAGlKkFLWl/brdCn2U
 y+JrwrL8cQ3Fy7bXzmK6m9mmhcIW6PcSOBE7RnSESyKzqM5VBSBgnjMqnHcP3FrV
 FYzihTCDcFKy3ELUe3Kbc1TgG/07stQs9xInGXN7ZFsIwadfHgoNVQ6JpOqq1oQa
 fvmLPBhq+a4=
 =QZuG
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-05-14' into staging

* Replace YAML anchors by extends in the gitlab-CI yaml files
* Many small qtest fixes (e.g. to fix issues discovered by Coverity)
* Poison more config switches in common code
* Fix the failing Travis-CI and Cirrus-CI tasks

# gpg: Signature made Fri 14 May 2021 12:17:39 BST
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* remotes/thuth-gitlab/tags/pull-request-2021-05-14:
  cirrus.yml: Fix the MSYS2 task
  pc-bios/s390-ccw: Fix inline assembly for older versions of Clang
  tests/qtest/migration-test: Use g_autofree to avoid leaks on error paths
  configure: Poison all current target-specific #defines
  migration: Move populate_vfio_info() into a separate file
  include/sysemu: Poison all accelerator CONFIG switches in common code
  tests: Avoid side effects inside g_assert() arguments
  tests/qtest/rtc-test: Remove pointless NULL check
  tests/qtest/tpm-util.c: Free memory with correct free function
  tests/migration-test: Fix "true" vs true
  tests/qtest/npcm7xx_pwm-test.c: Avoid g_assert_true() for non-test assertions
  tests/qtest/ahci-test.c: Calculate iso_size with 64-bit arithmetic
  util/compatfd.c: Replaced a malloc call with g_malloc.
  libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm
  docs/devel/qgraph: add troubleshooting information
  libqos/qgraph: fix "UNAVAILBLE" typo
  gitlab-ci: Replace YAML anchors by extends (native_test_job)
  gitlab-ci: Replace YAML anchors by extends (native_build_job)
  gitlab-ci: Replace YAML anchors by extends (container_job)
  tests/docker/dockerfiles: Add ccache to containers where it was missing

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-05-14 19:33:23 +01:00
Thomas Huth
43bd0bf30f migration: Move populate_vfio_info() into a separate file
The CONFIG_VFIO switch only works in target specific code. Since
migration/migration.c is common code, the #ifdef does not have
the intended behavior here. Move the related code to a separate
file now which gets compiled via specific_ss instead.

Fixes: 3710586caa ("qapi: Add VFIO devices migration stats in Migration stats")
Message-Id: <20210414112004.943383-3-thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2021-05-14 12:31:51 +02:00
David Hildenbrand
c7c0e72408 migration/ram: Handle RAM block resizes during precopy
Resizing while migrating is dangerous and does not work as expected.
The whole migration code works on the usable_length of ram blocks and does
not expect this to change at random points in time.

In the case of precopy, the ram block size must not change on the source,
after syncing the RAM block list in ram_save_setup(), so as long as the
guest is still running on the source.

Resizing can be trigger *after* (but not during) a reset in
ACPI code by the guest
- hw/arm/virt-acpi-build.c:acpi_ram_update()
- hw/i386/acpi-build.c:acpi_ram_update()

Use the ram block notifier to get notified about resizes. Let's simply
cancel migration and indicate the reason. We'll continue running on the
source. No harm done.

Update the documentation. Postcopy will be handled separately.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210429112708.12291-5-david@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Manual merge
2021-05-13 18:21:13 +01:00
Markus Armbruster
8b9407a09f migration: Clean up signed vs. unsigned XBZRLE cache-size
73af8dd8d7 "migration: Make xbzrle_cache_size a migration
parameter" (v2.11.0) made the new parameter unsigned (QAPI type
'size', uint64_t in C).  It neglected to update existing code, which
continues to use int64_t.

migrate_xbzrle_cache_size() returns the new parameter.  Adjust its
return type.

QMP query-migrate-cache-size returns migrate_xbzrle_cache_size().
Adjust its return type.

migrate-set-parameters passes the new parameter to
xbzrle_cache_resize().  Adjust its parameter type.

xbzrle_cache_resize() passes it on to cache_init().  Adjust its
parameter type.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202141734.2488076-3-armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Andrey Gruzdev
8518278a6a migration: implementation of background snapshot thread
Introducing implementation of 'background' snapshot thread
which in overall follows the logic of precopy migration
while internally utilizes completely different mechanism
to 'freeze' vmstate at the start of snapshot creation.

This mechanism is based on userfault_fd with wr-protection
support and is Linux-specific.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210129101407.103458-5-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Andrey Gruzdev
6e8c25b4c6 migration: introduce 'background-snapshot' migration capability
Add new capability to 'qapi/migration.json' schema.
Update migrate_caps_check() to validate enabled capability set
against introduced one. Perform checks for required kernel features
and compatibility with guest memory backends.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210129101407.103458-2-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Peter Xu
8f8bfffcf1 migration: Maintain postcopy faulted addresses
Maintain a list of faulted addresses on the destination host for which we're
waiting on.  This is implemented using a GTree rather than a real list to make
sure even there're plenty of vCPUs/threads that are faulting, the lookup will
still be fast with O(log(N)) (because we'll do that after placing each page).
It should bring a slight overhead, but ideally that shouldn't be a big problem
simply because in most cases the requested page list will be short.

Actually we did similar things for postcopy blocktime measurements.  This patch
didn't use that simply because:

  (1) blocktime measurement is towards vcpu threads only, but here we need to
      record all faulted addresses, including main thread and external
      thread (like, DPDK via vhost-user).

  (2) blocktime measurement will require UFFD_FEATURE_THREAD_ID, but here we
      don't want to add that extra dependency on the kernel version since not
      necessary.  E.g., we don't need to know which thread faulted on which
      page, we also don't care about multiple threads faulting on the same
      page.  But we only care about what addresses are faulted so waiting for a
      page copying from src.

  (3) blocktime measurement is not enabled by default.  However we need this by
      default especially for postcopy recover.

Another thing to mention is that this patch introduced a new mutex to serialize
the receivedmap and the page_requested tree, however that serialization does
not cover other procedures like UFFDIO_COPY.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201021212721.440373-4-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 16:15:04 +00:00
Peter Xu
7a267fc49b migration: Introduce migrate_send_rp_message_req_pages()
This is another layer wrapper for sending a page request to the source VM.  The
new migrate_send_rp_message_req_pages() will be used elsewhere in coming
patches.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201021212721.440373-3-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 16:15:04 +00:00
Bihong Yu
f16aee44b4 migration: Open brace '{' following struct go on the same line
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1603163448-27122-5-git-send-email-yubihong@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 16:15:04 +00:00
Chuan Zheng
d8053e73fb migration/tls: save hostname into MigrationState
hostname is need in multifd-tls, save hostname into MigrationState.

Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Yan Jin <jinyan12@huawei.com>
Message-Id: <1600139042-104593-2-git-send-email-zhengchuan@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25 12:45:58 +01:00
Peter Xu
2e2bce167e migration: Rework migrate_send_rp_req_pages() function
We duplicated the logic of maintaining the last_rb variable at both callers of
this function.  Pass *rb pointer into the function so that we can avoid
duplicating the logic.  Also, when we have the rb pointer, it's also easier to
remove the original 2nd & 4th parameters, because both of them (name of the
ramblock when needed, or the page size) can be fetched from the ramblock
pointer too.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200908203022.341615-3-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25 11:11:01 +01:00
Eduardo Habkost
8110fa1d94 Use DECLARE_*CHECKER* macros
Generated using:

 $ ./scripts/codeconverter/converter.py -i \
   --pattern=TypeCheckMacro $(git grep -l '' -- '*.[ch]')

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-12-ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-13-ehabkost@redhat.com>
Message-Id: <20200831210740.126168-14-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-09 09:27:09 -04:00
Eduardo Habkost
db1015e92e Move QOM typedefs and add missing includes
Some typedefs and macros are defined after the type check macros.
This makes it difficult to automatically replace their
definitions with OBJECT_DECLARE_TYPE.

Patch generated using:

 $ ./scripts/codeconverter/converter.py -i \
   --pattern=QOMStructTypedefSplit $(git grep -l '' -- '*.[ch]')

which will split "typdef struct { ... } TypedefName"
declarations.

Followed by:

 $ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \
    $(git grep -l '' -- '*.[ch]')

which will:
- move the typedefs and #defines above the type check macros
- add missing #include "qom/object.h" lines if necessary

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-9-ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-10-ehabkost@redhat.com>
Message-Id: <20200831210740.126168-11-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-09 09:26:43 -04:00
Eduardo Habkost
6c725351c3 migration: Rename class type checking macros
Rename the macros to make them consistent with the MIGRATION_OBJ
macro name.

This will make future conversion to OBJECT_DECLARE* easier.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-By: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20200825192110.3528606-51-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-08-27 14:04:55 -04:00
Max Reitz
31e4c354b3 migration: Add block-bitmap-mapping parameter
This migration parameter allows mapping block node names and bitmap
names to aliases for the purpose of block dirty bitmap migration.

This way, management tools can use different node and bitmap names on
the source and destination and pass the mapping of how bitmaps are to be
transferred to qemu (on the source, the destination, or even both with
arbitrary aliases in the migration stream).

While touching this code, fix a bug where bitmap names longer than 255
bytes would fail an assertion in qemu_put_counted_string().

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200820150725.68687-2-mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-08-21 08:56:09 -05:00
Vladimir Sementsov-Ogievskiy
1499ab0969 migration/block-dirty-bitmap: cancel migration on shutdown
If target is turned off prior to postcopy finished, target crashes
because busy bitmaps are found at shutdown.
Canceling incoming migration helps, as it removes all unfinished (and
therefore busy) bitmaps.

Similarly on source we crash in bdrv_close_all which asserts that all
bdrv states are removed, because bdrv states involved into dirty bitmap
migration are referenced by it. So, we need to cancel outgoing
migration as well.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <20200727194236.19551-17-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-07-27 15:41:34 -05:00
Vladimir Sementsov-Ogievskiy
d0cccbd118 migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
No reasons to keep two public init functions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200727194236.19551-11-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-07-27 15:39:59 -05:00
Lukas Straub
bb70b66ed7 migration/colo.c: Use event instead of semaphore
If multiple packets miscompare in a short timeframe, the semaphore
value will be increased multiple times. This causes multiple
checkpoints even if one would be sufficient.

Fix this by using a event instead of a semaphore for triggering
checkpoints. Now, checkpoint requests will be ignored until the
checkpoint event is sent to colo-compare (which releases the
miscompared packets).

Benchmark results (iperf3):
Client-to-server tcp:
without patch: ~66 Mbit/s
with patch: ~61 Mbit/s
Server-to-client tcp:
without patch: ~702 Kbit/s
with patch: ~16 Mbit/s

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Message-Id: <fd601ba1beb524aada54ba66e87ebfc12cf4574b.1589193382.git.lukasstraub2@web.de>
Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-06-01 18:44:27 +01:00
Juan Quintela
6a9ad15420 multifd: Add multifd-zstd-level parameter
This parameter specifies the zstd compression level. The next patch
will put it to use.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
2020-02-28 09:25:28 +01:00
Juan Quintela
9004db48c0 multifd: Add multifd-zlib-level parameter
This parameter specifies the zlib compression level. The next patch
will put it to use.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-02-28 09:24:43 +01:00
Juan Quintela
ab7cbb0b9a multifd: Make no compression operations into its own structure
It will be used later.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

---

No comp value needs to be zero.
2020-02-28 09:24:43 +01:00
Juan Quintela
b673eab4e2 multifd: Make multifd_load_setup() get an Error parameter
We need to change the full chain to pass the Error parameter.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-29 11:28:59 +01:00
Juan Quintela
392d87e213 migration: Create migration_is_running()
This function returns true if we are in the middle of a migration.
It is like migration_is_setup_or_active() with CANCELLING and COLO.
Adapt all callers that are needed.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-29 11:28:59 +01:00
Dr. David Alan Gilbert
97e1e06780 migration: Rate limit inside host pages
When using hugepages, rate limiting is necessary within each huge
page, since a 1G huge page can take a significant time to send, so
you end up with bursty behaviour.

Fixes: 4c011c37ec ("postcopy: Send whole huge pages")
Reported-by: Lin Ma <LMa@suse.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-01-20 09:10:22 +01:00
Jens Freimann
c7e0acd5a3 migration: add new migration state wait-unplug
This patch adds a new migration state called wait-unplug.  It is entered
after the SETUP state if failover devices are present. It will transition
into ACTIVE once all devices were succesfully unplugged from the guest.

So if a guest doesn't respond or takes long to honor the unplug request
the user will see the migration state 'wait-unplug'.

In the migration thread we query failover devices if they're are still
pending the guest unplug. When all are unplugged the migration
continues. If one device won't unplug migration will stay in wait_unplug
state.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20191029114905.6856-9-jfreimann@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-10-29 18:55:26 -04:00
Yury Kotov
b9d68df62a migration: Add validate-uuid capability
This capability realizes simple source validation by UUID.
It's useful for live migration between hosts.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190903162246.18524-2-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-09-12 11:19:23 +01:00
Peter Maydell
95a9457fd4 Header cleanup patches for 2019-08-13
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl1WleASHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTBBYQALQLzIYb2Zux95bAxoJdhqNuEOGLfxeu
 gx0i0roPe6SBleHozUK+gf7kVYyw7he58n2dZURGqrpqktgZOFcea2a6Dq1rnVw6
 JMJ2Oy7V326bHwJT0Np9rW4n+FHsMQZoAUEHjl9EeGCZfO/zy2aSWPsD8mbcbm0g
 hUW5Jr4+cpm28BCL8I+2HhWFazB6G2IPAF9oEXmNsOM6J1Ho8WGrTAjASe0Il5Yi
 m2B4QWG+4uz77WYnkttnssm41K1S95HYyaKluIVyNwTnsPTN303V/sUj+wdRaooL
 k1O6WqaavGhal7QeRqy+vCpF8m6qLq7NaYCzSCOrrkkuC8TAnpVn7Xmi9qI+vb6O
 kGBpDWhq5wOnphsEhnFvhPZgD+WZo3mwTgW4h0d3UhB6orOTPTMvWKEwFJ1j/O6/
 gntV61o542c9gpZjS133221HRmNjteHF/5/TFzmX/G50sgivJn+WOP87naM2aBAz
 8MW5HatTox+qQqYD4VMUIVnVkguxHDVhFRBunYu0HvZZ1Rud+Lc6Xzi6H4jDlZ81
 vtOmAlMU3dbp97gNvJrAVqV4JIL3puOWbu0MMaQWoG53Kcdfu46LIr57TTg3dw61
 R9e7HSOQjYILChoodwELlyeAsVeZo3IzX9vPX8aw7MoHvneyTUNqtha/rHsLEwsb
 97G19dydGEC6
 =eSUz
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-include-2019-08-13-v2' into staging

Header cleanup patches for 2019-08-13

# gpg: Signature made Fri 16 Aug 2019 12:39:12 BST
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-include-2019-08-13-v2: (29 commits)
  sysemu: Split sysemu/runstate.h off sysemu/sysemu.h
  sysemu: Move the VMChangeStateEntry typedef to qemu/typedefs.h
  Include sysemu/sysemu.h a lot less
  Clean up inclusion of sysemu/sysemu.h
  numa: Move remaining NUMA declarations from sysemu.h to numa.h
  Include sysemu/hostmem.h less
  numa: Don't include hw/boards.h into sysemu/numa.h
  Include hw/boards.h a bit less
  Include hw/qdev-properties.h less
  Include qemu/main-loop.h less
  Include qemu/queue.h slightly less
  Include hw/hw.h exactly where needed
  Include qom/object.h slightly less
  Include exec/memory.h slightly less
  Include migration/vmstate.h less
  migration: Move the VMStateDescription typedef to typedefs.h
  Clean up inclusion of exec/cpu-common.h
  Include hw/irq.h a lot less
  typedefs: Separate incomplete types and function types
  ide: Include hw/ide/internal a bit less outside hw/ide/
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-16 14:53:43 +01:00
Markus Armbruster
a27bd6c779 Include hw/qdev-properties.h less
In my "build everything" tree, changing hw/qdev-properties.h triggers
a recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

Many places including hw/qdev-properties.h (directly or via hw/qdev.h)
actually need only hw/qdev-core.h.  Include hw/qdev-core.h there
instead.

hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h
and hw/qdev-properties.h, which in turn includes hw/qdev-core.h.
Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h.

While there, delete a few superfluous inclusions of hw/qdev-core.h.

Touching hw/qdev-properties.h now recompiles some 1200 objects.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-22-armbru@redhat.com>
2019-08-16 13:31:53 +02:00
Markus Armbruster
d484205210 Include exec/memory.h slightly less
Drop unnecessary inclusions from headers.  Downgrade a few more to
exec/hwaddr.h.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-17-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster
6a0acfff99 Clean up inclusion of exec/cpu-common.h
migration/qemu-file.h neglects to include it even though it needs
ram_addr_t.  Fix that.  Drop a few superfluous inclusions elsewhere.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-14-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Wei Yang
14adf288d3 migration: remove unused field bytes_xfer
MigrationState->bytes_xfer is only set to 0 in migrate_init().

Remove this unnecessary field.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190402003106.17614-1-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-08-14 17:33:14 +01:00
Peter Xu
002cad6b16 migration: Split log_clear() into smaller chunks
Currently we are doing log_clear() right after log_sync() which mostly
keeps the old behavior when log_clear() was still part of log_sync().

This patch tries to further optimize the migration log_clear() code
path to split huge log_clear()s into smaller chunks.

We do this by spliting the whole guest memory region into memory
chunks, whose size is decided by MigrationState.clear_bitmap_shift (an
example will be given below).  With that, we don't do the dirty bitmap
clear operation on the remote node (e.g., KVM) when we fetch the dirty
bitmap, instead we explicitly clear the dirty bitmap for the memory
chunk for each of the first time we send a page in that chunk.

Here comes an example.

Assuming the guest has 64G memory, then before this patch the KVM
ioctl KVM_CLEAR_DIRTY_LOG will be a single one covering 64G memory.
If after the patch, let's assume when the clear bitmap shift is 18,
then the memory chunk size on x86_64 will be 1UL<<18 * 4K = 1GB.  Then
instead of sending a big 64G ioctl, we'll send 64 small ioctls, each
of the ioctl will cover 1G of the guest memory.  For each of the 64
small ioctls, we'll only send if any of the page in that small chunk
was going to be sent right away.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190603065056.25211-12-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15 15:39:03 +02:00
Markus Armbruster
a8d2532645 Include qemu-common.h exactly where needed
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
2019-06-12 13:20:20 +02:00
Wei Yang
15d2d64cf5 migration: remove not used field xfer_limit
MigrationState->xfer_limit is only set to 0 in migrate_init().

Remove this unnecessary field.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190326055726.10539-1-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-05-14 17:33:35 +01:00
Markus Armbruster
811f865271 Revert "migration: move only_migratable to MigrationState"
This reverts commit 3df663e575.
This reverts commit b605c47b57.

Command line option --only-migratable is for disallowing any
configuration that can block migration.

Initially, --only-migratable set global variable @only_migratable.

Commit 3df663e575 "migration: move only_migratable to MigrationState"
replaced it by MigrationState member @only_migratable.  That was a
mistake.

First, it doesn't make sense on the design level.  MigrationState
captures the state of an individual migration, but --only-migratable
isn't a property of an individual migration, it's a restriction on
QEMU configuration.  With fault tolerance, we could have several
migrations at once.  --only-migratable would certainly protect all of
them.  Storing it in MigrationState feels inappropriate.

Second, it contributes to a dependency cycle that manifests itself as
a bug now.

Putting @only_migratable into MigrationState means its available only
after migration_object_init().

We can't set it before migration_object_init(), so we delay setting it
with a global property (this is fixup commit b605c47b57 "migration:
fix handling for --only-migratable").

We can't get it before migration_object_init(), so anything that uses
it can only run afterwards.

Since migrate_add_blocker() needs to obey --only-migratable, any code
adding migration blockers can run only afterwards.  This contributes
to the following dependency cycle:

* configure_blockdev() must run before machine_set_property()
  so machine properties can refer to block backends

* machine_set_property() before configure_accelerator()
  so machine properties like kvm-irqchip get applied

* configure_accelerator() before migration_object_init()
  so that Xen's accelerator compat properties get applied.

* migration_object_init() before configure_blockdev()
  so configure_blockdev() can add migration blockers

The cycle was closed when recent commit cda4aa9a5a "Create block
backends before setting machine properties" added the first
dependency, and satisfied it by violating the last one.  Broke block
backends that add migration blockers.

Moving @only_migratable into MigrationState was a mistake.  Revert it.

This doesn't quite break the "migration_object_init() before
configure_blockdev() dependency, since migrate_add_blocker() still has
another dependency on migration_object_init().  To be addressed the
next commit.

Note that the reverted commit made -only-migratable sugar for -global
migration.only-migratable=on below the hood.  Documentation has only
ever mentioned -only-migratable.  This commit removes the arcane &
undocumented alternative to -only-migratable again.  Nobody should be
using it.

Conflicts:
	include/migration/misc.h
	migration/migration.c
	migration/migration.h
	vl.c

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190401090827.20793-3-armbru@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2019-04-02 13:38:05 +02:00
Juan Quintela
efd1a1d640 multifd: Drop x-multifd-page-count parameter
Libvirt don't want to expose (and explain it).  From now on we measure
the number of packages in bytes instead of pages, so it is the same
independently of architecture.  We choose the page size of x86.
Notice that in the following patch we make this variable.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-03-25 18:13:41 +01:00
Juan Quintela
9aca82ba31 migration: Create socket-address parameter
It will be used to store the uri parameters. We want this only for
tcp, so we don't set it for other uris.  We need it to know what port
is migration running.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Removed DummyStruct as suggested by Eric & Markus

--
2019-03-06 10:49:17 +00:00
Yury Kotov
fbd162e629 migration: Add an ability to ignore shared RAM blocks
If ignore-shared capability is set then skip shared RAMBlocks during the
RAM migration.
Also, move qemu_ram_foreach_migratable_block (and rename) to the
migration code, because it requires access to the migration capabilities.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-4-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:17 +00:00
Yury Kotov
18269069c3 migration: Introduce ignore-shared capability
We want to use local migration to update QEMU for running guests.
In this case we don't need to migrate shared (file backed) RAM.
So, add a capability to ignore such blocks during live migration.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-3-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert
7659505c16 migration: Switch to using announce timer
Switch the announcements to using the new announce timer.
Move the code that does it to announce.c rather than savevm
because it really has nothing to do with the actual migration.

Migration starts the announce from bh's and so they're all
in the main thread/bql, and so there's never any racing with
the timers themselves.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2019-03-05 11:27:41 +08:00
Xiao Guangrong
aecbfe9c64 migration: introduce pages-per-second
It introduces a new statistic, pages-per-second, as bandwidth or mbps is
not enough to measure the performance of posting pages out as we have
compression, xbzrle, which can significantly reduce the amount of the
data size, instead, pages-per-second is the one we want

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20190111063732.10484-2-xiaoguangrong@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  With typo's Eric spotted fixed
2019-01-23 15:51:47 +00:00
Fei Li
49ed0d24a4 migration: fix the multifd code when receiving less channels
In our current code, when multifd is used during migration, if there
is an error before the destination receives all new channels, the
source keeps running, however the destination does not exit but keeps
waiting until the source is killed deliberately.

Fix this by dumping the specific error and let users decide whether
to quit from the destination side when failing to receive packet via
some channel. And update the comment for multifd_recv_new_channel().

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190113140849.38339-3-lifei1214@126.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-01-23 15:02:07 +00:00
Jia Lina
3d63da16fb migration: avoid segmentfault when take a snapshot of a VM which being migrated
During an active background migration, snapshot will trigger a
segmentfault. As snapshot clears the "current_migration" struct
and updates "to_dst_file" before it finds out that there is a
migration task, Migration accesses the null pointer in
"current_migration" struct and qemu crashes eventually.

Signed-off-by: Jia Lina <jialina01@baidu.com>
Signed-off-by: Chai Wen <chaiwen@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Message-Id: <20181026083620.10172-1-jialina01@baidu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-10-31 09:38:59 +00:00
Xiao Guangrong
1d58872a91 migration: do not wait for free thread
Instead of putting the main thread to sleep state to wait for
free compression thread, we can directly post it out as normal
page that reduces the latency and uses CPUs more efficiently

A parameter, compress-wait-thread, is introduced, it can be
enabled if the user really wants the old behavior

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 12:34:11 +02:00
Li Qiang
4cbc9c7ffd migrate/cpu-throttle: Add max-cpu-throttle migration parameter
Currently, the default maximum CPU throttle for migration is
99(CPU_THROTTLE_PCT_MAX). This is too big and can make a remarkable
performance effect for the guest. We see a lot of packets latency
exceed 500ms when the CPU_THROTTLE_PCT_MAX reached. This patch set
adds a new max-cpu-throttle parameter to limit the CPU throttle.

Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 11:42:34 +02:00
Dr. David Alan Gilbert
ad767bed5a migration: Wake rate limiting for urgent requests
Rate limiting sleeps the migration thread for a while when it runs
out of bandwidth; but sometimes we want to wake up to get on with
something more urgent (like a postcopy request).  Here we use
a semaphore with a timedwait instead of a simple sleep; Incrementing
the sempahore will wake it up sooner.  Anything that consumes
these urgent events must decrement the sempahore.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180613102642.23995-3-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert
343f632c70 migration: Poison ramblock loops in migration
The migration code should be using the
  RAMBLOCK_FOREACH_MIGRATABLE and qemu_ram_foreach_block_migratable
not the all-block versions;  poison them so that we can't accidentally
use them.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180605162545.80778-3-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15 14:40:56 +01:00
Xiao Guangrong
f548222c24 migration: introduce decompress-error-check
QEMU 3.0 enables strict check for compression & decompression to
make the migration more robust, that depends on the source to fix
the internal design which triggers the unexpected error conditions

To make it work for migrating old version QEMU to 2.13 QEMU, we
introduce this parameter to disable the error check on the
destination which is the default behavior of the machine type
which is older than 2.13, alternately, the strict check can be
enabled explicitly as followings:
      -M pc-q35-2.11 -global migration.decompress-error-check=true

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-04 05:46:15 +02:00
Peter Xu
62df066fff migration: introduce lock for to_dst_file
Let's introduce a lock for that QEMUFile since we are going to operate
on it in multiple threads.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-23-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 22:12:41 +02:00
Peter Xu
02affd41b1 qmp/migration: new command migrate-recover
The first allow-oob=true command.  It's used on destination side when
the postcopy migration is paused and ready for a recovery.  After
execution, a new migration channel will be established for postcopy to
continue.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-21-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
s/2.12/2.13/
2018-05-15 22:11:45 +02:00
Peter Xu
edd090c728 migration: synchronize dirty bitmap for resume
This patch implements the first part of core RAM resume logic for
postcopy. ram_resume_prepare() is provided for the work.

When the migration is interrupted by network failure, the dirty bitmap
on the source side will be meaningless, because even the dirty bit is
cleared, it is still possible that the sent page was lost along the way
to destination. Here instead of continue the migration with the old
dirty bitmap on source, we ask the destination side to send back its
received bitmap, then invert it to be our initial dirty bitmap.

The source side send thread will issue the MIG_CMD_RECV_BITMAP requests,
once per ramblock, to ask for the received bitmap. On destination side,
MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap.
Data will be received on the return-path thread of source, and the main
migration thread will be notified when all the ramblock bitmaps are
synchronized.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-17-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:56:57 +02:00
Peter Xu
13955b89ce migration: new message MIG_RP_MSG_RESUME_ACK
Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t
is used as payload to let the source know whether destination is ready
to continue the migration.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-15-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:56:53 +02:00
Peter Xu
a335debb35 migration: new message MIG_RP_MSG_RECV_BITMAP
Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send
received bitmap of ramblock back to source.

This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only
the header (including the ramblock name), and it was appended with the
whole ramblock received bitmap on the destination side.

When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP),
it parses it, convert it to the dirty bitmap by inverting the bits.

One thing to mention is that, when we send the recv bitmap, we are doing
these things in extra:

- converting the bitmap to little endian, to support when hosts are
  using different endianess on src/dst.

- do proper alignment for 8 bytes, to support when hosts are using
  different word size (32/64 bits) on src/dst.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-13-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:56:51 +02:00
Peter Xu
3a7804c306 migration: allow fault thread to pause
Allows the fault thread to stop handling page faults temporarily. When
network failure happened (and if we expect a recovery afterwards), we
should not allow the fault thread to continue sending things to source,
instead, it should halt for a while until the connection is rebuilt.

When the dest main thread noticed the failure, it kicks the fault thread
to switch to pause state.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-7-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:24:27 +02:00
Peter Xu
14b1742eaa migration: allow src return path to pause
Let the thread pause for network issues.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-6-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:24:27 +02:00
Peter Xu
b411b844fb migration: allow dst vm pause on postcopy
When there is IO error on the incoming channel (e.g., network down),
instead of bailing out immediately, we allow the dst vm to switch to the
new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
new semaphore, until someone poke it for another attempt.

One note is that here on ram loading thread we cannot detect the
POSTCOPY_ACTIVE state, but we need to detect the more specific
POSTCOPY_INCOMING_RUNNING state, to make sure we have already loaded all
the device states.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-5-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:24:27 +02:00
Peter Xu
b23c2ade25 migration: implement "postcopy-pause" src logic
Now when network down for postcopy, the source side will not fail the
migration. Instead we convert the status into this new paused state, and
we will try to wait for a rescue in the future.

If a recovery is detected, migration_thread() will reset its local
variables to prepare for that.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-4-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:24:27 +02:00
Juan Quintela
36c2f8be2c migration: Delay start of migration main routines
We need to make sure that we have started all the multifd threads.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15 20:24:27 +02:00
Alexey Perevalov
65ace06045 migration: add postcopy total blocktime into query-migrate
Postcopy total blocktime is available on destination side only.
But query-migrate was possible only for source. This patch
adds ability to call query-migrate on destination.
To be able to see postcopy blocktime, need to request postcopy-blocktime
capability.

The query-migrate command will show following sample result:
{"return":
    "postcopy-vcpu-blocktime": [115, 100],
    "status": "completed",
    "postcopy-blocktime": 100
}}

postcopy_vcpu_blocktime contains list, where the first item is the first
vCPU in QEMU.

This patch has a drawback, it combines states of incoming and
outgoing migration. Ongoing migration state will overwrite incoming
state. Looks like better to separate query-migrate for incoming and
outgoing migration or add parameter to indicate type of migration.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1521742647-25550-7-git-send-email-a.perevalov@samsung.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25 18:02:17 +01:00
Alexey Perevalov
2a4c42f18c migration: add postcopy blocktime ctx into MigrationIncomingState
This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID, in
case this feature is provided by kernel.

PostcopyBlocktimeContext is encapsulated inside postcopy-ram.c,
due to it being a postcopy-only feature.
Also it defines PostcopyBlocktimeContext's instance live time.
Information from PostcopyBlocktimeContext instance will be provided
much after postcopy migration end, instance of PostcopyBlocktimeContext
will live till QEMU exit, but part of it (vcpu_addr,
page_fault_vcpu_time) used only during calculation, will be released
when postcopy ended or failed.

To enable postcopy blocktime calculation on destination, need to
request proper compatibility (Patch for documentation will be at the
tail of the patch set).

As an example following command enable that capability, assume QEMU was
started with
-chardev socket,id=charmonitor,path=/var/lib/migrate-vm-monitor.sock
option to control it

[root@host]#printf "{\"execute\" : \"qmp_capabilities\"}\r\n \
{\"execute\": \"migrate-set-capabilities\" , \"arguments\":   {
\"capabilities\": [ { \"capability\": \"postcopy-blocktime\", \"state\":
true } ] } }" | nc -U /var/lib/migrate-vm-monitor.sock

Or just with HMP
(qemu) migrate_set_capability postcopy-blocktime on

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1521742647-25550-3-git-send-email-a.perevalov@samsung.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25 18:02:12 +01:00
Alexey Perevalov
f22f928ec9 migration: introduce postcopy-blocktime capability
Right now it could be used on destination side to
enable vCPU blocktime calculation for postcopy live migration.
vCPU blocktime - it's time since vCPU thread was put into
interruptible sleep, till memory page was copied and thread awake.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1521742647-25550-2-git-send-email-a.perevalov@samsung.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25 18:02:12 +01:00
Peter Maydell
ed627b2ad3 virtio,vhost,pci,pc: features, cleanups
SRAT tables for DIMM devices
 new virtio net flags for speed/duplex
 post-copy migration support in vhost
 cleanups in pci
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJasR1rAAoJECgfDbjSjVRpOocH/R9A3g/TkpGjmLzJBrrX1NGO
 I/iq0ttHjqg4OBIChA4BHHjXwYUMs7XQn26B3efrk1otLAJhuqntZIIo3uU0WraA
 5J+4DT46ogs5rZWNzDCZ0zAkSaATDA6h9Nfh7TvPc9Q2WpcIT0cTa/jOtrxRc9Vq
 32hbUKtJSpNxRjwbZvk6YV21HtWo3Tktdaj9IeTQTN0/gfMyOMdgxta3+bymicbJ
 FuF9ybHcpXvrEctHhXHIL4/YVGEH/4shagZ4JVzv1dVdLeHLZtPomdf7+oc0+07m
 Qs+yV0HeRS5Zxt7w5blGLC4zDXczT/bUx8oln0Tz5MV7RR/+C2HwMOHC69gfpSc=
 =vomK
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

virtio,vhost,pci,pc: features, cleanups

SRAT tables for DIMM devices
new virtio net flags for speed/duplex
post-copy migration support in vhost
cleanups in pci

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Tue 20 Mar 2018 14:40:43 GMT
# gpg:                using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (51 commits)
  postcopy shared docs
  libvhost-user: Claim support for postcopy
  postcopy: Allow shared memory
  vhost: Huge page align and merge
  vhost+postcopy: Wire up POSTCOPY_END notify
  vhost-user: Add VHOST_USER_POSTCOPY_END message
  libvhost-user: mprotect & madvises for postcopy
  vhost+postcopy: Call wakeups
  vhost+postcopy: Add vhost waker
  postcopy: postcopy_notify_shared_wake
  postcopy: helper for waking shared
  vhost+postcopy: Resolve client address
  postcopy-ram: add a stub for postcopy_request_shared_page
  vhost+postcopy: Helper to send requests to source for shared pages
  vhost+postcopy: Stash RAMBlock and offset
  vhost+postcopy: Send address back to qemu
  libvhost-user+postcopy: Register new regions with the ufd
  migration/ram: ramblock_recv_bitmap_test_byte_offset
  postcopy+vhost-user: Split set_mem_table for postcopy
  vhost+postcopy: Transmit 'listen' to slave
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	scripts/update-linux-headers.sh
2018-03-20 15:48:34 +00:00
Dr. David Alan Gilbert
096bf4c852 vhost+postcopy: Helper to send requests to source for shared pages
Provide a helper to be used by shared waker functions to request
shared pages from the source.
The last_rb pointer is moved into the incoming state since this
helper can update it as well as the main fault thread function.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-03-20 05:03:29 +02:00
Dr. David Alan Gilbert
00fa4fc85b postcopy: Allow registering of fd handler
Allow other userfaultfd's to be registered into the fault thread
so that handlers for shared memory can get responses.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-03-20 05:03:28 +02:00
Vladimir Sementsov-Ogievskiy
b35ebdf076 migration: add postcopy migration of dirty bitmaps
Postcopy migration of dirty bitmaps. Only named dirty bitmaps are migrated.

If destination qemu is already containing a dirty bitmap with the same name
as a migrated bitmap (for the same node), then, if their granularities are
the same the migration will be done, otherwise the error will be generated.

If destination qemu doesn't contain such bitmap it will be created.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 20180313180320.339796-12-vsementsov@virtuozzo.com
[Changed '+' to '*' as per list discussion. --js]
Signed-off-by: John Snow <jsnow@redhat.com>
2018-03-13 17:06:09 -04:00
Vladimir Sementsov-Ogievskiy
55efc8c2ff qapi: add dirty-bitmaps migration capability
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180313180320.339796-7-vsementsov@virtuozzo.com
2018-03-13 17:05:45 -04:00
Markus Armbruster
9af2398977 Include less of the generated modular QAPI headers
In my "build everything" tree, a change to the types in
qapi-schema.json triggers a recompile of about 4800 out of 5100
objects.

The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h,
qapi-types.h.  Each of these headers still includes all its shards.
Reduce compile time by including just the shards we actually need.

To illustrate the benefits: adding a type to qapi/migration.json now
recompiles some 2300 instead of 4800 objects.  The next commit will
improve it further.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-24-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-02 13:45:50 -06:00
Peter Xu
3e0c8050eb migration: pass MigrationState to migrate_init()
Let the callers take the object, then pass it to migrate_init().

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180208103132.28452-12-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-02-14 10:37:09 +00:00
Peter Xu
d6208e35e4 migration: allow send_rq to fail
We will not allow failures to happen when sending data from destination
to source via the return path. However it is possible that there can be
errors along the way.  This patch allows the migrate_send_rp_message()
to return error when it happens, and further extended it to
migrate_send_rp_req_pages().

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180208103132.28452-9-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-02-14 10:36:52 +00:00
Peter Xu
64f615fe34 migration: reuse mis->userfault_quit_fd
It was only used for quitting the page fault thread before. Let it be
something more useful - now we can use it to notify a "wake" for the
page fault thread (for any reason), and it only means "quit" if the
fault_thread_quit is set.

Since we changed what it does, renaming it to userfault_event_fd.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180208103132.28452-3-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-02-14 10:35:45 +00:00
Markus Armbruster
522ece32d2 Drop superfluous includes of qapi-types.h and test-qapi-types.h
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-4-armbru@redhat.com>
2018-02-09 05:05:11 +01:00
Dr. David Alan Gilbert
cce8040bb0 migration: Allow migrate_fd_connect to take an Error *
Allow whatever is performing the connection to pass migrate_fd_connect
an error to indicate there was a problem during connection, an allow
us to clean up.

The caller must free the error.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-02-06 10:55:12 +00:00
Peter Maydell
ee86981bda migration: Revert postcopy-blocktime commit set
This reverts commits
ca6011c migration: add postcopy total blocktime into query-migrate
5f32dc8 migration: add blocktime calculation into migration-test
2f7dae9 migration: postcopy_blocktime documentation
3be98be migration: calculate vCPU blocktime on dst side
01a87f0 migration: add postcopy blocktime ctx into MigrationIncomingState
31bf06a migration: introduce postcopy-blocktime capability

as they don't build on ppc32 due to trying to do atomic accesses
on types that are larger than the host pointer type.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-01-23 10:08:05 +00:00
Peter Xu
b15df1ae50 migration: cleanup stats update into function
We have quite a few lines in migration_thread() that calculates some
statistics for the migration interations.  Isolate it into a single
function to improve readability.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-01-15 12:48:10 +01:00
Peter Xu
64909f9740 migration: introduce downtime_start
Introduce MigrationState.downtime_start to replace the local variable
"start_time" in migration_thread to avoid passing things around.

Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-01-15 12:48:09 +01:00
Peter Xu
7287cbd46e migration: move vm_old_running into global state
Firstly, it was passed around.  Let's just move it into MigrationState
just like many other variables as state of migration, renaming it to
vm_was_running.

One thing to mention is that for postcopy, we actually don't need this
knowledge at all since postcopy can't resume a VM even if it fails (we
can see that from the old code too: when we try to resume we also check
against "entered_postcopy" variable).  So further we do this:

- in postcopy_start(), we don't update vm_old_running since useless
- in migration_thread(), we don't need to check entered_postcopy when
  resume, since it's only used for precopy.

Comment this out too for that variable definition.

Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-01-15 12:48:08 +01:00
Peter Xu
4af246a34e migration: split use of MigrationState.total_time
It was used either to:

1. store initial timestamp of migration start, and
2. store total time used by last migration

Let's provide two parameters for each of them.  Mix use of the two is
slightly misleading.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-01-15 12:48:08 +01:00
Alexey Perevalov
ca6011c232 migration: add postcopy total blocktime into query-migrate
Postcopy total blocktime is available on destination side only.
But query-migrate was possible only for source. This patch
adds ability to call query-migrate on destination.
To be able to see postcopy blocktime, need to request postcopy-blocktime
capability.

The query-migrate command will show following sample result:
{"return":
    "postcopy-vcpu-blocktime": [115, 100],
    "status": "completed",
    "postcopy-blocktime": 100
}}

postcopy_vcpu_blocktime contains list, where the first item is the first
vCPU in QEMU.

This patch has a drawback, it combines states of incoming and
outgoing migration. Ongoing migration state will overwrite incoming
state. Looks like better to separate query-migrate for incoming and
outgoing migration or add parameter to indicate type of migration.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-01-15 12:48:04 +01:00
Alexey Perevalov
01a87f0bd3 migration: add postcopy blocktime ctx into MigrationIncomingState
This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID, in
case this feature is provided by kernel.

PostcopyBlocktimeContext is encapsulated inside postcopy-ram.c,
due to it being a postcopy-only feature.
Also it defines PostcopyBlocktimeContext's instance live time.
Information from PostcopyBlocktimeContext instance will be provided
much after postcopy migration end, instance of PostcopyBlocktimeContext
will live till QEMU exit, but part of it (vcpu_addr,
page_fault_vcpu_time) used only during calculation, will be released
when postcopy ended or failed.

To enable postcopy blocktime calculation on destination, need to
request proper compatibility (Patch for documentation will be at the
tail of the patch set).

As an example following command enable that capability, assume QEMU was
started with
-chardev socket,id=charmonitor,path=/var/lib/migrate-vm-monitor.sock
option to control it

[root@host]#printf "{\"execute\" : \"qmp_capabilities\"}\r\n \
{\"execute\": \"migrate-set-capabilities\" , \"arguments\":   {
\"capabilities\": [ { \"capability\": \"postcopy-blocktime\", \"state\":
true } ] } }" | nc -U /var/lib/migrate-vm-monitor.sock

Or just with HMP
(qemu) migrate_set_capability postcopy-blocktime on

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-01-15 12:48:00 +01:00
Alexey Perevalov
31bf06a9d6 migration: introduce postcopy-blocktime capability
Right now it could be used on destination side to
enable vCPU blocktime calculation for postcopy live migration.
vCPU blocktime - it's time since vCPU thread was put into
interruptible sleep, till memory page was copied and thread awake.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-01-15 12:47:59 +01:00
Juan Quintela
73af8dd8d7 migration: Make xbzrle_cache_size a migration parameter
Right now it is a variable in MigrationState instead of a
MigrationParameter.  The change allows to set it as the rest of the
Migration parameters, from the command line, with
query_migration_paramters, set_migrate_parameters, etc.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-10-29 14:06:15 +01:00
Juan Quintela
87db1a7d89 migration: Improve migration thread error handling
We now report errors also when we finish migration, not only on info
migrate.  We plan to use this error from several places, and we want
the first error to happen to win, so we add an mutex to order it.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-10-23 18:03:43 +02:00
Dr. David Alan Gilbert
e91d8951d5 migration: Wait for semaphore before completing migration
Wait for a semaphore before completing the migration,
if the previously added capability was enabled.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-10-23 18:03:29 +02:00
Dr. David Alan Gilbert
93fbd0314e migration: Add 'pause-before-switchover' capability
When 'pause-before-switchover' is enabled, the outgoing migration
will pause before invalidating the block devices and serializing
the device state.
At this point the management layer gets the chance to clean up any
device jobs or other device users before the migration completes.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-10-23 18:03:27 +02:00
Vladimir Sementsov-Ogievskiy
58110f0acb migration: split common postcopy out of ram postcopy
Split common postcopy staff from ram postcopy staff.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22 14:11:27 +02:00
Juan Quintela
0fb86605ea migration: Create x-multifd-page-count parameter
Indicates how many pages we are going to send in each batch to a multifd
thread.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>

--

Be consistent with defaults and documentation
Use new DEFINE_PROP_*
Rename x-multifd-group to x-multifd-page-count
2017-09-22 14:11:21 +02:00
Juan Quintela
4075fb1ca4 migration: Create x-multifd-channels parameter
Indicates the number of channels that we will create.  By default we
create 2 channels.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>

--

Catch inconsistent defaults (eric).
Improve comment stating that number of threads is the same than number
of sockets
Use new DEFIN_PROP_*
Rename x-multifd-threads to x-multifd-threads
2017-09-22 14:11:21 +02:00
Juan Quintela
30126bbf1f migration: Add multifd capability
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>

--

Use new DEFINE_PROP
2017-09-22 14:11:20 +02:00
Juan Quintela
428d89084c migration: Create migration_has_all_channels
This function allows us to decide when to close the listener socket.
For now, we only need one connection.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
2017-09-22 14:11:19 +02:00