The following sequence may cause the VM abort during migration:
1. RUN_STATE_RUNNING,MIGRATION_STATUS_ACTIVE
2. before call migration_completion(), we send migrate_cancel
QMP command, the state machine is changed to:
RUN_STATE_RUNNING,MIGRATION_STATUS_CANCELLING
3. call migration_completion(), and the state machine is
switch to: RUN_STATE_RUNNING,MIGRATION_STATUS_COMPLETED
4. call migration_iteration_finish(), because the migration
status is COMPLETED, so it will try to set the runstate
to POSTMIGRATE, but RUNNING-->POSTMIGRATE is an invalid
transition, so abort().
The migration_completion() should not change the migration state
to COMPLETED if it is already changed to CANCELLING.
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Message-Id: <20201105091726.148-1-longpeng2@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
When creating new tls client, the tioc->master will be referenced which results in socket
leaking after multifd_save_cleanup if we cancel migration.
Fix it by do object_unref() after tls client creation.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Message-Id: <1605104763-118687-1-git-send-email-zhengchuan@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Remove redundant blank line which is left by Commit 662770af7c,
also take this opportunity to remove redundant includes in dirtyrate.c.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Message-Id: <1604030281-112946-1-git-send-email-zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
After the WITH_QEMU_LOCK_GUARD macro is added, the compiler cannot identify
that the statements in the macro must be executed. As a result, some variables
assignment statements in the macro may be considered as unexecuted by the compiler.
When the -Wmaybe-uninitialized capability is enabled on GCC9,the compiler showed warning:
migration/migration.c: In function ‘migrate_send_rp_req_pages’:
migration/migration.c:384:8: warning: ‘received’ may be used uninitialized in this function [-Wmaybe-uninitialized]
384 | if (received) {
| ^
Add a default value for 'received' to prevented the warning.
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201111142203.2359370-6-kuhn.chenqun@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The qemu main loop could hang up forever when we enable TLS+Multifd.
The Src multifd_send_0 invokes tls handshake, it sends hello to sever
and wait response.
However, the Dst main qemu loop has been waiting recvmsg() for multifd_recv_1.
Both of Src and Dst main qemu loop are blocking and waiting for reponse which
results in hanging up forever.
Src: (multifd_send_0) Dst: (multifd_recv_1)
multifd_channel_connect migration_channel_process_incoming
multifd_tls_channel_connect migration_tls_channel_process_incoming
multifd_tls_channel_connect qio_channel_tls_handshake_task
qio_channel_tls_handshake gnutls_handshake
qio_channel_tls_handshake_task ...
qcrypto_tls_session_handshake ...
gnutls_handshake ...
... ...
recvmsg (Blocking I/O waiting for response) recvmsg (Blocking I/O waiting for response)
Fix this by offloadinig handshake work to a background thread.
Reported-by: Yan Jin <jinyan12@huawei.com>
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Message-Id: <1604643893-8223-1-git-send-email-zhengchuan@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The '%u' conversion specifier is for decimal notation.
When prefixing a format with '0x', we want the hexadecimal
specifier ('%x').
Inspired-by: Dov Murik <dovmurik@linux.vnet.ibm.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201103112558.2554390-5-philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
we should set ram_bulk_stage to false after ram_state_init,
otherwise the bitmap will be unused in migration_bitmap_find_dirty.
all pages in ram cache will be flushed to the ram of secondary guest
for each checkpoint.
Signed-off-by: Lei Rao <lei.rao@intel.com>
Signed-off-by: Derek Su <dereksu@qnap.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The new migrate_send_rp_req_pages_pending() call should greatly improve
destination responsiveness because it will resync faulted address after
postcopy recovery. However it is also the 1st place to initiate the page
request from the main thread.
One thing is overlooked on that migrate_send_rp_message_req_pages() is not
designed to be thread-safe. So if we wake the fault thread before syncing all
the faulted pages in the main thread, it means they can race.
Postpone the wake up operation after the sync of faulted addresses.
Fixes: 0c26781c09 ("migration: Sync requested pages after postcopy recovery")
Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201102153010.11979-3-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
When postcopy recover happens, we need to reset last_rb after each return of
postcopy_pause_fault_thread() because that means we just got the postcopy
migration continued.
Unify this reset to the place right before we want to kick the fault thread
again, when we get the command MIG_CMD_POSTCOPY_RESUME from source.
This is actually more than that - because the main thread on destination will
now be able to call migrate_send_rp_req_pages_pending() too, so the fault
thread is not the only user of last_rb now. Move the reset earlier will allow
the first call to migrate_send_rp_req_pages_pending() to use the reset value
even if called from the main thread.
(NOTE: this is not a real fix to 0c26781c09 mentioned below, however it is just
a mark that when picking up 0c26781c09 we'd better have this one too; the real
fix will come later)
Fixes: 0c26781c09 ("migration: Sync requested pages after postcopy recovery")
Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201102153010.11979-2-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Added amount of bytes transferred to the VM at destination by all VFIO
devices
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Another go at Peter's postcopy fixes
Cleanups from Bihong Yu and Peter Maydell.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAl+W9n8ACgkQBRYzHrxb
/ef2uRAAqWTFLXuBF8+evEd1mMq2SM3ZYTuc7QKTY3MzAH6J/OMvJbZ112itqWOb
iZ5NuuWH4PvzOhlR/PNNf1Yv3hTfv36HinG+OCh6s+6aqVx9yHOAfdBgmJIdYAeg
Sk1jx43dvCyN2FwPs31ir3L6mwsrtfkRsS+2FeyrvRoEl4WE9mOoypCft3vdd9Dw
zZea0Pw7vIs454D4n1vpJiQtq6B4eSAlQKpTLfQbglpTm4MgqLERzGvpT6hbQXJR
eQyTOqRe08viIOZ+oN0B/+RVO6T9jc4Y1bEl2NSak1v4Tf7NNfDkFpLAjFm07V/1
tIhL/NOOsHdzfHQtrZpzKQgwaceb1N5qo0PfxD6/tRf9HlXY54iw6yY75+5c5Y89
UK8VSIYKnM2yXeVDLShxixIr3A1Z+zA41XydDwaLZczjeV7+nwrAXAjO8a+j6Dox
zj4IyN2g5elEOmarC8qkvbDZ+TVvA2tookhWVwoz+D8ChYkcRDKP9eoYomfRwg+e
NKRFuLBkyVPb0eEhyOV6HqJbMfTLpHneTM94v6HGz8tiK8IlMZfTTnC2Mr5gTXuS
/cgOVhsY7+l+pKpxpGJmU3aUCYRk1CuK6MhXgjYEFMh5Siba8s0ZPZVaEm/BUyO1
rD+tVup87xMiJq3xnmLX+opblYE9G+b67hH1KuPc5vZXiSwuTkQ=
=OL0Q
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20201026a' into staging
migration pull: 2020-10-26
Another go at Peter's postcopy fixes
Cleanups from Bihong Yu and Peter Maydell.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
# gpg: Signature made Mon 26 Oct 2020 16:17:03 GMT
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20201026a:
migration-test: Only hide error if !QTEST_LOG
migration/postcopy: Release fd before going into 'postcopy-pause'
migration: Sync requested pages after postcopy recovery
migration: Maintain postcopy faulted addresses
migration: Introduce migrate_send_rp_message_req_pages()
migration: Pass incoming state into qemu_ufd_copy_ioctl()
migration: using trace_ to replace DPRINTF
migration: Delete redundant spaces
migration: Open brace '{' following function declarations go on the next line
migration: Do not initialise statics and globals to 0 or NULL
migration: Add braces {} for if statement
migration: Open brace '{' following struct go on the same line
migration: Add spaces around operator
migration: Don't use '#' flag of printf format
migration: Do not use C99 // comments
migration: Drop unused VMSTATE_FLOAT64 support
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Logically below race could trigger with the old code:
test program migration thread
------------ ----------------
wait_until('postcopy-pause')
postcopy_pause()
set_state('postcopy-pause')
do_postcopy_recover()
arm s->to_dst_file with new fd
release s->to_dst_file [1]
Here [1] could have released the just-installed recoverying channel. Then the
migration could hang without really resuming.
Instead, it should be very safe to release the fd before setting the state into
'postcopy-pause', because there's no reason for any other thread to touch it
during 'postcopy-active'.
Dave reported a very rare postcopy recovery hang that the migration-test
program waited for the migration to complete in migrate_postcopy_complete().
We do suspect it's the same thing that we're gonna fix here. Hard to tell.
However since we've noticed this, fix this irrelevant of the hang report.
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201021212721.440373-6-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We synchronize the requested pages right after a postcopy recovery happens.
This helps to synchronize the prioritized pages on source so that the faulted
threads can be served faster.
Reported-by: Xiaohui Li <xiaohli@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201021212721.440373-5-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Maintain a list of faulted addresses on the destination host for which we're
waiting on. This is implemented using a GTree rather than a real list to make
sure even there're plenty of vCPUs/threads that are faulting, the lookup will
still be fast with O(log(N)) (because we'll do that after placing each page).
It should bring a slight overhead, but ideally that shouldn't be a big problem
simply because in most cases the requested page list will be short.
Actually we did similar things for postcopy blocktime measurements. This patch
didn't use that simply because:
(1) blocktime measurement is towards vcpu threads only, but here we need to
record all faulted addresses, including main thread and external
thread (like, DPDK via vhost-user).
(2) blocktime measurement will require UFFD_FEATURE_THREAD_ID, but here we
don't want to add that extra dependency on the kernel version since not
necessary. E.g., we don't need to know which thread faulted on which
page, we also don't care about multiple threads faulting on the same
page. But we only care about what addresses are faulted so waiting for a
page copying from src.
(3) blocktime measurement is not enabled by default. However we need this by
default especially for postcopy recover.
Another thing to mention is that this patch introduced a new mutex to serialize
the receivedmap and the page_requested tree, however that serialization does
not cover other procedures like UFFDIO_COPY.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201021212721.440373-4-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This is another layer wrapper for sending a page request to the source VM. The
new migrate_send_rp_message_req_pages() will be used elsewhere in coming
patches.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201021212721.440373-3-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
It'll be used in follow up patches to access more fields out of it. Meanwhile
fetch the userfaultfd inside the function.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201021212721.440373-2-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Message-Id: <1603179176-5360-1-git-send-email-yubihong@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1603163448-27122-9-git-send-email-yubihong@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1603163448-27122-8-git-send-email-yubihong@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1603163448-27122-7-git-send-email-yubihong@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1603163448-27122-6-git-send-email-yubihong@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1603163448-27122-5-git-send-email-yubihong@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1603163448-27122-4-git-send-email-yubihong@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1603163448-27122-3-git-send-email-yubihong@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1603163448-27122-2-git-send-email-yubihong@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commit ef96e3ae96 in January 2019 removed the last user of the
VMSTATE_FLOAT64* macros. These were used by targets which defined
their floating point register file as an array of 'float64'.
We used to try to maintain a stricter distinction between
'float64' (a type for holding an integer representing an IEEE float)
and 'uint64_t', including having a debug option for 'float64' being
a struct and supposedly mandatory macros for converting between
float64 and uint64_t. We no longer think that's a usefully
strong distinction to draw and we allow ourselves to freely
assume that float64 really is just a 64-bit integer type, so
for new targets we would simply recommend use of the uint64_t type
for a floating point register file. The float64 type remains
as a useful way of documenting in the type signature of helper
functions and the like that they expect to receive an IEEE float
from the TCG generated code rather than an arbitrary integer.
Since the VMSTATE_FLOAT64* macros have no remaining users and
we don't recommend new code uses them, delete them.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20201022120830.5938-1-peter.maydell@linaro.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
A default value is provided for the variable 'bitmap_name' to avoid
a compiler warning.
The compiler showed the warning:
migration/block-dirty-bitmap.c:1090:13: warning: ‘bitmap_name’
may be used uninitialized in this function [-Wmaybe-uninitialized]
g_strlcpy(s->bitmap_name, bitmap_name, sizeof(s->bitmap_name));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Message-Id: <20201014114430.1898684-1-kuhn.chenqun@huawei.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: commit message grammar tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
sectors_per_chunk is a 64 bit integer, but the calculation is done in 32
bits, leading to an overflow for coarse bitmap granularities.
If that results in the value 0, it leads to a hang where no progress is
made but send_bitmap_bits is constantly called with nr_sectors being 0.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Message-Id: <20201021144456.1072-1-s.reiter@proxmox.com>
Fixes: b35ebdf07 migration: add postcopy migration of dirty bitmaps
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: Use correct type for 8ULL, use () to avoid overflow]
Signed-off-by: Eric Blake <eblake@redhat.com>
Restricting xen-set-global-dirty-log and xen-load-devices-state
commands migration.json pulls slightly less QAPI-generated code
into user-mode and tools.
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201012121536.3381997-6-philmd@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Be consistent creating all the libraries in the main meson.build file.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201006125602.2311423-6-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make dirty_rate field optional, present dirty rate only when querying
the rate has completed.
The qmp results is shown as follow:
@unstarted:
{"return":{"status":"unstarted","start-time":0,"calc-time":0},"id":"libvirt-12"}
@measuring:
{"return":{"status":"measuring","start-time":102931,"calc-time":1},"id":"libvirt-85"}
@measured:
{"return":{"status":"measured","dirty-rate":4,"start-time":150146,"calc-time":1},"id":"libvirt-15"}
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <1601350938-128320-3-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Querying could include both the start-time and the calc-time while at the measuring
state, allowing a caller to determine when they should expect to come back looking
for a result.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Message-Id: <1601350938-128320-2-git-send-email-zhengchuan@huawei.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
cur_mon really needs to be coroutine-local as soon as we move monitor
command handlers to coroutines and let them yield. As a first step, just
remove all direct accesses to cur_mon so that we can implement this in
the getter function later.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-4-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Non-empty record/replay queue prevents saving and loading the VM state,
because it includes pending bottom halves and block coroutines.
But when the new VM state is loaded, we don't have to preserve the consistency
of the current state anymore. Therefore this patch just flushes the queue
allowing the coroutines to finish and removes checking for empty rr queue
for load_snapshot function.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <160174521762.12451.15752448887893855757.stgit@pasha-ThinkPad-X280>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Saving icount as a parameters of the snapshot allows navigation between
them in the execution replay scenario.
This information can be used for finding a specific snapshot for proceeding
the recorded execution to the specific moment of the time.
E.g., 'reverse step' action (introduced in one of the following patches)
needs to load the nearest snapshot which is prior to the current moment
of time.
This patch also updates snapshot test which verifies qemu monitor output.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
--
v4 changes:
- squashed format update with test output update
v7 changes:
- introduced the spaces between the fields in snapshot info output
- updated the test to match new field widths
Message-Id: <160174518865.12451.14327573383978752463.stgit@pasha-ThinkPad-X280>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
GCC 9.3.0 on Ubuntu complains:
In file included from /usr/include/string.h:495,
from /home/travis/build/huth/qemu/include/qemu/osdep.h:87,
from ../migration/global_state.c:13:
In function ‘strncpy’,
inlined from ‘global_state_store_running’ at ../migration/global_state.c:47:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:10: error:
‘__builtin_strncpy’ specified bound 100 equals destination size [-Werror=stringop-truncation]
106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
... but we apparently really want to do a strncpy here - the size is already
checked with the assert() statement right in front of it. To silence the
warning, simply replace it with our strpadcpy() function.
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com> (two years ago)
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200918103430.297167-4-thuth@redhat.com>
Message-Id: <20200925154027.12672-5-alex.bennee@linaro.org>
One patch in the last pull broke test 068 which
does a pair of vmload's.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAl9x/vUACgkQBRYzHrxb
/edt6Q/9E7P3CTZWy2JHKcHPY4rV4NfJEP/bY947Z7nd2+jaCx7ObLOGtWpuzd+i
+4NsuuI54DeYaK6HNZPC84KTxBkF8P70lh4EUkzTdn8sDrrYDcmDx6JmdxnAwGQ7
GNS5Uzi/BUVJlw+IkKxj/nIlZBjc3EUr7uYboqIKKArun/4z/hflndYtzB7lLxig
DkVBTbtMoE0LhRt6tx9s3GzqC0vZPpISZF/mPEOptlsx09oWqfzilyenrCfB+icE
ZuCzKeVkpWer10sWj2xbaI+Ep9wb5Q81qvtIoIcqRspbGz4XENwqHAuDWcpnniEL
nlhjunziLIzAbHiis/FnLUUzPs/Q/Q2eRZc3MziSoU7AKHZ/2MCdYRph427tR8Q5
xE3V8UHKjAGnigO88InAp2fS8Z///p+c49Xx4zJaXAQTGzhbTni5ng0kcakYWbp+
SNpL8Yto38OfqdIe8cBy+At5DeXfkBsy50vIvSHKcG01DRt2Ty8G0Ai23AImTNgJ
zt+Q6y09TuRF07Ao48wMVYW4XRcXmW8eVZzWFHPBBG65AfeC6zGCqzX2VmPlV2fm
qGjv8zINnDlaFX9j/z3Z4zdf66LQcMpD6IFB+POlcyEG22MtHSxfXQDGNKSNplSr
v1ae9wwl5wXRAXSrvIlQhyw/VVYOdD3ZGmgGnXnzLikp+h3NrSY=
=eW0Z
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20200928a' into staging
Migration: Revert one patch for 068 fix
One patch in the last pull broke test 068 which
does a pair of vmload's.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
# gpg: Signature made Mon 28 Sep 2020 16:19:17 BST
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20200928a:
Revert "migration: Properly destroy variables on incoming side"
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
add trace points for multifd-tls for debug.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Yan Jin <jinyan12@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <1600139042-104593-7-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Similar like migration main thread, we need to do handshake
for each multifd thread.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Yan Jin <jinyan12@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <1600139042-104593-6-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
multifd channel cleanup is need if multifd handshake failed,
let's extract it.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Yan Jin <jinyan12@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <1600139042-104593-5-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Since multifd creation is async with migration_channel_connect, we should
pass the hostname from MigrationState to MultiFDSendParams.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Yan Jin <jinyan12@huawei.com>
Message-Id: <1600139042-104593-4-git-send-email-zhengchuan@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
migration_tls_client_create will be used in multifd-tls, let's
extract it.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Yan Jin <jinyan12@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <1600139042-104593-3-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
hostname is need in multifd-tls, save hostname into MigrationState.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Yan Jin <jinyan12@huawei.com>
Message-Id: <1600139042-104593-2-git-send-email-zhengchuan@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
max-bandwidth is set by default to 32 MiB/s (256 Mib/s)
since 2008 (5bb7910af0).
Most of the CPUs can dirty memory faster than that now,
and this is clearly a problem with POWER where the page
size is 64 kiB and not 4 KiB.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20200921144957.979989-1-lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
When running the xen-save-devices-state QMP command, if the filename
already exists it will be truncated before dumping the devices' state
into it.
Signed-off-by: Dov Murik <dovmurik@linux.vnet.ibm.com>
Message-Id: <20200921094830.114028-1-dovmurik@linux.vnet.ibm.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Add trace_calls to make it easier to debug
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <1600237327-33618-13-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Implement qmp_cal_dirty_rate()/qmp_get_dirty_rate() function which could be called
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Message-Id: <1600237327-33618-12-git-send-email-zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
atomic function fixup
Wording fixup in migration.json based on Eric's review
Implement calculate_dirtyrate() function.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: YanYing Zhuang <ann.zhuangyanying@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <1600237327-33618-11-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Implement is_sample_period_valid() to check if the sample period is vaild and
do set_sample_page_period() to sleep specific time between sample actions.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <1600237327-33618-10-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
In order to sample real RAM, skip ramblock with size below MIN_RAMBLOCK_SIZE
which is set as 128M.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <1600237327-33618-9-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Compare page hash results for recorded sampled page.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: YanYing Zhuang <ann.zhuangyanying@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <1600237327-33618-8-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Record hash results for each sampled page, crc32 is taken to calculate
hash results for each sampled length in TARGET_PAGE_SIZE.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: YanYing Zhuang <ann.zhuangyanying@huawei.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <1600237327-33618-7-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
RAMBLOCK_FOREACH_MIGRATABLE is need in dirtyrate measure,
move the existing definition up into migration/ram.h
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <1600237327-33618-6-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Add dirtyrate statistics functions to record/update dirtyrate info.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <1600237327-33618-5-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Add RamblockDirtyInfo to store sampled page info of each ramblock.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <1600237327-33618-4-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
add DirtyRateStatus to denote calculating status.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <1600237327-33618-3-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
atomic name fixup
Add get_dirtyrate_thread() functions to setup query-dirtyrate
framework.
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: YanYing Zhuang <ann.zhuangyanying@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <1600237327-33618-2-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We duplicated the logic of maintaining the last_rb variable at both callers of
this function. Pass *rb pointer into the function so that we can avoid
duplicating the logic. Also, when we have the rb pointer, it's also easier to
remove the original 2nd & 4th parameters, because both of them (name of the
ramblock when needed, or the page size) can be fetched from the ramblock
pointer too.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200908203022.341615-3-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
In migration_incoming_state_destroy(), we've got a few variables that aren't
destroyed properly, namely:
main_thread_load_event
postcopy_pause_sem_dst
postcopy_pause_sem_fault
rp_mutex
Destroy them properly.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200908203022.341615-2-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included <stdatomic.h> via a system header file:
$ CC=clang CXX=clang++ ./configure ... && make
../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)
Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
searched GitHub for existing "qatomic_" users but there seem to be none.
This patch was generated using:
$ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
sort -u >/tmp/changed_identifiers
$ for identifier in $(</tmp/changed_identifiers); do
sed -i "s%\<$identifier\>%q$identifier%g" \
$(git grep -I -l "\<$identifier\>")
done
I manually fixed line-wrap issues and misaligned rST tables.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
I found that there are many spelling errors in the comments of qemu,
so I used the spellcheck tool to check the spelling errors
and finally found some spelling errors in the migration folder.
Signed-off-by: zhaolichang <zhaolichang@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20200917075029.313-3-zhaolichang@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
- qemu-img create: Fail gracefully when backing file is an empty string
- Fixes related to filter block nodes ("Deal with filters" series)
- block/nvme: Various cleanups required to use multiple queues
- block/nvme: Use NvmeBar structure from "block/nvme.h"
- file-win32: Fix "locking" option
- iotests: Allow running from different directory
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl9Z7bcRHGt3b2xmQHJl
ZGhhdC5jb20ACgkQfwmycsiPL9ZS6w//bos+A0RfRRF0YFWkIBLQWxqzKcGvMJ8W
XWv3mFzd47UaDgRYwVnCC3CR6bLYEINISngZ3geA4jI1+w7AtYKDOO0HN32dUg+D
ZrNMn02701CA6qkmpxJ+yjsrl9ltR3jYe0me4Wr39Pvdexa2pl/e+M4Vas6FhkYL
ghAwNThypscGCrFjAlz3ru2Sc/K+sPWrGoqkzr+SWvsm9wy4vb8aLxr8Yy50x/zc
CqALS9SQ/YA93BCVi9CzPkVyV3ioA0kg/y38WvLtAQ9GZ3m/ekMro3WvdYsRsFCN
LGXsuwFig+U7Kd7lJrCS9TLnlTJstNGqPq9jEoV5cThPvGknFfMvVOzRmmP7tzqT
YRcPRy39z44OoLKa3kyg3aF38BTxt+9gPqBnivKMr9j9EecMvPsXXHRvF+lP+LsP
j753Ih561hX6FurcjX8pc9GOM2cQA0GjlyL77UTTAmLZyFXP/8e55oQbBuYTylc/
Xlvmc/T+yEGiEGTnK+FxgDAiUaxbCCM9cDVStJjTvsIq43dwXb48g1onDsGZ5eDf
j9lmAD6TJxHNOB5ErNsDPODf4/D1wJ9t9WVF8UZp9ArfPHRdxMzT7Q4LvetaDmVl
+hQC9cgTq8Qd8LwSqbKEYua4L6iGbmLAT7/N6htq5L1eVLg76/tLg/tKSwh/vKAY
yzPmyHaVK84=
=gaaW
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- qemu-img create: Fail gracefully when backing file is an empty string
- Fixes related to filter block nodes ("Deal with filters" series)
- block/nvme: Various cleanups required to use multiple queues
- block/nvme: Use NvmeBar structure from "block/nvme.h"
- file-win32: Fix "locking" option
- iotests: Allow running from different directory
# gpg: Signature made Thu 10 Sep 2020 10:11:19 BST
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (65 commits)
block/qcow2-cluster: Add missing "fallthrough" annotation
block/nvme: Pair doorbell registers
block/nvme: Use generic NvmeBar structure
block/nvme: Group controller registers in NVMeRegs structure
file-win32: Fix "locking" option
iotests: Allow running from different directory
iotests: Test committing to overridden backing
iotests: Add test for commit in sub directory
iotests: Add filter mirror test cases
iotests: Add filter commit test cases
iotests: Let complete_and_wait() work with commit
iotests: Test that qcow2's data-file is flushed
block: Leave BDS.backing_{file,format} constant
block: Inline bdrv_co_block_status_from_*()
blockdev: Fix active commit choice
block: Drop backing_bs()
qemu-img: Use child access functions
nbd: Use CAF when looking for dirty bitmap
commit: Deal with filters
backup: Deal with filters
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Some trace points are attributed to the wrong source file. Happens
when we neglect to update trace-events for code motion, or add events
in the wrong place, or misspell the file name.
Clean up with help of scripts/cleanup-trace-events.pl. Funnies
requiring manual post-processing:
* accel/tcg/cputlb.c trace points are in trace-events.
* block.c and blockdev.c trace points are in block/trace-events.
* hw/block/nvme.c uses the preprocessor to hide its trace point use
from cleanup-trace-events.pl.
* hw/tpm/tpm_spapr.c uses pseudo trace point tpm_spapr_show_buffer to
guard debug code.
* include/hw/xen/xen_common.h trace points are in hw/xen/trace-events.
* linux-user/trace-events abbreviates a tedious list of filenames to
*/signal.c.
* net/colo-compare and net/filter-rewriter.c use pseudo trace points
colo_compare_miscompare and colo_filter_rewriter_debug to guard
debug code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200806141334.3646302-5-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tracked down with the help of scripts/cleanup-trace-events.pl.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20200806141334.3646302-4-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Some typedefs and macros are defined after the type check macros.
This makes it difficult to automatically replace their
definitions with OBJECT_DECLARE_TYPE.
Patch generated using:
$ ./scripts/codeconverter/converter.py -i \
--pattern=QOMStructTypedefSplit $(git grep -l '' -- '*.[ch]')
which will split "typdef struct { ... } TypedefName"
declarations.
Followed by:
$ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \
$(git grep -l '' -- '*.[ch]')
which will:
- move the typedefs and #defines above the type check macros
- add missing #include "qom/object.h" lines if necessary
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-9-ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-10-ehabkost@redhat.com>
Message-Id: <20200831210740.126168-11-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Places that use patterns like
if (bs->drv->is_filter && bs->file) {
... something about bs->file->bs ...
}
should be
BlockDriverState *filtered = bdrv_filter_bs(bs);
if (filtered) {
... something about @filtered ...
}
instead.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Currently migration_tls_get_creds() adds the reference of creds
but there was no place to unref it. So the OBJECT(creds) will
never be freed and result in memory leak.
The leak stack:
Direct leak of 104 byte(s) in 1 object(s) allocated from:
#0 0xffffa88bd20b in __interceptor_malloc (/usr/lib64/libasan.so.4+0xd320b)
#1 0xffffa7f0cb1b in g_malloc (/usr/lib64/libglib-2.0.so.0+0x58b1b)
#2 0x14b58cb in object_new_with_type qom/object.c:634
#3 0x14b597b in object_new qom/object.c:645
#4 0x14c0e4f in user_creatable_add_type qom/object_interfaces.c:59
#5 0x141c78b in qmp_object_add qom/qom-qmp-cmds.c:312
#6 0x140e513 in qmp_marshal_object_add qapi/qapi-commands-qom.c:279
#7 0x176ba97 in do_qmp_dispatch qapi/qmp-dispatch.c:165
#8 0x176bee7 in qmp_dispatch qapi/qmp-dispatch.c:208
#9 0x136e337 in monitor_qmp_dispatch monitor/qmp.c:150
#10 0x136eae3 in monitor_qmp_bh_dispatcher monitor/qmp.c:239
#11 0x1852e93 in aio_bh_call util/async.c:89
#12 0x18531b7 in aio_bh_poll util/async.c:117
#13 0x18616bf in aio_dispatch util/aio-posix.c:459
#14 0x1853f37 in aio_ctx_dispatch util/async.c:268
#15 0xffffa7f06a7b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a7b)
Since we're fine to use the borrowed reference when using the creds,
so just remove the object_ref() in migration_tls_get_creds().
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Message-Id: <20200722033228.71-1-yezhenyu2@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
With blockdev, a BlockDriverState may not have a device name,
so using a node name is required as an alternative.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200827111606.1408275-2-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The vsock channel is more widely use in some new features, for example,
the Nitro/Enclave. It can also be used as the migration channel.
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Message-Id: <20200806074030.174-3-longpeng2@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Currently, the only difference of tcp channel and unix channel in
migration/socket.c is the way to build SocketAddress, but socket_parse()
can handle these two types, so use it to instead of tcp_build_address()
and unix_build_address().
The socket-type channel can be further unified based on the up, this
would be helpful for us to add other socket-type channels.
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Message-Id: <20200806074030.174-2-longpeng2@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Rename the macros to make them consistent with the MIGRATION_OBJ
macro name.
This will make future conversion to OBJECT_DECLARE* easier.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-By: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20200825192110.3528606-51-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This migration parameter allows mapping block node names and bitmap
names to aliases for the purpose of block dirty bitmap migration.
This way, management tools can use different node and bitmap names on
the source and destination and pass the mapping of how bitmaps are to be
transferred to qemu (on the source, the destination, or even both with
arbitrary aliases in the migration stream).
While touching this code, fix a bug where bitmap names longer than 255
bytes would fail an assertion in qemu_put_counted_string().
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200820150725.68687-2-mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Meson doesn't enjoy the same flexibility we have with Make in choosing
the include path. In particular the tracing headers are using
$(build_root)/$(<D).
In order to keep the include directives unchanged,
the simplest solution is to generate headers with patterns like
"trace/trace-audio.h" and place forwarding headers in the source tree
such that for example "audio/trace.h" includes "trace/trace-audio.h".
This patch is too ugly to be applied to the Makefiles now. It's only
a way to separate the changes to the tracing header files from the
Meson rewrite of the tracing logic.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Noticed while reviewing the file for newer patches.
Fixes: b35ebdf076
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200727203206.134996-1-eblake@redhat.com>
First, if only bitmaps postcopy is enabled (and not ram postcopy)
postcopy_pause_incoming crashes on an assertion
assert(mis->to_src_file).
And anyway, bitmaps postcopy is not prepared to be somehow recovered.
The original idea instead is that if bitmaps postcopy failed, we just
lose some bitmaps, which is not critical. So, on failure we just need
to remove unfinished bitmaps and guest should continue execution on
destination.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200727194236.19551-18-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
If target is turned off prior to postcopy finished, target crashes
because busy bitmaps are found at shutdown.
Canceling incoming migration helps, as it removes all unfinished (and
therefore busy) bitmaps.
Similarly on source we crash in bdrv_close_all which asserts that all
bdrv states are removed, because bdrv states involved into dirty bitmap
migration are referenced by it. So, we need to cancel outgoing
migration as well.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <20200727194236.19551-17-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Bitmaps data is not critical, and we should not fail the migration (or
use postcopy recovering) because of dirty-bitmaps migration failure.
Instead we should just lose unfinished bitmaps.
Still we have to report io stream violation errors, as they affect the
whole migration stream.
While touching this, tighten code that was previously blindly calling
malloc on a size read from the migration stream, as a corrupted stream
(perhaps from a malicious user) should not be able to convince us to
allocate an inordinate amount of memory.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200727194236.19551-16-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: typo fixes, enhance commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
Keep bitmap state for disabled bitmaps too. Keep the state until the
end of the process. It's needed for the following commit to implement
bitmap postcopy canceling.
To clean-up the new list the following logic is used:
We need two events to consider bitmap migration finished:
1. chunk with DIRTY_BITMAP_MIG_FLAG_COMPLETE flag should be received
2. dirty_bitmap_mig_before_vm_start should be called
These two events may come in any order, so we understand which one is
last, and on the last of them we remove bitmap migration state from the
list.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <20200727194236.19551-15-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
bdrv_enable_dirty_bitmap_locked() call does nothing, as if we are in
postcopy, bitmap successor must be enabled, and reclaim operation will
enable the bitmap.
So, actually we need just call _reclaim_ in both if branches, and
making differences only to add an assertion seems not really good. The
logic becomes simple: on load complete we do reclaim and that's all.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <20200727194236.19551-14-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
finish_lock is bad name, as lock used not only on process end.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <20200727194236.19551-13-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Move all state variables into one global struct. Reduce global
variable usage, utilizing opaque pointer where possible.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <20200727194236.19551-12-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
No reasons to keep two public init functions.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200727194236.19551-11-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Rename dirty_bitmap_mig_cleanup to dirty_bitmap_do_save_cleanup, to
stress that it is on save part.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200727194236.19551-10-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Rename types to be symmetrical for load/save part and shorter.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200727194236.19551-9-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Using the _locked version of bdrv_enable_dirty_bitmap to bypass locking
is wrong as we do not already own the mutex. Moreover, the adjacent
call to bdrv_dirty_bitmap_enable_successor grabs the mutex.
Fixes: 58f72b965e9e1q
Cc: qemu-stable@nongnu.org # v3.0
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200727194236.19551-8-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We shouldn't fail when finding an unnamed bitmap in a unnamed node or
node with auto-generated node name, as bitmap migration ignores such
bitmaps in the first place.
Fixes: 82640edb88
Fixes: 4ff5cc121b
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200626130658.76498-1-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: commit message grammar tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>