Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231221031652.119827-67-richard.henderson@linaro.org>
Allow the array of pointers to itself be const.
Propagate this through the copies of this field.
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231221031652.119827-2-richard.henderson@linaro.org>
vcpu_dirty_stat_collect() has an unused parameter so remove it.
Signed-off-by: Wafer <wafer@jaguarmicro.com>
Reviewed-by: Hyman Huang <yong.huang@smartx.com>
Message-Id: <20231204012230.4123-1-wafer@jaguarmicro.com>
This is the big patch that removes
aio_context_acquire()/aio_context_release() from the block layer and
affected block layer users.
There isn't a clean way to split this patch and the reviewers are likely
the same group of people, so I decided to do it in one patch.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-ID: <20231205182011.1976568-7-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
migrate_uri_parse() allocates memory to 'channel' if the user
opts for old syntax - uri, which is leaked because there is no
code for freeing 'channel'.
So, free channel to avoid memory leak in case where 'channels'
is empty and uri parsing is required.
Fixes: 5994024f ("migration: Implement MigrateChannelList to qmp migration flow")
Signed-off-by: Het Gala <het.gala@nutanix.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Tested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20231129204301.131228-1-het.gala@nutanix.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Since socket_parse() will allocate memory for 'saddr',and its value
will pass to 'addr' that allocated by migrate_uri_parse(),
then 'saddr' will no longer used,need to free.
But due to 'saddr->u' is shallow copying the contents of the union,
the members of this union containing allocated strings,and will be used after that.
So just free 'saddr' itself without doing a deep free on the contents of the SocketAddress.
Fixes: 72a8192e22 ("migration: convert migration 'uri' into 'MigrateAddress'")
Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231120031428.908295-1-zhouzongmin@kylinos.cn>
This is being shadowed but the assignments at
multifd_channel_connect() and multifd_tls_channel_connect() .
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20231110200241.20679-2-farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_filter_bs() need to hold a reader lock for the graph because
it calls bdrv_filter_child(), which accesses bs->file/backing.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-4-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* Support for non 64b IOVA space
* Introduction of a PCIIOMMUOps callback structure to ease future
extensions
* Fix for a buffer overrun when writing the VF token
* PPC cleanups preparing ground for IOMMUFD support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmVI+bIACgkQUaNDx8/7
7KHW4g/9FmgX0k2Elm1BAul3slJtuBT8/iHKfK19rhXICxhxS5xBWJA8FmosTWAT
91YqQJhOHARxLd9VROfv8Fq8sAo+Ys8bP3PTXh5satjY5gR9YtmMSVqvsAVLn7lv
a/0xp7wPJt2UeKzvRNUqFXNr7yHPwxFxbJbmmAJbNte8p+TfE2qvojbJnu7BjJbg
sTtS/vFWNJwtuNYTkMRoiZaUKEoEZ8LnslOqKUjgeO59g4i3Dq8e2JCmHANPFWUK
cWmr7AqcXgXEnLSDWTtfN53bjcSCYkFVb4WV4Wv1/7hUF5jQ4UR0l3B64xWe0M3/
Prak3bWOM/o7JwLBsgaWPngXA9V0WFBTXVF4x5qTwhuR1sSV8MxUvTKxI+qqiEzA
FjU89oSZ+zXId/hEUuTL6vn1Th8/6mwD0L9ORchNOQUKzCjBzI4MVPB09nM3AdPC
LGThlufsZktdoU2KjMHpc+gMIXQYsxkgvm07K5iZTZ5eJ4tV5KB0aPvTZppGUxe1
YY9og9F3hxjDHQtEuSY2rzBQI7nrUpd1ZI5ut/3ZgDWkqD6aGRtMme4n4GsGsYb2
Ht9+d2RL9S8uPUh+7rV8K/N3+vXgXRaEYTuAScKtflEbA7YnZA5nUdMng8x0kMTQ
Y73XCd4UGWDfSSZsgaIHGkM/MRIHgmlrfcwPkWqWW9vF+92O6Hw=
=/Du0
-----END PGP SIGNATURE-----
Merge tag 'pull-vfio-20231106' of https://github.com/legoater/qemu into staging
vfio queue:
* Support for non 64b IOVA space
* Introduction of a PCIIOMMUOps callback structure to ease future
extensions
* Fix for a buffer overrun when writing the VF token
* PPC cleanups preparing ground for IOMMUFD support
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmVI+bIACgkQUaNDx8/7
# 7KHW4g/9FmgX0k2Elm1BAul3slJtuBT8/iHKfK19rhXICxhxS5xBWJA8FmosTWAT
# 91YqQJhOHARxLd9VROfv8Fq8sAo+Ys8bP3PTXh5satjY5gR9YtmMSVqvsAVLn7lv
# a/0xp7wPJt2UeKzvRNUqFXNr7yHPwxFxbJbmmAJbNte8p+TfE2qvojbJnu7BjJbg
# sTtS/vFWNJwtuNYTkMRoiZaUKEoEZ8LnslOqKUjgeO59g4i3Dq8e2JCmHANPFWUK
# cWmr7AqcXgXEnLSDWTtfN53bjcSCYkFVb4WV4Wv1/7hUF5jQ4UR0l3B64xWe0M3/
# Prak3bWOM/o7JwLBsgaWPngXA9V0WFBTXVF4x5qTwhuR1sSV8MxUvTKxI+qqiEzA
# FjU89oSZ+zXId/hEUuTL6vn1Th8/6mwD0L9ORchNOQUKzCjBzI4MVPB09nM3AdPC
# LGThlufsZktdoU2KjMHpc+gMIXQYsxkgvm07K5iZTZ5eJ4tV5KB0aPvTZppGUxe1
# YY9og9F3hxjDHQtEuSY2rzBQI7nrUpd1ZI5ut/3ZgDWkqD6aGRtMme4n4GsGsYb2
# Ht9+d2RL9S8uPUh+7rV8K/N3+vXgXRaEYTuAScKtflEbA7YnZA5nUdMng8x0kMTQ
# Y73XCd4UGWDfSSZsgaIHGkM/MRIHgmlrfcwPkWqWW9vF+92O6Hw=
# =/Du0
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 06 Nov 2023 22:35:30 HKT
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [unknown]
# gpg: aka "Cédric Le Goater <clg@kaod.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-vfio-20231106' of https://github.com/legoater/qemu: (22 commits)
vfio/common: Move vfio_host_win_add/del into spapr.c
vfio/spapr: Make vfio_spapr_create/remove_window static
vfio/container: Move spapr specific init/deinit into spapr.c
vfio/container: Move vfio_container_add/del_section_window into spapr.c
vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c
util/uuid: Define UUID_STR_LEN from UUID_NONE string
util/uuid: Remove UUID_FMT_LEN
vfio/pci: Fix buffer overrun when writing the VF token
util/uuid: Add UUID_STR_LEN definition
hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
test: Add some tests for range and resv-mem helpers
virtio-iommu: Consolidate host reserved regions and property set ones
virtio-iommu: Implement set_iova_ranges() callback
virtio-iommu: Record whether a probe request has been issued
range: Introduce range_inverse_array()
virtio-iommu: Introduce per IOMMUDevice reserved regions
util/reserved-region: Add new ReservedRegion helpers
range: Make range_compare() public
virtio-iommu: Rename reserved_regions into prop_resv_regions
vfio: Collect container iova range info
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We were not unlocking bitmap mutex on the error case. To fix it
forever change to enclose the code with WITH_QEMU_LOCK_GUARD().
Coverity CID 1523750.
Fixes: a2326705e5 ("migration: Stop migration immediately in RDMA error paths")
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231103074245.55166-1-quintela@redhat.com>
qemu_uuid_unparse() includes a trailing NUL when writing the uuid
string and the buffer size should be UUID_FMT_LEN + 1 bytes. Add a
define for this size and use it where required.
Cc: Fam Zheng <fam@euphon.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: "Denis V. Lunev" <den@openvz.org>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Integrate MigrateChannelList with all transport backends
(socket, exec and rdma) for both src and dest migration
endpoints for hmp migration.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20231023182053.8711-14-farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Integrate MigrateChannelList with all transport backends
(socket, exec and rdma) for both src and dest migration
endpoints for qmp migration.
For current series, limit the size of MigrateChannelList
to single element (single interface) as runtime check.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-13-farosas@suse.de>
migration_channels_and_uri_compatible() check for transport mechanism
suitable for multifd migration gets executed when the caller calls old
uri syntax. It needs it to be run when using the modern MigrateChannel
QAPI syntax too.
After URI -> 'MigrateChannel' :
migration_channels_and_uri_compatible() ->
migration_channels_and_transport_compatible() passes object as argument
and check for valid transport mechanism.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-12-farosas@suse.de>
MigrateChannelList allows to connect accross multiple interfaces.
Add MigrateChannelList struct as argument to migration QAPIs.
We plan to include multiple channels in future, to connnect
multiple interfaces. Hence, we choose 'MigrateChannelList'
as the new argument over 'MigrateChannel' to make migration
QAPIs future proof.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-10-farosas@suse.de>
Convert the file: URI to accept a FileMigrationArgs to be compatible
with the new migration QAPI.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-9-farosas@suse.de>
Exec transport backend for 'migrate'/'migrate-incoming' QAPIs accept
new wire protocol of MigrateAddress struct.
It is achived by parsing 'uri' string and storing migration parameters
required for exec connection into strList struct.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-8-farosas@suse.de>
RDMA based transport backend for 'migrate'/'migrate-incoming' QAPIs
accept new wire protocol of MigrateAddress struct.
It is achived by parsing 'uri' string and storing migration parameters
required for RDMA connection into well defined InetSocketAddress struct.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-7-farosas@suse.de>
Socket transport backend for 'migrate'/'migrate-incoming' QAPIs accept
new wire protocol of MigrateAddress struct.
It is achived by parsing 'uri' string and storing migration parameters
required for socket connection into well defined SocketAddress struct.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-6-farosas@suse.de>
This patch parses 'migrate' and 'migrate-incoming' QAPI's 'uri'
string containing migration connection related information
and stores them inside well defined 'MigrateAddress' struct.
Fabiano fixed for "file" transport.
Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-4-farosas@suse.de>
Message-ID: <20231023182053.8711-5-farosas@suse.de>
Now we have a Error** passed into the return path thread stack, which is
even clearer than an int retval. Change ram_dirty_bitmap_reload() and the
callers to use a bool instead to replace errnos.
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017202633.296756-5-peterx@redhat.com>
Normally the postcopy recover phase should only exist for a super short
period, that's the duration when QEMU is trying to recover from an
interrupted postcopy migration, during which handshake will be carried out
for continuing the procedure with state changes from PAUSED -> RECOVER ->
POSTCOPY_ACTIVE again.
Here RECOVER phase should be super small, that happens right after the
admin specified a new but working network link for QEMU to reconnect to
dest QEMU.
However there can still be case where the channel is broken in this small
RECOVER window.
If it happens, with current code there's no way the src QEMU can got kicked
out of RECOVER stage. No way either to retry the recover in another channel
when established.
This patch allows the RECOVER phase to fail itself too - we're mostly
ready, just some small things missing, e.g. properly kick the main
migration thread out when sleeping on rp_sem when we found that we're at
RECOVER stage. When this happens, it fails the RECOVER itself, and
rollback to PAUSED stage. Then the user can retry another round of
recovery.
To make it even stronger, teach QMP command migrate-pause to explicitly
kick src/dst QEMU out when needed, so even if for some reason the migration
thread didn't got kicked out already by a failing rethrn-path thread, the
admin can also kick it out.
This will be an super, super corner case, but still try to cover that.
One can try to test this with two proxy channels for migration:
(a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:10000
(b) socat tcp-listen:10000,reuseaddr,fork unix:/tmp/dst.sock
So the migration channel will be:
(a) (b)
src -> /tmp/src.sock -> tcp:10000 -> /tmp/dst.sock -> dst
Then to make QEMU hang at RECOVER stage, one can do below:
(1) stop the postcopy using QMP command postcopy-pause
(2) kill the 2nd proxy (b)
(3) try to recover the postcopy using /tmp/src.sock on src
(4) src QEMU will go into RECOVER stage but won't be able to continue
from there, because the channel is actually broken at (b)
Before this patch, step (4) will make src QEMU stuck in RECOVER stage,
without a way to kick the QEMU out or continue the postcopy again. After
this patch, (4) will quickly fail qemu and bounce back to PAUSED stage.
Admin can also kick QEMU from (4) into PAUSED when needed using
migrate-pause when needed.
After bouncing back to PAUSED stage, one can recover again.
Reported-by: Xiaohui Li <xiaohli@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111332
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017202633.296756-3-peterx@redhat.com>
rp_state.error was a boolean used to show error happened in return path
thread. That's not only duplicating error reporting (migrate_set_error),
but also not good enough in that we only do error_report() and set it to
true, we never can keep a history of the exact error and show it in
query-migrate.
To make this better, a few things done:
- Use error_setg() rather than error_report() across the whole lifecycle
of return path thread, keeping the error in an Error*.
- With above, no need to have mark_source_rp_bad(), remove it, alongside
with rp_state.error itself.
- Use migrate_set_error() to apply that captured error to the global
migration object when error occured in this thread.
- Do the same when detected qemufile error in source return path
We need to re-export qemu_file_get_error_obj() to do the last one.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017202633.296756-2-peterx@redhat.com>
Extend the blocker interface so that a blocker can be registered for
one or more migration modes. The existing interfaces register a
blocker for all modes, and the new interfaces take a varargs list
of modes.
Internally, maintain a separate blocker list per mode. The same Error
object may be added to multiple lists. When a block is deleted, it is
removed from every list, and the Error is freed.
No functional change until a new mode is added.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1698263069-406971-3-git-send-email-steven.sistare@oracle.com>
Create a mode migration parameter that can be used to select alternate
migration algorithms. The default mode is normal, representing the
current migration algorithm, and does not need to be explicitly set.
No functional change until a new mode is added, except that the mode is
shown by the 'info migrate' command.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1698263069-406971-2-git-send-email-steven.sistare@oracle.com>
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints for major downtime checkpoints on both src and dst. They
share the same tracepoint with a string showing its stage.
Besides the checkpoints in the previous patch, this patch also added
destination checkpoints.
On src, we have these checkpoints added:
- src-downtime-start: right before vm stops on src
- src-vm-stopped: after vm is fully stopped
- src-iterable-saved: after all iterables saved (END sections)
- src-non-iterable-saved: after all non-iterable saved (FULL sections)
- src-downtime-stop: migration fully completed
On dst, we have these checkpoints added:
- dst-precopy-loadvm-completes: after loadvm all done for precopy
- dst-precopy-bh-*: record BH steps to resume VM for precopy
- dst-postcopy-bh-*: record BH steps to resume VM for postcopy
On dst side, we don't have a good way to trace total time consumed by
iterable or non-iterable for now. We can mark it by 1st time receiving a
FULL / END section, but rather than that let's just rely on the other
tracepoints added for vmstates to back up the information.
With this patch, one can enable "vmstate_downtime*" tracepoints and it'll
enable all tracepoints for downtime measurements necessary.
Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they
service the same purpose, which was only for postcopy. We then have
unified prefix for all downtime relevant tracepoints.
Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-6-peterx@redhat.com>
Provide a helper for non-COLO use case of migration to stop a VM. This
prepares for adding some downtime relevant tracepoints to migration, where
they may or may not apply to COLO.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-5-peterx@redhat.com>
We have a bunch of savevm_section* tracepoints, they're good to analyze
migration stream, but not always suitable if someone would like to analyze
the migration downtime. Two major problems:
- savevm_section* tracepoints are dumping all sections, we only care
about the sections that contribute to the downtime
- They don't have an identifier to show the type of sections, so no way
to filter downtime information either easily.
We can add type into the tracepoints, but instead of doing so, this patch
kept them untouched, instead of adding a bunch of downtime specific
tracepoints, so one can enable "vmstate_downtime*" tracepoints and get a
full picture of how the downtime is distributed across iterative and
non-iterative vmstate save/load.
Note that here both save() and load() need to be traced, because both of
them may contribute to the downtime. The contribution is not a simple "add
them together", though: consider when the src is doing a save() of device1
while the dest can be load()ing for device2, so they can happen
concurrently.
Tracking both sides make sense because device load() and save() can be
imbalanced, one device can save() super fast, but load() super slow, vice
versa. We can't figure that out without tracing both.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-4-peterx@redhat.com>
Unify the three users on recording downtimes with the same pair of helpers.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-3-peterx@redhat.com>
Postcopy calculates its downtime separately. It always sets
MigrationState.downtime properly, but not MigrationState.downtime_start.
Make postcopy do the same as other modes on properly recording the
timestamp when the VM is going to be stopped. Drop the temporary variable
in postcopy_start() along the way.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-2-peterx@redhat.com>
Before finally register one SaveStateEntry, we detect for duplicated
entries. This could be helpful to notify us asap instead of get
silent migration failures which could be hard to diagnose.
For example, this patch will generate a message like this (if without
previous fixes on x2apic) as long as we wants to boot a VM instance
with "-smp 200,maxcpus=288,sockets=2,cores=72,threads=2" and QEMU will
bail out even before VM starts:
savevm_state_handler_insert: Detected duplicate SaveStateEntry: id=apic, instance_id=0x0
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-10-quintela@redhat.com>
Current code does:
- register pre_2_10_vmstate_dummy_icp with "icp/server" and instance
dependinfg on cpu number
- for newer machines, it register vmstate_icp with "icp/server" name
and instance 0
- now it unregisters "icp/server" for the 1st instance.
This is wrong at many levels:
- we shouldn't have two VMSTATEDescriptions with the same name
- In case this is the only solution that we can came with, it needs to
be:
* register pre_2_10_vmstate_dummy_icp
* unregister pre_2_10_vmstate_dummy_icp
* register real vmstate_icp
Created vmstate_replace_hack_for_ppc() with warnings left and right
that it is a hack.
CC: Cedric Le Goater <clg@kaod.org>
CC: Daniel Henrique Barboza <danielhb413@gmail.com>
CC: David Gibson <david@gibson.dropbear.id.au>
CC: Greg Kurz <groug@kaod.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-8-quintela@redhat.com>
This let us simplify code of this shape.
qemu_fflush(f);
int ret = qemu_file_get_error(f);
if (ret) {
return ret;
}
into:
int ret = qemu_fflush(f);
if (ret) {
return ret;
}
I updated all callers where there is any error check.
qemu_fclose() don't need to check for f->last_error because
qemu_fflush() returns it at the beggining of the function.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-13-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
After last commit, it is a write only variable.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-12-quintela@redhat.com>
There are only two differnces with the old value:
- the amount of QEMUFile that hasn't yet been flushed. It can be
discussed what is more exact, the new or the old one.
- the amount of transferred bytes that we forgot to account for (the
newer is better, i.e. exact).
Notice that this two values are used to:
a - present to the user
b - calculate the rate_limit
So a few KB here and there is not going to make a difference.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-11-quintela@redhat.com>
If we pass a NULL error is the same that returning directly the value.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-10-quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-9-quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-8-quintela@redhat.com>
qemu_file_transferred() don't exist anymore, so we can reuse the name.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-7-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We only use migration_transferred_bytes() to calculate the rate_limit,
for that we don't need to flush whatever is on the qemu_file buffer.
Remember that the buffer is really small (normal case is 32K if we use
iov's can be 64 * TARGET_PAGE_SIZE), so this is not relevant to
calculations.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-5-quintela@redhat.com>
This way we can read it from any thread.
I checked that it gives the same value as the current one. We never
use two qemu_files at the same time.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-3-quintela@redhat.com>
We only call qemu_file_transferred_* on the sending side. Remove the
increment at qemu_file_fill_buffer() and add asserts to
qemu_file_transferred* functions.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-2-quintela@redhat.com>
In multiple places, RDMA errors are handled in a strange way, where it only
sets qemu_file_set_error() but not stop the migration immediately.
It's not obvious what will happen later if there is already an error. Make
all such failures stop migration immediately.
Cc: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Fabiano Rosas <farosas@suse.de>
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231024163933.516546-1-peterx@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231018115513.2163-6-quintela@redhat.com>
It is obsolete. It is better to use driver-mirror with NBD instead.
CC: Kevin Wolf <kwolf@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Hanna Czenczek <hreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231018115513.2163-5-quintela@redhat.com>
Use blocked-mirror with NBD instead.
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231018115513.2163-4-quintela@redhat.com>
Use blockdev-mirror with NBD instead.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231018115513.2163-3-quintela@redhat.com>
commit 13cde50889 ("vmstate: Return error in case of error") sets
QemuFile error to stop reading from it and report to the caller (checked
by unit tests). We should do the same on subsection loading error.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231024084043.2926316-8-marcandre.lureau@redhat.com>
The function is used on save at this point. The following commits will
use it on load.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231024084043.2926316-5-marcandre.lureau@redhat.com>
Rename the variable here to avoid that it shadows a variable from
the beginning of the function scope. With this change the code now
successfully compiles with -Wshadow=local.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231024092220.55305-1-thuth@redhat.com>
We are moving to have all functions exported from ram-compress.c to
start with compress_.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231019110724.15324-12-quintela@redhat.com>
As we export it, rename it compress_flush_data().
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231019110724.15324-10-quintela@redhat.com>
This function is only used for compression. So we rename it as
compress_send_queued_data(). We put it on ram-compress.h because we
are moving it later to ram-compress.c.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231019110724.15324-9-quintela@redhat.com>
So we can move more compression_counters stuff to ram-compress.c.
Create compression_counters struct to add the stuff that was on
MigrationState.
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231019110724.15324-8-quintela@redhat.com>
And now we can simplify save_compress_page().
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231019110724.15324-7-quintela@redhat.com>
Move the goto to a while true.
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231019110724.15324-6-quintela@redhat.com>
After previous patch, we disable the posiblity that we use compression
together with xbzrle. So we can use directly migrate_compress().
Once there, now we don't need the rs parameter, so remove it.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231019110724.15324-4-quintela@redhat.com>
Now that we know it only handles zero, we can remove the ch parameter.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231019085259.13307-3-quintela@redhat.com>
We don't allow non zero compressed pages since:
commit 3edcd7e6eb
Author: Peter Lieven <pl@kamp.de>
Date: Tue Mar 26 10:58:35 2013 +0100
migration: search for zero instead of dup pages
RDMA case is a bit more complicated, but they don't handle it since:
commit a1febc4950
Author: Richard Henderson <rth@twiddle.net>
Date: Mon Aug 29 11:46:14 2016 -0700
cutils: Export only buffer_is_zero
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231019085259.13307-2-quintela@redhat.com>
We don't need to check p->quit in the multifd_send_thread() because it
is shadowed by the 'exiting' flag. Ever since that flag was added
p->quit became obsolete as a way to stop the thread.
Since p->quit is set at multifd_send_terminate_threads() under the
p->mutex lock, the thread will only see it once it loops, so 'exiting'
will always be seen first.
Note that setting p->quit at multifd_send_terminate_threads() still
makes sense because we need a way to inform multifd_send_pages() that
the channel has stopped.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231012140651.13122-3-farosas@suse.de>
Pass the callback function to add_migration_state_change_notifier so
that migration can initialize the notifier on add and clear it on
delete, which simplifies the call sites. Shorten the function names
so the extra arg can be added more legibly. Hide the global notifier
list in a new function migration_call_notifiers, and make it externally
visible so future live update code can call it.
No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1686148954-250144-1-git-send-email-steven.sistare@oracle.com>
It's possible that some errors can be overwritten with success retval later
on, and then ignored. Always capture all errors and report.
Reported by Coverity 1522861, but actually I spot one more in the same
function.
Fixes: CID 1522861
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017203855.298260-1-peterx@redhat.com>
Modify migrate_add_blocker and migrate_del_blocker to take an Error **
reason. This allows migration to own the Error object, so that if
an error occurs in migrate_add_blocker, migration code can free the Error
and clear the client handle, simplifying client code. It also simplifies
the migrate_del_blocker call site.
In addition, this is a pre-requisite for a proposed future patch that would
add a mode argument to migration requests to support live update, and
maintain a list of blockers for each mode. A blocker may apply to a single
mode or to multiple modes, and passing Error** will allow one Error object
to be registered for multiple modes.
No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Tested-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1697634216-84215-1-git-send-email-steven.sistare@oracle.com>
The new line was only printed when command options were used. When we
used migration parameters and capabilities, it wasn't.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017172307.22858-2-quintela@redhat.com>
It is used everywhere else in C. Once there, make sure that we don't
use the index outside of the for declaring the variable there.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Message-ID: <20230613145757.10131-15-quintela@redhat.com>
Doing a break to do another break is just confused. Just call return
when we know we want to return.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Message-ID: <20230613145757.10131-14-quintela@redhat.com>
So we don't have to access compression_counters from outside
ram-compress.c.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Message-ID: <20230613145757.10131-7-quintela@redhat.com>
So give an error instead of just ignoring the other methods.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Message-ID: <20230613145757.10131-4-quintela@redhat.com>
The function is currently called from two sites, one always gives it a
NULL Error and the other always gives it a non-NULL Error.
In the non-NULL case, all it does it trace the error and return. One
of the callers already have tracing, add a tracepoint to the other and
stop passing the error into the function.
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231012134343.23757-4-farosas@suse.de>
The preferred usage of the Error type is to always set both the return
code and the error when a failure happens. As all code called from the
send thread follows this pattern, we'll always have the return code
and the error set at the same time.
Aside from the convention, in this piece of code this must be the
case, otherwise the if (ret != 0) would be exiting the thread without
calling multifd_send_terminate_threads() which is incorrect.
Unify both paths to make it clear that both are taken when there's an
error.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231012134343.23757-3-farosas@suse.de>
We're about to enable support for other transports in multifd, so
remove direct references to sockets.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231012134343.23757-2-farosas@suse.de>
We don't need to do this in two pieces. One single function makes it
easier to grasp, specially since it removes the indirection on the
return value handling.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184604.32364-7-farosas@suse.de>
It makes a bit more sense to have the zero page handling of xbzrle
right where we save the zero page.
Also invert the exit condition to remove one level of indentation
which makes the next patch easier to grasp.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184604.32364-6-farosas@suse.de>
We don't need the QEMUFile when we're already passing the
PageSearchStatus.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184604.32364-5-farosas@suse.de>
'rs' is not used in that function. It's a leftover from commit
9360447d34 ("ram: Use MigrationStats for statistics").
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184604.32364-4-farosas@suse.de>
Extract the ramblock parsing code into a routine that operates on the
sequence of headers from the stream and another the parses the
individual ramblock. This makes ram_load_precopy() easier to
comprehend.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184604.32364-3-farosas@suse.de>
Sometimes multifd sends just sync packet with no pages
(normal_num is 0). In this case the old value is being
preserved and being accounted for while only packet_len
is being transferred.
Reset it to 0 after sending and accounting for.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184358.97349-5-elena.ufimtseva@oracle.com>
Previous commit cbec7eb768
"migration/multifd: Compute transferred bytes correctly"
removed accounting for packet_len in non-rdma
case, but the next_packet_size only accounts for pages, not for
the header packet (normal_pages * PAGE_SIZE) that is being sent
as iov[0]. The packet_len part should be added to account for
the size of MultiFDPacket and the array of the offsets.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184358.97349-4-elena.ufimtseva@oracle.com>
In migration rate limiting atomic operations are used
to read the rate limit variables and transferred bytes and
they are expensive. Check first if rate_limit_max is equal
to RATE_LIMIT_DISABLED and return false immediately if so.
Note that with this patch we will also will stop flushing
by not calling qemu_fflush() from migration_transferred_bytes()
if the migration rate is not exceeded.
This should be fine since migration thread calls in the loop
migration_update_counters from migration_rate_limit() that
calls the migration_transferred_bytes() and flushes there.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184358.97349-2-elena.ufimtseva@oracle.com>
Change code that is:
int ret;
...
ret = foo();
if (ret[ < 0]?) {
to:
if (foo()[ < 0]) {
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-14-quintela@redhat.com>
Declare all variables that are only used inside a for loop inside the
for statement.
This makes clear that they are not used outside of the for loop.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-13-quintela@redhat.com>
Once there, all the uses are local to the for, so declare the variable
inside the for statement.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-12-quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-11-quintela@redhat.com>
Functions are long enough even without this.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-10-quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-9-quintela@redhat.com>
The only user was rdma, and its use is gone.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-8-quintela@redhat.com>
The only user of ram_control_save_page() and save_page() hook was
rdma. Just move the function to rdma.c, rename it to
rdma_control_save_page().
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-7-quintela@redhat.com>
There is only one flag called with: RAM_CONTROL_BLOCK_REG.
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-6-quintela@redhat.com>
Instead of going through ram_control_load_hook(), call
qemu_rdma_registration_handle() directly.
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-5-quintela@redhat.com>
Once there:
- Remove unused data parameter
- unfold it in its callers
- change all callers to call qemu_rdma_registration_stop()
- We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma()
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-4-quintela@redhat.com>
Once there:
- Remove unused data parameter
- unfold it in its callers.
- change all callers to call qemu_rdma_registration_start()
- We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma()
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-3-quintela@redhat.com>
Helper to say if we are doing a migration over rdma.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011203527.9061-2-quintela@redhat.com>
RDMA was having trouble because
migrate_multifd_flush_after_each_section() can only be true or false,
but we don't want to send any flush when we are not in multifd
migration.
CC: Fabiano Rosas <farosas@suse.de
Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
Reported-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011205548.10571-2-quintela@redhat.com>
This is intended to be a semantic revert of commit 9b09503752
("migration: run setup callbacks out of big lock"). There have been so
many changes since that commit (e.g. a new setup callback
dirty_bitmap_save_setup() that also needs to be adapted now), it's
easier to do the revert manually.
For snapshots, the bdrv_writev_vmstate() function is used during setup
(in QIOChannelBlock backing the QEMUFile), but not holding the BQL
while calling it could lead to an assertion failure. To understand
how, first note the following:
1. Generated coroutine wrappers for block layer functions spawn the
coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it.
2. If the host OS switches threads at an inconvenient time, it can
happen that a bottom half scheduled for the main thread's AioContext
is executed as part of a vCPU thread's aio_poll().
An example leading to the assertion failure is as follows:
main thread:
1. A snapshot-save QMP command gets issued.
2. snapshot_save_job_bh() is scheduled.
vCPU thread:
3. aio_poll() for the main thread's AioContext is called (e.g. when
the guest writes to a pflash device, as part of blk_pwrite which is a
generated coroutine wrapper).
4. snapshot_save_job_bh() is executed as part of aio_poll().
3. qemu_savevm_state() is called.
4. qemu_mutex_unlock_iothread() is called. Now
qemu_get_current_aio_context() returns 0x0.
5. bdrv_writev_vmstate() is executed during the usual savevm setup
via qemu_fflush(). But this function is a generated coroutine wrapper,
so it uses AIO_WAIT_WHILE. There, the assertion
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
will fail.
To fix it, ensure that the BQL is held during setup. While it would
only be needed for snapshots, adapting migration too avoids additional
logic for conditional locking/unlocking in the setup callbacks.
Writing the header could (in theory) also trigger qemu_fflush() and
thus bdrv_writev_vmstate(), so the locked section also covers the
qemu_savevm_state_header() call, even for migration for consistency.
The section around multifd_send_sync_main() needs to be unlocked to
avoid a deadlock. In particular, the multifd_save_setup() function calls
socket_send_channel_create() using multifd_new_send_channel_async() as a
callback and then waits for the callback to signal via the
channels_ready semaphore. The connection happens via
qio_task_run_in_thread(), but the callback is only executed via
qio_task_thread_result() which is scheduled for the main event loop.
Without unlocking the section, the main thread would never get to
process the task result and the callback meaning there would be no
signal via the channels_ready semaphore.
The comment in ram_init_bitmaps() was introduced by 4987783400
("migration: fix incorrect memory_global_dirty_log_start outside BQL")
and is removed, because it referred to the qemu_mutex_lock_iothread()
call.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231013105839.415989-1-f.ebner@proxmox.com>
Make the migration json writer part of MigrationState struct, allowing
the 'configuration' object be serialized to json.
This will facilitate the parsing of the 'configuration' object in the
next patch that fixes analyze-migration.py for arm.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231009184326.15777-2-farosas@suse.de>
qemu_ram_block_from_host() may return NULL, which will be dereferenced w/o
check. Usualy return value is checked for this function.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Dmitry Frolov <frolov@swemel.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231010104851.802947-1-frolov@swemel.ru>
Migration bandwidth is a very important value to live migration. It's
because it's one of the major factors that we'll make decision on when to
switchover to destination in a precopy process.
This value is currently estimated by QEMU during the whole live migration
process by monitoring how fast we were sending the data. This can be the
most accurate bandwidth if in the ideal world, where we're always feeding
unlimited data to the migration channel, and then it'll be limited to the
bandwidth that is available.
However in reality it may be very different, e.g., over a 10Gbps network we
can see query-migrate showing migration bandwidth of only a few tens of
MB/s just because there are plenty of other things the migration thread
might be doing. For example, the migration thread can be busy scanning
zero pages, or it can be fetching dirty bitmap from other external dirty
sources (like vhost or KVM). It means we may not be pushing data as much
as possible to migration channel, so the bandwidth estimated from "how many
data we sent in the channel" can be dramatically inaccurate sometimes.
With that, the decision to switchover will be affected, by assuming that we
may not be able to switchover at all with such a low bandwidth, but in
reality we can.
The migration may not even converge at all with the downtime specified,
with that wrong estimation of bandwidth, keeping iterations forever with a
low estimation of bandwidth.
The issue is QEMU itself may not be able to avoid those uncertainties on
measuing the real "available migration bandwidth". At least not something
I can think of so far.
One way to fix this is when the user is fully aware of the available
bandwidth, then we can allow the user to help providing an accurate value.
For example, if the user has a dedicated channel of 10Gbps for migration
for this specific VM, the user can specify this bandwidth so QEMU can
always do the calculation based on this fact, trusting the user as long as
specified. It may not be the exact bandwidth when switching over (in which
case qemu will push migration data as fast as possible), but much better
than QEMU trying to wildly guess, especially when very wrong.
A new parameter "avail-switchover-bandwidth" is introduced just for this.
So when the user specified this parameter, instead of trusting the
estimated value from QEMU itself (based on the QEMUFile send speed), it
trusts the user more by using this value to decide when to switchover,
assuming that we'll have such bandwidth available then.
Note that specifying this value will not throttle the bandwidth for
switchover yet, so QEMU will always use the full bandwidth possible for
sending switchover data, assuming that should always be the most important
way to use the network at that time.
This can resolve issues like "unconvergence migration" which is caused by
hilarious low "migration bandwidth" detected for whatever reason.
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231010221922.40638-1-peterx@redhat.com>
Current migration_completion function is a bit long. Refactor the long
implementation into different subfunctions:
- migration_completion_precopy: completion code related to precopy
- migration_completion_postcopy: completion code related to postcopy
Rename await_return_path_close_on_source to
close_return_path_on_source: It is renamed to match with
open_return_path_on_source.
This improves readability and is easier for future updates (e.g. add new
subfunctions when completion code related to new features are needed). No
functional changes intended.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230804093053.5037-1-wei.w.wang@intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_first_blk() and bdrv_is_root_node() need to hold a reader lock
for the graph. These functions are the only functions in block-backend.c
that access the parent list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-5-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's just a simple wrapper for rp_sem on either wait() or kick(), make it
even clearer on how it is used. Prepared to be used even for other things.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-ID: <20231004220240.167175-8-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Instead of only relying on the count of rp_sem, make the counter be part of
RAMState so it can be used in both threads to synchronize on the process.
rp_sem will be further reused in follow up patches, as a way to kick the
main thread, e.g., on recovery failures.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-7-peterx@redhat.com>
There're a lot of cases where we only have an errno set in last_error but
without a detailed error description. When this happens, try to generate
an error contains the errno as a descriptive error.
This will be helpful in cases where one relies on the Error*. E.g.,
migration state only caches Error* in MigrationState.error. With this,
we'll display correct error messages in e.g. query-migrate when the error
was only set by qemu_file_set_error().
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-6-peterx@redhat.com>
Introduce a helper to detect whether MigrationState.error is set for
whatever reason.
This is preparation work for any thread (e.g. source return path thread) to
setup errors in an unified way to MigrationState, rather than relying on
its own way to set errors (mark_source_rp_bad()).
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-3-peterx@redhat.com>
Display it as long as being set, irrelevant of FAILED status. E.g., it may
also be applicable to PAUSED stage of postcopy, to provide hint on what has
gone wrong.
The error_mutex seems to be overlooked when referencing the error, add it
to be very safe.
This will change QAPI behavior by showing up error message outside !FAILED
status, but it's intended and doesn't expect to break anyone.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-2-peterx@redhat.com>
qemu_rdma_dump_id() dumps RDMA device details to stdout.
rdma_start_outgoing_migration() calls it via qemu_rdma_source_init()
and qemu_rdma_resolve_host() to show source device details.
rdma_start_incoming_migration() arranges its call via
rdma_accept_incoming_migration() and qemu_rdma_accept() to show
destination device details.
Two issues:
1. rdma_start_outgoing_migration() can run in HMP context. The
information should arguably go the monitor, not stdout.
2. ibv_query_port() failure is reported as error. Its callers remain
unaware of this failure (qemu_rdma_dump_id() can't fail), so
reporting this to the user as an error is problematic.
Fixable, but the device detail dump is noise, except when
troubleshooting. Tracing is a better fit. Similar function
qemu_rdma_dump_id() was converted to tracing in commit
733252deb8 (Tracify migration/rdma.c).
Convert qemu_rdma_dump_id(), too.
While there, touch up qemu_rdma_dump_gid()'s outdated comment.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-54-armbru@redhat.com>
error_report() obeys -msg, reports the current error location if any,
and reports to the current monitor if any. Reporting to stderr
directly with fprintf() or perror() is wrong, because it loses all
this.
Fix the offenders. Bonus: resolves a FIXME about problematic use of
errno.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-53-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_source_init(), qemu_rdma_connect(),
rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
violate this principle: they call error_report() via
qemu_rdma_cleanup().
Moreover, qemu_rdma_cleanup() can't fail. It is called on error
paths, and QIOChannel close and finalization. Are the conditions it
reports really errors? I doubt it.
Downgrade qemu_rdma_cleanup()'s errors to warnings.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-52-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_write_one() violates this principle: it reports errors to
stderr via qemu_rdma_register_and_get_keys(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up: silence qemu_rdma_register_and_get_keys(). I believe
the caller's error reports suffice. If they don't, we need to convert
to Error instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-51-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_post_send_control(), qemu_rdma_exchange_get_response(), and
qemu_rdma_write_one() violate this principle: they call
error_report(), fprintf(stderr, ...), and perror() via
qemu_rdma_block_for_wrid(), qemu_rdma_poll(), and
qemu_rdma_wait_comp_channel(). I elected not to investigate how
callers handle the error, i.e. precise impact is not known.
Clean this up by dropping the error reporting from qemu_rdma_poll(),
qemu_rdma_wait_comp_channel(), and qemu_rdma_block_for_wrid(). I
believe the callers' error reports suffice. If they don't, we need to
convert to Error instead.
Bonus: resolves a FIXME about problematic use of errno.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-50-armbru@redhat.com>
When qemu_rdma_wait_comp_channel() receives an event from the
completion channel, it reports an error "receive cm event while wait
comp channel,cm event is T", where T is the numeric event type.
However, the function fails only when T is a disconnect or device
removal. Events other than these two are not actually an error, and
reporting them as an error is wrong. If we need to report them to the
user, we should use something else, and what to use depends on why we
need to report them to the user.
For now, report this error only when the function actually fails.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-49-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_source_init() and qemu_rdma_accept() violate this principle:
they call error_report() via qemu_rdma_reg_control(). I elected not
to investigate how callers handle the error, i.e. precise impact is
not known.
Clean this up by dropping the error reporting from
qemu_rdma_reg_control(). I believe the callers' error reports
suffice. If they don't, we need to convert to Error instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-48-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_connect() violates this principle: it calls error_report()
and perror(). I elected not to investigate how callers handle the
error, i.e. precise impact is not known.
Clean this up: replace perror() by changing error_setg() to
error_setg_errno(), and drop error_report(). I believe the callers'
error reports suffice then. If they don't, we need to convert to
Error instead.
Bonus: resolves a FIXME about problematic use of errno.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-47-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_resolve_host() violates this principle: it calls
error_report().
Clean this up: drop error_report().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-46-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_source_init() violates this principle: it calls
error_report() via qemu_rdma_alloc_pd_cq(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up by converting qemu_rdma_alloc_pd_cq() to Error.
The conversion loses a piece of advice on one of two failure paths:
Your mlock() limits may be too low. Please check $ ulimit -a # and search for 'ulimit -l' in the output
Not worth retaining.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-45-armbru@redhat.com>
Just for symmetry with qemu_rdma_post_send_control(). Error messages
lose detail I consider of no use to users.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-44-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_exchange_send() violates this principle: it calls
error_report() via qemu_rdma_post_send_control(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up by converting qemu_rdma_post_send_control() to Error.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-43-armbru@redhat.com>
Just for consistency with qemu_rdma_write_one() and
qemu_rdma_write_flush(), and for slightly simpler code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-42-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_write_flush() violates this principle: it calls
error_report() via qemu_rdma_write_one(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up by converting qemu_rdma_write_one() to Error. Bonus:
resolves a FIXME about problematic use of errno.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-41-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qio_channel_rdma_writev() violates this principle: it calls
error_report() via qemu_rdma_write_flush(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up by converting qemu_rdma_write_flush() to Error.
Necessitates setting an error when qemu_rdma_write_one() failed.
Since this error will go away later in this series, simply use "FIXME
temporary error message" there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-40-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_exchange_send() violates this principle: it calls
error_report() via callback qemu_rdma_reg_whole_ram_blocks(). I
elected not to investigate how callers handle the error, i.e. precise
impact is not known.
Clean this up by converting the callback to Error.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-39-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_exchange_send() and qemu_rdma_exchange_recv() violate this
principle: they call error_report() via
qemu_rdma_exchange_get_response(). I elected not to investigate how
callers handle the error, i.e. precise impact is not known.
Clean this up by converting qemu_rdma_exchange_get_response() to
Error.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-38-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qio_channel_rdma_writev() violates this principle: it calls
error_report() via qemu_rdma_exchange_send(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up by converting qemu_rdma_exchange_send() to Error.
Necessitates setting an error when qemu_rdma_post_recv_control(),
callback(), or qemu_rdma_exchange_get_response() failed. Since these
errors will go away later in this series, simply use "FIXME temporary
error message" there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-37-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qio_channel_rdma_readv() violates this principle: it calls
error_report() via qemu_rdma_exchange_recv(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up by converting qemu_rdma_exchange_recv() to Error.
Necessitates setting an error when qemu_rdma_exchange_get_response()
failed. Since this error will go away later in this series, simply
use "FIXME temporary error message" there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-36-armbru@redhat.com>
These guards are all redundant now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-35-armbru@redhat.com>
qemu_rdma_resolve_host() and qemu_rdma_dest_init() iterate over
addresses to find one that works, holding onto the first Error from
qemu_rdma_broken_ipv6_kernel() for use when no address works. Issues:
1. If @errp was &error_abort or &error_fatal, we'd terminate instead
of trying the next address. Can't actually happen, since no caller
passes these arguments.
2. When @errp is a pointer to a variable containing NULL, and
qemu_rdma_broken_ipv6_kernel() fails, the variable no longer
contains NULL. Subsequent iterations pass it again, violating
Error usage rules. Dangerous, as setting an error would then trip
error_setv()'s assertion. Works only because
qemu_rdma_broken_ipv6_kernel() and the code following the loops
carefully avoids setting a second error.
3. If qemu_rdma_broken_ipv6_kernel() fails, and then a later iteration
finds a working address, @errp still holds the first error from
qemu_rdma_broken_ipv6_kernel(). If we then run into another error,
we report the qemu_rdma_broken_ipv6_kernel() failure instead.
4. If we don't run into another error, we leak the Error object.
Use a local error variable, and propagate to @errp. This fixes 3. and
also cleans up 1 and partly 2.
Free this error when we have a working address. This fixes 4.
Pass the local error variable to qemu_rdma_broken_ipv6_kernel() only
until it fails. Pass null on any later iterations. This cleans up
the remainder of 2.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-34-armbru@redhat.com>
ERROR() has become "error_setg() unless an error has been set
already". Hiding the conditional in the macro is in the way of
further work. Replace the macro uses by their expansion, and delete
the macro.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-33-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
Macro ERROR() violates this principle. Delete the error_report()
there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-32-armbru@redhat.com>
When migration capability @rdma-pin-all is true, but the server cannot
honor it, qemu_rdma_connect() calls macro ERROR(), then returns
success.
ERROR() sets an error. Since qemu_rdma_connect() returns success, its
caller rdma_start_outgoing_migration() duly assumes @errp is still
clear. The Error object leaks.
ERROR() additionally reports the situation to the user as an error:
RDMA ERROR: Server cannot support pinning all memory. Will register memory dynamically.
Is this an error or not? It actually isn't; we disable @rdma-pin-all
and carry on. "Correcting" the user's configuration decisions that
way feels problematic, but that's a topic for another day.
Replace ERROR() by warn_report(). This plugs the memory leak, and
emits a clearer message to the user.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-31-armbru@redhat.com>
When a function returns 0 on success, negative value on error,
checking for non-zero suffices, but checking for negative is clearer.
So do that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-30-armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-29-armbru@redhat.com>
All we do with the value of RDMAContext member @error_state is test
whether it's zero. Change to bool and rename to @errored.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-28-armbru@redhat.com>
This is just to make the error value more obvious. Callers don't
mind.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-27-armbru@redhat.com>
Several functions return negative errno codes on failure. Callers
check for specific codes exactly never. For some of the functions,
callers couldn't check even if they wanted to, because the functions
also return negative values that aren't errno codes, leaving readers
confused on what the function actually returns.
Clean up and simplify: return -1 instead of negative errno code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-26-armbru@redhat.com>
rdma_getaddrinfo() returns 0 on success. On error, it returns one of
the EAI_ error codes like getaddrinfo() does, or -1 with errno set.
This is broken by design: POSIX implicitly specifies the EAI_ error
codes to be non-zero, no more. They could clash with -1. Nothing we
can do about this design flaw.
Both callers of rdma_getaddrinfo() only recognize negative values as
error. Works only because systems elect to make the EAI_ error codes
negative.
Best not to rely on that: change the callers to treat any non-zero
value as failure. Also change them to return -1 instead of the value
received from getaddrinfo() on failure, to avoid positive error
values.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-25-armbru@redhat.com>
The QEMUFileHooks methods don't come with a written contract. Digging
through the code calling them, we find:
* save_page():
Negative values RAM_SAVE_CONTROL_DELAYED and
RAM_SAVE_CONTROL_NOT_SUPP are special. Any other negative value is
an unspecified error.
qemu_rdma_save_page() returns -EIO or rdma->error_state on error. I
believe the latter is always negative. Nothing stops either of them
to clash with the special values, though. Feels unlikely, but fix
it anyway to return only the special values and -1.
* before_ram_iterate(), after_ram_iterate():
Negative value means error. qemu_rdma_registration_start() and
qemu_rdma_registration_stop() comply as far as I can tell. Make
them comply *obviously*, by returning -1 on error.
* hook_ram_load:
Negative value means error. rdma_load_hook() already returns -1 on
error. Leave it alone.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-24-armbru@redhat.com>
qemu_rdma_data_init() neglects to set an Error when it fails because
@host_port is null. Fortunately, no caller passes null, so this is
merely a latent bug. Drop the flawed code handling null argument.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-23-armbru@redhat.com>
qemu_get_cm_event_timeout() neglects to set an error when it fails
because rdma_get_cm_event() fails. Harmless, as its caller
qemu_rdma_connect() substitutes a generic error then. Fix it anyway.
qemu_rdma_connect() also sets the generic error when its own call of
rdma_get_cm_event() fails. Make the error handling more obvious: set
a specific error right after rdma_get_cm_event() fails. Delete the
generic error.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-22-armbru@redhat.com>
qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until
they find on that works. If none works, they return the first Error
set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one.
qemu_rdma_broken_ipv6_kernel() neglects to set an Error when
ibv_open_device() fails. If a later address fails differently, we use
that Error instead, or else the generic one. Harmless enough, but
needs fixing all the same.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-21-armbru@redhat.com>
Hiding return statements in macros is a bad idea. Use a function
instead, and open code the return part.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-20-armbru@redhat.com>
QIOChannelClass methods qio_channel_rdma_readv() and
qio_channel_rdma_writev() violate their method contract when
rdma->error_state is non-zero:
1. They return whatever is in rdma->error_state then. Only -1 will be
fine. -2 will be misinterpreted as "would block". Anything less
than -2 isn't defined in the contract. A positive value would be
misinterpreted as success, but I believe that's not actually
possible.
2. They neglect to set an error then. If something up the call stack
dereferences the error when failure is returned, it will crash. If
it ignores the return value and checks the error instead, it will
miss the error.
Crap like this happens when return statements hide in macros,
especially when their uses are far away from the definition.
I elected not to investigate how callers are impacted.
Expand the two bad macro uses, so we can set an error and return -1.
The next commit will then get rid of the macro altogether.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-19-armbru@redhat.com>
Several error messages include numeric error codes returned by failed
functions:
* ibv_poll_cq() returns an unspecified negative value. Useless.
* rdma_accept and rdma_get_cm_event() return -1. Useless.
* qemu_rdma_poll() returns either -1 or an unspecified negative
value. Useless.
* qemu_rdma_block_for_wrid(), qemu_rdma_write_flush(),
qemu_rdma_exchange_send(), qemu_rdma_exchange_recv(),
qemu_rdma_write() return a negative value that may or may not be an
errno value. While reporting human-readable errno
information (which a number is not) can be useful, reporting an
error code that may or may not be an errno value is useless.
Drop these error codes from the error messages.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-18-armbru@redhat.com>
We use errno after calling Libibverbs functions that are not
documented to set errno (manual page does not mention errno), or where
the documentation is unclear ("returns [...] the value of errno on
failure"). While this could be read as "sets errno and returns it",
a glance at the source code[*] kills that hope:
static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr,
struct ibv_send_wr **bad_wr)
{
return qp->context->ops.post_send(qp, wr, bad_wr);
}
The callback can be
static int mana_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
struct ibv_send_wr **bad)
{
/* This version of driver supports RAW QP only.
* Posting WR is done directly in the application.
*/
return EOPNOTSUPP;
}
Neither of them touches errno.
One of these errno uses is easy to fix, so do that now. Several more
will go away later in the series; add temporary FIXME commments.
Three will remain; add TODO comments. TODO, not FIXME, because the
bug might be in Libibverbs documentation.
[*] https://github.com/linux-rdma/rdma-core.git
commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-17-armbru@redhat.com>
@error_reported and @received_error are flags. The latter is even
assigned bool true. Change them from int to bool.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-16-armbru@redhat.com>
qemu_rdma_buffer_mergeable() is semantically a predicate. It returns
int 0 or 1. Return bool instead, and fix the function name's
spelling.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-15-armbru@redhat.com>
qemu_rdma_search_ram_block() can't fail. Return void, and drop the
unreachable error handling.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-14-armbru@redhat.com>
rdma_add_block() can't fail. Return void, and drop the unreachable
error handling.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-13-armbru@redhat.com>
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-12-armbru@redhat.com>
include/qapi/error.h demands:
* - Functions that use Error to report errors have an Error **errp
* parameter. It should be the last parameter, except for functions
* taking variable arguments.
qemu_rdma_connect() does not conform. Clean it up.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-11-armbru@redhat.com>
qemu_rdma_accept() returns 0 in some cases even when it didn't
complete its job due to errors. Impact is not obvious. I figure the
caller will soon fail again with a misleading error message.
Fix it to return -1 on any failure.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-10-armbru@redhat.com>
qemu_rdma_exchange_get_response() compares int parameter @expecting
with uint32_t head->type. Actual arguments are non-negative
enumeration constants, RDMAControlHeader uint32_t member type, or
qemu_rdma_exchange_recv() int parameter expecting. Actual arguments
for the latter are non-negative enumeration constants. Change both
parameters to uint32_t.
In qio_channel_rdma_readv(), loop control variable @i is ssize_t, and
counts from 0 up to @niov, which is size_t. Change @i to size_t.
While there, make qio_channel_rdma_readv() and
qio_channel_rdma_writev() more consistent: change the former's @done
to ssize_t, and delete the latter's useless initialization of @len.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-8-armbru@redhat.com>
qio_channel_rdma_readv() assigns the size_t value of qemu_rdma_fill()
to an int variable before it adds it to @done / subtracts it from
@want, both size_t. Truncation when qemu_rdma_fill() copies more than
INT_MAX bytes. Seems vanishingly unlikely, but needs fixing all the
same.
Fixes: 6ddd2d76ca (migration: convert RDMA to use QIOChannel interface)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-7-armbru@redhat.com>
We use int instead of uint64_t in a few places. Change them to
uint64_t.
This cleans up a comparison of signed qemu_rdma_block_for_wrid()
parameter @wrid_requested with unsigned @wr_id. Harmless, because the
actual arguments are non-negative enumeration constants.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-6-armbru@redhat.com>
wrid_desc[] uses 4001 pointers to map four integer values to strings.
print_wrid() accesses wrid_desc[] out of bounds when passed a negative
argument. It returns null for values 2..1999 and 2001..3999.
qemu_rdma_poll() and qemu_rdma_block_for_wrid() print wrid_desc[wr_id]
and passes print_wrid(wr_id) to tracepoints. Could conceivably crash
trying to format a null string. I believe access out of bounds is not
possible.
Not worth cleaning up. Dumb down to show just numeric wr_id.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-5-armbru@redhat.com>
rdma_delete_block() always returns 0, which its only caller ignores.
Return void instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-4-armbru@redhat.com>
qemu_rdma_data_init() return type is void *. It actually returns
RDMAContext *, and all its callers assign the value to an
RDMAContext *. Unclean.
Return RDMAContext * instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-3-armbru@redhat.com>
qemu_rdma_poll()'s return type is uint64_t, even though it returns 0,
-1, or @ret, which is int. Its callers assign the return value to int
variables, then check whether it's negative. Unclean.
Return int instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-2-armbru@redhat.com>
There's a bug on dest that if a double fault triggered on dest qemu (a
network issue during postcopy-recover), we won't set PAUSED correctly
because we assumed we always came from ACTIVE.
Fix that by always overwriting the state to PAUSE.
We could also check for these two states, but maybe it's an overkill. We
did the same on the src QEMU to unconditionally switch to PAUSE anyway.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-10-peterx@redhat.com>
We are sending a migration event of MIGRATION_STATUS_SETUP at
qemu_start_incoming_migration but never actually setting the state.
This creates a window between qmp_migrate_incoming and
process_incoming_migration_co where the migration status is still
MIGRATION_STATUS_NONE. Calling query-migrate during this time will
return an empty response even though the incoming migration command
has already been issued.
Commit 7cf1fe6d68 ("migration: Add migration events on target side")
has added support to the 'events' capability to the incoming part of
migration, but chose to send the SETUP event without setting the
state. I'm assuming this was a mistake.
This introduces a change in behavior, any QMP client waiting for the
SETUP event will hang, unless it has previously enabled the 'events'
capability. Having the capability enabled is sufficient to continue to
receive the event.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230712190742.22294-5-farosas@suse.de>
QEMU will crash if anyone tries to set tls-authz (which is a type
StrOrNull) with 'null' value. Fix it in the easy way by converting it to
qstring just like the other two tls parameters.
Cc: qemu-stable@nongnu.org # v4.0+
Fixes: d2f1d29b95 ("migration: add support for a "tls-authz" migration parameter")
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230905162335.235619-2-peterx@redhat.com>
Currently query-dirty-rate uses QEMU_CLOCK_REALTIME as
the source for start-time field. This translates to
clock_gettime(CLOCK_MONOTONIC), i.e. number of seconds
since host boot. This is not very useful. The only
reasonable use case of start-time I can imagine is to
check whether previously completed measurements are
too old or not. But this makes sense only if start-time
is reported as host wall-clock time.
This patch replaces source of start-time from
QEMU_CLOCK_REALTIME to QEMU_CLOCK_HOST.
Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
Reviewed-by: Hyman Huang <yong.huang@smartx.com>
Message-Id: <399861531e3b24a1ecea2ba453fb2c3d129fb03a.1693905328.git.gudkov.andrei@huawei.com>
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
For both save/load we actually share the logic on deciding whether a field
should exist. Merge the checks into a helper and use it for both save and
load. When doing so, add documentations and reformat the code to make it
much easier to read.
The real benefit here (besides code cleanups) is we add a trace-point for
this; this is a known spot where we can easily break migration
compatibilities between binaries, and this trace point will be critical for
us to identify such issues.
For example, this will be handy when debugging things like:
https://gitlab.com/qemu-project/qemu/-/issues/932
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230906204722.514474-1-peterx@redhat.com>
Allow an offset option to be specified as part of the file URI, in
the form "file:filename,offset=offset", where offset accepts the common
size suffixes, or the 0x prefix, but not both. Migration data is written
to and read from the file starting at offset. If unspecified, it defaults
to 0.
This is needed by libvirt to store its own data at the head of the file.
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1694182931-61390-3-git-send-email-steven.sistare@oracle.com>
Extend the migration URI to support file:<filename>. This can be used for
any migration scenario that does not require a reverse path. It can be
used as an alternative to 'exec:cat > file' in minimized containers that
do not contain /bin/sh, and it is easier to use than the fd:<fdname> URI.
It can be used in HMP commands, and as a qemu command-line parameter.
For best performance, guest ram should be shared and x-ignore-shared
should be true, so guest pages are not written to the file, in which case
the guest may remain running. If ram is not so configured, then the user
is advised to stop the guest first. Otherwise, a busy guest may re-dirty
the same page, causing it to be appended to the file multiple times,
and the file may grow unboundedly. That issue is being addressed in the
"fixed-ram" patch series.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Tested-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1694182931-61390-2-git-send-email-steven.sistare@oracle.com>
Previously, we got a confusion error that complains
the RDMAControlHeader.repeat:
qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
Actually, it's caused by an unexpected RDMAControlHeader.type.
After this patch, error will become:
qemu-system-x86_64: Unknown control message QEMU FILE
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230926100103.201564-2-lizhijian@fujitsu.com>
A few code paths exist in the source code,where a migration is
marked as failed via MIGRATION_STATUS_FAILED, but the failure happens
outside of migration.c
In such cases, an error_report() call is made, however the current
MigrationState is never updated with the error description, and hence
clients like libvirt never know the actual reason for the failure.
This patch covers such cases outside of migration.c and updates the
error description at the appropriate places.
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231003065538.244752-3-tejus.gk@nutanix.com>
Currently, a few code paths exist in the function vmstate_save_state_v,
which ultimately leads to a migration failure. However, an update in the
current MigrationState for the error description is never done.
vmstate.c somehow doesn't seem to allow the use of migrate_set_error due
to some dependencies for unit tests. Hence, this patch introduces a new
function vmstate_save_state_with_err, which will eventually propagate
the error message to savevm.c where a migrate_set_error call can be
eventually done.
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231003065538.244752-2-tejus.gk@nutanix.com>
When we sent a page through QEMUFile hooks (RDMA) there are three
posiblities:
- We are not using RDMA. return RAM_SAVE_CONTROL_DELAYED and
control_save_page() returns false to let anything else to proceed.
- There is one error but we are using RDMA. Then we return a negative
value, control_save_page() needs to return true.
- Everything goes well and RDMA start the sent of the page
asynchronously. It returns RAM_SAVE_CONTROL_DELAYED and we need to
return 1 for ram_save_page_legacy.
Clear?
I know, I know, the interface is as bad as it gets. I think that now
it is a bit clearer, but this needs to be done some other way.
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-16-quintela@redhat.com>
After this change, nothing abuses QEMUFile to account for data
transferrefd during migration.
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-15-quintela@redhat.com>
RDMA protocol is completely asynchronous, so in qemu_rdma_save_page()
they "invent" that a byte has been transferred. And then they call
qemu_file_credit_transfer() and ram_transferred_add() with that byte.
Just remove that calls as nothing has been sent.
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-14-quintela@redhat.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-13-quintela@redhat.com>
Remove the one in control_save_page().
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-12-quintela@redhat.com>
Just create a variable for it, the same way that multifd does. This
way it is safe to use for other thread, etc, etc.
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-11-quintela@redhat.com>
We only care about the amount of bytes transferred. Flushing is done
by the system somewhere else.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230530183941.7223-4-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
In the function qmp_migrate(), yank_unregister_instance() gets called
twice which isn't required. Hence, refactoring it so that it gets called
during the local_error cleanup.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
Message-ID: <20230621130940.178659-3-tejus.gk@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Local variables shadowing other local variables or parameters make the
code needlessly hard to understand. Tracked down with -Wshadow=local.
Clean up: delete inner declarations when they are actually redundant,
else rename variables.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Message-ID: <20230921121312.1301864-3-armbru@redhat.com>
qemu_rdma_save_page() reports polling error with error_report(), then
succeeds anyway. This is because the variable holding the polling
status *shadows* the variable the function returns. The latter
remains zero.
Broken since day one, and duplicated more recently.
Fixes: 2da776db48 (rdma: core logic)
Fixes: b390afd8c5 (migration/rdma: Fix out of order wrid)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Message-ID: <20230921121312.1301864-2-armbru@redhat.com>
Now that the return path thread is allowed to finish during a paused
migration, we can move the cleanup of the QEMUFiles to the main
migration thread.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-9-farosas@suse.de>
Replace the return path retry logic with finishing and restarting the
thread. This fixes a race when resuming the migration that leads to a
segfault.
Currently when doing postcopy we consider that an IO error on the
return path file could be due to a network intermittency. We then keep
the thread alive but have it do cleanup of the 'from_dst_file' and
wait on the 'postcopy_pause_rp' semaphore. When the user issues a
migrate resume, a new return path is opened and the thread is allowed
to continue.
There's a race condition in the above mechanism. It is possible for
the new return path file to be setup *before* the cleanup code in the
return path thread has had a chance to run, leading to the *new* file
being closed and the pointer set to NULL. When the thread is released
after the resume, it tries to dereference 'from_dst_file' and crashes:
Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
154 return f->last_error;
(gdb) bt
#0 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
#1 0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
#2 0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
#3 0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
#4 0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
#5 0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Here's the race (important bit is open_return_path happening before
migration_release_dst_files):
migration | qmp | return path
--------------------------+-----------------------------+---------------------------------
qmp_migrate_pause()
shutdown(ms->to_dst_file)
f->last_error = -EIO
migrate_detect_error()
postcopy_pause()
set_state(PAUSED)
wait(postcopy_pause_sem)
qmp_migrate(resume)
migrate_fd_connect()
resume = state == PAUSED
open_return_path <-- TOO SOON!
set_state(RECOVER)
post(postcopy_pause_sem)
(incoming closes to_src_file)
res = qemu_file_get_error(rp)
migration_release_dst_files()
ms->rp_state.from_dst_file = NULL
post(postcopy_pause_rp_sem)
postcopy_pause_return_path_thread()
wait(postcopy_pause_rp_sem)
rp = ms->rp_state.from_dst_file
goto retry
qemu_file_get_error(rp)
SIGSEGV
-------------------------------------------------------------------------------------------
We can keep the retry logic without having the thread alive and
waiting. The only piece of data used by it is the 'from_dst_file' and
it is only allowed to proceed after a migrate resume is issued and the
semaphore released at migrate_fd_connect().
Move the retry logic to outside the thread by waiting for the thread
to finish before pausing the migration.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-8-farosas@suse.de>
We'll start calling the await_return_path_close_on_source() function
from other parts of the code, so move all of the related checks and
tracepoints into it.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-7-farosas@suse.de>
This file is owned by the return path thread which is already doing
cleanup.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-6-farosas@suse.de>
It's not safe to call qemu_file_shutdown() on the to_dst_file without
first checking for the file's presence under the lock. The cleanup of
this file happens at postcopy_pause() and migrate_fd_cleanup() which
are not necessarily running in the same thread as migrate_fd_cancel().
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-5-farosas@suse.de>