migration/docs: Update postcopy recover session for SETUP phase

Firstly, the "Paused" state was added in the wrong place before. The state
machine section was describing PostcopyState, rather than MigrationStatus.
Drop the Paused state descriptions.

Then in the postcopy recover session, add more information on the state
machine for MigrationStatus in the lines.  Add the new RECOVER_SETUP phase.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
[fix typo s/reconnects/reconnect]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
This commit is contained in:
Peter Xu 2024-06-19 18:30:41 -04:00 committed by Fabiano Rosas
parent 4146b77ec7
commit 21e89f7ad5

View File

@ -99,17 +99,6 @@ ADVISE->DISCARD->LISTEN->RUNNING->END
(although it can't do the cleanup it would do as it
finishes a normal migration).
- Paused
Postcopy can run into a paused state (normally on both sides when
happens), where all threads will be temporarily halted mostly due to
network errors. When reaching paused state, migration will make sure
the qemu binary on both sides maintain the data without corrupting
the VM. To continue the migration, the admin needs to fix the
migration channel using the QMP command 'migrate-recover' on the
destination node, then resume the migration using QMP command 'migrate'
again on source node, with resume=true flag set.
- End
The listen thread can now quit, and perform the cleanup of migration
@ -221,7 +210,8 @@ paused postcopy migration.
The recovery phase normally contains a few steps:
- When network issue occurs, both QEMU will go into PAUSED state
- When network issue occurs, both QEMU will go into **POSTCOPY_PAUSED**
migration state.
- When the network is recovered (or a new network is provided), the admin
can setup the new channel for migration using QMP command
@ -229,9 +219,20 @@ The recovery phase normally contains a few steps:
- On source host, the admin can continue the interrupted postcopy
migration using QMP command 'migrate' with resume=true flag set.
Source QEMU will go into **POSTCOPY_RECOVER_SETUP** state trying to
re-establish the channels.
- After the connection is re-established, QEMU will continue the postcopy
migration on both sides.
- When both sides of QEMU successfully reconnect using a new or fixed up
channel, they will go into **POSTCOPY_RECOVER** state, some handshake
procedure will be needed to properly synchronize the VM states between
the two QEMUs to continue the postcopy migration. For example, there
can be pages sent right during the window when the network is
interrupted, then the handshake will guarantee pages lost in-flight
will be resent again.
- After a proper handshake synchronization, QEMU will continue the
postcopy migration on both sides and go back to **POSTCOPY_ACTIVE**
state. Postcopy migration will continue.
During a paused postcopy migration, the VM can logically still continue
running, and it will not be impacted from any page access to pages that