This grand cleanup drops all reset and vmsave/load related
synchronization points in favor of four(!) generic hooks:
- cpu_synchronize_all_states in qemu_savevm_state_complete
(initial sync from kernel before vmsave)
- cpu_synchronize_all_post_init in qemu_loadvm_state
(writeback after vmload)
- cpu_synchronize_all_post_init in main after machine init
- cpu_synchronize_all_post_reset in qemu_system_reset
(writeback after system reset)
These writeback points + the existing one of VCPU exec after
cpu_synchronize_state map on three levels of writeback:
- KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
- KVM_PUT_RESET_STATE (on synchronous system reset, all VCPUs stopped)
- KVM_PUT_FULL_STATE (on init or vmload, all VCPUs stopped as well)
This level is passed to the arch-specific VCPU state writing function
that will decide which concrete substates need to be written. That way,
no writer of load, save or reset functions that interact with in-kernel
KVM states will ever have to worry about synchronization again. That
also means that a lot of reasons for races, segfaults and deadlocks are
eliminated.
cpu_synchronize_state remains untouched, just as Anthony suggested. We
continue to need it before reading or writing of VCPU states that are
also tracked by in-kernel KVM subsystems.
Consequently, this patch removes many cpu_synchronize_state calls that
are now redundant, just like remaining explicit register syncs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
savevm without id or tag segfaults in:
(gdb) bt
#0 0x00007f600a83bf8a in __strcmp_sse42 () from /lib64/libc.so.6
#1 0x00000000004745b6 in bdrv_snapshot_find (bs=<value optimized out>,
sn_info=0x7fff996be280, name=0x0) at savevm.c:1631
#2 0x0000000000475c80 in del_existing_snapshots (name=<value optimized out>,
mon=<value optimized out>) at savevm.c:1654
#3 do_savevm (name=<value optimized out>, mon=<value optimized out>)
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
CC savevm.o
cc1: warnings being treated as errors
savevm.c: In function 'file_put_buffer':
savevm.c:342: error: ignoring return value of 'fwrite', declared with attribute warn_unused_result
make: *** [savevm.o] Error 1
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The effect of this patch with current block migration is that its stage
2, ie. the first full walk-through of the block devices will be
performed completely before RAM migration starts. This ensures that
continuously changing RAM pages are not re-synchronized all the time
while block migration is not completed.
Future versions of block migration which will respect the specified
downtime will generate a different pattern: After RAM migration has
started as well, block migration may also continue to inject dirty
blocks into the RAM stream once it detects that the number of pending
blocks would extend the downtime unacceptably.
Note that all this relies on the current registration order: block
before RAM migration.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In order to allow proper progress reporting to the monitor that
initiated the migration, forward the monitor reference through the
migration layer down to SaveLiveStateHandler.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Introduce qemu_savevm_state_cancel and inject a stage -1 to cancel a
live migration. This gives the involved subsystems a chance to clean up
dynamically allocated resources. Namely, the block migration layer can
now free its device descriptors and pending blocks.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When the size that we want to transmit is in another field, but in an
unit different that bytes
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Support for buffer that are pointed by a pointer (i.e. not embedded)
where the size that we want to use is a field in the state.
We also need a new place to store where to start in the middle of the
buffer, as now it is a pointer, not the offset of the 1st field.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Seeking on vmstate save/load does not work if the underlying file is a
stream. We could try to make all QEMUFile* forward-seek-aware, but first
attempts in this direction indicated that it's saner to convert the few
qemu_fseek-on-vmstates users to plain reads/writes.
This fixes various subtle vmstate corruptions where unused fields were
involved.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch introduces block migration called during live migration. Block
are being copied to the destination in an async way. First the code will
transfer the whole disk and then transfer all dirty blocks accumulted during
the migration.
Still need to improve transition from the iterative phase of migration to the
end phase. For now transition will take place when all blocks transfered once,
all the dirty blocks will be transfered during the end phase (guest is
suspended).
Changes from v4:
- Global variabels moved to a global state structure allocated dynamically.
- Minor coding style issues.
- Poll block.c for tracking of dirty blocks instead of manage it here.
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When creating a snapshot we can run into the situation that the first disk
doesn't have a snapshot, but the second one does have one with the same name as
the new snapshot.
In this case, qemu doesn't recognize that there is a snapshot to be
overwritten, so it starts to save the new snapshot and errors out later when it
tries to snapshot the second image. With this patch, snapshots on secondary
images are overwritten just like on the first image.
Reported-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
commit b04c4134d6
broke incoming migration. After talking with Gleb, code was intended
to be the way is in this fix. This fixes migration here.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Use qemu_send_packet_raw to send gratuitous arp. This will ensure that
vnet header is handled properly.
Also, avoid sending the gratuitous packet to the guest. There doesn't
appear to be any reason for doing that and the code will currently just
crash if the NIC is not associated with a vlan.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Looks like these are just artifacts of vl.c being split up.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
It allows to have 'things' in savevm format not backed in the device state
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently, after a migration qemu sends a broadcast packet to update
switches' MAC->port mappings.
Unfortunately, it picks a random (constant) ethertype and crosses its
fingers that no one else is using it.
This patch causes it to send a RARP packet instead. RARP was chosen for
2 reasons. One, it is always harmless, and will continue to be so even
as new ethertypes are allocated. Two, it is what VMware ESX sends, so
people who write filtering rules for switches already know about it.
I also changed the code to send SELF_ANNOUNCE_ROUNDS packets, instead of
SELF_ANNOUNCE_ROUNDS + 1, and added a simple backoff scheme.
Signed-off-by: Nolan Leake <nolan <at> sigbus.net>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In a later patch, we introduce pre_save() and post_save() functions.
The whole point of that operation is to change things in the state.
Without this patch, we have to remove the const qualifier in each
use with a cast
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc923584,
f40d753718,
96555a96d7 and
3990d09adf but the fixes were fragile.
Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This naming was used in kvm tree, and is easier to remember
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
vmsd alone is not enugh, because we can have several structs saved with the same description (vmsd).
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In previosu series I remove v2 support for RAM (that was the version that was
supported when SaveVM v3 appeared). Now we can't load RAM for any image saved in SaveVM v2, we can as well remove SaveVM v2 entirely.
Note: That SaveVM RAM was at v2 when General SaveVM support went from v2 to v3 makes talking about versions confusing at least
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This commit ports command handlers that receive one argument to use
the new monitor's dictionary.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We can't check the version in a substruct, it is not stored anywhere
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We read the saved value and check that it is less or equal than the one
stored in the structure.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds support for static sized buffer and typecheks that the buffer is right.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch add supports for variable sized arrays whose size is
another field of the state.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We read the saved value and check that it is the same that the one
is stored in the structure.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds support for saving one VMStateDescription from other
VMStateDescription.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds support for saving arrays inside the struct
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds support for saving pointers to values
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch introduces VMState infrastructure, to convert the save/load
functions of devices to a table approach. This new approach has the
following advantages:
- it is type-safe
- you can't have load/save functions out of sync
- will allows us to have new interesting commands, like dump <device>, that
shows all its internal state.
- Just now, the only added type is arrays, but we can add structures.
- Uses old load_state() function for loading old state.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>