This grand cleanup drops all reset and vmsave/load related
synchronization points in favor of four(!) generic hooks:
- cpu_synchronize_all_states in qemu_savevm_state_complete
(initial sync from kernel before vmsave)
- cpu_synchronize_all_post_init in qemu_loadvm_state
(writeback after vmload)
- cpu_synchronize_all_post_init in main after machine init
- cpu_synchronize_all_post_reset in qemu_system_reset
(writeback after system reset)
These writeback points + the existing one of VCPU exec after
cpu_synchronize_state map on three levels of writeback:
- KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
- KVM_PUT_RESET_STATE (on synchronous system reset, all VCPUs stopped)
- KVM_PUT_FULL_STATE (on init or vmload, all VCPUs stopped as well)
This level is passed to the arch-specific VCPU state writing function
that will decide which concrete substates need to be written. That way,
no writer of load, save or reset functions that interact with in-kernel
KVM states will ever have to worry about synchronization again. That
also means that a lot of reasons for races, segfaults and deadlocks are
eliminated.
cpu_synchronize_state remains untouched, just as Anthony suggested. We
continue to need it before reading or writing of VCPU states that are
also tracked by in-kernel KVM subsystems.
Consequently, this patch removes many cpu_synchronize_state calls that
are now redundant, just like remaining explicit register syncs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
savevm without id or tag segfaults in:
(gdb) bt
#0 0x00007f600a83bf8a in __strcmp_sse42 () from /lib64/libc.so.6
#1 0x00000000004745b6 in bdrv_snapshot_find (bs=<value optimized out>,
sn_info=0x7fff996be280, name=0x0) at savevm.c:1631
#2 0x0000000000475c80 in del_existing_snapshots (name=<value optimized out>,
mon=<value optimized out>) at savevm.c:1654
#3 do_savevm (name=<value optimized out>, mon=<value optimized out>)
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
CC savevm.o
cc1: warnings being treated as errors
savevm.c: In function 'file_put_buffer':
savevm.c:342: error: ignoring return value of 'fwrite', declared with attribute warn_unused_result
make: *** [savevm.o] Error 1
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The effect of this patch with current block migration is that its stage
2, ie. the first full walk-through of the block devices will be
performed completely before RAM migration starts. This ensures that
continuously changing RAM pages are not re-synchronized all the time
while block migration is not completed.
Future versions of block migration which will respect the specified
downtime will generate a different pattern: After RAM migration has
started as well, block migration may also continue to inject dirty
blocks into the RAM stream once it detects that the number of pending
blocks would extend the downtime unacceptably.
Note that all this relies on the current registration order: block
before RAM migration.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In order to allow proper progress reporting to the monitor that
initiated the migration, forward the monitor reference through the
migration layer down to SaveLiveStateHandler.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Introduce qemu_savevm_state_cancel and inject a stage -1 to cancel a
live migration. This gives the involved subsystems a chance to clean up
dynamically allocated resources. Namely, the block migration layer can
now free its device descriptors and pending blocks.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When the size that we want to transmit is in another field, but in an
unit different that bytes
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Support for buffer that are pointed by a pointer (i.e. not embedded)
where the size that we want to use is a field in the state.
We also need a new place to store where to start in the middle of the
buffer, as now it is a pointer, not the offset of the 1st field.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Seeking on vmstate save/load does not work if the underlying file is a
stream. We could try to make all QEMUFile* forward-seek-aware, but first
attempts in this direction indicated that it's saner to convert the few
qemu_fseek-on-vmstates users to plain reads/writes.
This fixes various subtle vmstate corruptions where unused fields were
involved.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch introduces block migration called during live migration. Block
are being copied to the destination in an async way. First the code will
transfer the whole disk and then transfer all dirty blocks accumulted during
the migration.
Still need to improve transition from the iterative phase of migration to the
end phase. For now transition will take place when all blocks transfered once,
all the dirty blocks will be transfered during the end phase (guest is
suspended).
Changes from v4:
- Global variabels moved to a global state structure allocated dynamically.
- Minor coding style issues.
- Poll block.c for tracking of dirty blocks instead of manage it here.
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When creating a snapshot we can run into the situation that the first disk
doesn't have a snapshot, but the second one does have one with the same name as
the new snapshot.
In this case, qemu doesn't recognize that there is a snapshot to be
overwritten, so it starts to save the new snapshot and errors out later when it
tries to snapshot the second image. With this patch, snapshots on secondary
images are overwritten just like on the first image.
Reported-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
commit b04c4134d6
broke incoming migration. After talking with Gleb, code was intended
to be the way is in this fix. This fixes migration here.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Use qemu_send_packet_raw to send gratuitous arp. This will ensure that
vnet header is handled properly.
Also, avoid sending the gratuitous packet to the guest. There doesn't
appear to be any reason for doing that and the code will currently just
crash if the NIC is not associated with a vlan.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Looks like these are just artifacts of vl.c being split up.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
It allows to have 'things' in savevm format not backed in the device state
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently, after a migration qemu sends a broadcast packet to update
switches' MAC->port mappings.
Unfortunately, it picks a random (constant) ethertype and crosses its
fingers that no one else is using it.
This patch causes it to send a RARP packet instead. RARP was chosen for
2 reasons. One, it is always harmless, and will continue to be so even
as new ethertypes are allocated. Two, it is what VMware ESX sends, so
people who write filtering rules for switches already know about it.
I also changed the code to send SELF_ANNOUNCE_ROUNDS packets, instead of
SELF_ANNOUNCE_ROUNDS + 1, and added a simple backoff scheme.
Signed-off-by: Nolan Leake <nolan <at> sigbus.net>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In a later patch, we introduce pre_save() and post_save() functions.
The whole point of that operation is to change things in the state.
Without this patch, we have to remove the const qualifier in each
use with a cast
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc923584,
f40d753718,
96555a96d7 and
3990d09adf but the fixes were fragile.
Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This naming was used in kvm tree, and is easier to remember
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
vmsd alone is not enugh, because we can have several structs saved with the same description (vmsd).
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In previosu series I remove v2 support for RAM (that was the version that was
supported when SaveVM v3 appeared). Now we can't load RAM for any image saved in SaveVM v2, we can as well remove SaveVM v2 entirely.
Note: That SaveVM RAM was at v2 when General SaveVM support went from v2 to v3 makes talking about versions confusing at least
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This commit ports command handlers that receive one argument to use
the new monitor's dictionary.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We can't check the version in a substruct, it is not stored anywhere
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We read the saved value and check that it is less or equal than the one
stored in the structure.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds support for static sized buffer and typecheks that the buffer is right.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch add supports for variable sized arrays whose size is
another field of the state.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We read the saved value and check that it is the same that the one
is stored in the structure.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds support for saving one VMStateDescription from other
VMStateDescription.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds support for saving arrays inside the struct
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds support for saving pointers to values
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch introduces VMState infrastructure, to convert the save/load
functions of devices to a table approach. This new approach has the
following advantages:
- it is type-safe
- you can't have load/save functions out of sync
- will allows us to have new interesting commands, like dump <device>, that
shows all its internal state.
- Just now, the only added type is arrays, but we can add structures.
- Uses old load_state() function for loading old state.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
do_loadvm() is now called from the monitor.
load_vmstate() is called by do_loadvm() and when -loadvm command line is used.
Command line don't have to play games with vmstop()/vmstart()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
While reading Chris's code for fd migration I noticed the duplication
between QEMUFilePopen and QEMUFileStdio. This fixes it, and makes
qemu_fopen more similar qemu_popen.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
First step cleaning up the drives handling. This one does nothing but
removing drives_table[], still it became seriously big.
drive_get_index() is gone and is replaced by drives_get() which hands
out DriveInfo pointers instead of a table index. This needs adaption in
*tons* of places all over.
The drives are now maintained as linked list.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Hi,
Whoever wrote this migrate_set_speed function is totally stupid.
Any failed or completed migration keeps its state to allow probing of
migration data, but has no associated file anymore. It is, thus,
possible to crash qemu by calling migrate_set_speed after a migration
is finished (or failed, or cancelled), but before another one starts.
This patch fixes it.
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The VM state offset is a concept internal to the image format. Replace
the old bdrv_{get,put}_buffer method that require an index into the
image file that is constructed from the VM state offset and an offset
into the vmstate with the bdrv_{load,save}_vmstate that just take an
offset into the VM state.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Sometimes, upon interrupt, fread returns with no data, and
the (incoming exec) migration fails.
Fix by retrying on such a case.
Signed-off-by: Uri Lublin <uril@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Work around buffer and ioctlsocket argument type signedness problems
Suppress a prototype which is unused on mingw32
Expand a macro to avoid warnings from some GCC versions
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
VLANClientState's fd_read() handler doesn't read from file
descriptors, it adds a buffer to the client's receive queue.
Re-name the handlers to make things a little less confusing.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
All,
I've recently been playing around with migration via exec. Unfortunately,
when starting the incoming qemu process with "-incoming exec:cmd", it suffers
the same problem that -incoming tcp used to suffer; namely, that you can't
interact with the monitor until after the migration has happened. This causes
problems for libvirt usage of -incoming exec, since libvirt expects to be able
to access the monitor ahead of time. This fairly simple patch allows you to
access the monitor both before and after the migration has completed using exec.
(note: developed/tested with qemu-kvm, but applies perfectly fine to qemu)
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch converts the current callers of qemu_fopen_ops().
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently there's no way to unregister a savevm callback, so
e.g. if a NIC is hot-unplugged and a savevm is issued, we'll
segfault.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7148 c046a42c-6fe2-441c-8c8c-71466251a162
This is mainly for consistency, since we don't want
anything outside of savevm setting it explicitly. There
are current no users of that in qemu tree, but there
are potential candidates on kvm-userspace. And avi
is a nice guy, let's be nice with him.
Based on a patch by Yaniv Kamay
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6998 c046a42c-6fe2-441c-8c8c-71466251a162
We now enforce that you cannot write beyond the end of a non-growable file.
qcow2 files are not growable but we rely on them being growable to do
savevm/loadvm. Temporarily allow them to be growable by introducing a new
API specifically for savevm read/write operations.
Reported-by: malc
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6994 c046a42c-6fe2-441c-8c8c-71466251a162
We want to globally define WIN_LEAN_AND_MEAN and WINVER to particular values so
let's do it in OS_CFLAGS.
Then, we can pepper in windows.h includes where using #includes that require it.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6783 c046a42c-6fe2-441c-8c8c-71466251a162
Refactor the monitor API and prepare it for decoupled terminals:
term_print functions are renamed to monitor_* and all monitor services
gain a new parameter (mon) that will once refer to the monitor instance
the output is supposed to appear on. However, the argument remains
unused for now. All monitor command callbacks are also extended by a mon
parameter so that command handlers are able to pass an appropriate
reference to monitor output services.
For the case that monitor outputs so far happen without clearly
identifiable context, the global variable cur_mon is introduced that
shall once provide a pointer either to the current active monitor (while
processing commands) or to the default one. On the mid or long term,
those use case will be obsoleted so that this variable can be removed
again.
Due to the broad usage of the monitor interface, this patch mostly deals
with converting users of the monitor API. A few of them are already
extended to pass 'mon' from the command handler further down to internal
functions that invoke monitor_printf.
At this chance, monitor-related prototypes are moved from console.h to
a new monitor.h. The same is done for the readline API.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6711 c046a42c-6fe2-441c-8c8c-71466251a162
When creating a snapshot with multiple qcow2 disks attached, the current
behaviour is that qemu creates a disk snapshot on all of them and
chooses one to write the VM state to.
Despite having the state only in one image, loadvm tries to restore the
VM state from the middle of nowhere if you run qemu a second time with
only one of the other images attached. In the lucky case it will fail
because there simply is no state, but it also can happen that it loads
the state of a different snapshot (the one this new one is based upon).
The fix is to write a zero VM state size to the images which don't
contain the state, and check this in loadvm.
I agree that you probably have to provoke such things intentionally to
get in a state like this with qemu itself. However, with my second patch
that adds snapshot support to qemu-img it could become a reasonable use
case to have snapshots with and without VM states on the same image.
Signed-off-by: Kevin Wolf <kwolf@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5985 c046a42c-6fe2-441c-8c8c-71466251a162
This is pure code motion. The savevm code is all common code so we can build
it once and share the object with all executables.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5700 c046a42c-6fe2-441c-8c8c-71466251a162