Record/replay functions are used for the deterministic replay of qemu
execution. Execution recording writes a non-deterministic events log, which
can be later used for replaying the execution anywhere and for unlimited
number of times. Execution replaying reads the log and replays all
non-deterministic events including external input, hardware clocks,
and interrupts.
Several parts of QEMU include function calls to make event log recording
and replaying.
Devices' models that have non-deterministic input from external devices were
changed to write every external event into the execution log immediately.
E.g. network packets are written into the log when they arrive into the virtual
network adapter.
All non-deterministic events are coming from these devices. But to
replay them we need to know at which moments they occur. We specify
these moments by counting the number of instructions executed between
every pair of consecutive events.
Academic papers with description of deterministic replay implementation:
*`Deterministic Replay of System's Execution with Multi-target QEMU Simulator for Dynamic Analysis and Reverse Debugging <https://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html>`_
*`Don't panic: reverse debugging of kernel drivers <https://dl.acm.org/citation.cfm?id=2786805.2803179>`_
Modifications of qemu include:
* wrappers for clock and time functions to save their return values in the log
* saving different asynchronous events (e.g. system shutdown) into the log
* synchronization of the bottom halves execution
* synchronization of the threads from thread pool
* recording/replaying user input (mouse, keyboard, and microphone)
* adding internal checkpoints for cpu and io synchronization
* network filter for recording and replaying the packets
* block driver for making block layer deterministic
* serial port input record and replay
* recording of random numbers obtained from the external sources
Instruction counting
--------------------
QEMU should work in icount mode to use record/replay feature. icount was
designed to allow deterministic execution in absence of external inputs
of the virtual machine. We also use icount to control the occurrence of the
non-deterministic events. The number of instructions elapsed from the last event
is written to the log while recording the execution. In replay mode we
can predict when to inject that event using the instruction counter.
Locking and thread synchronisation
----------------------------------
Previously the synchronisation of the main thread and the vCPU thread
was ensured by the holding of the BQL. However the trend has been to
reduce the time the BQL was held across the system including under TCG
system emulation. As it is important that batches of events are kept
in sequence (e.g. expiring timers and checkpoints in the main thread
while instruction checkpoints are written by the vCPU thread) we need
another lock to keep things in lock-step. This role is now handled by
the replay_mutex_lock. It used to be held only for each event being
written but now it is held for a whole execution period. This results
in a deterministic ping-pong between the two main threads.
As the BQL is now a finer grained lock than the replay_lock it is almost
certainly a bug, and a source of deadlocks, to take the
replay_mutex_lock while the BQL is held. This is enforced by an assert.
While the unlocks are usually in the reverse order, this is not
necessary; you can drop the replay_lock while holding the BQL, without
doing a more complicated unlock_iothread/replay_unlock/lock_iothread
sequence.
Checkpoints
-----------
Replaying the execution of virtual machine is bound by sources of
non-determinism. These are inputs from clock and peripheral devices,
and QEMU thread scheduling. Thread scheduling affect on processing events
from timers, asynchronous input-output, and bottom halves.