diff mbox

[RFC,v10,02/24] replay: global variables and function stubs

Message ID 20150227130958.11912.56622.stgit@PASHA-ISP
State New
Headers show

Commit Message

Pavel Dovgalyuk Feb. 27, 2015, 1:09 p.m. UTC
This patch adds global variables, defines, functions declarations,
and function stubs for deterministic VM replay used by external modules.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 Makefile.target      |    1 
 docs/replay.txt      |  161 ++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json     |   18 ++++++
 replay/Makefile.objs |    1 
 replay/replay.c      |   14 ++++
 replay/replay.h      |   19 ++++++
 stubs/Makefile.objs  |    1 
 stubs/replay.c       |    3 +
 8 files changed, 218 insertions(+), 0 deletions(-)
 create mode 100755 docs/replay.txt
 create mode 100755 replay/Makefile.objs
 create mode 100755 replay/replay.c
 create mode 100755 replay/replay.h
 create mode 100755 stubs/replay.c

Comments

Eric Blake March 13, 2015, 7:07 p.m. UTC | #1
On 02/27/2015 06:09 AM, Pavel Dovgalyuk wrote:
> This patch adds global variables, defines, functions declarations,

s/functions/function/

> and function stubs for deterministic VM replay used by external modules.
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> ---

> +++ b/docs/replay.txt
> @@ -0,0 +1,161 @@
> +Record/replay
> +-------------

It might be nice to include an explicit copyright and license statement
at the top of this file (without one, you have inherited the top-level
default of GPLv2+).

> +
> +Record/replay functions are used for the reverse execution and deterministic 
> +replay of qemu execution. This implementation of deterministic replay can 
> +be used for deterministic debugging of guest code through gdb remote

s/through/through a/

> +interface.
> +
> +Execution recording writes non-deterministic events log, which can be later 

s/writes/writes a/

> +used for replaying the execution anywhere and for unlimited number of times. 
> +It also supports checkpointing for faster rewinding during reverse debugging. 
> +Execution replaying reads the log and replays all non-deterministic events 
> +including external input, hardware clocks, and interrupts.
> +
> +Deterministic replay has the following features:
> + * Deterministically replays whole system execution and all contents of 
> +   the memory, state of the hadrware devices, clocks, and screen of the VM.

s/hadrware/hardware/

> + * Writes execution log into the file for latter replaying for multiple times 

s/latter/later/

> +   on different machines.
> + * Supports i386, x86_64, and ARM hardware platforms.
> + * Performs deterministic replay of all operations with keyboard and mouse
> +   input devices.
> +
> +Usage of the record/replay:
> + * First, record the execution, by adding the following string to the command line:

s/string/arguments/ (it is not 1 string argument with 3 embedded spaces,
but four separate arguments)

> +   '-icount shift=7,rr=record,rrfile=replay.bin -net none'. 
> +   Block devices' images are not actually changed in the recording mode, 
> +   because all of the changes are written to the temporary overlay file.
> + * Then you can replay it for the multiple times by using another command

s/for the multiple times//

> +   line option: '-icount shift=7,rr=replay,rrfile=replay.bin -net none'
> + * '-net none' option should also be specified if network replay patches
> +   are not applied.
> +
> +Paper with short description of deterministic replay implementation:
> +http://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html
> +
> +Modifications of qemu include:
> + * wrappers for clock and time functions to save their return values in the log
> + * saving different asynchronous events (e.g. system shutdown) into the log
> + * synchronization of the bottom halves execution
> + * synchronization of the threads from thread pool
> + * recording/replaying user input (mouse and keyboard)
> + * adding internal checkpoints for cpu and io synchronization
> +
> +Non-deterministic events
> +------------------------
> +
> +Our record/replay system is based on saving and replaying non-deterministic 
> +events (e.g. keyboard input) and simulating deterministic ones (e.g. reading 
> +from HDD or memory of the VM). Saving only non-deterministic events makes 
> +log file smaller, simulation faster, and allows using reverse debugging even 
> +for realtime applications. 
> +
> +The following non-deterministic data from peripheral devices is saved into 
> +the log: mouse and keyboard input, network packets, audio controller input, 
> +USB packets, serial port input, and hardware clocks (they are non-deterministic 
> +too, because their values are taken from the host machine). Inputs from 
> +simulated hardware, memory of VM, software interrupts, and execution of 
> +instructions are not saved into the log, because they are deterministic and 
> +can be replayed by simulating the behavior of virtual machine starting from 
> +initial state.
> +
> +We had to solve three tasks to implement deterministic replay: recording 
> +non-deterministic events, replaying non-deterministic events, and checking 
> +that there is no divergence between record and replay modes.
> +
> +We changed several parts of QEMU to make event log recording and replaying.
> +Devices' models that have non-deterministic input from external devices were 
> +changed to write every external event into the execution log immediately. 
> +E.g. network packets are written into the log when they arrive into the virtual 
> +network adapter.
> +
> +All non-deterministic events are coming from these devices. But to 
> +replay them we need to know at which moments they occur. We specify 
> +these moments by counting the number of instructions executed between 
> +every pair of consecutive events.
> +
> +Instructions counting

s/Instructions/Instruction/

> +---------------------
> +
> +QEMU should work in icount mode to use record/replay feature. icount was
> +designed to allow deterministic execution in absense of external inputs

s/absense/absence/

> +of the virtual machine. We also use icount to control the occurence of the

s/occurence/occurrence/

> +non-deterministic events. Number of instruction passed from the last event

s/Number/The number/
s/instruction passed/instructions elapsed/

> +is written to the log while recording the execution. In replay mode we
> +can predict when to inject that event using the instructions counter.

s/instructions/instruction/

> +
> +Timers
> +------
> +
> +Timers are used to execute callbacks from different subsystems of QEMU
> +at the specified moments of time. There are several kinds of timers: 
> + * Real time clock. Based on host time and used only for callbacks that 
> +   do not change the virtual machine state. For this reason real time
> +   clock and timers does not affect deterministic replay at all;

s/;/./

> + * Virtual clock. These timers run only during the emulation. In icount
> +   mode virtual clock value is calculated using executed instructions counter.
> +   That is why it is completely deterministic and does not have to be recorded;
> + * Host clock. This clock is used by device models that simulate real time
> +   sources (e.g. real time clock chip). Host clock is the one of the sources
> +   of non-determinism. Host clock read operations should be logged to
> +   make the execution deterministic.
> + * Real time clock for icount. This clock is similar to real time clock but
> +   it is used only for increasing virtual clock while virtual machine is
> +   sleeping. Due to its nature it is also non-deterministic as the host clock
> +   and has to be logged too.
> +
> +Checkpoints
> +-----------
> +
> +Replaying of the execution of virtual machine is bound by sources of
> +non-determinism. These are inputs from clock and peripheral devices,
> +and QEMU thread scheduling. Thread scheduling affect on processing events
> +from timers, asynchronous input-output, and bottom halves.
> +
> +Invocations of timers are coupled with clock reads and changing the state 
> +of the virtual machine. Reads produce non-deterministic data taken from
> +host clock. And VM state changes should preserve their order. Their relative
> +order in replay mode must replicate the order of callbacks in record mode.
> +To preserve this order we use checkpoints. When specific clock is processed

s/When/When a/

> +in record mode we save to the log special ''checkpoint`` event.

s/``/''/ or even s/''\(.*\)``/"\1"/

> +Checkpoints here do not refer to virtual machine snapshots. They are just
> +record/replay events used for synchronization.
> +
> +QEMU in replay mode will try to invoke timers processing in random moment 
> +of time. That's why we do not process group of timers until the checkpoint

s/group/a group/

> +event will be read from the log. Such an event allows synchronizing CPU 
> +execution and timer events.
> +
> +Another checkpoints application in record/replay is instructions counting

s/instructions/instruction/

> +while the virtual machine is idle. This function (qemu_clock_warp) is called
> +from the wait loop. It changes virtual machine state and must be deterministic
> +then. That is why we added checkpoint to this function to prevent its 
> +operation in replay mode when it does not correspond to record mode.
> +
> +Bottom halves
> +-------------
> +
> +Disk I/O events are completely deterministic in our model, because 
> +in both record and replay modes we start virtual machine from the same 
> +disk state. But callbacks that virtual disk controller uses for reading and
> +writing the disk may occur at different moments of time in record and replay
> +modes.
> +
> +Reading and writing requests are created by CPU thread of QEMU. Later these
> +requests proceed to block layer which creates ''bottom halves``. Bottom

Another instance of your odd quoting style.  Make it consistent with the
solution you chose above.

> +halves consist of callback and its parameters. They are processed when
> +main loop locks the global mutex. These locking are not synchronized with

s/locking/locks/

> +replaying process because main loop also processes the events that do not
> +affect the virtual machine state (like user interation with monitor).

s/interation/interaction/

> +
> +That is why we had to implement saving and replaying bottom halves callbacks
> +synchronously to the CPU execution. When the callback is about to execute
> +it is added to the queue in the replay module. This queue is written to the
> +log when its callbacks are executed. In replay mode callbacks do not processed

s/do/are/

> +until the corresponding event is read from the events log file.
> +
> +Sometimes block layer uses asynchronous callbacks for its internal purposes 

s/block/the block/

> +(like reading or writing VM snapshots or disk image cluster tables). In this
> +case bottom halves are not marked as ''replayable`` and do not saved 

No trailing whitespace.  Another instance of odd quoting style.

> +into the log.
diff mbox

Patch

diff --git a/Makefile.target b/Makefile.target
index 58c6ae1..cd939c1 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -84,6 +84,7 @@  all: $(PROGS) stap
 # cpu emulator library
 obj-y = exec.o translate-all.o cpu-exec.o
 obj-y += tcg/tcg.o tcg/tcg-op.o tcg/optimize.o
+obj-y += replay/
 obj-$(CONFIG_TCG_INTERPRETER) += tci.o
 obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
 obj-y += fpu/softfloat.o
diff --git a/docs/replay.txt b/docs/replay.txt
new file mode 100755
index 0000000..084ca65
--- /dev/null
+++ b/docs/replay.txt
@@ -0,0 +1,161 @@ 
+Record/replay
+-------------
+
+Record/replay functions are used for the reverse execution and deterministic 
+replay of qemu execution. This implementation of deterministic replay can 
+be used for deterministic debugging of guest code through gdb remote
+interface.
+
+Execution recording writes non-deterministic events log, which can be later 
+used for replaying the execution anywhere and for unlimited number of times. 
+It also supports checkpointing for faster rewinding during reverse debugging. 
+Execution replaying reads the log and replays all non-deterministic events 
+including external input, hardware clocks, and interrupts.
+
+Deterministic replay has the following features:
+ * Deterministically replays whole system execution and all contents of 
+   the memory, state of the hadrware devices, clocks, and screen of the VM.
+ * Writes execution log into the file for latter replaying for multiple times 
+   on different machines.
+ * Supports i386, x86_64, and ARM hardware platforms.
+ * Performs deterministic replay of all operations with keyboard and mouse
+   input devices.
+
+Usage of the record/replay:
+ * First, record the execution, by adding the following string to the command line:
+   '-icount shift=7,rr=record,rrfile=replay.bin -net none'. 
+   Block devices' images are not actually changed in the recording mode, 
+   because all of the changes are written to the temporary overlay file.
+ * Then you can replay it for the multiple times by using another command
+   line option: '-icount shift=7,rr=replay,rrfile=replay.bin -net none'
+ * '-net none' option should also be specified if network replay patches
+   are not applied.
+
+Paper with short description of deterministic replay implementation:
+http://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html
+
+Modifications of qemu include:
+ * wrappers for clock and time functions to save their return values in the log
+ * saving different asynchronous events (e.g. system shutdown) into the log
+ * synchronization of the bottom halves execution
+ * synchronization of the threads from thread pool
+ * recording/replaying user input (mouse and keyboard)
+ * adding internal checkpoints for cpu and io synchronization
+
+Non-deterministic events
+------------------------
+
+Our record/replay system is based on saving and replaying non-deterministic 
+events (e.g. keyboard input) and simulating deterministic ones (e.g. reading 
+from HDD or memory of the VM). Saving only non-deterministic events makes 
+log file smaller, simulation faster, and allows using reverse debugging even 
+for realtime applications. 
+
+The following non-deterministic data from peripheral devices is saved into 
+the log: mouse and keyboard input, network packets, audio controller input, 
+USB packets, serial port input, and hardware clocks (they are non-deterministic 
+too, because their values are taken from the host machine). Inputs from 
+simulated hardware, memory of VM, software interrupts, and execution of 
+instructions are not saved into the log, because they are deterministic and 
+can be replayed by simulating the behavior of virtual machine starting from 
+initial state.
+
+We had to solve three tasks to implement deterministic replay: recording 
+non-deterministic events, replaying non-deterministic events, and checking 
+that there is no divergence between record and replay modes.
+
+We changed several parts of QEMU to make event log recording and replaying.
+Devices' models that have non-deterministic input from external devices were 
+changed to write every external event into the execution log immediately. 
+E.g. network packets are written into the log when they arrive into the virtual 
+network adapter.
+
+All non-deterministic events are coming from these devices. But to 
+replay them we need to know at which moments they occur. We specify 
+these moments by counting the number of instructions executed between 
+every pair of consecutive events.
+
+Instructions counting
+---------------------
+
+QEMU should work in icount mode to use record/replay feature. icount was
+designed to allow deterministic execution in absense of external inputs
+of the virtual machine. We also use icount to control the occurence of the
+non-deterministic events. Number of instruction passed from the last event
+is written to the log while recording the execution. In replay mode we
+can predict when to inject that event using the instructions counter.
+
+Timers
+------
+
+Timers are used to execute callbacks from different subsystems of QEMU
+at the specified moments of time. There are several kinds of timers: 
+ * Real time clock. Based on host time and used only for callbacks that 
+   do not change the virtual machine state. For this reason real time
+   clock and timers does not affect deterministic replay at all;
+ * Virtual clock. These timers run only during the emulation. In icount
+   mode virtual clock value is calculated using executed instructions counter.
+   That is why it is completely deterministic and does not have to be recorded;
+ * Host clock. This clock is used by device models that simulate real time
+   sources (e.g. real time clock chip). Host clock is the one of the sources
+   of non-determinism. Host clock read operations should be logged to
+   make the execution deterministic.
+ * Real time clock for icount. This clock is similar to real time clock but
+   it is used only for increasing virtual clock while virtual machine is
+   sleeping. Due to its nature it is also non-deterministic as the host clock
+   and has to be logged too.
+
+Checkpoints
+-----------
+
+Replaying of the execution of virtual machine is bound by sources of
+non-determinism. These are inputs from clock and peripheral devices,
+and QEMU thread scheduling. Thread scheduling affect on processing events
+from timers, asynchronous input-output, and bottom halves.
+
+Invocations of timers are coupled with clock reads and changing the state 
+of the virtual machine. Reads produce non-deterministic data taken from
+host clock. And VM state changes should preserve their order. Their relative
+order in replay mode must replicate the order of callbacks in record mode.
+To preserve this order we use checkpoints. When specific clock is processed
+in record mode we save to the log special ''checkpoint`` event.
+Checkpoints here do not refer to virtual machine snapshots. They are just
+record/replay events used for synchronization.
+
+QEMU in replay mode will try to invoke timers processing in random moment 
+of time. That's why we do not process group of timers until the checkpoint
+event will be read from the log. Such an event allows synchronizing CPU 
+execution and timer events.
+
+Another checkpoints application in record/replay is instructions counting
+while the virtual machine is idle. This function (qemu_clock_warp) is called
+from the wait loop. It changes virtual machine state and must be deterministic
+then. That is why we added checkpoint to this function to prevent its 
+operation in replay mode when it does not correspond to record mode.
+
+Bottom halves
+-------------
+
+Disk I/O events are completely deterministic in our model, because 
+in both record and replay modes we start virtual machine from the same 
+disk state. But callbacks that virtual disk controller uses for reading and
+writing the disk may occur at different moments of time in record and replay
+modes.
+
+Reading and writing requests are created by CPU thread of QEMU. Later these
+requests proceed to block layer which creates ''bottom halves``. Bottom
+halves consist of callback and its parameters. They are processed when
+main loop locks the global mutex. These locking are not synchronized with
+replaying process because main loop also processes the events that do not
+affect the virtual machine state (like user interation with monitor).
+
+That is why we had to implement saving and replaying bottom halves callbacks
+synchronously to the CPU execution. When the callback is about to execute
+it is added to the queue in the replay module. This queue is written to the
+log when its callbacks are executed. In replay mode callbacks do not processed
+until the corresponding event is read from the events log file.
+
+Sometimes block layer uses asynchronous callbacks for its internal purposes 
+(like reading or writing VM snapshots or disk image cluster tables). In this
+case bottom halves are not marked as ''replayable`` and do not saved 
+into the log.
diff --git a/qapi-schema.json b/qapi-schema.json
index e16f8eb..ec6200a 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3606,3 +3606,21 @@ 
 # Since: 2.1
 ##
 { 'command': 'rtc-reset-reinjection' }
+
+##
+# ReplayMode:
+#
+# Mode of the replay subsystem.
+#
+# @none: normal execution mode. Replay or record are not enabled.
+#
+# @record: record mode. All non-deterministic data is written into the
+#          replay log.
+#
+# @play: replay mode. Non-deterministic data required for system execution
+#        is read from the log.
+#
+# Since: 2.3
+##
+{ 'enum': 'ReplayMode',
+  'data': [ 'none', 'record', 'play' ] }
diff --git a/replay/Makefile.objs b/replay/Makefile.objs
new file mode 100755
index 0000000..7ea860f
--- /dev/null
+++ b/replay/Makefile.objs
@@ -0,0 +1 @@ 
+obj-y += replay.o
diff --git a/replay/replay.c b/replay/replay.c
new file mode 100755
index 0000000..5ce066f
--- /dev/null
+++ b/replay/replay.c
@@ -0,0 +1,14 @@ 
+/*
+ * replay.c
+ *
+ * Copyright (c) 2010-2015 Institute for System Programming
+ *                         of the Russian Academy of Sciences.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "replay.h"
+
+ReplayMode replay_mode = REPLAY_MODE_NONE;
diff --git a/replay/replay.h b/replay/replay.h
new file mode 100755
index 0000000..d6b73c3
--- /dev/null
+++ b/replay/replay.h
@@ -0,0 +1,19 @@ 
+#ifndef REPLAY_H
+#define REPLAY_H
+
+/*
+ * replay.h
+ *
+ * Copyright (c) 2010-2015 Institute for System Programming
+ *                         of the Russian Academy of Sciences.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qapi-types.h"
+
+extern ReplayMode replay_mode;
+
+#endif
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 5e347d0..45a6c71 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -27,6 +27,7 @@  stub-obj-y += notify-event.o
 stub-obj-y += pci-drive-hot-add.o
 stub-obj-$(CONFIG_SPICE) += qemu-chr-open-spice.o
 stub-obj-y += qtest.o
+stub-obj-y += replay.o
 stub-obj-y += reset.o
 stub-obj-y += runstate-check.o
 stub-obj-y += set-fd-handler.o
diff --git a/stubs/replay.c b/stubs/replay.c
new file mode 100755
index 0000000..563c777
--- /dev/null
+++ b/stubs/replay.c
@@ -0,0 +1,3 @@ 
+#include "replay/replay.h"
+
+ReplayMode replay_mode;