diff mbox series

[v7,19/19] replay: document development rules

Message ID 20181010133522.24538.48800.stgit@pasha-VirtualBox
State New
Headers show
Series Fixing record/replay and adding reverse debugging | expand

Commit Message

Pavel Dovgalyuk Oct. 10, 2018, 1:35 p.m. UTC
This patch introduces docs/devel/replay.txt which describes the rules
that should be followed to make virtual devices usable in record/replay mode.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgauk@ispras.ru>
---
 docs/devel/replay.txt |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100644 docs/devel/replay.txt

Comments

Artem Pisarenko Oct. 11, 2018, 3:08 p.m. UTC | #1
Great! I'm voting with all my fingers up for such rules. But I would
suggest even more generic rules which prevent breaking determinism in a
more wide sense. At least, where such breakage is trivial to avoid.

Currently I'm working on modification, which extends conditions where guest
execution is kept deterministic, above such narrow set like "only rtc
clock=vm, no serial devices, no network, no external communication
interfaces at all, etc...". Also I'm dealing with bugs (features?), which
prevents advertised determinism even in such restricted conditions. I found
that my work involves very similar efforts as Pavel's work. And this is
extra hard work. I'm feeling like fighting with hundreds of maintainers and
contributors, whose efforts are in opposite direction. They starting from
qemu underlying core, encouraging asyncronous  processing (aio, threads,
virtio ioeventfd, etc.), and ending with particular modules or hardware
models, which negligently uses any kind of non-blocking calls, callbacks,
and even inappropriate QEMUClockType. Nobody cares about synchronization
with vcpu thread, except Pavel, me and, possibly, 1-2 more persons in a
whole world. I can understand why. Main reason is that it might greatly
degradate performance of emulation, which might be avoided by introducing
very high complexity. So, my words shouldn't be treated as any kind of
criticism. I perfectly understand that it's a complex issue.

Key difference of record/replay is that it solves problem by hooking calls
of any source of asynchrony at low level and just replaying it. In other
words, it deals with end effects, whereas non-record/replay use case
doesn't allow such solution and have to eliminate source of undesired
asynchrony by design. As such, record/replay have strong immunity to
violation of 'generic determinism' rules and even to hidden and tricky bugs
in any module which affects guest state. And that's why development rules
Pavel imposes are so democratic (relative to generic ones, I would like to
exist).

Anyway, it's just generic idea for discussion. I know, it needs to be more
specific. But, if nobody will express interest, I see no reason to continue.

P.S. Trivial example of how qemu could extend conditions for deterministic
execution. Сhardev would perform writing to backend using blocking calls,
thus making possible deterministic execution in use cases, where guest has
only one serial port which outputs data to console and have no interaction
with user. At least it would provide user with option, selecting between
better performance and determinism.


ср, 10 окт. 2018 г. в 19:32, Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>:

> This patch introduces docs/devel/replay.txt which describes the rules
> that should be followed to make virtual devices usable in record/replay
> mode.
>
> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgauk@ispras.ru>
> ---
>  docs/devel/replay.txt |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>  create mode 100644 docs/devel/replay.txt
>
> diff --git a/docs/devel/replay.txt b/docs/devel/replay.txt
> new file mode 100644
> index 0000000..61dac1b
> --- /dev/null
> +++ b/docs/devel/replay.txt
> @@ -0,0 +1,45 @@
> +Record/replay mechanism, that could be enabled through icount mode,
> expects
> +the virtual devices to satisfy the following requirements.
> +
> +The main idea behind this document is that everything that affects
> +the guest state during execution in icount mode should be deterministic.
> +
> +Timers
> +======
> +
> +All virtual devices should use virtual clock for timers that change the
> guest
> +state. Virtual clock is deterministic, therefore such timers are
> deterministic
> +too.
> +
> +Virtual devices can also use realtime clock for the events that do not
> change
> +the guest state directly. When the clock ticking should depend on VM
> execution
> +speed, use virtual ext clock. It is not deterministic, but its speed
> depends
> +on the guest execution. This clock is used by the virtual devices (e.g.,
> +slirp routing device) that lie outside the replayed guest.
> +
> +Bottom halves
> +=============
> +
> +Bottom half callbacks, that affect the guest state, should be invoked
> through
> +replay_bh_schedule_event or replay_bh_schedule_oneshot_event functions.
> +Their invocations are saved in record mode and synchronized with the
> existing
> +log in replay mode.
> +
> +Saving/restoring the VM state
> +=============================
> +
> +All fields in the device state structure (including virtual timers)
> +should be restored by loadvm to the same values they had before savevm.
> +
> +Avoid accessing other devices' state, because the order of
> saving/restoring
> +is not defined. It means that you should not call functions like
> +'update_irq' in post_load callback. Save everything explicitly to avoid
> +the dependencies that may make restoring the VM state non-deterministic.
> +
> +Stopping the VM
> +===============
> +
> +Stopping the guest should not interfere with its state (with the exception
> +of the network connections, that could be broken by the remote timeouts).
> +VM can be stopped at any moment of replay by the user. Restarting the VM
> +after that stop should not break the replay by the unneeded guest state
> change.
>
> --

С уважением,
  Артем Писаренко
diff mbox series

Patch

diff --git a/docs/devel/replay.txt b/docs/devel/replay.txt
new file mode 100644
index 0000000..61dac1b
--- /dev/null
+++ b/docs/devel/replay.txt
@@ -0,0 +1,45 @@ 
+Record/replay mechanism, that could be enabled through icount mode, expects
+the virtual devices to satisfy the following requirements.
+
+The main idea behind this document is that everything that affects
+the guest state during execution in icount mode should be deterministic.
+
+Timers
+======
+
+All virtual devices should use virtual clock for timers that change the guest
+state. Virtual clock is deterministic, therefore such timers are deterministic
+too.
+
+Virtual devices can also use realtime clock for the events that do not change
+the guest state directly. When the clock ticking should depend on VM execution
+speed, use virtual ext clock. It is not deterministic, but its speed depends
+on the guest execution. This clock is used by the virtual devices (e.g.,
+slirp routing device) that lie outside the replayed guest.
+
+Bottom halves
+=============
+
+Bottom half callbacks, that affect the guest state, should be invoked through
+replay_bh_schedule_event or replay_bh_schedule_oneshot_event functions.
+Their invocations are saved in record mode and synchronized with the existing
+log in replay mode.
+
+Saving/restoring the VM state
+=============================
+
+All fields in the device state structure (including virtual timers)
+should be restored by loadvm to the same values they had before savevm.
+
+Avoid accessing other devices' state, because the order of saving/restoring
+is not defined. It means that you should not call functions like
+'update_irq' in post_load callback. Save everything explicitly to avoid
+the dependencies that may make restoring the VM state non-deterministic.
+
+Stopping the VM
+===============
+
+Stopping the guest should not interfere with its state (with the exception
+of the network connections, that could be broken by the remote timeouts).
+VM can be stopped at any moment of replay by the user. Restarting the VM
+after that stop should not break the replay by the unneeded guest state change.