mbox series

[v3,0/4] Introduce attributes for timers subsystem and remove QEMU_CLOCK_VIRTUAL_EXT clock type

Message ID cover.1539860473.git.artem.k.pisarenko@gmail.com
Headers show
Series Introduce attributes for timers subsystem and remove QEMU_CLOCK_VIRTUAL_EXT clock type | expand

Message

Artem Pisarenko Oct. 18, 2018, 11:04 a.m. UTC
Recent patches from series [PATCH v6] "Fixing record/replay and adding reverse debugging" introduced new clock type QEMU_CLOCK_VIRTUAL_EXT and replaced virtual timers in some external subsystems with it.
This resulted in small change to existing behavior, which I consider to be unacceptable.
Processing of virtual timers, belonging to new clock type, was kicked off to main loop, which made them asynchronous with vCPU thread and, in icount mode, with whole guest execution. This breaks expected determinism in non-record/replay icount mode of emulation where these "external subsystems" are isolated from host (i.e. they external only to guest core, not to emulation environment).

Example for slirp ("user" backend for network device):
User runs qemu in icount mode with rtc clock=vm without any external communication interfaces but with "-netdev user,restrict=on". It expects deterministic execution, because network services are emulated inside qemu and isolated from host. There are no reasons to get reply from DHCP server with different delay or something like that.

These series of patches revert those commits and reimplement their modifications in a better way.

Current implementation of timers/clock processing is confusing (at least for me) because of exceptions from design concept behind them, which already introduced by icount mode (which adds QEMU_CLOCK_VIRTUAL_RT). Adding QEMU_CLOCK_VIRTUAL_EXT just made things even more complicated. I consider these "alternative" virtual clocks to be some kind of hacks being convinient only to authors of relevant qemu features. Lets don't touch fundamental clock types and keep them orthogonal to special cases of timers handling.

As far as I understand, original intention of author was just to make optimization in replay log by avoiding storing extra events which don't change guest state directly. Indeed, for example, ipv6 icmp timer in slirp does things which external to guest core and ends with sending network packet to guest, but record/replay will anyway catch event, corresponding to packet reception in guest network frontend, and store it to replay log, so there are no need in making checkpoint for corresponding clock when that timer fires.
If so, then we just need to skip checkpoints for clock values, when only these specific timers are run. It is individual timers which are specific, not clock.
Adding some kind of attribute/flag/property to individual timer allows any special qemu feature (such as record/replay) to inspect it and handle as needed. This design achieves less dependencies, more transparency and has more intuitive and clear sense. For record/replay feature it's achieved with 'EXTERNAL' attribute. The only drawback is that it required to add one extra mutex unlock/lock pair for virtual clock type in timerlist_run_timers() function (see patch 3), but it's being added only when qemu runs in record/replay mode.

Finally, this patch series optimizes checkpointing for all other clocks it applies to.


v3 changes:
 - fixed failed build in last patch
 - attributes has been refactored and restricted to be used only within qemu-timer (as suggested by Stefan Hajnoczi and Paolo Bonzini)

v2 changes:
 - timers/aio interface refactored and improved, addition to couroutines removed (as Paolo Bonzini suggested)
 - fixed wrong reimplementation in qemu-timer.c caused race conditions
 - added bonus patch which optimizes checkpointing for other clocks

P.S. I've tried to test record/replay with slirp, but in replay mode qemu stucks at guest linux boot after "Configuring network interfaces..." message, where DHCP communication takes place. It's broken in a same way both in master and master with reverted commits being fixed.

P.S.2. I wasn't able to test these patches on purely clean master branch because of bugs https://bugs.launchpad.net/qemu/+bug/1790460 and https://bugs.launchpad.net/qemu/+bug/1795369, which workarounded by several not-accepted (yet?) modifications.


Artem Pisarenko (4):
  Revert some patches from recent [PATCH v6] "Fixing record/replay and
    adding reverse debugging"
  Introduce attributes to qemu timer subsystem
  Restores record/replay behavior related to special virtual clock
    processing for timers used in external subsystems.
  Optimize record/replay checkpointing for all clocks it applies to

 include/block/aio.h       |  59 ++++++++++++++++++---
 include/qemu/timer.h      | 128 +++++++++++++++++++++++-----------------------
 slirp/ip6_icmp.c          |   9 ++--
 tests/ptimer-test-stubs.c |  13 +++--
 ui/input.c                |   9 ++--
 util/qemu-timer.c         |  95 +++++++++++++++++++++++-----------
 6 files changed, 201 insertions(+), 112 deletions(-)

Comments

Artem Pisarenko Oct. 18, 2018, noon UTC | #1
Sorry, I forgot to fix long lines in commit messages. Do I need to submit
v4 ?
Paolo Bonzini Oct. 18, 2018, 12:06 p.m. UTC | #2
On 18/10/2018 14:00, Artem Pisarenko wrote:
> Sorry, I forgot to fix long lines in commit messages. Do I need to
> submit v4 ?
> 

No need, don't worry.

Paolo
Pavel Dovgalyuk Jan. 10, 2019, 1:30 p.m. UTC | #3
Hi,

It seems, that this approach is not always correct.
Now timerlist_deadline_ns uses all virtual timers for deadline calculation (including external
ones).
qemu_start_warp_timer uses the deadline for setting warp timer (which should be deterministic).
Therefore warp timer may become non-deterministic and replay may behave differently compared to
recording phase.

We have to rollback these or improve somehow to avoid non-determinism.

Pavel Dovgalyuk

> -----Original Message-----
> From: Artem Pisarenko [mailto:artem.k.pisarenko@gmail.com]
> Sent: Thursday, October 18, 2018 2:04 PM
> To: qemu-devel@nongnu.org
> Cc: Pavel Dovgalyuk; Paolo Bonzini; Artem Pisarenko
> Subject: [PATCH v3 0/4] Introduce attributes for timers subsystem and remove
> QEMU_CLOCK_VIRTUAL_EXT clock type
> 
> Recent patches from series [PATCH v6] "Fixing record/replay and adding reverse debugging"
> introduced new clock type QEMU_CLOCK_VIRTUAL_EXT and replaced virtual timers in some external
> subsystems with it.
> This resulted in small change to existing behavior, which I consider to be unacceptable.
> Processing of virtual timers, belonging to new clock type, was kicked off to main loop, which
> made them asynchronous with vCPU thread and, in icount mode, with whole guest execution. This
> breaks expected determinism in non-record/replay icount mode of emulation where these
> "external subsystems" are isolated from host (i.e. they external only to guest core, not to
> emulation environment).
> 
> Example for slirp ("user" backend for network device):
> User runs qemu in icount mode with rtc clock=vm without any external communication interfaces
> but with "-netdev user,restrict=on". It expects deterministic execution, because network
> services are emulated inside qemu and isolated from host. There are no reasons to get reply
> from DHCP server with different delay or something like that.
> 
> These series of patches revert those commits and reimplement their modifications in a better
> way.
> 
> Current implementation of timers/clock processing is confusing (at least for me) because of
> exceptions from design concept behind them, which already introduced by icount mode (which
> adds QEMU_CLOCK_VIRTUAL_RT). Adding QEMU_CLOCK_VIRTUAL_EXT just made things even more
> complicated. I consider these "alternative" virtual clocks to be some kind of hacks being
> convinient only to authors of relevant qemu features. Lets don't touch fundamental clock types
> and keep them orthogonal to special cases of timers handling.
> 
> As far as I understand, original intention of author was just to make optimization in replay
> log by avoiding storing extra events which don't change guest state directly. Indeed, for
> example, ipv6 icmp timer in slirp does things which external to guest core and ends with
> sending network packet to guest, but record/replay will anyway catch event, corresponding to
> packet reception in guest network frontend, and store it to replay log, so there are no need
> in making checkpoint for corresponding clock when that timer fires.
> If so, then we just need to skip checkpoints for clock values, when only these specific timers
> are run. It is individual timers which are specific, not clock.
> Adding some kind of attribute/flag/property to individual timer allows any special qemu
> feature (such as record/replay) to inspect it and handle as needed. This design achieves less
> dependencies, more transparency and has more intuitive and clear sense. For record/replay
> feature it's achieved with 'EXTERNAL' attribute. The only drawback is that it required to add
> one extra mutex unlock/lock pair for virtual clock type in timerlist_run_timers() function
> (see patch 3), but it's being added only when qemu runs in record/replay mode.
> 
> Finally, this patch series optimizes checkpointing for all other clocks it applies to.
> 
> 
> v3 changes:
>  - fixed failed build in last patch
>  - attributes has been refactored and restricted to be used only within qemu-timer (as
> suggested by Stefan Hajnoczi and Paolo Bonzini)
> 
> v2 changes:
>  - timers/aio interface refactored and improved, addition to couroutines removed (as Paolo
> Bonzini suggested)
>  - fixed wrong reimplementation in qemu-timer.c caused race conditions
>  - added bonus patch which optimizes checkpointing for other clocks
> 
> P.S. I've tried to test record/replay with slirp, but in replay mode qemu stucks at guest
> linux boot after "Configuring network interfaces..." message, where DHCP communication takes
> place. It's broken in a same way both in master and master with reverted commits being fixed.
> 
> P.S.2. I wasn't able to test these patches on purely clean master branch because of bugs
> https://bugs.launchpad.net/qemu/+bug/1790460 and https://bugs.launchpad.net/qemu/+bug/1795369,
> which workarounded by several not-accepted (yet?) modifications.
> 
> 
> Artem Pisarenko (4):
>   Revert some patches from recent [PATCH v6] "Fixing record/replay and
>     adding reverse debugging"
>   Introduce attributes to qemu timer subsystem
>   Restores record/replay behavior related to special virtual clock
>     processing for timers used in external subsystems.
>   Optimize record/replay checkpointing for all clocks it applies to
> 
>  include/block/aio.h       |  59 ++++++++++++++++++---
>  include/qemu/timer.h      | 128 +++++++++++++++++++++++-----------------------
>  slirp/ip6_icmp.c          |   9 ++--
>  tests/ptimer-test-stubs.c |  13 +++--
>  ui/input.c                |   9 ++--
>  util/qemu-timer.c         |  95 +++++++++++++++++++++++-----------
>  6 files changed, 201 insertions(+), 112 deletions(-)
> 
> --
> 2.7.4
Paolo Bonzini Jan. 11, 2019, 10:25 a.m. UTC | #4
On 10/01/19 14:30, Pavel Dovgalyuk wrote:
> Now timerlist_deadline_ns uses all virtual timers for deadline calculation (including external
> ones).
> qemu_start_warp_timer uses the deadline for setting warp timer (which should be deterministic).

Can you introduce a variant of qemu_clock_deadline_ns_all that ignores
external timers, and use it in qemu_start_warp_timer?

timerlist_deadline_ns can be made static, so adding a bool argument to
it is not a big issue for API cleanliness.

In fact, qemu_clock_deadline_ns_all is only ever used with
QEMU_CLOCK_VIRTUAL, so you could also change it to

uint64_t virtual_clock_deadline_ns(bool include_external);

Paolo
Artem Pisarenko Jan. 11, 2019, 1 p.m. UTC | #5
> It seems, that this approach is not always correct.
> Now timerlist_deadline_ns uses all virtual timers for deadline calculation (including external
> ones).
> qemu_start_warp_timer uses the deadline for setting warp timer (which should be deterministic).
> Therefore warp timer may become non-deterministic and replay may behave differently compared to
> recording phase.
>
> We have to rollback these or improve somehow to avoid non-determinism.

I dont understand how this approach would even introduce
non-determinism. I'm not sure about aspects of timers subsystem you
mentioned, assuming that maybe we missed something when tried to
optimize. But this has nothing to do with determinism as long as we
treat all virtual clock timers as deterministic, regardless of
EXTERNAL attribute being set or not. They are intended to be used as
such by design, aren't? This attribute was introduced purely to avoid
extra events in log.

Artem
Pavel Dovgalyuk Jan. 14, 2019, 5:59 a.m. UTC | #6
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> On 10/01/19 14:30, Pavel Dovgalyuk wrote:
> > Now timerlist_deadline_ns uses all virtual timers for deadline calculation (including
> external
> > ones).
> > qemu_start_warp_timer uses the deadline for setting warp timer (which should be
> deterministic).
> 
> Can you introduce a variant of qemu_clock_deadline_ns_all that ignores
> external timers, and use it in qemu_start_warp_timer?

Ok.

> timerlist_deadline_ns can be made static, so adding a bool argument to
> it is not a big issue for API cleanliness.
> 
> In fact, qemu_clock_deadline_ns_all is only ever used with
> QEMU_CLOCK_VIRTUAL, so you could also change it to
> 
> uint64_t virtual_clock_deadline_ns(bool include_external);

I started making changes and it seems that this parameter should never be true,
because this function is called either from cpus.c or from the tests.

Pavel Dovgalyuk
Pavel Dovgalyuk Jan. 16, 2019, 6:16 a.m. UTC | #7
> From: Artem Pisarenko [mailto:artem.k.pisarenko@gmail.com]
> > It seems, that this approach is not always correct.
> > Now timerlist_deadline_ns uses all virtual timers for deadline calculation (including
> external
> > ones).
> > qemu_start_warp_timer uses the deadline for setting warp timer (which should be
> deterministic).
> > Therefore warp timer may become non-deterministic and replay may behave differently compared
> to
> > recording phase.
> >
> > We have to rollback these or improve somehow to avoid non-determinism.
> 
> I dont understand how this approach would even introduce
> non-determinism. I'm not sure about aspects of timers subsystem you
> mentioned, assuming that maybe we missed something when tried to
> optimize. But this has nothing to do with determinism as long as we
> treat all virtual clock timers as deterministic, regardless of
> EXTERNAL attribute being set or not. They are intended to be used as
> such by design, aren't? This attribute was introduced purely to avoid
> extra events in log.

Right, but deadline calculation and icount warp events (that are written into the log too)
depend on the active virtual timers. Therefore external ones may affect icount warp
operation sequence.

Pavel Dovgalyuk
Artem Pisarenko Jan. 16, 2019, 8:35 a.m. UTC | #8
ср, 16 янв. 2019 г. в 12:15, Pavel Dovgalyuk <dovgaluk@ispras.ru>:
>
> > From: Artem Pisarenko [mailto:artem.k.pisarenko@gmail.com]
> > > It seems, that this approach is not always correct.
> > > Now timerlist_deadline_ns uses all virtual timers for deadline calculation (including
> > external
> > > ones).
> > > qemu_start_warp_timer uses the deadline for setting warp timer (which should be
> > deterministic).
> > > Therefore warp timer may become non-deterministic and replay may behave differently compared
> > to
> > > recording phase.
> > >
> > > We have to rollback these or improve somehow to avoid non-determinism.
> >
> > I dont understand how this approach would even introduce
> > non-determinism. I'm not sure about aspects of timers subsystem you
> > mentioned, assuming that maybe we missed something when tried to
> > optimize. But this has nothing to do with determinism as long as we
> > treat all virtual clock timers as deterministic, regardless of
> > EXTERNAL attribute being set or not. They are intended to be used as
> > such by design, aren't? This attribute was introduced purely to avoid
> > extra events in log.
>
> Right, but deadline calculation and icount warp events (that are written into the log too)
> depend on the active virtual timers. Therefore external ones may affect icount warp
> operation sequence.
>
> Pavel Dovgalyuk
>

Sorry, but I still don't understand the source of non-determinism.
From what you said I may understand that replay module actually has
some additional non-trivial (for me) dependencies on EXTERNAL
attribute other than one, introduced in 'timerlist_run_timers()' (i.e.
it expects deadline calculations somewhere else to match decisions
made in this function, or something like that). I would agree that
these patch series might introduce some incorrect behavior. But it
must be identical in all emulations running in same conditions,
because deadline, warp timer and their effects are all dependent only
on virtual timers, and, therefore, are deterministic too. Of course,
it needs to be fixed. Did you find solution ?
Pavel Dovgalyuk Jan. 17, 2019, 7:42 a.m. UTC | #9
> From: Artem Pisarenko [mailto:artem.k.pisarenko@gmail.com]
> ср, 16 янв. 2019 г. в 12:15, Pavel Dovgalyuk <dovgaluk@ispras.ru>:
> >
> > > From: Artem Pisarenko [mailto:artem.k.pisarenko@gmail.com]
> > > > It seems, that this approach is not always correct.
> > > > Now timerlist_deadline_ns uses all virtual timers for deadline calculation (including
> > > external
> > > > ones).
> > > > qemu_start_warp_timer uses the deadline for setting warp timer (which should be
> > > deterministic).
> > > > Therefore warp timer may become non-deterministic and replay may behave differently
> compared
> > > to
> > > > recording phase.
> > > >
> > > > We have to rollback these or improve somehow to avoid non-determinism.
> > >
> > > I dont understand how this approach would even introduce
> > > non-determinism. I'm not sure about aspects of timers subsystem you
> > > mentioned, assuming that maybe we missed something when tried to
> > > optimize. But this has nothing to do with determinism as long as we
> > > treat all virtual clock timers as deterministic, regardless of
> > > EXTERNAL attribute being set or not. They are intended to be used as
> > > such by design, aren't? This attribute was introduced purely to avoid
> > > extra events in log.
> >
> > Right, but deadline calculation and icount warp events (that are written into the log too)
> > depend on the active virtual timers. Therefore external ones may affect icount warp
> > operation sequence.
> >
> > Pavel Dovgalyuk
> >
> 
> Sorry, but I still don't understand the source of non-determinism.

The source is the external timers, that depend on "outer world", e.g., slirp.
These timers are not recorded/replayed, but may affect the icount warping
(which should remain deterministic).

> From what you said I may understand that replay module actually has
> some additional non-trivial (for me) dependencies on EXTERNAL
> attribute other than one, introduced in 'timerlist_run_timers()' (i.e.
> it expects deadline calculations somewhere else to match decisions
> made in this function, or something like that). I would agree that
> these patch series might introduce some incorrect behavior. But it
> must be identical in all emulations running in same conditions,
> because deadline, warp timer and their effects are all dependent only
> on virtual timers, and, therefore, are deterministic too. Of course,
> it needs to be fixed. Did you find solution ?

Please check the new version of the series.
Patch 22

Pavel Dovgalyuk
Artem Pisarenko Jan. 17, 2019, 1:19 p.m. UTC | #10
> The source is the external timers, that depend on "outer world", e.g., slirp.
> These timers are not recorded/replayed, but may affect the icount warping
> (which should remain deterministic).

I thought we agreed earlier in this discussion that external timers
are deterministic by design, because they are subset of virtual clock
timers. Regarding slirp module, although it interfaces "outer world"
(except when 'restrict' option is on), it still uses IPv6 timer
deterministically. You may check it yourself. Otherwise it shouldn't
use timer of 'virtual clock' type for this purpose and this needs to
be fixed.
So I still don't understand. And I'm not asking you to explain. I'm
just hinting you that your fix may finally happen to be incorrect in
case if it based on wrong assertion I'm talking about.

> Please check the new version of the series.

Sorry, I'm too busy right now for deep analysis. I've just tried to
briefly understand issue you pointed and provide minimal feedback in
order to prevent possible wrong decisions. I'll return to this work
later.