diff mbox series

[v7,17/19] replay: add BH oneshot event for block layer

Message ID 20181010133511.24538.70006.stgit@pasha-VirtualBox
State New
Headers show
Series Fixing record/replay and adding reverse debugging | expand

Commit Message

Pavel Dovgalyuk Oct. 10, 2018, 1:35 p.m. UTC
Replay is capable of recording normal BH events, but sometimes
there are single use callbacks scheduled with aio_bh_schedule_oneshot
function. This patch enables recording and replaying such callbacks.
Block layer uses these events for calling the completion function.
Replaying these calls makes the execution deterministic.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>

--

v6:
 - moved stub function to the separate file for fixing linux-user build
---
 block/block-backend.c    |    3 ++-
 include/sysemu/replay.h  |    3 +++
 replay/replay-events.c   |   16 ++++++++++++++++
 replay/replay-internal.h |    1 +
 replay/replay.c          |    2 +-
 stubs/Makefile.objs      |    1 +
 stubs/replay-user.c      |    9 +++++++++
 7 files changed, 33 insertions(+), 2 deletions(-)
 create mode 100644 stubs/replay-user.c

Comments

Kevin Wolf Nov. 28, 2018, 4:01 p.m. UTC | #1
Am 10.10.2018 um 15:35 hat Pavel Dovgalyuk geschrieben:
> Replay is capable of recording normal BH events, but sometimes
> there are single use callbacks scheduled with aio_bh_schedule_oneshot
> function. This patch enables recording and replaying such callbacks.
> Block layer uses these events for calling the completion function.
> Replaying these calls makes the execution deterministic.
> 
> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>

What are the rules for using aio_bh_schedule_oneshot() vs.
replay_bh_schedule_oneshot_event()? You are modifying one place, but if
I grep for aio_bh_schedule_oneshot(), I find many other places that
use it. Why is this one place special?

> --
> 
> v6:
>  - moved stub function to the separate file for fixing linux-user build
> ---
>  block/block-backend.c    |    3 ++-
>  include/sysemu/replay.h  |    3 +++
>  replay/replay-events.c   |   16 ++++++++++++++++
>  replay/replay-internal.h |    1 +
>  replay/replay.c          |    2 +-
>  stubs/Makefile.objs      |    1 +
>  stubs/replay-user.c      |    9 +++++++++
>  7 files changed, 33 insertions(+), 2 deletions(-)
>  create mode 100644 stubs/replay-user.c
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index dc0cd57..04f5554 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -17,6 +17,7 @@
>  #include "block/throttle-groups.h"
>  #include "sysemu/blockdev.h"
>  #include "sysemu/sysemu.h"
> +#include "sysemu/replay.h"
>  #include "qapi/error.h"
>  #include "qapi/qapi-events-block.h"
>  #include "qemu/id.h"
> @@ -1379,7 +1380,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
>  
>      acb->has_returned = true;
>      if (acb->rwco.ret != NOT_DONE) {
> -        aio_bh_schedule_oneshot(blk_get_aio_context(blk),
> +        replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
>                                  blk_aio_complete_bh, acb);

Indentation is off.

Kevin
Pavel Dovgalyuk Nov. 30, 2018, 8:21 a.m. UTC | #2
> From: Kevin Wolf [mailto:kwolf@redhat.com]
> Am 10.10.2018 um 15:35 hat Pavel Dovgalyuk geschrieben:
> > Replay is capable of recording normal BH events, but sometimes
> > there are single use callbacks scheduled with aio_bh_schedule_oneshot
> > function. This patch enables recording and replaying such callbacks.
> > Block layer uses these events for calling the completion function.
> > Replaying these calls makes the execution deterministic.
> >
> > Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
> 
> What are the rules for using aio_bh_schedule_oneshot() vs.
> replay_bh_schedule_oneshot_event()? You are modifying one place, but if
> I grep for aio_bh_schedule_oneshot(), I find many other places that
> use it. Why is this one place special?

This one is special, because it caused a record/replay bug.
I can't validate all other places, because I don't understand block layer
good enough.

That's why our current strategy is fixing replay, when it is breaks.
It's too hard to create the test for verifying the modification. And for the current
patch there was the bug which was fixed.

The rule is the following: when the event is asynchronous and its finalization
affects the guest state, then this event should be processed by the record/replay queue.

> 
> > --
> >
> > v6:
> >  - moved stub function to the separate file for fixing linux-user build
> > ---
> >  block/block-backend.c    |    3 ++-
> >  include/sysemu/replay.h  |    3 +++
> >  replay/replay-events.c   |   16 ++++++++++++++++
> >  replay/replay-internal.h |    1 +
> >  replay/replay.c          |    2 +-
> >  stubs/Makefile.objs      |    1 +
> >  stubs/replay-user.c      |    9 +++++++++
> >  7 files changed, 33 insertions(+), 2 deletions(-)
> >  create mode 100644 stubs/replay-user.c
> >
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index dc0cd57..04f5554 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -17,6 +17,7 @@
> >  #include "block/throttle-groups.h"
> >  #include "sysemu/blockdev.h"
> >  #include "sysemu/sysemu.h"
> > +#include "sysemu/replay.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qapi-events-block.h"
> >  #include "qemu/id.h"
> > @@ -1379,7 +1380,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int
> bytes,
> >
> >      acb->has_returned = true;
> >      if (acb->rwco.ret != NOT_DONE) {
> > -        aio_bh_schedule_oneshot(blk_get_aio_context(blk),
> > +        replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> >                                  blk_aio_complete_bh, acb);
> 
> Indentation is off.

Thanks.

Pavel Dovgalyuk
Kevin Wolf Nov. 30, 2018, 11:18 a.m. UTC | #3
Am 30.11.2018 um 09:21 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > Am 10.10.2018 um 15:35 hat Pavel Dovgalyuk geschrieben:
> > > Replay is capable of recording normal BH events, but sometimes
> > > there are single use callbacks scheduled with aio_bh_schedule_oneshot
> > > function. This patch enables recording and replaying such callbacks.
> > > Block layer uses these events for calling the completion function.
> > > Replaying these calls makes the execution deterministic.
> > >
> > > Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
> > 
> > What are the rules for using aio_bh_schedule_oneshot() vs.
> > replay_bh_schedule_oneshot_event()? You are modifying one place, but if
> > I grep for aio_bh_schedule_oneshot(), I find many other places that
> > use it. Why is this one place special?
> 
> This one is special, because it caused a record/replay bug.  I can't
> validate all other places, because I don't understand block layer good
> enough.
>
> That's why our current strategy is fixing replay, when it is breaks.
> It's too hard to create the test for verifying the modification. And
> for the current patch there was the bug which was fixed.

So nobody really understands the code, but we're just fixing symptoms
whenever something crashes, without ever thinking about the design? And
then the code will organically grow to maybe approximate what we wanted
it to do?

Honestly, that's not the way to build reliable and maintainable
software.

> The rule is the following: when the event is asynchronous and its
> finalization affects the guest state, then this event should be
> processed by the record/replay queue.

Are there any BHs that can never affect the guest state?

Maybe you should really intercept this inside qemu_bh_schedule() and
aio_bh_schedule_oneshot() to catch all instances. This would look more
likely to address the root cause rather than twenty different special
cases and forgetting the other hundred instances.

But why do you even need to record and replay BHs? Aren't they already
fully deterministic if the code that schedules them is deterministic? Is
this hinting at problems in a different part of the code, so that the
caller isn't deterministic even though we expect it to be?

Kevin
Pavel Dovgalyuk Nov. 30, 2018, 11:26 a.m. UTC | #4
> From: Kevin Wolf [mailto:kwolf@redhat.com]
> Am 30.11.2018 um 09:21 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > Am 10.10.2018 um 15:35 hat Pavel Dovgalyuk geschrieben:
> > > > Replay is capable of recording normal BH events, but sometimes
> > > > there are single use callbacks scheduled with aio_bh_schedule_oneshot
> > > > function. This patch enables recording and replaying such callbacks.
> > > > Block layer uses these events for calling the completion function.
> > > > Replaying these calls makes the execution deterministic.
> > > >
> > > > Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
> > >
> > > What are the rules for using aio_bh_schedule_oneshot() vs.
> > > replay_bh_schedule_oneshot_event()? You are modifying one place, but if
> > > I grep for aio_bh_schedule_oneshot(), I find many other places that
> > > use it. Why is this one place special?
> >
> > This one is special, because it caused a record/replay bug.  I can't
> > validate all other places, because I don't understand block layer good
> > enough.
> >
> > That's why our current strategy is fixing replay, when it is breaks.
> > It's too hard to create the test for verifying the modification. And
> > for the current patch there was the bug which was fixed.
> 
> So nobody really understands the code, but we're just fixing symptoms
> whenever something crashes, without ever thinking about the design? And
> then the code will organically grow to maybe approximate what we wanted
> it to do?
> 
> Honestly, that's not the way to build reliable and maintainable
> software.
> 
> > The rule is the following: when the event is asynchronous and its
> > finalization affects the guest state, then this event should be
> > processed by the record/replay queue.
> 
> Are there any BHs that can never affect the guest state?

For example, qemu_bh_schedule(qmp_dispatcher_bh) in handle_qmp_command.
thread-pool.c also uses BH callbacks, but they are for creating threads,
and not for modifying the replayed guest state.

> Maybe you should really intercept this inside qemu_bh_schedule() and
> aio_bh_schedule_oneshot() to catch all instances. This would look more
> likely to address the root cause rather than twenty different special
> cases and forgetting the other hundred instances.
> 
> But why do you even need to record and replay BHs? Aren't they already
> fully deterministic if the code that schedules them is deterministic? Is
> this hinting at problems in a different part of the code, so that the
> caller isn't deterministic even though we expect it to be?

BHs may be invoked at random moments, that are not related to the executed
instructions. As we replay CPU, memory, and peripheral device state, we
must replay all invocations all BHs that affect them.

Replay must be done precisely, with instruction-level granularity.
That is why we record the moment of BH invocation for matching it in
the replay phase.

Pavel Dovgalyuk
diff mbox series

Patch

diff --git a/block/block-backend.c b/block/block-backend.c
index dc0cd57..04f5554 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -17,6 +17,7 @@ 
 #include "block/throttle-groups.h"
 #include "sysemu/blockdev.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/replay.h"
 #include "qapi/error.h"
 #include "qapi/qapi-events-block.h"
 #include "qemu/id.h"
@@ -1379,7 +1380,7 @@  static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
 
     acb->has_returned = true;
     if (acb->rwco.ret != NOT_DONE) {
-        aio_bh_schedule_oneshot(blk_get_aio_context(blk),
+        replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
                                 blk_aio_complete_bh, acb);
     }
 
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 05cb40a..5c71858 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -157,6 +157,9 @@  bool replay_events_enabled(void);
 void replay_flush_events(void);
 /*! Adds bottom half event to the queue */
 void replay_bh_schedule_event(QEMUBH *bh);
+/* Adds oneshot bottom half event to the queue */
+void replay_bh_schedule_oneshot_event(AioContext *ctx,
+    QEMUBHFunc *cb, void *opaque);
 /*! Adds input event to the queue */
 void replay_input_event(QemuConsole *src, InputEvent *evt);
 /*! Adds input sync event to the queue */
diff --git a/replay/replay-events.c b/replay/replay-events.c
index 0964a82..83a7d81 100644
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -37,6 +37,9 @@  static void replay_run_event(Event *event)
     case REPLAY_ASYNC_EVENT_BH:
         aio_bh_call(event->opaque);
         break;
+    case REPLAY_ASYNC_EVENT_BH_ONESHOT:
+        ((QEMUBHFunc*)event->opaque)(event->opaque2);
+        break;
     case REPLAY_ASYNC_EVENT_INPUT:
         qemu_input_event_send_impl(NULL, (InputEvent *)event->opaque);
         qapi_free_InputEvent((InputEvent *)event->opaque);
@@ -132,6 +135,17 @@  void replay_bh_schedule_event(QEMUBH *bh)
     }
 }
 
+void replay_bh_schedule_oneshot_event(AioContext *ctx,
+    QEMUBHFunc *cb, void *opaque)
+{
+    if (events_enabled) {
+        uint64_t id = replay_get_current_step();
+        replay_add_event(REPLAY_ASYNC_EVENT_BH_ONESHOT, cb, opaque, id);
+    } else {
+        aio_bh_schedule_oneshot(ctx, cb, opaque);
+    }
+}
+
 void replay_add_input_event(struct InputEvent *event)
 {
     replay_add_event(REPLAY_ASYNC_EVENT_INPUT, event, NULL, 0);
@@ -162,6 +176,7 @@  static void replay_save_event(Event *event, int checkpoint)
         /* save event-specific data */
         switch (event->event_kind) {
         case REPLAY_ASYNC_EVENT_BH:
+        case REPLAY_ASYNC_EVENT_BH_ONESHOT:
             replay_put_qword(event->id);
             break;
         case REPLAY_ASYNC_EVENT_INPUT:
@@ -216,6 +231,7 @@  static Event *replay_read_event(int checkpoint)
     /* Events that has not to be in the queue */
     switch (replay_state.read_event_kind) {
     case REPLAY_ASYNC_EVENT_BH:
+    case REPLAY_ASYNC_EVENT_BH_ONESHOT:
         if (replay_state.read_event_id == -1) {
             replay_state.read_event_id = replay_get_qword();
         }
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index e37b201..af2b941 100644
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -51,6 +51,7 @@  enum ReplayEvents {
 
 enum ReplayAsyncEventKind {
     REPLAY_ASYNC_EVENT_BH,
+    REPLAY_ASYNC_EVENT_BH_ONESHOT,
     REPLAY_ASYNC_EVENT_INPUT,
     REPLAY_ASYNC_EVENT_INPUT_SYNC,
     REPLAY_ASYNC_EVENT_CHAR_READ,
diff --git a/replay/replay.c b/replay/replay.c
index c6d5de9..93d2573 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -22,7 +22,7 @@ 
 
 /* Current version of the replay mechanism.
    Increase it when file format changes. */
-#define REPLAY_VERSION              0xe02007
+#define REPLAY_VERSION              0xe02008
 /* Size of replay log header */
 #define HEADER_SIZE                 (sizeof(uint32_t) + sizeof(uint64_t))
 
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 53d3f32..2acbac3 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -24,6 +24,7 @@  stub-obj-y += monitor.o
 stub-obj-y += notify-event.o
 stub-obj-y += qtest.o
 stub-obj-y += replay.o
+stub-obj-y += replay-user.o
 stub-obj-y += runstate-check.o
 stub-obj-y += set-fd-handler.o
 stub-obj-y += slirp.o
diff --git a/stubs/replay-user.c b/stubs/replay-user.c
new file mode 100644
index 0000000..2ad9e27
--- /dev/null
+++ b/stubs/replay-user.c
@@ -0,0 +1,9 @@ 
+#include "qemu/osdep.h"
+#include "sysemu/replay.h"
+#include "sysemu/sysemu.h"
+
+void replay_bh_schedule_oneshot_event(AioContext *ctx,
+    QEMUBHFunc *cb, void *opaque)
+{
+    aio_bh_schedule_oneshot(ctx, cb, opaque);
+}