mbox series

[RFC,00/17] reverse debugging

Message ID 20180425124533.17182.53165.stgit@pasha-VirtualBox
Headers show
Series reverse debugging | expand

Message

Pavel Dovgalyuk April 25, 2018, 12:45 p.m. UTC
GDB remote protocol supports reverse debugging of the targets.
It includes 'reverse step' and 'reverse continue' operations.
The first one finds the previous step of the execution,
and the second one is intended to stop at the last breakpoint that
would happen when the program is executed normally.

Reverse debugging is possible in the replay mode, when at least
one snapshot was created at the record or replay phase.
QEMU can use these snapshots for travelling back in time with GDB.

Running the execution in replay mode allows using GDB reverse debugging
commands:
 - reverse-stepi (or rsi): Steps one instruction to the past.
   QEMU loads on of the prior snapshots and proceeds to the desired
   instruction forward. When that step is reaches, execution stops.
 - reverse-continue (or rc): Runs execution "backwards".
   QEMU tries to find breakpoint or watchpoint by loaded prior snapshot
   and replaying the execution. Then QEMU loads snapshots again and
   replays to the latest breakpoint. When there are no breakpoints in
   the examined section of the execution, QEMU finds one more snapshot
   and tries again. After the first snapshot is processed, execution
   stops at this snapshot.

The set of patches include the following modifications:
 - gdbstub update for reverse debugging support
 - functions that automatically perform reverse step and reverse
   continue operations
 - hmp/qmp commands for manipulating the replay process
 - improvement of the snapshotting for saving the execution step
   in the snapshot parameters
 - other record/replay fixes

The patches are available in the repository:
https://github.com/ispras/qemu/tree/rr-180207

---

Pavel Dovgalyuk (17):
      block: implement bdrv_snapshot_goto for blkreplay
      replay: disable default snapshot for record/replay
      replay: update docs for record/replay with block devices
      replay: don't drain/flush bdrv queue while RR is working
      replay: finish record/replay before closing the disks
      migration: introduce icount field for snapshots
      qcow2: introduce icount field for snapshots
      replay: introduce info hmp/qmp command
      replay: introduce breakpoint at the specified step
      replay: implement replay_seek command to proceed to the desired step
      replay: flush events when exitting
      timer: remove replay clock probe in deadline calculation
      replay: refine replay-time module
      translator: fix breakpoint processing
      replay: flush rr queue before loading the vmstate
      gdbstub: add reverse step support in replay mode
      gdbstub: add reverse continue support in replay mode


 accel/tcg/translator.c    |    8 +
 block/blkreplay.c         |    8 +
 block/io.c                |   22 +++
 block/qapi.c              |   11 +-
 block/qcow2-snapshot.c    |    9 +
 block/qcow2.h             |    2 
 blockdev.c                |    3 
 cpus.c                    |   19 ++-
 docs/replay.txt           |   12 +-
 exec.c                    |    6 +
 gdbstub.c                 |   50 +++++++-
 hmp-commands-info.hx      |   14 ++
 hmp-commands.hx           |   30 +++++
 hmp.h                     |    3 
 include/block/snapshot.h  |    1 
 include/sysemu/replay.h   |   18 +++
 migration/savevm.c        |   11 +-
 qapi/block-core.json      |    5 +
 qapi/block.json           |    3 
 qapi/misc.json            |   69 +++++++++++
 replay/Makefile.objs      |    3 
 replay/replay-debugging.c |  286 +++++++++++++++++++++++++++++++++++++++++++++
 replay/replay-events.c    |   14 --
 replay/replay-internal.h  |   10 +-
 replay/replay-time.c      |   27 ++--
 replay/replay.c           |   22 +++
 stubs/replay.c            |   10 ++
 util/qemu-timer.c         |   11 --
 vl.c                      |   11 +-
 29 files changed, 625 insertions(+), 73 deletions(-)
 create mode 100644 replay/replay-debugging.c

Comments

Pavel Dovgalyuk April 25, 2018, 12:48 p.m. UTC | #1
> From: Pavel Dovgalyuk [mailto:Pavel.Dovgaluk@ispras.ru]
> The patches are available in the repository:
> https://github.com/ispras/qemu/tree/rr-180207

This should be https://github.com/ispras/qemu/tree/rr-180425


Pavel Dovgalyuk
Ciro Santilli April 26, 2018, 12:21 p.m. UTC | #2
On Wed, Apr 25, 2018 at 1:45 PM, Pavel Dovgalyuk
<Pavel.Dovgaluk@ispras.ru> wrote:
> GDB remote protocol supports reverse debugging of the targets.
> It includes 'reverse step' and 'reverse continue' operations.
> The first one finds the previous step of the execution,
> and the second one is intended to stop at the last breakpoint that
> would happen when the program is executed normally.
>
> Reverse debugging is possible in the replay mode, when at least
> one snapshot was created at the record or replay phase.
> QEMU can use these snapshots for travelling back in time with GDB.
>

Hi Pavel,

1)

Can you provide more details on how to run the reverse debugging? In
particular how to take the checkpoint?

My test setup is described in detail at:
https://github.com/cirosantilli/qemu-test/tree/8127452e5685ed233dc7357a1fe34b7a2d173480
command "x86_64/reverse-debug".

Here are the actual commands:

#!/usr/bin/env bash
set -eu
dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/.."
cmd="\
time \
./x86_64-softmmu/qemu-system-x86_64 \
-M pc \
-append 'root=/dev/sda console=ttyS0 nokaslr printk.time=y -
lkmc_eval=\"/rand_check.out;/sbin/ifup -a;wget -S
google.com;/poweroff.out;\"' \
-kernel '${dir}/out/x86_64/buildroot/images/bzImage' \
-nographic \
-serial mon:stdio \
-monitor telnet::45454,server,nowait \
\
-drive file='${dir}/out/x86_64/buildroot/images/rootfs.ext2.qcow2,if=none,id=img-direct,format=qcow2,snapshot'
\
-drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
-device ide-hd,drive=img-blkreplay \
\
-netdev user,id=net1 \
-device rtl8139,netdev=net1 \
-object filter-replay,id=replay,netdev=net1 \
"
cmd="${cmd} $@"
echo "$cmd"
eval "$cmd -icount 'shift=7,rr=record,rrfile=replay.bin'"
eval "$cmd -icount 'shift=7,rr=replay,rrfile=replay.bin' -S -s"

Then I take a snapshot right at the beginning of the execution:

telnet 45454
savevm a

And on another shell:

/data/git/linux-kernel-module-cheat/out/x86_64/buildroot/host/usr/bin/x86_64-linux-gdb
\
-q \
-ex 'file vmlinux' \
-ex 'target remote localhost:1234' \
-ex 'break start_kernel' \
-ex 'continue' \

But now if I try on GDB:

next
next
next
reverse-continue

hoping to go back to start_kernel, but nothing happens.

Same behavior if I take the snapshot after reaching start_kernel instead.

2)

I wonder if it would be possible to expose checkpoint taking through
GDB example via:
https://sourceware.org/gdb/onlinedocs/gdb/Checkpoint_002fRestart.html

Or some other more convenient checkpoint generation method, e.g.
automatically take checkpoints every N instructions.

> Running the execution in replay mode allows using GDB reverse debugging
> commands:
>  - reverse-stepi (or rsi): Steps one instruction to the past.
>    QEMU loads on of the prior snapshots and proceeds to the desired
>    instruction forward. When that step is reaches, execution stops.
>  - reverse-continue (or rc): Runs execution "backwards".
>    QEMU tries to find breakpoint or watchpoint by loaded prior snapshot
>    and replaying the execution. Then QEMU loads snapshots again and
>    replays to the latest breakpoint. When there are no breakpoints in
>    the examined section of the execution, QEMU finds one more snapshot
>    and tries again. After the first snapshot is processed, execution
>    stops at this snapshot.
>
> The set of patches include the following modifications:
>  - gdbstub update for reverse debugging support
>  - functions that automatically perform reverse step and reverse
>    continue operations
>  - hmp/qmp commands for manipulating the replay process
>  - improvement of the snapshotting for saving the execution step
>    in the snapshot parameters
>  - other record/replay fixes
>
> The patches are available in the repository:
> https://github.com/ispras/qemu/tree/rr-180207
>
> ---
>
> Pavel Dovgalyuk (17):
>       block: implement bdrv_snapshot_goto for blkreplay
>       replay: disable default snapshot for record/replay
>       replay: update docs for record/replay with block devices
>       replay: don't drain/flush bdrv queue while RR is working
>       replay: finish record/replay before closing the disks
>       migration: introduce icount field for snapshots
>       qcow2: introduce icount field for snapshots
>       replay: introduce info hmp/qmp command
>       replay: introduce breakpoint at the specified step
>       replay: implement replay_seek command to proceed to the desired step
>       replay: flush events when exitting
>       timer: remove replay clock probe in deadline calculation
>       replay: refine replay-time module
>       translator: fix breakpoint processing
>       replay: flush rr queue before loading the vmstate
>       gdbstub: add reverse step support in replay mode
>       gdbstub: add reverse continue support in replay mode
>
>
>  accel/tcg/translator.c    |    8 +
>  block/blkreplay.c         |    8 +
>  block/io.c                |   22 +++
>  block/qapi.c              |   11 +-
>  block/qcow2-snapshot.c    |    9 +
>  block/qcow2.h             |    2
>  blockdev.c                |    3
>  cpus.c                    |   19 ++-
>  docs/replay.txt           |   12 +-
>  exec.c                    |    6 +
>  gdbstub.c                 |   50 +++++++-
>  hmp-commands-info.hx      |   14 ++
>  hmp-commands.hx           |   30 +++++
>  hmp.h                     |    3
>  include/block/snapshot.h  |    1
>  include/sysemu/replay.h   |   18 +++
>  migration/savevm.c        |   11 +-
>  qapi/block-core.json      |    5 +
>  qapi/block.json           |    3
>  qapi/misc.json            |   69 +++++++++++
>  replay/Makefile.objs      |    3
>  replay/replay-debugging.c |  286 +++++++++++++++++++++++++++++++++++++++++++++
>  replay/replay-events.c    |   14 --
>  replay/replay-internal.h  |   10 +-
>  replay/replay-time.c      |   27 ++--
>  replay/replay.c           |   22 +++
>  stubs/replay.c            |   10 ++
>  util/qemu-timer.c         |   11 --
>  vl.c                      |   11 +-
>  29 files changed, 625 insertions(+), 73 deletions(-)
>  create mode 100644 replay/replay-debugging.c
>
> --
> Pavel Dovgalyuk
Pavel Dovgalyuk April 26, 2018, 12:34 p.m. UTC | #3
> From: Ciro Santilli [mailto:ciro.santilli@gmail.com]
> On Wed, Apr 25, 2018 at 1:45 PM, Pavel Dovgalyuk
> <Pavel.Dovgaluk@ispras.ru> wrote:
> > GDB remote protocol supports reverse debugging of the targets.
> > It includes 'reverse step' and 'reverse continue' operations.
> > The first one finds the previous step of the execution,
> > and the second one is intended to stop at the last breakpoint that
> > would happen when the program is executed normally.
> >
> > Reverse debugging is possible in the replay mode, when at least
> > one snapshot was created at the record or replay phase.
> > QEMU can use these snapshots for travelling back in time with GDB.
> >
> 
> Hi Pavel,
> 
> 1)
> 
> Can you provide more details on how to run the reverse debugging? In
> particular how to take the checkpoint?

There is some information in docs/replay.txt, but I guess, that I can give some more.

> 
> My test setup is described in detail at:
> https://github.com/cirosantilli/qemu-test/tree/8127452e5685ed233dc7357a1fe34b7a2d173480
> command "x86_64/reverse-debug".
> 
> Here are the actual commands:
> 
> #!/usr/bin/env bash
> set -eu
> dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/.."
> cmd="\
> time \
> ./x86_64-softmmu/qemu-system-x86_64 \
> -M pc \
> -append 'root=/dev/sda console=ttyS0 nokaslr printk.time=y -
> lkmc_eval=\"/rand_check.out;/sbin/ifup -a;wget -S
> google.com;/poweroff.out;\"' \
> -kernel '${dir}/out/x86_64/buildroot/images/bzImage' \
> -nographic \
> -serial mon:stdio \
> -monitor telnet::45454,server,nowait \
> \
> -drive file='${dir}/out/x86_64/buildroot/images/rootfs.ext2.qcow2,if=none,id=img-
> direct,format=qcow2,snapshot'

The main thing for reverse debugging is snapshotting.
Therefore you should have an image that does not use temporary overlay file (snapshot option).
I'm using the following command line for record:

rm ./images/xp.ovl
# create overlay to avoid modifying the original image
./bin/qemu-img create -f qcow2 -b xp.qcow2 ./images/xp.ovl
./bin/qemu-system-i386 \
# This is workaround for XP. I wonder is it needed for the current version or not.
 -global apic-common.vapic=off \
# using newly created overlay instead of the original image
# rrsnapshot creates the snapshot at the start
 -icount shift=7,rr=record,rrfile=xp.replay,rrsnapshot=init -drive file=./images/xp.ovl,if=none,id=img-direct \
 -drive driver=blkreplay,if=none,image=img-direct,id=img-replay -device ide-hd,drive=img-replay -net none -m 256M -monitor stdio

While recording I can create some snapshots with savevm.
Command line for replaying differs only in "rr" option. rrsnapshot there loads the initial snapshot.
Any of the previously created snapshots may be specified.
You can also create new snapshots while replaying.


> \
> -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
> -device ide-hd,drive=img-blkreplay \
> \
> -netdev user,id=net1 \
> -device rtl8139,netdev=net1 \
> -object filter-replay,id=replay,netdev=net1 \
> "
> cmd="${cmd} $@"
> echo "$cmd"
> eval "$cmd -icount 'shift=7,rr=record,rrfile=replay.bin'"
> eval "$cmd -icount 'shift=7,rr=replay,rrfile=replay.bin' -S -s"
> 
> Then I take a snapshot right at the beginning of the execution:
> 
> telnet 45454
> savevm a
> 
> And on another shell:
> 
> /data/git/linux-kernel-module-cheat/out/x86_64/buildroot/host/usr/bin/x86_64-linux-gdb
> \
> -q \
> -ex 'file vmlinux' \
> -ex 'target remote localhost:1234' \
> -ex 'break start_kernel' \
> -ex 'continue' \
> 
> But now if I try on GDB:
> 
> next
> next
> next
> reverse-continue
> 
> hoping to go back to start_kernel, but nothing happens.

Yes, because you are missing your snapshot, that was actually created in the temporary overlay.

> Same behavior if I take the snapshot after reaching start_kernel instead.
> 
> 2)
> 
> I wonder if it would be possible to expose checkpoint taking through
> GDB example via:
> https://sourceware.org/gdb/onlinedocs/gdb/Checkpoint_002fRestart.html

We'll check this out.

> Or some other more convenient checkpoint generation method, e.g.
> automatically take checkpoints every N instructions.

We implemented 'taking snapshots every N seconds', but I'll prefer to submit
it later, after approving the main idea.

> > Running the execution in replay mode allows using GDB reverse debugging
> > commands:
> >  - reverse-stepi (or rsi): Steps one instruction to the past.
> >    QEMU loads on of the prior snapshots and proceeds to the desired
> >    instruction forward. When that step is reaches, execution stops.
> >  - reverse-continue (or rc): Runs execution "backwards".
> >    QEMU tries to find breakpoint or watchpoint by loaded prior snapshot
> >    and replaying the execution. Then QEMU loads snapshots again and
> >    replays to the latest breakpoint. When there are no breakpoints in
> >    the examined section of the execution, QEMU finds one more snapshot
> >    and tries again. After the first snapshot is processed, execution
> >    stops at this snapshot.
> >
> > The set of patches include the following modifications:
> >  - gdbstub update for reverse debugging support
> >  - functions that automatically perform reverse step and reverse
> >    continue operations
> >  - hmp/qmp commands for manipulating the replay process
> >  - improvement of the snapshotting for saving the execution step
> >    in the snapshot parameters
> >  - other record/replay fixes
> >
> > The patches are available in the repository:
> > https://github.com/ispras/qemu/tree/rr-180207

Pavel Dovgalyuk
Ciro Santilli April 28, 2018, 8:14 a.m. UTC | #4
Forgetting about debugging, I belive there is a deadlock in the replay
at 63d426dfa4fbfac3d50cda3f553cd975de2b85ea , but it is rare.

I have only reproduced it on ARM so far, and I haven't checked pre-patch.

The setup is https://github.com/cirosantilli/qemu-test/tree/6a3497f0d84e7c86ef80f7322e24e8a149b93214
with images-ab21ef58deed8536bc159c2afd680a4fabd68510.zip

Then try to run it several times with:

i=0; while true; do date; echo $i; ../qemu-test/arm/rr; i=$(($i+1)); done

I think the deadlock can happen in a few different places, but the
most common is when the kernel is doing disk related stuff, the last
messages before getting stuck are:

[   11.530325] ALSA device list:
[   11.531451]   No soundcards found.

and what would follow on a normal replay would be:

[   11.551904] EXT4-fs (vda): couldn't mount as ext3 due to feature
incompatibilities
[   11.619238] EXT4-fs (vda): mounted filesystem without journal. Opts: (null)

I then attach GDB with:

gdb -q ./arm-softmmu/qemu-system-arm `pgrep qemu`

and then:

>>> thread apply all bt

Thread 5 (Thread 0x7f59c6efb700 (LWP 22096)):
#0  0x00007f59e7aa9072 in futex_wait_cancelable (private=<optimized
out>, expected=0, futex_word=0x55a8e99801d8) at
../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  0x00007f59e7aa9072 in __pthread_cond_wait_common (abstime=0x0,
mutex=0x55a8e89cbf40 <qemu_global_mutex>, cond=0x55a8e99801b0) at
pthread_cond_wait.c:502
#2  0x00007f59e7aa9072 in __pthread_cond_wait (cond=0x55a8e99801b0,
mutex=0x55a8e89cbf40 <qemu_global_mutex>) at pthread_cond_wait.c:655
#3  0x000055a8e7f4f178 in qemu_cond_wait_impl (cond=0x55a8e99801b0,
mutex=0x55a8e89cbf40 <qemu_global_mutex>, file=0x55a8e80b10a8
"/home/ciro/git/qemu/cpus.c", line=1175) at
util/qemu-thread-posix.c:164
#4  0x000055a8e7999965 in qemu_tcg_rr_wait_io_event
(cpu=0x55a8e986b330) at /home/ciro/git/qemu/cpus.c:1175
#5  0x000055a8e799a1f5 in qemu_tcg_rr_cpu_thread_fn
(arg=0x55a8e986b330) at /home/ciro/git/qemu/cpus.c:1502
#6  0x00007f59e7aa27fc in start_thread (arg=0x7f59c6efb700) at
pthread_create.c:465
#7  0x00007f59e77cfb5f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f59c76fc700 (LWP 22095)):
#0  0x00007f59e77c3a4b in __GI_ppoll (fds=0x7f59b8000b10, nfds=1,
timeout=<optimized out>, sigmask=0x0) at
../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x000055a8e7f4a02e in qemu_poll_ns (fds=0x7f59b8000b10, nfds=1,
timeout=-1) at util/qemu-timer.c:322
#2  0x000055a8e7f4cb5e in aio_poll (ctx=0x55a8e978eab0, blocking=true)
at util/aio-posix.c:629
#3  0x000055a8e7b5f084 in iothread_run (opaque=0x55a8e970c710) at
iothread.c:64
#4  0x00007f59e7aa27fc in start_thread (arg=0x7f59c76fc700) at
pthread_create.c:465
#5  0x00007f59e77cfb5f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f59ced65700 (LWP 22093)):
#0  0x00007f59e77c9a49 in syscall () at
../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f59e88456ef in g_cond_wait () at
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x000055a8e7f43157 in wait_for_trace_records_available () at
trace/simple.c:150
#3  0x000055a8e7f431b8 in writeout_thread (opaque=0x0) at
trace/simple.c:169
#4  0x00007f59e8827645 in  () at
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007f59e7aa27fc in start_thread (arg=0x7f59ced65700) at
pthread_create.c:465
#6  0x00007f59e77cfb5f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f59cf566700 (LWP 22092)):
#0  0x00007f59e77c9a49 in syscall () at
../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x000055a8e7f4f5d8 in qemu_futex_wait (f=0x55a8e8e48418
<rcu_call_ready_event>, val=4294967295) at
/home/ciro/git/qemu/include/qemu/futex.h:29
#2  0x000055a8e7f4f79f in qemu_event_wait (ev=0x55a8e8e48418
<rcu_call_ready_event>) at util/qemu-thread-posix.c:445
#3  0x000055a8e7f67d2d in call_rcu_thread (opaque=0x0) at
util/rcu.c:261
#4  0x00007f59e7aa27fc in start_thread (arg=0x7f59cf566700) at
pthread_create.c:465
#5  0x00007f59e77cfb5f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f59ecf03280 (LWP 22091)):
#0  0x00007f59e77c3a4b in __GI_ppoll (fds=0x55a8e9860aa0, nfds=5,
timeout=<optimized out>, sigmask=0x0) at
../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x000055a8e7f4a0c4 in qemu_poll_ns (fds=0x55a8e9860aa0, nfds=5,
timeout=1000000000) at util/qemu-timer.c:334
#2  0x000055a8e7f4b176 in os_host_main_loop_wait (timeout=1000000000)
at util/main-loop.c:258
#3  0x000055a8e7f4b241 in main_loop_wait (nonblocking=0) at
util/main-loop.c:522
#4  0x000055a8e7b66fed in main_loop () at vl.c:1943
#5  0x000055a8e7b6ead4 in main (argc=24, argv=0x7fff6fe0f328,
envp=0x7fff6fe0f3f0) at vl.c:4740

On Wed, Apr 25, 2018 at 1:45 PM, Pavel Dovgalyuk
<Pavel.Dovgaluk@ispras.ru> wrote:
> GDB remote protocol supports reverse debugging of the targets.
> It includes 'reverse step' and 'reverse continue' operations.
> The first one finds the previous step of the execution,
> and the second one is intended to stop at the last breakpoint that
> would happen when the program is executed normally.
>
> Reverse debugging is possible in the replay mode, when at least
> one snapshot was created at the record or replay phase.
> QEMU can use these snapshots for travelling back in time with GDB.
>
> Running the execution in replay mode allows using GDB reverse debugging
> commands:
>  - reverse-stepi (or rsi): Steps one instruction to the past.
>    QEMU loads on of the prior snapshots and proceeds to the desired
>    instruction forward. When that step is reaches, execution stops.
>  - reverse-continue (or rc): Runs execution "backwards".
>    QEMU tries to find breakpoint or watchpoint by loaded prior snapshot
>    and replaying the execution. Then QEMU loads snapshots again and
>    replays to the latest breakpoint. When there are no breakpoints in
>    the examined section of the execution, QEMU finds one more snapshot
>    and tries again. After the first snapshot is processed, execution
>    stops at this snapshot.
>
> The set of patches include the following modifications:
>  - gdbstub update for reverse debugging support
>  - functions that automatically perform reverse step and reverse
>    continue operations
>  - hmp/qmp commands for manipulating the replay process
>  - improvement of the snapshotting for saving the execution step
>    in the snapshot parameters
>  - other record/replay fixes
>
> The patches are available in the repository:
> https://github.com/ispras/qemu/tree/rr-180207
>
> ---
>
> Pavel Dovgalyuk (17):
>       block: implement bdrv_snapshot_goto for blkreplay
>       replay: disable default snapshot for record/replay
>       replay: update docs for record/replay with block devices
>       replay: don't drain/flush bdrv queue while RR is working
>       replay: finish record/replay before closing the disks
>       migration: introduce icount field for snapshots
>       qcow2: introduce icount field for snapshots
>       replay: introduce info hmp/qmp command
>       replay: introduce breakpoint at the specified step
>       replay: implement replay_seek command to proceed to the desired step
>       replay: flush events when exitting
>       timer: remove replay clock probe in deadline calculation
>       replay: refine replay-time module
>       translator: fix breakpoint processing
>       replay: flush rr queue before loading the vmstate
>       gdbstub: add reverse step support in replay mode
>       gdbstub: add reverse continue support in replay mode
>
>
>  accel/tcg/translator.c    |    8 +
>  block/blkreplay.c         |    8 +
>  block/io.c                |   22 +++
>  block/qapi.c              |   11 +-
>  block/qcow2-snapshot.c    |    9 +
>  block/qcow2.h             |    2
>  blockdev.c                |    3
>  cpus.c                    |   19 ++-
>  docs/replay.txt           |   12 +-
>  exec.c                    |    6 +
>  gdbstub.c                 |   50 +++++++-
>  hmp-commands-info.hx      |   14 ++
>  hmp-commands.hx           |   30 +++++
>  hmp.h                     |    3
>  include/block/snapshot.h  |    1
>  include/sysemu/replay.h   |   18 +++
>  migration/savevm.c        |   11 +-
>  qapi/block-core.json      |    5 +
>  qapi/block.json           |    3
>  qapi/misc.json            |   69 +++++++++++
>  replay/Makefile.objs      |    3
>  replay/replay-debugging.c |  286 +++++++++++++++++++++++++++++++++++++++++++++
>  replay/replay-events.c    |   14 --
>  replay/replay-internal.h  |   10 +-
>  replay/replay-time.c      |   27 ++--
>  replay/replay.c           |   22 +++
>  stubs/replay.c            |   10 ++
>  util/qemu-timer.c         |   11 --
>  vl.c                      |   11 +-
>  29 files changed, 625 insertions(+), 73 deletions(-)
>  create mode 100644 replay/replay-debugging.c
>
> --
> Pavel Dovgalyuk
Ciro Santilli April 28, 2018, 8:17 a.m. UTC | #5
On Sat, Apr 28, 2018 at 9:12 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
>> From: Ciro Santilli [mailto:ciro.santilli@gmail.com]
>> On Thu, Apr 26, 2018 at 1:34 PM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
>> >> From: Ciro Santilli [mailto:ciro.santilli@gmail.com]
>> >> On Wed, Apr 25, 2018 at 1:45 PM, Pavel Dovgalyuk
>> >> <Pavel.Dovgaluk@ispras.ru> wrote:
>> >> > GDB remote protocol supports reverse debugging of the targets.
>> >> > It includes 'reverse step' and 'reverse continue' operations.
>> >> > The first one finds the previous step of the execution,
>> >> > and the second one is intended to stop at the last breakpoint that
>> >> > would happen when the program is executed normally.
>> >> >
>> >> > Reverse debugging is possible in the replay mode, when at least
>> >> > one snapshot was created at the record or replay phase.
>> >> > QEMU can use these snapshots for travelling back in time with GDB.
>> >> >
>> >>
>> >> Hi Pavel,
>> >>
>> >> 1)
>> >>
>> >> Can you provide more details on how to run the reverse debugging? In
>> >> particular how to take the checkpoint?
>> >
>> > There is some information in docs/replay.txt, but I guess, that I can give some more.
>> >
>> >>
>> >> My test setup is described in detail at:
>> >> https://github.com/cirosantilli/qemu-test/tree/8127452e5685ed233dc7357a1fe34b7a2d173480
>> >> command "x86_64/reverse-debug".
>> >>
>> >> Here are the actual commands:
>> >>
>> >> #!/usr/bin/env bash
>> >> set -eu
>> >> dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/.."
>> >> cmd="\
>> >> time \
>> >> ./x86_64-softmmu/qemu-system-x86_64 \
>> >> -M pc \
>> >> -append 'root=/dev/sda console=ttyS0 nokaslr printk.time=y -
>> >> lkmc_eval=\"/rand_check.out;/sbin/ifup -a;wget -S
>> >> google.com;/poweroff.out;\"' \
>> >> -kernel '${dir}/out/x86_64/buildroot/images/bzImage' \
>> >> -nographic \
>> >> -serial mon:stdio \
>> >> -monitor telnet::45454,server,nowait \
>> >> \
>> >> -drive file='${dir}/out/x86_64/buildroot/images/rootfs.ext2.qcow2,if=none,id=img-
>> >> direct,format=qcow2,snapshot'
>> >
>> > The main thing for reverse debugging is snapshotting.
>> > Therefore you should have an image that does not use temporary overlay file (snapshot
>> option).
>> > I'm using the following command line for record:
>> >
>> > rm ./images/xp.ovl
>> > # create overlay to avoid modifying the original image
>> > ./bin/qemu-img create -f qcow2 -b xp.qcow2 ./images/xp.ovl
>> > ./bin/qemu-system-i386 \
>> > # This is workaround for XP. I wonder is it needed for the current version or not.
>> >  -global apic-common.vapic=off \
>> > # using newly created overlay instead of the original image
>> > # rrsnapshot creates the snapshot at the start
>> >  -icount shift=7,rr=record,rrfile=xp.replay,rrsnapshot=init -drive
>> file=./images/xp.ovl,if=none,id=img-direct \
>> >  -drive driver=blkreplay,if=none,image=img-direct,id=img-replay -device ide-hd,drive=img-
>> replay -net none -m 256M -monitor stdio
>> >
>> > While recording I can create some snapshots with savevm.
>> > Command line for replaying differs only in "rr" option. rrsnapshot there loads the initial
>> snapshot.
>> > Any of the previously created snapshots may be specified.
>> > You can also create new snapshots while replaying.
>> >
>>
>> How is the snapshot to be used chosen? Does this patch make it try to
>> smartly use the snapshot that is closest to the target break?
>
> Yes, it selects the closest snapshot.
>
>> Does rrsnapshot select which snapshot will be used, or just creates a
>> snapshot at the start or record?
>
> rrsnapshot creates a snapshot at record and loads it at start.
> It is required, because the disk image is modified by the execution,
> when 'snapshot' option is omitted.
>
>> I have modified my commands to remove snapshot from -drive, and add
>> rrsnapshot=init to -icount and the following works:
>>
>> b start_kernel
>> n
>> n
>> n
>> b
>> n
>> n
>> rc
>
> Great!
>
>> However, if after the "b start_kernel" I make a new snapshot on telnet
>> with "savevm a" to try and make the restore faster, then
>> reverse-continue fails.
>
> That's strange. What did it say?
>

Nothing, it just stayed on the same line.

>> Also, if I do "loadvm a" after "savevm a" while the debugger is
>> attached at start_kernel, the monitor just hangs. Is there a way to
>> restore snapshots while debugging of replay is going on?
>
> Never tried to do this.
> I'll check this out.
>
>
> Pavel Dovgalyuk
>
Ciro Santilli April 28, 2018, 9:38 a.m. UTC | #6
On Sat, Apr 28, 2018 at 10:27 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
>
>
>> -----Original Message-----
>> From: Ciro Santilli [mailto:ciro.santilli@gmail.com]
>> Sent: Saturday, April 28, 2018 11:13 AM
>> To: Pavel Dovgalyuk
>> Subject: Re: [RFC PATCH 00/17] reverse debugging
>>
>> Forgetting about debugging, I belive there is a deadlock in the replay
>> at 63d426dfa4fbfac3d50cda3f553cd975de2b85ea , but it is rare.
>>
>> I have only reproduced it on ARM so far, and I haven't checked pre-patch.
>>
>> The setup is https://github.com/cirosantilli/qemu-
>> test/tree/6a3497f0d84e7c86ef80f7322e24e8a149b93214
>> with images-ab21ef58deed8536bc159c2afd680a4fabd68510.zip
>>
>> Then try to run it several times with:
>>
>> i=0; while true; do date; echo $i; ../qemu-test/arm/rr; i=$(($i+1)); done
>>
>> I think the deadlock can happen in a few different places, but the
>> most common is when the kernel is doing disk related stuff, the last
>> messages before getting stuck are:
>
> It usually happens when there is some bugs in the implementation of the virtual devices.
> Our customers mostly emulates x86-based systems, therefore most of
> the ARM hardware is untested.
>

Hi Pete, do you know anything about this? Traces at:
http://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg05218.html
command at: https://github.com/cirosantilli/qemu-test/blob/6a3497f0d84e7c86ef80f7322e24e8a149b93214/arm/rr

@Pavel: I recommend always replying to both me and to qemu-devel to
preserve a better history of our talk on the tracker.

>> [   11.530325] ALSA device list:
>> [   11.531451]   No soundcards found.
>>
>> and what would follow on a normal replay would be:
>>
>> [   11.551904] EXT4-fs (vda): couldn't mount as ext3 due to feature
>> incompatibilities
>> [   11.619238] EXT4-fs (vda): mounted filesystem without journal. Opts: (null)
>>
>> I then attach GDB with:
>>
>> gdb -q ./arm-softmmu/qemu-system-arm `pgrep qemu`
>
>
>
>
> Pavel Dovgalyuk
>
Ciro Santilli Aug. 10, 2018, 3:41 p.m. UTC | #7
On Thu, Apr 26, 2018 at 1:34 PM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:

> > From: Ciro Santilli [mailto:ciro.santilli@gmail.com]
> > On Wed, Apr 25, 2018 at 1:45 PM, Pavel Dovgalyuk
> > <Pavel.Dovgaluk@ispras.ru> wrote:
> > > GDB remote protocol supports reverse debugging of the targets.
> > > It includes 'reverse step' and 'reverse continue' operations.
> > > The first one finds the previous step of the execution,
> > > and the second one is intended to stop at the last breakpoint that
> > > would happen when the program is executed normally.
> > >
> > > Reverse debugging is possible in the replay mode, when at least
> > > one snapshot was created at the record or replay phase.
> > > QEMU can use these snapshots for travelling back in time with GDB.
> > >
> >
> > Hi Pavel,
> >
> > 1)
> >
> > Can you provide more details on how to run the reverse debugging? In
> > particular how to take the checkpoint?
>
> There is some information in docs/replay.txt, but I guess, that I can give
> some more.
>
> >
> > My test setup is described in detail at:
> > https://github.com/cirosantilli/qemu-test/tree/8127452e5685e
> d233dc7357a1fe34b7a2d173480
> > command "x86_64/reverse-debug".
> >
> > Here are the actual commands:
> >
> > #!/usr/bin/env bash
> > set -eu
> > dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/.."
> > cmd="\
> > time \
> > ./x86_64-softmmu/qemu-system-x86_64 \
> > -M pc \
> > -append 'root=/dev/sda console=ttyS0 nokaslr printk.time=y -
> > lkmc_eval=\"/rand_check.out;/sbin/ifup -a;wget -S
> > google.com;/poweroff.out;\"' \
> > -kernel '${dir}/out/x86_64/buildroot/images/bzImage' \
> > -nographic \
> > -serial mon:stdio \
> > -monitor telnet::45454,server,nowait \
> > \
> > -drive file='${dir}/out/x86_64/buildroot/images/rootfs.ext2.qcow2,
> if=none,id=img-
> > direct,format=qcow2,snapshot'
>
> The main thing for reverse debugging is snapshotting.
> Therefore you should have an image that does not use temporary overlay
> file (snapshot option).
> I'm using the following command line for record:
>
> rm ./images/xp.ovl
> # create overlay to avoid modifying the original image
> ./bin/qemu-img create -f qcow2 -b xp.qcow2 ./images/xp.ovl
> ./bin/qemu-system-i386 \
> # This is workaround for XP. I wonder is it needed for the current version
> or not.
>  -global apic-common.vapic=off \
> # using newly created overlay instead of the original image
> # rrsnapshot creates the snapshot at the start
>  -icount shift=7,rr=record,rrfile=xp.replay,rrsnapshot=init -drive
> file=./images/xp.ovl,if=none,id=img-direct \
>  -drive driver=blkreplay,if=none,image=img-direct,id=img-replay -device
> ide-hd,drive=img-replay -net none -m 256M -monitor stdio
>
> While recording I can create some snapshots with savevm.
> Command line for replaying differs only in "rr" option. rrsnapshot there
> loads the initial snapshot.
> Any of the previously created snapshots may be specified.
> You can also create new snapshots while replaying.
>
>
> > \
> > -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
> > -device ide-hd,drive=img-blkreplay \
> > \
> > -netdev user,id=net1 \
> > -device rtl8139,netdev=net1 \
> > -object filter-replay,id=replay,netdev=net1 \
> > "
> > cmd="${cmd} $@"
> > echo "$cmd"
> > eval "$cmd -icount 'shift=7,rr=record,rrfile=replay.bin'"
> > eval "$cmd -icount 'shift=7,rr=replay,rrfile=replay.bin' -S -s"
> >
> > Then I take a snapshot right at the beginning of the execution:
> >
> > telnet 45454
> > savevm a
> >
> > And on another shell:
> >
> > /data/git/linux-kernel-module-cheat/out/x86_64/buildroot/hos
> t/usr/bin/x86_64-linux-gdb
> > \
> > -q \
> > -ex 'file vmlinux' \
> > -ex 'target remote localhost:1234' \
> > -ex 'break start_kernel' \
> > -ex 'continue' \
> >
> > But now if I try on GDB:
> >
> > next
> > next
> > next
> > reverse-continue
> >
> > hoping to go back to start_kernel, but nothing happens.
>
> Yes, because you are missing your snapshot, that was actually created in
> the temporary overlay.
>
> > Same behavior if I take the snapshot after reaching start_kernel instead.
> >
> > 2)
> >
> > I wonder if it would be possible to expose checkpoint taking through
> > GDB example via:
> > https://sourceware.org/gdb/onlinedocs/gdb/Checkpoint_002fRestart.html
>
> We'll check this out.
>

Actually, this is not needed, I have learnt now that you can send QEMU
monitor commands with `monitor <qemu-monitor-command>`, so e.g. `monitor
savevm a`.


>
> > Or some other more convenient checkpoint generation method, e.g.
> > automatically take checkpoints every N instructions.
>
> We implemented 'taking snapshots every N seconds', but I'll prefer to
> submit
> it later, after approving the main idea.
>
> > > Running the execution in replay mode allows using GDB reverse debugging
> > > commands:
> > >  - reverse-stepi (or rsi): Steps one instruction to the past.
> > >    QEMU loads on of the prior snapshots and proceeds to the desired
> > >    instruction forward. When that step is reaches, execution stops.
> > >  - reverse-continue (or rc): Runs execution "backwards".
> > >    QEMU tries to find breakpoint or watchpoint by loaded prior snapshot
> > >    and replaying the execution. Then QEMU loads snapshots again and
> > >    replays to the latest breakpoint. When there are no breakpoints in
> > >    the examined section of the execution, QEMU finds one more snapshot
> > >    and tries again. After the first snapshot is processed, execution
> > >    stops at this snapshot.
> > >
> > > The set of patches include the following modifications:
> > >  - gdbstub update for reverse debugging support
> > >  - functions that automatically perform reverse step and reverse
> > >    continue operations
> > >  - hmp/qmp commands for manipulating the replay process
> > >  - improvement of the snapshotting for saving the execution step
> > >    in the snapshot parameters
> > >  - other record/replay fixes
> > >
> > > The patches are available in the repository:
> > > https://github.com/ispras/qemu/tree/rr-180207
>
> Pavel Dovgalyuk
>
>