mbox series

[v5,00/24] Fixing record/replay and adding reverse debugging

Message ID 20180725121311.12867.21729.stgit@pasha-VirtualBox
Headers show
Series Fixing record/replay and adding reverse debugging | expand

Message

Pavel Dovgalyuk July 25, 2018, 12:13 p.m. UTC
GDB remote protocol supports reverse debugging of the targets.
It includes 'reverse step' and 'reverse continue' operations.
The first one finds the previous step of the execution,
and the second one is intended to stop at the last breakpoint that
would happen when the program is executed normally.

Reverse debugging is possible in the replay mode, when at least
one snapshot was created at the record or replay phase.
QEMU can use these snapshots for travelling back in time with GDB.

Running the execution in replay mode allows using GDB reverse debugging
commands:
 - reverse-stepi (or rsi): Steps one instruction to the past.
   QEMU loads on of the prior snapshots and proceeds to the desired
   instruction forward. When that step is reaches, execution stops.
 - reverse-continue (or rc): Runs execution "backwards".
   QEMU tries to find breakpoint or watchpoint by loaded prior snapshot
   and replaying the execution. Then QEMU loads snapshots again and
   replays to the latest breakpoint. When there are no breakpoints in
   the examined section of the execution, QEMU finds one more snapshot
   and tries again. After the first snapshot is processed, execution
   stops at this snapshot.

The set of patches include the following modifications:
 - fixes of record/replay caused by the QEMU core changes
 - gdbstub update for reverse debugging support
 - functions that automatically perform reverse step and reverse
   continue operations
 - hmp/qmp commands for manipulating the replay process
 - improvement of the snapshotting for saving the execution step
   in the snapshot parameters
 - other record/replay fixes

The patches are available in the repository:
https://github.com/ispras/qemu/tree/rr-180725

v5 changes:
 - multiple fixes of record/replay bugs appeared after QEMU core update
 - changed reverse debugging to 'since 3.1'

v4 changes:
 - changed 'since 2.13' to 'since 3.0' in json (as suggested by Eric Blake)

v3 changes:
 - Fixed PS/2 bug with save/load vm, which caused failures of the replay.
 - Rebased to the new code base.
 - Minor fixes.

v2 changes:
 - documented reverse debugging
 - fixed start vmstate loading in record mode
 - documented qcow2 changes (as suggested by Eric Blake)
 - made icount SnapshotInfo field optional (as suggested by Eric Blake)
 - renamed qmp commands (as suggested by Eric Blake)
 - minor changes

---

Pavel Dovgalyuk (24):
      block: implement bdrv_snapshot_goto for blkreplay
      replay: disable default snapshot for record/replay
      replay: update docs for record/replay with block devices
      replay: don't drain/flush bdrv queue while RR is working
      replay: finish record/replay before closing the disks
      qcow2: introduce icount field for snapshots
      migration: introduce icount field for snapshots
      replay: introduce info hmp/qmp command
      replay: introduce breakpoint at the specified step
      replay: implement replay-seek command to proceed to the desired step
      replay: flush events when exiting
      timer: remove replay clock probe in deadline calculation
      replay: refine replay-time module
      translator: fix breakpoint processing
      replay: flush rr queue before loading the vmstate
      gdbstub: add reverse step support in replay mode
      gdbstub: add reverse continue support in replay mode
      replay: describe reverse debugging in docs/replay.txt
      replay: allow loading any snapshots before recording
      ps2: prevent changing irq state on save and load
      replay: wake up vCPU when replaying
      replay: replay BH for IDE trim operation
      replay: add BH oneshot event for block layer
      slirp: fix ipv6 timers


 accel/tcg/translator.c    |    9 +
 block/blkreplay.c         |    8 +
 block/block-backend.c     |    3 
 block/io.c                |   22 +++
 block/qapi.c              |   17 ++-
 block/qcow2-snapshot.c    |    9 +
 block/qcow2.h             |    2 
 blockdev.c                |   10 ++
 cpus.c                    |   50 +++++---
 docs/interop/qcow2.txt    |    4 +
 docs/replay.txt           |   45 +++++++
 exec.c                    |    6 +
 gdbstub.c                 |   50 +++++++-
 hmp-commands-info.hx      |   14 ++
 hmp-commands.hx           |   30 +++++
 hmp.h                     |    3 
 hw/ide/core.c             |    3 
 hw/input/ps2.c            |    8 +
 include/block/snapshot.h  |    1 
 include/sysemu/replay.h   |   24 ++++
 migration/savevm.c        |   15 +-
 qapi/block-core.json      |    5 +
 qapi/block.json           |    3 
 qapi/misc.json            |   68 +++++++++++
 replay/Makefile.objs      |    3 
 replay/replay-debugging.c |  287 +++++++++++++++++++++++++++++++++++++++++++++
 replay/replay-events.c    |   30 +++--
 replay/replay-internal.h  |   11 +-
 replay/replay-snapshot.c  |   17 ++-
 replay/replay-time.c      |   27 ++--
 replay/replay.c           |   36 +++++-
 slirp/ip6_icmp.c          |    6 -
 stubs/replay.c            |   16 +++
 util/qemu-timer.c         |   11 --
 vl.c                      |   18 ++-
 35 files changed, 772 insertions(+), 99 deletions(-)
 create mode 100644 replay/replay-debugging.c

Comments

no-reply@patchew.org July 25, 2018, 2:15 p.m. UTC | #1
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180725121311.12867.21729.stgit@pasha-VirtualBox
Subject: [Qemu-devel] [PATCH v5 00/24] Fixing record/replay and adding reverse debugging

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
b933359fcb slirp: fix ipv6 timers
08f5dcb6f9 replay: add BH oneshot event for block layer
1d795aa6f9 replay: replay BH for IDE trim operation
22a0a68431 replay: wake up vCPU when replaying
8a458d20ac ps2: prevent changing irq state on save and load
c076be246d replay: allow loading any snapshots before recording
10622de164 replay: describe reverse debugging in docs/replay.txt
a9f8b005f0 gdbstub: add reverse continue support in replay mode
c1b2f4385e gdbstub: add reverse step support in replay mode
9bd4685704 replay: flush rr queue before loading the vmstate
cffb0d860d translator: fix breakpoint processing
05e4bd25b6 replay: refine replay-time module
ed42025371 timer: remove replay clock probe in deadline calculation
36f5132987 replay: flush events when exiting
ee8c956c92 replay: implement replay-seek command to proceed to the desired step
9aade36782 replay: introduce breakpoint at the specified step
36fb64416b replay: introduce info hmp/qmp command
84f414e5bf migration: introduce icount field for snapshots
67e35a07df qcow2: introduce icount field for snapshots
74d11dda54 replay: finish record/replay before closing the disks
eea5cde9f5 replay: don't drain/flush bdrv queue while RR is working
a96d8d5e35 replay: update docs for record/replay with block devices
e673d40571 replay: disable default snapshot for record/replay
1ea43d85a7 block: implement bdrv_snapshot_goto for blkreplay

=== OUTPUT BEGIN ===
Checking PATCH 1/24: block: implement bdrv_snapshot_goto for blkreplay...
Checking PATCH 2/24: replay: disable default snapshot for record/replay...
Checking PATCH 3/24: replay: update docs for record/replay with block devices...
Checking PATCH 4/24: replay: don't drain/flush bdrv queue while RR is working...
Checking PATCH 5/24: replay: finish record/replay before closing the disks...
Checking PATCH 6/24: qcow2: introduce icount field for snapshots...
Checking PATCH 7/24: migration: introduce icount field for snapshots...
Checking PATCH 8/24: replay: introduce info hmp/qmp command...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#114: 
new file mode 100644

total: 0 errors, 1 warnings, 132 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 9/24: replay: introduce breakpoint at the specified step...
Checking PATCH 10/24: replay: implement replay-seek command to proceed to the desired step...
Checking PATCH 11/24: replay: flush events when exiting...
Checking PATCH 12/24: timer: remove replay clock probe in deadline calculation...
WARNING: line over 80 characters
#37: FILE: util/qemu-timer.c:584:
+                                            timerlist_deadline_ns(tlg->tl[type]));

total: 0 errors, 1 warnings, 19 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 13/24: replay: refine replay-time module...
Checking PATCH 14/24: translator: fix breakpoint processing...
Checking PATCH 15/24: replay: flush rr queue before loading the vmstate...
Checking PATCH 16/24: gdbstub: add reverse step support in replay mode...
Checking PATCH 17/24: gdbstub: add reverse continue support in replay mode...
Checking PATCH 18/24: replay: describe reverse debugging in docs/replay.txt...
Checking PATCH 19/24: replay: allow loading any snapshots before recording...
Checking PATCH 20/24: ps2: prevent changing irq state on save and load...
Checking PATCH 21/24: replay: wake up vCPU when replaying...
Checking PATCH 22/24: replay: replay BH for IDE trim operation...
Checking PATCH 23/24: replay: add BH oneshot event for block layer...
ERROR: "(foo*)" should be "(foo *)"
#59: FILE: replay/replay-events.c:41:
+        ((QEMUBHFunc*)event->opaque)(event->opaque2);

ERROR: space required after that ',' (ctx:VxV)
#69: FILE: replay/replay-events.c:139:
+    QEMUBHFunc *cb,void *opaque)
                   ^

ERROR: space required after that ',' (ctx:VxV)
#133: FILE: stubs/replay.c:95:
+    QEMUBHFunc *cb,void *opaque)
                   ^

total: 3 errors, 0 warnings, 88 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 24/24: slirp: fix ipv6 timers...
WARNING: line over 80 characters
#31: FILE: slirp/ip6_icmp.c:30:
+    slirp->ra_timer = timer_new_ms(QEMU_CLOCK_REALTIME, ra_timer_handler, slirp);

total: 0 errors, 1 warnings, 19 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Ciro Santilli Aug. 7, 2018, 11:13 p.m. UTC | #2
OK, finally got some time to try it out, I'm using
c42634d8e3428cfa60672c3ba89cabefc720cde9 from rr-180725.

Replay works well as far as I can tell, so I moved to the reverse debugging:

/home/ciro/bak/git/linux-kernel-module-cheat/out/x86_
64/buildroot/build/host-qemu-custom.rr/x86_64-softmmu/qemu-system-x86_64 \
-M pc \
-append 'root=/dev/sda nopat console_msg_format=syslog nokaslr norandmaps
printk.devkmsg=on printk.time=y console=ttyS0 -  lkmc_eval_base64="
L3JhbmRfY2hlY2sub3V0Oy9wb3dlcm9mZi5vdXQ7"' \
-kernel '/home/ciro/bak/git/linux-kernel-module-cheat/out/x86_
64/buildroot/build/linux-custom.default/arch/x86/boot/bzImage' \
-m '256M' \
-monitor 'telnet::45454,server,nowait' \
-nographic \
-serial mon:stdio \
-smp '1' \
\
-drive 'file=/home/ciro/bak/git/linux-kernel-module-cheat/out/
x86_64/buildroot/images/rootfs.ext2.qcow2,format=qcow2,if=none,id=img-direct'
\
-drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
-device ide-hd,drive=img-blkreplay \
\
-object filter-replay,id=replay,netdev=net0 \
-device rtl8139,netdev=net0 \
-netdev 'user,hostfwd=tcp::45455-:45455,hostfwd=tcp::45456-:22,id=net0' \
\
-icount 'shift=7,rr=record,rrfile=/home/ciro/bak/git/linux-
kernel-module-cheat/out/x86_64/qemu/0/rrfile' \

and replay with:

-icount 'shift=7,rr=replay,rrfile=/home/ciro/bak/git/linux-
kernel-module-cheat/out/x86_64/qemu/0/rrfile' \
-gdb 'tcp::45457' \
-S \

Then, I do

/home/ciro/bak/git/linux-kernel-module-cheat/out/x86_
64/buildroot/host/usr/bin/x86_64-linux-gdb \
 -q \
-ex 'add-auto-load-safe-path /home/ciro/bak/git/linux-kernel-module-cheat' \
-ex 'file vmlinux' \
-ex 'target remote localhost:45457' \
-ex 'break start_kernel' \
  -ex continue \
-ex 'lx-symbols ../kernel_module-1.0/' \

Then in GDB:

n
n
n
n
reverse-continue

expecting it to return me to start_kernel, but instead it left me in the
same place that I'm at.

I also tried to manually checkpoint from qemu monitor at the very start,
but it didn't change anything.

bzImage at: https://github.com/cirosantilli/linux-kernel-
module-cheat/releases/download/sha-19f4d00f9b13aa67369e32ec7cd351
8950c6f30e/bzImage and docs at: https://github.com/
cirosantilli/linux-kernel-module-cheat/tree/19f4d00f9b13aa67369e32ec7cd351
8950c6f30e#qemu-record-and-replay


On Wed, Jul 25, 2018 at 1:13 PM, Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
wrote:

> GDB remote protocol supports reverse debugging of the targets.
> It includes 'reverse step' and 'reverse continue' operations.
> The first one finds the previous step of the execution,
> and the second one is intended to stop at the last breakpoint that
> would happen when the program is executed normally.
>
> Reverse debugging is possible in the replay mode, when at least
> one snapshot was created at the record or replay phase.
> QEMU can use these snapshots for travelling back in time with GDB.
>
> Running the execution in replay mode allows using GDB reverse debugging
> commands:
>  - reverse-stepi (or rsi): Steps one instruction to the past.
>    QEMU loads on of the prior snapshots and proceeds to the desired
>    instruction forward. When that step is reaches, execution stops.
>  - reverse-continue (or rc): Runs execution "backwards".
>    QEMU tries to find breakpoint or watchpoint by loaded prior snapshot
>    and replaying the execution. Then QEMU loads snapshots again and
>    replays to the latest breakpoint. When there are no breakpoints in
>    the examined section of the execution, QEMU finds one more snapshot
>    and tries again. After the first snapshot is processed, execution
>    stops at this snapshot.
>
> The set of patches include the following modifications:
>  - fixes of record/replay caused by the QEMU core changes
>  - gdbstub update for reverse debugging support
>  - functions that automatically perform reverse step and reverse
>    continue operations
>  - hmp/qmp commands for manipulating the replay process
>  - improvement of the snapshotting for saving the execution step
>    in the snapshot parameters
>  - other record/replay fixes
>
> The patches are available in the repository:
> https://github.com/ispras/qemu/tree/rr-180725
>
> v5 changes:
>  - multiple fixes of record/replay bugs appeared after QEMU core update
>  - changed reverse debugging to 'since 3.1'
>
> v4 changes:
>  - changed 'since 2.13' to 'since 3.0' in json (as suggested by Eric Blake)
>
> v3 changes:
>  - Fixed PS/2 bug with save/load vm, which caused failures of the replay.
>  - Rebased to the new code base.
>  - Minor fixes.
>
> v2 changes:
>  - documented reverse debugging
>  - fixed start vmstate loading in record mode
>  - documented qcow2 changes (as suggested by Eric Blake)
>  - made icount SnapshotInfo field optional (as suggested by Eric Blake)
>  - renamed qmp commands (as suggested by Eric Blake)
>  - minor changes
>
> ---
>
> Pavel Dovgalyuk (24):
>       block: implement bdrv_snapshot_goto for blkreplay
>       replay: disable default snapshot for record/replay
>       replay: update docs for record/replay with block devices
>       replay: don't drain/flush bdrv queue while RR is working
>       replay: finish record/replay before closing the disks
>       qcow2: introduce icount field for snapshots
>       migration: introduce icount field for snapshots
>       replay: introduce info hmp/qmp command
>       replay: introduce breakpoint at the specified step
>       replay: implement replay-seek command to proceed to the desired step
>       replay: flush events when exiting
>       timer: remove replay clock probe in deadline calculation
>       replay: refine replay-time module
>       translator: fix breakpoint processing
>       replay: flush rr queue before loading the vmstate
>       gdbstub: add reverse step support in replay mode
>       gdbstub: add reverse continue support in replay mode
>       replay: describe reverse debugging in docs/replay.txt
>       replay: allow loading any snapshots before recording
>       ps2: prevent changing irq state on save and load
>       replay: wake up vCPU when replaying
>       replay: replay BH for IDE trim operation
>       replay: add BH oneshot event for block layer
>       slirp: fix ipv6 timers
>
>
>  accel/tcg/translator.c    |    9 +
>  block/blkreplay.c         |    8 +
>  block/block-backend.c     |    3
>  block/io.c                |   22 +++
>  block/qapi.c              |   17 ++-
>  block/qcow2-snapshot.c    |    9 +
>  block/qcow2.h             |    2
>  blockdev.c                |   10 ++
>  cpus.c                    |   50 +++++---
>  docs/interop/qcow2.txt    |    4 +
>  docs/replay.txt           |   45 +++++++
>  exec.c                    |    6 +
>  gdbstub.c                 |   50 +++++++-
>  hmp-commands-info.hx      |   14 ++
>  hmp-commands.hx           |   30 +++++
>  hmp.h                     |    3
>  hw/ide/core.c             |    3
>  hw/input/ps2.c            |    8 +
>  include/block/snapshot.h  |    1
>  include/sysemu/replay.h   |   24 ++++
>  migration/savevm.c        |   15 +-
>  qapi/block-core.json      |    5 +
>  qapi/block.json           |    3
>  qapi/misc.json            |   68 +++++++++++
>  replay/Makefile.objs      |    3
>  replay/replay-debugging.c |  287 ++++++++++++++++++++++++++++++
> +++++++++++++++
>  replay/replay-events.c    |   30 +++--
>  replay/replay-internal.h  |   11 +-
>  replay/replay-snapshot.c  |   17 ++-
>  replay/replay-time.c      |   27 ++--
>  replay/replay.c           |   36 +++++-
>  slirp/ip6_icmp.c          |    6 -
>  stubs/replay.c            |   16 +++
>  util/qemu-timer.c         |   11 --
>  vl.c                      |   18 ++-
>  35 files changed, 772 insertions(+), 99 deletions(-)
>  create mode 100644 replay/replay-debugging.c
>
> --
> Pavel Dovgalyuk
>
Pavel Dovgalyuk Sept. 12, 2018, 8:14 a.m. UTC | #3
Hi, Ciro!

I found several issues in your command lines.

Ciro Santilli писал 2018-08-08 02:13:
> OK, finally got some time to try it out, I'm using
> c42634d8e3428cfa60672c3ba89cabefc720cde9 from rr-180725.
> 
> Replay works well as far as I can tell, so I moved to the reverse
> debugging:
> 
> /home/ciro/bak/git/linux-kernel-module-cheat/out/x86_64/buildroot/build/host-qemu-custom.rr/x86_64-softmmu/qemu-system-x86_64
> \
> -M pc \
> -append 'root=/dev/sda nopat console_msg_format=syslog nokaslr
> norandmaps printk.devkmsg=on printk.time=y console=ttyS0 -
> lkmc_eval_base64="L3JhbmRfY2hlY2sub3V0Oy9wb3dlcm9mZi5vdXQ7"' \
> -kernel
> '/home/ciro/bak/git/linux-kernel-module-cheat/out/x86_64/buildroot/build/linux-custom.default/arch/x86/boot/bzImage'
> \
> -m '256M' \
> -monitor 'telnet::45454,server,nowait' \
> -nographic \
> -serial mon:stdio \
> -smp '1' \
> \
> -drive
> 'file=/home/ciro/bak/git/linux-kernel-module-cheat/out/x86_64/buildroot/images/rootfs.ext2.qcow2,format=qcow2,if=none,id=img-direct'

You'll probably need an overlay, it you want this file to be unchanged 
by VM.

Can you also provide this file for testing? I found only bzImage.

> \
> -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
> -device ide-hd,drive=img-blkreplay \
> \
> -object filter-replay,id=replay,netdev=net0 \
> -device rtl8139,netdev=net0 \
> -netdev
> 'user,hostfwd=tcp::45455-:45455,hostfwd=tcp::45456-:22,id=net0' \
> \
> -icount
> 'shift=7,rr=record,rrfile=/home/ciro/bak/git/linux-kernel-module-cheat/out/x86_64/qemu/0/rrfile'

You need to specify rrsnapshot=<name> option for creating the initial VM 
snapshot.
This option creates snapshot at record and loads it at replay. GDB can 
also use this snapshot for reverse execution.

> \
> 
> and replay with:
> 
> -icount
> 'shift=7,rr=replay,rrfile=/home/ciro/bak/git/linux-kernel-module-cheat/out/x86_64/qemu/0/rrfile'
> \
> -gdb 'tcp::45457' \
> -S \
> 
> Then, I do
> 
> /home/ciro/bak/git/linux-kernel-module-cheat/out/x86_64/buildroot/host/usr/bin/x86_64-linux-gdb
> \
>  -q \
> -ex 'add-auto-load-safe-path
> /home/ciro/bak/git/linux-kernel-module-cheat' \
> -ex 'file vmlinux' \
> -ex 'target remote localhost:45457' \
> -ex 'break start_kernel' \
>   -ex continue \
> -ex 'lx-symbols ../kernel_module-1.0/' \
> 
> Then in GDB:
> 
> n
> n
> n
> n
> reverse-continue
> 
> expecting it to return me to start_kernel, but instead it left me in
> the same place that I'm at.

Right, because there were no checkpoints. The initial one must be 
created at recording phase.



Pavel Dovgalyuk