mbox series

[v2,for-8.2,00/19] ppc: record-replay enablement and fixes

Message ID 20230808042001.411094-1-npiggin@gmail.com
Headers show
Series ppc: record-replay enablement and fixes | expand

Message

Nicholas Piggin Aug. 8, 2023, 4:19 a.m. UTC
The patches in this series has been seen a few times in various
iterations.

There are two main pieces, some assorted small fixes and tests for
record-replay, plus a large set of decrementer fixes. I merged these
into one series rather than send decrementer fixes alone first, because
record-replay has been very good at uncovering timer problems, so it's
good to have those test cases in at the same time IMO.

Some of the fixes we might take to stable, but unclear which.
Decrementer fixes were a bit of a tangle so maybe we just leave those
alone since they work okay.

The decrementer is not emulated perfectly still. Underflow from -ve
to +ve is not implemented, for one. I started doing that but it's not
trivial so better stop here for now.

For record-replay, pseries is now quite solid with rr. Surely some
issues to iron out but it is becoming usable.

powernv record-replay has some known problems migrating edge-triggered
decrementer, and edge triggered msgsnd. Also it seems to get stuck in
xive init somewhere when replaying from checkpoint, so there is probably
some state in xive not being reset. But at least it runs the avocado
test and seems close to working, so I've added that test case so we
don't go backwards (ha!).

Other machine types might not be too far off if there is interest. I
found it quite difficult to find these problems though, reverse
debugging will sometimes just lock up, stop at wrong location, or abort
with wrong event. Difficult understand what went wrong. Worst case I had
to basically bisect the replay of the trace, and find the minimum length
of replay that hit the problem -- that sometimes would land near a
mtDEC or timer interrupt or similar.

Thanks,
Nick

Nicholas Piggin (19):
  ppc/vhyp: reset exception state when handling vhyp hcall
  ppc/vof: Fix missed fields in VOF cleanup
  hw/ppc/ppc.c: Tidy over-long lines
  hw/ppc: Introduce functions for conversion between timebase and
    nanoseconds
  host-utils: Add muldiv64_round_up
  hw/ppc: Round up the decrementer interval when converting to ns
  hw/ppc: Avoid decrementer rounding errors
  target/ppc: Sign-extend large decrementer to 64-bits
  hw/ppc: Always store the decrementer value
  target/ppc: Migrate DECR SPR
  hw/ppc: Reset timebase facilities on machine reset
  hw/ppc: Read time only once to perform decrementer write
  target/ppc: Fix CPU reservation migration for record-replay
  target/ppc: Fix timebase reset with record-replay
  spapr: Fix machine reset deadlock from replay-record
  spapr: Fix record-replay machine reset consuming too many events
  tests/avocado: boot ppc64 pseries replay-record test to Linux VFS
    mount
  tests/avocado: reverse-debugging cope with re-executing breakpoints
  tests/avocado: ppc64 reverse debugging tests for pseries and powernv

 hw/ppc/mac_oldworld.c              |   1 +
 hw/ppc/pegasos2.c                  |   1 +
 hw/ppc/pnv_core.c                  |   2 +
 hw/ppc/ppc.c                       | 236 +++++++++++++++++++----------
 hw/ppc/prep.c                      |   1 +
 hw/ppc/spapr.c                     |  32 +++-
 hw/ppc/spapr_cpu_core.c            |   2 +
 hw/ppc/vof.c                       |   2 +
 include/hw/ppc/ppc.h               |   3 +-
 include/hw/ppc/spapr.h             |   2 +
 include/qemu/host-utils.h          |  21 ++-
 target/ppc/compat.c                |  19 +++
 target/ppc/cpu.h                   |   3 +
 target/ppc/excp_helper.c           |   3 +
 target/ppc/machine.c               |  40 ++++-
 target/ppc/translate.c             |   4 +
 tests/avocado/replay_kernel.py     |   3 +-
 tests/avocado/reverse_debugging.py |  54 ++++++-
 18 files changed, 330 insertions(+), 99 deletions(-)

Comments

Cédric Le Goater Aug. 29, 2023, 4:43 p.m. UTC | #1
Hello,

On 8/8/23 06:19, Nicholas Piggin wrote:
> The patches in this series has been seen a few times in various
> iterations.
> 
> There are two main pieces, some assorted small fixes and tests for
> record-replay, plus a large set of decrementer fixes. I merged these
> into one series rather than send decrementer fixes alone first, because
> record-replay has been very good at uncovering timer problems, so it's
> good to have those test cases in at the same time IMO.
> 
> Some of the fixes we might take to stable, but unclear which.
> Decrementer fixes were a bit of a tangle so maybe we just leave those
> alone since they work okay.
> 
> The decrementer is not emulated perfectly still. Underflow from -ve
> to +ve is not implemented, for one. I started doing that but it's not
> trivial so better stop here for now.
> 
> For record-replay, pseries is now quite solid with rr. Surely some
> issues to iron out but it is becoming usable.
> 
> powernv record-replay has some known problems migrating edge-triggered
> decrementer, and edge triggered msgsnd. Also it seems to get stuck in
> xive init somewhere when replaying from checkpoint, so there is probably
> some state in xive not being reset. But at least it runs the avocado
> test and seems close to working, so I've added that test case so we
> don't go backwards (ha!).
> 
> Other machine types might not be too far off if there is interest. I
> found it quite difficult to find these problems though, reverse
> debugging will sometimes just lock up, stop at wrong location, or abort
> with wrong event. Difficult understand what went wrong. Worst case I had
> to basically bisect the replay of the trace, and find the minimum length
> of replay that hit the problem -- that sometimes would land near a
> mtDEC or timer interrupt or similar.
> 
> Thanks,
> Nick
> 
> Nicholas Piggin (19):
>    ppc/vhyp: reset exception state when handling vhyp hcall
>    ppc/vof: Fix missed fields in VOF cleanup
>    hw/ppc/ppc.c: Tidy over-long lines
>    hw/ppc: Introduce functions for conversion between timebase and
>      nanoseconds
>    host-utils: Add muldiv64_round_up
>    hw/ppc: Round up the decrementer interval when converting to ns
>    hw/ppc: Avoid decrementer rounding errors
>    target/ppc: Sign-extend large decrementer to 64-bits
>    hw/ppc: Always store the decrementer value
>    target/ppc: Migrate DECR SPR
>    hw/ppc: Reset timebase facilities on machine reset
>    hw/ppc: Read time only once to perform decrementer write
>    target/ppc: Fix CPU reservation migration for record-replay
>    target/ppc: Fix timebase reset with record-replay
>    spapr: Fix machine reset deadlock from replay-record
>    spapr: Fix record-replay machine reset consuming too many events
>    tests/avocado: boot ppc64 pseries replay-record test to Linux VFS
>      mount
>    tests/avocado: reverse-debugging cope with re-executing breakpoints
>    tests/avocado: ppc64 reverse debugging tests for pseries and powernv
> 
>   hw/ppc/mac_oldworld.c              |   1 +
>   hw/ppc/pegasos2.c                  |   1 +
>   hw/ppc/pnv_core.c                  |   2 +
>   hw/ppc/ppc.c                       | 236 +++++++++++++++++++----------
>   hw/ppc/prep.c                      |   1 +
>   hw/ppc/spapr.c                     |  32 +++-
>   hw/ppc/spapr_cpu_core.c            |   2 +
>   hw/ppc/vof.c                       |   2 +
>   include/hw/ppc/ppc.h               |   3 +-
>   include/hw/ppc/spapr.h             |   2 +
>   include/qemu/host-utils.h          |  21 ++-
>   target/ppc/compat.c                |  19 +++
>   target/ppc/cpu.h                   |   3 +
>   target/ppc/excp_helper.c           |   3 +
>   target/ppc/machine.c               |  40 ++++-
>   target/ppc/translate.c             |   4 +
>   tests/avocado/replay_kernel.py     |   3 +-
>   tests/avocado/reverse_debugging.py |  54 ++++++-
>   18 files changed, 330 insertions(+), 99 deletions(-)
> 

I am preparing a PR with this series. It is time to take a look at it if you
haven't already !

Thanks,

C.