mbox series

[v3,00/10] Improve RISC-V Perf support using SBI PMU and sscofpmf extension

Message ID 20210910192757.2309100-1-atish.patra@wdc.com
Headers show
Series Improve RISC-V Perf support using SBI PMU and sscofpmf extension | expand

Message

Atish Patra Sept. 10, 2021, 7:27 p.m. UTC
This series adds improved perf support for RISC-V based system using
SBI PMU extension[1] and Sscofpmf extension[2]. The SBI PMU extension allows
the kernel to program the counters for different events and start/stop counters
while the sscofpmf extension allows the counter overflow interrupt and privilege
mode filtering. An hardware platform can leverage SBI PMU extension without
the sscofpmf extension if it supports mcountinhibit and mcounteren. However,
the reverse is not true. With both of these extension enabled, a platform can
take advantage of all both event counting and sampling using perf tool. 

This series introduces a platform perf driver instead of a existing arch
specific implementation. The new perf implementation has adopted a modular
approach where most of the generic event handling is done in the core library
while individual PMUs need to only implement necessary features specific to
the PMU. This is easily extensible and any future RISC-V PMU implementation
can leverage this. Currently, SBI PMU driver & legacy PMU driver are implemented
as a part of this series.

The legacy driver tries to reimplement the existing minimal perf under a new
config to maintain backward compatibility. This implementation only allows
monitoring of always running cycle/instruction counters. Moreover, they can
not be started or stopped. In general, this is very limited and not very useful.
That's why, I am not very keen to carry the support into the new driver.
However, I don't want to break perf for any existing hardware platforms.
If nobody really uses perf currently, I will be happy to drop PATCH 4.

This series has been tested in Qemu on both RV64 & RV32. Qemu[5] & OpenSBI [3]
patches are required to test it. Qemu changes are not backward compatible.
That means, you can not use perf anymore on older Qemu versions with latest
OpenSBI and/or Kernel. However, newer kernel will just use legacy pmu driver if
old OpenSBI is detected or hardware doesn't implement mcountinhibit.

Here is an output of perf stat/report while running hackbench with OpenSBI & Linux
kernel patches applied [3].

Perf stat:
=========

[root@fedora-riscv riscv]# perf stat -e r8000000000000005 -e r8000000000000007
-e r8000000000000006 -e r0000000000020002 -e r0000000000020004 -e branch-misses
-e cache-misses -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses
-e cycles -e instructions ./hackbench -pipe 15 process
Running with 15*40 (== 600) tasks.
Time: 6.578

 Performance counter stats for './hackbench -pipe 15 process':

             6,491      r8000000000000005      (52.59%) --> SBI_PMU_FW_SET_TIMER
            20,433      r8000000000000007      (60.74%) --> SBI_PMU_FW_IPI_RECVD
            21,271      r8000000000000006      (68.71%) --> SBI_PMU_FW_IPI_SENT
                 0      r0000000000020002      (76.55%)
     <not counted>      r0000000000020004      (0.00%)
     <not counted>      branch-misses          (0.00%)
     <not counted>      cache-misses           (0.00%)
        57,537,853      dTLB-load-misses       (9.49%)
         2,821,147      dTLB-store-misses      (18.64%)
        52,928,130      iTLB-load-misses       (27.53%)
    89,521,791,110      cycles                 (36.08%)
    90,678,132,464      instructions #    1.01  insn per cycle (44.44%)

       6.975908032 seconds time elapsed

       3.130950000 seconds user
      24.353310000 seconds sys

The patches can also be found in the github[4].

Perf record:
============
[root@fedora-riscv riscv]# perf record -e cycles -e instructions -e \
dTLB-load-misses -e dTLB-store-misses -c 1000 ./hackbench -pipe 15 process 15
Running with 15*40 (== 600) tasks.
Time: 1.238
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.106 MB perf.data (1020 samples) ]

[root@fedora-riscv riscv]# perf report  
Available samples                                                               
372 cycles                                                                     ◆
372 instructions                                                               ▒
262 dTLB-load-misses                                                           ▒
14 dTLB-store-misses      

The patches can also be found in the github[4].

[1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
[2] https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit
[3] https://github.com/atishp04/opensbi/tree/pmu_sscofpmf 
[4] https://github.com/atishp04/linux/tree/riscv_pmu_v3
[5] https://github.com/atishp04/qemu/tree/riscv_pmu_v2

Changes from v3->v4:
1. Added interrupt overflow support.
2. Cleaned up legacy driver initialization.
3. Supports perf record now.
4. Added the DT binding and maintainers file.
5. Changed cpu hotplug notifier to be multi-state.
6. OpenSBI doesn't disable cycle/instret counter during boot. Update the
   perf code to disable all the counter during the boot.

Changes from v1->v2
1. Implemented the latest SBI PMU extension specification.
2. The core platform driver was changed to operate as a library while only
   sbi based PMU is built as a driver. The legacy one is just a fallback if
   SBI PMU extension is not available.

Atish Patra (10):
RISC-V: Remove the current perf implementation
RISC-V: Add CSR encodings for all HPMCOUNTERS
RISC-V: Add a perf core library for pmu drivers
RISC-V: Add a simple platform driver for RISC-V legacy perf
RISC-V: Add RISC-V SBI PMU extension definitions
dt-binding: pmu: Add RISC-V PMU DT bindings
RISC-V: Add perf platform driver based on SBI PMU extension
RISC-V: Add interrupt support for perf
Documentation: riscv: Remove the old documentation
MAINTAINERS: Add entry for RISC-V PMU drivers

.../devicetree/bindings/perf/riscv,pmu.yaml   |  51 ++
Documentation/riscv/pmu.rst                   | 255 ------
MAINTAINERS                                   |  10 +
arch/riscv/Kconfig                            |  13 -
arch/riscv/include/asm/csr.h                  |  66 +-
arch/riscv/include/asm/perf_event.h           |  72 --
arch/riscv/include/asm/sbi.h                  |  97 +++
arch/riscv/kernel/Makefile                    |   1 -
arch/riscv/kernel/perf_event.c                | 485 ------------
drivers/perf/Kconfig                          |  25 +
drivers/perf/Makefile                         |   5 +
drivers/perf/riscv_pmu.c                      | 331 ++++++++
drivers/perf/riscv_pmu_legacy.c               | 143 ++++
drivers/perf/riscv_pmu_sbi.c                  | 731 ++++++++++++++++++
include/linux/cpuhotplug.h                    |   1 +
include/linux/perf/riscv_pmu.h                |  69 ++
16 files changed, 1528 insertions(+), 827 deletions(-)
create mode 100644 Documentation/devicetree/bindings/perf/riscv,pmu.yaml
delete mode 100644 Documentation/riscv/pmu.rst
delete mode 100644 arch/riscv/kernel/perf_event.c
create mode 100644 drivers/perf/riscv_pmu.c
create mode 100644 drivers/perf/riscv_pmu_legacy.c
create mode 100644 drivers/perf/riscv_pmu_sbi.c
create mode 100644 include/linux/perf/riscv_pmu.h

--
2.31.1

Comments

Nikita Shubin Sept. 22, 2021, 5:25 a.m. UTC | #1
On Wed, 22 Sep 2021 08:11:44 +0300
nikita.shubin@maquefel.me wrote:

> Hello Atish.
>  
> 10.09.2021, 22:28, "Atish Patra" <atish.patra@wdc.com>:
> 
> > The current perf implementation in RISC-V is not very useful as it
> > can not count any events other than cycle/instructions. Moreover,
> > perf record can not be used or the events can not be started or
> > stopped.
> > 
> > Remove the implementation now for a better platform driver in future
> > that will implement most of the missing functionality.
> Your v3 version still breaks the build with CONFIG_BPF_SYSCALL
> enabled: 
> kernel/events/core.c: In function 'bpf_overflow_handler':
> kernel/events/core.c:9914:18: error: assignment to
> 'bpf_user_pt_regs_t *' {aka 'struct user_regs_struct *'} from
> incompatible pointer type 'struct pt_regs *'
> [-Werror=incompatible-pointer-types] 9914 |         ctx.regs =
> perf_arch_bpf_user_pt_regs(regs); |                  ^ 

Sorry for the noise.

Everything complies fine - i mistaken and applied older version again. 

> Yours,
> Nikita Shubin.
>
Palmer Dabbelt Oct. 4, 2021, 6:20 p.m. UTC | #2
On Fri, 10 Sep 2021 12:27:47 PDT (-0700), Atish Patra wrote:
> This series adds improved perf support for RISC-V based system using
> SBI PMU extension[1] and Sscofpmf extension[2]. The SBI PMU extension allows

Last we talked the SBI-0.3 stuff was in an uncertain state and I'm not 
sure we ever got to a point of agreement there.  I've decided to just 
stop worrying about the state of extensions, so if you guys want the 
SBI-0.3 stuff merged then just go say it's frozen and that'll be good 
enough for me.

> the kernel to program the counters for different events and start/stop counters
> while the sscofpmf extension allows the counter overflow interrupt and privilege
> mode filtering. An hardware platform can leverage SBI PMU extension without
> the sscofpmf extension if it supports mcountinhibit and mcounteren. However,
> the reverse is not true. With both of these extension enabled, a platform can
> take advantage of all both event counting and sampling using perf tool.
>
> This series introduces a platform perf driver instead of a existing arch
> specific implementation. The new perf implementation has adopted a modular
> approach where most of the generic event handling is done in the core library
> while individual PMUs need to only implement necessary features specific to
> the PMU. This is easily extensible and any future RISC-V PMU implementation
> can leverage this. Currently, SBI PMU driver & legacy PMU driver are implemented
> as a part of this series.
>
> The legacy driver tries to reimplement the existing minimal perf under a new
> config to maintain backward compatibility. This implementation only allows
> monitoring of always running cycle/instruction counters. Moreover, they can
> not be started or stopped. In general, this is very limited and not very useful.
> That's why, I am not very keen to carry the support into the new driver.
> However, I don't want to break perf for any existing hardware platforms.
> If nobody really uses perf currently, I will be happy to drop PATCH 4.
>
> This series has been tested in Qemu on both RV64 & RV32. Qemu[5] & OpenSBI [3]
> patches are required to test it. Qemu changes are not backward compatible.
> That means, you can not use perf anymore on older Qemu versions with latest
> OpenSBI and/or Kernel. However, newer kernel will just use legacy pmu driver if
> old OpenSBI is detected or hardware doesn't implement mcountinhibit.
>
> Here is an output of perf stat/report while running hackbench with OpenSBI & Linux
> kernel patches applied [3].
>
> Perf stat:
> =========
>
> [root@fedora-riscv riscv]# perf stat -e r8000000000000005 -e r8000000000000007
> -e r8000000000000006 -e r0000000000020002 -e r0000000000020004 -e branch-misses
> -e cache-misses -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses
> -e cycles -e instructions ./hackbench -pipe 15 process
> Running with 15*40 (== 600) tasks.
> Time: 6.578
>
>  Performance counter stats for './hackbench -pipe 15 process':
>
>              6,491      r8000000000000005      (52.59%) --> SBI_PMU_FW_SET_TIMER
>             20,433      r8000000000000007      (60.74%) --> SBI_PMU_FW_IPI_RECVD
>             21,271      r8000000000000006      (68.71%) --> SBI_PMU_FW_IPI_SENT
>                  0      r0000000000020002      (76.55%)
>      <not counted>      r0000000000020004      (0.00%)
>      <not counted>      branch-misses          (0.00%)
>      <not counted>      cache-misses           (0.00%)
>         57,537,853      dTLB-load-misses       (9.49%)
>          2,821,147      dTLB-store-misses      (18.64%)
>         52,928,130      iTLB-load-misses       (27.53%)
>     89,521,791,110      cycles                 (36.08%)
>     90,678,132,464      instructions #    1.01  insn per cycle (44.44%)
>
>        6.975908032 seconds time elapsed
>
>        3.130950000 seconds user
>       24.353310000 seconds sys
>
> The patches can also be found in the github[4].
>
> Perf record:
> ============
> [root@fedora-riscv riscv]# perf record -e cycles -e instructions -e \
> dTLB-load-misses -e dTLB-store-misses -c 1000 ./hackbench -pipe 15 process 15
> Running with 15*40 (== 600) tasks.
> Time: 1.238
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.106 MB perf.data (1020 samples) ]
>
> [root@fedora-riscv riscv]# perf report
> Available samples
> 372 cycles                                                                     ◆
> 372 instructions                                                               ▒
> 262 dTLB-load-misses                                                           ▒
> 14 dTLB-store-misses
>
> The patches can also be found in the github[4].
>
> [1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
> [2] https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit
> [3] https://github.com/atishp04/opensbi/tree/pmu_sscofpmf
> [4] https://github.com/atishp04/linux/tree/riscv_pmu_v3
> [5] https://github.com/atishp04/qemu/tree/riscv_pmu_v2
>
> Changes from v3->v4:
> 1. Added interrupt overflow support.
> 2. Cleaned up legacy driver initialization.
> 3. Supports perf record now.
> 4. Added the DT binding and maintainers file.
> 5. Changed cpu hotplug notifier to be multi-state.
> 6. OpenSBI doesn't disable cycle/instret counter during boot. Update the
>    perf code to disable all the counter during the boot.
>
> Changes from v1->v2
> 1. Implemented the latest SBI PMU extension specification.
> 2. The core platform driver was changed to operate as a library while only
>    sbi based PMU is built as a driver. The legacy one is just a fallback if
>    SBI PMU extension is not available.
>
> Atish Patra (10):
> RISC-V: Remove the current perf implementation
> RISC-V: Add CSR encodings for all HPMCOUNTERS
> RISC-V: Add a perf core library for pmu drivers
> RISC-V: Add a simple platform driver for RISC-V legacy perf
> RISC-V: Add RISC-V SBI PMU extension definitions
> dt-binding: pmu: Add RISC-V PMU DT bindings
> RISC-V: Add perf platform driver based on SBI PMU extension
> RISC-V: Add interrupt support for perf
> Documentation: riscv: Remove the old documentation
> MAINTAINERS: Add entry for RISC-V PMU drivers
>
> .../devicetree/bindings/perf/riscv,pmu.yaml   |  51 ++
> Documentation/riscv/pmu.rst                   | 255 ------
> MAINTAINERS                                   |  10 +
> arch/riscv/Kconfig                            |  13 -
> arch/riscv/include/asm/csr.h                  |  66 +-
> arch/riscv/include/asm/perf_event.h           |  72 --
> arch/riscv/include/asm/sbi.h                  |  97 +++
> arch/riscv/kernel/Makefile                    |   1 -
> arch/riscv/kernel/perf_event.c                | 485 ------------
> drivers/perf/Kconfig                          |  25 +
> drivers/perf/Makefile                         |   5 +
> drivers/perf/riscv_pmu.c                      | 331 ++++++++
> drivers/perf/riscv_pmu_legacy.c               | 143 ++++
> drivers/perf/riscv_pmu_sbi.c                  | 731 ++++++++++++++++++
> include/linux/cpuhotplug.h                    |   1 +
> include/linux/perf/riscv_pmu.h                |  69 ++
> 16 files changed, 1528 insertions(+), 827 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/perf/riscv,pmu.yaml
> delete mode 100644 Documentation/riscv/pmu.rst
> delete mode 100644 arch/riscv/kernel/perf_event.c
> create mode 100644 drivers/perf/riscv_pmu.c
> create mode 100644 drivers/perf/riscv_pmu_legacy.c
> create mode 100644 drivers/perf/riscv_pmu_sbi.c
> create mode 100644 include/linux/perf/riscv_pmu.h
Nikita Shubin Oct. 5, 2021, 9:17 a.m. UTC | #3
On Fri, 10 Sep 2021 12:27:47 -0700
Atish Patra <atish.patra@wdc.com> wrote:

Hello Atish,

> Perf stat:
> =========
> 
> [root@fedora-riscv riscv]# perf stat -e r8000000000000005 -e
> r8000000000000007 -e r8000000000000006 -e r0000000000020002 -e
> r0000000000020004 -e branch-misses -e cache-misses -e
> dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -e cycles
> -e instructions ./hackbench -pipe 15 process Running with 15*40 (==
> 600) tasks. Time: 6.578
> 
>  Performance counter stats for './hackbench -pipe 15 process':
> 
>              6,491      r8000000000000005      (52.59%) -->
> SBI_PMU_FW_SET_TIMER 20,433      r8000000000000007      (60.74%) -->
> SBI_PMU_FW_IPI_RECVD 21,271      r8000000000000006      (68.71%) -->
> SBI_PMU_FW_IPI_SENT 0      r0000000000020002      (76.55%)
>      <not counted>      r0000000000020004      (0.00%)
>      <not counted>      branch-misses          (0.00%)
>      <not counted>      cache-misses           (0.00%)
>         57,537,853      dTLB-load-misses       (9.49%)
>          2,821,147      dTLB-store-misses      (18.64%)
>         52,928,130      iTLB-load-misses       (27.53%)
>     89,521,791,110      cycles                 (36.08%)
>     90,678,132,464      instructions #    1.01  insn per cycle
> (44.44%)
> 
>        6.975908032 seconds time elapsed
> 
>        3.130950000 seconds user
>       24.353310000 seconds sys
> 

Tested your patch series with qemu and got results as expected:

perf stat -e r8000000000000005 -e r8000000000000007 \
-e r8000000000000006 -e r0000000000020002 -e r0000000000020004 -e
branch-misses \ -e cache-misses -e dTLB-load-misses -e
dTLB-store-misses -e iTLB-load-misses \ -e cycles -e instructions
./hackbench -pipe 15 process

Running with 15*40 (== 600) tasks.nch -pipe 15 process
Time: 20.027

 Performance counter stats for './hackbench -pipe 15 process':

              4896      r8000000000000005
                            (53.34%) 0      r8000000000000007
                                                (61.20%) 0
              r8000000000000006
                  (68.88%) 0      r0000000000020002
                                      (76.53%) <not counted>
              r0000000000020004
                  (0.00%) <not counted>      branch-misses
                                                 (0.00%) <not counted>
                  cache-misses
                      (0.00%) 48414917      dTLB-load-misses
                                                (9.87%) 2427413
              dTLB-store-misses
                  (19.43%) 46958092      iTLB-load-misses
                                             (28.58%) 69245163600
              cycles
                  (37.09%) 70334279943      instructions              #
                 1.02  insn per cycle           (45.24%)

      20.895871900 seconds time elapsed

       2.724942000 seconds user
      18.126277000 seconds sys

perf top/record also works.

Tested-by: Nikita Shubin <n.shubin@yadro.com>

Yours,
Nikita Shubin
Atish Patra Oct. 6, 2021, 6:27 p.m. UTC | #4
On Mon, Oct 4, 2021 at 11:21 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Fri, 10 Sep 2021 12:27:47 PDT (-0700), Atish Patra wrote:
> > This series adds improved perf support for RISC-V based system using
> > SBI PMU extension[1] and Sscofpmf extension[2]. The SBI PMU extension allows
>
> Last we talked the SBI-0.3 stuff was in an uncertain state and I'm not
> sure we ever got to a point of agreement there.  I've decided to just
> stop worrying about the state of extensions, so if you guys want the
> SBI-0.3 stuff merged then just go say it's frozen and that'll be good
> enough for me.
>

Now, there is a freeze/ratification process[1] and DoD policy[2] in
place for non-ISA specifications.
Thus, we are following the process to get it approved by the TSC.

[1] https://docs.google.com/document/d/1KkTmcPuor3DipS2JIUeR3DJhIN5i51bj3VF-L7QQShw/edit#heading=h.hym1mtuc89he
[2] https://docs.google.com/document/d/1A4gZlXXFT_a9QGzlc9g2RHg853e6ZDB59coANEYN3Aw/edit

There will be another release v0.3.1 which will be approved by the TSC
and will be called frozen officially.
There are no functional changes between v0.3 and v0.3.1. It is created
just to follow the newly defined process.
I will update the mailing list as soon as it is submitted for approval.

> > the kernel to program the counters for different events and start/stop counters
> > while the sscofpmf extension allows the counter overflow interrupt and privilege
> > mode filtering. An hardware platform can leverage SBI PMU extension without
> > the sscofpmf extension if it supports mcountinhibit and mcounteren. However,
> > the reverse is not true. With both of these extension enabled, a platform can
> > take advantage of all both event counting and sampling using perf tool.
> >
> > This series introduces a platform perf driver instead of a existing arch
> > specific implementation. The new perf implementation has adopted a modular
> > approach where most of the generic event handling is done in the core library
> > while individual PMUs need to only implement necessary features specific to
> > the PMU. This is easily extensible and any future RISC-V PMU implementation
> > can leverage this. Currently, SBI PMU driver & legacy PMU driver are implemented
> > as a part of this series.
> >
> > The legacy driver tries to reimplement the existing minimal perf under a new
> > config to maintain backward compatibility. This implementation only allows
> > monitoring of always running cycle/instruction counters. Moreover, they can
> > not be started or stopped. In general, this is very limited and not very useful.
> > That's why, I am not very keen to carry the support into the new driver.
> > However, I don't want to break perf for any existing hardware platforms.
> > If nobody really uses perf currently, I will be happy to drop PATCH 4.
> >
> > This series has been tested in Qemu on both RV64 & RV32. Qemu[5] & OpenSBI [3]
> > patches are required to test it. Qemu changes are not backward compatible.
> > That means, you can not use perf anymore on older Qemu versions with latest
> > OpenSBI and/or Kernel. However, newer kernel will just use legacy pmu driver if
> > old OpenSBI is detected or hardware doesn't implement mcountinhibit.
> >
> > Here is an output of perf stat/report while running hackbench with OpenSBI & Linux
> > kernel patches applied [3].
> >
> > Perf stat:
> > =========
> >
> > [root@fedora-riscv riscv]# perf stat -e r8000000000000005 -e r8000000000000007
> > -e r8000000000000006 -e r0000000000020002 -e r0000000000020004 -e branch-misses
> > -e cache-misses -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses
> > -e cycles -e instructions ./hackbench -pipe 15 process
> > Running with 15*40 (== 600) tasks.
> > Time: 6.578
> >
> >  Performance counter stats for './hackbench -pipe 15 process':
> >
> >              6,491      r8000000000000005      (52.59%) --> SBI_PMU_FW_SET_TIMER
> >             20,433      r8000000000000007      (60.74%) --> SBI_PMU_FW_IPI_RECVD
> >             21,271      r8000000000000006      (68.71%) --> SBI_PMU_FW_IPI_SENT
> >                  0      r0000000000020002      (76.55%)
> >      <not counted>      r0000000000020004      (0.00%)
> >      <not counted>      branch-misses          (0.00%)
> >      <not counted>      cache-misses           (0.00%)
> >         57,537,853      dTLB-load-misses       (9.49%)
> >          2,821,147      dTLB-store-misses      (18.64%)
> >         52,928,130      iTLB-load-misses       (27.53%)
> >     89,521,791,110      cycles                 (36.08%)
> >     90,678,132,464      instructions #    1.01  insn per cycle (44.44%)
> >
> >        6.975908032 seconds time elapsed
> >
> >        3.130950000 seconds user
> >       24.353310000 seconds sys
> >
> > The patches can also be found in the github[4].
> >
> > Perf record:
> > ============
> > [root@fedora-riscv riscv]# perf record -e cycles -e instructions -e \
> > dTLB-load-misses -e dTLB-store-misses -c 1000 ./hackbench -pipe 15 process 15
> > Running with 15*40 (== 600) tasks.
> > Time: 1.238
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.106 MB perf.data (1020 samples) ]
> >
> > [root@fedora-riscv riscv]# perf report
> > Available samples
> > 372 cycles                                                                     ◆
> > 372 instructions                                                               ▒
> > 262 dTLB-load-misses                                                           ▒
> > 14 dTLB-store-misses
> >
> > The patches can also be found in the github[4].
> >
> > [1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
> > [2] https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit
> > [3] https://github.com/atishp04/opensbi/tree/pmu_sscofpmf
> > [4] https://github.com/atishp04/linux/tree/riscv_pmu_v3
> > [5] https://github.com/atishp04/qemu/tree/riscv_pmu_v2
> >
> > Changes from v3->v4:
> > 1. Added interrupt overflow support.
> > 2. Cleaned up legacy driver initialization.
> > 3. Supports perf record now.
> > 4. Added the DT binding and maintainers file.
> > 5. Changed cpu hotplug notifier to be multi-state.
> > 6. OpenSBI doesn't disable cycle/instret counter during boot. Update the
> >    perf code to disable all the counter during the boot.
> >
> > Changes from v1->v2
> > 1. Implemented the latest SBI PMU extension specification.
> > 2. The core platform driver was changed to operate as a library while only
> >    sbi based PMU is built as a driver. The legacy one is just a fallback if
> >    SBI PMU extension is not available.
> >
> > Atish Patra (10):
> > RISC-V: Remove the current perf implementation
> > RISC-V: Add CSR encodings for all HPMCOUNTERS
> > RISC-V: Add a perf core library for pmu drivers
> > RISC-V: Add a simple platform driver for RISC-V legacy perf
> > RISC-V: Add RISC-V SBI PMU extension definitions
> > dt-binding: pmu: Add RISC-V PMU DT bindings
> > RISC-V: Add perf platform driver based on SBI PMU extension
> > RISC-V: Add interrupt support for perf
> > Documentation: riscv: Remove the old documentation
> > MAINTAINERS: Add entry for RISC-V PMU drivers
> >
> > .../devicetree/bindings/perf/riscv,pmu.yaml   |  51 ++
> > Documentation/riscv/pmu.rst                   | 255 ------
> > MAINTAINERS                                   |  10 +
> > arch/riscv/Kconfig                            |  13 -
> > arch/riscv/include/asm/csr.h                  |  66 +-
> > arch/riscv/include/asm/perf_event.h           |  72 --
> > arch/riscv/include/asm/sbi.h                  |  97 +++
> > arch/riscv/kernel/Makefile                    |   1 -
> > arch/riscv/kernel/perf_event.c                | 485 ------------
> > drivers/perf/Kconfig                          |  25 +
> > drivers/perf/Makefile                         |   5 +
> > drivers/perf/riscv_pmu.c                      | 331 ++++++++
> > drivers/perf/riscv_pmu_legacy.c               | 143 ++++
> > drivers/perf/riscv_pmu_sbi.c                  | 731 ++++++++++++++++++
> > include/linux/cpuhotplug.h                    |   1 +
> > include/linux/perf/riscv_pmu.h                |  69 ++
> > 16 files changed, 1528 insertions(+), 827 deletions(-)
> > create mode 100644 Documentation/devicetree/bindings/perf/riscv,pmu.yaml
> > delete mode 100644 Documentation/riscv/pmu.rst
> > delete mode 100644 arch/riscv/kernel/perf_event.c
> > create mode 100644 drivers/perf/riscv_pmu.c
> > create mode 100644 drivers/perf/riscv_pmu_legacy.c
> > create mode 100644 drivers/perf/riscv_pmu_sbi.c
> > create mode 100644 include/linux/perf/riscv_pmu.h
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv



--
Regards,
Atish
Atish Patra Oct. 6, 2021, 6:28 p.m. UTC | #5
On Tue, Oct 5, 2021 at 2:18 AM Nikita Shubin <nikita.shubin@maquefel.me> wrote:
>
> On Fri, 10 Sep 2021 12:27:47 -0700
> Atish Patra <atish.patra@wdc.com> wrote:
>
> Hello Atish,
>
> > Perf stat:
> > =========
> >
> > [root@fedora-riscv riscv]# perf stat -e r8000000000000005 -e
> > r8000000000000007 -e r8000000000000006 -e r0000000000020002 -e
> > r0000000000020004 -e branch-misses -e cache-misses -e
> > dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -e cycles
> > -e instructions ./hackbench -pipe 15 process Running with 15*40 (==
> > 600) tasks. Time: 6.578
> >
> >  Performance counter stats for './hackbench -pipe 15 process':
> >
> >              6,491      r8000000000000005      (52.59%) -->
> > SBI_PMU_FW_SET_TIMER 20,433      r8000000000000007      (60.74%) -->
> > SBI_PMU_FW_IPI_RECVD 21,271      r8000000000000006      (68.71%) -->
> > SBI_PMU_FW_IPI_SENT 0      r0000000000020002      (76.55%)
> >      <not counted>      r0000000000020004      (0.00%)
> >      <not counted>      branch-misses          (0.00%)
> >      <not counted>      cache-misses           (0.00%)
> >         57,537,853      dTLB-load-misses       (9.49%)
> >          2,821,147      dTLB-store-misses      (18.64%)
> >         52,928,130      iTLB-load-misses       (27.53%)
> >     89,521,791,110      cycles                 (36.08%)
> >     90,678,132,464      instructions #    1.01  insn per cycle
> > (44.44%)
> >
> >        6.975908032 seconds time elapsed
> >
> >        3.130950000 seconds user
> >       24.353310000 seconds sys
> >
>
> Tested your patch series with qemu and got results as expected:
>
> perf stat -e r8000000000000005 -e r8000000000000007 \
> -e r8000000000000006 -e r0000000000020002 -e r0000000000020004 -e
> branch-misses \ -e cache-misses -e dTLB-load-misses -e
> dTLB-store-misses -e iTLB-load-misses \ -e cycles -e instructions
> ./hackbench -pipe 15 process
>
> Running with 15*40 (== 600) tasks.nch -pipe 15 process
> Time: 20.027
>
>  Performance counter stats for './hackbench -pipe 15 process':
>
>               4896      r8000000000000005
>                             (53.34%) 0      r8000000000000007
>                                                 (61.20%) 0
>               r8000000000000006
>                   (68.88%) 0      r0000000000020002
>                                       (76.53%) <not counted>
>               r0000000000020004
>                   (0.00%) <not counted>      branch-misses
>                                                  (0.00%) <not counted>
>                   cache-misses
>                       (0.00%) 48414917      dTLB-load-misses
>                                                 (9.87%) 2427413
>               dTLB-store-misses
>                   (19.43%) 46958092      iTLB-load-misses
>                                              (28.58%) 69245163600
>               cycles
>                   (37.09%) 70334279943      instructions              #
>                  1.02  insn per cycle           (45.24%)
>
>       20.895871900 seconds time elapsed
>
>        2.724942000 seconds user
>       18.126277000 seconds sys
>
> perf top/record also works.
>
> Tested-by: Nikita Shubin <n.shubin@yadro.com>
>

Thanks for the testing.

> Yours,
> Nikita Shubin
>
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv