mbox series

[v4,00/11] Improve RISC-V Perf support using SBI PMU and sscofpmf extension

Message ID 20211025195350.242914-1-atish.patra@wdc.com
Headers show
Series Improve RISC-V Perf support using SBI PMU and sscofpmf extension | expand

Message

Atish Patra Oct. 25, 2021, 7:53 p.m. UTC
This series adds improved perf support for RISC-V based system using
SBI PMU extension[1] and Sscofpmf extension[2]. The SBI PMU extension allows
the kernel to program the counters for different events and start/stop counters
while the sscofpmf extension allows the counter overflow interrupt and privilege
mode filtering. An hardware platform can leverage SBI PMU extension without
the sscofpmf extension if it supports mcountinhibit and mcounteren. However,
the reverse is not true. With both of these extension enabled, a platform can
take advantage of all both event counting and sampling using perf tool. 

This series introduces a platform perf driver instead of a existing arch
specific implementation. The new perf implementation has adopted a modular
approach where most of the generic event handling is done in the core library
while individual PMUs need to only implement necessary features specific to
the PMU. This is easily extensible and any future RISC-V PMU implementation
can leverage this. Currently, SBI PMU driver & legacy PMU driver are implemented
as a part of this series.

The legacy driver tries to reimplement the existing minimal perf under a new
config to maintain backward compatibility. This implementation only allows
monitoring of always running cycle/instruction counters. Moreover, they can
not be started or stopped. In general, this is very limited and not very useful.
That's why, I am not very keen to carry the support into the new driver.
However, I don't want to break perf for any existing hardware platforms.
If everybody agrees that we don't need legacy perf implementation for older
implementation, I will be happy to drop PATCH 4.

This series has been tested in Qemu (RV64 & RV32) and HiFive Unmatched.
Qemu[5] & OpenSBI [3] patches are required to test it on Qemu and a dt patch
required in U-Boot[6] for HiFive Unmatched. Qemu changes are not
backward compatible. That means, you can not use perf anymore on older Qemu
versions with latest OpenSBI and/or Kernel. However, newer kernel will
just use legacy pmu driver if old OpenSBI is detected.

The U-Boot patch is just an example that encodes few of the events defined
in fu740 documentation [7] in the DT. We can update the DT to include all the
events defined if required.

Here is an output of perf stat/report while running perf benchmark with OpenSBI, 
Linux kernel and U-Boot patches applied.

HiFive Unmatched:
=================
perf stat -e cycles -e instructions -e L1-icache-load-misses -e branches -e branch-misses \
-e r0000000000000200 -e r0000000000000400 \
-e r0000000000000800 perf bench sched messaging -g 25 -l 15

# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 25 groups == 1000 processes run

     Total time: 0.826 [sec]

 Performance counter stats for 'perf bench sched messaging -g 25 -l 15':

        3426710073      cycles                (65.92%)
        1348772808      instructions          #0.39  insn per cycle  (75.44%)
                 0      L1-icache-load-misses (72.28%)
         201133996      branches              (67.88%)
          44663584      branch-misses         #22.21% of all branches (35.01%)
         248194747      r0000000000000200     (41.94%) --> Integer load instruction retired
         156879950      r0000000000000400     (43.58%) --> Integer store instruction retired
           6988678      r0000000000000800     (47.91%) --> Atomic memory operation retired

       1.931335000 seconds time elapsed

       1.100415000 seconds user
       3.755176000 seconds sys


QEMU:
=========
Perf stat:
=========

[root@fedora-riscv riscv]# perf stat -e r8000000000000005 -e r8000000000000007 \
-e r8000000000000006 -e r0000000000020002 -e r0000000000020004 -e branch-misses \
-e cache-misses -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \
-e cycles -e instructions perf bench sched messaging -g 15 -l 10 \
Running with 15*40 (== 600) tasks.
Time: 6.578

 Performance counter stats for './hackbench -pipe 15 process':

             1,794      r8000000000000005      (52.59%) --> SBI_PMU_FW_SET_TIMER
             2,859      r8000000000000007      (60.74%) --> SBI_PMU_FW_IPI_RECVD
             4,205      r8000000000000006      (68.71%) --> SBI_PMU_FW_IPI_SENT
                 0      r0000000000020002      (81.69%)
     <not counted>      r0000000000020004      (0.00%)
     <not counted>      branch-misses          (0.00%)
     <not counted>      cache-misses           (0.00%)
         7,878,328      dTLB-load-misses       (15.60%)
           680,270      dTLB-store-misses      (28.45%)
         8,287,931      iTLB-load-misses       (39.24%)
    20,008,506,675      cycles                 (48.60%)
    21,484,427,932      instructions   # 1.07  insn per cycle (56.60%)

       1.681344735 seconds time elapsed

       0.614460000 seconds user
       8.313254000 seconds sys


Perf record:
============
[root@fedora-riscv riscv]# perf record -e cycles -e instructions \
-e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 10000 \
perf bench sched messaging -g 15 -l 10
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 15 groups == 600 processes run

     Total time: 1.261 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.101 MB perf.data (845 samples) ]

[root@fedora-riscv riscv]# perf report
Available samples                                                               
407 cycles                                                                     _
407 instructions                                                               _
18 dTLB-load-misses                                                            _
2 dTLB-store-misses                                                            _
11 iTLB-load-misses                                                            _
                           
[1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
[2] https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit
[3] https://github.com/atishp04/opensbi/tree/pmu_sscofpmf_v2 
[4] https://github.com/atishp04/linux/tree/riscv_pmu_v4
[5] https://github.com/atishp04/qemu/tree/riscv_pmu_v3
[6] https://github.com/atishp04/u-boot/tree/hifive_unmatched_dt_pmu
[7] https://sifive.cdn.prismic.io/sifive/de1491e5-077c-461d-9605-e8a0ce57337d_fu740-c000-manual-v1p3.pdf

Changes from v3->v4:
1. Do not proceed overflow handler if event doesn't set for sampling.
2. overflow status register is only read after counters are stopped.
3. Added the PMU DT node for HiFive Unmatched.

Changes from v2->v3:
1. Added interrupt overflow support.
2. Cleaned up legacy driver initialization.
3. Supports perf record now.
4. Added the DT binding and maintainers file.
5. Changed cpu hotplug notifier to be multi-state.
6. OpenSBI doesn't disable cycle/instret counter during boot. Update the
   perf code to disable all the counter during the boot.

Changes from v1->v2
1. Implemented the latest SBI PMU extension specification.
2. The core platform driver was changed to operate as a library while only
   sbi based PMU is built as a driver. The legacy one is just a fallback if
   SBI PMU extension is not available.

Atish Patra (11):
RISC-V: Remove the current perf implementation
RISC-V: Add CSR encodings for all HPMCOUNTERS
RISC-V: Add a perf core library for pmu drivers
RISC-V: Add a simple platform driver for RISC-V legacy perf
RISC-V: Add RISC-V SBI PMU extension definitions
dt-binding: pmu: Add RISC-V PMU DT bindings
RISC-V: Add perf platform driver based on SBI PMU extension
RISC-V: Add interrupt support for perf
Documentation: riscv: Remove the old documentation
riscv: dts: fu740: Add pmu node
MAINTAINERS: Add entry for RISC-V PMU drivers

.../devicetree/bindings/perf/riscv,pmu.yaml   |  51 ++
Documentation/riscv/pmu.rst                   | 255 ------
MAINTAINERS                                   |  10 +
arch/riscv/Kconfig                            |  13 -
arch/riscv/boot/dts/sifive/fu740-c000.dtsi    |   3 +
arch/riscv/include/asm/csr.h                  |  66 +-
arch/riscv/include/asm/perf_event.h           |  72 --
arch/riscv/include/asm/sbi.h                  |  97 +++
arch/riscv/kernel/Makefile                    |   1 -
arch/riscv/kernel/perf_event.c                | 485 ------------
drivers/perf/Kconfig                          |  25 +
drivers/perf/Makefile                         |   5 +
drivers/perf/riscv_pmu.c                      | 331 ++++++++
drivers/perf/riscv_pmu_legacy.c               | 143 ++++
drivers/perf/riscv_pmu_sbi.c                  | 732 ++++++++++++++++++
include/linux/cpuhotplug.h                    |   1 +
include/linux/perf/riscv_pmu.h                |  69 ++
17 files changed, 1532 insertions(+), 827 deletions(-)
create mode 100644 Documentation/devicetree/bindings/perf/riscv,pmu.yaml
delete mode 100644 Documentation/riscv/pmu.rst
delete mode 100644 arch/riscv/kernel/perf_event.c
create mode 100644 drivers/perf/riscv_pmu.c
create mode 100644 drivers/perf/riscv_pmu_legacy.c
create mode 100644 drivers/perf/riscv_pmu_sbi.c
create mode 100644 include/linux/perf/riscv_pmu.h

--
2.31.1

Comments

Jessica Clarke Oct. 28, 2021, 8:48 p.m. UTC | #1
On Mon, Oct 25, 2021 at 12:53:49PM -0700, Atish Patra wrote:
> HiFive unmatched supports HPMCounters but does not implement mcountinhibit
> or sscof extension. Thus, perf monitoring can be used on the unmatched
> board without sampling.
> 
> Add the PMU node with compatible string so that Linux perf driver can
> utilize this to enable PMU.
> 
> Signed-off-by: Atish Patra <atish.patra@wdc.com>
> ---
>  arch/riscv/boot/dts/sifive/fu740-c000.dtsi | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/riscv/boot/dts/sifive/fu740-c000.dtsi b/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
> index abbb960f90a0..b35b96b58820 100644
> --- a/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
> +++ b/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
> @@ -140,6 +140,9 @@ soc {
>  		#size-cells = <2>;
>  		compatible = "simple-bus";
>  		ranges;
> +		pmu {
> +			compatible = "riscv,pmu";
> +		};

This is a property of the user-replaceable firmware, not a property of
the hardware, so having this in the device tree under /soc, let alone
hard-coded in Linux, is utterly wrong. Why can this not just be probed
like any other SBI interface? The "Probe SBI extension" interface is
precisely for this kind of thing.

Jess
Atish Patra Oct. 28, 2021, 11:37 p.m. UTC | #2
On Thu, Oct 28, 2021 at 1:49 PM Jessica Clarke <jrtc27@jrtc27.com> wrote:
>
> On Mon, Oct 25, 2021 at 12:53:49PM -0700, Atish Patra wrote:
> > HiFive unmatched supports HPMCounters but does not implement mcountinhibit
> > or sscof extension. Thus, perf monitoring can be used on the unmatched
> > board without sampling.
> >
> > Add the PMU node with compatible string so that Linux perf driver can
> > utilize this to enable PMU.
> >
> > Signed-off-by: Atish Patra <atish.patra@wdc.com>
> > ---
> >  arch/riscv/boot/dts/sifive/fu740-c000.dtsi | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/riscv/boot/dts/sifive/fu740-c000.dtsi b/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
> > index abbb960f90a0..b35b96b58820 100644
> > --- a/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
> > +++ b/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
> > @@ -140,6 +140,9 @@ soc {
> >               #size-cells = <2>;
> >               compatible = "simple-bus";
> >               ranges;
> > +             pmu {
> > +                     compatible = "riscv,pmu";
> > +             };
>
> This is a property of the user-replaceable firmware, not a property of
> the hardware,

It's a property of hardware that indicates that the hardware supports PMU.
Additionally, the counter overflow interrupt number needs to be
defined through the DT as well
so that a clean platform driver can be implemented.


so having this in the device tree under /soc, let alone
> hard-coded in Linux, is utterly wrong. Why can this not just be probed
> like any other SBI interface? The "Probe SBI extension" interface is
> precisely for this kind of thing.
>
SBI extension is anyways probed to verify if the firmware has PMU
extension or not.
However, adding the DT property allows different platforms (with or
without sscof extension)
to use the same code path.

> Jess
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Jessica Clarke Oct. 29, 2021, 12:07 a.m. UTC | #3
On 29 Oct 2021, at 00:37, Atish Patra <atishp@atishpatra.org> wrote:
> 
> On Thu, Oct 28, 2021 at 1:49 PM Jessica Clarke <jrtc27@jrtc27.com> wrote:
>> 
>> On Mon, Oct 25, 2021 at 12:53:49PM -0700, Atish Patra wrote:
>>> HiFive unmatched supports HPMCounters but does not implement mcountinhibit
>>> or sscof extension. Thus, perf monitoring can be used on the unmatched
>>> board without sampling.
>>> 
>>> Add the PMU node with compatible string so that Linux perf driver can
>>> utilize this to enable PMU.
>>> 
>>> Signed-off-by: Atish Patra <atish.patra@wdc.com>
>>> ---
>>> arch/riscv/boot/dts/sifive/fu740-c000.dtsi | 3 +++
>>> 1 file changed, 3 insertions(+)
>>> 
>>> diff --git a/arch/riscv/boot/dts/sifive/fu740-c000.dtsi b/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
>>> index abbb960f90a0..b35b96b58820 100644
>>> --- a/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
>>> +++ b/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
>>> @@ -140,6 +140,9 @@ soc {
>>>              #size-cells = <2>;
>>>              compatible = "simple-bus";
>>>              ranges;
>>> +             pmu {
>>> +                     compatible = "riscv,pmu";
>>> +             };
>> 
>> This is a property of the user-replaceable firmware, not a property of
>> the hardware,
> 
> It's a property of hardware that indicates that the hardware supports PMU.

All RISC-V hardware provides the CSRs, they’re part of the privileged
spec and not marked optional. How many aren’t hard-wired to zero is up
to the implementation. But even then you can’t know from the hardware
alone what is supported; the firmware has to enable S-mode (and
U-mode)’s ability to read them, so you can’t assume anything in a
static device tree hard-coded in Linux about what firmware has done.
Since you currently have to query the firmware to determine what’s
available to you anyway, I see no benefit from having a node in the
device tree that tells you your firmware *might* have counters you can
use.

> Additionally, the counter overflow interrupt number needs to be
> defined through the DT as well
> so that a clean platform driver can be implemented.

The interrupt number is specified as 13 by the Sscofmpf spec.
But that’s not relevant here, the FU740 predates and doesn’t implement
Sscofmpf, meaning there is no interrupt to even define here. And as I
said on the other patch, don’t conflate “SBI PMU firmware interface is
supported” and “Sscofmpf is implemented in the hardware”; the former
should be discovered by talking to firmware, and the latter should be
discovered like any other extension (however that ends up happening).

>> so having this in the device tree under /soc, let alone
>> hard-coded in Linux, is utterly wrong. Why can this not just be probed
>> like any other SBI interface? The "Probe SBI extension" interface is
>> precisely for this kind of thing.
>> 
> SBI extension is anyways probed to verify if the firmware has PMU
> extension or not.
> However, adding the DT property allows different platforms (with or
> without sscof extension)
> to use the same code path.

You don’t need a device tree for that; that same code path can just be
“use the existing standard firmware interface”. That also has the
benefit that it’s not tied to device tree and so works identically for
ACPI, rather than needing an ACPI version of it.

I see nothing here that can’t be discovered through pre-existing means.
If it can be discovered without use of the device tree then it does not
belong in the device tree; the device tree is purely for things that
cannot otherwise be discovered.

Jess
Atish Patra Oct. 29, 2021, 6:05 a.m. UTC | #4
On Thu, Oct 28, 2021 at 5:07 PM Jessica Clarke <jrtc27@jrtc27.com> wrote:
>
> On 29 Oct 2021, at 00:37, Atish Patra <atishp@atishpatra.org> wrote:
> >
> > On Thu, Oct 28, 2021 at 1:49 PM Jessica Clarke <jrtc27@jrtc27.com> wrote:
> >>
> >> On Mon, Oct 25, 2021 at 12:53:49PM -0700, Atish Patra wrote:
> >>> HiFive unmatched supports HPMCounters but does not implement mcountinhibit
> >>> or sscof extension. Thus, perf monitoring can be used on the unmatched
> >>> board without sampling.
> >>>
> >>> Add the PMU node with compatible string so that Linux perf driver can
> >>> utilize this to enable PMU.
> >>>
> >>> Signed-off-by: Atish Patra <atish.patra@wdc.com>
> >>> ---
> >>> arch/riscv/boot/dts/sifive/fu740-c000.dtsi | 3 +++
> >>> 1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/arch/riscv/boot/dts/sifive/fu740-c000.dtsi b/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
> >>> index abbb960f90a0..b35b96b58820 100644
> >>> --- a/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
> >>> +++ b/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
> >>> @@ -140,6 +140,9 @@ soc {
> >>>              #size-cells = <2>;
> >>>              compatible = "simple-bus";
> >>>              ranges;
> >>> +             pmu {
> >>> +                     compatible = "riscv,pmu";
> >>> +             };
> >>
> >> This is a property of the user-replaceable firmware, not a property of
> >> the hardware,
> >
> > It's a property of hardware that indicates that the hardware supports PMU.
>
> All RISC-V hardware provides the CSRs, they’re part of the privileged
> spec and not marked optional. How many aren’t hard-wired to zero is up
> to the implementation. But even then you can’t know from the hardware
> alone what is supported; the firmware has to enable S-mode (and
> U-mode)’s ability to read them, so you can’t assume anything in a
> static device tree hard-coded in Linux about what firmware has done.
> Since you currently have to query the firmware to determine what’s
> available to you anyway, I see no benefit from having a node in the
> device tree that tells you your firmware *might* have counters you can
> use.
>
> > Additionally, the counter overflow interrupt number needs to be
> > defined through the DT as well
> > so that a clean platform driver can be implemented.
>
> The interrupt number is specified as 13 by the Sscofmpf spec.
> But that’s not relevant here, the FU740 predates and doesn’t implement
> Sscofmpf, meaning there is no interrupt to even define here. And as I
> said on the other patch, don’t conflate “SBI PMU firmware interface is
> supported” and “Sscofmpf is implemented in the hardware”; the former
> should be discovered by talking to firmware, and the latter should be
> discovered like any other extension (however that ends up happening).

Presence of sscof extension can be discovered through general extension
discovery mechanism (probably a separate DT node..that's a separate discussion).

However, the interrupt number discovery has to be through DT so the
platform driver
can probe the DT to figure out that.

>
> >> so having this in the device tree under /soc, let alone
> >> hard-coded in Linux, is utterly wrong. Why can this not just be probed
> >> like any other SBI interface? The "Probe SBI extension" interface is
> >> precisely for this kind of thing.
> >>
> > SBI extension is anyways probed to verify if the firmware has PMU
> > extension or not.
> > However, adding the DT property allows different platforms (with or
> > without sscof extension)
> > to use the same code path.
>
> You don’t need a device tree for that; that same code path can just be
> “use the existing standard firmware interface”. That also has the
> benefit that it’s not tied to device tree and so works identically for
> ACPI, rather than needing an ACPI version of it.
>

I don't disagree with that argument. However, we need a DT node for
interrupt number as explained in the above.
A DT based platform driver allows us to provide a unified code path
which can handle both kinds of platforms described below.

1. Platforms without sscof extension
2. Platforms with sscof extension that requires a DT node for interrupt number

Otherwise, the driver has to do the following things in order.

1. Probe PMU extension
2. first check if sscof extension is present in the special RISC-V ISA
extension DT node (which is yet to finalize)
3. If sscof extension is present then register for a DT based platform driver.
4. Otherwise, register a simple platform driver.

I am not completely opposed to doing that if there is a strong
technical issue with the current approach.

> I see nothing here that can’t be discovered through pre-existing means.
> If it can be discovered without use of the device tree then it does not
> belong in the device tree; the device tree is purely for things that
> cannot otherwise be discovered.
>
> Jess
>
Jessica Clarke Oct. 29, 2021, 12:25 p.m. UTC | #5
On 29 Oct 2021, at 07:05, Atish Patra <atishp@atishpatra.org> wrote:
> 
> On Thu, Oct 28, 2021 at 5:07 PM Jessica Clarke <jrtc27@jrtc27.com> wrote:
>> 
>> On 29 Oct 2021, at 00:37, Atish Patra <atishp@atishpatra.org> wrote:
>>> 
>>> On Thu, Oct 28, 2021 at 1:49 PM Jessica Clarke <jrtc27@jrtc27.com> wrote:
>>>> 
>>>> On Mon, Oct 25, 2021 at 12:53:49PM -0700, Atish Patra wrote:
>>>>> HiFive unmatched supports HPMCounters but does not implement mcountinhibit
>>>>> or sscof extension. Thus, perf monitoring can be used on the unmatched
>>>>> board without sampling.
>>>>> 
>>>>> Add the PMU node with compatible string so that Linux perf driver can
>>>>> utilize this to enable PMU.
>>>>> 
>>>>> Signed-off-by: Atish Patra <atish.patra@wdc.com>
>>>>> ---
>>>>> arch/riscv/boot/dts/sifive/fu740-c000.dtsi | 3 +++
>>>>> 1 file changed, 3 insertions(+)
>>>>> 
>>>>> diff --git a/arch/riscv/boot/dts/sifive/fu740-c000.dtsi b/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
>>>>> index abbb960f90a0..b35b96b58820 100644
>>>>> --- a/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
>>>>> +++ b/arch/riscv/boot/dts/sifive/fu740-c000.dtsi
>>>>> @@ -140,6 +140,9 @@ soc {
>>>>>             #size-cells = <2>;
>>>>>             compatible = "simple-bus";
>>>>>             ranges;
>>>>> +             pmu {
>>>>> +                     compatible = "riscv,pmu";
>>>>> +             };
>>>> 
>>>> This is a property of the user-replaceable firmware, not a property of
>>>> the hardware,
>>> 
>>> It's a property of hardware that indicates that the hardware supports PMU.
>> 
>> All RISC-V hardware provides the CSRs, they’re part of the privileged
>> spec and not marked optional. How many aren’t hard-wired to zero is up
>> to the implementation. But even then you can’t know from the hardware
>> alone what is supported; the firmware has to enable S-mode (and
>> U-mode)’s ability to read them, so you can’t assume anything in a
>> static device tree hard-coded in Linux about what firmware has done.
>> Since you currently have to query the firmware to determine what’s
>> available to you anyway, I see no benefit from having a node in the
>> device tree that tells you your firmware *might* have counters you can
>> use.
>> 
>>> Additionally, the counter overflow interrupt number needs to be
>>> defined through the DT as well
>>> so that a clean platform driver can be implemented.
>> 
>> The interrupt number is specified as 13 by the Sscofmpf spec.
>> But that’s not relevant here, the FU740 predates and doesn’t implement
>> Sscofmpf, meaning there is no interrupt to even define here. And as I
>> said on the other patch, don’t conflate “SBI PMU firmware interface is
>> supported” and “Sscofmpf is implemented in the hardware”; the former
>> should be discovered by talking to firmware, and the latter should be
>> discovered like any other extension (however that ends up happening).
> 
> Presence of sscof extension can be discovered through general extension
> discovery mechanism (probably a separate DT node..that's a separate discussion).
> 
> However, the interrupt number discovery has to be through DT so the
> platform driver
> can probe the DT to figure out that.

No, you’re not reading what I said. It’s specified to be 13 in the
Sscofmpf spec, there is zero need to encode information in the device
tree that is already mandated by a specification. We don’t put that
supervisor software interrupts are 1 nor that supervisor timer
interrupts are 5 in the device tree, so we also don’t need to put that
supervisor counter overflow interrupts are 13 in it. We *do* currently
put machine timer interrupt information from the CLINT in the device
tree, and both supervisor and machine external interrupt information
from the PLIC, but that is not to tell you what’s already specified
(that they’re interrupts 7, 11 and 9 respectively), it’s to tell you
which order the per-hart registers are in in the CLINT, and which order
the contexts are in in the PLIC. If it were up to me that would’ve been
expressed a different way as it’s an ugly encoding, rather redundant
and not the nicest to decode in software, but it’s too late for that.
Though with the ACLINT, APLIC and other AIA parts on the horizon I hope
we can get saner bindings for those that don’t repeat those mistakes.
But the point is, if it’s specified by the spec, it doesn’t need to go
in the device tree, the device tree is for telling you all the things
you don’t, and can’t, already know.

>>>> so having this in the device tree under /soc, let alone
>>>> hard-coded in Linux, is utterly wrong. Why can this not just be probed
>>>> like any other SBI interface? The "Probe SBI extension" interface is
>>>> precisely for this kind of thing.
>>>> 
>>> SBI extension is anyways probed to verify if the firmware has PMU
>>> extension or not.
>>> However, adding the DT property allows different platforms (with or
>>> without sscof extension)
>>> to use the same code path.
>> 
>> You don’t need a device tree for that; that same code path can just be
>> “use the existing standard firmware interface”. That also has the
>> benefit that it’s not tied to device tree and so works identically for
>> ACPI, rather than needing an ACPI version of it.
>> 
> 
> I don't disagree with that argument. However, we need a DT node for
> interrupt number as explained in the above.
> A DT based platform driver allows us to provide a unified code path
> which can handle both kinds of platforms described below.
> 
> 1. Platforms without sscof extension
> 2. Platforms with sscof extension that requires a DT node for interrupt number
> 
> Otherwise, the driver has to do the following things in order.
> 
> 1. Probe PMU extension
> 2. first check if sscof extension is present in the special RISC-V ISA
> extension DT node (which is yet to finalize)
> 3. If sscof extension is present then register for a DT based platform driver.
> 4. Otherwise, register a simple platform driver.
> 
> I am not completely opposed to doing that if there is a strong
> technical issue with the current approach.

Nope, it’s:

1. Probe PMU SBI extension
2. Register a driver
3. If Sscofmpf is present, register for interrupt 13

Or perhaps with 2 and 3 swapped.

You’re making this far too complicated by being fixated on needing
device tree in there somewhere. You don’t need it at all except
possibly to tell you that Sscofmpf is supported, which should be done
the same as any other supervisor extension like Svnapot or Svpbmt.

Jess

>> I see nothing here that can’t be discovered through pre-existing means.
>> If it can be discovered without use of the device tree then it does not
>> belong in the device tree; the device tree is purely for things that
>> cannot otherwise be discovered.
>> 
>> Jess
>> 
> 
> 
> -- 
> Regards,
> Atish
Palmer Dabbelt Dec. 14, 2021, 1:51 a.m. UTC | #6
On Mon, 25 Oct 2021 12:53:39 PDT (-0700), Atish Patra wrote:
> This series adds improved perf support for RISC-V based system using
> SBI PMU extension[1] and Sscofpmf extension[2]. The SBI PMU extension allows
> the kernel to program the counters for different events and start/stop counters
> while the sscofpmf extension allows the counter overflow interrupt and privilege
> mode filtering. An hardware platform can leverage SBI PMU extension without
> the sscofpmf extension if it supports mcountinhibit and mcounteren. However,
> the reverse is not true. With both of these extension enabled, a platform can
> take advantage of all both event counting and sampling using perf tool.
>
> This series introduces a platform perf driver instead of a existing arch
> specific implementation. The new perf implementation has adopted a modular
> approach where most of the generic event handling is done in the core library
> while individual PMUs need to only implement necessary features specific to
> the PMU. This is easily extensible and any future RISC-V PMU implementation
> can leverage this. Currently, SBI PMU driver & legacy PMU driver are implemented
> as a part of this series.
>
> The legacy driver tries to reimplement the existing minimal perf under a new
> config to maintain backward compatibility. This implementation only allows
> monitoring of always running cycle/instruction counters. Moreover, they can
> not be started or stopped. In general, this is very limited and not very useful.
> That's why, I am not very keen to carry the support into the new driver.
> However, I don't want to break perf for any existing hardware platforms.
> If everybody agrees that we don't need legacy perf implementation for older
> implementation, I will be happy to drop PATCH 4.

IMO we should keep it for a bit, so we have a transition period.  These 
extensions are pretty new so we won't be able to count on everyone 
having them yet, this way we'll avoid breaking users.

This generally looks good, but I don't see any Acks from the perf 
maintainers.  I'm happy to take this through the RISC-V tree, but I'd 
generally like to have at least an ack as perf isn't really my 
subsystem.  MAINTAINERS seems to indicate that's Will and Mark, they're 
not To'd so maybe they just missed this?

I fixed a few trivial checkpatch warnings, updated Atish's email 
address, and put this on palmer/riscv-pmu.  Happy to hear any comments, 
if nobody says anything then I'll just put that on riscv/for-next 
whenever I get back to my own email.

> This series has been tested in Qemu (RV64 & RV32) and HiFive Unmatched.
> Qemu[5] & OpenSBI [3] patches are required to test it on Qemu and a dt patch
> required in U-Boot[6] for HiFive Unmatched. Qemu changes are not
> backward compatible. That means, you can not use perf anymore on older Qemu
> versions with latest OpenSBI and/or Kernel. However, newer kernel will
> just use legacy pmu driver if old OpenSBI is detected.
>
> The U-Boot patch is just an example that encodes few of the events defined
> in fu740 documentation [7] in the DT. We can update the DT to include all the
> events defined if required.
>
> Here is an output of perf stat/report while running perf benchmark with OpenSBI,
> Linux kernel and U-Boot patches applied.
>
> HiFive Unmatched:
> =================
> perf stat -e cycles -e instructions -e L1-icache-load-misses -e branches -e branch-misses \
> -e r0000000000000200 -e r0000000000000400 \
> -e r0000000000000800 perf bench sched messaging -g 25 -l 15
>
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 25 groups == 1000 processes run
>
>      Total time: 0.826 [sec]
>
>  Performance counter stats for 'perf bench sched messaging -g 25 -l 15':
>
>         3426710073      cycles                (65.92%)
>         1348772808      instructions          #0.39  insn per cycle  (75.44%)
>                  0      L1-icache-load-misses (72.28%)
>          201133996      branches              (67.88%)
>           44663584      branch-misses         #22.21% of all branches (35.01%)
>          248194747      r0000000000000200     (41.94%) --> Integer load instruction retired
>          156879950      r0000000000000400     (43.58%) --> Integer store instruction retired
>            6988678      r0000000000000800     (47.91%) --> Atomic memory operation retired
>
>        1.931335000 seconds time elapsed
>
>        1.100415000 seconds user
>        3.755176000 seconds sys
>
>
> QEMU:
> =========
> Perf stat:
> =========
>
> [root@fedora-riscv riscv]# perf stat -e r8000000000000005 -e r8000000000000007 \
> -e r8000000000000006 -e r0000000000020002 -e r0000000000020004 -e branch-misses \
> -e cache-misses -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \
> -e cycles -e instructions perf bench sched messaging -g 15 -l 10 \
> Running with 15*40 (== 600) tasks.
> Time: 6.578
>
>  Performance counter stats for './hackbench -pipe 15 process':
>
>              1,794      r8000000000000005      (52.59%) --> SBI_PMU_FW_SET_TIMER
>              2,859      r8000000000000007      (60.74%) --> SBI_PMU_FW_IPI_RECVD
>              4,205      r8000000000000006      (68.71%) --> SBI_PMU_FW_IPI_SENT
>                  0      r0000000000020002      (81.69%)
>      <not counted>      r0000000000020004      (0.00%)
>      <not counted>      branch-misses          (0.00%)
>      <not counted>      cache-misses           (0.00%)
>          7,878,328      dTLB-load-misses       (15.60%)
>            680,270      dTLB-store-misses      (28.45%)
>          8,287,931      iTLB-load-misses       (39.24%)
>     20,008,506,675      cycles                 (48.60%)
>     21,484,427,932      instructions   # 1.07  insn per cycle (56.60%)
>
>        1.681344735 seconds time elapsed
>
>        0.614460000 seconds user
>        8.313254000 seconds sys
>
>
> Perf record:
> ============
> [root@fedora-riscv riscv]# perf record -e cycles -e instructions \
> -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 10000 \
> perf bench sched messaging -g 15 -l 10
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 15 groups == 600 processes run
>
>      Total time: 1.261 [sec]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.101 MB perf.data (845 samples) ]
>
> [root@fedora-riscv riscv]# perf report
> Available samples
> 407 cycles                                                                     _
> 407 instructions                                                               _
> 18 dTLB-load-misses                                                            _
> 2 dTLB-store-misses                                                            _
> 11 iTLB-load-misses                                                            _
>
> [1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
> [2] https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit
> [3] https://github.com/atishp04/opensbi/tree/pmu_sscofpmf_v2
> [4] https://github.com/atishp04/linux/tree/riscv_pmu_v4
> [5] https://github.com/atishp04/qemu/tree/riscv_pmu_v3
> [6] https://github.com/atishp04/u-boot/tree/hifive_unmatched_dt_pmu
> [7] https://sifive.cdn.prismic.io/sifive/de1491e5-077c-461d-9605-e8a0ce57337d_fu740-c000-manual-v1p3.pdf
>
> Changes from v3->v4:
> 1. Do not proceed overflow handler if event doesn't set for sampling.
> 2. overflow status register is only read after counters are stopped.
> 3. Added the PMU DT node for HiFive Unmatched.
>
> Changes from v2->v3:
> 1. Added interrupt overflow support.
> 2. Cleaned up legacy driver initialization.
> 3. Supports perf record now.
> 4. Added the DT binding and maintainers file.
> 5. Changed cpu hotplug notifier to be multi-state.
> 6. OpenSBI doesn't disable cycle/instret counter during boot. Update the
>    perf code to disable all the counter during the boot.
>
> Changes from v1->v2
> 1. Implemented the latest SBI PMU extension specification.
> 2. The core platform driver was changed to operate as a library while only
>    sbi based PMU is built as a driver. The legacy one is just a fallback if
>    SBI PMU extension is not available.
>
> Atish Patra (11):
> RISC-V: Remove the current perf implementation
> RISC-V: Add CSR encodings for all HPMCOUNTERS
> RISC-V: Add a perf core library for pmu drivers
> RISC-V: Add a simple platform driver for RISC-V legacy perf
> RISC-V: Add RISC-V SBI PMU extension definitions
> dt-binding: pmu: Add RISC-V PMU DT bindings
> RISC-V: Add perf platform driver based on SBI PMU extension
> RISC-V: Add interrupt support for perf
> Documentation: riscv: Remove the old documentation
> riscv: dts: fu740: Add pmu node
> MAINTAINERS: Add entry for RISC-V PMU drivers
>
> .../devicetree/bindings/perf/riscv,pmu.yaml   |  51 ++
> Documentation/riscv/pmu.rst                   | 255 ------
> MAINTAINERS                                   |  10 +
> arch/riscv/Kconfig                            |  13 -
> arch/riscv/boot/dts/sifive/fu740-c000.dtsi    |   3 +
> arch/riscv/include/asm/csr.h                  |  66 +-
> arch/riscv/include/asm/perf_event.h           |  72 --
> arch/riscv/include/asm/sbi.h                  |  97 +++
> arch/riscv/kernel/Makefile                    |   1 -
> arch/riscv/kernel/perf_event.c                | 485 ------------
> drivers/perf/Kconfig                          |  25 +
> drivers/perf/Makefile                         |   5 +
> drivers/perf/riscv_pmu.c                      | 331 ++++++++
> drivers/perf/riscv_pmu_legacy.c               | 143 ++++
> drivers/perf/riscv_pmu_sbi.c                  | 732 ++++++++++++++++++
> include/linux/cpuhotplug.h                    |   1 +
> include/linux/perf/riscv_pmu.h                |  69 ++
> 17 files changed, 1532 insertions(+), 827 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/perf/riscv,pmu.yaml
> delete mode 100644 Documentation/riscv/pmu.rst
> delete mode 100644 arch/riscv/kernel/perf_event.c
> create mode 100644 drivers/perf/riscv_pmu.c
> create mode 100644 drivers/perf/riscv_pmu_legacy.c
> create mode 100644 drivers/perf/riscv_pmu_sbi.c
> create mode 100644 include/linux/perf/riscv_pmu.h
Atish Patra Dec. 14, 2021, 3:16 a.m. UTC | #7
On Mon, Dec 13, 2021 at 5:52 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Mon, 25 Oct 2021 12:53:39 PDT (-0700), Atish Patra wrote:
> > This series adds improved perf support for RISC-V based system using
> > SBI PMU extension[1] and Sscofpmf extension[2]. The SBI PMU extension allows
> > the kernel to program the counters for different events and start/stop counters
> > while the sscofpmf extension allows the counter overflow interrupt and privilege
> > mode filtering. An hardware platform can leverage SBI PMU extension without
> > the sscofpmf extension if it supports mcountinhibit and mcounteren. However,
> > the reverse is not true. With both of these extension enabled, a platform can
> > take advantage of all both event counting and sampling using perf tool.
> >
> > This series introduces a platform perf driver instead of a existing arch
> > specific implementation. The new perf implementation has adopted a modular
> > approach where most of the generic event handling is done in the core library
> > while individual PMUs need to only implement necessary features specific to
> > the PMU. This is easily extensible and any future RISC-V PMU implementation
> > can leverage this. Currently, SBI PMU driver & legacy PMU driver are implemented
> > as a part of this series.
> >
> > The legacy driver tries to reimplement the existing minimal perf under a new
> > config to maintain backward compatibility. This implementation only allows
> > monitoring of always running cycle/instruction counters. Moreover, they can
> > not be started or stopped. In general, this is very limited and not very useful.
> > That's why, I am not very keen to carry the support into the new driver.
> > However, I don't want to break perf for any existing hardware platforms.
> > If everybody agrees that we don't need legacy perf implementation for older
> > implementation, I will be happy to drop PATCH 4.
>
> IMO we should keep it for a bit, so we have a transition period.  These
> extensions are pretty new so we won't be able to count on everyone
> having them yet, this way we'll avoid breaking users.
>

Sure.

> This generally looks good, but I don't see any Acks from the perf
> maintainers.  I'm happy to take this through the RISC-V tree, but I'd
> generally like to have at least an ack as perf isn't really my
> subsystem.  MAINTAINERS seems to indicate that's Will and Mark, they're
> not To'd so maybe they just missed this?
>
> I fixed a few trivial checkpatch warnings, updated Atish's email

Thanks!

> address, and put this on palmer/riscv-pmu.  Happy to hear any comments,

There was a comment from Jessica about needing a DT node in the kernel.
Here is the relevant discussion.

https://patchwork.kernel.org/project/linux-riscv/patch/20211025195350.242914-11-atish.patra@wdc.com/

To summarize, this series uses a PMU DT node to do the following things.

1. The PMU driver is DT probe based platform driver.
2. It uses the DT to discover the interrupt number
3. overflow will not be supported if the interrupt is not present in
the PMU DT node.

Jessica suggested that we should get rid of the DT node in the kernel because
1. The counter overflow interrupt is a fixed local interrupt number (0x13)
2. Overflow support should be enabled based on sscofpmf ISA extension
rather than interrupt property in
a DT node.

We did not have any discussion on the ISA extension discovery process
earlier. Thus, I did not add #2 in the current series.
Recently, it was briefly discussed and I am working on an
implementation based on that discussion.
Thus, I can address all the suggestions from Jessica in the next version.

I am okay with both approaches. Do you have any preference/suggestions
between the two approaches ?

> if nobody says anything then I'll just put that on riscv/for-next
> whenever I get back to my own email.
>

You may want to wait a little bit based on how the above discussion folds.

> > This series has been tested in Qemu (RV64 & RV32) and HiFive Unmatched.
> > Qemu[5] & OpenSBI [3] patches are required to test it on Qemu and a dt patch
> > required in U-Boot[6] for HiFive Unmatched. Qemu changes are not
> > backward compatible. That means, you can not use perf anymore on older Qemu
> > versions with latest OpenSBI and/or Kernel. However, newer kernel will
> > just use legacy pmu driver if old OpenSBI is detected.
> >
> > The U-Boot patch is just an example that encodes few of the events defined
> > in fu740 documentation [7] in the DT. We can update the DT to include all the
> > events defined if required.
> >
> > Here is an output of perf stat/report while running perf benchmark with OpenSBI,
> > Linux kernel and U-Boot patches applied.
> >
> > HiFive Unmatched:
> > =================
> > perf stat -e cycles -e instructions -e L1-icache-load-misses -e branches -e branch-misses \
> > -e r0000000000000200 -e r0000000000000400 \
> > -e r0000000000000800 perf bench sched messaging -g 25 -l 15
> >
> > # Running 'sched/messaging' benchmark:
> > # 20 sender and receiver processes per group
> > # 25 groups == 1000 processes run
> >
> >      Total time: 0.826 [sec]
> >
> >  Performance counter stats for 'perf bench sched messaging -g 25 -l 15':
> >
> >         3426710073      cycles                (65.92%)
> >         1348772808      instructions          #0.39  insn per cycle  (75.44%)
> >                  0      L1-icache-load-misses (72.28%)
> >          201133996      branches              (67.88%)
> >           44663584      branch-misses         #22.21% of all branches (35.01%)
> >          248194747      r0000000000000200     (41.94%) --> Integer load instruction retired
> >          156879950      r0000000000000400     (43.58%) --> Integer store instruction retired
> >            6988678      r0000000000000800     (47.91%) --> Atomic memory operation retired
> >
> >        1.931335000 seconds time elapsed
> >
> >        1.100415000 seconds user
> >        3.755176000 seconds sys
> >
> >
> > QEMU:
> > =========
> > Perf stat:
> > =========
> >
> > [root@fedora-riscv riscv]# perf stat -e r8000000000000005 -e r8000000000000007 \
> > -e r8000000000000006 -e r0000000000020002 -e r0000000000020004 -e branch-misses \
> > -e cache-misses -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \
> > -e cycles -e instructions perf bench sched messaging -g 15 -l 10 \
> > Running with 15*40 (== 600) tasks.
> > Time: 6.578
> >
> >  Performance counter stats for './hackbench -pipe 15 process':
> >
> >              1,794      r8000000000000005      (52.59%) --> SBI_PMU_FW_SET_TIMER
> >              2,859      r8000000000000007      (60.74%) --> SBI_PMU_FW_IPI_RECVD
> >              4,205      r8000000000000006      (68.71%) --> SBI_PMU_FW_IPI_SENT
> >                  0      r0000000000020002      (81.69%)
> >      <not counted>      r0000000000020004      (0.00%)
> >      <not counted>      branch-misses          (0.00%)
> >      <not counted>      cache-misses           (0.00%)
> >          7,878,328      dTLB-load-misses       (15.60%)
> >            680,270      dTLB-store-misses      (28.45%)
> >          8,287,931      iTLB-load-misses       (39.24%)
> >     20,008,506,675      cycles                 (48.60%)
> >     21,484,427,932      instructions   # 1.07  insn per cycle (56.60%)
> >
> >        1.681344735 seconds time elapsed
> >
> >        0.614460000 seconds user
> >        8.313254000 seconds sys
> >
> >
> > Perf record:
> > ============
> > [root@fedora-riscv riscv]# perf record -e cycles -e instructions \
> > -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 10000 \
> > perf bench sched messaging -g 15 -l 10
> > # Running 'sched/messaging' benchmark:
> > # 20 sender and receiver processes per group
> > # 15 groups == 600 processes run
> >
> >      Total time: 1.261 [sec]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.101 MB perf.data (845 samples) ]
> >
> > [root@fedora-riscv riscv]# perf report
> > Available samples
> > 407 cycles                                                                     _
> > 407 instructions                                                               _
> > 18 dTLB-load-misses                                                            _
> > 2 dTLB-store-misses                                                            _
> > 11 iTLB-load-misses                                                            _
> >
> > [1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
> > [2] https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit
> > [3] https://github.com/atishp04/opensbi/tree/pmu_sscofpmf_v2
> > [4] https://github.com/atishp04/linux/tree/riscv_pmu_v4
> > [5] https://github.com/atishp04/qemu/tree/riscv_pmu_v3
> > [6] https://github.com/atishp04/u-boot/tree/hifive_unmatched_dt_pmu
> > [7] https://sifive.cdn.prismic.io/sifive/de1491e5-077c-461d-9605-e8a0ce57337d_fu740-c000-manual-v1p3.pdf
> >
> > Changes from v3->v4:
> > 1. Do not proceed overflow handler if event doesn't set for sampling.
> > 2. overflow status register is only read after counters are stopped.
> > 3. Added the PMU DT node for HiFive Unmatched.
> >
> > Changes from v2->v3:
> > 1. Added interrupt overflow support.
> > 2. Cleaned up legacy driver initialization.
> > 3. Supports perf record now.
> > 4. Added the DT binding and maintainers file.
> > 5. Changed cpu hotplug notifier to be multi-state.
> > 6. OpenSBI doesn't disable cycle/instret counter during boot. Update the
> >    perf code to disable all the counter during the boot.
> >
> > Changes from v1->v2
> > 1. Implemented the latest SBI PMU extension specification.
> > 2. The core platform driver was changed to operate as a library while only
> >    sbi based PMU is built as a driver. The legacy one is just a fallback if
> >    SBI PMU extension is not available.
> >
> > Atish Patra (11):
> > RISC-V: Remove the current perf implementation
> > RISC-V: Add CSR encodings for all HPMCOUNTERS
> > RISC-V: Add a perf core library for pmu drivers
> > RISC-V: Add a simple platform driver for RISC-V legacy perf
> > RISC-V: Add RISC-V SBI PMU extension definitions
> > dt-binding: pmu: Add RISC-V PMU DT bindings
> > RISC-V: Add perf platform driver based on SBI PMU extension
> > RISC-V: Add interrupt support for perf
> > Documentation: riscv: Remove the old documentation
> > riscv: dts: fu740: Add pmu node
> > MAINTAINERS: Add entry for RISC-V PMU drivers
> >
> > .../devicetree/bindings/perf/riscv,pmu.yaml   |  51 ++
> > Documentation/riscv/pmu.rst                   | 255 ------
> > MAINTAINERS                                   |  10 +
> > arch/riscv/Kconfig                            |  13 -
> > arch/riscv/boot/dts/sifive/fu740-c000.dtsi    |   3 +
> > arch/riscv/include/asm/csr.h                  |  66 +-
> > arch/riscv/include/asm/perf_event.h           |  72 --
> > arch/riscv/include/asm/sbi.h                  |  97 +++
> > arch/riscv/kernel/Makefile                    |   1 -
> > arch/riscv/kernel/perf_event.c                | 485 ------------
> > drivers/perf/Kconfig                          |  25 +
> > drivers/perf/Makefile                         |   5 +
> > drivers/perf/riscv_pmu.c                      | 331 ++++++++
> > drivers/perf/riscv_pmu_legacy.c               | 143 ++++
> > drivers/perf/riscv_pmu_sbi.c                  | 732 ++++++++++++++++++
> > include/linux/cpuhotplug.h                    |   1 +
> > include/linux/perf/riscv_pmu.h                |  69 ++
> > 17 files changed, 1532 insertions(+), 827 deletions(-)
> > create mode 100644 Documentation/devicetree/bindings/perf/riscv,pmu.yaml
> > delete mode 100644 Documentation/riscv/pmu.rst
> > delete mode 100644 arch/riscv/kernel/perf_event.c
> > create mode 100644 drivers/perf/riscv_pmu.c
> > create mode 100644 drivers/perf/riscv_pmu_legacy.c
> > create mode 100644 drivers/perf/riscv_pmu_sbi.c
> > create mode 100644 include/linux/perf/riscv_pmu.h
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Nikita Shubin Dec. 14, 2021, 9:14 a.m. UTC | #8
Hello Atish!

I get linker error if CONFIG_RISCV_PMU_LEGACY is not set:

riscv64-unknown-linux-gnu-ld: drivers/perf/riscv_pmu_sbi.o: in function
`pmu_sbi_device_probe': linux/drivers/perf/riscv_pmu_sbi.c:688:
undefined reference to `riscv_pmu_legacy_init'

It looks like you need some guards or a dummy function for
riscv_pmu_legacy_init.

On Mon, 25 Oct 2021 12:53:39 -0700
Atish Patra <atish.patra@wdc.com> wrote:

> This series adds improved perf support for RISC-V based system using
> SBI PMU extension[1] and Sscofpmf extension[2]. The SBI PMU extension
> allows the kernel to program the counters for different events and
> start/stop counters while the sscofpmf extension allows the counter
> overflow interrupt and privilege mode filtering. An hardware platform
> can leverage SBI PMU extension without the sscofpmf extension if it
> supports mcountinhibit and mcounteren. However, the reverse is not
> true. With both of these extension enabled, a platform can take
> advantage of all both event counting and sampling using perf tool. 
> 
> This series introduces a platform perf driver instead of a existing
> arch specific implementation. The new perf implementation has adopted
> a modular approach where most of the generic event handling is done
> in the core library while individual PMUs need to only implement
> necessary features specific to the PMU. This is easily extensible and
> any future RISC-V PMU implementation can leverage this. Currently,
> SBI PMU driver & legacy PMU driver are implemented as a part of this
> series.
> 
> The legacy driver tries to reimplement the existing minimal perf
> under a new config to maintain backward compatibility. This
> implementation only allows monitoring of always running
> cycle/instruction counters. Moreover, they can not be started or
> stopped. In general, this is very limited and not very useful. That's
> why, I am not very keen to carry the support into the new driver.
> However, I don't want to break perf for any existing hardware
> platforms. If everybody agrees that we don't need legacy perf
> implementation for older implementation, I will be happy to drop
> PATCH 4.
> 
> This series has been tested in Qemu (RV64 & RV32) and HiFive
> Unmatched. Qemu[5] & OpenSBI [3] patches are required to test it on
> Qemu and a dt patch required in U-Boot[6] for HiFive Unmatched. Qemu
> changes are not backward compatible. That means, you can not use perf
> anymore on older Qemu versions with latest OpenSBI and/or Kernel.
> However, newer kernel will just use legacy pmu driver if old OpenSBI
> is detected.
> 
> The U-Boot patch is just an example that encodes few of the events
> defined in fu740 documentation [7] in the DT. We can update the DT to
> include all the events defined if required.
> 
> Here is an output of perf stat/report while running perf benchmark
> with OpenSBI, Linux kernel and U-Boot patches applied.
> 
> HiFive Unmatched:
> =================
> perf stat -e cycles -e instructions -e L1-icache-load-misses -e
> branches -e branch-misses \ -e r0000000000000200 -e r0000000000000400
> \ -e r0000000000000800 perf bench sched messaging -g 25 -l 15
> 
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 25 groups == 1000 processes run
> 
>      Total time: 0.826 [sec]
> 
>  Performance counter stats for 'perf bench sched messaging -g 25 -l
> 15':
> 
>         3426710073      cycles                (65.92%)
>         1348772808      instructions          #0.39  insn per cycle
> (75.44%) 0      L1-icache-load-misses (72.28%)
>          201133996      branches              (67.88%)
>           44663584      branch-misses         #22.21% of all branches
> (35.01%) 248194747      r0000000000000200     (41.94%) --> Integer
> load instruction retired 156879950      r0000000000000400
> (43.58%) --> Integer store instruction retired 6988678
> r0000000000000800     (47.91%) --> Atomic memory operation retired
> 
>        1.931335000 seconds time elapsed
> 
>        1.100415000 seconds user
>        3.755176000 seconds sys
> 
> 
> QEMU:
> =========
> Perf stat:
> =========
> 
> [root@fedora-riscv riscv]# perf stat -e r8000000000000005 -e
> r8000000000000007 \ -e r8000000000000006 -e r0000000000020002 -e
> r0000000000020004 -e branch-misses \ -e cache-misses -e
> dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \ -e cycles
> -e instructions perf bench sched messaging -g 15 -l 10 \ Running with
> 15*40 (== 600) tasks. Time: 6.578
> 
>  Performance counter stats for './hackbench -pipe 15 process':
> 
>              1,794      r8000000000000005      (52.59%) -->
> SBI_PMU_FW_SET_TIMER 2,859      r8000000000000007      (60.74%) -->
> SBI_PMU_FW_IPI_RECVD 4,205      r8000000000000006      (68.71%) -->
> SBI_PMU_FW_IPI_SENT 0      r0000000000020002      (81.69%)
>      <not counted>      r0000000000020004      (0.00%)
>      <not counted>      branch-misses          (0.00%)
>      <not counted>      cache-misses           (0.00%)
>          7,878,328      dTLB-load-misses       (15.60%)
>            680,270      dTLB-store-misses      (28.45%)
>          8,287,931      iTLB-load-misses       (39.24%)
>     20,008,506,675      cycles                 (48.60%)
>     21,484,427,932      instructions   # 1.07  insn per cycle (56.60%)
> 
>        1.681344735 seconds time elapsed
> 
>        0.614460000 seconds user
>        8.313254000 seconds sys
> 
> 
> Perf record:
> ============
> [root@fedora-riscv riscv]# perf record -e cycles -e instructions \
> -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 10000
> \ perf bench sched messaging -g 15 -l 10
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 15 groups == 600 processes run
> 
>      Total time: 1.261 [sec]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.101 MB perf.data (845 samples) ]
> 
> [root@fedora-riscv riscv]# perf report
> Available samples
> 407 cycles
>          _ 407 instructions
>                     _ 18 dTLB-load-misses
>                                _ 2 dTLB-store-misses
>                                           _ 11 iTLB-load-misses
>                                                      _ 
> [1]
> https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
> [2]
> https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit
> [3] https://github.com/atishp04/opensbi/tree/pmu_sscofpmf_v2 [4]
> https://github.com/atishp04/linux/tree/riscv_pmu_v4 [5]
> https://github.com/atishp04/qemu/tree/riscv_pmu_v3 [6]
> https://github.com/atishp04/u-boot/tree/hifive_unmatched_dt_pmu [7]
> https://sifive.cdn.prismic.io/sifive/de1491e5-077c-461d-9605-e8a0ce57337d_fu740-c000-manual-v1p3.pdf
> 
> Changes from v3->v4:
> 1. Do not proceed overflow handler if event doesn't set for sampling.
> 2. overflow status register is only read after counters are stopped.
> 3. Added the PMU DT node for HiFive Unmatched.
> 
> Changes from v2->v3:
> 1. Added interrupt overflow support.
> 2. Cleaned up legacy driver initialization.
> 3. Supports perf record now.
> 4. Added the DT binding and maintainers file.
> 5. Changed cpu hotplug notifier to be multi-state.
> 6. OpenSBI doesn't disable cycle/instret counter during boot. Update
> the perf code to disable all the counter during the boot.
> 
> Changes from v1->v2
> 1. Implemented the latest SBI PMU extension specification.
> 2. The core platform driver was changed to operate as a library while
> only sbi based PMU is built as a driver. The legacy one is just a
> fallback if SBI PMU extension is not available.
> 
> Atish Patra (11):
> RISC-V: Remove the current perf implementation
> RISC-V: Add CSR encodings for all HPMCOUNTERS
> RISC-V: Add a perf core library for pmu drivers
> RISC-V: Add a simple platform driver for RISC-V legacy perf
> RISC-V: Add RISC-V SBI PMU extension definitions
> dt-binding: pmu: Add RISC-V PMU DT bindings
> RISC-V: Add perf platform driver based on SBI PMU extension
> RISC-V: Add interrupt support for perf
> Documentation: riscv: Remove the old documentation
> riscv: dts: fu740: Add pmu node
> MAINTAINERS: Add entry for RISC-V PMU drivers
> 
> .../devicetree/bindings/perf/riscv,pmu.yaml   |  51 ++
> Documentation/riscv/pmu.rst                   | 255 ------
> MAINTAINERS                                   |  10 +
> arch/riscv/Kconfig                            |  13 -
> arch/riscv/boot/dts/sifive/fu740-c000.dtsi    |   3 +
> arch/riscv/include/asm/csr.h                  |  66 +-
> arch/riscv/include/asm/perf_event.h           |  72 --
> arch/riscv/include/asm/sbi.h                  |  97 +++
> arch/riscv/kernel/Makefile                    |   1 -
> arch/riscv/kernel/perf_event.c                | 485 ------------
> drivers/perf/Kconfig                          |  25 +
> drivers/perf/Makefile                         |   5 +
> drivers/perf/riscv_pmu.c                      | 331 ++++++++
> drivers/perf/riscv_pmu_legacy.c               | 143 ++++
> drivers/perf/riscv_pmu_sbi.c                  | 732 ++++++++++++++++++
> include/linux/cpuhotplug.h                    |   1 +
> include/linux/perf/riscv_pmu.h                |  69 ++
> 17 files changed, 1532 insertions(+), 827 deletions(-)
> create mode 100644
> Documentation/devicetree/bindings/perf/riscv,pmu.yaml delete mode
> 100644 Documentation/riscv/pmu.rst delete mode 100644
> arch/riscv/kernel/perf_event.c create mode 100644
> drivers/perf/riscv_pmu.c create mode 100644
> drivers/perf/riscv_pmu_legacy.c create mode 100644
> drivers/perf/riscv_pmu_sbi.c create mode 100644
> include/linux/perf/riscv_pmu.h
> 
> --
> 2.31.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Will Deacon Dec. 14, 2021, 6:09 p.m. UTC | #9
On Mon, Dec 13, 2021 at 05:51:28PM -0800, Palmer Dabbelt wrote:
> On Mon, 25 Oct 2021 12:53:39 PDT (-0700), Atish Patra wrote:
> > This series adds improved perf support for RISC-V based system using
> > SBI PMU extension[1] and Sscofpmf extension[2]. The SBI PMU extension allows
> > the kernel to program the counters for different events and start/stop counters
> > while the sscofpmf extension allows the counter overflow interrupt and privilege
> > mode filtering. An hardware platform can leverage SBI PMU extension without
> > the sscofpmf extension if it supports mcountinhibit and mcounteren. However,
> > the reverse is not true. With both of these extension enabled, a platform can
> > take advantage of all both event counting and sampling using perf tool.
> > 
> > This series introduces a platform perf driver instead of a existing arch
> > specific implementation. The new perf implementation has adopted a modular
> > approach where most of the generic event handling is done in the core library
> > while individual PMUs need to only implement necessary features specific to
> > the PMU. This is easily extensible and any future RISC-V PMU implementation
> > can leverage this. Currently, SBI PMU driver & legacy PMU driver are implemented
> > as a part of this series.
> > 
> > The legacy driver tries to reimplement the existing minimal perf under a new
> > config to maintain backward compatibility. This implementation only allows
> > monitoring of always running cycle/instruction counters. Moreover, they can
> > not be started or stopped. In general, this is very limited and not very useful.
> > That's why, I am not very keen to carry the support into the new driver.
> > However, I don't want to break perf for any existing hardware platforms.
> > If everybody agrees that we don't need legacy perf implementation for older
> > implementation, I will be happy to drop PATCH 4.
> 
> IMO we should keep it for a bit, so we have a transition period.  These
> extensions are pretty new so we won't be able to count on everyone having
> them yet, this way we'll avoid breaking users.
> 
> This generally looks good, but I don't see any Acks from the perf
> maintainers.  I'm happy to take this through the RISC-V tree, but I'd
> generally like to have at least an ack as perf isn't really my subsystem.
> MAINTAINERS seems to indicate that's Will and Mark, they're not To'd so
> maybe they just missed this?
> 
> I fixed a few trivial checkpatch warnings, updated Atish's email address,
> and put this on palmer/riscv-pmu.  Happy to hear any comments, if nobody
> says anything then I'll just put that on riscv/for-next whenever I get back
> to my own email.

Fine by me! Most (all?) of the other drivers under drivers/perf/ are for
arm64, so I'm more than happy for you to handle the riscv one yourself.
If I end up with something that touches all of the drivers then we can
use a shared branch or something.

Will
Atish Patra Dec. 14, 2021, 6:29 p.m. UTC | #10
On Tue, Dec 14, 2021 at 1:14 AM Nikita Shubin <nikita.shubin@maquefel.me> wrote:
>
> Hello Atish!
>
> I get linker error if CONFIG_RISCV_PMU_LEGACY is not set:
>
> riscv64-unknown-linux-gnu-ld: drivers/perf/riscv_pmu_sbi.o: in function
> `pmu_sbi_device_probe': linux/drivers/perf/riscv_pmu_sbi.c:688:
> undefined reference to `riscv_pmu_legacy_init'
>
> It looks like you need some guards or a dummy function for
> riscv_pmu_legacy_init.

Ahh yes. Thanks for catching that. I will fix it in the next version.

>
> On Mon, 25 Oct 2021 12:53:39 -0700
> Atish Patra <atish.patra@wdc.com> wrote:
>
> > This series adds improved perf support for RISC-V based system using
> > SBI PMU extension[1] and Sscofpmf extension[2]. The SBI PMU extension
> > allows the kernel to program the counters for different events and
> > start/stop counters while the sscofpmf extension allows the counter
> > overflow interrupt and privilege mode filtering. An hardware platform
> > can leverage SBI PMU extension without the sscofpmf extension if it
> > supports mcountinhibit and mcounteren. However, the reverse is not
> > true. With both of these extension enabled, a platform can take
> > advantage of all both event counting and sampling using perf tool.
> >
> > This series introduces a platform perf driver instead of a existing
> > arch specific implementation. The new perf implementation has adopted
> > a modular approach where most of the generic event handling is done
> > in the core library while individual PMUs need to only implement
> > necessary features specific to the PMU. This is easily extensible and
> > any future RISC-V PMU implementation can leverage this. Currently,
> > SBI PMU driver & legacy PMU driver are implemented as a part of this
> > series.
> >
> > The legacy driver tries to reimplement the existing minimal perf
> > under a new config to maintain backward compatibility. This
> > implementation only allows monitoring of always running
> > cycle/instruction counters. Moreover, they can not be started or
> > stopped. In general, this is very limited and not very useful. That's
> > why, I am not very keen to carry the support into the new driver.
> > However, I don't want to break perf for any existing hardware
> > platforms. If everybody agrees that we don't need legacy perf
> > implementation for older implementation, I will be happy to drop
> > PATCH 4.
> >
> > This series has been tested in Qemu (RV64 & RV32) and HiFive
> > Unmatched. Qemu[5] & OpenSBI [3] patches are required to test it on
> > Qemu and a dt patch required in U-Boot[6] for HiFive Unmatched. Qemu
> > changes are not backward compatible. That means, you can not use perf
> > anymore on older Qemu versions with latest OpenSBI and/or Kernel.
> > However, newer kernel will just use legacy pmu driver if old OpenSBI
> > is detected.
> >
> > The U-Boot patch is just an example that encodes few of the events
> > defined in fu740 documentation [7] in the DT. We can update the DT to
> > include all the events defined if required.
> >
> > Here is an output of perf stat/report while running perf benchmark
> > with OpenSBI, Linux kernel and U-Boot patches applied.
> >
> > HiFive Unmatched:
> > =================
> > perf stat -e cycles -e instructions -e L1-icache-load-misses -e
> > branches -e branch-misses \ -e r0000000000000200 -e r0000000000000400
> > \ -e r0000000000000800 perf bench sched messaging -g 25 -l 15
> >
> > # Running 'sched/messaging' benchmark:
> > # 20 sender and receiver processes per group
> > # 25 groups == 1000 processes run
> >
> >      Total time: 0.826 [sec]
> >
> >  Performance counter stats for 'perf bench sched messaging -g 25 -l
> > 15':
> >
> >         3426710073      cycles                (65.92%)
> >         1348772808      instructions          #0.39  insn per cycle
> > (75.44%) 0      L1-icache-load-misses (72.28%)
> >          201133996      branches              (67.88%)
> >           44663584      branch-misses         #22.21% of all branches
> > (35.01%) 248194747      r0000000000000200     (41.94%) --> Integer
> > load instruction retired 156879950      r0000000000000400
> > (43.58%) --> Integer store instruction retired 6988678
> > r0000000000000800     (47.91%) --> Atomic memory operation retired
> >
> >        1.931335000 seconds time elapsed
> >
> >        1.100415000 seconds user
> >        3.755176000 seconds sys
> >
> >
> > QEMU:
> > =========
> > Perf stat:
> > =========
> >
> > [root@fedora-riscv riscv]# perf stat -e r8000000000000005 -e
> > r8000000000000007 \ -e r8000000000000006 -e r0000000000020002 -e
> > r0000000000020004 -e branch-misses \ -e cache-misses -e
> > dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \ -e cycles
> > -e instructions perf bench sched messaging -g 15 -l 10 \ Running with
> > 15*40 (== 600) tasks. Time: 6.578
> >
> >  Performance counter stats for './hackbench -pipe 15 process':
> >
> >              1,794      r8000000000000005      (52.59%) -->
> > SBI_PMU_FW_SET_TIMER 2,859      r8000000000000007      (60.74%) -->
> > SBI_PMU_FW_IPI_RECVD 4,205      r8000000000000006      (68.71%) -->
> > SBI_PMU_FW_IPI_SENT 0      r0000000000020002      (81.69%)
> >      <not counted>      r0000000000020004      (0.00%)
> >      <not counted>      branch-misses          (0.00%)
> >      <not counted>      cache-misses           (0.00%)
> >          7,878,328      dTLB-load-misses       (15.60%)
> >            680,270      dTLB-store-misses      (28.45%)
> >          8,287,931      iTLB-load-misses       (39.24%)
> >     20,008,506,675      cycles                 (48.60%)
> >     21,484,427,932      instructions   # 1.07  insn per cycle (56.60%)
> >
> >        1.681344735 seconds time elapsed
> >
> >        0.614460000 seconds user
> >        8.313254000 seconds sys
> >
> >
> > Perf record:
> > ============
> > [root@fedora-riscv riscv]# perf record -e cycles -e instructions \
> > -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 10000
> > \ perf bench sched messaging -g 15 -l 10
> > # Running 'sched/messaging' benchmark:
> > # 20 sender and receiver processes per group
> > # 15 groups == 600 processes run
> >
> >      Total time: 1.261 [sec]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.101 MB perf.data (845 samples) ]
> >
> > [root@fedora-riscv riscv]# perf report
> > Available samples
> > 407 cycles
> >          _ 407 instructions
> >                     _ 18 dTLB-load-misses
> >                                _ 2 dTLB-store-misses
> >                                           _ 11 iTLB-load-misses
> >                                                      _
> > [1]
> > https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
> > [2]
> > https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit
> > [3] https://github.com/atishp04/opensbi/tree/pmu_sscofpmf_v2 [4]
> > https://github.com/atishp04/linux/tree/riscv_pmu_v4 [5]
> > https://github.com/atishp04/qemu/tree/riscv_pmu_v3 [6]
> > https://github.com/atishp04/u-boot/tree/hifive_unmatched_dt_pmu [7]
> > https://sifive.cdn.prismic.io/sifive/de1491e5-077c-461d-9605-e8a0ce57337d_fu740-c000-manual-v1p3.pdf
> >
> > Changes from v3->v4:
> > 1. Do not proceed overflow handler if event doesn't set for sampling.
> > 2. overflow status register is only read after counters are stopped.
> > 3. Added the PMU DT node for HiFive Unmatched.
> >
> > Changes from v2->v3:
> > 1. Added interrupt overflow support.
> > 2. Cleaned up legacy driver initialization.
> > 3. Supports perf record now.
> > 4. Added the DT binding and maintainers file.
> > 5. Changed cpu hotplug notifier to be multi-state.
> > 6. OpenSBI doesn't disable cycle/instret counter during boot. Update
> > the perf code to disable all the counter during the boot.
> >
> > Changes from v1->v2
> > 1. Implemented the latest SBI PMU extension specification.
> > 2. The core platform driver was changed to operate as a library while
> > only sbi based PMU is built as a driver. The legacy one is just a
> > fallback if SBI PMU extension is not available.
> >
> > Atish Patra (11):
> > RISC-V: Remove the current perf implementation
> > RISC-V: Add CSR encodings for all HPMCOUNTERS
> > RISC-V: Add a perf core library for pmu drivers
> > RISC-V: Add a simple platform driver for RISC-V legacy perf
> > RISC-V: Add RISC-V SBI PMU extension definitions
> > dt-binding: pmu: Add RISC-V PMU DT bindings
> > RISC-V: Add perf platform driver based on SBI PMU extension
> > RISC-V: Add interrupt support for perf
> > Documentation: riscv: Remove the old documentation
> > riscv: dts: fu740: Add pmu node
> > MAINTAINERS: Add entry for RISC-V PMU drivers
> >
> > .../devicetree/bindings/perf/riscv,pmu.yaml   |  51 ++
> > Documentation/riscv/pmu.rst                   | 255 ------
> > MAINTAINERS                                   |  10 +
> > arch/riscv/Kconfig                            |  13 -
> > arch/riscv/boot/dts/sifive/fu740-c000.dtsi    |   3 +
> > arch/riscv/include/asm/csr.h                  |  66 +-
> > arch/riscv/include/asm/perf_event.h           |  72 --
> > arch/riscv/include/asm/sbi.h                  |  97 +++
> > arch/riscv/kernel/Makefile                    |   1 -
> > arch/riscv/kernel/perf_event.c                | 485 ------------
> > drivers/perf/Kconfig                          |  25 +
> > drivers/perf/Makefile                         |   5 +
> > drivers/perf/riscv_pmu.c                      | 331 ++++++++
> > drivers/perf/riscv_pmu_legacy.c               | 143 ++++
> > drivers/perf/riscv_pmu_sbi.c                  | 732 ++++++++++++++++++
> > include/linux/cpuhotplug.h                    |   1 +
> > include/linux/perf/riscv_pmu.h                |  69 ++
> > 17 files changed, 1532 insertions(+), 827 deletions(-)
> > create mode 100644
> > Documentation/devicetree/bindings/perf/riscv,pmu.yaml delete mode
> > 100644 Documentation/riscv/pmu.rst delete mode 100644
> > arch/riscv/kernel/perf_event.c create mode 100644
> > drivers/perf/riscv_pmu.c create mode 100644
> > drivers/perf/riscv_pmu_legacy.c create mode 100644
> > drivers/perf/riscv_pmu_sbi.c create mode 100644
> > include/linux/perf/riscv_pmu.h
> >
> > --
> > 2.31.1
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Nikita Shubin Dec. 15, 2021, 8:02 a.m. UTC | #11
Hello Atish!

On Mon, 25 Oct 2021 12:53:44 -0700
Atish Patra <atish.patra@wdc.com> wrote:

> This patch adds all the definitions defined by the SBI PMU extension.
> 
> Signed-off-by: Atish Patra <atish.patra@wdc.com>
> ---
>  arch/riscv/include/asm/sbi.h | 97
> ++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/sbi.h
> b/arch/riscv/include/asm/sbi.h index 0d42693cb65e..7a14ca06ba8f 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -27,6 +27,7 @@ enum sbi_ext_id {
>  	SBI_EXT_IPI = 0x735049,
>  	SBI_EXT_RFENCE = 0x52464E43,
>  	SBI_EXT_HSM = 0x48534D,
> +	SBI_EXT_PMU = 0x504D55,
>  };
>  
>  enum sbi_ext_base_fid {
> @@ -70,6 +71,99 @@ enum sbi_hsm_hart_status {
>  	SBI_HSM_HART_STATUS_STOP_PENDING,
>  };
>  
> +
> +enum sbi_ext_pmu_fid {
> +	SBI_EXT_PMU_NUM_COUNTERS = 0,
> +	SBI_EXT_PMU_COUNTER_GET_INFO,
> +	SBI_EXT_PMU_COUNTER_CFG_MATCH,
> +	SBI_EXT_PMU_COUNTER_START,
> +	SBI_EXT_PMU_COUNTER_STOP,
> +	SBI_EXT_PMU_COUNTER_FW_READ,
> +};
> +
> +#define RISCV_PMU_RAW_EVENT_MASK GENMASK_ULL(55, 0)
> +#define RISCV_PMU_RAW_EVENT_IDX 0x20000
> +
> +/** General pmu event codes specified in SBI PMU extension */
> +enum sbi_pmu_hw_generic_events_t {
> +	SBI_PMU_HW_NO_EVENT			= 0,
> +	SBI_PMU_HW_CPU_CYCLES			= 1,
> +	SBI_PMU_HW_INSTRUCTIONS			= 2,
> +	SBI_PMU_HW_CACHE_REFERENCES		= 3,
> +	SBI_PMU_HW_CACHE_MISSES			= 4,
> +	SBI_PMU_HW_BRANCH_INSTRUCTIONS		= 5,
> +	SBI_PMU_HW_BRANCH_MISSES		= 6,
> +	SBI_PMU_HW_BUS_CYCLES			= 7,
> +	SBI_PMU_HW_STALLED_CYCLES_FRONTEND	= 8,
> +	SBI_PMU_HW_STALLED_CYCLES_BACKEND	= 9,
> +	SBI_PMU_HW_REF_CPU_CYCLES		= 10,
> +
> +	SBI_PMU_HW_GENERAL_MAX,
> +};
> +
> +/**
> + * Special "firmware" events provided by the firmware, even if the
> hardware
> + * does not support performance events. These events are encoded as
> a raw
> + * event type in Linux kernel perf framework.
> + */
> +enum sbi_pmu_fw_generic_events_t {
> +	SBI_PMU_FW_MISALIGNED_LOAD	= 0,
> +	SBI_PMU_FW_MISALIGNED_STORE	= 1,
> +	SBI_PMU_FW_ACCESS_LOAD		= 2,
> +	SBI_PMU_FW_ACCESS_STORE		= 3,
> +	SBI_PMU_FW_ILLEGAL_INSN		= 4,
> +	SBI_PMU_FW_SET_TIMER		= 5,
> +	SBI_PMU_FW_IPI_SENT		= 6,
> +	SBI_PMU_FW_IPI_RECVD		= 7,
> +	SBI_PMU_FW_FENCE_I_SENT		= 8,
> +	SBI_PMU_FW_FENCE_I_RECVD	= 9,
> +	SBI_PMU_FW_SFENCE_VMA_SENT	= 10,
> +	SBI_PMU_FW_SFENCE_VMA_RCVD	= 11,
> +	SBI_PMU_FW_SFENCE_VMA_ASID_SENT	= 12,
> +	SBI_PMU_FW_SFENCE_VMA_ASID_RCVD	= 13,
> +
> +	SBI_PMU_FW_HFENCE_GVMA_SENT	= 14,
> +	SBI_PMU_FW_HFENCE_GVMA_RCVD	= 15,
> +	SBI_PMU_FW_HFENCE_GVMA_VMID_SENT = 16,
> +	SBI_PMU_FW_HFENCE_GVMA_VMID_RCVD = 17,
> +
> +	SBI_PMU_FW_HFENCE_VVMA_SENT	= 18,
> +	SBI_PMU_FW_HFENCE_VVMA_RCVD	= 19,
> +	SBI_PMU_FW_HFENCE_VVMA_ASID_SENT = 20,
> +	SBI_PMU_FW_HFENCE_VVMA_ASID_RCVD = 21,
> +	SBI_PMU_FW_MAX,
> +};
> +
> +/* SBI PMU event types */
> +enum sbi_pmu_event_type {
> +	SBI_PMU_EVENT_TYPE_HW = 0x0,
> +	SBI_PMU_EVENT_TYPE_CACHE = 0x1,
> +	SBI_PMU_EVENT_TYPE_RAW = 0x2,
> +	SBI_PMU_EVENT_TYPE_FW = 0xf,
> +};
> +
> +/* SBI PMU event types */
> +enum sbi_pmu_ctr_type {
> +	SBI_PMU_CTR_TYPE_HW = 0x0,
> +	SBI_PMU_CTR_TYPE_FW,
> +};
> +
> +/* Flags defined for config matching function */
> +#define SBI_PMU_CFG_FLAG_SKIP_MATCH	(1 << 0)
> +#define SBI_PMU_CFG_FLAG_CLEAR_VALUE	(1 << 1)
> +#define SBI_PMU_CFG_FLAG_AUTO_START	(1 << 2)
> +#define SBI_PMU_CFG_FLAG_SET_MINH	(1 << 3)
> +#define SBI_PMU_CFG_FLAG_SET_SINH	(1 << 4)
> +#define SBI_PMU_CFG_FLAG_SET_UINH	(1 << 5)
> +#define SBI_PMU_CFG_FLAG_SET_VSINH	(1 << 6)
> +#define SBI_PMU_CFG_FLAG_SET_VUINH	(1 << 7)

It looks like you have a typo here, the defines in OpenSBI are
different:

#define SBI_PMU_CFG_FLAG_SET_VUINH	(1 << 3)
#define SBI_PMU_CFG_FLAG_SET_VSINH	(1 << 4)
#define SBI_PMU_CFG_FLAG_SET_UINH	(1 << 5)
#define SBI_PMU_CFG_FLAG_SET_SINH	(1 << 6)
#define SBI_PMU_CFG_FLAG_SET_MINH	(1 << 7)


> +
> +/* Flags defined for counter start function */
> +#define SBI_PMU_START_FLAG_SET_INIT_VALUE (1 << 0)
> +
> +/* Flags defined for counter stop function */
> +#define SBI_PMU_STOP_FLAG_RESET (1 << 0)
> +
>  #define SBI_SPEC_VERSION_DEFAULT	0x1
>  #define SBI_SPEC_VERSION_MAJOR_SHIFT	24
>  #define SBI_SPEC_VERSION_MAJOR_MASK	0x7f
> @@ -82,6 +176,9 @@ enum sbi_hsm_hart_status {
>  #define SBI_ERR_INVALID_PARAM	-3
>  #define SBI_ERR_DENIED		-4
>  #define SBI_ERR_INVALID_ADDRESS	-5
> +#define SBI_ERR_ALREADY_AVAILABLE -6
> +#define SBI_ERR_ALREADY_STARTED -7
> +#define SBI_ERR_ALREADY_STOPPED -8
>  
>  extern unsigned long sbi_spec_version;
>  struct sbiret {
Atish Patra Dec. 15, 2021, 4:03 p.m. UTC | #12
On Wed, Dec 15, 2021 at 12:10 AM Nikita Shubin
<nikita.shubin@maquefel.me> wrote:
>
> Hello Atish!
>
> On Mon, 25 Oct 2021 12:53:44 -0700
> Atish Patra <atish.patra@wdc.com> wrote:
>
> > This patch adds all the definitions defined by the SBI PMU extension.
> >
> > Signed-off-by: Atish Patra <atish.patra@wdc.com>
> > ---
> >  arch/riscv/include/asm/sbi.h | 97
> > ++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/sbi.h
> > b/arch/riscv/include/asm/sbi.h index 0d42693cb65e..7a14ca06ba8f 100644
> > --- a/arch/riscv/include/asm/sbi.h
> > +++ b/arch/riscv/include/asm/sbi.h
> > @@ -27,6 +27,7 @@ enum sbi_ext_id {
> >       SBI_EXT_IPI = 0x735049,
> >       SBI_EXT_RFENCE = 0x52464E43,
> >       SBI_EXT_HSM = 0x48534D,
> > +     SBI_EXT_PMU = 0x504D55,
> >  };
> >
> >  enum sbi_ext_base_fid {
> > @@ -70,6 +71,99 @@ enum sbi_hsm_hart_status {
> >       SBI_HSM_HART_STATUS_STOP_PENDING,
> >  };
> >
> > +
> > +enum sbi_ext_pmu_fid {
> > +     SBI_EXT_PMU_NUM_COUNTERS = 0,
> > +     SBI_EXT_PMU_COUNTER_GET_INFO,
> > +     SBI_EXT_PMU_COUNTER_CFG_MATCH,
> > +     SBI_EXT_PMU_COUNTER_START,
> > +     SBI_EXT_PMU_COUNTER_STOP,
> > +     SBI_EXT_PMU_COUNTER_FW_READ,
> > +};
> > +
> > +#define RISCV_PMU_RAW_EVENT_MASK GENMASK_ULL(55, 0)
> > +#define RISCV_PMU_RAW_EVENT_IDX 0x20000
> > +
> > +/** General pmu event codes specified in SBI PMU extension */
> > +enum sbi_pmu_hw_generic_events_t {
> > +     SBI_PMU_HW_NO_EVENT                     = 0,
> > +     SBI_PMU_HW_CPU_CYCLES                   = 1,
> > +     SBI_PMU_HW_INSTRUCTIONS                 = 2,
> > +     SBI_PMU_HW_CACHE_REFERENCES             = 3,
> > +     SBI_PMU_HW_CACHE_MISSES                 = 4,
> > +     SBI_PMU_HW_BRANCH_INSTRUCTIONS          = 5,
> > +     SBI_PMU_HW_BRANCH_MISSES                = 6,
> > +     SBI_PMU_HW_BUS_CYCLES                   = 7,
> > +     SBI_PMU_HW_STALLED_CYCLES_FRONTEND      = 8,
> > +     SBI_PMU_HW_STALLED_CYCLES_BACKEND       = 9,
> > +     SBI_PMU_HW_REF_CPU_CYCLES               = 10,
> > +
> > +     SBI_PMU_HW_GENERAL_MAX,
> > +};
> > +
> > +/**
> > + * Special "firmware" events provided by the firmware, even if the
> > hardware
> > + * does not support performance events. These events are encoded as
> > a raw
> > + * event type in Linux kernel perf framework.
> > + */
> > +enum sbi_pmu_fw_generic_events_t {
> > +     SBI_PMU_FW_MISALIGNED_LOAD      = 0,
> > +     SBI_PMU_FW_MISALIGNED_STORE     = 1,
> > +     SBI_PMU_FW_ACCESS_LOAD          = 2,
> > +     SBI_PMU_FW_ACCESS_STORE         = 3,
> > +     SBI_PMU_FW_ILLEGAL_INSN         = 4,
> > +     SBI_PMU_FW_SET_TIMER            = 5,
> > +     SBI_PMU_FW_IPI_SENT             = 6,
> > +     SBI_PMU_FW_IPI_RECVD            = 7,
> > +     SBI_PMU_FW_FENCE_I_SENT         = 8,
> > +     SBI_PMU_FW_FENCE_I_RECVD        = 9,
> > +     SBI_PMU_FW_SFENCE_VMA_SENT      = 10,
> > +     SBI_PMU_FW_SFENCE_VMA_RCVD      = 11,
> > +     SBI_PMU_FW_SFENCE_VMA_ASID_SENT = 12,
> > +     SBI_PMU_FW_SFENCE_VMA_ASID_RCVD = 13,
> > +
> > +     SBI_PMU_FW_HFENCE_GVMA_SENT     = 14,
> > +     SBI_PMU_FW_HFENCE_GVMA_RCVD     = 15,
> > +     SBI_PMU_FW_HFENCE_GVMA_VMID_SENT = 16,
> > +     SBI_PMU_FW_HFENCE_GVMA_VMID_RCVD = 17,
> > +
> > +     SBI_PMU_FW_HFENCE_VVMA_SENT     = 18,
> > +     SBI_PMU_FW_HFENCE_VVMA_RCVD     = 19,
> > +     SBI_PMU_FW_HFENCE_VVMA_ASID_SENT = 20,
> > +     SBI_PMU_FW_HFENCE_VVMA_ASID_RCVD = 21,
> > +     SBI_PMU_FW_MAX,
> > +};
> > +
> > +/* SBI PMU event types */
> > +enum sbi_pmu_event_type {
> > +     SBI_PMU_EVENT_TYPE_HW = 0x0,
> > +     SBI_PMU_EVENT_TYPE_CACHE = 0x1,
> > +     SBI_PMU_EVENT_TYPE_RAW = 0x2,
> > +     SBI_PMU_EVENT_TYPE_FW = 0xf,
> > +};
> > +
> > +/* SBI PMU event types */
> > +enum sbi_pmu_ctr_type {
> > +     SBI_PMU_CTR_TYPE_HW = 0x0,
> > +     SBI_PMU_CTR_TYPE_FW,
> > +};
> > +
> > +/* Flags defined for config matching function */
> > +#define SBI_PMU_CFG_FLAG_SKIP_MATCH  (1 << 0)
> > +#define SBI_PMU_CFG_FLAG_CLEAR_VALUE (1 << 1)
> > +#define SBI_PMU_CFG_FLAG_AUTO_START  (1 << 2)
> > +#define SBI_PMU_CFG_FLAG_SET_MINH    (1 << 3)
> > +#define SBI_PMU_CFG_FLAG_SET_SINH    (1 << 4)
> > +#define SBI_PMU_CFG_FLAG_SET_UINH    (1 << 5)
> > +#define SBI_PMU_CFG_FLAG_SET_VSINH   (1 << 6)
> > +#define SBI_PMU_CFG_FLAG_SET_VUINH   (1 << 7)
>
> It looks like you have a typo here, the defines in OpenSBI are
> different:
>
> #define SBI_PMU_CFG_FLAG_SET_VUINH      (1 << 3)
> #define SBI_PMU_CFG_FLAG_SET_VSINH      (1 << 4)
> #define SBI_PMU_CFG_FLAG_SET_UINH       (1 << 5)
> #define SBI_PMU_CFG_FLAG_SET_SINH       (1 << 6)
> #define SBI_PMU_CFG_FLAG_SET_MINH       (1 << 7)
>
>

Thanks for catching that. OpenSBI has the correct one as per the
specification[1].
I will fix it in the next version.

[1] https://github.com/riscv-software-src/opensbi/blob/master/include/sbi/sbi_ecall_interface.h

> > +
> > +/* Flags defined for counter start function */
> > +#define SBI_PMU_START_FLAG_SET_INIT_VALUE (1 << 0)
> > +
> > +/* Flags defined for counter stop function */
> > +#define SBI_PMU_STOP_FLAG_RESET (1 << 0)
> > +
> >  #define SBI_SPEC_VERSION_DEFAULT     0x1
> >  #define SBI_SPEC_VERSION_MAJOR_SHIFT 24
> >  #define SBI_SPEC_VERSION_MAJOR_MASK  0x7f
> > @@ -82,6 +176,9 @@ enum sbi_hsm_hart_status {
> >  #define SBI_ERR_INVALID_PARAM        -3
> >  #define SBI_ERR_DENIED               -4
> >  #define SBI_ERR_INVALID_ADDRESS      -5
> > +#define SBI_ERR_ALREADY_AVAILABLE -6
> > +#define SBI_ERR_ALREADY_STARTED -7
> > +#define SBI_ERR_ALREADY_STOPPED -8
> >
> >  extern unsigned long sbi_spec_version;
> >  struct sbiret {
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv