[1/2] dt-bindings: edac: Add DT bindings for Kryo EDAC
diff mbox series

Message ID 0101016ed57a3259-eee09e9e-e99a-40f1-ab1c-63e58a42615c-000000@us-west-2.amazonses.com
State Changes Requested
Headers show
Series
  • Add EDAC support for Kryo CPU core caches
Related show

Checks

Context Check Description
robh/dt-meta-schema success
robh/checkpatch success

Commit Message

Sai Prakash Ranjan Dec. 5, 2019, 9:53 a.m. UTC
This adds DT bindings for Kryo EDAC implemented with RAS
extensions on KRYO{3,4}XX CPU cores for reporting of cache
errors.

Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 .../bindings/edac/qcom-kryo-edac.yaml         | 67 +++++++++++++++++++
 1 file changed, 67 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml

Comments

Rob Herring Dec. 18, 2019, 11:37 p.m. UTC | #1
On Thu, Dec 05, 2019 at 09:53:05AM +0000, Sai Prakash Ranjan wrote:
> This adds DT bindings for Kryo EDAC implemented with RAS
> extensions on KRYO{3,4}XX CPU cores for reporting of cache
> errors.
> 
> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> ---
>  .../bindings/edac/qcom-kryo-edac.yaml         | 67 +++++++++++++++++++
>  1 file changed, 67 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
> 
> diff --git a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
> new file mode 100644
> index 000000000000..1a39429a73b4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
> @@ -0,0 +1,67 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Kryo Error Detection and Correction(EDAC)
> +
> +maintainers:
> +  - Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> +
> +description: |
> +  Kryo EDAC is defined to describe on-chip error detection and correction
> +  for the Kryo CPU cores which implement RAS extensions. It will report
> +  all Single Bit Errors and Double Bit Errors found in L1/L2 caches in
> +  in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache errors
> +  are reported in ERR1STATUS and ERR1MISC0 registers.
> +    ERXSTATUS_EL1 - Selected Error Record Primary Status Register, EL1
> +    ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, EL1
> +    ERR1STATUS - Error Record Primary Status Register
> +    ERR1MISC0 - Error Record Miscellaneous Register 0
> +  Current implementation of Kryo ECC(Error Correcting Code) mechanism is
> +  based on interrupts.
> +
> +properties:
> +  compatible:
> +    enum:
> +      - qcom,kryo-edac
> +
> +  interrupts:
> +    minItems: 1
> +    maxItems: 4
> +    items:
> +      - description: l1-l2 cache faultirq interrupt
> +      - description: l1-l2 cache errirq interrupt
> +      - description: l3-scu cache errirq interrupt
> +      - description: l3-scu cache faultirq interrupt
> +
> +  interrupt-names:
> +    minItems: 1
> +    maxItems: 4

You are saying only these combinations are valid:

l1-l2-faultirq

l1-l2-faultirq
l1-l2-errirq

l1-l2-faultirq
l1-l2-errirq
l3-scu-errirq

l1-l2-faultirq
l1-l2-errirq
l3-scu-errirq
l3-scu-faultirq

Is that your intent?

> +    items:
> +      - const: l1-l2-faultirq
> +      - const: l1-l2-errirq
> +      - const: l3-scu-errirq
> +      - const: l3-scu-faultirq
> +
> +required:
> +  - compatible
> +  - interrupts
> +  - interrupt-names
> +
> +examples:
> +  - |
> +    #include <dt-bindings/interrupt-controller/arm-gic.h>
> +
> +    kryo_edac {
> +      compatible = "qcom,kryo-edac";
> +      interrupts = <GIC_PPI 6 IRQ_TYPE_LEVEL_HIGH>,
> +                   <GIC_PPI 7 IRQ_TYPE_LEVEL_HIGH>,
> +                   <GIC_SPI 34 IRQ_TYPE_LEVEL_HIGH>,
> +                   <GIC_SPI 35 IRQ_TYPE_LEVEL_HIGH>;
> +      interrupt-names = "l1-l2-faultirq",
> +                        "l1-l2-errirq",
> +                        "l3-scu-errirq",
> +                        "l3-scu-faultirq";
> +    };
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
>
Sai Prakash Ranjan Dec. 19, 2019, 6:50 a.m. UTC | #2
Hi Rob,

On 2019-12-19 05:07, Rob Herring wrote:
> On Thu, Dec 05, 2019 at 09:53:05AM +0000, Sai Prakash Ranjan wrote:
>> This adds DT bindings for Kryo EDAC implemented with RAS
>> extensions on KRYO{3,4}XX CPU cores for reporting of cache
>> errors.
>> 
>> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>> ---
>>  .../bindings/edac/qcom-kryo-edac.yaml         | 67 
>> +++++++++++++++++++
>>  1 file changed, 67 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
>> 
>> diff --git 
>> a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml 
>> b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
>> new file mode 100644
>> index 000000000000..1a39429a73b4
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
>> @@ -0,0 +1,67 @@
>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Kryo Error Detection and Correction(EDAC)
>> +
>> +maintainers:
>> +  - Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>> +
>> +description: |
>> +  Kryo EDAC is defined to describe on-chip error detection and 
>> correction
>> +  for the Kryo CPU cores which implement RAS extensions. It will 
>> report
>> +  all Single Bit Errors and Double Bit Errors found in L1/L2 caches 
>> in
>> +  in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache 
>> errors
>> +  are reported in ERR1STATUS and ERR1MISC0 registers.
>> +    ERXSTATUS_EL1 - Selected Error Record Primary Status Register, 
>> EL1
>> +    ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, 
>> EL1
>> +    ERR1STATUS - Error Record Primary Status Register
>> +    ERR1MISC0 - Error Record Miscellaneous Register 0
>> +  Current implementation of Kryo ECC(Error Correcting Code) mechanism 
>> is
>> +  based on interrupts.
>> +
>> +properties:
>> +  compatible:
>> +    enum:
>> +      - qcom,kryo-edac
>> +
>> +  interrupts:
>> +    minItems: 1
>> +    maxItems: 4
>> +    items:
>> +      - description: l1-l2 cache faultirq interrupt
>> +      - description: l1-l2 cache errirq interrupt
>> +      - description: l3-scu cache errirq interrupt
>> +      - description: l3-scu cache faultirq interrupt
>> +
>> +  interrupt-names:
>> +    minItems: 1
>> +    maxItems: 4
> 
> You are saying only these combinations are valid:
> 
> l1-l2-faultirq
> 
> l1-l2-faultirq
> l1-l2-errirq
> 
> l1-l2-faultirq
> l1-l2-errirq
> l3-scu-errirq
> 
> l1-l2-faultirq
> l1-l2-errirq
> l3-scu-errirq
> l3-scu-faultirq
> 
> Is that your intent?
> 

No, I want any combination of interrupts to be valid with atleast one 
interrupt as mandatory.
I thought specifying minItems as 1 and maxItems as 4 will take care of 
this,  am I doing something wrong?

Thanks,
Sai
Rob Herring Dec. 19, 2019, 1:58 p.m. UTC | #3
On Thu, Dec 19, 2019 at 12:50 AM Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> Hi Rob,
>
> On 2019-12-19 05:07, Rob Herring wrote:
> > On Thu, Dec 05, 2019 at 09:53:05AM +0000, Sai Prakash Ranjan wrote:
> >> This adds DT bindings for Kryo EDAC implemented with RAS
> >> extensions on KRYO{3,4}XX CPU cores for reporting of cache
> >> errors.
> >>
> >> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> >> ---
> >>  .../bindings/edac/qcom-kryo-edac.yaml         | 67
> >> +++++++++++++++++++
> >>  1 file changed, 67 insertions(+)
> >>  create mode 100644
> >> Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
> >>
> >> diff --git
> >> a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
> >> b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
> >> new file mode 100644
> >> index 000000000000..1a39429a73b4
> >> --- /dev/null
> >> +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
> >> @@ -0,0 +1,67 @@
> >> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> >> +%YAML 1.2
> >> +---
> >> +$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml#
> >> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> >> +
> >> +title: Kryo Error Detection and Correction(EDAC)
> >> +
> >> +maintainers:
> >> +  - Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> >> +
> >> +description: |
> >> +  Kryo EDAC is defined to describe on-chip error detection and
> >> correction
> >> +  for the Kryo CPU cores which implement RAS extensions. It will
> >> report
> >> +  all Single Bit Errors and Double Bit Errors found in L1/L2 caches
> >> in
> >> +  in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache
> >> errors
> >> +  are reported in ERR1STATUS and ERR1MISC0 registers.
> >> +    ERXSTATUS_EL1 - Selected Error Record Primary Status Register,
> >> EL1
> >> +    ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0,
> >> EL1
> >> +    ERR1STATUS - Error Record Primary Status Register
> >> +    ERR1MISC0 - Error Record Miscellaneous Register 0
> >> +  Current implementation of Kryo ECC(Error Correcting Code) mechanism
> >> is
> >> +  based on interrupts.
> >> +
> >> +properties:
> >> +  compatible:
> >> +    enum:
> >> +      - qcom,kryo-edac
> >> +
> >> +  interrupts:
> >> +    minItems: 1
> >> +    maxItems: 4
> >> +    items:
> >> +      - description: l1-l2 cache faultirq interrupt
> >> +      - description: l1-l2 cache errirq interrupt
> >> +      - description: l3-scu cache errirq interrupt
> >> +      - description: l3-scu cache faultirq interrupt
> >> +
> >> +  interrupt-names:
> >> +    minItems: 1
> >> +    maxItems: 4
> >
> > You are saying only these combinations are valid:
> >
> > l1-l2-faultirq
> >
> > l1-l2-faultirq
> > l1-l2-errirq
> >
> > l1-l2-faultirq
> > l1-l2-errirq
> > l3-scu-errirq
> >
> > l1-l2-faultirq
> > l1-l2-errirq
> > l3-scu-errirq
> > l3-scu-faultirq
> >
> > Is that your intent?
> >
>
> No, I want any combination of interrupts to be valid with atleast one
> interrupt as mandatory.
> I thought specifying minItems as 1 and maxItems as 4 will take care of
> this,  am I doing something wrong?

Interrupts (really all properties) have a defined order in DT and an
'items' list defines both the order and index. You'll need to use
oneOf and list out the possibilities. Stick to ones you actually need.

Rob
Sai Prakash Ranjan Dec. 19, 2019, 2:48 p.m. UTC | #4
On 2019-12-19 19:28, Rob Herring wrote:
>> > Is that your intent?
>> >
>> 
>> No, I want any combination of interrupts to be valid with atleast one
>> interrupt as mandatory.
>> I thought specifying minItems as 1 and maxItems as 4 will take care of
>> this,  am I doing something wrong?
> 
> Interrupts (really all properties) have a defined order in DT and an
> 'items' list defines both the order and index. You'll need to use
> oneOf and list out the possibilities. Stick to ones you actually need.
> 

Thanks, I will make the change in the next spin.

-Sai
James Morse Jan. 15, 2020, 6:48 p.m. UTC | #5
Hi Sai,

(CC: +Tyler)

On 05/12/2019 09:53, Sai Prakash Ranjan wrote:
> This adds DT bindings for Kryo EDAC implemented with RAS
> extensions on KRYO{3,4}XX CPU cores for reporting of cache
> errors.

KRYO{3,4}XX isn't the only SoC with the RAS extensions. The DT needs to convey the range
of ways this armv8 RAS extensions stuff can be wired up.

The folk who look after the ACPI specs have made a start:
https://static.docs.arm.com/den0085/a/DEN0085_RAS_ACPI_1.0_BETA_1.pdf

(I suspect that isn't the latest version, I'll try and find out)

I'd like the ACPI table and DT to convey the same information so that we don't need to
convert or infer things in the driver. If something is missing, we should get it added!


> diff --git a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
> new file mode 100644
> index 000000000000..1a39429a73b4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
> @@ -0,0 +1,67 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Kryo Error Detection and Correction(EDAC)
> +
> +maintainers:
> +  - Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> +
> +description: |
> +  Kryo EDAC is defined to describe on-chip error detection and correction
> +  for the Kryo CPU cores which implement RAS extensions.

Please don't make this Kryo specific, otherwise this binding becomes an extra thing we
need to support with a 'v8.2 RAS' driver.

What I'd like is a single 'armv82_ras' edac driver that handles faults and errors reported
by interrupts, and interacts with the arch code's handling of 'external aborts'. This
should work for all platforms using v8.2 RAS and later.


> +  It will report
> +  all Single Bit Errors and Double Bit Errors found in L1/L2 caches in
> +  in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache errors
> +  are reported in ERR1STATUS and ERR1MISC0 registers.
> +    ERXSTATUS_EL1 - Selected Error Record Primary Status Register, EL1
> +    ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, EL1
> +    ERR1STATUS - Error Record Primary Status Register
> +    ERR1MISC0 - Error Record Miscellaneous Register 0
> +  Current implementation of Kryo ECC(Error Correcting Code) mechanism is
> +  based on interrupts.

Your SoC picked the system registers as the interface to these component's registers.
The binding would need to specify which index the 'l1-l2' records start at, and how many
there are. The same for the 'l3-scu'. You can't hard code these, they are different on
other platforms.

There is also an MMIO interface which needs a base address, along with the index and
ranges. (which may be different). The same component may use both the system register and
the MMIO interface.

This stuff is likely to vary on big/little systems, so you need a way of describing which
CPUs the settings refer to. This probably isn't something the ACPI tables capture as ACPI
machines are typically homogenous.


Thanks,

James
Sai Prakash Ranjan Jan. 24, 2020, 2:21 p.m. UTC | #6
Hi James,

On 2020-01-16 00:18, James Morse wrote:
> Hi Sai,
> 
> (CC: +Tyler)
> 
> On 05/12/2019 09:53, Sai Prakash Ranjan wrote:
>> This adds DT bindings for Kryo EDAC implemented with RAS
>> extensions on KRYO{3,4}XX CPU cores for reporting of cache
>> errors.
> 
> KRYO{3,4}XX isn't the only SoC with the RAS extensions. The DT needs
> to convey the range
> of ways this armv8 RAS extensions stuff can be wired up.
> 

Right, but I was going for Kryo specific implementation and hence the 
binding as such.

> The folk who look after the ACPI specs have made a start:
> https://static.docs.arm.com/den0085/a/DEN0085_RAS_ACPI_1.0_BETA_1.pdf
> 
> (I suspect that isn't the latest version, I'll try and find out)
> 

That would be helpful, thanks.

> I'd like the ACPI table and DT to convey the same information so that
> we don't need to
> convert or infer things in the driver. If something is missing, we
> should get it added!
> 

Sure, I think it is decided now that kernel first RAS implementation 
will be generic.

> 
>> diff --git 
>> a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml 
>> b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
>> new file mode 100644
>> index 000000000000..1a39429a73b4
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
>> @@ -0,0 +1,67 @@
>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Kryo Error Detection and Correction(EDAC)
>> +
>> +maintainers:
>> +  - Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>> +
>> +description: |
>> +  Kryo EDAC is defined to describe on-chip error detection and 
>> correction
>> +  for the Kryo CPU cores which implement RAS extensions.
> 
> Please don't make this Kryo specific, otherwise this binding becomes
> an extra thing we
> need to support with a 'v8.2 RAS' driver.
> 
> What I'd like is a single 'armv82_ras' edac driver that handles faults
> and errors reported
> by interrupts, and interacts with the arch code's handling of
> 'external aborts'. This
> should work for all platforms using v8.2 RAS and later.
> 
> 

Ok sure.

>> +  It will report
>> +  all Single Bit Errors and Double Bit Errors found in L1/L2 caches 
>> in
>> +  in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache 
>> errors
>> +  are reported in ERR1STATUS and ERR1MISC0 registers.
>> +    ERXSTATUS_EL1 - Selected Error Record Primary Status Register, 
>> EL1
>> +    ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, 
>> EL1
>> +    ERR1STATUS - Error Record Primary Status Register
>> +    ERR1MISC0 - Error Record Miscellaneous Register 0
>> +  Current implementation of Kryo ECC(Error Correcting Code) mechanism 
>> is
>> +  based on interrupts.
> 
> Your SoC picked the system registers as the interface to these
> component's registers.
> The binding would need to specify which index the 'l1-l2' records
> start at, and how many
> there are. The same for the 'l3-scu'. You can't hard code these, they
> are different on
> other platforms.
> 

Ok will keep this in mind for the next version.

> There is also an MMIO interface which needs a base address, along with
> the index and
> ranges. (which may be different). The same component may use both the
> system register and
> the MMIO interface.
> 

I have some doubts here, Where do I get this info? Will this be 
implementation specific?

> This stuff is likely to vary on big/little systems, so you need a way
> of describing which
> CPUs the settings refer to. This probably isn't something the ACPI
> tables capture as ACPI
> machines are typically homogenous.
> 

Our SoCs are based on big.LITTLE arch, so this will be needed.

Thanks,
Sai

Patch
diff mbox series

diff --git a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
new file mode 100644
index 000000000000..1a39429a73b4
--- /dev/null
+++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
@@ -0,0 +1,67 @@ 
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Kryo Error Detection and Correction(EDAC)
+
+maintainers:
+  - Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
+
+description: |
+  Kryo EDAC is defined to describe on-chip error detection and correction
+  for the Kryo CPU cores which implement RAS extensions. It will report
+  all Single Bit Errors and Double Bit Errors found in L1/L2 caches in
+  in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache errors
+  are reported in ERR1STATUS and ERR1MISC0 registers.
+    ERXSTATUS_EL1 - Selected Error Record Primary Status Register, EL1
+    ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, EL1
+    ERR1STATUS - Error Record Primary Status Register
+    ERR1MISC0 - Error Record Miscellaneous Register 0
+  Current implementation of Kryo ECC(Error Correcting Code) mechanism is
+  based on interrupts.
+
+properties:
+  compatible:
+    enum:
+      - qcom,kryo-edac
+
+  interrupts:
+    minItems: 1
+    maxItems: 4
+    items:
+      - description: l1-l2 cache faultirq interrupt
+      - description: l1-l2 cache errirq interrupt
+      - description: l3-scu cache errirq interrupt
+      - description: l3-scu cache faultirq interrupt
+
+  interrupt-names:
+    minItems: 1
+    maxItems: 4
+    items:
+      - const: l1-l2-faultirq
+      - const: l1-l2-errirq
+      - const: l3-scu-errirq
+      - const: l3-scu-faultirq
+
+required:
+  - compatible
+  - interrupts
+  - interrupt-names
+
+examples:
+  - |
+    #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+    kryo_edac {
+      compatible = "qcom,kryo-edac";
+      interrupts = <GIC_PPI 6 IRQ_TYPE_LEVEL_HIGH>,
+                   <GIC_PPI 7 IRQ_TYPE_LEVEL_HIGH>,
+                   <GIC_SPI 34 IRQ_TYPE_LEVEL_HIGH>,
+                   <GIC_SPI 35 IRQ_TYPE_LEVEL_HIGH>;
+      interrupt-names = "l1-l2-faultirq",
+                        "l1-l2-errirq",
+                        "l3-scu-errirq",
+                        "l3-scu-faultirq";
+    };