diff mbox

[v10,3/5] Documentation: Add documentation for the APM X-Gene SoC EDAC DTS binding

Message ID 1431991481-25684-4-git-send-email-lho@apm.com
State Superseded, archived
Headers show

Commit Message

Loc Ho May 18, 2015, 11:24 p.m. UTC
This patch adds documentation for the APM X-Gene SoC EDAC DTS binding.

Signed-off-by: Loc Ho <lho@apm.com>
---
 .../devicetree/bindings/edac/apm-xgene-edac.txt    |   74 ++++++++++++++++++++
 1 files changed, 74 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/edac/apm-xgene-edac.txt

Comments

Arnd Bergmann May 22, 2015, 8:02 a.m. UTC | #1
On Monday 18 May 2015 17:24:39 Loc Ho wrote:
> This patch adds documentation for the APM X-Gene SoC EDAC DTS binding.
> 
> Signed-off-by: Loc Ho <lho@apm.com>

This is starting to look pretty good. One final comment:

> +Example:
> +	csw: csw@7e200000 {
> +		compatible = "syscon";
> +		reg = <0x0 0x7e200000 0x0 0x1000>;
> +	};
> +
> +	mcba: mcba@7e700000 {
> +		compatible = "syscon";
> +		reg = <0x0 0x7e700000 0x0 0x1000>;
> +	};
> +
> +	mcbb: mcbb@7e720000 {
> +		compatible = "syscon";
> +		reg = <0x0 0x7e720000 0x0 0x1000>;
> +	};
> +
> +	efuse: efuse@1054a000 {
> +		compatible = "syscon";
> +		reg = <0x0 0x1054a000 0x0 0x20>;
> +	};

I think it would be helpful to assign a proper compatible string (in
addition o syscon, not replacing it) for each of these, just in case
we ever need to add a proper driver for one of them, or in case a
future chip uses an unmodified edac block with a slightly different
set of fuses or other registers that are referenced here.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann May 22, 2015, 8:23 a.m. UTC | #2
On Monday 18 May 2015 17:24:40 Loc Ho wrote:
> This patch adds support for the APM X-Gene SoC EDAC driver.
> 
> Signed-off-by: Loc Ho <lho@apm.com>
> ---
>  drivers/edac/Kconfig      |    7 +
>  drivers/edac/Makefile     |    1 +
>  drivers/edac/xgene_edac.c | 1313 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 1321 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/edac/xgene_edac.c

Looks reasonable overall, good job. Just a few details that are left:

> +
> +static int xgene_edac_pcp_rd(struct xgene_edac *edac, u32 reg, u32 *val);
> +static int xgene_edac_pcp_clrbits(struct xgene_edac *edac, u32 reg,
> +				  u32 bits_mask);
> +static int xgene_edac_pcp_setbits(struct xgene_edac *edac, u32 reg,
> +				  u32 bits_mask);
> +static int xgene_edac_csw_rd(struct xgene_edac *edac, u32 reg, u32 *val);
> +static int xgene_edac_mcba_rd(struct xgene_edac *edac, u32 reg, u32 *val);
> +static int xgene_edac_mcbb_rd(struct xgene_edac *edac, u32 reg, u32 *val);
> +static int xgene_edac_efuse_rd(struct xgene_edac *edac, u32 reg, u32 *val);

It's better to avoid any forward declaration for local functions, and instead
reorder the file to move the definition before the first use.

> +static int edac_mc_idx;
> +static int edac_mc_active_mask;
> +static int edac_mc_registered_mask;
> +static DEFINE_MUTEX(xgene_edac_mc_lock);

It would also be best to avoid global variables, but it seems that at
least the edac_mc_idx is needed to work with the EDAC subsystem.
Maybe Boris has an idea for how to avoid it.

> +#ifdef CONFIG_EDAC_DEBUG
> +static ssize_t xgene_edac_mc_err_inject_write(struct file *file,
> +					      const char __user *data,
> +					      size_t count, loff_t *ppos)
> +{
> +	struct mem_ctl_info *mci = file->private_data;
> +	struct xgene_edac_mc_ctx *ctx = mci->pvt_info;
> +	int i;
> +
> +	for (i = 0; i < MCU_MAX_RANK; i++) {
> +		writel(MCU_ESRR_MULTUCERR_MASK | MCU_ESRR_BACKUCERR_MASK |
> +		       MCU_ESRR_DEMANDUCERR_MASK | MCU_ESRR_CERR_MASK,
> +		       ctx->mcu_csr + MCUESRRA0 + i * MCU_RANK_STRIDE);
> +	}
> +	return count;
> +}
> +
> +static const struct file_operations xgene_edac_mc_debug_inject_fops = {
> +	.open = simple_open,
> +	.write = xgene_edac_mc_err_inject_write,
> +	.llseek = generic_file_llseek,
> +};
> +
> +static void xgene_edac_mc_create_debugfs_node(struct mem_ctl_info *mci)
> +{
> +	if (!mci->debugfs)
> +		return;
> +
> +	debugfs_create_file("inject_ctrl", S_IWUSR, mci->debugfs, mci,
> +			    &xgene_edac_mc_debug_inject_fops);
> +}
> +#else
> +static void xgene_edac_mc_create_debugfs_node(struct mem_ctl_info *mci)
> +{
> +}
> +#endif

Here, it would be better style to avoid the #ifdef. If you just use

static void xgene_edac_mc_create_debugfs_node(struct mem_ctl_info *mci)
{
	if (! IS_ENABLED(CONFIG_EDAC_DEBUG) || !mci->debugfs)
		return;

the compiler will drop all the debug code when that option is not
set, but will still check the code for correctness, and emit
any warnings that one might want to see.


> +static int xgene_edac_csw_rd(struct xgene_edac *edac, u32 reg, u32 *val)
> +{
> +	if (edac == NULL)
> +		return -EINVAL;
> +
> +	return regmap_read(edac->csw_map, reg, val);
> +}
> +
> +static int xgene_edac_mcba_rd(struct xgene_edac *edac, u32 reg, u32 *val)
> +{
> +	if (edac == NULL)
> +		return -EINVAL;
> +
> +	return regmap_read(edac->mcba_map, reg, val);
> +}
> +
> +static int xgene_edac_mcbb_rd(struct xgene_edac *edac, u32 reg, u32 *val)
> +{
> +	if (edac == NULL)
> +		return -EINVAL;
> +
> +	return regmap_read(edac->mcbb_map, reg, val);
> +}
> +
> +static int xgene_edac_efuse_rd(struct xgene_edac *edac, u32 reg, u32 *val)
> +{
> +	if (edac == NULL)
> +		return -EINVAL;
> +
> +	return regmap_read(edac->efuse_map, reg, val);
> +}

I would drop the NULL check, and just refuse to probe the
device if any of the regmaps are unavailable. You list the
properties as 'required' in DT, so if any of the pointers
are NULL here, that is really a bug somewhere (kernel or DT),
not a user error, so -EINVAL is not appropriate.

Also, these all have very few callers, so just open-code the
regmap_read() instead of having wrappers.

> +
> +static int xgene_edac_pcp_setbits(struct xgene_edac *edac, u32 reg,
> +				  u32 bits_mask)
> +{
> +	u32 val;
> +
> +	if (edac == NULL)
> +		return -EINVAL;
> +
> +	spin_lock(&edac->lock);
> +	val = readl(edac->pcp_csr + reg);
> +	val |= bits_mask;
> +	writel(val, edac->pcp_csr + reg);
> +	spin_unlock(&edac->lock);
> +	return 0;
> +}

If you cache the interrupt masks in the xgene_edac structure, you
can avoid the read-modify-write functions for updating the
register in place.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov May 22, 2015, 8:46 a.m. UTC | #3
On Fri, May 22, 2015 at 10:23:11AM +0200, Arnd Bergmann wrote:
> > +static int edac_mc_idx;
> > +static int edac_mc_active_mask;
> > +static int edac_mc_registered_mask;
> > +static DEFINE_MUTEX(xgene_edac_mc_lock);
> 
> It would also be best to avoid global variables, but it seems that at
> least the edac_mc_idx is needed to work with the EDAC subsystem.
> Maybe Boris has an idea for how to avoid it.

Right, so AFAICT this is used in

	edac_mc_alloc(edac_mc_idx++

and basically we're supplying the memory controller numbers in the order
we're calling xgene_edac_mc_add(). Perhaps we need to make the memory
controller numbering explicit with DT...
Arnd Bergmann May 22, 2015, 8:55 a.m. UTC | #4
On Friday 22 May 2015 10:46:25 Borislav Petkov wrote:
> On Fri, May 22, 2015 at 10:23:11AM +0200, Arnd Bergmann wrote:
> > > +static int edac_mc_idx;
> > > +static int edac_mc_active_mask;
> > > +static int edac_mc_registered_mask;
> > > +static DEFINE_MUTEX(xgene_edac_mc_lock);
> > 
> > It would also be best to avoid global variables, but it seems that at
> > least the edac_mc_idx is needed to work with the EDAC subsystem.
> > Maybe Boris has an idea for how to avoid it.
> 
> Right, so AFAICT this is used in
> 
>         edac_mc_alloc(edac_mc_idx++
> 
> and basically we're supplying the memory controller numbers in the order
> we're calling xgene_edac_mc_add(). Perhaps we need to make the memory
> controller numbering explicit with DT...

I see now that the number is also used in xgene_edac_mc_is_active()
as an index for the MCBADDRMR register, which seems very fragile,
as it relies on the DT nodes getting seen in the same order by the
driver that the hardware blocks are used in.

Better add an explicit DT property for the index that is defined
to be the same as MCBADDRMR bits, e.g. 

                       edacmc@7e800000 {
                             compatible = "apm,xgene-edac-mc";
                             reg = <0x0 0x7e800000 0x0 0x1000>;
			      memory-controller = <0>;
                       };

or use a reference to the normal memory controller DT node
outside of the EDAC, and put a number in that one.

Is there a strict 1:1 relation between the DIMM slot numbering
on the board and the memory controller number within the EDAC
registers?

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann May 22, 2015, 8:59 a.m. UTC | #5
On Monday 18 May 2015 17:24:40 Loc Ho wrote:
> +       tmp_ctx.mcu_id = ((res.start >> 16) & 0xF) / 4;
> 

Just found this line: you are effectively hardcoding the physical
register layout of the chip in the driver here, to get the index
of the memory controller. This is a very bad idea as it prevents
you from reusing this driver for another chip that puts the
same registers in a different place, don't do that.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Loc Ho May 22, 2015, 6:25 p.m. UTC | #6
Hi,

>> > +static int edac_mc_idx;
>> > +static int edac_mc_active_mask;
>> > +static int edac_mc_registered_mask;
>> > +static DEFINE_MUTEX(xgene_edac_mc_lock);
>>
>> It would also be best to avoid global variables, but it seems that at
>> least the edac_mc_idx is needed to work with the EDAC subsystem.
>> Maybe Boris has an idea for how to avoid it.
>
> Right, so AFAICT this is used in
>
>         edac_mc_alloc(edac_mc_idx++
>
> and basically we're supplying the memory controller numbers in the order
> we're calling xgene_edac_mc_add(). Perhaps we need to make the memory
> controller numbering explicit with DT...
>

Let me move them all into the top level driver given that the MC is
now a subnode.

-Loc
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Loc Ho May 22, 2015, 6:28 p.m. UTC | #7
Hi,

>> > > +static int edac_mc_idx;
>> > > +static int edac_mc_active_mask;
>> > > +static int edac_mc_registered_mask;
>> > > +static DEFINE_MUTEX(xgene_edac_mc_lock);
>> >
>> > It would also be best to avoid global variables, but it seems that at
>> > least the edac_mc_idx is needed to work with the EDAC subsystem.
>> > Maybe Boris has an idea for how to avoid it.
>>
>> Right, so AFAICT this is used in
>>
>>         edac_mc_alloc(edac_mc_idx++
>>
>> and basically we're supplying the memory controller numbers in the order
>> we're calling xgene_edac_mc_add(). Perhaps we need to make the memory
>> controller numbering explicit with DT...
>
> I see now that the number is also used in xgene_edac_mc_is_active()
> as an index for the MCBADDRMR register, which seems very fragile,
> as it relies on the DT nodes getting seen in the same order by the
> driver that the hardware blocks are used in.
>
> Better add an explicit DT property for the index that is defined
> to be the same as MCBADDRMR bits, e.g.
>
>                        edacmc@7e800000 {
>                              compatible = "apm,xgene-edac-mc";
>                              reg = <0x0 0x7e800000 0x0 0x1000>;
>                               memory-controller = <0>;
>                        };
>
> or use a reference to the normal memory controller DT node
> outside of the EDAC, and put a number in that one.
>
> Is there a strict 1:1 relation between the DIMM slot numbering
> on the board and the memory controller number within the EDAC
> registers?

Yes... The status active bit position is strictly tie to an
controller. Like the suggestion with memory-controller property.

-Loc
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/edac/apm-xgene-edac.txt b/Documentation/devicetree/bindings/edac/apm-xgene-edac.txt
new file mode 100644
index 0000000..d8f2782
--- /dev/null
+++ b/Documentation/devicetree/bindings/edac/apm-xgene-edac.txt
@@ -0,0 +1,74 @@ 
+* APM X-Gene SoC EDAC node
+
+EDAC node is defined to describe on-chip error detection and correction.
+The follow error types are supported:
+
+  memory controller	- Memory controller
+  PMD (L1/L2)		- Processor module unit (PMD) L1/L2 cache
+
+The following section describes the EDAC DT node binding.
+
+Required properties:
+- compatible		: Shall be "apm,xgene-edac".
+- regmap-csw		: Regmap of the CPU switch fabric (CSW) resource.
+- regmap-mcba		: Regmap of the MCB-A (memory bridge) resource.
+- regmap-mcbb		: Regmap of the MCB-B (memory bridge) resource.
+- regmap-efuse		: Regmap of the PMD efuse resource.
+- reg			: First resource shall be the CPU bus (PCP) resource.
+- interrupts            : Interrupt-specifier for MCU, PMD, L3, or SoC error
+			  IRQ(s).
+
+Required properties for memory controller subnode:
+- compatible		: Shall be "apm,xgene-edac-mc".
+- reg			: First resource shall be the memory controller unit
+                          (MCU) resource.
+
+Required properties for PMD subnode:
+- compatible		: Shall be "apm,xgene-edac-pmd".
+- reg			: First resource shall be the PMD resource.
+
+Example:
+	csw: csw@7e200000 {
+		compatible = "syscon";
+		reg = <0x0 0x7e200000 0x0 0x1000>;
+	};
+
+	mcba: mcba@7e700000 {
+		compatible = "syscon";
+		reg = <0x0 0x7e700000 0x0 0x1000>;
+	};
+
+	mcbb: mcbb@7e720000 {
+		compatible = "syscon";
+		reg = <0x0 0x7e720000 0x0 0x1000>;
+	};
+
+	efuse: efuse@1054a000 {
+		compatible = "syscon";
+		reg = <0x0 0x1054a000 0x0 0x20>;
+	};
+
+	edac@78800000 {
+		compatible = "apm,xgene-edac";
+		#address-cells = <2>;
+		#size-cells = <2>;
+		ranges;
+		regmap-csw = <&csw>;
+		regmap-mcba = <&mcba>;
+		regmap-mcbb = <&mcbb>;
+		regmap-efuse = <&efuse>;
+		reg = <0x0 0x78800000 0x0 0x100>;
+		interrupts = <0x0 0x20 0x4>,
+			     <0x0 0x21 0x4>,
+			     <0x0 0x27 0x4>;
+
+		edacmc@7e800000 {
+			compatible = "apm,xgene-edac-mc";
+			reg = <0x0 0x7e800000 0x0 0x1000>;
+		};
+
+		edacpmd@7c000000 {
+			compatible = "apm,xgene-edac-pmd";
+			reg = <0x0 0x7c000000 0x0 0x200000>;
+		};
+	};