diff mbox series

mtd: rawnand: mxc: Move the ECC engine initialization to the right place

Message ID 20201016074942.29650-1-miquel.raynal@bootlin.com
State Superseded
Headers show
Series mtd: rawnand: mxc: Move the ECC engine initialization to the right place | expand

Commit Message

Miquel Raynal Oct. 16, 2020, 7:49 a.m. UTC
No ECC initialization should happen during the host controller probe.

Indeed, we need the probe to call nand_scan() in order to:
- identify the device, its capabilities and constraints (nand_scan_ident())
- configure the ECC engine accordingly (->attach_chip())
- scan its content and prepare the core (nand_scan_tail())

Moving these lines to mxcnd_attach_chip() fixes a regression caused by
a previous commit supposed to clarify these steps.

Fixes: TODO
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---

Hi Han,

Could you please give this patch a shot? It is supposed to fix
the LS1043A issue we have seen in robots reports the last weeks.

I kept the Fixes: tag empty because I need the original patch to be
merged in Linus' tree first but if it fixes the issue I will merge it
at -rc1 or -rc2.

Thanks,
Miquèl

 drivers/mtd/nand/raw/mxc_nand.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

Comments

Fabio Estevam Oct. 16, 2020, 11:32 a.m. UTC | #1
Hi Miquel,

On Fri, Oct 16, 2020 at 4:49 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> No ECC initialization should happen during the host controller probe.
>
> Indeed, we need the probe to call nand_scan() in order to:
> - identify the device, its capabilities and constraints (nand_scan_ident())
> - configure the ECC engine accordingly (->attach_chip())
> - scan its content and prepare the core (nand_scan_tail())
>
> Moving these lines to mxcnd_attach_chip() fixes a regression caused by
> a previous commit supposed to clarify these steps.
>
> Fixes: TODO
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
>
> Hi Han,
>
> Could you please give this patch a shot? It is supposed to fix
> the LS1043A issue we have seen in robots reports the last weeks.

Thanks for the mxc_nand fix!

The LS1043A uses a different NAND controller and its driver is
drivers/mtd/nand/raw/fsl_ifc_nand.c

Thanks
Fabio Estevam Oct. 16, 2020, 11:45 a.m. UTC | #2
Hi Miquel and Han,

On Fri, Oct 16, 2020 at 8:32 AM Fabio Estevam <festevam@gmail.com> wrote:

> The LS1043A uses a different NAND controller and its driver is
> drivers/mtd/nand/raw/fsl_ifc_nand.c

Should we follow the same idea here and move the ECC initialization to
fsl_ifc_attach_chip()?

Does this patch help?
https://pastebin.com/raw/xwHKXFmu
Miquel Raynal Oct. 16, 2020, 12:05 p.m. UTC | #3
Hi Fabio,

Fabio Estevam <festevam@gmail.com> wrote on Fri, 16 Oct 2020 08:45:08
-0300:

> Hi Miquel and Han,
> 
> On Fri, Oct 16, 2020 at 8:32 AM Fabio Estevam <festevam@gmail.com> wrote:
> 
> > The LS1043A uses a different NAND controller and its driver is
> > drivers/mtd/nand/raw/fsl_ifc_nand.c  
> 
> Should we follow the same idea here and move the ECC initialization to
> fsl_ifc_attach_chip()?
> 
> Does this patch help?
> https://pastebin.com/raw/xwHKXFmu

Definitely, yes!

I guess I will have to fix all the drivers doing part of the ECC
initialization in their probe function with the same logic. Hopefully
there are not so many...

Thanks,
Miquèl
Fabio Estevam Oct. 16, 2020, 12:11 p.m. UTC | #4
Hi Miquel,

On Fri, Oct 16, 2020 at 9:05 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:

> > Does this patch help?
> > https://pastebin.com/raw/xwHKXFmu
>
> Definitely, yes!
>
> I guess I will have to fix all the drivers doing part of the ECC
> initialization in their probe function with the same logic. Hopefully
> there are not so many...

Ok, let me send a formal patch for the ifc driver then.

I will also audit the other drivers.

Thanks,

Fabio Estevam
Sascha Hauer Oct. 16, 2020, 1:53 p.m. UTC | #5
Hi Miquel,

On Fri, Oct 16, 2020 at 09:49:42AM +0200, Miquel Raynal wrote:
> No ECC initialization should happen during the host controller probe.
> 
> Indeed, we need the probe to call nand_scan() in order to:
> - identify the device, its capabilities and constraints (nand_scan_ident())
> - configure the ECC engine accordingly (->attach_chip())
> - scan its content and prepare the core (nand_scan_tail())
> 
> Moving these lines to mxcnd_attach_chip() fixes a regression caused by
> a previous commit supposed to clarify these steps.
> 
> Fixes: TODO
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
> 
> Hi Han,
> 
> Could you please give this patch a shot? It is supposed to fix
> the LS1043A issue we have seen in robots reports the last weeks.

With this there's no longer a division by zero in the kernel, but NAND
now fails with the following. Also I can confirm that "mtd: rawnand:
Use the ECC framework user input parsing bits" in next breaks it, without
this patch the driver runs well.

Sascha

nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
nand: ST Micro NAND01GR3B2CZA6
nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at drivers/mtd/nand/raw/mxc_nand.c:1390 mxc_nand_command+0xc4/0x26c
Unimplemented command (cmd=0)
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 5.9.0-rc2-00020-g2e00450a5852 #20
Hardware name: Freescale i.MX27 (Device Tree Support)
[<c00106a4>] (unwind_backtrace) from [<c000dd08>] (show_stack+0x10/0x18)
[<c000dd08>] (show_stack) from [<c037950c>] (dump_stack+0x20/0x2c)
[<c037950c>] (dump_stack) from [<c001c68c>] (__warn+0xb8/0xec)
[<c001c68c>] (__warn) from [<c001ca54>] (warn_slowpath_fmt+0x90/0xbc)
[<c001ca54>] (warn_slowpath_fmt) from [<c0494cb8>] (mxc_nand_command+0xc4/0x26c)
[<c0494cb8>] (mxc_nand_command) from [<c0489f50>] (nand_read_page_op+0x28c/0x300)
[<c0489f50>] (nand_read_page_op) from [<c048a0d4>] (nand_read_page_raw+0x2c/0x6c)
[<c048a0d4>] (nand_read_page_raw) from [<c0486b68>] (nand_read_page_swecc+0x38/0x11c)
[<c0486b68>] (nand_read_page_swecc) from [<c0487f5c>] (nand_read_oob+0x238/0x730)
[<c0487f5c>] (nand_read_oob) from [<c047650c>] (mtd_read_oob+0x84/0x14c)
[<c047650c>] (mtd_read_oob) from [<c048eb7c>] (scan_read+0xd0/0x138)
[<c048eb7c>] (scan_read) from [<c048fe18>] (search_bbt+0x254/0x2cc)
[<c048fe18>] (search_bbt) from [<c0490068>] (nand_create_bbt+0x1d8/0x6ec)
[<c0490068>] (nand_create_bbt) from [<c048d20c>] (nand_scan_with_ids+0x10fc/0x164c)
[<c048d20c>] (nand_scan_with_ids) from [<c0495dd8>] (mxcnd_probe+0x2bc/0x3b4)
[<c0495dd8>] (mxcnd_probe) from [<c041c2b4>] (platform_drv_probe+0x4c/0xa0)
[<c041c2b4>] (platform_drv_probe) from [<c0419e18>] (really_probe+0x1e8/0x3d0)
[<c0419e18>] (really_probe) from [<c041a170>] (driver_probe_device+0x54/0xb0)
[<c041a170>] (driver_probe_device) from [<c041a384>] (device_driver_attach+0x5c/0x64)
[<c041a384>] (device_driver_attach) from [<c041a3e8>] (__driver_attach+0x5c/0xcc)
[<c041a3e8>] (__driver_attach) from [<c0417f30>] (bus_for_each_dev+0x78/0xc4)
[<c0417f30>] (bus_for_each_dev) from [<c0419734>] (driver_attach+0x18/0x24)
[<c0419734>] (driver_attach) from [<c04191b4>] (bus_add_driver+0x178/0x1d8)
[<c04191b4>] (bus_add_driver) from [<c041af74>] (driver_register+0x74/0x114)
[<c041af74>] (driver_register) from [<c041c20c>] (__platform_driver_register+0x30/0x48)
[<c041c20c>] (__platform_driver_register) from [<c090ecb8>] (mxcnd_driver_init+0x10/0x1c)
[<c090ecb8>] (mxcnd_driver_init) from [<c000a1dc>] (do_one_initcall+0x50/0x278)
[<c000a1dc>] (do_one_initcall) from [<c08ebffc>] (kernel_init_freeable+0x13c/0x1c0)
[<c08ebffc>] (kernel_init_freeable) from [<c070e150>] (kernel_init+0x8/0xf8)
[<c070e150>] (kernel_init) from [<c0008560>] (ret_from_fork+0x14/0x34)
Exception stack(0xc783dfb0 to 0xc783dff8)
dfa0:                                     00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
---[ end trace 0f9701f8ecf348de ]---
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
Bad block table not found for chip 0
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
__nand_correct_data: uncorrectable ECC error
Miquel Raynal Oct. 16, 2020, 5:01 p.m. UTC | #6
Hi Sascha,

Sascha Hauer <s.hauer@pengutronix.de> wrote on Fri, 16 Oct 2020
15:53:51 +0200:

> Hi Miquel,
> 
> On Fri, Oct 16, 2020 at 09:49:42AM +0200, Miquel Raynal wrote:
> > No ECC initialization should happen during the host controller probe.
> > 
> > Indeed, we need the probe to call nand_scan() in order to:
> > - identify the device, its capabilities and constraints (nand_scan_ident())
> > - configure the ECC engine accordingly (->attach_chip())
> > - scan its content and prepare the core (nand_scan_tail())
> > 
> > Moving these lines to mxcnd_attach_chip() fixes a regression caused by
> > a previous commit supposed to clarify these steps.
> > 
> > Fixes: TODO
> > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > ---
> > 
> > Hi Han,
> > 
> > Could you please give this patch a shot? It is supposed to fix
> > the LS1043A issue we have seen in robots reports the last weeks.  
> 
> With this there's no longer a division by zero in the kernel, but NAND
> now fails with the following. Also I can confirm that "mtd: rawnand:
> Use the ECC framework user input parsing bits" in next breaks it, without
> this patch the driver runs well.
> 
> Sascha
> 
> nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> nand: ST Micro NAND01GR3B2CZA6
> nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at drivers/mtd/nand/raw/mxc_nand.c:1390 mxc_nand_command+0xc4/0x26c
> Unimplemented command (cmd=0)
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper Not tainted 5.9.0-rc2-00020-g2e00450a5852 #20
> Hardware name: Freescale i.MX27 (Device Tree Support)
> [<c00106a4>] (unwind_backtrace) from [<c000dd08>] (show_stack+0x10/0x18)
> [<c000dd08>] (show_stack) from [<c037950c>] (dump_stack+0x20/0x2c)
> [<c037950c>] (dump_stack) from [<c001c68c>] (__warn+0xb8/0xec)
> [<c001c68c>] (__warn) from [<c001ca54>] (warn_slowpath_fmt+0x90/0xbc)
> [<c001ca54>] (warn_slowpath_fmt) from [<c0494cb8>] (mxc_nand_command+0xc4/0x26c)
> [<c0494cb8>] (mxc_nand_command) from [<c0489f50>] (nand_read_page_op+0x28c/0x300)
> [<c0489f50>] (nand_read_page_op) from [<c048a0d4>] (nand_read_page_raw+0x2c/0x6c)
> [<c048a0d4>] (nand_read_page_raw) from [<c0486b68>] (nand_read_page_swecc+0x38/0x11c)
> [<c0486b68>] (nand_read_page_swecc) from [<c0487f5c>] (nand_read_oob+0x238/0x730)

Software ECC is used, is it expected?

Can you trace rawnand_dt_init() to see what happen in the engine_type
choice?

Thanks,
Miquèl
Fabio Estevam Oct. 16, 2020, 5:37 p.m. UTC | #7
Hi Miquel,

On Fri, Oct 16, 2020 at 2:01 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:

> Software ECC is used, is it expected?
>
> Can you trace rawnand_dt_init() to see what happen in the engine_type
> choice?

I managed to resurrect an old imx27-pdk board here and reproduced the
same behavior as Sascha reported.

engine_type looks good. ecc.size is still 0.

I am using this debug patch:

--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -5036,6 +5036,12 @@ static int rawnand_dt_init(struct nand_chip *chip)
        chip->ecc.strength = nand->ecc.user_conf.strength;
        chip->ecc.size = nand->ecc.user_conf.step_size;

+
+       pr_info("********** chip->ecc.engine_type is %d\n",
chip->ecc.engine_type);
+       pr_info("********** chip->ecc.strength is %d\n", chip->ecc.strength);
+       pr_info("********** chip->ecc.size is %d\n", chip->ecc.size);
+       pr_info("********** chip->ecc.algo is %d\n", chip->ecc.algo);
+
        return 0;
 }

and I get:

nand: ********** chip->ecc.engine_type is 3
nand: ********** chip->ecc.strength is 0
nand: ********** chip->ecc.size is 0
nand: ********** chip->ecc.algo is 0
nand: device found, Manufacturer ID: 0xec, Chip ID: 0xaa
nand: Samsung NAND 256MiB 1,8V 8-bit
nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at drivers/mtd/nand/raw/mxc_nand.c:1391
mxc_nand_command+0x22c/0x280
Unimplemented command (cmd=0)
Modules linked in:

If I checkout commit  d7157ff49a ("mtd: rawnand: Use the ECC framework
user input parsing bits") in today's linux-next and revert it, then
the driver probes fine and prints:

nand: ******** ecc.engine_type = 3
nand: ******** chip->ecc.strength = 0
nand: ******** chip->ecc.size = 512

Thanks
Fabio Estevam Oct. 16, 2020, 7:18 p.m. UTC | #8
Hi Miquel,

On Fri, Oct 16, 2020 at 2:01 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:

> Software ECC is used, is it expected?

I noticed the issue with your patch: host->pdata.hw_ecc cannot be
called within attach() because pdata is not initialized at this point.

Instead of using pdata, we can retrieve the "nand-ecc-mode" string
from the device tree.

I will submit the correct fix.

Thanks
Miquel Raynal Oct. 16, 2020, 9:05 p.m. UTC | #9
Hi Fabio,

Fabio Estevam <festevam@gmail.com> wrote on Fri, 16 Oct 2020 16:18:15
-0300:

> Hi Miquel,
> 
> On Fri, Oct 16, 2020 at 2:01 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> 
> > Software ECC is used, is it expected?  
> 
> I noticed the issue with your patch: host->pdata.hw_ecc cannot be
> called within attach() because pdata is not initialized at this point.t->

Nice catch! But I don't get why host->pdata.hw_ecc would not be
accessible from the attach hook. host->pdata is populated in the probe
function, way before nand_scan(), where ->attach() is called. So for me
host->pdata.hw_ecc should be accessible from ->attach().

> Instead of using pdata, we can retrieve the "nand-ecc-mode" string
> from the device tree.

Please don't do that! The DT parsing should be centralized in the core.

However, if you don't need this pdata entry you can get rid of it
entirely. In theory, if the user set the nand-ecc-mode property, then
chip->ecc.engine_type should already be set to the appropriate value
when entering ->attach(). Can you please check its value? It should
have been updated by rawnand_dt_init().

Thanks,
Miquèl
Fabio Estevam Oct. 16, 2020, 9:32 p.m. UTC | #10
Hi Miquel,

On Fri, Oct 16, 2020 at 6:06 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:

> Nice catch! But I don't get why host->pdata.hw_ecc would not be
> accessible from the attach hook. host->pdata is populated in the probe
> function, way before nand_scan(), where ->attach() is called. So for me
> host->pdata.hw_ecc should be accessible from ->attach().

Yes, now I understand it. pdata is only populated for non-dt platforms.

On 5.10-rc1 the non-dt imx users are gone, so we can get rid of pdata
on a separate patch.

> > Instead of using pdata, we can retrieve the "nand-ecc-mode" string
> > from the device tree.
>
> Please don't do that! The DT parsing should be centralized in the core.
>
> However, if you don't need this pdata entry you can get rid of it
> entirely. In theory, if the user set the nand-ecc-mode property, then
> chip->ecc.engine_type should already be set to the appropriate value
> when entering ->attach(). Can you please check its value? It should
> have been updated by rawnand_dt_init().

You are right. I have sent a v2, which lets the core determine the engine type.

Thanks,

Fabio Estevam
Miquel Raynal Oct. 17, 2020, 6:17 p.m. UTC | #11
Hi Fabio,

Fabio Estevam <festevam@gmail.com> wrote on Fri, 16 Oct 2020 18:32:33
-0300:

> Hi Miquel,
> 
> On Fri, Oct 16, 2020 at 6:06 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> 
> > Nice catch! But I don't get why host->pdata.hw_ecc would not be
> > accessible from the attach hook. host->pdata is populated in the probe
> > function, way before nand_scan(), where ->attach() is called. So for me
> > host->pdata.hw_ecc should be accessible from ->attach().  
> 
> Yes, now I understand it. pdata is only populated for non-dt platforms.
> 
> On 5.10-rc1 the non-dt imx users are gone, so we can get rid of pdata
> on a separate patch.

Nice!

> 
> > > Instead of using pdata, we can retrieve the "nand-ecc-mode" string
> > > from the device tree.  
> >
> > Please don't do that! The DT parsing should be centralized in the core.
> >
> > However, if you don't need this pdata entry you can get rid of it
> > entirely. In theory, if the user set the nand-ecc-mode property, then
> > chip->ecc.engine_type should already be set to the appropriate value
> > when entering ->attach(). Can you please check its value? It should
> > have been updated by rawnand_dt_init().  
> 
> You are right. I have sent a v2, which lets the core determine the engine type.

The patch looks good to me!

Thanks,
Miquèl
diff mbox series

Patch

diff --git a/drivers/mtd/nand/raw/mxc_nand.c b/drivers/mtd/nand/raw/mxc_nand.c
index d4200eb2ad32..6ba96f343a3d 100644
--- a/drivers/mtd/nand/raw/mxc_nand.c
+++ b/drivers/mtd/nand/raw/mxc_nand.c
@@ -1681,6 +1681,18 @@  static int mxcnd_attach_chip(struct nand_chip *chip)
 	struct mxc_nand_host *host = nand_get_controller_data(chip);
 	struct device *dev = mtd->dev.parent;
 
+	chip->ecc.bytes = host->devtype_data->eccbytes;
+	host->eccsize = host->devtype_data->eccsize;
+	chip->ecc.size = 512;
+	mtd_set_ooblayout(mtd, host->devtype_data->ooblayout);
+
+	if (host->pdata.hw_ecc) {
+		chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_HOST;
+	} else {
+		chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;
+		chip->ecc.algo = NAND_ECC_ALGO_HAMMING;
+	}
+
 	switch (chip->ecc.engine_type) {
 	case NAND_ECC_ENGINE_TYPE_ON_HOST:
 		chip->ecc.read_page = mxc_nand_read_page;
@@ -1836,19 +1848,7 @@  static int mxcnd_probe(struct platform_device *pdev)
 	if (host->devtype_data->axi_offset)
 		host->regs_axi = host->base + host->devtype_data->axi_offset;
 
-	this->ecc.bytes = host->devtype_data->eccbytes;
-	host->eccsize = host->devtype_data->eccsize;
-
 	this->legacy.select_chip = host->devtype_data->select_chip;
-	this->ecc.size = 512;
-	mtd_set_ooblayout(mtd, host->devtype_data->ooblayout);
-
-	if (host->pdata.hw_ecc) {
-		this->ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_HOST;
-	} else {
-		this->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;
-		this->ecc.algo = NAND_ECC_ALGO_HAMMING;
-	}
 
 	/* NAND bus width determines access functions used by upper layer */
 	if (host->pdata.width == 2)