Message ID | 20201016074942.29650-1-miquel.raynal@bootlin.com |
---|---|
State | Superseded |
Headers | show |
Series | mtd: rawnand: mxc: Move the ECC engine initialization to the right place | expand |
Hi Miquel, On Fri, Oct 16, 2020 at 4:49 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > No ECC initialization should happen during the host controller probe. > > Indeed, we need the probe to call nand_scan() in order to: > - identify the device, its capabilities and constraints (nand_scan_ident()) > - configure the ECC engine accordingly (->attach_chip()) > - scan its content and prepare the core (nand_scan_tail()) > > Moving these lines to mxcnd_attach_chip() fixes a regression caused by > a previous commit supposed to clarify these steps. > > Fixes: TODO > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > --- > > Hi Han, > > Could you please give this patch a shot? It is supposed to fix > the LS1043A issue we have seen in robots reports the last weeks. Thanks for the mxc_nand fix! The LS1043A uses a different NAND controller and its driver is drivers/mtd/nand/raw/fsl_ifc_nand.c Thanks
Hi Miquel and Han, On Fri, Oct 16, 2020 at 8:32 AM Fabio Estevam <festevam@gmail.com> wrote: > The LS1043A uses a different NAND controller and its driver is > drivers/mtd/nand/raw/fsl_ifc_nand.c Should we follow the same idea here and move the ECC initialization to fsl_ifc_attach_chip()? Does this patch help? https://pastebin.com/raw/xwHKXFmu
Hi Fabio, Fabio Estevam <festevam@gmail.com> wrote on Fri, 16 Oct 2020 08:45:08 -0300: > Hi Miquel and Han, > > On Fri, Oct 16, 2020 at 8:32 AM Fabio Estevam <festevam@gmail.com> wrote: > > > The LS1043A uses a different NAND controller and its driver is > > drivers/mtd/nand/raw/fsl_ifc_nand.c > > Should we follow the same idea here and move the ECC initialization to > fsl_ifc_attach_chip()? > > Does this patch help? > https://pastebin.com/raw/xwHKXFmu Definitely, yes! I guess I will have to fix all the drivers doing part of the ECC initialization in their probe function with the same logic. Hopefully there are not so many... Thanks, Miquèl
Hi Miquel, On Fri, Oct 16, 2020 at 9:05 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Does this patch help? > > https://pastebin.com/raw/xwHKXFmu > > Definitely, yes! > > I guess I will have to fix all the drivers doing part of the ECC > initialization in their probe function with the same logic. Hopefully > there are not so many... Ok, let me send a formal patch for the ifc driver then. I will also audit the other drivers. Thanks, Fabio Estevam
Hi Miquel, On Fri, Oct 16, 2020 at 09:49:42AM +0200, Miquel Raynal wrote: > No ECC initialization should happen during the host controller probe. > > Indeed, we need the probe to call nand_scan() in order to: > - identify the device, its capabilities and constraints (nand_scan_ident()) > - configure the ECC engine accordingly (->attach_chip()) > - scan its content and prepare the core (nand_scan_tail()) > > Moving these lines to mxcnd_attach_chip() fixes a regression caused by > a previous commit supposed to clarify these steps. > > Fixes: TODO > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > --- > > Hi Han, > > Could you please give this patch a shot? It is supposed to fix > the LS1043A issue we have seen in robots reports the last weeks. With this there's no longer a division by zero in the kernel, but NAND now fails with the following. Also I can confirm that "mtd: rawnand: Use the ECC framework user input parsing bits" in next breaks it, without this patch the driver runs well. Sascha nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 nand: ST Micro NAND01GR3B2CZA6 nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at drivers/mtd/nand/raw/mxc_nand.c:1390 mxc_nand_command+0xc4/0x26c Unimplemented command (cmd=0) Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 5.9.0-rc2-00020-g2e00450a5852 #20 Hardware name: Freescale i.MX27 (Device Tree Support) [<c00106a4>] (unwind_backtrace) from [<c000dd08>] (show_stack+0x10/0x18) [<c000dd08>] (show_stack) from [<c037950c>] (dump_stack+0x20/0x2c) [<c037950c>] (dump_stack) from [<c001c68c>] (__warn+0xb8/0xec) [<c001c68c>] (__warn) from [<c001ca54>] (warn_slowpath_fmt+0x90/0xbc) [<c001ca54>] (warn_slowpath_fmt) from [<c0494cb8>] (mxc_nand_command+0xc4/0x26c) [<c0494cb8>] (mxc_nand_command) from [<c0489f50>] (nand_read_page_op+0x28c/0x300) [<c0489f50>] (nand_read_page_op) from [<c048a0d4>] (nand_read_page_raw+0x2c/0x6c) [<c048a0d4>] (nand_read_page_raw) from [<c0486b68>] (nand_read_page_swecc+0x38/0x11c) [<c0486b68>] (nand_read_page_swecc) from [<c0487f5c>] (nand_read_oob+0x238/0x730) [<c0487f5c>] (nand_read_oob) from [<c047650c>] (mtd_read_oob+0x84/0x14c) [<c047650c>] (mtd_read_oob) from [<c048eb7c>] (scan_read+0xd0/0x138) [<c048eb7c>] (scan_read) from [<c048fe18>] (search_bbt+0x254/0x2cc) [<c048fe18>] (search_bbt) from [<c0490068>] (nand_create_bbt+0x1d8/0x6ec) [<c0490068>] (nand_create_bbt) from [<c048d20c>] (nand_scan_with_ids+0x10fc/0x164c) [<c048d20c>] (nand_scan_with_ids) from [<c0495dd8>] (mxcnd_probe+0x2bc/0x3b4) [<c0495dd8>] (mxcnd_probe) from [<c041c2b4>] (platform_drv_probe+0x4c/0xa0) [<c041c2b4>] (platform_drv_probe) from [<c0419e18>] (really_probe+0x1e8/0x3d0) [<c0419e18>] (really_probe) from [<c041a170>] (driver_probe_device+0x54/0xb0) [<c041a170>] (driver_probe_device) from [<c041a384>] (device_driver_attach+0x5c/0x64) [<c041a384>] (device_driver_attach) from [<c041a3e8>] (__driver_attach+0x5c/0xcc) [<c041a3e8>] (__driver_attach) from [<c0417f30>] (bus_for_each_dev+0x78/0xc4) [<c0417f30>] (bus_for_each_dev) from [<c0419734>] (driver_attach+0x18/0x24) [<c0419734>] (driver_attach) from [<c04191b4>] (bus_add_driver+0x178/0x1d8) [<c04191b4>] (bus_add_driver) from [<c041af74>] (driver_register+0x74/0x114) [<c041af74>] (driver_register) from [<c041c20c>] (__platform_driver_register+0x30/0x48) [<c041c20c>] (__platform_driver_register) from [<c090ecb8>] (mxcnd_driver_init+0x10/0x1c) [<c090ecb8>] (mxcnd_driver_init) from [<c000a1dc>] (do_one_initcall+0x50/0x278) [<c000a1dc>] (do_one_initcall) from [<c08ebffc>] (kernel_init_freeable+0x13c/0x1c0) [<c08ebffc>] (kernel_init_freeable) from [<c070e150>] (kernel_init+0x8/0xf8) [<c070e150>] (kernel_init) from [<c0008560>] (ret_from_fork+0x14/0x34) Exception stack(0xc783dfb0 to 0xc783dff8) dfa0: 00000000 00000000 00000000 00000000 dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 ---[ end trace 0f9701f8ecf348de ]--- __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error Bad block table not found for chip 0 __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error __nand_correct_data: uncorrectable ECC error
Hi Sascha, Sascha Hauer <s.hauer@pengutronix.de> wrote on Fri, 16 Oct 2020 15:53:51 +0200: > Hi Miquel, > > On Fri, Oct 16, 2020 at 09:49:42AM +0200, Miquel Raynal wrote: > > No ECC initialization should happen during the host controller probe. > > > > Indeed, we need the probe to call nand_scan() in order to: > > - identify the device, its capabilities and constraints (nand_scan_ident()) > > - configure the ECC engine accordingly (->attach_chip()) > > - scan its content and prepare the core (nand_scan_tail()) > > > > Moving these lines to mxcnd_attach_chip() fixes a regression caused by > > a previous commit supposed to clarify these steps. > > > > Fixes: TODO > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > --- > > > > Hi Han, > > > > Could you please give this patch a shot? It is supposed to fix > > the LS1043A issue we have seen in robots reports the last weeks. > > With this there's no longer a division by zero in the kernel, but NAND > now fails with the following. Also I can confirm that "mtd: rawnand: > Use the ECC framework user input parsing bits" in next breaks it, without > this patch the driver runs well. > > Sascha > > nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1 > nand: ST Micro NAND01GR3B2CZA6 > nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 1 at drivers/mtd/nand/raw/mxc_nand.c:1390 mxc_nand_command+0xc4/0x26c > Unimplemented command (cmd=0) > Modules linked in: > CPU: 0 PID: 1 Comm: swapper Not tainted 5.9.0-rc2-00020-g2e00450a5852 #20 > Hardware name: Freescale i.MX27 (Device Tree Support) > [<c00106a4>] (unwind_backtrace) from [<c000dd08>] (show_stack+0x10/0x18) > [<c000dd08>] (show_stack) from [<c037950c>] (dump_stack+0x20/0x2c) > [<c037950c>] (dump_stack) from [<c001c68c>] (__warn+0xb8/0xec) > [<c001c68c>] (__warn) from [<c001ca54>] (warn_slowpath_fmt+0x90/0xbc) > [<c001ca54>] (warn_slowpath_fmt) from [<c0494cb8>] (mxc_nand_command+0xc4/0x26c) > [<c0494cb8>] (mxc_nand_command) from [<c0489f50>] (nand_read_page_op+0x28c/0x300) > [<c0489f50>] (nand_read_page_op) from [<c048a0d4>] (nand_read_page_raw+0x2c/0x6c) > [<c048a0d4>] (nand_read_page_raw) from [<c0486b68>] (nand_read_page_swecc+0x38/0x11c) > [<c0486b68>] (nand_read_page_swecc) from [<c0487f5c>] (nand_read_oob+0x238/0x730) Software ECC is used, is it expected? Can you trace rawnand_dt_init() to see what happen in the engine_type choice? Thanks, Miquèl
Hi Miquel, On Fri, Oct 16, 2020 at 2:01 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > Software ECC is used, is it expected? > > Can you trace rawnand_dt_init() to see what happen in the engine_type > choice? I managed to resurrect an old imx27-pdk board here and reproduced the same behavior as Sascha reported. engine_type looks good. ecc.size is still 0. I am using this debug patch: --- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -5036,6 +5036,12 @@ static int rawnand_dt_init(struct nand_chip *chip) chip->ecc.strength = nand->ecc.user_conf.strength; chip->ecc.size = nand->ecc.user_conf.step_size; + + pr_info("********** chip->ecc.engine_type is %d\n", chip->ecc.engine_type); + pr_info("********** chip->ecc.strength is %d\n", chip->ecc.strength); + pr_info("********** chip->ecc.size is %d\n", chip->ecc.size); + pr_info("********** chip->ecc.algo is %d\n", chip->ecc.algo); + return 0; } and I get: nand: ********** chip->ecc.engine_type is 3 nand: ********** chip->ecc.strength is 0 nand: ********** chip->ecc.size is 0 nand: ********** chip->ecc.algo is 0 nand: device found, Manufacturer ID: 0xec, Chip ID: 0xaa nand: Samsung NAND 256MiB 1,8V 8-bit nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at drivers/mtd/nand/raw/mxc_nand.c:1391 mxc_nand_command+0x22c/0x280 Unimplemented command (cmd=0) Modules linked in: If I checkout commit d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits") in today's linux-next and revert it, then the driver probes fine and prints: nand: ******** ecc.engine_type = 3 nand: ******** chip->ecc.strength = 0 nand: ******** chip->ecc.size = 512 Thanks
Hi Miquel,
On Fri, Oct 16, 2020 at 2:01 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> Software ECC is used, is it expected?
I noticed the issue with your patch: host->pdata.hw_ecc cannot be
called within attach() because pdata is not initialized at this point.
Instead of using pdata, we can retrieve the "nand-ecc-mode" string
from the device tree.
I will submit the correct fix.
Thanks
Hi Fabio, Fabio Estevam <festevam@gmail.com> wrote on Fri, 16 Oct 2020 16:18:15 -0300: > Hi Miquel, > > On Fri, Oct 16, 2020 at 2:01 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > Software ECC is used, is it expected? > > I noticed the issue with your patch: host->pdata.hw_ecc cannot be > called within attach() because pdata is not initialized at this point.t-> Nice catch! But I don't get why host->pdata.hw_ecc would not be accessible from the attach hook. host->pdata is populated in the probe function, way before nand_scan(), where ->attach() is called. So for me host->pdata.hw_ecc should be accessible from ->attach(). > Instead of using pdata, we can retrieve the "nand-ecc-mode" string > from the device tree. Please don't do that! The DT parsing should be centralized in the core. However, if you don't need this pdata entry you can get rid of it entirely. In theory, if the user set the nand-ecc-mode property, then chip->ecc.engine_type should already be set to the appropriate value when entering ->attach(). Can you please check its value? It should have been updated by rawnand_dt_init(). Thanks, Miquèl
Hi Miquel, On Fri, Oct 16, 2020 at 6:06 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > Nice catch! But I don't get why host->pdata.hw_ecc would not be > accessible from the attach hook. host->pdata is populated in the probe > function, way before nand_scan(), where ->attach() is called. So for me > host->pdata.hw_ecc should be accessible from ->attach(). Yes, now I understand it. pdata is only populated for non-dt platforms. On 5.10-rc1 the non-dt imx users are gone, so we can get rid of pdata on a separate patch. > > Instead of using pdata, we can retrieve the "nand-ecc-mode" string > > from the device tree. > > Please don't do that! The DT parsing should be centralized in the core. > > However, if you don't need this pdata entry you can get rid of it > entirely. In theory, if the user set the nand-ecc-mode property, then > chip->ecc.engine_type should already be set to the appropriate value > when entering ->attach(). Can you please check its value? It should > have been updated by rawnand_dt_init(). You are right. I have sent a v2, which lets the core determine the engine type. Thanks, Fabio Estevam
Hi Fabio, Fabio Estevam <festevam@gmail.com> wrote on Fri, 16 Oct 2020 18:32:33 -0300: > Hi Miquel, > > On Fri, Oct 16, 2020 at 6:06 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > Nice catch! But I don't get why host->pdata.hw_ecc would not be > > accessible from the attach hook. host->pdata is populated in the probe > > function, way before nand_scan(), where ->attach() is called. So for me > > host->pdata.hw_ecc should be accessible from ->attach(). > > Yes, now I understand it. pdata is only populated for non-dt platforms. > > On 5.10-rc1 the non-dt imx users are gone, so we can get rid of pdata > on a separate patch. Nice! > > > > Instead of using pdata, we can retrieve the "nand-ecc-mode" string > > > from the device tree. > > > > Please don't do that! The DT parsing should be centralized in the core. > > > > However, if you don't need this pdata entry you can get rid of it > > entirely. In theory, if the user set the nand-ecc-mode property, then > > chip->ecc.engine_type should already be set to the appropriate value > > when entering ->attach(). Can you please check its value? It should > > have been updated by rawnand_dt_init(). > > You are right. I have sent a v2, which lets the core determine the engine type. The patch looks good to me! Thanks, Miquèl
diff --git a/drivers/mtd/nand/raw/mxc_nand.c b/drivers/mtd/nand/raw/mxc_nand.c index d4200eb2ad32..6ba96f343a3d 100644 --- a/drivers/mtd/nand/raw/mxc_nand.c +++ b/drivers/mtd/nand/raw/mxc_nand.c @@ -1681,6 +1681,18 @@ static int mxcnd_attach_chip(struct nand_chip *chip) struct mxc_nand_host *host = nand_get_controller_data(chip); struct device *dev = mtd->dev.parent; + chip->ecc.bytes = host->devtype_data->eccbytes; + host->eccsize = host->devtype_data->eccsize; + chip->ecc.size = 512; + mtd_set_ooblayout(mtd, host->devtype_data->ooblayout); + + if (host->pdata.hw_ecc) { + chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_HOST; + } else { + chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT; + chip->ecc.algo = NAND_ECC_ALGO_HAMMING; + } + switch (chip->ecc.engine_type) { case NAND_ECC_ENGINE_TYPE_ON_HOST: chip->ecc.read_page = mxc_nand_read_page; @@ -1836,19 +1848,7 @@ static int mxcnd_probe(struct platform_device *pdev) if (host->devtype_data->axi_offset) host->regs_axi = host->base + host->devtype_data->axi_offset; - this->ecc.bytes = host->devtype_data->eccbytes; - host->eccsize = host->devtype_data->eccsize; - this->legacy.select_chip = host->devtype_data->select_chip; - this->ecc.size = 512; - mtd_set_ooblayout(mtd, host->devtype_data->ooblayout); - - if (host->pdata.hw_ecc) { - this->ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_HOST; - } else { - this->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT; - this->ecc.algo = NAND_ECC_ALGO_HAMMING; - } /* NAND bus width determines access functions used by upper layer */ if (host->pdata.width == 2)
No ECC initialization should happen during the host controller probe. Indeed, we need the probe to call nand_scan() in order to: - identify the device, its capabilities and constraints (nand_scan_ident()) - configure the ECC engine accordingly (->attach_chip()) - scan its content and prepare the core (nand_scan_tail()) Moving these lines to mxcnd_attach_chip() fixes a regression caused by a previous commit supposed to clarify these steps. Fixes: TODO Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> --- Hi Han, Could you please give this patch a shot? It is supposed to fix the LS1043A issue we have seen in robots reports the last weeks. I kept the Fixes: tag empty because I need the original patch to be merged in Linus' tree first but if it fixes the issue I will merge it at -rc1 or -rc2. Thanks, Miquèl drivers/mtd/nand/raw/mxc_nand.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-)