Message ID | 20190129153700.18717-1-martink@posteo.de |
---|---|
State | Superseded |
Delegated to: | Boris Brezillon |
Headers | show |
Series | mtd: rawnand: gpmi: fix MX28 bus master lockup problem | expand |
Hi Martin, Martin Kepplinger <martink@posteo.de> wrote on Tue, 29 Jan 2019 16:37:00 +0100: > From: Martin Kepplinger <martin.kepplinger@ginzinger.com> > > Disable BCH soft reset according to MX23 erratum #2847 ("BCH soft > reset may cause bus master lock up") for MX28 too. It has the same > problem. > > Observed problem: once per 100,000+ MX28 reboots NAND read failed on > DMA timeout errors: > [ 1.770823] UBI: attaching mtd3 to ubi0 > [ 2.768088] gpmi_nand: DMA timeout, last DMA :1 > [ 3.958087] gpmi_nand: BCH timeout, last DMA :1 > [ 4.156033] gpmi_nand: Error in ECC-based read: -110 > [ 4.161136] UBI warning: ubi_io_read: error -110 while reading 64 > bytes from PEB 0:0, read only 0 bytes, retry > [ 4.171283] step 1 error > [ 4.173846] gpmi_nand: Chip: 0, Error -1 > > Without BCH soft reset we successfully executed 1,000,000 MX28 reboots. > > I have a quote from NXP regarding this problem, from July 18th 2016: > > "As the i.MX23 and i.MX28 are of the same generation, they share many > characteristics. Unfortunately, also the erratas may be shared. > In case of the documented erratas and the workarounds, you can also > apply the workaround solution of one device on the other one. This have > been reported, but I’m afraid that there are not an estimated date for > updating the Errata documents. > Please accept our apologies for any inconveniences this may cause." > > Signed-off-by: Manfred Schlaegl <manfred.schlaegl@ginzinger.com> > Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com> > --- > > This is something we have in our tree for years and is running on > production systems and I hope that this helps others too. > > thanks > martin Thanks for sharing this. On my side the patch looks ok, maybe an Acked-by from Han would be great? Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Boris, do you want to take this? If you don't have more fixes to send I can queue it to nand/next. Thanks, Miquèl
On Tue, 5 Feb 2019 13:59:46 +0100 Miquel Raynal <miquel.raynal@bootlin.com> wrote: > Hi Martin, > > Martin Kepplinger <martink@posteo.de> wrote on Tue, 29 Jan 2019 > 16:37:00 +0100: > > > From: Martin Kepplinger <martin.kepplinger@ginzinger.com> > > > > Disable BCH soft reset according to MX23 erratum #2847 ("BCH soft > > reset may cause bus master lock up") for MX28 too. It has the same > > problem. > > > > Observed problem: once per 100,000+ MX28 reboots NAND read failed on > > DMA timeout errors: > > [ 1.770823] UBI: attaching mtd3 to ubi0 > > [ 2.768088] gpmi_nand: DMA timeout, last DMA :1 > > [ 3.958087] gpmi_nand: BCH timeout, last DMA :1 > > [ 4.156033] gpmi_nand: Error in ECC-based read: -110 > > [ 4.161136] UBI warning: ubi_io_read: error -110 while reading 64 > > bytes from PEB 0:0, read only 0 bytes, retry > > [ 4.171283] step 1 error > > [ 4.173846] gpmi_nand: Chip: 0, Error -1 > > > > Without BCH soft reset we successfully executed 1,000,000 MX28 > > reboots. > > > > I have a quote from NXP regarding this problem, from July 18th 2016: > > > > "As the i.MX23 and i.MX28 are of the same generation, they share > > many characteristics. Unfortunately, also the erratas may be shared. > > In case of the documented erratas and the workarounds, you can also > > apply the workaround solution of one device on the other one. This > > have been reported, but I’m afraid that there are not an estimated > > date for updating the Errata documents. > > Please accept our apologies for any inconveniences this may cause." > > > > Signed-off-by: Manfred Schlaegl <manfred.schlaegl@ginzinger.com> > > Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com> > > --- > > > > This is something we have in our tree for years and is running on > > production systems and I hope that this helps others too. > > > > thanks > > martin > > Thanks for sharing this. > > On my side the patch looks ok, maybe an Acked-by from Han would be > great? > > Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> > > Boris, do you want to take this? If you don't have more fixes to send > I can queue it to nand/next. I'll take it.
On Tue, Feb 5, 2019 at 11:00 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi Martin, > > Martin Kepplinger <martink@posteo.de> wrote on Tue, 29 Jan 2019 > 16:37:00 +0100: > > > From: Martin Kepplinger <martin.kepplinger@ginzinger.com> > > > > Disable BCH soft reset according to MX23 erratum #2847 ("BCH soft > > reset may cause bus master lock up") for MX28 too. It has the same > > problem. > > > > Observed problem: once per 100,000+ MX28 reboots NAND read failed on > > DMA timeout errors: > > [ 1.770823] UBI: attaching mtd3 to ubi0 > > [ 2.768088] gpmi_nand: DMA timeout, last DMA :1 > > [ 3.958087] gpmi_nand: BCH timeout, last DMA :1 > > [ 4.156033] gpmi_nand: Error in ECC-based read: -110 > > [ 4.161136] UBI warning: ubi_io_read: error -110 while reading 64 > > bytes from PEB 0:0, read only 0 bytes, retry > > [ 4.171283] step 1 error > > [ 4.173846] gpmi_nand: Chip: 0, Error -1 > > > > Without BCH soft reset we successfully executed 1,000,000 MX28 reboots. > > > > I have a quote from NXP regarding this problem, from July 18th 2016: > > > > "As the i.MX23 and i.MX28 are of the same generation, they share many > > characteristics. Unfortunately, also the erratas may be shared. > > In case of the documented erratas and the workarounds, you can also > > apply the workaround solution of one device on the other one. This have > > been reported, but I’m afraid that there are not an estimated date for > > updating the Errata documents. > > Please accept our apologies for any inconveniences this may cause." > > > > Signed-off-by: Manfred Schlaegl <manfred.schlaegl@ginzinger.com> > > Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com> Does this also need a Fixes tag so that it can be backported to stable? Thanks
On 05.02.19 16:22, Fabio Estevam wrote: > On Tue, Feb 5, 2019 at 11:00 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: >> >> Hi Martin, >> >> Martin Kepplinger <martink@posteo.de> wrote on Tue, 29 Jan 2019 >> 16:37:00 +0100: >> >>> From: Martin Kepplinger <martin.kepplinger@ginzinger.com> >>> >>> Disable BCH soft reset according to MX23 erratum #2847 ("BCH soft >>> reset may cause bus master lock up") for MX28 too. It has the same >>> problem. >>> >>> Observed problem: once per 100,000+ MX28 reboots NAND read failed on >>> DMA timeout errors: >>> [ 1.770823] UBI: attaching mtd3 to ubi0 >>> [ 2.768088] gpmi_nand: DMA timeout, last DMA :1 >>> [ 3.958087] gpmi_nand: BCH timeout, last DMA :1 >>> [ 4.156033] gpmi_nand: Error in ECC-based read: -110 >>> [ 4.161136] UBI warning: ubi_io_read: error -110 while reading 64 >>> bytes from PEB 0:0, read only 0 bytes, retry >>> [ 4.171283] step 1 error >>> [ 4.173846] gpmi_nand: Chip: 0, Error -1 >>> >>> Without BCH soft reset we successfully executed 1,000,000 MX28 reboots. >>> >>> I have a quote from NXP regarding this problem, from July 18th 2016: >>> >>> "As the i.MX23 and i.MX28 are of the same generation, they share many >>> characteristics. Unfortunately, also the erratas may be shared. >>> In case of the documented erratas and the workarounds, you can also >>> apply the workaround solution of one device on the other one. This have >>> been reported, but I’m afraid that there are not an estimated date for >>> updating the Errata documents. >>> Please accept our apologies for any inconveniences this may cause." >>> >>> Signed-off-by: Manfred Schlaegl <manfred.schlaegl@ginzinger.com> >>> Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com> > > Does this also need a Fixes tag so that it can be backported to stable? > Actually I rebased this from our 4.14 stable tree, so yes, I just forgot about that and I guess it would be Fixes: 6f2a6a52560a ("mtd: nand: gpmi: reset BCH earlier, too, to avoid NAND startup problems") Do you want me to add this and CC stable? thanks martin
Hi Martin, On Tue, Feb 5, 2019 at 1:42 PM Martin Kepplinger <martin.kepplinger@ginzinger.com> wrote: > Actually I rebased this from our 4.14 stable tree, so yes, I just forgot > about that and I guess it would be > > Fixes: 6f2a6a52560a ("mtd: nand: gpmi: reset BCH earlier, too, to avoid > NAND startup problems") > > Do you want me to add this and CC stable? That would be nice. You can also add: Reviewed-by: Fabio Estevam <festevam@gmail.com> Thanks
diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c b/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c index bd4cfac6b5aa..a4768df5083f 100644 --- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c +++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c @@ -155,9 +155,10 @@ int gpmi_init(struct gpmi_nand_data *this) /* * Reset BCH here, too. We got failures otherwise :( - * See later BCH reset for explanation of MX23 handling + * See later BCH reset for explanation of MX23 and MX28 handling */ - ret = gpmi_reset_block(r->bch_regs, GPMI_IS_MX23(this)); + ret = gpmi_reset_block(r->bch_regs, + GPMI_IS_MX23(this) || GPMI_IS_MX28(this)); if (ret) goto err_out; @@ -263,12 +264,10 @@ int bch_set_geometry(struct gpmi_nand_data *this) /* * Due to erratum #2847 of the MX23, the BCH cannot be soft reset on this * chip, otherwise it will lock up. So we skip resetting BCH on the MX23. - * On the other hand, the MX28 needs the reset, because one case has been - * seen where the BCH produced ECC errors constantly after 10000 - * consecutive reboots. The latter case has not been seen on the MX23 - * yet, still we don't know if it could happen there as well. + * and MX28. */ - ret = gpmi_reset_block(r->bch_regs, GPMI_IS_MX23(this)); + ret = gpmi_reset_block(r->bch_regs, + GPMI_IS_MX23(this) || GPMI_IS_MX28(this)); if (ret) goto err_out;