Message ID | 1325233646-3343-1-git-send-email-b32955@freescale.com |
---|---|
State | New, archived |
Headers | show |
> In MX28, if we do not reset the BCH module. The BCH module may > becomes unstable when the board reboots for several thousands times. > This bug has been catched in customer's production. > > The patch adds some comments, and fixes it now. > > Also change gpmi_reset_block() to static. > > Signed-off-by: Huang Shijie <b32955@freescale.com> > --- > drivers/mtd/nand/gpmi-nand/gpmi-lib.c | 10 +++++++--- > 1 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c > b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c index de4db76..1573e99 100644 > --- a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c > @@ -69,7 +69,8 @@ static int clear_poll_bit(void __iomem *addr, u32 mask) > * [1] enable the module. > * [2] reset the module. > * > - * In most of the cases, it's ok. But there is a hardware bug in the BCH > block. + * In most of the cases, it's ok. > + * But in MX23, there is a hardware bug in the BCH block. > * If you try to soft reset the BCH block, it becomes unusable until > * the next hard reset. This case occurs in the NAND boot mode. When the > board * boots by NAND, the ROM of the chip will initialize the BCH blocks > itself. @@ -78,8 +79,10 @@ static int clear_poll_bit(void __iomem *addr, > u32 mask) * > * To avoid this bug, just add a new parameter `just_enable` for > * the mxs_reset_block(), and rewrite it here. > + * > + * The bug has been fixed in the following chips, such as MX28. > */ > -int gpmi_reset_block(void __iomem *reset_addr, bool just_enable) > +static int gpmi_reset_block(void __iomem *reset_addr, bool just_enable) > { > int ret; > int timeout = 0x400; > @@ -206,7 +209,8 @@ int bch_set_geometry(struct gpmi_nand_data *this) > if (ret) > goto err_out; > > - ret = gpmi_reset_block(r->bch_regs, true); > + /* The bug only exits in mx23, the following chips fix it. */ > + ret = gpmi_reset_block(r->bch_regs, GPMI_IS_MX23(this)); > if (ret) > goto err_out; Looks sane, though I can't test right now. Acked-by: Marek Vasut <marek.vasut@gmail.com>
On Fri, Dec 30, 2011 at 04:27:26PM +0800, Huang Shijie wrote: > In MX28, if we do not reset the BCH module. The BCH module may > becomes unstable when the board reboots for several thousands times. Do you have more details when and why this happens? What happens on MX23 then? > Signed-off-by: Huang Shijie <b32955@freescale.com> Is this Bug 2847 from the errata? Should be mentioned in the commit message and comments of the patch if so. Regards, Wolfram
Hi, > On Fri, Dec 30, 2011 at 04:27:26PM +0800, Huang Shijie wrote: >> In MX28, if we do not reset the BCH module. The BCH module may >> becomes unstable when the board reboots for several thousands times. > Do you have more details when and why this happens? What happens on MX23 then? In one customer's 3G router which uses the MX28: [0] NAND boot mode, mount the UBIFS in the NAND partition. [1] We used the gpmi_reset_block(r->bch_regs, true) to init the BCH module. [2] The board will automatically reboot again after it booted by NAND boot mode. [3] After reboot more then thousands times(cost nearly one day), the BCH mode became UNSTABLE, the data read out was not right, so the system could not mount the UBIFS. After we use gpmi_reset_block(r->bch_regs, false) to init the BCH module, the bug never happens. The gpmi_reset_block() was coded to avoid the NAND boot bug in MX23. So MX23 does not have the bug. >> Signed-off-by: Huang Shijie <b32955@freescale.com> > Is this Bug 2847 from the errata? Should be mentioned in the commit message and Yes, this is the bug 2847 from the mx23's errata. I will add some comments to it. Best Regards Huang Shijie > comments of the patch if so. > > Regards, > > Wolfram >
> Hi, > > > On Fri, Dec 30, 2011 at 04:27:26PM +0800, Huang Shijie wrote: > >> In MX28, if we do not reset the BCH module. The BCH module may > >> becomes unstable when the board reboots for several thousands times. > > > > Do you have more details when and why this happens? What happens on MX23 > > then? > > In one customer's 3G router which uses the MX28: > [0] NAND boot mode, mount the UBIFS in the NAND partition. > [1] We used the gpmi_reset_block(r->bch_regs, true) to init the BCH module. > [2] The board will automatically reboot again after it booted by NAND > boot mode. > [3] After reboot more then thousands times(cost nearly one day), the BCH > mode became UNSTABLE, > the data read out was not right, so the system could not mount the UBIFS. > > After we use gpmi_reset_block(r->bch_regs, false) to init the BCH > module, the bug never happens. > > > > The gpmi_reset_block() was coded to avoid the NAND boot bug in MX23. So > MX23 does not have the bug. > > >> Signed-off-by: Huang Shijie <b32955@freescale.com> > > > > Is this Bug 2847 from the errata? Should be mentioned in the commit > > message and > > Yes, this is the bug 2847 from the mx23's errata. > > I will add some comments to it. > > Best Regards > Huang Shijie > > > comments of the patch if so. > > > > Regards, > > > > Wolfram Ah, so you confirmed it's a problem with the silicon. Good
hi, >> Hi, >> >>> On Fri, Dec 30, 2011 at 04:27:26PM +0800, Huang Shijie wrote: >>>> In MX28, if we do not reset the BCH module. The BCH module may >>>> becomes unstable when the board reboots for several thousands times. >>> Do you have more details when and why this happens? What happens on MX23 >>> then? >> In one customer's 3G router which uses the MX28: >> [0] NAND boot mode, mount the UBIFS in the NAND partition. >> [1] We used the gpmi_reset_block(r->bch_regs, true) to init the BCH module. >> [2] The board will automatically reboot again after it booted by NAND >> boot mode. >> [3] After reboot more then thousands times(cost nearly one day), the BCH >> mode became UNSTABLE, >> the data read out was not right, so the system could not mount the UBIFS. >> >> After we use gpmi_reset_block(r->bch_regs, false) to init the BCH >> module, the bug never happens. >> >> >> >> The gpmi_reset_block() was coded to avoid the NAND boot bug in MX23. So >> MX23 does not have the bug. >> >>>> Signed-off-by: Huang Shijie<b32955@freescale.com> >>> Is this Bug 2847 from the errata? Should be mentioned in the commit >>> message and >> Yes, this is the bug 2847 from the mx23's errata. >> >> I will add some comments to it. >> >> Best Regards >> Huang Shijie >> >>> comments of the patch if so. >>> >>> Regards, >>> >>> Wolfram > Ah, so you confirmed it's a problem with the silicon. Good > yes, the bug in mx23 is really a hardware bug. BR Huang Shijie
> hi, > > >> Hi, > >> > >>> On Fri, Dec 30, 2011 at 04:27:26PM +0800, Huang Shijie wrote: > >>>> In MX28, if we do not reset the BCH module. The BCH module may > >>>> becomes unstable when the board reboots for several thousands times. > >>> > >>> Do you have more details when and why this happens? What happens on > >>> MX23 then? > >> > >> In one customer's 3G router which uses the MX28: > >> [0] NAND boot mode, mount the UBIFS in the NAND partition. > >> [1] We used the gpmi_reset_block(r->bch_regs, true) to init the BCH > >> module. [2] The board will automatically reboot again after it booted > >> by NAND boot mode. > >> [3] After reboot more then thousands times(cost nearly one day), the BCH > >> mode became UNSTABLE, > >> the data read out was not right, so the system could not mount the > >> UBIFS. > >> > >> After we use gpmi_reset_block(r->bch_regs, false) to init the BCH > >> module, the bug never happens. > >> > >> > >> > >> The gpmi_reset_block() was coded to avoid the NAND boot bug in MX23. So > >> MX23 does not have the bug. > >> > >>>> Signed-off-by: Huang Shijie<b32955@freescale.com> > >>> > >>> Is this Bug 2847 from the errata? Should be mentioned in the commit > >>> message and > >> > >> Yes, this is the bug 2847 from the mx23's errata. > >> > >> I will add some comments to it. > >> > >> Best Regards > >> Huang Shijie > >> > >>> comments of the patch if so. > >>> > >>> Regards, > >>> > >>> Wolfram > > > > Ah, so you confirmed it's a problem with the silicon. Good > > yes, the bug in mx23 is really a hardware bug. > > BR > Huang Shijie All right. Well ... this is probably the last email your way until next year so: I wish you a successful 2012, thanks for all your awesome work and I hope we'll be able to continue working (not only) on MXS well next year too! M
Hi, On Sat, Dec 31, 2011 at 10:23:50AM +0800, Huang Shijie wrote: > > On Fri, Dec 30, 2011 at 04:27:26PM +0800, Huang Shijie wrote: > >> In MX28, if we do not reset the BCH module. The BCH module may > >> becomes unstable when the board reboots for several thousands times. > > Do you have more details when and why this happens? What happens on MX23 then? > In one customer's 3G router which uses the MX28: > [0] NAND boot mode, mount the UBIFS in the NAND partition. > [1] We used the gpmi_reset_block(r->bch_regs, true) to init the BCH module. > [2] The board will automatically reboot again after it booted by NAND > boot mode. > [3] After reboot more then thousands times(cost nearly one day), the BCH > mode became UNSTABLE, > the data read out was not right, so the system could not mount the UBIFS. > > After we use gpmi_reset_block(r->bch_regs, false) to init the BCH > module, the bug never happens. Yes, I undestood that. I was curious _why_ this happens, for example, because of a bug in the rom-code or such. This would then lead to the question if this can be fixed by some other means. Because currently it looks like this to me: If you would do the above scenario of booting 10000 times on a MX23, then BCH could enter the same unstable state (can it?). Since we cannot reset BCH due to bug 2847, we have a serious problem, because NAND won't work until the next power-cycle? I am curious if my assumptions are true and we have a serious problem on the MX23. > Yes, this is the bug 2847 from the mx23's errata. OK, thanks. Kind regards, Wolfram
On Sat, Dec 31, 2011 at 10:43 AM, Wolfram Sang <w.sang@pengutronix.de> wrote: > Hi, > > On Sat, Dec 31, 2011 at 10:23:50AM +0800, Huang Shijie wrote: > >> > On Fri, Dec 30, 2011 at 04:27:26PM +0800, Huang Shijie wrote: >> >> In MX28, if we do not reset the BCH module. The BCH module may >> >> becomes unstable when the board reboots for several thousands times. >> > Do you have more details when and why this happens? What happens on MX23 then? >> In one customer's 3G router which uses the MX28: >> [0] NAND boot mode, mount the UBIFS in the NAND partition. >> [1] We used the gpmi_reset_block(r->bch_regs, true) to init the BCH module. >> [2] The board will automatically reboot again after it booted by NAND >> boot mode. >> [3] After reboot more then thousands times(cost nearly one day), the BCH >> mode became UNSTABLE, >> the data read out was not right, so the system could not mount the UBIFS. >> >> After we use gpmi_reset_block(r->bch_regs, false) to init the BCH >> module, the bug never happens. > > Yes, I undestood that. I was curious _why_ this happens, for example, because > of a bug in the rom-code or such. This would then lead to the question if this This happens because we do NOT follow the right init procedure. The IC guy told me. If the BCH was not initialized correctly, it can not guarantees the data is right. This is why the mx28 failed. > can be fixed by some other means. Because currently it looks like this to me: > If you would do the above scenario of booting 10000 times on a MX23, then BCH > could enter the same unstable state (can it?). Since we cannot reset BCH due to For mx23, it should not reset the BCH, this is the right init procedure for mx23. But I did not ever boot 10000 times on mx23. I am not sure if it can fail or not. > bug 2847, we have a serious problem, because NAND won't work until the next > power-cycle? I am curious if my assumptions are true and we have a serious > problem on the MX23. You can test it if you have time. :) Happy new year. Huang Shijie
Hi, > This happens because we do NOT follow the right init procedure. The IC > guy told me. > If the BCH was not initialized correctly, it can not guarantees the > data is right. > This is why the mx28 failed. ... > For mx23, it should not reset the BCH, this is the right init > procedure for mx23. Okay, this was the information I was looking for. So, the MX23 cannot be reset due to bug #2847. But it also does not NEED to be reset, which is unlike the MX28 which needs the reset. Correct? Before that, it was only clear that MX23 cannot be reset. It was not clear (at least to me) if it still could then run into the same problems as the MX28 after 10000+x boot cycles. That would be a really bad situation: being in need of the reset which we can't do due to 2847. > But I did not ever boot 10000 times on mx23. I am not sure if it can > fail or not. Hmm, can't your IC guy tell you? I was hoping he knows if the same problem which happens on the MX28 can also happen on the MX23? I don't want to make your life unnecessarily hard, but I'd really like to know if I need to warn customers when using MX23 and NAND (and if so, we have to update the comment in the code). > > bug 2847, we have a serious problem, because NAND won't work until the next > > power-cycle? I am curious if my assumptions are true and we have a serious > > problem on the MX23. > You can test it if you have time. :) Testing is guessing in this case, the problem could arise after <n+1> cycles if I tested <n>. Actually knowing the situation would be more helpful, I'd think. If this is not possible to find out, we'd need to add this as a comment, too, so people experiencing problems have a pointer which can save them *a lot of* time. Kind regards, Wolfram
On Sun, Jan 1, 2012 at 5:33 PM, Wolfram Sang <w.sang@pengutronix.de> wrote: > Hi, > >> This happens because we do NOT follow the right init procedure. The IC >> guy told me. >> If the BCH was not initialized correctly, it can not guarantees the >> data is right. >> This is why the mx28 failed. > ... >> For mx23, it should not reset the BCH, this is the right init >> procedure for mx23. > > Okay, this was the information I was looking for. So, the MX23 cannot be reset > due to bug #2847. But it also does not NEED to be reset, which is unlike the > MX28 which needs the reset. Correct? Before that, it was only clear that MX23 yes. i think so. > cannot be reset. It was not clear (at least to me) if it still could then run > into the same problems as the MX28 after 10000+x boot cycles. That would be a > really bad situation: being in need of the reset which we can't do due to 2847. I will do the MX23 test, when i am free. It really costs lot of time :( The bug only occurs in the software reboot. If you do hardware reboot(power from off to on). it does not exit. I am wonder if it's really has the same scenario in the real life. Who will soft reboot for 10000 continously? > >> But I did not ever boot 10000 times on mx23. I am not sure if it can >> fail or not. > > Hmm, can't your IC guy tell you? I was hoping he knows if the same problem which Our IC guy did not do such tough test as our customers did. > happens on the MX28 can also happen on the MX23? I don't want to make your life > unnecessarily hard, but I'd really like to know if I need to warn customers > when using MX23 and NAND (and if so, we have to update the comment in the code). > >> > bug 2847, we have a serious problem, because NAND won't work until the next >> > power-cycle? I am curious if my assumptions are true and we have a serious >> > problem on the MX23. >> You can test it if you have time. :) > > Testing is guessing in this case, the problem could arise after <n+1> cycles if > I tested <n>. Actually knowing the situation would be more helpful, I'd think. > If this is not possible to find out, we'd need to add this as a comment, too, > so people experiencing problems have a pointer which can save them *a lot of* yes, i agree. add it in next version. Best Regards Huang Shijie > time. > > Kind regards, > > Wolfram > > -- > Pengutronix e.K. | Wolfram Sang | > Industrial Linux Solutions | http://www.pengutronix.de/ |
Hi Huang,
> I will do the MX23 test, when i am free. It really costs lot of time :(
My point is like this: I wondered if you really found out _why_ the MX28
has this problem, e.g. "There is a case where the rom code does <x>, and
when the NAND then does <y> the BCH goes to state <z> which requires a
soft reset of the ip-core." If you had been at that point, it should be
quite easy to check if this can happen on MX23, too, I think.
If you don't know the cause of the problem in that detail because the
problem is hardly reproducable, well, that can happen, too. I fully
understand that. I don't want to enforce the time-consuming test on you
(would be interesting, though), but then at least the comments need
proper updates, so people are informed. From my understanding of the
situation, it could look like this:
===
Due to Errata #2847 of the MX23, the BCH cannot be soft reset on this
chip, otherwise it will lock up. So we skip resetting BCH on the MX23.
On the other hand, the MX28 needs the reset, because one case has been
seen where the BCH produced ECC errors constantly after 10000
consecutive reboots. The latter case has not been seen on the MX23 yet,
still we don't if it could happen there as well.
===
What do you think?
Thanks,
Wolfram
hi, > Due to Errata #2847 of the MX23, the BCH cannot be soft reset on this > chip, otherwise it will lock up. So we skip resetting BCH on the MX23. > On the other hand, the MX28 needs the reset, because one case has been > seen where the BCH produced ECC errors constantly after 10000 > consecutive reboots. The latter case has not been seen on the MX23 yet, > still we don't if it could happen there as well. > > === > > What do you think? > ok, thanks. I will add this comment to the patch. Huang Shijie
diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c index de4db76..1573e99 100644 --- a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c +++ b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c @@ -69,7 +69,8 @@ static int clear_poll_bit(void __iomem *addr, u32 mask) * [1] enable the module. * [2] reset the module. * - * In most of the cases, it's ok. But there is a hardware bug in the BCH block. + * In most of the cases, it's ok. + * But in MX23, there is a hardware bug in the BCH block. * If you try to soft reset the BCH block, it becomes unusable until * the next hard reset. This case occurs in the NAND boot mode. When the board * boots by NAND, the ROM of the chip will initialize the BCH blocks itself. @@ -78,8 +79,10 @@ static int clear_poll_bit(void __iomem *addr, u32 mask) * * To avoid this bug, just add a new parameter `just_enable` for * the mxs_reset_block(), and rewrite it here. + * + * The bug has been fixed in the following chips, such as MX28. */ -int gpmi_reset_block(void __iomem *reset_addr, bool just_enable) +static int gpmi_reset_block(void __iomem *reset_addr, bool just_enable) { int ret; int timeout = 0x400; @@ -206,7 +209,8 @@ int bch_set_geometry(struct gpmi_nand_data *this) if (ret) goto err_out; - ret = gpmi_reset_block(r->bch_regs, true); + /* The bug only exits in mx23, the following chips fix it. */ + ret = gpmi_reset_block(r->bch_regs, GPMI_IS_MX23(this)); if (ret) goto err_out;
In MX28, if we do not reset the BCH module. The BCH module may becomes unstable when the board reboots for several thousands times. This bug has been catched in customer's production. The patch adds some comments, and fixes it now. Also change gpmi_reset_block() to static. Signed-off-by: Huang Shijie <b32955@freescale.com> --- drivers/mtd/nand/gpmi-nand/gpmi-lib.c | 10 +++++++--- 1 files changed, 7 insertions(+), 3 deletions(-)