diff mbox

mtd: nand: gpmi: reset BCH earlier, too, to avoid NAND startup problems

Message ID 1354740362-20412-1-git-send-email-w.sang@pengutronix.de
State Accepted
Commit 6f2a6a52560ad8d85710aabd92b7a3239b3a6b07
Headers show

Commit Message

Wolfram Sang Dec. 5, 2012, 8:46 p.m. UTC
It could happen (1 out of 100 times) that NAND did not start up
correctly after warm rebooting, so the kernel could not find the UBI or
DMA timed out due to a stalled BCH. When resetting BCH together with
GPMI, the issue could not be observed anymore (after 10000+ reboots). We
probably need the consistent state already before sending any command to
NAND, even when no ECC is needed. I chose to keep the extra reset for
BCH when changing the flash layout to be on the safe side.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Huang Shijie <b32955@freescale.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <david.woodhouse@intel.com>
---

Would be really great to have in 3.7. Also a stable-candidate IMO.

 drivers/mtd/nand/gpmi-nand/gpmi-lib.c |    9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Huang Shijie Dec. 6, 2012, 2:16 a.m. UTC | #1
于 2012年12月06日 04:46, Wolfram Sang 写道:
> correctly after warm rebooting, so the kernel could not find the UBI or
which Soc do you meet this issue? the mx23 or mx28?

It's bad news to me. I ever thought the BCH-reset-issue is gone.
I ever tested many times in mx28(> 10000 times) with freescale's uboot.

I guest you are not use the freescale's uboot. The uboot also will reset
the BCH/GPMI.


thanks
Huang Shijie
Wolfram Sang Dec. 6, 2012, 9:52 a.m. UTC | #2
On Thu, Dec 06, 2012 at 10:16:08AM +0800, Huang Shijie wrote:
> 于 2012年12月06日 04:46, Wolfram Sang 写道:
> > correctly after warm rebooting, so the kernel could not find the UBI or
> which Soc do you meet this issue? the mx23 or mx28?

MX28.

> It's bad news to me. I ever thought the BCH-reset-issue is gone.
> I ever tested many times in mx28(> 10000 times) with freescale's uboot.

Did you power-cycle between each test or reboot? I still need to test if
this really makes a difference, but I think the issue shows more often
when only rebooting.

> I guest you are not use the freescale's uboot. The uboot also will reset
> the BCH/GPMI.

I just checked its sources. It also has this flaw. GPMI and BCH need to
be reset at the same time, i.e. before first commands are sent to the
NAND (although they don't need ECC).

I use barebox, but that isn't of importance here. If I don't fix
barebox, I see NAND issues in barebox or the kernel. If I fix barebox, I
still see issues in the kernel, but not in barebox anymore.

The setup needs to be done properly to ensure a consistent state at the
beginning, especially when somebody used the NAND before (ROM code,
bootloader). Fixing the bootloader alone is not enough.

Thanks,

   Wolfram
Huang Shijie Dec. 6, 2012, 2:58 p.m. UTC | #3
On Thu, Dec 6, 2012 at 4:52 AM, Wolfram Sang <w.sang@pengutronix.de> wrote:
> On Thu, Dec 06, 2012 at 10:16:08AM +0800, Huang Shijie wrote:
>> 于 2012年12月06日 04:46, Wolfram Sang 写道:
>> > correctly after warm rebooting, so the kernel could not find the UBI or
>> which Soc do you meet this issue? the mx23 or mx28?
>
> MX28.
>
>> It's bad news to me. I ever thought the BCH-reset-issue is gone.
>> I ever tested many times in mx28(> 10000 times) with freescale's uboot.
>
> Did you power-cycle between each test or reboot? I still need to test if
What's the meaning of "power-cycle"?
I only tested with the soft reset, the power is never shut down.


> this really makes a difference, but I think the issue shows more often
> when only rebooting.
>
>> I guest you are not use the freescale's uboot. The uboot also will reset
>> the BCH/GPMI.
>
> I just checked its sources. It also has this flaw. GPMI and BCH need to
> be reset at the same time, i.e. before first commands are sent to the
> NAND (although they don't need ECC).
>

With your patch, we have resetted the BCH twice in the driver. Could
we only reset BCH one time?
Do you ever remove another reset-bch code, and test it?
Frankly speaking, it's strange to reset the BCH twice, and it makes no sense. :(
I am afraid that we have to add three-reset-bch in the future, if we
can not find the root cause.
If you can remove another reset-bch code, i will ack the patch.


> I use barebox, but that isn't of importance here. If I don't fix
> barebox, I see NAND issues in barebox or the kernel. If I fix barebox, I
> still see issues in the kernel, but not in barebox anymore.
>
I ever met the same case with the freescale's BSP code.
I fixed the issue in uboot, and fixed the driver too.



> The setup needs to be done properly to ensure a consistent state at the
> beginning, especially when somebody used the NAND before (ROM code,
At the beginning? it's really interesting.
I want to test it myself.

could you wait for some time?

Best Regards
Huang Shijie


> bootloader). Fixing the bootloader alone is not enough.
>
> Thanks,
>
>    Wolfram
>
> --
> Pengutronix e.K.                           | Wolfram Sang                |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iEYEARECAAYFAlDAas0ACgkQD27XaX1/VRs8iQCgrqb5nADGL5Dyi+csfSdSLWSR
> /mUAoJN91gz+1QiqVUsNdHtwX9jpAdZC
> =nMk3
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Wolfram Sang Dec. 6, 2012, 4:58 p.m. UTC | #4
> > Did you power-cycle between each test or reboot? I still need to test if
> What's the meaning of "power-cycle"?

power off - power on

> I only tested with the soft reset, the power is never shut down.

OK.

> With your patch, we have resetted the BCH twice in the driver. Could
> we only reset BCH one time?

You tell me, please :) Is it needed when changing the layout? I don't
have any setup to test that. Unless I can test that, I prefer to not
remove it. I am a bit anxious, because the reason for the stalled BCH is
not fully understood and all I can reliably say is that resetting twice
does not hurt.

> Do you ever remove another reset-bch code, and test it?

It will probably work, but we don't cover the case of changing the
layout?

> Frankly speaking, it's strange to reset the BCH twice, and it makes no sense. :(

The stalled BCH makes no sense either, currently.

> I am afraid that we have to add three-reset-bch in the future, if we
> can not find the root cause.

Well, ask the IC guys if something can go wrong when the BCH has been
active and the GPMI gets reset and issues NAND commands (without needing
ECC).

> If you can remove another reset-bch code, i will ack the patch.

Removing code that late in the cycle without proof sounds dangerous to
me.

> > The setup needs to be done properly to ensure a consistent state at the
> > beginning, especially when somebody used the NAND before (ROM code,
> At the beginning? it's really interesting.
> I want to test it myself.
> 
> could you wait for some time?

What is some time? :)

Thanks,

   Wolfram
Huang Shijie Dec. 11, 2012, 2:01 a.m. UTC | #5
于 2012年12月06日 17:52, Wolfram Sang 写道:
> The setup needs to be done properly to ensure a consistent state at the
> beginning, especially when somebody used the NAND before (ROM code,
> bootloader). Fixing the bootloader alone is not enough.
I tried to make some time to test it.
But i am too busy this week.
Anyway, let this patch get merged.

Acked-by: Huang Shijie <b32955@freescale.com>


thanks
Huang Shijie
Artem Bityutskiy Dec. 12, 2012, 3:11 p.m. UTC | #6
On Wed, 2012-12-05 at 21:46 +0100, Wolfram Sang wrote:
> It could happen (1 out of 100 times) that NAND did not start up
> correctly after warm rebooting, so the kernel could not find the UBI or
> DMA timed out due to a stalled BCH. When resetting BCH together with
> GPMI, the issue could not be observed anymore (after 10000+ reboots). We
> probably need the consistent state already before sending any command to
> NAND, even when no ECC is needed. I chose to keep the extra reset for
> BCH when changing the flash layout to be on the safe side.
> 
> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
> Cc: Huang Shijie <b32955@freescale.com>
> Cc: Artem Bityutskiy <dedekind1@gmail.com>
> Cc: David Woodhouse <david.woodhouse@intel.com>
> ---
> 
> Would be really great to have in 3.7. Also a stable-candidate IMO.

Pushed to l2-mtd.git, thanks

If you believe this is suitable for stable, please let me know.
Wolfram Sang Dec. 12, 2012, 4:01 p.m. UTC | #7
On Wed, Dec 12, 2012 at 05:11:43PM +0200, Artem Bityutskiy wrote:
> On Wed, 2012-12-05 at 21:46 +0100, Wolfram Sang wrote:
> > It could happen (1 out of 100 times) that NAND did not start up
> > correctly after warm rebooting, so the kernel could not find the UBI or
> > DMA timed out due to a stalled BCH. When resetting BCH together with
> > GPMI, the issue could not be observed anymore (after 10000+ reboots). We
> > probably need the consistent state already before sending any command to
> > NAND, even when no ECC is needed. I chose to keep the extra reset for
> > BCH when changing the flash layout to be on the safe side.
> > 
> > Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
> > Cc: Huang Shijie <b32955@freescale.com>
> > Cc: Artem Bityutskiy <dedekind1@gmail.com>
> > Cc: David Woodhouse <david.woodhouse@intel.com>
> > ---
> > 
> > Would be really great to have in 3.7. Also a stable-candidate IMO.
> 
> Pushed to l2-mtd.git, thanks
> 
> If you believe this is suitable for stable, please let me know.

I strongly believe that.
Artem Bityutskiy Dec. 13, 2012, 11:41 a.m. UTC | #8
On Wed, 2012-12-12 at 17:01 +0100, Wolfram Sang wrote:

> > If you believe this is suitable for stable, please let me know.
> 
> I strongly believe that.

OK, added

Cc: stable@vger.kernel.org
diff mbox

Patch

diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
index 3502acc..84f0526 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
@@ -166,6 +166,15 @@  int gpmi_init(struct gpmi_nand_data *this)
 	if (ret)
 		goto err_out;
 
+	/*
+	 * Reset BCH here, too. We got failures otherwise :(
+	 * See later BCH reset for explanation of MX23 handling
+	 */
+	ret = gpmi_reset_block(r->bch_regs, GPMI_IS_MX23(this));
+	if (ret)
+		goto err_out;
+
+
 	/* Choose NAND mode. */
 	writel(BM_GPMI_CTRL1_GPMI_MODE, r->gpmi_regs + HW_GPMI_CTRL1_CLR);