Patchwork Numonyx NOR and chip->mutex bug?

login
register
mail settings
Submitter Stefan Bigler
Date Feb. 7, 2011, 5:08 p.m.
Message ID <4D502727.5010208@keymile.com>
Download mbox | patch
Permalink /patch/82120/
State New
Headers show

Comments

Stefan Bigler - Feb. 7, 2011, 5:08 p.m.
Hi Mike

I attached a patch for you adding some printk, with those already the 
creation of the first ubivolume goes wrong.
Is this also the case in your setup?

Regards Stefan

Subject: [PATCH 2/2] MTD: cfi_cmdset_0001 driver add tracing

This tracing force in my case a race condition. When I create
a ubivolume.

Signed-off-by: Stefan Bigler <stefan.bigler@keymile.com>
---
  drivers/mtd/chips/cfi_cmdset_0001.c |   15 +++++++++++++++
  1 files changed, 15 insertions(+), 0 deletions(-)

           * appears to make some (28F320) flash devices switch to
@@ -1012,6 +1016,10 @@ static void put_chip(struct map_info *map, struct 
flchip *chip, unsigned long ad
             do. */
          map_write(map, CMD(0xd0), adr);
          map_write(map, CMD(0x70), adr);
+
+        printk("[%10u][%04ld] erase resumed 2b        adr=0x%08lx\n",
+               jiffies_to_usecs(jiffies)/1000, sys_gettid(), adr);
+
          chip->oldstate = FL_READY;
          chip->state = FL_ERASING;
          break;
@@ -1884,6 +1892,10 @@ static int __xipram do_erase_oneblock(struct 
map_info *map, struct flchip *chip,
   retry:
      mutex_lock(&chip->mutex);
      ret = get_chip(map, chip, adr, FL_ERASING);
+
+    printk("[%10u][%04ld] do_erase_oneblock start adr=0x%08lx len=0x%x\n",
+           jiffies_to_usecs(jiffies)/1000, sys_gettid(), adr, len);
+
      if (ret) {
          mutex_unlock(&chip->mutex);
          return ret;
@@ -1953,6 +1965,9 @@ static int __xipram do_erase_oneblock(struct 
map_info *map, struct flchip *chip,

      xip_enable(map, chip, adr);
   out:    put_chip(map, chip, adr);
+    printk("[%10u][%04ld] do_erase_oneblock end   adr=0x%08lx len=0x%x 
\n",
+        jiffies_to_usecs(jiffies)/1000, sys_gettid(), adr, len);
+
      mutex_unlock(&chip->mutex);
      return ret;
  }
Michael Cashwell - Feb. 7, 2011, 7:04 p.m.
On Feb 7, 2011, at 12:08 PM, Stefan Bigler wrote:

> Hi Mike
> 
> I attached a patch for you adding some printk, with those already the creation of the first ubivolume goes wrong.
> Is this also the case in your setup?

Greetings and thanks,

Unfortunately, as I've seen before, adding printks tends to make my problem go away, particularly right after the 0xd0 erase resume command. That's the same place that I also found calls like udelay(20) and (void) map_read() of status also hide the failure. (On a lark, I just tried doing that print just before the resume command but it made no difference.)

To avoid masking the failure I think I need to collect information into a static buffer and printk the buffered data only on the error path. That way, the timing of the code hopefully won't change enough to hide the problem. That's more complicated to do, of course.

More as I find it.

-Mike

> Subject: [PATCH 2/2] MTD: cfi_cmdset_0001 driver add tracing
> 
> This tracing force in my case a race condition. When I create
> a ubivolume.
> 
> Signed-off-by: Stefan Bigler <stefan.bigler@keymile.com>
> ---
> drivers/mtd/chips/cfi_cmdset_0001.c |   15 +++++++++++++++
> 1 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c b/drivers/mtd/chips/cfi_cmdset_0001.c
> index d3b2cd3..ea59726 100644
> --- a/drivers/mtd/chips/cfi_cmdset_0001.c
> +++ b/drivers/mtd/chips/cfi_cmdset_0001.c
> @@ -37,6 +37,8 @@
> #include <linux/mtd/compatmac.h>
> #include <linux/mtd/cfi.h>
> 
> +#include <linux/syscalls.h>
> +
> /* #define CMDSET0001_DISABLE_ERASE_SUSPEND_ON_WRITE */
> /* #define CMDSET0001_DISABLE_WRITE_SUSPEND */
> 
> @@ -799,6 +801,8 @@ static int chip_ready (struct map_info *map, struct flchip *chip, unsigned long
> 
>         /* Erase suspend */
>         map_write(map, CMD(0xB0), adr);
> +        printk("[%10u][%04ld] erase suspend 1         adr=0x%08lx\n",
> +               jiffies_to_usecs(jiffies)/1000, sys_gettid(), adr);
> 
>         /* If the flash has finished erasing, then 'erase suspend'
>          * appears to make some (28F320) flash devices switch to
> @@ -1012,6 +1016,10 @@ static void put_chip(struct map_info *map, struct flchip *chip, unsigned long ad
>            do. */
>         map_write(map, CMD(0xd0), adr);
>         map_write(map, CMD(0x70), adr);
> +
> +        printk("[%10u][%04ld] erase resumed 2b        adr=0x%08lx\n",
> +               jiffies_to_usecs(jiffies)/1000, sys_gettid(), adr);
> +
>         chip->oldstate = FL_READY;
>         chip->state = FL_ERASING;
>         break;
> @@ -1884,6 +1892,10 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip,
>  retry:
>     mutex_lock(&chip->mutex);
>     ret = get_chip(map, chip, adr, FL_ERASING);
> +
> +    printk("[%10u][%04ld] do_erase_oneblock start adr=0x%08lx len=0x%x\n",
> +           jiffies_to_usecs(jiffies)/1000, sys_gettid(), adr, len);
> +
>     if (ret) {
>         mutex_unlock(&chip->mutex);
>         return ret;
> @@ -1953,6 +1965,9 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip,
> 
>     xip_enable(map, chip, adr);
>  out:    put_chip(map, chip, adr);
> +    printk("[%10u][%04ld] do_erase_oneblock end   adr=0x%08lx len=0x%x \n",
> +        jiffies_to_usecs(jiffies)/1000, sys_gettid(), adr, len);
> +
>     mutex_unlock(&chip->mutex);
>     return ret;
> }
> -- 
> 1.7.0.5
> 
> Am 02/07/2011 05:46 PM, schrieb Michael Cashwell:
>> On Feb 7, 2011, at 11:22 AM, Joakim Tjernlund wrote:
>> 
>>> Michael Cashwell<mboards@prograde.net>  wrote on 2011/02/07 16:46:22:
>>> 
>>>> My current workaround from my problem is to do a throw-away status read "(void) map_read(map, addr);" after that 0x50, 0xd0, 0x70 sequence. Since no one else is seeing my problem I expect it's some issue with my specific batch of chips. Ugh.
>>> 
>>> Possibly, or an accident waiting to happen to the rest of us.
>> 
>> That does worry me too.
>> 
>>> The map_read will probably force some HW completion. Perhaps some sync() op. will do the same? Just to nail it down.
>> 
>> Once your patch is handled I will continue to try to fully explain my issue. I'm not giving up yet.
>> 
>>> BTW, do you have CONFIG_MTD_COMPLEX_MAPPINGS=y ? I do
>> 
>> Interesting. I have used that on a different platform where some of the high-order address lines were GPIOs.
>> 
>> But no, I'm not using that in this case. Just a simple linear mapping of the whole part.
>> 
>> -Mike
>> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
Michael Cashwell - Feb. 9, 2011, 7:52 p.m.
On Feb 7, 2011, at 2:04 PM, Michael Cashwell wrote:

> To avoid masking the failure I think I need to collect information into a static buffer and printk the buffered data only on the error path. That way, the timing of the code hopefully won't change enough to hide the problem. That's more complicated to do, of course.
> 
> More as I find it.

I added a simple buffer to log the last N map_writes() in cfi_cmdset_0001.c. As expected, I had to be very selective about what and how much to log and the logging code itself could take enough on the hot path to make my failure vanish. But I have been able to distill down the results.

It doesn't look like a threading/locking/mutex problem after all. It looks like a straight timing issue on these chips. Here's my reduced log:

Dumping flash write history:
... ongoing write buffer data ...
...cfi_cmdset_0001.c line 1793 writes 216c  <     last data word
...cfi_cmdset_0001.c line 1808 writes 00d0  <---  write confirm
...cfi_cmdset_0001.c line 1051 writes 0050  +	  presumably the buffer write finished and called put_chip.
...cfi_cmdset_0001.c line 1052 writes 00d0  + <-- put_chip(), oldstate FL_ERASING -> erase resume.
...cfi_cmdset_0001.c line 1053 writes 0070  +     == 20µs delay or discard-read of status register here hides problem ===
...cfi_cmdset_0001.c line 832 writes 00b0   <---| erase suspend.
...cfi_cmdset_0001.c line 839 writes 0070   <---| presumably in get_chip/chip_ready for FL_WRITING at start of
...cfi_cmdset_0001.c line 1743 writes 00e8  <---  buffer write
...cfi_cmdset_0001.c line 1771 writes 01ff  <---  number of words
...cfi_cmdset_0001.c line 1793 writes 0001  <     data word 1
... 510 words of data ...
...cfi_cmdset_0001.c line 1793 writes 5c0c  <     data word 512
...cfi_cmdset_0001.c line 1808 writes 00d0  <---  write confirm
...cfi_cmdset_0001.c line 1829 writes 0050  +     write error path
...cfi_cmdset_0001.c line 1830 writes 0070  +     write error has been sensed
NOR Flash: buffer write error (block locked) status 0xffff (priv = (null), chip = c7cf2ef8)
UBI error: ubi_io_write: error -30 while writing 1960 bytes to PEB 30:57800, written 568 bytes
UBI warning: ubi_eba_write_leb: failed to write data to PEB 30
UBI warning: ubi_ro_mode: switch to read-only mode
UBIFS error (pid 404): ubifs_wbuf_write_nolock: cannot write 1956 bytes to LEB 17:57664, error -30
UBIFS warning (pid 404): ubifs_ro_mode: switched to read-only mode, error -30

So when a buffered write fails, it's when the previous write suspended an erase. But more specifically, it seems that if the suspended erase is resumed and then immediately suspended again things go off the rails. This is between entries for lines 1053 and 832 above.

The chip seems to malfunction if an erase-resume is followed "too soon" by an erase-suspend.

I'd like to do things like sys_gettid() or perhaps get_timeofday() but I find that if I log more than 2 words per map_write() the failure is masked. Still working on that.

The only other thing I've done is ensure that following an erase-suspend for (;;) loop that map_word_bitset(map, status, CMD(0x40)) is true. If not then the erase that was being suspended completed instead (which can happen according to the data sheet). That assertion has not failed in my testing so that's not my problem.

I'm still digging but it's really looking like a straight timing problem with this batch of chips. At what point should I declare defeat and propose my simple fix even though it's just for these chips? The file has several instances of similar "Parts XYZ act funny without this even though it should be unnecessary" workarounds.

-Mike
Joakim Tjernlund - Feb. 9, 2011, 8:13 p.m.
linux-mtd-bounces@lists.infradead.org wrote on 2011/02/09 20:52:23:

> From: Michael Cashwell <mboards@prograde.net>
> To: Michael Cashwell <mboards@prograde.net>
> Cc: linux-mtd@lists.infradead.org, Holger brunck <holger.brunck@keymile.com>, stefan.bigler@keymile.com, Joakim Tjernlund <joakim.tjernlund@transmode.se>
> Date: 2011/02/09 20:56
> Subject: Re: Numonyx NOR and chip->mutex bug?
> Sent by: linux-mtd-bounces@lists.infradead.org
>
> On Feb 7, 2011, at 2:04 PM, Michael Cashwell wrote:
>
> > To avoid masking the failure I think I need to collect information into a static buffer and printk the buffered data only on the error path. That way, the timing of the code hopefully won't change enough to hide the problem. That's more complicated to do, of course.
> >
> > More as I find it.
>
> I added a simple buffer to log the last N map_writes() in cfi_cmdset_0001.c. As expected, I had to be very selective about what and how much to log and the logging code itself could take enough on the hot path to make my failure vanish. But I have been able to distill down the results.
>
> It doesn't look like a threading/locking/mutex problem after all. It looks like a straight timing issue on these chips. Here's my reduced log:
>
> Dumping flash write history:
> ... ongoing write buffer data ...
> ...cfi_cmdset_0001.c line 1793 writes 216c  <     last data word
> ...cfi_cmdset_0001.c line 1808 writes 00d0  <---  write confirm
> ...cfi_cmdset_0001.c line 1051 writes 0050  +     presumably the buffer write finished and called put_chip.
> ...cfi_cmdset_0001.c line 1052 writes 00d0  + <-- put_chip(), oldstate FL_ERASING -> erase resume.
> ...cfi_cmdset_0001.c line 1053 writes 0070  +     == 20µs delay or discard-read of status register here hides problem ===
> ...cfi_cmdset_0001.c line 832 writes 00b0   <---| erase suspend.
> ...cfi_cmdset_0001.c line 839 writes 0070   <---| presumably in get_chip/chip_ready for FL_WRITING at start of
> ...cfi_cmdset_0001.c line 1743 writes 00e8  <---  buffer write
> ...cfi_cmdset_0001.c line 1771 writes 01ff  <---  number of words
> ...cfi_cmdset_0001.c line 1793 writes 0001  <     data word 1
> ... 510 words of data ...
> ...cfi_cmdset_0001.c line 1793 writes 5c0c  <     data word 512
> ...cfi_cmdset_0001.c line 1808 writes 00d0  <---  write confirm
> ...cfi_cmdset_0001.c line 1829 writes 0050  +     write error path
> ...cfi_cmdset_0001.c line 1830 writes 0070  +     write error has been sensed
> NOR Flash: buffer write error (block locked) status 0xffff (priv = (null), chip = c7cf2ef8)
> UBI error: ubi_io_write: error -30 while writing 1960 bytes to PEB 30:57800, written 568 bytes
> UBI warning: ubi_eba_write_leb: failed to write data to PEB 30
> UBI warning: ubi_ro_mode: switch to read-only mode
> UBIFS error (pid 404): ubifs_wbuf_write_nolock: cannot write 1956 bytes to LEB 17:57664, error -30
> UBIFS warning (pid 404): ubifs_ro_mode: switched to read-only mode, error -30
>
> So when a buffered write fails, it's when the previous write suspended an erase. But more specifically, it seems that if the suspended erase is resumed and then immediately suspended again things go off the rails. This is between entries for lines 1053 and 832 above.
>
> The chip seems to malfunction if an erase-resume is followed "too soon" by an erase-suspend.
>
> I'd like to do things like sys_gettid() or perhaps get_timeofday() but I find that if I log more than 2 words per map_write() the failure is masked. Still working on that.
>
> The only other thing I've done is ensure that following an erase-suspend for (;;) loop that map_word_bitset(map, status, CMD(0x40)) is true. If not then the erase that was being suspended completed instead (which can happen according to the data sheet). That assertion has not failed in my testing
so that's not my problem.
>
> I'm still digging but it's really looking like a straight timing problem with this batch of chips. At what point should I declare defeat and propose my simple fix even though it's just for these chips? The file has several instances of similar "Parts XYZ act funny without this even though it
should be unnecessary" workarounds.
>
> -Mike

hmm, this sounds similar(from http://www.numonyx.com/Documents/Specification%20Updates/SU-309045_P30.pdf)

5. W602: erase suspend resume operation
Problem: P30 product may fail to erase in intensive erase/suspend/resume environments. This is
        due to an internal firmware issue that is exhibited in certain applications that require at
       least 3000 to 4000 erase/suspend/resume cycles during the erase of a single block.
Implication: Customer may see erase failure (SR reports “A0”) during a background erase. This
                does not damage the device in any way, but data in the block may be disturbed from its
               original state.
Workaround: If such an erase failure occurs, customer should retry the erase with an increase to the
Tres (W602) spec from 500uS to 1mS. If the device still fails, continue to increase Tres
in increments of 500uS up to a maximum Tres of 2.5mS. Once the failing block passes,
subsequent blocks should revert back to original erase algorithm and timing for Tres
(500uS typical).
Status:
June 2008
309045-09
This erratum does not apply to material marked with an FPO code dated x806xxxx or
later.
Michael Cashwell - Feb. 9, 2011, 9:59 p.m.
On Feb 9, 2011, at 3:13 PM, Joakim Tjernlund wrote:

> hmm, this sounds similar(from http://www.numonyx.com/Documents/Specification%20Updates/SU-309045_P30.pdf)
> 
> 5. W602: erase suspend resume operation
> Problem: P30 product may fail to erase in intensive erase/suspend/resume environments. This is
>        due to an internal firmware issue that is exhibited in certain applications that require at
>       least 3000 to 4000 erase/suspend/resume cycles during the erase of a single block.
> Implication: Customer may see erase failure (SR reports “A0”) during a background erase. This
>                does not damage the device in any way, but data in the block may be disturbed from its
>               original state.
> Workaround: If such an erase failure occurs, customer should retry the erase with an increase to the
> Tres (W602) spec from 500uS to 1mS. If the device still fails, continue to increase Tres
> in increments of 500uS up to a maximum Tres of 2.5mS. Once the failing block passes,
> subsequent blocks should revert back to original erase algorithm and timing for Tres
> (500uS typical).
> Status:
> June 2008
> 309045-09
> This erratum does not apply to material marked with an FPO code dated x806xxxx or
> later.

Interesting. Thanks for the URL. I'll explore it. I have a 2009 errata sheet but had not seen the 2008 one. (I'm wondering if that's because your's is for a batch of the older 130nm parts which are no longer made. Just guessing.)

It doesn't match exactly, since I'm not seeing erase failures. I'm seeing buffered write failures where, after the 0xd0 Write Confirm command I get back nonsense status. (See the 0xffff reported in my log. I've also seen many values other than 0xffff.)

So the the program operation (0xe8) begins OK, the count and data are written, the confirm command is written but then the status read reads as junk. I'm thinking about adding status-read logs to my buffer if I can do it without disturbing the timing.

The reason that status is nonsense in the status register is only in the lower 8 bits. If it's in read-status mode (which it should be) we should never see any set bits in the upper byte. Because I do see such bits I've suspected from the start that somehow the part is exiting the status-read mode and returning array-read mode when it should not. (My very first post mentions this suspicion.)

One more log point I think I may check is to count and report the number of suspend/resumes for each erase. It would be interesting to see if some sort of spike in that count correlates with my failures.

More as I find it.

-Mike
Anders Grafström - Feb. 10, 2011, 1:21 p.m.
On 2011-02-09 22:59, Michael Cashwell wrote:
> On Feb 9, 2011, at 3:13 PM, Joakim Tjernlund wrote:
> 
>> hmm, this sounds similar(from http://www.numonyx.com/Documents/Specification%20Updates/SU-309045_P30.pdf)
>>
>> 5. W602: erase suspend resume operation
>> Problem: P30 product may fail to erase in intensive erase/suspend/resume environments. This is
>>        due to an internal firmware issue that is exhibited in certain applications that require at
>>       least 3000 to 4000 erase/suspend/resume cycles during the erase of a single block.
>> Implication: Customer may see erase failure (SR reports “A0”) during a background erase. This
>>                does not damage the device in any way, but data in the block may be disturbed from its
>>               original state.

I've seen this issue with Intel 28F640J5 chips as well.

There's an old thread on it.
http://lists.infradead.org/pipermail/linux-mtd/2008-April/021266.html

A delay was suggested similar to the one you're experimenting with I think.
http://lists.infradead.org/pipermail/linux-mtd/2008-April/021436.html

/Anders
Joakim Tjernlund - Feb. 10, 2011, 2:04 p.m.
Anders Grafström <grfstrm@users.sourceforge.net> wrote on 2011/02/10 14:21:36:
>
> On 2011-02-09 22:59, Michael Cashwell wrote:
> > On Feb 9, 2011, at 3:13 PM, Joakim Tjernlund wrote:
> >
> >> hmm, this sounds similar(from http://www.numonyx.com/Documents/Specification%20Updates/SU-309045_P30.pdf)
> >>
> >> 5. W602: erase suspend resume operation
> >> Problem: P30 product may fail to erase in intensive erase/suspend/resume environments. This is
> >>        due to an internal firmware issue that is exhibited in certain applications that require at
> >>       least 3000 to 4000 erase/suspend/resume cycles during the erase of a single block.
> >> Implication: Customer may see erase failure (SR reports “A0”) during a background erase. This
> >>                does not damage the device in any way, but data in the block may be disturbed from its
> >>               original state.
>
> I've seen this issue with Intel 28F640J5 chips as well.
>
> There's an old thread on it.
> http://lists.infradead.org/pipermail/linux-mtd/2008-April/021266.html
>
> A delay was suggested similar to the one you're experimenting with I think.
> http://lists.infradead.org/pipermail/linux-mtd/2008-April/021436.html

Oh, I had forgotten about this thread :)

I agree with the latency theory, for Numonyx there is this:

W602 is the typical time between an initial block erase or erase resume command and the a subsequent erase suspend
command. Violating the specification repeatedly during any particular block erase may cause erase failures.

W602 is defined to 500us

Adding a delay after resume will do it but is a bit crude. Possibly one
could add a timestamp at resume/initial erase and a check in suspend that enough time has passed
before suspending again.

How does that sound?

 Jocke
Joakim Tjernlund - Feb. 10, 2011, 2:53 p.m.
>
> On Feb 9, 2011, at 3:13 PM, Joakim Tjernlund wrote:
>
> > hmm, this sounds similar(from http://www.numonyx.com/Documents/Specification%20Updates/SU-309045_P30.pdf)
> >
> > 5. W602: erase suspend resume operation
> > Problem: P30 product may fail to erase in intensive erase/suspend/resume environments. This is
> >        due to an internal firmware issue that is exhibited in certain applications that require at
> >       least 3000 to 4000 erase/suspend/resume cycles during the erase of a single block.
> > Implication: Customer may see erase failure (SR reports “A0”) during a background erase. This
> >                does not damage the device in any way, but data in the block may be disturbed from its
> >               original state.
> > Workaround: If such an erase failure occurs, customer should retry the erase with an increase to the
> > Tres (W602) spec from 500uS to 1mS. If the device still fails, continue to increase Tres
> > in increments of 500uS up to a maximum Tres of 2.5mS. Once the failing block passes,
> > subsequent blocks should revert back to original erase algorithm and timing for Tres
> > (500uS typical).
> > Status:
> > June 2008
> > 309045-09
> > This erratum does not apply to material marked with an FPO code dated x806xxxx or
> > later.
>
> Interesting. Thanks for the URL. I'll explore it. I have a 2009 errata sheet but had not seen the 2008 one. (I'm wondering if that's because your's is for a batch of the older 130nm parts which are no longer made. Just guessing.)
>
> It doesn't match exactly, since I'm not seeing erase failures. I'm seeing buffered write failures where, after the 0xd0 Write Confirm command I get back nonsense status. (See the 0xffff reported in my log. I've also seen many values other than 0xffff.)
>
> So the the program operation (0xe8) begins OK, the count and data are written, the confirm command is written but then the status read reads as junk. I'm thinking about adding status-read logs to my buffer if I can do it without disturbing the timing.
>
> The reason that status is nonsense in the status register is only in the lower 8 bits. If it's in read-status mode (which it should be) we should never see any set bits in the upper byte. Because I do see such bits I've suspected from the start that somehow the part is exiting the status-read mode
and returning array-read mode when it should not. (My very first post mentions this suspicion.)
>
> One more log point I think I may check is to count and report the number of suspend/resumes for each erase. It would be interesting to see if some sort of spike in that count correlates with my failures.
>
> More as I find it.

Mike, what if you move the udelay to just before doing erase suspend? Perhaps
you need to increase it a bit.

 Jocke
Joakim Tjernlund - Feb. 10, 2011, 3:04 p.m.
>
> Anders Grafström <grfstrm@users.sourceforge.net> wrote on 2011/02/10 14:21:36:
> >
> > On 2011-02-09 22:59, Michael Cashwell wrote:
> > > On Feb 9, 2011, at 3:13 PM, Joakim Tjernlund wrote:
> > >
> > >> hmm, this sounds similar(from http://www.numonyx.com/Documents/Specification%20Updates/SU-309045_P30.pdf)
> > >>
> > >> 5. W602: erase suspend resume operation
> > >> Problem: P30 product may fail to erase in intensive erase/suspend/resume environments. This is
> > >>        due to an internal firmware issue that is exhibited in certain applications that require at
> > >>       least 3000 to 4000 erase/suspend/resume cycles during the erase of a single block.
> > >> Implication: Customer may see erase failure (SR reports “A0”) during a background erase. This
> > >>                does not damage the device in any way, but data in the block may be disturbed from its
> > >>               original state.
> >
> > I've seen this issue with Intel 28F640J5 chips as well.
> >
> > There's an old thread on it.
> > http://lists.infradead.org/pipermail/linux-mtd/2008-April/021266.html
> >
> > A delay was suggested similar to the one you're experimenting with I think.
> > http://lists.infradead.org/pipermail/linux-mtd/2008-April/021436.html
>
> Oh, I had forgotten about this thread :)
>
> I agree with the latency theory, for Numonyx there is this:
>
> W602 is the typical time between an initial block erase or erase resume command and the a subsequent erase suspend
> command. Violating the specification repeatedly during any particular block erase may cause erase failures.
>
> W602 is defined to 500us
>
> Adding a delay after resume will do it but is a bit crude. Possibly one
> could add a timestamp at resume/initial erase and a check in suspend that enough time has passed
> before suspending again.
>
> How does that sound?

A simpler impl. would be a suspend counter. When no. of suspends > 1000, do usleep(500)
or usleep(500) every 100-500 suspends.

 Jocke
Michael Cashwell - Feb. 10, 2011, 3:43 p.m.
On Feb 10, 2011, at 9:04 AM, Joakim Tjernlund wrote:

> Anders Grafström <grfstrm@users.sourceforge.net> wrote on 2011/02/10 14:21:36:
> 
>> I've seen this issue with Intel 28F640J5 chips as well.
>> 
>> There's an old thread on it.
>> http://lists.infradead.org/pipermail/linux-mtd/2008-April/021266.html
>> 
>> A delay was suggested similar to the one you're experimenting with I think.
>> http://lists.infradead.org/pipermail/linux-mtd/2008-April/021436.html
> 
> Oh, I had forgotten about this thread :)
> 
> I agree with the latency theory, for Numonyx there is this:
> 
> W602 is the typical time between an initial block erase or erase resume command and the a subsequent erase suspend
> command. Violating the specification repeatedly during any particular block erase may cause erase failures.

Interesting. I saw the W602 value but it also talks about "erase failures" which are not precisely what I'm seeing. I see failed buffered write. But it's sounding like the issue anyway.

I've just instrumented a suspend count and the erase that was active when a failed write occurred was suspended 29 times. That's not really very high so it fits with the idea that the issue is not the number of suspend/resumes but their timing.

> W602 is defined to 500us
> 
> Adding a delay after resume will do it but is a bit crude. Possibly one could add a timestamp at resume/initial erase and a check in suspend that enough time has passed before suspending again.
> 
> How does that sound?


I like this idea but how do we deal with the lack of precision? 500µs "typical" and violating this timing "repeatedly"? Yuk. Without hard numbers coding it will be tricky. And what about this all being Numonyx-specific? Do we need to key off of the manufacturer ID or does it apply to all chips handled by this cmd set?

Lastly, what's the general kernel API for µs resolution time? I recall having issues in the past when high-resolution timers is not enabled in the kernel config.

-Mike
Joakim Tjernlund - Feb. 10, 2011, 3:51 p.m.
Michael Cashwell <mboards@prograde.net> wrote on 2011/02/10 16:43:40:
>
> On Feb 10, 2011, at 9:04 AM, Joakim Tjernlund wrote:
>
> > Anders Grafström <grfstrm@users.sourceforge.net> wrote on 2011/02/10 14:21:36:
> >
> >> I've seen this issue with Intel 28F640J5 chips as well.
> >>
> >> There's an old thread on it.
> >> http://lists.infradead.org/pipermail/linux-mtd/2008-April/021266.html
> >>
> >> A delay was suggested similar to the one you're experimenting with I think.
> >> http://lists.infradead.org/pipermail/linux-mtd/2008-April/021436.html
> >
> > Oh, I had forgotten about this thread :)
> >
> > I agree with the latency theory, for Numonyx there is this:
> >
> > W602 is the typical time between an initial block erase or erase resume command and the a subsequent erase suspend
> > command. Violating the specification repeatedly during any particular block erase may cause erase failures.
>
> Interesting. I saw the W602 value but it also talks about "erase failures" which are not precisely what I'm seeing. I see failed buffered write. But it's sounding like the issue anyway.
>
> I've just instrumented a suspend count and the erase that was active when a failed write occurred was suspended 29 times. That's not really very high so it fits with the idea that the issue is not the number of suspend/resumes but their timing.

29 sounds very low, it may be some additional issue you see.

>
> > W602 is defined to 500us
> >
> > Adding a delay after resume will do it but is a bit crude. Possibly one could add a timestamp at resume/initial erase and a check in suspend that enough time has passed before suspending again.
> >
> > How does that sound?
>
>
> I like this idea but how do we deal with the lack of precision? 500µs "typical" and violating this timing "repeatedly"? Yuk. Without hard numbers coding it will be tricky. And what about this all being Numonyx-specific? Do we need to key off of the manufacturer ID or does it apply to all chips
handled by this cmd set?
>
> Lastly, what's the general kernel API for µs resolution time? I recall having issues in the past when high-resolution timers is not enabled in the kernel config.

I don't know, I have seen drivers(mpc52xx_spi.c) use get_tbl() though.
Michael Cashwell - Feb. 10, 2011, 3:59 p.m.
On Feb 10, 2011, at 10:04 AM, Joakim Tjernlund wrote:

> A simpler impl. would be a suspend counter. When no. of suspends > 1000, do usleep(500) or usleep(500) every 100-500 suspends.

Our messages crossed each other. I just saw this problem with 29 suspends.

I wonder if THAT is what's different between these chips and the older 130nm parts. And none of that addresses whether or not this change is just an anomaly in this batch or if we would continue to get them.

I'm going to run my test on the older 130nm (2003 vintage) parts just for grins.

-Mike
Joakim Tjernlund - Feb. 10, 2011, 4:05 p.m.
Michael Cashwell <mboards@prograde.net> wrote on 2011/02/10 16:59:50:
>
> > A simpler impl. would be a suspend counter. When no. of suspends > 1000, do usleep(500) or usleep(500) every 100-500 suspends.
>
> Our messages crossed each other. I just saw this problem with 29 suspends.
>
> I wonder if THAT is what's different between these chips and the older 130nm parts. And none of that addresses whether or not this change is just an anomaly in this batch or if we would continue to get them.
>
> I'm going to run my test on the older 130nm (2003 vintage) parts just for grins.

Here is another idea, don't resume between every write_buffer etc. If less
than 500 us has passed, just continue with the next write_buffer. That
would be much more efficient.
Michael Cashwell - Feb. 10, 2011, 4:41 p.m.
On Feb 10, 2011, at 11:05 AM, Joakim Tjernlund wrote:

> Michael Cashwell <mboards@prograde.net> wrote on 2011/02/10 16:59:50:
>> 
>>> A simpler impl. would be a suspend counter. When no. of suspends > 1000, do usleep(500) or usleep(500) every 100-500 suspends.
>> 
>> Our messages crossed each other. I just saw this problem with 29 suspends.
>> 
>> I wonder if THAT is what's different between these chips and the older 130nm parts. And none of that addresses whether or not this change is just an anomaly in this batch or if we would continue to get them.
>> 
>> I'm going to run my test on the older 130nm (2003 vintage) parts just for grins.
> 
> Here is another idea, don't resume between every write_buffer etc. If less than 500 us has passed, just continue with the next write_buffer. That would be much more efficient.

Wouldn't that extend the wrong time. That would ensure the erase stays *suspended* for at least 500us. But doesn't W602 specify that an erase should be allowed *to run* for 500us before suspension?

It seems the other way around.

-Mike
Michael Cashwell - Feb. 10, 2011, 4:43 p.m.
On Feb 10, 2011, at 10:59 AM, Michael Cashwell wrote:

> I'm going to run my test on the older 130nm (2003 vintage) parts just for grins.

Yup. That yielded the following:

Older 130nm parts:
run 1:
NOR Flash: block erase suspended and resumed 851 times.
NOR Flash: block erase suspended and resumed 490 times.
NOR Flash: block erase suspended and resumed 16598 times.
NOR Flash: block erase suspended and resumed 16 times.

run 2:
NOR Flash: block erase suspended and resumed 864 times.
NOR Flash: block erase suspended and resumed 16741 times.
NOR Flash: block erase suspended and resumed 9223 times.
NOR Flash: block erase suspended and resumed 16 times.

and suffered no failures.

Troublesome 65nm parts:
run 1:
NOR Flash: block erase suspended and resumed 29 times.
buffered write fails.

run 2:
NOR Flash: block erase suspended and resumed 89 times.
NOR Flash: block erase suspended and resumed 15 times.
buffered write fails.

So this does seem to be the characteristic that has changed.

-Mike
Joakim Tjernlund - Feb. 10, 2011, 4:46 p.m.
Michael Cashwell <mboards@prograde.net> wrote on 2011/02/10 17:41:23:
>
>
> On Feb 10, 2011, at 11:05 AM, Joakim Tjernlund wrote:
>
> > Michael Cashwell <mboards@prograde.net> wrote on 2011/02/10 16:59:50:
> >>
> >>> A simpler impl. would be a suspend counter. When no. of suspends > 1000, do usleep(500) or usleep(500) every 100-500 suspends.
> >>
> >> Our messages crossed each other. I just saw this problem with 29 suspends.
> >>
> >> I wonder if THAT is what's different between these chips and the older 130nm parts. And none of that addresses whether or not this change is just an anomaly in this batch or if we would continue to get them.
> >>
> >> I'm going to run my test on the older 130nm (2003 vintage) parts just for grins.
> >
> > Here is another idea, don't resume between every write_buffer etc. If less than 500 us has passed, just continue with the next write_buffer. That would be much more efficient.
>
> Wouldn't that extend the wrong time. That would ensure the erase stays *suspended* for at least 500us. But doesn't W602 specify that an erase should be allowed *to run* for 500us before suspension?
>
> It seems the other way around.

eeh, yes :(

 Jocke
Joakim Tjernlund - Feb. 10, 2011, 5:02 p.m.
> > > Here is another idea, don't resume between every write_buffer etc. If less than 500 us has passed, just continue with the next write_buffer. That would be much more efficient.
> >
> > Wouldn't that extend the wrong time. That would ensure the erase stays *suspended* for at least 500us. But doesn't W602 specify that an erase should be allowed *to run* for 500us before suspension?
> >
> > It seems the other way around.
>
> eeh, yes :(

And this reminds me that if the spec is to be trusted, the delay should be just before
erase suspend, otherwise you miss the time between the initial erase and the first suspend.

 Jocke
Michael Cashwell - Feb. 10, 2011, 5:10 p.m.
On Feb 10, 2011, at 12:02 PM, Joakim Tjernlund wrote:

> And this reminds me that if the spec is to be trusted, the delay should be just before erase suspend, otherwise you miss the time between the initial erase and the first suspend.

Probably true to be completely sure. I bet the need for "repeated violations" is why I've been able to make it work by delaying after. But I agree.

I have an idea for an efficient fix. But it's tricky to not impact writing speed. Once it get the idea worked out I'll post for comment. I have some other commitments today, so honestly it'll likely be next week at this point.

-Mike
Joakim Tjernlund - Feb. 10, 2011, 5:20 p.m.
Michael Cashwell <mboards@prograde.net> wrote on 2011/02/10 18:10:18:
>
> On Feb 10, 2011, at 12:02 PM, Joakim Tjernlund wrote:
>
> > And this reminds me that if the spec is to be trusted, the delay should be just before erase suspend, otherwise you miss the time between the initial erase and the first suspend.
>
> Probably true to be completely sure. I bet the need for "repeated violations" is why I've been able to make it work by delaying after. But I agree.

Yes, but you also indicated that a throw away status read made the problem go away?
Can you move that status read to just before suspend and get the same results?

>
> I have an idea for an efficient fix. But it's tricky to not impact writing speed. Once it get the idea worked out I'll post for comment. I have some other commitments today, so honestly it'll likely be next week at this point.
>

Nice, looking forward to that.
Joakim Tjernlund - Feb. 10, 2011, 5:47 p.m.
>
> Michael Cashwell <mboards@prograde.net> wrote on 2011/02/10 18:10:18:
> >
> > On Feb 10, 2011, at 12:02 PM, Joakim Tjernlund wrote:
> >
> > > And this reminds me that if the spec is to be trusted, the delay should be just before erase suspend, otherwise you miss the time between the initial erase and the first suspend.
> >
> > Probably true to be completely sure. I bet the need for "repeated violations" is why I've been able to make it work by delaying after. But I agree.
>
> Yes, but you also indicated that a throw away status read made the problem go away?
> Can you move that status read to just before suspend and get the same results?

hmm, Figure 31: Program Suspend/Resume Flowchart and
Figure 36: Erase Suspend/Resume Flowchart actually specifies
a read status(0x70) before suspend(0xb0)
Anders Grafström - Feb. 10, 2011, 5:54 p.m.
On 2011-02-10 17:43, Michael Cashwell wrote:
> On Feb 10, 2011, at 10:59 AM, Michael Cashwell wrote:
> Troublesome 65nm parts:
> run 1:
> NOR Flash: block erase suspended and resumed 29 times.
> buffered write fails.

Buffered write fails with SR[4] set?

Have you configured with CONFIG_MTD_UBI_DEBUG_PARANOID
so that UBI does the all 0xff check after erase?

I'm wondering if the blocks being written to are in a properly erased state.

/Anders
Joakim Tjernlund - Feb. 10, 2011, 6:26 p.m.
>
> >
> > Michael Cashwell <mboards@prograde.net> wrote on 2011/02/10 18:10:18:
> > >
> > > On Feb 10, 2011, at 12:02 PM, Joakim Tjernlund wrote:
> > >
> > > > And this reminds me that if the spec is to be trusted, the delay should be just before erase suspend, otherwise you miss the time between the initial erase and the first suspend.
> > >
> > > Probably true to be completely sure. I bet the need for "repeated violations" is why I've been able to make it work by delaying after. But I agree.
> >
> > Yes, but you also indicated that a throw away status read made the problem go away?
> > Can you move that status read to just before suspend and get the same results?
>
> hmm, Figure 31: Program Suspend/Resume Flowchart and
> Figure 36: Erase Suspend/Resume Flowchart actually specifies
> a read status(0x70) before suspend(0xb0)

Furthermore, the read status has to be to the same "partition" and you have
to end the resume with a read status too
 "If the suspended partition was placed in Read Array mode or a Program Loop"

Mike, I think some of this supports your findings.
I guess we need to save the block address which are being erased
to fulfil "same partition"

         Jocke
Michael Cashwell - Feb. 11, 2011, 3:05 p.m.
On Feb 10, 2011, at 12:54 PM, Anders Grafström wrote:

> On 2011-02-10 17:43, Michael Cashwell wrote:
>> On Feb 10, 2011, at 10:59 AM, Michael Cashwell wrote:
>> Troublesome 65nm parts:
>> run 1:
>> NOR Flash: block erase suspended and resumed 29 times.
>> buffered write fails.
> 
> Buffered write fails with SR[4] set?

The error messages variously report block locked and Vpp errors but they are not legitimate. I changed them to actually print the status word and it's often 0xffff or other values with set bits in the upper byte. That tells me the chip is not properly in status-read mode at that point.

So I think the answer to your question is no, not reliably with SR[4] set because it's reading array data instead of status.

> Have you configured with CONFIG_MTD_UBI_DEBUG_PARANOID so that UBI does the all 0xff check after erase?

I've used some of the debugging modes at the top of cfi_cmdset_0001.c but I had not used the config option. I will do that and report back.

> I'm wondering if the blocks being written to are in a properly erased state.

Yes, it's a possibility. However, I've never seen an erase report any error.

More next week.

-Mike
Joakim Tjernlund - Feb. 11, 2011, 5 p.m.
Anders Grafström <grfstrm@users.sourceforge.net> wrote on 2011/02/10 18:54:31:
>
> On 2011-02-10 17:43, Michael Cashwell wrote:
> > On Feb 10, 2011, at 10:59 AM, Michael Cashwell wrote:
> > Troublesome 65nm parts:
> > run 1:
> > NOR Flash: block erase suspended and resumed 29 times.
> > buffered write fails.
>
> Buffered write fails with SR[4] set?
>
> Have you configured with CONFIG_MTD_UBI_DEBUG_PARANOID
> so that UBI does the all 0xff check after erase?
>
> I'm wondering if the blocks being written to are in a properly erased state.

Looking at the write buffer flow chart for Numonyx I noted that it
specifies that while waiting for the write buffer one have to retry the
WB cmd each time, a la:

	for (;;) {
		map_write(map, CMD(0xe8), cmd_adr);
		status = map_read(map, cmd_adr);
		if (map_word_andequal(map, status, CMD(0x80), CMD(0x80)))
			break;
		if (timeout) abort write
	}

What does other specs say?

 Jocke
Michael Cashwell - Feb. 11, 2011, 6:03 p.m.
On Feb 10, 2011, at 1:26 PM, Joakim Tjernlund wrote:

>> hmm, Figure 31: Program Suspend/Resume Flowchart and
>> Figure 36: Erase Suspend/Resume Flowchart actually specifies a read status(0x70) before suspend(0xb0)

I see the Erase Suspend/Resume Flowchart as figure 40, not 36. The sheet info is: 320002-10 Mar 2010. What's yours?

> Furthermore, the read status has to be to the same "partition" and you have to end the resume with a read status too "If the suspended partition was placed in Read Array mode or a Program Loop"

Mine says NOT to do that. You can program or read from any OTHER partition, but trying to do that to the same one that was being erased is not valid.

My Erase Suspend/Resume Flowchart says to do a read-status (0x70) before the 0xb0 suspend. (Interesting that we weren't doing that!) The the address for both must be the same, but can be "any device address."

I don't see where they say the resume address must match the address of the erase.

-Mike
Joakim Tjernlund - Feb. 12, 2011, 10:47 a.m.
Michael Cashwell <mboards@prograde.net> wrote on 2011/02/11 19:03:00:
>
> On Feb 10, 2011, at 1:26 PM, Joakim Tjernlund wrote:
>
> >> hmm, Figure 31: Program Suspend/Resume Flowchart and
> >> Figure 36: Erase Suspend/Resume Flowchart actually specifies a read status(0x70) before suspend(0xb0)
>
> I see the Erase Suspend/Resume Flowchart as figure 40, not 36. The sheet info is: 320002-10 Mar 2010. What's yours?

  Apr 2010
  Order Number: 208042-05
from
  http://www.numonyx.com/en-US/MemoryProducts/NOR/Pages/P30P33Documents.aspx
>
> > Furthermore, the read status has to be to the same "partition" and you have to end the resume with a read status too "If the suspended partition was placed in Read Array mode or a Program Loop"
>
> Mine says NOT to do that. You can program or read from any OTHER partition, but trying to do that to the same one that was being erased is not valid.

Partition != erase block, not sure how though.

>
> My Erase Suspend/Resume Flowchart says to do a read-status (0x70) before the 0xb0 suspend. (Interesting that we weren't doing that!) The the address for both must be the same, but can be "any device address."
>
> I don't see where they say the resume address must match the address of the erase.

It says Same Partition which will map to Any Address if you only got one partition.

BTW, when we went from 130 to 65 nm parts, bus timing changed to slower and we
had to adjust our chip select logic to match. The 65 nm parts needed some extra
time for the read access. Have you adjusted yours?

 Jocke
Michael Cashwell - Feb. 14, 2011, 3:44 p.m.
On Feb 10, 2011, at 12:20 PM, Joakim Tjernlund wrote:

> Michael Cashwell <mboards@prograde.net> wrote on 2011/02/10 18:10:18:
>> On Feb 10, 2011, at 12:02 PM, Joakim Tjernlund wrote:
>> 
>>> And this reminds me that if the spec is to be trusted, the delay should be just before erase suspend, otherwise you miss the time between the initial erase and the first suspend.
>> 
>> Probably true to be completely sure. I bet the need for "repeated violations" is why I've been able to make it work by delaying after. But I agree.
> 
> Yes, but you also indicated that a throw away status read made the problem go away?
> Can you move that status read to just before suspend and get the same results?

Well, sort of. The place where I've been doing the udelay or throw-away status read is when resuming, not suspending.

But interestingly, doing it right before the resume works just as well as right after.

-Mike
Michael Cashwell - Feb. 14, 2011, 3:59 p.m.
To verify things are as expected when doing the erase suspend I added some status checks. That section of chip_read() shown next. I wanted to be sure that once the for (;;) loop ends that we were properly in a suspended erase state with no unexpected status bits.

	case FL_ERASING:
...
		/* Erase suspend */
		map_write(map, CMD(0x70), adr);
		map_write(map, CMD(0xB0), adr);
...
		map_write(map, CMD(0x70), adr);
		chip->oldstate = FL_ERASING;
		chip->state = FL_ERASE_SUSPENDING;
		chip->erase_suspended = 1;
		for (;;) {
			status = map_read(map, adr);
			if (map_word_andequal(map, status, status_OK, status_OK))
			        break;
...
		}
		if (!map_word_bitsset(map, status, CMD(0x40)))
			printk(KERN_ERR "%s: Erase-suspend completed instead.\n", map->name);

		if (map_word_bitsset(map, status, CMD(0x3f))) {
			printk(KERN_ERR "%s: Erase-suspend had unexpected status bits %lx.\n", map->name, status.x[0]);
			BUG();
		}

		chip->state = FL_STATUS;
		return 0;

During UBI writing via rsync things failed:

NOR Flash: Erase-suspend had unexpected status bits c4.
kernel BUG at drivers/mtd/chips/cfi_cmdset_0001.c:874!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c7ee4000
*pte=0000000025
Internal error: Oops: 817 [#1] PREEMPT
last sysfs file: /sys/class/ubi/ubi4/ubi4_0/name
Modules linked in: sc16is7uart gpiopps gpiodriver
CPU: 0    Not tainted  (2.6.35.7 #22)
PC is at __bug+0x1c/0x28
LR is at __bug+0x18/0x28
pc : [<c0026f40>]    lr : [<c0026f3c>]    psr: 60000013
sp : c7ee1b78  ip : 00000000  fp : 00000188
r10: c7cf2f10  r9 : ffffda32  r8 : 017a3000
r7 : c7cf2ef8  r6 : c0349730  r5 : 000000c4  r4 : c7c857ac
r3 : 00000000  r2 : 00020000  r1 : 00002040  r0 : 0000003d
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0000397f  Table: a7ee4000  DAC: 00000015
Process rsync (pid: 399, stack limit = 0xc7ee0278)
Stack: (0xc7ee1b78 to 0xc7ee2000)
...
[<c0026f40>] (__bug+0x1c/0x28) from [<c01a5980>] (chip_ready+0x20c/0x36c)
[<c01a5980>] (chip_ready+0x20c/0x36c) from [<c01a5f48>] (get_chip+0x8c/0x224)
[<c01a5f48>] (get_chip+0x8c/0x224) from [<c01a6648>] (cfi_intelext_writev+0x1a4/0x600)
[<c01a6648>] (cfi_intelext_writev+0x1a4/0x600) from [<c01a6ad4>] (cfi_intelext_write_buffers+0x30/0x38)
[<c01a6ad4>] (cfi_intelext_write_buffers+0x30/0x38) from [<c019f5c4>] (part_write+0x8c/0xa0)
[<c019f5c4>] (part_write+0x8c/0xa0) from [<c01b0ddc>] (ubi_io_write+0x4c/0xa4)
[<c01b0ddc>] (ubi_io_write+0x4c/0xa4) from [<c01af22c>] (ubi_eba_write_leb+0x7c/0x770)
[<c01af22c>] (ubi_eba_write_leb+0x7c/0x770) from [<c01ae328>] (ubi_leb_write+0xf4/0xf8)
[<c01ae328>] (ubi_leb_write+0xf4/0xf8) from [<c0134020>] (ubifs_wbuf_write_nolock+0x230/0x324)
[<c0134020>] (ubifs_wbuf_write_nolock+0x230/0x324) from [<c0129704>] (write_head.clone.12.clone.14+0x40/0x60)
[<c0129704>] (write_head.clone.12.clone.14+0x40/0x60) from [<c01299c0>] (ubifs_jnl_update+0x29c/0x4f4)
[<c01299c0>] (ubifs_jnl_update+0x29c/0x4f4) from [<c012ea04>] (ubifs_create+0x114/0x1b8)
[<c012ea04>] (ubifs_create+0x114/0x1b8) from [<c009683c>] (vfs_create+0x84/0x88)
[<c009683c>] (vfs_create+0x84/0x88) from [<c00997d0>] (do_last.clone.55+0x55c/0x63c)
[<c00997d0>] (do_last.clone.55+0x55c/0x63c) from [<c0099a40>] (do_filp_open+0x190/0x4e4)
[<c0099a40>] (do_filp_open+0x190/0x4e4) from [<c008d4c0>] (do_sys_open+0x5c/0xa8)
[<c008d4c0>] (do_sys_open+0x5c/0xa8) from [<c0023f40>] (ret_fast_syscall+0x0/0x2c)
Code: e59f0010 e1a01003 eb098786 e3a03000 (e5833000) 

The c4 status above is strange. chip->status was FL_ERASING but after the suspend we had both ESS and PSS status set. That means that both and erase and a program have been suspended.

Huh? How do we ever get PSS set? I thought that without xip the code would not suspend writes. So I'm quite confused by this.

It's also interesting that do_write_buffer() is not shown in the stack dump. I assume it's being inlined.

Still looking...

-Mike
Joakim Tjernlund - Feb. 24, 2011, 10:50 a.m.
Anders Grafström <grfstrm@users.sourceforge.net> wrote on 2011/02/10 14:21:36:
>
> On 2011-02-09 22:59, Michael Cashwell wrote:
> > On Feb 9, 2011, at 3:13 PM, Joakim Tjernlund wrote:
> >
> >> hmm, this sounds similar(from http://www.numonyx.com/Documents/Specification%20Updates/SU-309045_P30.pdf)
> >>
> >> 5. W602: erase suspend resume operation
> >> Problem: P30 product may fail to erase in intensive erase/suspend/resume environments. This is
> >>        due to an internal firmware issue that is exhibited in certain applications that require at
> >>       least 3000 to 4000 erase/suspend/resume cycles during the erase of a single block.
> >> Implication: Customer may see erase failure (SR reports “A0”) during a background erase. This
> >>                does not damage the device in any way, but data in the block may be disturbed from its
> >>               original state.
>
> I've seen this issue with Intel 28F640J5 chips as well.
>
> There's an old thread on it.
> http://lists.infradead.org/pipermail/linux-mtd/2008-April/021266.html
>
> A delay was suggested similar to the one you're experimenting with I think.
> http://lists.infradead.org/pipermail/linux-mtd/2008-April/021436.html

Been a while but it just occurred to me that you might see something similar as
Michael.

This program of yours:

static void erase_suspend(unsigned int loops)
{
	volatile unsigned int *flash = (unsigned int *)0x41040000;
	unsigned int i;
	unsigned int suspends = 0;
	unsigned int words = 2;
	unsigned int timeout;

	/* Make sure there's something to erase */
	for (i = 0; i < 0x10000; i++) {
		if (flash[i] == 0xffffffff) {
			flash[i] = 0x00400040;
			flash[i] = 0;
			while ((flash[i] & 0x00800080) != 0x00800080);
		}
	}

	/* Clear status */
	flash[0] = 0x00500050;

	/* Erase */
	flash[0] = 0x00200020;
	flash[0] = 0x00d000d0;

	while (1) {
		/* Suspend */
		flash[0] = 0x00b000b0;
		flash[0] = 0x00700070;
		timeout = 100000;
		while ((flash[0] & 0x00800080) != 0x00800080 && --timeout);
		if (!timeout)
			printf("suspend timeout\n");
		suspends++;

		/* Short delay */
		for (i = 0; i < 1000; i++);

		/* Resume */
		flash[0] = 0x00d000d0;
		flash[0] = 0x00700070;

		/* Erase done ? */
		if ((flash[0] & 0x00800080) == 0x00800080) {
			printf("\nstatus %08x\n", flash[0]);
			break;
		}

		/* Short delay */
		for (i = 0; i < loops; i++);
	}

	/* Read array */
	flash[0] = 0x00ff00ff;

	/* Show the first 2 failed words */
	for (i = 0; i < 0x10000; i++) {
		if (flash[i] != 0xffffffff) {
			printf("%08x\n", flash[i]);
			if (--words == 0)
				break;
		}
	}

	printf("%d suspends\n", suspends);
}

void test_flash_erase_suspend(void)
{
	unsigned int loops;

	for (loops = 6000; loops > 2000; loops -= 500) {
		printf("\n%d loops", loops);
		erase_suspend(loops);
	}
}


It produced this output at one occasion:

6000 loops
status 00800080
ffff0000
ffff0000
4783 suspends
---------------------
See the status 00800080 ?
Looking at the code this should not be possible, right?
looking closer though it becomes clearer:

if ((flash[0] & 0x00800080) == 0x00800080) {
	printf("\nstatus %08x\n", flash[0]);
	break;
}
There are two status reads here, one where you test and one where you
print. I suspect that if you add a dummy status read like Michael need to do:
stat = flash[0]; /* dummy, will contain garbage sometimes */
stat = flash[0];
if ((stat & 0x00800080) == 0x00800080) {
	printf("\nstatus %08x\n", stat);
	break;
}
You will see an improvement.

Michael, have you made any progress? Can you run Anders program too?

 Jocke
Joakim Tjernlund - Feb. 24, 2011, 11:36 a.m.
>
> Anders Grafström <grfstrm@users.sourceforge.net> wrote on 2011/02/10 14:21:36:
> >
> > On 2011-02-09 22:59, Michael Cashwell wrote:
> > > On Feb 9, 2011, at 3:13 PM, Joakim Tjernlund wrote:
> > >
> > >> hmm, this sounds similar(from http://www.numonyx.com/Documents/Specification%20Updates/SU-309045_P30.pdf)
> > >>
> > >> 5. W602: erase suspend resume operation
> > >> Problem: P30 product may fail to erase in intensive erase/suspend/resume environments. This is
> > >>        due to an internal firmware issue that is exhibited in certain applications that require at
> > >>       least 3000 to 4000 erase/suspend/resume cycles during the erase of a single block.
> > >> Implication: Customer may see erase failure (SR reports “A0”) during a background erase. This
> > >>                does not damage the device in any way, but data in the block may be disturbed from its
> > >>               original state.
> >
> > I've seen this issue with Intel 28F640J5 chips as well.
> >
> > There's an old thread on it.
> > http://lists.infradead.org/pipermail/linux-mtd/2008-April/021266.html
> >
> > A delay was suggested similar to the one you're experimenting with I think.
> > http://lists.infradead.org/pipermail/linux-mtd/2008-April/021436.html
>
> Been a while but it just occurred to me that you might see something similar as
> Michael.
>
> This program of yours:
>
> static void erase_suspend(unsigned int loops)
> {
>    volatile unsigned int *flash = (unsigned int *)0x41040000;
>    unsigned int i;
>    unsigned int suspends = 0;
>    unsigned int words = 2;
>    unsigned int timeout;
>
>    /* Make sure there's something to erase */
>    for (i = 0; i < 0x10000; i++) {
>       if (flash[i] == 0xffffffff) {
>          flash[i] = 0x00400040;
>          flash[i] = 0;
>          while ((flash[i] & 0x00800080) != 0x00800080);
>       }
>    }
>
>    /* Clear status */
>    flash[0] = 0x00500050;
>
>    /* Erase */
>    flash[0] = 0x00200020;
>    flash[0] = 0x00d000d0;
>
>    while (1) {
>       /* Suspend */
>       flash[0] = 0x00b000b0;
>       flash[0] = 0x00700070;
>       timeout = 100000;
>       while ((flash[0] & 0x00800080) != 0x00800080 && --timeout);
>       if (!timeout)
>          printf("suspend timeout\n");
>       suspends++;
>
>       /* Short delay */
>       for (i = 0; i < 1000; i++);
>
>       /* Resume */
>       flash[0] = 0x00d000d0;
>       flash[0] = 0x00700070;
>
>       /* Erase done ? */
>       if ((flash[0] & 0x00800080) == 0x00800080) {
>          printf("\nstatus %08x\n", flash[0]);
>          break;
>       }
>
>       /* Short delay */
>       for (i = 0; i < loops; i++);
>    }
>
>    /* Read array */
>    flash[0] = 0x00ff00ff;
>
>    /* Show the first 2 failed words */
>    for (i = 0; i < 0x10000; i++) {
>       if (flash[i] != 0xffffffff) {
>          printf("%08x\n", flash[i]);
>          if (--words == 0)
>             break;
>       }
>    }
>
>    printf("%d suspends\n", suspends);
> }
>
> void test_flash_erase_suspend(void)
> {
>    unsigned int loops;
>
>    for (loops = 6000; loops > 2000; loops -= 500) {
>       printf("\n%d loops", loops);
>       erase_suspend(loops);
>    }
> }
>
>
> It produced this output at one occasion:
>
> 6000 loops
> status 00800080
> ffff0000
> ffff0000
> 4783 suspends
> ---------------------
> See the status 00800080 ?
> Looking at the code this should not be possible, right?

ehh, looking at my own post I see that I misread == as !=


> looking closer though it becomes clearer:
>
> if ((flash[0] & 0x00800080) == 0x00800080) {
>    printf("\nstatus %08x\n", flash[0]);
>    break;
> }
> There are two status reads here, one where you test and one where you
> print. I suspect that if you add a dummy status read like Michael need to do:
> stat = flash[0]; /* dummy, will contain garbage sometimes */
> stat = flash[0];
> if ((stat & 0x00800080) == 0x00800080) {
>    printf("\nstatus %08x\n", stat);
>    break;
> }
> You will see an improvement.
>
> Michael, have you made any progress? Can you run Anders program too?
>
>  Jocke
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
Michael Cashwell - Feb. 24, 2011, 2:28 p.m.
Anders Grafström <grfstrm@users.sourceforge.net> wrote on 2011/02/10 14:21:36:

> It produced this output at one occasion:
> 
> 6000 loops
> status 00800080
> ffff0000
> ffff0000
> 4783 suspends
> ---------------------
> See the status 00800080 ?
> Looking at the code this should not be possible, right?


It's similar but not identical to my issue. I was seeing a buffered programming operation (done while an erase was suspended) appear to fail because the status read looking for program completion would return garbage. Your case is only erase, not programming.

But yours looks similar to mine in that your resumed erase completes (00800080) and then the next several status read returns garbage. Interestingly, the low chip returns correct status (WSM busy, nothing suspended, no errors, etc.) but the high chip returns FFFF. That was often what mine did during buffer program.

I also sometimes would see other values, not always FFFF. But with bits set in the upper byte whatever was being read was clearly NOT status. Since FFFF is correct for erased array data I assumed the part had dropped out of status-read mode but I've not been able to determine why.

Also, on my parts there is a loose restriction on the minimum time between starting or resuming an erase and the next suspend of it. It's 500µs for my part and they say that "repeated violations" of that time can cause erase failures. The hard part is that the 500µs is only a typical value and no hard number is given for how many violations are needed to cause a failure. I suspect it varies from part to part.

Your code suspends immediately after starting the erase, so that would be a violation on my part. And depending on the value of loops you might be violating it often.

On Feb 24, 2011, at 6:36 AM, Joakim Tjernlund wrote:

> Michael, have you made any progress?

Sadly, management was breathing down my neck to get it working and I've been pulled completely off onto a different project. I do still want to find the real answer but currently the people who control the purse strings have other ideas. There is another board in the pipeline here that might have this part on it. When that arrives on my desk I'll see if I can pick this up again.

> Can you run Anders program too?

Not as written. My board only has one flash chip where Anders' clearly has two chips in parallel. It wouldn't be too hard to change it to 1-wide but I'm overloaded on my other project at present.

-Mike

Patch

diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c 
b/drivers/mtd/chips/cfi_cmdset_0001.c
index d3b2cd3..ea59726 100644
--- a/drivers/mtd/chips/cfi_cmdset_0001.c
+++ b/drivers/mtd/chips/cfi_cmdset_0001.c
@@ -37,6 +37,8 @@ 
  #include <linux/mtd/compatmac.h>
  #include <linux/mtd/cfi.h>

+#include <linux/syscalls.h>
+
  /* #define CMDSET0001_DISABLE_ERASE_SUSPEND_ON_WRITE */
  /* #define CMDSET0001_DISABLE_WRITE_SUSPEND */

@@ -799,6 +801,8 @@  static int chip_ready (struct map_info *map, struct 
flchip *chip, unsigned long

          /* Erase suspend */
          map_write(map, CMD(0xB0), adr);
+        printk("[%10u][%04ld] erase suspend 1         adr=0x%08lx\n",
+               jiffies_to_usecs(jiffies)/1000, sys_gettid(), adr);

          /* If the flash has finished erasing, then 'erase suspend'