diff mbox

mtd: cfi_cmdset_0001 - fixup for PC28F512P33TFA

Message ID 1396287151.4857.50.camel@mars
State Changes Requested
Headers show

Commit Message

Christoph Fritz March 31, 2014, 5:32 p.m. UTC
Add a fixup (quirk) for Micron NOR flash PC28F512P33TFA: This patch
disables option 'suspend erase' due to "internal timing issues on
some revisions of the P3x" which is pointed out by Micron TechNote-PDF
"Adapting the Linux Kernel for Micron P30, P33, and J3 Flash Memory" [1].

Without this patch, errors occur during heavy flash usage mostly in minus
degree temperature environments.

It's very likely that other P3x chips suffer from the same issue.

[1]: https://www.micron.com/~/media/Documents/Products/Technical%20Note/NOR%20Flash/tn1206_p30_p33_j3_linux.pdf

Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Tested-by: Marius Achilles <M.Achilles@phytec.de>
---
 drivers/mtd/chips/cfi_cmdset_0001.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Joakim Tjernlund March 31, 2014, 8:31 p.m. UTC | #1
"linux-mtd" <linux-mtd-bounces@lists.infradead.org> wrote on 2014/03/31 
19:32:31:
> 
> Add a fixup (quirk) for Micron NOR flash PC28F512P33TFA: This patch
> disables option 'suspend erase' due to "internal timing issues on
> some revisions of the P3x" which is pointed out by Micron TechNote-PDF
> "Adapting the Linux Kernel for Micron P30, P33, and J3 Flash Memory" 
[1].
> 
> Without this patch, errors occur during heavy flash usage mostly in 
minus
> degree temperature environments.
> 
> It's very likely that other P3x chips suffer from the same issue.
> 
> [1]: 
https://www.micron.com/~/media/Documents/Products/Technical%20Note/NOR%20Flash/tn1206_p30_p33_j3_linux.pdf

> 
> Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
> Tested-by: Marius Achilles <M.Achilles@phytec.de>

No way, disabling erase suspend will cause severe latencies for read/write 
ops.

The document mentions other ways to get around this problem, I suggest you 
explore
these first.
Minus degrees is not a common environment either so perhaps any fix should 
be selectable?
 
 Jocke
Christoph Fritz March 31, 2014, 10:04 p.m. UTC | #2
On Mon, 2014-03-31 at 22:31 +0200, Joakim Tjernlund wrote:
> No way, disabling erase suspend will cause severe latencies for read/write 
> ops.

Here on a mx35 it's not really severe.

> The document mentions other ways to get around this problem, I suggest you 
> explore
> these first.

Already tried the “0xFF” dummy write cycle suggestion, should have
mentioned that.

> Minus degrees is not a common environment either so perhaps any fix should 
> be selectable?

The NOR is specified for down to -40°C.

 Thanks
  -- Christoph
Joakim Tjernlund April 1, 2014, 8:29 a.m. UTC | #3
Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/01 00:04:20:
> 

> On Mon, 2014-03-31 at 22:31 +0200, Joakim Tjernlund wrote:

> > No way, disabling erase suspend will cause severe latencies for 

read/write 
> > ops.

> 

> Here on a mx35 it's not really severe.


There has been complaints in the past with no erase suspend.
You get delays considering that an erase takes c.a 1 sek so
once you get into a state where GC is erasing lots of blocks and other
progs wants read/write you will notice.

> 

> > The document mentions other ways to get around this problem, I suggest 

you 
> > explore

> > these first.

> 

> Already tried the “0xFF” dummy write cycle suggestion, should have

> mentioned that.


OK, but one more workaround(the udelay part) is required, right?

Also, the doc. is written in 2011 and only mentions some revisions of P3x 
has this
problem. Are these problems still present in current flash parts?

> 

> > Minus degrees is not a common environment either so perhaps any fix 

should 
> > be selectable?

> 

> The NOR is specified for down to -40°C.


That is a lot I guess but not everyone runs the equipment at these 
temperatures.
Joakim Tjernlund April 1, 2014, 11:15 a.m. UTC | #4
Joakim Tjernlund/Transmode wrote on 2014/04/01 10:29:43:
> 

> Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/01 00:04:20:

> > 

> > On Mon, 2014-03-31 at 22:31 +0200, Joakim Tjernlund wrote:

> > > No way, disabling erase suspend will cause severe latencies for 

read/write 
> > > ops.

> > 

> > Here on a mx35 it's not really severe.

> 

> There has been complaints in the past with no erase suspend.

> You get delays considering that an erase takes c.a 1 sek so

> once you get into a state where GC is erasing lots of blocks and other

> progs wants read/write you will notice.

> 

> > 

> > > The document mentions other ways to get around this problem, I 

suggest you 
> > > explore

> > > these first.

> > 

> > Already tried the “0xFF” dummy write cycle suggestion, should have

> > mentioned that.

> 

> OK, but one more workaround(the udelay part) is required, right?


There something odd with the patches in the mentioned doc.
1) "Resolving ERASE SUSPEND Hangups" touches the same code as
2) "ERASE SUSPEND Following ERASE RESUME"

1) says to add write(0xff) and 2) adds a udelay

What should it be, both? in what order?

I suspect that any write(0xff) can be there unconditionally, provided that 
the cmd0001 spec allows it.
The delay could be a quirk default to 0

 Jocke

 Jocke
Christoph Fritz April 1, 2014, 3:09 p.m. UTC | #5
Hi Joakim,

 thanks for your input, please see my further comments below.

On Tue, 2014-04-01 at 13:15 +0200, Joakim Tjernlund wrote:
> > There has been complaints in the past with no erase suspend.
> > You get delays considering that an erase takes c.a 1 sek so
> > once you get into a state where GC is erasing lots of blocks and other
> > progs wants read/write you will notice.

Yes it's not a fine solution, it's a quirk for a hardware with issues.
But here for me it's better to get delays than having to suffer from
flash errors.

> > > > The document mentions other ways to get around this problem, I 
> suggest you 
> > > > explore
> > > > these first.
> > > 
> > > Already tried the “0xFF” dummy write cycle suggestion, should have
> > > mentioned that.
> > 
> > OK, but one more workaround(the udelay part) is required, right?
> 
> There something odd with the patches in the mentioned doc.
> 1) "Resolving ERASE SUSPEND Hangups" touches the same code as
> 2) "ERASE SUSPEND Following ERASE RESUME"
> 
> 1) says to add write(0xff) and 2) adds a udelay
> 
> What should it be, both? in what order?
> 
> I suspect that any write(0xff) can be there unconditionally, provided that 
> the cmd0001 spec allows it.
> The delay could be a quirk default to 0

I already tried all sorts of combinations of various delays and 0xff
writes, which can be used wasteful without impact. And yes, errors are
happening less -- but are still happening (at for example -2°C starting
udev or a filesys-bench program like bonnie++).

Sure, it's possible that I missed the holy right
delay-0xff-quirk-combination to get this NOR flash reliable working for
its specified temperature range. But until nobody has found it, I would
prefer to stick to my posted quirk and just disable suspend erase.

Maybe someone from Micron can help?

 Thanks
  -- Christoph
Joakim Tjernlund April 1, 2014, 7:13 p.m. UTC | #6
Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/01 17:09:51:
> 

> Hi Joakim,

> 

>  thanks for your input, please see my further comments below.

> 

> On Tue, 2014-04-01 at 13:15 +0200, Joakim Tjernlund wrote:

> > > There has been complaints in the past with no erase suspend.

> > > You get delays considering that an erase takes c.a 1 sek so

> > > once you get into a state where GC is erasing lots of blocks and 

other
> > > progs wants read/write you will notice.

> 

> Yes it's not a fine solution, it's a quirk for a hardware with issues.

> But here for me it's better to get delays than having to suffer from

> flash errors.

> 

> > > > > The document mentions other ways to get around this problem, I 

> > suggest you 

> > > > > explore

> > > > > these first.

> > > > 

> > > > Already tried the “0xFF” dummy write cycle suggestion, should have

> > > > mentioned that.

> > > 

> > > OK, but one more workaround(the udelay part) is required, right?

> > 

> > There something odd with the patches in the mentioned doc.

> > 1) "Resolving ERASE SUSPEND Hangups" touches the same code as

> > 2) "ERASE SUSPEND Following ERASE RESUME"

> > 

> > 1) says to add write(0xff) and 2) adds a udelay

> > 

> > What should it be, both? in what order?

> > 

> > I suspect that any write(0xff) can be there unconditionally, provided 

that 
> > the cmd0001 spec allows it.

> > The delay could be a quirk default to 0

> 

> I already tried all sorts of combinations of various delays and 0xff

> writes, which can be used wasteful without impact. And yes, errors are

> happening less -- but are still happening (at for example -2°C starting

> udev or a filesys-bench program like bonnie++).


Amazing Micron still sells these defect chips

> 

> Sure, it's possible that I missed the holy right

> delay-0xff-quirk-combination to get this NOR flash reliable working for

> its specified temperature range. But until nobody has found it, I would

> prefer to stick to my posted quirk and just disable suspend erase.


Problem is, if you add that quirk no-one will be able to remove it later 
so we
better make sure there isn't a less intrusive solution.

hmm, have you checked you bus timing? Perhaps you need to relax it 
somewhat.

> 

> Maybe someone from Micron can help?


Yes, you should contact them.

 Thanks
         Jocke
Christoph Fritz April 1, 2014, 11:10 p.m. UTC | #7
On Tue, 2014-04-01 at 21:13 +0200, Joakim Tjernlund wrote:
> Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/01 17:09:51:
> > I already tried all sorts of combinations of various delays and 0xff
> > writes, which can be used wasteful without impact. And yes, errors are
> > happening less -- but are still happening (at for example -2°C starting
> > udev or a filesys-bench program like bonnie++).
> 
> Amazing Micron still sells these defect chips

Yeah, and it was pain to do all these kinds of tests to get it reliably
working for a real world application where minus degrees can happen. For
me it would be no problem to keep this quirk in my private queue, but
involving a bigger audience seems to be right considering the facts.

> > Sure, it's possible that I missed the holy right
> > delay-0xff-quirk-combination to get this NOR flash reliable working for
> > its specified temperature range. But until nobody has found it, I would
> > prefer to stick to my posted quirk and just disable suspend erase.
> 
> Problem is, if you add that quirk no-one will be able to remove it later 
> so we
> better make sure there isn't a less intrusive solution.

What about a comment or printk which makes that clear? I anyway don't
think that there are interested users of this specific NOR flash left
using a current kernel -- but who knows, right.

> hmm, have you checked you bus timing? Perhaps you need to relax it 
> somewhat.

Sure, I rechecked the WEIM interface (imx35 bus) settings more than
once. I also did testings with most relaxed settings (highest delays
possible). But still, flash errors do occur.

I should mention that a PC28F256P33BFE (a similar 32 MB NOR flash) works
without any quirk in all my testing environments just reliable as it
should be.
 
> > Maybe someone from Micron can help?
> 
> Yes, you should contact them.

Is there anybody out there knowing a contact without the need of going
through a business-ticket-support-hell ?

Thanks
  -- Christoph
Joakim Tjernlund April 2, 2014, 7:25 a.m. UTC | #8
Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/02 01:10:30:
> 
> On Tue, 2014-04-01 at 21:13 +0200, Joakim Tjernlund wrote:
> > Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/01 
17:09:51:
> > > I already tried all sorts of combinations of various delays and 0xff
> > > writes, which can be used wasteful without impact. And yes, errors 
are
> > > happening less -- but are still happening (at for example -2°C 
starting
> > > udev or a filesys-bench program like bonnie++).
> > 
> > Amazing Micron still sells these defect chips
> 
> Yeah, and it was pain to do all these kinds of tests to get it reliably
> working for a real world application where minus degrees can happen. For
> me it would be no problem to keep this quirk in my private queue, but
> involving a bigger audience seems to be right considering the facts.
> 
> > > Sure, it's possible that I missed the holy right
> > > delay-0xff-quirk-combination to get this NOR flash reliable working 
for
> > > its specified temperature range. But until nobody has found it, I 
would
> > > prefer to stick to my posted quirk and just disable suspend erase.
> > 
> > Problem is, if you add that quirk no-one will be able to remove it 
later 
> > so we
> > better make sure there isn't a less intrusive solution.
> 
> What about a comment or printk which makes that clear? I anyway don't
> think that there are interested users of this specific NOR flash left
> using a current kernel -- but who knows, right.

Yes and a new commit msg which indicates you have tried all workarounds 
suggested
by Micron to no avail.

> 
> > hmm, have you checked you bus timing? Perhaps you need to relax it 
> > somewhat.
> 
> Sure, I rechecked the WEIM interface (imx35 bus) settings more than
> once. I also did testings with most relaxed settings (highest delays
> possible). But still, flash errors do occur.

Well, I am out of ideas and it seems you are too.
If not Micron comes back with a solution I guess there is no choice

> 
> I should mention that a PC28F256P33BFE (a similar 32 MB NOR flash) works
> without any quirk in all my testing environments just reliable as it
> should be.
> 
> > > Maybe someone from Micron can help?
> > 
> > Yes, you should contact them.
> 
> Is there anybody out there knowing a contact without the need of going
> through a business-ticket-support-hell ?

Sadly no.

 Jocke
Christian Riesch April 2, 2014, 9:04 a.m. UTC | #9
Hi,

--On April 02, 2014 09:25 +0200 Joakim Tjernlund 
<joakim.tjernlund@transmode.se> wrote:

> Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/02 01:10:30:
>>
>> On Tue, 2014-04-01 at 21:13 +0200, Joakim Tjernlund wrote:
>> > Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/01
> 17:09:51:
>> > > I already tried all sorts of combinations of various delays and 0xff
>> > > writes, which can be used wasteful without impact. And yes, errors
> are

There are similar issues with the Micron M29EW that uses the 
cfi_cmdset_0002.c driver. See TN-13-07 [1]. The fixes required by this 
technical note were added to the driver in commit 
420962884379bd434a7f643d0936281b2ab4b30c. Since the fixes are pretty 
similar to those in TN-12-06, I think you should have a look that this 
commit, in case you didn't already.

Which delays did you try? Did you try the maximum delay of 500us? For the 
M29EW, 20us worked fine at room temperature, but at -40°C 500us much 
larger delays were required, therefore we set it to 500us finally.

[1] 
http://www.micron.com/-/media/Documents/Products/Technical%20Note/NOR%20Flash/tn1307_patching_linux_kernel_for_m29.pdf

>> > > happening less -- but are still happening (at for example -2°C
> starting
>> > > udev or a filesys-bench program like bonnie++).
>> >
>> > Amazing Micron still sells these defect chips
>>
>> Yeah, and it was pain to do all these kinds of tests to get it reliably
>> working for a real world application where minus degrees can happen. For
>> me it would be no problem to keep this quirk in my private queue, but
>> involving a bigger audience seems to be right considering the facts.
>>
>> > > Sure, it's possible that I missed the holy right
>> > > delay-0xff-quirk-combination to get this NOR flash reliable working
> for
>> > > its specified temperature range. But until nobody has found it, I
> would

After reading TN-12-06 I guess the correct order should be

1) 0xff
2) the existing 0xd0/0x70
3) wait 500us

right? Could you send a patch of the combination you tried (and that did 
not work) for discussion?

[...]

>> > > Maybe someone from Micron can help?
>> >
>> > Yes, you should contact them.
>>
>> Is there anybody out there knowing a contact without the need of going
>> through a business-ticket-support-hell ?

Christoph, see private mail.

Regards, Christian
Christoph Fritz April 2, 2014, 9:18 a.m. UTC | #10
On Wed, 2014-04-02 at 10:06 +0200, Andrea Adami wrote:
> Have you tried disabling only "Suspend erase on write" ?
> 
> cfip->SuspendCmdSupport &= ~1;
> 
> That did the trick for other stubborn flash chips.

I'll recheck to be sure.

 Thanks
  -- Christoph
diff mbox

Patch

diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c b/drivers/mtd/chips/cfi_cmdset_0001.c
index 7751443..eb951b1 100644
--- a/drivers/mtd/chips/cfi_cmdset_0001.c
+++ b/drivers/mtd/chips/cfi_cmdset_0001.c
@@ -279,6 +279,18 @@  static void fixup_use_write_buffers(struct mtd_info *mtd)
 	}
 }
 
+static void fixup_micron_pc28f512p33(struct mtd_info *mtd)
+{
+	struct map_info *map = mtd->priv;
+	struct cfi_private *cfi = map->fldrv_priv;
+	struct cfi_pri_intelext *cfip = cfi->cmdset_priv;
+
+	if (cfip->FeatureSupport & 2) {
+		printk(KERN_INFO "Disable Suspend Erase\n");
+		cfip->FeatureSupport &= ~2;
+	}
+}
+
 /*
  * Some chips power-up with all sectors locked by default.
  */
@@ -309,6 +321,7 @@  static struct cfi_fixup cfi_fixup_table[] = {
 #endif
 	{ CFI_MFR_ST, 0x00ba, /* M28W320CT */ fixup_st_m28w320ct },
 	{ CFI_MFR_ST, 0x00bb, /* M28W320CB */ fixup_st_m28w320cb },
+	{ CFI_MFR_INTEL, 0x8964, /* PC28F512P33 */ fixup_micron_pc28f512p33 },
 	{ CFI_MFR_INTEL, CFI_ID_ANY, fixup_unlock_powerup_lock },
 	{ 0, 0, NULL }
 };