Message ID | 1396287151.4857.50.camel@mars |
---|---|
State | Changes Requested |
Headers | show |
"linux-mtd" <linux-mtd-bounces@lists.infradead.org> wrote on 2014/03/31 19:32:31: > > Add a fixup (quirk) for Micron NOR flash PC28F512P33TFA: This patch > disables option 'suspend erase' due to "internal timing issues on > some revisions of the P3x" which is pointed out by Micron TechNote-PDF > "Adapting the Linux Kernel for Micron P30, P33, and J3 Flash Memory" [1]. > > Without this patch, errors occur during heavy flash usage mostly in minus > degree temperature environments. > > It's very likely that other P3x chips suffer from the same issue. > > [1]: https://www.micron.com/~/media/Documents/Products/Technical%20Note/NOR%20Flash/tn1206_p30_p33_j3_linux.pdf > > Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com> > Tested-by: Marius Achilles <M.Achilles@phytec.de> No way, disabling erase suspend will cause severe latencies for read/write ops. The document mentions other ways to get around this problem, I suggest you explore these first. Minus degrees is not a common environment either so perhaps any fix should be selectable? Jocke
On Mon, 2014-03-31 at 22:31 +0200, Joakim Tjernlund wrote: > No way, disabling erase suspend will cause severe latencies for read/write > ops. Here on a mx35 it's not really severe. > The document mentions other ways to get around this problem, I suggest you > explore > these first. Already tried the “0xFF” dummy write cycle suggestion, should have mentioned that. > Minus degrees is not a common environment either so perhaps any fix should > be selectable? The NOR is specified for down to -40°C. Thanks -- Christoph
Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/01 00:04:20: > > On Mon, 2014-03-31 at 22:31 +0200, Joakim Tjernlund wrote: > > No way, disabling erase suspend will cause severe latencies for read/write > > ops. > > Here on a mx35 it's not really severe. There has been complaints in the past with no erase suspend. You get delays considering that an erase takes c.a 1 sek so once you get into a state where GC is erasing lots of blocks and other progs wants read/write you will notice. > > > The document mentions other ways to get around this problem, I suggest you > > explore > > these first. > > Already tried the “0xFF” dummy write cycle suggestion, should have > mentioned that. OK, but one more workaround(the udelay part) is required, right? Also, the doc. is written in 2011 and only mentions some revisions of P3x has this problem. Are these problems still present in current flash parts? > > > Minus degrees is not a common environment either so perhaps any fix should > > be selectable? > > The NOR is specified for down to -40°C. That is a lot I guess but not everyone runs the equipment at these temperatures.
Joakim Tjernlund/Transmode wrote on 2014/04/01 10:29:43: > > Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/01 00:04:20: > > > > On Mon, 2014-03-31 at 22:31 +0200, Joakim Tjernlund wrote: > > > No way, disabling erase suspend will cause severe latencies for read/write > > > ops. > > > > Here on a mx35 it's not really severe. > > There has been complaints in the past with no erase suspend. > You get delays considering that an erase takes c.a 1 sek so > once you get into a state where GC is erasing lots of blocks and other > progs wants read/write you will notice. > > > > > > The document mentions other ways to get around this problem, I suggest you > > > explore > > > these first. > > > > Already tried the “0xFF” dummy write cycle suggestion, should have > > mentioned that. > > OK, but one more workaround(the udelay part) is required, right? There something odd with the patches in the mentioned doc. 1) "Resolving ERASE SUSPEND Hangups" touches the same code as 2) "ERASE SUSPEND Following ERASE RESUME" 1) says to add write(0xff) and 2) adds a udelay What should it be, both? in what order? I suspect that any write(0xff) can be there unconditionally, provided that the cmd0001 spec allows it. The delay could be a quirk default to 0 Jocke Jocke
Hi Joakim, thanks for your input, please see my further comments below. On Tue, 2014-04-01 at 13:15 +0200, Joakim Tjernlund wrote: > > There has been complaints in the past with no erase suspend. > > You get delays considering that an erase takes c.a 1 sek so > > once you get into a state where GC is erasing lots of blocks and other > > progs wants read/write you will notice. Yes it's not a fine solution, it's a quirk for a hardware with issues. But here for me it's better to get delays than having to suffer from flash errors. > > > > The document mentions other ways to get around this problem, I > suggest you > > > > explore > > > > these first. > > > > > > Already tried the “0xFF” dummy write cycle suggestion, should have > > > mentioned that. > > > > OK, but one more workaround(the udelay part) is required, right? > > There something odd with the patches in the mentioned doc. > 1) "Resolving ERASE SUSPEND Hangups" touches the same code as > 2) "ERASE SUSPEND Following ERASE RESUME" > > 1) says to add write(0xff) and 2) adds a udelay > > What should it be, both? in what order? > > I suspect that any write(0xff) can be there unconditionally, provided that > the cmd0001 spec allows it. > The delay could be a quirk default to 0 I already tried all sorts of combinations of various delays and 0xff writes, which can be used wasteful without impact. And yes, errors are happening less -- but are still happening (at for example -2°C starting udev or a filesys-bench program like bonnie++). Sure, it's possible that I missed the holy right delay-0xff-quirk-combination to get this NOR flash reliable working for its specified temperature range. But until nobody has found it, I would prefer to stick to my posted quirk and just disable suspend erase. Maybe someone from Micron can help? Thanks -- Christoph
Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/01 17:09:51: > > Hi Joakim, > > thanks for your input, please see my further comments below. > > On Tue, 2014-04-01 at 13:15 +0200, Joakim Tjernlund wrote: > > > There has been complaints in the past with no erase suspend. > > > You get delays considering that an erase takes c.a 1 sek so > > > once you get into a state where GC is erasing lots of blocks and other > > > progs wants read/write you will notice. > > Yes it's not a fine solution, it's a quirk for a hardware with issues. > But here for me it's better to get delays than having to suffer from > flash errors. > > > > > > The document mentions other ways to get around this problem, I > > suggest you > > > > > explore > > > > > these first. > > > > > > > > Already tried the “0xFF” dummy write cycle suggestion, should have > > > > mentioned that. > > > > > > OK, but one more workaround(the udelay part) is required, right? > > > > There something odd with the patches in the mentioned doc. > > 1) "Resolving ERASE SUSPEND Hangups" touches the same code as > > 2) "ERASE SUSPEND Following ERASE RESUME" > > > > 1) says to add write(0xff) and 2) adds a udelay > > > > What should it be, both? in what order? > > > > I suspect that any write(0xff) can be there unconditionally, provided that > > the cmd0001 spec allows it. > > The delay could be a quirk default to 0 > > I already tried all sorts of combinations of various delays and 0xff > writes, which can be used wasteful without impact. And yes, errors are > happening less -- but are still happening (at for example -2°C starting > udev or a filesys-bench program like bonnie++). Amazing Micron still sells these defect chips > > Sure, it's possible that I missed the holy right > delay-0xff-quirk-combination to get this NOR flash reliable working for > its specified temperature range. But until nobody has found it, I would > prefer to stick to my posted quirk and just disable suspend erase. Problem is, if you add that quirk no-one will be able to remove it later so we better make sure there isn't a less intrusive solution. hmm, have you checked you bus timing? Perhaps you need to relax it somewhat. > > Maybe someone from Micron can help? Yes, you should contact them. Thanks Jocke
On Tue, 2014-04-01 at 21:13 +0200, Joakim Tjernlund wrote: > Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/01 17:09:51: > > I already tried all sorts of combinations of various delays and 0xff > > writes, which can be used wasteful without impact. And yes, errors are > > happening less -- but are still happening (at for example -2°C starting > > udev or a filesys-bench program like bonnie++). > > Amazing Micron still sells these defect chips Yeah, and it was pain to do all these kinds of tests to get it reliably working for a real world application where minus degrees can happen. For me it would be no problem to keep this quirk in my private queue, but involving a bigger audience seems to be right considering the facts. > > Sure, it's possible that I missed the holy right > > delay-0xff-quirk-combination to get this NOR flash reliable working for > > its specified temperature range. But until nobody has found it, I would > > prefer to stick to my posted quirk and just disable suspend erase. > > Problem is, if you add that quirk no-one will be able to remove it later > so we > better make sure there isn't a less intrusive solution. What about a comment or printk which makes that clear? I anyway don't think that there are interested users of this specific NOR flash left using a current kernel -- but who knows, right. > hmm, have you checked you bus timing? Perhaps you need to relax it > somewhat. Sure, I rechecked the WEIM interface (imx35 bus) settings more than once. I also did testings with most relaxed settings (highest delays possible). But still, flash errors do occur. I should mention that a PC28F256P33BFE (a similar 32 MB NOR flash) works without any quirk in all my testing environments just reliable as it should be. > > Maybe someone from Micron can help? > > Yes, you should contact them. Is there anybody out there knowing a contact without the need of going through a business-ticket-support-hell ? Thanks -- Christoph
Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/02 01:10:30: > > On Tue, 2014-04-01 at 21:13 +0200, Joakim Tjernlund wrote: > > Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/01 17:09:51: > > > I already tried all sorts of combinations of various delays and 0xff > > > writes, which can be used wasteful without impact. And yes, errors are > > > happening less -- but are still happening (at for example -2°C starting > > > udev or a filesys-bench program like bonnie++). > > > > Amazing Micron still sells these defect chips > > Yeah, and it was pain to do all these kinds of tests to get it reliably > working for a real world application where minus degrees can happen. For > me it would be no problem to keep this quirk in my private queue, but > involving a bigger audience seems to be right considering the facts. > > > > Sure, it's possible that I missed the holy right > > > delay-0xff-quirk-combination to get this NOR flash reliable working for > > > its specified temperature range. But until nobody has found it, I would > > > prefer to stick to my posted quirk and just disable suspend erase. > > > > Problem is, if you add that quirk no-one will be able to remove it later > > so we > > better make sure there isn't a less intrusive solution. > > What about a comment or printk which makes that clear? I anyway don't > think that there are interested users of this specific NOR flash left > using a current kernel -- but who knows, right. Yes and a new commit msg which indicates you have tried all workarounds suggested by Micron to no avail. > > > hmm, have you checked you bus timing? Perhaps you need to relax it > > somewhat. > > Sure, I rechecked the WEIM interface (imx35 bus) settings more than > once. I also did testings with most relaxed settings (highest delays > possible). But still, flash errors do occur. Well, I am out of ideas and it seems you are too. If not Micron comes back with a solution I guess there is no choice > > I should mention that a PC28F256P33BFE (a similar 32 MB NOR flash) works > without any quirk in all my testing environments just reliable as it > should be. > > > > Maybe someone from Micron can help? > > > > Yes, you should contact them. > > Is there anybody out there knowing a contact without the need of going > through a business-ticket-support-hell ? Sadly no. Jocke
Hi, --On April 02, 2014 09:25 +0200 Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote: > Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/02 01:10:30: >> >> On Tue, 2014-04-01 at 21:13 +0200, Joakim Tjernlund wrote: >> > Christoph Fritz <chf.fritz@googlemail.com> wrote on 2014/04/01 > 17:09:51: >> > > I already tried all sorts of combinations of various delays and 0xff >> > > writes, which can be used wasteful without impact. And yes, errors > are There are similar issues with the Micron M29EW that uses the cfi_cmdset_0002.c driver. See TN-13-07 [1]. The fixes required by this technical note were added to the driver in commit 420962884379bd434a7f643d0936281b2ab4b30c. Since the fixes are pretty similar to those in TN-12-06, I think you should have a look that this commit, in case you didn't already. Which delays did you try? Did you try the maximum delay of 500us? For the M29EW, 20us worked fine at room temperature, but at -40°C 500us much larger delays were required, therefore we set it to 500us finally. [1] http://www.micron.com/-/media/Documents/Products/Technical%20Note/NOR%20Flash/tn1307_patching_linux_kernel_for_m29.pdf >> > > happening less -- but are still happening (at for example -2°C > starting >> > > udev or a filesys-bench program like bonnie++). >> > >> > Amazing Micron still sells these defect chips >> >> Yeah, and it was pain to do all these kinds of tests to get it reliably >> working for a real world application where minus degrees can happen. For >> me it would be no problem to keep this quirk in my private queue, but >> involving a bigger audience seems to be right considering the facts. >> >> > > Sure, it's possible that I missed the holy right >> > > delay-0xff-quirk-combination to get this NOR flash reliable working > for >> > > its specified temperature range. But until nobody has found it, I > would After reading TN-12-06 I guess the correct order should be 1) 0xff 2) the existing 0xd0/0x70 3) wait 500us right? Could you send a patch of the combination you tried (and that did not work) for discussion? [...] >> > > Maybe someone from Micron can help? >> > >> > Yes, you should contact them. >> >> Is there anybody out there knowing a contact without the need of going >> through a business-ticket-support-hell ? Christoph, see private mail. Regards, Christian
On Wed, 2014-04-02 at 10:06 +0200, Andrea Adami wrote: > Have you tried disabling only "Suspend erase on write" ? > > cfip->SuspendCmdSupport &= ~1; > > That did the trick for other stubborn flash chips. I'll recheck to be sure. Thanks -- Christoph
diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c b/drivers/mtd/chips/cfi_cmdset_0001.c index 7751443..eb951b1 100644 --- a/drivers/mtd/chips/cfi_cmdset_0001.c +++ b/drivers/mtd/chips/cfi_cmdset_0001.c @@ -279,6 +279,18 @@ static void fixup_use_write_buffers(struct mtd_info *mtd) } } +static void fixup_micron_pc28f512p33(struct mtd_info *mtd) +{ + struct map_info *map = mtd->priv; + struct cfi_private *cfi = map->fldrv_priv; + struct cfi_pri_intelext *cfip = cfi->cmdset_priv; + + if (cfip->FeatureSupport & 2) { + printk(KERN_INFO "Disable Suspend Erase\n"); + cfip->FeatureSupport &= ~2; + } +} + /* * Some chips power-up with all sectors locked by default. */ @@ -309,6 +321,7 @@ static struct cfi_fixup cfi_fixup_table[] = { #endif { CFI_MFR_ST, 0x00ba, /* M28W320CT */ fixup_st_m28w320ct }, { CFI_MFR_ST, 0x00bb, /* M28W320CB */ fixup_st_m28w320cb }, + { CFI_MFR_INTEL, 0x8964, /* PC28F512P33 */ fixup_micron_pc28f512p33 }, { CFI_MFR_INTEL, CFI_ID_ANY, fixup_unlock_powerup_lock }, { 0, 0, NULL } };