mbox series

[v4,0/5] Micron SLC NAND filling block

Message ID 20200518135943.11749-1-huobean@gmail.com
Headers show
Series Micron SLC NAND filling block | expand

Message

Bean Huo May 18, 2020, 1:59 p.m. UTC
From: Bean Huo <beanhuo@micron.com>

Hi,

on some legacy planar 2D Micron NAND devices when a block erase command is
issued, occasionally even though a block erase operation completes and returns
a pass status, the flash block may not be completely erased. Subsequent
operations to this block on very rare cases can result in subtle failures or
corruption. These extremely rare cases should nevertheless be considered. This
patchset is to address this potential issue.

After submission of patch V1 [1] and V2 [2], we stopped its update since we get
stuck in the solution on how to avoid the power-loss issue in case power-cut
hits the block filling. In the v1 and v2, to avoid this issue, we always damaged
page0, page1, this's based on the hypothesis that NAND FS is UBIFS. This
FS-specifical code is unacceptable in the MTD layer. Also, it cannot cover all
NAND based file system. Based on the current discussion, seems that re-write all
first 15 page from page0 is a satisfactory solution.

Meanwhile, I borrowed one idea from Miquel Raynal patchset [3], in which keeps
a recode of programmed pages, base on it, for most of the cases, we don't need
to read every page to see if current erasing block is a partially programmed
block.

Changelog:

v3 - v4:
    1. In the patch 4/5, change to directly use ecc.strength to judge the page
       is a empty page or not, rather than max_bitflips < mtd->bitflip_threshold
    2. In the patch 5/5, for the powerloss case, from the next time boot up,
       lots of page will be programmed from >page15 address, if still using
       first_p as GENMASK() bitmask starting position, writtenp will be always 0,
       fix it by changing its bitmask starting at bit position 0.

v2 - v3:
    1. Rebase patch to the latest MTD git tree
    2. Add a record that keeps tracking the programmed pages in the first 16
       pages
    3. Change from program odd pages, damage page 0 and page 1, to program all
       first 15 pages
    4. Address issues which exist in the V2.

v1 - v2:
    1. Rebased V1 to latest Linux kernel.
    2. Add erase preparation function pointer in nand_manufacturer_ops.


[1] https://www.spinics.net/lists/linux-mtd/msg04112.html
[2] https://www.spinics.net/lists/linux-mtd/msg04450.html
[3] https://www.spinics.net/lists/linux-mtd/msg13083.html


Bean Huo (5):
  mtd: rawnand: group all NAND specific ops into new nand_chip_ops
  mtd: rawnand: Add {pre,post}_erase hooks in nand_chip_ops
  mtd: rawnand: Add write_oob hook in nand_chip_ops
  mtd: rawnand: Introduce a new function nand_check_is_erased_page()
  mtd: rawnand: micron: Micron SLC NAND filling block

 drivers/mtd/nand/raw/internals.h     |   3 +-
 drivers/mtd/nand/raw/nand_base.c     |  88 +++++++++++++++++++----
 drivers/mtd/nand/raw/nand_hynix.c    |   2 +-
 drivers/mtd/nand/raw/nand_macronix.c |  10 +--
 drivers/mtd/nand/raw/nand_micron.c   | 104 ++++++++++++++++++++++++++-
 include/linux/mtd/rawnand.h          |  40 +++++++----
 6 files changed, 212 insertions(+), 35 deletions(-)

Comments

Miquel Raynal May 18, 2020, 3:22 p.m. UTC | #1
Hi Bean,

Bean Huo <huobean@gmail.com> wrote on Mon, 18 May 2020 15:59:38 +0200:

> From: Bean Huo <beanhuo@micron.com>
> 
> Hi,
> 
> on some legacy planar 2D Micron NAND devices when a block erase command is
> issued, occasionally even though a block erase operation completes and returns
> a pass status, the flash block may not be completely erased. Subsequent
> operations to this block on very rare cases can result in subtle failures or
> corruption. These extremely rare cases should nevertheless be considered. This
> patchset is to address this potential issue.
> 
> After submission of patch V1 [1] and V2 [2], we stopped its update since we get
> stuck in the solution on how to avoid the power-loss issue in case power-cut
> hits the block filling. In the v1 and v2, to avoid this issue, we always damaged
> page0, page1, this's based on the hypothesis that NAND FS is UBIFS. This
> FS-specifical code is unacceptable in the MTD layer. Also, it cannot cover all
> NAND based file system. Based on the current discussion, seems that re-write all
> first 15 page from page0 is a satisfactory solution.

We have a layering problem now. Maybe we should just have an MTD
internal variable like min_written_pages_before_erase that the Micron
driver could set to a !0 value.

Then, the handling could be done by the user (UBI/UBIFS, JFFS2, MTD
user if exported).

> 
> Meanwhile, I borrowed one idea from Miquel Raynal patchset [3], in which keeps
> a recode of programmed pages, base on it, for most of the cases, we don't need
> to read every page to see if current erasing block is a partially programmed
> block.
> 
> Changelog:
> 
> v3 - v4:
>     1. In the patch 4/5, change to directly use ecc.strength to judge the page
>        is a empty page or not, rather than max_bitflips < mtd->bitflip_threshold
>     2. In the patch 5/5, for the powerloss case, from the next time boot up,
>        lots of page will be programmed from >page15 address, if still using
>        first_p as GENMASK() bitmask starting position, writtenp will be always 0,
>        fix it by changing its bitmask starting at bit position 0.
> 
> v2 - v3:
>     1. Rebase patch to the latest MTD git tree
>     2. Add a record that keeps tracking the programmed pages in the first 16
>        pages
>     3. Change from program odd pages, damage page 0 and page 1, to program all
>        first 15 pages
>     4. Address issues which exist in the V2.
> 
> v1 - v2:
>     1. Rebased V1 to latest Linux kernel.
>     2. Add erase preparation function pointer in nand_manufacturer_ops.
> 
> 
> [1] https://www.spinics.net/lists/linux-mtd/msg04112.html
> [2] https://www.spinics.net/lists/linux-mtd/msg04450.html
> [3] https://www.spinics.net/lists/linux-mtd/msg13083.html
> 
> 
> Bean Huo (5):
>   mtd: rawnand: group all NAND specific ops into new nand_chip_ops
>   mtd: rawnand: Add {pre,post}_erase hooks in nand_chip_ops
>   mtd: rawnand: Add write_oob hook in nand_chip_ops
>   mtd: rawnand: Introduce a new function nand_check_is_erased_page()
>   mtd: rawnand: micron: Micron SLC NAND filling block

When you take my patches in your series, especially when not touching
them at all, you should keep my Authorship and SoB first, then add your
SoB.


Thanks,
Miquèl
Bean Huo May 19, 2020, 9:04 a.m. UTC | #2
hi,  Miquel

On Mon, 2020-05-18 at 17:22 +0200, Miquel Raynal wrote:
> Hi Bean,
> 
> Bean Huo <huobean@gmail.com> wrote on Mon, 18 May 2020 15:59:38
> +0200:
> 
> > From: Bean Huo <beanhuo@micron.com>
> > 
> > After submission of patch V1 [1] and V2 [2], we stopped its update
> > since we get
> > stuck in the solution on how to avoid the power-loss issue in case
> > power-cut
> > hits the block filling. In the v1 and v2, to avoid this issue, we
> > always damaged
> > page0, page1, this's based on the hypothesis that NAND FS is UBIFS.
> > This
> > FS-specifical code is unacceptable in the MTD layer. Also, it
> > cannot cover all
> > NAND based file system. Based on the current discussion, seems that
> > re-write all
> > first 15 page from page0 is a satisfactory solution.
> 
> We have a layering problem now. Maybe we should just have an MTD
> internal variable like min_written_pages_before_erase that the Micron
> driver could set to a !0 value.
> 
> Then, the handling could be done by the user (UBI/UBIFS, JFFS2, MTD
> user if exported).
> 

This is NAND its own problem, if no significant adantage, I don't think
it's a good solution to extend the problem to the upper FS layer.
also, in the FS erase path, doesn't have the programmed pages counter.
we should repeat the same approach as we did in MTD layer.

> > 
> > Meanwhile, I borrowed one idea from Miquel Raynal patchset [3], in
> > which keeps
> > a recode of programmed pages, base on it, for most of the cases, we
> > don't need
> > to read every page to see if current erasing block is a partially
> > programmed
> > block.
> > 
> > Changelog:
> > 
> > v3 - v4:
> >     1. In the patch 4/5, change to directly use ecc.strength to
> > judge the page
> >        is a empty page or not, rather than max_bitflips < mtd-
> > >bitflip_threshold
> >     2. In the patch 5/5, for the powerloss case, from the next time
> > boot up,
> >        lots of page will be programmed from >page15 address, if
> > still using
> >        first_p as GENMASK() bitmask starting position, writtenp
> > will be always 0,
> >        fix it by changing its bitmask starting at bit position 0.
> > 
> > v2 - v3:
> >     1. Rebase patch to the latest MTD git tree
> >     2. Add a record that keeps tracking the programmed pages in the
> > first 16
> >        pages
> >     3. Change from program odd pages, damage page 0 and page 1, to
> > program all
> >        first 15 pages
> >     4. Address issues which exist in the V2.
> > 
> > v1 - v2:
> >     1. Rebased V1 to latest Linux kernel.
> >     2. Add erase preparation function pointer in
> > nand_manufacturer_ops.
> > 
> > 
> > [1] https://www.spinics.net/lists/linux-mtd/msg04112.html
> > [2] https://www.spinics.net/lists/linux-mtd/msg04450.html
> > [3] https://www.spinics.net/lists/linux-mtd/msg13083.html
> > 
> > 
> > Bean Huo (5):
> >   mtd: rawnand: group all NAND specific ops into new nand_chip_ops
> >   mtd: rawnand: Add {pre,post}_erase hooks in nand_chip_ops
> >   mtd: rawnand: Add write_oob hook in nand_chip_ops
> >   mtd: rawnand: Introduce a new function
> > nand_check_is_erased_page()
> >   mtd: rawnand: micron: Micron SLC NAND filling block
> 
> When you take my patches in your series, especially when not touching
> them at all, you should keep my Authorship and SoB first, then add
> your
> SoB.
> 

sorry for my fault, I thought adding your Signed-off-by in 3/5 is
suffient. you mean I should add your signed-off-by in 5/5 as well?
I will do that in next version.

thanks Miquel.


BTW: would you please help me review other code?


 
Bean


> 
> Thanks,
> Miquèl
Miquel Raynal May 19, 2020, 9:08 a.m. UTC | #3
Hi Bean,

Bean Huo <huobean@gmail.com> wrote on Tue, 19 May 2020 11:04:15 +0200:

> hi,  Miquel
> 
> On Mon, 2020-05-18 at 17:22 +0200, Miquel Raynal wrote:
> > Hi Bean,
> > 
> > Bean Huo <huobean@gmail.com> wrote on Mon, 18 May 2020 15:59:38
> > +0200:
> >   
> > > From: Bean Huo <beanhuo@micron.com>
> > > 
> > > After submission of patch V1 [1] and V2 [2], we stopped its update
> > > since we get
> > > stuck in the solution on how to avoid the power-loss issue in case
> > > power-cut
> > > hits the block filling. In the v1 and v2, to avoid this issue, we
> > > always damaged
> > > page0, page1, this's based on the hypothesis that NAND FS is UBIFS.
> > > This
> > > FS-specifical code is unacceptable in the MTD layer. Also, it
> > > cannot cover all
> > > NAND based file system. Based on the current discussion, seems that
> > > re-write all
> > > first 15 page from page0 is a satisfactory solution.  
> > 
> > We have a layering problem now. Maybe we should just have an MTD
> > internal variable like min_written_pages_before_erase that the Micron
> > driver could set to a !0 value.
> > 
> > Then, the handling could be done by the user (UBI/UBIFS, JFFS2, MTD
> > user if exported).
> >   
> 
> This is NAND its own problem, if no significant adantage, I don't think
> it's a good solution to extend the problem to the upper FS layer.
> also, in the FS erase path, doesn't have the programmed pages counter.
> we should repeat the same approach as we did in MTD layer.

The problem is that if the filesystem is not aware, it breaks the
"power cut safe" assertion.

There is a problem with JFFS2 and a problem with UBIFS because of that.
We can certainly keep a default implementation like this one for other
users though.

> 
> > > 
> > > Meanwhile, I borrowed one idea from Miquel Raynal patchset [3], in
> > > which keeps
> > > a recode of programmed pages, base on it, for most of the cases, we
> > > don't need
> > > to read every page to see if current erasing block is a partially
> > > programmed
> > > block.
> > > 
> > > Changelog:
> > > 
> > > v3 - v4:
> > >     1. In the patch 4/5, change to directly use ecc.strength to
> > > judge the page
> > >        is a empty page or not, rather than max_bitflips < mtd-  
> > > >bitflip_threshold  
> > >     2. In the patch 5/5, for the powerloss case, from the next time
> > > boot up,
> > >        lots of page will be programmed from >page15 address, if
> > > still using
> > >        first_p as GENMASK() bitmask starting position, writtenp
> > > will be always 0,
> > >        fix it by changing its bitmask starting at bit position 0.
> > > 
> > > v2 - v3:
> > >     1. Rebase patch to the latest MTD git tree
> > >     2. Add a record that keeps tracking the programmed pages in the
> > > first 16
> > >        pages
> > >     3. Change from program odd pages, damage page 0 and page 1, to
> > > program all
> > >        first 15 pages
> > >     4. Address issues which exist in the V2.
> > > 
> > > v1 - v2:
> > >     1. Rebased V1 to latest Linux kernel.
> > >     2. Add erase preparation function pointer in
> > > nand_manufacturer_ops.
> > > 
> > > 
> > > [1] https://www.spinics.net/lists/linux-mtd/msg04112.html
> > > [2] https://www.spinics.net/lists/linux-mtd/msg04450.html
> > > [3] https://www.spinics.net/lists/linux-mtd/msg13083.html
> > > 
> > > 
> > > Bean Huo (5):
> > >   mtd: rawnand: group all NAND specific ops into new nand_chip_ops
> > >   mtd: rawnand: Add {pre,post}_erase hooks in nand_chip_ops
> > >   mtd: rawnand: Add write_oob hook in nand_chip_ops
> > >   mtd: rawnand: Introduce a new function
> > > nand_check_is_erased_page()
> > >   mtd: rawnand: micron: Micron SLC NAND filling block  
> > 
> > When you take my patches in your series, especially when not touching
> > them at all, you should keep my Authorship and SoB first, then add
> > your
> > SoB.
> >   
> 
> sorry for my fault, I thought adding your Signed-off-by in 3/5 is
> suffient. you mean I should add your signed-off-by in 5/5 as well?
> I will do that in next version.

You should keep my Authorship and SoB for both patches + add your SoB
after mine.


Thanks,
Miquèl