diff mbox series

mtd: rawnand: nand_bbt: Skip bad blocks when searching for the BBT in NAND

Message ID 20210625123821.207458-1-s.riedmueller@phytec.de
State Accepted
Headers show
Series mtd: rawnand: nand_bbt: Skip bad blocks when searching for the BBT in NAND | expand

Commit Message

Stefan Riedmüller June 25, 2021, 12:38 p.m. UTC
The blocks containing the bad block table can become bad as well. So
make sure to skip any blocks that are marked bad when searching for the
bad block table.

Otherwise in very rare cases where two BBT blocks wear out it might
happen that an obsolete BBT is used instead of a newer available
version.

This only applies to drivers which make use of a bad block marker in flash.
Other drivers won't be able to identify bad BBT blocks and thus can't skip
these.

Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
---

Hi,

this is the second approach of this patch. The first one [1] unfortunately lead
to boot failures on i.MX 27 boards [2] since the i.MX 27 driver uses the bad
block marker position for the bad block table marker which lead to falsely
identifying all BBT blocks as bad.

This new patch now skips the check for bad BBT blocks if the BBT marker
position in OOB overlaps with the bad block marker position or if a driver
can't use bad block markers in flash at all (NAND_BBT_NO_OOB_BBM or
NAND_NO_BBM_QUIRK are set). This hopefully makes sure we don't break drivers
which cannot check for bad BBT blocks due to the limitations mentioned before.

I was only able to test this patch on a phyCORE-i.MX 6 and a phyCARD-i.MX 27.
Therfore would really appreciate more people testing this to make sure I have
not missed another use case where the bad block marker position in OOB is used
in a different way than for the BBM.

Regards,
Stefan

[1] https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de/
[2] https://lore.kernel.org/linux-mtd/CAOMZO5DufVR=+EzCa1-MPUc+ZefZVTXb5Kgu3Wxms7cxw9GmGg@mail.gmail.com/

 drivers/mtd/nand/raw/nand_bbt.c | 34 +++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

Comments

Miquel Raynal July 6, 2021, 4:13 p.m. UTC | #1
Hi Stefan,

Stefan Riedmueller <s.riedmueller@phytec.de> wrote on Fri, 25 Jun 2021
14:38:21 +0200:

> The blocks containing the bad block table can become bad as well. So
> make sure to skip any blocks that are marked bad when searching for the
> bad block table.
> 
> Otherwise in very rare cases where two BBT blocks wear out it might
> happen that an obsolete BBT is used instead of a newer available
> version.
> 
> This only applies to drivers which make use of a bad block marker in flash.
> Other drivers won't be able to identify bad BBT blocks and thus can't skip
> these.
> 
> Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>

Besides the alignment of the helper parameters (nitpick) the patch
looks good to me. If we can get someone to test it before the merge
window closes it's perfect. Otherwise I'll apply it and we'll let robots
do the job.

> ---
> 
> Hi,
> 
> this is the second approach of this patch. The first one [1] unfortunately lead
> to boot failures on i.MX 27 boards [2] since the i.MX 27 driver uses the bad
> block marker position for the bad block table marker which lead to falsely
> identifying all BBT blocks as bad.
> 
> This new patch now skips the check for bad BBT blocks if the BBT marker
> position in OOB overlaps with the bad block marker position or if a driver
> can't use bad block markers in flash at all (NAND_BBT_NO_OOB_BBM or
> NAND_NO_BBM_QUIRK are set). This hopefully makes sure we don't break drivers
> which cannot check for bad BBT blocks due to the limitations mentioned before.
> 
> I was only able to test this patch on a phyCORE-i.MX 6 and a phyCARD-i.MX 27.
> Therfore would really appreciate more people testing this to make sure I have
> not missed another use case where the bad block marker position in OOB is used
> in a different way than for the BBM.
> 
> Regards,
> Stefan
> 
> [1] https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de/
> [2] https://lore.kernel.org/linux-mtd/CAOMZO5DufVR=+EzCa1-MPUc+ZefZVTXb5Kgu3Wxms7cxw9GmGg@mail.gmail.com/
> 
>  drivers/mtd/nand/raw/nand_bbt.c | 34 +++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/drivers/mtd/nand/raw/nand_bbt.c b/drivers/mtd/nand/raw/nand_bbt.c
> index dced32a126d9..2a30714350ee 100644
> --- a/drivers/mtd/nand/raw/nand_bbt.c
> +++ b/drivers/mtd/nand/raw/nand_bbt.c
> @@ -447,6 +447,36 @@ static int scan_block_fast(struct nand_chip *this, struct nand_bbt_descr *bd,
>  	return 0;
>  }
>  
> +/* Check if a potential BBT block is marked as bad */
> +static int bbt_block_checkbad(struct nand_chip *this,
> +				      struct nand_bbt_descr *td,
> +				      loff_t offs, uint8_t *buf)
> +{
> +	struct nand_bbt_descr *bd = this->badblock_pattern;
> +
> +	/*
> +	 * No need to check for a bad BBT block if the BBM area overlaps with
> +	 * the bad block table marker area in OOB since writing a BBM here
> +	 * invalidates the bad block table marker anyway.
> +	 */
> +	if (!(td->options & NAND_BBT_NO_OOB) &&
> +	    td->offs >= bd->offs && td->offs < bd->offs + bd->len)
> +		return 0;
> +
> +	/*
> +	 * There is no point in checking for a bad block marker if writing
> +	 * such marker is not supported
> +	 */
> +	if (this->bbt_options & NAND_BBT_NO_OOB_BBM ||
> +	    this->options & NAND_NO_BBM_QUIRK)
> +		return 0;
> +
> +	if (scan_block_fast(this, bd, offs, buf) > 0)
> +		return 1;
> +
> +	return 0;
> +}
> +
>  /**
>   * create_bbt - [GENERIC] Create a bad block table by scanning the device
>   * @this: NAND chip object
> @@ -560,6 +590,10 @@ static int search_bbt(struct nand_chip *this, uint8_t *buf,
>  			int actblock = startblock + dir * block;
>  			loff_t offs = (loff_t)actblock << this->bbt_erase_shift;
>  
> +			/* Check if block is marked bad */
> +			if (bbt_block_checkbad(this, td, offs, buf))
> +				continue;
> +
>  			/* Read first page */
>  			scan_read(this, buf, offs, mtd->writesize, td);
>  			if (!check_pattern(buf, scanlen, mtd->writesize, td)) {

Thanks,
Miquèl
Alexander Dahl July 7, 2021, 9:18 a.m. UTC | #2
Hei hei,

Am Dienstag, 6. Juli 2021, 18:13:08 CEST schrieb Miquel Raynal:
> Hi Stefan,
> 
> Stefan Riedmueller <s.riedmueller@phytec.de> wrote on Fri, 25 Jun 2021
> 
> 14:38:21 +0200:
> > The blocks containing the bad block table can become bad as well. So
> > make sure to skip any blocks that are marked bad when searching for the
> > bad block table.
> > 
> > Otherwise in very rare cases where two BBT blocks wear out it might
> > happen that an obsolete BBT is used instead of a newer available
> > version.
> > 
> > This only applies to drivers which make use of a bad block marker in
> > flash.
> > Other drivers won't be able to identify bad BBT blocks and thus can't skip
> > these.
> > 
> > Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
> 
> Besides the alignment of the helper parameters (nitpick) the patch
> looks good to me. If we can get someone to test it before the merge
> window closes it's perfect. Otherwise I'll apply it and we'll let robots
> do the job.

Added the patch on top of v5.10.21 and booted a SAMA5D27 based board, from the 
boot log:

nand: device found, Manufacturer ID: 0x01, Chip ID: 0xda
nand: AMD/Spansion S34ML02G1
nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
Bad block table found at page 131008, version 0xFF
Bad block table found at page 130944, version 0xFF
6 cmdlinepart partitions found on MTD device atmel_nand
Creating 6 MTD partitions on "atmel_nand":
0x000000000000-0x000000040000 : "bootstrap"
0x000000040000-0x000000100000 : "uboot"
0x000000100000-0x000000140000 : "env1"
0x000000140000-0x000000180000 : "env2"
0x000000180000-0x000000200000 : "reserved"
0x000000200000-0x000010000000 : "UBI"
NET: Registered protocol family 17
ubi0: attaching mtd5
random: fast init done
ubi0: scanning is finished
ubi0: attached mtd5 (name "UBI", size 254 MiB)
ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
ubi0: good PEBs: 2028, bad PEBs: 4, corrupted PEBs: 0
ubi0: user volume: 4, internal volumes: 1, max. volumes count: 128
ubi0: max/mean erase counter: 4/1, WL threshold: 4096, image sequence number: 
1600812189
ubi0: available PEBs: 0, total reserved PEBs: 2028, PEBs reserved for bad PEB 
handling: 36
ubi0: background thread "ubi_bgt0d" started, PID 85

No suspicious other messages. 

Not sure if that device would be affected anyways. No bad blocks are known on 
this flash, device behaves as usual.

HTH & Greets
Alex

> 
> > ---
> > 
> > Hi,
> > 
> > this is the second approach of this patch. The first one [1] unfortunately
> > lead to boot failures on i.MX 27 boards [2] since the i.MX 27 driver uses
> > the bad block marker position for the bad block table marker which lead
> > to falsely identifying all BBT blocks as bad.
> > 
> > This new patch now skips the check for bad BBT blocks if the BBT marker
> > position in OOB overlaps with the bad block marker position or if a driver
> > can't use bad block markers in flash at all (NAND_BBT_NO_OOB_BBM or
> > NAND_NO_BBM_QUIRK are set). This hopefully makes sure we don't break
> > drivers which cannot check for bad BBT blocks due to the limitations
> > mentioned before.
> > 
> > I was only able to test this patch on a phyCORE-i.MX 6 and a phyCARD-i.MX
> > 27. Therfore would really appreciate more people testing this to make
> > sure I have not missed another use case where the bad block marker
> > position in OOB is used in a different way than for the BBM.
> > 
> > Regards,
> > Stefan
> > 
> > [1]
> > https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@p
> > hytec.de/ [2]
> > https://lore.kernel.org/linux-mtd/CAOMZO5DufVR=+EzCa1-MPUc+ZefZVTXb5Kgu3W
> > xms7cxw9GmGg@mail.gmail.com/> 
> >  drivers/mtd/nand/raw/nand_bbt.c | 34 +++++++++++++++++++++++++++++++++
> >  1 file changed, 34 insertions(+)
> > 
> > diff --git a/drivers/mtd/nand/raw/nand_bbt.c
> > b/drivers/mtd/nand/raw/nand_bbt.c index dced32a126d9..2a30714350ee 100644
> > --- a/drivers/mtd/nand/raw/nand_bbt.c
> > +++ b/drivers/mtd/nand/raw/nand_bbt.c
> > @@ -447,6 +447,36 @@ static int scan_block_fast(struct nand_chip *this,
> > struct nand_bbt_descr *bd,> 
> >  	return 0;
> >  
> >  }
> > 
> > +/* Check if a potential BBT block is marked as bad */
> > +static int bbt_block_checkbad(struct nand_chip *this,
> > +				      struct nand_bbt_descr *td,
> > +				      loff_t offs, uint8_t *buf)
> > +{
> > +	struct nand_bbt_descr *bd = this->badblock_pattern;
> > +
> > +	/*
> > +	 * No need to check for a bad BBT block if the BBM area overlaps with
> > +	 * the bad block table marker area in OOB since writing a BBM here
> > +	 * invalidates the bad block table marker anyway.
> > +	 */
> > +	if (!(td->options & NAND_BBT_NO_OOB) &&
> > +	    td->offs >= bd->offs && td->offs < bd->offs + bd->len)
> > +		return 0;
> > +
> > +	/*
> > +	 * There is no point in checking for a bad block marker if writing
> > +	 * such marker is not supported
> > +	 */
> > +	if (this->bbt_options & NAND_BBT_NO_OOB_BBM ||
> > +	    this->options & NAND_NO_BBM_QUIRK)
> > +		return 0;
> > +
> > +	if (scan_block_fast(this, bd, offs, buf) > 0)
> > +		return 1;
> > +
> > +	return 0;
> > +}
> > +
> > 
> >  /**
> >  
> >   * create_bbt - [GENERIC] Create a bad block table by scanning the device
> >   * @this: NAND chip object
> > 
> > @@ -560,6 +590,10 @@ static int search_bbt(struct nand_chip *this, uint8_t
> > *buf,> 
> >  			int actblock = startblock + dir * block;
> >  			loff_t offs = (loff_t)actblock << this->bbt_erase_shift;
> > 
> > +			/* Check if block is marked bad */
> > +			if (bbt_block_checkbad(this, td, offs, buf))
> > +				continue;
> > +
> > 
> >  			/* Read first page */
> >  			scan_read(this, buf, offs, mtd->writesize, td);
> >  			if (!check_pattern(buf, scanlen, mtd->writesize, td)) {
> 
> Thanks,
> Miquèl
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
Stefan Riedmüller July 8, 2021, 8:42 a.m. UTC | #3
Hi Alexander,

On Wed, 2021-07-07 at 11:18 +0200, Alexander Dahl wrote:
> Hei hei,
> 
> Am Dienstag, 6. Juli 2021, 18:13:08 CEST schrieb Miquel Raynal:
> > Hi Stefan,
> > 
> > Stefan Riedmueller <s.riedmueller@phytec.de> wrote on Fri, 25 Jun 2021
> > 
> > 14:38:21 +0200:
> > > The blocks containing the bad block table can become bad as well. So
> > > make sure to skip any blocks that are marked bad when searching for the
> > > bad block table.
> > > 
> > > Otherwise in very rare cases where two BBT blocks wear out it might
> > > happen that an obsolete BBT is used instead of a newer available
> > > version.
> > > 
> > > This only applies to drivers which make use of a bad block marker in
> > > flash.
> > > Other drivers won't be able to identify bad BBT blocks and thus can't
> > > skip
> > > these.
> > > 
> > > Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
> > 
> > Besides the alignment of the helper parameters (nitpick) the patch
> > looks good to me. If we can get someone to test it before the merge
> > window closes it's perfect. Otherwise I'll apply it and we'll let robots
> > do the job.
> 
> Added the patch on top of v5.10.21 and booted a SAMA5D27 based board, from
> the 
> boot log:
> 
> nand: device found, Manufacturer ID: 0x01, Chip ID: 0xda
> nand: AMD/Spansion S34ML02G1
> nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
> Bad block table found at page 131008, version 0xFF
> Bad block table found at page 130944, version 0xFF
> 6 cmdlinepart partitions found on MTD device atmel_nand
> Creating 6 MTD partitions on "atmel_nand":
> 0x000000000000-0x000000040000 : "bootstrap"
> 0x000000040000-0x000000100000 : "uboot"
> 0x000000100000-0x000000140000 : "env1"
> 0x000000140000-0x000000180000 : "env2"
> 0x000000180000-0x000000200000 : "reserved"
> 0x000000200000-0x000010000000 : "UBI"
> NET: Registered protocol family 17
> ubi0: attaching mtd5
> random: fast init done
> ubi0: scanning is finished
> ubi0: attached mtd5 (name "UBI", size 254 MiB)
> ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
> ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
> ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
> ubi0: good PEBs: 2028, bad PEBs: 4, corrupted PEBs: 0
> ubi0: user volume: 4, internal volumes: 1, max. volumes count: 128
> ubi0: max/mean erase counter: 4/1, WL threshold: 4096, image sequence
> number: 
> 1600812189
> ubi0: available PEBs: 0, total reserved PEBs: 2028, PEBs reserved for bad
> PEB 
> handling: 36
> ubi0: background thread "ubi_bgt0d" started, PID 85
> 
> No suspicious other messages. 
> 
> Not sure if that device would be affected anyways. No bad blocks are known
> on 
> this flash, device behaves as usual.

Thanks for testing! :)

Regards,
Stefan

> 
> HTH & Greets
> Alex
> 
> > > ---
> > > 
> > > Hi,
> > > 
> > > this is the second approach of this patch. The first one [1]
> > > unfortunately
> > > lead to boot failures on i.MX 27 boards [2] since the i.MX 27 driver
> > > uses
> > > the bad block marker position for the bad block table marker which lead
> > > to falsely identifying all BBT blocks as bad.
> > > 
> > > This new patch now skips the check for bad BBT blocks if the BBT marker
> > > position in OOB overlaps with the bad block marker position or if a
> > > driver
> > > can't use bad block markers in flash at all (NAND_BBT_NO_OOB_BBM or
> > > NAND_NO_BBM_QUIRK are set). This hopefully makes sure we don't break
> > > drivers which cannot check for bad BBT blocks due to the limitations
> > > mentioned before.
> > > 
> > > I was only able to test this patch on a phyCORE-i.MX 6 and a phyCARD-
> > > i.MX
> > > 27. Therfore would really appreciate more people testing this to make
> > > sure I have not missed another use case where the bad block marker
> > > position in OOB is used in a different way than for the BBM.
> > > 
> > > Regards,
> > > Stefan
> > > 
> > > [1]
> > > https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@p
> > > hytec.de/ [2]
> > > https://lore.kernel.org/linux-mtd/CAOMZO5DufVR=+EzCa1-MPUc+ZefZVTXb5Kgu3W
> > > xms7cxw9GmGg@mail.gmail.com/> 
> > >  drivers/mtd/nand/raw/nand_bbt.c | 34 +++++++++++++++++++++++++++++++++
> > >  1 file changed, 34 insertions(+)
> > > 
> > > diff --git a/drivers/mtd/nand/raw/nand_bbt.c
> > > b/drivers/mtd/nand/raw/nand_bbt.c index dced32a126d9..2a30714350ee
> > > 100644
> > > --- a/drivers/mtd/nand/raw/nand_bbt.c
> > > +++ b/drivers/mtd/nand/raw/nand_bbt.c
> > > @@ -447,6 +447,36 @@ static int scan_block_fast(struct nand_chip *this,
> > > struct nand_bbt_descr *bd,> 
> > >  	return 0;
> > >  
> > >  }
> > > 
> > > +/* Check if a potential BBT block is marked as bad */
> > > +static int bbt_block_checkbad(struct nand_chip *this,
> > > +				      struct nand_bbt_descr *td,
> > > +				      loff_t offs, uint8_t *buf)
> > > +{
> > > +	struct nand_bbt_descr *bd = this->badblock_pattern;
> > > +
> > > +	/*
> > > +	 * No need to check for a bad BBT block if the BBM area overlaps with
> > > +	 * the bad block table marker area in OOB since writing a BBM here
> > > +	 * invalidates the bad block table marker anyway.
> > > +	 */
> > > +	if (!(td->options & NAND_BBT_NO_OOB) &&
> > > +	    td->offs >= bd->offs && td->offs < bd->offs + bd->len)
> > > +		return 0;
> > > +
> > > +	/*
> > > +	 * There is no point in checking for a bad block marker if writing
> > > +	 * such marker is not supported
> > > +	 */
> > > +	if (this->bbt_options & NAND_BBT_NO_OOB_BBM ||
> > > +	    this->options & NAND_NO_BBM_QUIRK)
> > > +		return 0;
> > > +
> > > +	if (scan_block_fast(this, bd, offs, buf) > 0)
> > > +		return 1;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > 
> > >  /**
> > >  
> > >   * create_bbt - [GENERIC] Create a bad block table by scanning the
> > > device
> > >   * @this: NAND chip object
> > > 
> > > @@ -560,6 +590,10 @@ static int search_bbt(struct nand_chip *this,
> > > uint8_t
> > > *buf,> 
> > >  			int actblock = startblock + dir * block;
> > >  			loff_t offs = (loff_t)actblock << this-
> > > >bbt_erase_shift;
> > > 
> > > +			/* Check if block is marked bad */
> > > +			if (bbt_block_checkbad(this, td, offs, buf))
> > > +				continue;
> > > +
> > > 
> > >  			/* Read first page */
> > >  			scan_read(this, buf, offs, mtd->writesize, td);
> > >  			if (!check_pattern(buf, scanlen, mtd->writesize, td))
> > > {
> > 
> > Thanks,
> > Miquèl
> > 
> > ______________________________________________________
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 
> 
>
Miquel Raynal July 15, 2021, 11:08 p.m. UTC | #4
On Fri, 2021-06-25 at 12:38:21 UTC, Stefan Riedmueller wrote:
> The blocks containing the bad block table can become bad as well. So
> make sure to skip any blocks that are marked bad when searching for the
> bad block table.
> 
> Otherwise in very rare cases where two BBT blocks wear out it might
> happen that an obsolete BBT is used instead of a newer available
> version.
> 
> This only applies to drivers which make use of a bad block marker in flash.
> Other drivers won't be able to identify bad BBT blocks and thus can't skip
> these.
> 
> Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git nand/next, thanks.

Miquel
diff mbox series

Patch

diff --git a/drivers/mtd/nand/raw/nand_bbt.c b/drivers/mtd/nand/raw/nand_bbt.c
index dced32a126d9..2a30714350ee 100644
--- a/drivers/mtd/nand/raw/nand_bbt.c
+++ b/drivers/mtd/nand/raw/nand_bbt.c
@@ -447,6 +447,36 @@  static int scan_block_fast(struct nand_chip *this, struct nand_bbt_descr *bd,
 	return 0;
 }
 
+/* Check if a potential BBT block is marked as bad */
+static int bbt_block_checkbad(struct nand_chip *this,
+				      struct nand_bbt_descr *td,
+				      loff_t offs, uint8_t *buf)
+{
+	struct nand_bbt_descr *bd = this->badblock_pattern;
+
+	/*
+	 * No need to check for a bad BBT block if the BBM area overlaps with
+	 * the bad block table marker area in OOB since writing a BBM here
+	 * invalidates the bad block table marker anyway.
+	 */
+	if (!(td->options & NAND_BBT_NO_OOB) &&
+	    td->offs >= bd->offs && td->offs < bd->offs + bd->len)
+		return 0;
+
+	/*
+	 * There is no point in checking for a bad block marker if writing
+	 * such marker is not supported
+	 */
+	if (this->bbt_options & NAND_BBT_NO_OOB_BBM ||
+	    this->options & NAND_NO_BBM_QUIRK)
+		return 0;
+
+	if (scan_block_fast(this, bd, offs, buf) > 0)
+		return 1;
+
+	return 0;
+}
+
 /**
  * create_bbt - [GENERIC] Create a bad block table by scanning the device
  * @this: NAND chip object
@@ -560,6 +590,10 @@  static int search_bbt(struct nand_chip *this, uint8_t *buf,
 			int actblock = startblock + dir * block;
 			loff_t offs = (loff_t)actblock << this->bbt_erase_shift;
 
+			/* Check if block is marked bad */
+			if (bbt_block_checkbad(this, td, offs, buf))
+				continue;
+
 			/* Read first page */
 			scan_read(this, buf, offs, mtd->writesize, td);
 			if (!check_pattern(buf, scanlen, mtd->writesize, td)) {