Patchwork [v2] mtd: nand: write bad block marker even with BBT

login
register
mail settings
Submitter Brian Norris
Date Dec. 19, 2011, 10:03 p.m.
Message ID <1324332231-30884-1-git-send-email-computersforpeace@gmail.com>
Download mbox | patch
Permalink /patch/132324/
State New
Headers show

Comments

Brian Norris - Dec. 19, 2011, 10:03 p.m.
Currently, the flash-based BBT implementation writes bad block data only
to its flash-based table and not to the OOB marker area. Then, as new
bad blocks are marked over time, the OOB markers become out of date and
the flash-based table becomes the only source of current bad block
information. This can be a problem when:

 * bootloader cannot read the flash-based BBT format
 * BBT is corrupted and the flash must be rescanned for bad
   blocks; we want to remember bad blocks that were marked from Linux

In an attempt to keep the bad block markers in sync with the flash-based
BBT, this patch changes the default so that we write bad block markers
to the proper OOB area on each block in addition to flash-based BBT.

Theoretically, the bad block table and the OOB markers can still get out
of sync if the system experiences a power cut between writing the BBT to
flash and writing the OOB marker to a newly-marked bad block. However,
this is a relatively unlikely event, as new bad blocks shouldn't appear
frequently.

Note that this is a change from the previous default flash-based BBT
behavior. If any contributors rely on the old behavior, they are welcome
to introduce an option flag for it.

Adapted from code by Matthieu Castet.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
---
v2: Explain potential power cut issues and remove option for retaining
    old behavior. I CC'd various MTD contributors; speak up if the new
    default is unacceptable!

drivers/mtd/nand/nand_base.c |   59 +++++++++++++++++++++++------------------
 1 files changed, 33 insertions(+), 26 deletions(-)
Brian Norris - Dec. 19, 2011, 10:16 p.m.
Sorry for two e-mails; apparently several addresses I reached were no
longer valid. I'm removing Randy Dunlap and Sukumar Ghorai from the CC
list, and I updated Barry Song and Adrian Hunter's addresses.

Hopefully this can prevent other people from receiving too many
bounces :) Patch description copied below.

On Mon, Dec 19, 2011 at 2:03 PM, Brian Norris
<computersforpeace@gmail.com> wrote:
> Currently, the flash-based BBT implementation writes bad block data only
> to its flash-based table and not to the OOB marker area. Then, as new
> bad blocks are marked over time, the OOB markers become out of date and
> the flash-based table becomes the only source of current bad block
> information. This can be a problem when:
>
>  * bootloader cannot read the flash-based BBT format
>  * BBT is corrupted and the flash must be rescanned for bad
>   blocks; we want to remember bad blocks that were marked from Linux
>
> In an attempt to keep the bad block markers in sync with the flash-based
> BBT, this patch changes the default so that we write bad block markers
> to the proper OOB area on each block in addition to flash-based BBT.
>
> Theoretically, the bad block table and the OOB markers can still get out
> of sync if the system experiences a power cut between writing the BBT to
> flash and writing the OOB marker to a newly-marked bad block. However,
> this is a relatively unlikely event, as new bad blocks shouldn't appear
> frequently.
>
> Note that this is a change from the previous default flash-based BBT
> behavior. If any contributors rely on the old behavior, they are welcome
> to introduce an option flag for it.
>
> Adapted from code by Matthieu Castet.
>
> Signed-off-by: Brian Norris <computersforpeace@gmail.com>

Brian
Sebastian Siewior - Dec. 20, 2011, 8:49 a.m.
On 12/19/2011 11:03 PM, Brian Norris wrote:
> Currently, the flash-based BBT implementation writes bad block data only
> to its flash-based table and not to the OOB marker area. Then, as new
> bad blocks are marked over time, the OOB markers become out of date and
> the flash-based table becomes the only source of current bad block
> information. This can be a problem when:
>
>   * bootloader cannot read the flash-based BBT format
>   * BBT is corrupted and the flash must be rescanned for bad
>     blocks; we want to remember bad blocks that were marked from Linux
>
> In an attempt to keep the bad block markers in sync with the flash-based
> BBT, this patch changes the default so that we write bad block markers
> to the proper OOB area on each block in addition to flash-based BBT.
>
> Theoretically, the bad block table and the OOB markers can still get out
> of sync if the system experiences a power cut between writing the BBT to
> flash and writing the OOB marker to a newly-marked bad block. However,
> this is a relatively unlikely event, as new bad blocks shouldn't appear
> frequently.
>

The marker and BBT may get out of sync. You should use either the one 
_or_ the other but not both. Why use both anyway? I use BBT because I
have no room left in OOB or updating a single area in OOB is too
complicated / hardly possible.
In the former case, reading that area again and interpreting it as
something as it was once is a bad thing.

Why is the BBT corrupted? You have two tables. If one is broken you
have still the other. If you lose one block which was about to be
marked bad you find soon enough.

Sebastian
Brian Norris - Dec. 20, 2011, 6:17 p.m.
Hi Sebastian,

On Tue, Dec 20, 2011 at 12:49 AM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 12/19/2011 11:03 PM, Brian Norris wrote:
>>
>> Currently, the flash-based BBT implementation writes bad block data only
>> to its flash-based table and not to the OOB marker area. Then, as new
>> bad blocks are marked over time, the OOB markers become out of date and
>> the flash-based table becomes the only source of current bad block
>> information. This can be a problem when:
>>
>>  * bootloader cannot read the flash-based BBT format
>>  * BBT is corrupted and the flash must be rescanned for bad
>>    blocks; we want to remember bad blocks that were marked from Linux
>>
>> In an attempt to keep the bad block markers in sync with the flash-based
>> BBT, this patch changes the default so that we write bad block markers
>> to the proper OOB area on each block in addition to flash-based BBT.
>>
>> Theoretically, the bad block table and the OOB markers can still get out
>> of sync if the system experiences a power cut between writing the BBT to
>> flash and writing the OOB marker to a newly-marked bad block. However,
>> this is a relatively unlikely event, as new bad blocks shouldn't appear
>> frequently.
>>
>
> The marker and BBT may get out of sync. You should use either the one _or_
> the other but not both. Why use both anyway?

Two reasons for using OOB area were stated above. There are cases where:
* bootloader cannot read the flash-based BBT format [... but it still
may need to recognize bad blocks]
* BBT is corrupted and the flash must be rescanned for bad blocks; we
want to remember bad blocks that were marked from Linux

In the above cases, one might still want to use flash-based BBT for
boot-time performance.

> I use BBT because I
> have no room left in OOB or updating a single area in OOB is too
> complicated / hardly possible.
> In the former case, reading that area again and interpreting it as
> something as it was once is a bad thing.

Well, then that is one case where you want an option to avoid writing
to markers back to OOB. We may need an option flag to enforce the old
behavior.

However, your use case is not the only use case. I use flash-based BBT
because of the boot-time performance improvement when I don't have to
rescan the whole flash. I believe there are others who use flash-based
BBT who want the ability to fall back to OOB bad block markers.

> Why is the BBT corrupted? You have two tables. If one is broken you
> have still the other. If you lose one block which was about to be
> marked bad you find soon enough.

There are certainly cases where the table could be erased/overwritten
(and therefore "corrupted"), like by a bootloader that doesn't
recognize the BBT, by a software bug, etc. Also, a table entry that
was written once and then read back over a lifetime might wear out and
start to have errors. Recent patches have tried to solve this with
"scrubbing" and, if necessary, rescanning the flash for bad blocks to
rebuild the table.

Also, mirroring is not a requirement for flash-based BBT. A driver
could provide a custom, non-mirrored configuration.

Brian
Artem Bityutskiy - Dec. 20, 2011, 8:49 p.m.
On Tue, 2011-12-20 at 10:17 -0800, Brian Norris wrote:
> > I use BBT because I
> > have no room left in OOB or updating a single area in OOB is too
> > complicated / hardly possible.
> > In the former case, reading that area again and interpreting it as
> > something as it was once is a bad thing.
> 
> Well, then that is one case where you want an option to avoid writing
> to markers back to OOB. We may need an option flag to enforce the old
> behavior. 

I guess there should be some way of detecting this case (no space in
OOB) and just not writing.

Artem.
Brian Norris - Dec. 22, 2011, 1:15 a.m.
On Tue, Dec 20, 2011 at 12:49 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> On Tue, 2011-12-20 at 10:17 -0800, Brian Norris wrote:
>> > I use BBT because I
>> > have no room left in OOB or updating a single area in OOB is too
>> > complicated / hardly possible.
>> > In the former case, reading that area again and interpreting it as
>> > something as it was once is a bad thing.
>>
>> Well, then that is one case where you want an option to avoid writing
>> to markers back to OOB. We may need an option flag to enforce the old
>> behavior.
>
> I guess there should be some way of detecting this case (no space in
> OOB) and just not writing.

Any suggestions on how to detect no space in OOB? ECC layout structure
is not necessarily reliable, as its fields are designed for
determining what's available apart from BBM and ECC (for userspace or
filesystem data).

My idea: instead of adding more layout info about free space for
writing BBM, we can add a flag to avoid writing BBM to OOB
(NAND_NO_WRITE_OOB?). Would this suffice for you, Sebastian? It would
effectively be a reversal of my v1 patch; we write to both the
flash-BBT and the OOB BBM by default and provide an option that
prevents writing to OOB.

If we use NAND_NO_WRITE_OOB, are there other reasonable situations in
which we should prevent OOB writes according to this flag?

Another question: I noticed that when writing bad block markers to
OOB, we simply overwrite without erasing first. This is a problem on
MLC, which aren't designed to be written twice; they usually can write
some of the bits to 0, but not all of them. Is it reasonable to change
nand_default_block_markbad() to always have it erase the block before
writing? We can just ignore errors (if this is a really bad block that
can't be erased properly). This would be subject of a second patch of
course.

Brian
Shmulik Ladkani - Dec. 23, 2011, 8:13 a.m.
On Wed, 21 Dec 2011 17:15:29 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> Another question: I noticed that when writing bad block markers to
> OOB, we simply overwrite without erasing first. This is a problem on
> MLC, which aren't designed to be written twice; they usually can write
> some of the bits to 0, but not all of them. Is it reasonable to change
> nand_default_block_markbad() to always have it erase the block before
> writing?

Not an MLC expert, but could it be that for these MLC chips,
NAND_BBT_USE_FLASH is usually set, hence the complexities invloved in
re-programming the OOB are simply avoided?

If so, it stregthens your NAND_BBT_WRITE_BBM/NAND_NO_WRITE_OOB approach,
which goes hand-in-hand with Sebastian's scenario where there's no room
in the OOB.
Brian Norris - Jan. 6, 2012, 3:10 a.m.
On Fri, Dec 23, 2011 at 12:13 AM, Shmulik Ladkani
<shmulik.ladkani@gmail.com> wrote:
> On Wed, 21 Dec 2011 17:15:29 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
>> Another question: I noticed that when writing bad block markers to
>> OOB, we simply overwrite without erasing first. This is a problem on
>> MLC, which aren't designed to be written twice; they usually can write
>> some of the bits to 0, but not all of them. Is it reasonable to change
>> nand_default_block_markbad() to always have it erase the block before
>> writing?
>
> Not an MLC expert, but could it be that for these MLC chips,
> NAND_BBT_USE_FLASH is usually set, hence the complexities invloved in
> re-programming the OOB are simply avoided?

I can't comment on others' use of MLC + NAND_BBT_USE_FLASH, but I use
NAND_BBT_USE_FLASH for all NAND.

NAND_BBT_USE_FLASH only applies to the bad block *table* location -
that it will be stored in flash. This does not exclude the possibility
of still writing/recognizing bad block markers in OOB. Thus, I do not
understand your how "the complexities invloved in re-programming the
OOB are simply avoided". In fact, I feel that (when possible) it's
best to write to *both* a flash-based BBT and the OOB region of the
bad block, so I do not plan to avoid the complexities of reprogramming
the OOB. Please correct me if I'm missing your point.

> If so, it stregthens your NAND_BBT_WRITE_BBM/NAND_NO_WRITE_OOB approach,
> which goes hand-in-hand with Sebastian's scenario where there's no room
> in the OOB.

I will proceed with v2 adding a NAND_NO_WRITE_OOB flag to provide an
option for cases like Sebastian's, where there is no writeable OOB
space whatsoever.

Perhaps the paragraph beginning with "Another question:" should be
relegated to a separate patch/thread to avoid confusion between two
related but distinct issues. I will try to write a patch soon that
implements my suggestion and tag it onto a v2 patch series.

Brian

Patch

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 35b4565..dfa017e 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -393,6 +393,7 @@  static int nand_default_block_markbad(struct mtd_info *mtd, loff_t ofs)
 	struct nand_chip *chip = mtd->priv;
 	uint8_t buf[2] = { 0, 0 };
 	int block, ret, i = 0;
+	struct mtd_oob_ops ops;
 
 	if (chip->bbt_options & NAND_BBT_SCANLASTPAGE)
 		ofs += mtd->erasesize - mtd->writesize;
@@ -403,34 +404,40 @@  static int nand_default_block_markbad(struct mtd_info *mtd, loff_t ofs)
 		chip->bbt[block >> 2] |= 0x01 << ((block & 0x03) << 1);
 
 	/* Do we have a flash based bad block table? */
-	if (chip->bbt_options & NAND_BBT_USE_FLASH)
+	if (chip->bbt_options & NAND_BBT_USE_FLASH) {
 		ret = nand_update_bbt(mtd, ofs);
-	else {
-		struct mtd_oob_ops ops;
+		if (ret)
+			return ret;
+	}
 
-		nand_get_device(chip, mtd, FL_WRITING);
+	/*
+	 * Write bad block marker to OOB
+	 * Note that a flash-based BBT (when used) can become out of sync with
+	 * OOB markers if a power cut occurs here. See:
+	 *   http://lists.infradead.org/pipermail/linux-mtd/2011-December/038851.html
+	 */
 
-		/*
-		 * Write to first two pages if necessary. If we write to more
-		 * than one location, the first error encountered quits the
-		 * procedure. We write two bytes per location, so we dont have
-		 * to mess with 16 bit access.
-		 */
-		ops.len = ops.ooblen = 2;
-		ops.datbuf = NULL;
-		ops.oobbuf = buf;
-		ops.ooboffs = chip->badblockpos & ~0x01;
-		ops.mode = MTD_OPS_PLACE_OOB;
-		do {
-			ret = nand_do_write_oob(mtd, ofs, &ops);
-
-			i++;
-			ofs += mtd->writesize;
-		} while (!ret && (chip->bbt_options & NAND_BBT_SCAN2NDPAGE) &&
-				i < 2);
+	nand_get_device(chip, mtd, FL_WRITING);
+
+	/*
+	 * Write to first two pages if necessary. If we write to more than one
+	 * location, the first error encountered quits the procedure. We write
+	 * two bytes per location, so we dont have to mess with 16 bit access.
+	 */
+	ops.len = ops.ooblen = 2;
+	ops.datbuf = NULL;
+	ops.oobbuf = buf;
+	ops.ooboffs = chip->badblockpos & ~0x01;
+	ops.mode = MTD_OPS_PLACE_OOB;
+	do {
+		ret = nand_do_write_oob(mtd, ofs, &ops);
+
+		i++;
+		ofs += mtd->writesize;
+	} while (!ret && (chip->bbt_options & NAND_BBT_SCAN2NDPAGE) && i < 2);
+
+	nand_release_device(mtd);
 
-		nand_release_device(mtd);
-	}
 	if (!ret)
 		mtd->ecc_stats.badblocks++;