Message ID | 1324332231-30884-1-git-send-email-computersforpeace@gmail.com |
---|---|
State | New, archived |
Headers | show |
Sorry for two e-mails; apparently several addresses I reached were no longer valid. I'm removing Randy Dunlap and Sukumar Ghorai from the CC list, and I updated Barry Song and Adrian Hunter's addresses. Hopefully this can prevent other people from receiving too many bounces :) Patch description copied below. On Mon, Dec 19, 2011 at 2:03 PM, Brian Norris <computersforpeace@gmail.com> wrote: > Currently, the flash-based BBT implementation writes bad block data only > to its flash-based table and not to the OOB marker area. Then, as new > bad blocks are marked over time, the OOB markers become out of date and > the flash-based table becomes the only source of current bad block > information. This can be a problem when: > > * bootloader cannot read the flash-based BBT format > * BBT is corrupted and the flash must be rescanned for bad > blocks; we want to remember bad blocks that were marked from Linux > > In an attempt to keep the bad block markers in sync with the flash-based > BBT, this patch changes the default so that we write bad block markers > to the proper OOB area on each block in addition to flash-based BBT. > > Theoretically, the bad block table and the OOB markers can still get out > of sync if the system experiences a power cut between writing the BBT to > flash and writing the OOB marker to a newly-marked bad block. However, > this is a relatively unlikely event, as new bad blocks shouldn't appear > frequently. > > Note that this is a change from the previous default flash-based BBT > behavior. If any contributors rely on the old behavior, they are welcome > to introduce an option flag for it. > > Adapted from code by Matthieu Castet. > > Signed-off-by: Brian Norris <computersforpeace@gmail.com> Brian
On 12/19/2011 11:03 PM, Brian Norris wrote: > Currently, the flash-based BBT implementation writes bad block data only > to its flash-based table and not to the OOB marker area. Then, as new > bad blocks are marked over time, the OOB markers become out of date and > the flash-based table becomes the only source of current bad block > information. This can be a problem when: > > * bootloader cannot read the flash-based BBT format > * BBT is corrupted and the flash must be rescanned for bad > blocks; we want to remember bad blocks that were marked from Linux > > In an attempt to keep the bad block markers in sync with the flash-based > BBT, this patch changes the default so that we write bad block markers > to the proper OOB area on each block in addition to flash-based BBT. > > Theoretically, the bad block table and the OOB markers can still get out > of sync if the system experiences a power cut between writing the BBT to > flash and writing the OOB marker to a newly-marked bad block. However, > this is a relatively unlikely event, as new bad blocks shouldn't appear > frequently. > The marker and BBT may get out of sync. You should use either the one _or_ the other but not both. Why use both anyway? I use BBT because I have no room left in OOB or updating a single area in OOB is too complicated / hardly possible. In the former case, reading that area again and interpreting it as something as it was once is a bad thing. Why is the BBT corrupted? You have two tables. If one is broken you have still the other. If you lose one block which was about to be marked bad you find soon enough. Sebastian
Hi Sebastian, On Tue, Dec 20, 2011 at 12:49 AM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > On 12/19/2011 11:03 PM, Brian Norris wrote: >> >> Currently, the flash-based BBT implementation writes bad block data only >> to its flash-based table and not to the OOB marker area. Then, as new >> bad blocks are marked over time, the OOB markers become out of date and >> the flash-based table becomes the only source of current bad block >> information. This can be a problem when: >> >> * bootloader cannot read the flash-based BBT format >> * BBT is corrupted and the flash must be rescanned for bad >> blocks; we want to remember bad blocks that were marked from Linux >> >> In an attempt to keep the bad block markers in sync with the flash-based >> BBT, this patch changes the default so that we write bad block markers >> to the proper OOB area on each block in addition to flash-based BBT. >> >> Theoretically, the bad block table and the OOB markers can still get out >> of sync if the system experiences a power cut between writing the BBT to >> flash and writing the OOB marker to a newly-marked bad block. However, >> this is a relatively unlikely event, as new bad blocks shouldn't appear >> frequently. >> > > The marker and BBT may get out of sync. You should use either the one _or_ > the other but not both. Why use both anyway? Two reasons for using OOB area were stated above. There are cases where: * bootloader cannot read the flash-based BBT format [... but it still may need to recognize bad blocks] * BBT is corrupted and the flash must be rescanned for bad blocks; we want to remember bad blocks that were marked from Linux In the above cases, one might still want to use flash-based BBT for boot-time performance. > I use BBT because I > have no room left in OOB or updating a single area in OOB is too > complicated / hardly possible. > In the former case, reading that area again and interpreting it as > something as it was once is a bad thing. Well, then that is one case where you want an option to avoid writing to markers back to OOB. We may need an option flag to enforce the old behavior. However, your use case is not the only use case. I use flash-based BBT because of the boot-time performance improvement when I don't have to rescan the whole flash. I believe there are others who use flash-based BBT who want the ability to fall back to OOB bad block markers. > Why is the BBT corrupted? You have two tables. If one is broken you > have still the other. If you lose one block which was about to be > marked bad you find soon enough. There are certainly cases where the table could be erased/overwritten (and therefore "corrupted"), like by a bootloader that doesn't recognize the BBT, by a software bug, etc. Also, a table entry that was written once and then read back over a lifetime might wear out and start to have errors. Recent patches have tried to solve this with "scrubbing" and, if necessary, rescanning the flash for bad blocks to rebuild the table. Also, mirroring is not a requirement for flash-based BBT. A driver could provide a custom, non-mirrored configuration. Brian
On Tue, 2011-12-20 at 10:17 -0800, Brian Norris wrote: > > I use BBT because I > > have no room left in OOB or updating a single area in OOB is too > > complicated / hardly possible. > > In the former case, reading that area again and interpreting it as > > something as it was once is a bad thing. > > Well, then that is one case where you want an option to avoid writing > to markers back to OOB. We may need an option flag to enforce the old > behavior. I guess there should be some way of detecting this case (no space in OOB) and just not writing. Artem.
On Tue, Dec 20, 2011 at 12:49 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > On Tue, 2011-12-20 at 10:17 -0800, Brian Norris wrote: >> > I use BBT because I >> > have no room left in OOB or updating a single area in OOB is too >> > complicated / hardly possible. >> > In the former case, reading that area again and interpreting it as >> > something as it was once is a bad thing. >> >> Well, then that is one case where you want an option to avoid writing >> to markers back to OOB. We may need an option flag to enforce the old >> behavior. > > I guess there should be some way of detecting this case (no space in > OOB) and just not writing. Any suggestions on how to detect no space in OOB? ECC layout structure is not necessarily reliable, as its fields are designed for determining what's available apart from BBM and ECC (for userspace or filesystem data). My idea: instead of adding more layout info about free space for writing BBM, we can add a flag to avoid writing BBM to OOB (NAND_NO_WRITE_OOB?). Would this suffice for you, Sebastian? It would effectively be a reversal of my v1 patch; we write to both the flash-BBT and the OOB BBM by default and provide an option that prevents writing to OOB. If we use NAND_NO_WRITE_OOB, are there other reasonable situations in which we should prevent OOB writes according to this flag? Another question: I noticed that when writing bad block markers to OOB, we simply overwrite without erasing first. This is a problem on MLC, which aren't designed to be written twice; they usually can write some of the bits to 0, but not all of them. Is it reasonable to change nand_default_block_markbad() to always have it erase the block before writing? We can just ignore errors (if this is a really bad block that can't be erased properly). This would be subject of a second patch of course. Brian
On Wed, 21 Dec 2011 17:15:29 -0800 Brian Norris <computersforpeace@gmail.com> wrote: > Another question: I noticed that when writing bad block markers to > OOB, we simply overwrite without erasing first. This is a problem on > MLC, which aren't designed to be written twice; they usually can write > some of the bits to 0, but not all of them. Is it reasonable to change > nand_default_block_markbad() to always have it erase the block before > writing? Not an MLC expert, but could it be that for these MLC chips, NAND_BBT_USE_FLASH is usually set, hence the complexities invloved in re-programming the OOB are simply avoided? If so, it stregthens your NAND_BBT_WRITE_BBM/NAND_NO_WRITE_OOB approach, which goes hand-in-hand with Sebastian's scenario where there's no room in the OOB.
On Fri, Dec 23, 2011 at 12:13 AM, Shmulik Ladkani <shmulik.ladkani@gmail.com> wrote: > On Wed, 21 Dec 2011 17:15:29 -0800 Brian Norris <computersforpeace@gmail.com> wrote: >> Another question: I noticed that when writing bad block markers to >> OOB, we simply overwrite without erasing first. This is a problem on >> MLC, which aren't designed to be written twice; they usually can write >> some of the bits to 0, but not all of them. Is it reasonable to change >> nand_default_block_markbad() to always have it erase the block before >> writing? > > Not an MLC expert, but could it be that for these MLC chips, > NAND_BBT_USE_FLASH is usually set, hence the complexities invloved in > re-programming the OOB are simply avoided? I can't comment on others' use of MLC + NAND_BBT_USE_FLASH, but I use NAND_BBT_USE_FLASH for all NAND. NAND_BBT_USE_FLASH only applies to the bad block *table* location - that it will be stored in flash. This does not exclude the possibility of still writing/recognizing bad block markers in OOB. Thus, I do not understand your how "the complexities invloved in re-programming the OOB are simply avoided". In fact, I feel that (when possible) it's best to write to *both* a flash-based BBT and the OOB region of the bad block, so I do not plan to avoid the complexities of reprogramming the OOB. Please correct me if I'm missing your point. > If so, it stregthens your NAND_BBT_WRITE_BBM/NAND_NO_WRITE_OOB approach, > which goes hand-in-hand with Sebastian's scenario where there's no room > in the OOB. I will proceed with v2 adding a NAND_NO_WRITE_OOB flag to provide an option for cases like Sebastian's, where there is no writeable OOB space whatsoever. Perhaps the paragraph beginning with "Another question:" should be relegated to a separate patch/thread to avoid confusion between two related but distinct issues. I will try to write a patch soon that implements my suggestion and tag it onto a v2 patch series. Brian
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 35b4565..dfa017e 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -393,6 +393,7 @@ static int nand_default_block_markbad(struct mtd_info *mtd, loff_t ofs) struct nand_chip *chip = mtd->priv; uint8_t buf[2] = { 0, 0 }; int block, ret, i = 0; + struct mtd_oob_ops ops; if (chip->bbt_options & NAND_BBT_SCANLASTPAGE) ofs += mtd->erasesize - mtd->writesize; @@ -403,34 +404,40 @@ static int nand_default_block_markbad(struct mtd_info *mtd, loff_t ofs) chip->bbt[block >> 2] |= 0x01 << ((block & 0x03) << 1); /* Do we have a flash based bad block table? */ - if (chip->bbt_options & NAND_BBT_USE_FLASH) + if (chip->bbt_options & NAND_BBT_USE_FLASH) { ret = nand_update_bbt(mtd, ofs); - else { - struct mtd_oob_ops ops; + if (ret) + return ret; + } - nand_get_device(chip, mtd, FL_WRITING); + /* + * Write bad block marker to OOB + * Note that a flash-based BBT (when used) can become out of sync with + * OOB markers if a power cut occurs here. See: + * http://lists.infradead.org/pipermail/linux-mtd/2011-December/038851.html + */ - /* - * Write to first two pages if necessary. If we write to more - * than one location, the first error encountered quits the - * procedure. We write two bytes per location, so we dont have - * to mess with 16 bit access. - */ - ops.len = ops.ooblen = 2; - ops.datbuf = NULL; - ops.oobbuf = buf; - ops.ooboffs = chip->badblockpos & ~0x01; - ops.mode = MTD_OPS_PLACE_OOB; - do { - ret = nand_do_write_oob(mtd, ofs, &ops); - - i++; - ofs += mtd->writesize; - } while (!ret && (chip->bbt_options & NAND_BBT_SCAN2NDPAGE) && - i < 2); + nand_get_device(chip, mtd, FL_WRITING); + + /* + * Write to first two pages if necessary. If we write to more than one + * location, the first error encountered quits the procedure. We write + * two bytes per location, so we dont have to mess with 16 bit access. + */ + ops.len = ops.ooblen = 2; + ops.datbuf = NULL; + ops.oobbuf = buf; + ops.ooboffs = chip->badblockpos & ~0x01; + ops.mode = MTD_OPS_PLACE_OOB; + do { + ret = nand_do_write_oob(mtd, ofs, &ops); + + i++; + ofs += mtd->writesize; + } while (!ret && (chip->bbt_options & NAND_BBT_SCAN2NDPAGE) && i < 2); + + nand_release_device(mtd); - nand_release_device(mtd); - } if (!ret) mtd->ecc_stats.badblocks++;
Currently, the flash-based BBT implementation writes bad block data only to its flash-based table and not to the OOB marker area. Then, as new bad blocks are marked over time, the OOB markers become out of date and the flash-based table becomes the only source of current bad block information. This can be a problem when: * bootloader cannot read the flash-based BBT format * BBT is corrupted and the flash must be rescanned for bad blocks; we want to remember bad blocks that were marked from Linux In an attempt to keep the bad block markers in sync with the flash-based BBT, this patch changes the default so that we write bad block markers to the proper OOB area on each block in addition to flash-based BBT. Theoretically, the bad block table and the OOB markers can still get out of sync if the system experiences a power cut between writing the BBT to flash and writing the OOB marker to a newly-marked bad block. However, this is a relatively unlikely event, as new bad blocks shouldn't appear frequently. Note that this is a change from the previous default flash-based BBT behavior. If any contributors rely on the old behavior, they are welcome to introduce an option flag for it. Adapted from code by Matthieu Castet. Signed-off-by: Brian Norris <computersforpeace@gmail.com> --- v2: Explain potential power cut issues and remove option for retaining old behavior. I CC'd various MTD contributors; speak up if the new default is unacceptable! drivers/mtd/nand/nand_base.c | 59 +++++++++++++++++++++++------------------ 1 files changed, 33 insertions(+), 26 deletions(-)