diff mbox

mtd: Add config option to skip scanning bad blocks in partitions

Message ID 1421206498-28826-1-git-send-email-dehrenberg@chromium.org
State Rejected
Headers show

Commit Message

Dan Ehrenberg Jan. 14, 2015, 3:34 a.m. UTC
Each partition on a device with bad blocks iterates over its erase
blocks to find an accurate count of bad blocks. The iteration can take
some time, especially on devices with many erase blocks which are
configured to use the OOB area to detect if the block is bad. This
patch makes a config option, MTD_PART_BAD_BLOCK_COUNT, which can
be turned off to skip the initial scan and forgo an accurate count.

Signed-off-by: Dan Ehrenberg <dehrenberg@chromium.org>
---
 drivers/mtd/Kconfig   | 9 +++++++++
 drivers/mtd/mtdpart.c | 2 ++
 2 files changed, 11 insertions(+)

Comments

Ricard Wanderlof Jan. 14, 2015, 7:47 a.m. UTC | #1
On Wed, 14 Jan 2015, Dan Ehrenberg wrote:

> Each partition on a device with bad blocks iterates over its erase
> blocks to find an accurate count of bad blocks. The iteration can take
> some time, especially on devices with many erase blocks which are
> configured to use the OOB area to detect if the block is bad. This
> patch makes a config option, MTD_PART_BAD_BLOCK_COUNT, which can
> be turned off to skip the initial scan and forgo an accurate count.

How much time would you actually save? Sure, there number of blocks in a 
modern flash is in the thousands, but one only has to read the OOB of the 
first page to find the bad block marker, so scanning through the whole 
flash does not take long.

/Ricard
Dan Ehrenberg Jan. 14, 2015, 6:08 p.m. UTC | #2
To initialize the partitions, it takes me 3.5s with this config option
turned on and .2s with the config option turned off. So a large
proportion of the NAND startup time (not counting ubi coming up) is
comes from this bad block scan. It takes about 1.87s to scan over the
device, and I have my partitioning set up so there's an extra
partition covering the whole device in addition to the individual
partitions.

Dan

On Tue, Jan 13, 2015 at 11:47 PM, Ricard Wanderlof
<ricard.wanderlof@axis.com> wrote:
>
> On Wed, 14 Jan 2015, Dan Ehrenberg wrote:
>
>> Each partition on a device with bad blocks iterates over its erase
>> blocks to find an accurate count of bad blocks. The iteration can take
>> some time, especially on devices with many erase blocks which are
>> configured to use the OOB area to detect if the block is bad. This
>> patch makes a config option, MTD_PART_BAD_BLOCK_COUNT, which can
>> be turned off to skip the initial scan and forgo an accurate count.
>
> How much time would you actually save? Sure, there number of blocks in a
> modern flash is in the thousands, but one only has to read the OOB of the
> first page to find the bad block marker, so scanning through the whole
> flash does not take long.
>
> /Ricard
> --
> Ricard Wolf Wanderlöf                           ricardw(at)axis.com
> Axis Communications AB, Lund, Sweden            www.axis.com
> Phone +46 46 272 2016                           Fax +46 46 13 61 30
Richard Weinberger Jan. 14, 2015, 6:35 p.m. UTC | #3
On Wed, Jan 14, 2015 at 7:08 PM, Daniel Ehrenberg <dehrenberg@google.com> wrote:
> To initialize the partitions, it takes me 3.5s with this config option
> turned on and .2s with the config option turned off. So a large
> proportion of the NAND startup time (not counting ubi coming up) is
> comes from this bad block scan. It takes about 1.87s to scan over the
> device, and I have my partitioning set up so there's an extra
> partition covering the whole device in addition to the individual
> partitions.

3.5seconds?! Wow.
Brian Norris Jan. 14, 2015, 7:19 p.m. UTC | #4
On Wed, Jan 14, 2015 at 10:08:51AM -0800, Daniel Ehrenberg wrote:
> To initialize the partitions, it takes me 3.5s with this config option
> turned on and .2s with the config option turned off. So a large
> proportion of the NAND startup time (not counting ubi coming up) is
> comes from this bad block scan. It takes about 1.87s to scan over the
> device, and I have my partitioning set up so there's an extra
> partition covering the whole device in addition to the individual
> partitions.

Why don't you have BBT enabled? Even if it's just in-memory (not
on-flash), you should only have to scan the entire flash once (not once
for each partition), after which every subsequent lookup should only be
a memory access. That'd cut your time in half already, I expect. But if
you use the on-flash BBT (NAND_BBT_USE_FLASH), then you should only have
to scan the flash once per flash lifetime, after which the bad block
information will be stored in just a few pages, and should take only a
millisecond or two to read.

(There are other reasons for using on-flash BBT, but I won't detail them
here.)

I also recall the bad block scanning process will read not just the
spare area, but actually the full 1st page. So that might cause a little
unnecessary slowness.

All in all, I think it's highly unlikely that your patch is necessary.
There are much better ways to get you a faster boot time.

Brian
Dan Ehrenberg Jan. 15, 2015, 5:28 p.m. UTC | #5
On Wed, Jan 14, 2015 at 11:19 AM, Brian Norris
<computersforpeace@gmail.com> wrote:
> On Wed, Jan 14, 2015 at 10:08:51AM -0800, Daniel Ehrenberg wrote:
>> To initialize the partitions, it takes me 3.5s with this config option
>> turned on and .2s with the config option turned off. So a large
>> proportion of the NAND startup time (not counting ubi coming up) is
>> comes from this bad block scan. It takes about 1.87s to scan over the
>> device, and I have my partitioning set up so there's an extra
>> partition covering the whole device in addition to the individual
>> partitions.
>
> Why don't you have BBT enabled? Even if it's just in-memory (not
> on-flash), you should only have to scan the entire flash once (not once
> for each partition), after which every subsequent lookup should only be
> a memory access. That'd cut your time in half already, I expect. But if
> you use the on-flash BBT (NAND_BBT_USE_FLASH), then you should only have
> to scan the flash once per flash lifetime, after which the bad block
> information will be stored in just a few pages, and should take only a
> millisecond or two to read.
>
> (There are other reasons for using on-flash BBT, but I won't detail them
> here.)
>
> I also recall the bad block scanning process will read not just the
> spare area, but actually the full 1st page. So that might cause a little
> unnecessary slowness.
>
> All in all, I think it's highly unlikely that your patch is necessary.
> There are much better ways to get you a faster boot time.
>
> Brian

Thanks for the suggestions. I'm new and I definitely want to do things
the right way, if there's a better way that doesn't need all these
modifications.

I'm using UBI, so I thought an on-device BBT is redundant and would be
ignored by UBI. Also, I couldn't figure out how I'm supposed to to
enable it--the out-of-tree driver I'm using isn't really based on
nand_base.c; correct me if I'm wrong, but it looks like that's a
requirement for using the nand_bbt code. Cutting my time in half
wouldn't be all that great since that's still almost 2 seconds of
useless work.

Where can I read about the other advantages of on-flash BBT when using
UBI? Maybe I can use this technique for other devices I'm working on
where nand_bbt could be used.

Thanks,
Dan
Ricard Wanderlof Jan. 16, 2015, 9:31 a.m. UTC | #6
On Thu, 15 Jan 2015, Daniel Ehrenberg wrote:

> I'm using UBI, so I thought an on-device BBT is redundant and would be
> ignored by UBI. Also, I couldn't figure out how I'm supposed to to
> enable it--the out-of-tree driver I'm using isn't really based on
> nand_base.c; correct me if I'm wrong, but it looks like that's a
> requirement for using the nand_bbt code. Cutting my time in half
> wouldn't be all that great since that's still almost 2 seconds of
> useless work.

What kind of driver are you using? I always though UBI essentially needed 
an mtd device, meaning that if you're using a NAND flash, nand_base.c 
comes in to play, along with a board (hardware) specific driver of course.

The bad block handling mechanism is basically hidden away inside mtd, 
there are mtd API calls to determine if a block is bad or not, and mtd 
handles that according to the configuration for the specific driver, 
either by reading from the flash every time, or using a BBT.

> Where can I read about the other advantages of on-flash BBT when using
> UBI? Maybe I can use this technique for other devices I'm working on
> where nand_bbt could be used.

There are no specific specific UBI-related advantages when using a BBT 
that I know of.

One turning point for me was the realization that at least for some NAND 
chips, it is out of spec to read bad blocks except once, to determine 
which ones are bad by looking at the bad block markers in the individual 
blocks. Hence, using a BBT is a way of fulfilling that requirement. In 
practice, though, the flashes I've come in contact with (SLC up to say 2 
Gbit) don't seem to have any problems in this regard.

Another advantage in a development environment, especially when working 
with upgrade software, is that if something goes wrong that causes blocks 
to be marked bad spuriously, if not using a BBT the new bad blocks are 
marked as such in the same way as factory bad blocks, making it impossible 
to separate the two and undo the bad block markings and revert to the 
factory condition. When using a BBT, new bad blocks are marked in the BBT 
only, so by erasing the BBT and rescanning the bad block markers in the 
OOBs of the blocks the flash can be reset to a known state.

/Ricard
Dan Ehrenberg Jan. 16, 2015, 6:08 p.m. UTC | #7
On Fri, Jan 16, 2015 at 1:31 AM, Ricard Wanderlof
<ricard.wanderlof@axis.com> wrote:
> What kind of driver are you using? I always though UBI essentially needed
> an mtd device, meaning that if you're using a NAND flash, nand_base.c
> comes in to play, along with a board (hardware) specific driver of course.
>
> The bad block handling mechanism is basically hidden away inside mtd,
> there are mtd API calls to determine if a block is bad or not, and mtd
> handles that according to the configuration for the specific driver,
> either by reading from the flash every time, or using a BBT.

I'm currently using an out-of-tree NAND driver which is not based on
nand_base.c.
>
>> Where can I read about the other advantages of on-flash BBT when using
>> UBI? Maybe I can use this technique for other devices I'm working on
>> where nand_bbt could be used.
>
> There are no specific specific UBI-related advantages when using a BBT
> that I know of.
>
> One turning point for me was the realization that at least for some NAND
> chips, it is out of spec to read bad blocks except once, to determine
> which ones are bad by looking at the bad block markers in the individual
> blocks. Hence, using a BBT is a way of fulfilling that requirement. In
> practice, though, the flashes I've come in contact with (SLC up to say 2
> Gbit) don't seem to have any problems in this regard.

Does UBI not satisfy this requirement itself by its own bad block management?
>
> Another advantage in a development environment, especially when working
> with upgrade software, is that if something goes wrong that causes blocks
> to be marked bad spuriously, if not using a BBT the new bad blocks are
> marked as such in the same way as factory bad blocks, making it impossible
> to separate the two and undo the bad block markings and revert to the
> factory condition. When using a BBT, new bad blocks are marked in the BBT
> only, so by erasing the BBT and rescanning the bad block markers in the
> OOBs of the blocks the flash can be reset to a known state.
>
> /Ricard

But when I'm using UBI, will the bad block table ever be read by
anyone other than this startup scan to count the number of bad blocks
in the partition?

Dan
> --
> Ricard Wolf Wanderlöf                           ricardw(at)axis.com
> Axis Communications AB, Lund, Sweden            www.axis.com
> Phone +46 46 272 2016                           Fax +46 46 13 61 30
Ricard Wanderlof Jan. 16, 2015, 6:53 p.m. UTC | #8
On Fri, 16 Jan 2015, Daniel Ehrenberg wrote:

> I'm currently using an out-of-tree NAND driver which is not based on
> nand_base.c.

Are you and Erik Ekman on this list working on the same thing, he is also 
using an out-of-tree NAND driver?

> > One turning point for me was the realization that at least for some NAND
> > chips, it is out of spec to read bad blocks except once, to determine
> > which ones are bad by looking at the bad block markers in the individual
> > blocks. Hence, using a BBT is a way of fulfilling that requirement. In
> > practice, though, the flashes I've come in contact with (SLC up to say 2
> > Gbit) don't seem to have any problems in this regard.
> 
> Does UBI not satisfy this requirement itself by its own bad block management?

UBI does not have any bad block management itself, it is bad block aware, 
so that it is aware that the underyling media may have bad blocks, but it 
basically just ignores them and relies on the mtd layer to keep track of 
them.

> > with upgrade software, is that if something goes wrong that causes blocks
> > to be marked bad spuriously, if not using a BBT the new bad blocks are
> > marked as such in the same way as factory bad blocks, making it impossible
> > to separate the two and undo the bad block markings and revert to the
> > factory condition. When using a BBT, new bad blocks are marked in the BBT
> > only, so by erasing the BBT and rescanning the bad block markers in the
> > OOBs of the blocks the flash can be reset to a known state.
> >
> But when I'm using UBI, will the bad block table ever be read by
> anyone other than this startup scan to count the number of bad blocks
> in the partition?

When I said 'once' I meant once during the lifetime of the flash, not once 
per boot.

The principle is that bad blocks should basically never even be read after 
the flash has been writen, as since they are bad, doing so could 
potentially cause all sorts of ill effects on the data on the flash chip. 
Therefore it is acceptable to read them once when the flash chip is empty, 
but after that bad blocks should be managed using a separate table so that 
the bad blocks are never touched again.

In my personal, but rather limited, experience (1-2 Gb SLC flash), reading 
bad blocks don't seem to cause any problems. As I mentioned in another 
post, blocks seem to be either 'hard' bad in which they seem to be 
completely disengaged from the rest of the chip, or 'soft' bad in which 
case they are just worn out past the point where they can remember data 
but not otherwise damaged. Mostly speciulation based on empirical 
findings, as I have not consulted any flash manufacturer on the subject.

The conclusion as far as I'm concerned is to use a BBT in a production 
code so that the flash is only ever read in its entirety (i.e. with 
badblocks) once, in order to generate the BBT. In our R&D department, 
especially in departments where they just want a unit to work and aren't 
testing or developing flash code, we can be more leniant and have even 
erased factory-marked bad blocks in some cases (when possible). In the 
unlikely event that a memory chip starts behaving erratically (which 
hasn't happened yet) we'd just replace it. In the field such a situation 
would be harder to diagnose so we want to avoid that as far as possible.

/Ricard
Dan Ehrenberg Jan. 16, 2015, 10:14 p.m. UTC | #9
On Fri, Jan 16, 2015 at 10:53 AM, Ricard Wanderlof
<ricard.wanderlof@axis.com> wrote:
>
> On Fri, 16 Jan 2015, Daniel Ehrenberg wrote:
>
> Are you and Erik Ekman on this list working on the same thing, he is also
> using an out-of-tree NAND driver?

No, I'm not working with him.

> The conclusion as far as I'm concerned is to use a BBT in a production
> code so that the flash is only ever read in its entirety (i.e. with
> badblocks) once, in order to generate the BBT. In our R&D department,
> especially in departments where they just want a unit to work and aren't
> testing or developing flash code, we can be more leniant and have even
> erased factory-marked bad blocks in some cases (when possible). In the
> unlikely event that a memory chip starts behaving erratically (which
> hasn't happened yet) we'd just replace it. In the field such a situation
> would be harder to diagnose so we want to avoid that as far as possible.

Thanks for this explanation. I'll work on adding a bad block table to
how I'm using NAND. I don't know what got into my head to think that
UBI was going to do all this for me. I guess the original patch at the
head of this thread isn't so useful for that use case.

Dan
diff mbox

Patch

diff --git a/drivers/mtd/Kconfig b/drivers/mtd/Kconfig
index 94b8210..cc5678f 100644
--- a/drivers/mtd/Kconfig
+++ b/drivers/mtd/Kconfig
@@ -309,6 +309,15 @@  config MTD_SWAP
 	  The driver provides wear leveling by storing erase counter into the
 	  OOB.
 
+config MTD_PART_BAD_BLOCK_COUNT
+	bool "Accurate bad block count for partitions"
+	default y
+	help
+	  Maintain an accurate count of bad blocks within the partition.
+	  With this option turned off, the bad block count might be
+	  inaccurate, avoiding a scan in partition initialization in
+	  order to improve boot performance on some systems.
+
 source "drivers/mtd/chips/Kconfig"
 
 source "drivers/mtd/maps/Kconfig"
diff --git a/drivers/mtd/mtdpart.c b/drivers/mtd/mtdpart.c
index a3e3a7d..aba0d6c 100644
--- a/drivers/mtd/mtdpart.c
+++ b/drivers/mtd/mtdpart.c
@@ -531,6 +531,7 @@  static struct mtd_part *allocate_partition(struct mtd_info *master,
 	slave->mtd.ecc_strength = master->ecc_strength;
 	slave->mtd.bitflip_threshold = master->bitflip_threshold;
 
+#ifdef CONFIG_MTD_PART_BAD_BLOCK_COUNT
 	if (master->_block_isbad) {
 		uint64_t offs = 0;
 
@@ -542,6 +543,7 @@  static struct mtd_part *allocate_partition(struct mtd_info *master,
 			offs += slave->mtd.erasesize;
 		}
 	}
+#endif
 
 out_register:
 	return slave;