diff mbox

UBI: add minimal amount of reserved erase blocks in Kconfig

Message ID 1340636918-7505-1-git-send-email-richard.genoud@gmail.com
State New, archived
Headers show

Commit Message

Richard Genoud June 25, 2012, 3:08 p.m. UTC
On NAND flash devices, UBI reserves some physical erase blocks (PEB) for
bad block handling.
Today, the number of reserved PEB can only be set as a percentage of the
total number of PEB in each UBI volume.
For example, for a NAND flash with 128KiB PEB, 2 MTD partition of 20MiB
(mtd0) and 100MiB (mtd1) and 2% reserved PEB:
- the UBI volume on mtd0 will have 2 PEB reserved
- the UBI volume on mtd1 will have 16 PEB reserved

The problem with this behaviour is that NAND flash manufacturers give a
minimum number of valid block (NVB) during the endurance life of the
device.
E.G.:
Parameter             Symbol    Min    Max    Unit      Notes
--------------------------------------------------------------
Valid block number     NVB     1004    1024   Blocks     1
Note:
1. Invalid blocks are block that contain one or more bad bits beyond
ECC. The device may contain bad blocks upon shipment. Additional bad
blocks may develop over time; however, the total number of available
blocks will not drop below NVB during the endurance life of the device.

From this number we can deduce the maximum number of bad PEB that a
device will contain during its endurance life :
A 128MiB NAND flash (1024 PEB) will not have less than 20 bad blocks
during the flash endurance life.

BUT, the manufacturer doesn't tell where those bad block will appear. He
doesn't say either if they will be equally disposed on the whole device
(and I'm pretty sure they won't).
So, according to the datasheets, we should reserve the maximum number of
bad PEB for each UBI volume.
(Worst case scenario: 20 bad blocks appears on the smallest UBI volume.)

=> The actual parameter MTD_UBI_BEB_RESERVE is not enough to cover this
scenario. We need to set a minimal number of reserved PEB for a UBI
volume.

That's what this patch introduce.

Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
---
 drivers/mtd/ubi/Kconfig |   18 ++++++++++++++++++
 drivers/mtd/ubi/misc.c  |    5 +++--
 drivers/mtd/ubi/ubi.h   |    3 ---
 3 files changed, 21 insertions(+), 5 deletions(-)

Comments

Artem Bityutskiy June 28, 2012, 2:39 p.m. UTC | #1
Hi, first of all, this 1% is does is not good enough for modern devices.
It is just a default I picked out of the thin air.

What we really need is to make it possible to specify beb_rsvd_level at
attach time. I believe it is not difficult to implement:

1. Add a parameter to ubiattach
2. Extend the attach ioctl and add beb_rsvd_level there. We have 12
unused bytes in 'struct ubi_attach_req'.
3. Extend the "mtd" module parameter and allow to specify beb_rsvd_level
there as well.

That's it - you can specify the needed numbers for your flash.

 to make this per-MTD device
On Mon, 2012-06-25 at 17:08 +0200, Richard Genoud wrote:
> The problem with this behaviour is that NAND flash manufacturers give a
> minimum number of valid block (NVB) during the endurance life of the
> device.
> E.G.:
> Parameter             Symbol    Min    Max    Unit      Notes
> --------------------------------------------------------------
> Valid block number     NVB     1004    1024   Blocks     1
> Note:
> 1. Invalid blocks are block that contain one or more bad bits beyond
> ECC. The device may contain bad blocks upon shipment. Additional bad
> blocks may develop over time; however, the total number of available
> blocks will not drop below NVB during the endurance life of the device.

OK, in this discussion is is easier to talk about maximum bad blocks
(MBB) rather than minimum good blocks, though. So basically, the vendor
guarantees that there will be at most MBB bad blocks.

> From this number we can deduce the maximum number of bad PEB that a
> device will contain during its endurance life :
> A 128MiB NAND flash (1024 PEB) will not have less than 20 bad blocks
> during the flash endurance life.

OK.

> BUT, the manufacturer doesn't tell where those bad block will appear. He
> doesn't say either if they will be equally disposed on the whole device
> (and I'm pretty sure they won't).

Right, then if you implement what I suggested, you'll pass
"beb_rsvd_level=20" when attaching both partitions.

> So, according to the datasheets, we should reserve the maximum number of
> bad PEB for each UBI volume.
> (Worst case scenario: 20 bad blocks appears on the smallest UBI volume.)

Err, not smallest UBI volume, but smallest MTD partition. We reserve
PEBs per-MTD device.

> => The actual parameter MTD_UBI_BEB_RESERVE is not enough to cover this
> scenario. We need to set a minimal number of reserved PEB for a UBI
> volume.

Frankly, I do not understand this logic :-) And your patch looks wrong -
it touches the "auto-format" code which you may consider more like a
"debugging" feature and should not rely on this in production.
Richard Genoud June 28, 2012, 4:07 p.m. UTC | #2
2012/6/28 Artem Bityutskiy <dedekind1@gmail.com>:
> Hi, first of all, this 1% is does is not good enough for modern devices.
> It is just a default I picked out of the thin air.
Hi !
Agreed, it seems that 2% of the whole flash (at least for SLC device)
is more realistic.
>
> What we really need is to make it possible to specify beb_rsvd_level at
> attach time. I believe it is not difficult to implement:
>
> 1. Add a parameter to ubiattach
> 2. Extend the attach ioctl and add beb_rsvd_level there. We have 12
> unused bytes in 'struct ubi_attach_req'.
> 3. Extend the "mtd" module parameter and allow to specify beb_rsvd_level
> there as well.
>
> That's it - you can specify the needed numbers for your flash.
>
>  to make this per-MTD device
All right, with this approach, we can have different flash devices
with different MBB. Got it.

> On Mon, 2012-06-25 at 17:08 +0200, Richard Genoud wrote:
>> The problem with this behaviour is that NAND flash manufacturers give a
>> minimum number of valid block (NVB) during the endurance life of the
>> device.
>> E.G.:
>> Parameter             Symbol    Min    Max    Unit      Notes
>> --------------------------------------------------------------
>> Valid block number     NVB     1004    1024   Blocks     1
>> Note:
>> 1. Invalid blocks are block that contain one or more bad bits beyond
>> ECC. The device may contain bad blocks upon shipment. Additional bad
>> blocks may develop over time; however, the total number of available
>> blocks will not drop below NVB during the endurance life of the device.
>
> OK, in this discussion is is easier to talk about maximum bad blocks
> (MBB) rather than minimum good blocks, though. So basically, the vendor
> guarantees that there will be at most MBB bad blocks.
>
>> From this number we can deduce the maximum number of bad PEB that a
>> device will contain during its endurance life :
>> A 128MiB NAND flash (1024 PEB) will not have less than 20 bad blocks
>> during the flash endurance life.
>
> OK.
>
>> BUT, the manufacturer doesn't tell where those bad block will appear. He
>> doesn't say either if they will be equally disposed on the whole device
>> (and I'm pretty sure they won't).
>
> Right, then if you implement what I suggested, you'll pass
> "beb_rsvd_level=20" when attaching both partitions.
>
>> So, according to the datasheets, we should reserve the maximum number of
>> bad PEB for each UBI volume.
>> (Worst case scenario: 20 bad blocks appears on the smallest UBI volume.)
>
> Err, not smallest UBI volume, but smallest MTD partition. We reserve
> PEBs per-MTD device.
Yes, you're right (I usually use only one ubi volume per mdt device,
that's why I mixed up things)

>> => The actual parameter MTD_UBI_BEB_RESERVE is not enough to cover this
>> scenario. We need to set a minimal number of reserved PEB for a UBI
>> volume.
>
> Frankly, I do not understand this logic :-) And your patch looks wrong -
> it touches the "auto-format" code which you may consider more like a
> "debugging" feature and should not rely on this in production.
Sorry, but I don't understand what you mean by the "auto-format" code.
The other thing I don't understand is that the patch touches only the
ubi_calculate_reserved() function which is the only place where
beb_rsvd_level is set.
And with your approach, as far as we'll get beb_rsvd_level value from
ioctl, I'll have also to touch this function...
(or maybe what's wrong with this patch is that beb_rsvd_level can be
less that 2, in this case I should have set a minimum range of 2 in
Kconfig)

Thanks for your comments.

Best regards,
Richard.

> --
> Best Regards,
> Artem Bityutskiy
Artem Bityutskiy June 28, 2012, 4:22 p.m. UTC | #3
On Thu, 2012-06-28 at 18:07 +0200, Richard Genoud wrote:
> Agreed, it seems that 2% of the whole flash (at least for SLC device)
> is more realistic.

Agree, feel free to send a separate patch for this.

> > Frankly, I do not understand this logic :-) And your patch looks wrong -
> > it touches the "auto-format" code which you may consider more like a
> > "debugging" feature and should not rely on this in production.
> Sorry, but I don't understand what you mean by the "auto-format" code.

Yeah, right, this comment was incorrect, sorry.
Shmulik Ladkani June 28, 2012, 5:53 p.m. UTC | #4
On Mon, 25 Jun 2012 17:08:38 +0200 Richard Genoud <richard.genoud@gmail.com> wrote:
> diff --git a/drivers/mtd/ubi/misc.c b/drivers/mtd/ubi/misc.c
> index f6a7d7a..c2c6db0 100644
> --- a/drivers/mtd/ubi/misc.c
> +++ b/drivers/mtd/ubi/misc.c
> @@ -100,8 +100,9 @@ void ubi_calculate_reserved(struct ubi_device *ubi)
>  {
>  	ubi->beb_rsvd_level = ubi->good_peb_count/100;
>  	ubi->beb_rsvd_level *= CONFIG_MTD_UBI_BEB_RESERVE;

BTW Artem, I've always had an unresolved issue with this original
calculation... maybe you can shed some light here.

Why 'beb_rsvd_level' is set to RESERVE percent of the 'good_peb_count'?

As the device gets worn, number of 'good_peb_count' will lessen over
time - and as such, the 'beb_rsvd_level' may vary (lessen) over time.

I'd expect a fixed number of 'beb_rsvd_level' PEBs for a given mtd
partition, or more correctly, as Richard suggests, the *sum* of bad PEBs
plus the beb reserved PEBs should be constant for a partition - as I
do not expect more than a known constant of blocks to go bad during
device's (and thus, partition's) lifetime.

Regards,
Shmulik
Richard Genoud June 29, 2012, 7:17 a.m. UTC | #5
2012/6/28 Artem Bityutskiy <dedekind1@gmail.com>:
> On Thu, 2012-06-28 at 18:07 +0200, Richard Genoud wrote:
>> Agreed, it seems that 2% of the whole flash (at least for SLC device)
>> is more realistic.
>
> Agree, feel free to send a separate patch for this.
Done !
>
>> > Frankly, I do not understand this logic :-) And your patch looks wrong -
>> > it touches the "auto-format" code which you may consider more like a
>> > "debugging" feature and should not rely on this in production.
>> Sorry, but I don't understand what you mean by the "auto-format" code.
>
> Yeah, right, this comment was incorrect, sorry.
>

I was thinking that instead of giving to ubiattach the MBB, we could
give it the MBB percentage (maximum bad blocks percentage of the whole
flash device).
From this % and the whole flash size, we get the MBB number, and set
beb_rsvd_level for each MTD part.
It will be easier for userspace, as we won't have to set a different
value for different flash size. The default 2% value will (almost)
always be correct.
We can even get rid of the CONFIG_MTD_UBI_BEB_RESERVE option.

BTW, the real killer feature would be that the flash gives its NVB or
MBB value in response to the READ_ID command, but unfortunately that's
not the case...
Artem Bityutskiy June 29, 2012, 12:47 p.m. UTC | #6
On Thu, 2012-06-28 at 20:53 +0300, Shmulik Ladkani wrote:
> On Mon, 25 Jun 2012 17:08:38 +0200 Richard Genoud <richard.genoud@gmail.com> wrote:
> > diff --git a/drivers/mtd/ubi/misc.c b/drivers/mtd/ubi/misc.c
> > index f6a7d7a..c2c6db0 100644
> > --- a/drivers/mtd/ubi/misc.c
> > +++ b/drivers/mtd/ubi/misc.c
> > @@ -100,8 +100,9 @@ void ubi_calculate_reserved(struct ubi_device *ubi)
> >  {
> >  	ubi->beb_rsvd_level = ubi->good_peb_count/100;
> >  	ubi->beb_rsvd_level *= CONFIG_MTD_UBI_BEB_RESERVE;
> 
> BTW Artem, I've always had an unresolved issue with this original
> calculation... maybe you can shed some light here.
> 
> Why 'beb_rsvd_level' is set to RESERVE percent of the 'good_peb_count'?

I thought it was a good idea I guess :-)

> As the device gets worn, number of 'good_peb_count' will lessen over
> time - and as such, the 'beb_rsvd_level' may vary (lessen) over time.

Yeah, probably you are right.

> I'd expect a fixed number of 'beb_rsvd_level' PEBs for a given mtd
> partition, or more correctly, as Richard suggests, the *sum* of bad PEBs
> plus the beb reserved PEBs should be constant for a partition - as I
> do not expect more than a known constant of blocks to go bad during
> device's (and thus, partition's) lifetime.

Those days we did not have this "vendor-guaranteed max. bad blocks
count" thing and I thought that UBI would try to always maintain a pool
of reserved PEBs.

Would you send a patch?

Note, one thing: if I already marked max. possible amount of PEBs as
bad, I just do not reserve any more. But if I have a plenty of available
PEBs anyway (my volumes are smaller than they could be, or I shrink one
of them), and another PEB goes bad, I should just mark it as bad. I may
print a warning, but should not panic.
Artem Bityutskiy June 29, 2012, 2:10 p.m. UTC | #7
On Fri, 2012-06-29 at 09:17 +0200, Richard Genoud wrote:
> 2012/6/28 Artem Bityutskiy <dedekind1@gmail.com>:
> > On Thu, 2012-06-28 at 18:07 +0200, Richard Genoud wrote:
> >> Agreed, it seems that 2% of the whole flash (at least for SLC device)
> >> is more realistic.
> >
> > Agree, feel free to send a separate patch for this.
> Done !
> >
> >> > Frankly, I do not understand this logic :-) And your patch looks wrong -
> >> > it touches the "auto-format" code which you may consider more like a
> >> > "debugging" feature and should not rely on this in production.
> >> Sorry, but I don't understand what you mean by the "auto-format" code.
> >
> > Yeah, right, this comment was incorrect, sorry.
> >
> 
> I was thinking that instead of giving to ubiattach the MBB, we could
> give it the MBB percentage (maximum bad blocks percentage of the whole
> flash device).
> From this % and the whole flash size, we get the MBB number, and set
> beb_rsvd_level for each MTD part.

Well, I thought that it may be not flexible enough for some people,
because you cannot give 1.5%, since flaoting-point arithmetic in the
kernel is not used.

> It will be easier for userspace, as we won't have to set a different
> value for different flash size. The default 2% value will (almost)
> always be correct.
> We can even get rid of the CONFIG_MTD_UBI_BEB_RESERVE option.

Yes, this option would be killed.

> BTW, the real killer feature would be that the flash gives its NVB or
> MBB value in response to the READ_ID command, but unfortunately that's
> not the case...

Sure, you can also implement this. Add the corresponding field to
'struct mtd_info', 0 will mean "not known". UBI could pick it.
Richard Genoud June 29, 2012, 2:57 p.m. UTC | #8
2012/6/29 Artem Bityutskiy <dedekind1@gmail.com>:
>> I was thinking that instead of giving to ubiattach the MBB, we could
>> give it the MBB percentage (maximum bad blocks percentage of the whole
>> flash device).
>> From this % and the whole flash size, we get the MBB number, and set
>> beb_rsvd_level for each MTD part.
>
> Well, I thought that it may be not flexible enough for some people,
> because you cannot give 1.5%, since flaoting-point arithmetic in the
> kernel is not used.
So there's 2 ways to bypass that :
- use per-1024 instead of percent => the default value would be 20.
- let ubiattach do the conversion from MBB% to MBB
I like the 1st one because it gives a number close to what we have in
the datasheets.
What do you think ?
Artem Bityutskiy June 29, 2012, 3:07 p.m. UTC | #9
On Fri, 2012-06-29 at 16:57 +0200, Richard Genoud wrote:
> 2012/6/29 Artem Bityutskiy <dedekind1@gmail.com>:
> >> I was thinking that instead of giving to ubiattach the MBB, we could
> >> give it the MBB percentage (maximum bad blocks percentage of the whole
> >> flash device).
> >> From this % and the whole flash size, we get the MBB number, and set
> >> beb_rsvd_level for each MTD part.
> >
> > Well, I thought that it may be not flexible enough for some people,
> > because you cannot give 1.5%, since flaoting-point arithmetic in the
> > kernel is not used.
> So there's 2 ways to bypass that :
> - use per-1024 instead of percent => the default value would be 20.
> - let ubiattach do the conversion from MBB% to MBB
> I like the 1st one because it gives a number close to what we have in
> the datasheets.
> What do you think ?

Yeah, probably, centi-percents would be OK.
Shmulik Ladkani June 30, 2012, 8:43 p.m. UTC | #10
Hi,

On Fri, 29 Jun 2012 15:47:44 +0300 Artem Bityutskiy <dedekind1@gmail.com> wrote:
> > I'd expect a fixed number of 'beb_rsvd_level' PEBs for a given mtd
> > partition, or more correctly, as Richard suggests, the *sum* of bad PEBs
> > plus the beb reserved PEBs should be constant for a partition - as I
> > do not expect more than a known constant of blocks to go bad during
> > device's (and thus, partition's) lifetime.
> 
> Those days we did not have this "vendor-guaranteed max. bad blocks
> count" thing and I thought that UBI would try to always maintain a pool
> of reserved PEBs.
> 
> Would you send a patch?

Ok. I'll try to fiddle with UBI beb reserved arithmetics.

I thought to make it a gradual change.

First, change the semantics of CONFIG_MTD_UBI_BEB_RESERVE to be the
percent of total number of eraseblocks (instead of total number of
_good_ eraseblocks). And 'reserve' counts for both existing bad PEBs
and those reserved for future bad PEB handling.
Note this would still be % of the blocks in the mtd partition (and as
such, it is very loosely related to the MBB of the device, if at all).

Then, Richard may introduce the MBB parameter to ubiattach, and later
may kill CONFIG_MTD_UBI_BEB_RESERVE (if no longer needed). The
calculations will be according to the new parameter.

What do you say?

On a side note, the new ubiattach parameter should not be called MBB,
but rather a generic "reserved" (or "% reserved").

This is since the MBB is a property of the mtd nand device.
But the ubi user may issue, for the partition attached, a value other
than the device's MBB, according to his storage needs and
risks/securities he is willing to take.

Regards,
Shmulik
Richard Genoud July 2, 2012, 6:15 a.m. UTC | #11
2012/6/30 Shmulik Ladkani <shmulik.ladkani@gmail.com>:
> Hi,
Hi,
> I thought to make it a gradual change.
>
> First, change the semantics of CONFIG_MTD_UBI_BEB_RESERVE to be the
> percent of total number of eraseblocks (instead of total number of
> _good_ eraseblocks). And 'reserve' counts for both existing bad PEBs
> and those reserved for future bad PEB handling.
> Note this would still be % of the blocks in the mtd partition (and as
> such, it is very loosely related to the MBB of the device, if at all).
>
> Then, Richard may introduce the MBB parameter to ubiattach, and later
> may kill CONFIG_MTD_UBI_BEB_RESERVE (if no longer needed). The
> calculations will be according to the new parameter.
>
> What do you say?
It's ok for me.

> On a side note, the new ubiattach parameter should not be called MBB,
> but rather a generic "reserved" (or "% reserved").
>
> This is since the MBB is a property of the mtd nand device.
> But the ubi user may issue, for the partition attached, a value other
> than the device's MBB, according to his storage needs and
> risks/securities he is willing to take.
>
> Regards,
> Shmulik
Artem Bityutskiy July 3, 2012, 10:46 a.m. UTC | #12
On Sat, 2012-06-30 at 23:43 +0300, Shmulik Ladkani wrote:
> First, change the semantics of CONFIG_MTD_UBI_BEB_RESERVE to be the
> percent of total number of eraseblocks (instead of total number of
> _good_ eraseblocks). And 'reserve' counts for both existing bad PEBs
> and those reserved for future bad PEB handling.
> Note this would still be % of the blocks in the mtd partition (and as
> such, it is very loosely related to the MBB of the device, if at all).
> 
> Then, Richard may introduce the MBB parameter to ubiattach, and later
> may kill CONFIG_MTD_UBI_BEB_RESERVE (if no longer needed). The
> calculations will be according to the new parameter.
> 
> What do you say?
> 
> On a side note, the new ubiattach parameter should not be called MBB,
> but rather a generic "reserved" (or "% reserved").
> 
> This is since the MBB is a property of the mtd nand device.
> But the ubi user may issue, for the partition attached, a value other
> than the device's MBB, according to his storage needs and
> risks/securities he is willing to take.

Sounds good, thank you!
diff mbox

Patch

diff --git a/drivers/mtd/ubi/Kconfig b/drivers/mtd/ubi/Kconfig
index 738ee8d..6770b96 100644
--- a/drivers/mtd/ubi/Kconfig
+++ b/drivers/mtd/ubi/Kconfig
@@ -27,6 +27,24 @@  config MTD_UBI_WL_THRESHOLD
 	  life-cycle less than 10000, the threshold should be lessened (e.g.,
 	  to 128 or 256, although it does not have to be power of 2).
 
+config MTD_UBI_BEB_MIN_RESERVE
+	int "Minimal number of reserved eraseblocks for bad eraseblocks handling"
+	default 2
+	range 0 2147483647
+	depends on MTD_UBI
+	help
+	  If the MTD device admits of bad eraseblocks (e.g. NAND flash), UBI
+	  reserves some amount of physical eraseblocks to handle new bad
+	  eraseblocks. For example, if a flash physical eraseblock becomes bad,
+	  UBI uses these reserved physical eraseblocks to relocate the bad one.
+	  This option specifies the minimal amount of eraseblocks that will be
+	  reserved for bad eraseblock handling.
+	  If the device has less or as many eraseblocks as this value, the
+	  percentage value (MTD_UBI_BEB_RESERVE) is used.
+	  If the underlying flash does not admit of bad eraseblocks (e.g. NOR
+	  flash), this value is ignored and nothing is reserved.
+	  Leave the default value if unsure.
+
 config MTD_UBI_BEB_RESERVE
 	int "Percentage of reserved eraseblocks for bad eraseblocks handling"
 	default 1
diff --git a/drivers/mtd/ubi/misc.c b/drivers/mtd/ubi/misc.c
index f6a7d7a..c2c6db0 100644
--- a/drivers/mtd/ubi/misc.c
+++ b/drivers/mtd/ubi/misc.c
@@ -100,8 +100,9 @@  void ubi_calculate_reserved(struct ubi_device *ubi)
 {
 	ubi->beb_rsvd_level = ubi->good_peb_count/100;
 	ubi->beb_rsvd_level *= CONFIG_MTD_UBI_BEB_RESERVE;
-	if (ubi->beb_rsvd_level < MIN_RESEVED_PEBS)
-		ubi->beb_rsvd_level = MIN_RESEVED_PEBS;
+	if ((ubi->beb_rsvd_level < CONFIG_MTD_UBI_BEB_MIN_RESERVE) &&
+	    (ubi->good_peb_count > CONFIG_MTD_UBI_BEB_MIN_RESERVE))
+		ubi->beb_rsvd_level = CONFIG_MTD_UBI_BEB_MIN_RESERVE;
 }
 
 /**
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index a1a81c9..d321c74 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -59,9 +59,6 @@ 
 #define ubi_err(fmt, ...) printk(KERN_ERR "UBI error: %s: " fmt "\n", \
 				 __func__, ##__VA_ARGS__)
 
-/* Lowest number PEBs reserved for bad PEB handling */
-#define MIN_RESEVED_PEBS 2
-
 /* Background thread name pattern */
 #define UBI_BGT_NAME_PATTERN "ubi_bgt%dd"