Patchwork nandwrite: add --nobad to write bad blocks

login
register
mail settings
Submitter Mike Frysinger
Date Sept. 12, 2010, 3:51 a.m.
Message ID <1284263480-31573-1-git-send-email-vapier@gentoo.org>
Download mbox | patch
Permalink /patch/64539/
State Superseded, archived
Headers show

Comments

Mike Frysinger - Sept. 12, 2010, 3:51 a.m.
Sometimes dumping bad blocks is useful, like when the block isn't actually
bad but the OOB layout isn't what the kernel is expecting or is otherwise
screwed up.  The --nobad option allows just that.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 nandwrite.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)
Artem Bityutskiy - Sept. 12, 2010, 4:27 p.m.
On Sat, 2010-09-11 at 23:51 -0400, Mike Frysinger wrote:
> Sometimes dumping bad blocks is useful, like when the block isn't actually
> bad but the OOB layout isn't what the kernel is expecting or is otherwise
> screwed up.  The --nobad option allows just that.
> 
> Signed-off-by: Mike Frysinger <vapier@gentoo.org>

How useful is this? I think instead you should implement the force flag
we discussed and deal with 'otherwise screwed up' eraseblocks with
flash_erase. I am afraid it is too dangerous to introduce this option.
Mike Frysinger - Sept. 12, 2010, 7:05 p.m.
On Sun, Sep 12, 2010 at 12:27, Artem Bityutskiy wrote:
> On Sat, 2010-09-11 at 23:51 -0400, Mike Frysinger wrote:
>> Sometimes dumping bad blocks is useful, like when the block isn't actually
>> bad but the OOB layout isn't what the kernel is expecting or is otherwise
>> screwed up.  The --nobad option allows just that.
>
> How useful is this? I think instead you should implement the force flag
> we discussed and deal with 'otherwise screwed up' eraseblocks with
> flash_erase. I am afraid it is too dangerous to introduce this option.

i dont see how this is any more dangerous than adding an option to
force erasing of bad blocks ?  why should we be over protective of the
system ?  this is what got is into the existing rut of unrecoverable
blocks.

i find it useful during development to write out the content of pages
irregardless of the bad blocks and then read them back.  and for
recovering systems manually without having to resort to local access
to the bootloader.
-mike
Artem Bityutskiy - Sept. 13, 2010, 6:23 a.m.
On Sun, 2010-09-12 at 15:05 -0400, Mike Frysinger wrote:
> On Sun, Sep 12, 2010 at 12:27, Artem Bityutskiy wrote:
> > On Sat, 2010-09-11 at 23:51 -0400, Mike Frysinger wrote:
> >> Sometimes dumping bad blocks is useful, like when the block isn't actually
> >> bad but the OOB layout isn't what the kernel is expecting or is otherwise
> >> screwed up.  The --nobad option allows just that.
> >
> > How useful is this? I think instead you should implement the force flag
> > we discussed and deal with 'otherwise screwed up' eraseblocks with
> > flash_erase. I am afraid it is too dangerous to introduce this option.
> 
> i dont see how this is any more dangerous than adding an option to
> force erasing of bad blocks ?  why should we be over protective of the
> system ?

Because it was like this for long time and people are accustomed to the
fact that if a block is marked as bad, nothing will happen to it.

Besides, if I misuse options and lose a really bad block, it is very
difficult to find it again.

>   this is what got is into the existing rut of unrecoverable
> blocks.

> 
> i find it useful during development to write out the content of pages
> irregardless of the bad blocks and then read them back.

But if a block is marked as bad, the current contents of it is not
necessarily 0xFFs and it does not necessarily ready to be written. You
have to first erase it.

So my point was - please, first provide the means to erase them.

>   and for
> recovering systems manually without having to resort to local access
> to the bootloader.
> -mike
Mike Frysinger - Sept. 14, 2010, 1:21 a.m.
On Mon, Sep 13, 2010 at 02:23, Artem Bityutskiy wrote:
> On Sun, 2010-09-12 at 15:05 -0400, Mike Frysinger wrote:
>> On Sun, Sep 12, 2010 at 12:27, Artem Bityutskiy wrote:
>> > On Sat, 2010-09-11 at 23:51 -0400, Mike Frysinger wrote:
>> >> Sometimes dumping bad blocks is useful, like when the block isn't actually
>> >> bad but the OOB layout isn't what the kernel is expecting or is otherwise
>> >> screwed up.  The --nobad option allows just that.
>> >
>> > How useful is this? I think instead you should implement the force flag
>> > we discussed and deal with 'otherwise screwed up' eraseblocks with
>> > flash_erase. I am afraid it is too dangerous to introduce this option.
>>
>> i dont see how this is any more dangerous than adding an option to
>> force erasing of bad blocks ?  why should we be over protective of the
>> system ?
>
> Because it was like this for long time and people are accustomed to the
> fact that if a block is marked as bad, nothing will happen to it.
>
> Besides, if I misuse options and lose a really bad block, it is very
> difficult to find it again.

... i dont see what any of this has to do with my proposal.  the
default behavior is unchanged, and i dont think we need to be coddling
people to not use new options.  if you dont want to "lose" things,
then dont use the new option.

>> i find it useful during development to write out the content of pages
>> irregardless of the bad blocks and then read them back.
>
> But if a block is marked as bad, the current contents of it is not
> necessarily 0xFFs and it does not necessarily ready to be written. You
> have to first erase it.
>
> So my point was - please, first provide the means to erase them.

so once we have the ability to erase them, you'll merge this ?

not that i see why we're restricting this behavior in the first place
... policy is for the end user to determine ... this is putting
artificial limits for no real reason that i can see.
-mike
Artem Bityutskiy - Sept. 14, 2010, 5:26 a.m.
On Mon, 2010-09-13 at 21:21 -0400, Mike Frysinger wrote:
> so once we have the ability to erase them, you'll merge this ?

Let's see, why not. How about what I suggest below.

> not that i see why we're restricting this behavior in the first place

First of all, bad eraseblock is the eraseblock which should not be used,
because it is bad. Writing to a bad eraseblock should not be allowed.

Your use-case is that an eraseblock is marked as bad incorrectly - by a
mistake or due to some problems.

So what you need is an operation which makes it good again and let's you
use it.

Once you add such operation, you will mark needed eraseblocks as good
again (effectively, erase them, and unmark in in-RAM and/or on-flash
BBD). After this you will not need --nobad option, because the
eraseblock is already good.

> ... policy is for the end user to determine ... this is putting
> artificial limits for no real reason that i can see.

But the problem is not artificial limits. The problem is that I do not
think your option is usable at all, because, as I explained, bad
eraseblock is not necessarily writable, it's contents and the state is
unpredictable. It may contain unstable bits, for example. You really
need to erase it before writing.
Mike Frysinger - Sept. 22, 2010, 4:23 a.m.
On Tue, Sep 14, 2010 at 01:26, Artem Bityutskiy wrote:
> On Mon, 2010-09-13 at 21:21 -0400, Mike Frysinger wrote:
>> ... policy is for the end user to determine ... this is putting
>> artificial limits for no real reason that i can see.
>
> But the problem is not artificial limits. The problem is that I do not
> think your option is usable at all, because, as I explained, bad
> eraseblock is not necessarily writable, it's contents and the state is
> unpredictable. It may contain unstable bits, for example. You really
> need to erase it before writing.

it is completely artificial.  as you said yourself, it is "not
necessarily writable".  that means i should be able to tell the
hardware "do XXX" and let the hardware do it.  instead, i'm stuck with
userspace utils that say "no, you cant do that".  except the only
thing telling me i cant do that is the userspace utils.  as clearly
demonstrated, adding this option lets me do what i want -- write to
pages and change their contents.

the fact that i might not get the same data back as what i wrote is
completely irrelevant.  i can already do this to good pages without
erasing them first.  by your logic, nandwrite should also be
artificially aborting with "oh, you need to erase these blocks first".
 but it isnt

the fact that normally you want to skip badblocks is also irrelevant.
that's why it is an option the user specifically needs to enable
themselves.  i dont care if the default policy is "dont write to bad
blocks".  the default policy has no bearing here.
-mike
Artem Bityutskiy - Sept. 22, 2010, 7:12 a.m.
On Wed, 2010-09-22 at 00:23 -0400, Mike Frysinger wrote:
> On Tue, Sep 14, 2010 at 01:26, Artem Bityutskiy wrote:
> > On Mon, 2010-09-13 at 21:21 -0400, Mike Frysinger wrote:
> >> ... policy is for the end user to determine ... this is putting
> >> artificial limits for no real reason that i can see.
> >
> > But the problem is not artificial limits. The problem is that I do not
> > think your option is usable at all, because, as I explained, bad
> > eraseblock is not necessarily writable, it's contents and the state is
> > unpredictable. It may contain unstable bits, for example. You really
> > need to erase it before writing.
> 
> it is completely artificial.  as you said yourself, it is "not
> necessarily writable".  that means i should be able to tell the
> hardware "do XXX" and let the hardware do it.  instead, i'm stuck with
> userspace utils that say "no, you cant do that".  except the only
> thing telling me i cant do that is the userspace utils.  as clearly
> demonstrated, adding this option lets me do what i want -- write to
> pages and change their contents.
> 
> the fact that i might not get the same data back as what i wrote is
> completely irrelevant.  i can already do this to good pages without
> erasing them first.  by your logic, nandwrite should also be
> artificially aborting with "oh, you need to erase these blocks first".
>  but it isnt
> 
> the fact that normally you want to skip badblocks is also irrelevant.
> that's why it is an option the user specifically needs to enable
> themselves.  i dont care if the default policy is "dont write to bad
> blocks".  the default policy has no bearing here.
> -mike

You are pushing for this quite aggressively, so I guess you really need
this :-) And you sound convincing, but give me some time to think about
this please.
Mike Frysinger - Sept. 22, 2010, 7:34 a.m.
On Wed, Sep 22, 2010 at 03:12, Artem Bityutskiy wrote:
> On Wed, 2010-09-22 at 00:23 -0400, Mike Frysinger wrote:
>> On Tue, Sep 14, 2010 at 01:26, Artem Bityutskiy wrote:
>> > On Mon, 2010-09-13 at 21:21 -0400, Mike Frysinger wrote:
>> >> ... policy is for the end user to determine ... this is putting
>> >> artificial limits for no real reason that i can see.
>> >
>> > But the problem is not artificial limits. The problem is that I do not
>> > think your option is usable at all, because, as I explained, bad
>> > eraseblock is not necessarily writable, it's contents and the state is
>> > unpredictable. It may contain unstable bits, for example. You really
>> > need to erase it before writing.
>>
>> it is completely artificial.  as you said yourself, it is "not
>> necessarily writable".  that means i should be able to tell the
>> hardware "do XXX" and let the hardware do it.  instead, i'm stuck with
>> userspace utils that say "no, you cant do that".  except the only
>> thing telling me i cant do that is the userspace utils.  as clearly
>> demonstrated, adding this option lets me do what i want -- write to
>> pages and change their contents.
>>
>> the fact that i might not get the same data back as what i wrote is
>> completely irrelevant.  i can already do this to good pages without
>> erasing them first.  by your logic, nandwrite should also be
>> artificially aborting with "oh, you need to erase these blocks first".
>>  but it isnt
>>
>> the fact that normally you want to skip badblocks is also irrelevant.
>> that's why it is an option the user specifically needs to enable
>> themselves.  i dont care if the default policy is "dont write to bad
>> blocks".  the default policy has no bearing here.
>
> You are pushing for this quite aggressively, so I guess you really need
> this :-) And you sound convincing

i'm open to logic, but i cant figure out your side.  all i can see is
"it's been this way" and "we shouldnt write bad blocks".  but both
sound like policies that the end user should have control over rather
than the userspace utils always enforcing.  so if you feel i've missed
something, please highlight it.

> but give me some time to think about this please.

np
-mike
Jon Povey - Sept. 23, 2010, 1:48 a.m.
linux-mtd-bounces@lists.infradead.org wrote:
> it is completely artificial.  as you said yourself, it is "not
> necessarily writable".  that means i should be able to tell the
> hardware "do XXX" and let the hardware do it.  instead, i'm stuck with
> userspace utils that say "no, you cant do that".

As Artem is thinking about it, I'll just chip in with a "me too".

By all means the default should protect you from doing potentially
stupid and dangerous things. But these are low level utilities for
people who know what they are doing, having a "safeties off" option
makes a lot of sense.

--
Jon Povey
jon.povey@racelogic.co.uk

Racelogic is a limited company registered in England. Registered number 2743719 .
Registered Office Unit 10, Swan Business Centre, Osier Way, Buckingham, Bucks, MK18 1TB .

The information contained in this electronic mail transmission is intended by Racelogic Ltd for the use of the named individual or entity to which it is directed and may contain information that is confidential or privileged. If you have received this electronic mail transmission in error, please delete it from your system without copying or forwarding it, and notify the sender of the error by reply email so that the sender's address records can be corrected. The views expressed by the sender of this communication do not necessarily represent those of Racelogic Ltd. Please note that Racelogic reserves the right to monitor e-mail communications passing through its network
Artem Bityutskiy - Sept. 23, 2010, 10:38 a.m.
On Sat, 2010-09-11 at 23:51 -0400, Mike Frysinger wrote:
> Sometimes dumping bad blocks is useful, like when the block isn't actually
> bad but the OOB layout isn't what the kernel is expecting or is otherwise
> screwed up.  The --nobad option allows just that.
> 
> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
> ---
>  nandwrite.c |   10 +++++++++-
>  1 files changed, 9 insertions(+), 1 deletions(-)

Let's merge this. But other people were unhappy about --nobad name and
suggested --noskipbad, and this was done for nanddump. Let's use
consistent name for nandwrite, would you please tweak the patch and
re-send?
Iwo Mergler - Sept. 29, 2010, 3:35 a.m.
Mike Frysinger wrote:
> i'm open to logic, but i cant figure out your side.  all i can see is
> "it's been this way" and "we shouldnt write bad blocks".  but both
> sound like policies that the end user should have control over rather
> than the userspace utils always enforcing.  so if you feel i've missed
> something, please highlight it.

Hi Mike,

while I agree with the philosophy to allow users to shoot
themselves in the foot if they really like that sort of thing,
the "don't touch bad blocks" rule makes sense and it's not
necessarily obvious.

Flash memory is only 'digital' in an idealised sense. Like
every other 'digital' device, the real thing is analogue.

The read value of a flash memory cell depends on the amount
of charge stored in a insulated transistor gate when compared
to a threshold voltage derived from the power supply. The
comparison is also temperature dependent.

In other words, a bit being read correctly depends on voltage
and temperature during both writing and reading.

Bad cells can be caused by impurities in the gate insulation
(slow discharge), incorrect insulation thickness, marginal
transistors, etc. The degree of 'badness' also depends on
voltage and temperature.

The upshot of this is that blocks are marked as bad by the
manufacturer based on corner case testing. Like erasing at
high temperature / low voltage and reading back at low
temperature / high voltage. The manufacturer can also
access pads on the naked die that are not connected to pins
during packaging.

This means that the manufacturer can catch marginal cases,
where a cell 'works', but will flip a bit within a month.

Depending on the damage, it may even require special tricks
to mark a block as bad.

Either way, as a user of the device, you may not be able
tell if a block is bad, or fully erase a bad block, or even
reliably mark it as bad again.

Thus the rule about not touching bad blocks. It's the only
way to make sure that you don't end up with a batch of products
that will die on the shelf, after you successfully tested them.


Best regards,

Iwo
Jon Povey - Sept. 29, 2010, 6:56 a.m.
linux-mtd-bounces@lists.infradead.org wrote:
> Mike Frysinger wrote:
>> i'm open to logic, but i cant figure out your side.  all i can see is
>> "it's been this way" and "we shouldnt write bad blocks".  but both
>> sound like policies that the end user should have control over rather
>> than the userspace utils always enforcing.  so if you feel i've
>> missed something, please highlight it.
>
> Hi Mike,
>
> while I agree with the philosophy to allow users to shoot
> themselves in the foot if they really like that sort of thing,
> the "don't touch bad blocks" rule makes sense and it's not
> necessarily obvious.

[manufacturer-marked bad blocks]

> Thus the rule about not touching bad blocks. It's the only
> way to make sure that you don't end up with a batch of products
> that will die on the shelf, after you successfully tested them.

I can't speak for Mike, but I have run into a few situations where
blocks were marked bad incorrectly. For example, the ROM bootloader of
a chip I work with requires a different OOB layout to that specified by
the NAND flash manufacturer, so to write the second-stage bootloader
you must write data into what is, as far as the NAND maker is concerned,
the OOB area. Then when u-boot or Linux loads, it finds no BBT, scans
the bootloader, finds non-FF OOB and marks it bad.

These blocks are NOT really bad, and as engineers who know what we are
doing, we need to be able to rewrite these for firmware upgrades and
so on.

There are also situations of upgrading to new firmware that uses a
different OOB layout (yes it's perverted, and yes I have been there),
maybe we know a block is bad but it's full of FF, we are going to
reboot and rebuild the BBT, we want to write a bad block marker in
the new OOB - we need to be able to write the bad block.

imo, the mtd-utils are low-level and specialist enough that allowing
a --yes-i-really-know-what-i-am-doing kind of flag makes sense.
Perhaps that support could even be an #ifdef, disabled by default.

Yes, someone using it has to really know what they are doing, including
being aware of manufacturer-marked bad blocks.

Either way, I for one need to do these kind of low-level hacks, and if
mtd kernel or utils have policy hardwired into them, happily I have
the source and a compiler and can fix that locally.

--
Jon Povey
jon.povey@racelogic.co.uk

Racelogic is a limited company registered in England. Registered number 2743719 .
Registered Office Unit 10, Swan Business Centre, Osier Way, Buckingham, Bucks, MK18 1TB .

The information contained in this electronic mail transmission is intended by Racelogic Ltd for the use of the named individual or entity to which it is directed and may contain information that is confidential or privileged. If you have received this electronic mail transmission in error, please delete it from your system without copying or forwarding it, and notify the sender of the error by reply email so that the sender's address records can be corrected. The views expressed by the sender of this communication do not necessarily represent those of Racelogic Ltd. Please note that Racelogic reserves the right to monitor e-mail communications passing through its network
Artem Bityutskiy - Sept. 29, 2010, 12:59 p.m.
On Wed, 2010-09-29 at 13:35 +1000, Iwo Mergler wrote:
> The upshot of this is that blocks are marked as bad by the
> manufacturer based on corner case testing. Like erasing at
> high temperature / low voltage and reading back at low
> temperature / high voltage. The manufacturer can also
> access pads on the naked die that are not connected to pins
> during packaging.

Is only about letting you write to bad eraseblocks, not about erasing
them. I assumed this cannot be used for marking bad eraseblocks as good
back. The kernel does not have any protection against writing to bad
eraseblocks, so I do not see why Mike cannot have this option.

But speaking about erasing bad eraseblocks, which we also discussed, I
think you have a good point - we should probably distinguish between
factory-marked bad eraseblocks and user-marked. AFAIR, the OOB marker
for them was even different, or in BBT, do not really remember.

We probably can allow erasing user-marked bad eraseblocks and unmarking
them. But we probably should not allow erasing factory-marked bad
eraseblocks. But again, I'm not sure if it is possible to do, did not
really think about this.

But I think this particular patch from Mike is OK.
Mike Frysinger - Sept. 29, 2010, 1:25 p.m.
On Wed, Sep 29, 2010 at 02:56, Jon Povey wrote:
> linux-mtd-bounces@lists.infradead.org wrote:
>> Mike Frysinger wrote:
>>> i'm open to logic, but i cant figure out your side.  all i can see is
>>> "it's been this way" and "we shouldnt write bad blocks".  but both
>>> sound like policies that the end user should have control over rather
>>> than the userspace utils always enforcing.  so if you feel i've
>>> missed something, please highlight it.
>>
>> while I agree with the philosophy to allow users to shoot
>> themselves in the foot if they really like that sort of thing,
>> the "don't touch bad blocks" rule makes sense and it's not
>> necessarily obvious.
>
> [manufacturer-marked bad blocks]
>
>> Thus the rule about not touching bad blocks. It's the only
>> way to make sure that you don't end up with a batch of products
>> that will die on the shelf, after you successfully tested them.
>
> I can't speak for Mike, but I have run into a few situations where
> blocks were marked bad incorrectly. For example, the ROM bootloader of
> a chip I work with requires a different OOB layout to that specified by
> the NAND flash manufacturer, so to write the second-stage bootloader
> you must write data into what is, as far as the NAND maker is concerned,
> the OOB area. Then when u-boot or Linux loads, it finds no BBT, scans
> the bootloader, finds non-FF OOB and marks it bad.

this is actually the same situation i am dealing with (i doubt we're
using the same processor though), and as i described in my original
posting: "like when the block isn't actually bad but the OOB layout
isn't what the kernel is expecting or is otherwise screwed up".
-mike
Mike Frysinger - Sept. 29, 2010, 1:28 p.m.
On Wed, Sep 29, 2010 at 08:59, Artem Bityutskiy wrote:
> On Wed, 2010-09-29 at 13:35 +1000, Iwo Mergler wrote:
>> The upshot of this is that blocks are marked as bad by the
>> manufacturer based on corner case testing. Like erasing at
>> high temperature / low voltage and reading back at low
>> temperature / high voltage. The manufacturer can also
>> access pads on the naked die that are not connected to pins
>> during packaging.
>
> Is only about letting you write to bad eraseblocks, not about erasing
> them. I assumed this cannot be used for marking bad eraseblocks as good
> back. The kernel does not have any protection against writing to bad
> eraseblocks, so I do not see why Mike cannot have this option.

correct on all points :)

> But speaking about erasing bad eraseblocks, which we also discussed, I
> think you have a good point - we should probably distinguish between
> factory-marked bad eraseblocks and user-marked. AFAIR, the OOB marker
> for them was even different, or in BBT, do not really remember.
>
> We probably can allow erasing user-marked bad eraseblocks and unmarking
> them. But we probably should not allow erasing factory-marked bad
> eraseblocks. But again, I'm not sure if it is possible to do, did not
> really think about this.

i think you're correct here too.  i havent seen any indication that
factory-marked bad blocks are distinguished in any way from
user-marked bad blocks.  either in the software BBT or in the OOB
layout.
-mike
Iwo Mergler - Sept. 29, 2010, 11:44 p.m.
Jon Povey wrote:
> I can't speak for Mike, but I have run into a few situations where
> blocks were marked bad incorrectly. For example, the ROM bootloader of
> a chip I work with requires a different OOB layout to that specified by
> the NAND flash manufacturer, so to write the second-stage bootloader
> you must write data into what is, as far as the NAND maker is concerned,
> the OOB area. Then when u-boot or Linux loads, it finds no BBT, scans
> the bootloader, finds non-FF OOB and marks it bad.
> 
> These blocks are NOT really bad, and as engineers who know what we are
> doing, we need to be able to rewrite these for firmware upgrades and
> so on.

No problem with that. Anything *you* have marked bad is fair game.

Although I would always use an in-flash BBT for this kind of situation.
As you say, there are bootloaders, filesystems and hardware ECC that
will only work if you re-purpose the bad block marker region in the
OOB.

In such a case, just do the bad block scan once, at production time,
and write the BBT. Then change everything else to use that BBT. And
you leave the BBT alone during reflash.

> There are also situations of upgrading to new firmware that uses a
> different OOB layout (yes it's perverted, and yes I have been there),
> maybe we know a block is bad but it's full of FF, we are going to
> reboot and rebuild the BBT, we want to write a bad block marker in
> the new OOB - we need to be able to write the bad block.

You can't guarantee that that will always succeed. It's unlikely,
but you can end up with a marker that verifies correctly and then
fades, long before the official 10 endurance years are over.

It's a judgement call, of course. It somewhat depends whether
you're making a toy or a medical implant. Warranty return or
lawsuit. :-)

In the case you describe, I would use the old BBT to create the
new one directly, without attempting to write to the bad blocks.

> imo, the mtd-utils are low-level and specialist enough that allowing
> a --yes-i-really-know-what-i-am-doing kind of flag makes sense.
> Perhaps that support could even be an #ifdef, disabled by default.
> 
> Yes, someone using it has to really know what they are doing, including
> being aware of manufacturer-marked bad blocks.
> 
> Either way, I for one need to do these kind of low-level hacks, and if
> mtd kernel or utils have policy hardwired into them, happily I have
> the source and a compiler and can fix that locally.

I agree. I only wrote what I did to maybe contribute to the 
documentation. There seemed to be the sentiment that the tools
only skiped bad blocks because of some silly tradition.

I'm all in favour of adding the feature.


Best regards,

Iwo
Jon Povey - Sept. 30, 2010, 3:23 a.m.
Iwo Mergler wrote:
> Jon Povey wrote:
> Although I would always use an in-flash BBT for this kind of
> situation. As you say, there are bootloaders, filesystems and
> hardware ECC that will only work if you re-purpose the bad block
> marker region in the OOB.
>
> In such a case, just do the bad block scan once, at production time,
> and write the BBT. Then change everything else to use that BBT. And
> you leave the BBT alone during reflash.

I agree this is a nice way to do it, if it's practical.

> In the case you describe, I would use the old BBT to create the
> new one directly, without attempting to write to the bad blocks.

That's what I did, but note mtd won't allow you to rewrite the BBTs,
they are hardwired to be off-limits in the kernel. So, you have to
do something like write a new bootloader that can translate it on
reboot. Happily that has been OK for me, but it's not ideal, and one
may be in a position where that is difficult (restricted code size for
bootloader and such)

The inability to rewrite the BBT from linux is a separate issue though.

>> Either way, I for one need to do these kind of low-level hacks, and
>> if mtd kernel or utils have policy hardwired into them, happily I
>> have the source and a compiler and can fix that locally.
>
> I agree. I only wrote what I did to maybe contribute to the
> documentation. There seemed to be the sentiment that the tools
> only skiped bad blocks because of some silly tradition.
>
> I'm all in favour of adding the feature.

Ah, I thought you were against it.
Jolly good.

--
Jon Povey
jon.povey@racelogic.co.uk

Racelogic is a limited company registered in England. Registered number 2743719 .
Registered Office Unit 10, Swan Business Centre, Osier Way, Buckingham, Bucks, MK18 1TB .

The information contained in this electronic mail transmission is intended by Racelogic Ltd for the use of the named individual or entity to which it is directed and may contain information that is confidential or privileged. If you have received this electronic mail transmission in error, please delete it from your system without copying or forwarding it, and notify the sender of the error by reply email so that the sender's address records can be corrected. The views expressed by the sender of this communication do not necessarily represent those of Racelogic Ltd. Please note that Racelogic reserves the right to monitor e-mail communications passing through its network
Jon Povey - Sept. 30, 2010, 3:51 a.m.
linux-mtd-bounces@lists.infradead.org wrote:
> On Wed, Sep 29, 2010 at 08:59, Artem Bityutskiy wrote:
>> But speaking about erasing bad eraseblocks, which we also discussed,
>> I think you have a good point - we should probably distinguish
>> between factory-marked bad eraseblocks and user-marked. AFAIR, the
>> OOB marker for them was even different, or in BBT, do not really
>> remember.

> i think you're correct here too.  i havent seen any indication that
> factory-marked bad blocks are distinguished in any way from
> user-marked bad blocks.  either in the software BBT or in the OOB
> layout. -mike

I remember reading somewhere that the BBT is designed to distinguish
them, e.g. here:
http://wiki.laptop.org/go/NAND_Flash_Bad_Block_Table#Bad_Block_Table_Format

I haven't looked in the code to see if that's actually used, though.


--
Jon Povey
jon.povey@racelogic.co.uk

Racelogic is a limited company registered in England. Registered number 2743719 .
Registered Office Unit 10, Swan Business Centre, Osier Way, Buckingham, Bucks, MK18 1TB .

The information contained in this electronic mail transmission is intended by Racelogic Ltd for the use of the named individual or entity to which it is directed and may contain information that is confidential or privileged. If you have received this electronic mail transmission in error, please delete it from your system without copying or forwarding it, and notify the sender of the error by reply email so that the sender's address records can be corrected. The views expressed by the sender of this communication do not necessarily represent those of Racelogic Ltd. Please note that Racelogic reserves the right to monitor e-mail communications passing through its network

Patch

diff --git a/nandwrite.c b/nandwrite.c
index 1b4ca3d..bbe38b9 100644
--- a/nandwrite.c
+++ b/nandwrite.c
@@ -79,6 +79,7 @@  static void display_help (void)
 "                          device\n"
 "  -m, --markbad           Mark blocks bad if write fails\n"
 "  -n, --noecc             Write without ecc\n"
+"  -N, --nobad             Write without bad block skipping\n"
 "  -o, --oob               Image contains oob data\n"
 "  -r, --raw               Image contains the raw oob data dumped by nanddump\n"
 "  -s addr, --start=addr   Set start address (default is 0)\n"
@@ -118,6 +119,7 @@  static bool		forcejffs2 = false;
 static bool		forceyaffs = false;
 static bool		forcelegacy = false;
 static bool		noecc = false;
+static bool		nobad = false;
 static bool		pad = false;
 static int		blockalign = 1; /*default to using 16K block size */
 
@@ -127,7 +129,7 @@  static void process_options (int argc, char * const argv[])
 
 	for (;;) {
 		int option_index = 0;
-		static const char *short_options = "ab:fjmnopqrs:y";
+		static const char *short_options = "ab:fjmnNopqrs:y";
 		static const struct option long_options[] = {
 			{"help", no_argument, 0, 0},
 			{"version", no_argument, 0, 0},
@@ -137,6 +139,7 @@  static void process_options (int argc, char * const argv[])
 			{"jffs2", no_argument, 0, 'j'},
 			{"markbad", no_argument, 0, 'm'},
 			{"noecc", no_argument, 0, 'n'},
+			{"nobad", no_argument, 0, 'N'},
 			{"oob", no_argument, 0, 'o'},
 			{"pad", no_argument, 0, 'p'},
 			{"quiet", no_argument, 0, 'q'},
@@ -181,6 +184,9 @@  static void process_options (int argc, char * const argv[])
 			case 'n':
 				noecc = true;
 				break;
+			case 'N':
+				nobad = true;
+				break;
 			case 'm':
 				markbad = true;
 				break;
@@ -487,6 +493,8 @@  int main(int argc, char * const argv[])
 						 blockstart / meminfo.erasesize, blockstart);
 
 			/* Check all the blocks in an erase block for bad blocks */
+			if (nobad)
+				continue;
 			do {
 				if ((ret = ioctl(fd, MEMGETBADBLOCK, &offs)) < 0) {
 					perror("ioctl(MEMGETBADBLOCK)");