diff mbox

[v3] mtd: nand: add option to erase NAND blocks even if detected as bad.

Message ID 20170512073946.812-1-mrugiero@gmail.com
State Rejected
Delegated to: Boris Brezillon
Headers show

Commit Message

Mario Rugiero May 12, 2017, 7:39 a.m. UTC
Some chips used under a custom vendor driver can get their blocks
incorrectly detected as bad blocks, out of incompatibilities
between such drivers and MTD drivers.
When there are too many misdetected bad blocks, the device becomes
unusable because a bad block table can't be allocated, aside from
all the legitimately good blocks which become unusable under these
conditions.
This adds a build option to workaround the issue by enabling the
user to free up space regardless of what the driver thinks about
the blocks. It still warns about it, since that's potentially
dangerous.

Example usage: recovering NAND chips on sunxi devices, as explained
here: http://linux-sunxi.org/Mainline_NAND_Howto#Known_issues

v3: Warn when erasing a bad block, as suggested by Andrea Scian.

v2: Fix a typo in the config description.

Signed-off-by: Mario J. Rugiero <mrugiero@gmail.com>
---
 drivers/mtd/nand/Kconfig     | 9 +++++++++
 drivers/mtd/nand/nand_base.c | 2 ++
 2 files changed, 11 insertions(+)

Comments

Boris Brezillon May 15, 2017, 8:21 a.m. UTC | #1
+MTD maintainers

Hi Mario,

Can you please Cc NAND/MTD maintainers next time.

On Fri, 12 May 2017 04:39:46 -0300
"Mario J. Rugiero" <mrugiero@gmail.com> wrote:

> Some chips used under a custom vendor driver can get their blocks
> incorrectly detected as bad blocks, out of incompatibilities
> between such drivers and MTD drivers.
> When there are too many misdetected bad blocks, the device becomes
> unusable because a bad block table can't be allocated, aside from
> all the legitimately good blocks which become unusable under these
> conditions.
> This adds a build option to workaround the issue by enabling the
> user to free up space regardless of what the driver thinks about
> the blocks. It still warns about it, since that's potentially
> dangerous.
> 
> Example usage: recovering NAND chips on sunxi devices, as explained
> here: http://linux-sunxi.org/Mainline_NAND_Howto#Known_issues
> 
> v3: Warn when erasing a bad block, as suggested by Andrea Scian.
> 
> v2: Fix a typo in the config description.

For your future submissions, please put the changelog after the --- so
that it's dropped when applying the patch.

> 
> Signed-off-by: Mario J. Rugiero <mrugiero@gmail.com>
> ---

Here:

v3: ...

v2: ...

>  drivers/mtd/nand/Kconfig     | 9 +++++++++
>  drivers/mtd/nand/nand_base.c | 2 ++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/drivers/mtd/nand/Kconfig b/drivers/mtd/nand/Kconfig
> index c3029528063b..e0a32a34b6bf 100644
> --- a/drivers/mtd/nand/Kconfig
> +++ b/drivers/mtd/nand/Kconfig
> @@ -36,6 +36,15 @@ config MTD_NAND_ECC_BCH
>  	  ECC codes. They are used with NAND devices requiring more than 1 bit
>  	  of error correction.
>  
> +config MTD_NAND_ERASE_BADBLOCKS
> +	bool "Enable erasing of bad blocks (DANGEROUS)"
> +	default n
> +	help
> +	  This enables support for attempting to erase bad blocks.
> +	  It is needed to workaround too many badblocks issue on chips used
> +	  under custom, incompatible vendor drivers.
> +	  Say N if unsure.
> +
>  config MTD_SM_COMMON
>  	tristate
>  	default n
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index d474378ed810..282410813a9c 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -3211,8 +3211,10 @@ int nand_erase_nand(struct mtd_info *mtd, struct erase_info *instr,
>  					chip->page_shift, allowbbt)) {
>  			pr_warn("%s: attempt to erase a bad block at page 0x%08x\n",
>  				    __func__, page);
> +			#ifndef CONFIG_MTD_NAND_ERASE_BADBLOCKS

#ifndef statements should be at the beginning of the line. But anyway,
I think we all agree that always forcing bad block erasure is a bad
idea. If we accept to support this feature, this should be done through
a per-NAND-chip debugfs entry, with fine grained selection of the block
that we're allowing to be forcibly erased.

Here is a suggestion:

echo all > /sys/kernel/debug/nand/<nand-chip-name>/erase-bad-blocks
echo none > /sys/kernel/debug/nand/<nand-chip-name>/erase-bad-blocks
echo X,Y-Z,... > /sys/kernel/debug/nand/<nand-chip-name>/erase-bad-blocks

where X is an eraseblock number, and X-Y is a range of eraseblocks.

BTW, maybe we should create and expose a top-level mtd debugfs directory
to avoid exposing MTD related things at the root of the debugfs FS.

Something like:

/sys/kernel/debug/mtd/<dev-type>/...

so for the NAND related bits it would be

/sys/kernel/debug/mtd/nand/...

MTD maintainers, any opinion on this?
Richard Weinberger May 15, 2017, 9:23 a.m. UTC | #2
Boris,

Am 15.05.2017 um 10:21 schrieb Boris Brezillon:
 > #ifndef statements should be at the beginning of the line. But anyway,
> I think we all agree that always forcing bad block erasure is a bad
> idea. If we accept to support this feature, this should be done through
> a per-NAND-chip debugfs entry, with fine grained selection of the block
> that we're allowing to be forcibly erased.
> 
> Here is a suggestion:
> 
> echo all > /sys/kernel/debug/nand/<nand-chip-name>/erase-bad-blocks
> echo none > /sys/kernel/debug/nand/<nand-chip-name>/erase-bad-blocks
> echo X,Y-Z,... > /sys/kernel/debug/nand/<nand-chip-name>/erase-bad-blocks
> 
> where X is an eraseblock number, and X-Y is a range of eraseblocks.

Will a write to that file trigger the erase or just allow it?

IMHO we can keept it simple such as:
echo y > /sys/kernel/debug/nand/<nand-chip-name>/allow-bad-block-erase

Then a user can erase whatever he wants...

> BTW, maybe we should create and expose a top-level mtd debugfs directory
> to avoid exposing MTD related things at the root of the debugfs FS.
> 
> Something like:
> 
> /sys/kernel/debug/mtd/<dev-type>/...
> 
> so for the NAND related bits it would be
> 
> /sys/kernel/debug/mtd/nand/...

Yes, this makes sense to me.

And since this is debugfs we can do what we want.

Thanks,
//richard
Boris Brezillon May 15, 2017, 9:41 a.m. UTC | #3
On Mon, 15 May 2017 11:23:55 +0200
Richard Weinberger <richard@nod.at> wrote:

> Boris,
> 
> Am 15.05.2017 um 10:21 schrieb Boris Brezillon:
>  > #ifndef statements should be at the beginning of the line. But anyway,
> > I think we all agree that always forcing bad block erasure is a bad
> > idea. If we accept to support this feature, this should be done through
> > a per-NAND-chip debugfs entry, with fine grained selection of the block
> > that we're allowing to be forcibly erased.
> > 
> > Here is a suggestion:
> > 
> > echo all > /sys/kernel/debug/nand/<nand-chip-name>/erase-bad-blocks
> > echo none > /sys/kernel/debug/nand/<nand-chip-name>/erase-bad-blocks
> > echo X,Y-Z,... > /sys/kernel/debug/nand/<nand-chip-name>/erase-bad-blocks
> > 
> > where X is an eraseblock number, and X-Y is a range of eraseblocks.  
> 
> Will a write to that file trigger the erase or just allow it?

Just allows it.

> 
> IMHO we can keept it simple such as:
> echo y > /sys/kernel/debug/nand/<nand-chip-name>/allow-bad-block-erase
> 
> Then a user can erase whatever he wants...

Well, I was proposing to do that because several users can use the same
NAND chip in parallel, and allowing to forcibly erase a bad block at
the NAND chip level can be dangerous in this case.
Here is a real example: 2 users are accessing 2 different partitions,
one wants to force bad block erasure on partition 1, while the other
just wants to normally erase partition 0. With the global
"allow/disallow bad block erasure" approach, you're just likely to
erase bad blocks in partition 0 as well.

> 
> > BTW, maybe we should create and expose a top-level mtd debugfs directory
> > to avoid exposing MTD related things at the root of the debugfs FS.
> > 
> > Something like:
> > 
> > /sys/kernel/debug/mtd/<dev-type>/...
> > 
> > so for the NAND related bits it would be
> > 
> > /sys/kernel/debug/mtd/nand/...  
> 
> Yes, this makes sense to me.
> 
> And since this is debugfs we can do what we want.

Yep.
Richard Weinberger May 15, 2017, 10:10 a.m. UTC | #4
Boris,

Am 15.05.2017 um 11:41 schrieb Boris Brezillon:
>> IMHO we can keept it simple such as:
>> echo y > /sys/kernel/debug/nand/<nand-chip-name>/allow-bad-block-erase
>>
>> Then a user can erase whatever he wants...
> 
> Well, I was proposing to do that because several users can use the same
> NAND chip in parallel, and allowing to forcibly erase a bad block at
> the NAND chip level can be dangerous in this case.
> Here is a real example: 2 users are accessing 2 different partitions,
> one wants to force bad block erasure on partition 1, while the other
> just wants to normally erase partition 0. With the global
> "allow/disallow bad block erasure" approach, you're just likely to
> erase bad blocks in partition 0 as well.

Hmm, I'm not sure whether it makes sense to be smart in this case.
If somebody needs to fixup his bad blocks, like Mario described, there
are no other users and such an action should only be done when you
really know what you are doing.
This is a debug knob and not a knob that should be used in production
or while other parts of the chip are in use.

Thanks,
//richard
Boris Brezillon May 15, 2017, 11:05 a.m. UTC | #5
On Mon, 15 May 2017 12:10:58 +0200
Richard Weinberger <richard@nod.at> wrote:

> Boris,
> 
> Am 15.05.2017 um 11:41 schrieb Boris Brezillon:
> >> IMHO we can keept it simple such as:
> >> echo y > /sys/kernel/debug/nand/<nand-chip-name>/allow-bad-block-erase
> >>
> >> Then a user can erase whatever he wants...  
> > 
> > Well, I was proposing to do that because several users can use the same
> > NAND chip in parallel, and allowing to forcibly erase a bad block at
> > the NAND chip level can be dangerous in this case.
> > Here is a real example: 2 users are accessing 2 different partitions,
> > one wants to force bad block erasure on partition 1, while the other
> > just wants to normally erase partition 0. With the global
> > "allow/disallow bad block erasure" approach, you're just likely to
> > erase bad blocks in partition 0 as well.  
> 
> Hmm, I'm not sure whether it makes sense to be smart in this case.
> If somebody needs to fixup his bad blocks, like Mario described, there
> are no other users and such an action should only be done when you
> really know what you are doing.
> This is a debug knob and not a knob that should be used in production
> or while other parts of the chip are in use.

Fair enough. Let's keep it simple then.
Mario Rugiero May 15, 2017, 1:16 p.m. UTC | #6
Hi all,

I took note of all of your feedback. I'll be waiting until wednesday
for more feedback, and then provide two new patchsets.
Patchset 1 will focus on creating the debugfs entries for mtd.
Patchset 2 will add the ability to toggle force erasing to the debugfs.
Does that sound right to you?

Regards,
Mario.
Boris Brezillon May 15, 2017, 1:20 p.m. UTC | #7
On Mon, 15 May 2017 10:16:51 -0300
Mario Rugiero <mrugiero@gmail.com> wrote:

> Hi all,
> 
> I took note of all of your feedback. I'll be waiting until wednesday
> for more feedback, and then provide two new patchsets.
> Patchset 1 will focus on creating the debugfs entries for mtd.
> Patchset 2 will add the ability to toggle force erasing to the debugfs.

Keep in mind that this should be done on a per device basis (one
debugfs dir per NAND device created under /sys/kernel/debug/mtd/nand/).

> Does that sound right to you?

Yep.

Thanks,

Boris
diff mbox

Patch

diff --git a/drivers/mtd/nand/Kconfig b/drivers/mtd/nand/Kconfig
index c3029528063b..e0a32a34b6bf 100644
--- a/drivers/mtd/nand/Kconfig
+++ b/drivers/mtd/nand/Kconfig
@@ -36,6 +36,15 @@  config MTD_NAND_ECC_BCH
 	  ECC codes. They are used with NAND devices requiring more than 1 bit
 	  of error correction.
 
+config MTD_NAND_ERASE_BADBLOCKS
+	bool "Enable erasing of bad blocks (DANGEROUS)"
+	default n
+	help
+	  This enables support for attempting to erase bad blocks.
+	  It is needed to workaround too many badblocks issue on chips used
+	  under custom, incompatible vendor drivers.
+	  Say N if unsure.
+
 config MTD_SM_COMMON
 	tristate
 	default n
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index d474378ed810..282410813a9c 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -3211,8 +3211,10 @@  int nand_erase_nand(struct mtd_info *mtd, struct erase_info *instr,
 					chip->page_shift, allowbbt)) {
 			pr_warn("%s: attempt to erase a bad block at page 0x%08x\n",
 				    __func__, page);
+			#ifndef CONFIG_MTD_NAND_ERASE_BADBLOCKS
 			instr->state = MTD_ERASE_FAILED;
 			goto erase_exit;
+			#endif
 		}
 
 		/*