Patchwork [v2,07/16] mtd/docg3: add OOB layout to mtdinfo

login
register
mail settings
Submitter Robert Jarzmik
Date Nov. 10, 2011, 8:05 a.m.
Message ID <1320912342-30147-8-git-send-email-robert.jarzmik@free.fr>
Download mbox | patch
Permalink /patch/124807/
State Accepted
Commit 732b63bd8c70bf8fbc50d3d3cd56c748edb8cfac
Headers show

Comments

Robert Jarzmik - Nov. 10, 2011, 8:05 a.m.
Add OOB layout description for docg3, so that userspace can
use this information to setup the data for write_oob().

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>

---
Since V1: added ECC oobavail field for OOB auto placement
---
 drivers/mtd/devices/docg3.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)
Mike Dunn - Nov. 12, 2011, 7:39 p.m.
On 11/10/2011 12:05 AM, Robert Jarzmik wrote:
> +static struct nand_ecclayout docg3_oobinfo = {
> +	.eccbytes = 8,
> +	.eccpos = {7, 8, 9, 10, 11, 12, 13, 14},
> +	.oobfree = {{0, 7}, {15, 1} },
> +	.oobavail = 8,
> +};
> +

Just FYI, per Ivan's suggestion, I changed this to use the last oob byte as a
"page programmed" flag for the purpose of detecting bit flips when reading a
blank page.  Maybe something to keep in mind.  You can have a look at the latest
G4 driver patch to see exactly how I use it.

Mike
Robert Jarzmik - Nov. 13, 2011, 10:18 a.m.
Mike Dunn <mikedunn@newsguy.com> writes:

> On 11/10/2011 12:05 AM, Robert Jarzmik wrote:
>> +static struct nand_ecclayout docg3_oobinfo = {
>> +	.eccbytes = 8,
>> +	.eccpos = {7, 8, 9, 10, 11, 12, 13, 14},
>> +	.oobfree = {{0, 7}, {15, 1} },
>> +	.oobavail = 8,
>> +};
>> +

I took a different approach.  I check an internal docg3 register to see if the
page was written. Or I could had have checked the Hamming code, as I don't think
it can be 0xff whatever the pagesize 7 bytes values.

The reason behind is that the Hamming code is Ham(64, 57), ie. Ham(2^6,
2^6-6-1). The means the 6 bits are enough to cover all codewords possibilities,
and 0xff is not one of them.

So unless a bitflip in Hamming code, 0xff in it means blank page. And I think
the ECC engine is even smarter, with the ECCCONF1_PAGE_IS_WRITTEN.

>
> Just FYI, per Ivan's suggestion, I changed this to use the last oob byte as a
> "page programmed" flag for the purpose of detecting bit flips when reading a
> blank page.  Maybe something to keep in mind.  You can have a look at the latest
> G4 driver patch to see exactly how I use it.
I personally think this should be provided by the MTD API. A function
is_page_blank(ofs) could tell if the page was written or not. Now if the
function is NULL, the upper layer (UBIFS, ...) could decide _by itself_ to
assign a free OOB byte to that meaning. But IMHO it's not the drivers duty to
take these decisions to restrict the OOB, let it be done at an upper level.

Cheers.
Artem Bityutskiy - Nov. 13, 2011, 12:53 p.m.
On Sun, 2011-11-13 at 11:18 +0100, Robert Jarzmik wrote:
> I personally think this should be provided by the MTD API. A function
> is_page_blank(ofs) could tell if the page was written or not. Now if the
> function is NULL, the upper layer (UBIFS, ...) could decide _by itself_ to
> assign a free OOB byte to that meaning. But IMHO it's not the drivers duty to
> take these decisions to restrict the OOB, let it be done at an upper level.

Probably it is a good idea to introduce an mtd_is_page_blank. But it
should either work for all flashes or not introduced at all. I do not
think upper layers should use OOB at all. And this interface should also
work for NOR flash. Probably it should just fall-back to comparing the
data with 0xFF if the driver cannot offer anything special.

Artem.
David Woodhouse - Nov. 13, 2011, 1:03 p.m.
On Sun, 2011-11-13 at 14:53 +0200, Artem Bityutskiy wrote:
> On Sun, 2011-11-13 at 11:18 +0100, Robert Jarzmik wrote:
> > I personally think this should be provided by the MTD API. A function
> > is_page_blank(ofs) could tell if the page was written or not. Now if the
> > function is NULL, the upper layer (UBIFS, ...) could decide _by itself_ to
> > assign a free OOB byte to that meaning. But IMHO it's not the drivers duty to
> > take these decisions to restrict the OOB, let it be done at an upper level.
> 
> Probably it is a good idea to introduce an mtd_is_page_blank. But it
> should either work for all flashes or not introduced at all. I do not
> think upper layers should use OOB at all. And this interface should also
> work for NOR flash. Probably it should just fall-back to comparing the
> data with 0xFF if the driver cannot offer anything special. 

We *tried* comparing with 0xFF in JFFS2 and found that it wasn't good
enough (it could be *mostly* erased before a powerfail but not
completely, so as soon as you start to program it you get a lot of
bitflips). Hence the cleanmarkers.

The only way you can treat a page as erased is if you *know* it was
successfully erased. So I'd be reluctant to introduce a function that
encourages people to draw inferences they shouldn't.
Artem Bityutskiy - Nov. 13, 2011, 1:35 p.m.
On Sun, 2011-11-13 at 13:03 +0000, David Woodhouse wrote:
> On Sun, 2011-11-13 at 14:53 +0200, Artem Bityutskiy wrote:
> > On Sun, 2011-11-13 at 11:18 +0100, Robert Jarzmik wrote:
> > > I personally think this should be provided by the MTD API. A function
> > > is_page_blank(ofs) could tell if the page was written or not. Now if the
> > > function is NULL, the upper layer (UBIFS, ...) could decide _by itself_ to
> > > assign a free OOB byte to that meaning. But IMHO it's not the drivers duty to
> > > take these decisions to restrict the OOB, let it be done at an upper level.
> > 
> > Probably it is a good idea to introduce an mtd_is_page_blank. But it
> > should either work for all flashes or not introduced at all. I do not
> > think upper layers should use OOB at all. And this interface should also
> > work for NOR flash. Probably it should just fall-back to comparing the
> > data with 0xFF if the driver cannot offer anything special. 
> 
> We *tried* comparing with 0xFF in JFFS2 and found that it wasn't good
> enough (it could be *mostly* erased before a powerfail but not
> completely, so as soon as you start to program it you get a lot of
> bitflips). Hence the cleanmarkers.
> 
> The only way you can treat a page as erased is if you *know* it was
> successfully erased. So I'd be reluctant to introduce a function that
> encourages people to draw inferences they shouldn't.

That's a bit different. Nowadays no one tries to compare to 0xFFs to
check if an eraseblock is erased or not :-)

This is about the situation when you have an eraseblock with data, and
you want to find where it ends, e.g., you want to find the first blank
NAND page. Both JFFS2 and UBIFS search for 0xFFs. It worked for many
years, but modern NANDs have bit-flips even in empty space, so reading a
never written NAND page may return mostly 0xFFs, but with few bits set
to 0. Modern NANDs have strong ECC which can handle 4 bit errors and
above. According to Ivan, manufacturers say it is expected and OK.

No make SW like JFFS2 and UBIFS work, modern drivers need to
either correct bit-flips on blank pages automatically, or be able to
quickly distinguish between blank and non-blank pages and memset the
buffer with 0xFFs for the former case.

What Robert says is that we probably need an is_page_blank() and let the
driver implement it optimally, or make MTD fall-back to 0xFFs
comparison.

This is my understanding.

Artem.
Robert Jarzmik - Nov. 13, 2011, 4:38 p.m.
Artem Bityutskiy <dedekind1@gmail.com> writes:

> What Robert says is that we probably need an is_page_blank() and let the
> driver implement it optimally, or make MTD fall-back to 0xFFs
> comparison.
>
> This is my understanding.
And that's exactly my point.

And while we're discussing MTD API, I'd like to add another thing I was thinking
of, from a conversation Mike and Ivan had.

They discussed how UBIFS is "intolerant" to bitflips, and marks a block as
"unusable" if one bitflip occured, even if the ECC can fix much more.

What I was thinking is that the MTD oob information which exposes how many ECC
are available should expose as well how many bitflips can be fixed (for example
4 bitflips can be fixed, 5 bitflips can be detected). Then, the read_oob()
function could return back 0 if read was successful, -Exxxx on error, or a
positive number N if N bitflips were fixed.
With this information, upper level could decide from (read_oob() return and
ecc.fixable_bitflips) if a block should be marked as unusable (worn out) or not.

I'd like some feedback on this idea as well.

Cheers.

--
Robert
Mike Dunn - Nov. 13, 2011, 7:55 p.m.
On 11/13/2011 08:38 AM, Robert Jarzmik wrote:
>
> And while we're discussing MTD API, I'd like to add another thing I was thinking
> of, from a conversation Mike and Ivan had.
>
> They discussed how UBIFS is "intolerant" to bitflips, and marks a block as
> "unusable" if one bitflip occured, even if the ECC can fix much more.
>
> What I was thinking is that the MTD oob information which exposes how many ECC
> are available should expose as well how many bitflips can be fixed (for example
> 4 bitflips can be fixed, 5 bitflips can be detected). 


You're referring to struct nand_ecclayout?  I wouldn't think that would be the
appropriate place; it just describes the layout, not operational details.


> Then, the read_oob()
> function could return back 0 if read was successful, -Exxxx on error, or a
> positive number N if N bitflips were fixed.
> With this information, upper level could decide from (read_oob() return and
> ecc.fixable_bitflips) if a block should be marked as unusable (worn out) or not.


Something along these lines was suggested by Artem a few days ago:

http://lists.infradead.org/pipermail/linux-mtd/2011-November/038376.html

I'm looking into implementing this.  Currently the drivers return -EUCLEAN from
read() and read_oob() if *any* bit error corrections were made, and this is the
information used by ubi to determine whether to scrub, and also whether to mark
a block as "bad" after running the PEB torture test.  To summarize Artem's
suggestion in my own words... The drivers expose the ecc strength to the mtd
subsystem (as Robert also suggests).  Mtd assigns a "scrublevel" to the device,
settable by the user through sysfs, with ecc strength as the default.  Read
operations no longer go directly to the driver (as they currently do for
unpartitioned devices) but are handled in mtd.  The driver returns the corrected
error count to mtd, which makes the determination of whether to return -EUCLEAN
or 0, based on the number of corrected errors and the scrublevel.

An objection might be that mtd should not be setting policy.  It's also a fairly
sizeable modification.  The alternative would be to implement a mechanism to
return the corrected error count to the higher layer (e.g., ubi) for each read
operation.  This would be even more work, requiring modifications to mtd and ubi.

I'd like to work to resolve this either way, as ubi and ubifs are the killer
apps for these new drivers.

Thanks,
Mike
Artem Bityutskiy - Nov. 13, 2011, 8:27 p.m.
On Sun, 2011-11-13 at 11:55 -0800, Mike Dunn wrote:
> An objection might be that mtd should not be setting policy.  It's also a fairly
> sizeable modification.  The alternative would be to implement a mechanism to
> return the corrected error count to the higher layer (e.g., ubi) for each read
> operation.  This would be even more work, requiring modifications to mtd and ubi.

Yeah, probably just returning the ECC correction count is cleaner
design. Probably we can add another argument to the mtd read function
and if the return code is -EUCLEAN (correctable bit-flips happened), it
would contain the highest ECC correction count encountered while reading
this region of the flash. So the SW which does not care, will not
require any changes.

I am not sure if you'll need to mtd interfaces from mtd->func(...) to
mtd_func(mtd, ...) for this or not, though.

Artem.
Mike Dunn - Nov. 14, 2011, 12:58 a.m.
On 11/13/2011 02:18 AM, Robert Jarzmik wrote:
> I took a different approach.  I check an internal docg3 register to see if the
> page was written. Or I could had have checked the Hamming code, as I don't think
> it can be 0xff whatever the pagesize 7 bytes values.
>
> The reason behind is that the Hamming code is Ham(64, 57), ie. Ham(2^6,
> 2^6-6-1). The means the 6 bits are enough to cover all codewords possibilities,
> and 0xff is not one of them.
>
> So unless a bitflip in Hamming code, 0xff in it means blank page.


Clever!


>  And I think
> the ECC engine is even smarter, with the ECCCONF1_PAGE_IS_WRITTEN.


I forgot about this.  If the hardware can indeed tell you, I guess this is the
best way.  Sounds like I'll be updating my blank page detection.

Mike
Artem Bityutskiy - Nov. 14, 2011, 5:38 p.m.
On Mon, 2011-11-14 at 10:08 -0800, Mike Dunn wrote:
> This would be better than the cumulative error count over the entire block,
> because the highest count on any one page is more significant, I think.

Yeah, although in the previous proposal I also assumed something like
that, not "cumulative".

Just a side note - take my suggestions with a grain of salt - I do not
actively work on MTD any longer so may mislead you :-)

> >  So the SW which does not care, will not
> > require any changes.
> >
> > I am not sure if you'll need to mtd interfaces from mtd->func(...) to
> > mtd_func(mtd, ...) for this or not, though.
> 
> 
> I don't (yet) see why I would need to. 
> 
> Just adding the argument to mtd->read(), mtd->read_oob(),  would be a simple
> change, but large in scope, affecting all users of the mtd interface.  Any
> advice on how to proceed?

Add the argument without implementing its support, amend all users and
make them compile.

>   Should it be one big patchset, with individual
> patches for changes to mtd, nand, one_nand, mtdchar, each driver, ... ?
>   If it
> is not all merged at once, the build will be broken for the unpatched
> components.  Or is that acceptable, and the patches can be submitted piecemeal,
> starting with, say, mtd, nand, nandsim, mtdram, mtdchar?  Or should we
> temporarily create a branch from l2-mtd until we're satisfiled that this is all
> stable?

We can create a branch regardless, if you find this useful.

I guess one big patch should be OK. If it causes issues we can later
think how to split it.
Mike Dunn - Nov. 14, 2011, 6:08 p.m.
On 11/13/2011 12:27 PM, Artem Bityutskiy wrote:
> On Sun, 2011-11-13 at 11:55 -0800, Mike Dunn wrote:
>> An objection might be that mtd should not be setting policy.  It's also a fairly
>> sizeable modification.  The alternative would be to implement a mechanism to
>> return the corrected error count to the higher layer (e.g., ubi) for each read
>> operation.  This would be even more work, requiring modifications to mtd and ubi.
> Yeah, probably just returning the ECC correction count is cleaner
> design. Probably we can add another argument to the mtd read function
> and if the return code is -EUCLEAN (correctable bit-flips happened), it
> would contain the highest ECC correction count encountered while reading
> this region of the flash.


This would be better than the cumulative error count over the entire block,
because the highest count on any one page is more significant, I think.


>  So the SW which does not care, will not
> require any changes.
>
> I am not sure if you'll need to mtd interfaces from mtd->func(...) to
> mtd_func(mtd, ...) for this or not, though.


I don't (yet) see why I would need to. 

Just adding the argument to mtd->read(), mtd->read_oob(),  would be a simple
change, but large in scope, affecting all users of the mtd interface.  Any
advice on how to proceed?  Should it be one big patchset, with individual
patches for changes to mtd, nand, one_nand, mtdchar, each driver, ... ?  If it
is not all merged at once, the build will be broken for the unpatched
components.  Or is that acceptable, and the patches can be submitted piecemeal,
starting with, say, mtd, nand, nandsim, mtdram, mtdchar?  Or should we
temporarily create a branch from l2-mtd until we're satisfiled that this is all
stable?

Thanks,
Mike

> Artem.
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
>

Patch

diff --git a/drivers/mtd/devices/docg3.c b/drivers/mtd/devices/docg3.c
index 6eca7f6..27c4fea 100644
--- a/drivers/mtd/devices/docg3.c
+++ b/drivers/mtd/devices/docg3.c
@@ -63,6 +63,20 @@ 
  *
  */
 
+/**
+ * struct docg3_oobinfo - DiskOnChip G3 OOB layout
+ * @eccbytes: 8 bytes are used (1 for Hamming ECC, 7 for BCH ECC)
+ * @eccpos: ecc positions (byte 7 is Hamming ECC, byte 8-14 are BCH ECC)
+ * @oobfree: free pageinfo bytes (byte 0 until byte 6, byte 15
+ * @oobavail: 8 available bytes remaining after ECC toll
+ */
+static struct nand_ecclayout docg3_oobinfo = {
+	.eccbytes = 8,
+	.eccpos = {7, 8, 9, 10, 11, 12, 13, 14},
+	.oobfree = {{0, 7}, {15, 1} },
+	.oobavail = 8,
+};
+
 static inline u8 doc_readb(struct docg3 *docg3, u16 reg)
 {
 	u8 val = readb(docg3->base + reg);
@@ -973,6 +987,7 @@  static void __init doc_set_driver_info(int chip_id, struct mtd_info *mtd)
 	mtd->write_oob = NULL;
 	mtd->sync = NULL;
 	mtd->block_isbad = doc_block_isbad;
+	mtd->ecclayout = &docg3_oobinfo;
 }
 
 /**