diff mbox

[v2] nand: nand_base: Always initialise oob_poi before writing OOB data

Message ID E14CAA9D00C5044EACE0C59FC2A1C7F5059A8DF591@FRMRSSXCHMBSC1.dc-m.alcatel-lucent.com
State New, archived
Headers show

Commit Message

THOMSON, Adam (Adam) May 25, 2011, 10:25 a.m. UTC
In nand_do_write_ops() code it is possible for a caller to provide
ops.oobbuf populated and ops.mode == MTD_OOB_AUTO, which currently
means that the chip->oob_poi buffer isn't initialised to all 0xFF.
The nand_fill_oob() method then carries out the task of copying
the provided OOB data to oob_poi, but with MTD_OOB_AUTO it skips
areas marked as unavailable by the layout struct, including the
bad block marker bytes.

An example of this causing issues is when the last OOB data read
was from the start of a bad block where the markers are not 0xFF,
and the caller wishes to write new OOB data at the beginning of
another block. In this scenario the caller would provide OOB data,
but nand_fill_oob() would skip the bad block marker bytes in
oob_poi before copying the OOB data provided by the caller.
This means that when the OOB data is written back to NAND,
the block is inadvertently marked as bad without the caller knowing.
This has been witnessed when using YAFFS2 where tags are stored
in the OOB.

This patch changes the code so that oob_poi is always initialised
to 0xFF to make sure no left over data is inadvertently written
back to OOB data.

Signed-off-by: Adam Thomson <adam.thomson@alcatel-lucent.com>
---
 drivers/mtd/nand/nand_base.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

Comments

Artem Bityutskiy May 26, 2011, 8:01 a.m. UTC | #1
On Wed, 2011-05-25 at 12:25 +0200, THOMSON, Adam (Adam) wrote:
> In nand_do_write_ops() code it is possible for a caller to provide
> ops.oobbuf populated and ops.mode == MTD_OOB_AUTO, which currently
> means that the chip->oob_poi buffer isn't initialised to all 0xFF.
> The nand_fill_oob() method then carries out the task of copying
> the provided OOB data to oob_poi, but with MTD_OOB_AUTO it skips
> areas marked as unavailable by the layout struct, including the
> bad block marker bytes.
> 
> An example of this causing issues is when the last OOB data read
> was from the start of a bad block where the markers are not 0xFF,
> and the caller wishes to write new OOB data at the beginning of
> another block. In this scenario the caller would provide OOB data,
> but nand_fill_oob() would skip the bad block marker bytes in
> oob_poi before copying the OOB data provided by the caller.
> This means that when the OOB data is written back to NAND,
> the block is inadvertently marked as bad without the caller knowing.
> This has been witnessed when using YAFFS2 where tags are stored
> in the OOB.
> 
> This patch changes the code so that oob_poi is always initialised
> to 0xFF to make sure no left over data is inadvertently written
> back to OOB data.
> 
> Signed-off-by: Adam Thomson <adam.thomson@alcatel-lucent.com>

Hi Adam,

thanks a lot for the patch. We definitely need to fix this issue. Could
we please have the following things done before we merge a fix for this.

1. We need to send this to the -stable tree, so this needs a

Cc: stable@kernel.org

You say this is broken since 2.6.31, are you 100% sure? Can you point
the specific commit which broke this? Or you just guess that this is
2.6.31?

This is important because we'd need to add

Cc: stable@kernel.org [2.6.31+] in that case.

But AFAICS, the breakage happened after commit

commit 7dcdcbef5d2b60d1db68fd2c07351f7afd8ad376
Author: David Woodhouse <dwmw2@infradead.org>
Date:   Sat Oct 21 17:09:53 2006 +0100

    [MTD] NAND: Combined oob buffer so it's contiguous with data

which was released in 2.6.20.



2. I think the right place fir this memset is 'nand_fill_oob()'. But in
this case the first memset from 'nand_do_write_oob()' has to be removed.



3. And finally, I think we should clean-up the following piece of code
   in 'nand_do_write_oob()':

        memset(chip->oob_poi, 0xff, mtd->oobsize);
        nand_fill_oob(chip, ops->oobbuf, ops->ooblen, ops);
        status = chip->ecc.write_oob(mtd, chip, page & chip->pagemask);
        memset(chip->oob_poi, 0xff, mtd->oobsize);

This should not be part of your fix and should not go to stable. But I
think this should be a condition for your fix to be accepted.
Unfortunately, so few people love mtd and it is full of crap which
should be cleaned-up, and the only thing to do this is to force people
who want just a bug-fix to also do a bit of cleaning up. Sorry, nothing
personal.

Would you please be kind enough and look into this (if what I say makes
technical sense, of course)? I think the second memset should not exist
at all. And instead, all read side code has to memset the buffer for
itself.

Most probably the memset can just be removed, but I'm not 100% sure and
this needs a more careful look.

How does this sound to you?

Thanks!
THOMSON, Adam (Adam) May 26, 2011, 3:15 p.m. UTC | #2
Artem Bityutskiy Wrote:

> 1. We need to send this to the -stable tree, so this needs a
> 
> Cc: stable@kernel.org
> 
> You say this is broken since 2.6.31, are you 100% sure? Can 
> you point the specific commit which broke this? Or you just 
> guess that this is 2.6.31?
>
> This is important because we'd need to add
> 
> Cc: stable@kernel.org [2.6.31+] in that case.
> 
> But AFAICS, the breakage happened after commit
> 
> commit 7dcdcbef5d2b60d1db68fd2c07351f7afd8ad376
> Author: David Woodhouse <dwmw2@infradead.org>
> Date:   Sat Oct 21 17:09:53 2006 +0100
> 
>     [MTD] NAND: Combined oob buffer so it's contiguous with data
> 
> which was released in 2.6.20.
> 

The reason I mentioned 2.6.31 was because it was the earliest kernel I
had been using when I witnessed this issue. Having looked at the commit
you mentioned, I have to agree that's where the problem first appeared.
Do you want me to add the CC with [2.6.20+] (assume that should be part
of the patch text in the mail)?

> 
> 
> 2. I think the right place fir this memset is 
> 'nand_fill_oob()'. But in this case the first memset from 
> 'nand_do_write_oob()' has to be removed.
> 

Yes, that makes sense. Did consider that afterwards. Will update
Accordingly.

> 
> 
> 3. And finally, I think we should clean-up the following piece of code
>    in 'nand_do_write_oob()':
> 
>         memset(chip->oob_poi, 0xff, mtd->oobsize);
>         nand_fill_oob(chip, ops->oobbuf, ops->ooblen, ops);
>         status = chip->ecc.write_oob(mtd, chip, page & 
> chip->pagemask);
>         memset(chip->oob_poi, 0xff, mtd->oobsize);
> 
> This should not be part of your fix and should not go to 
> stable. But I think this should be a condition for your fix 
> to be accepted.
> Unfortunately, so few people love mtd and it is full of crap 
> which should be cleaned-up, and the only thing to do this is 
> to force people who want just a bug-fix to also do a bit of 
> cleaning up. Sorry, nothing personal.
>

No that's fine. Shouldn't be too much to do. Do you want both
the stable patch and the proper fix submitted around the same
time, or are you happy to get the initial fix in first, and follow up
with the more complete tidying of that code? Also am guessing
the complete patch should be based on the latest and greatest
Kernel (2.6.39)?

> 
> Would you please be kind enough and look into this (if what I 
> say makes technical sense, of course)? I think the second 
> memset should not exist at all. And instead, all read side 
> code has to memset the buffer for itself.
> 
> Most probably the memset can just be removed, but I'm not 
> 100% sure and this needs a more careful look.

Having looked briefly at the read side code in nand_base.c,
it does look like it should be enough to remove the second
memset and leave the read side code as is, but will examine
it more thoroughly before I post a patch.

Thanks

Adam
Artem Bityutskiy May 27, 2011, 9:30 a.m. UTC | #3
On Thu, 2011-05-26 at 17:15 +0200, THOMSON, Adam (Adam) wrote:
> The reason I mentioned 2.6.31 was because it was the earliest kernel I
> had been using when I witnessed this issue. Having looked at the commit
> you mentioned, I have to agree that's where the problem first appeared.
> Do you want me to add the CC with [2.6.20+] (assume that should be part
> of the patch text in the mail)?

Yes, please, add the CC to -stable.

> > 2. I think the right place fir this memset is 
> > 'nand_fill_oob()'. But in this case the first memset from 
> > 'nand_do_write_oob()' has to be removed.
> > 
> 
> Yes, that makes sense. Did consider that afterwards. Will update
> Accordingly.

Thanks!

> No that's fine. Shouldn't be too much to do. Do you want both
> the stable patch and the proper fix submitted around the same
> time,

Yes, please.

>  or are you happy to get the initial fix in first, and follow up
> with the more complete tidying of that code?

Please, send all together. It is anyway too late to merge it to 2.6.40,
so there is no rush.

>  Also am guessing
> the complete patch should be based on the latest and greatest
> Kernel (2.6.39)?

Yes, of course, although the latest kernel is this one:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

But it is OK to _test_ it an older kernel, though. 

> Having looked briefly at the read side code in nand_base.c,
> it does look like it should be enough to remove the second
> memset and leave the read side code as is, but will examine
> it more thoroughly before I post a patch.

Thanks a lot. When you send a patch, please, also explicitly tell
whether you tested it on some HW or only compile-tested.

Thanks!
Artem Bityutskiy June 7, 2011, 11:06 a.m. UTC | #4
On Thu, 2011-05-26 at 17:15 +0200, THOMSON, Adam (Adam) wrote:
> Having looked briefly at the read side code in nand_base.c,
> it does look like it should be enough to remove the second
> memset and leave the read side code as is, but will examine
> it more thoroughly before I post a patch.

Any news? :-)
THOMSON, Adam (Adam) June 7, 2011, 1 p.m. UTC | #5
> -----Original Message-----
> From: Artem Bityutskiy [mailto:dedekind1@gmail.com] 
> Sent: 07 June 2011 12:07
> To: THOMSON, Adam (Adam)
> Cc: linux-mtd@lists.infradead.org
> Subject: RE: [PATCH v2] nand: nand_base: Always initialise 
> oob_poi before writing OOB data
> 
> On Thu, 2011-05-26 at 17:15 +0200, THOMSON, Adam (Adam) wrote:
> > Having looked briefly at the read side code in nand_base.c, it does 
> > look like it should be enough to remove the second memset and leave 
> > the read side code as is, but will examine it more 
> thoroughly before I 
> > post a patch.
> 
> Any news? :-)

Apologies. Have been sidetracked with other work. Have put changes together but want to check them over just so I'm happy, in particular verifying them on some hardware to make sure all still works as expected. Hopefully should have something in the next week for you. :o)

Thanks

Adam

> 
> --
> Best Regards,
> Artem Bityutskiy (Артём Битюцкий)
> 
>
Artem Bityutskiy June 7, 2011, 1 p.m. UTC | #6
On Tue, 2011-06-07 at 15:00 +0200, THOMSON, Adam (Adam) wrote:
> Apologies. Have been sidetracked with other work. Have put changes
> together but want to check them over just so I'm happy, in particular
> verifying them on some hardware to make sure all still works as
> expected. Hopefully should have something in the next week for
> you. :o)

OK, thanks, although next week I'll be busy and sidetracked :-)
diff mbox

Patch

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 8c21b89..6bc7212 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1843,9 +1843,11 @@  static int nand_do_write_ops(struct mtd_info *mtd, loff_t to,
 	    (chip->pagebuf << chip->page_shift) < (to + ops->len))
 		chip->pagebuf = -1;
 
-	/* If we're not given explicit OOB data, let it be 0xFF */
-	if (likely(!oob))
-		memset(chip->oob_poi, 0xff, mtd->oobsize);
+	/*
+	 * Initialise to all 0xFF, to avoid the possibility of
+	 * left over OOB data from a previous OOB read.
+	 */
+	memset(chip->oob_poi, 0xff, mtd->oobsize);
 
 	while(1) {
 		int bytes = mtd->writesize;