Message ID | E14CAA9D00C5044EACE0C59FC2A1C7F5059A8DF591@FRMRSSXCHMBSC1.dc-m.alcatel-lucent.com |
---|---|
State | New, archived |
Headers | show |
On Wed, 2011-05-25 at 12:25 +0200, THOMSON, Adam (Adam) wrote: > In nand_do_write_ops() code it is possible for a caller to provide > ops.oobbuf populated and ops.mode == MTD_OOB_AUTO, which currently > means that the chip->oob_poi buffer isn't initialised to all 0xFF. > The nand_fill_oob() method then carries out the task of copying > the provided OOB data to oob_poi, but with MTD_OOB_AUTO it skips > areas marked as unavailable by the layout struct, including the > bad block marker bytes. > > An example of this causing issues is when the last OOB data read > was from the start of a bad block where the markers are not 0xFF, > and the caller wishes to write new OOB data at the beginning of > another block. In this scenario the caller would provide OOB data, > but nand_fill_oob() would skip the bad block marker bytes in > oob_poi before copying the OOB data provided by the caller. > This means that when the OOB data is written back to NAND, > the block is inadvertently marked as bad without the caller knowing. > This has been witnessed when using YAFFS2 where tags are stored > in the OOB. > > This patch changes the code so that oob_poi is always initialised > to 0xFF to make sure no left over data is inadvertently written > back to OOB data. > > Signed-off-by: Adam Thomson <adam.thomson@alcatel-lucent.com> Hi Adam, thanks a lot for the patch. We definitely need to fix this issue. Could we please have the following things done before we merge a fix for this. 1. We need to send this to the -stable tree, so this needs a Cc: stable@kernel.org You say this is broken since 2.6.31, are you 100% sure? Can you point the specific commit which broke this? Or you just guess that this is 2.6.31? This is important because we'd need to add Cc: stable@kernel.org [2.6.31+] in that case. But AFAICS, the breakage happened after commit commit 7dcdcbef5d2b60d1db68fd2c07351f7afd8ad376 Author: David Woodhouse <dwmw2@infradead.org> Date: Sat Oct 21 17:09:53 2006 +0100 [MTD] NAND: Combined oob buffer so it's contiguous with data which was released in 2.6.20. 2. I think the right place fir this memset is 'nand_fill_oob()'. But in this case the first memset from 'nand_do_write_oob()' has to be removed. 3. And finally, I think we should clean-up the following piece of code in 'nand_do_write_oob()': memset(chip->oob_poi, 0xff, mtd->oobsize); nand_fill_oob(chip, ops->oobbuf, ops->ooblen, ops); status = chip->ecc.write_oob(mtd, chip, page & chip->pagemask); memset(chip->oob_poi, 0xff, mtd->oobsize); This should not be part of your fix and should not go to stable. But I think this should be a condition for your fix to be accepted. Unfortunately, so few people love mtd and it is full of crap which should be cleaned-up, and the only thing to do this is to force people who want just a bug-fix to also do a bit of cleaning up. Sorry, nothing personal. Would you please be kind enough and look into this (if what I say makes technical sense, of course)? I think the second memset should not exist at all. And instead, all read side code has to memset the buffer for itself. Most probably the memset can just be removed, but I'm not 100% sure and this needs a more careful look. How does this sound to you? Thanks!
Artem Bityutskiy Wrote: > 1. We need to send this to the -stable tree, so this needs a > > Cc: stable@kernel.org > > You say this is broken since 2.6.31, are you 100% sure? Can > you point the specific commit which broke this? Or you just > guess that this is 2.6.31? > > This is important because we'd need to add > > Cc: stable@kernel.org [2.6.31+] in that case. > > But AFAICS, the breakage happened after commit > > commit 7dcdcbef5d2b60d1db68fd2c07351f7afd8ad376 > Author: David Woodhouse <dwmw2@infradead.org> > Date: Sat Oct 21 17:09:53 2006 +0100 > > [MTD] NAND: Combined oob buffer so it's contiguous with data > > which was released in 2.6.20. > The reason I mentioned 2.6.31 was because it was the earliest kernel I had been using when I witnessed this issue. Having looked at the commit you mentioned, I have to agree that's where the problem first appeared. Do you want me to add the CC with [2.6.20+] (assume that should be part of the patch text in the mail)? > > > 2. I think the right place fir this memset is > 'nand_fill_oob()'. But in this case the first memset from > 'nand_do_write_oob()' has to be removed. > Yes, that makes sense. Did consider that afterwards. Will update Accordingly. > > > 3. And finally, I think we should clean-up the following piece of code > in 'nand_do_write_oob()': > > memset(chip->oob_poi, 0xff, mtd->oobsize); > nand_fill_oob(chip, ops->oobbuf, ops->ooblen, ops); > status = chip->ecc.write_oob(mtd, chip, page & > chip->pagemask); > memset(chip->oob_poi, 0xff, mtd->oobsize); > > This should not be part of your fix and should not go to > stable. But I think this should be a condition for your fix > to be accepted. > Unfortunately, so few people love mtd and it is full of crap > which should be cleaned-up, and the only thing to do this is > to force people who want just a bug-fix to also do a bit of > cleaning up. Sorry, nothing personal. > No that's fine. Shouldn't be too much to do. Do you want both the stable patch and the proper fix submitted around the same time, or are you happy to get the initial fix in first, and follow up with the more complete tidying of that code? Also am guessing the complete patch should be based on the latest and greatest Kernel (2.6.39)? > > Would you please be kind enough and look into this (if what I > say makes technical sense, of course)? I think the second > memset should not exist at all. And instead, all read side > code has to memset the buffer for itself. > > Most probably the memset can just be removed, but I'm not > 100% sure and this needs a more careful look. Having looked briefly at the read side code in nand_base.c, it does look like it should be enough to remove the second memset and leave the read side code as is, but will examine it more thoroughly before I post a patch. Thanks Adam
On Thu, 2011-05-26 at 17:15 +0200, THOMSON, Adam (Adam) wrote: > The reason I mentioned 2.6.31 was because it was the earliest kernel I > had been using when I witnessed this issue. Having looked at the commit > you mentioned, I have to agree that's where the problem first appeared. > Do you want me to add the CC with [2.6.20+] (assume that should be part > of the patch text in the mail)? Yes, please, add the CC to -stable. > > 2. I think the right place fir this memset is > > 'nand_fill_oob()'. But in this case the first memset from > > 'nand_do_write_oob()' has to be removed. > > > > Yes, that makes sense. Did consider that afterwards. Will update > Accordingly. Thanks! > No that's fine. Shouldn't be too much to do. Do you want both > the stable patch and the proper fix submitted around the same > time, Yes, please. > or are you happy to get the initial fix in first, and follow up > with the more complete tidying of that code? Please, send all together. It is anyway too late to merge it to 2.6.40, so there is no rush. > Also am guessing > the complete patch should be based on the latest and greatest > Kernel (2.6.39)? Yes, of course, although the latest kernel is this one: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git But it is OK to _test_ it an older kernel, though. > Having looked briefly at the read side code in nand_base.c, > it does look like it should be enough to remove the second > memset and leave the read side code as is, but will examine > it more thoroughly before I post a patch. Thanks a lot. When you send a patch, please, also explicitly tell whether you tested it on some HW or only compile-tested. Thanks!
On Thu, 2011-05-26 at 17:15 +0200, THOMSON, Adam (Adam) wrote: > Having looked briefly at the read side code in nand_base.c, > it does look like it should be enough to remove the second > memset and leave the read side code as is, but will examine > it more thoroughly before I post a patch. Any news? :-)
> -----Original Message----- > From: Artem Bityutskiy [mailto:dedekind1@gmail.com] > Sent: 07 June 2011 12:07 > To: THOMSON, Adam (Adam) > Cc: linux-mtd@lists.infradead.org > Subject: RE: [PATCH v2] nand: nand_base: Always initialise > oob_poi before writing OOB data > > On Thu, 2011-05-26 at 17:15 +0200, THOMSON, Adam (Adam) wrote: > > Having looked briefly at the read side code in nand_base.c, it does > > look like it should be enough to remove the second memset and leave > > the read side code as is, but will examine it more > thoroughly before I > > post a patch. > > Any news? :-) Apologies. Have been sidetracked with other work. Have put changes together but want to check them over just so I'm happy, in particular verifying them on some hardware to make sure all still works as expected. Hopefully should have something in the next week for you. :o) Thanks Adam > > -- > Best Regards, > Artem Bityutskiy (Артём Битюцкий) > >
On Tue, 2011-06-07 at 15:00 +0200, THOMSON, Adam (Adam) wrote: > Apologies. Have been sidetracked with other work. Have put changes > together but want to check them over just so I'm happy, in particular > verifying them on some hardware to make sure all still works as > expected. Hopefully should have something in the next week for > you. :o) OK, thanks, although next week I'll be busy and sidetracked :-)
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 8c21b89..6bc7212 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -1843,9 +1843,11 @@ static int nand_do_write_ops(struct mtd_info *mtd, loff_t to, (chip->pagebuf << chip->page_shift) < (to + ops->len)) chip->pagebuf = -1; - /* If we're not given explicit OOB data, let it be 0xFF */ - if (likely(!oob)) - memset(chip->oob_poi, 0xff, mtd->oobsize); + /* + * Initialise to all 0xFF, to avoid the possibility of + * left over OOB data from a previous OOB read. + */ + memset(chip->oob_poi, 0xff, mtd->oobsize); while(1) { int bytes = mtd->writesize;
In nand_do_write_ops() code it is possible for a caller to provide ops.oobbuf populated and ops.mode == MTD_OOB_AUTO, which currently means that the chip->oob_poi buffer isn't initialised to all 0xFF. The nand_fill_oob() method then carries out the task of copying the provided OOB data to oob_poi, but with MTD_OOB_AUTO it skips areas marked as unavailable by the layout struct, including the bad block marker bytes. An example of this causing issues is when the last OOB data read was from the start of a bad block where the markers are not 0xFF, and the caller wishes to write new OOB data at the beginning of another block. In this scenario the caller would provide OOB data, but nand_fill_oob() would skip the bad block marker bytes in oob_poi before copying the OOB data provided by the caller. This means that when the OOB data is written back to NAND, the block is inadvertently marked as bad without the caller knowing. This has been witnessed when using YAFFS2 where tags are stored in the OOB. This patch changes the code so that oob_poi is always initialised to 0xFF to make sure no left over data is inadvertently written back to OOB data. Signed-off-by: Adam Thomson <adam.thomson@alcatel-lucent.com> --- drivers/mtd/nand/nand_base.c | 8 +++++--- 1 files changed, 5 insertions(+), 3 deletions(-)