Patchwork Bug on atmel_nand HW ECC : OOB info not correctly written

login
register
mail settings
Submitter Richard Genoud
Date Oct. 12, 2008, 6:42 a.m.
Message ID <1223793748.7110.13.camel@ubuntu>
Download mbox | patch
Permalink /patch/4049/
State Accepted
Commit 3fc2389847a84d06263c13a3b6dfe1f1d6eea935
Headers show

Comments

Richard Genoud - Oct. 12, 2008, 6:42 a.m.
The functions that write the OOB info (on hardware ECC only) use the HW_SYNDROME method.
This is not correct : the start position is "pos = eccsize + chunk" and should be eccsize.
So, the standard (nand_write_oob_std) function should be used.
This patch corrects this by using NAND_ECC_HW instead of NAND_ECC_HW_SYNDROME.

This has only been tested on small pages nand flash.
(if anyone can test it on large pages that would be great).

kernel version : 2.6.27-rc2 (current git mtd-2.6)

Signed-off-by: Richard Genoud <richard.genoud@gmail.com>

---
David Woodhouse - Oct. 13, 2008, 2:01 p.m.
On Sun, 2008-10-12 at 08:42 +0200, Richard Genoud wrote:
> The functions that write the OOB info (on hardware ECC only) use the
> HW_SYNDROME method.
> This is not correct : the start position is "pos = eccsize + chunk"
> and should be eccsize.
> So, the standard (nand_write_oob_std) function should be used.
> This patch corrects this by using NAND_ECC_HW instead of
> NAND_ECC_HW_SYNDROME.
> 
> This has only been tested on small pages nand flash.
> (if anyone can test it on large pages that would be great).
> 
> kernel version : 2.6.27-rc2 (current git mtd-2.6)
> 
> Signed-off-by: Richard Genoud <richard.genoud@gmail.com>

Håvard?
Haavard Skinnemoen - Oct. 13, 2008, 2:13 p.m.
David Woodhouse <dwmw2@infradead.org> wrote:
> On Sun, 2008-10-12 at 08:42 +0200, Richard Genoud wrote:
> > The functions that write the OOB info (on hardware ECC only) use the
> > HW_SYNDROME method.
> > This is not correct : the start position is "pos = eccsize + chunk"
> > and should be eccsize.
> > So, the standard (nand_write_oob_std) function should be used.
> > This patch corrects this by using NAND_ECC_HW instead of
> > NAND_ECC_HW_SYNDROME.
> > 
> > This has only been tested on small pages nand flash.
> > (if anyone can test it on large pages that would be great).
> > 
> > kernel version : 2.6.27-rc2 (current git mtd-2.6)
> > 
> > Signed-off-by: Richard Genoud <richard.genoud@gmail.com>  
> 
> Håvard?

I have no clue, I'm afraid. What's the difference between NAND_ECC_HW
and NAND_ECC_HW_SYNDROME?

I do think I have a large page NAND flash available for testing though,
so I can do that. Will any potential problems be easy to spot?

In other words: What are the consequences of this bug?

Haavard
Richard Genoud - Oct. 13, 2008, 3:44 p.m.
2008/10/13 Haavard Skinnemoen <haavard.skinnemoen@atmel.com>:
> I have no clue, I'm afraid. What's the difference between NAND_ECC_HW
> and NAND_ECC_HW_SYNDROME?
>
> I do think I have a large page NAND flash available for testing though,
> so I can do that. Will any potential problems be easy to spot?
>
> In other words: What are the consequences of this bug?
>
> Haavard
>

The NAND_ECC_HW_SYNDROME is normally used with "non standard" ECC
controllers that need to have a complete access on the OOB area ( the
OOB pointer is given in the xxx_nand_calculate function whereas in
NAND_ECC_HW, only an ecc code pointer is given).
The read and write mechanisms are also a "little bit" more complicated
in syndrome mode, they admit ecc post-pad and pre-pad data which are
not used in our case.

The reported bug was from a small pages user :
mount -tyaffs2 /dev/mtdblock0 /mnt/part
touch /mnt/part/foo
rm /mnt/part/foo
umount /mnt/part
mount -tyaffs2 /dev/mtdblock0 /mnt/part
... and foo is still there.

the code in cause was in atmel_nand_write_oob_512, the writing start
position was pos = eccsize + chunk; instead of pos = eccsize; (eccsize
= size of a page). The OOB info was written with an offset of 4 bytes
(=chunk) and so Yaffs didn't manage to read its deleted file marker.
With this patch, the deleted files are removed.

There's the same mechanism for large pages (the at91sam9263-ek has large pages).
Today, nand_write_oob_syndrome is used and we've got pos = steps *
(eccsize + chunk); with steps = 1.
There's no reason to have an offset of 4 bytes (chunk).

I'm not sure that there will be the same bug with yaffs and large
pages, but there will certairnly be some problems.

maybe there's a way with the mtdtools to call the read/write_oob_ functions.
In this case, it would be easy to see the bug.

(I would have do it myself, but I have no board left...)

richard.
Richard Genoud - Nov. 21, 2008, 9:50 a.m.
2008/10/13 Richard Genoud <richard.genoud@gmail.com>:
> 2008/10/13 Haavard Skinnemoen <haavard.skinnemoen@atmel.com>:
>> I have no clue, I'm afraid. What's the difference between NAND_ECC_HW
>> and NAND_ECC_HW_SYNDROME?
>>
>> I do think I have a large page NAND flash available for testing though,
>> so I can do that. Will any potential problems be easy to spot?
>>
>> In other words: What are the consequences of this bug?
>>
>> Haavard
>>

hi !
Have you some feed-back on this patch ?
Justin tested it successfully with a Samsung K9F2G08U0M-PCB0 256 MiB
with 2048 Byte page size
and 128 KiB Erase Block size on an at91sam9260-ek.
Haavard Skinnemoen - Dec. 18, 2008, 12:29 p.m.
Richard Genoud wrote:
> Have you some feed-back on this patch ?
> Justin tested it successfully with a Samsung K9F2G08U0M-PCB0 256 MiB
> with 2048 Byte page size
> and 128 KiB Erase Block size on an at91sam9260-ek.

Sorry about not responding earlier, but FWIW, I have successfully
booted a avr32 board after this patch was merged, and I can't see any
obvious problems. So I think it works fine.

The NAND chip I have is also a Samsung one:

NAND device: Manufacturer ID: 0xec, Chip ID: 0xda (Samsung NAND 256MiB 3,3V 8-bit)

with 2K page size.

Haavard
Richard Genoud - Dec. 18, 2008, 2:03 p.m.
2008/12/18, Haavard Skinnemoen <haavard.skinnemoen@atmel.com>:
> Sorry about not responding earlier, but FWIW, I have successfully
> booted a avr32 board after this patch was merged, and I can't see any
> obvious problems. So I think it works fine.
>
> The NAND chip I have is also a Samsung one:
>
> NAND device: Manufacturer ID: 0xec, Chip ID: 0xda (Samsung NAND 256MiB 3,3V
> 8-bit)
>
> with 2K page size.
>
> Haavard
>

Ok, thanks for the testing !

Richard.

Patch

diff --git a/drivers/mtd/nand/atmel_nand.c b/drivers/mtd/nand/atmel_nand.c
index 3387e0d..c98c157 100644
--- a/drivers/mtd/nand/atmel_nand.c
+++ b/drivers/mtd/nand/atmel_nand.c
@@ -174,48 +174,6 @@  static void atmel_write_buf16(struct mtd_info *mtd, const u8 *buf, int len)
 }
 
 /*
- * write oob for small pages
- */
-static int atmel_nand_write_oob_512(struct mtd_info *mtd,
-		struct nand_chip *chip, int page)
-{
-	int chunk = chip->ecc.bytes + chip->ecc.prepad + chip->ecc.postpad;
-	int eccsize = chip->ecc.size, length = mtd->oobsize;
-	int len, pos, status = 0;
-	const uint8_t *bufpoi = chip->oob_poi;
-
-	pos = eccsize + chunk;
-
-	chip->cmdfunc(mtd, NAND_CMD_SEQIN, pos, page);
-	len = min_t(int, length, chunk);
-	chip->write_buf(mtd, bufpoi, len);
-	bufpoi += len;
-	length -= len;
-	if (length > 0)
-		chip->write_buf(mtd, bufpoi, length);
-
-	chip->cmdfunc(mtd, NAND_CMD_PAGEPROG, -1, -1);
-	status = chip->waitfunc(mtd, chip);
-
-	return status & NAND_STATUS_FAIL ? -EIO : 0;
-
-}
-
-/*
- * read oob for small pages
- */
-static int atmel_nand_read_oob_512(struct mtd_info *mtd,
-		struct nand_chip *chip,	int page, int sndcmd)
-{
-	if (sndcmd) {
-		chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page);
-		sndcmd = 0;
-	}
-	chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
-	return sndcmd;
-}
-
-/*
  * Calculate HW ECC
  *
  * function called after a write
@@ -235,14 +193,14 @@  static int atmel_nand_calculate(struct mtd_info *mtd,
 	/* get the first 2 ECC bytes */
 	ecc_value = ecc_readl(host->ecc, PR);
 
-	ecc_code[eccpos[0]] = ecc_value & 0xFF;
-	ecc_code[eccpos[1]] = (ecc_value >> 8) & 0xFF;
+	ecc_code[0] = ecc_value & 0xFF;
+	ecc_code[1] = (ecc_value >> 8) & 0xFF;
 
 	/* get the last 2 ECC bytes */
 	ecc_value = ecc_readl(host->ecc, NPR) & ATMEL_ECC_NPARITY;
 
-	ecc_code[eccpos[2]] = ecc_value & 0xFF;
-	ecc_code[eccpos[3]] = (ecc_value >> 8) & 0xFF;
+	ecc_code[2] = ecc_value & 0xFF;
+	ecc_code[3] = (ecc_value >> 8) & 0xFF;
 
 	return 0;
 }
@@ -476,14 +434,12 @@  static int __init atmel_nand_probe(struct platform_device *pdev)
 			res = -EIO;
 			goto err_ecc_ioremap;
 		}
-		nand_chip->ecc.mode = NAND_ECC_HW_SYNDROME;
+		nand_chip->ecc.mode = NAND_ECC_HW;
 		nand_chip->ecc.calculate = atmel_nand_calculate;
 		nand_chip->ecc.correct = atmel_nand_correct;
 		nand_chip->ecc.hwctl = atmel_nand_hwctl;
 		nand_chip->ecc.read_page = atmel_nand_read_page;
 		nand_chip->ecc.bytes = 4;
-		nand_chip->ecc.prepad = 0;
-		nand_chip->ecc.postpad = 0;
 	}
 
 	nand_chip->chip_delay = 20;		/* 20us command delay time */
@@ -514,7 +470,7 @@  static int __init atmel_nand_probe(struct platform_device *pdev)
 		goto err_scan_ident;
 	}
 
-	if (nand_chip->ecc.mode == NAND_ECC_HW_SYNDROME) {
+	if (nand_chip->ecc.mode == NAND_ECC_HW) {
 		/* ECC is calculated for the whole page (1 step) */
 		nand_chip->ecc.size = mtd->writesize;
 
@@ -522,8 +478,6 @@  static int __init atmel_nand_probe(struct platform_device *pdev)
 		switch (mtd->writesize) {
 		case 512:
 			nand_chip->ecc.layout = &atmel_oobinfo_small;
-			nand_chip->ecc.read_oob = atmel_nand_read_oob_512;
-			nand_chip->ecc.write_oob = atmel_nand_write_oob_512;
 			ecc_writel(host->ecc, MR, ATMEL_ECC_PAGESIZE_528);
 			break;
 		case 1024: