Message ID | 518397C60809E147AF5323E0420B992E3E934BF1@DBDE01.ent.ti.com |
---|---|
State | New, archived |
Headers | show |
On Tue, May 08, 2012 at 01:33:06PM +0100, Philip, Avinash wrote: > Hi, > > We are having an 8-bit NAND part (MT29F2G08ABAEAWP from Micron) connected to GPMC > Module (General purpose memory controller) from TI. Hi, How is ecc performed ? Using NAND internal ecc ? or with GPMC 1-bit Hamming ? 4-bit/8-bit BCH ? Which version of omap2 driver are you using ? Is OOB also ECC-protected ? > We have been seeing mtd_oobtest failure on a partition size of 248 MB. Most of > the time, test case 2 of mtd_oobtest is failing. On debugging further it seems > that bit flip is happening on the test case 2 in OOB area. It is observed that > the failure locations are consistent. If you are able to reproduce failures, then you should be able to tell which bits in OOB are failing, by adding a few debugging lines in the code. > To verify further we had tried writing zeros to OOB area and read it back. > This test is passing and confirms that all OOB bits (that are programmable) > are not bad. It does not confirm anything, bits can fail by remaining stuck at 0. BR, -- Ivan
On Tue, May 08, 2012 at 18:53:54, Ivan Djelic wrote: > On Tue, May 08, 2012 at 01:33:06PM +0100, Philip, Avinash wrote: > > Hi, > > > > We are having an 8-bit NAND part (MT29F2G08ABAEAWP from Micron) > > connected to GPMC Module (General purpose memory controller) from TI. > > Hi, > How is ecc performed ? > Using NAND internal ecc ? or with GPMC 1-bit Hamming ? 4-bit/8-bit BCH ? > Which version of omap2 driver are you using ? > Is OOB also ECC-protected ? Hardware ECC is performing. 4-bit BCH ECC scheme is used. I am using omap2 driver in Linux 3.2.0 Kernel. Don't know omap2 driver version. No, OOB is not ECC protected. > > > We have been seeing mtd_oobtest failure on a partition size of 248 MB. > > Most of the time, test case 2 of mtd_oobtest is failing. On debugging > > further it seems that bit flip is happening on the test case 2 in OOB > > area. It is observed that the failure locations are consistent. > > If you are able to reproduce failures, then you should be able to tell which bits in OOB are failing, by adding a few debugging lines in the code. > I add debugs and found bit flips from 1 to 0. The location of bit flips might vary on boards. But on the same board it is consistent. > > To verify further we had tried writing zeros to OOB area and read it back. > > This test is passing and confirms that all OOB bits (that are > > programmable) are not bad. > > It does not confirm anything, bits can fail by remaining stuck at 0. > As bit flip is from 1 to 0, you are right. But same experiment with 0x55, test is passing. > BR, > -- > Ivan >
On Tue, May 08, 2012 at 20:39:46, Philip, Avinash wrote: > On Tue, May 08, 2012 at 18:53:54, Ivan Djelic wrote: > > On Tue, May 08, 2012 at 01:33:06PM +0100, Philip, Avinash wrote: > > > Hi, > > > > > > We are having an 8-bit NAND part (MT29F2G08ABAEAWP from Micron) > > > connected to GPMC Module (General purpose memory controller) from TI. > > > > Hi, > > How is ecc performed ? > > Using NAND internal ecc ? or with GPMC 1-bit Hamming ? 4-bit/8-bit BCH ? > > Which version of omap2 driver are you using ? > > Is OOB also ECC-protected ? > > Hardware ECC is performing. > 4-bit BCH ECC scheme is used. One correction, 8-bit BCH ECC scheme used, > I am using omap2 driver in Linux 3.2.0 Kernel. Don't know omap2 driver version. > No, OOB is not ECC protected. > > > > > > We have been seeing mtd_oobtest failure on a partition size of 248 MB. > > > Most of the time, test case 2 of mtd_oobtest is failing. On > > > debugging further it seems that bit flip is happening on the test > > > case 2 in OOB area. It is observed that the failure locations are consistent. > > > > If you are able to reproduce failures, then you should be able to tell which bits in OOB are failing, by adding a few debugging lines in the code. > > > > I add debugs and found bit flips from 1 to 0. The location of bit flips might vary on boards. But on the same board it is consistent. > > > > To verify further we had tried writing zeros to OOB area and read it back. > > > This test is passing and confirms that all OOB bits (that are > > > programmable) are not bad. > > > > It does not confirm anything, bits can fail by remaining stuck at 0. > > > > As bit flip is from 1 to 0, you are right. But same experiment with 0x55, test is passing. > > > BR, > > -- > > Ivan > > > > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ >
On Tue, May 08, 2012 at 04:09:46PM +0100, Philip, Avinash wrote: > On Tue, May 08, 2012 at 18:53:54, Ivan Djelic wrote: > > On Tue, May 08, 2012 at 01:33:06PM +0100, Philip, Avinash wrote: > > > Hi, > > > > > > We are having an 8-bit NAND part (MT29F2G08ABAEAWP from Micron) > > > connected to GPMC Module (General purpose memory controller) from TI. > > > > Hi, > > How is ecc performed ? > > Using NAND internal ecc ? or with GPMC 1-bit Hamming ? 4-bit/8-bit BCH ? > > Which version of omap2 driver are you using ? > > Is OOB also ECC-protected ? > > Hardware ECC is performing. > 4-bit BCH ECC scheme is used. > I am using omap2 driver in Linux 3.2.0 Kernel. Don't know omap2 driver version. You are probably using a patched kernel, since 3.2.0 does not have GPMC BCH support ?! What is your ecc layout ? Does it expose oobfree regions ? > No, OOB is not ECC protected. Well, in that case, isn't it normal that mtd_oobtest should fail if there happens to be a single bitflip in the available OOB area ? (of size mtd->ecclayout->oobavail for each page) BR, -- Ivan
On Tue, 8 May 2012, Philip, Avinash wrote: > I add debugs and found bit flips from 1 to 0. The location of bit flips might vary on > boards. But on the same board it is consistent. > >>> To verify further we had tried writing zeros to OOB area and read it back. >>> This test is passing and confirms that all OOB bits (that are >>> programmable) are not bad. >> >> It does not confirm anything, bits can fail by remaining stuck at 0. >> > > As bit flip is from 1 to 0, you are right. But same experiment with 0x55, > test is passing. Are you sure it's not the bits that happen to be 0 in 0x55 that are the ones suspected of flipping? /Ricard
On Wed, May 09, 2012 at 00:15:16, Ivan Djelic wrote: > On Tue, May 08, 2012 at 04:09:46PM +0100, Philip, Avinash wrote: > > On Tue, May 08, 2012 at 18:53:54, Ivan Djelic wrote: > > > On Tue, May 08, 2012 at 01:33:06PM +0100, Philip, Avinash wrote: > > > > Hi, > > > > > > > > We are having an 8-bit NAND part (MT29F2G08ABAEAWP from Micron) > > > > connected to GPMC Module (General purpose memory controller) from TI. > > > > > > Hi, > > > How is ecc performed ? > > > Using NAND internal ecc ? or with GPMC 1-bit Hamming ? 4-bit/8-bit BCH ? > > > Which version of omap2 driver are you using ? > > > Is OOB also ECC-protected ? > > > > Hardware ECC is performing. > > 4-bit BCH ECC scheme is used. > > I am using omap2 driver in Linux 3.2.0 Kernel. Don't know omap2 driver version. > > You are probably using a patched kernel, since 3.2.0 does not have GPMC BCH support ?! > What is your ecc layout ? Does it expose oobfree regions ? > Yes, we had using patched kernel. OOB free region is exposed. ECC layout will be as follows. 0-1 -> BAD block marking 2-57 -> ECC byte position, ( 14 bytes for 512 byte) 58-63 -> oob free bytes mtd->ecclayout->eccbytes = 56 mtd->ecclayout->eccpos[0] = 2 mtd->ecclayout->oobavail = 6 mtd->ecclayout->oobfree[0].offset = 58 mtd->ecclayout->oobfree[0].length = 6 Regards Avinash > > No, OOB is not ECC protected. > > Well, in that case, isn't it normal that mtd_oobtest should fail if there happens to be a single bitflip in the available OOB area ? (of size mtd->ecclayout->oobavail for each page) > > BR, > -- > Ivan >
On Wed, May 09, 2012 at 04:12:05PM +0100, Philip, Avinash wrote: > On Wed, May 09, 2012 at 00:15:16, Ivan Djelic wrote: > > On Tue, May 08, 2012 at 04:09:46PM +0100, Philip, Avinash wrote: > > > On Tue, May 08, 2012 at 18:53:54, Ivan Djelic wrote: > > > > On Tue, May 08, 2012 at 01:33:06PM +0100, Philip, Avinash wrote: > > > > > Hi, > > > > > > > > > > We are having an 8-bit NAND part (MT29F2G08ABAEAWP from Micron) > > > > > connected to GPMC Module (General purpose memory controller) from TI. > > > > > > > > Hi, > > > > How is ecc performed ? > > > > Using NAND internal ecc ? or with GPMC 1-bit Hamming ? 4-bit/8-bit BCH ? > > > > Which version of omap2 driver are you using ? > > > > Is OOB also ECC-protected ? > > > > > > Hardware ECC is performing. > > > 4-bit BCH ECC scheme is used. > > > I am using omap2 driver in Linux 3.2.0 Kernel. Don't know omap2 driver version. > > > > You are probably using a patched kernel, since 3.2.0 does not have GPMC BCH support ?! > > What is your ecc layout ? Does it expose oobfree regions ? > > > > Yes, we had using patched kernel. OOB free region is exposed. > > ECC layout will be as follows. > > 0-1 -> BAD block marking > 2-57 -> ECC byte position, ( 14 bytes for 512 byte) > 58-63 -> oob free bytes > > mtd->ecclayout->eccbytes = 56 > mtd->ecclayout->eccpos[0] = 2 > mtd->ecclayout->oobavail = 6 > mtd->ecclayout->oobfree[0].offset = 58 > mtd->ecclayout->oobfree[0].length = 6 > OK, then it is quite normal that mtd_oobtest should fail when it encounters a bitflip (one that does not match the programmed data) in those unprotected 6 bytes (58-63). What do you think ? BR, -- Ivan
On Wed, May 09, 2012 at 20:54:37, Ivan Djelic wrote: > On Wed, May 09, 2012 at 04:12:05PM +0100, Philip, Avinash wrote: > > On Wed, May 09, 2012 at 00:15:16, Ivan Djelic wrote: > > > On Tue, May 08, 2012 at 04:09:46PM +0100, Philip, Avinash wrote: > > > > On Tue, May 08, 2012 at 18:53:54, Ivan Djelic wrote: > > > > > On Tue, May 08, 2012 at 01:33:06PM +0100, Philip, Avinash wrote: > > > > > > Hi, > > > > > > > > > > > > We are having an 8-bit NAND part (MT29F2G08ABAEAWP from > > > > > > Micron) connected to GPMC Module (General purpose memory controller) from TI. > > > > > > > > > > Hi, > > > > > How is ecc performed ? > > > > > Using NAND internal ecc ? or with GPMC 1-bit Hamming ? 4-bit/8-bit BCH ? > > > > > Which version of omap2 driver are you using ? > > > > > Is OOB also ECC-protected ? > > > > > > > > Hardware ECC is performing. > > > > 4-bit BCH ECC scheme is used. One correction: we are using 8-bit BCH ECC scheme is used. > > > > I am using omap2 driver in Linux 3.2.0 Kernel. Don't know omap2 driver version. > > > > > > You are probably using a patched kernel, since 3.2.0 does not have GPMC BCH support ?! > > > What is your ecc layout ? Does it expose oobfree regions ? > > > > > > > Yes, we had using patched kernel. OOB free region is exposed. > > > > ECC layout will be as follows. > > > > 0-1 -> BAD block marking > > 2-57 -> ECC byte position, ( 14 bytes for 512 byte) > > 58-63 -> oob free bytes > > > > mtd->ecclayout->eccbytes = 56 > > mtd->ecclayout->eccpos[0] = 2 > > mtd->ecclayout->oobavail = 6 > > mtd->ecclayout->oobfree[0].offset = 58 > > mtd->ecclayout->oobfree[0].length = 6 > > > > OK, then it is quite normal that mtd_oobtest should fail when it encounters a bitflip (one that does not match the programmed data) in those unprotected 6 bytes (58-63). What do you think ? Is this behavior is expected for which OOB area left unprotected? (I am not sure, What I understood is with failure in OOB area, ECC won't be useful. Is it ideally we should have ECC protection for OOB area also required?) Basically I am testing why bit flips is happening in OOB area. Some observation related to mtd_oob test in the setup we are having is 1. Modify mtd_oob test to write patterns (0x0, 0x55, 0xAA, 0xff), then test is getting passed for all patterns. 2. On inserting a delay of 10 ms after erase_whole_device() in mtd oob test, test is getting passed. I can't correlate how test is getting passed on modifying pattern as we are covering all bits in either of the patterns. On inserting delay test is getting passed, will point to me some problems in command issue. I am debugging on this. Any suggestions will be helpful. Thanks Avinash
On Wed, May 09, 2012 at 04:46:17PM +0100, Philip, Avinash wrote: > > > > > > Yes, we had using patched kernel. OOB free region is exposed. > > > > > > ECC layout will be as follows. > > > > > > 0-1 -> BAD block marking > > > 2-57 -> ECC byte position, ( 14 bytes for 512 byte) > > > 58-63 -> oob free bytes > > > > > > mtd->ecclayout->eccbytes = 56 > > > mtd->ecclayout->eccpos[0] = 2 > > > mtd->ecclayout->oobavail = 6 > > > mtd->ecclayout->oobfree[0].offset = 58 > > > mtd->ecclayout->oobfree[0].length = 6 > > > > > > > OK, then it is quite normal that mtd_oobtest should fail when it encounters a bitflip (one that does not match the programmed data) in those unprotected 6 bytes (58-63). What do you think ? > > > Is this behavior is expected for which OOB area left unprotected? > (I am not sure, What I understood is with failure in OOB area, ECC won't be useful. Yes. > Is it ideally we should have ECC protection for OOB area also required?) There is no need for ECC protection on free oob bytes if you do not use them. > Basically I am testing why bit flips is happening in OOB area. Some observation related > to mtd_oob test in the setup we are having is > 1. Modify mtd_oob test to write patterns (0x0, 0x55, 0xAA, 0xff), then test is getting passed > for all patterns. OK, strange. > 2. On inserting a delay of 10 ms after erase_whole_device() in mtd oob test, test is getting passed. > I can't correlate how test is getting passed on modifying pattern as we are covering all bits in > either of the patterns. > > On inserting delay test is getting passed, will point to me some problems in command issue. I am > debugging on this. > OK. If you are relying on a R/nB pin to wait for operation completion, you might want to check that is works properly. BR,
On Wed, 9 May 2012, Philip, Avinash wrote: > Basically I am testing why bit flips is happening in OOB area. Some observation related > to mtd_oob test in the setup we are having is > 1. Modify mtd_oob test to write patterns (0x0, 0x55, 0xAA, 0xff), then test is getting passed > for all patterns. > 2. On inserting a delay of 10 ms after erase_whole_device() in mtd oob test, test is getting passed. > > I can't correlate how test is getting passed on modifying pattern as we are covering all bits in > either of the patterns. > > On inserting delay test is getting passed, will point to me some problems in command issue. I am > debugging on this. We've had problems with bus buffers between the CPU and flash not being fast enough. The symptoms were similar to what is described above, certain patterns would fail, whereas others wouldn't. When a given byte failed, it depended on what the previous byte on the bus was if I remember correctly, in some pattern that we never bothered do find out. We upgraded the bus drivers and changed the CPU timing towards the flash which cleared up the problem. One symptom was that the probability of errors changed drastically when the temperature was changed. It also varied a lot between individual devices. /Ricard
On Tue, 2012-05-08 at 12:33 +0000, Philip, Avinash wrote: > static inline unsigned int simple_rand(void) > { > - next = next * 1103515245 + 12345; > + next = next * 1103515244 + 12345; /* 45 -> 44. Sequence is changed */ > return (unsigned int)((next / 65536) % 32768); I do not really understand this modification, but we should start using the generic linux 'random32()' function instead of this home-brewed one, I guess.
diff --git a/drivers/mtd/tests/mtd_oobtest.c b/drivers/mtd/tests/mtd_oobtest.c index 933f7e5..9f118de 100644 --- a/drivers/mtd/tests/mtd_oobtest.c +++ b/drivers/mtd/tests/mtd_oobtest.c @@ -50,7 +50,7 @@ static unsigned long next = 1; static inline unsigned int simple_rand(void) { - next = next * 1103515245 + 12345; + next = next * 1103515244 + 12345; /* 45 -> 44. Sequence is changed */ return (unsigned int)((next / 65536) % 32768); }