Message ID | 0bc8cec13bcf5b9cbea9cd3345815e4a@agner.ch |
---|---|
State | New |
Headers | show |
>>>> On 17 Sep 2014, stefan@agner.ch wrote: >>> Yes, we are using Macronix SLC NAND. >>>> On 17 Sep 2014, stefan@agner.ch wrote: >>> This is a new device, but its one out of several dozens. The device >>> had two factory marked bad page. This four page would then be 6 bad >>> pages. I would say that your guess is probably the case at hand >>> (should be considered bad, but were marked by factory). On 10 Dec 2014, stefan@agner.ch wrote: > What I currently did, is just accept strength / 2 bits. This is not a > clean solution since it will also count the ECC bits, but it works for > now: > --- a/drivers/mtd/nand/fsl_nfc.c > +++ b/drivers/mtd/nand/fsl_nfc.c > @@ -524,7 +524,7 @@ static int nfc_correct_data(struct mtd_info *mtd, > u_char *dat, > flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count); > > /* ECC failed. */ > - if (flip > ecc_count) > + if (flip > ecc_count && flip > (nfc->chip.ecc.strength / 2)) > return -1; > > /* Erased page. */ > I think we are facing multiple issues here. One might contain general > software/hardware issues (non bit-flip related). I had this issue > again on a different module with 3.18-rc5 (without the "fix" > above). The kernel output looks like this: [snip] > Interesting is that this error happens every second PEB (every 128 > page, but erase block size is 64) and it is always the second page. On > that device, this is completely reproduceable, e.g. I can erase > everything and flash it again, the same happens. > I dumped the block in question: > Page 00240800 dump: > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > .... > ff ff ff ff ff ff ff ff f7 ff ff ff ff ff ff ff > .... > ff ff ff ff ff ff ff ff ff ff fb ff ff ff ff ff > .... > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff f7 > .... > I also printed flip count and ecc_count the values for all those pages > are: flip 3, ecc_count 2 > Now the interesting part: When I erase the block, and dump that page > again, it is completely empty! No flips, no ecc_count anymore! UBI > attach writes something into the first page, hence it looks like this > write into the first page influences the values of the second > page... I verified this behavior this using U-Boot and the Linux > kernel. > I digged a bit deeper, and wrote just zeros into the first page. In > the second page some bits are flipped. However, writing into the > second page does not influence the third page. But a bit in the first > page is flipped. And the third page influences the forth page. It > looks like the pages behave in pairs.... Any idea what kind of issue > we are facing here? Hmm. It sounds like MLC flash, but you say you have SLC. It could be that some bus signalling is marginal? Could you reduce the clocks a bit on this device and see if the behaviour changes? I am pretty sure that stuck-at-zero errors will stay that way. I would love to get back to this controller code to fix some issues you noted and bring in the changes to the u-boot review. Unfortunately, I keep getting stuck with legacy hw issues. fwiw, Bill.
On 28 Feb 2015, stefan@agner.ch wrote: > The flash chip mentioned above requires 8-bit error correction per 512 > byte block, hence I increased the ECC to the maximum available level > (60-byte ECC, see page below). One thing which is not very nice, in > order to fit the 60-byte ECC into the 64-byte OOB, I had to shorten > the BBT pattern and set it at the very beginning of the page. This > works fine, however this basically sets the page also to factory bad, > I'm not sure if this is ok? Otherwise, we also could use a BBT pattern > of length 1 (used by cafe_nand.c too). I guess that is a DT option? I wouldn't be an expert on this. So submitting it to the linux-mtd is good. I am also not sure if the HW ECC will work with 'sub-pages'. I think a college of your at Toradex submitted a patch to the u-boot. I am pretty sure that it could work with software ECC, but maybe disabling it is easiest. > What do you think? I would like to respin the NFC patch, with my > U-Boot changes and this change included... Please go ahead. Markus M <marb@ixxat.de> is also using the fsl_nfc driver on a Freescale MPC5125 board, so it is probably good to copy your patches to him. At least, he can test on a BE platform. People also complained about JFFS and this version of the driver. I didn't investigate that. Thanks, Bill Pringlemeir.
On 03/02/2015 08:05 AM, Bill Pringlemeir wrote: > On 28 Feb 2015, stefan@agner.ch wrote: > >> The flash chip mentioned above requires 8-bit error correction per 512 >> byte block, hence I increased the ECC to the maximum available level >> (60-byte ECC, see page below). One thing which is not very nice, in >> order to fit the 60-byte ECC into the 64-byte OOB, I had to shorten >> the BBT pattern and set it at the very beginning of the page. This >> works fine, however this basically sets the page also to factory bad, >> I'm not sure if this is ok? Otherwise, we also could use a BBT pattern >> of length 1 (used by cafe_nand.c too). > I guess that is a DT option? I wouldn't be an expert on this. So > submitting it to the linux-mtd is good. > > I am also not sure if the HW ECC will work with 'sub-pages'. I think a > college of your at Toradex submitted a patch to the u-boot. I am pretty > sure that it could work with software ECC, but maybe disabling it is > easiest. > >> What do you think? I would like to respin the NFC patch, with my >> U-Boot changes and this change included... > Please go ahead. Markus M <marb@ixxat.de> is also using the fsl_nfc > driver on a Freescale MPC5125 board, so it is probably good to copy your > patches to him. At least, he can test on a BE platform. > > People also complained about JFFS and this version of the driver. I > didn't investigate that. I also noticed a problem with JFFS2 and the driver, where fs changes were lost after reboot. Didn't investigate, switched to UBIFS..
On 2015-03-02 22:39, Aaron Brice wrote: > On 03/02/2015 08:05 AM, Bill Pringlemeir wrote: >> On 28 Feb 2015, stefan@agner.ch wrote: >> >>> The flash chip mentioned above requires 8-bit error correction per 512 >>> byte block, hence I increased the ECC to the maximum available level >>> (60-byte ECC, see page below). One thing which is not very nice, in >>> order to fit the 60-byte ECC into the 64-byte OOB, I had to shorten >>> the BBT pattern and set it at the very beginning of the page. This >>> works fine, however this basically sets the page also to factory bad, >>> I'm not sure if this is ok? Otherwise, we also could use a BBT pattern >>> of length 1 (used by cafe_nand.c too). >> I guess that is a DT option? I wouldn't be an expert on this. So >> submitting it to the linux-mtd is good. >> >> I am also not sure if the HW ECC will work with 'sub-pages'. I think a >> college of your at Toradex submitted a patch to the u-boot. I am pretty >> sure that it could work with software ECC, but maybe disabling it is >> easiest. >> >>> What do you think? I would like to respin the NFC patch, with my >>> U-Boot changes and this change included... >> Please go ahead. Markus M <marb@ixxat.de> is also using the fsl_nfc >> driver on a Freescale MPC5125 board, so it is probably good to copy your >> patches to him. At least, he can test on a BE platform. >> >> People also complained about JFFS and this version of the driver. I >> didn't investigate that. > > I also noticed a problem with JFFS2 and the driver, where fs changes > were lost after reboot. Didn't investigate, switched to UBIFS.. Did you happen to use HW ECC? The controller seems to use byte 19 onwards to store the ECC bytes, where also JFFS2 stores it's meta data when using a NAND chip with 64-byte OOB (see http://www.linux-mtd.infradead.org/doc/nand.html). However, I think UBI/UBIFS is a good choice anyway. -- Stefan
--- a/drivers/mtd/nand/fsl_nfc.c +++ b/drivers/mtd/nand/fsl_nfc.c @@ -524,7 +524,7 @@ static int nfc_correct_data(struct mtd_info *mtd, u_char *dat, flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count); /* ECC failed. */ - if (flip > ecc_count) + if (flip > ecc_count && flip > (nfc->chip.ecc.strength / 2)) return -1;