Patchwork [v1,1/5] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes

login
register
mail settings
Submitter pekon gupta
Date July 15, 2013, 2:55 p.m.
Message ID <1373900152-26885-2-git-send-email-pekon@ti.com>
Download mbox | patch
Permalink /patch/259070/
State New
Headers show

Comments

pekon gupta - July 15, 2013, 2:55 p.m.
chip->ecc.correct() is used for detecting and correcting bit-flips during read
operations. In omap2-nand driver this is done usingt following functions:

- omap_correct_data(): for H/W based HAM1_ECC schemes
	(Un-Touched in current patch)

- omap_elm_correct_data(): for H/W based BCHx_ECC scheme
	Current implementation of this function is not scalable for newer ECC
	schemes because:
	- It depends on a specific byte-position in OOB area (reserved as 0x00)
	  to differentiates between programmed-pages and erased-pages.
	  This reserved byte-position cannot be accomodated in all ECC schemes.
	- Current code is not scalable for future ECC schemes due to tweaks for
	  BCH4_ECC and BCH8_ECC at multiple places.
	- It checks for bit-flips in Erased-pages using check_erased_page().
	  This is over-work, as sanity of Erased-page can be verified by just
	  comparing them to a pre-defined ECC-syndrome for all_0xFF data.

	This patch optimizes omap_elm_correct_data() in following ways:
	(1) Removes dependency on specific reserved-byte (0x00) in OOB area,
	    instead Erased-page is identified by matching calc_ecc with a
	    pre-defined ECC syndrome of all(0xFF) data
	(2) merges common code for BCH4_ECC and BCH8_ECC for scalability.
	(3) handles incorrect elm_error_location beyond data+oob buffer.
	(4) removes check_erased_page(): Bit-flips in erased-page are handled
	    in same way as for programmed-page

Signed-off-by: Pekon Gupta <pekon@ti.com>
---
 drivers/mtd/nand/omap2.c | 249 ++++++++++++++---------------------------------
 1 file changed, 75 insertions(+), 174 deletions(-)
avinash philip - July 16, 2013, 6:46 a.m.
Hi Pekon,

On Mon, Jul 15, 2013 at 8:25 PM, Pekon Gupta <pekon@ti.com> wrote:
> chip->ecc.correct() is used for detecting and correcting bit-flips during read
> operations. In omap2-nand driver this is done usingt following functions:
>
> - omap_correct_data(): for H/W based HAM1_ECC schemes
>         (Un-Touched in current patch)
>
> - omap_elm_correct_data(): for H/W based BCHx_ECC scheme
>         Current implementation of this function is not scalable for newer ECC
>         schemes because:
>         - It depends on a specific byte-position in OOB area (reserved as 0x00)
>           to differentiates between programmed-pages and erased-pages.
>           This reserved byte-position cannot be accomodated in all ECC schemes.
>         - Current code is not scalable for future ECC schemes due to tweaks for
>           BCH4_ECC and BCH8_ECC at multiple places.
>         - It checks for bit-flips in Erased-pages using check_erased_page().
>           This is over-work, as sanity of Erased-page can be verified by just
>           comparing them to a pre-defined ECC-syndrome for all_0xFF data.

This is correct if the erased page read back didn't have any bit flips.
If erased page result in bit flips, then this method will end up in
un-correctable
error. Can you please confirm you taken care erased page bit flips in this
modification?

I think below discussion would help.

https://lkml.org/lkml/2012/11/22/590

Thanks
Avinash
pekon gupta - July 16, 2013, 7:47 a.m.
> 
> Hi Pekon,
> 
> On Mon, Jul 15, 2013 at 8:25 PM, Pekon Gupta <pekon@ti.com> wrote:
> > chip->ecc.correct() is used for detecting and correcting bit-flips during read
> > operations. In omap2-nand driver this is done usingt following functions:
> >
> > - omap_correct_data(): for H/W based HAM1_ECC schemes
> >         (Un-Touched in current patch)
> >
> > - omap_elm_correct_data(): for H/W based BCHx_ECC scheme
> >         Current implementation of this function is not scalable for newer ECC
> >         schemes because:
> >         - It depends on a specific byte-position in OOB area (reserved as 0x00)
> >           to differentiates between programmed-pages and erased-pages.
> >           This reserved byte-position cannot be accomodated in all ECC
> schemes.
> >         - Current code is not scalable for future ECC schemes due to tweaks for
> >           BCH4_ECC and BCH8_ECC at multiple places.
> >         - It checks for bit-flips in Erased-pages using check_erased_page().
> >           This is over-work, as sanity of Erased-page can be verified by just
> >           comparing them to a pre-defined ECC-syndrome for all_0xFF data.
> 
> This is correct if the erased page read back didn't have any bit flips.
> If erased page result in bit flips, then this method will end up in
> un-correctable
> error. Can you please confirm you taken care erased page bit flips in this
> modification?
> 
> I think below discussion would help.
> 
> https://lkml.org/lkml/2012/11/22/590
> 

Yes, that is correct.. Please refer to latest discussion on this topic.
http://permalink.gmane.org/gmane.linux.drivers.mtd/46821

So, if a erased-page has any bit-flips, nand_read() would return it as
 EBADMSG (uncorrectable errors). I understand this is an over-cautious
 approach, but there are reasons for this:
(1) I din't wanted to burden the Read data-path with extra checks for 
finding and correcting bit-flips on erased-pages.

(2) Most File-Systems, if find un-correctable errors in erased-page 
would first torture_peb() by iteratively erasing and  checking for all(0xFF).
 So if the bit-flips were temporary they would go-away. And block would
 not be marked bad.

(3) Using erased-page with 'correctable bit-flips' for storing data makes
data on flash more vulnerable to early failures. Suppose using BCH8 
ECC scheme which has capability of correcting 8-bits per ecc.size. 
if my File-System uses an erased-page which already had 3-bit-flips 
before even it was written. So this leaves me with only 8-3 = 5 bit-flips
possibility before my data is rendered un-correctable and corrupt.
So, why not to use a erased-page which is not having any bit-flips at all ?
(And frequency of having bit-flips in erased-page would increase as
device ages)

I think from a user's point-of-view it would be better to get..
(a) more tolerance of bit-flips  + better read performance
v/s
(b) saving some blocks from getting re-erased.

Comments ?

with regards, pekon

> Thanks
> Avinash

Patch

diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index e9ef743..5321bdcc 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -176,6 +176,7 @@  struct omap_nand_info {
 	int				gpmc_cs;
 	unsigned long			phys_base;
 	unsigned long			mem_size;
+	enum omap_ecc			ecc_opt;
 	struct completion		comp;
 	struct dma_chan			*dma;
 	int				gpmc_irq_fifo;
@@ -1307,219 +1308,118 @@  static int omap3_calculate_ecc_bch(struct mtd_info *mtd, const u_char *dat,
 }
 
 /**
- * erased_sector_bitflips - count bit flips
- * @data:	data sector buffer
- * @oob:	oob buffer
- * @info:	omap_nand_info
- *
- * Check the bit flips in erased page falls below correctable level.
- * If falls below, report the page as erased with correctable bit
- * flip, else report as uncorrectable page.
- */
-static int erased_sector_bitflips(u_char *data, u_char *oob,
-		struct omap_nand_info *info)
-{
-	int flip_bits = 0, i;
-
-	for (i = 0; i < info->nand.ecc.size; i++) {
-		flip_bits += hweight8(~data[i]);
-		if (flip_bits > info->nand.ecc.strength)
-			return 0;
-	}
-
-	for (i = 0; i < info->nand.ecc.bytes - 1; i++) {
-		flip_bits += hweight8(~oob[i]);
-		if (flip_bits > info->nand.ecc.strength)
-			return 0;
-	}
-
-	/*
-	 * Bit flips falls in correctable level.
-	 * Fill data area with 0xFF
-	 */
-	if (flip_bits) {
-		memset(data, 0xFF, info->nand.ecc.size);
-		memset(oob, 0xFF, info->nand.ecc.bytes);
-	}
-
-	return flip_bits;
-}
-
-/**
  * omap_elm_correct_data - corrects page data area in case error reported
  * @mtd:	MTD device structure
  * @data:	page data
  * @read_ecc:	ecc read from nand flash
- * @calc_ecc:	ecc read from HW ECC registers
- *
- * Calculated ecc vector reported as zero in case of non-error pages.
- * In case of error/erased pages non-zero error vector is reported.
- * In case of non-zero ecc vector, check read_ecc at fixed offset
- * (x = 13/7 in case of BCH8/4 == 0) to find page programmed or not.
- * To handle bit flips in this data, count the number of 0's in
- * read_ecc[x] and check if it greater than 4. If it is less, it is
- * programmed page, else erased page.
- *
- * 1. If page is erased, check with standard ecc vector (ecc vector
- * for erased page to find any bit flip). If check fails, bit flip
- * is present in erased page. Count the bit flips in erased page and
- * if it falls under correctable level, report page with 0xFF and
- * update the correctable bit information.
- * 2. If error is reported on programmed page, update elm error
- * vector and correct the page with ELM error correction routine.
- *
+ * @calc_ecc:	ecc calculated after reading Data and OOB regions from flash
+ * As calc_ecc is calculated over both main & oob, so calc_ecc would be
+ * non-zero only in following cases:
+ * - bit-flips in data or oob region
+ * - erase page, where no ECC is written in OOB area
+ *   However, erased_pages can be differentiated from corrupted pages
+ *   by comparing the calculated ECC with pre-defined syndrome ECC_of_ALL(0xFF)
+ *   Bit-flips in erased-pages would also be caught by comparing, calc_ecc
+ *   with ECC_of_ALL(0xFF)
  */
 static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 				u_char *read_ecc, u_char *calc_ecc)
 {
 	struct omap_nand_info *info = container_of(mtd, struct omap_nand_info,
 			mtd);
-	int eccsteps = info->nand.ecc.steps;
-	int i , j, stat = 0;
-	int eccsize, eccflag, ecc_vector_size;
+	enum omap_ecc ecc_opt = info->ecc_opt;
+	struct nand_chip *chip = mtd->priv;
+	int eccsteps = chip->ecc.steps;
+	int eccsize  = chip->ecc.size;
+	int eccbytes = chip->ecc.bytes;
+	int i , j, stat = 0, ret = 0, flag_read_ecc;
 	struct elm_errorvec err_vec[ERROR_VECTOR_MAX];
-	u_char *ecc_vec = calc_ecc;
-	u_char *spare_ecc = read_ecc;
-	u_char *erased_ecc_vec;
-	enum bch_ecc type;
+	u_char *ecc;
 	bool is_error_reported = false;
+	u32 bit_pos, byte_pos, error_max, pos;
 
 	/* Initialize elm error vector to zero */
 	memset(err_vec, 0, sizeof(err_vec));
 
-	if (info->nand.ecc.strength == BCH8_MAX_ERROR) {
-		type = BCH8_ECC;
-		erased_ecc_vec = bch8_vector;
-	} else {
-		type = BCH4_ECC;
-		erased_ecc_vec = bch4_vector;
-	}
-
-	ecc_vector_size = info->nand.ecc.bytes;
-
-	/*
-	 * Remove extra byte padding for BCH8 RBL
-	 * compatibility and erased page handling
-	 */
-	eccsize = ecc_vector_size - 1;
-
 	for (i = 0; i < eccsteps ; i++) {
-		eccflag = 0;	/* initialize eccflag */
-
-		/*
-		 * Check any error reported,
-		 * In case of error, non zero ecc reported.
-		 */
-
-		for (j = 0; (j < eccsize); j++) {
-			if (calc_ecc[j] != 0) {
-				eccflag = 1; /* non zero ecc, error present */
+		flag_read_ecc = 0;
+		ecc = calc_ecc + (i * eccbytes);
+		/* check calc_ecc */
+		for (j = 0; j < eccbytes; j++) {
+			if (*(ecc + j) != 0x00) {
+				flag_read_ecc = 1;
 				break;
 			}
 		}
-
-		if (eccflag == 1) {
-			/*
-			 * Set threshold to minimum of 4, half of ecc.strength/2
-			 * to allow max bit flip in byte to 4
-			 */
-			unsigned int threshold = min_t(unsigned int, 4,
-					info->nand.ecc.strength / 2);
-
-			/*
-			 * Check data area is programmed by counting
-			 * number of 0's at fixed offset in spare area.
-			 * Checking count of 0's against threshold.
-			 * In case programmed page expects at least threshold
-			 * zeros in byte.
-			 * If zeros are less than threshold for programmed page/
-			 * zeros are more than threshold erased page, either
-			 * case page reported as uncorrectable.
-			 */
-			if (hweight8(~read_ecc[eccsize]) >= threshold) {
-				/*
-				 * Update elm error vector as
-				 * data area is programmed
-				 */
-				err_vec[i].error_reported = true;
-				is_error_reported = true;
-			} else {
-				/* Error reported in erased page */
-				int bitflip_count;
-				u_char *buf = &data[info->nand.ecc.size * i];
-
-				if (memcmp(calc_ecc, erased_ecc_vec, eccsize)) {
-					bitflip_count = erased_sector_bitflips(
-							buf, read_ecc, info);
-
-					if (bitflip_count)
-						stat += bitflip_count;
-					else
-						return -EINVAL;
-				}
+		/* check if its a erased-page */
+		if (flag_read_ecc) {
+			switch (ecc_opt) {
+			case OMAP_ECC_BCH8_CODE_HW:
+				if (memcmp(ecc, bch8_vector, eccbytes))
+					err_vec[i].error_reported = true;
+				break;
+			case OMAP_ECC_BCH4_CODE_HW:
+				if (memcmp(ecc, bch4_vector, eccbytes))
+					err_vec[i].error_reported = true;
+				break;
+			default:
+				pr_err("%s: invalid configuration",
+								 DRIVER_NAME);
+				return -EINVAL;
 			}
 		}
-
-		/* Update the ecc vector */
-		calc_ecc += ecc_vector_size;
-		read_ecc += ecc_vector_size;
+		/* page definitely has bit-flips */
+		if (err_vec[i].error_reported)
+			is_error_reported = true;
 	}
 
-	/* Check if any error reported */
 	if (!is_error_reported)
 		return 0;
+	/* detect bit-flips using ELM module */
+	elm_decode_bch_error_page(info->elm_dev, calc_ecc, err_vec);
 
-	/* Decode BCH error using ELM module */
-	elm_decode_bch_error_page(info->elm_dev, ecc_vec, err_vec);
-
+	/* correct bit-flip */
 	for (i = 0; i < eccsteps; i++) {
-		if (err_vec[i].error_reported) {
+		if (err_vec[i].error_uncorrectable) {
+			ret = -EBADMSG;
+		} else if (err_vec[i].error_reported) {
 			for (j = 0; j < err_vec[i].error_count; j++) {
-				u32 bit_pos, byte_pos, error_max, pos;
-
-				if (type == BCH8_ECC)
-					error_max = BCH8_ECC_MAX;
-				else
-					error_max = BCH4_ECC_MAX;
-
-				if (info->nand.ecc.strength == BCH8_MAX_ERROR)
-					pos = err_vec[i].error_loc[j];
-				else
+				switch (ecc_opt) {
+				case OMAP_ECC_BCH4_CODE_HW:
+					error_max = SECTOR_BYTES +
+							(eccbytes - 1);
 					/* Add 4 to take care 4 bit padding */
 					pos = err_vec[i].error_loc[j] +
-						BCH4_BIT_PAD;
-
-				/* Calculate bit position of error */
+							BCH4_BIT_PAD;
+					break;
+				case OMAP_ECC_BCH8_CODE_HW:
+					error_max = SECTOR_BYTES +
+							(eccbytes - 1);
+					pos = err_vec[i].error_loc[j];
+					break;
+				default:
+					return -EINVAL;
+				}
+				/* Calculate bit & byte bit-flip position */
 				bit_pos = pos % 8;
-
-				/* Calculate byte position of error */
-				byte_pos = (error_max - pos - 1) / 8;
-
-				if (pos < error_max) {
-					if (byte_pos < 512)
-						data[byte_pos] ^= 1 << bit_pos;
-					else
-						spare_ecc[byte_pos - 512] ^=
+				byte_pos = error_max - (pos / 8) - 1;
+				if (byte_pos < SECTOR_BYTES)
+					data[byte_pos] ^= 1 << bit_pos;
+				else if (byte_pos < error_max)
+					read_ecc[byte_pos - SECTOR_BYTES] ^=
 							1 << bit_pos;
-				}
-				/* else, not interested to correct ecc */
+				else
+					ret = -EBADMSG;
 			}
 		}
-
 		/* Update number of correctable errors */
 		stat += err_vec[i].error_count;
-
 		/* Update page data with sector size */
-		data += info->nand.ecc.size;
-		spare_ecc += ecc_vector_size;
+		data	 += eccsize;
+		read_ecc += eccbytes;
 	}
 
-	for (i = 0; i < eccsteps; i++)
-		/* Return error if uncorrectable error present */
-		if (err_vec[i].error_uncorrectable)
-			return -EINVAL;
-
-	return stat;
+	return (ret < 0) ? ret : stat;
 }
 
 /**
@@ -1677,13 +1577,14 @@  static int omap_nand_probe(struct platform_device *pdev)
 	info->pdev		= pdev;
 	info->gpmc_cs		= pdata->cs;
 	info->reg		= pdata->reg;
+	info->ecc_opt		= pdata->ecc_opt;
 
 	info->nand.options	= NAND_BUSWIDTH_AUTO;
 	info->nand.options	|= NAND_SKIP_BBTSCAN;
+	info->nand.ecc.priv	= NULL;
 #ifdef CONFIG_MTD_NAND_OMAP_BCH
 	info->of_node		= pdata->of_node;
 #endif
-	info->nand.ecc.priv			= NULL;
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	if (res == NULL) {