diff mbox

MTD-NAND: Changes to read_page APIs to support NAND_ECC_HW_SYNDROME mode on TI DaVinci DM355

Message ID 1238601784-9563-1-git-send-email-nsnehaprabha@ti.com
State New, archived
Headers show

Commit Message

nsnehaprabha@ti.com April 1, 2009, 4:03 p.m. UTC
From: Sneha Narnakaje <nsnehaprabha@ti.com>

The NAND controller on TI DaVinci DM355 supports NAND devices with large page size (2K and 4K), but the HW ECC is handled for every 512byte read/write chunks. The current HW_SYNDROME read_page/write_page APIs in the NAND core (nand_base) use the "infix OOB" scheme. The core APIs overwrite NAND manufacturer's bad block meta data, thus complicating the jobs of non-Linux NAND programmers (end equipment manufacturering). These APIs also imply ECC protection for the prepad bytes, causing nand raw_write operations to fail.
So the TI DaVinci NAND driver overrides these APIs for DaVinci DM355. The read_page APIs required "page" to be passed for reading the whole OOB area first and then data in chunks of 512bytes. The ECC status is checked after reading every 512byte chunk and passing the ECC syndrome.
Since there is a change to read_page/read_page_raw API definitions, other NAND drivers overriding these APIs have also been changed (to address compilation problems). 

This patch also changes the way oobavail is calculated using the oobfree[].length and the way oobfree[] is used. The oobfree[] is defined as an array of MTD_MAX_OOBFREE_ENTRIES size (8). If we initialize all 8 entries, then the current for loop which has a chance to go beyond the array limit thus causing incorrect oobavail value. The NAND core and other NAND drivers do not require upto 8 oobfree entries, but the TI DaVinci NAND driver requires upto 8 oobfree entries to support upto 4K page size. The check for MTD_MAX_OOBFREE_ENTREIES is already there in the drivers/mtd/onenand/onenand_base.c, which is based on drivers/mtd/nand/nand_base.c. There is also a check for ecc.steps to make the oobfree[] backward compatible with 2K page size. The max oob size and page size have been adjusted for the 4K+128 page size. Supporting up to 4K page with NAND_ECC_HW_SYNDROME does not require changes to include/mtd/mtd-abi.h (breaks user space IOCTL interface).

Signed-off-by: Sandeep Paulraj <s-paulraj@ti.com>
Signed-off-by: Sneha Narnakaje <nsnehaprabha@ti.com>
---
 drivers/mtd/nand/atmel_nand.c    |    2 +-
 drivers/mtd/nand/cafe_nand.c     |    2 +-
 drivers/mtd/nand/fsl_elbc_nand.c |    3 +-
 drivers/mtd/nand/nand_base.c     |   45 ++++++++++++++++++++++----------------
 drivers/mtd/nand/sh_flctl.c      |    2 +-
 include/linux/mtd/nand.h         |    8 +++---
 6 files changed, 35 insertions(+), 27 deletions(-)

Comments

David Brownell April 5, 2009, 10:56 p.m. UTC | #1
On Wednesday 01 April 2009, nsnehaprabha@ti.com wrote:
> From: Sneha Narnakaje <nsnehaprabha@ti.com>
> 
> The NAND controller on TI DaVinci DM355 supports NAND devices
> with large page size (2K and 4K), but the HW ECC is handled
> for every 512byte read/write chunks. The current HW_SYNDROME
> read_page/write_page APIs in the NAND core (nand_base) use the
> "infix OOB" scheme.

Makes me wonder if there should be a new NAND_ECC_HW_* mode,
which doesn't use "infix OOB" ... since that's the problem.

Given bugs recently uncovered in that area, it seems that
these DaVinci chips may be the first to use ECC_HW_SYNDROME
in multi-chunk mode (and thus "infix OOB" in its full glory,
rather than single-chunk subset which matches traditional
layout with OOB at the end).


> The core APIs overwrite NAND manufacturer's bad block meta
> data, thus complicating the jobs of non-Linux NAND programmers
> (end equipment manufacturering). These APIs also imply ECC
> protection for the prepad bytes, causing nand raw_write
> operations to fail.

All of those seem like reasons to avoid NAND_ECC_HW_SYNDROME
unless the ECC chunksize matches the page size ... that is,
to only use it when "infix OOB" won't apply.

I particularly dislike clobbering the bad block data, since
once that's done it can't easily be recovered.  Preferable
to leave those markers in place, so that the chip can still
be re-initialized cleanly.


> So the TI DaVinci NAND driver overrides these APIs for
> DaVinci DM355.

The same 4-bit ECC engine is used on most other DaVinci DM3xx
chips; OMAP-L1xx chips, and their technical relatives like
the DA8xx family, of course.


> The read_page APIs required "page" to be passed for reading
> the whole OOB area first and then data in chunks of 512bytes.
> The ECC status is checked after reading every 512byte chunk
> and passing the ECC syndrome.

If this were a new NAND_ECC_HW_* flavor, all that logic
would reside in the NAND core so that other chips could
easily reuse it.  (The interface changes would still be
needed.)

- Dave


> Since there is a change to read_page/read_page_raw API
> definitions, other NAND drivers overriding these APIs have
> also been changed (to address compilation problems).   
> 
> This patch also changes the way oobavail is calculated using
> the oobfree[].length and the way oobfree[] is used. The
> oobfree[] is defined as an array of MTD_MAX_OOBFREE_ENTRIES
> size (8). If we initialize all 8 entries, then the current
> for loop which has a chance to go beyond the array limit thus
> causing incorrect oobavail value. The NAND core and other NAND
> drivers do not require upto 8 oobfree entries, but the TI
> DaVinci NAND driver requires upto 8 oobfree entries to support
> upto 4K page size. The check for MTD_MAX_OOBFREE_ENTREIES is
> already there in the drivers/mtd/onenand/onenand_base.c, which
> is based on drivers/mtd/nand/nand_base.c. There is also a check
> for ecc.steps to make the oobfree[] backward compatible with 2K
> page size. The max oob size and page size have been adjusted for
> the 4K+128 page size. Supporting up to 4K page with
> NAND_ECC_HW_SYNDROME does not require changes to
> include/mtd/mtd-abi.h (breaks user space IOCTL interface).               
> 
> Signed-off-by: Sandeep Paulraj <s-paulraj@ti.com>
> Signed-off-by: Sneha Narnakaje <nsnehaprabha@ti.com>
> ---
>  drivers/mtd/nand/atmel_nand.c    |    2 +-
>  drivers/mtd/nand/cafe_nand.c     |    2 +-
>  drivers/mtd/nand/fsl_elbc_nand.c |    3 +-
>  drivers/mtd/nand/nand_base.c     |   45 ++++++++++++++++++++++----------------
>  drivers/mtd/nand/sh_flctl.c      |    2 +-
>  include/linux/mtd/nand.h         |    8 +++---
>  6 files changed, 35 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/mtd/nand/atmel_nand.c b/drivers/mtd/nand/atmel_nand.c
> index 47a33ce..b63eddb 100644
> --- a/drivers/mtd/nand/atmel_nand.c
> +++ b/drivers/mtd/nand/atmel_nand.c
> @@ -214,7 +214,7 @@ static int atmel_nand_calculate(struct mtd_info *mtd,
>   * buf:        buffer to store read data
>   */
>  static int atmel_nand_read_page(struct mtd_info *mtd,
> -		struct nand_chip *chip, uint8_t *buf)
> +		struct nand_chip *chip, uint8_t *buf, int page)
>  {
>  	int eccsize = chip->ecc.size;
>  	int eccbytes = chip->ecc.bytes;
> diff --git a/drivers/mtd/nand/cafe_nand.c b/drivers/mtd/nand/cafe_nand.c
> index 22a6b2e..8ea1623 100644
> --- a/drivers/mtd/nand/cafe_nand.c
> +++ b/drivers/mtd/nand/cafe_nand.c
> @@ -381,7 +381,7 @@ static int cafe_nand_read_oob(struct mtd_info *mtd, struct nand_chip *chip,
>   * we need a special oob layout and handling.
>   */
>  static int cafe_nand_read_page(struct mtd_info *mtd, struct nand_chip *chip,
> -			       uint8_t *buf)
> +			       uint8_t *buf, int page)
>  {
>  	struct cafe_priv *cafe = mtd->priv;
>  
> diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c
> index 1f6eb25..ddd37d2 100644
> --- a/drivers/mtd/nand/fsl_elbc_nand.c
> +++ b/drivers/mtd/nand/fsl_elbc_nand.c
> @@ -739,7 +739,8 @@ static int fsl_elbc_chip_init_tail(struct mtd_info *mtd)
>  
>  static int fsl_elbc_read_page(struct mtd_info *mtd,
>                                struct nand_chip *chip,
> -                              uint8_t *buf)
> +			      uint8_t *buf,
> +			      int page)
>  {
>  	fsl_elbc_read_buf(mtd, buf, mtd->writesize);
>  	fsl_elbc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index 599185c..d7f6650 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -750,7 +750,7 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
>   * @buf:	buffer to store read data
>   */
>  static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
> -			      uint8_t *buf)
> +			      uint8_t *buf, int page)
>  {
>  	chip->read_buf(mtd, buf, mtd->writesize);
>  	chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
> @@ -764,7 +764,7 @@ static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
>   * @buf:	buffer to store read data
>   */
>  static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip,
> -				uint8_t *buf)
> +				uint8_t *buf, int page)
>  {
>  	int i, eccsize = chip->ecc.size;
>  	int eccbytes = chip->ecc.bytes;
> @@ -774,7 +774,7 @@ static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip,
>  	uint8_t *ecc_code = chip->buffers->ecccode;
>  	uint32_t *eccpos = chip->ecc.layout->eccpos;
>  
> -	chip->ecc.read_page_raw(mtd, chip, buf);
> +	chip->ecc.read_page_raw(mtd, chip, buf, page);
>  
>  	for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize)
>  		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
> @@ -887,7 +887,7 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3
>   * Not for syndrome calculating ecc controllers which need a special oob layout
>   */
>  static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
> -				uint8_t *buf)
> +				uint8_t *buf, int page)
>  {
>  	int i, eccsize = chip->ecc.size;
>  	int eccbytes = chip->ecc.bytes;
> @@ -932,7 +932,7 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
>   * we need a special oob layout and handling.
>   */
>  static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
> -				   uint8_t *buf)
> +				   uint8_t *buf, int page)
>  {
>  	int i, eccsize = chip->ecc.size;
>  	int eccbytes = chip->ecc.bytes;
> @@ -997,8 +997,10 @@ static uint8_t *nand_transfer_oob(struct nand_chip *chip, uint8_t *oob,
>  		struct nand_oobfree *free = chip->ecc.layout->oobfree;
>  		uint32_t boffs = 0, roffs = ops->ooboffs;
>  		size_t bytes = 0;
> +		unsigned int i;
>  
> -		for(; free->length && len; free++, len -= bytes) {
> +		for (i = 0; i < chip->ecc.steps && i < MTD_MAX_OOBFREE_ENTRIES
> +		     && free->length && len; i++, free++, len -= bytes) {
>  			/* Read request not from offset 0 ? */
>  			if (unlikely(roffs)) {
>  				if (roffs >= free->length) {
> @@ -1074,11 +1076,13 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
>  
>  			/* Now read the page into the buffer */
>  			if (unlikely(ops->mode == MTD_OOB_RAW))
> -				ret = chip->ecc.read_page_raw(mtd, chip, bufpoi);
> +				ret = chip->ecc.read_page_raw(mtd, chip,
> +							      bufpoi, page);
>  			else if (!aligned && NAND_SUBPAGE_READ(chip) && !oob)
>  				ret = chip->ecc.read_subpage(mtd, chip, col, bytes, bufpoi);
>  			else
> -				ret = chip->ecc.read_page(mtd, chip, bufpoi);
> +				ret = chip->ecc.read_page(mtd, chip, bufpoi,
> +							  page);
>  			if (ret < 0)
>  				break;
>  
> @@ -1666,8 +1670,10 @@ static uint8_t *nand_fill_oob(struct nand_chip *chip, uint8_t *oob,
>  		struct nand_oobfree *free = chip->ecc.layout->oobfree;
>  		uint32_t boffs = 0, woffs = ops->ooboffs;
>  		size_t bytes = 0;
> +		unsigned int i;
>  
> -		for(; free->length && len; free++, len -= bytes) {
> +		for (i = 0; i < chip->ecc.steps && i < MTD_MAX_OOBFREE_ENTRIES
> +		     && free->length && len; i++, free++, len -= bytes) {
>  			/* Write request not from offset 0 ? */
>  			if (unlikely(woffs)) {
>  				if (woffs >= free->length) {
> @@ -2651,16 +2657,6 @@ int nand_scan_tail(struct mtd_info *mtd)
>  	}
>  
>  	/*
> -	 * The number of bytes available for a client to place data into
> -	 * the out of band area
> -	 */
> -	chip->ecc.layout->oobavail = 0;
> -	for (i = 0; chip->ecc.layout->oobfree[i].length; i++)
> -		chip->ecc.layout->oobavail +=
> -			chip->ecc.layout->oobfree[i].length;
> -	mtd->oobavail = chip->ecc.layout->oobavail;
> -
> -	/*
>  	 * Set the number of read / write steps for one page depending on ECC
>  	 * mode
>  	 */
> @@ -2672,6 +2668,17 @@ int nand_scan_tail(struct mtd_info *mtd)
>  	chip->ecc.total = chip->ecc.steps * chip->ecc.bytes;
>  
>  	/*
> +	 * The number of bytes available for a client to place data into
> +	 * the out of band area
> +	 */
> +	chip->ecc.layout->oobavail = 0;
> +	for (i = 0; i < chip->ecc.steps && i < MTD_MAX_OOBFREE_ENTRIES &&
> +	     chip->ecc.layout->oobfree[i].length; i++)
> +		chip->ecc.layout->oobavail +=
> +			chip->ecc.layout->oobfree[i].length;
> +	mtd->oobavail = chip->ecc.layout->oobavail;
> +
> +	/*
>  	 * Allow subpage writes up to ecc.steps. Not possible for MLC
>  	 * FLASH.
>  	 */
> diff --git a/drivers/mtd/nand/sh_flctl.c b/drivers/mtd/nand/sh_flctl.c
> index 821acb0..77e74a3 100644
> --- a/drivers/mtd/nand/sh_flctl.c
> +++ b/drivers/mtd/nand/sh_flctl.c
> @@ -324,7 +324,7 @@ static void set_cmd_regs(struct mtd_info *mtd, uint32_t cmd, uint32_t flcmcdr_va
>  }
>  
>  static int flctl_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
> -				uint8_t *buf)
> +				uint8_t *buf, int page)
>  {
>  	int i, eccsize = chip->ecc.size;
>  	int eccbytes = chip->ecc.bytes;
> diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
> index 27fb694..63ed0eb 100644
> --- a/include/linux/mtd/nand.h
> +++ b/include/linux/mtd/nand.h
> @@ -43,8 +43,8 @@ extern void nand_wait_ready(struct mtd_info *mtd);
>   * is supported now. If you add a chip with bigger oobsize/page
>   * adjust this accordingly.
>   */
> -#define NAND_MAX_OOBSIZE	64
> -#define NAND_MAX_PAGESIZE	2048
> +#define NAND_MAX_OOBSIZE	128
> +#define NAND_MAX_PAGESIZE	4096
>  
>  /*
>   * Constants for hardware specific CLE/ALE/NCE function
> @@ -278,13 +278,13 @@ struct nand_ecc_ctrl {
>  					   uint8_t *calc_ecc);
>  	int			(*read_page_raw)(struct mtd_info *mtd,
>  						 struct nand_chip *chip,
> -						 uint8_t *buf);
> +						 uint8_t *buf, int page);
>  	void			(*write_page_raw)(struct mtd_info *mtd,
>  						  struct nand_chip *chip,
>  						  const uint8_t *buf);
>  	int			(*read_page)(struct mtd_info *mtd,
>  					     struct nand_chip *chip,
> -					     uint8_t *buf);
> +					     uint8_t *buf, int page);
>  	int			(*read_subpage)(struct mtd_info *mtd,
>  					     struct nand_chip *chip,
>  					     uint32_t offs, uint32_t len,
> -- 
> 1.6.0.4
>
Thomas Gleixner April 6, 2009, 10:59 p.m. UTC | #2
David, Sneha,

I'm answering both mails in one go.

On Sun, 5 Apr 2009, David Brownell wrote:

> On Wednesday 01 April 2009, nsnehaprabha@ti.com wrote:
> > From: Sneha Narnakaje <nsnehaprabha@ti.com>
> > 
> > The NAND controller on TI DaVinci DM355 supports NAND devices
> > with large page size (2K and 4K), but the HW ECC is handled
> > for every 512byte read/write chunks. The current HW_SYNDROME
> > read_page/write_page APIs in the NAND core (nand_base) use the
> > "infix OOB" scheme.

For a damned good reason. If you want to use HW_SYNDROME mode then you
better have the syndrome right after the data.

data - ecc - data - ecc ...

That's how HW_SYNDROME generators work. It's not about the write side,
It's about the read side. You read n bytes of data and m btes of ecc
in _ONE_ go and the hardware will tell you whether there is a bit flip
or not. You can even do full hardware error correction this way. And
it has been done.
 
See also: http://linux-mtd.infradead.org/tech/mtdnand/x111.html#AEN140

> Makes me wonder if there should be a new NAND_ECC_HW_* mode,
> which doesn't use "infix OOB" ... since that's the problem.
> 
> Given bugs recently uncovered in that area, it seems that
> these DaVinci chips may be the first to use ECC_HW_SYNDROME
> in multi-chunk mode (and thus "infix OOB" in its full glory,
> rather than single-chunk subset which matches traditional
> layout with OOB at the end).

Sorry, HW_SYNDROME was used in multi chunk mode from the very
beginning. See above.
 
If your device does not do that or you do not want to utilize it for
whatever reasons then just use the NAND_ECC_HW mode which lets you
place your ECC data where ever you want.

> > The core APIs overwrite NAND manufacturer's bad block meta
> > data, thus complicating the jobs of non-Linux NAND programmers
> > (end equipment manufacturering). These APIs also imply ECC
> > protection for the prepad bytes, causing nand raw_write
> > operations to fail.
> 
> All of those seem like reasons to avoid NAND_ECC_HW_SYNDROME
> unless the ECC chunksize matches the page size ... that is,
> to only use it when "infix OOB" won't apply.

Wrong, see above
 
> I particularly dislike clobbering the bad block data, since
> once that's done it can't easily be recovered.  Preferable
> to leave those markers in place, so that the chip can still
> be re-initialized cleanly.

And how do you utilize the "data-ecc-data-ecc..." scheme _WITHOUT_
clobbering the bad block marker bytes ??? That's why we have the bad
block table mechanism. And this is not a Linux kernel hackers oddity,
just have a look at DiskOnChip devices which do exactly the same. It's
dictated by the hardware and there is nothing you can do about it -
aside of falling back to software ECC and giving up the HW
acceleration.

This patch is utter nonsense. The whole functionality is already
there. Just use the correct mode: NAND_ECC_HW. Place your ECC data
where you want or where your commercial counterparts want them to show
up. Using NAND_ECC_HW_SYNDROME for your case is just plain wrong.

Thanks,

	tglx
Thomas Gleixner April 6, 2009, 11:07 p.m. UTC | #3
Sneha,

On Wed, 1 Apr 2009, nsnehaprabha@ti.com wrote:

Would you please care to CC the still caring but admittedly lazy and
distracted maintainer on such patches ?

> From: Sneha Narnakaje <nsnehaprabha@ti.com>
> 
> The NAND controller on TI DaVinci DM355 supports NAND devices with large page size (2K and 4K), but the HW ECC is handled for every 512byte read/write chunks. The current HW_SYNDROME read_page/write_page APIs in the NAND core (nand_base) use the "infix OOB" scheme. The core APIs overwrite NAND manufacturer's bad block meta data, thus complicating the jobs of non-Linux NAND programmers (end equipment manufacturering). These APIs also imply ECC protection for the prepad bytes, causing nand raw_write operations to fail.

Please use proper line breaks. 523 characters do not fit into a
terminal.

Thanks,

	tglx
David Brownell April 7, 2009, 1:49 a.m. UTC | #4
On Monday 06 April 2009, Thomas Gleixner wrote:
> David, Sneha,
> 
> I'm answering both mails in one go.
> 
> On Sun, 5 Apr 2009, David Brownell wrote:
> 
> > On Wednesday 01 April 2009, nsnehaprabha@ti.com wrote:
> > > From: Sneha Narnakaje <nsnehaprabha@ti.com>
> > > 
> > > The NAND controller on TI DaVinci DM355 supports NAND devices
> > > with large page size (2K and 4K), but the HW ECC is handled
> > > for every 512byte read/write chunks. The current HW_SYNDROME
> > > read_page/write_page APIs in the NAND core (nand_base) use the
> > > "infix OOB" scheme.
> 
> For a damned good reason. If you want to use HW_SYNDROME mode then you
> better have the syndrome right after the data.
> 
> data - ecc - data - ecc ...
> 
> That's how HW_SYNDROME generators work. It's not about the write side,
> It's about the read side. You read n bytes of data and m btes of ecc
> in _ONE_ go

Nit:  the current code issues up to four read_buf() calls there, not
one; always at least two:  data, prepad, ecc, postpad (pads optional).
The write side is relevant since it's got to issue matching writes.

Which causes another little issue:  the pre-pad may be ECC-protected,
but parts of the MTD stack don't entirely seem prepared for that.  As
in, it looks like some code assumes writing prepad won't affect ECC.
They may write it ... causing up to 8*6 = 48 bits of ECC error in this
case, more than can be corrected.  (I think that would be tools, not
normal kernel code.)


> and the hardware will tell you whether there is a bit flip 
> or not. You can even do full hardware error correction this way. And
> it has been done.
>  
> See also: http://linux-mtd.infradead.org/tech/mtdnand/x111.html#AEN140

That information is partially obsolete.  Those NAND_ECC_HWx_y
constants are gone, the method names changed too.  And I think
the current code tolerates BCH and other ECC schemes not just
the Reed-Solomon subset.

FWIW, this hardware would be NAND_ECC_HW10_518; not one of those
now-obsolete constants.  Yes; 518, == 512 + 6, so the data and
"prepad" can all be ECC-protected.

And of course, the point of this patch is to support the read
side *without* forcing that "choice" of "infix OOB".


> > Makes me wonder if there should be a new NAND_ECC_HW_* mode,
> > which doesn't use "infix OOB" ... since that's the problem.
> > 
> > Given bugs recently uncovered in that area, it seems that
> > these DaVinci chips may be the first to use ECC_HW_SYNDROME
> > in multi-chunk mode (and thus "infix OOB" in its full glory,
> > rather than single-chunk subset which matches traditional
> > layout with OOB at the end).
> 
> Sorry, HW_SYNDROME was used in multi chunk mode from the very
> beginning. See above.

That's hard to believe; 52ff49df7fab18e56fa31b43143c4150c2547836
merged today.  Lack of that pretty much trashed a 2 GByte NAND
chip I have ... the standard raw page r/w calls ignored multi chunk
layout issues, so the BBT detection lost horribly.  Trashed the
existing BBT table -- EVERY TIME IT BOOTED.  Data corruption bugs
like that, and "code in use", do not mix at all convincingly.

The main current in-tree HW_SYNDROME driver is cafe_nand ... which
looks to force single-chunk mode.

Maybe bugs crept in over time.  The other two drivers using that
code (referenced in the URL above) are from the days when large
page chips weren't routine.  Diskonchip is now obsolete (and the
docs I've found refer to 512 byte pages, single-chunk territory);
and that "AND" flash (not NAND!) stuff is at best rare.  Hence my
assumption that multi-chunk mode has never actually been used, even
if the code has been in place.


> If your device does not do that or you do not want to utilize it for
> whatever reasons then just use the NAND_ECC_HW mode which lets you
> place your ECC data where ever you want.

So you basically agree with my comment that this shouldn't use
HW_SYNDROME code paths unless it uses the "infix OOB" scheme.


SEPARATE ISSUE:  when is using that "infix OOB" appropriate?

> > > The core APIs overwrite NAND manufacturer's bad block meta
> > > data, thus complicating the jobs of non-Linux NAND programmers
> > > (end equipment manufacturering). These APIs also imply ECC
> > > protection for the prepad bytes, causing nand raw_write
> > > operations to fail.
> > 
> > All of those seem like reasons to avoid NAND_ECC_HW_SYNDROME
> > unless the ECC chunksize matches the page size ... that is,
> > to only use it when "infix OOB" won't apply.
> 
> Wrong, see above

Hardly.  It's clearly arguable; see my comments above, and even
yours (!) about not wanting to use it for "whatever reasons".
Like the reasons quoted right there... there are indeed downsides
to this "infix OOB" approach, which could make it worth avoiding.


> > I particularly dislike clobbering the bad block data, since
> > once that's done it can't easily be recovered.  Preferable
> > to leave those markers in place, so that the chip can still
> > be re-initialized cleanly.
> 
> And how do you utilize the "data-ecc-data-ecc..." scheme _WITHOUT_
> clobbering the bad block marker bytes ???

You don't.  You'd use a different scheme.


> That's why we have the bad block table mechanism.

Sure, when it works.  Of course it's _nice_ to be able to
recreate such a table too, and treat a table-in-flash
as just a boot accelerator.

Maybe "nice" matters only during development; maybe not.
I was able to manually re-mark 256 MBytes as "block not
actually bad", after hitting that BBT-corrupting bug above.
(The other blocks ... I won't worry about for now.)

Now I think other folk won't see that *same* failure.
So maybe the backup BBT will suffice to recover from
such things in the future, in conjunction with system
software which disallows (re)creation of BBT-in-flash
from both Linux and the boot loader.


> And this is not a Linux kernel hackers oddity, 
> just have a look at DiskOnChip devices which do exactly the same. It's
> dictated by the hardware and there is nothing you can do about it -
> aside of falling back to software ECC and giving up the HW
> acceleration.

Or ... using patches like Sneha provided.

Or ... by using hardware like OMAP3, which seems to have
hardware buffers for the ECC syndrome stuff.  Support for
"normal" OOB layout (vs infix) from the hardware's BCH
syndrome calculation/correction module, if I read right.

The "infix OOB" is a software policy.  Maybe it's natural
to choose that with some hardware.  But it's not dictated
hardware; as shown by Sneha's patches.


> This patch is utter nonsense. The whole functionality is already
> there. Just use the correct mode: NAND_ECC_HW. Place your ECC data
> where you want or where your commercial counterparts want them to show
> up. Using NAND_ECC_HW_SYNDROME for your case is just plain wrong.

Well, ISTR that you said otherwise above ... yeah, quoting you (above):

> > If your device does not do that or you do not want to utilize it for
> > whatever reasons then just use the NAND_ECC_HW mode which lets you
> > place your ECC data where ever you want.

And that's exactly what these patches do:  enable just such choices.
The $SUBJECT patch would be needed to use NAND_ECC_HW with this 4-bit
ECC hardware, and not be forced to "choose" the infix-OOB policy.


How strongly you believe in supporting those choices is yet another
question.  But you should at least be arguing from real facts, not
from mis-apprehensions.

- Dave
nsnehaprabha@ti.com April 7, 2009, 3:07 p.m. UTC | #5
Thomas,

> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Monday, April 06, 2009 7:08 PM
> To: Narnakaje, Snehaprabha
> Cc: linux-mtd@lists.infradead.org; linux-kernel@vger.kernel.org; davinci-
> linux-open-source@linux.davincidsp.com; Paulraj, Sandeep
> Subject: Re: [PATCH] MTD-NAND: Changes to read_page APIs to support
> NAND_ECC_HW_SYNDROME mode on TI DaVinci DM355
> 
> Sneha,
> 
> On Wed, 1 Apr 2009, nsnehaprabha@ti.com wrote:
> 
> Would you please care to CC the still caring but admittedly lazy and
> distracted maintainer on such patches ?

Who would that be?

Is the list below correct?

Steven J. Hill (sjhill@realitydiluted.com)
Thomas Gleixner (tglx@linuxtronix.de)
David Woodhouse (dwmw2@infradead.org)

> 
> > From: Sneha Narnakaje <nsnehaprabha@ti.com>
> >
> > The NAND controller on TI DaVinci DM355 supports NAND devices with large
> page size (2K and 4K), but the HW ECC is handled for every 512byte
> read/write chunks. The current HW_SYNDROME read_page/write_page APIs in
> the NAND core (nand_base) use the "infix OOB" scheme. The core APIs
> overwrite NAND manufacturer's bad block meta data, thus complicating the
> jobs of non-Linux NAND programmers (end equipment manufacturering). These
> APIs also imply ECC protection for the prepad bytes, causing nand
> raw_write operations to fail.
> 
> Please use proper line breaks. 523 characters do not fit into a
> terminal.

OK.

Thanks
Sneha
 
> 
> Thanks,
> 
> 	tglx
> 
>
nsnehaprabha@ti.com April 7, 2009, 4:02 p.m. UTC | #6
Thomas, David,

> -----Original Message-----
> From: David Brownell [mailto:david-b@pacbell.net]
> Sent: Monday, April 06, 2009 9:49 PM
> To: Thomas Gleixner
> Cc: Narnakaje, Snehaprabha; davinci-linux-open-
> source@linux.davincidsp.com; linux-mtd@lists.infradead.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH] MTD-NAND: Changes to read_page APIs to support
> NAND_ECC_HW_SYNDROME mode on TI DaVinci DM355
> 
> On Monday 06 April 2009, Thomas Gleixner wrote:
> > David, Sneha,
> >
> > I'm answering both mails in one go.
> >
> > On Sun, 5 Apr 2009, David Brownell wrote:
> >
> > > On Wednesday 01 April 2009, nsnehaprabha@ti.com wrote:
> > > > From: Sneha Narnakaje <nsnehaprabha@ti.com>
> > > >
> > > > The NAND controller on TI DaVinci DM355 supports NAND devices
> > > > with large page size (2K and 4K), but the HW ECC is handled
> > > > for every 512byte read/write chunks. The current HW_SYNDROME
> > > > read_page/write_page APIs in the NAND core (nand_base) use the
> > > > "infix OOB" scheme.
> >
> > For a damned good reason. If you want to use HW_SYNDROME mode then you
> > better have the syndrome right after the data.
> >
> > data - ecc - data - ecc ...
> >
> > That's how HW_SYNDROME generators work. It's not about the write side,
> > It's about the read side. You read n bytes of data and m btes of ecc
> > in _ONE_ go
> 
> Nit:  the current code issues up to four read_buf() calls there, not
> one; always at least two:  data, prepad, ecc, postpad (pads optional).
> The write side is relevant since it's got to issue matching writes.
> 
> Which causes another little issue:  the pre-pad may be ECC-protected,
> but parts of the MTD stack don't entirely seem prepared for that.  As
> in, it looks like some code assumes writing prepad won't affect ECC.
> They may write it ... causing up to 8*6 = 48 bits of ECC error in this
> case, more than can be corrected.  (I think that would be tools, not
> normal kernel code.)
> 
> 
> > and the hardware will tell you whether there is a bit flip
> > or not. You can even do full hardware error correction this way. And
> > it has been done.
> >
> > See also: http://linux-mtd.infradead.org/tech/mtdnand/x111.html#AEN140
> 
> That information is partially obsolete.  Those NAND_ECC_HWx_y
> constants are gone, the method names changed too.  And I think
> the current code tolerates BCH and other ECC schemes not just
> the Reed-Solomon subset.
> 
> FWIW, this hardware would be NAND_ECC_HW10_518; not one of those
> now-obsolete constants.  Yes; 518, == 512 + 6, so the data and
> "prepad" can all be ECC-protected.
> 
> And of course, the point of this patch is to support the read
> side *without* forcing that "choice" of "infix OOB".
> 
> 
> > > Makes me wonder if there should be a new NAND_ECC_HW_* mode,
> > > which doesn't use "infix OOB" ... since that's the problem.
> > >
> > > Given bugs recently uncovered in that area, it seems that
> > > these DaVinci chips may be the first to use ECC_HW_SYNDROME
> > > in multi-chunk mode (and thus "infix OOB" in its full glory,
> > > rather than single-chunk subset which matches traditional
> > > layout with OOB at the end).
> >
> > Sorry, HW_SYNDROME was used in multi chunk mode from the very
> > beginning. See above.
> 
> That's hard to believe; 52ff49df7fab18e56fa31b43143c4150c2547836
> merged today.  Lack of that pretty much trashed a 2 GByte NAND
> chip I have ... the standard raw page r/w calls ignored multi chunk
> layout issues, so the BBT detection lost horribly.  Trashed the
> existing BBT table -- EVERY TIME IT BOOTED.  Data corruption bugs
> like that, and "code in use", do not mix at all convincingly.
> 
> The main current in-tree HW_SYNDROME driver is cafe_nand ... which
> looks to force single-chunk mode.
> 
> Maybe bugs crept in over time.  The other two drivers using that
> code (referenced in the URL above) are from the days when large
> page chips weren't routine.  Diskonchip is now obsolete (and the
> docs I've found refer to 512 byte pages, single-chunk territory);
> and that "AND" flash (not NAND!) stuff is at best rare.  Hence my
> assumption that multi-chunk mode has never actually been used, even
> if the code has been in place.
> 
> 
> > If your device does not do that or you do not want to utilize it for
> > whatever reasons then just use the NAND_ECC_HW mode which lets you
> > place your ECC data where ever you want.
> 
> So you basically agree with my comment that this shouldn't use
> HW_SYNDROME code paths unless it uses the "infix OOB" scheme.
> 
> 
> SEPARATE ISSUE:  when is using that "infix OOB" appropriate?
> 
> > > > The core APIs overwrite NAND manufacturer's bad block meta
> > > > data, thus complicating the jobs of non-Linux NAND programmers
> > > > (end equipment manufacturering). These APIs also imply ECC
> > > > protection for the prepad bytes, causing nand raw_write
> > > > operations to fail.
> > >
> > > All of those seem like reasons to avoid NAND_ECC_HW_SYNDROME
> > > unless the ECC chunksize matches the page size ... that is,
> > > to only use it when "infix OOB" won't apply.
> >
> > Wrong, see above
> 
> Hardly.  It's clearly arguable; see my comments above, and even
> yours (!) about not wanting to use it for "whatever reasons".
> Like the reasons quoted right there... there are indeed downsides
> to this "infix OOB" approach, which could make it worth avoiding.
> 
> 
> > > I particularly dislike clobbering the bad block data, since
> > > once that's done it can't easily be recovered.  Preferable
> > > to leave those markers in place, so that the chip can still
> > > be re-initialized cleanly.
> >
> > And how do you utilize the "data-ecc-data-ecc..." scheme _WITHOUT_
> > clobbering the bad block marker bytes ???
> 
> You don't.  You'd use a different scheme.
> 
> 
> > That's why we have the bad block table mechanism.
> 
> Sure, when it works.  Of course it's _nice_ to be able to
> recreate such a table too, and treat a table-in-flash
> as just a boot accelerator.
> 
> Maybe "nice" matters only during development; maybe not.
> I was able to manually re-mark 256 MBytes as "block not
> actually bad", after hitting that BBT-corrupting bug above.
> (The other blocks ... I won't worry about for now.)
> 
> Now I think other folk won't see that *same* failure.
> So maybe the backup BBT will suffice to recover from
> such things in the future, in conjunction with system
> software which disallows (re)creation of BBT-in-flash
> from both Linux and the boot loader.
> 
> 
> > And this is not a Linux kernel hackers oddity,
> > just have a look at DiskOnChip devices which do exactly the same. It's
> > dictated by the hardware and there is nothing you can do about it -
> > aside of falling back to software ECC and giving up the HW
> > acceleration.
> 
> Or ... using patches like Sneha provided.
> 
> Or ... by using hardware like OMAP3, which seems to have
> hardware buffers for the ECC syndrome stuff.  Support for
> "normal" OOB layout (vs infix) from the hardware's BCH
> syndrome calculation/correction module, if I read right.
> 
> The "infix OOB" is a software policy.  Maybe it's natural
> to choose that with some hardware.  But it's not dictated
> hardware; as shown by Sneha's patches.
> 
> 
> > This patch is utter nonsense. The whole functionality is already
> > there. Just use the correct mode: NAND_ECC_HW. Place your ECC data
> > where you want or where your commercial counterparts want them to show
> > up. Using NAND_ECC_HW_SYNDROME for your case is just plain wrong.
> 
> Well, ISTR that you said otherwise above ... yeah, quoting you (above):
> 
> > > If your device does not do that or you do not want to utilize it for
> > > whatever reasons then just use the NAND_ECC_HW mode which lets you
> > > place your ECC data where ever you want.
> 
> And that's exactly what these patches do:  enable just such choices.
> The $SUBJECT patch would be needed to use NAND_ECC_HW with this 4-bit
> ECC hardware, and not be forced to "choose" the infix-OOB policy.

I had looked read_page/write_page APIs of the NAND_ECC_HW mode to see if we can use this mode for DaVinci DM355 device.

The read_page handler for NAND_ECC_HW mode reads the data and ECC from hardware for each chunk. It then reads the OOB and extracts ecc code from it, before using the ECC from hardware and ecc code from OOB for data correction.

On DaVinci DM355 device, we need to pass the ecc code/syndrome extracted from OOB area, in order to read the HW ECC for each data chunk. This is where we differ from NAND_ECC_HW mode.

The read_page/write_page APIs for NAND_ECC_HW_SYNDROME have the other problem that David mentioned - overwriting NAND manufacturers bad block meta data, when used with large page NAND chips. These APIs do not handle the "eccsteps" properly. The OOB is read/written after every data chunk, thus you actually have overwritten the factory bad block information, when these APIs handle the last data chunk. OOB should be read/written after the entire data (a page) is read/written.

Thanks
Sneha

> 
> 
> How strongly you believe in supporting those choices is yet another
> question.  But you should at least be arguing from real facts, not
> from mis-apprehensions.
> 
> - Dave
>
Thomas Gleixner April 7, 2009, 4:45 p.m. UTC | #7
On Tue, 7 Apr 2009, Narnakaje, Snehaprabha wrote:
> > And that's exactly what these patches do:  enable just such choices.
> > The $SUBJECT patch would be needed to use NAND_ECC_HW with this 4-bit
> > ECC hardware, and not be forced to "choose" the infix-OOB policy.
>
> I had looked read_page/write_page APIs of the NAND_ECC_HW mode to
> see if we can use this mode for DaVinci DM355 device.
> 
> The read_page handler for NAND_ECC_HW mode reads the data and ECC
> from hardware for each chunk. It then reads the OOB and extracts ecc
> code from it, before using the ECC from hardware and ecc code from
> OOB for data correction.
>
> On DaVinci DM355 device, we need to pass the ecc code/syndrome
> extracted from OOB area, in order to read the HW ECC for each data
> chunk. This is where we differ from NAND_ECC_HW mode.

I have to admit that I'm slightly confused. Is the below description
correct?

On write you generate the ECC via hardware and store it in the OOB
area, right ?

On read you read the oob data first and extract the ECC. Then you feed
the extracted ECC into the HW generator and read the data. Is this
correct ?

> The read_page/write_page APIs for NAND_ECC_HW_SYNDROME have the
> other problem that David mentioned - overwriting NAND manufacturers
> bad block meta data, when used with large page NAND chips. These
> APIs do not handle the "eccsteps" properly. The OOB is read/written
> after every data chunk, thus you actually have overwritten the
> factory bad block information, when these APIs handle the last data
> chunk. OOB should be read/written after the entire data (a page) is
> read/written.

Again, that is _how_ the NAND_ECC_HW_SYNDROME functions work. And if
your hardware needs a completely different mode, then we need to
analyse what's the best solution for it.

Right now the provided information is way to diffuse to do that.

You provided a patch without explaining what your hardware needs and
showing how the actual user of the modified API looks like and how the
functionality is implemented.

Thanks,

	tglx
nsnehaprabha@ti.com April 7, 2009, 6:34 p.m. UTC | #8
Thomas,

> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Tuesday, April 07, 2009 12:46 PM
> To: Narnakaje, Snehaprabha
> Cc: David Brownell; davinci-linux-open-source@linux.davincidsp.com; linux-
> mtd@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH] MTD-NAND: Changes to read_page APIs to support
> NAND_ECC_HW_SYNDROME mode on TI DaVinci DM355
> 
> On Tue, 7 Apr 2009, Narnakaje, Snehaprabha wrote:
> > > And that's exactly what these patches do:  enable just such choices.
> > > The $SUBJECT patch would be needed to use NAND_ECC_HW with this 4-bit
> > > ECC hardware, and not be forced to "choose" the infix-OOB policy.
> >
> > I had looked read_page/write_page APIs of the NAND_ECC_HW mode to
> > see if we can use this mode for DaVinci DM355 device.
> >
> > The read_page handler for NAND_ECC_HW mode reads the data and ECC
> > from hardware for each chunk. It then reads the OOB and extracts ecc
> > code from it, before using the ECC from hardware and ecc code from
> > OOB for data correction.
> >
> > On DaVinci DM355 device, we need to pass the ecc code/syndrome
> > extracted from OOB area, in order to read the HW ECC for each data
> > chunk. This is where we differ from NAND_ECC_HW mode.
> 
> I have to admit that I'm slightly confused. Is the below description
> correct?
> 
> On write you generate the ECC via hardware and store it in the OOB
> area, right ?

Yes. For a large page (e.g. 2K+64), ECC should be stored in the 64bytes OOB region. The NAND controller on DaVinci DM355 device is capable of generating HW ECC (4-bit) for every 512bytes. This means, we have 4 eccsteps. The ECC should be generated by the HW for every 512byte chunk write and stored temporarily in a buffer until all 4 eccsteps have been tried. The ECC from the temporary buffer is then written to the OOB area.

> 
> On read you read the oob data first and extract the ECC. Then you feed
> the extracted ECC into the HW generator and read the data. Is this
> correct ?

Almost, read OOB, read data and then feed the extracted ECC into HW generator. Read the ECC status to see any correction required on the read data. Again, reading data, feeding the extracted ECC into HW generator and the correction will have to be repeated for # of times - eccsteps.

> 
> > The read_page/write_page APIs for NAND_ECC_HW_SYNDROME have the
> > other problem that David mentioned - overwriting NAND manufacturers
> > bad block meta data, when used with large page NAND chips. These
> > APIs do not handle the "eccsteps" properly. The OOB is read/written
> > after every data chunk, thus you actually have overwritten the
> > factory bad block information, when these APIs handle the last data
> > chunk. OOB should be read/written after the entire data (a page) is
> > read/written.
> 
> Again, that is _how_ the NAND_ECC_HW_SYNDROME functions work. And if
> your hardware needs a completely different mode, then we need to
> analyse what's the best solution for it.

Based on the description above, NAND_ECC_HW_SYNDROME fitted well, with the exception of overriding the read_page/write_page APIs (to fix the bugs mentioned above). I have not sent the actual NAND driver to the linux-mtd yet, since the initial version (supported only 1-bit HW ECC) submitted by David Brownell was still in review (no comments and not approved yet). While coming up with read_page API in the DaVinci NAND driver, we realized the need to pass "page" parameter. The read_page API in the DaVinci NAND driver required to call chip->cmdfunc to adjust the read location (page vs. OOB). The "page" parameter has to be passed to the chip->cmdfunc.
This patch was a "preparation" patch for the DaVinci NAND driver 4-bit ECC support. It added the "page" parameter to the "read_page" and "read_page_raw" handlers, fixed few other bugs.

> 
> Right now the provided information is way to diffuse to do that.
> 
> You provided a patch without explaining what your hardware needs and
> showing how the actual user of the modified API looks like and how the
> functionality is implemented.
> 
> Thanks,
> 
> 	tglx
Thomas Gleixner April 7, 2009, 7:48 p.m. UTC | #9
On Tue, 7 Apr 2009, Narnakaje, Snehaprabha wrote:
> > > On DaVinci DM355 device, we need to pass the ecc code/syndrome
> > > extracted from OOB area, in order to read the HW ECC for each data
> > > chunk. This is where we differ from NAND_ECC_HW mode.
> > 
> > I have to admit that I'm slightly confused. Is the below description
> > correct?
> > 
> > On write you generate the ECC via hardware and store it in the OOB
> > area, right ?

Your mailer still does not insert line breaks at around 78 chars.
 
> Yes. For a large page (e.g. 2K+64), ECC should be stored in the
> 64bytes OOB region. The NAND controller on DaVinci DM355 device is
> capable of generating HW ECC (4-bit) for every 512bytes. This means,
> we have 4 eccsteps. The ECC should be generated by the HW for every
> 512byte chunk write and stored temporarily in a buffer until all 4
> eccsteps have been tried. The ECC from the temporary buffer is then
> written to the OOB area.

That's exactly what NAND_ECC_HW does.

> > 
> > On read you read the oob data first and extract the ECC. Then you feed
> > the extracted ECC into the HW generator and read the data. Is this
> > correct ?
 
> Almost, read OOB, read data and then feed the extracted ECC into HW
> generator. Read the ECC status to see any correction required on the
> read data. Again, reading data, feeding the extracted ECC into HW
> generator and the correction will have to be repeated for # of times
> - eccsteps.

Ok.

> > 
> > > The read_page/write_page APIs for NAND_ECC_HW_SYNDROME have the
> > > other problem that David mentioned - overwriting NAND manufacturers
> > > bad block meta data, when used with large page NAND chips. These
> > > APIs do not handle the "eccsteps" properly. The OOB is read/written
> > > after every data chunk, thus you actually have overwritten the
> > > factory bad block information, when these APIs handle the last data
> > > chunk. OOB should be read/written after the entire data (a page) is
> > > read/written.
> > 
> > Again, that is _how_ the NAND_ECC_HW_SYNDROME functions work. And if
> > your hardware needs a completely different mode, then we need to
> > analyse what's the best solution for it.
 
> Based on the description above, NAND_ECC_HW_SYNDROME fitted well,

No, it does not fit well.

> with the exception of overriding the read_page/write_page APIs (to
> fix the bugs mentioned above). I have not sent the actual NAND

Those are not bugs. You abused that interface and now you claim it has
bugs. It does _not_. You introduced the bugs by using a function for
something it was never designed for.

> driver to the linux-mtd yet, since the initial version (supported
> only 1-bit HW ECC) submitted by David Brownell was still in review
> (no comments and not approved yet). While coming up with read_page
> API in the DaVinci NAND driver, we realized the need to pass "page"
> parameter. The read_page API in the DaVinci NAND driver required to
> call chip->cmdfunc to adjust the read location (page vs. OOB). The
> "page" parameter has to be passed to the chip->cmdfunc.

This is the wrong approach. 

What you need is a NAND_ECC_HW_OOB_FIRST mode, which uses the
NAND_ECC_HW write function and implements a separate read function
which reads the oob and then reads the data chunks.

Thanks,

	tglx
David Brownell April 7, 2009, 10:44 p.m. UTC | #10
On Tuesday 07 April 2009, Narnakaje, Snehaprabha wrote:
> 
> > > > If your device does not do that or you do not want to utilize it for
> > > > whatever reasons then just use the NAND_ECC_HW mode which lets you
> > > > place your ECC data where ever you want.
> > 
> > And that's exactly what these patches do:  enable just such choices.
> > The $SUBJECT patch would be needed to use NAND_ECC_HW with this 4-bit
> > ECC hardware, and not be forced to "choose" the infix-OOB policy.
> 
> I had looked read_page/write_page APIs of the NAND_ECC_HW mode to see
> if we can use this mode for DaVinci DM355 device. 
> 
> The read_page handler for NAND_ECC_HW mode reads the data and ECC from
> hardware for each chunk. It then reads the OOB and extracts ecc code
> from it, before using the ECC from hardware and ecc code from OOB for
> data correction.   

That's pretty much what the RFC patches I sent about 2 months ago
were doing.  One "RFC" aspect was very much whether there wasn't
a better option than "infix OOB".  Overview of the patches and test
results:

  http://linux.omap.com/pipermail/davinci-linux-open-source/2009-February/010977.html

The patches won't quite apply to the version of the DaVinci NAND
driver now in mainline, but they've had only minor changes since then:

  http://linux.omap.com/pipermail/davinci-linux-open-source/2009-February/010979.html
	Adds basic 4-bit ECC support to the DaVinci NAND driver using
	HW_SYNDROME, but limited to single ECC step (small page NAND).

  http://linux.omap.com/pipermail/davinci-linux-open-source/2009-February/010976.html
	Now in mainline ... fixes NAND-core data corruption caused
	by using HW_SYNDROME with multi-step ECC.

  http://linux.omap.com/pipermail/davinci-linux-open-source/2009-February/010978.html
	Updates that 4-bit ECC support to work with multi-step ECC,
	but *strongly* restricted to something else writing the BBT.

Since then I haven't put any more work into sorting out where the
u-boot failure and oob/subpage test failures came from.  That stuff
was not tested with 4KB pages either ... there's at least one bug
in the NAND core code that Sneha pointed out.

(Thomas, the discussion thread for the second of those patches has
more background if you're interested and have the time.)


> The read_page/write_page APIs for NAND_ECC_HW_SYNDROME have the other
> problem that David mentioned - overwriting NAND manufacturers bad block
> meta data, when used with large page NAND chips.

The third patch above goes to some length to prevent Linux from
doing anything other than taking a BBT from flash that something
else has written.  So long as nothing trashes that BBT, and there's
a nice way to get it written in the first place (u-boot need not
apply, it'll have the same issues as Linux), things might act
OK -- once someone resolves the failures I observed.  Still seems
undesirably (and needlessly) fragile to me.

And I rather suspect that fragility is the reason the ROM boot
loader in newer chip revisions from TI is avoiding "infix OOB".
Plus likely a bit of feedback from their development partners
who package end-product NAND programming solutions ... I'm just
guessing, but it sure seems to me like there are many reasons
to avoid that "infix OOB" fragility.

- dave
David Brownell April 7, 2009, 10:46 p.m. UTC | #11
On Tuesday 07 April 2009, Narnakaje, Snehaprabha wrote:
> 
>  I have not sent the actual NAND driver to the linux-mtd yet, since
>  the initial version (supported only 1-bit HW ECC) submitted by David
>  Brownell was still in review (no comments and not approved yet).

It merged to mainline yesterday; should verify on DM6446 EVM.

The funky bits are fortunately limited to the 4-bit ECC support,
which are natural to review/evaluate as deltas on top of that core.
It was easy to tell they would bring out some confusion, which is
a good sign of something that's worth reviewing separately.

And, given the patches I referenced previously ... even those
should only be an issue for large page chips.  (And no DaVinci
board that uses large page chips has hit mainline yet.  Lots of
stuff in the queue for, one hopes, 2.6.31 though.)

- Dave
David Brownell April 7, 2009, 11:08 p.m. UTC | #12
On Tuesday 07 April 2009, Thomas Gleixner wrote:
> What you need is a NAND_ECC_HW_OOB_FIRST mode, which uses the
> NAND_ECC_HW write function and implements a separate read function
> which reads the oob and then reads the data chunks.

That sounds about right, and it should address the issues
I raised earlier in this thread.

Though I might name it differently.  Reason being that the
"read OOB first" bit is only needed when there's more than
one ecc.step ... small page chips won't need it.  Maybe it
would suffice to map that mode to ECC_HW in those cases.

- Dave
nsnehaprabha@ti.com April 9, 2009, 1:41 p.m. UTC | #13
Thomas,

> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Tuesday, April 07, 2009 3:48 PM
> To: Narnakaje, Snehaprabha
> Cc: David Brownell; davinci-linux-open-source@linux.davincidsp.com; linux-
> mtd@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH] MTD-NAND: Changes to read_page APIs to support
> NAND_ECC_HW_SYNDROME mode on TI DaVinci DM355
> 
> On Tue, 7 Apr 2009, Narnakaje, Snehaprabha wrote:
> > > > On DaVinci DM355 device, we need to pass the ecc code/syndrome
> > > > extracted from OOB area, in order to read the HW ECC for each data
> > > > chunk. This is where we differ from NAND_ECC_HW mode.
> > >
> > > I have to admit that I'm slightly confused. Is the below description
> > > correct?
> > >
> > > On write you generate the ECC via hardware and store it in the OOB
> > > area, right ?
> 
> Your mailer still does not insert line breaks at around 78 chars.
> 
> > Yes. For a large page (e.g. 2K+64), ECC should be stored in the
> > 64bytes OOB region. The NAND controller on DaVinci DM355 device is
> > capable of generating HW ECC (4-bit) for every 512bytes. This means,
> > we have 4 eccsteps. The ECC should be generated by the HW for every
> > 512byte chunk write and stored temporarily in a buffer until all 4
> > eccsteps have been tried. The ECC from the temporary buffer is then
> > written to the OOB area.
> 
> That's exactly what NAND_ECC_HW does.
> 
> > >
> > > On read you read the oob data first and extract the ECC. Then you feed
> > > the extracted ECC into the HW generator and read the data. Is this
> > > correct ?
> 
> > Almost, read OOB, read data and then feed the extracted ECC into HW
> > generator. Read the ECC status to see any correction required on the
> > read data. Again, reading data, feeding the extracted ECC into HW
> > generator and the correction will have to be repeated for # of times
> > - eccsteps.
> 
> Ok.
> 
> > >
> > > > The read_page/write_page APIs for NAND_ECC_HW_SYNDROME have the
> > > > other problem that David mentioned - overwriting NAND manufacturers
> > > > bad block meta data, when used with large page NAND chips. These
> > > > APIs do not handle the "eccsteps" properly. The OOB is read/written
> > > > after every data chunk, thus you actually have overwritten the
> > > > factory bad block information, when these APIs handle the last data
> > > > chunk. OOB should be read/written after the entire data (a page) is
> > > > read/written.
> > >
> > > Again, that is _how_ the NAND_ECC_HW_SYNDROME functions work. And if
> > > your hardware needs a completely different mode, then we need to
> > > analyse what's the best solution for it.
> 
> > Based on the description above, NAND_ECC_HW_SYNDROME fitted well,
> 
> No, it does not fit well.
> 
> > with the exception of overriding the read_page/write_page APIs (to
> > fix the bugs mentioned above). I have not sent the actual NAND
> 
> Those are not bugs. You abused that interface and now you claim it has
> bugs. It does _not_. You introduced the bugs by using a function for
> something it was never designed for.
> 
> > driver to the linux-mtd yet, since the initial version (supported
> > only 1-bit HW ECC) submitted by David Brownell was still in review
> > (no comments and not approved yet). While coming up with read_page
> > API in the DaVinci NAND driver, we realized the need to pass "page"
> > parameter. The read_page API in the DaVinci NAND driver required to
> > call chip->cmdfunc to adjust the read location (page vs. OOB). The
> > "page" parameter has to be passed to the chip->cmdfunc.
> 
> This is the wrong approach.
> 
> What you need is a NAND_ECC_HW_OOB_FIRST mode, which uses the
> NAND_ECC_HW write function and implements a separate read function
> which reads the oob and then reads the data chunks.

OK, I will come up with the new ECC mode with the separate read_page handler.

However, I am still not clear on how I can avoid the change to read_page handler definition to pass page. This new read_page handler does require it, if it has to call the cmdfunc to adjust the read pointer to OOB area. Basically, I am referring to the code snippet below -

...
/* Read the OOB area first */
chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page);
chip->read_buf(mtd, oob, mtd->oobsize);
chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
...

The function that calls read_page handler has already called the cmdfunc to adjust the read pointer to the beginning of the page. The read_page handler for the new ECC mode will have to re-adjust the read pointer to read the OOB first.

Thanks
Sneha

> 
> Thanks,
> 
> 	tglx
David Brownell April 14, 2009, 5:36 p.m. UTC | #14
On Tuesday 07 April 2009, David Brownell wrote:
> > 
> >  I have not sent the actual NAND driver to the linux-mtd yet, since
> >  the initial version (supported only 1-bit HW ECC) submitted by David
> >  Brownell was still in review (no comments and not approved yet).
> 
> It merged to mainline yesterday; should verify on DM6446 EVM.

For the record:  it seems to have merged OK, and the resulting
driver passed all the mtd tests I threw at it.  Current mainline
Linux and U-Boot interop through that small-page NAND just fine.

- Dave
diff mbox

Patch

diff --git a/drivers/mtd/nand/atmel_nand.c b/drivers/mtd/nand/atmel_nand.c
index 47a33ce..b63eddb 100644
--- a/drivers/mtd/nand/atmel_nand.c
+++ b/drivers/mtd/nand/atmel_nand.c
@@ -214,7 +214,7 @@  static int atmel_nand_calculate(struct mtd_info *mtd,
  * buf:        buffer to store read data
  */
 static int atmel_nand_read_page(struct mtd_info *mtd,
-		struct nand_chip *chip, uint8_t *buf)
+		struct nand_chip *chip, uint8_t *buf, int page)
 {
 	int eccsize = chip->ecc.size;
 	int eccbytes = chip->ecc.bytes;
diff --git a/drivers/mtd/nand/cafe_nand.c b/drivers/mtd/nand/cafe_nand.c
index 22a6b2e..8ea1623 100644
--- a/drivers/mtd/nand/cafe_nand.c
+++ b/drivers/mtd/nand/cafe_nand.c
@@ -381,7 +381,7 @@  static int cafe_nand_read_oob(struct mtd_info *mtd, struct nand_chip *chip,
  * we need a special oob layout and handling.
  */
 static int cafe_nand_read_page(struct mtd_info *mtd, struct nand_chip *chip,
-			       uint8_t *buf)
+			       uint8_t *buf, int page)
 {
 	struct cafe_priv *cafe = mtd->priv;
 
diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c
index 1f6eb25..ddd37d2 100644
--- a/drivers/mtd/nand/fsl_elbc_nand.c
+++ b/drivers/mtd/nand/fsl_elbc_nand.c
@@ -739,7 +739,8 @@  static int fsl_elbc_chip_init_tail(struct mtd_info *mtd)
 
 static int fsl_elbc_read_page(struct mtd_info *mtd,
                               struct nand_chip *chip,
-                              uint8_t *buf)
+			      uint8_t *buf,
+			      int page)
 {
 	fsl_elbc_read_buf(mtd, buf, mtd->writesize);
 	fsl_elbc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 599185c..d7f6650 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -750,7 +750,7 @@  static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
  * @buf:	buffer to store read data
  */
 static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
-			      uint8_t *buf)
+			      uint8_t *buf, int page)
 {
 	chip->read_buf(mtd, buf, mtd->writesize);
 	chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
@@ -764,7 +764,7 @@  static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
  * @buf:	buffer to store read data
  */
 static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip,
-				uint8_t *buf)
+				uint8_t *buf, int page)
 {
 	int i, eccsize = chip->ecc.size;
 	int eccbytes = chip->ecc.bytes;
@@ -774,7 +774,7 @@  static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip,
 	uint8_t *ecc_code = chip->buffers->ecccode;
 	uint32_t *eccpos = chip->ecc.layout->eccpos;
 
-	chip->ecc.read_page_raw(mtd, chip, buf);
+	chip->ecc.read_page_raw(mtd, chip, buf, page);
 
 	for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize)
 		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
@@ -887,7 +887,7 @@  static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3
  * Not for syndrome calculating ecc controllers which need a special oob layout
  */
 static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
-				uint8_t *buf)
+				uint8_t *buf, int page)
 {
 	int i, eccsize = chip->ecc.size;
 	int eccbytes = chip->ecc.bytes;
@@ -932,7 +932,7 @@  static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
  * we need a special oob layout and handling.
  */
 static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
-				   uint8_t *buf)
+				   uint8_t *buf, int page)
 {
 	int i, eccsize = chip->ecc.size;
 	int eccbytes = chip->ecc.bytes;
@@ -997,8 +997,10 @@  static uint8_t *nand_transfer_oob(struct nand_chip *chip, uint8_t *oob,
 		struct nand_oobfree *free = chip->ecc.layout->oobfree;
 		uint32_t boffs = 0, roffs = ops->ooboffs;
 		size_t bytes = 0;
+		unsigned int i;
 
-		for(; free->length && len; free++, len -= bytes) {
+		for (i = 0; i < chip->ecc.steps && i < MTD_MAX_OOBFREE_ENTRIES
+		     && free->length && len; i++, free++, len -= bytes) {
 			/* Read request not from offset 0 ? */
 			if (unlikely(roffs)) {
 				if (roffs >= free->length) {
@@ -1074,11 +1076,13 @@  static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 
 			/* Now read the page into the buffer */
 			if (unlikely(ops->mode == MTD_OOB_RAW))
-				ret = chip->ecc.read_page_raw(mtd, chip, bufpoi);
+				ret = chip->ecc.read_page_raw(mtd, chip,
+							      bufpoi, page);
 			else if (!aligned && NAND_SUBPAGE_READ(chip) && !oob)
 				ret = chip->ecc.read_subpage(mtd, chip, col, bytes, bufpoi);
 			else
-				ret = chip->ecc.read_page(mtd, chip, bufpoi);
+				ret = chip->ecc.read_page(mtd, chip, bufpoi,
+							  page);
 			if (ret < 0)
 				break;
 
@@ -1666,8 +1670,10 @@  static uint8_t *nand_fill_oob(struct nand_chip *chip, uint8_t *oob,
 		struct nand_oobfree *free = chip->ecc.layout->oobfree;
 		uint32_t boffs = 0, woffs = ops->ooboffs;
 		size_t bytes = 0;
+		unsigned int i;
 
-		for(; free->length && len; free++, len -= bytes) {
+		for (i = 0; i < chip->ecc.steps && i < MTD_MAX_OOBFREE_ENTRIES
+		     && free->length && len; i++, free++, len -= bytes) {
 			/* Write request not from offset 0 ? */
 			if (unlikely(woffs)) {
 				if (woffs >= free->length) {
@@ -2651,16 +2657,6 @@  int nand_scan_tail(struct mtd_info *mtd)
 	}
 
 	/*
-	 * The number of bytes available for a client to place data into
-	 * the out of band area
-	 */
-	chip->ecc.layout->oobavail = 0;
-	for (i = 0; chip->ecc.layout->oobfree[i].length; i++)
-		chip->ecc.layout->oobavail +=
-			chip->ecc.layout->oobfree[i].length;
-	mtd->oobavail = chip->ecc.layout->oobavail;
-
-	/*
 	 * Set the number of read / write steps for one page depending on ECC
 	 * mode
 	 */
@@ -2672,6 +2668,17 @@  int nand_scan_tail(struct mtd_info *mtd)
 	chip->ecc.total = chip->ecc.steps * chip->ecc.bytes;
 
 	/*
+	 * The number of bytes available for a client to place data into
+	 * the out of band area
+	 */
+	chip->ecc.layout->oobavail = 0;
+	for (i = 0; i < chip->ecc.steps && i < MTD_MAX_OOBFREE_ENTRIES &&
+	     chip->ecc.layout->oobfree[i].length; i++)
+		chip->ecc.layout->oobavail +=
+			chip->ecc.layout->oobfree[i].length;
+	mtd->oobavail = chip->ecc.layout->oobavail;
+
+	/*
 	 * Allow subpage writes up to ecc.steps. Not possible for MLC
 	 * FLASH.
 	 */
diff --git a/drivers/mtd/nand/sh_flctl.c b/drivers/mtd/nand/sh_flctl.c
index 821acb0..77e74a3 100644
--- a/drivers/mtd/nand/sh_flctl.c
+++ b/drivers/mtd/nand/sh_flctl.c
@@ -324,7 +324,7 @@  static void set_cmd_regs(struct mtd_info *mtd, uint32_t cmd, uint32_t flcmcdr_va
 }
 
 static int flctl_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
-				uint8_t *buf)
+				uint8_t *buf, int page)
 {
 	int i, eccsize = chip->ecc.size;
 	int eccbytes = chip->ecc.bytes;
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 27fb694..63ed0eb 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -43,8 +43,8 @@  extern void nand_wait_ready(struct mtd_info *mtd);
  * is supported now. If you add a chip with bigger oobsize/page
  * adjust this accordingly.
  */
-#define NAND_MAX_OOBSIZE	64
-#define NAND_MAX_PAGESIZE	2048
+#define NAND_MAX_OOBSIZE	128
+#define NAND_MAX_PAGESIZE	4096
 
 /*
  * Constants for hardware specific CLE/ALE/NCE function
@@ -278,13 +278,13 @@  struct nand_ecc_ctrl {
 					   uint8_t *calc_ecc);
 	int			(*read_page_raw)(struct mtd_info *mtd,
 						 struct nand_chip *chip,
-						 uint8_t *buf);
+						 uint8_t *buf, int page);
 	void			(*write_page_raw)(struct mtd_info *mtd,
 						  struct nand_chip *chip,
 						  const uint8_t *buf);
 	int			(*read_page)(struct mtd_info *mtd,
 					     struct nand_chip *chip,
-					     uint8_t *buf);
+					     uint8_t *buf, int page);
 	int			(*read_subpage)(struct mtd_info *mtd,
 					     struct nand_chip *chip,
 					     uint32_t offs, uint32_t len,