diff mbox

[v2,3/4] mtd: nand: support Micron READ RETRY

Message ID 1386619753-27613-3-git-send-email-computersforpeace@gmail.com
State Superseded, archived
Headers show

Commit Message

Brian Norris Dec. 9, 2013, 8:09 p.m. UTC
MLC NAND can experience a large number of bitflips (beyond the
recommended correctability capacity) due to drifts in the voltage
threshold (Vt). These bitflips can cause ECC errors to occur well within
the expected lifetime of the flash. To account for this, some
manufacturers provide a mechanism for shifting the Vt threshold after a
corrupted read. Micron provides the necessary information via the ONFI
vendor-specific parameter block (to indicate how many read-retry modes
are available) and the ONFI {GET,SET}_FEATURES commands with a
vendor-specific feature address (to support reading/switching the
current read-retry mode).

The recommmended sequence is as follows:

  1. Perform PAGE_READ operation
  2. If no ECC error, we are done
  3. Run SET_FEATURES with feature address 89h, mode 1
  4. Retry PAGE_READ operation
  5. If ECC error and there are remaining supported modes, increment the
     mode and return to step 3. Otherwise, this is a true ECC error.
  6. Run SET_FEATURES with feature address 89h, mode 0, to return to the
     default state.

Tested on Micron MT29F32G08CBADA, suppors 8 read-retry modes.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
---
v1 -> v2: fix a logic error when incrementing retry_mode, which caused -EINVAL
          failures on flash that didn't need READ RETRY

One thing I'm not sure about: do all relevant (i.e., ONFI-capable) NAND drivers
support SET_FEATURES properly? If not, then it's possible that this could break
nand_do_read_ops for such drivers. Not sure what the best method of handling
that would be.

 drivers/mtd/nand/nand_base.c | 71 +++++++++++++++++++++++++++++++++++++++++---
 include/linux/mtd/nand.h     |  6 ++++
 2 files changed, 73 insertions(+), 4 deletions(-)

Comments

Huang Shijie Dec. 11, 2013, 1:54 p.m. UTC | #1
On Mon, Dec 09, 2013 at 12:09:12PM -0800, Brian Norris wrote:
> + * nand_set_read_retry - [INTERN] Set the READ RETRY mode
> + * @mtd: MTD device structure
> + * @retry_mode: the retry mode to use
> + *
> + * Some vendors supply a special command to shift the Vt threshold, to be used
> + * when there are too many bitflips in a page (i.e., ECC error). After setting
> + * a new threshold, the host should retry reading the page.
> + */
> +static int nand_set_read_retry(struct mtd_info *mtd, int retry_mode)
> +{
> +	struct nand_chip *chip = mtd->priv;
> +	uint8_t feature[ONFI_SUBFEATURE_PARAM_LEN] = {retry_mode};
> +
This can cause a DMA warning.

> +	if (retry_mode >= chip->read_retries)
> +		return -EINVAL;
> +
> +	if (chip->onfi_params.jedec_id == NAND_MFR_MICRON)
> +		return chip->onfi_set_features(mtd, chip,
> +				ONFI_FEATURE_ADDR_READ_RETRY, feature);
I suggest to add a hook such as for nand_chip{}:
	chip->read_retry(..)

Different nand chips fill different hook.

For Micron, fill it with micron_read_retry();
for Toshiba, fill it with a toshiab_read_retry();
For Hynix, fill it with hynix_read_retry().

I am wondar if we should add a file for the read-retry in the
drivers/mtd/nand folder.


thanks
Huang Shijie
Brian Norris Dec. 11, 2013, 7:03 p.m. UTC | #2
On Wed, Dec 11, 2013 at 09:54:54PM +0800, Huang Shijie wrote:
> On Mon, Dec 09, 2013 at 12:09:12PM -0800, Brian Norris wrote:
> > + * nand_set_read_retry - [INTERN] Set the READ RETRY mode
> > + * @mtd: MTD device structure
> > + * @retry_mode: the retry mode to use
> > + *
> > + * Some vendors supply a special command to shift the Vt threshold, to be used
> > + * when there are too many bitflips in a page (i.e., ECC error). After setting
> > + * a new threshold, the host should retry reading the page.
> > + */
> > +static int nand_set_read_retry(struct mtd_info *mtd, int retry_mode)
> > +{
> > +	struct nand_chip *chip = mtd->priv;
> > +	uint8_t feature[ONFI_SUBFEATURE_PARAM_LEN] = {retry_mode};
> > +
> This can cause a DMA warning.

...on GPMI NAND, but not on most (any?) other drivers. Why does GPMI try
to use DMA on *every* operation? That doesn't even make sense for a 4 or
5 byte transfer. Plus, we don't give a guarantee that buffers will be
DMA-able in MTD (UBI uses vmalloc() buffers, for instance), so I'm sure
you'll hit problems other places. I can fix this one (use
chip->buffers->databuf instead?) but I think GPMI is a bad citizen in
this regard, given that you have no guarantee. You need to fix MTD
systematically if you expect this guarantee.

(For one, nand_default_block_markbad() uses stack-allocated buffers; but
I see that GPMI overrides this callback.)

> > +	if (retry_mode >= chip->read_retries)
> > +		return -EINVAL;
> > +
> > +	if (chip->onfi_params.jedec_id == NAND_MFR_MICRON)
> > +		return chip->onfi_set_features(mtd, chip,
> > +				ONFI_FEATURE_ADDR_READ_RETRY, feature);
> I suggest to add a hook such as for nand_chip{}:
> 	chip->read_retry(..)
> 
> Different nand chips fill different hook.

OK, that can make sense.

> For Micron, fill it with micron_read_retry();
> for Toshiba, fill it with a toshiab_read_retry();
> For Hynix, fill it with hynix_read_retry().

Do you have info on Toshiba/Hynix read retry? I'd be interested to know
what their "features" look like. Do they have a similar set of supported
retry modes, where we have to cycle through each mode and retry? I'm
interested in whether the nand_chip.read_retries field is useful for the
others.

Also, what would a 'read_retry()' callback look like? What are its
parameters? Right now, it's just the struct mtd_info + an incrementing
integer, from 0 to chip->read_retries-1.

> I am wondar if we should add a file for the read-retry in the
> drivers/mtd/nand folder.

TBH, I'm not really very happy to be opening the read-retry can of
worms. From what I hear, read-retry is just the first step down a path
of vendor-specific detail, which ends with you re-implementing the FTL
that these vendors develop for their SSDs.

But anyway, no I don't think read-retry warrants its own file, at least
not yet. I'd need to understand where it's heading before we decide that
it needs this kind of separation.

I do think, though, that it may be worth moving some more of the
vendor-specific stuff (like the identification code,
nand_get_flash_type(), + read-retry callbacks) into a new file. Maybe
nand_vendors.c? nand_ident.c? But I think it needs a bit of cleanup
first; untangling the auto-buswidth stuff that Pekon, Ezequiel and I
have been dealing with might be a start.

Brian
Ezequiel Garcia Dec. 11, 2013, 8:31 p.m. UTC | #3
On Wed, Dec 11, 2013 at 11:03:13AM -0800, Brian Norris wrote:
> On Wed, Dec 11, 2013 at 09:54:54PM +0800, Huang Shijie wrote:
> > On Mon, Dec 09, 2013 at 12:09:12PM -0800, Brian Norris wrote:
> > > + * nand_set_read_retry - [INTERN] Set the READ RETRY mode
> > > + * @mtd: MTD device structure
> > > + * @retry_mode: the retry mode to use
> > > + *
> > > + * Some vendors supply a special command to shift the Vt threshold, to be used
> > > + * when there are too many bitflips in a page (i.e., ECC error). After setting
> > > + * a new threshold, the host should retry reading the page.
> > > + */
> > > +static int nand_set_read_retry(struct mtd_info *mtd, int retry_mode)
> > > +{
> > > +	struct nand_chip *chip = mtd->priv;
> > > +	uint8_t feature[ONFI_SUBFEATURE_PARAM_LEN] = {retry_mode};
> > > +
> > This can cause a DMA warning.
> 
> ...on GPMI NAND, but not on most (any?) other drivers. Why does GPMI try
> to use DMA on *every* operation? That doesn't even make sense for a 4 or
> 5 byte transfer. Plus, we don't give a guarantee that buffers will be
> DMA-able in MTD (UBI uses vmalloc() buffers, for instance), so I'm sure
> you'll hit problems other places. I can fix this one (use
> chip->buffers->databuf instead?) but I think GPMI is a bad citizen in
> this regard, given that you have no guarantee. You need to fix MTD
> systematically if you expect this guarantee.
> 
> (For one, nand_default_block_markbad() uses stack-allocated buffers; but
> I see that GPMI overrides this callback.)
> 

Isn't completely bloated to setup a DMA operation just to transfer a
bunch of bytes? Wouldn't that seriously hurt performance?

Or maybe it doesn't matter because those commands are scarcely
invoked...
Huang Shijie Dec. 12, 2013, 3:47 a.m. UTC | #4
On Wed, Dec 11, 2013 at 11:03:13AM -0800, Brian Norris wrote:
> On Wed, Dec 11, 2013 at 09:54:54PM +0800, Huang Shijie wrote:
> > On Mon, Dec 09, 2013 at 12:09:12PM -0800, Brian Norris wrote:
> > > + * nand_set_read_retry - [INTERN] Set the READ RETRY mode
> > > + * @mtd: MTD device structure
> > > + * @retry_mode: the retry mode to use
> > > + *
> > > + * Some vendors supply a special command to shift the Vt threshold, to be used
> > > + * when there are too many bitflips in a page (i.e., ECC error). After setting
> > > + * a new threshold, the host should retry reading the page.
> > > + */
> > > +static int nand_set_read_retry(struct mtd_info *mtd, int retry_mode)
> > > +{
> > > +	struct nand_chip *chip = mtd->priv;
> > > +	uint8_t feature[ONFI_SUBFEATURE_PARAM_LEN] = {retry_mode};
> > > +
> > This can cause a DMA warning.
> 
> ...on GPMI NAND, but not on most (any?) other drivers. Why does GPMI try
> to use DMA on *every* operation? That doesn't even make sense for a 4 or
> 5 byte transfer. Plus, we don't give a guarantee that buffers will be
> DMA-able in MTD (UBI uses vmalloc() buffers, for instance), so I'm sure
> you'll hit problems other places. I can fix this one (use
> chip->buffers->databuf instead?) but I think GPMI is a bad citizen in
> this regard, given that you have no guarantee. You need to fix MTD
> systematically if you expect this guarantee.
You do not need to change this code, if you like it.

The DMA warning only occurs when we enable the CONFIG_DMA_API_DEBUG.


> 
> (For one, nand_default_block_markbad() uses stack-allocated buffers; but
> I see that GPMI overrides this callback.)
> 
> > > +	if (retry_mode >= chip->read_retries)
> > > +		return -EINVAL;
> > > +
> > > +	if (chip->onfi_params.jedec_id == NAND_MFR_MICRON)
> > > +		return chip->onfi_set_features(mtd, chip,
> > > +				ONFI_FEATURE_ADDR_READ_RETRY, feature);
> > I suggest to add a hook such as for nand_chip{}:
> > 	chip->read_retry(..)
> > 
> > Different nand chips fill different hook.
> 
> OK, that can make sense.
> 
> > For Micron, fill it with micron_read_retry();
> > for Toshiba, fill it with a toshiab_read_retry();
> > For Hynix, fill it with hynix_read_retry().
> 
> Do you have info on Toshiba/Hynix read retry? I'd be interested to know
> what their "features" look like. Do they have a similar set of supported
> retry modes, where we have to cycle through each mode and retry? I'm
> interested in whether the nand_chip.read_retries field is useful for the
> others.

i will send you the documents when i back office.
Different nand chips use different read-retry methods. A headache to us.

The common idea between the read-retry methods is:
  "we have to cycle through each mode and retry"

So the chip.read_retries is used by others too.

I ever implemented the Micron's read-retry in my local code.
I planned to implement others, but i was intterrupted by high priority
jobs.

> 
> Also, what would a 'read_retry()' callback look like? What are its
> parameters? Right now, it's just the struct mtd_info + an incrementing
> integer, from 0 to chip->read_retries-1.

I think the parameters are good now.

> 
> > I am wondar if we should add a file for the read-retry in the
> > drivers/mtd/nand folder.
> 
> TBH, I'm not really very happy to be opening the read-retry can of
> worms. From what I hear, read-retry is just the first step down a path

it is ok to me if you do not like to create a new file, that is just a
suggestion.


After the nand chip use <19nm, afaik, all the NAND chips needs the
read-retry. We have to implement the read-retry for all of them.
The nightmare is coming.

thanks
Huang Shijie
Huang Shijie Dec. 12, 2013, 3:49 a.m. UTC | #5
On Wed, Dec 11, 2013 at 05:31:12PM -0300, Ezequiel Garcia wrote:
> Isn't completely bloated to setup a DMA operation just to transfer a
> bunch of bytes? Wouldn't that seriously hurt performance?
> 
> Or maybe it doesn't matter because those commands are scarcely
> invoked...
As you said, those commands are scarcely invoked.

thanks
Huang Shijie
Huang Shijie Dec. 17, 2013, 4:23 a.m. UTC | #6
On Mon, Dec 16, 2013 at 08:43:42PM -0800, Brian Norris wrote:
> > 
> > The DMA warning only occurs when we enable the CONFIG_DMA_API_DEBUG.
> 
> But the warning has a significant purpose: it is *not* legal to perform
> DMA on stack memory, and no driver should do this. Either nand_base or
> gpmi-nand need to be fixed here.
> 
> But I think it's the GPMI driver is the one that needs fixing, not this
> patch. Are there any scenarios where it makes sense for gpmi-nand to use
> DMA for its read_buf/write_buf? Shouldn't the only DMA-necessary
> operations come through the ecc.write_page and ecc.read_page callbacks?
> If I'm correct, then gpmi-nand should not be using DMA at all in
> read_buf/write_buf.
Currently, all the operations, including the read_buf/write_buf, use
the DMA.

Is there any ABI tells us we should not use the DMA for read_buf/write_buf? :)


> 
> > Different nand chips use different read-retry methods. A headache to us.
> 
> Any chance we can push these vendors to implement things through a
> standard, like JEDEC? I believe there's a JEDEC parameter page standard
we do not have such influence to push these vendors.
they will not listen to us. :(

> now, though I haven't looked at it closely. Micron's support is nice and
> simple because it properly uses the ONFI vendor block. Other vendors
> should take note...
yes. Micron is good example.


thanks
Huang Shijie
Brian Norris Dec. 17, 2013, 4:43 a.m. UTC | #7
On Thu, Dec 12, 2013 at 11:47:52AM +0800, Huang Shijie wrote:
> On Wed, Dec 11, 2013 at 11:03:13AM -0800, Brian Norris wrote:
> > On Wed, Dec 11, 2013 at 09:54:54PM +0800, Huang Shijie wrote:
> > > On Mon, Dec 09, 2013 at 12:09:12PM -0800, Brian Norris wrote:
> > > > +	uint8_t feature[ONFI_SUBFEATURE_PARAM_LEN] = {retry_mode};
> > > > +
> > > This can cause a DMA warning.
> > 
> > ...on GPMI NAND, but not on most (any?) other drivers. Why does GPMI try
> > to use DMA on *every* operation? That doesn't even make sense for a 4 or
> > 5 byte transfer. Plus, we don't give a guarantee that buffers will be
> > DMA-able in MTD (UBI uses vmalloc() buffers, for instance), so I'm sure
> > you'll hit problems other places. I can fix this one (use
> > chip->buffers->databuf instead?) but I think GPMI is a bad citizen in
> > this regard, given that you have no guarantee. You need to fix MTD
> > systematically if you expect this guarantee.
> You do not need to change this code, if you like it.
> 
> The DMA warning only occurs when we enable the CONFIG_DMA_API_DEBUG.

But the warning has a significant purpose: it is *not* legal to perform
DMA on stack memory, and no driver should do this. Either nand_base or
gpmi-nand need to be fixed here.

But I think it's the GPMI driver is the one that needs fixing, not this
patch. Are there any scenarios where it makes sense for gpmi-nand to use
DMA for its read_buf/write_buf? Shouldn't the only DMA-necessary
operations come through the ecc.write_page and ecc.read_page callbacks?
If I'm correct, then gpmi-nand should not be using DMA at all in
read_buf/write_buf.

> Different nand chips use different read-retry methods. A headache to us.

Any chance we can push these vendors to implement things through a
standard, like JEDEC? I believe there's a JEDEC parameter page standard
now, though I haven't looked at it closely. Micron's support is nice and
simple because it properly uses the ONFI vendor block. Other vendors
should take note...

> The common idea between the read-retry methods is:
>   "we have to cycle through each mode and retry"
> 
> So the chip.read_retries is used by others too.

Good to know.

> After the nand chip use <19nm, afaik, all the NAND chips needs the
> read-retry. We have to implement the read-retry for all of them.
> The nightmare is coming.

*cue the dark organ music*

Sigh. I might have to abandon MTD for eMMC some day ;)

Brian
Brian Norris Dec. 17, 2013, 5:06 a.m. UTC | #8
On Tue, Dec 17, 2013 at 12:23:08PM +0800, Huang Shijie wrote:
> On Mon, Dec 16, 2013 at 08:43:42PM -0800, Brian Norris wrote:
> > > 
> > > The DMA warning only occurs when we enable the CONFIG_DMA_API_DEBUG.
> > 
> > But the warning has a significant purpose: it is *not* legal to perform
> > DMA on stack memory, and no driver should do this. Either nand_base or
> > gpmi-nand need to be fixed here.
> > 
> > But I think it's the GPMI driver is the one that needs fixing, not this
> > patch. Are there any scenarios where it makes sense for gpmi-nand to use
> > DMA for its read_buf/write_buf? Shouldn't the only DMA-necessary
> > operations come through the ecc.write_page and ecc.read_page callbacks?
> > If I'm correct, then gpmi-nand should not be using DMA at all in
> > read_buf/write_buf.
> Currently, all the operations, including the read_buf/write_buf, use
> the DMA.
> 
> Is there any ABI tells us we should not use the DMA for read_buf/write_buf? :)

No... but this is not about ABI, it's about designing a reasonable API
and abstraction between users (NAND drivers) and the helper code
(nand_base). MTD never had a good design for DMA-able operations; but I
think that it's at least an approachable problem if we provide
limitations on when and where we might expect to use DMA.

Particularly, I think it is quite reasonable to restrict DMA to
mtd_read()/mtd_write()-like operations -- so this trickles down to
read_page/write_page, but not to device configuration operations like
ONFI set_features/get_features/read parameter tables, etc.

Do you disagree? Do you see some need to perform DMA on all operations?

> > > Different nand chips use different read-retry methods. A headache to us.
> > 
> > Any chance we can push these vendors to implement things through a
> > standard, like JEDEC? I believe there's a JEDEC parameter page standard
> we do not have such influence to push these vendors.
> they will not listen to us. :(

We can try! Strength in numbers! BTW, last time I brought up ONFI w/ a
Toshiba representative, I think I offended them :) But at least I had
their ear for a moment.

Anyway, it may be more reasonable to suggest a JEDEC standard, as I
think they felt JEDEC was more vendor-independent, and they were at
least open to entertaining the idea.

Brian
Huang Shijie Dec. 17, 2013, 5:11 a.m. UTC | #9
On Mon, Dec 16, 2013 at 09:06:13PM -0800, Brian Norris wrote:
> On Tue, Dec 17, 2013 at 12:23:08PM +0800, Huang Shijie wrote:
> > On Mon, Dec 16, 2013 at 08:43:42PM -0800, Brian Norris wrote:
> > > > 
> > > > The DMA warning only occurs when we enable the CONFIG_DMA_API_DEBUG.
> Particularly, I think it is quite reasonable to restrict DMA to
> mtd_read()/mtd_write()-like operations -- so this trickles down to
> read_page/write_page, but not to device configuration operations like
> ONFI set_features/get_features/read parameter tables, etc.
> 
> Do you disagree? Do you see some need to perform DMA on all operations?

The gpmi is much coupled with the MXS-DMA engine. the gpmi can uses the
mxs-dma to set the gpmi registers.

Use the DMA makes the GPMI code very simple, it really makes the code very
_ugly_ if we do not use the DMA for read_buf/write_buf.

If you insist that it is okay to use the local array in the nand_base.c,
I can use kmalloc/memcpy in the gpmi's read_buf/write_buf hooks. In such a 
way, we can make the gpmi looks more normal.

sorry, I really think it is no need to force the gpmi do not use the DMA
for read_buf/write_buf.

> 
> > > > Different nand chips use different read-retry methods. A headache to us.
> > > 
> > > Any chance we can push these vendors to implement things through a
> > > standard, like JEDEC? I believe there's a JEDEC parameter page standard
> > we do not have such influence to push these vendors.
> > they will not listen to us. :(
> 
> We can try! Strength in numbers! BTW, last time I brought up ONFI w/ a
> Toshiba representative, I think I offended them :) But at least I had
> their ear for a moment.
> 
look good to me. :)

I will suggest their FAE/AE when i meet them next time.

thanks
Huang Shijie
Brian Norris Dec. 17, 2013, 7:01 a.m. UTC | #10
On Tue, Dec 17, 2013 at 01:11:25PM +0800, Huang Shijie wrote:
> On Mon, Dec 16, 2013 at 09:06:13PM -0800, Brian Norris wrote:
> > On Tue, Dec 17, 2013 at 12:23:08PM +0800, Huang Shijie wrote:
> > > On Mon, Dec 16, 2013 at 08:43:42PM -0800, Brian Norris wrote:
> > > > > 
> > > > > The DMA warning only occurs when we enable the CONFIG_DMA_API_DEBUG.
> > Particularly, I think it is quite reasonable to restrict DMA to
> > mtd_read()/mtd_write()-like operations -- so this trickles down to
> > read_page/write_page, but not to device configuration operations like
> > ONFI set_features/get_features/read parameter tables, etc.
> > 
> > Do you disagree? Do you see some need to perform DMA on all operations?
> 
> The gpmi is much coupled with the MXS-DMA engine. the gpmi can uses the
> mxs-dma to set the gpmi registers.
> 
> Use the DMA makes the GPMI code very simple, it really makes the code very
> _ugly_ if we do not use the DMA for read_buf/write_buf.

It's not just about ugliness, it's about maintainability. I can clean up
a few cases here and there that don't conform to your driver's needs now
(notice I pointed out another instance of this in Uwe's patch), but
unless we establish better guidelines for NAND, this is fragile.

As a diversion, let's look at atmel_nand for an example. It uses DMA as
well, but it dodges some of your problems using a heuristic, so that it
only uses DMA for larger-than-OOB-size transactions (see
atmel_read_buf). Additionally, it has a hack-y check for highmem, to try
to confirm that the region it's trying to map is, in fact, DMA-able.

So, atmel_nand does not have an ideal solution, but it at least attempts
to solve parts of this problem. GPMI, on the other hand, ignores it
entirely.

> If you insist that it is okay to use the local array in the nand_base.c,
> I can use kmalloc/memcpy in the gpmi's read_buf/write_buf hooks. In such a 
> way, we can make the gpmi looks more normal.

I think a bounce buffer is necessary, if you really want to use DMA all
the time; some transactions are just not worth initializing DMA-able
buffers in the high-level code. But even better, can GPMI easily utilize
some sort of PIO mode for small/slow-path transactions like this, or is
it inextricably tied to the MXS-DMA engine?

Brian
Huang Shijie Dec. 17, 2013, 7:12 a.m. UTC | #11
On Mon, Dec 16, 2013 at 11:01:03PM -0800, Brian Norris wrote:
> On Tue, Dec 17, 2013 at 01:11:25PM +0800, Huang Shijie wrote:
> > On Mon, Dec 16, 2013 at 09:06:13PM -0800, Brian Norris wrote:
> > > On Tue, Dec 17, 2013 at 12:23:08PM +0800, Huang Shijie wrote:
> > > > On Mon, Dec 16, 2013 at 08:43:42PM -0800, Brian Norris wrote:
> > > > > > 
> > > > > > The DMA warning only occurs when we enable the CONFIG_DMA_API_DEBUG.
> > > Particularly, I think it is quite reasonable to restrict DMA to
> > > mtd_read()/mtd_write()-like operations -- so this trickles down to
> > > read_page/write_page, but not to device configuration operations like
> > > ONFI set_features/get_features/read parameter tables, etc.
> > > 
> > > Do you disagree? Do you see some need to perform DMA on all operations?
> > 
> > The gpmi is much coupled with the MXS-DMA engine. the gpmi can uses the
> > mxs-dma to set the gpmi registers.
> > 
> > Use the DMA makes the GPMI code very simple, it really makes the code very
> > _ugly_ if we do not use the DMA for read_buf/write_buf.
> 
> It's not just about ugliness, it's about maintainability. I can clean up
> a few cases here and there that don't conform to your driver's needs now
> (notice I pointed out another instance of this in Uwe's patch), but
> unless we establish better guidelines for NAND, this is fragile.
> 
> As a diversion, let's look at atmel_nand for an example. It uses DMA as
> well, but it dodges some of your problems using a heuristic, so that it
> only uses DMA for larger-than-OOB-size transactions (see
> atmel_read_buf). Additionally, it has a hack-y check for highmem, to try
> to confirm that the region it's trying to map is, in fact, DMA-able.
> 
> So, atmel_nand does not have an ideal solution, but it at least attempts
> to solve parts of this problem. GPMI, on the other hand, ignores it
> entirely.

ok.

i will fix it by a new patch to add more checks for the read_buf/write_buf.


> 
> > If you insist that it is okay to use the local array in the nand_base.c,
> > I can use kmalloc/memcpy in the gpmi's read_buf/write_buf hooks. In such a 
> > way, we can make the gpmi looks more normal.
> 
> I think a bounce buffer is necessary, if you really want to use DMA all
> the time; some transactions are just not worth initializing DMA-able
> buffers in the high-level code. But even better, can GPMI easily utilize
> some sort of PIO mode for small/slow-path transactions like this, or is
yes, the gpmi can utilize the PIO mode for the transactions.


> it inextricably tied to the MXS-DMA engine?

You can think the mxs-dma is designed for GPMI.


thanks
Huang Shijie
diff mbox

Patch

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index e85b07f4293d..f7ac2bfb3445 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1410,6 +1410,30 @@  static uint8_t *nand_transfer_oob(struct nand_chip *chip, uint8_t *oob,
 }
 
 /**
+ * nand_set_read_retry - [INTERN] Set the READ RETRY mode
+ * @mtd: MTD device structure
+ * @retry_mode: the retry mode to use
+ *
+ * Some vendors supply a special command to shift the Vt threshold, to be used
+ * when there are too many bitflips in a page (i.e., ECC error). After setting
+ * a new threshold, the host should retry reading the page.
+ */
+static int nand_set_read_retry(struct mtd_info *mtd, int retry_mode)
+{
+	struct nand_chip *chip = mtd->priv;
+	uint8_t feature[ONFI_SUBFEATURE_PARAM_LEN] = {retry_mode};
+
+	if (retry_mode >= chip->read_retries)
+		return -EINVAL;
+
+	if (chip->onfi_params.jedec_id == NAND_MFR_MICRON)
+		return chip->onfi_set_features(mtd, chip,
+				ONFI_FEATURE_ADDR_READ_RETRY, feature);
+
+	return -EOPNOTSUPP;
+}
+
+/**
  * nand_do_read_ops - [INTERN] Read data with ECC
  * @mtd: MTD device structure
  * @from: offset to read from
@@ -1431,6 +1455,7 @@  static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 	uint8_t *bufpoi, *oob, *buf;
 	unsigned int max_bitflips = 0;
 
+	int retry_mode = 0;
 	bool ecc_fail = false;
 
 	chipnr = (int)(from >> chip->chip_shift);
@@ -1494,8 +1519,6 @@  static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 				memcpy(buf, chip->buffers->databuf + col, bytes);
 			}
 
-			buf += bytes;
-
 			if (unlikely(oob)) {
 				int toread = min(oobreadlen, max_oobsize);
 
@@ -1514,8 +1537,27 @@  static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 					nand_wait_ready(mtd);
 			}
 
-			if (mtd->ecc_stats.failed - ecc_failures)
-				ecc_fail = true;
+			if (mtd->ecc_stats.failed - ecc_failures) {
+				if (retry_mode + 1 < chip->read_retries) {
+					retry_mode++;
+					pr_debug("ECC error; performing READ RETRY %d\n",
+							retry_mode);
+
+					ret = nand_set_read_retry(mtd,
+							retry_mode);
+					if (ret < 0)
+						break;
+
+					/* Reset failures */
+					mtd->ecc_stats.failed = ecc_failures;
+					continue;
+				} else {
+					/* No more retry modes; real failure */
+					ecc_fail = true;
+				}
+			}
+
+			buf += bytes;
 		} else {
 			memcpy(buf, chip->buffers->databuf + col, bytes);
 			buf += bytes;
@@ -1525,6 +1567,14 @@  static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 
 		readlen -= bytes;
 
+		/* Reset to retry mode 0 */
+		if (retry_mode) {
+			ret = nand_set_read_retry(mtd, 0);
+			if (ret < 0)
+				break;
+			retry_mode = 0;
+		}
+
 		if (!readlen)
 			break;
 
@@ -2932,6 +2982,16 @@  ext_out:
 }
 
 /*
+ * Configure chip properties from Micron vendor-specific ONFI table
+ */
+static void nand_onfi_detect_micron(struct nand_chip *chip,
+		struct nand_onfi_params *p)
+{
+	struct nand_onfi_vendor_micron *micron = (void *)p->vendor;
+	chip->read_retries = micron->read_retry_options;
+}
+
+/*
  * Check if the NAND chip is ONFI compliant, returns 1 if it is, 0 otherwise.
  */
 static int nand_flash_detect_onfi(struct mtd_info *mtd, struct nand_chip *chip,
@@ -3037,6 +3097,9 @@  static int nand_flash_detect_onfi(struct mtd_info *mtd, struct nand_chip *chip,
 		pr_warn("Could not retrieve ONFI ECC requirements\n");
 	}
 
+	if (p->jedec_id == NAND_MFR_MICRON)
+		nand_onfi_detect_micron(chip, p);
+
 	return 1;
 }
 
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 029fe5948dc4..6e579e90955e 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -219,6 +219,9 @@  struct nand_chip;
 /* ONFI feature address */
 #define ONFI_FEATURE_ADDR_TIMING_MODE	0x1
 
+/* Vendor-specific feature address (Micron) */
+#define ONFI_FEATURE_ADDR_READ_RETRY	0x89
+
 /* ONFI subfeature parameters length */
 #define ONFI_SUBFEATURE_PARAM_LEN	4
 
@@ -518,6 +521,7 @@  struct nand_buffers {
  *			non 0 if ONFI supported.
  * @onfi_params:	[INTERN] holds the ONFI page parameter when ONFI is
  *			supported, 0 otherwise.
+ * @read_retries:	[INTERN] the number of read retry modes supported
  * @onfi_set_features:	[REPLACEABLE] set the features for ONFI nand
  * @onfi_get_features:	[REPLACEABLE] get the features for ONFI nand
  * @bbt:		[INTERN] bad block table pointer
@@ -589,6 +593,8 @@  struct nand_chip {
 	int onfi_version;
 	struct nand_onfi_params	onfi_params;
 
+	int read_retries;
+
 	flstate_t state;
 
 	uint8_t *oob_poi;