[1/3] mtd: nand: Add on-die ECC support

Message ID	1427292151-3835-2-git-send-email-richard@nod.at
State	Rejected
Headers	show Return-Path: <linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org> From: Richard Weinberger <richard@nod.at> To: linux-mtd@lists.infradead.org Subject: [PATCH 1/3] mtd: nand: Add on-die ECC support Date: Wed, 25 Mar 2015 15:02:29 +0100 Message-Id: <1427292151-3835-2-git-send-email-richard@nod.at> In-Reply-To: <1427292151-3835-1-git-send-email-richard@nod.at> References: <1427292151-3835-1-git-send-email-richard@nod.at> summary: Content analysis details: (-0.0 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record Cc: Richard Weinberger <richard@nod.at>, computersforpeace@gmail.com, dwmw2@infradead.org, linux-kernel@vger.kernel.org Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-mtd" <linux-mtd-bounces@lists.infradead.org> Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org

Richard Weinberger March 25, 2015, 2:02 p.m. UTC

Some Micron NAND chips offer an on-die ECC (AKA internal ECC)
feature. It is useful when the host-platform does not offer
multi-bit ECC and software ECC is not feasible.

Based on original work by David Mosberger <davidm@egauge.net>

Signed-off-by: Richard Weinberger <richard@nod.at>
---
 drivers/mtd/nand/nand_base.c   |  66 +++++++++-
 drivers/mtd/nand/nand_ondie.c  | 266 +++++++++++++++++++++++++++++++++++++++++
 include/linux/mtd/nand.h       |   6 +
 include/linux/mtd/nand_ondie.h |  40 +++++++
 4 files changed, 374 insertions(+), 4 deletions(-)
 create mode 100644 drivers/mtd/nand/nand_ondie.c
 create mode 100644 include/linux/mtd/nand_ondie.h

Paul Bolle March 25, 2015, 8:39 p.m. UTC | #1

Probably not really the feedback you're hoping for at this moment, this
being an RFC, but I spotted two nits anyway.

On Wed, 2015-03-25 at 15:02 +0100, Richard Weinberger wrote:
> --- /dev/null
> +++ b/drivers/mtd/nand/nand_ondie.c

This file makes git am whine: "new blank line at EOF".

> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>

Is this include needed? The code can only be built in. This files also
builds silently without that include. I used the rather hacky
    make EXTRA_CFLAGS="-DCONFIG_MTD_NAND_ECC_ON_DIE" drivers/mtd/nand/nand_ondie.o

to test the build.

> +#include <linux/slab.h>
> +#include <linux/bitops.h>
> +#include <linux/mtd/mtd.h>
> +#include <linux/mtd/nand.h>
> +#include <linux/mtd/nand_ondie.h>


Paul Bolle

Ben Shelton April 27, 2015, 9:35 p.m. UTC | #2

Hi Richard,

On 03/25, Richard Weinberger wrote:
> Some Micron NAND chips offer an on-die ECC (AKA internal ECC)
> feature. It is useful when the host-platform does not offer
> multi-bit ECC and software ECC is not feasible.
> 
> Based on original work by David Mosberger <davidm@egauge.net>
> 
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---

[...]

> +
> +static int check_read_status_on_die(struct mtd_info *mtd, int page)
> +{
> +	struct nand_chip *chip = mtd->priv;
> +	int max_bitflips = 0;
> +	uint8_t status;
> +
> +	/* Check ECC status of page just transferred into NAND's page buffer */
> +	chip->cmdfunc(mtd, NAND_CMD_STATUS, -1, -1);
> +	status = chip->read_byte(mtd);
> +
> +	/* Switch back to data reading */
> +	chip->cmd_ctrl(mtd, NAND_CMD_READ0, NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE);
> +	chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_NCE | NAND_CTRL_CHANGE);

I tested this against the latest version of the PL353 NAND driver that Punnaiah
has been working to upstream (copying her on this message).  With a few changes
to that driver, I got it most of the way through initialization with on-die ECC
enabled, but it segfaults here with a null pointer dereference because the
PL353 driver does not implement chip->cmd_ctrl.  Instead, it implements a
custom override of cmd->cmdfunc that does not call cmd_ctrl.  Looking through
the other in-tree NAND drivers, it looks like most of them do implement
cmd_ctrl, but quite a few of them do not (e.g. au1550nd, denali, docg4).

What do you think would be the best way to handle this?  It seems like this gap
could be bridged from either side -- either the PL353 driver could implement
cmd_ctrl, at least as a stub version that provides the expected behavior in
this case; or the on-die framework could break this out into a callback
function with a default implementation that the driver could override to
perform this behavior in the manner of its choosing.

> +
> +	if (status & NAND_STATUS_FAIL) {
> +		/* Page has invalid ECC */
> +		mtd->ecc_stats.failed++;
> +	} else if (status & NAND_STATUS_REWRITE) {
> +		/*
> +		 * Micron on-die ECC doesn't report the number of
> +		 * bitflips, so we have to count them ourself to see
> +		 * if the error rate is too high.  This is
> +		 * particularly important for pages with stuck bits.
> +		 */
> +		max_bitflips = check_for_bitflips(mtd, page);
> +	}
> +	return max_bitflips;
> +}
> +

[...]

> +#else /* !CONFIG_MTD_NAND_ECC_ON_DIE */
> +static inline int mtd_nand_has_on_die(void) { return 0; }
> +static inline int nand_read_subpage_on_die(struct mtd_info *mtd,
> +					   struct nand_chip *chip,
> +					   uint32_t data_offs,
> +					   uint32_t readlen,
> +					   uint8_t *bufpoi, int page)
> +{
> +	BUG();

When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
following warning here:

In file included from drivers/mtd/nand/nand_base.c:46:0:
include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]

Perhaps return an error code here, even though you'll never get past the BUG()?

> +}
> +
> +static inline int nand_read_page_on_die(struct mtd_info *mtd, struct nand_chip *chip,
> +					uint8_t *buf, int oob_required, int page)
> +{
> +	BUG();

Same here.

> +}

Thanks,
Ben

Richard Weinberger April 27, 2015, 10:19 p.m. UTC | #3

Am 27.04.2015 um 23:35 schrieb Ben Shelton:
> I tested this against the latest version of the PL353 NAND driver that Punnaiah
> has been working to upstream (copying her on this message).  With a few changes
> to that driver, I got it most of the way through initialization with on-die ECC
> enabled, but it segfaults here with a null pointer dereference because the
> PL353 driver does not implement chip->cmd_ctrl.  Instead, it implements a
> custom override of cmd->cmdfunc that does not call cmd_ctrl.  Looking through
> the other in-tree NAND drivers, it looks like most of them do implement
> cmd_ctrl, but quite a few of them do not (e.g. au1550nd, denali, docg4).
> 
> What do you think would be the best way to handle this?  It seems like this gap
> could be bridged from either side -- either the PL353 driver could implement
> cmd_ctrl, at least as a stub version that provides the expected behavior in
> this case; or the on-die framework could break this out into a callback
> function with a default implementation that the driver could override to
> perform this behavior in the manner of its choosing.

Oh, I thought every driver has to implement that function. ;-\
But you're right there is a corner case.

What we could do is just using chip->cmdfunc() with a custom NAND command.
i.e. chip->cmdfunc(mtd, NAND_CMD_READMODE, -1, -1);

Gerhard Sittig tried to introduce such a command some time ago:
http://lists.infradead.org/pipermail/linux-mtd/2014-April/053115.html

Maybe Brian can bring some light into that too...

> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
> following warning here:
> 
> In file included from drivers/mtd/nand/nand_base.c:46:0:
> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
> 
> Perhaps return an error code here, even though you'll never get past the BUG()?

What gcc is this?
gcc 4.8 here does not warn, I thought it is smart enough that this function does never
return. Can it be that your .config has CONFIG_BUG=n?
Anyway, this functions clearly needs a return statement. :)

Thanks,
//richard

Ben Shelton April 27, 2015, 10:36 p.m. UTC | #4

On 04/28, Richard Weinberger wrote:
> Am 27.04.2015 um 23:35 schrieb Ben Shelton:
> > I tested this against the latest version of the PL353 NAND driver that Punnaiah
> > has been working to upstream (copying her on this message).  With a few changes
> > to that driver, I got it most of the way through initialization with on-die ECC
> > enabled, but it segfaults here with a null pointer dereference because the
> > PL353 driver does not implement chip->cmd_ctrl.  Instead, it implements a
> > custom override of cmd->cmdfunc that does not call cmd_ctrl.  Looking through
> > the other in-tree NAND drivers, it looks like most of them do implement
> > cmd_ctrl, but quite a few of them do not (e.g. au1550nd, denali, docg4).
> > 
> > What do you think would be the best way to handle this?  It seems like this gap
> > could be bridged from either side -- either the PL353 driver could implement
> > cmd_ctrl, at least as a stub version that provides the expected behavior in
> > this case; or the on-die framework could break this out into a callback
> > function with a default implementation that the driver could override to
> > perform this behavior in the manner of its choosing.
> 
> Oh, I thought every driver has to implement that function. ;-\
> But you're right there is a corner case.
> 
> What we could do is just using chip->cmdfunc() with a custom NAND command.
> i.e. chip->cmdfunc(mtd, NAND_CMD_READMODE, -1, -1);
> 
> Gerhard Sittig tried to introduce such a command some time ago:
> http://lists.infradead.org/pipermail/linux-mtd/2014-April/053115.html

That sounds reasonable to me.  That's similar to how we're checking the
NAND status after reads in our current out-of-tree PL353 driver.  We
added the extra command:

+ /*
+  * READ0 command only, for checking read status. Note that the real command
+  * here is 0x00, but we can't differentiate between READ0 where we need to
+  * send a READSTART after the address bytes, or a READ0 by itself, after
+  * a read status command to check the on-die ECC status. The high bit is
+  * written into the unused end_cmd field, so we don't need to mask it off.
+  */
+#define NAND_CMD_READ0_ONLY 0x100

and then added it to the struct pl353_nand_command_format of the driver:

 static const struct pl353_nand_command_format pl353_nand_commands[] = {
        {NAND_CMD_READ0, NAND_CMD_READSTART, 5, PL353_NAND_CMD_PHASE},
+       {NAND_CMD_READ0_ONLY, NAND_CMD_NONE, 0, NAND_CMD_NONE},
        {NAND_CMD_RNDOUT, NAND_CMD_RNDOUTSTART, 2, PL353_NAND_CMD_PHASE},
        {NAND_CMD_READID, NAND_CMD_NONE, 1, NAND_CMD_NONE},
        {NAND_CMD_STATUS, NAND_CMD_NONE, 0, NAND_CMD_NONE},

> 
> Maybe Brian can bring some light into that too...
> 
> > When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
> > following warning here:
> > 
> > In file included from drivers/mtd/nand/nand_base.c:46:0:
> > include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
> > include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
> > include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
> > include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
> > 
> > Perhaps return an error code here, even though you'll never get past the BUG()?
> 
> What gcc is this?
> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
> return. Can it be that your .config has CONFIG_BUG=n?
> Anyway, this functions clearly needs a return statement. :)

gcc 4.7.2, and you are correct that I had CONFIG_BUG off.  :)

Thanks,
Ben

> 
> Thanks,
> //richard

Richard Weinberger April 27, 2015, 10:42 p.m. UTC | #5

Am 28.04.2015 um 00:36 schrieb Ben Shelton:
> On 04/28, Richard Weinberger wrote:
>> Am 27.04.2015 um 23:35 schrieb Ben Shelton:
>>> I tested this against the latest version of the PL353 NAND driver that Punnaiah
>>> has been working to upstream (copying her on this message).  With a few changes
>>> to that driver, I got it most of the way through initialization with on-die ECC
>>> enabled, but it segfaults here with a null pointer dereference because the
>>> PL353 driver does not implement chip->cmd_ctrl.  Instead, it implements a
>>> custom override of cmd->cmdfunc that does not call cmd_ctrl.  Looking through
>>> the other in-tree NAND drivers, it looks like most of them do implement
>>> cmd_ctrl, but quite a few of them do not (e.g. au1550nd, denali, docg4).
>>>
>>> What do you think would be the best way to handle this?  It seems like this gap
>>> could be bridged from either side -- either the PL353 driver could implement
>>> cmd_ctrl, at least as a stub version that provides the expected behavior in
>>> this case; or the on-die framework could break this out into a callback
>>> function with a default implementation that the driver could override to
>>> perform this behavior in the manner of its choosing.
>>
>> Oh, I thought every driver has to implement that function. ;-\
>> But you're right there is a corner case.
>>
>> What we could do is just using chip->cmdfunc() with a custom NAND command.
>> i.e. chip->cmdfunc(mtd, NAND_CMD_READMODE, -1, -1);
>>
>> Gerhard Sittig tried to introduce such a command some time ago:
>> http://lists.infradead.org/pipermail/linux-mtd/2014-April/053115.html
> 
> That sounds reasonable to me.  That's similar to how we're checking the
> NAND status after reads in our current out-of-tree PL353 driver.  We
> added the extra command:
> 
> + /*
> +  * READ0 command only, for checking read status. Note that the real command
> +  * here is 0x00, but we can't differentiate between READ0 where we need to
> +  * send a READSTART after the address bytes, or a READ0 by itself, after
> +  * a read status command to check the on-die ECC status. The high bit is
> +  * written into the unused end_cmd field, so we don't need to mask it off.
> +  */
> +#define NAND_CMD_READ0_ONLY 0x100
> 
> and then added it to the struct pl353_nand_command_format of the driver:
> 
>  static const struct pl353_nand_command_format pl353_nand_commands[] = {
>         {NAND_CMD_READ0, NAND_CMD_READSTART, 5, PL353_NAND_CMD_PHASE},
> +       {NAND_CMD_READ0_ONLY, NAND_CMD_NONE, 0, NAND_CMD_NONE},
>         {NAND_CMD_RNDOUT, NAND_CMD_RNDOUTSTART, 2, PL353_NAND_CMD_PHASE},
>         {NAND_CMD_READID, NAND_CMD_NONE, 1, NAND_CMD_NONE},
>         {NAND_CMD_STATUS, NAND_CMD_NONE, 0, NAND_CMD_NONE},

Yep. All you need to do in check_read_status_on_die() is switching back to reading
mode.

>>
>> Maybe Brian can bring some light into that too...
>>
>>> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
>>> following warning here:
>>>
>>> In file included from drivers/mtd/nand/nand_base.c:46:0:
>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
>>> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
>>> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>
>>> Perhaps return an error code here, even though you'll never get past the BUG()?
>>
>> What gcc is this?
>> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
>> return. Can it be that your .config has CONFIG_BUG=n?
>> Anyway, this functions clearly needs a return statement. :)
> 
> gcc 4.7.2, and you are correct that I had CONFIG_BUG off.  :)

Yeah, just noticed that BUG() with CONFIG_BUG=n does not have
a nonreturn attribute. So, gcc cannot know...

Thanks,
//richard

Brian Norris April 27, 2015, 10:53 p.m. UTC | #6

On Tue, Apr 28, 2015 at 12:42:18AM +0200, Richard Weinberger wrote:
> Am 28.04.2015 um 00:36 schrieb Ben Shelton:
> >>> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
> >>> following warning here:
> >>>
> >>> In file included from drivers/mtd/nand/nand_base.c:46:0:
> >>> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
> >>> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
> >>> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
> >>> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
> >>>
> >>> Perhaps return an error code here, even though you'll never get past the BUG()?
> >>
> >> What gcc is this?
> >> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
> >> return. Can it be that your .config has CONFIG_BUG=n?
> >> Anyway, this functions clearly needs a return statement. :)
> > 
> > gcc 4.7.2, and you are correct that I had CONFIG_BUG off.  :)
> 
> Yeah, just noticed that BUG() with CONFIG_BUG=n does not have
> a nonreturn attribute. So, gcc cannot know...

But it's an obvious infinite loop... all of my toolchains (4.2, 4.5,
4.6, 4.8) are able to compile this without complaining (gcc -Wall):

int test() { do { } while (1); }

Not sure if 4.7 is unique.

But hey, others are doing this, so why not join the crowd. e.g.:

include/asm-generic/pgtable.h-567-#ifndef __HAVE_ARCH_PMD_WRITE
include/asm-generic/pgtable.h-568-static inline int pmd_write(pmd_t pmd)
include/asm-generic/pgtable.h-569-{
include/asm-generic/pgtable.h:570:      BUG();
include/asm-generic/pgtable.h-571-      return 0;
include/asm-generic/pgtable.h-572-}
include/asm-generic/pgtable.h-573-#endif /* __HAVE_ARCH_PMD_WRITE */

Richard Weinberger April 27, 2015, 10:57 p.m. UTC | #7

Am 28.04.2015 um 00:53 schrieb Brian Norris:
> On Tue, Apr 28, 2015 at 12:42:18AM +0200, Richard Weinberger wrote:
>> Am 28.04.2015 um 00:36 schrieb Ben Shelton:
>>>>> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
>>>>> following warning here:
>>>>>
>>>>> In file included from drivers/mtd/nand/nand_base.c:46:0:
>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
>>>>> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
>>>>> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>
>>>>> Perhaps return an error code here, even though you'll never get past the BUG()?
>>>>
>>>> What gcc is this?
>>>> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
>>>> return. Can it be that your .config has CONFIG_BUG=n?
>>>> Anyway, this functions clearly needs a return statement. :)
>>>
>>> gcc 4.7.2, and you are correct that I had CONFIG_BUG off.  :)
>>
>> Yeah, just noticed that BUG() with CONFIG_BUG=n does not have
>> a nonreturn attribute. So, gcc cannot know...
> 
> But it's an obvious infinite loop... all of my toolchains (4.2, 4.5,
> 4.6, 4.8) are able to compile this without complaining (gcc -Wall):
> 
> int test() { do { } while (1); }

Not here. gcc 4.8 warns on that.
As soon I add __attribute__ ((noreturn)) it does not longer complain.

Thanks,
//richard

Brian Norris April 27, 2015, 11:10 p.m. UTC | #8

On Mon, Apr 27, 2015 at 3:57 PM, Richard Weinberger <richard@nod.at> wrote:
> Am 28.04.2015 um 00:53 schrieb Brian Norris:
>> On Tue, Apr 28, 2015 at 12:42:18AM +0200, Richard Weinberger wrote:
>>> Am 28.04.2015 um 00:36 schrieb Ben Shelton:
>>>>>> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
>>>>>> following warning here:
>>>>>>
>>>>>> In file included from drivers/mtd/nand/nand_base.c:46:0:
>>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
>>>>>> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
>>>>>> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>>
>>>>>> Perhaps return an error code here, even though you'll never get past the BUG()?
>>>>>
>>>>> What gcc is this?
>>>>> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
>>>>> return. Can it be that your .config has CONFIG_BUG=n?
>>>>> Anyway, this functions clearly needs a return statement. :)
>>>>
>>>> gcc 4.7.2, and you are correct that I had CONFIG_BUG off.  :)
>>>
>>> Yeah, just noticed that BUG() with CONFIG_BUG=n does not have
>>> a nonreturn attribute. So, gcc cannot know...
>>
>> But it's an obvious infinite loop... all of my toolchains (4.2, 4.5,
>> 4.6, 4.8) are able to compile this without complaining (gcc -Wall):
>>
>> int test() { do { } while (1); }
>
> Not here. gcc 4.8 warns on that.
> As soon I add __attribute__ ((noreturn)) it does not longer complain.

Huh? Maybe I have a crazy modified gcc.

$ gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -Wall -Wextra -c a.c
$ cat a.c
int test() { do {} while (1); }

But:

$ gcc -Wall -Wextra -c b.c
b.c: In function ‘test’:
b.c:1:1: warning: control reaches end of non-void function [-Wreturn-type]
 int test() { }
 ^
$ cat b.c
int test() { }

Richard Weinberger April 27, 2015, 11:15 p.m. UTC | #9

Am 28.04.2015 um 01:10 schrieb Brian Norris:
> On Mon, Apr 27, 2015 at 3:57 PM, Richard Weinberger <richard@nod.at> wrote:
>> Am 28.04.2015 um 00:53 schrieb Brian Norris:
>>> On Tue, Apr 28, 2015 at 12:42:18AM +0200, Richard Weinberger wrote:
>>>> Am 28.04.2015 um 00:36 schrieb Ben Shelton:
>>>>>>> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
>>>>>>> following warning here:
>>>>>>>
>>>>>>> In file included from drivers/mtd/nand/nand_base.c:46:0:
>>>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
>>>>>>> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
>>>>>>> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>>>
>>>>>>> Perhaps return an error code here, even though you'll never get past the BUG()?
>>>>>>
>>>>>> What gcc is this?
>>>>>> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
>>>>>> return. Can it be that your .config has CONFIG_BUG=n?
>>>>>> Anyway, this functions clearly needs a return statement. :)
>>>>>
>>>>> gcc 4.7.2, and you are correct that I had CONFIG_BUG off.  :)
>>>>
>>>> Yeah, just noticed that BUG() with CONFIG_BUG=n does not have
>>>> a nonreturn attribute. So, gcc cannot know...
>>>
>>> But it's an obvious infinite loop... all of my toolchains (4.2, 4.5,
>>> 4.6, 4.8) are able to compile this without complaining (gcc -Wall):
>>>
>>> int test() { do { } while (1); }
>>
>> Not here. gcc 4.8 warns on that.
>> As soon I add __attribute__ ((noreturn)) it does not longer complain.
> 
> Huh? Maybe I have a crazy modified gcc.
> 
> $ gcc --version
> gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
> Copyright (C) 2013 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> $ gcc -Wall -Wextra -c a.c
> $ cat a.c
> int test() { do {} while (1); }

Make test static and gcc will warn.

Thanks,
//richard

Brian Norris April 27, 2015, 11:19 p.m. UTC | #10

On Mon, Apr 27, 2015 at 4:15 PM, Richard Weinberger <richard@nod.at> wrote:
> Am 28.04.2015 um 01:10 schrieb Brian Norris:
>> On Mon, Apr 27, 2015 at 3:57 PM, Richard Weinberger <richard@nod.at> wrote:
>>> Am 28.04.2015 um 00:53 schrieb Brian Norris:
>>>> On Tue, Apr 28, 2015 at 12:42:18AM +0200, Richard Weinberger wrote:
>>>>> Am 28.04.2015 um 00:36 schrieb Ben Shelton:
>>>>>>>> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
>>>>>>>> following warning here:
>>>>>>>>
>>>>>>>> In file included from drivers/mtd/nand/nand_base.c:46:0:
>>>>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
>>>>>>>> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
>>>>>>>> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>>>>
>>>>>>>> Perhaps return an error code here, even though you'll never get past the BUG()?
>>>>>>>
>>>>>>> What gcc is this?
>>>>>>> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
>>>>>>> return. Can it be that your .config has CONFIG_BUG=n?
>>>>>>> Anyway, this functions clearly needs a return statement. :)
>>>>>>
>>>>>> gcc 4.7.2, and you are correct that I had CONFIG_BUG off.  :)
>>>>>
>>>>> Yeah, just noticed that BUG() with CONFIG_BUG=n does not have
>>>>> a nonreturn attribute. So, gcc cannot know...
>>>>
>>>> But it's an obvious infinite loop... all of my toolchains (4.2, 4.5,
>>>> 4.6, 4.8) are able to compile this without complaining (gcc -Wall):
>>>>
>>>> int test() { do { } while (1); }
>>>
>>> Not here. gcc 4.8 warns on that.
>>> As soon I add __attribute__ ((noreturn)) it does not longer complain.
>>
>> Huh? Maybe I have a crazy modified gcc.
>>
>> $ gcc --version
>> gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions.  There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>
>> $ gcc -Wall -Wextra -c a.c
>> $ cat a.c
>> int test() { do {} while (1); }
>
> Make test static and gcc will warn.

Hmm. That's a strange distinction for gcc to make. Maybe because of
the potential for inlining? Still seems odd.

Brian Norris April 27, 2015, 11:23 p.m. UTC | #11

On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
> Am 27.04.2015 um 23:35 schrieb Ben Shelton:
> > I tested this against the latest version of the PL353 NAND driver that Punnaiah
> > has been working to upstream (copying her on this message).  With a few changes
> > to that driver, I got it most of the way through initialization with on-die ECC
> > enabled, but it segfaults here with a null pointer dereference because the
> > PL353 driver does not implement chip->cmd_ctrl.  Instead, it implements a
> > custom override of cmd->cmdfunc that does not call cmd_ctrl.  Looking through
> > the other in-tree NAND drivers, it looks like most of them do implement
> > cmd_ctrl, but quite a few of them do not (e.g. au1550nd, denali, docg4).
> > 
> > What do you think would be the best way to handle this?  It seems like this gap
> > could be bridged from either side -- either the PL353 driver could implement
> > cmd_ctrl, at least as a stub version that provides the expected behavior in
> > this case; or the on-die framework could break this out into a callback
> > function with a default implementation that the driver could override to
> > perform this behavior in the manner of its choosing.
> 
> Oh, I thought every driver has to implement that function. ;-\

Nope.

> But you're right there is a corner case.

And it's not the only one! Right now, there's no guarantee even that
read_buf() returns raw data, unmodified by the SoC's controller. Plenty
of drivers actually have HW-enabled ECC turned on by default, and so
they override the chip->ecc.read_page() (and sometimes
chip->ecc.read_page_raw() functions, if we're lucky) with something
that pokes the appropriate hardware instead. I expect anything
comprehensive here is probably going to have to utilize
chip->ecc.read_page_raw(), at least if it's provided by the hardware
driver.

> What we could do is just using chip->cmdfunc() with a custom NAND command.
> i.e. chip->cmdfunc(mtd, NAND_CMD_READMODE, -1, -1);
> 
> Gerhard Sittig tried to introduce such a command some time ago:
> http://lists.infradead.org/pipermail/linux-mtd/2014-April/053115.html

Yikes! Please no! It's bad enough to have a ton of drivers doing
switch/case on a bunch of real, somewhat well-known opcodes, but to add
new fake ones? I'd rather not. We're inflicting ourselves with a
kernel-internal version of ioctl(). What's the justification, again? I
don't really remember the context of Gerhard's previous patch.

Brian

punnaiah choudary kalluri April 28, 2015, 2:48 a.m. UTC | #12

On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
<computersforpeace@gmail.com> wrote:
> On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
>> Am 27.04.2015 um 23:35 schrieb Ben Shelton:
>> > I tested this against the latest version of the PL353 NAND driver that Punnaiah
>> > has been working to upstream (copying her on this message).  With a few changes
>> > to that driver, I got it most of the way through initialization with on-die ECC
>> > enabled, but it segfaults here with a null pointer dereference because the
>> > PL353 driver does not implement chip->cmd_ctrl.  Instead, it implements a
>> > custom override of cmd->cmdfunc that does not call cmd_ctrl.  Looking through
>> > the other in-tree NAND drivers, it looks like most of them do implement
>> > cmd_ctrl, but quite a few of them do not (e.g. au1550nd, denali, docg4).
>> >
>> > What do you think would be the best way to handle this?  It seems like this gap
>> > could be bridged from either side -- either the PL353 driver could implement
>> > cmd_ctrl, at least as a stub version that provides the expected behavior in
>> > this case; or the on-die framework could break this out into a callback
>> > function with a default implementation that the driver could override to
>> > perform this behavior in the manner of its choosing.
>>
>> Oh, I thought every driver has to implement that function. ;-\
>
> Nope.
>
>> But you're right there is a corner case.
>
> And it's not the only one! Right now, there's no guarantee even that
> read_buf() returns raw data, unmodified by the SoC's controller. Plenty
> of drivers actually have HW-enabled ECC turned on by default, and so
> they override the chip->ecc.read_page() (and sometimes
> chip->ecc.read_page_raw() functions, if we're lucky) with something
> that pokes the appropriate hardware instead. I expect anything
> comprehensive here is probably going to have to utilize
> chip->ecc.read_page_raw(), at least if it's provided by the hardware
> driver.

Yes, overriding the chip->ecc.read_page_raw would solve this. Agree that
read_buf need not be returning raw data always including my new driver for
arasan nand flash controller.

http://lkml.iu.edu/hypermail/linux/kernel/1504.2/00313.html


Regards,
Punnaiah
>
>> What we could do is just using chip->cmdfunc() with a custom NAND command.
>> i.e. chip->cmdfunc(mtd, NAND_CMD_READMODE, -1, -1);
>>
>> Gerhard Sittig tried to introduce such a command some time ago:
>> http://lists.infradead.org/pipermail/linux-mtd/2014-April/053115.html
>
> Yikes! Please no! It's bad enough to have a ton of drivers doing
> switch/case on a bunch of real, somewhat well-known opcodes, but to add
> new fake ones? I'd rather not. We're inflicting ourselves with a
> kernel-internal version of ioctl(). What's the justification, again? I
> don't really remember the context of Gerhard's previous patch.
>
> Brian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

punnaiah choudary kalluri April 28, 2015, 3:15 a.m. UTC | #13

On Wed, Mar 25, 2015 at 7:32 PM, Richard Weinberger <richard@nod.at> wrote:
> Some Micron NAND chips offer an on-die ECC (AKA internal ECC)
> feature. It is useful when the host-platform does not offer
> multi-bit ECC and software ECC is not feasible.
>
> Based on original work by David Mosberger <davidm@egauge.net>
>
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
>  drivers/mtd/nand/nand_base.c   |  66 +++++++++-
>  drivers/mtd/nand/nand_ondie.c  | 266 +++++++++++++++++++++++++++++++++++++++++
>  include/linux/mtd/nand.h       |   6 +
>  include/linux/mtd/nand_ondie.h |  40 +++++++
>  4 files changed, 374 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/mtd/nand/nand_ondie.c
>  create mode 100644 include/linux/mtd/nand_ondie.h
>
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index df7eb4f..92e7ed7 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -43,6 +43,7 @@
>  #include <linux/mtd/nand.h>
>  #include <linux/mtd/nand_ecc.h>
>  #include <linux/mtd/nand_bch.h>
> +#include <linux/mtd/nand_ondie.h>
>  #include <linux/interrupt.h>
>  #include <linux/bitops.h>
>  #include <linux/leds.h>
> @@ -79,6 +80,20 @@ static struct nand_ecclayout nand_oob_64 = {
>                  .length = 38} }
>  };
>
> +static struct nand_ecclayout nand_oob_64_on_die = {
> +       .eccbytes = 32,
> +       .eccpos = {
> +                   8,  9, 10, 11, 12, 13, 14, 15,
> +                  24, 25, 26, 27, 28, 29, 30, 31,
> +                  40, 41, 42, 43, 44, 45, 46, 47,
> +                  56, 57, 58, 59, 60, 61, 62, 63},
> +       .oobfree = {
> +               {.offset =  4,   .length = 4},
> +               {.offset = 20,   .length = 4},
> +               {.offset = 36,   .length = 4},
> +               {.offset = 52,   .length = 4} }
> +};
> +
>  static struct nand_ecclayout nand_oob_128 = {
>         .eccbytes = 48,
>         .eccpos = {
> @@ -3115,9 +3130,10 @@ static int nand_setup_read_retry_micron(struct mtd_info *mtd, int retry_mode)
>  /*
>   * Configure chip properties from Micron vendor-specific ONFI table
>   */
> -static void nand_onfi_detect_micron(struct nand_chip *chip,
> -               struct nand_onfi_params *p)
> +static void nand_onfi_detect_micron(struct mtd_info *mtd,
> +                                   struct nand_onfi_params *p)
>  {
> +       struct nand_chip *chip = mtd->priv;
>         struct nand_onfi_vendor_micron *micron = (void *)p->vendor;
>
>         if (le16_to_cpu(p->vendor_revision) < 1)
> @@ -3125,6 +3141,8 @@ static void nand_onfi_detect_micron(struct nand_chip *chip,
>
>         chip->read_retries = micron->read_retry_options;
>         chip->setup_read_retry = nand_setup_read_retry_micron;
> +
> +       nand_onfi_detect_on_die_ecc(mtd);
>  }
>
>  /*
> @@ -3226,7 +3244,7 @@ static int nand_flash_detect_onfi(struct mtd_info *mtd, struct nand_chip *chip,
>         }
>
>         if (p->jedec_id == NAND_MFR_MICRON)
> -               nand_onfi_detect_micron(chip, p);
> +               nand_onfi_detect_micron(mtd, p);
>
>         return 1;
>  }
> @@ -4056,6 +4074,40 @@ int nand_scan_tail(struct mtd_info *mtd)
>                 }
>                 break;
>
> +       case NAND_ECC_HW_ON_DIE:
> +               if (!mtd_nand_has_on_die()) {
> +                       pr_warn("CONFIG_MTD_NAND_ECC_ON_DIE not enabled\n");
> +                       BUG();
> +               }

Cant we get rid of this CONFIG option. lets say during the nand scan
phase it self,
ondie ecc feature status will be known from onfi parameter page based
on the product
and vendor id of micron parts. Now lets the driver choose which ecc it
want. Like legacy
controllers(pl353 smc controller ..) which doesn't have much ecc
coverage will use
ondie ecc feature and controllers that have better ecc coverage can
use controller ecc.
so, the driver can choose which ecc they want to use and pass it to
the scan_tail function.
Since ondie ecc has limitations on spare area usage and the ecc layout
is different to what
the default linux kernel layout, people may prefer to use controller
ecc if that controller has
better ecc coverage. But any case, i think we can avoid this config option.


Regards,,
Punnaiah
> +               /*
> +                * nand_bbt.c by default puts the BBT marker at
> +                * offset 8 in OOB, which is used for ECC (see
> +                * nand_oob_64_on_die).
> +                * Fixed by not using OOB for BBT marker.
> +                */
> +               chip->bbt_options |= NAND_BBT_NO_OOB;
> +               ecc->layout = &nand_oob_64_on_die;
> +               ecc->read_page = nand_read_page_on_die;
> +               ecc->read_subpage = nand_read_subpage_on_die;
> +               ecc->write_page = nand_write_page_raw;
> +               ecc->read_oob = nand_read_oob_std;
> +               ecc->read_page_raw = nand_read_page_raw;
> +               ecc->write_page_raw = nand_write_page_raw;
> +               ecc->write_oob = nand_write_oob_std;
> +               ecc->size = 512;
> +               ecc->bytes = 8;
> +               ecc->strength = 4;
> +               ecc->priv = nand_on_die_init(mtd);
> +               if (!ecc->priv) {
> +                       pr_warn("On-Die ECC initialization failed!\n");
> +                       BUG();
> +               }
> +               if (nand_setup_on_die_ecc_micron(mtd, 1) != 0) {
> +                       pr_warn("Unable to enable on-die ECC!\n");
> +                       BUG();
> +               }
> +               break;
> +
>         case NAND_ECC_NONE:
>                 pr_warn("NAND_ECC_NONE selected by board driver. This is not recommended!\n");
>                 ecc->read_page = nand_read_page_raw;
> @@ -4127,10 +4179,14 @@ int nand_scan_tail(struct mtd_info *mtd)
>         /* Invalidate the pagebuffer reference */
>         chip->pagebuf = -1;
>
> -       /* Large page NAND with SOFT_ECC should support subpage reads */
> +       /*
> +        * Large page NAND with SOFT_ECC or on-die ECC should support
> +        * subpage reads.
> +        */
>         switch (ecc->mode) {
>         case NAND_ECC_SOFT:
>         case NAND_ECC_SOFT_BCH:
> +       case NAND_ECC_HW_ON_DIE:
>                 if (chip->page_shift > 9)
>                         chip->options |= NAND_SUBPAGE_READ;
>                 break;
> @@ -4232,6 +4288,8 @@ void nand_release(struct mtd_info *mtd)
>
>         if (chip->ecc.mode == NAND_ECC_SOFT_BCH)
>                 nand_bch_free((struct nand_bch_control *)chip->ecc.priv);
> +       else if (chip->ecc.mode == NAND_ECC_HW_ON_DIE)
> +               nand_on_die_destroy(chip->ecc.priv);
>
>         mtd_device_unregister(mtd);
>
> diff --git a/drivers/mtd/nand/nand_ondie.c b/drivers/mtd/nand/nand_ondie.c
> new file mode 100644
> index 0000000..4147ef5
> --- /dev/null
> +++ b/drivers/mtd/nand/nand_ondie.c
> @@ -0,0 +1,266 @@
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/bitops.h>
> +#include <linux/mtd/mtd.h>
> +#include <linux/mtd/nand.h>
> +#include <linux/mtd/nand_ondie.h>
> +
> +/**
> + * struct nand_on_die_control - private NAND on-die ECC control structure
> + *
> + * As on-die ECC does not report the number of changed bits we have to
> + * count them on our own. If on-die ECC reports flipped bits we re-read the
> + * whole page into @chkbuf and then again with ECC disabled into @rawbuf.
> + * By comparing these buffers the number of changed bits can be identified.
> + *
> + * @chkbuf: Buffer to store a whole page
> + * @rawbuf: Buffer to store a whole page with ECC disabled
> + */
> +struct nand_on_die_control {
> +       uint8_t *chkbuf;
> +       uint8_t *rawbuf;
> +};
> +
> +static int check_for_bitflips(struct mtd_info *mtd, int page)
> +{
> +       int flips = 0, max_bitflips = 0, i, j, read_size;
> +       uint8_t *chkbuf, *rawbuf, *chkoob, *rawoob;
> +       struct nand_chip *chip = mtd->priv;
> +       struct nand_on_die_control *on_die_ctrl = chip->ecc.priv;
> +       uint32_t *eccpos;
> +
> +       chkbuf = on_die_ctrl->chkbuf;
> +       rawbuf = on_die_ctrl->rawbuf;
> +       read_size = mtd->writesize + mtd->oobsize;
> +
> +       /* Read entire page w/OOB area with on-die ECC on */
> +       chip->read_buf(mtd, chkbuf, read_size);
> +
> +       /* Re-read page with on-die ECC off */
> +       nand_setup_on_die_ecc_micron(mtd, 0);
> +       chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
> +       chip->read_buf(mtd, rawbuf, read_size);
> +       nand_setup_on_die_ecc_micron(mtd, 1);
> +
> +       chkoob = chkbuf + mtd->writesize;
> +       rawoob = rawbuf + mtd->writesize;
> +       eccpos = chip->ecc.layout->eccpos;
> +       for (i = 0; i < chip->ecc.steps; ++i) {
> +               /* Count bit flips in the actual data area */
> +               flips = 0;
> +               for (j = 0; j < chip->ecc.size; ++j)
> +                       flips += hweight8(chkbuf[j] ^ rawbuf[j]);
> +               /* Count bit flips in the ECC bytes */
> +               for (j = 0; j < chip->ecc.bytes; ++j) {
> +                       flips += hweight8(chkoob[*eccpos] ^ rawoob[*eccpos]);
> +                       ++eccpos;
> +               }
> +               if (flips > 0)
> +                       mtd->ecc_stats.corrected += flips;
> +               max_bitflips = max_t(int, max_bitflips, flips);
> +               chkbuf += chip->ecc.size;
> +               rawbuf += chip->ecc.size;
> +       }
> +
> +       /* Re-issue the READ command for the actual data read that follows.  */
> +       chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
> +
> +       return max_bitflips;
> +}
> +
> +static int check_read_status_on_die(struct mtd_info *mtd, int page)
> +{
> +       struct nand_chip *chip = mtd->priv;
> +       int max_bitflips = 0;
> +       uint8_t status;
> +
> +       /* Check ECC status of page just transferred into NAND's page buffer */
> +       chip->cmdfunc(mtd, NAND_CMD_STATUS, -1, -1);
> +       status = chip->read_byte(mtd);
> +
> +       /* Switch back to data reading */
> +       chip->cmd_ctrl(mtd, NAND_CMD_READ0, NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE);
> +       chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_NCE | NAND_CTRL_CHANGE);
> +
> +       if (status & NAND_STATUS_FAIL) {
> +               /* Page has invalid ECC */
> +               mtd->ecc_stats.failed++;
> +       } else if (status & NAND_STATUS_REWRITE) {
> +               /*
> +                * Micron on-die ECC doesn't report the number of
> +                * bitflips, so we have to count them ourself to see
> +                * if the error rate is too high.  This is
> +                * particularly important for pages with stuck bits.
> +                */
> +               max_bitflips = check_for_bitflips(mtd, page);
> +       }
> +       return max_bitflips;
> +}
> +
> +
> +/**
> + * nand_read_page_on_die - [INTERN] read raw page data without ecc
> + * @mtd: mtd info structure
> + * @chip: nand chip info structure
> + * @buf: buffer to store read data
> + * @oob_required: caller requires OOB data read to chip->oob_poi
> + * @page: page number to read
> + */
> +int nand_read_page_on_die(struct mtd_info *mtd, struct nand_chip *chip,
> +                                uint8_t *buf, int oob_required, int page)
> +{
> +       uint32_t failed = mtd->ecc_stats.failed;
> +       int ret;
> +
> +       ret = check_read_status_on_die(mtd, page);
> +       if (ret < 0 || mtd->ecc_stats.failed != failed)
> +               return ret;
> +
> +       chip->read_buf(mtd, buf, mtd->writesize);
> +       if (oob_required)
> +               chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
> +       return ret;
> +}
> +
> +/**
> + * nand_read_subpage - [INTERN] on-die-ECC based sub-page read function
> + * @mtd: mtd info structure
> + * @chip: nand chip info structure
> + * @data_offs: offset of requested data within the page
> + * @readlen: data length
> + * @bufpoi: buffer to store read data
> + * @page: page number to read
> + */
> +int nand_read_subpage_on_die(struct mtd_info *mtd,
> +                                   struct nand_chip *chip,
> +                                   uint32_t data_offs, uint32_t readlen,
> +                                   uint8_t *bufpoi, int page)
> +{
> +       int start_step, end_step, num_steps, ret;
> +       int data_col_addr;
> +       int datafrag_len;
> +       uint32_t failed;
> +       uint8_t *p;
> +
> +       /* Column address within the page aligned to ECC size */
> +       start_step = data_offs / chip->ecc.size;
> +       end_step = (data_offs + readlen - 1) / chip->ecc.size;
> +       num_steps = end_step - start_step + 1;
> +
> +       /* Data size aligned to ECC ecc.size */
> +       datafrag_len = num_steps * chip->ecc.size;
> +       data_col_addr = start_step * chip->ecc.size;
> +       p = bufpoi + data_col_addr;
> +
> +       failed = mtd->ecc_stats.failed;
> +
> +       ret = check_read_status_on_die(mtd, page);
> +       if (ret < 0 || mtd->ecc_stats.failed != failed) {
> +               memset(p, 0, datafrag_len);
> +               return ret;
> +       }
> +
> +       /* If we read not a page aligned data */
> +       if (data_col_addr != 0)
> +               chip->cmdfunc(mtd, NAND_CMD_RNDOUT, data_col_addr, -1);
> +
> +       chip->read_buf(mtd, p, datafrag_len);
> +
> +       return ret;
> +}
> +
> +/**
> + * nand_on_die_init - Initialize on-die ECC private data
> + *
> + * @mtd: mtd info structure
> + *
> + * Returns NULL in case of an error.
> + */
> +struct nand_on_die_control *nand_on_die_init(struct mtd_info *mtd)
> +{
> +       struct nand_on_die_control *ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
> +
> +       if (!ctrl)
> +               goto out;
> +
> +       ctrl->chkbuf = kzalloc(mtd->writesize + mtd->oobsize, GFP_KERNEL);
> +       ctrl->rawbuf = kzalloc(mtd->writesize + mtd->oobsize, GFP_KERNEL);
> +
> +       if (!ctrl->chkbuf || !ctrl->rawbuf) {
> +               kfree(ctrl->chkbuf);
> +               kfree(ctrl->rawbuf);
> +               kfree(ctrl);
> +               ctrl = NULL;
> +       }
> +
> +out:
> +       return ctrl;
> +}
> +
> +/**
> + * nand_on_die_destroy - Destroy on-die ECC private data
> + *
> + * @ctrl: on-die ECC private data
> + */
> +void nand_on_die_destroy(struct nand_on_die_control *ctrl)
> +{
> +       kfree(ctrl->chkbuf);
> +       kfree(ctrl->rawbuf);
> +       kfree(ctrl);
> +}
> +
> +static int __nand_setup_on_die_ecc_micron(struct mtd_info *mtd, int enable)
> +{
> +       struct nand_chip *chip = mtd->priv;
> +       uint8_t feature[ONFI_SUBFEATURE_PARAM_LEN] = {0};
> +
> +       feature[0] = enable ? ONFI_FEATURE_ENABLE_ON_DIE_ECC : 0;
> +
> +       return chip->onfi_set_features(mtd, chip,
> +                    ONFI_FEATURE_ADDR_ARRAY_OP_MODE, feature);
> +}
> +
> +/**
> + * nand_setup_on_die_ecc_micron - Enable/Disable on-die ECC mode
> + *
> + * @mtd: mtd info structure
> + * @enable: enables on-die ECC if non-zero
> + */
> +int nand_setup_on_die_ecc_micron(struct mtd_info *mtd, int enable)
> +{
> +       int ret;
> +       struct nand_chip *chip = mtd->priv;
> +
> +       if (chip->ecc.mode != NAND_ECC_HW_ON_DIE)
> +               return 0;
> +
> +       ret = __nand_setup_on_die_ecc_micron(mtd, enable);
> +       if (ret)
> +               pr_err("Unable to %sable on-die ECC!\n", enable ? "en" : "dis");
> +       return ret;
> +}
> +
> +/**
> + * nand_onfi_detect_on_die_ecc - Detect and handle on-die ECC
> + *
> + * @mtd: mtd info structure
> + */
> +void nand_onfi_detect_on_die_ecc(struct mtd_info *mtd)
> +{
> +       u8 features[ONFI_SUBFEATURE_PARAM_LEN];
> +       struct nand_chip *chip = mtd->priv;
> +
> +       if (chip->onfi_get_features(mtd, chip, ONFI_FEATURE_ADDR_ARRAY_OP_MODE,
> +                                   features) < 0)
> +               return;
> +
> +       if (features[0] & ONFI_FEATURE_ENABLE_ON_DIE_ECC &&
> +               chip->ecc.mode != NAND_ECC_HW_ON_DIE) {
> +
> +               pr_info("Requested ECC method is not on-die, disabling on-die ECC\n");
> +               __nand_setup_on_die_ecc_micron(mtd, 0);
> +       }
> +}
> +
> diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
> index 3d4ea7e..5216704 100644
> --- a/include/linux/mtd/nand.h
> +++ b/include/linux/mtd/nand.h
> @@ -101,6 +101,7 @@ extern int nand_unlock(struct mtd_info *mtd, loff_t ofs, uint64_t len);
>  /* Status bits */
>  #define NAND_STATUS_FAIL       0x01
>  #define NAND_STATUS_FAIL_N1    0x02
> +#define NAND_STATUS_REWRITE    0x08
>  #define NAND_STATUS_TRUE_READY 0x20
>  #define NAND_STATUS_READY      0x40
>  #define NAND_STATUS_WP         0x80
> @@ -115,6 +116,7 @@ typedef enum {
>         NAND_ECC_HW_SYNDROME,
>         NAND_ECC_HW_OOB_FIRST,
>         NAND_ECC_SOFT_BCH,
> +       NAND_ECC_HW_ON_DIE,
>  } nand_ecc_modes_t;
>
>  /*
> @@ -219,6 +221,10 @@ struct nand_chip;
>  /* Vendor-specific feature address (Micron) */
>  #define ONFI_FEATURE_ADDR_READ_RETRY   0x89
>
> +/* Vendor-specific array operation mode (Micron) */
> +#define ONFI_FEATURE_ADDR_ARRAY_OP_MODE        0x90
> +#define ONFI_FEATURE_ENABLE_ON_DIE_ECC 0x08
> +
>  /* ONFI subfeature parameters length */
>  #define ONFI_SUBFEATURE_PARAM_LEN      4
>
> diff --git a/include/linux/mtd/nand_ondie.h b/include/linux/mtd/nand_ondie.h
> new file mode 100644
> index 0000000..b2ec2da
> --- /dev/null
> +++ b/include/linux/mtd/nand_ondie.h
> @@ -0,0 +1,40 @@
> +#ifndef __MTD_NAND_ONDIE_H__
> +#define __MTD_NAND_ONDIE_H__
> +
> +struct nand_on_die_control;
> +
> +#if defined(CONFIG_MTD_NAND_ECC_ON_DIE)
> +static inline int mtd_nand_has_on_die(void) { return 1; }
> +int nand_read_subpage_on_die(struct mtd_info *mtd,
> +                                   struct nand_chip *chip,
> +                                   uint32_t data_offs, uint32_t readlen,
> +                                   uint8_t *bufpoi, int page);
> +
> +int nand_read_page_on_die(struct mtd_info *mtd, struct nand_chip *chip,
> +                         uint8_t *buf, int oob_required, int page);
> +void nand_onfi_detect_on_die_ecc(struct mtd_info *mtd);
> +int nand_setup_on_die_ecc_micron(struct mtd_info *mtd, int enable);
> +struct nand_on_die_control *nand_on_die_init(struct mtd_info *mtd);
> +void nand_on_die_destroy(struct nand_on_die_control *ctrl);
> +#else /* !CONFIG_MTD_NAND_ECC_ON_DIE */
> +static inline int mtd_nand_has_on_die(void) { return 0; }
> +static inline int nand_read_subpage_on_die(struct mtd_info *mtd,
> +                                          struct nand_chip *chip,
> +                                          uint32_t data_offs,
> +                                          uint32_t readlen,
> +                                          uint8_t *bufpoi, int page)
> +{
> +       BUG();
> +}
> +
> +static inline int nand_read_page_on_die(struct mtd_info *mtd, struct nand_chip *chip,
> +                                       uint8_t *buf, int oob_required, int page)
> +{
> +       BUG();
> +}
> +static inline void nand_onfi_detect_on_die_ecc(struct mtd_info *mtd) { }
> +static inline int nand_setup_on_die_ecc_micron(struct mtd_info *mtd, int enable) { return -EINVAL; }
> +static inline struct nand_on_die_control *nand_on_die_init(struct mtd_info *mtd) { return NULL; }
> +static inline void nand_on_die_destroy(struct nand_on_die_control *ctrl) { }
> +#endif /* CONFIG_MTD_NAND_ECC_ON_DIE */
> +#endif /* __MTD_NAND_ONDIE_H__ */
> --
> 2.3.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Brian Norris April 28, 2015, 3:22 a.m. UTC | #14

On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
> On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
> <computersforpeace@gmail.com> wrote:
> > On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
> >> Oh, I thought every driver has to implement that function. ;-\
> >
> > Nope.
> >
> >> But you're right there is a corner case.
> >
> > And it's not the only one! Right now, there's no guarantee even that
> > read_buf() returns raw data, unmodified by the SoC's controller. Plenty
> > of drivers actually have HW-enabled ECC turned on by default, and so
> > they override the chip->ecc.read_page() (and sometimes
> > chip->ecc.read_page_raw() functions, if we're lucky) with something
> > that pokes the appropriate hardware instead. I expect anything
> > comprehensive here is probably going to have to utilize
> > chip->ecc.read_page_raw(), at least if it's provided by the hardware
> > driver.
> 
> Yes, overriding the chip->ecc.read_page_raw would solve this.

I'm actually suggesting that (in this patch set, for on-die ECC
support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
leave that to be defined by the driver, and then on-die ECC support
should be added in a way that just calls chip->ecc.read_page_raw(). This
should work for any driver that already properly supports the raw
callbacks.

> Agree that
> read_buf need not be returning raw data always including my new driver for
> arasan nand flash controller.

I agree with that. At the moment, chip->read_buf() really has very
driver-specific meaning. Not sure if that's really a good thing, but
it's the way things are...

> http://lkml.iu.edu/hypermail/linux/kernel/1504.2/00313.html

In the half a minute I just spent looking at this (I may review it
properly later), I noted a few things:

1. you don't implement ecc.read_page_raw(); this means we'll probably
have trouble supporting on-die ECC with your driver, among other things

2. your patch is all white-space mangled. Please use your favorite
search engine to figure out how to get that right. git-send-email is
your friend.

Thanks,
Brian

punnaiah choudary kalluri April 28, 2015, 3:44 a.m. UTC | #15

On Tue, Apr 28, 2015 at 8:52 AM, Brian Norris
<computersforpeace@gmail.com> wrote:
> On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
>> On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
>> <computersforpeace@gmail.com> wrote:
>> > On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
>> >> Oh, I thought every driver has to implement that function. ;-\
>> >
>> > Nope.
>> >
>> >> But you're right there is a corner case.
>> >
>> > And it's not the only one! Right now, there's no guarantee even that
>> > read_buf() returns raw data, unmodified by the SoC's controller. Plenty
>> > of drivers actually have HW-enabled ECC turned on by default, and so
>> > they override the chip->ecc.read_page() (and sometimes
>> > chip->ecc.read_page_raw() functions, if we're lucky) with something
>> > that pokes the appropriate hardware instead. I expect anything
>> > comprehensive here is probably going to have to utilize
>> > chip->ecc.read_page_raw(), at least if it's provided by the hardware
>> > driver.
>>
>> Yes, overriding the chip->ecc.read_page_raw would solve this.
>
> I'm actually suggesting that (in this patch set, for on-die ECC
> support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
> leave that to be defined by the driver, and then on-die ECC support
> should be added in a way that just calls chip->ecc.read_page_raw(). This
> should work for any driver that already properly supports the raw
> callbacks.

Ok. Understood.

>
>> Agree that
>> read_buf need not be returning raw data always including my new driver for
>> arasan nand flash controller.
>
> I agree with that. At the moment, chip->read_buf() really has very
> driver-specific meaning. Not sure if that's really a good thing, but
> it's the way things are...
>
>> http://lkml.iu.edu/hypermail/linux/kernel/1504.2/00313.html
>
> In the half a minute I just spent looking at this (I may review it
> properly later), I noted a few things:
>
> 1. you don't implement ecc.read_page_raw(); this means we'll probably
> have trouble supporting on-die ECC with your driver, among other things

On-die ECC is optional as long as the controller has better ecc coverage.
The arasan controller supports up to 24 bit ecc. There is no point to use
on-die ECC and will always use hw ecc even for On-die ecc devices.
This version of driver will not have the support for ecc.read_page_raw but
I will add based on the need in future.


>
> 2. your patch is all white-space mangled. Please use your favorite
> search engine to figure out how to get that right. git-send-email is
> your friend.

Oh sorry. Looks that was the web link issue. Here is the new one.
 https://lkml.org/lkml/2015/4/16/311

Also request your time for reviewing this driver.

Thanks,
Punnaiah

>
> Thanks,
> Brian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Josh Cartwright April 28, 2015, 2:03 p.m. UTC | #16

On Tue, Apr 28, 2015 at 09:14:26AM +0530, punnaiah choudary kalluri wrote:
> On Tue, Apr 28, 2015 at 8:52 AM, Brian Norris <computersforpeace@gmail.com> wrote:
> > On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
> >> On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris <computersforpeace@gmail.com> wrote:
[..]
> >> Agree that read_buf need not be returning raw data always including
> >> my new driver for arasan nand flash controller.
> >
> > I agree with that. At the moment, chip->read_buf() really has very
> > driver-specific meaning. Not sure if that's really a good thing, but
> > it's the way things are...
> >
> >> http://lkml.iu.edu/hypermail/linux/kernel/1504.2/00313.html
> >
> > In the half a minute I just spent looking at this (I may review it
> > properly later), I noted a few things:
> >
> > 1. you don't implement ecc.read_page_raw(); this means we'll probably
> > have trouble supporting on-die ECC with your driver, among other things
> 
> On-die ECC is optional as long as the controller has better ecc coverage.
> The arasan controller supports up to 24 bit ecc. There is no point to use
> on-die ECC and will always use hw ecc even for On-die ecc devices.

Maybe this is true of the controller instantiated in the Zynq MPSoC
chips, but the Zynq 7000 series SMC is documented to support only 1-bit
ECC.

(Which is why we're looking at your pl353 driver and this set).

  Josh

punnaiah choudary kalluri April 28, 2015, 4:19 p.m. UTC | #17

On Tue, Apr 28, 2015 at 7:33 PM, Josh Cartwright <joshc@ni.com> wrote:
> On Tue, Apr 28, 2015 at 09:14:26AM +0530, punnaiah choudary kalluri wrote:
>> On Tue, Apr 28, 2015 at 8:52 AM, Brian Norris <computersforpeace@gmail.com> wrote:
>> > On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
>> >> On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris <computersforpeace@gmail.com> wrote:
> [..]
>> >> Agree that read_buf need not be returning raw data always including
>> >> my new driver for arasan nand flash controller.
>> >
>> > I agree with that. At the moment, chip->read_buf() really has very
>> > driver-specific meaning. Not sure if that's really a good thing, but
>> > it's the way things are...
>> >
>> >> http://lkml.iu.edu/hypermail/linux/kernel/1504.2/00313.html
>> >
>> > In the half a minute I just spent looking at this (I may review it
>> > properly later), I noted a few things:
>> >
>> > 1. you don't implement ecc.read_page_raw(); this means we'll probably
>> > have trouble supporting on-die ECC with your driver, among other things
>>
>> On-die ECC is optional as long as the controller has better ecc coverage.
>> The arasan controller supports up to 24 bit ecc. There is no point to use
>> on-die ECC and will always use hw ecc even for On-die ecc devices.
>
> Maybe this is true of the controller instantiated in the Zynq MPSoC
> chips, but the Zynq 7000 series SMC is documented to support only 1-bit
> ECC.

True. SMC controller in zynq supports only 1 -bit ECC. One can use
on-die ECC devices
and turn of the controller ECC for better ECC coverage. I am also
exploring the options
to use these patches with the pl353 driver in a  generic way and
acceptable to all.

Regards,
Punnaiah
>
> (Which is why we're looking at your pl353 driver and this set).
>
>   Josh

Ben Shelton May 8, 2015, 9:26 p.m. UTC | #18

On 04/27, Brian Norris wrote:
> On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
> > On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
> > <computersforpeace@gmail.com> wrote:
> > > On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
> > >> Oh, I thought every driver has to implement that function. ;-\
> > >
> > > Nope.
> > >
> > >> But you're right there is a corner case.
> > >
> > > And it's not the only one! Right now, there's no guarantee even that
> > > read_buf() returns raw data, unmodified by the SoC's controller. Plenty
> > > of drivers actually have HW-enabled ECC turned on by default, and so
> > > they override the chip->ecc.read_page() (and sometimes
> > > chip->ecc.read_page_raw() functions, if we're lucky) with something
> > > that pokes the appropriate hardware instead. I expect anything
> > > comprehensive here is probably going to have to utilize
> > > chip->ecc.read_page_raw(), at least if it's provided by the hardware
> > > driver.
> > 
> > Yes, overriding the chip->ecc.read_page_raw would solve this.
> 
> I'm actually suggesting that (in this patch set, for on-die ECC
> support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
> leave that to be defined by the driver, and then on-die ECC support
> should be added in a way that just calls chip->ecc.read_page_raw(). This
> should work for any driver that already properly supports the raw
> callbacks.
> 

Hi Richard et al,

I'm guessing it's probably too late for the on-die ECC stuff to land in
4.2 at this point.  Is there anything I can do to help this along
(testing, etc.)?

Thanks,
Ben

Brian Norris May 8, 2015, 9:39 p.m. UTC | #19

On Fri, May 08, 2015 at 04:26:32PM -0500, Ben Shelton wrote:
> On 04/27, Brian Norris wrote:
> > On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
> > > On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
> > > <computersforpeace@gmail.com> wrote:
> > > > On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
> > > >> Oh, I thought every driver has to implement that function. ;-\
> > > >
> > > > Nope.
> > > >
> > > >> But you're right there is a corner case.
> > > >
> > > > And it's not the only one! Right now, there's no guarantee even that
> > > > read_buf() returns raw data, unmodified by the SoC's controller. Plenty
> > > > of drivers actually have HW-enabled ECC turned on by default, and so
> > > > they override the chip->ecc.read_page() (and sometimes
> > > > chip->ecc.read_page_raw() functions, if we're lucky) with something
> > > > that pokes the appropriate hardware instead. I expect anything
> > > > comprehensive here is probably going to have to utilize
> > > > chip->ecc.read_page_raw(), at least if it's provided by the hardware
> > > > driver.
> > > 
> > > Yes, overriding the chip->ecc.read_page_raw would solve this.
> > 
> > I'm actually suggesting that (in this patch set, for on-die ECC
> > support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
> > leave that to be defined by the driver, and then on-die ECC support
> > should be added in a way that just calls chip->ecc.read_page_raw(). This
> > should work for any driver that already properly supports the raw
> > callbacks.
> > 
> 
> Hi Richard et al,
> 
> I'm guessing it's probably too late for the on-die ECC stuff to land in
> 4.2 at this point.

Not technically. We've got several weeks (approx 5 to 6?) before 4.1 is
released. 4.2 material should be getting finalized by a week or so
before the merge window (i.e., 4 to 5 weeks from now).

> Is there anything I can do to help this along
> (testing, etc.)?

This is going to need to get rewritten. I'm not sure if Richard is going
to tackle this again, as he hasn't responded to the points I brought up.
(Note that Richard is not the first to have tried to implement this,
without initial success.)

Brian

Richard Weinberger May 8, 2015, 9:43 p.m. UTC | #20

Am 08.05.2015 um 23:39 schrieb Brian Norris:
> On Fri, May 08, 2015 at 04:26:32PM -0500, Ben Shelton wrote:
>> On 04/27, Brian Norris wrote:
>>> On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
>>>> On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
>>>> <computersforpeace@gmail.com> wrote:
>>>>> On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
>>>>>> Oh, I thought every driver has to implement that function. ;-\
>>>>>
>>>>> Nope.
>>>>>
>>>>>> But you're right there is a corner case.
>>>>>
>>>>> And it's not the only one! Right now, there's no guarantee even that
>>>>> read_buf() returns raw data, unmodified by the SoC's controller. Plenty
>>>>> of drivers actually have HW-enabled ECC turned on by default, and so
>>>>> they override the chip->ecc.read_page() (and sometimes
>>>>> chip->ecc.read_page_raw() functions, if we're lucky) with something
>>>>> that pokes the appropriate hardware instead. I expect anything
>>>>> comprehensive here is probably going to have to utilize
>>>>> chip->ecc.read_page_raw(), at least if it's provided by the hardware
>>>>> driver.
>>>>
>>>> Yes, overriding the chip->ecc.read_page_raw would solve this.
>>>
>>> I'm actually suggesting that (in this patch set, for on-die ECC
>>> support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
>>> leave that to be defined by the driver, and then on-die ECC support
>>> should be added in a way that just calls chip->ecc.read_page_raw(). This
>>> should work for any driver that already properly supports the raw
>>> callbacks.
>>>
>>
>> Hi Richard et al,
>>
>> I'm guessing it's probably too late for the on-die ECC stuff to land in
>> 4.2 at this point.
> 
> Not technically. We've got several weeks (approx 5 to 6?) before 4.1 is
> released. 4.2 material should be getting finalized by a week or so
> before the merge window (i.e., 4 to 5 weeks from now).
> 
>> Is there anything I can do to help this along
>> (testing, etc.)?
> 
> This is going to need to get rewritten. I'm not sure if Richard is going
> to tackle this again, as he hasn't responded to the points I brought up.
> (Note that Richard is not the first to have tried to implement this,
> without initial success.)

I'm definitely willing to take the challenge.
But as I'm currently very busy with non-MTD stuff I had no time
to address your comments.

Thanks,
//richard

[1/3] mtd: nand: Add on-die ECC support

Commit Message

Comments

Patch