diff mbox series

[2/3] mtd: rawnand: micron: Disable ECC earlier in the read path

Message ID 20180703122009.29914-3-boris.brezillon@bootlin.com
State Accepted
Delegated to: Miquel Raynal
Headers show
Series mtd: rawnand: micron: Get the actual number of bitflips | expand

Commit Message

Boris Brezillon July 3, 2018, 12:20 p.m. UTC
We are about to support extracting the real number of bitflips for
4-bits ECC when WRITE_RECOMMEND is returned. This requires re-reading
the page in raw mode to compare it to the corrected version, and this
logic will be placed in micron_nand_on_die_ecc_status_4().

Moving the micron_nand_on_die_ecc_setup() will allow us to disable
ECC only once.

As a result, we have to rework the exit path and add an error path
where the ECC is disabled.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
---
 drivers/mtd/nand/raw/nand_micron.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

Comments

Chris Packham July 16, 2018, 9 a.m. UTC | #1
Hi Boris,

On 04/07/18 00:20, Boris Brezillon wrote:
> We are about to support extracting the real number of bitflips for
> 4-bits ECC when WRITE_RECOMMEND is returned. This requires re-reading
> the page in raw mode to compare it to the corrected version, and this
> logic will be placed in micron_nand_on_die_ecc_status_4().
> 
> Moving the micron_nand_on_die_ecc_setup() will allow us to disable
> ECC only once.
> 
> As a result, we have to rework the exit path and add an error path
> where the ECC is disabled.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>

As I said on the other thread this appears to cause a problem for me on 
the MT29F1G08ABAFAWP-ITE setup I have. I notice we're not able to find 
the BBT, not sure if that is symptom or cause. If I revert this I can 
successfully mount and access data on the chip.

Here's some output from dmesg:

nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd1
nand: Micron MT29F1G08ABAGAWP
nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
nand: timing mode 5 not acknowledged by the NAND chip
Bad block table not found for chip 0
Bad block table not found for chip 0
Scanning device for bad blocks
random: fast init done
Bad block table written to 0x000007fe0000, version 0x01
Bad block table written to 0x000007fc0000, version 0x01
3 fixed-partitions partitions found on MTD device pxa3xx_nand-0
Creating 3 MTD partitions on "pxa3xx_nand-0":
0x000000000000-0x000007000000 : "user"
0x000007000000-0x000007800000 : "errlog"
mtdoops: Attached to MTD device 1
mtdoops: ready 0, 1
0x000007800000-0x000008000000 : "nand-bbt"

And from my attempt to mount:

[root@linuxbox ~]# umount /flash && ubiattach -p /dev/mtd0 && mount -t 
ubifs ubi
0:user -o sync /flash
ubi0: attaching mtd0
ubi0: scanning is finished
ubi0 error: ubi_read_volume_table: the layout volume was not found
ubi0 error: ubi_attach_mtd_dev: failed to attach mtd0, error -22
ubiattach: error!: cannot attach "/dev/mtd0"
            error 22 (Invalid argument)
[root@linuxbox ~]#


> ---
>   drivers/mtd/nand/raw/nand_micron.c | 25 ++++++++++++++++---------
>   1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
> index 63ac98a36ed7..b9cbaf125a98 100644
> --- a/drivers/mtd/nand/raw/nand_micron.c
> +++ b/drivers/mtd/nand/raw/nand_micron.c
> @@ -197,30 +197,37 @@ micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip,
>   
>   	ret = nand_read_page_op(chip, page, 0, NULL, 0);
>   	if (ret)
> -		goto out;
> +		goto err_disable_ecc;
>   
>   	ret = nand_status_op(chip, &status);
>   	if (ret)
> -		goto out;
> +		goto err_disable_ecc;
>   
>   	ret = nand_exit_status_op(chip);
>   	if (ret)
> -		goto out;
> +		goto err_disable_ecc;
>   
> -	if (chip->ecc.strength == 4)
> -		max_bitflips = micron_nand_on_die_ecc_status_4(chip, status);
> -	else
> -		max_bitflips = micron_nand_on_die_ecc_status_8(chip, status);
> +	micron_nand_on_die_ecc_setup(chip, false);
>   
>   	ret = nand_read_data_op(chip, buf, mtd->writesize, false);
>   	if (!ret && oob_required)
>   		ret = nand_read_data_op(chip, chip->oob_poi, mtd->oobsize,
>   					false);
>   
> -out:
> +	if (ret)
> +		return ret;
> +
> +	if (chip->ecc.strength == 4)
> +		max_bitflips = micron_nand_on_die_ecc_status_4(chip, status);
> +	else
> +		max_bitflips = micron_nand_on_die_ecc_status_8(chip, status);
> +
> +	return max_bitflips;
> +
> +err_disable_ecc:
>   	micron_nand_on_die_ecc_setup(chip, false);
>   
> -	return ret ? ret : max_bitflips;
> +	return ret;
>   }
>   
>   static int
>
Boris Brezillon July 16, 2018, 4:10 p.m. UTC | #2
On Mon, 16 Jul 2018 09:00:59 +0000
Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:

> Hi Boris,
> 
> On 04/07/18 00:20, Boris Brezillon wrote:
> > We are about to support extracting the real number of bitflips for
> > 4-bits ECC when WRITE_RECOMMEND is returned. This requires re-reading
> > the page in raw mode to compare it to the corrected version, and this
> > logic will be placed in micron_nand_on_die_ecc_status_4().
> > 
> > Moving the micron_nand_on_die_ecc_setup() will allow us to disable
> > ECC only once.
> > 
> > As a result, we have to rework the exit path and add an error path
> > where the ECC is disabled.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>  
> 
> As I said on the other thread this appears to cause a problem for me on 
> the MT29F1G08ABAFAWP-ITE setup I have. I notice we're not able to find 
> the BBT, not sure if that is symptom or cause. If I revert this I can 
> successfully mount and access data on the chip.
> 
> Here's some output from dmesg:
> 
> nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd1
> nand: Micron MT29F1G08ABAGAWP
> nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
> nand: timing mode 5 not acknowledged by the NAND chip
> Bad block table not found for chip 0
> Bad block table not found for chip 0
> Scanning device for bad blocks
> random: fast init done
> Bad block table written to 0x000007fe0000, version 0x01
> Bad block table written to 0x000007fc0000, version 0x01
> 3 fixed-partitions partitions found on MTD device pxa3xx_nand-0
> Creating 3 MTD partitions on "pxa3xx_nand-0":
> 0x000000000000-0x000007000000 : "user"
> 0x000007000000-0x000007800000 : "errlog"
> mtdoops: Attached to MTD device 1
> mtdoops: ready 0, 1
> 0x000007800000-0x000008000000 : "nand-bbt"
> 
> And from my attempt to mount:
> 
> [root@linuxbox ~]# umount /flash && ubiattach -p /dev/mtd0 && mount -t 
> ubifs ubi
> 0:user -o sync /flash
> ubi0: attaching mtd0
> ubi0: scanning is finished
> ubi0 error: ubi_read_volume_table: the layout volume was not found
> ubi0 error: ubi_attach_mtd_dev: failed to attach mtd0, error -22
> ubiattach: error!: cannot attach "/dev/mtd0"
>             error 22 (Invalid argument)
> [root@linuxbox ~]#
> 
> 
> > ---
> >   drivers/mtd/nand/raw/nand_micron.c | 25 ++++++++++++++++---------
> >   1 file changed, 16 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
> > index 63ac98a36ed7..b9cbaf125a98 100644
> > --- a/drivers/mtd/nand/raw/nand_micron.c
> > +++ b/drivers/mtd/nand/raw/nand_micron.c
> > @@ -197,30 +197,37 @@ micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip,
> >   
> >   	ret = nand_read_page_op(chip, page, 0, NULL, 0);
> >   	if (ret)
> > -		goto out;
> > +		goto err_disable_ecc;
> >   
> >   	ret = nand_status_op(chip, &status);
> >   	if (ret)
> > -		goto out;
> > +		goto err_disable_ecc;
> >   
> >   	ret = nand_exit_status_op(chip);
> >   	if (ret)
> > -		goto out;
> > +		goto err_disable_ecc;
> >   
> > -	if (chip->ecc.strength == 4)
> > -		max_bitflips = micron_nand_on_die_ecc_status_4(chip, status);
> > -	else
> > -		max_bitflips = micron_nand_on_die_ecc_status_8(chip, status);
> > +	micron_nand_on_die_ecc_setup(chip, false);

Hm, can you try to move the micron_nand_on_die_ecc_setup(chip, false)
call just before nand_exit_status_op()?

> >   
> >   	ret = nand_read_data_op(chip, buf, mtd->writesize, false);
> >   	if (!ret && oob_required)
> >   		ret = nand_read_data_op(chip, chip->oob_poi, mtd->oobsize,
> >   					false);
> >   
> > -out:
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (chip->ecc.strength == 4)
> > +		max_bitflips = micron_nand_on_die_ecc_status_4(chip, status);
> > +	else
> > +		max_bitflips = micron_nand_on_die_ecc_status_8(chip, status);
> > +
> > +	return max_bitflips;
> > +
> > +err_disable_ecc:
> >   	micron_nand_on_die_ecc_setup(chip, false);
> >   
> > -	return ret ? ret : max_bitflips;
> > +	return ret;
> >   }
> >   
> >   static int
> >   
>
Boris Brezillon July 16, 2018, 8:55 p.m. UTC | #3
On Mon, 16 Jul 2018 18:10:57 +0200
Boris Brezillon <boris.brezillon@bootlin.com> wrote:

> On Mon, 16 Jul 2018 09:00:59 +0000
> Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:
> 
> > Hi Boris,
> > 
> > On 04/07/18 00:20, Boris Brezillon wrote:  
> > > We are about to support extracting the real number of bitflips for
> > > 4-bits ECC when WRITE_RECOMMEND is returned. This requires re-reading
> > > the page in raw mode to compare it to the corrected version, and this
> > > logic will be placed in micron_nand_on_die_ecc_status_4().
> > > 
> > > Moving the micron_nand_on_die_ecc_setup() will allow us to disable
> > > ECC only once.
> > > 
> > > As a result, we have to rework the exit path and add an error path
> > > where the ECC is disabled.
> > > 
> > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>    
> > 
> > As I said on the other thread this appears to cause a problem for me on 
> > the MT29F1G08ABAFAWP-ITE setup I have. I notice we're not able to find 
> > the BBT, not sure if that is symptom or cause.

It's most likely the symptom, not the cause.

> > > 
> > > diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
> > > index 63ac98a36ed7..b9cbaf125a98 100644
> > > --- a/drivers/mtd/nand/raw/nand_micron.c
> > > +++ b/drivers/mtd/nand/raw/nand_micron.c
> > > @@ -197,30 +197,37 @@ micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip,
> > >   
> > >   	ret = nand_read_page_op(chip, page, 0, NULL, 0);
> > >   	if (ret)
> > > -		goto out;
> > > +		goto err_disable_ecc;
> > >   
> > >   	ret = nand_status_op(chip, &status);
> > >   	if (ret)
> > > -		goto out;
> > > +		goto err_disable_ecc;
> > >   
> > >   	ret = nand_exit_status_op(chip);
> > >   	if (ret)
> > > -		goto out;
> > > +		goto err_disable_ecc;
> > >   
> > > -	if (chip->ecc.strength == 4)
> > > -		max_bitflips = micron_nand_on_die_ecc_status_4(chip, status);
> > > -	else
> > > -		max_bitflips = micron_nand_on_die_ecc_status_8(chip, status);
> > > +	micron_nand_on_die_ecc_setup(chip, false);  
> 
> Hm, can you try to move the micron_nand_on_die_ecc_setup(chip, false)
> call just before nand_exit_status_op()?
> 

Just pushed a branch fixing that [1]. Can you test it? If it works,
I'll ask Miquel to drop the initial set of patches and instead pick the
fixed ones so that we don't break bisectibility.

[1]https://github.com/bbrezillon/linux-0day/commits/nand/next
Chris Packham July 16, 2018, 10:42 p.m. UTC | #4
On 17/07/18 08:55, Boris Brezillon wrote:
> On Mon, 16 Jul 2018 18:10:57 +0200
> Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> 
>> On Mon, 16 Jul 2018 09:00:59 +0000
>> Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:
>>
>>> Hi Boris,
>>>
>>> On 04/07/18 00:20, Boris Brezillon wrote:
>>>> We are about to support extracting the real number of bitflips for
>>>> 4-bits ECC when WRITE_RECOMMEND is returned. This requires re-reading
>>>> the page in raw mode to compare it to the corrected version, and this
>>>> logic will be placed in micron_nand_on_die_ecc_status_4().
>>>>
>>>> Moving the micron_nand_on_die_ecc_setup() will allow us to disable
>>>> ECC only once.
>>>>
>>>> As a result, we have to rework the exit path and add an error path
>>>> where the ECC is disabled.
>>>>
>>>> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
>>>
>>> As I said on the other thread this appears to cause a problem for me on
>>> the MT29F1G08ABAFAWP-ITE setup I have. I notice we're not able to find
>>> the BBT, not sure if that is symptom or cause.
> 
> It's most likely the symptom, not the cause.
> 
>>>>
>>>> diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
>>>> index 63ac98a36ed7..b9cbaf125a98 100644
>>>> --- a/drivers/mtd/nand/raw/nand_micron.c
>>>> +++ b/drivers/mtd/nand/raw/nand_micron.c
>>>> @@ -197,30 +197,37 @@ micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip,
>>>>    
>>>>    	ret = nand_read_page_op(chip, page, 0, NULL, 0);
>>>>    	if (ret)
>>>> -		goto out;
>>>> +		goto err_disable_ecc;
>>>>    
>>>>    	ret = nand_status_op(chip, &status);
>>>>    	if (ret)
>>>> -		goto out;
>>>> +		goto err_disable_ecc;
>>>>    
>>>>    	ret = nand_exit_status_op(chip);
>>>>    	if (ret)
>>>> -		goto out;
>>>> +		goto err_disable_ecc;
>>>>    
>>>> -	if (chip->ecc.strength == 4)
>>>> -		max_bitflips = micron_nand_on_die_ecc_status_4(chip, status);
>>>> -	else
>>>> -		max_bitflips = micron_nand_on_die_ecc_status_8(chip, status);
>>>> +	micron_nand_on_die_ecc_setup(chip, false);
>>
>> Hm, can you try to move the micron_nand_on_die_ecc_setup(chip, false)
>> call just before nand_exit_status_op()?
>>
> 
> Just pushed a branch fixing that [1]. Can you test it? If it works,
> I'll ask Miquel to drop the initial set of patches and instead pick the
> fixed ones so that we don't break bisectibility.
> 
> [1]https://github.com/bbrezillon/linux-0day/commits/nand/next
> 

Still appears to have the same problem.

I'm guessing that since you can't actually disable ecc on this chip 
calling micron_nand_on_die_ecc_setup(chip, false); before reading the 
oob data interferes with it somehow (if I call it after there is no 
problem).

We could add code to qualify the attempt to disable ecc early based on 
it being optional/mandatory or just stick with it being disabled late.
Boris Brezillon July 17, 2018, 11:41 a.m. UTC | #5
On Mon, 16 Jul 2018 22:42:54 +0000
Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:

> On 17/07/18 08:55, Boris Brezillon wrote:
> > On Mon, 16 Jul 2018 18:10:57 +0200
> > Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> >   
> >> On Mon, 16 Jul 2018 09:00:59 +0000
> >> Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:
> >>  
> >>> Hi Boris,
> >>>
> >>> On 04/07/18 00:20, Boris Brezillon wrote:  
> >>>> We are about to support extracting the real number of bitflips for
> >>>> 4-bits ECC when WRITE_RECOMMEND is returned. This requires re-reading
> >>>> the page in raw mode to compare it to the corrected version, and this
> >>>> logic will be placed in micron_nand_on_die_ecc_status_4().
> >>>>
> >>>> Moving the micron_nand_on_die_ecc_setup() will allow us to disable
> >>>> ECC only once.
> >>>>
> >>>> As a result, we have to rework the exit path and add an error path
> >>>> where the ECC is disabled.
> >>>>
> >>>> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>  
> >>>
> >>> As I said on the other thread this appears to cause a problem for me on
> >>> the MT29F1G08ABAFAWP-ITE setup I have. I notice we're not able to find
> >>> the BBT, not sure if that is symptom or cause.  
> > 
> > It's most likely the symptom, not the cause.
> >   
> >>>>
> >>>> diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
> >>>> index 63ac98a36ed7..b9cbaf125a98 100644
> >>>> --- a/drivers/mtd/nand/raw/nand_micron.c
> >>>> +++ b/drivers/mtd/nand/raw/nand_micron.c
> >>>> @@ -197,30 +197,37 @@ micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip,
> >>>>    
> >>>>    	ret = nand_read_page_op(chip, page, 0, NULL, 0);
> >>>>    	if (ret)
> >>>> -		goto out;
> >>>> +		goto err_disable_ecc;
> >>>>    
> >>>>    	ret = nand_status_op(chip, &status);
> >>>>    	if (ret)
> >>>> -		goto out;
> >>>> +		goto err_disable_ecc;
> >>>>    
> >>>>    	ret = nand_exit_status_op(chip);
> >>>>    	if (ret)
> >>>> -		goto out;
> >>>> +		goto err_disable_ecc;
> >>>>    
> >>>> -	if (chip->ecc.strength == 4)
> >>>> -		max_bitflips = micron_nand_on_die_ecc_status_4(chip, status);
> >>>> -	else
> >>>> -		max_bitflips = micron_nand_on_die_ecc_status_8(chip, status);
> >>>> +	micron_nand_on_die_ecc_setup(chip, false);  
> >>
> >> Hm, can you try to move the micron_nand_on_die_ecc_setup(chip, false)
> >> call just before nand_exit_status_op()?
> >>  
> > 
> > Just pushed a branch fixing that [1]. Can you test it? If it works,
> > I'll ask Miquel to drop the initial set of patches and instead pick the
> > fixed ones so that we don't break bisectibility.
> > 
> > [1]https://github.com/bbrezillon/linux-0day/commits/nand/next
> >   
> 
> Still appears to have the same problem.
> 
> I'm guessing that since you can't actually disable ecc on this chip 
> calling micron_nand_on_die_ecc_setup(chip, false); before reading the 
> oob data interferes with it somehow (if I call it after there is no 
> problem).
> 
> We could add code to qualify the attempt to disable ecc early based on 
> it being optional/mandatory or just stick with it being disabled late.

Okay. I gave up on disabling ECC earlier and instead implemented what
you suggested. Can you test
https://github.com/bbrezillon/linux-0day/commits/nand/next again and
let me know if it works for you?

Thanks,

Boris
Chris Packham July 17, 2018, 9:27 p.m. UTC | #6
Hi Boris,

On 17/07/18 23:42, Boris Brezillon wrote:
> On Mon, 16 Jul 2018 22:42:54 +0000
> Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:
> 
>> On 17/07/18 08:55, Boris Brezillon wrote:
>>> On Mon, 16 Jul 2018 18:10:57 +0200
>>> Boris Brezillon <boris.brezillon@bootlin.com> wrote:
>>>    
>>>> On Mon, 16 Jul 2018 09:00:59 +0000
>>>> Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:
>>>>   
>>>>> Hi Boris,
>>>>>
>>>>> On 04/07/18 00:20, Boris Brezillon wrote:
>>>>>> We are about to support extracting the real number of bitflips for
>>>>>> 4-bits ECC when WRITE_RECOMMEND is returned. This requires re-reading
>>>>>> the page in raw mode to compare it to the corrected version, and this
>>>>>> logic will be placed in micron_nand_on_die_ecc_status_4().
>>>>>>
>>>>>> Moving the micron_nand_on_die_ecc_setup() will allow us to disable
>>>>>> ECC only once.
>>>>>>
>>>>>> As a result, we have to rework the exit path and add an error path
>>>>>> where the ECC is disabled.
>>>>>>
>>>>>> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
>>>>>
>>>>> As I said on the other thread this appears to cause a problem for me on
>>>>> the MT29F1G08ABAFAWP-ITE setup I have. I notice we're not able to find
>>>>> the BBT, not sure if that is symptom or cause.
>>>
>>> It's most likely the symptom, not the cause.
>>>    
>>>>>>
>>>>>> diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
>>>>>> index 63ac98a36ed7..b9cbaf125a98 100644
>>>>>> --- a/drivers/mtd/nand/raw/nand_micron.c
>>>>>> +++ b/drivers/mtd/nand/raw/nand_micron.c
>>>>>> @@ -197,30 +197,37 @@ micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip,
>>>>>>     
>>>>>>     	ret = nand_read_page_op(chip, page, 0, NULL, 0);
>>>>>>     	if (ret)
>>>>>> -		goto out;
>>>>>> +		goto err_disable_ecc;
>>>>>>     
>>>>>>     	ret = nand_status_op(chip, &status);
>>>>>>     	if (ret)
>>>>>> -		goto out;
>>>>>> +		goto err_disable_ecc;
>>>>>>     
>>>>>>     	ret = nand_exit_status_op(chip);
>>>>>>     	if (ret)
>>>>>> -		goto out;
>>>>>> +		goto err_disable_ecc;
>>>>>>     
>>>>>> -	if (chip->ecc.strength == 4)
>>>>>> -		max_bitflips = micron_nand_on_die_ecc_status_4(chip, status);
>>>>>> -	else
>>>>>> -		max_bitflips = micron_nand_on_die_ecc_status_8(chip, status);
>>>>>> +	micron_nand_on_die_ecc_setup(chip, false);
>>>>
>>>> Hm, can you try to move the micron_nand_on_die_ecc_setup(chip, false)
>>>> call just before nand_exit_status_op()?
>>>>   
>>>
>>> Just pushed a branch fixing that [1]. Can you test it? If it works,
>>> I'll ask Miquel to drop the initial set of patches and instead pick the
>>> fixed ones so that we don't break bisectibility.
>>>
>>> [1]https://github.com/bbrezillon/linux-0day/commits/nand/next
>>>    
>>
>> Still appears to have the same problem.
>>
>> I'm guessing that since you can't actually disable ecc on this chip
>> calling micron_nand_on_die_ecc_setup(chip, false); before reading the
>> oob data interferes with it somehow (if I call it after there is no
>> problem).
>>
>> We could add code to qualify the attempt to disable ecc early based on
>> it being optional/mandatory or just stick with it being disabled late.
> 
> Okay. I gave up on disabling ECC earlier and instead implemented what
> you suggested. Can you test
> https://github.com/bbrezillon/linux-0day/commits/nand/next again and
> let me know if it works for you?

Yes that works for me. Feel free to add

Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz>

to

88240e244026 mtd: rawnand: micron: Make ECC activation stateful
6ae7f205cec0 mtd: rawnand: micron: Avoid enabling/disabling ECC when it 
can't be disabled
Boris Brezillon July 17, 2018, 9:31 p.m. UTC | #7
On Tue, 17 Jul 2018 21:27:26 +0000
Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:

> Hi Boris,
> 
> On 17/07/18 23:42, Boris Brezillon wrote:
> > On Mon, 16 Jul 2018 22:42:54 +0000
> > Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:
> >   
> >> On 17/07/18 08:55, Boris Brezillon wrote:  
> >>> On Mon, 16 Jul 2018 18:10:57 +0200
> >>> Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> >>>      
> >>>> On Mon, 16 Jul 2018 09:00:59 +0000
> >>>> Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:
> >>>>     
> >>>>> Hi Boris,
> >>>>>
> >>>>> On 04/07/18 00:20, Boris Brezillon wrote:  
> >>>>>> We are about to support extracting the real number of bitflips for
> >>>>>> 4-bits ECC when WRITE_RECOMMEND is returned. This requires re-reading
> >>>>>> the page in raw mode to compare it to the corrected version, and this
> >>>>>> logic will be placed in micron_nand_on_die_ecc_status_4().
> >>>>>>
> >>>>>> Moving the micron_nand_on_die_ecc_setup() will allow us to disable
> >>>>>> ECC only once.
> >>>>>>
> >>>>>> As a result, we have to rework the exit path and add an error path
> >>>>>> where the ECC is disabled.
> >>>>>>
> >>>>>> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>  
> >>>>>
> >>>>> As I said on the other thread this appears to cause a problem for me on
> >>>>> the MT29F1G08ABAFAWP-ITE setup I have. I notice we're not able to find
> >>>>> the BBT, not sure if that is symptom or cause.  
> >>>
> >>> It's most likely the symptom, not the cause.
> >>>      
> >>>>>>
> >>>>>> diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
> >>>>>> index 63ac98a36ed7..b9cbaf125a98 100644
> >>>>>> --- a/drivers/mtd/nand/raw/nand_micron.c
> >>>>>> +++ b/drivers/mtd/nand/raw/nand_micron.c
> >>>>>> @@ -197,30 +197,37 @@ micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip,
> >>>>>>     
> >>>>>>     	ret = nand_read_page_op(chip, page, 0, NULL, 0);
> >>>>>>     	if (ret)
> >>>>>> -		goto out;
> >>>>>> +		goto err_disable_ecc;
> >>>>>>     
> >>>>>>     	ret = nand_status_op(chip, &status);
> >>>>>>     	if (ret)
> >>>>>> -		goto out;
> >>>>>> +		goto err_disable_ecc;
> >>>>>>     
> >>>>>>     	ret = nand_exit_status_op(chip);
> >>>>>>     	if (ret)
> >>>>>> -		goto out;
> >>>>>> +		goto err_disable_ecc;
> >>>>>>     
> >>>>>> -	if (chip->ecc.strength == 4)
> >>>>>> -		max_bitflips = micron_nand_on_die_ecc_status_4(chip, status);
> >>>>>> -	else
> >>>>>> -		max_bitflips = micron_nand_on_die_ecc_status_8(chip, status);
> >>>>>> +	micron_nand_on_die_ecc_setup(chip, false);  
> >>>>
> >>>> Hm, can you try to move the micron_nand_on_die_ecc_setup(chip, false)
> >>>> call just before nand_exit_status_op()?
> >>>>     
> >>>
> >>> Just pushed a branch fixing that [1]. Can you test it? If it works,
> >>> I'll ask Miquel to drop the initial set of patches and instead pick the
> >>> fixed ones so that we don't break bisectibility.
> >>>
> >>> [1]https://github.com/bbrezillon/linux-0day/commits/nand/next
> >>>      
> >>
> >> Still appears to have the same problem.
> >>
> >> I'm guessing that since you can't actually disable ecc on this chip
> >> calling micron_nand_on_die_ecc_setup(chip, false); before reading the
> >> oob data interferes with it somehow (if I call it after there is no
> >> problem).
> >>
> >> We could add code to qualify the attempt to disable ecc early based on
> >> it being optional/mandatory or just stick with it being disabled late.  
> > 
> > Okay. I gave up on disabling ECC earlier and instead implemented what
> > you suggested. Can you test
> > https://github.com/bbrezillon/linux-0day/commits/nand/next again and
> > let me know if it works for you?  
> 
> Yes that works for me. Feel free to add
> 
> Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
> 
> to
> 
> 88240e244026 mtd: rawnand: micron: Make ECC activation stateful
> 6ae7f205cec0 mtd: rawnand: micron: Avoid enabling/disabling ECC when it 
> can't be disabled

Okay, cool.

Miquel, can you drop:

683ca8a70738 mtd: rawnand: micron: Get the actual number of bitflips
d73e5d29aa29 mtd: rawnand: micron: Disable ECC earlier in the read path
db2d37331813 mtd: rawnand: micron: Stop passing mtd to ecc_status() funcs
51f3b3970a8c mtd: rawnand: micron: detect forced on-die ECC
386fc2609d15 mtd: rawnand: micron: support 8/512 on-die ECC

I'll send a new patch series soon.

Thanks,

Boris
Miquel Raynal July 18, 2018, 7:25 a.m. UTC | #8
Hi Boris,

Boris Brezillon <boris.brezillon@bootlin.com> wrote on Tue, 17 Jul 2018
23:31:43 +0200:

> On Tue, 17 Jul 2018 21:27:26 +0000
> Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:
> 
> > Hi Boris,
> > 
> > On 17/07/18 23:42, Boris Brezillon wrote:  
> > > On Mon, 16 Jul 2018 22:42:54 +0000
> > > Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:
> > >     
> > >> On 17/07/18 08:55, Boris Brezillon wrote:    
> > >>> On Mon, 16 Jul 2018 18:10:57 +0200
> > >>> Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> > >>>        
> > >>>> On Mon, 16 Jul 2018 09:00:59 +0000
> > >>>> Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote:
> > >>>>       
> > >>>>> Hi Boris,
> > >>>>>
> > >>>>> On 04/07/18 00:20, Boris Brezillon wrote:    
> > >>>>>> We are about to support extracting the real number of bitflips for
> > >>>>>> 4-bits ECC when WRITE_RECOMMEND is returned. This requires re-reading
> > >>>>>> the page in raw mode to compare it to the corrected version, and this
> > >>>>>> logic will be placed in micron_nand_on_die_ecc_status_4().
> > >>>>>>
> > >>>>>> Moving the micron_nand_on_die_ecc_setup() will allow us to disable
> > >>>>>> ECC only once.
> > >>>>>>
> > >>>>>> As a result, we have to rework the exit path and add an error path
> > >>>>>> where the ECC is disabled.
> > >>>>>>
> > >>>>>> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>    
> > >>>>>
> > >>>>> As I said on the other thread this appears to cause a problem for me on
> > >>>>> the MT29F1G08ABAFAWP-ITE setup I have. I notice we're not able to find
> > >>>>> the BBT, not sure if that is symptom or cause.    
> > >>>
> > >>> It's most likely the symptom, not the cause.
> > >>>        
> > >>>>>>
> > >>>>>> diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
> > >>>>>> index 63ac98a36ed7..b9cbaf125a98 100644
> > >>>>>> --- a/drivers/mtd/nand/raw/nand_micron.c
> > >>>>>> +++ b/drivers/mtd/nand/raw/nand_micron.c
> > >>>>>> @@ -197,30 +197,37 @@ micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip,
> > >>>>>>     
> > >>>>>>     	ret = nand_read_page_op(chip, page, 0, NULL, 0);
> > >>>>>>     	if (ret)
> > >>>>>> -		goto out;
> > >>>>>> +		goto err_disable_ecc;
> > >>>>>>     
> > >>>>>>     	ret = nand_status_op(chip, &status);
> > >>>>>>     	if (ret)
> > >>>>>> -		goto out;
> > >>>>>> +		goto err_disable_ecc;
> > >>>>>>     
> > >>>>>>     	ret = nand_exit_status_op(chip);
> > >>>>>>     	if (ret)
> > >>>>>> -		goto out;
> > >>>>>> +		goto err_disable_ecc;
> > >>>>>>     
> > >>>>>> -	if (chip->ecc.strength == 4)
> > >>>>>> -		max_bitflips = micron_nand_on_die_ecc_status_4(chip, status);
> > >>>>>> -	else
> > >>>>>> -		max_bitflips = micron_nand_on_die_ecc_status_8(chip, status);
> > >>>>>> +	micron_nand_on_die_ecc_setup(chip, false);    
> > >>>>
> > >>>> Hm, can you try to move the micron_nand_on_die_ecc_setup(chip, false)
> > >>>> call just before nand_exit_status_op()?
> > >>>>       
> > >>>
> > >>> Just pushed a branch fixing that [1]. Can you test it? If it works,
> > >>> I'll ask Miquel to drop the initial set of patches and instead pick the
> > >>> fixed ones so that we don't break bisectibility.
> > >>>
> > >>> [1]https://github.com/bbrezillon/linux-0day/commits/nand/next
> > >>>        
> > >>
> > >> Still appears to have the same problem.
> > >>
> > >> I'm guessing that since you can't actually disable ecc on this chip
> > >> calling micron_nand_on_die_ecc_setup(chip, false); before reading the
> > >> oob data interferes with it somehow (if I call it after there is no
> > >> problem).
> > >>
> > >> We could add code to qualify the attempt to disable ecc early based on
> > >> it being optional/mandatory or just stick with it being disabled late.    
> > > 
> > > Okay. I gave up on disabling ECC earlier and instead implemented what
> > > you suggested. Can you test
> > > https://github.com/bbrezillon/linux-0day/commits/nand/next again and
> > > let me know if it works for you?    
> > 
> > Yes that works for me. Feel free to add
> > 
> > Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
> > 
> > to
> > 
> > 88240e244026 mtd: rawnand: micron: Make ECC activation stateful
> > 6ae7f205cec0 mtd: rawnand: micron: Avoid enabling/disabling ECC when it 
> > can't be disabled  
> 
> Okay, cool.
> 
> Miquel, can you drop:
> 
> 683ca8a70738 mtd: rawnand: micron: Get the actual number of bitflips
> d73e5d29aa29 mtd: rawnand: micron: Disable ECC earlier in the read path
> db2d37331813 mtd: rawnand: micron: Stop passing mtd to ecc_status() funcs
> 51f3b3970a8c mtd: rawnand: micron: detect forced on-die ECC
> 386fc2609d15 mtd: rawnand: micron: support 8/512 on-die ECC
> 
> I'll send a new patch series soon.

Above patches dropped, thank you both for working on (and fixing) this.

I'll wait for the new series.

Thanks,
Miquèl
diff mbox series

Patch

diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
index 63ac98a36ed7..b9cbaf125a98 100644
--- a/drivers/mtd/nand/raw/nand_micron.c
+++ b/drivers/mtd/nand/raw/nand_micron.c
@@ -197,30 +197,37 @@  micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip,
 
 	ret = nand_read_page_op(chip, page, 0, NULL, 0);
 	if (ret)
-		goto out;
+		goto err_disable_ecc;
 
 	ret = nand_status_op(chip, &status);
 	if (ret)
-		goto out;
+		goto err_disable_ecc;
 
 	ret = nand_exit_status_op(chip);
 	if (ret)
-		goto out;
+		goto err_disable_ecc;
 
-	if (chip->ecc.strength == 4)
-		max_bitflips = micron_nand_on_die_ecc_status_4(chip, status);
-	else
-		max_bitflips = micron_nand_on_die_ecc_status_8(chip, status);
+	micron_nand_on_die_ecc_setup(chip, false);
 
 	ret = nand_read_data_op(chip, buf, mtd->writesize, false);
 	if (!ret && oob_required)
 		ret = nand_read_data_op(chip, chip->oob_poi, mtd->oobsize,
 					false);
 
-out:
+	if (ret)
+		return ret;
+
+	if (chip->ecc.strength == 4)
+		max_bitflips = micron_nand_on_die_ecc_status_4(chip, status);
+	else
+		max_bitflips = micron_nand_on_die_ecc_status_8(chip, status);
+
+	return max_bitflips;
+
+err_disable_ecc:
 	micron_nand_on_die_ecc_setup(chip, false);
 
-	return ret ? ret : max_bitflips;
+	return ret;
 }
 
 static int