diff mbox series

mtd: rawnand: marvell: check for RDY bits after enabling the IRQ

Message ID 20180926212353.13399-1-daniel@zonque.org
State Superseded
Headers show
Series mtd: rawnand: marvell: check for RDY bits after enabling the IRQ | expand

Commit Message

Daniel Mack Sept. 26, 2018, 9:23 p.m. UTC
At least on PXA3xx platforms, enabling RDY interrupts in the NDCR register
will only cause the IRQ to latch when the RDY lanes are changing, and not
in case they are already asserted.

This means that if the controller finished the command in flight before
marvell_nfc_wait_op() is called, that function will wait for a change in
the bit that can't ever happen as it is already set.

To mitigate this race, check for the RDY bits after the IRQ was enabled,
and only sleep on the condition if the controller isn't ready yet.

This fixes a bug that was observed with a NAND chip that holds a UBIFS
parition on which file system stress tests were executed. When
marvell_nfc_wait_op() reports an error, UBI/UBIFS will eventually mount
the filesystem read-only, reporting lots of warnings along the way.

Fixes: 02f26ecf8c77 mtd: nand: add reworked Marvell NAND controller driver
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 drivers/mtd/nand/raw/marvell_nand.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Chris Packham Sept. 26, 2018, 11:33 p.m. UTC | #1
Hi Daniel,

On 27/09/18 09:24, Daniel Mack wrote:
> At least on PXA3xx platforms, enabling RDY interrupts in the NDCR register
> will only cause the IRQ to latch when the RDY lanes are changing, and not
> in case they are already asserted.
> 
> This means that if the controller finished the command in flight before
> marvell_nfc_wait_op() is called, that function will wait for a change in
> the bit that can't ever happen as it is already set.
> 
> To mitigate this race, check for the RDY bits after the IRQ was enabled,
> and only sleep on the condition if the controller isn't ready yet.
> 
> This fixes a bug that was observed with a NAND chip that holds a UBIFS
> parition on which file system stress tests were executed. When
> marvell_nfc_wait_op() reports an error, UBI/UBIFS will eventually mount
> the filesystem read-only, reporting lots of warnings along the way.
> 
> Fixes: 02f26ecf8c77 mtd: nand: add reworked Marvell NAND controller driver
> Cc: stable@vger.kernel.org
> Signed-off-by: Daniel Mack <daniel@zonque.org>
> ---
>   drivers/mtd/nand/raw/marvell_nand.c | 14 +++++++++++---
>   1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
> index 666f34b58dec..e96ec7b9a152 100644
> --- a/drivers/mtd/nand/raw/marvell_nand.c
> +++ b/drivers/mtd/nand/raw/marvell_nand.c
> @@ -613,7 +613,8 @@ static int marvell_nfc_wait_cmdd(struct nand_chip *chip)
>   static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
>   {
>   	struct marvell_nfc *nfc = to_marvell_nfc(chip->controller);
> -	int ret;
> +	int ret = -EALREADY;

Won't this cause us to hit the if (!ret) below if we don't end up 
calling wait_for_completion_timeout()? In fact I did just hit something 
like this on my Armada-385 based board

marvell-nfc f10d0000.nand-controller: Timeout waiting for RB signal
ubi0 warning: do_sync_erase: error -5 while erasing PEB 755, retry

> +	u32 st;
>   
>   	/* Timeout is expressed in ms */
>   	if (!timeout_ms)
> @@ -622,8 +623,15 @@ static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
>   	init_completion(&nfc->complete);
>   
>   	marvell_nfc_enable_int(nfc, NDCR_RDYM);
> -	ret = wait_for_completion_timeout(&nfc->complete,
> -					  msecs_to_jiffies(timeout_ms));
> +
> +	/*
> +	 * Check if the NDSR_RDY bits have already been set before the
> +	 * interrupt was enabled.
> +	 */
> +	st = readl_relaxed(nfc->regs + NDSR);
> +	if (!(st & (NDSR_RDY(0) | NDSR_RDY(1))))
> +		ret = wait_for_completion_timeout(&nfc->complete,
> +						  msecs_to_jiffies(timeout_ms));
>   	marvell_nfc_disable_int(nfc, NDCR_RDYM);
>   	marvell_nfc_clear_int(nfc, NDSR_RDY(0) | NDSR_RDY(1));
>   	if (!ret) {
>
Chris Packham Sept. 26, 2018, 11:42 p.m. UTC | #2
On 27/09/18 11:33, Chris Packham wrote:
> Hi Daniel,
> 
> On 27/09/18 09:24, Daniel Mack wrote:
>> At least on PXA3xx platforms, enabling RDY interrupts in the NDCR register
>> will only cause the IRQ to latch when the RDY lanes are changing, and not
>> in case they are already asserted.
>>
>> This means that if the controller finished the command in flight before
>> marvell_nfc_wait_op() is called, that function will wait for a change in
>> the bit that can't ever happen as it is already set.
>>
>> To mitigate this race, check for the RDY bits after the IRQ was enabled,
>> and only sleep on the condition if the controller isn't ready yet.
>>
>> This fixes a bug that was observed with a NAND chip that holds a UBIFS
>> parition on which file system stress tests were executed. When
>> marvell_nfc_wait_op() reports an error, UBI/UBIFS will eventually mount
>> the filesystem read-only, reporting lots of warnings along the way.
>>
>> Fixes: 02f26ecf8c77 mtd: nand: add reworked Marvell NAND controller driver
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Daniel Mack <daniel@zonque.org>
>> ---
>>    drivers/mtd/nand/raw/marvell_nand.c | 14 +++++++++++---
>>    1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
>> index 666f34b58dec..e96ec7b9a152 100644
>> --- a/drivers/mtd/nand/raw/marvell_nand.c
>> +++ b/drivers/mtd/nand/raw/marvell_nand.c
>> @@ -613,7 +613,8 @@ static int marvell_nfc_wait_cmdd(struct nand_chip *chip)
>>    static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
>>    {
>>    	struct marvell_nfc *nfc = to_marvell_nfc(chip->controller);
>> -	int ret;
>> +	int ret = -EALREADY;
> 
> Won't this cause us to hit the if (!ret) below if we don't end up
> calling wait_for_completion_timeout()?

Nope boolean logic fail.

> In fact I did just hit something
> like this on my Armada-385 based board
> 
> marvell-nfc f10d0000.nand-controller: Timeout waiting for RB signal
> ubi0 warning: do_sync_erase: error -5 while erasing PEB 755, retry
>

But this still might be something else, it doesn't happen regularly. For 
what it's worth

Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz>

>> +	u32 st;
>>    
>>    	/* Timeout is expressed in ms */
>>    	if (!timeout_ms)
>> @@ -622,8 +623,15 @@ static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
>>    	init_completion(&nfc->complete);
>>    
>>    	marvell_nfc_enable_int(nfc, NDCR_RDYM);
>> -	ret = wait_for_completion_timeout(&nfc->complete,
>> -					  msecs_to_jiffies(timeout_ms));
>> +
>> +	/*
>> +	 * Check if the NDSR_RDY bits have already been set before the
>> +	 * interrupt was enabled.
>> +	 */
>> +	st = readl_relaxed(nfc->regs + NDSR);
>> +	if (!(st & (NDSR_RDY(0) | NDSR_RDY(1))))
>> +		ret = wait_for_completion_timeout(&nfc->complete,
>> +						  msecs_to_jiffies(timeout_ms));
>>    	marvell_nfc_disable_int(nfc, NDCR_RDYM);
>>    	marvell_nfc_clear_int(nfc, NDSR_RDY(0) | NDSR_RDY(1));
>>    	if (!ret) {
>>
> 
>
Boris Brezillon Sept. 27, 2018, 6:20 a.m. UTC | #3
Hi Daniel,

On Wed, 26 Sep 2018 23:23:53 +0200
Daniel Mack <daniel@zonque.org> wrote:

> At least on PXA3xx platforms, enabling RDY interrupts in the NDCR register
> will only cause the IRQ to latch when the RDY lanes are changing, and not
> in case they are already asserted.
> 
> This means that if the controller finished the command in flight before
> marvell_nfc_wait_op() is called, that function will wait for a change in
> the bit that can't ever happen as it is already set.
> 
> To mitigate this race, check for the RDY bits after the IRQ was enabled,
> and only sleep on the condition if the controller isn't ready yet.
> 
> This fixes a bug that was observed with a NAND chip that holds a UBIFS
> parition on which file system stress tests were executed. When
> marvell_nfc_wait_op() reports an error, UBI/UBIFS will eventually mount
> the filesystem read-only, reporting lots of warnings along the way.
> 
> Fixes: 02f26ecf8c77 mtd: nand: add reworked Marvell NAND controller driver
> Cc: stable@vger.kernel.org
> Signed-off-by: Daniel Mack <daniel@zonque.org>
> ---
>  drivers/mtd/nand/raw/marvell_nand.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
> index 666f34b58dec..e96ec7b9a152 100644
> --- a/drivers/mtd/nand/raw/marvell_nand.c
> +++ b/drivers/mtd/nand/raw/marvell_nand.c
> @@ -613,7 +613,8 @@ static int marvell_nfc_wait_cmdd(struct nand_chip *chip)
>  static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
>  {
>  	struct marvell_nfc *nfc = to_marvell_nfc(chip->controller);
> -	int ret;
> +	int ret = -EALREADY;
> +	u32 st;
>  
>  	/* Timeout is expressed in ms */
>  	if (!timeout_ms)
> @@ -622,8 +623,15 @@ static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
>  	init_completion(&nfc->complete);
>  
>  	marvell_nfc_enable_int(nfc, NDCR_RDYM);
> -	ret = wait_for_completion_timeout(&nfc->complete,
> -					  msecs_to_jiffies(timeout_ms));
> +
> +	/*
> +	 * Check if the NDSR_RDY bits have already been set before the
> +	 * interrupt was enabled.
> +	 */
> +	st = readl_relaxed(nfc->regs + NDSR);
> +	if (!(st & (NDSR_RDY(0) | NDSR_RDY(1))))
> +		ret = wait_for_completion_timeout(&nfc->complete,
> +						  msecs_to_jiffies(timeout_ms));

Or you can just do:

	st = readl_relaxed(nfc->regs + NDSR);
	if (st & (NDSR_RDY(0) | NDSR_RDY(1)))
		complete(&nfc->complete);

	ret = wait_for_completion_timeout(&nfc->complete,
					  msecs_to_jiffies(timeout_ms));
	...

Of course, it's less efficient than your version, but I find it clearer
than the -EALREADY approach.

>  	marvell_nfc_disable_int(nfc, NDCR_RDYM);
>  	marvell_nfc_clear_int(nfc, NDSR_RDY(0) | NDSR_RDY(1));
>  	if (!ret) {
Daniel Mack Sept. 27, 2018, 7:13 a.m. UTC | #4
Hi,

On 27/9/2018 1:42 AM, Chris Packham wrote:
> On 27/09/18 11:33, Chris Packham wrote:
>> On 27/09/18 09:24, Daniel Mack wrote:

>>> diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
>>> index 666f34b58dec..e96ec7b9a152 100644
>>> --- a/drivers/mtd/nand/raw/marvell_nand.c
>>> +++ b/drivers/mtd/nand/raw/marvell_nand.c
>>> @@ -613,7 +613,8 @@ static int marvell_nfc_wait_cmdd(struct nand_chip *chip)
>>>     static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
>>>     {
>>>     	struct marvell_nfc *nfc = to_marvell_nfc(chip->controller);
>>> -	int ret;
>>> +	int ret = -EALREADY;
>>
>> Won't this cause us to hit the if (!ret) below if we don't end up
>> calling wait_for_completion_timeout()?
> 
> Nope boolean logic fail.

Yup :)

>> In fact I did just hit something
>> like this on my Armada-385 based board
>>
>> marvell-nfc f10d0000.nand-controller: Timeout waiting for RB signal
>> ubi0 warning: do_sync_erase: error -5 while erasing PEB 755, retry
>>
> 
> But this still might be something else, it doesn't happen regularly. For
> what it's worth
> 
> Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz>

Thanks. Did you see the issue that I described without that patch?


Cheers,
Daniel
Daniel Mack Sept. 27, 2018, 7:14 a.m. UTC | #5
Hi Boris,

On 27/9/2018 8:20 AM, Boris Brezillon wrote:
> On Wed, 26 Sep 2018 23:23:53 +0200
> Daniel Mack <daniel@zonque.org> wrote:

>>   	marvell_nfc_enable_int(nfc, NDCR_RDYM);
>> -	ret = wait_for_completion_timeout(&nfc->complete,
>> -					  msecs_to_jiffies(timeout_ms));
>> +
>> +	/*
>> +	 * Check if the NDSR_RDY bits have already been set before the
>> +	 * interrupt was enabled.
>> +	 */
>> +	st = readl_relaxed(nfc->regs + NDSR);
>> +	if (!(st & (NDSR_RDY(0) | NDSR_RDY(1))))
>> +		ret = wait_for_completion_timeout(&nfc->complete,
>> +						  msecs_to_jiffies(timeout_ms));
> 
> Or you can just do:
> 
> 	st = readl_relaxed(nfc->regs + NDSR);
> 	if (st & (NDSR_RDY(0) | NDSR_RDY(1)))
> 		complete(&nfc->complete);
> 
> 	ret = wait_for_completion_timeout(&nfc->complete,
> 					  msecs_to_jiffies(timeout_ms));
> 	...
> 

Ah, yes. Thanks.

> Of course, it's less efficient than your version, but I find it clearer
> than the -EALREADY approach.

Well, this issue usually takes minutes to trigger, so efficiency is not 
a good argument :)

I'll resend a v2.


Thanks,
Daniel
Chris Packham Sept. 27, 2018, 8:36 p.m. UTC | #6
On 27/09/18 19:13, Daniel Mack wrote:
> Hi,
> 
> On 27/9/2018 1:42 AM, Chris Packham wrote:
>> On 27/09/18 11:33, Chris Packham wrote:
>>> On 27/09/18 09:24, Daniel Mack wrote:
> 
>>>> diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
>>>> index 666f34b58dec..e96ec7b9a152 100644
>>>> --- a/drivers/mtd/nand/raw/marvell_nand.c
>>>> +++ b/drivers/mtd/nand/raw/marvell_nand.c
>>>> @@ -613,7 +613,8 @@ static int marvell_nfc_wait_cmdd(struct nand_chip *chip)
>>>>      static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
>>>>      {
>>>>      	struct marvell_nfc *nfc = to_marvell_nfc(chip->controller);
>>>> -	int ret;
>>>> +	int ret = -EALREADY;
>>>
>>> Won't this cause us to hit the if (!ret) below if we don't end up
>>> calling wait_for_completion_timeout()?
>>
>> Nope boolean logic fail.
> 
> Yup :)
> 
>>> In fact I did just hit something
>>> like this on my Armada-385 based board
>>>
>>> marvell-nfc f10d0000.nand-controller: Timeout waiting for RB signal
>>> ubi0 warning: do_sync_erase: error -5 while erasing PEB 755, retry
>>>
>>
>> But this still might be something else, it doesn't happen regularly. For
>> what it's worth
>>
>> Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
> 
> Thanks. Did you see the issue that I described without that patch?
> 

Yes. I've just tried a test without your patch and I see the same thing. 
I wonder if my R/B configuration is correct.
Chris Packham Sept. 27, 2018, 9:50 p.m. UTC | #7
On 28/09/18 08:36, Chris Packham wrote:
> On 27/09/18 19:13, Daniel Mack wrote:
>> Hi,
>>
>> On 27/9/2018 1:42 AM, Chris Packham wrote:
>>> On 27/09/18 11:33, Chris Packham wrote:
>>>> On 27/09/18 09:24, Daniel Mack wrote:
>>
>>>>> diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
>>>>> index 666f34b58dec..e96ec7b9a152 100644
>>>>> --- a/drivers/mtd/nand/raw/marvell_nand.c
>>>>> +++ b/drivers/mtd/nand/raw/marvell_nand.c
>>>>> @@ -613,7 +613,8 @@ static int marvell_nfc_wait_cmdd(struct nand_chip *chip)
>>>>>       static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
>>>>>       {
>>>>>       	struct marvell_nfc *nfc = to_marvell_nfc(chip->controller);
>>>>> -	int ret;
>>>>> +	int ret = -EALREADY;
>>>>
>>>> Won't this cause us to hit the if (!ret) below if we don't end up
>>>> calling wait_for_completion_timeout()?
>>>
>>> Nope boolean logic fail.
>>
>> Yup :)
>>
>>>> In fact I did just hit something
>>>> like this on my Armada-385 based board
>>>>
>>>> marvell-nfc f10d0000.nand-controller: Timeout waiting for RB signal
>>>> ubi0 warning: do_sync_erase: error -5 while erasing PEB 755, retry
>>>>
>>>
>>> But this still might be something else, it doesn't happen regularly. For
>>> what it's worth
>>>
>>> Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
>>
>> Thanks. Did you see the issue that I described without that patch?
>>
> 
> Yes. I've just tried a test without your patch and I see the same thing.
> I wonder if my R/B configuration is correct.
> 

Sure enough I was missing the pinctrl configuration so the R/B wasn't 
quite configured correctly.
diff mbox series

Patch

diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
index 666f34b58dec..e96ec7b9a152 100644
--- a/drivers/mtd/nand/raw/marvell_nand.c
+++ b/drivers/mtd/nand/raw/marvell_nand.c
@@ -613,7 +613,8 @@  static int marvell_nfc_wait_cmdd(struct nand_chip *chip)
 static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
 {
 	struct marvell_nfc *nfc = to_marvell_nfc(chip->controller);
-	int ret;
+	int ret = -EALREADY;
+	u32 st;
 
 	/* Timeout is expressed in ms */
 	if (!timeout_ms)
@@ -622,8 +623,15 @@  static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
 	init_completion(&nfc->complete);
 
 	marvell_nfc_enable_int(nfc, NDCR_RDYM);
-	ret = wait_for_completion_timeout(&nfc->complete,
-					  msecs_to_jiffies(timeout_ms));
+
+	/*
+	 * Check if the NDSR_RDY bits have already been set before the
+	 * interrupt was enabled.
+	 */
+	st = readl_relaxed(nfc->regs + NDSR);
+	if (!(st & (NDSR_RDY(0) | NDSR_RDY(1))))
+		ret = wait_for_completion_timeout(&nfc->complete,
+						  msecs_to_jiffies(timeout_ms));
 	marvell_nfc_disable_int(nfc, NDCR_RDYM);
 	marvell_nfc_clear_int(nfc, NDSR_RDY(0) | NDSR_RDY(1));
 	if (!ret) {