diff mbox

[v7] mtd: nand: increase ready wait timeout and report timeouts

Message ID 20160225231432.GN21465@google.com
State Accepted
Commit 9ebfdf5b18493f338237ef9861a555c2f79b0c17
Headers show

Commit Message

Brian Norris Feb. 25, 2016, 11:14 p.m. UTC
On Thu, Feb 25, 2016 at 11:54:25PM +0100, Richard Weinberger wrote:
> On Tue, Oct 6, 2015 at 3:52 PM, Harvey Hunt <harvey.hunt@imgtec.com> wrote:
> > From: Alex Smith <alex.smith@imgtec.com>

[...]

> > --- a/drivers/mtd/nand/nand_base.c
> > +++ b/drivers/mtd/nand/nand_base.c
> > @@ -543,23 +543,32 @@ static void panic_nand_wait_ready(struct mtd_info *mtd, unsigned long timeo)
> >         }
> >  }
> >
> > -/* Wait for the ready pin, after a command. The timeout is caught later. */
> > +/**
> > + * nand_wait_ready - [GENERIC] Wait for the ready pin after commands.
> > + * @mtd: MTD device structure
> > + *
> > + * Wait for the ready pin after a command, and warn if a timeout occurs.
> > + */
> >  void nand_wait_ready(struct mtd_info *mtd)
> >  {
> >         struct nand_chip *chip = mtd->priv;
> > -       unsigned long timeo = jiffies + msecs_to_jiffies(20);
> > +       unsigned long timeo = 400;
> >
> > -       /* 400ms timeout */
> >         if (in_interrupt() || oops_in_progress)
> > -               return panic_nand_wait_ready(mtd, 400);
> > +               return panic_nand_wait_ready(mtd, timeo);
> >
> >         led_trigger_event(nand_led_trigger, LED_FULL);
> >         /* Wait until command is processed or timeout occurs */
> > +       timeo = jiffies + msecs_to_jiffies(timeo);
> >         do {
> >                 if (chip->dev_ready(mtd))
> > -                       break;
> > -               touch_softlockup_watchdog();
> > +                       goto out;
> > +               cond_resched();
> >         } while (time_before(jiffies, timeo));
> > +
> > +       pr_warn_ratelimited(
> > +               "timeout while waiting for chip to become ready\n");
> > +out:
> 
> Sorry for exhuming an already merged patch but Boris and I ran into
> spurious chip timeouts
> and hunted the issue down to this change.
> If the system is under heavy load the cond_resched() will swap in
> other threads and the
> time_before() calculation will trigger and a wrong chip timeout is reported.
> 
> It is also not clear to us why the cond_resched() is needed at all.
> Can you please elaborate?

I can't speak for the "why" precisely. It seemed reasonable to avoid a
(potentially) 400 ms busy loop though, in the presence of other
potential work.

Regardless, this timeout loop is wrong. Shouldn't it have something like
the following?

Comments

Boris Brezillon Feb. 25, 2016, 11:23 p.m. UTC | #1
On Thu, 25 Feb 2016 15:14:32 -0800
Brian Norris <computersforpeace@gmail.com> wrote:

> On Thu, Feb 25, 2016 at 11:54:25PM +0100, Richard Weinberger wrote:
> > On Tue, Oct 6, 2015 at 3:52 PM, Harvey Hunt <harvey.hunt@imgtec.com> wrote:
> > > From: Alex Smith <alex.smith@imgtec.com>
> 
> [...]
> 
> > > --- a/drivers/mtd/nand/nand_base.c
> > > +++ b/drivers/mtd/nand/nand_base.c
> > > @@ -543,23 +543,32 @@ static void panic_nand_wait_ready(struct mtd_info *mtd, unsigned long timeo)
> > >         }
> > >  }
> > >
> > > -/* Wait for the ready pin, after a command. The timeout is caught later. */
> > > +/**
> > > + * nand_wait_ready - [GENERIC] Wait for the ready pin after commands.
> > > + * @mtd: MTD device structure
> > > + *
> > > + * Wait for the ready pin after a command, and warn if a timeout occurs.
> > > + */
> > >  void nand_wait_ready(struct mtd_info *mtd)
> > >  {
> > >         struct nand_chip *chip = mtd->priv;
> > > -       unsigned long timeo = jiffies + msecs_to_jiffies(20);
> > > +       unsigned long timeo = 400;
> > >
> > > -       /* 400ms timeout */
> > >         if (in_interrupt() || oops_in_progress)
> > > -               return panic_nand_wait_ready(mtd, 400);
> > > +               return panic_nand_wait_ready(mtd, timeo);
> > >
> > >         led_trigger_event(nand_led_trigger, LED_FULL);
> > >         /* Wait until command is processed or timeout occurs */
> > > +       timeo = jiffies + msecs_to_jiffies(timeo);
> > >         do {
> > >                 if (chip->dev_ready(mtd))
> > > -                       break;
> > > -               touch_softlockup_watchdog();
> > > +                       goto out;
> > > +               cond_resched();
> > >         } while (time_before(jiffies, timeo));
> > > +
> > > +       pr_warn_ratelimited(
> > > +               "timeout while waiting for chip to become ready\n");
> > > +out:
> > 
> > Sorry for exhuming an already merged patch but Boris and I ran into
> > spurious chip timeouts
> > and hunted the issue down to this change.
> > If the system is under heavy load the cond_resched() will swap in
> > other threads and the
> > time_before() calculation will trigger and a wrong chip timeout is reported.
> > 
> > It is also not clear to us why the cond_resched() is needed at all.
> > Can you please elaborate?
> 
> I can't speak for the "why" precisely. It seemed reasonable to avoid a
> (potentially) 400 ms busy loop though, in the presence of other
> potential work.
> 
> Regardless, this timeout loop is wrong. Shouldn't it have something like
> the following?
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index f2c8ff398d6c..596a9b0503da 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -566,8 +566,8 @@ void nand_wait_ready(struct mtd_info *mtd)
>  		cond_resched();
>  	} while (time_before(jiffies, timeo));
>  
> -	pr_warn_ratelimited(
> -		"timeout while waiting for chip to become ready\n");
> +	if (!chip->dev_ready(mtd))
> +		pr_warn_ratelimited("timeout while waiting for chip to become ready\n");
>  out:
>  	led_trigger_event(nand_led_trigger, LED_OFF);
>  }

Looks good to me.

If you post the patch, you can add

Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Richard Weinberger Feb. 25, 2016, 11:27 p.m. UTC | #2
Am 26.02.2016 um 00:23 schrieb Boris Brezillon:
>> Regardless, this timeout loop is wrong. Shouldn't it have something like
>> the following?
>>
>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>> index f2c8ff398d6c..596a9b0503da 100644
>> --- a/drivers/mtd/nand/nand_base.c
>> +++ b/drivers/mtd/nand/nand_base.c
>> @@ -566,8 +566,8 @@ void nand_wait_ready(struct mtd_info *mtd)
>>  		cond_resched();
>>  	} while (time_before(jiffies, timeo));
>>  
>> -	pr_warn_ratelimited(
>> -		"timeout while waiting for chip to become ready\n");
>> +	if (!chip->dev_ready(mtd))
>> +		pr_warn_ratelimited("timeout while waiting for chip to become ready\n");
>>  out:
>>  	led_trigger_event(nand_led_trigger, LED_OFF);
>>  }
> 
> Looks good to me.
> 
> If you post the patch, you can add
> 
> Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>

Same here.

Reviewed-by: Richard Weinberger <richard@nod.at>

Thanks,
//richard
Harvey Hunt Feb. 26, 2016, 1:45 p.m. UTC | #3
Hi,

On 25/02/16 23:27, Richard Weinberger wrote:
> Am 26.02.2016 um 00:23 schrieb Boris Brezillon:
>>> Regardless, this timeout loop is wrong. Shouldn't it have something like
>>> the following?
>>>
>>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>>> index f2c8ff398d6c..596a9b0503da 100644
>>> --- a/drivers/mtd/nand/nand_base.c
>>> +++ b/drivers/mtd/nand/nand_base.c
>>> @@ -566,8 +566,8 @@ void nand_wait_ready(struct mtd_info *mtd)
>>>   		cond_resched();
>>>   	} while (time_before(jiffies, timeo));
>>>
>>> -	pr_warn_ratelimited(
>>> -		"timeout while waiting for chip to become ready\n");
>>> +	if (!chip->dev_ready(mtd))
>>> +		pr_warn_ratelimited("timeout while waiting for chip to become ready\n");
>>>   out:
>>>   	led_trigger_event(nand_led_trigger, LED_OFF);
>>>   }
>>
>> Looks good to me.
>>
>> If you post the patch, you can add
>>
>> Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
>
> Same here.
>
> Reviewed-by: Richard Weinberger <richard@nod.at>
>
> Thanks,
> //richard
>

-cc IMG list (I left it in my gitconfig when I originally sent this 
patch...).

Thanks for debugging and fixing this - proposed patch looks good to me:

Reviewed-by: Harvey Hunt <harvey.hunt@imgtec.com>

Thanks,

Harvey
diff mbox

Patch

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index f2c8ff398d6c..596a9b0503da 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -566,8 +566,8 @@  void nand_wait_ready(struct mtd_info *mtd)
 		cond_resched();
 	} while (time_before(jiffies, timeo));
 
-	pr_warn_ratelimited(
-		"timeout while waiting for chip to become ready\n");
+	if (!chip->dev_ready(mtd))
+		pr_warn_ratelimited("timeout while waiting for chip to become ready\n");
 out:
 	led_trigger_event(nand_led_trigger, LED_OFF);
 }