diff mbox

[2/6] nand_wait : warn if the nand is busy on exit

Message ID 1309105616-3609-2-git-send-email-matthieu.castet@parrot.com
State Accepted
Commit f251b8dfdd0721255ea11751cdc282834e43b74e
Headers show

Commit Message

Matthieu CASTET June 26, 2011, 4:26 p.m. UTC
This patch allow to detect buggy driver/hardware with
bad RnB (dev_ready) management.
This check cost nothing and could help to detect bugs.

Signed-off-by: Matthieu CASTET <matthieu.castet@parrot.com>
---
 drivers/mtd/nand/nand_base.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

Comments

Artem Bityutskiy June 28, 2011, 7:57 a.m. UTC | #1
On Sun, 2011-06-26 at 18:26 +0200, Matthieu CASTET wrote:
> This patch allow to detect buggy driver/hardware with
> bad RnB (dev_ready) management.
> This check cost nothing and could help to detect bugs.
> 
> Signed-off-by: Matthieu CASTET <matthieu.castet@parrot.com>
> ---
>  drivers/mtd/nand/nand_base.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index a3c7fd3..095dfea 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -885,6 +885,8 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
>  	led_trigger_event(nand_led_trigger, LED_OFF);
>  
>  	status = (int)chip->read_byte(mtd);
> +	/* This can happen if in case of timeout or buggy dev_ready */
> +	WARN_ON(!(status & NAND_STATUS_READY));
>  	return status;

This seem to completely miss the chip->dev_ready != NULL case, e.g.,
piece of code above is like this

                while (time_before(jiffies, timeo)) {
                        if (chip->dev_ready) {
                                if (chip->dev_ready(mtd))
                                        break;
                        } else {
                                if (chip->read_byte(mtd) & NAND_STATUS_READY)
                                        break;
                        }
                        cond_resched();
                }
Matthieu CASTET June 28, 2011, 3:03 p.m. UTC | #2
Artem Bityutskiy a écrit :
> On Sun, 2011-06-26 at 18:26 +0200, Matthieu CASTET wrote:
>> This patch allow to detect buggy driver/hardware with
>> bad RnB (dev_ready) management.
>> This check cost nothing and could help to detect bugs.
>>
>> Signed-off-by: Matthieu CASTET <matthieu.castet@parrot.com>
>> ---
>>  drivers/mtd/nand/nand_base.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>> index a3c7fd3..095dfea 100644
>> --- a/drivers/mtd/nand/nand_base.c
>> +++ b/drivers/mtd/nand/nand_base.c
>> @@ -885,6 +885,8 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
>>  	led_trigger_event(nand_led_trigger, LED_OFF);
>>  
>>  	status = (int)chip->read_byte(mtd);
>> +	/* This can happen if in case of timeout or buggy dev_ready */
>> +	WARN_ON(!(status & NAND_STATUS_READY));
>>  	return status;
> 
> This seem to completely miss the chip->dev_ready != NULL case, e.g.,
> piece of code above is like this
> 
>                 while (time_before(jiffies, timeo)) {
>                         if (chip->dev_ready) {
>                                 if (chip->dev_ready(mtd))
>                                         break;
>                         } else {
>                                 if (chip->read_byte(mtd) & NAND_STATUS_READY)
>                                         break;
>                         }
>                         cond_resched();
>                 }
> 
Sorry, I don't understand what you mean.

We don't care what's done in the loop (chip->dev_ready != NULL or, not). We only
check when the loop exit, that READY bit is set in the status.


Matthieu
Artem Bityutskiy June 28, 2011, 7:05 p.m. UTC | #3
On Tue, 2011-06-28 at 17:03 +0200, Matthieu CASTET wrote:
> Artem Bityutskiy a écrit :
> > On Sun, 2011-06-26 at 18:26 +0200, Matthieu CASTET wrote:
> >> This patch allow to detect buggy driver/hardware with
> >> bad RnB (dev_ready) management.
> >> This check cost nothing and could help to detect bugs.
> >>
> >> Signed-off-by: Matthieu CASTET <matthieu.castet@parrot.com>
> >> ---
> >>  drivers/mtd/nand/nand_base.c |    2 ++
> >>  1 files changed, 2 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> >> index a3c7fd3..095dfea 100644
> >> --- a/drivers/mtd/nand/nand_base.c
> >> +++ b/drivers/mtd/nand/nand_base.c
> >> @@ -885,6 +885,8 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
> >>  	led_trigger_event(nand_led_trigger, LED_OFF);
> >>  
> >>  	status = (int)chip->read_byte(mtd);
> >> +	/* This can happen if in case of timeout or buggy dev_ready */
> >> +	WARN_ON(!(status & NAND_STATUS_READY));
> >>  	return status;
> > 
> > This seem to completely miss the chip->dev_ready != NULL case, e.g.,
> > piece of code above is like this
> > 
> >                 while (time_before(jiffies, timeo)) {
> >                         if (chip->dev_ready) {
> >                                 if (chip->dev_ready(mtd))
> >                                         break;
> >                         } else {
> >                                 if (chip->read_byte(mtd) & NAND_STATUS_READY)
> >                                         break;
> >                         }
> >                         cond_resched();
> >                 }
> > 
> Sorry, I don't understand what you mean.
> 
> We don't care what's done in the loop (chip->dev_ready != NULL or, not). We only
> check when the loop exit, that READY bit is set in the status.

Well, the logic is suspicious.

1. For NAND with chip->dev_ready != NULL, why NAND_STATUS_READY should
be set? We do not check for this in the loop.

2. For NAND with chip->dev_ready != NULL, if NAND_STATUS_READY has to be
set at the end, why wouldn't we drop this chip_ready part completely? We
could just loop while NAND_STATUS_READY is not set.

Isn't this strange?
Ivan Djelic June 29, 2011, 1:59 p.m. UTC | #4
On Tue, Jun 28, 2011 at 08:05:14PM +0100, Artem Bityutskiy wrote:
> On Tue, 2011-06-28 at 17:03 +0200, Matthieu CASTET wrote:
> > Artem Bityutskiy a écrit :
> > > On Sun, 2011-06-26 at 18:26 +0200, Matthieu CASTET wrote:
> > >> This patch allow to detect buggy driver/hardware with
> > >> bad RnB (dev_ready) management.
> > >> This check cost nothing and could help to detect bugs.
> > >>
> > >> Signed-off-by: Matthieu CASTET <matthieu.castet@parrot.com>
> > >> ---
> > >>  drivers/mtd/nand/nand_base.c |    2 ++
> > >>  1 files changed, 2 insertions(+), 0 deletions(-)
> > >>
> > >> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> > >> index a3c7fd3..095dfea 100644
> > >> --- a/drivers/mtd/nand/nand_base.c
> > >> +++ b/drivers/mtd/nand/nand_base.c
> > >> @@ -885,6 +885,8 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
> > >>  	led_trigger_event(nand_led_trigger, LED_OFF);
> > >>  
> > >>  	status = (int)chip->read_byte(mtd);
> > >> +	/* This can happen if in case of timeout or buggy dev_ready */
> > >> +	WARN_ON(!(status & NAND_STATUS_READY));
> > >>  	return status;
> > > 
> > > This seem to completely miss the chip->dev_ready != NULL case, e.g.,
> > > piece of code above is like this
> > > 
> > >                 while (time_before(jiffies, timeo)) {
> > >                         if (chip->dev_ready) {
> > >                                 if (chip->dev_ready(mtd))
> > >                                         break;
> > >                         } else {
> > >                                 if (chip->read_byte(mtd) & NAND_STATUS_READY)
> > >                                         break;
> > >                         }
> > >                         cond_resched();
> > >                 }
> > > 
> > Sorry, I don't understand what you mean.
> > 
> > We don't care what's done in the loop (chip->dev_ready != NULL or, not). We only
> > check when the loop exit, that READY bit is set in the status.
> 
> Well, the logic is suspicious.
> 
> 1. For NAND with chip->dev_ready != NULL, why NAND_STATUS_READY should
> be set? We do not check for this in the loop.
> 
> 2. For NAND with chip->dev_ready != NULL, if NAND_STATUS_READY has to be
> set at the end, why wouldn't we drop this chip_ready part completely? We
> could just loop while NAND_STATUS_READY is not set.
> 
> Isn't this strange?

Not really. There are 2 methods to wait for an erase/program command completion:

1. Wait until nand RnB pin goes high (that's what chip->dev_ready usually does)
2. Poll the device: send a status (0x70) command and read status byte in a loop
   until bit NAND_STATUS_READY is set

In all cases, you should send a status command after completion, to check if
the operation was successful. And if the operation completed, the status should
have bit NAND_STATUS_READY set.

Method 1 is optimal, you can often use an interrupt to signal completion, and
no cpu cycles are lost in a polling loop. But flaky hardware (bad pull-ups) or
bad gpio configuration can make it unreliable.

Method 2 is the safest, it always works, but it is less efficient.

Here, Matthieu wants to detect cases where method 1 is unreliable or a timeout
occurs (e.g. somebody forgot to put a nand device on the board :).
Both conditions are not expected on working hardware.

Another option (that I used on other systems) is to systematically perform
method 1 first (if chip->dev_ready != NULL), _then_ method 2.
If method 1 works as expected, then the polling loop in method 2 finishes
immediately because nand status already has bit NAND_STATUS_READY set.

--
Best Regards,

Ivan
Artem Bityutskiy June 30, 2011, 12:36 p.m. UTC | #5
On Wed, 2011-06-29 at 15:59 +0200, Ivan Djelic wrote:
> On Tue, Jun 28, 2011 at 08:05:14PM +0100, Artem Bityutskiy wrote:
> > On Tue, 2011-06-28 at 17:03 +0200, Matthieu CASTET wrote:
> > > Artem Bityutskiy a écrit :
> > > > On Sun, 2011-06-26 at 18:26 +0200, Matthieu CASTET wrote:
> > > >> This patch allow to detect buggy driver/hardware with
> > > >> bad RnB (dev_ready) management.
> > > >> This check cost nothing and could help to detect bugs.
> > > >>
> > > >> Signed-off-by: Matthieu CASTET <matthieu.castet@parrot.com>
> > > >> ---
> > > >>  drivers/mtd/nand/nand_base.c |    2 ++
> > > >>  1 files changed, 2 insertions(+), 0 deletions(-)
> > > >>
> > > >> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> > > >> index a3c7fd3..095dfea 100644
> > > >> --- a/drivers/mtd/nand/nand_base.c
> > > >> +++ b/drivers/mtd/nand/nand_base.c
> > > >> @@ -885,6 +885,8 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
> > > >>  	led_trigger_event(nand_led_trigger, LED_OFF);
> > > >>  
> > > >>  	status = (int)chip->read_byte(mtd);
> > > >> +	/* This can happen if in case of timeout or buggy dev_ready */
> > > >> +	WARN_ON(!(status & NAND_STATUS_READY));
> > > >>  	return status;
> > > > 
> > > > This seem to completely miss the chip->dev_ready != NULL case, e.g.,
> > > > piece of code above is like this
> > > > 
> > > >                 while (time_before(jiffies, timeo)) {
> > > >                         if (chip->dev_ready) {
> > > >                                 if (chip->dev_ready(mtd))
> > > >                                         break;
> > > >                         } else {
> > > >                                 if (chip->read_byte(mtd) & NAND_STATUS_READY)
> > > >                                         break;
> > > >                         }
> > > >                         cond_resched();
> > > >                 }
> > > > 
> > > Sorry, I don't understand what you mean.
> > > 
> > > We don't care what's done in the loop (chip->dev_ready != NULL or, not). We only
> > > check when the loop exit, that READY bit is set in the status.
> > 
> > Well, the logic is suspicious.
> > 
> > 1. For NAND with chip->dev_ready != NULL, why NAND_STATUS_READY should
> > be set? We do not check for this in the loop.
> > 
> > 2. For NAND with chip->dev_ready != NULL, if NAND_STATUS_READY has to be
> > set at the end, why wouldn't we drop this chip_ready part completely? We
> > could just loop while NAND_STATUS_READY is not set.
> > 
> > Isn't this strange?
> 
> Not really. There are 2 methods to wait for an erase/program command completion:
> 
> 1. Wait until nand RnB pin goes high (that's what chip->dev_ready usually does)
> 2. Poll the device: send a status (0x70) command and read status byte in a loop
>    until bit NAND_STATUS_READY is set
> 
> In all cases, you should send a status command after completion, to check if
> the operation was successful. And if the operation completed, the status should
> have bit NAND_STATUS_READY set.
> 
> Method 1 is optimal, you can often use an interrupt to signal completion, and
> no cpu cycles are lost in a polling loop. But flaky hardware (bad pull-ups) or
> bad gpio configuration can make it unreliable.
> 
> Method 2 is the safest, it always works, but it is less efficient.

Thanks, this is basically the reply I waited for - the explanation how
it works. You guys should not assume that I know everything - I just do
my best to keep MTD subsystem working and make people improve it :-) And
sometimes I just have no time to dig things and ask people to educate
me, to save my time :-)

> Here, Matthieu wants to detect cases where method 1 is unreliable or a timeout
> occurs (e.g. somebody forgot to put a nand device on the board :).
> Both conditions are not expected on working hardware.

Fair enough, thanks!

However, the code is not easy to follow, and this assumption which you
explained - "if dev_ready() returns truth, the status command has to be
sent anyway and the READY bit has to be there anyway" - it is subtle,
and makes the nand_base.c less readable, and more error prone.

And there are tons of things like this. And what the community do in
these cases - it forces people who just want to do a simple thing to
also do general clean-ups, at least some reasonable amount of it.

E.g., recently people cleaned-up the partitions stuff, and this started
with our refusal to take a simple path which touched the MTD_PARTITIONS
macro.

So, what I suggested to Matthieu, although in a vague way, is to look
how this subtle dev_ready things could be cleaned-up.

I said that we probably may:

1. Introduce something like default_dev_ready() which falls-back to the
status polling unless the driver provides it's own.
2. Look to all the if (chip->dev_ready) do A else do B things and try to
remove them.

To put it differently, I am trying to encourage you guys to clean-up the
code a bit in this dev_ready/status area before changing this area.

Any ideas? :-)
diff mbox

Patch

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index a3c7fd3..095dfea 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -885,6 +885,8 @@  static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
 	led_trigger_event(nand_led_trigger, LED_OFF);
 
 	status = (int)chip->read_byte(mtd);
+	/* This can happen if in case of timeout or buggy dev_ready */
+	WARN_ON(!(status & NAND_STATUS_READY));
 	return status;
 }