diff mbox

[2/2] mtd: nand: use hrtimer to measure timeout in nand_wait{_ready, }

Message ID 1337589758-8775-3-git-send-email-johan.gunnarsson@axis.com
State Superseded, archived
Headers show

Commit Message

Johan Gunnarsson May 21, 2012, 8:42 a.m. UTC
Previous, jiffy-based, implementation had three issues:

1. For low HZ values (<100 and >=50) the timeout would be 1 jiffy,
   which can cause premature timeouts if a timer interrupt happens
   right after the "timeo" value is assigned.
2. For very low HZ values (<50) the timeout is 0 jiffies, which also
   cause premature timeouts.
3. The jiffies counter is not reliable on multicore systems when
   interrupts are disabled for long times (a couple of timer interrupt
   periods). For example with excessive printk and serial output in
   interrupt context. The jiffies counting stops during the period
   with disabled interrupts and recovers by counting up all lost jiffy
   increments very, very fast when interrupts are later enabled. This,
   together with bad luck, can cause a third type of premature
   timeouts.

All three issues can cause good blocks being marked bad.

Signed-off-by: Johan Gunnarsson <johan.gunnarsson@axis.com>
---
 drivers/mtd/nand/nand_base.c |   36 ++++++++++++++++++++++++++++++------
 1 files changed, 30 insertions(+), 6 deletions(-)

Comments

Artem Bityutskiy May 22, 2012, 7:53 a.m. UTC | #1
On Mon, 2012-05-21 at 10:42 +0200, Johan Gunnarsson wrote:
> @@ -533,19 +541,29 @@ static void panic_nand_wait_ready(struct mtd_info *mtd, unsigned long timeo)
>  void nand_wait_ready(struct mtd_info *mtd)
>  {
>  	struct nand_chip *chip = mtd->priv;
> -	unsigned long timeo = jiffies + 2;
> +	struct hrtimer timer;
>  
>  	/* 400ms timeout */
>  	if (in_interrupt() || oops_in_progress)
>  		return panic_nand_wait_ready(mtd, 400);
>  
>  	led_trigger_event(nand_led_trigger, LED_FULL);
> +
> +	/* Arm timeout timer for 20ms timeout */
> +	hrtimer_init(&timer, CLOCK_REALTIME, HRTIMER_MODE_REL);
> +	timer.function = nand_wait_timeout_callback;
> +	hrtimer_start(&timer, ns_to_ktime(20 * 1000 * 1000),
> +	              HRTIMER_MODE_REL);
> +
>  	/* Wait until command is processed or timeout occurs */
>  	do {
>  		if (chip->dev_ready(mtd))
>  			break;
>  		touch_softlockup_watchdog();
> -	} while (time_before(jiffies, timeo));

Si this function waits for the chip, but if we cannot access the "ready"
pin - it waits for 400msecs? Where this number 400 comes from? Why not
500? How many chips we have which do not provide "dev_ready()" function?
Can we simplify this function and assume everyone has "dev_ready()" and
add BUG_ON(!chip->dev_ready) - could you make a small research ? Or can
we do like this:

if (!dev_ready()) {
	msleep(400);
	return
}

do {
} while

Why should we iterate if "dev_ready" is NULL?

My point is that this stuff ancient horror which needs love and cleaning
up. Your does not make it less scary. I'd prefer to first clean the
whole thing up, and then fix it.

I'm CCing Brian who spent a lot of time digging nand_base.c recently, he
has probably got some ideas.
Johan Gunnarsson May 22, 2012, 8:52 a.m. UTC | #2
> -----Original Message-----
> From: Artem Bityutskiy [mailto:dedekind1@gmail.com]
> Sent: den 22 maj 2012 09:53
> To: Johan Gunnarsson; Brian Norris
> Cc: linux-mtd@lists.infradead.org; Jesper Nilsson
> Subject: Re: [PATCH 2/2] mtd: nand: use hrtimer to measure timeout in
> nand_wait{_ready, }
> 
> On Mon, 2012-05-21 at 10:42 +0200, Johan Gunnarsson wrote:
> > @@ -533,19 +541,29 @@ static void panic_nand_wait_ready(struct
> mtd_info *mtd, unsigned long timeo)
> >  void nand_wait_ready(struct mtd_info *mtd)
> >  {
> >  	struct nand_chip *chip = mtd->priv;
> > -	unsigned long timeo = jiffies + 2;
> > +	struct hrtimer timer;
> >
> >  	/* 400ms timeout */
> >  	if (in_interrupt() || oops_in_progress)
> >  		return panic_nand_wait_ready(mtd, 400);
> >
> >  	led_trigger_event(nand_led_trigger, LED_FULL);
> > +
> > +	/* Arm timeout timer for 20ms timeout */
> > +	hrtimer_init(&timer, CLOCK_REALTIME, HRTIMER_MODE_REL);
> > +	timer.function = nand_wait_timeout_callback;
> > +	hrtimer_start(&timer, ns_to_ktime(20 * 1000 * 1000),
> > +	              HRTIMER_MODE_REL);
> > +
> >  	/* Wait until command is processed or timeout occurs */
> >  	do {
> >  		if (chip->dev_ready(mtd))
> >  			break;
> >  		touch_softlockup_watchdog();
> > -	} while (time_before(jiffies, timeo));
> 
> Si this function waits for the chip, but if we cannot access the
> "ready"
> pin - it waits for 400msecs? Where this number 400 comes from? Why not
> 500? How many chips we have which do not provide "dev_ready()"
> function?

Not an expert on the MTD framework -- but I believe the 400 comes from the comment above nand_wait(): "Wait for command done. This applies to erase and program only. Erase can take up to 400ms and program up to 20ms according to general NAND and SmartMedia specs."

> Can we simplify this function and assume everyone has "dev_ready()" and
> add BUG_ON(!chip->dev_ready) - could you make a small research ? Or can

The drivers/mtd/nand/h1910.c driver sets dev_ready to NULL on init. The others seem to have dev_ready implementations.

> we do like this:
> 
> if (!dev_ready()) {
> 	msleep(400);
> 	return
> }
> 
> do {
> } while
> 
> Why should we iterate if "dev_ready" is NULL?

Actually, it's worse than that. If dev_ready is NULL (as in the h1910.c case) we'll crash the kernel, right?

> 
> My point is that this stuff ancient horror which needs love and
> cleaning
> up. Your does not make it less scary. I'd prefer to first clean the
> whole thing up, and then fix it.
> 
> I'm CCing Brian who spent a lot of time digging nand_base.c recently,
> he
> has probably got some ideas.
> 
> --
> Best Regards,
> Artem Bityutskiy
Artem Bityutskiy May 22, 2012, 10:25 a.m. UTC | #3
On Tue, 2012-05-22 at 10:52 +0200, Johan Gunnarsson wrote:
> Not an expert on the MTD framework -- but I believe the 400 comes from
> the comment above nand_wait(): "Wait for command done. This applies to
> erase and program only. Erase can take up to 400ms and program up to
> 20ms according to general NAND and SmartMedia specs."

Most probably this legacy can be killed along with the IPAQ h1910 driver
you pointed. But could you please start with cleaning these functions up
and turning the "!chip->dev_ready()" case into a simple msleep() or
mdelay() for the panic case?

Then I may get some ideas how to get rid of the jiffies without using
hrtimer.
Johan Gunnarsson May 22, 2012, 2:24 p.m. UTC | #4
> -----Original Message-----
> From: Artem Bityutskiy [mailto:dedekind1@gmail.com]
> Sent: den 22 maj 2012 12:25
> To: Johan Gunnarsson
> Cc: Brian Norris; linux-mtd@lists.infradead.org; Jesper Nilsson
> Subject: RE: [PATCH 2/2] mtd: nand: use hrtimer to measure timeout in
> nand_wait{_ready, }
> 
> On Tue, 2012-05-22 at 10:52 +0200, Johan Gunnarsson wrote:
> > Not an expert on the MTD framework -- but I believe the 400 comes
> from
> > the comment above nand_wait(): "Wait for command done. This applies
> to
> > erase and program only. Erase can take up to 400ms and program up to
> > 20ms according to general NAND and SmartMedia specs."
> 
> Most probably this legacy can be killed along with the IPAQ h1910
> driver you pointed. But could you please start with cleaning these
> functions up and turning the "!chip->dev_ready()" case into a simple
> msleep() or
> mdelay() for the panic case?
> 

I don't understand what clean-up is needed here. Can you please elaborate?

The case you mentioned earlier doesn't really make sense to me:

if (!dev_ready()) {
	msleep(400);
	return
}

Making a single call to dev_ready() and giving up if FALSE is wrong IMHO, since we very much want to poll dev_ready() and return as soon as it turns TRUE (or timeout.) Assuming you actually mean "if(!dev_ready)) {" also doesn't make sense since checks like that exists on all path leading to nand_wait_ready().
Ivan Djelic May 22, 2012, 5:10 p.m. UTC | #5
On Tue, May 22, 2012 at 09:52:22AM +0100, Johan Gunnarsson wrote:
> > -----Original Message-----
> > From: Artem Bityutskiy [mailto:dedekind1@gmail.com]
> > Sent: den 22 maj 2012 09:53
> > To: Johan Gunnarsson; Brian Norris
> > Cc: linux-mtd@lists.infradead.org; Jesper Nilsson
> > Subject: Re: [PATCH 2/2] mtd: nand: use hrtimer to measure timeout in
> > nand_wait{_ready, }

(...)

> > Si this function waits for the chip, but if we cannot access the
> > "ready"
> > pin - it waits for 400msecs? Where this number 400 comes from? Why not
> > 500? How many chips we have which do not provide "dev_ready()"
> > function?
> 
> Not an expert on the MTD framework -- but I believe the 400 comes from the comment above nand_wait(): "Wait for command done. This applies to erase and program only. Erase can take up to 400ms and program up to 20ms according to general NAND and SmartMedia specs."
> 

Hi Johan and Artem,

Current NAND devices require approximately 1-3 ms per block erase, and 100-200 us per page program operation.
But I think this timeout is only useful to detect and recover from broken hardware: IMHO there is no point in
trying to "optimize" the timeout delay to 20ms or 400ms depending on which operation is being done (program or erase).

Why not simplify and just use a single 1s (1000ms) timeout ?

Johan,
with such a timeout value, your bug would manifest itself only if interrupts are blocked for nearly a second: would that be acceptable ?

> > Can we simplify this function and assume everyone has "dev_ready()" and
> > add BUG_ON(!chip->dev_ready) - could you make a small research ? Or can
> 
> The drivers/mtd/nand/h1910.c driver sets dev_ready to NULL on init. The others seem to have dev_ready implementations.
> 
> > we do like this:
> > 
> > if (!dev_ready()) {
> > 	msleep(400);
> > 	return
> > }
> > 
> > do {
> > } while
> > 
> > Why should we iterate if "dev_ready" is NULL?

nand_wait() polls the device until it is ready or a timeout has elapsed.
More details here: http://lists.infradead.org/pipermail/linux-mtd/2011-June/036795.html


> Actually, it's worse than that. If dev_ready is NULL (as in the h1910.c case) we'll crash the kernel, right?
> 
> > 
> > My point is that this stuff ancient horror which needs love and
> > cleaning
> > up. Your does not make it less scary. I'd prefer to first clean the
> > whole thing up, and then fix it.

I agree it needs some cleaning; here are a few things I have noticed:

* panic_nand_wait() and panic_nand_wait_ready() are nearly identical

* Some patterns are repeated:
	/* Apply delay or wait for ready/busy pin */
	if (!chip->dev_ready)
		udelay(chip->chip_delay);
	else
		nand_wait_ready(mtd);

* the udelay() fallback should probably be phased out if only one driver (h1910) depends on it because it does not define chip->dev_ready() ?

* nand_wait() and nand_wait_ready() use different strategies to wait:
  - nand_wait_ready = busy loop
  - nand_wait       = busy loop with cond_resched call
  Presumably, this is because nand_wait_ready() should not wait for more than ~25 us (typically),
  so it is more efficient to avoid the cond_resched() call ?

* We could separate dev_ready() and STATUS polling inside nand_wait(), or even better, directly call nand_wait_ready() from nand_wait() ?
  The only difference between the 2 functions (besides cond_resched) is that nand_wait_ready() does not send any STATUS command: as a consequence,
  it does not interfere with the device mode (polling the device with STATUS reads while waiting during a read operation requires
  re-sending a READ0 command at the end to return to read mode).

I would suggest the following approach:

1. Use a single timeout value (1s for instance).

2. Rewrite and simplify nand_wait() as follows:
   a) loop until dev_ready() returns true OR timeout
   b) send a STATUS command
   c) loop until status is ready OR timeout, and return NAND status

The idea is that in most cases, dev_ready() will be defined and functional, and the loop in c) will finish immediately.
If dev_ready() is not available or defective, step c) still ensures we properly wait.
Step a) could be replaced with a call to a modified version of nand_wait_ready() with optional cond_resched-uling.

What do you think ? Any suggestions ?
I'm willing to submit patches if we reach a consensus on this,

Best regards,
--
Ivan
Artem Bityutskiy May 22, 2012, 6:21 p.m. UTC | #6
On Tue, 2012-05-22 at 19:10 +0200, Ivan Djelic wrote:
> Current NAND devices require approximately 1-3 ms per block erase, and 100-200 us per page program operation.
> But I think this timeout is only useful to detect and recover from broken hardware: IMHO there is no point in
> trying to "optimize" the timeout delay to 20ms or 400ms depending on which operation is being done (program or erase).
> 
> Why not simplify and just use a single 1s (1000ms) timeout ?


I have many things on my plate now so my answers are are sloppy - I
spent 1 minute or less looking at the code. But I thought that this
timeout is important for the the case when 'chip->device_ready' is NULL.
This is why I wanted to get rid of that case, and then the timeout would
only become about detecting the forever loops. And this is exactly what
I meant by saying that I have further idea on improving that without
hrtimer - just have one single 1 second timeout. We indeed do not have
to be precise for the errors detection.

Oops, SIGBABY.
Brian Norris May 23, 2012, 6:39 a.m. UTC | #7
Hi,

On Tue, May 22, 2012 at 10:10 AM, Ivan Djelic <ivan.djelic@parrot.com> wrote:
> On Tue, May 22, 2012 at 09:52:22AM +0100, Johan Gunnarsson wrote:
>> > -----Original Message-----
>> > From: Artem Bityutskiy [mailto:dedekind1@gmail.com]
>> > Sent: den 22 maj 2012 09:53
>> > To: Johan Gunnarsson; Brian Norris
>> > Cc: linux-mtd@lists.infradead.org; Jesper Nilsson
>> > Subject: Re: [PATCH 2/2] mtd: nand: use hrtimer to measure timeout in
>> > nand_wait{_ready, }
>
> (...)
>
>> > Si this function waits for the chip, but if we cannot access the
>> > "ready"
>> > pin - it waits for 400msecs? Where this number 400 comes from? Why not
>> > 500? How many chips we have which do not provide "dev_ready()"
>> > function?
>
> Current NAND devices require approximately 1-3 ms per block erase, and 100-200 us per page program operation.
> But I think this timeout is only useful to detect and recover from broken hardware: IMHO there is no point in
> trying to "optimize" the timeout delay to 20ms or 400ms depending on which operation is being done (program or erase).
>
> Why not simplify and just use a single 1s (1000ms) timeout ?
>
>> > Can we simplify this function and assume everyone has "dev_ready()" and
>> > add BUG_ON(!chip->dev_ready) - could you make a small research ? Or can
>>
>> The drivers/mtd/nand/h1910.c driver sets dev_ready to NULL on init. The others seem to have dev_ready implementations.

Not true: there are several drivers which do not have dev_ready, but
this is also because they are controller-based, so they provide their
own cmdfunc as well, mostly avoiding the nand_wait_ready stuff.
(Drivers: denali.c, docg4.c, my out-of-tree driver, ...)

Note: another use of nand_wait_ready is in nand_do_read_ops and
nand_do_read_oob, under the following condition:
    if (!(chip->options & NAND_NO_READRDY)) {
Isn't NAND_NO_READRDY no longer needed, since I killed autoincrement?
(i.e., we *always* "do not support autoincrement", as its comment says
in nand.h) So if we kill this option, we can kill the only use of
nand_wait_ready outside of cmdfunc(); this brings us closer to being
able to using BUG_ON(!chip->dev_ready). I'll send a patch for this,
and you guys can comment.

>> Actually, it's worse than that. If dev_ready is NULL (as in the h1910.c case) we'll crash the kernel, right?

No, I don't think so. It looks like nand_wait_ready is never called
when dev_ready is NULL.

>> >
>> > My point is that this stuff ancient horror which needs love and
>> > cleaning
>> > up. Your does not make it less scary. I'd prefer to first clean the
>> > whole thing up, and then fix it.
>
> I agree it needs some cleaning; here are a few things I have noticed:

Even though I'm mostly not a user of this code, I agree :)

> * Some patterns are repeated:
>        /* Apply delay or wait for ready/busy pin */
>        if (!chip->dev_ready)
>                udelay(chip->chip_delay);
>        else
>                nand_wait_ready(mtd);

I'm going to kill this pattern, I think.

> * the udelay() fallback should probably be phased out if only one driver (h1910) depends on it because it does not define chip->dev_ready() ?

Clarification: only one driver does leaves both chip->dev_ready and
chip->cmdfunc NULL, right?

> What do you think ? Any suggestions ?
> I'm willing to submit patches if we reach a consensus on this,

Feel free to send a patch; it's probably easier to talk more
concretely about a patch that implements your (seemingly reasonable)
suggestions, and then provide comments based on concrete ideas.

Brian
Johan Gunnarsson May 23, 2012, 8:14 a.m. UTC | #8
> Why not simplify and just use a single 1s (1000ms) timeout ?
> 
> Johan,
> with such a timeout value, your bug would manifest itself only if
> interrupts are blocked for nearly a second: would that be acceptable ?

That indeed would be acceptable from my side.
Ivan Djelic May 23, 2012, 8:36 a.m. UTC | #9
On Wed, May 23, 2012 at 07:39:51AM +0100, Brian Norris wrote:
(...)
> >> > Can we simplify this function and assume everyone has "dev_ready()" and
> >> > add BUG_ON(!chip->dev_ready) - could you make a small research ? Or can
> >>
> >> The drivers/mtd/nand/h1910.c driver sets dev_ready to NULL on init. The others seem to have dev_ready implementations.
> 
> Not true: there are several drivers which do not have dev_ready, but
> this is also because they are controller-based, so they provide their
> own cmdfunc as well, mostly avoiding the nand_wait_ready stuff.
> (Drivers: denali.c, docg4.c, my out-of-tree driver, ...)
> 
> Note: another use of nand_wait_ready is in nand_do_read_ops and
> nand_do_read_oob, under the following condition:
>     if (!(chip->options & NAND_NO_READRDY)) {
> Isn't NAND_NO_READRDY no longer needed, since I killed autoincrement?
> (i.e., we *always* "do not support autoincrement", as its comment says
> in nand.h) So if we kill this option, we can kill the only use of
> nand_wait_ready outside of cmdfunc(); this brings us closer to being
> able to using BUG_ON(!chip->dev_ready). I'll send a patch for this,
> and you guys can comment.

Agreed.
 
> >> Actually, it's worse than that. If dev_ready is NULL (as in the h1910.c case) we'll crash the kernel, right?
> 
> No, I don't think so. It looks like nand_wait_ready is never called
> when dev_ready is NULL.

True, but since nand_wait_ready() is a public function, it could be called from a driver that has (dev_ready == NULL), resulting in a crash.
So all drivers must make sure they define dev_ready if they are going to call nand_wait_ready() (like cafe_nand.c).

(...)
> 
> > * the udelay() fallback should probably be phased out if only one driver (h1910) depends on it because it does not define chip->dev_ready() ?
> 
> Clarification: only one driver does leaves both chip->dev_ready and
> chip->cmdfunc NULL, right?

Yes, exactly. I meant that we could get rid of the udelay in the default chip->cmdfunc implementations (nand_command and nand_command_lp).
In other words, if chip->cmdfunc == NULL, we require chip->dev_ready != NULL. And move the existing udelay in an implementation of chip->dev_ready for the h1910 driver.

> Feel free to send a patch; it's probably easier to talk more
> concretely about a patch that implements your (seemingly reasonable)
> suggestions, and then provide comments based on concrete ideas.

Will do,
Thanks,

--
Ivan
diff mbox

Patch

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index b927e64..0db920b 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -47,6 +47,7 @@ 
 #include <linux/bitops.h>
 #include <linux/leds.h>
 #include <linux/io.h>
+#include <linux/hrtimer.h>
 #include <linux/mtd/partitions.h>
 
 /* Define default oob placement schemes for large and small page devices */
@@ -99,6 +100,13 @@  static int nand_get_device(struct nand_chip *chip, struct mtd_info *mtd,
 static int nand_do_write_oob(struct mtd_info *mtd, loff_t to,
 			     struct mtd_oob_ops *ops);
 
+/**
+ * NOP nand_wait timeout callback.
+ */
+static enum hrtimer_restart nand_wait_timeout_callback(struct hrtimer* timer) {
+	return HRTIMER_NORESTART;
+}
+
 /*
  * For devices which display every fart in the system on a separate LED. Is
  * compiled away when LED support is disabled.
@@ -533,19 +541,29 @@  static void panic_nand_wait_ready(struct mtd_info *mtd, unsigned long timeo)
 void nand_wait_ready(struct mtd_info *mtd)
 {
 	struct nand_chip *chip = mtd->priv;
-	unsigned long timeo = jiffies + 2;
+	struct hrtimer timer;
 
 	/* 400ms timeout */
 	if (in_interrupt() || oops_in_progress)
 		return panic_nand_wait_ready(mtd, 400);
 
 	led_trigger_event(nand_led_trigger, LED_FULL);
+
+	/* Arm timeout timer for 20ms timeout */
+	hrtimer_init(&timer, CLOCK_REALTIME, HRTIMER_MODE_REL);
+	timer.function = nand_wait_timeout_callback;
+	hrtimer_start(&timer, ns_to_ktime(20 * 1000 * 1000),
+	              HRTIMER_MODE_REL);
+
 	/* Wait until command is processed or timeout occurs */
 	do {
 		if (chip->dev_ready(mtd))
 			break;
 		touch_softlockup_watchdog();
-	} while (time_before(jiffies, timeo));
+	} while (ktime_to_ns(hrtimer_get_expires(&timer)) > 0);
+
+	hrtimer_cancel(&timer);
+
 	led_trigger_event(nand_led_trigger, LED_OFF);
 }
 EXPORT_SYMBOL_GPL(nand_wait_ready);
@@ -868,7 +886,6 @@  static void panic_nand_wait(struct mtd_info *mtd, struct nand_chip *chip,
 static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
 {
 	unsigned long timeout_ms;
-	unsigned long timeo = jiffies;
 	int status, state = chip->state;
 
 	if (state == FL_ERASING)
@@ -876,8 +893,6 @@  static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
 	else
 		timeout_ms = 20;
 
-	timeo += (HZ * timeout_ms) / 1000;
-
 	led_trigger_event(nand_led_trigger, LED_FULL);
 
 	/*
@@ -894,7 +909,14 @@  static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
 	if (in_interrupt() || oops_in_progress)
 		panic_nand_wait(mtd, chip, timeout_ms);
 	else {
-		while (time_before(jiffies, timeo)) {
+		struct hrtimer timer;
+
+		hrtimer_init(&timer, CLOCK_REALTIME, HRTIMER_MODE_REL);
+		timer.function = nand_wait_timeout_callback;
+		hrtimer_start(&timer, ns_to_ktime(timeout_ms * 1000 * 1000),
+		              HRTIMER_MODE_REL);
+
+		while(ktime_to_ns(hrtimer_get_expires(&timer)) > 0) {
 			if (chip->dev_ready) {
 				if (chip->dev_ready(mtd))
 					break;
@@ -904,6 +926,8 @@  static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
 			}
 			cond_resched();
 		}
+
+		hrtimer_cancel(&timer);
 	}
 	led_trigger_event(nand_led_trigger, LED_OFF);