Patchwork [MTD] Adjust the NOR CFI flash timeouts to round better

login
register
mail settings
Submitter Jason Gunthorpe
Date Oct. 5, 2012, 6:32 p.m.
Message ID <20121005183202.GA11385@obsidianresearch.com>
Download mbox | patch
Permalink /patch/189563/
State New
Headers show

Comments

Jason Gunthorpe - Oct. 5, 2012, 6:32 p.m.
Using jiffies for timeout could have significant rounding, ie if we
start at jiffie 1.9 and timeout at 2 then we may only get .1 jiffies
of timeout, which is not nearly enough. With a Hz of 250 I've seen
random bogus timeouts from this code.

Add a +1 to the HZ calculations to ensure a guaranteed minimum
timeout period.

This also increases the write_buffer timeout to 1.5us since I have
flashes in use that have a datasheet max of over 1 us.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/mtd/chips/cfi_cmdset_0002.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)
Artem Bityutskiy - Oct. 15, 2012, 1:42 p.m.
On Fri, 2012-10-05 at 12:32 -0600, Jason Gunthorpe wrote:
> -	/* see comments in do_write_oneword() regarding uWriteTimeo. */
> -	unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
> +	/* see comments in do_write_oneword() regarding uWriteTimeo.
> +	   Note: write_buffer commands take longer so we use a higher
> +	   time. The AMD 29LV256M for instance has a datasheet max
> +	   of 1.2ms for page and 600us for byte */
> +	unsigned long uWriteTimeout = (HZ / 1500) + 2;

What does HZ / 1500 mean? HZ is amount of timer interrupts per second,
which may be 1/100, or 1/1000, or 1/300, depending on the system.

The CFI code seems to assume that HZ is 1/1000, which is wrong. I think
the entire timeout strategy in this file should be cleaned up. You
should use 'jiffies_to_msecs()' / 'msecs_to_jiffies()' helpers instead
of using 'HZ'. And time-outs should be specified in msecs, not in
jiffies. There should be no things like 'HZ/1500'.

Or do I misunderstand something?
David Woodhouse - Oct. 15, 2012, 4:09 p.m.
On Mon, 2012-10-15 at 16:42 +0300, Artem Bityutskiy wrote:
> On Fri, 2012-10-05 at 12:32 -0600, Jason Gunthorpe wrote:
> > -	/* see comments in do_write_oneword() regarding uWriteTimeo. */
> > -	unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
> > +	/* see comments in do_write_oneword() regarding uWriteTimeo.
> > +	   Note: write_buffer commands take longer so we use a higher
> > +	   time. The AMD 29LV256M for instance has a datasheet max
> > +	   of 1.2ms for page and 600us for byte */
> > +	unsigned long uWriteTimeout = (HZ / 1500) + 2;
> 
> What does HZ / 1500 mean? HZ is amount of timer interrupts per second,
> which may be 1/100, or 1/1000, or 1/300, depending on the system.

If HZ is the amount of timer interrupts per second, then HZ/1500 is the
amount of timer interrupts in 1/1500 of a second, aka 666.7µs.¹

So it's not *entirely* bogus, but it should certainly be cleaned up to
use the msecs_to_jiffies() helpers... or preferably not use jiffies at
all, perhaps.
Jason Gunthorpe - Oct. 15, 2012, 4:57 p.m.
On Mon, Oct 15, 2012 at 09:09:40AM -0700, David Woodhouse wrote:

> So it's not *entirely* bogus, but it should certainly be cleaned up to
> use the msecs_to_jiffies() helpers... or preferably not use jiffies at
> all, perhaps.

No doubt this is part of why I've seen failures on my boards, the
timeout is too short by half and rounded wrong :)

A quick grep shows many cases in drivers/mtd that use division of HZ,
what would you like to see done here?

Jason

?? That's '??s', not 'us', for next time the patch is submitted. Welcome
  to the 21st century.

Hmm.. I may need a new terminal and/or mail reader ;)
David Woodhouse - Oct. 15, 2012, 5:31 p.m.
On Mon, 2012-10-15 at 10:57 -0600, Jason Gunthorpe wrote:
> On Mon, Oct 15, 2012 at 09:09:40AM -0700, David Woodhouse wrote:
> 
> > So it's not *entirely* bogus, but it should certainly be cleaned up to
> > use the msecs_to_jiffies() helpers... or preferably not use jiffies at
> > all, perhaps.
> 
> No doubt this is part of why I've seen failures on my boards, the
> timeout is too short by half and rounded wrong :)

Yeah, the rounding by +2 is confusing. I'm sure it's wrong, but at least
I can kind of see where it came from.

> A quick grep shows many cases in drivers/mtd that use division of HZ,
> what would you like to see done here?

I'd like someone with infinite amounts of free time to go through and
clean them all up :)

> ?? That's '??s', not 'us', for next time the patch is submitted. Welcome
>   to the 21st century.
> 
> Hmm.. I may need a new terminal and/or mail reader ;)

As I said... welcome to the 21st century. Seriously though, what are you
using that isn't capable of even *preserving* UTF-8 when you reply to
it? That's really quite broken, and has been considered so for a decade
or more already.
Jason Gunthorpe - Oct. 15, 2012, 5:49 p.m.
On Mon, Oct 15, 2012 at 10:31:41AM -0700, David Woodhouse wrote:

> > > So it's not *entirely* bogus, but it should certainly be cleaned up to
> > > use the msecs_to_jiffies() helpers... or preferably not use jiffies at
> > > all, perhaps.
> > 
> > No doubt this is part of why I've seen failures on my boards, the
> > timeout is too short by half and rounded wrong :)
> 
> Yeah, the rounding by +2 is confusing. I'm sure it's wrong, but at least
> I can kind of see where it came from.

The rounding by +2? I think even if you used msecs_to_jiffies you'd
need to add a +1 to the result to make the timeout algorithm work
right..

Is there a better way to do these timeouts?
 
> > A quick grep shows many cases in drivers/mtd that use division of HZ,
> > what would you like to see done here?
> 
> I'd like someone with infinite amounts of free time to go through and
> clean them all up :)

lol! Would you accept msecs_to_jiffies conversion for the cfi files?

> > ?? That's '??s', not 'us', for next time the patch is submitted. Welcome
> >   to the 21st century.
> > 
> > Hmm.. I may need a new terminal and/or mail reader ;)
> 
> As I said... welcome to the 21st century. Seriously though, what are you
> using that isn't capable of even *preserving* UTF-8 when you reply to
> it? That's really quite broken, and has been considered so for a decade
> or more already.

I deliberately made it not preserve UTF-8 for this posting, it is just
rxvt, nobody ever put unicode into it.

Jason
David Woodhouse - Oct. 15, 2012, 6:32 p.m.
On Mon, 2012-10-15 at 11:49 -0600, Jason Gunthorpe wrote:
> The rounding by +2? I think even if you used msecs_to_jiffies you'd
> need to add a +1 to the result to make the timeout algorithm work
> right..

Yes, HZ/1500+1 I could understand. That's the number of jiffies that
give you '666µs or more'. The +2 is odd though.

> > I'd like someone with infinite amounts of free time to go through and
> > clean them all up :)
> 
> lol! Would you accept msecs_to_jiffies conversion for the cfi files?

Absolutely. Although I'd like to take a little step back and take a more
thoughtful view of how we *should* be handing these timeouts, rather
than a simple janitor-style conversion of the existing code.

> I deliberately made it not preserve UTF-8 for this posting, it is just
> rxvt, nobody ever put unicode into it.

∃ rxvt-unicode
Artem Bityutskiy - Oct. 15, 2012, 7:28 p.m.
On Mon, 2012-10-15 at 09:09 -0700, David Woodhouse wrote:
> On Mon, 2012-10-15 at 16:42 +0300, Artem Bityutskiy wrote:
> > On Fri, 2012-10-05 at 12:32 -0600, Jason Gunthorpe wrote:
> > > -	/* see comments in do_write_oneword() regarding uWriteTimeo. */
> > > -	unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
> > > +	/* see comments in do_write_oneword() regarding uWriteTimeo.
> > > +	   Note: write_buffer commands take longer so we use a higher
> > > +	   time. The AMD 29LV256M for instance has a datasheet max
> > > +	   of 1.2ms for page and 600us for byte */
> > > +	unsigned long uWriteTimeout = (HZ / 1500) + 2;
> > 
> > What does HZ / 1500 mean? HZ is amount of timer interrupts per second,
> > which may be 1/100, or 1/1000, or 1/300, depending on the system.
> 
> If HZ is the amount of timer interrupts per second, then HZ/1500 is the
> amount of timer interrupts in 1/1500 of a second, aka 666.7µs.¹

First of all, yes, my e-mail was not very clear, I guess I just wanted
to express that it is not very readable. Also, HZ / 1500 would be 0 for
HZ < 1500, since we are using integer arithmetic.
David Woodhouse - Oct. 15, 2012, 8 p.m.
On Mon, 2012-10-15 at 22:28 +0300, Artem Bityutskiy wrote:
>  Also, HZ / 1500 would be 0 for HZ < 1500, since we are using integer
> arithmetic.

Hence the +1. One of them, at least.

Patch

diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index 22d0493..46a1dde 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -1150,10 +1150,10 @@  static int __xipram do_write_oneword(struct map_info *map, struct flchip *chip,
 	 * use the maximum timeout value given by the chip at probe time
 	 * instead.  Unfortunately, struct flchip does have a field for
 	 * maximum timeout, only for typical which can be far too short
-	 * depending of the conditions.	 The ' + 1' is to avoid having a
-	 * timeout of 0 jiffies if HZ is smaller than 1000.
+	 * depending of the conditions.	 The ' + 2' is to ensure we get
+	 * *at least* 1000us of timeout.
 	 */
-	unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
+	unsigned long uWriteTimeout = (HZ / 1000) + 2;
 	int ret = 0;
 	map_word oldd;
 	int retry_cnt = 0;
@@ -1382,8 +1382,11 @@  static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
 {
 	struct cfi_private *cfi = map->fldrv_priv;
 	unsigned long timeo = jiffies + HZ;
-	/* see comments in do_write_oneword() regarding uWriteTimeo. */
-	unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
+	/* see comments in do_write_oneword() regarding uWriteTimeo.
+	   Note: write_buffer commands take longer so we use a higher
+	   time. The AMD 29LV256M for instance has a datasheet max
+	   of 1.2ms for page and 600us for byte */
+	unsigned long uWriteTimeout = (HZ / 1500) + 2;
 	int ret = -EIO;
 	unsigned long cmd_adr;
 	int z, words;