diff mbox

[10/10] mtd: spi-nor: aspeed: optimize read mode

Message ID 1491497808-25487-11-git-send-email-clg@kaod.org
State Rejected
Delegated to: Cyrille Pitchen
Headers show

Commit Message

Cédric Le Goater April 6, 2017, 4:56 p.m. UTC
This is only for SPI controllers as U-Boot should have done it already
for the FMC controller using DMAs.

The algo is based on the one found in the OpenPOWER pflash tool. It
first reads a golden buffer at low speed and then performs reads with
different clocks and delay cycles settings to find the fastest
configuration for the chip.

It can be deactivated at boot time with the kernel parameter :

	aspeed_smc.optimize_read=0

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 drivers/mtd/spi-nor/aspeed-smc.c | 191 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 191 insertions(+)

Comments

Marek Vasut April 6, 2017, 7:28 p.m. UTC | #1
On 04/06/2017 06:56 PM, Cédric Le Goater wrote:
> This is only for SPI controllers as U-Boot should have done it already
> for the FMC controller using DMAs.
> 
> The algo is based on the one found in the OpenPOWER pflash tool. It
> first reads a golden buffer at low speed and then performs reads with
> different clocks and delay cycles settings to find the fastest
> configuration for the chip.
> 
> It can be deactivated at boot time with the kernel parameter :

Something tells me this is likely gonna be pretty flaky and might lead
to unreliable configurations. The hardware manufacturer should be able
to determine the limits of the hardware and those should be put into DT
at manufacturing time IMO.
Cédric Le Goater April 11, 2017, 8:13 a.m. UTC | #2
Hello Marek,

On 04/06/2017 09:28 PM, Marek Vasut wrote:
> On 04/06/2017 06:56 PM, Cédric Le Goater wrote:
>> This is only for SPI controllers as U-Boot should have done it already
>> for the FMC controller using DMAs.
>>
>> The algo is based on the one found in the OpenPOWER pflash tool. It
>> first reads a golden buffer at low speed and then performs reads with
>> different clocks and delay cycles settings to find the fastest
>> configuration for the chip.
>>
>> It can be deactivated at boot time with the kernel parameter :
> 
> Something tells me this is likely gonna be pretty flaky and might lead
> to unreliable configurations. The hardware manufacturer should be able
> to determine the limits of the hardware and those should be put into DT
> at manufacturing time IMO.

It is not a common method but we have been using it on many OpenPOWER
platforms (P8) using the AST2400 (palmetto, habanero, firestone, 
garrison, etc) and also on the newer P9 platforms using the AST2500
(romulus, zaius, witherspoon). So I would say that experience 'proves' 
that it works. We didn't invent the SPI Timing Calibration algo, it is
described in the specs of the manufacturer.

Also, all these platforms use sockets to hold the different flash 
modules of the system. We don't know in advance what we will find 
and so it makes difficult to put any timing in the DT.    

And the values of the "data input delay cycle" (REG94) depend on the 
chip model and on the length of the wires of the board. If we were to 
hard-code such values in the DT, we would need to put only safe and slow 
settings known to work in all configurations. which is a bit what we 
do in the current driver using really slow ones.

Here is a proposal. We could activate this algo using a property such
as :
 
	"speed-mode" = "freq" or "auto-adjust"

How's that ? 

Thanks,

C.
Benjamin Herrenschmidt April 11, 2017, 8:23 a.m. UTC | #3
On Tue, 2017-04-11 at 10:13 +0200, Cédric Le Goater wrote:
> 
> > Something tells me this is likely gonna be pretty flaky and might lead
> > to unreliable configurations. The hardware manufacturer should be able
> > to determine the limits of the hardware and those should be put into DT
> > at manufacturing time IMO.
> 
> It is not a common method but we have been using it on many OpenPOWER
> platforms (P8) using the AST2400 (palmetto, habanero, firestone, 
> garrison, etc) and also on the newer P9 platforms using the AST2500
> (romulus, zaius, witherspoon). So I would say that experience 'proves' 
> that it works. We didn't invent the SPI Timing Calibration algo, it is
> described in the specs of the manufacturer.

 ... Partially ;-) I wrote the original pflash so I can take the blame
for that one but yes, it's proven actually more reliable than any
attempt at hard wiring the settings for a given board/chip combination.

> Also, all these platforms use sockets to hold the different flash 
> modules of the system. We don't know in advance what we will find 
> and so it makes difficult to put any timing in the DT.    

Correct. Plus random revisions changing the trace length etc... the
problem isn't so much the SPI frequency which can theorically be
derived from the detected chip, as long as we stay within reasonable
limits. There seem to be sub-cycle read delays that need to be inserted
for reads to be reliable. The controller allow to manipulate those
delays rather precisely but the "right" value is hard to guess and
depend on a given combination of system trace length, flash chip, flash
chip revision even etc...

The algorithm we use goes through several passes over ranges of timings
and delays and picks a "safe" middle ground, ie, picks a combination
that is good *and* have both adjacent combinations good.

It also makes sure that there's a good enough distribution of 0's and
1's in the flash.

If any of the above fails, it falls back to slow and safe timings.

I suggested to Cedric the stuff he proposed below, which is to make it
a manufacturer choice to use the auto calibration or not via the DT.

However ...

> And the values of the "data input delay cycle" (REG94) depend on the 
> chip model and on the length of the wires of the board. If we were to 
> hard-code such values in the DT, we would need to put only safe and slow 
> settings known to work in all configurations. which is a bit what we 
> do in the current driver using really slow ones.
> 
> Here is a proposal. We could activate this algo using a property such
> as :
>  
> 	"speed-mode" = "freq" or "auto-adjust"

You need more than just "freq".

At the moment we use a command freq which is somewhat safe and a
different read freq. There's also a write freq which could be different
based on the flash chip. It also depends on the command mode used (fast
read, dual IO, ...).

The problem is that you need to also provide a way to specify that gate
delay correction (REG94) in addition too. My original implementation in
pflash was forced to slow commands right down (which sucks for write
and erase) because it didn't apply that correction factor to the reads
of the status register and thus was getting occasional garbage there
which can be fatal.

So I would provide a way to specify the "normal+write+erase command"
and the "read command" timing separately, and not as a frequency but as
a precise set of divider and REG94 since the exact frequency obtained
from the divider rather than the "target" frequency is what matters.

> How's that ? 
> 
> Thanks,
> 
> C.
Marek Vasut April 14, 2017, 4:18 p.m. UTC | #4
On 04/11/2017 10:13 AM, Cédric Le Goater wrote:
> Hello Marek,
> 
> On 04/06/2017 09:28 PM, Marek Vasut wrote:
>> On 04/06/2017 06:56 PM, Cédric Le Goater wrote:
>>> This is only for SPI controllers as U-Boot should have done it already
>>> for the FMC controller using DMAs.
>>>
>>> The algo is based on the one found in the OpenPOWER pflash tool. It
>>> first reads a golden buffer at low speed and then performs reads with
>>> different clocks and delay cycles settings to find the fastest
>>> configuration for the chip.
>>>
>>> It can be deactivated at boot time with the kernel parameter :
>>
>> Something tells me this is likely gonna be pretty flaky and might lead
>> to unreliable configurations. The hardware manufacturer should be able
>> to determine the limits of the hardware and those should be put into DT
>> at manufacturing time IMO.
> 
> It is not a common method but we have been using it on many OpenPOWER
> platforms (P8) using the AST2400 (palmetto, habanero, firestone, 
> garrison, etc) and also on the newer P9 platforms using the AST2500
> (romulus, zaius, witherspoon). So I would say that experience 'proves' 
> that it works. We didn't invent the SPI Timing Calibration algo, it is
> described in the specs of the manufacturer.
> 
> Also, all these platforms use sockets to hold the different flash 
> modules of the system. We don't know in advance what we will find 
> and so it makes difficult to put any timing in the DT.    
> 
> And the values of the "data input delay cycle" (REG94) depend on the 
> chip model and on the length of the wires of the board. If we were to 
> hard-code such values in the DT, we would need to put only safe and slow 
> settings known to work in all configurations. which is a bit what we 
> do in the current driver using really slow ones.

Aha, thanks for clarifying.

> Here is a proposal. We could activate this algo using a property such
> as :
>  
> 	"speed-mode" = "freq" or "auto-adjust"
> 
> How's that ? 

Thinking about it a bit more, I think having a module parameter is the
right approach here ... or maybe compile-time switch. This shouldn't be
in DT as it's not HW property.

> Thanks,
> 
> C.
>
Benjamin Herrenschmidt April 14, 2017, 9:40 p.m. UTC | #5
On Fri, 2017-04-14 at 18:18 +0200, Marek Vasut wrote:
> > Here is a proposal. We could activate this algo using a property such
> > as :
> >   
> >        "speed-mode" = "freq" or "auto-adjust"
> > 
> > How's that ? 
> 
> Thinking about it a bit more, I think having a module parameter is the
> right approach here ... or maybe compile-time switch. This shouldn't be
> in DT as it's not HW property.

Strong disagreement here :-)

DT is not *strictly* HW properties. Never was despite what some
fanatics around might say :-) Its also platform properties and can
include policies.

We put things like UART speeds in there, MAC addresses, etc... it makes
sense to put calibration info and in this case, request to perform SW
calibration.

Module parameters are crap. They are a major pain to use, they are in
practice only good for tweaking/experimenting.

Ben.
Marek Vasut April 14, 2017, 9:51 p.m. UTC | #6
On 04/14/2017 11:40 PM, Benjamin Herrenschmidt wrote:
> On Fri, 2017-04-14 at 18:18 +0200, Marek Vasut wrote:
>>> Here is a proposal. We could activate this algo using a property such
>>> as :
>>>
>>>        "speed-mode" = "freq" or "auto-adjust"
>>>
>>> How's that ?
>>
>> Thinking about it a bit more, I think having a module parameter is the
>> right approach here ... or maybe compile-time switch. This shouldn't be
>> in DT as it's not HW property.
>
> Strong disagreement here :-)
>
> DT is not *strictly* HW properties. Never was despite what some
> fanatics around might say :-) Its also platform properties and can
> include policies.

/me grabs popcorn ... :-)

> We put things like UART speeds in there, MAC addresses, etc...

UART speeds or UART max/allowed speeds ? That's basically HW property
since flaky HW might not allow all sorts of UART speed options. MAC
address is a HW property.

> it makes
> sense to put calibration info and in this case, request to perform SW
> calibration.

That's a hard question and I don't have the right answer to this.

> Module parameters are crap. They are a major pain to use, they are in
> practice only good for tweaking/experimenting.

That's correct, but then turning the calibration off would probably be
only used in such experimental setups or during HW bringup (if at all).
Based on the discussion thus far, my impression is that the preferred
and mostly used default is calibration enabled.

> Ben.
>
Benjamin Herrenschmidt April 14, 2017, 10:11 p.m. UTC | #7
On Fri, 2017-04-14 at 23:51 +0200, Marek Vasut wrote:
> On 04/14/2017 11:40 PM, Benjamin Herrenschmidt wrote:
> > 
> > Strong disagreement here :-)
> > 
> > DT is not *strictly* HW properties. Never was despite what some
> > fanatics around might say :-) Its also platform properties and can
> > include policies.
> 
> /me grabs popcorn ... :-)

Be my guest ! I only invented the bloody thing in the first place after
all ;-) (Well, the FDT format rather, and its use in Linux, the DT
itself dates back from Open Firmware).

> > We put things like UART speeds in there, MAC addresses, etc...
> 
> UART speeds or UART max/allowed speeds ? That's basically HW property
> since flaky HW might not allow all sorts of UART speed options. MAC
> address is a HW property.

Both. The point is that there is no hard lines. Every time people come
up with hard lines, we end up with inflexible horror shows that fail to
solve practical issues on the field.

There is no good reason to forbid passing such a simple policy argument
that way. None. Other than ideological that is.

> > it makes
> > sense to put calibration info and in this case, request to perform
> > SW
> > calibration.
> 
> That's a hard question and I don't have the right answer to this.

I do, and it's fine :-)

> > Module parameters are crap. They are a major pain to use, they are
> > in
> > practice only good for tweaking/experimenting.
> 
> That's correct, but then turning the calibration off would probably
> be only used in such experimental setups or during HW bringup (if at
> all).
> Based on the discussion thus far, my impression is that thepreferred
> and mostly used default is calibration enabled.

Probably yes. So we could reverse the problem and say that we have
the calibration enabled by default, and an optional device-tree
property to force a fixed speed.

That becomes akin to what we do with Ethernet PHYs for example :)

Cheers,
Ben.

> 
>
Marek Vasut April 14, 2017, 10:25 p.m. UTC | #8
On 04/15/2017 12:11 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2017-04-14 at 23:51 +0200, Marek Vasut wrote:
>> On 04/14/2017 11:40 PM, Benjamin Herrenschmidt wrote:
>>>
>>> Strong disagreement here :-)
>>>
>>> DT is not *strictly* HW properties. Never was despite what some
>>> fanatics around might say :-) Its also platform properties and can
>>> include policies.
>>
>> /me grabs popcorn ... :-)
> 
> Be my guest ! I only invented the bloody thing in the first place after
> all ;-) (Well, the FDT format rather, and its use in Linux, the DT
> itself dates back from Open Firmware).

:-)

>>> We put things like UART speeds in there, MAC addresses, etc...
>>
>> UART speeds or UART max/allowed speeds ? That's basically HW property
>> since flaky HW might not allow all sorts of UART speed options. MAC
>> address is a HW property.
> 
> Both. The point is that there is no hard lines. Every time people come
> up with hard lines, we end up with inflexible horror shows that fail to
> solve practical issues on the field.

True

> There is no good reason to forbid passing such a simple policy argument
> that way. None. Other than ideological that is.
>
>>> it makes
>>> sense to put calibration info and in this case, request to perform
>>> SW
>>> calibration.
>>
>> That's a hard question and I don't have the right answer to this.
> 
> I do, and it's fine :-)
> 
>>> Module parameters are crap. They are a major pain to use, they are
>>> in
>>> practice only good for tweaking/experimenting.
>>
>> That's correct, but then turning the calibration off would probably
>> be only used in such experimental setups or during HW bringup (if at
>> all).
>> Based on the discussion thus far, my impression is that thepreferred
>> and mostly used default is calibration enabled.
> 
> Probably yes. So we could reverse the problem and say that we have
> the calibration enabled by default, and an optional device-tree
> property to force a fixed speed.

That sounds like a good option, yes. And if it's just forcing fixed
speed, that's awesome, but I think there are more values that might need
to be passed ... I think that's what Cedric can answer.

> That becomes akin to what we do with Ethernet PHYs for example :)

Even better, I recall seeing some speed DT props in the SPI binding
docs, so we already have those in place.

> Cheers,
> Ben.
> 
>>
>>
Benjamin Herrenschmidt April 14, 2017, 10:42 p.m. UTC | #9
On Sat, 2017-04-15 at 00:25 +0200, Marek Vasut wrote:
> That sounds like a good option, yes. And if it's just forcing fixed
> speed, that's awesome, but I think there are more values that might need
> to be passed ... I think that's what Cedric can answer.

Yes, fixed and delay. Not a huge deal.

Also a module parameter makes it hard to specify different settings for
different instances of the device/flash. IE, there can be multiple
flash controllers and each of them can have multiple chip selects.

The DT is the only sane way for that.

> > That becomes akin to what we do with Ethernet PHYs for example :)
> 
> Even better, I recall seeing some speed DT props in the SPI binding
> docs, so we already have those in place.

Right though the gate delay is rather IP block specific but that's
not a huge issue.

Cheers,
Ben.
Marek Vasut April 16, 2017, 6:46 p.m. UTC | #10
On 04/15/2017 12:42 AM, Benjamin Herrenschmidt wrote:
> On Sat, 2017-04-15 at 00:25 +0200, Marek Vasut wrote:
>> That sounds like a good option, yes. And if it's just forcing fixed
>> speed, that's awesome, but I think there are more values that might need
>> to be passed ... I think that's what Cedric can answer.
> 
> Yes, fixed and delay. Not a huge deal.

OK, that works.

> Also a module parameter makes it hard to specify different settings for
> different instances of the device/flash. IE, there can be multiple
> flash controllers and each of them can have multiple chip selects.
> 
> The DT is the only sane way for that.

:-)

>>> That becomes akin to what we do with Ethernet PHYs for example :)
>>
>> Even better, I recall seeing some speed DT props in the SPI binding
>> docs, so we already have those in place.
> 
> Right though the gate delay is rather IP block specific but that's
> not a huge issue.

Yep.
Cédric Le Goater April 18, 2017, 3:46 p.m. UTC | #11
On 04/15/2017 12:42 AM, Benjamin Herrenschmidt wrote:
> On Sat, 2017-04-15 at 00:25 +0200, Marek Vasut wrote:
>> That sounds like a good option, yes. And if it's just forcing fixed
>> speed, that's awesome, but I think there are more values that might need
>> to be passed ... I think that's what Cedric can answer.
> 
> Yes, fixed and delay. Not a huge deal.

Do we really need to specify the delays in the DT ? As we loop on 
the different delay settings, if a setting is bogus, we can just 
pick the following HCLK divider. No ? I don't think it is worth 
adding black magic properties like this in the DT. We never had to 
tune it manually AFAICT and anyway, we can always add that later 
if needed. 

So, that would leave two properties :

  - safe divider for "normal/write/erase" commands
  - speedy divider for "read" commands

If the property is not present, we would keep the low HW default, 
which is /16. 

If there is such a property, the divider value would be considered 
as a max. The range should be 1..5. Let's introduce "literal" values 
like "min" and "max" for 1 and 5 ?  

Do we care for the other HCLK settings < 5 for which we can not set
any delay ?  

> Also a module parameter makes it hard to specify different settings for
> different instances of the device/flash. IE, there can be multiple
> flash controllers and each of them can have multiple chip selects.
> 
> The DT is the only sane way for that.

Yes. Can we keep the global module parameter as a global chicken 
switch to deactivate the algo ? It could be useful, some chips tend
to freak out when hammered on a bit too much. 

Thanks,

C.


>>> That becomes akin to what we do with Ethernet PHYs for example :)
>>
>> Even better, I recall seeing some speed DT props in the SPI binding
>> docs, so we already have those in place.
> 
> Right though the gate delay is rather IP block specific but that's
> not a huge issue.
> 
> Cheers,
> Ben.
>
Marek Vasut April 18, 2017, 4:51 p.m. UTC | #12
On 04/18/2017 05:46 PM, Cédric Le Goater wrote:
> On 04/15/2017 12:42 AM, Benjamin Herrenschmidt wrote:
>> On Sat, 2017-04-15 at 00:25 +0200, Marek Vasut wrote:
>>> That sounds like a good option, yes. And if it's just forcing fixed
>>> speed, that's awesome, but I think there are more values that might need
>>> to be passed ... I think that's what Cedric can answer.
>>
>> Yes, fixed and delay. Not a huge deal.
> 
> Do we really need to specify the delays in the DT ? As we loop on 
> the different delay settings, if a setting is bogus, we can just 
> pick the following HCLK divider. No ? I don't think it is worth 
> adding black magic properties like this in the DT. We never had to 
> tune it manually AFAICT and anyway, we can always add that later 
> if needed. 
> 
> So, that would leave two properties :
> 
>   - safe divider for "normal/write/erase" commands
>   - speedy divider for "read" commands
> 
> If the property is not present, we would keep the low HW default, 
> which is /16. 
> 
> If there is such a property, the divider value would be considered 
> as a max. The range should be 1..5. Let's introduce "literal" values 
> like "min" and "max" for 1 and 5 ?  

Can we use the spi max frequency which is a standard property and be
done with it ?

> Do we care for the other HCLK settings < 5 for which we can not set
> any delay ?  

I cannot tell, you're the expert :)

>> Also a module parameter makes it hard to specify different settings for
>> different instances of the device/flash. IE, there can be multiple
>> flash controllers and each of them can have multiple chip selects.
>>
>> The DT is the only sane way for that.
> 
> Yes. Can we keep the global module parameter as a global chicken 
> switch to deactivate the algo ? It could be useful, some chips tend
> to freak out when hammered on a bit too much. 

Can we also use the spi-max-frequency here somehow ?

> Thanks,
> 
> C.
> 
> 
>>>> That becomes akin to what we do with Ethernet PHYs for example :)
>>>
>>> Even better, I recall seeing some speed DT props in the SPI binding
>>> docs, so we already have those in place.
>>
>> Right though the gate delay is rather IP block specific but that's
>> not a huge issue.
>>
>> Cheers,
>> Ben.
>>
>
Benjamin Herrenschmidt April 18, 2017, 10:41 p.m. UTC | #13
On Tue, 2017-04-18 at 17:46 +0200, Cédric Le Goater wrote:
> On 04/15/2017 12:42 AM, Benjamin Herrenschmidt wrote:
> > On Sat, 2017-04-15 at 00:25 +0200, Marek Vasut wrote:
> > > That sounds like a good option, yes. And if it's just forcing
> > > fixed
> > > speed, that's awesome, but I think there are more values that
> > > might need
> > > to be passed ... I think that's what Cedric can answer.
> > 
> > Yes, fixed and delay. Not a huge deal.
> 
> Do we really need to specify the delays in the DT ? As we loop on 
> the different delay settings, if a setting is bogus, we can just 
> pick the following HCLK divider. No ? I don't think it is worth 
> adding black magic properties like this in the DT. We never had to 
> tune it manually AFAICT and anyway, we can always add that later 
> if needed. 
> 
> So, that would leave two properties :
> 
>   - safe divider for "normal/write/erase" commands
>   - speedy divider for "read" commands

When the properties are present, we don't calibrate. So we need the
delay being specified. There's no "black magic" about it, it's about
specifying the appropriate configuration of the controller for the
given motherboard layout/flashchip/speed.

> If the property is not present, we would keep the low HW default, 
> which is /16. 

No misunderstood. The default is to calibrate. The properties allow you
to specify a fixed speed/delay combination in case the system has been
fully calibrated in the factory.

> If there is such a property, the divider value would be considered 
> as a max. The range should be 1..5. Let's introduce "literal" values 
> like "min" and "max" for 1 and 5 ?  
> 
> Do we care for the other HCLK settings < 5 for which we can not set
> any delay ?  

> > Also a module parameter makes it hard to specify different settings
> > for
> > different instances of the device/flash. IE, there can be multiple
> > flash controllers and each of them can have multiple chip selects.
> > 
> > The DT is the only sane way for that.
> 
> Yes. Can we keep the global module parameter as a global chicken 
> switch to deactivate the algo ? It could be useful, some chips tend
> to freak out when hammered on a bit too much. 

Yes, that's useful for diagnosis when things go wrong.

Cheers,
Ben.

> Thanks,
> 
> C.
> 
> 
> > > > That becomes akin to what we do with Ethernet PHYs for example
> > > > :)
> > > 
> > > Even better, I recall seeing some speed DT props in the SPI
> > > binding
> > > docs, so we already have those in place.
> > 
> > Right though the gate delay is rather IP block specific but that's
> > not a huge issue.
> > 
> > Cheers,
> > Ben.
> >
diff mbox

Patch

diff --git a/drivers/mtd/spi-nor/aspeed-smc.c b/drivers/mtd/spi-nor/aspeed-smc.c
index 1b398303f039..875b029198fc 100644
--- a/drivers/mtd/spi-nor/aspeed-smc.c
+++ b/drivers/mtd/spi-nor/aspeed-smc.c
@@ -22,6 +22,7 @@ 
 #include <linux/mtd/spi-nor.h>
 #include <linux/of.h>
 #include <linux/of_platform.h>
+#include <linux/slab.h>
 #include <linux/sysfs.h>
 
 #define DEVICE_NAME	"aspeed-smc"
@@ -43,13 +44,17 @@  struct aspeed_smc_info {
 	bool hastype;		/* flash type field exists in config reg */
 	u8 we0;			/* shift for write enable bit for CE0 */
 	u8 ctl0;		/* offset in regs of ctl for CE0 */
+	u8 timing;		/* offset in regs of timing */
 	bool has_dma;
 
 	void (*set_4b)(struct aspeed_smc_chip *chip);
+	int (*optimize_read)(struct aspeed_smc_chip *chip, u32 max_freq);
 };
 
 static void aspeed_smc_chip_set_4b_spi_2400(struct aspeed_smc_chip *chip);
 static void aspeed_smc_chip_set_4b(struct aspeed_smc_chip *chip);
+static int aspeed_smc_optimize_read(struct aspeed_smc_chip *chip,
+				     u32 max_freq);
 
 static const struct aspeed_smc_info fmc_2400_info = {
 	.maxsize = 256 * 1024 * 1024,
@@ -57,6 +62,7 @@  static const struct aspeed_smc_info fmc_2400_info = {
 	.hastype = true,
 	.we0 = 16,
 	.ctl0 = 0x10,
+	.timing = 0x94,
 	.has_dma = true,
 	.set_4b = aspeed_smc_chip_set_4b,
 };
@@ -67,8 +73,10 @@  static const struct aspeed_smc_info spi_2400_info = {
 	.hastype = false,
 	.we0 = 0,
 	.ctl0 = 0x04,
+	.timing = 0x94,
 	.has_dma = false,
 	.set_4b = aspeed_smc_chip_set_4b_spi_2400,
+	.optimize_read = aspeed_smc_optimize_read,
 };
 
 static const struct aspeed_smc_info fmc_2500_info = {
@@ -77,6 +85,7 @@  static const struct aspeed_smc_info fmc_2500_info = {
 	.hastype = true,
 	.we0 = 16,
 	.ctl0 = 0x10,
+	.timing = 0x94,
 	.has_dma = true,
 	.set_4b = aspeed_smc_chip_set_4b,
 };
@@ -87,8 +96,10 @@  static const struct aspeed_smc_info spi_2500_info = {
 	.hastype = false,
 	.we0 = 16,
 	.ctl0 = 0x10,
+	.timing = 0x94,
 	.has_dma = false,
 	.set_4b = aspeed_smc_chip_set_4b,
+	.optimize_read = aspeed_smc_optimize_read,
 };
 
 enum aspeed_smc_ctl_reg_value {
@@ -249,6 +260,12 @@  module_param(min_dma_size, uint, 0644);
 static unsigned int dma_timeout = 200;
 module_param(dma_timeout, uint, 0644);
 
+/*
+ * Switch to turn off read optimisation if needed
+ */
+static bool optimize_read = true;
+module_param(optimize_read, bool, 0644);
+
 static void aspeed_smc_dma_done(struct aspeed_smc_controller *controller)
 {
 	writel(0, controller->regs + INTERRUPT_STATUS_REG);
@@ -865,6 +882,174 @@  static int aspeed_smc_chip_setup_init(struct aspeed_smc_chip *chip,
 	return 0;
 }
 
+
+#define CALIBRATE_BUF_SIZE 16384
+
+static bool aspeed_smc_check_reads(struct aspeed_smc_chip *chip,
+				  const u8 *golden_buf, u8 *test_buf)
+{
+	int i;
+
+	for (i = 0; i < 10; i++) {
+		aspeed_smc_read_from_ahb(test_buf, chip->ahb_base,
+					 CALIBRATE_BUF_SIZE);
+		if (memcmp(test_buf, golden_buf, CALIBRATE_BUF_SIZE) != 0)
+			return false;
+	}
+	return true;
+}
+
+static int aspeed_smc_calibrate_reads(struct aspeed_smc_chip *chip, u32 hdiv,
+				      const u8 *golden_buf, u8 *test_buf)
+{
+	struct aspeed_smc_controller *controller = chip->controller;
+	const struct aspeed_smc_info *info = controller->info;
+	int i;
+	int good_pass = -1, pass_count = 0;
+	u32 shift = (hdiv - 1) << 2;
+	u32 mask = ~(0xfu << shift);
+	u32 fread_timing_val = 0;
+
+#define FREAD_TPASS(i)	(((i) / 2) | (((i) & 1) ? 0 : 8))
+
+	/* Try HCLK delay 0..5, each one with/without delay and look for a
+	 * good pair.
+	 */
+	for (i = 0; i < 12; i++) {
+		bool pass;
+
+		fread_timing_val &= mask;
+		fread_timing_val |= FREAD_TPASS(i) << shift;
+
+		writel(fread_timing_val, controller->regs + info->timing);
+		pass = aspeed_smc_check_reads(chip, golden_buf, test_buf);
+		dev_dbg(chip->nor.dev,
+			"  * [%08x] %d HCLK delay, %dns DI delay : %s",
+			fread_timing_val, i/2, (i & 1) ? 0 : 4,
+			pass ? "PASS" : "FAIL");
+		if (pass) {
+			pass_count++;
+			if (pass_count == 3) {
+				good_pass = i - 1;
+				break;
+			}
+		} else
+			pass_count = 0;
+	}
+
+	/* No good setting for this frequency */
+	if (good_pass < 0)
+		return -1;
+
+	/* We have at least one pass of margin, let's use first pass */
+	fread_timing_val &= mask;
+	fread_timing_val |= FREAD_TPASS(good_pass) << shift;
+	writel(fread_timing_val, controller->regs + info->timing);
+	dev_dbg(chip->nor.dev, " * -> good is pass %d [0x%08x]",
+		good_pass, fread_timing_val);
+	return 0;
+}
+
+static bool aspeed_smc_check_calib_data(const u8 *test_buf, u32 size)
+{
+	const u32 *tb32 = (const u32 *) test_buf;
+	u32 i, cnt = 0;
+
+	/* We check if we have enough words that are neither all 0
+	 * nor all 1's so the calibration can be considered valid.
+	 *
+	 * I use an arbitrary threshold for now of 64
+	 */
+	size >>= 2;
+	for (i = 0; i < size; i++) {
+		if (tb32[i] != 0 && tb32[i] != 0xffffffff)
+			cnt++;
+	}
+	return cnt >= 64;
+}
+
+static const uint32_t aspeed_smc_hclk_divs[] = {
+	0xf, /* HCLK */
+	0x7, /* HCLK/2 */
+	0xe, /* HCLK/3 */
+	0x6, /* HCLK/4 */
+	0xd, /* HCLK/5 */
+};
+#define ASPEED_SMC_HCLK_DIV(i) (aspeed_smc_hclk_divs[(i) - 1] << 8)
+
+static int aspeed_smc_optimize_read(struct aspeed_smc_chip *chip,
+				     u32 max_freq)
+{
+	u8 *golden_buf, *test_buf;
+	int i, rc, best_div = -1;
+	u32 save_read_val = chip->ctl_val[smc_read];
+	u32 ahb_freq = clk_get_rate(chip->controller->ahb_clk);
+
+	dev_dbg(chip->nor.dev, "AHB frequency: %d MHz", ahb_freq / 1000000);
+
+	test_buf = kmalloc(CALIBRATE_BUF_SIZE * 2, GFP_KERNEL);
+	golden_buf = test_buf + CALIBRATE_BUF_SIZE;
+
+	/* We start with the dumbest setting (keep 4Byte bit) and read
+	 * some data
+	 */
+	chip->ctl_val[smc_read] = (chip->ctl_val[smc_read] & 0x2000) |
+		(0x00 << 28) | /* Single bit */
+		(0x00 << 24) | /* CE# max */
+		(0x03 << 16) | /* use normal reads */
+		(0x00 <<  8) | /* HCLK/16 */
+		(0x00 <<  6) | /* no dummy cycle */
+		(0x00);        /* normal read */
+
+	writel(chip->ctl_val[smc_read], chip->ctl);
+
+	aspeed_smc_read_from_ahb(golden_buf, chip->ahb_base,
+				 CALIBRATE_BUF_SIZE);
+
+	/* Establish our read mode with freq field set to 0 (HCLK/16) */
+	chip->ctl_val[smc_read] = save_read_val & 0xfffff0ff;
+
+	/* Check if calibration data is suitable */
+	if (!aspeed_smc_check_calib_data(golden_buf, CALIBRATE_BUF_SIZE)) {
+		dev_info(chip->nor.dev,
+			 "Calibration area too uniform, using low speed");
+		writel(chip->ctl_val[smc_read], chip->ctl);
+		kfree(test_buf);
+		return 0;
+	}
+
+	/* Now we iterate the HCLK dividers until we find our breaking point */
+	for (i = ARRAY_SIZE(aspeed_smc_hclk_divs); i > 0; i--) {
+		u32 tv, freq;
+
+		/* Compare timing to max */
+		freq = ahb_freq / i;
+		if (freq >= max_freq)
+			continue;
+
+		/* Set the timing */
+		tv = chip->ctl_val[smc_read] | ASPEED_SMC_HCLK_DIV(i);
+		writel(tv, chip->ctl);
+		dev_dbg(chip->nor.dev, "Trying HCLK/%d...", i);
+		rc = aspeed_smc_calibrate_reads(chip, i, golden_buf, test_buf);
+		if (rc == 0)
+			best_div = i;
+	}
+	kfree(test_buf);
+
+	/* Nothing found ? */
+	if (best_div < 0)
+		dev_warn(chip->nor.dev, "No good frequency, using dumb slow");
+	else {
+		dev_dbg(chip->nor.dev, "Found good read timings at HCLK/%d",
+			best_div);
+		chip->ctl_val[smc_read] |= ASPEED_SMC_HCLK_DIV(best_div);
+	}
+
+	writel(chip->ctl_val[smc_read], chip->ctl);
+	return 0;
+}
+
 static int aspeed_smc_chip_setup_finish(struct aspeed_smc_chip *chip)
 {
 	struct aspeed_smc_controller *controller = chip->controller;
@@ -918,6 +1103,12 @@  static int aspeed_smc_chip_setup_finish(struct aspeed_smc_chip *chip)
 
 	dev_dbg(controller->dev, "read control register: %08x\n",
 		chip->ctl_val[smc_read]);
+
+	/*
+	 * TODO: get max freq from chip
+	 */
+	if (optimize_read && info->optimize_read)
+		info->optimize_read(chip, 104000000);
 	return 0;
 }