diff mbox

net: davinci_emac: Add pre_open, post_stop platform callbacks

Message ID 20120502234718.GA5432@animalcreek.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Mark Greer May 2, 2012, 11:47 p.m. UTC
From: "Mark A. Greer" <mgreer@animalcreek.com>

The davinci EMAC driver has been incorporated into the am35x
family of SoC's which is OMAP-based.  The incorporation is
incomplete in that the EMAC cannot unblock the [ARM] core if
its blocked on a 'wfi' instruction.  This is an issue with
the cpu_idle code because it has the core execute a 'wfi'
instruction.

To work around this issue, add platform data callbacks which
are called at the beginning of the open routine and at the
end of the stop routine of the davinci_emac driver.  The
callbacks allow the platform code to issue disable_hlt() and
enable_hlt() calls appropriately.  Calling disable_hlt()
prevents cpu_idle from issuing the 'wfi' instruction.

It is not sufficient to simply call disable_hlt() when
there is an EMAC present because it could be present but
not actually used in which case, we do want the 'wfi' to
be executed.

Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
---

I know adding platform_data callbacks are frowned upon
and I really don't want to add them but I don't see
any other way to accomplish what needs to be accomplished.

Any suggestions?

Thanks, Mark.

 drivers/net/ethernet/ti/davinci_emac.c |   14 ++++++++++++++
 include/linux/davinci_emac.h           |    2 ++
 2 files changed, 16 insertions(+)

Comments

Bedia, Vaibhav May 3, 2012, 10:44 a.m. UTC | #1
On Thu, May 03, 2012 at 05:17:18, Mark A. Greer wrote:
> From: "Mark A. Greer" <mgreer@animalcreek.com>
> 
> The davinci EMAC driver has been incorporated into the am35x
> family of SoC's which is OMAP-based.  The incorporation is
> incomplete in that the EMAC cannot unblock the [ARM] core if
> its blocked on a 'wfi' instruction.  This is an issue with
> the cpu_idle code because it has the core execute a 'wfi'
> instruction.
> 
> To work around this issue, add platform data callbacks which
> are called at the beginning of the open routine and at the
> end of the stop routine of the davinci_emac driver.  The
> callbacks allow the platform code to issue disable_hlt() and
> enable_hlt() calls appropriately.  Calling disable_hlt()
> prevents cpu_idle from issuing the 'wfi' instruction.
> 
> It is not sufficient to simply call disable_hlt() when
> there is an EMAC present because it could be present but
> not actually used in which case, we do want the 'wfi' to
> be executed.
> 

Are you trying to say that if ARM executes _just_ wfi and _absolutely
nothing else_ is done in the OMAP PM code, EMAC stops working?

However, if this is indeed the case, then probably a better solution would be
to invoke disable_hlt() from the board file when EMAC support is compiled in.

Regards,
Vaibhav
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kevin Hilman May 3, 2012, 2:22 p.m. UTC | #2
"Bedia, Vaibhav" <vaibhav.bedia@ti.com> writes:

> On Thu, May 03, 2012 at 05:17:18, Mark A. Greer wrote:
>> From: "Mark A. Greer" <mgreer@animalcreek.com>
>> 
>> The davinci EMAC driver has been incorporated into the am35x
>> family of SoC's which is OMAP-based.  The incorporation is
>> incomplete in that the EMAC cannot unblock the [ARM] core if
>> its blocked on a 'wfi' instruction.  This is an issue with
>> the cpu_idle code because it has the core execute a 'wfi'
>> instruction.
>> 
>> To work around this issue, add platform data callbacks which
>> are called at the beginning of the open routine and at the
>> end of the stop routine of the davinci_emac driver.  The
>> callbacks allow the platform code to issue disable_hlt() and
>> enable_hlt() calls appropriately.  Calling disable_hlt()
>> prevents cpu_idle from issuing the 'wfi' instruction.
>> 
>> It is not sufficient to simply call disable_hlt() when
>> there is an EMAC present because it could be present but
>> not actually used in which case, we do want the 'wfi' to
>> be executed.
>> 
>
> Are you trying to say that if ARM executes _just_ wfi and _absolutely
> nothing else_ is done in the OMAP PM code, EMAC stops working?
>
> However, if this is indeed the case, then probably a better solution would be
> to invoke disable_hlt() from the board file when EMAC support is compiled in.

No.  As Mark stated in the changelog, doing that will prevent any
low-power states states even if the EMAC is not in use.  IMO, it is best
to only prevent WFI when absolutely needed.

Kevin

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Greer May 3, 2012, 4:09 p.m. UTC | #3
On Thu, May 03, 2012 at 10:44:44AM +0000, Bedia, Vaibhav wrote:
> On Thu, May 03, 2012 at 05:17:18, Mark A. Greer wrote:
> > From: "Mark A. Greer" <mgreer@animalcreek.com>
> > 
> > The davinci EMAC driver has been incorporated into the am35x
> > family of SoC's which is OMAP-based.  The incorporation is
> > incomplete in that the EMAC cannot unblock the [ARM] core if
> > its blocked on a 'wfi' instruction.  This is an issue with
> > the cpu_idle code because it has the core execute a 'wfi'
> > instruction.
> > 
> > To work around this issue, add platform data callbacks which
> > are called at the beginning of the open routine and at the
> > end of the stop routine of the davinci_emac driver.  The
> > callbacks allow the platform code to issue disable_hlt() and
> > enable_hlt() calls appropriately.  Calling disable_hlt()
> > prevents cpu_idle from issuing the 'wfi' instruction.
> > 
> > It is not sufficient to simply call disable_hlt() when
> > there is an EMAC present because it could be present but
> > not actually used in which case, we do want the 'wfi' to
> > be executed.
> > 
> 
> Are you trying to say that if ARM executes _just_ wfi and _absolutely
> nothing else_ is done in the OMAP PM code, EMAC stops working?

No, I'm saying the EMAC can't wake the core from the wfi so if nothing
else happens in the system, its effectively hung.  If something else
does happen in the system (e.g., a timer expires), the the system is
extremely slow because because its only waking up when a timer (or
something else wakes it up--but not net traffic).  This is very apparent
when using an nfs-mounted rootfs. It doesn't hang but its extremely
slow because occasionally something else wakes up the core but it
spends most of its time stuck in the wfi when it should be handling
net/nfs traffic.

> However, if this is indeed the case, then probably a better solution would be
> to invoke disable_hlt() from the board file when EMAC support is compiled in.

Kevin addressed this one.  Thanks Kevin.

Mark
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bedia, Vaibhav May 3, 2012, 6:21 p.m. UTC | #4
On Thu, May 03, 2012 at 21:39:18, Mark A. Greer wrote:
> On Thu, May 03, 2012 at 10:44:44AM +0000, Bedia, Vaibhav wrote:
> > On Thu, May 03, 2012 at 05:17:18, Mark A. Greer wrote:
> > > From: "Mark A. Greer" <mgreer@animalcreek.com>
> > > 
> > > The davinci EMAC driver has been incorporated into the am35x
> > > family of SoC's which is OMAP-based.  The incorporation is
> > > incomplete in that the EMAC cannot unblock the [ARM] core if
> > > its blocked on a 'wfi' instruction.  This is an issue with
> > > the cpu_idle code because it has the core execute a 'wfi'
> > > instruction.
> > > 
> > > To work around this issue, add platform data callbacks which
> > > are called at the beginning of the open routine and at the
> > > end of the stop routine of the davinci_emac driver.  The
> > > callbacks allow the platform code to issue disable_hlt() and
> > > enable_hlt() calls appropriately.  Calling disable_hlt()
> > > prevents cpu_idle from issuing the 'wfi' instruction.
> > > 
> > > It is not sufficient to simply call disable_hlt() when
> > > there is an EMAC present because it could be present but
> > > not actually used in which case, we do want the 'wfi' to
> > > be executed.
> > > 
> > 
> > Are you trying to say that if ARM executes _just_ wfi and _absolutely
> > nothing else_ is done in the OMAP PM code, EMAC stops working?
> 
> No, I'm saying the EMAC can't wake the core from the wfi so if nothing
> else happens in the system, its effectively hung.  If something else
> does happen in the system (e.g., a timer expires), the the system is
> extremely slow because because its only waking up when a timer (or
> something else wakes it up--but not net traffic).  This is very apparent
> when using an nfs-mounted rootfs. It doesn't hang but its extremely
> slow because occasionally something else wakes up the core but it
> spends most of its time stuck in the wfi when it should be handling
> net/nfs traffic.
> 

So, if I understood this correctly, it's effectively like blocking a low power
state transition (here wfi execution) when EMAC is active?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Greer May 3, 2012, 6:46 p.m. UTC | #5
On Thu, May 03, 2012 at 06:21:27PM +0000, Bedia, Vaibhav wrote:
> On Thu, May 03, 2012 at 21:39:18, Mark A. Greer wrote:
> > On Thu, May 03, 2012 at 10:44:44AM +0000, Bedia, Vaibhav wrote:
> > > On Thu, May 03, 2012 at 05:17:18, Mark A. Greer wrote:
> > > > From: "Mark A. Greer" <mgreer@animalcreek.com>
> > > > 
> > > > The davinci EMAC driver has been incorporated into the am35x
> > > > family of SoC's which is OMAP-based.  The incorporation is
> > > > incomplete in that the EMAC cannot unblock the [ARM] core if
> > > > its blocked on a 'wfi' instruction.  This is an issue with
> > > > the cpu_idle code because it has the core execute a 'wfi'
> > > > instruction.
> > > > 
> > > > To work around this issue, add platform data callbacks which
> > > > are called at the beginning of the open routine and at the
> > > > end of the stop routine of the davinci_emac driver.  The
> > > > callbacks allow the platform code to issue disable_hlt() and
> > > > enable_hlt() calls appropriately.  Calling disable_hlt()
> > > > prevents cpu_idle from issuing the 'wfi' instruction.
> > > > 
> > > > It is not sufficient to simply call disable_hlt() when
> > > > there is an EMAC present because it could be present but
> > > > not actually used in which case, we do want the 'wfi' to
> > > > be executed.
> > > > 
> > > 
> > > Are you trying to say that if ARM executes _just_ wfi and _absolutely
> > > nothing else_ is done in the OMAP PM code, EMAC stops working?
> > 
> > No, I'm saying the EMAC can't wake the core from the wfi so if nothing
> > else happens in the system, its effectively hung.  If something else
> > does happen in the system (e.g., a timer expires), the the system is
> > extremely slow because because its only waking up when a timer (or
> > something else wakes it up--but not net traffic).  This is very apparent
> > when using an nfs-mounted rootfs. It doesn't hang but its extremely
> > slow because occasionally something else wakes up the core but it
> > spends most of its time stuck in the wfi when it should be handling
> > net/nfs traffic.
> > 
> 
> So, if I understood this correctly, it's effectively like blocking a low power
> state transition (here wfi execution) when EMAC is active?

Assuming "it" is my patch, correct.

Mark
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bedia, Vaibhav May 3, 2012, 7:25 p.m. UTC | #6
On Fri, May 04, 2012 at 00:16:32, Mark A. Greer wrote:
[...]
> > 
> > So, if I understood this correctly, it's effectively like blocking a low power
> > state transition (here wfi execution) when EMAC is active?
> 
> Assuming "it" is my patch, correct.
> 

Recently I was thinking about how to get certain drivers to disallow some or all
low power states and to me this also seems to fall in a similar category.

One of the suggestions that I got was to check if the 'wakeup' entry associated with
the device under sysfs could be leveraged for this. The PM code could maintain
a whitelist (or blacklist) of devices and it decides the low power state to enter
based on the 'wakeup' entries associated with these devices. In this particular case,
maybe the driver could simply set this entry to non-wakeup capable when necessary and
then let the PM code take care of skipping the wfi execution.

Thoughts/brickbats welcome :)

Regards,
Vaibhav
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings May 3, 2012, 8:22 p.m. UTC | #7
On Thu, 2012-05-03 at 19:25 +0000, Bedia, Vaibhav wrote:
> On Fri, May 04, 2012 at 00:16:32, Mark A. Greer wrote:
> [...]
> > > 
> > > So, if I understood this correctly, it's effectively like blocking a low power
> > > state transition (here wfi execution) when EMAC is active?
> > 
> > Assuming "it" is my patch, correct.
> > 
> 
> Recently I was thinking about how to get certain drivers to disallow some or all
> low power states and to me this also seems to fall in a similar category.
> 
> One of the suggestions that I got was to check if the 'wakeup' entry associated with
> the device under sysfs could be leveraged for this. The PM code could maintain
> a whitelist (or blacklist) of devices and it decides the low power state to enter
> based on the 'wakeup' entries associated with these devices. In this particular case,
> maybe the driver could simply set this entry to non-wakeup capable when necessary and
> then let the PM code take care of skipping the wfi execution.
> 
> Thoughts/brickbats welcome :)

You can maybe (ab)use the pm_qos mechanism for this.

Ben.
Kevin Hilman May 3, 2012, 9:32 p.m. UTC | #8
Ben Hutchings <bhutchings@solarflare.com> writes:

> On Thu, 2012-05-03 at 19:25 +0000, Bedia, Vaibhav wrote:
>> On Fri, May 04, 2012 at 00:16:32, Mark A. Greer wrote:
>> [...]
>> > > 
>> > > So, if I understood this correctly, it's effectively like blocking a low power
>> > > state transition (here wfi execution) when EMAC is active?
>> > 
>> > Assuming "it" is my patch, correct.
>> > 
>> 
>> Recently I was thinking about how to get certain drivers to disallow some or all
>> low power states and to me this also seems to fall in a similar category.
>> 
>> One of the suggestions that I got was to check if the 'wakeup' entry associated with
>> the device under sysfs could be leveraged for this. The PM code could maintain
>> a whitelist (or blacklist) of devices and it decides the low power state to enter
>> based on the 'wakeup' entries associated with these devices. In this particular case,
>> maybe the driver could simply set this entry to non-wakeup capable when necessary and
>> then let the PM code take care of skipping the wfi execution.
>> 
>> Thoughts/brickbats welcome :)
>
> You can maybe (ab)use the pm_qos mechanism for this.

I thought of using this too, but it doesn't actually solve the problem:

Using PM QoS, you can avoid hitting the deeper idle states by setting a
very low wakeup latency.  However, on ARM platforms, even the shallowest
idle states use the WFI instruction, and the EMAC would still not be
able to wake the system from WFI.  A possibility would be define the
shallowest idle state to be one that doesn't call WFI and just does
cpu_relax().  However, that would only work for CPUidle since PM QoS
constraints are only checked by CPUidle.  So, a non-CPUidle kernel would
still have this bug. :(

Ultimately, this is just broken HW.  This network HW was bolted onto an
existing SoC without consideration for wakeup capabilities.  The result
is that any use of this device with networking has to completely disable
SoC power management.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bedia, Vaibhav May 4, 2012, 1:55 p.m. UTC | #9
Hi Kevin,

On Fri, May 04, 2012 at 03:02:16, Hilman, Kevin wrote:
> Ben Hutchings <bhutchings@solarflare.com> writes:
> 
> > On Thu, 2012-05-03 at 19:25 +0000, Bedia, Vaibhav wrote:
> >> On Fri, May 04, 2012 at 00:16:32, Mark A. Greer wrote:
> >> [...]
> >> > > 
> >> > > So, if I understood this correctly, it's effectively like blocking a low power
> >> > > state transition (here wfi execution) when EMAC is active?
> >> > 
> >> > Assuming "it" is my patch, correct.
> >> > 
> >> 
> >> Recently I was thinking about how to get certain drivers to disallow some or all
> >> low power states and to me this also seems to fall in a similar category.
> >> 
> >> One of the suggestions that I got was to check if the 'wakeup' entry associated with
> >> the device under sysfs could be leveraged for this. The PM code could maintain
> >> a whitelist (or blacklist) of devices and it decides the low power state to enter
> >> based on the 'wakeup' entries associated with these devices. In this particular case,
> >> maybe the driver could simply set this entry to non-wakeup capable when necessary and
> >> then let the PM code take care of skipping the wfi execution.
> >> 
> >> Thoughts/brickbats welcome :)
> >
> > You can maybe (ab)use the pm_qos mechanism for this.
> 
> I thought of using this too, but it doesn't actually solve the problem:
> 
> Using PM QoS, you can avoid hitting the deeper idle states by setting a
> very low wakeup latency.  However, on ARM platforms, even the shallowest
> idle states use the WFI instruction, and the EMAC would still not be
> able to wake the system from WFI.  A possibility would be define the
> shallowest idle state to be one that doesn't call WFI and just does
> cpu_relax().  However, that would only work for CPUidle since PM QoS
> constraints are only checked by CPUidle.  So, a non-CPUidle kernel would
> still have this bug. :(
> 
> Ultimately, this is just broken HW.  This network HW was bolted onto an
> existing SoC without consideration for wakeup capabilities.  The result
> is that any use of this device with networking has to completely disable
> SoC power management.
> 

I was checking with internally with some folks on the issue being addressed
in this patch and unfortunately no one seems to be aware of this :(
Mark mentioned nfs mounted rootfs being slow but in my limited testing I
didn't observe this on an AM3517 board. I am yet to go through the PSP code
to be fully sure that wfi instruction is indeed being executed but I wanted
to check if I need to do something specific to reproduce this at my end.

Irrespective of the above problem being present in the h/w, I feel the approach
of adding platform callbacks for blocking deeper idle states will create problems
when this is required for multiple peripherals. I agree that the default behavior
should be to support the deepest idle state based on the peripherals being used but
IMO the user should have the flexibility to change this behavior if he wishes
to do so. 

I don't know whether the usage of the 'wakeup' entries for giving this
control to users qualifies as an abuse of the infrastructure. If it does, perhaps
there should some other mechanism for letting users control the system behavior.

Regards,
Vaibhav
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kevin Hilman May 4, 2012, 2:31 p.m. UTC | #10
+Sekhar

"Bedia, Vaibhav" <vaibhav.bedia@ti.com> writes:

> Hi Kevin,
>
> On Fri, May 04, 2012 at 03:02:16, Hilman, Kevin wrote:
>> Ben Hutchings <bhutchings@solarflare.com> writes:
>> 
>> > On Thu, 2012-05-03 at 19:25 +0000, Bedia, Vaibhav wrote:
>> >> On Fri, May 04, 2012 at 00:16:32, Mark A. Greer wrote:
>> >> [...]
>> >> > > 
>> >> > > So, if I understood this correctly, it's effectively like blocking a low power
>> >> > > state transition (here wfi execution) when EMAC is active?
>> >> > 
>> >> > Assuming "it" is my patch, correct.
>> >> > 
>> >> 
>> >> Recently I was thinking about how to get certain drivers to disallow some or all
>> >> low power states and to me this also seems to fall in a similar category.
>> >> 
>> >> One of the suggestions that I got was to check if the 'wakeup' entry associated with
>> >> the device under sysfs could be leveraged for this. The PM code could maintain
>> >> a whitelist (or blacklist) of devices and it decides the low power state to enter
>> >> based on the 'wakeup' entries associated with these devices. In this particular case,
>> >> maybe the driver could simply set this entry to non-wakeup capable when necessary and
>> >> then let the PM code take care of skipping the wfi execution.
>> >> 
>> >> Thoughts/brickbats welcome :)
>> >
>> > You can maybe (ab)use the pm_qos mechanism for this.
>> 
>> I thought of using this too, but it doesn't actually solve the problem:
>> 
>> Using PM QoS, you can avoid hitting the deeper idle states by setting a
>> very low wakeup latency.  However, on ARM platforms, even the shallowest
>> idle states use the WFI instruction, and the EMAC would still not be
>> able to wake the system from WFI.  A possibility would be define the
>> shallowest idle state to be one that doesn't call WFI and just does
>> cpu_relax().  However, that would only work for CPUidle since PM QoS
>> constraints are only checked by CPUidle.  So, a non-CPUidle kernel would
>> still have this bug. :(
>> 
>> Ultimately, this is just broken HW.  This network HW was bolted onto an
>> existing SoC without consideration for wakeup capabilities.  The result
>> is that any use of this device with networking has to completely disable
>> SoC power management.
>> 
>
> I was checking with internally with some folks on the issue being addressed
> in this patch and unfortunately no one seems to be aware of this :(

Do you mean they are not aware that the EMAC cannot wakeup th SoC, or
they are not aware that having a device that cannot wakup the SoC has
such an impact on Linux.

> Mark mentioned nfs mounted rootfs being slow but in my limited testing I
> didn't observe this on an AM3517 board. I am yet to go through the PSP code
> to be fully sure that wfi instruction is indeed being executed but I wanted
> to check if I need to do something specific to reproduce this at my end.

Based on my discussion with Mark, I suspect that the kernel you're using
is simply not going idle.

> Irrespective of the above problem being present in the h/w, I feel the approach
> of adding platform callbacks for blocking deeper idle states will create problems
> when this is required for multiple peripherals. 

I agree.  If we have to do this for multiple peripherals, the curren
approach it will become unwieldy.

> I agree that the default behavior should be to support the deepest
> idle state based on the peripherals being used but IMO the user should
> have the flexibility to change this behavior if he wishes to do so.

Well, we always have the option of booting with 'nohlt' on the
commandline.

Since nobody seems to have thought about idle power management in the HW
design, maybe we shouldn't break our backs to hack around the
HW brokenness.

Personally, I'm perfectly OK leaving the default behavior of
sluggish/unresponsive devices that are not wakeup capable.  The only fix
is to not sleep, and that can be accomplished on the cmdline using
nohlt (at the expense of some energy savings.)

> I don't know whether the usage of the 'wakeup' entries for giving this
> control to users qualifies as an abuse of the infrastructure. 

It does.

> If it does, perhaps there should some other mechanism for letting
> users control the system behavior.

Come to think of it, the right solution here is probably to use runtime
PM.  We could then to add some custom hooks for davinci_emac in the
device code to use enable_hlt/disable_hlt based on activity.

In order to do that though, the davinci_emac driver needs to be runtime
PM converted.

Kevin








--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Greer May 4, 2012, 4:35 p.m. UTC | #11
On Fri, May 04, 2012 at 01:55:58PM +0000, Bedia, Vaibhav wrote:

Hi Vaibhav.

> Hi Kevin,
> 
> On Fri, May 04, 2012 at 03:02:16, Hilman, Kevin wrote:
> > Ben Hutchings <bhutchings@solarflare.com> writes:
> > 
> > > On Thu, 2012-05-03 at 19:25 +0000, Bedia, Vaibhav wrote:
> > >> On Fri, May 04, 2012 at 00:16:32, Mark A. Greer wrote:
> > >> [...]
> > >> > > 
> > >> > > So, if I understood this correctly, it's effectively like blocking a low power
> > >> > > state transition (here wfi execution) when EMAC is active?
> > >> > 
> > >> > Assuming "it" is my patch, correct.
> > >> > 
> > >> 
> > >> Recently I was thinking about how to get certain drivers to disallow some or all
> > >> low power states and to me this also seems to fall in a similar category.
> > >> 
> > >> One of the suggestions that I got was to check if the 'wakeup' entry associated with
> > >> the device under sysfs could be leveraged for this. The PM code could maintain
> > >> a whitelist (or blacklist) of devices and it decides the low power state to enter
> > >> based on the 'wakeup' entries associated with these devices. In this particular case,
> > >> maybe the driver could simply set this entry to non-wakeup capable when necessary and
> > >> then let the PM code take care of skipping the wfi execution.
> > >> 
> > >> Thoughts/brickbats welcome :)
> > >
> > > You can maybe (ab)use the pm_qos mechanism for this.
> > 
> > I thought of using this too, but it doesn't actually solve the problem:
> > 
> > Using PM QoS, you can avoid hitting the deeper idle states by setting a
> > very low wakeup latency.  However, on ARM platforms, even the shallowest
> > idle states use the WFI instruction, and the EMAC would still not be
> > able to wake the system from WFI.  A possibility would be define the
> > shallowest idle state to be one that doesn't call WFI and just does
> > cpu_relax().  However, that would only work for CPUidle since PM QoS
> > constraints are only checked by CPUidle.  So, a non-CPUidle kernel would
> > still have this bug. :(
> > 
> > Ultimately, this is just broken HW.  This network HW was bolted onto an
> > existing SoC without consideration for wakeup capabilities.  The result
> > is that any use of this device with networking has to completely disable
> > SoC power management.
> > 
> 
> I was checking with internally with some folks on the issue being addressed
> in this patch and unfortunately no one seems to be aware of this :(

This is from the TI hardware engineer that I talked to after spending many
hours trying to get the EMAC to wake up the system.  It was a private
conversation so I won't share his name/email here.  If you want to contact
him, please reach me privately.

"No, AM35x can't be waken up from CPGMAC. If customer need to wake AM35x
 up from Ethernet, a wake up interrupt signal from Ethernet phy should be
 connected to one of wakeup capable GPIO pins."

> Mark mentioned nfs mounted rootfs being slow but in my limited testing I
> didn't observe this on an AM3517 board. I am yet to go through the PSP code
> to be fully sure that wfi instruction is indeed being executed but I wanted
> to check if I need to do something specific to reproduce this at my end.

When you go through the PSP code, look for the definition & use of
omap3_can_sleep().  That routine returns '0' when either cpu_is_omap3505()
or cpu_is_omap3517() ruturns true (among other conditions).  You will see
that its used in omap3_pm_idle() to exit early so pm_idle never executes
the wfi.

I expect that you don't have CONFIG_CPU_IDLE enabled, so cpuidle has no
opportunity to execute a wfi.  If it is enabled, omap3_can_sleep() is
used in omap3_idle_bm_check() which is used in omap3_enter_idle_bm()
so the wfi won't be executed when omap3_enter_idle_bm() is called.
omap3_enter_idle() isn't called (in my testing--the code is very
different from current k.o.) so it doesn't execute the wfi either.

Therefore, you don't see an issue when running PSP code.

> Irrespective of the above problem being present in the h/w, I feel the approach
> of adding platform callbacks for blocking deeper idle states will create problems
> when this is required for multiple peripherals. I agree that the default behavior
> should be to support the deepest idle state based on the peripherals being used but
> IMO the user should have the flexibility to change this behavior if he wishes
> to do so. 

I agree but hopefully this doesn't become common.  The real issue is a
missing hardware feature that--again, hopefully--won't become common.

Mark
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Greer May 4, 2012, 6:29 p.m. UTC | #12
On Fri, May 04, 2012 at 07:31:30AM -0700, Kevin Hilman wrote:
> "Bedia, Vaibhav" <vaibhav.bedia@ti.com> writes:

Hi Kevin.

> > If it does, perhaps there should some other mechanism for letting
> > users control the system behavior.
> 
> Come to think of it, the right solution here is probably to use runtime
> PM.  We could then to add some custom hooks for davinci_emac in the
> device code to use enable_hlt/disable_hlt based on activity.

That was my first thought, actually, but that only works if its
okay for the driver to call enable_hlt/disable_hlt directly (i.e.,
have runtime_suspend() call enable_hlt() and runtime_resume() call
disable_hlt()).  However, I assumed it would _not_ be acceptable for
the driver to issue those calls directly.  Its a platform-specific
issue that we shouldn't be polluting the driver with and there are
currently no drivers that call them under the drivers directory.

If its not okay to call enable_hlt/disable_hlt directly, then we still
need callback hooks to the plaform code (i.e., some version of this
patch).

> In order to do that though, the davinci_emac driver needs to be runtime
> PM converted.

We probably should pm_runtime-ize the driver either way but we need
to resolve the question of whether its okay for the driver to call
enable_hlt/disable_hlt directly or not.  If it is okay, we call them
in runtime_suspend/resume.  If it isn't okay, then we still need 
platform callback hooks.

Mark
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index 174a334..141a888 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -344,6 +344,8 @@  struct emac_priv {
 	/*platform specific members*/
 	void (*int_enable) (void);
 	void (*int_disable) (void);
+	void (*pre_open) (struct net_device *ndev);
+	void (*post_stop) (struct net_device *ndev);
 };
 
 /* clock frequency for EMAC */
@@ -1534,6 +1536,9 @@  static int emac_dev_open(struct net_device *ndev)
 	int k = 0;
 	struct emac_priv *priv = netdev_priv(ndev);
 
+	if (priv->pre_open)
+		priv->pre_open(ndev);
+
 	netif_carrier_off(ndev);
 	for (cnt = 0; cnt < ETH_ALEN; cnt++)
 		ndev->dev_addr[cnt] = priv->mac_addr[cnt];
@@ -1644,6 +1649,10 @@  rollback:
 		res = platform_get_resource(priv->pdev, IORESOURCE_IRQ, k-1);
 		m = res->end;
 	}
+
+	if (priv->post_stop)
+		priv->post_stop(ndev);
+
 	return -EBUSY;
 }
 
@@ -1686,6 +1695,9 @@  static int emac_dev_stop(struct net_device *ndev)
 	if (netif_msg_drv(priv))
 		dev_notice(emac_dev, "DaVinci EMAC: %s stopped\n", ndev->name);
 
+	if (priv->post_stop)
+		priv->post_stop(ndev);
+
 	return 0;
 }
 
@@ -1817,6 +1829,8 @@  static int __devinit davinci_emac_probe(struct platform_device *pdev)
 	priv->version = pdata->version;
 	priv->int_enable = pdata->interrupt_enable;
 	priv->int_disable = pdata->interrupt_disable;
+	priv->pre_open = pdata->pre_open;
+	priv->post_stop = pdata->post_stop;
 
 	priv->coal_intvl = 0;
 	priv->bus_freq_mhz = (u32)(emac_bus_frequency / 1000000);
diff --git a/include/linux/davinci_emac.h b/include/linux/davinci_emac.h
index 5428885..b61e6de 100644
--- a/include/linux/davinci_emac.h
+++ b/include/linux/davinci_emac.h
@@ -39,6 +39,8 @@  struct emac_platform_data {
 	bool no_bd_ram;
 	void (*interrupt_enable) (void);
 	void (*interrupt_disable) (void);
+	void (*pre_open) (struct net_device *ndev);
+	void (*post_stop) (struct net_device *ndev);
 };
 
 enum {