[v3,0/8] PM / ACPI / i2c: Deploy runtime PM centric path for system sleep

Message ID	1504018610-10822-1-git-send-email-ulf.hansson@linaro.org
Headers	show Return-Path: <linux-i2c-owner@vger.kernel.org> From: Ulf Hansson <ulf.hansson@linaro.org> To: Wolfram Sang <wsa@the-dreams.de>, "Rafael J . Wysocki" <rjw@rjwysocki.net>, Len Brown <lenb@kernel.org>, linux-acpi@vger.kernel.org, linux-pm@vger.kernel.org Cc: Kevin Hilman <khilman@kernel.org>, Jarkko Nikula <jarkko.nikula@linux.intel.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Mika Westerberg <mika.westerberg@linux.intel.com>, Jisheng Zhang <jszhang@marvell.com>, John Stultz <john.stultz@linaro.org>, Guodong Xu <guodong.xu@linaro.org>, Sumit Semwal <sumit.semwal@linaro.org>, Haojian Zhuang <haojian.zhuang@linaro.org>, Johannes Stezenbach <js@sig21.net>, linux-arm-kernel@lists.infradead.org, linux-i2c@vger.kernel.org, Ulf Hansson <ulf.hansson@linaro.org> Subject: [PATCH v3 0/8] PM / ACPI / i2c: Deploy runtime PM centric path for system sleep Date: Tue, 29 Aug 2017 16:56:42 +0200 Message-Id: <1504018610-10822-1-git-send-email-ulf.hansson@linaro.org> Sender: linux-i2c-owner@vger.kernel.org Precedence: bulk
Series	PM / ACPI / i2c: Deploy runtime PM centric path for system sleep \| expand [v3,0/8] PM / ACPI / i2c: Deploy runtime PM centric path for system sleep [v3,1/8] PM / Sleep: Make the runtime PM centric path known to the PM core [v3,2/8] PM / ACPI: Restore acpi_subsys_complete() [v3,3/8] PM / Sleep: Remove pm_complete_with_resume_check() [v3,4/8] PM / ACPI: Split code validating need for runtime resume in ->prepare() [v3,5/8] PM / ACPI: Split acpi_lpss_suspend_late\|resume_early() [v3,6/8] PM / ACPI: Enable the runtime PM centric approach for system sleep [v3,7/8] i2c: designware: Don't resume device in the ->complete() callback [v3,8/8] i2c: designware: Deploy the runtime PM centric path for system sleep

Ulf Hansson Aug. 29, 2017, 2:56 p.m. UTC

The i2c designware platform driver, drivers/i2c/busses/i2c-designware-platdrv.c,
isn't well optimized for system sleep.

What makes this driver particularly interesting is because it's a cross-SoC
driver, which sometimes means there is an ACPI PM domain attached to the i2c
device and sometimes not. The driver is being used on both x86 and ARM.

In principle, to optimize the system sleep support in i2c driver, this series
enables the proven runtime PM centric path for the i2c driver. However, to do
that the ACPI PM domain also have to collaborate and understand this behaviour.
From earlier versions, Rafael has also pointed out that also the PM core needs
to be involved.

Therefore a number of changes, patch 1 to patch 6, makes the needed changes to
the PM core and the ACPI PM domain. In patch7 and patch 8, the i2c driver gets
optimized and is converted to the runtime PM centric path for system sleep.

It shall be noted, the behaviour of the ACPI PM domain should remain intact,
still taking benefit of using the direct_complete path during system sleep,
except for those drivers that uses the runtime PM centric path.

This series has been tested on an ARM64 Hikey board, which isn't having the
i2c device attached to the ACPI PM domain. This means that the ACPI changes
needs to be tested on some relevant Intel SoCs and it's greatly appreciated
is someone could help out with this, so is of course review comments.

Some news in v3:
	- The fix for the i2c driver [1], is now present in Linus' tree from tag
	v4.13-rc7 - and so does Rafael's tree.
	- To simplify for testers, I have published a branch [3] based upon
	Rafael's pm tree and linux-next branch.
	- Rephrased some part of the coverletter to clarify the intent of this
	series.
	- Addressed review comments from v2.

Some news in v2:
	- The v1 contained a fix for the i2c driver, this has been sent
	separately [1] and picked up for fixes by Wolfram for v4.13-rcs. However
	the fix has not yet reached Linus' tree. The changes on i2c driver
	are based upon that change.
	- To simplify for testers, I have published a branch [2] based upon
	Rafael's pm tree and linux-next branch, which also includes the above
	patch.
	- Rephrased the coverletter to clarify the intent of this series.
	- Addressed review comments from v1.

[1]
http://patchwork.ozlabs.org/patch/799803/

[2]
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git acpi_pm_i2c_rpm_path_v2

[3]
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git acpi_pm_i2c_rpm_path_v3

Kind regards
Ulf Hansson


Ulf Hansson (8):
  PM / Sleep: Make the runtime PM centric path known to the PM core
  PM / ACPI: Restore acpi_subsys_complete()
  PM / Sleep: Remove pm_complete_with_resume_check()
  PM / ACPI: Split code validating need for runtime resume in
    ->prepare()
  PM / ACPI: Split acpi_lpss_suspend_late|resume_early()
  PM / ACPI: Enable the runtime PM centric approach for system sleep
  i2c: designware: Don't resume device in the ->complete() callback
  i2c: designware: Deploy the runtime PM centric path for system sleep

 drivers/acpi/acpi_lpss.c                    |  79 ++++++++++++++------
 drivers/acpi/device_pm.c                    | 111 ++++++++++++++++++++++------
 drivers/base/power/generic_ops.c            |  23 ------
 drivers/base/power/main.c                   |  49 ++++++++++--
 drivers/base/power/runtime.c                |   1 +
 drivers/i2c/busses/i2c-designware-platdrv.c |  34 ++-------
 include/linux/pm.h                          |   8 +-
 7 files changed, 204 insertions(+), 101 deletions(-)

Rafael J. Wysocki Aug. 29, 2017, 8:19 p.m. UTC | #1

On Tuesday, August 29, 2017 4:56:42 PM CEST Ulf Hansson wrote:
> The i2c designware platform driver, drivers/i2c/busses/i2c-designware-platdrv.c,
> isn't well optimized for system sleep.
> 
> What makes this driver particularly interesting is because it's a cross-SoC
> driver, which sometimes means there is an ACPI PM domain attached to the i2c
> device and sometimes not. The driver is being used on both x86 and ARM.
> 
> In principle, to optimize the system sleep support in i2c driver, this series
> enables the proven runtime PM centric path for the i2c driver. However, to do
> that the ACPI PM domain also have to collaborate and understand this behaviour.
> From earlier versions, Rafael has also pointed out that also the PM core needs
> to be involved.

Earlier today I realized that drivers pointing their ->suspend_late and
->resume_early callbacks, respectively, to pm_runtime_force_suspend() and
pm_runtime_force_resume(), are fundamentally incompatible with any bus type
doing nontrivial PM and with almost any nontrivial PM domains, for two reasons.

First, it basically requires the bus type or PM domain to expect that its
->runtime_suspend callback may or may not be indirectly invoked from its
own ->suspend_late callback, depending on the driver (and analogously
for ->runtime_resume and ->resume early), which is insane.

Second, it is a layering violation, because it makes the driver effectively
override the upper layer's decisions about what code to run.

That's why I'm afraid that we've reached a dead end here. :-(

Thanks,
Rafael

Ulf Hansson Aug. 30, 2017, 9:57 a.m. UTC | #2

On 29 August 2017 at 22:19, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Tuesday, August 29, 2017 4:56:42 PM CEST Ulf Hansson wrote:
>> The i2c designware platform driver, drivers/i2c/busses/i2c-designware-platdrv.c,
>> isn't well optimized for system sleep.
>>
>> What makes this driver particularly interesting is because it's a cross-SoC
>> driver, which sometimes means there is an ACPI PM domain attached to the i2c
>> device and sometimes not. The driver is being used on both x86 and ARM.
>>
>> In principle, to optimize the system sleep support in i2c driver, this series
>> enables the proven runtime PM centric path for the i2c driver. However, to do
>> that the ACPI PM domain also have to collaborate and understand this behaviour.
>> From earlier versions, Rafael has also pointed out that also the PM core needs
>> to be involved.
>
> Earlier today I realized that drivers pointing their ->suspend_late and
> ->resume_early callbacks, respectively, to pm_runtime_force_suspend() and
> pm_runtime_force_resume(), are fundamentally incompatible with any bus type
> doing nontrivial PM and with almost any nontrivial PM domains, for two reasons.
>
> First, it basically requires the bus type or PM domain to expect that its
> ->runtime_suspend callback may or may not be indirectly invoked from its
> own ->suspend_late callback, depending on the driver (and analogously
> for ->runtime_resume and ->resume early), which is insane.
>
> Second, it is a layering violation, because it makes the driver effectively
> override the upper layer's decisions about what code to run.

You are right that for more complex bus types and PM domains, those
needs to play along. So that is what I am trying to implement for the
ACPI PM domain in this series.

The generic PM domain, is simple in this regards. There is only a
minor adaptation for the ->runtime_suspend|resume() callbacks, which
avoids validating dev_pm_qos constraints during system sleep. Nothing
special is needed in ->suspend_late|noirq callbacks, etc.

For most other simple bus types, like the platform bus, spi, i2c,
amba, no particular adoptions is needed at all. Instead those just
trust the drivers to do the right thing.

Before we had the direct_complete path, using the
pm_runtime_force_suspend|resume() helpers, was the only good way for
these kind of drivers, to in an optimized manner, deal with system
sleep when runtime PM also was enabled for their devices. Now this
method has become widely deployed, unfortunate whether you like it or
not.

Besides the slightly better optimizations you get when using
pm_runtime_force_suspend|resume(), comparing to the direct_complete
path - I think it's also worth to consider, how easy it becomes for
drivers to deploy system sleep support. In many cases, only two lines
of code is needed to add system sleep support in a driver.

Now, some complex code always needs to be implemented somewhere. When
using the runtime PM centric path, that code consist of the
pm_runtime_force_suspend|resume() helpers itself - and some
adaptations in buses/PM domains in cases when those needs special
care.

My point is, the runtime PM centric path, allows us to keep the
complex part of the code at a few centralized places, instead of
having it spread/duplicated into drivers.

So yes, you could consider it insane, but to me and many others, it
seems to work out quite well.

Yeah, and the laying violation is undoubtedly the most controversial
part of the runtime PM centric path - I agree to that! The
direct_complete path don't have this, as you implemented it. :-)

On the other hand, one could consider that these upper layers, in many
cases anyway needs to play along with the behavior of the driver. So,
I guess it depends on how one see it.

>
> That's why I'm afraid that we've reached a dead end here. :-(

That's said news. Is was really hoping I could find a way to move this
forward. You don't have any other ideas on how I can adjust the series
to make you happy?

>
> Thanks,
> Rafael
>

Kind regards
Uffe

Rafael J. Wysocki Aug. 31, 2017, 12:17 a.m. UTC | #3

Disclaimer: I'm falling asleep, so I probably shouldn't reply to email right
now, but tomorrow I may not be able to get to email at all, so I'll try anyway.

On Wednesday, August 30, 2017 11:57:28 AM CEST Ulf Hansson wrote:
> On 29 August 2017 at 22:19, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Tuesday, August 29, 2017 4:56:42 PM CEST Ulf Hansson wrote:
> >> The i2c designware platform driver, drivers/i2c/busses/i2c-designware-platdrv.c,
> >> isn't well optimized for system sleep.
> >>
> >> What makes this driver particularly interesting is because it's a cross-SoC
> >> driver, which sometimes means there is an ACPI PM domain attached to the i2c
> >> device and sometimes not. The driver is being used on both x86 and ARM.
> >>
> >> In principle, to optimize the system sleep support in i2c driver, this series
> >> enables the proven runtime PM centric path for the i2c driver. However, to do
> >> that the ACPI PM domain also have to collaborate and understand this behaviour.
> >> From earlier versions, Rafael has also pointed out that also the PM core needs
> >> to be involved.
> >
> > Earlier today I realized that drivers pointing their ->suspend_late and
> > ->resume_early callbacks, respectively, to pm_runtime_force_suspend() and
> > pm_runtime_force_resume(), are fundamentally incompatible with any bus type
> > doing nontrivial PM and with almost any nontrivial PM domains, for two reasons.
> >
> > First, it basically requires the bus type or PM domain to expect that its
> > ->runtime_suspend callback may or may not be indirectly invoked from its
> > own ->suspend_late callback, depending on the driver (and analogously
> > for ->runtime_resume and ->resume early), which is insane.
> >
> > Second, it is a layering violation, because it makes the driver effectively
> > override the upper layer's decisions about what code to run.
> 
> You are right that for more complex bus types and PM domains, those
> needs to play along. So that is what I am trying to implement for the
> ACPI PM domain in this series.

Well, "play along" is a bit of an understatement here.

They would need to turn into horrible mess and that's not going to happen.

> The generic PM domain, is simple in this regards. There is only a
> minor adaptation for the ->runtime_suspend|resume() callbacks, which
> avoids validating dev_pm_qos constraints during system sleep. Nothing
> special is needed in ->suspend_late|noirq callbacks, etc.
> 
> For most other simple bus types, like the platform bus, spi, i2c,
> amba, no particular adoptions is needed at all. Instead those just
> trust the drivers to do the right thing.

They are the trivial ones.

> Before we had the direct_complete path, using the
> pm_runtime_force_suspend|resume() helpers, was the only good way for
> these kind of drivers, to in an optimized manner, deal with system
> sleep when runtime PM also was enabled for their devices. Now this
> method has become widely deployed, unfortunate whether you like it or
> not.

So can you please remind me why the _force_ wrappers are needed?

In particular, why can't drivers arrange their callbacks the way I did that
in https://patchwork.kernel.org/patch/9928583/ ?

> Besides the slightly better optimizations you get when using
> pm_runtime_force_suspend|resume(), comparing to the direct_complete
> path - I think it's also worth to consider, how easy it becomes for
> drivers to deploy system sleep support. In many cases, only two lines
> of code is needed to add system sleep support in a driver.

You are doing a wrong comparison here IMO.  You essentially are comparing two
bandaids with each other and arguing that one of them somehow is better.

What about doing something which is not a bandaid instead?

> Now, some complex code always needs to be implemented somewhere. When
> using the runtime PM centric path, that code consist of the
> pm_runtime_force_suspend|resume() helpers itself - and some
> adaptations in buses/PM domains in cases when those needs special
> care.
> 
> My point is, the runtime PM centric path, allows us to keep the
> complex part of the code at a few centralized places, instead of
> having it spread/duplicated into drivers.
> 
> So yes, you could consider it insane, but to me and many others, it
> seems to work out quite well.

Because it only has been used with trivial middle layer code so far
and I'm quite disappointed that you don't seem to see a problem here. :-/

I mean something like

PM core => bus type / PM domain ->suspend_late => driver ->suspend_late

is far more straightforward than

PM core => bus type / PM domain ->suspend_late => driver ->suspend_late =>
	bus type / PM domain ->runtime_suspend => driver ->runtime_suspend

with the bus type / PM domain having to figure out somehow at the
->suspend_late time whether or not its ->runtume_suspend is going to be invoked
in the middle of it.

Apart from this just being aesthetically disgusting to me, which admittedly is
a matter of personal opinion, it makes debugging new driver code harder (if it
happens to not work) and reviewing it almost impossible, because now you need
to take all of the tangling between callbacks into accont and sometimes not
just for one bus type / PM domain.

> Yeah, and the laying violation is undoubtedly the most controversial
> part of the runtime PM centric path - I agree to that! The
> direct_complete path don't have this, as you implemented it. :-)
> 
> On the other hand, one could consider that these upper layers, in many
> cases anyway needs to play along with the behavior of the driver. So,
> I guess it depends on how one see it.
> 
> >
> > That's why I'm afraid that we've reached a dead end here. :-(
> 
> That's said news. Is was really hoping I could find a way to move this
> forward. You don't have any other ideas on how I can adjust the series
> to make you happy?

So to be precise, patches [2-3/8] are basically fine by me.  Patch [4/8]
sort of works too, but I'd do the splitting slightly differently and I don't
see much value in it alone.

The rest of the ACPI changes is mostly not acceptable to me, mostly because
of what is done to the PM domain's ->runtime_suspend/resume and
->suspend_late/->resume_early callbacks.

I guess the only way that could be made work for me would be by not using
_force_suspend/resume() at all, but that would defeat the point, right?

I don't like the flag too, but that might be worked out.

Also, when I looked at _force_suspend/resume() again, I got concerned.
There is stuff in there that shouldn't be necessary in a driver's
->late_suspend/->early_resume and some things in there just made me
scratch my head.

Thanks,
Rafael

Ulf Hansson Sept. 1, 2017, 10:42 a.m. UTC | #4

On 31 August 2017 at 02:17, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> Disclaimer: I'm falling asleep, so I probably shouldn't reply to email right
> now, but tomorrow I may not be able to get to email at all, so I'll try anyway.
>
> On Wednesday, August 30, 2017 11:57:28 AM CEST Ulf Hansson wrote:
>> On 29 August 2017 at 22:19, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > On Tuesday, August 29, 2017 4:56:42 PM CEST Ulf Hansson wrote:
>> >> The i2c designware platform driver, drivers/i2c/busses/i2c-designware-platdrv.c,
>> >> isn't well optimized for system sleep.
>> >>
>> >> What makes this driver particularly interesting is because it's a cross-SoC
>> >> driver, which sometimes means there is an ACPI PM domain attached to the i2c
>> >> device and sometimes not. The driver is being used on both x86 and ARM.
>> >>
>> >> In principle, to optimize the system sleep support in i2c driver, this series
>> >> enables the proven runtime PM centric path for the i2c driver. However, to do
>> >> that the ACPI PM domain also have to collaborate and understand this behaviour.
>> >> From earlier versions, Rafael has also pointed out that also the PM core needs
>> >> to be involved.
>> >
>> > Earlier today I realized that drivers pointing their ->suspend_late and
>> > ->resume_early callbacks, respectively, to pm_runtime_force_suspend() and
>> > pm_runtime_force_resume(), are fundamentally incompatible with any bus type
>> > doing nontrivial PM and with almost any nontrivial PM domains, for two reasons.
>> >
>> > First, it basically requires the bus type or PM domain to expect that its
>> > ->runtime_suspend callback may or may not be indirectly invoked from its
>> > own ->suspend_late callback, depending on the driver (and analogously
>> > for ->runtime_resume and ->resume early), which is insane.
>> >
>> > Second, it is a layering violation, because it makes the driver effectively
>> > override the upper layer's decisions about what code to run.
>>
>> You are right that for more complex bus types and PM domains, those
>> needs to play along. So that is what I am trying to implement for the
>> ACPI PM domain in this series.
>
> Well, "play along" is a bit of an understatement here.
>
> They would need to turn into horrible mess and that's not going to happen.

I absolutely agree, there must be no mess what so ever!

But, I don't want to give up yet, I still believe I can make this
series into a nice couple of changes for the ACPI PM domain.
Especially if you continue giving me your guidance.

>
>> The generic PM domain, is simple in this regards. There is only a
>> minor adaptation for the ->runtime_suspend|resume() callbacks, which
>> avoids validating dev_pm_qos constraints during system sleep. Nothing
>> special is needed in ->suspend_late|noirq callbacks, etc.
>>
>> For most other simple bus types, like the platform bus, spi, i2c,
>> amba, no particular adoptions is needed at all. Instead those just
>> trust the drivers to do the right thing.
>
> They are the trivial ones.

Yes.

However, the platform bus is also very commonly used in kernel. I
think that's an important thing to consider.

>
>> Before we had the direct_complete path, using the
>> pm_runtime_force_suspend|resume() helpers, was the only good way for
>> these kind of drivers, to in an optimized manner, deal with system
>> sleep when runtime PM also was enabled for their devices. Now this
>> method has become widely deployed, unfortunate whether you like it or
>> not.
>
> So can you please remind me why the _force_ wrappers are needed?

See below.

>
> In particular, why can't drivers arrange their callbacks the way I did that
> in https://patchwork.kernel.org/patch/9928583/ ?

I was preparing a reply to that patch, but let me summarize that here instead.

Let me be clear, the patch is an improvement of the behavior of the
driver and it addresses the issues you point out in the change log.
Re-using the runtime PM callbacks for system sleep, is nice as it
avoids open coding, which is of curse also one of the reason of using
pm_runtime_force_suspend|resume().

Still there are a couple of things I am worried about in this patch.
*)
To be able to re-use the same callbacks for system sleep and runtime
PM, some boilerplate code is added to the driver, as to cope with the
different conditions inside the callbacks. That pattern would become
repeated to many drivers dealing with similar issues.

**)
The ->resume_early() callback powers on the device, in case it was
runtime resumed when the ->suspend_late() callback was invoked. That
is in many cases completely unnecessary, causing us to waste power and
increase system resume time, for absolutely no reason. However, I
understand the patch didn't try to address this, but to really fix
this, there has to be an even closer collaboration between runtime PM
and the system sleep callbacks.

So, to remind you why the pm_runtime_force_suspend|resume() helpers is
preferred, that's because both of the above two things becomes taken
care of.

>
>> Besides the slightly better optimizations you get when using
>> pm_runtime_force_suspend|resume(), comparing to the direct_complete
>> path - I think it's also worth to consider, how easy it becomes for
>> drivers to deploy system sleep support. In many cases, only two lines
>> of code is needed to add system sleep support in a driver.
>
> You are doing a wrong comparison here IMO.  You essentially are comparing two
> bandaids with each other and arguing that one of them somehow is better.

I just wanted to compare against something...

>
> What about doing something which is not a bandaid instead?

I don't have a problem working on something new, but I am not sure
what that should be.

Unless you re-consider moving forward in some form, with the current
suggested approach for the ACPI PM domain, can you give me some
pointers on what you have in mind?

To remind you of my current view, the direct_complete path is useful
for PM domains, like the ACPI PM domain as it impacts all its devices.
Using pm_runtime_force_suspend|resume() offers the next steps to
achieve a fully optimized behavior of a device during system sleep, as
already been proven by now. It would be great to both options
supported by the ACPI PM domain.

Another related thing that is causing lots of problems during system
sleep of devices, but not related to optimizations, is to have the
correct order of how to suspend/resume the devices. We have talked
about this, but it's a separate problem and it's rather a deployment
issue, than having to implements something entirely new (we have
supplies/consumers links you invented for this).

>
>> Now, some complex code always needs to be implemented somewhere. When
>> using the runtime PM centric path, that code consist of the
>> pm_runtime_force_suspend|resume() helpers itself - and some
>> adaptations in buses/PM domains in cases when those needs special
>> care.
>>
>> My point is, the runtime PM centric path, allows us to keep the
>> complex part of the code at a few centralized places, instead of
>> having it spread/duplicated into drivers.
>>
>> So yes, you could consider it insane, but to me and many others, it
>> seems to work out quite well.
>
> Because it only has been used with trivial middle layer code so far
> and I'm quite disappointed that you don't seem to see a problem here. :-/
>
> I mean something like
>
> PM core => bus type / PM domain ->suspend_late => driver ->suspend_late
>
> is far more straightforward than
>
> PM core => bus type / PM domain ->suspend_late => driver ->suspend_late =>
>         bus type / PM domain ->runtime_suspend => driver ->runtime_suspend
>
> with the bus type / PM domain having to figure out somehow at the
> ->suspend_late time whether or not its ->runtume_suspend is going to be invoked
> in the middle of it.
>
> Apart from this just being aesthetically disgusting to me, which admittedly is
> a matter of personal opinion, it makes debugging new driver code harder (if it
> happens to not work) and reviewing it almost impossible, because now you need
> to take all of the tangling between callbacks into accont and sometimes not
> just for one bus type / PM domain.

I am wondering that perhaps you may be overlooking some of the
internals of runtime PM. Or maybe not? :-)

I mean, the hole thing is build upon that anyone can call runtime PM
functions to runtime resume/suspend a device. Doing that, makes the
hierarchy of the runtime PM callbacks being walked and invoked, of
course properly managed by the runtime PM core.

My point is that, the runtime PM core still controls this behavior,
even when the pm_runtime_force_suspend|resume() helpers are being
invoked. The only difference is that it allows runtime PM for the
device to be disabled, and still correctly invoked the callbacks. That
is what it is all about.

>
>> Yeah, and the laying violation is undoubtedly the most controversial
>> part of the runtime PM centric path - I agree to that! The
>> direct_complete path don't have this, as you implemented it. :-)
>>
>> On the other hand, one could consider that these upper layers, in many
>> cases anyway needs to play along with the behavior of the driver. So,
>> I guess it depends on how one see it.
>>
>> >
>> > That's why I'm afraid that we've reached a dead end here. :-(
>>
>> That's said news. Is was really hoping I could find a way to move this
>> forward. You don't have any other ideas on how I can adjust the series
>> to make you happy?
>
> So to be precise, patches [2-3/8] are basically fine by me.  Patch [4/8]
> sort of works too, but I'd do the splitting slightly differently and I don't
> see much value in it alone.
>
> The rest of the ACPI changes is mostly not acceptable to me, mostly because
> of what is done to the PM domain's ->runtime_suspend/resume and
> ->suspend_late/->resume_early callbacks.
>
> I guess the only way that could be made work for me would be by not using
> _force_suspend/resume() at all, but that would defeat the point, right?

Yes, it would.

>
> I don't like the flag too, but that might be worked out.

Yeah, I am open to any suggestion.

>
> Also, when I looked at _force_suspend/resume() again, I got concerned.
> There is stuff in there that shouldn't be necessary in a driver's
> ->late_suspend/->early_resume and some things in there just made me
> scratch my head.

Yes, there are some complexity in there, I will be happy to answer any
specific question about it.

The main thing is, that it tries to conform to the regular rules set
by the runtime PM core when runtime PM is enabled for the device - and
then apply those to the device when runtime PM has been disabled for
it.

Again, thanks for being patient and reviewing!

Kind regards
Uffe

Rafael J. Wysocki Sept. 4, 2017, 12:17 a.m. UTC | #5

On Friday, September 1, 2017 12:42:35 PM CEST Ulf Hansson wrote:
> On 31 August 2017 at 02:17, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > Disclaimer: I'm falling asleep, so I probably shouldn't reply to email right
> > now, but tomorrow I may not be able to get to email at all, so I'll try anyway.
> >
> > On Wednesday, August 30, 2017 11:57:28 AM CEST Ulf Hansson wrote:
> >> On 29 August 2017 at 22:19, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >> > On Tuesday, August 29, 2017 4:56:42 PM CEST Ulf Hansson wrote:
> >> >> The i2c designware platform driver, drivers/i2c/busses/i2c-designware-platdrv.c,
> >> >> isn't well optimized for system sleep.
> >> >>
> >> >> What makes this driver particularly interesting is because it's a cross-SoC
> >> >> driver, which sometimes means there is an ACPI PM domain attached to the i2c
> >> >> device and sometimes not. The driver is being used on both x86 and ARM.
> >> >>
> >> >> In principle, to optimize the system sleep support in i2c driver, this series
> >> >> enables the proven runtime PM centric path for the i2c driver. However, to do
> >> >> that the ACPI PM domain also have to collaborate and understand this behaviour.
> >> >> From earlier versions, Rafael has also pointed out that also the PM core needs
> >> >> to be involved.
> >> >
> >> > Earlier today I realized that drivers pointing their ->suspend_late and
> >> > ->resume_early callbacks, respectively, to pm_runtime_force_suspend() and
> >> > pm_runtime_force_resume(), are fundamentally incompatible with any bus type
> >> > doing nontrivial PM and with almost any nontrivial PM domains, for two reasons.
> >> >
> >> > First, it basically requires the bus type or PM domain to expect that its
> >> > ->runtime_suspend callback may or may not be indirectly invoked from its
> >> > own ->suspend_late callback, depending on the driver (and analogously
> >> > for ->runtime_resume and ->resume early), which is insane.
> >> >
> >> > Second, it is a layering violation, because it makes the driver effectively
> >> > override the upper layer's decisions about what code to run.
> >>
> >> You are right that for more complex bus types and PM domains, those
> >> needs to play along. So that is what I am trying to implement for the
> >> ACPI PM domain in this series.
> >
> > Well, "play along" is a bit of an understatement here.
> >
> > They would need to turn into horrible mess and that's not going to happen.
> 
> I absolutely agree, there must be no mess what so ever!
> 
> But, I don't want to give up yet, I still believe I can make this
> series into a nice couple of changes for the ACPI PM domain.

Well, as far as I'm concerned, this is not going to get any further.

> Especially if you continue giving me your guidance.
> 
> >
> >> The generic PM domain, is simple in this regards. There is only a
> >> minor adaptation for the ->runtime_suspend|resume() callbacks, which
> >> avoids validating dev_pm_qos constraints during system sleep. Nothing
> >> special is needed in ->suspend_late|noirq callbacks, etc.
> >>
> >> For most other simple bus types, like the platform bus, spi, i2c,
> >> amba, no particular adoptions is needed at all. Instead those just
> >> trust the drivers to do the right thing.
> >
> > They are the trivial ones.
> 
> Yes.
> 
> However, the platform bus is also very commonly used in kernel. I
> think that's an important thing to consider.

Well, so is ACPI, and PCI.

And the platform bus doesn't do any kind of PM handling by itself,
whereas the above do.  Therefore trying to look at the platform bus as
an example to follow by them is rather not useful IMO.

> >
> >> Before we had the direct_complete path, using the
> >> pm_runtime_force_suspend|resume() helpers, was the only good way for
> >> these kind of drivers, to in an optimized manner, deal with system
> >> sleep when runtime PM also was enabled for their devices. Now this
> >> method has become widely deployed, unfortunate whether you like it or
> >> not.
> >
> > So can you please remind me why the _force_ wrappers are needed?
> 
> See below.
> 
> >
> > In particular, why can't drivers arrange their callbacks the way I did that
> > in https://patchwork.kernel.org/patch/9928583/ ?
> 
> I was preparing a reply to that patch, but let me summarize that here instead.
> 
> Let me be clear, the patch is an improvement of the behavior of the
> driver and it addresses the issues you point out in the change log.
> Re-using the runtime PM callbacks for system sleep, is nice as it
> avoids open coding, which is of curse also one of the reason of using
> pm_runtime_force_suspend|resume().
> 
> Still there are a couple of things I am worried about in this patch.
> *)
> To be able to re-use the same callbacks for system sleep and runtime
> PM, some boilerplate code is added to the driver, as to cope with the
> different conditions inside the callbacks. That pattern would become
> repeated to many drivers dealing with similar issues.

I'm not worried about that as long as there are good examples and
documented best practices.

There aren't any right now, which is a problem, but that certainly is
fixable.

> **)
> The ->resume_early() callback powers on the device, in case it was
> runtime resumed when the ->suspend_late() callback was invoked. That
> is in many cases completely unnecessary, causing us to waste power and
> increase system resume time, for absolutely no reason. However, I
> understand the patch didn't try to address this, but to really fix
> this, there has to be an even closer collaboration between runtime PM
> and the system sleep callbacks.

I don't quite agree and here's why.

If a device was not runtime-suspended right before system suspend, then quite
likely it was in use then.  Therefore it is quite likely to be resumed
immediately after system resume anyway.

Now, if that's just one device, it probably doesn't matter, but if there are
more devices like that, they will be resumed after system suspend when they
are accessed and quite likely they will be accessed one-by-one rather than in
parallel with each other, so the latencies related to that will add up.  In
that case it is better to resume them upfront during system resume as they will
be resumed in parallel with each other then.  And that also is *way* simpler.

This means that the benefit from avoiding to resume devices during system
resume is not quite so obvious and the whole point above is highly
questionable.

> 
> So, to remind you why the pm_runtime_force_suspend|resume() helpers is
> preferred, that's because both of the above two things becomes taken
> care of.

And that is why there is this stuff about parents and usage counters, right?

I'm not liking it at all.

> 
> >
> >> Besides the slightly better optimizations you get when using
> >> pm_runtime_force_suspend|resume(), comparing to the direct_complete
> >> path - I think it's also worth to consider, how easy it becomes for
> >> drivers to deploy system sleep support. In many cases, only two lines
> >> of code is needed to add system sleep support in a driver.
> >
> > You are doing a wrong comparison here IMO.  You essentially are comparing two
> > bandaids with each other and arguing that one of them somehow is better.
> 
> I just wanted to compare against something...
> 
> >
> > What about doing something which is not a bandaid instead?
> 
> I don't have a problem working on something new, but I am not sure
> what that should be.
> 
> Unless you re-consider moving forward in some form, with the current
> suggested approach for the ACPI PM domain, can you give me some
> pointers on what you have in mind?

Yes.

Do what was indended from the start and make drivers re-use runtime
PM callbacks for ->suspend_late and ->resume_early.

First, that can be done.

Second, it is *conceptually* much more straightforward than things like
_force_suspend/resume().

Next, the drivers have full control on what they do in that case and
can be made work with any middle-layer core easily enough.

No layering violations, no insane callback chains.

> To remind you of my current view, the direct_complete path is useful
> for PM domains, like the ACPI PM domain as it impacts all its devices.
> Using pm_runtime_force_suspend|resume() offers the next steps to

I completely disagree at this point.

So to be clear, the invocation of moddle-layer callbacks instead of
*driver* callbacks in pm_runtime_force_suspend|resume() is a grave mistake.
They would have been almost possible to work with had they just invoke
driver callbacks.

OTOH I'm starting to think that direct_complete is only theoretically useful
and may not be actually set very often in practice, whereas it adds significant
complexity to the code, so I'm not sure about it any more.

> achieve a fully optimized behavior of a device during system sleep, as
> already been proven by now. It would be great to both options
> supported by the ACPI PM domain.

No.

> Another related thing that is causing lots of problems during system
> sleep of devices, but not related to optimizations, is to have the
> correct order of how to suspend/resume the devices. We have talked
> about this, but it's a separate problem and it's rather a deployment
> issue, than having to implements something entirely new (we have
> supplies/consumers links you invented for this).

Yes, that is a separate issue.

> >
> >> Now, some complex code always needs to be implemented somewhere. When
> >> using the runtime PM centric path, that code consist of the
> >> pm_runtime_force_suspend|resume() helpers itself - and some
> >> adaptations in buses/PM domains in cases when those needs special
> >> care.
> >>
> >> My point is, the runtime PM centric path, allows us to keep the
> >> complex part of the code at a few centralized places, instead of
> >> having it spread/duplicated into drivers.
> >>
> >> So yes, you could consider it insane, but to me and many others, it
> >> seems to work out quite well.
> >
> > Because it only has been used with trivial middle layer code so far
> > and I'm quite disappointed that you don't seem to see a problem here. :-/
> >
> > I mean something like
> >
> > PM core => bus type / PM domain ->suspend_late => driver ->suspend_late
> >
> > is far more straightforward than
> >
> > PM core => bus type / PM domain ->suspend_late => driver ->suspend_late =>
> >         bus type / PM domain ->runtime_suspend => driver ->runtime_suspend
> >
> > with the bus type / PM domain having to figure out somehow at the
> > ->suspend_late time whether or not its ->runtume_suspend is going to be invoked
> > in the middle of it.
> >
> > Apart from this just being aesthetically disgusting to me, which admittedly is
> > a matter of personal opinion, it makes debugging new driver code harder (if it
> > happens to not work) and reviewing it almost impossible, because now you need
> > to take all of the tangling between callbacks into accont and sometimes not
> > just for one bus type / PM domain.
> 
> I am wondering that perhaps you may be overlooking some of the
> internals of runtime PM. Or maybe not? :-)
> 
> I mean, the hole thing is build upon that anyone can call runtime PM
> functions to runtime resume/suspend a device.

Well, right in general, except that _force_suspend/resume() invoke
*callbacks* and *not* runtime PM functions.

> Doing that, makes the
> hierarchy of the runtime PM callbacks being walked and invoked, of
> course properly managed by the runtime PM core.
> 
> My point is that, the runtime PM core still controls this behavior,
> even when the pm_runtime_force_suspend|resume() helpers are being
> invoked. The only difference is that it allows runtime PM for the
> device to be disabled, and still correctly invoked the callbacks. That
> is what it is all about.

So why is it even useful to call ->runtime_suspend from a middle layer
in pm_runtime_force_suspend(), for example?

> >
> >> Yeah, and the laying violation is undoubtedly the most controversial
> >> part of the runtime PM centric path - I agree to that! The
> >> direct_complete path don't have this, as you implemented it. :-)
> >>
> >> On the other hand, one could consider that these upper layers, in many
> >> cases anyway needs to play along with the behavior of the driver. So,
> >> I guess it depends on how one see it.
> >>
> >> >
> >> > That's why I'm afraid that we've reached a dead end here. :-(
> >>
> >> That's said news. Is was really hoping I could find a way to move this
> >> forward. You don't have any other ideas on how I can adjust the series
> >> to make you happy?
> >
> > So to be precise, patches [2-3/8] are basically fine by me.  Patch [4/8]
> > sort of works too, but I'd do the splitting slightly differently and I don't
> > see much value in it alone.
> >
> > The rest of the ACPI changes is mostly not acceptable to me, mostly because
> > of what is done to the PM domain's ->runtime_suspend/resume and
> > ->suspend_late/->resume_early callbacks.
> >
> > I guess the only way that could be made work for me would be by not using
> > _force_suspend/resume() at all, but that would defeat the point, right?
> 
> Yes, it would.
> 
> >
> > I don't like the flag too, but that might be worked out.
> 
> Yeah, I am open to any suggestion.
> 
> >
> > Also, when I looked at _force_suspend/resume() again, I got concerned.
> > There is stuff in there that shouldn't be necessary in a driver's
> > ->late_suspend/->early_resume and some things in there just made me
> > scratch my head.
> 
> Yes, there are some complexity in there, I will be happy to answer any
> specific question about it.

OK

Of course they require runtime PM to be enabled by drivers using them as
their callbacks, but I suppose that you realize that.

Why to disabe/renable runtime PM in there in the first place?  That should
have been done by the core when these functions are intended to be called.

Second, why to use RPM_GET_CALLBACK in there?

Next, how is the parent actually runtime-resumed by pm_runtime_force_resume()
which the comment in pm_runtime_force_suspend() talks about?

> 
> The main thing is, that it tries to conform to the regular rules set
> by the runtime PM core when runtime PM is enabled for the device - and
> then apply those to the device when runtime PM has been disabled for
> it.

Sorry, I'm not sure what this means really ...

> Again, thanks for being patient and reviewing!

Well, no problem.

Thanks,
Rafael

Lukas Wunner Sept. 4, 2017, 5:46 a.m. UTC | #6

On Mon, Sep 04, 2017 at 02:17:21AM +0200, Rafael J. Wysocki wrote:
> OTOH I'm starting to think that direct_complete is only theoretically
> useful and may not be actually set very often in practice, whereas it
> adds significant complexity to the code, so I'm not sure about it any
> more.

That makes me come out of the woodwork as a direct_complete fan:

Runtime resuming a discrete GPU on a modern dual GPU laptop takes about
1.5 sec, runtime resuming Thunderbolt controllers more than 2 sec.
A discrete GPU easily consumes 10W, a Thunderbolt controller 2W.

So not having direct_complete would noticeably delay system suspend and
resume as well as reduce battery life.

Lukas

Rafael J. Wysocki Sept. 4, 2017, 10:04 a.m. UTC | #7

On Monday, September 4, 2017 7:46:37 AM CEST Lukas Wunner wrote:
> On Mon, Sep 04, 2017 at 02:17:21AM +0200, Rafael J. Wysocki wrote:
> > OTOH I'm starting to think that direct_complete is only theoretically
> > useful and may not be actually set very often in practice, whereas it
> > adds significant complexity to the code, so I'm not sure about it any
> > more.
> 
> That makes me come out of the woodwork as a direct_complete fan:
> 
> Runtime resuming a discrete GPU on a modern dual GPU laptop takes about
> 1.5 sec, runtime resuming Thunderbolt controllers more than 2 sec.
> A discrete GPU easily consumes 10W, a Thunderbolt controller 2W.
> 
> So not having direct_complete would noticeably delay system suspend and
> resume as well as reduce battery life.

Well, that's a good reason for having it. :-)

Thanks,
Rafael

Ulf Hansson Sept. 4, 2017, 12:55 p.m. UTC | #8

[...]

>> > So can you please remind me why the _force_ wrappers are needed?
>>
>> See below.
>>
>> >
>> > In particular, why can't drivers arrange their callbacks the way I did that
>> > in https://patchwork.kernel.org/patch/9928583/ ?
>>
>> I was preparing a reply to that patch, but let me summarize that here instead.
>>
>> Let me be clear, the patch is an improvement of the behavior of the
>> driver and it addresses the issues you point out in the change log.
>> Re-using the runtime PM callbacks for system sleep, is nice as it
>> avoids open coding, which is of curse also one of the reason of using
>> pm_runtime_force_suspend|resume().
>>
>> Still there are a couple of things I am worried about in this patch.
>> *)
>> To be able to re-use the same callbacks for system sleep and runtime
>> PM, some boilerplate code is added to the driver, as to cope with the
>> different conditions inside the callbacks. That pattern would become
>> repeated to many drivers dealing with similar issues.
>
> I'm not worried about that as long as there are good examples and
> documented best practices.
>
> There aren't any right now, which is a problem, but that certainly is
> fixable.
>
>> **)
>> The ->resume_early() callback powers on the device, in case it was
>> runtime resumed when the ->suspend_late() callback was invoked. That
>> is in many cases completely unnecessary, causing us to waste power and
>> increase system resume time, for absolutely no reason. However, I
>> understand the patch didn't try to address this, but to really fix
>> this, there has to be an even closer collaboration between runtime PM
>> and the system sleep callbacks.
>
> I don't quite agree and here's why.
>
> If a device was not runtime-suspended right before system suspend, then quite
> likely it was in use then.  Therefore it is quite likely to be resumed
> immediately after system resume anyway.

Unfortunate, to always make that assumption, leads to a non-optimized
behavior of system sleep. I think we can do better than that!

Let me give you a concrete example, where the above assumption would
lead to an non-optimized behavior.

To put an MMC card into low power state during system suspend
(covering eMMC, SD, SDIO) the mmc core needs to send a couple of
commands over the MMC interface to the card, as to conform with the
(e)MMC/SD/SDIO spec. To do this, the mmc driver for the mmc controller
must runtime resume its device, as to be able to send the commands
over the interface.

Now, when the system resumes, there is absolutely no reason to runtime
resume the device for the MMC controller, just because it was runtime
resumed during system suspend. Instead that is better to be postponed
to when the MMC card is really needed and thus via runtime PM instead.

This scenario shouldn't be specific to only MMC controllers/cards, but
should apply to any external devices/controllers that needs some
special treatment to be put into low power state during system
suspend. Particularly also when those external devices may be left in
that low power state until those are really needed. A couple of cases
I know of pops up in my head, WiFi chips, persistent storage devices,
etc. There should be plenty.

Another common case, is when a subsystem core layer flushes a request
queue during system suspend, which may cause a controller device to be
runtime resumed. Making the assumption that, because flushing the
queue was done during system suspend, we must also power up the
controller during system resume, again would lead to a non-optimized
behavior.

>
> Now, if that's just one device, it probably doesn't matter, but if there are
> more devices like that, they will be resumed after system suspend when they
> are accessed and quite likely they will be accessed one-by-one rather than in
> parallel with each other, so the latencies related to that will add up.  In
> that case it is better to resume them upfront during system resume as they will
> be resumed in parallel with each other then.  And that also is *way* simpler.
>
> This means that the benefit from avoiding to resume devices during system
> resume is not quite so obvious and the whole point above is highly
> questionable.

I hope my reasoning above explains why I think it shouldn't be
considered as questionable.

If you like, I can also provide some real data/logs - showing you
what's happening.

>
>>
>> So, to remind you why the pm_runtime_force_suspend|resume() helpers is
>> preferred, that's because both of the above two things becomes taken
>> care of.
>
> And that is why there is this stuff about parents and usage counters, right?

Correct. Perhaps this commit tells you a little more.

commit 1d9174fbc55ec99ccbfcafa3de2528ef78a849aa
Author: Ulf Hansson <ulf.hansson@linaro.org>
Date:   Thu Oct 13 16:58:54 2016 +0200

    PM / Runtime: Defer resuming of the device in pm_runtime_force_resume()

[...]

>> >
>> > PM core => bus type / PM domain ->suspend_late => driver ->suspend_late
>> >
>> > is far more straightforward than
>> >
>> > PM core => bus type / PM domain ->suspend_late => driver ->suspend_late =>
>> >         bus type / PM domain ->runtime_suspend => driver ->runtime_suspend
>> >
>> > with the bus type / PM domain having to figure out somehow at the
>> > ->suspend_late time whether or not its ->runtume_suspend is going to be invoked
>> > in the middle of it.
>> >
>> > Apart from this just being aesthetically disgusting to me, which admittedly is
>> > a matter of personal opinion, it makes debugging new driver code harder (if it
>> > happens to not work) and reviewing it almost impossible, because now you need
>> > to take all of the tangling between callbacks into accont and sometimes not
>> > just for one bus type / PM domain.
>>
>> I am wondering that perhaps you may be overlooking some of the
>> internals of runtime PM. Or maybe not? :-)
>>
>> I mean, the hole thing is build upon that anyone can call runtime PM
>> functions to runtime resume/suspend a device.
>
> Well, right in general, except that _force_suspend/resume() invoke
> *callbacks* and *not* runtime PM functions.

I am considering pm_runtime_force_suspend|resume() being a part of the
runtime PM API, except that those may be called only during system
sleep.

Comparing a call to pm_runtime_resume(); this may trigger rpm_resume()
to invoke the callbacks. To me, the difference is that the conditions
looked at in rpm_resume(), when runtime PM is enabled, becomes
different for system sleep when runtime PM is disabled - and that is
taken care of in pm_runtime_force_suspend|resume().

>
>> Doing that, makes the
>> hierarchy of the runtime PM callbacks being walked and invoked, of
>> course properly managed by the runtime PM core.
>>
>> My point is that, the runtime PM core still controls this behavior,
>> even when the pm_runtime_force_suspend|resume() helpers are being
>> invoked. The only difference is that it allows runtime PM for the
>> device to be disabled, and still correctly invoked the callbacks. That
>> is what it is all about.
>
> So why is it even useful to call ->runtime_suspend from a middle layer
> in pm_runtime_force_suspend(), for example?

Perhaps I don't understand the question correctly.

Anyway, the answer I think of, is probably because of the same reason
to why the runtime PM core invokes it, when it runs rpm_suspend() for
a device. My point is, we want the similar behavior.

[...]

>> >
>> > Also, when I looked at _force_suspend/resume() again, I got concerned.
>> > There is stuff in there that shouldn't be necessary in a driver's
>> > ->late_suspend/->early_resume and some things in there just made me
>> > scratch my head.
>>
>> Yes, there are some complexity in there, I will be happy to answer any
>> specific question about it.
>
> OK
>
> Of course they require runtime PM to be enabled by drivers using them as
> their callbacks, but I suppose that you realize that.
>
> Why to disabe/renable runtime PM in there in the first place?  That should
> have been done by the core when these functions are intended to be called.

The reason is because we didn't want to re-strict them to be used only
in ->suspend_late() and ->resume_early(), but also for ->suspend() and
->resume(), which is when runtime PM still is enabled.

>
> Second, why to use RPM_GET_CALLBACK in there?

To follow the same rules/hierarchy, as being done in rpm_suspend|resume().

>
> Next, how is the parent actually runtime-resumed by pm_runtime_force_resume()
> which the comment in pm_runtime_force_suspend() talks about?

I think the relevant use case here is when a parent and a child, both
have subsystems/drivers using pm_runtime_force_suspend|resume(). If
that isn't the case, we expect that the parent is always resumed
during system resume. It's a bit fragile approach, so we perhaps we
should deal with it, even if the hole thing is used as opt-in.

Anyway, let's focus on the case which I think is most relevant to your question:

A couple of conditions to start with.
*) The PM core system suspends a child prior a parent, which leads to
pm_runtime_force_suspend() being called for the child first.
**) The PM core system resumes a parents before a child, thus
pm_runtime_force_resume() is called for the parent first.

In case a child don't need to be resumed when
pm_runtime_force_resume() is called for it, likely doesn't its parent.
However, to control that, in system suspend the
pm_runtime_force_suspend() increases the usage counter for the parent,
as to indicate if it needs to be resumed when
pm_runtime_force_resume() is called for it.

Finally, when the child becomes resumed in pm_runtime_force_resume(),
pm_runtime_set_active() is called for it. This verifies that the
parent also has been resumed properly.

[...]

Kind regards
Uffe

Rafael J. Wysocki Sept. 6, 2017, 12:52 a.m. UTC | #9

On Monday, September 4, 2017 2:55:37 PM CEST Ulf Hansson wrote:
> [...]
> 
> >> > So can you please remind me why the _force_ wrappers are needed?
> >>
> >> See below.
> >>
> >> >
> >> > In particular, why can't drivers arrange their callbacks the way I did that
> >> > in https://patchwork.kernel.org/patch/9928583/ ?
> >>
> >> I was preparing a reply to that patch, but let me summarize that here instead.
> >>
> >> Let me be clear, the patch is an improvement of the behavior of the
> >> driver and it addresses the issues you point out in the change log.
> >> Re-using the runtime PM callbacks for system sleep, is nice as it
> >> avoids open coding, which is of curse also one of the reason of using
> >> pm_runtime_force_suspend|resume().
> >>
> >> Still there are a couple of things I am worried about in this patch.
> >> *)
> >> To be able to re-use the same callbacks for system sleep and runtime
> >> PM, some boilerplate code is added to the driver, as to cope with the
> >> different conditions inside the callbacks. That pattern would become
> >> repeated to many drivers dealing with similar issues.
> >
> > I'm not worried about that as long as there are good examples and
> > documented best practices.
> >
> > There aren't any right now, which is a problem, but that certainly is
> > fixable.
> >
> >> **)
> >> The ->resume_early() callback powers on the device, in case it was
> >> runtime resumed when the ->suspend_late() callback was invoked. That
> >> is in many cases completely unnecessary, causing us to waste power and
> >> increase system resume time, for absolutely no reason. However, I
> >> understand the patch didn't try to address this, but to really fix
> >> this, there has to be an even closer collaboration between runtime PM
> >> and the system sleep callbacks.
> >
> > I don't quite agree and here's why.
> >
> > If a device was not runtime-suspended right before system suspend, then quite
> > likely it was in use then.  Therefore it is quite likely to be resumed
> > immediately after system resume anyway.
> 
> Unfortunate, to always make that assumption, leads to a non-optimized
> behavior of system sleep. I think we can do better than that!
> 
> Let me give you a concrete example, where the above assumption would
> lead to an non-optimized behavior.
> 
> To put an MMC card into low power state during system suspend
> (covering eMMC, SD, SDIO) the mmc core needs to send a couple of
> commands over the MMC interface to the card, as to conform with the
> (e)MMC/SD/SDIO spec. To do this, the mmc driver for the mmc controller
> must runtime resume its device, as to be able to send the commands
> over the interface.
> 
> Now, when the system resumes, there is absolutely no reason to runtime
> resume the device for the MMC controller, just because it was runtime
> resumed during system suspend. Instead that is better to be postponed
> to when the MMC card is really needed and thus via runtime PM instead.

Yes, in this particular case it makes more sense to defer the resume of
the device, but there also are cases in which doing that leads to
suboptimal behavior.

> This scenario shouldn't be specific to only MMC controllers/cards, but
> should apply to any external devices/controllers that needs some
> special treatment to be put into low power state during system
> suspend. Particularly also when those external devices may be left in
> that low power state until those are really needed. A couple of cases
> I know of pops up in my head, WiFi chips, persistent storage devices,
> etc. There should be plenty.
> 
> Another common case, is when a subsystem core layer flushes a request
> queue during system suspend, which may cause a controller device to be
> runtime resumed. Making the assumption that, because flushing the
> queue was done during system suspend, we must also power up the
> controller during system resume, again would lead to a non-optimized
> behavior.

I understand that.

However, from a driver perspective, the most straightforward thing to do
is to restore the previous state of the device during system resume,
because that guarantees correctness.  Anything else is tricky and need to
be done with extra care.  Drivers *must* know what they are doing when
they are doing such things.

> >
> > Now, if that's just one device, it probably doesn't matter, but if there are
> > more devices like that, they will be resumed after system suspend when they
> > are accessed and quite likely they will be accessed one-by-one rather than in
> > parallel with each other, so the latencies related to that will add up.  In
> > that case it is better to resume them upfront during system resume as they will
> > be resumed in parallel with each other then.  And that also is *way* simpler.
> >
> > This means that the benefit from avoiding to resume devices during system
> > resume is not quite so obvious and the whole point above is highly
> > questionable.
> 
> I hope my reasoning above explains why I think it shouldn't be
> considered as questionable.
> 
> If you like, I can also provide some real data/logs - showing you
> what's happening.
> 

That's not necessary, this behavior can be useful and there are arguments for
doing it in *some* cases, but all of this argumentation applies to devices
that aren't going to be used right after system resume.  If they *are* going
to be used then, it very well may be better to resume them as part of
system resume instead of deferring that.

The tricky part is that at the point the resume callbacks run it is not known
whether or not the device is going to be accessed shortly and the decision made
either way may be suboptimal.

[Note: I know that people mostly care about seeing the screen on, but in fact
they should *also* care about the touch panel being ready to respond to
touches, for example.  If it isn't ready and the system suspends again
after a while because of that, the experience is somehwat less than fantastic.]

> >>
> >> So, to remind you why the pm_runtime_force_suspend|resume() helpers is
> >> preferred, that's because both of the above two things becomes taken
> >> care of.
> >
> > And that is why there is this stuff about parents and usage counters, right?
> 
> Correct. Perhaps this commit tells you a little more.
> 
> commit 1d9174fbc55ec99ccbfcafa3de2528ef78a849aa
> Author: Ulf Hansson <ulf.hansson@linaro.org>
> Date:   Thu Oct 13 16:58:54 2016 +0200
> 
>     PM / Runtime: Defer resuming of the device in pm_runtime_force_resume()
> 
> [...]
> 
> >> >
> >> > PM core => bus type / PM domain ->suspend_late => driver ->suspend_late
> >> >
> >> > is far more straightforward than
> >> >
> >> > PM core => bus type / PM domain ->suspend_late => driver ->suspend_late =>
> >> >         bus type / PM domain ->runtime_suspend => driver ->runtime_suspend
> >> >
> >> > with the bus type / PM domain having to figure out somehow at the
> >> > ->suspend_late time whether or not its ->runtume_suspend is going to be invoked
> >> > in the middle of it.
> >> >
> >> > Apart from this just being aesthetically disgusting to me, which admittedly is
> >> > a matter of personal opinion, it makes debugging new driver code harder (if it
> >> > happens to not work) and reviewing it almost impossible, because now you need
> >> > to take all of the tangling between callbacks into accont and sometimes not
> >> > just for one bus type / PM domain.
> >>
> >> I am wondering that perhaps you may be overlooking some of the
> >> internals of runtime PM. Or maybe not? :-)
> >>
> >> I mean, the hole thing is build upon that anyone can call runtime PM
> >> functions to runtime resume/suspend a device.
> >
> > Well, right in general, except that _force_suspend/resume() invoke
> > *callbacks* and *not* runtime PM functions.
> 
> I am considering pm_runtime_force_suspend|resume() being a part of the
> runtime PM API, except that those may be called only during system
> sleep.
> 
> Comparing a call to pm_runtime_resume(); this may trigger rpm_resume()
> to invoke the callbacks. To me, the difference is that the conditions
> looked at in rpm_resume(), when runtime PM is enabled, becomes
> different for system sleep when runtime PM is disabled - and that is
> taken care of in pm_runtime_force_suspend|resume().

So actually invoking runtime PM from a *driver* ->suspend callback for the
same device it was called for is fishy at best and may be a bug.  I'm not
sure why I had been thinking that it might have been fine at all.  It isn't.

The reason why is because runtime PM *potentially* involves invoking middle
layer callbacks an they generally may look like

->runtime_resume:
	(1) do A
	(2) call driver ->runtime_resume
	(3) do B

Now, a middle layer ->suspend callback generally may look like this:

->suspend:
	(1) do C
	(2) call driver ->suspend
	(3) do D

and if you stick the middle layer ->runtime_suspend invocation into the
driver ->suspend (which effectively is what running runtime PM in there means),
you get something like

do C
...
do A
call driver ->runtime_resume
do B
...
do D

and there's no guarantee whatever that "do C" can go before "do A" and
"do B" can go before "do D".  That depends on how the middle layer is designed
and there may be good reasons for how it works.

> >
> >> Doing that, makes the
> >> hierarchy of the runtime PM callbacks being walked and invoked, of
> >> course properly managed by the runtime PM core.
> >>
> >> My point is that, the runtime PM core still controls this behavior,
> >> even when the pm_runtime_force_suspend|resume() helpers are being
> >> invoked. The only difference is that it allows runtime PM for the
> >> device to be disabled, and still correctly invoked the callbacks. That
> >> is what it is all about.
> >
> > So why is it even useful to call ->runtime_suspend from a middle layer
> > in pm_runtime_force_suspend(), for example?
> 
> Perhaps I don't understand the question correctly.
> 
> Anyway, the answer I think of, is probably because of the same reason
> to why the runtime PM core invokes it, when it runs rpm_suspend() for
> a device. My point is, we want the similar behavior.

Not really.  The context is different, so why to expect the behavior to be
the same?

> [...]
> 
> >> >
> >> > Also, when I looked at _force_suspend/resume() again, I got concerned.
> >> > There is stuff in there that shouldn't be necessary in a driver's
> >> > ->late_suspend/->early_resume and some things in there just made me
> >> > scratch my head.
> >>
> >> Yes, there are some complexity in there, I will be happy to answer any
> >> specific question about it.
> >
> > OK
> >
> > Of course they require runtime PM to be enabled by drivers using them as
> > their callbacks, but I suppose that you realize that.
> >
> > Why to disabe/renable runtime PM in there in the first place?  That should
> > have been done by the core when these functions are intended to be called.
> 
> The reason is because we didn't want to re-strict them to be used only
> in ->suspend_late() and ->resume_early(), but also for ->suspend() and
> ->resume(), which is when runtime PM still is enabled.

Well, that means disabling runtime PM for some devices earlier which isn't
particularly consistent overall.

> >
> > Second, why to use RPM_GET_CALLBACK in there?
> 
> To follow the same rules/hierarchy, as being done in rpm_suspend|resume().

No, you don't use the same hierarchy, which is the key point of my objection.

You run *already* in the context of a middle layer PM callback, so by very
definition it is *not* the same situation as running runtime PM elsewhere.

This is the second or maybe even the third time I have repeated this point
and I'm not going to do so again.

> >
> > Next, how is the parent actually runtime-resumed by pm_runtime_force_resume()
> > which the comment in pm_runtime_force_suspend() talks about?
> 
> I think the relevant use case here is when a parent and a child, both
> have subsystems/drivers using pm_runtime_force_suspend|resume(). If
> that isn't the case, we expect that the parent is always resumed
> during system resume.

Why?

> It's a bit fragile approach, so we perhaps we
> should deal with it, even if the hole thing is used as opt-in.
> 
> Anyway, let's focus on the case which I think is most relevant to your question:
> 
> A couple of conditions to start with.
> *) The PM core system suspends a child prior a parent, which leads to
> pm_runtime_force_suspend() being called for the child first.
> **) The PM core system resumes a parents before a child, thus
> pm_runtime_force_resume() is called for the parent first.
> 
> In case a child don't need to be resumed when
> pm_runtime_force_resume() is called for it, likely doesn't its parent.
> However, to control that, in system suspend the
> pm_runtime_force_suspend() increases the usage counter for the parent,
> as to indicate if it needs to be resumed when
> pm_runtime_force_resume() is called for it.

OK, I see.

Why is usage_count > 1 used as the condition to trigger this behavior?

Thanks,
Rafael

Rafael J. Wysocki Sept. 6, 2017, 10:46 a.m. UTC | #10

On Wednesday, September 6, 2017 2:52:59 AM CEST Rafael J. Wysocki wrote:
> On Monday, September 4, 2017 2:55:37 PM CEST Ulf Hansson wrote:
> > [...]

I guess I can wrap it up, because all of the points seem to have been stated
and repeating them would not be useful.

My summary of the discussion is as follows.

It only is valid to use pm_runtime_force_suspend/resume() as *driver*
callbacks for system suspend/resume if both the driver itself and all of
the middle layers it has to work with carry out the same sequence of
operations in order to suspend the device both in runtime PM and for
system sleep (and analogously for resuming).  [The middle layers need
to meet additional conditions, but that's less relevant.]

Unfortunately, for the ACPI PM domain and the PCI bus type the situation is
different, because they generally need to do different things to suspend
devices for system sleep than they do for runtime PM (which mostly is
related to the handling of ACPI-defined sleep states and device/system
wakeup, but not limited to that).  This clearly means that drivers needing
to work with the ACPI PM domain and PCI drivers cannot use
pm_runtime_force_suspend/resume() as their PM callbacks for system
suspend/resume (quite fundamentally).

[Note that for i2c-designware-platdrv the situation is even more complicated,
because on some platforms it has to work with the ACPI PM domain (or the
ACPI LPSS driver), on some platforms its parent is a PCI device and on
some other platforms there's none of them.]

However, for drivers that need to work with the ACPI PM domain and
PCI drivers the differences in the device handling between runtime PM and
system suspend/resume are *very* often (even though not always) covered
entirely by the middle layer code.  Then, the driver itself actually
always carries out the same sequence of operations in order to suspend
the device (or to resume it, analogously).  The driver then can re-use
its runtime PM callbacks for system suspend/resume (but at the driver
level only) and it would be good to make that easy (or easier) for these
drivers somehow.

Thanks,
Rafael

Ulf Hansson Sept. 6, 2017, 1:54 p.m. UTC | #11

[...]

>
>> >
>> > Now, if that's just one device, it probably doesn't matter, but if there are
>> > more devices like that, they will be resumed after system suspend when they
>> > are accessed and quite likely they will be accessed one-by-one rather than in
>> > parallel with each other, so the latencies related to that will add up.  In
>> > that case it is better to resume them upfront during system resume as they will
>> > be resumed in parallel with each other then.  And that also is *way* simpler.
>> >
>> > This means that the benefit from avoiding to resume devices during system
>> > resume is not quite so obvious and the whole point above is highly
>> > questionable.
>>
>> I hope my reasoning above explains why I think it shouldn't be
>> considered as questionable.
>>
>> If you like, I can also provide some real data/logs - showing you
>> what's happening.
>>
>
> That's not necessary, this behavior can be useful and there are arguments for
> doing it in *some* cases, but all of this argumentation applies to devices
> that aren't going to be used right after system resume.  If they *are* going
> to be used then, it very well may be better to resume them as part of
> system resume instead of deferring that.
>
> The tricky part is that at the point the resume callbacks run it is not known
> whether or not the device is going to be accessed shortly and the decision made
> either way may be suboptimal.

You have a point and it seems like this is what everything boils done
to, except for the reasons about that you dislike how
pm_runtime_force_suspend|resume() is being used by drivers.

To clarify, let me bring up yet another typical scenario, observed
often in cases when pm_runtime_force_suspend|resume is not used.

During system resume the device gets resumed, then shortly after the
system resume sequence has completed, it become runtime suspened,
because pm_runtime_put() is called in device_complete(). Then, soon
after the system has resumed, the device becomes runtime resumed
again, which is because there is a request for it to really be used.

This means we end up resuming the device, suspending it then and
resuming it again, all within a very short time frame. I guess this is
also one of those tricky cases you refer to above, because one just
can know how long after the system has resumed it takes for the device
to be requested to be used again, thus we end up runtime suspending
the device in-between.

To me, spending lot of time in the world of embedded battery driven
devices, this behavior isn't good enough, because it increases system
resume time and may waste some power. Apologize if you find me
repeating myself.

Anyway, this leads to my final question, do you want this behavior to
be better addressed by the ACPI PM domain, if it can be solved nicely,
or are you fine with how works today?

>
> [Note: I know that people mostly care about seeing the screen on, but in fact
> they should *also* care about the touch panel being ready to respond to
> touches, for example.  If it isn't ready and the system suspends again
> after a while because of that, the experience is somehwat less than fantastic.]
>

Yep!

[...]

>> Comparing a call to pm_runtime_resume(); this may trigger rpm_resume()
>> to invoke the callbacks. To me, the difference is that the conditions
>> looked at in rpm_resume(), when runtime PM is enabled, becomes
>> different for system sleep when runtime PM is disabled - and that is
>> taken care of in pm_runtime_force_suspend|resume().
>
> So actually invoking runtime PM from a *driver* ->suspend callback for the
> same device it was called for is fishy at best and may be a bug.  I'm not
> sure why I had been thinking that it might have been fine at all.  It isn't.

Huh, now you lost me. :-)

>
> The reason why is because runtime PM *potentially* involves invoking middle
> layer callbacks an they generally may look like
>
> ->runtime_resume:
>         (1) do A
>         (2) call driver ->runtime_resume
>         (3) do B
>
> Now, a middle layer ->suspend callback generally may look like this:
>
> ->suspend:
>         (1) do C
>         (2) call driver ->suspend
>         (3) do D
>
> and if you stick the middle layer ->runtime_suspend invocation into the
> driver ->suspend (which effectively is what running runtime PM in there means),
> you get something like
>
> do C
> ...
> do A
> call driver ->runtime_resume
> do B
> ...
> do D
>
> and there's no guarantee whatever that "do C" can go before "do A" and
> "do B" can go before "do D".  That depends on how the middle layer is designed
> and there may be good reasons for how it works.

For ARM SoCs, not using the ACPI PM domain, many drivers needs to be
able to use runtime PM during system suspend, simply because the PM
domain/middle layer, has no knowledge of what the driver needs to put
its device into low power state during system suspend.

For many of the simple cases, the PM domain/middle layer act
transparent to this, which means leaving what needs to be done to the
driver (platform, spi, i2c, amba, genpd etc).

I understand there may be some cases where the situation becomes more
complex and interaction between the driver and the PM domain/middle
layer is required, like what it seems for in ACPI PM domain and PCI,
but is that a reason turn the world upside down for everybody else?

Or perhaps I don't understand what your are suggesting here.

[...]

>
>> I think the relevant use case here is when a parent and a child, both
>> have subsystems/drivers using pm_runtime_force_suspend|resume(). If
>> that isn't the case, we expect that the parent is always resumed
>> during system resume.
>
> Why?

Because the child may rely on that for it to be resumed.

Moreover, the expectation is that the parent likely doesn't support
runtime PM, or that it not yet supports the optimized method of using
pm_runtime_force_suspend|resume() during system sleep, and will thus
resume its device always during system resume.

>
>> It's a bit fragile approach, so we perhaps we
>> should deal with it, even if the hole thing is used as opt-in.
>>
>> Anyway, let's focus on the case which I think is most relevant to your question:
>>
>> A couple of conditions to start with.
>> *) The PM core system suspends a child prior a parent, which leads to
>> pm_runtime_force_suspend() being called for the child first.
>> **) The PM core system resumes a parents before a child, thus
>> pm_runtime_force_resume() is called for the parent first.
>>
>> In case a child don't need to be resumed when
>> pm_runtime_force_resume() is called for it, likely doesn't its parent.
>> However, to control that, in system suspend the
>> pm_runtime_force_suspend() increases the usage counter for the parent,
>> as to indicate if it needs to be resumed when
>> pm_runtime_force_resume() is called for it.
>
> OK, I see.
>
> Why is usage_count > 1 used as the condition to trigger this behavior?

It takes into account that the PM core increases the usage count in
device_prepare(), but which isn't because it needs the device to be
operational.

Kind regards
Uffe

Ulf Hansson Sept. 6, 2017, 1:59 p.m. UTC | #12

On 6 September 2017 at 12:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, September 6, 2017 2:52:59 AM CEST Rafael J. Wysocki wrote:
>> On Monday, September 4, 2017 2:55:37 PM CEST Ulf Hansson wrote:
>> > [...]
>
> I guess I can wrap it up, because all of the points seem to have been stated
> and repeating them would not be useful.
>
> My summary of the discussion is as follows.
>
> It only is valid to use pm_runtime_force_suspend/resume() as *driver*
> callbacks for system suspend/resume if both the driver itself and all of
> the middle layers it has to work with carry out the same sequence of
> operations in order to suspend the device both in runtime PM and for
> system sleep (and analogously for resuming).  [The middle layers need
> to meet additional conditions, but that's less relevant.]
>
> Unfortunately, for the ACPI PM domain and the PCI bus type the situation is
> different, because they generally need to do different things to suspend
> devices for system sleep than they do for runtime PM (which mostly is
> related to the handling of ACPI-defined sleep states and device/system
> wakeup, but not limited to that).  This clearly means that drivers needing
> to work with the ACPI PM domain and PCI drivers cannot use
> pm_runtime_force_suspend/resume() as their PM callbacks for system
> suspend/resume (quite fundamentally).
>
> [Note that for i2c-designware-platdrv the situation is even more complicated,
> because on some platforms it has to work with the ACPI PM domain (or the
> ACPI LPSS driver), on some platforms its parent is a PCI device and on
> some other platforms there's none of them.]

That is also why it makes it really interesting. I am guessing we will
be seeing more of these cases sooner or later.

To make it even more complex, I can guess we can expect cases when
genpd is mixed with the ACPI PM domain.

>
> However, for drivers that need to work with the ACPI PM domain and
> PCI drivers the differences in the device handling between runtime PM and
> system suspend/resume are *very* often (even though not always) covered
> entirely by the middle layer code.  Then, the driver itself actually
> always carries out the same sequence of operations in order to suspend
> the device (or to resume it, analogously).  The driver then can re-use
> its runtime PM callbacks for system suspend/resume (but at the driver
> level only) and it would be good to make that easy (or easier) for these
> drivers somehow.

This is a very nice summary so far, thanks for putting it together.

Kind regards
Uffe

Rafael J. Wysocki Sept. 6, 2017, 9:39 p.m. UTC | #13

On Wednesday, September 6, 2017 3:59:16 PM CEST Ulf Hansson wrote:
> On 6 September 2017 at 12:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Wednesday, September 6, 2017 2:52:59 AM CEST Rafael J. Wysocki wrote:
> >> On Monday, September 4, 2017 2:55:37 PM CEST Ulf Hansson wrote:
> >> > [...]
> >
> > I guess I can wrap it up, because all of the points seem to have been stated
> > and repeating them would not be useful.
> >
> > My summary of the discussion is as follows.
> >
> > It only is valid to use pm_runtime_force_suspend/resume() as *driver*
> > callbacks for system suspend/resume if both the driver itself and all of
> > the middle layers it has to work with carry out the same sequence of
> > operations in order to suspend the device both in runtime PM and for
> > system sleep (and analogously for resuming).  [The middle layers need
> > to meet additional conditions, but that's less relevant.]
> >
> > Unfortunately, for the ACPI PM domain and the PCI bus type the situation is
> > different, because they generally need to do different things to suspend
> > devices for system sleep than they do for runtime PM (which mostly is
> > related to the handling of ACPI-defined sleep states and device/system
> > wakeup, but not limited to that).  This clearly means that drivers needing
> > to work with the ACPI PM domain and PCI drivers cannot use
> > pm_runtime_force_suspend/resume() as their PM callbacks for system
> > suspend/resume (quite fundamentally).
> >
> > [Note that for i2c-designware-platdrv the situation is even more complicated,
> > because on some platforms it has to work with the ACPI PM domain (or the
> > ACPI LPSS driver), on some platforms its parent is a PCI device and on
> > some other platforms there's none of them.]
> 
> That is also why it makes it really interesting. I am guessing we will
> be seeing more of these cases sooner or later.
> 
> To make it even more complex, I can guess we can expect cases when
> genpd is mixed with the ACPI PM domain.
> 
> >
> > However, for drivers that need to work with the ACPI PM domain and
> > PCI drivers the differences in the device handling between runtime PM and
> > system suspend/resume are *very* often (even though not always) covered
> > entirely by the middle layer code.  Then, the driver itself actually
> > always carries out the same sequence of operations in order to suspend
> > the device (or to resume it, analogously).  The driver then can re-use
> > its runtime PM callbacks for system suspend/resume (but at the driver
> > level only) and it would be good to make that easy (or easier) for these
> > drivers somehow.
> 
> This is a very nice summary so far, thanks for putting it together.

No problem.

I actually have an idea on how to move forward, but let me start a new thread
for discussing that.

Thanks,
Rafael

[v3,0/8] PM / ACPI / i2c: Deploy runtime PM centric path for system sleep

Message

Comments