diff mbox series

[v3,1/6] PM / core: Add LEAVE_SUSPENDED driver flag

Message ID 5556673.MXo6XDN1f4@aspire.rjw.lan
State Not Applicable
Headers show
Series PM / sleep: Driver flags for system suspend/resume (part 2) | expand

Commit Message

Rafael J. Wysocki Nov. 12, 2017, 12:37 a.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
instruct the PM core and middle-layer (bus type, PM domain, etc.)
code that it is desirable to leave the device in runtime suspend
after system-wide transitions to the working state (for example,
the device may be slow to resume and it may be better to avoid
resuming it right away).

Generally, the middle-layer code involved in the handling of the
device is expected to indicate to the PM core whether or not the
device may be left in suspend with the help of the device's
power.may_skip_resume status bit.  That has to happen in the "noirq"
phase of the preceding system suspend (or analogous) transition.
The middle layer is then responsible for handling the device as
appropriate in its "noirq" resume callback which is executed
regardless of whether or not the device may be left suspended, but
the other resume callbacks (except for ->complete) will be skipped
automatically by the core if the device really can be left in
suspend.

The additional power.must_resume status bit introduced for the
implementation of this mechanisn is used internally by the PM core
to track the requirement to resume the device (which may depend on
its children etc).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

v2 -> v3: Take dev->power.usage_count when updating power.must_resume in
          __device_suspend_noirq().

---
 Documentation/driver-api/pm/devices.rst |   24 ++++++++++-
 drivers/base/power/main.c               |   66 +++++++++++++++++++++++++++++---
 drivers/base/power/runtime.c            |    9 ++--
 include/linux/pm.h                      |   14 +++++-
 include/linux/pm_runtime.h              |    9 ++--
 5 files changed, 104 insertions(+), 18 deletions(-)

Comments

Ulf Hansson Nov. 16, 2017, 3:10 p.m. UTC | #1
On 12 November 2017 at 01:37, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
> instruct the PM core and middle-layer (bus type, PM domain, etc.)
> code that it is desirable to leave the device in runtime suspend
> after system-wide transitions to the working state (for example,
> the device may be slow to resume and it may be better to avoid
> resuming it right away).
>
> Generally, the middle-layer code involved in the handling of the
> device is expected to indicate to the PM core whether or not the
> device may be left in suspend with the help of the device's
> power.may_skip_resume status bit.  That has to happen in the "noirq"
> phase of the preceding system suspend (or analogous) transition.
> The middle layer is then responsible for handling the device as
> appropriate in its "noirq" resume callback which is executed
> regardless of whether or not the device may be left suspended, but
> the other resume callbacks (except for ->complete) will be skipped
> automatically by the core if the device really can be left in
> suspend.
>
> The additional power.must_resume status bit introduced for the
> implementation of this mechanisn is used internally by the PM core
> to track the requirement to resume the device (which may depend on
> its children etc).
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>
> v2 -> v3: Take dev->power.usage_count when updating power.must_resume in
>           __device_suspend_noirq().
>
> ---
>  Documentation/driver-api/pm/devices.rst |   24 ++++++++++-
>  drivers/base/power/main.c               |   66 +++++++++++++++++++++++++++++---
>  drivers/base/power/runtime.c            |    9 ++--
>  include/linux/pm.h                      |   14 +++++-
>  include/linux/pm_runtime.h              |    9 ++--
>  5 files changed, 104 insertions(+), 18 deletions(-)
>
> Index: linux-pm/include/linux/pm.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm.h
> +++ linux-pm/include/linux/pm.h
> @@ -559,6 +559,7 @@ struct pm_subsys_data {
>   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
>   * SMART_SUSPEND: No need to resume the device from runtime suspend.
> + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
>   *
>   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>   * system suspend/resume callbacks to be skipped for the device to return 0 from
> @@ -572,10 +573,14 @@ struct pm_subsys_data {
>   * necessary from the driver's perspective.  It also may cause them to skip
>   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
>   * the driver if they decide to leave the device in runtime suspend.
> + *
> + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
> + * driver prefers the device to be left in runtime suspend after system resume.
>   */

Question: Can LEAVE_SUSPENDED and NEVER_SKIP be valid combination? I
guess not!? Should we validate for wrong combinations?

[...]

>  /**
>   * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
>   * @dev: Device to handle.
> @@ -1127,10 +1161,28 @@ static int __device_suspend_noirq(struct
>         }
>
>         error = dpm_run_callback(callback, dev, state, info);
> -       if (!error)
> -               dev->power.is_noirq_suspended = true;
> -       else
> +       if (error) {
>                 async_error = error;
> +               goto Complete;
> +       }
> +
> +       dev->power.is_noirq_suspended = true;
> +
> +       if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
> +               /*
> +                * The only safe strategy here is to require that if the device
> +                * may not be left in suspend, resume callbacks must be invoked
> +                * for it.
> +                */
> +               dev->power.must_resume = dev->power.must_resume ||
> +                                       !dev->power.may_skip_resume ||
> +                                       atomic_read(&dev->power.usage_count);

dev->power.usage_count is always > 0 at this point, meaning that
dev->power.must_resume always becomes true. :-)

You should rather use "atomic_read(&dev->power.usage_count) > 1".

> +       } else {
> +               dev->power.must_resume = true;
> +       }
> +
> +       if (dev->power.must_resume)
> +               dpm_superior_set_must_resume(dev);
>
>  Complete:
>         complete_all(&dev->power.completion);
> @@ -1487,6 +1539,9 @@ static int __device_suspend(struct devic
>                 dev->power.direct_complete = false;
>         }
>
> +       dev->power.may_skip_resume = false;
> +       dev->power.must_resume = false;
> +

First, these assignment could be bypassed if the direct_complete path
is used. Perhaps it's more robust to reset these flags already in
device_prepare().

Second, have you considered setting the default value of
dev->power.may_skip_resume to true? That would means the subsystem
instead need to implement an opt-out method. I am thinking that it may
not be an issue, since we anyway at this point, don't have drivers
using the LEAVE_SUSPENDED flag.

[...]

> +However, it may be desirable to leave some devices in runtime suspend after
> +system transitions to the working state and device drivers can use the
> +``DPM_FLAG_LEAVE_SUSPENDED`` flag to indicate to the PM core (and middle-layer
> +code) that this is the case.  Whether or not the devices will actually be left
> +in suspend may depend on their state before the given system suspend-resume
> +cycle and on the type of the system transition under way.  In particular,
> +devices are not left suspended if that transition is a restore from hibernation,
> +as device states are not guaranteed to be reflected by the information stored in
> +the hibernation image in that case.
> +
> +The middle-layer code involved in the handling of the device has to indicate to
> +the PM core if the device may be left in suspend with the help of its
> +:c:member:`power.may_skip_resume` status bit.  That has to happen in the "noirq"
> +phase of the preceding system-wide suspend (or analogous) transition.  The

Does it have to be managed in the "noirq" phase? Wouldn't be perfectly
okay do this in the suspend and suspend_late phases as well?

> +middle layer is then responsible for handling the device as appropriate in its
> +"noirq" resume callback, which is executed regardless of whether or not the
> +device may be left suspended, but the other resume callbacks (except for
> +``->complete``) will be skipped automatically by the PM core if the device
> +really can be left in suspend.
>

Kind regards
Uffe
Rafael J. Wysocki Nov. 16, 2017, 11:07 p.m. UTC | #2
On Thursday, November 16, 2017 4:10:16 PM CET Ulf Hansson wrote:
> On 12 November 2017 at 01:37, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
> > instruct the PM core and middle-layer (bus type, PM domain, etc.)
> > code that it is desirable to leave the device in runtime suspend
> > after system-wide transitions to the working state (for example,
> > the device may be slow to resume and it may be better to avoid
> > resuming it right away).
> >
> > Generally, the middle-layer code involved in the handling of the
> > device is expected to indicate to the PM core whether or not the
> > device may be left in suspend with the help of the device's
> > power.may_skip_resume status bit.  That has to happen in the "noirq"
> > phase of the preceding system suspend (or analogous) transition.
> > The middle layer is then responsible for handling the device as
> > appropriate in its "noirq" resume callback which is executed
> > regardless of whether or not the device may be left suspended, but
> > the other resume callbacks (except for ->complete) will be skipped
> > automatically by the core if the device really can be left in
> > suspend.
> >
> > The additional power.must_resume status bit introduced for the
> > implementation of this mechanisn is used internally by the PM core
> > to track the requirement to resume the device (which may depend on
> > its children etc).
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > ---
> >
> > v2 -> v3: Take dev->power.usage_count when updating power.must_resume in
> >           __device_suspend_noirq().
> >
> > ---
> >  Documentation/driver-api/pm/devices.rst |   24 ++++++++++-
> >  drivers/base/power/main.c               |   66 +++++++++++++++++++++++++++++---
> >  drivers/base/power/runtime.c            |    9 ++--
> >  include/linux/pm.h                      |   14 +++++-
> >  include/linux/pm_runtime.h              |    9 ++--
> >  5 files changed, 104 insertions(+), 18 deletions(-)
> >
> > Index: linux-pm/include/linux/pm.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/pm.h
> > +++ linux-pm/include/linux/pm.h
> > @@ -559,6 +559,7 @@ struct pm_subsys_data {
> >   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
> >   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
> >   * SMART_SUSPEND: No need to resume the device from runtime suspend.
> > + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
> >   *
> >   * Setting SMART_PREPARE instructs bus types and PM domains which may want
> >   * system suspend/resume callbacks to be skipped for the device to return 0 from
> > @@ -572,10 +573,14 @@ struct pm_subsys_data {
> >   * necessary from the driver's perspective.  It also may cause them to skip
> >   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
> >   * the driver if they decide to leave the device in runtime suspend.
> > + *
> > + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
> > + * driver prefers the device to be left in runtime suspend after system resume.
> >   */
> 
> Question: Can LEAVE_SUSPENDED and NEVER_SKIP be valid combination? I
> guess not!? Should we validate for wrong combinations?

Why not?  There's no real overlap between them.

> 
> [...]
> 
> >  /**
> >   * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
> >   * @dev: Device to handle.
> > @@ -1127,10 +1161,28 @@ static int __device_suspend_noirq(struct
> >         }
> >
> >         error = dpm_run_callback(callback, dev, state, info);
> > -       if (!error)
> > -               dev->power.is_noirq_suspended = true;
> > -       else
> > +       if (error) {
> >                 async_error = error;
> > +               goto Complete;
> > +       }
> > +
> > +       dev->power.is_noirq_suspended = true;
> > +
> > +       if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
> > +               /*
> > +                * The only safe strategy here is to require that if the device
> > +                * may not be left in suspend, resume callbacks must be invoked
> > +                * for it.
> > +                */
> > +               dev->power.must_resume = dev->power.must_resume ||
> > +                                       !dev->power.may_skip_resume ||
> > +                                       atomic_read(&dev->power.usage_count);
> 
> dev->power.usage_count is always > 0 at this point, meaning that
> dev->power.must_resume always becomes true. :-)
> 
> You should rather use "atomic_read(&dev->power.usage_count) > 1".

Right, thanks.  I tend to forget about that.

> > +       } else {
> > +               dev->power.must_resume = true;
> > +       }
> > +
> > +       if (dev->power.must_resume)
> > +               dpm_superior_set_must_resume(dev);
> >
> >  Complete:
> >         complete_all(&dev->power.completion);
> > @@ -1487,6 +1539,9 @@ static int __device_suspend(struct devic
> >                 dev->power.direct_complete = false;
> >         }
> >
> > +       dev->power.may_skip_resume = false;
> > +       dev->power.must_resume = false;
> > +
> 
> First, these assignment could be bypassed if the direct_complete path
> is used. Perhaps it's more robust to reset these flags already in
> device_prepare().

In the direct-complete case may_skip_resume doesn't matter.

must_resume should be set to "false", however, so that parents of
direct-complete devices may be left in suspend (in case they don't
fall under direct-complete themselves), so good catch.

But it is sufficient to do that before the power.direct_complete check above. :-)

> Second, have you considered setting the default value of
> dev->power.may_skip_resume to true?

Yes.

> That would means the subsystem
> instead need to implement an opt-out method. I am thinking that it may
> not be an issue, since we anyway at this point, don't have drivers
> using the LEAVE_SUSPENDED flag.

Opt-out doesn't work because of the need to invoke the "noirq" callbacks.

> [...]
> 
> > +However, it may be desirable to leave some devices in runtime suspend after
> > +system transitions to the working state and device drivers can use the
> > +``DPM_FLAG_LEAVE_SUSPENDED`` flag to indicate to the PM core (and middle-layer
> > +code) that this is the case.  Whether or not the devices will actually be left
> > +in suspend may depend on their state before the given system suspend-resume
> > +cycle and on the type of the system transition under way.  In particular,
> > +devices are not left suspended if that transition is a restore from hibernation,
> > +as device states are not guaranteed to be reflected by the information stored in
> > +the hibernation image in that case.
> > +
> > +The middle-layer code involved in the handling of the device has to indicate to
> > +the PM core if the device may be left in suspend with the help of its
> > +:c:member:`power.may_skip_resume` status bit.  That has to happen in the "noirq"
> > +phase of the preceding system-wide suspend (or analogous) transition.  The
> 
> Does it have to be managed in the "noirq" phase? Wouldn't be perfectly
> okay do this in the suspend and suspend_late phases as well?

The wording is slightly misleading I think.

In fact technically may_skip_resume may be set earlier, but the core checks it
in the "noirq" phase only anyway.

Thanks,
Rafael
Ulf Hansson Nov. 17, 2017, 6:11 a.m. UTC | #3
[...]

>> > +++ linux-pm/include/linux/pm.h
>> > @@ -559,6 +559,7 @@ struct pm_subsys_data {
>> >   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>> >   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
>> >   * SMART_SUSPEND: No need to resume the device from runtime suspend.
>> > + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
>> >   *
>> >   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>> >   * system suspend/resume callbacks to be skipped for the device to return 0 from
>> > @@ -572,10 +573,14 @@ struct pm_subsys_data {
>> >   * necessary from the driver's perspective.  It also may cause them to skip
>> >   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
>> >   * the driver if they decide to leave the device in runtime suspend.
>> > + *
>> > + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
>> > + * driver prefers the device to be left in runtime suspend after system resume.
>> >   */
>>
>> Question: Can LEAVE_SUSPENDED and NEVER_SKIP be valid combination? I
>> guess not!? Should we validate for wrong combinations?
>
> Why not?  There's no real overlap between them.

Except that NEVER_SKIP, documentation wise, tells you that your
suspend and resume callbacks will never be skipped. :-)

[...]

>> Second, have you considered setting the default value of
>> dev->power.may_skip_resume to true?
>
> Yes.
>
>> That would means the subsystem
>> instead need to implement an opt-out method. I am thinking that it may
>> not be an issue, since we anyway at this point, don't have drivers
>> using the LEAVE_SUSPENDED flag.
>
> Opt-out doesn't work because of the need to invoke the "noirq" callbacks.

I am not sure I follow that.

Whatever needs to be fixed on the subsystem level, that could be done
before the driver starts using the LEAVE_SUSPENDED flag. No?

>
>> [...]
>>
>> > +However, it may be desirable to leave some devices in runtime suspend after
>> > +system transitions to the working state and device drivers can use the
>> > +``DPM_FLAG_LEAVE_SUSPENDED`` flag to indicate to the PM core (and middle-layer
>> > +code) that this is the case.  Whether or not the devices will actually be left
>> > +in suspend may depend on their state before the given system suspend-resume
>> > +cycle and on the type of the system transition under way.  In particular,
>> > +devices are not left suspended if that transition is a restore from hibernation,
>> > +as device states are not guaranteed to be reflected by the information stored in
>> > +the hibernation image in that case.
>> > +
>> > +The middle-layer code involved in the handling of the device has to indicate to
>> > +the PM core if the device may be left in suspend with the help of its
>> > +:c:member:`power.may_skip_resume` status bit.  That has to happen in the "noirq"
>> > +phase of the preceding system-wide suspend (or analogous) transition.  The
>>
>> Does it have to be managed in the "noirq" phase? Wouldn't be perfectly
>> okay do this in the suspend and suspend_late phases as well?
>
> The wording is slightly misleading I think.
>
> In fact technically may_skip_resume may be set earlier, but the core checks it
> in the "noirq" phase only anyway.

Yeah, okay.

Kind regards
Uffe
Rafael J. Wysocki Nov. 17, 2017, 12:45 p.m. UTC | #4
On Fri, Nov 17, 2017 at 12:07 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, November 16, 2017 4:10:16 PM CET Ulf Hansson wrote:
>> On 12 November 2017 at 01:37, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >
>> > Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
>> > instruct the PM core and middle-layer (bus type, PM domain, etc.)
>> > code that it is desirable to leave the device in runtime suspend
>> > after system-wide transitions to the working state (for example,
>> > the device may be slow to resume and it may be better to avoid
>> > resuming it right away).
>> >
>> > Generally, the middle-layer code involved in the handling of the
>> > device is expected to indicate to the PM core whether or not the
>> > device may be left in suspend with the help of the device's
>> > power.may_skip_resume status bit.  That has to happen in the "noirq"
>> > phase of the preceding system suspend (or analogous) transition.
>> > The middle layer is then responsible for handling the device as
>> > appropriate in its "noirq" resume callback which is executed
>> > regardless of whether or not the device may be left suspended, but
>> > the other resume callbacks (except for ->complete) will be skipped
>> > automatically by the core if the device really can be left in
>> > suspend.
>> >
>> > The additional power.must_resume status bit introduced for the
>> > implementation of this mechanisn is used internally by the PM core
>> > to track the requirement to resume the device (which may depend on
>> > its children etc).
>> >
>> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> > ---
>> >
>> > v2 -> v3: Take dev->power.usage_count when updating power.must_resume in
>> >           __device_suspend_noirq().
>> >
>> > ---

[...]

>> > +       } else {
>> > +               dev->power.must_resume = true;
>> > +       }
>> > +
>> > +       if (dev->power.must_resume)
>> > +               dpm_superior_set_must_resume(dev);
>> >
>> >  Complete:
>> >         complete_all(&dev->power.completion);
>> > @@ -1487,6 +1539,9 @@ static int __device_suspend(struct devic
>> >                 dev->power.direct_complete = false;
>> >         }
>> >
>> > +       dev->power.may_skip_resume = false;
>> > +       dev->power.must_resume = false;
>> > +
>>
>> First, these assignment could be bypassed if the direct_complete path
>> is used. Perhaps it's more robust to reset these flags already in
>> device_prepare().
>
> In the direct-complete case may_skip_resume doesn't matter.
>
> must_resume should be set to "false", however, so that parents of
> direct-complete devices may be left in suspend (in case they don't
> fall under direct-complete themselves), so good catch.

Actually, not really.

must_resume for parents/suppliers is not updated if the device has
direct_complete set and the device's own must_resume doesn't matter
then.

So this part is good as is AFAICS.

Thanks,
Rafael
Rafael J. Wysocki Nov. 17, 2017, 1:18 p.m. UTC | #5
On Fri, Nov 17, 2017 at 7:11 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> [...]
>
>>> > +++ linux-pm/include/linux/pm.h
>>> > @@ -559,6 +559,7 @@ struct pm_subsys_data {
>>> >   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>>> >   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
>>> >   * SMART_SUSPEND: No need to resume the device from runtime suspend.
>>> > + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
>>> >   *
>>> >   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>>> >   * system suspend/resume callbacks to be skipped for the device to return 0 from
>>> > @@ -572,10 +573,14 @@ struct pm_subsys_data {
>>> >   * necessary from the driver's perspective.  It also may cause them to skip
>>> >   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
>>> >   * the driver if they decide to leave the device in runtime suspend.
>>> > + *
>>> > + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
>>> > + * driver prefers the device to be left in runtime suspend after system resume.
>>> >   */
>>>
>>> Question: Can LEAVE_SUSPENDED and NEVER_SKIP be valid combination? I
>>> guess not!? Should we validate for wrong combinations?
>>
>> Why not?  There's no real overlap between them.
>
> Except that NEVER_SKIP, documentation wise, tells you that your
> suspend and resume callbacks will never be skipped. :-)

You mean the comment in pm.h I suppose?  Yes, it isn't precise enough.

The proper documentation in devices.rst is less ambiguous, though. :-)

> [...]
>
>>> Second, have you considered setting the default value of
>>> dev->power.may_skip_resume to true?
>>
>> Yes.
>>
>>> That would means the subsystem
>>> instead need to implement an opt-out method. I am thinking that it may
>>> not be an issue, since we anyway at this point, don't have drivers
>>> using the LEAVE_SUSPENDED flag.
>>
>> Opt-out doesn't work because of the need to invoke the "noirq" callbacks.
>
> I am not sure I follow that.
>
> Whatever needs to be fixed on the subsystem level, that could be done
> before the driver starts using the LEAVE_SUSPENDED flag. No?

That requires a bit of explanation, sorry for being overly concise.

The core calls ->resume_noirq from the middle layer regardless of
whether or not the device will be left suspended, so the
->resume_noirq cannot do arbitrary things to it.  Setting
may_skip_resume by the middle layer tells the core that the middle
layer is ready for that and is going to cooperate.  If may_skip_resume
had been set by default, that piece of information would have been
missing.

Thanks,
Rafael
Ulf Hansson Nov. 17, 2017, 1:49 p.m. UTC | #6
[...]

>>
>>>> Second, have you considered setting the default value of
>>>> dev->power.may_skip_resume to true?
>>>
>>> Yes.
>>>
>>>> That would means the subsystem
>>>> instead need to implement an opt-out method. I am thinking that it may
>>>> not be an issue, since we anyway at this point, don't have drivers
>>>> using the LEAVE_SUSPENDED flag.
>>>
>>> Opt-out doesn't work because of the need to invoke the "noirq" callbacks.
>>
>> I am not sure I follow that.
>>
>> Whatever needs to be fixed on the subsystem level, that could be done
>> before the driver starts using the LEAVE_SUSPENDED flag. No?
>
> That requires a bit of explanation, sorry for being overly concise.
>
> The core calls ->resume_noirq from the middle layer regardless of
> whether or not the device will be left suspended, so the
> ->resume_noirq cannot do arbitrary things to it.  Setting
> may_skip_resume by the middle layer tells the core that the middle
> layer is ready for that and is going to cooperate.  If may_skip_resume
> had been set by default, that piece of information would have been
> missing.

Huh, I still don't get that. Sorry.

If the "may_skip_resume" is default set to true by the PM core,
wouldn't that just mean that the middle-layer needs to implement an
opt-out method, rather than opt-in. In principle to opt-out the
middle-layer needs to set may_skip_resume to false in suspend_noirq
phase, no?

Then we only need to make sure drivers don't starts use
LEAVE_SUSPENDED, before we make sure the middle layers is adopted. But
that should not be a problem.

The benefit would be that those middle layers that can cope with
LEAVE_SUSPENDED as of today don't need to change.

Kind regards
Uffe
Rafael J. Wysocki Nov. 17, 2017, 2:31 p.m. UTC | #7
On Fri, Nov 17, 2017 at 2:49 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> [...]
>
>>>
>>>>> Second, have you considered setting the default value of
>>>>> dev->power.may_skip_resume to true?
>>>>
>>>> Yes.
>>>>
>>>>> That would means the subsystem
>>>>> instead need to implement an opt-out method. I am thinking that it may
>>>>> not be an issue, since we anyway at this point, don't have drivers
>>>>> using the LEAVE_SUSPENDED flag.
>>>>
>>>> Opt-out doesn't work because of the need to invoke the "noirq" callbacks.
>>>
>>> I am not sure I follow that.
>>>
>>> Whatever needs to be fixed on the subsystem level, that could be done
>>> before the driver starts using the LEAVE_SUSPENDED flag. No?
>>
>> That requires a bit of explanation, sorry for being overly concise.
>>
>> The core calls ->resume_noirq from the middle layer regardless of
>> whether or not the device will be left suspended, so the
>> ->resume_noirq cannot do arbitrary things to it.  Setting
>> may_skip_resume by the middle layer tells the core that the middle
>> layer is ready for that and is going to cooperate.  If may_skip_resume
>> had been set by default, that piece of information would have been
>> missing.
>
> Huh, I still don't get that. Sorry.
>
> If the "may_skip_resume" is default set to true by the PM core,
> wouldn't that just mean that the middle-layer needs to implement an
> opt-out method, rather than opt-in. In principle to opt-out the
> middle-layer needs to set may_skip_resume to false in suspend_noirq
> phase, no?

Yes, but if the middle-layer doesn't clear it, that may mean two
things.  First, the middle layer is ready and so on.  Good.  Second,
the middle layer is not aware of the whole thing.  Not good.  The core
cannot tell.

In the opt-in case, however, all is clear. :-)

> Then we only need to make sure drivers don't starts use
> LEAVE_SUSPENDED, before we make sure the middle layers is adopted. But
> that should not be a problem.
>
> The benefit would be that those middle layers that can cope with
> LEAVE_SUSPENDED as of today don't need to change.

I'm not sure if that's the case.

The middle layer has to evaluate dev_pm_may_skip_resume() in
->resume_noirq() to check if the device can be left in suspend, as it
cannot determine that in ->suspend_noirq() yet.

Thanks,
Rafael
Ulf Hansson Nov. 17, 2017, 3:57 p.m. UTC | #8
On 17 November 2017 at 15:31, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, Nov 17, 2017 at 2:49 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> [...]
>>
>>>>
>>>>>> Second, have you considered setting the default value of
>>>>>> dev->power.may_skip_resume to true?
>>>>>
>>>>> Yes.
>>>>>
>>>>>> That would means the subsystem
>>>>>> instead need to implement an opt-out method. I am thinking that it may
>>>>>> not be an issue, since we anyway at this point, don't have drivers
>>>>>> using the LEAVE_SUSPENDED flag.
>>>>>
>>>>> Opt-out doesn't work because of the need to invoke the "noirq" callbacks.
>>>>
>>>> I am not sure I follow that.
>>>>
>>>> Whatever needs to be fixed on the subsystem level, that could be done
>>>> before the driver starts using the LEAVE_SUSPENDED flag. No?
>>>
>>> That requires a bit of explanation, sorry for being overly concise.
>>>
>>> The core calls ->resume_noirq from the middle layer regardless of
>>> whether or not the device will be left suspended, so the
>>> ->resume_noirq cannot do arbitrary things to it.  Setting
>>> may_skip_resume by the middle layer tells the core that the middle
>>> layer is ready for that and is going to cooperate.  If may_skip_resume
>>> had been set by default, that piece of information would have been
>>> missing.
>>
>> Huh, I still don't get that. Sorry.
>>
>> If the "may_skip_resume" is default set to true by the PM core,
>> wouldn't that just mean that the middle-layer needs to implement an
>> opt-out method, rather than opt-in. In principle to opt-out the
>> middle-layer needs to set may_skip_resume to false in suspend_noirq
>> phase, no?
>
> Yes, but if the middle-layer doesn't clear it, that may mean two
> things.  First, the middle layer is ready and so on.  Good.  Second,
> the middle layer is not aware of the whole thing.  Not good.  The core
> cannot tell.
>
> In the opt-in case, however, all is clear. :-)

Okay.

>
>> Then we only need to make sure drivers don't starts use
>> LEAVE_SUSPENDED, before we make sure the middle layers is adopted. But
>> that should not be a problem.
>>
>> The benefit would be that those middle layers that can cope with
>> LEAVE_SUSPENDED as of today don't need to change.
>
> I'm not sure if that's the case.
>
> The middle layer has to evaluate dev_pm_may_skip_resume() in
> ->resume_noirq() to check if the device can be left in suspend, as it
> cannot determine that in ->suspend_noirq() yet.

Right. Okay, let's stick with the chosen method.

Kind regards
Uffe
diff mbox series

Patch

Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -559,6 +559,7 @@  struct pm_subsys_data {
  * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
  * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
  * SMART_SUSPEND: No need to resume the device from runtime suspend.
+ * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
  *
  * Setting SMART_PREPARE instructs bus types and PM domains which may want
  * system suspend/resume callbacks to be skipped for the device to return 0 from
@@ -572,10 +573,14 @@  struct pm_subsys_data {
  * necessary from the driver's perspective.  It also may cause them to skip
  * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
  * the driver if they decide to leave the device in runtime suspend.
+ *
+ * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
+ * driver prefers the device to be left in runtime suspend after system resume.
  */
-#define DPM_FLAG_NEVER_SKIP	BIT(0)
-#define DPM_FLAG_SMART_PREPARE	BIT(1)
-#define DPM_FLAG_SMART_SUSPEND	BIT(2)
+#define DPM_FLAG_NEVER_SKIP		BIT(0)
+#define DPM_FLAG_SMART_PREPARE		BIT(1)
+#define DPM_FLAG_SMART_SUSPEND		BIT(2)
+#define DPM_FLAG_LEAVE_SUSPENDED	BIT(3)
 
 struct dev_pm_info {
 	pm_message_t		power_state;
@@ -597,6 +602,8 @@  struct dev_pm_info {
 	bool			wakeup_path:1;
 	bool			syscore:1;
 	bool			no_pm_callbacks:1;	/* Owned by the PM core */
+	unsigned int		must_resume:1;	/* Owned by the PM core */
+	unsigned int		may_skip_resume:1;	/* Set by subsystems */
 #else
 	unsigned int		should_wakeup:1;
 #endif
@@ -765,6 +772,7 @@  extern int pm_generic_poweroff_late(stru
 extern int pm_generic_poweroff(struct device *dev);
 extern void pm_generic_complete(struct device *dev);
 
+extern bool dev_pm_may_skip_resume(struct device *dev);
 extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
 
 #else /* !CONFIG_PM_SLEEP */
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -528,6 +528,18 @@  static void dpm_watchdog_clear(struct dp
 /*------------------------- Resume routines -------------------------*/
 
 /**
+ * dev_pm_may_skip_resume - System-wide device resume optimization check.
+ * @dev: Target device.
+ *
+ * Checks whether or not the device may be left in suspend after a system-wide
+ * transition to the working state.
+ */
+bool dev_pm_may_skip_resume(struct device *dev)
+{
+	return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
+}
+
+/**
  * device_resume_noirq - Execute a "noirq resume" callback for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
@@ -575,6 +587,12 @@  static int device_resume_noirq(struct de
 	error = dpm_run_callback(callback, dev, state, info);
 	dev->power.is_noirq_suspended = false;
 
+	if (dev_pm_may_skip_resume(dev)) {
+		pm_runtime_set_suspended(dev);
+		dev->power.is_late_suspended = false;
+		dev->power.is_suspended = false;
+	}
+
  Out:
 	complete_all(&dev->power.completion);
 	TRACE_RESUME(error);
@@ -1076,6 +1094,22 @@  static pm_message_t resume_event(pm_mess
 	return PMSG_ON;
 }
 
+static void dpm_superior_set_must_resume(struct device *dev)
+{
+	struct device_link *link;
+	int idx;
+
+	if (dev->parent)
+		dev->parent->power.must_resume = true;
+
+	idx = device_links_read_lock();
+
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
+		link->supplier->power.must_resume = true;
+
+	device_links_read_unlock(idx);
+}
+
 /**
  * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
  * @dev: Device to handle.
@@ -1127,10 +1161,28 @@  static int __device_suspend_noirq(struct
 	}
 
 	error = dpm_run_callback(callback, dev, state, info);
-	if (!error)
-		dev->power.is_noirq_suspended = true;
-	else
+	if (error) {
 		async_error = error;
+		goto Complete;
+	}
+
+	dev->power.is_noirq_suspended = true;
+
+	if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
+		/*
+		 * The only safe strategy here is to require that if the device
+		 * may not be left in suspend, resume callbacks must be invoked
+		 * for it.
+		 */
+		dev->power.must_resume = dev->power.must_resume ||
+					!dev->power.may_skip_resume ||
+					atomic_read(&dev->power.usage_count);
+	} else {
+		dev->power.must_resume = true;
+	}
+
+	if (dev->power.must_resume)
+		dpm_superior_set_must_resume(dev);
 
 Complete:
 	complete_all(&dev->power.completion);
@@ -1487,6 +1539,9 @@  static int __device_suspend(struct devic
 		dev->power.direct_complete = false;
 	}
 
+	dev->power.may_skip_resume = false;
+	dev->power.must_resume = false;
+
 	dpm_watchdog_set(&wd, dev);
 	device_lock(dev);
 
@@ -1652,8 +1707,9 @@  static int device_prepare(struct device
 	if (dev->power.syscore)
 		return 0;
 
-	WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
-		!pm_runtime_enabled(dev));
+	WARN_ON(!pm_runtime_enabled(dev) &&
+		dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND |
+					      DPM_FLAG_LEAVE_SUSPENDED));
 
 	/*
 	 * If a device's parent goes into runtime suspend at the wrong time,
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -788,6 +788,26 @@  must reflect the "active" status for run
 
 During system-wide resume from a sleep state it's easiest to put devices into
 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
-Refer to that document for more information regarding this particular issue as
+[Refer to that document for more information regarding this particular issue as
 well as for information on the device runtime power management framework in
-general.
+general.]
+
+However, it may be desirable to leave some devices in runtime suspend after
+system transitions to the working state and device drivers can use the
+``DPM_FLAG_LEAVE_SUSPENDED`` flag to indicate to the PM core (and middle-layer
+code) that this is the case.  Whether or not the devices will actually be left
+in suspend may depend on their state before the given system suspend-resume
+cycle and on the type of the system transition under way.  In particular,
+devices are not left suspended if that transition is a restore from hibernation,
+as device states are not guaranteed to be reflected by the information stored in
+the hibernation image in that case.
+
+The middle-layer code involved in the handling of the device has to indicate to
+the PM core if the device may be left in suspend with the help of its
+:c:member:`power.may_skip_resume` status bit.  That has to happen in the "noirq"
+phase of the preceding system-wide suspend (or analogous) transition.  The
+middle layer is then responsible for handling the device as appropriate in its
+"noirq" resume callback, which is executed regardless of whether or not the
+device may be left suspended, but the other resume callbacks (except for
+``->complete``) will be skipped automatically by the PM core if the device
+really can be left in suspend.