Patchwork Do we need asynchronous pm_runtime_get()? (was: Re: bisected regression ...)

login
register
mail settings
Submitter Rafael J. Wysocki
Date Aug. 6, 2012, 8:30 p.m.
Message ID <201208062230.58553.rjw@sisk.pl>
Download mbox | patch
Permalink /patch/175446/
State Not Applicable
Headers show

Comments

Rafael J. Wysocki - Aug. 6, 2012, 8:30 p.m.
On Monday, August 06, 2012, Alan Stern wrote:
> On Sun, 5 Aug 2012, Rafael J. Wysocki wrote:
[...]
> 
> > What about the appended experimental patch?
> 
> For now, I think it would be best to start with a single func argument.  
> If it turns out that anyone really needs to have two separate
> arguments, the single-argument form can be reimplemented on top of the
> two-argument form easily enough.

All right.

> > @@ -484,6 +484,15 @@ static int rpm_suspend(struct device *de
> >  	goto out;
> >  }
> >  
> > +void rpm_queue_up_resume(struct device *dev)
> > +{
> > +	dev->power.request = RPM_REQ_RESUME;
> > +	if (!dev->power.request_pending) {
> > +		dev->power.request_pending = true;
> > +		queue_work(pm_wq, &dev->power.work);
> > +	}
> > +}
> > +
> >  /**
> >   * rpm_resume - Carry out runtime resume of given device.
> >   * @dev: Device to resume.
> > @@ -524,7 +533,9 @@ static int rpm_resume(struct device *dev
> >  	 * rather than cancelling it now only to restart it again in the near
> >  	 * future.
> >  	 */
> > -	dev->power.request = RPM_REQ_NONE;
> > +	if (dev->power.request != RPM_REQ_RESUME || !dev->power.func)
> > +		dev->power.request = RPM_REQ_NONE;
> > +
> >  	if (!dev->power.timer_autosuspends)
> >  		pm_runtime_deactivate_timer(dev);
> >  
> > @@ -533,6 +544,12 @@ static int rpm_resume(struct device *dev
> >  		goto out;
> >  	}
> >  
> > +	if (dev->power.func && (rpmflags & RPM_ASYNC)) {
> > +		rpm_queue_up_resume(dev);
> > +		retval = 0;
> > +		goto out;
> > +	}
> > +
> >  	if (dev->power.runtime_status == RPM_RESUMING
> >  	    || dev->power.runtime_status == RPM_SUSPENDING) {
> >  		DEFINE_WAIT(wait);
> 
> All those changes (and some of the following ones) are symptoms of a
> basic mistake in this approach.

Every time you say something like this (i.e. liks someone who knows better)
I kind of feel like being under attach, which I hope is not your intention.
Never mind. :-)

Those changes are there for different reasons rather unrelated to the way
func() is being called, so let me explain.

First, rpm_queue_up_resume() is introduced, because the code it contains will
have to be called in two different places.  At least I don't see how to avoid
that without increasing the code complexity too much.

Second, the following change in rpm_resume()

-	dev->power.request = RPM_REQ_NONE;
+	if (dev->power.request != RPM_REQ_RESUME || !dev->power.func)
+		dev->power.request = RPM_REQ_NONE;

is made to prevent rpm_resume() from canceling the execution of func() queued
up by an earlier pm_runtime_get_and_call().  I believe it is necessary.

Finally, this change in rpm_resume():
 
+	if (dev->power.func && (rpmflags & RPM_ASYNC)) {
+		rpm_queue_up_resume(dev);
+		retval = 0;
+		goto out;
+	}

is not strictly necessary if pm_runtime_get_and_call() is modified to run
rpm_queue_up_resume() directly, like in the new version of the patch which is
appended.  This reduces the number of checks overall, so perhaps it's better.

> The idea of this new feature is to
> call "func" as soon as we know the device is at full power, no matter
> how it got there.

Yes, it is so.

> That means we should call it near the end of
> rpm_resume() (just before the rpm_idle() call), not from within
> pm_runtime_work().
> 
> Doing it this way will be more efficient and (I think) will remove
> some races.

Except that func() shouldn't be executed under dev->power.lock, which makes it
rather difficult to call it from rpm_resume().  Or at least it seems so.

Moreover, it should be called after we've changed the status to RPM_ACTIVE
_and_ dropped the lock.

Besides, I'd like to know what races you're referring to (perhaps you're seeing
some more of them than I am).

> > @@ -877,6 +903,48 @@ int __pm_runtime_resume(struct device *d
> >  }
> >  EXPORT_SYMBOL_GPL(__pm_runtime_resume);
> >  
> > +int pm_runtime_get_and_call(struct device *dev, void (*func)(struct device *),
> > +			     void (*func_async)(struct device *))
> > +{
> > +	unsigned long flags;
> > +	int ret;
> > +
> > +	atomic_inc(&dev->power.usage_count);
> > +
> > +	spin_lock_irqsave(&dev->power.lock, flags);
> > +
> > +	ret = dev->power.runtime_error;
> > +	if (ret) {
> > +		func = NULL;
> > +		goto out;
> > +	}
> > +
> > +	if (dev->power.runtime_status != RPM_ACTIVE
> > +	    && dev->power.disable_depth == 0)
> > +		func = NULL;
> > +
> > +	if (!func && func_async) {
> > +		if (dev->power.func) {
> > +			ret = -EAGAIN;
> > +			goto out;
> > +		} else {
> > +			dev->power.func = func_async;
> > +		}
> > +	}
> 
> The logic here is kind of hard to follow.  It will be simpler when
> there's only one "func":
> 
> 	If the status is RPM_ACTIVE or disable_depth > 0 then
> 	call "func" directly.
> 
> 	Otherwise save "func" in dev.power and do an async resume.

Yeah.  But do not run func() under power.lock, right?

> Some more things:
> 
> In __pm_runtime_set_status(), if power.func is set then I think we
> should call it if the new status is ACTIVE.

__pm_runtime_set_status() only works if power.disable_depth > 0, in
which case func() will be called synchronously.  Now, if someone does
pm_runtime_get_and_call() immediately followed by pm_runtime_disable()
(perhaps from a different thread), then __pm_runtime_disable() will
resume the device, so it's better to call func() from there if set,
I think.  And clear power.func afterwards.

> Likwise at the end of rpm_suspend(), if the suspend fails.

The only way by which power.func can be different from NULL at that point
is when the work item queued up by pm_runtime_get_and_call() runs after
rpm_suspend() has changed the status to RPM_SUSPENDING.  However in that
case the rpm_resume(dev, 0) in pm_runtime_work() will wait for the
suspend to complete (or fail) and func() will be run when it returns.

> There should be an invariant: Whenever the status is RPM_ACTIVE,
> power.func must be NULL.

Well, maybe so, but I don't see why right now.

> pm_runtime_barrier() should always clear power.func, even if the 
> rpm_resume() call fails.

Wouldn't it be better to wait for func() to be run?

> The documentation should explain that in some ways, "func" is like a
> workqueue callback routine: Subsystems and drivers have to be careful
> to make sure that it can't be invoked after the device is unbound.  A
> safe way to do this is to call pm_runtime_barrier() from within the
> driver's .remove() routine, after making sure that
> pm_runtime_get_and_call() won't be invoked any more.

Sure.

The appended patch doesn't contain any changes in __pm_runtime_disable()
or pm_runtime_barrier(), but you are right that they are necessary.

Thanks,
Rafael


---
 drivers/base/power/runtime.c |   98 ++++++++++++++++++++++++++++++++++++++-----
 include/linux/pm.h           |    1 
 include/linux/pm_runtime.h   |   11 ++++
 3 files changed, 99 insertions(+), 11 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Aug. 7, 2012, 12:28 p.m.
On Monday, August 06, 2012, Rafael J. Wysocki wrote:
> On Monday, August 06, 2012, Alan Stern wrote:
> > On Sun, 5 Aug 2012, Rafael J. Wysocki wrote:
> [...]
> > 
> > > What about the appended experimental patch?
> > 
> > For now, I think it would be best to start with a single func argument.  
> > If it turns out that anyone really needs to have two separate
> > arguments, the single-argument form can be reimplemented on top of the
> > two-argument form easily enough.
> 
> All right.
> 
> > > @@ -484,6 +484,15 @@ static int rpm_suspend(struct device *de
> > >  	goto out;
> > >  }
> > >  
> > > +void rpm_queue_up_resume(struct device *dev)
> > > +{
> > > +	dev->power.request = RPM_REQ_RESUME;
> > > +	if (!dev->power.request_pending) {
> > > +		dev->power.request_pending = true;
> > > +		queue_work(pm_wq, &dev->power.work);
> > > +	}
> > > +}
> > > +
> > >  /**
> > >   * rpm_resume - Carry out runtime resume of given device.
> > >   * @dev: Device to resume.
> > > @@ -524,7 +533,9 @@ static int rpm_resume(struct device *dev
> > >  	 * rather than cancelling it now only to restart it again in the near
> > >  	 * future.
> > >  	 */
> > > -	dev->power.request = RPM_REQ_NONE;
> > > +	if (dev->power.request != RPM_REQ_RESUME || !dev->power.func)
> > > +		dev->power.request = RPM_REQ_NONE;
> > > +
> > >  	if (!dev->power.timer_autosuspends)
> > >  		pm_runtime_deactivate_timer(dev);
> > >  
> > > @@ -533,6 +544,12 @@ static int rpm_resume(struct device *dev
> > >  		goto out;
> > >  	}
> > >  
> > > +	if (dev->power.func && (rpmflags & RPM_ASYNC)) {
> > > +		rpm_queue_up_resume(dev);
> > > +		retval = 0;
> > > +		goto out;
> > > +	}
> > > +
> > >  	if (dev->power.runtime_status == RPM_RESUMING
> > >  	    || dev->power.runtime_status == RPM_SUSPENDING) {
> > >  		DEFINE_WAIT(wait);
> > 
> > All those changes (and some of the following ones) are symptoms of a
> > basic mistake in this approach.
> 
> Every time you say something like this (i.e. liks someone who knows better)

s/liks/like/

> I kind of feel like being under attach, which I hope is not your intention.

s/attach/attack/

Two typos in one sentence, I guess it could have been worse ...

> Never mind. :-)
> 
> Those changes are there for different reasons rather unrelated to the way
> func() is being called, so let me explain.
> 
> First, rpm_queue_up_resume() is introduced, because the code it contains will
> have to be called in two different places.  At least I don't see how to avoid
> that without increasing the code complexity too much.
> 
> Second, the following change in rpm_resume()
> 
> -	dev->power.request = RPM_REQ_NONE;
> +	if (dev->power.request != RPM_REQ_RESUME || !dev->power.func)
> +		dev->power.request = RPM_REQ_NONE;
> 
> is made to prevent rpm_resume() from canceling the execution of func() queued
> up by an earlier pm_runtime_get_and_call().  I believe it is necessary.
> 
> Finally, this change in rpm_resume():
>  
> +	if (dev->power.func && (rpmflags & RPM_ASYNC)) {
> +		rpm_queue_up_resume(dev);
> +		retval = 0;
> +		goto out;
> +	}
> 
> is not strictly necessary if pm_runtime_get_and_call() is modified to run
> rpm_queue_up_resume() directly, like in the new version of the patch which is
> appended.  This reduces the number of checks overall, so perhaps it's better.
> 
> > The idea of this new feature is to
> > call "func" as soon as we know the device is at full power, no matter
> > how it got there.
> 
> Yes, it is so.
> 
> > That means we should call it near the end of
> > rpm_resume() (just before the rpm_idle() call), not from within
> > pm_runtime_work().
> > 
> > Doing it this way will be more efficient and (I think) will remove
> > some races.
> 
> Except that func() shouldn't be executed under dev->power.lock, which makes it
> rather difficult to call it from rpm_resume().  Or at least it seems so.
> 
> Moreover, it should be called after we've changed the status to RPM_ACTIVE
> _and_ dropped the lock.

So we could drop the lock right before returning, execute func() and acquire
the lock once again, but then func() might be executed by any thread that
happened to resume the device.  In that case the caller of
pm_runtime_get_and_call() would have to worry about locks that such threads
might acquire and it would have to make sure that func() didn't try to acquire
them too.  That may not be a big deal, but if func() is executed by
pm_runtime_work(), that issue simply goes away.

Then, however, there's another issue: what should happen if
pm_runtime_get_and_call() finds that it cannot execute func() right away,
so it queues up resume and the execution of it, in the meantime some other
thread resumes the device synchronously and pm_runtime_get_and_call() is
run again.  I think in that case func() should be executed synchronously
and the one waiting for execution should be canceled.  The alternative
would be to return -EAGAIN from pm_runtime_get_and_call() and expect the
caller to cope with that, which isn't too attractive.

This actually is analogous to the case when pm_runtime_get_and_call()
sees that power.func is not NULL.  In my experimental patches it returned
-EAGAIN in that case, but perhaps it's better to replace the existing
power.func with the new one.  Then, by doing pm_runtime_get_and_call(dev, NULL)
we can ensure that either the previous func() has run already or it will never
run, which may be useful.

Well, I guess I need to prepare a new experimantal patch. :-)

I'll send it in a new thread I think.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - Aug. 7, 2012, 5:15 p.m.
On Tue, 7 Aug 2012, Rafael J. Wysocki wrote:

> > > All those changes (and some of the following ones) are symptoms of a
> > > basic mistake in this approach.
> > 
> > Every time you say something like this (i.e. liks someone who knows better)
> 
> s/liks/like/
> 
> > I kind of feel like being under attach, which I hope is not your intention.
> 
> s/attach/attack/

Sorry; you're right.  It's all too easy to get very arrogant in email 
messages.  I'll try not to attack so strongly in the future.

> > > The idea of this new feature is to
> > > call "func" as soon as we know the device is at full power, no matter
> > > how it got there.
> > 
> > Yes, it is so.

Incidentally, that sentence is the justification for the invariance
condition mentioned later.  power.func should be called as soon as we
know the device is at full power; therefore when the status changes to
RPM_ACTIVE it should be called and then cleared (if it was set), and it
should never get set while the status is RPM_ACTIVE.  Therefore it
should never be true that power.func is set _and_ the status is
RPM_ACTIVE.

> > > That means we should call it near the end of
> > > rpm_resume() (just before the rpm_idle() call), not from within
> > > pm_runtime_work().
> > > 
> > > Doing it this way will be more efficient and (I think) will remove
> > > some races.
> > 
> > Except that func() shouldn't be executed under dev->power.lock, which makes it
> > rather difficult to call it from rpm_resume().  Or at least it seems so.
> > 
> > Moreover, it should be called after we've changed the status to RPM_ACTIVE
> > _and_ dropped the lock.
> 
> So we could drop the lock right before returning, execute func() and acquire
> the lock once again,

Yes; that's what I had in mind.  We already do something similar when 
calling pm_runtime_put(parent).

>  but then func() might be executed by any thread that
> happened to resume the device.  In that case the caller of
> pm_runtime_get_and_call() would have to worry about locks that such threads
> might acquire and it would have to make sure that func() didn't try to acquire
> them too.  That may not be a big deal, but if func() is executed by
> pm_runtime_work(), that issue simply goes away.

But then you have to worry about races between pm_runtime_resume() and
the workqueue.  If the device is resumed by some other thread, it
could be suspended again before "func" is called.

> Then, however, there's another issue: what should happen if
> pm_runtime_get_and_call() finds that it cannot execute func() right away,
> so it queues up resume and the execution of it, in the meantime some other
> thread resumes the device synchronously and pm_runtime_get_and_call() is
> run again.  I think in that case func() should be executed synchronously
> and the one waiting for execution should be canceled.  The alternative
> would be to return -EAGAIN from pm_runtime_get_and_call() and expect the
> caller to cope with that, which isn't too attractive.
> 
> This actually is analogous to the case when pm_runtime_get_and_call()
> sees that power.func is not NULL.  In my experimental patches it returned
> -EAGAIN in that case, but perhaps it's better to replace the existing
> power.func with the new one.  Then, by doing pm_runtime_get_and_call(dev, NULL)
> we can ensure that either the previous func() has run already or it will never
> run, which may be useful.

A good point.  I agree that pm_runtime_get_and_call() should always 
overwrite the existing power.func value.

There are a couple of other issues remaining.

What's the best approach when disable_count > 0?  My feeling is that we
should still rely on power.runtime_status as the best approximation to
the device's state, so we shouldn't call "func" directly unless the
status is already RPM_ACTIVE.  If the status is something else, we
can't queue an async resume request.  So we just set power.func and
return.  Eventually the driver will either call pm_runtime_set_active()
or pm_runtime_enable() followed by pm_runtime_resume(), at which time
we would call power.func.

Also, what should happen when power.runtime_error is set?  The same as
when disable_depth > 0?

You mentioned that pm_runtime_disable() does a resume if there's a 
pending resume request.  I had forgotten about this.  It worries me, 
because subsystems use code sequences like this:

	pm_runtime_disable(dev);
	pm_runtime_set_active(dev);
	pm_runtime_enable(dev);

in their system resume routines (in fact, we advise them to do so in
the Documentation file).  Now, it is unlikely for a resume request to
be pending during system sleep, but it doesn't seem to be impossible.  
When there is such a pending request, the pm_runtime_disable() call
will try to do a runtime resume at a time when the device has just been
restored to full power.  That's not good.

Probably this pattern occurs in few enough places that we could go
through and fix them all.  But how?  Should there be a new function:
pm_adjust_runtime_status_after_system_resume()?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Aug. 7, 2012, 9:31 p.m.
On Tuesday, August 07, 2012, Alan Stern wrote:
> On Tue, 7 Aug 2012, Rafael J. Wysocki wrote:
> 
> > > > All those changes (and some of the following ones) are symptoms of a
> > > > basic mistake in this approach.
> > > 
> > > Every time you say something like this (i.e. liks someone who knows better)
> > 
> > s/liks/like/
> > 
> > > I kind of feel like being under attach, which I hope is not your intention.
> > 
> > s/attach/attack/
> 
> Sorry; you're right.  It's all too easy to get very arrogant in email 
> messages.  I'll try not to attack so strongly in the future.

Thanks!

> > > > The idea of this new feature is to
> > > > call "func" as soon as we know the device is at full power, no matter
> > > > how it got there.
> > > 
> > > Yes, it is so.
> 
> Incidentally, that sentence is the justification for the invariance
> condition mentioned later.

:-)

> power.func should be called as soon as we
> know the device is at full power; therefore when the status changes to
> RPM_ACTIVE it should be called and then cleared (if it was set), and it
> should never get set while the status is RPM_ACTIVE.  Therefore it
> should never be true that power.func is set _and_ the status is
> RPM_ACTIVE.

I guess with the patch I've just sent:

http://marc.info/?l=linux-pm&m=134437366811066&w=4

it's almost the case, except when a synchronous resume happens before the work
item scheduled by __pm_runtime_get_and_call() is run.  However, I don't think
it is a problem in that case, because the device won't be suspended before the
execution of that work item starts (rpm_check_suspend_allowed() will see that
power.request_pending is set and that power.request is RPM_REQ_RESUME, so it
will return -EAGAIN).

> > > > That means we should call it near the end of
> > > > rpm_resume() (just before the rpm_idle() call), not from within
> > > > pm_runtime_work().
> > > > 
> > > > Doing it this way will be more efficient and (I think) will remove
> > > > some races.
> > > 
> > > Except that func() shouldn't be executed under dev->power.lock, which makes it
> > > rather difficult to call it from rpm_resume().  Or at least it seems so.
> > > 
> > > Moreover, it should be called after we've changed the status to RPM_ACTIVE
> > > _and_ dropped the lock.
> > 
> > So we could drop the lock right before returning, execute func() and acquire
> > the lock once again,
> 
> Yes; that's what I had in mind.  We already do something similar when 
> calling pm_runtime_put(parent).

Yes, we do.  However, I still don't think it's really safe to call func()
from rpm_resume(), because it may be run synchronously from a context
quite unrelated to the caller of __pm_runtime_get_and_call() (for example,
from the pm_runtime_barrier() in __device_suspend()).

> >  but then func() might be executed by any thread that
> > happened to resume the device.  In that case the caller of
> > pm_runtime_get_and_call() would have to worry about locks that such threads
> > might acquire and it would have to make sure that func() didn't try to acquire
> > them too.  That may not be a big deal, but if func() is executed by
> > pm_runtime_work(), that issue simply goes away.
> 
> But then you have to worry about races between pm_runtime_resume() and
> the workqueue.  If the device is resumed by some other thread, it
> could be suspended again before "func" is called.

No, it can't, if the device's usage count is incremented before dropping
power.lock after rpm_resume(dev, 0) has returned.

> > Then, however, there's another issue: what should happen if
> > pm_runtime_get_and_call() finds that it cannot execute func() right away,
> > so it queues up resume and the execution of it, in the meantime some other
> > thread resumes the device synchronously and pm_runtime_get_and_call() is
> > run again.  I think in that case func() should be executed synchronously
> > and the one waiting for execution should be canceled.  The alternative
> > would be to return -EAGAIN from pm_runtime_get_and_call() and expect the
> > caller to cope with that, which isn't too attractive.
> > 
> > This actually is analogous to the case when pm_runtime_get_and_call()
> > sees that power.func is not NULL.  In my experimental patches it returned
> > -EAGAIN in that case, but perhaps it's better to replace the existing
> > power.func with the new one.  Then, by doing pm_runtime_get_and_call(dev, NULL)
> > we can ensure that either the previous func() has run already or it will never
> > run, which may be useful.
> 
> A good point.  I agree that pm_runtime_get_and_call() should always 
> overwrite the existing power.func value.
> 
> There are a couple of other issues remaining.
> 
> What's the best approach when disable_count > 0?  My feeling is that we
> should still rely on power.runtime_status as the best approximation to
> the device's state, so we shouldn't call "func" directly unless the
> status is already RPM_ACTIVE.

Well, that's one possibility.  In that case, though, the caller may want
to run func() regardless of whether or not runtime PM is enabled for the given
device and that would require some serious trickery.  For this reason, in
the newest patch (http://marc.info/?l=linux-pm&m=134437366811066&w=4) the
caller can choose what to do. 

> If the status is something else, we
> can't queue an async resume request.  So we just set power.func and
> return.  Eventually the driver will either call pm_runtime_set_active()
> or pm_runtime_enable() followed by pm_runtime_resume(), at which time
> we would call power.func.
> 
> Also, what should happen when power.runtime_error is set?  The same as
> when disable_depth > 0?

I think so.

> You mentioned that pm_runtime_disable() does a resume if there's a 
> pending resume request.  I had forgotten about this.  It worries me, 
> because subsystems use code sequences like this:
> 
> 	pm_runtime_disable(dev);
> 	pm_runtime_set_active(dev);
> 	pm_runtime_enable(dev);
> 
> in their system resume routines (in fact, we advise them to do so in
> the Documentation file).  Now, it is unlikely for a resume request to
> be pending during system sleep, but it doesn't seem to be impossible.  
> When there is such a pending request, the pm_runtime_disable() call
> will try to do a runtime resume at a time when the device has just been
> restored to full power.  That's not good.

Well, they should do __pm_runtime_disable(dev, false), then.

> Probably this pattern occurs in few enough places that we could go
> through and fix them all.  But how?  Should there be a new function:
> pm_adjust_runtime_status_after_system_resume()?

I think the above would suffice.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

Index: linux/drivers/base/power/runtime.c
===================================================================
--- linux.orig/drivers/base/power/runtime.c
+++ linux/drivers/base/power/runtime.c
@@ -484,6 +484,15 @@  static int rpm_suspend(struct device *de
 	goto out;
 }
 
+void rpm_queue_up_resume(struct device *dev)
+{
+	dev->power.request = RPM_REQ_RESUME;
+	if (!dev->power.request_pending) {
+		dev->power.request_pending = true;
+		queue_work(pm_wq, &dev->power.work);
+	}
+}
+
 /**
  * rpm_resume - Carry out runtime resume of given device.
  * @dev: Device to resume.
@@ -519,12 +528,18 @@  static int rpm_resume(struct device *dev
 		goto out;
 
 	/*
-	 * Other scheduled or pending requests need to be canceled.  Small
-	 * optimization: If an autosuspend timer is running, leave it running
-	 * rather than cancelling it now only to restart it again in the near
-	 * future.
+	 * Other scheduled or pending requests need to be canceled.  If the
+	 * execution of a function is queued up along with a resume request,
+	 * do not cancel it.
+	 */
+	if (dev->power.request != RPM_REQ_RESUME || !dev->power.func)
+		dev->power.request = RPM_REQ_NONE;
+
+	/*
+	 * Small optimization: If an autosuspend timer is running, leave it
+	 * running rather than cancelling it now only to restart it again in the
+	 * near future.
 	 */
-	dev->power.request = RPM_REQ_NONE;
 	if (!dev->power.timer_autosuspends)
 		pm_runtime_deactivate_timer(dev);
 
@@ -591,11 +606,7 @@  static int rpm_resume(struct device *dev
 
 	/* Carry out an asynchronous or a synchronous resume. */
 	if (rpmflags & RPM_ASYNC) {
-		dev->power.request = RPM_REQ_RESUME;
-		if (!dev->power.request_pending) {
-			dev->power.request_pending = true;
-			queue_work(pm_wq, &dev->power.work);
-		}
+		rpm_queue_up_resume(dev);
 		retval = 0;
 		goto out;
 	}
@@ -691,6 +702,7 @@  static int rpm_resume(struct device *dev
 static void pm_runtime_work(struct work_struct *work)
 {
 	struct device *dev = container_of(work, struct device, power.work);
+	void (*func)(struct device *) = NULL;
 	enum rpm_request req;
 
 	spin_lock_irq(&dev->power.lock);
@@ -715,12 +727,24 @@  static void pm_runtime_work(struct work_
 		rpm_suspend(dev, RPM_NOWAIT | RPM_AUTO);
 		break;
 	case RPM_REQ_RESUME:
-		rpm_resume(dev, RPM_NOWAIT);
+		func = dev->power.func;
+		if (func) {
+			dev->power.func = NULL;
+			pm_runtime_get_noresume(dev);
+			rpm_resume(dev, 0);
+		} else {
+			rpm_resume(dev, RPM_NOWAIT);
+		}
 		break;
 	}
 
  out:
 	spin_unlock_irq(&dev->power.lock);
+
+	if (func) {
+		func(dev);
+		pm_runtime_put(dev);
+	}
 }
 
 /**
@@ -877,6 +901,58 @@  int __pm_runtime_resume(struct device *d
 }
 EXPORT_SYMBOL_GPL(__pm_runtime_resume);
 
+int pm_runtime_get_and_call(struct device *dev, void (*func)(struct device *))
+{
+	unsigned long flags;
+	bool sync = false;
+	int ret;
+
+	atomic_inc(&dev->power.usage_count);
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	ret = dev->power.runtime_error;
+	if (ret)
+		goto out;
+
+	sync = dev->power.runtime_status == RPM_ACTIVE
+		|| dev->power.disable_depth > 0;
+
+	if (!sync) {
+		if (dev->power.func) {
+			ret = -EAGAIN;
+			goto out;
+		} else {
+			dev->power.func = func;
+		}
+	}
+
+	/*
+	 * The approach here is the same as in rpm_suspend(): autosuspend timers
+	 * will be activated shortly anyway, so it is pointless to cancel them
+	 * now.
+	 */
+	if (!dev->power.timer_autosuspends)
+		pm_runtime_deactivate_timer(dev);
+
+	if (sync)
+		dev->power.request = RPM_REQ_NONE;
+	else
+		rpm_queue_up_resume(dev);
+
+ out:
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	if (sync) {
+		if (func)
+			func(dev);
+
+		return 1;
+	}
+
+	return ret;
+}
+
 /**
  * __pm_runtime_set_status - Set runtime PM status of a device.
  * @dev: Device to handle.
Index: linux/include/linux/pm.h
===================================================================
--- linux.orig/include/linux/pm.h
+++ linux/include/linux/pm.h
@@ -547,6 +547,7 @@  struct dev_pm_info {
 	unsigned long		suspended_jiffies;
 	unsigned long		accounting_timestamp;
 	struct dev_pm_qos_request *pq_req;
+	void			(*func)(struct device *);
 #endif
 	struct pm_subsys_data	*subsys_data;  /* Owned by the subsystem. */
 	struct pm_qos_constraints *constraints;
Index: linux/include/linux/pm_runtime.h
===================================================================
--- linux.orig/include/linux/pm_runtime.h
+++ linux/include/linux/pm_runtime.h
@@ -47,6 +47,8 @@  extern void pm_runtime_set_autosuspend_d
 extern unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
 extern void pm_runtime_update_max_time_suspended(struct device *dev,
 						 s64 delta_ns);
+extern int pm_runtime_get_and_call(struct device *dev,
+				   void (*func)(struct device *));
 
 static inline bool pm_children_suspended(struct device *dev)
 {
@@ -150,6 +152,15 @@  static inline void pm_runtime_set_autosu
 static inline unsigned long pm_runtime_autosuspend_expiration(
 				struct device *dev) { return 0; }
 
+static inline int pm_runtime_get_and_call(struct device *dev,
+					  void (*func)(struct device *))
+{
+	if (func)
+		func(dev);
+
+	return 1;
+}
+
 #endif /* !CONFIG_PM_RUNTIME */
 
 static inline int pm_runtime_idle(struct device *dev)