diff mbox

[1/2] scsi: sd: set ready_to_power_off for scsi disk

Message ID 1347522049-1836-2-git-send-email-aaron.lu@intel.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Aaron Lu Sept. 13, 2012, 7:40 a.m. UTC
The ready_to_power_off flag is used to give indication to ATA layer
if this device's power can be removed when runtime suspended.

This flag is determined by individual SCSI driver like sr, sd.

This flag is introduced to support zero power ODD. When ODD
is runtime suspended, it may not be OK to remove its power.

But for disk, it is always OK to be powered off, so set this flag.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/scsi/sd.c | 1 +
 1 file changed, 1 insertion(+)

Comments

James Bottomley Sept. 13, 2012, 8:14 a.m. UTC | #1
On Thu, 2012-09-13 at 15:40 +0800, Aaron Lu wrote:
> The ready_to_power_off flag is used to give indication to ATA layer
> if this device's power can be removed when runtime suspended.
> 
> This flag is determined by individual SCSI driver like sr, sd.
> 
> This flag is introduced to support zero power ODD. When ODD
> is runtime suspended, it may not be OK to remove its power.
> 
> But for disk, it is always OK to be powered off, so set this flag.

It is? I may have missed this, but where do you flush the cache of write
back cache devices you're about to power off?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu Sept. 13, 2012, 8:23 a.m. UTC | #2
On 09/13/2012 04:14 PM, James Bottomley wrote:
> On Thu, 2012-09-13 at 15:40 +0800, Aaron Lu wrote:
>> The ready_to_power_off flag is used to give indication to ATA layer
>> if this device's power can be removed when runtime suspended.
>>
>> This flag is determined by individual SCSI driver like sr, sd.
>>
>> This flag is introduced to support zero power ODD. When ODD
>> is runtime suspended, it may not be OK to remove its power.
>>
>> But for disk, it is always OK to be powered off, so set this flag.
> 
> It is? I may have missed this, but where do you flush the cache of write
> back cache devices you're about to power off?

I suppose that is handled in sd_suspend callback, the power off happens
after a device is runtime suspended.

Thanks,
Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Sept. 13, 2012, 8:37 a.m. UTC | #3
On Thu, 2012-09-13 at 16:23 +0800, Aaron Lu wrote:
> On 09/13/2012 04:14 PM, James Bottomley wrote:
> > On Thu, 2012-09-13 at 15:40 +0800, Aaron Lu wrote:
> >> The ready_to_power_off flag is used to give indication to ATA layer
> >> if this device's power can be removed when runtime suspended.
> >>
> >> This flag is determined by individual SCSI driver like sr, sd.
> >>
> >> This flag is introduced to support zero power ODD. When ODD
> >> is runtime suspended, it may not be OK to remove its power.
> >>
> >> But for disk, it is always OK to be powered off, so set this flag.
> > 
> > It is? I may have missed this, but where do you flush the cache of write
> > back cache devices you're about to power off?
> 
> I suppose that is handled in sd_suspend callback, the power off happens
> after a device is runtime suspended.

Well that would mean something is wrong somewhere:  For runtime power
management using idle timers and forced standby, there's no need to
flush the cache (if the drive goes into standby on its own as a result
of an idle timeout, the cache will never flush).  The cache needs to
flush before we power off the device: that's before the system goes into
S3, or now before you power it off at runtime.  Flushing the cache on
runtime transitions to standby will likely cause performance problems
since that happens quite often.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu Sept. 13, 2012, 8:49 a.m. UTC | #4
On 09/13/2012 04:37 PM, James Bottomley wrote:
> On Thu, 2012-09-13 at 16:23 +0800, Aaron Lu wrote:
>> On 09/13/2012 04:14 PM, James Bottomley wrote:
>>> On Thu, 2012-09-13 at 15:40 +0800, Aaron Lu wrote:
>>>> The ready_to_power_off flag is used to give indication to ATA layer
>>>> if this device's power can be removed when runtime suspended.
>>>>
>>>> This flag is determined by individual SCSI driver like sr, sd.
>>>>
>>>> This flag is introduced to support zero power ODD. When ODD
>>>> is runtime suspended, it may not be OK to remove its power.
>>>>
>>>> But for disk, it is always OK to be powered off, so set this flag.
>>>
>>> It is? I may have missed this, but where do you flush the cache of write
>>> back cache devices you're about to power off?
>>
>> I suppose that is handled in sd_suspend callback, the power off happens
>> after a device is runtime suspended.
> 
> Well that would mean something is wrong somewhere:  For runtime power
> management using idle timers and forced standby, there's no need to

The current mechanism for scsi disk runtime pm is based on open/close.
If there is some process opened this block device, it will be in active
state; only when all opened session exited, it will enter runtime
suspend state.

> flush the cache (if the drive goes into standby on its own as a result
> of an idle timeout, the cache will never flush).  The cache needs to
> flush before we power off the device: that's before the system goes into
> S3, or now before you power it off at runtime.  Flushing the cache on
> runtime transitions to standby will likely cause performance problems
> since that happens quite often.

As explained above, it didn't happen that often, especially for user who
has only one disk, the disk will be mounted, which makes it never be
able to enter runtime suspend state.

Thanks,
Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Sept. 13, 2012, 8:56 a.m. UTC | #5
On Thu, 2012-09-13 at 16:49 +0800, Aaron Lu wrote:
> On 09/13/2012 04:37 PM, James Bottomley wrote:
> > On Thu, 2012-09-13 at 16:23 +0800, Aaron Lu wrote:
> >> On 09/13/2012 04:14 PM, James Bottomley wrote:
> >>> On Thu, 2012-09-13 at 15:40 +0800, Aaron Lu wrote:
> >>>> The ready_to_power_off flag is used to give indication to ATA layer
> >>>> if this device's power can be removed when runtime suspended.
> >>>>
> >>>> This flag is determined by individual SCSI driver like sr, sd.
> >>>>
> >>>> This flag is introduced to support zero power ODD. When ODD
> >>>> is runtime suspended, it may not be OK to remove its power.
> >>>>
> >>>> But for disk, it is always OK to be powered off, so set this flag.
> >>>
> >>> It is? I may have missed this, but where do you flush the cache of write
> >>> back cache devices you're about to power off?
> >>
> >> I suppose that is handled in sd_suspend callback, the power off happens
> >> after a device is runtime suspended.
> > 
> > Well that would mean something is wrong somewhere:  For runtime power
> > management using idle timers and forced standby, there's no need to
> 
> The current mechanism for scsi disk runtime pm is based on open/close.
> If there is some process opened this block device, it will be in active
> state; only when all opened session exited, it will enter runtime
> suspend state.

A mounted disk is open for the period of the mount.  I thought the use
case for runtime PM was the laptop one but most laptops have a single
device to use as root, so if you never use runtime PM on an open device,
you never use it on 99% of our target systems ... doesn't that make the
feature a bit useless?

> > flush the cache (if the drive goes into standby on its own as a result
> > of an idle timeout, the cache will never flush).  The cache needs to
> > flush before we power off the device: that's before the system goes into
> > S3, or now before you power it off at runtime.  Flushing the cache on
> > runtime transitions to standby will likely cause performance problems
> > since that happens quite often.
> 
> As explained above, it didn't happen that often, especially for user who
> has only one disk, the disk will be mounted, which makes it never be
> able to enter runtime suspend state.

So what's the target audience for the feature.  If it isn't laptops or
standard desktops, is it the enterprise?

James

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu Sept. 13, 2012, 9:07 a.m. UTC | #6
On 09/13/2012 04:56 PM, James Bottomley wrote:
> On Thu, 2012-09-13 at 16:49 +0800, Aaron Lu wrote:
>> On 09/13/2012 04:37 PM, James Bottomley wrote:
>>> On Thu, 2012-09-13 at 16:23 +0800, Aaron Lu wrote:
>>>> On 09/13/2012 04:14 PM, James Bottomley wrote:
>>>>> On Thu, 2012-09-13 at 15:40 +0800, Aaron Lu wrote:
>>>>>> The ready_to_power_off flag is used to give indication to ATA layer
>>>>>> if this device's power can be removed when runtime suspended.
>>>>>>
>>>>>> This flag is determined by individual SCSI driver like sr, sd.
>>>>>>
>>>>>> This flag is introduced to support zero power ODD. When ODD
>>>>>> is runtime suspended, it may not be OK to remove its power.
>>>>>>
>>>>>> But for disk, it is always OK to be powered off, so set this flag.
>>>>>
>>>>> It is? I may have missed this, but where do you flush the cache of write
>>>>> back cache devices you're about to power off?
>>>>
>>>> I suppose that is handled in sd_suspend callback, the power off happens
>>>> after a device is runtime suspended.
>>>
>>> Well that would mean something is wrong somewhere:  For runtime power
>>> management using idle timers and forced standby, there's no need to
>>
>> The current mechanism for scsi disk runtime pm is based on open/close.
>> If there is some process opened this block device, it will be in active
>> state; only when all opened session exited, it will enter runtime
>> suspend state.
> 
> A mounted disk is open for the period of the mount.  I thought the use
> case for runtime PM was the laptop one but most laptops have a single
> device to use as root, so if you never use runtime PM on an open device,
> you never use it on 99% of our target systems ... doesn't that make the
> feature a bit useless?

I agree, but it may be helpful in some cases.

> 
>>> flush the cache (if the drive goes into standby on its own as a result
>>> of an idle timeout, the cache will never flush).  The cache needs to
>>> flush before we power off the device: that's before the system goes into
>>> S3, or now before you power it off at runtime.  Flushing the cache on
>>> runtime transitions to standby will likely cause performance problems
>>> since that happens quite often.
>>
>> As explained above, it didn't happen that often, especially for user who
>> has only one disk, the disk will be mounted, which makes it never be
>> able to enter runtime suspend state.
> 
> So what's the target audience for the feature.  If it isn't laptops or
> standard desktops, is it the enterprise?

To make this feature useful for normal laptop user, a better mechanism
for scsi disk runtime pm is needed. Alan Stern and Lin Ming has been
working on this, and I'll see if I can make that patch work later.

So I think this is basically 2 things, one is the runtime suspend of the
disk, another is when it is runtime suspended, how to remove its power.
I'm currently doing the latter one, which is simpler, so I want to do it
first :-)

And there may exist some cases this can be helpful, if user has 2 or
more disks attached and he is only using one of them or some other
corner cases that I don't know.

Considering the effort to implement this feature pretty small, and it
shouldn't cause trouble for existing system, I think this may be worth
it.

Thanks,
Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Sept. 13, 2012, 9:26 a.m. UTC | #7
On Thu, 2012-09-13 at 17:07 +0800, Aaron Lu wrote:
> On 09/13/2012 04:56 PM, James Bottomley wrote:
> > So what's the target audience for the feature.  If it isn't laptops or
> > standard desktops, is it the enterprise?
> 
> To make this feature useful for normal laptop user, a better mechanism
> for scsi disk runtime pm is needed. Alan Stern and Lin Ming has been
> working on this, and I'll see if I can make that patch work later.
> 
> So I think this is basically 2 things, one is the runtime suspend of the
> disk, another is when it is runtime suspended, how to remove its power.
> I'm currently doing the latter one, which is simpler, so I want to do it
> first :-)

Well, I don't like the way the interaction of the patches is going.
You're the one proposing powering down the device outside of the
standards defined transitions, so you need to be responsible for the
actions that necessitates, including synchronizing the cache.  The specs
(SPC-4) say that cache management is explicitly unnecessary for the
standard SCSI power states (Active, Idle, Standby and Stopped), so
someone at some point is going to read that and remove the unnecessary
cache sync in the code.  When that happens, you'll start getting data
loss.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum Sept. 13, 2012, 10:16 a.m. UTC | #8
On Thursday 13 September 2012 10:26:44 James Bottomley wrote:
> On Thu, 2012-09-13 at 17:07 +0800, Aaron Lu wrote:

> > So I think this is basically 2 things, one is the runtime suspend of the
> > disk, another is when it is runtime suspended, how to remove its power.
> > I'm currently doing the latter one, which is simpler, so I want to do it
> > first :-)
> 
> Well, I don't like the way the interaction of the patches is going.
> You're the one proposing powering down the device outside of the
> standards defined transitions, so you need to be responsible for the
> actions that necessitates, including synchronizing the cache.  The specs
> (SPC-4) say that cache management is explicitly unnecessary for the
> standard SCSI power states (Active, Idle, Standby and Stopped), so
> someone at some point is going to read that and remove the unnecessary
> cache sync in the code.  When that happens, you'll start getting data
> loss.

The cache is handled identically in sd_suspend() and sd_shutdown().
In fact sd_shutdown() will skip handling it if the device has already been
suspended, so the assumption is built into the code and has been so
for a long time.

Though it wouldn't hurt to add a comment that says that the system going
to S3 or S4 will cut power to a lot of disk so that the cache needs to be synced
even if the spec says we need not. Runtime PM doesn't much alter the
situation.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Sept. 13, 2012, 10:51 a.m. UTC | #9
On Thu, 2012-09-13 at 12:16 +0200, Oliver Neukum wrote:
> On Thursday 13 September 2012 10:26:44 James Bottomley wrote:
> > On Thu, 2012-09-13 at 17:07 +0800, Aaron Lu wrote:
> 
> > > So I think this is basically 2 things, one is the runtime suspend of the
> > > disk, another is when it is runtime suspended, how to remove its power.
> > > I'm currently doing the latter one, which is simpler, so I want to do it
> > > first :-)
> > 
> > Well, I don't like the way the interaction of the patches is going.
> > You're the one proposing powering down the device outside of the
> > standards defined transitions, so you need to be responsible for the
> > actions that necessitates, including synchronizing the cache.  The specs
> > (SPC-4) say that cache management is explicitly unnecessary for the
> > standard SCSI power states (Active, Idle, Standby and Stopped), so
> > someone at some point is going to read that and remove the unnecessary
> > cache sync in the code.  When that happens, you'll start getting data
> > loss.
> 
> The cache is handled identically in sd_suspend() and sd_shutdown().
> In fact sd_shutdown() will skip handling it if the device has already been
> suspended, so the assumption is built into the code and has been so
> for a long time.
> 
> Though it wouldn't hurt to add a comment that says that the system going
> to S3 or S4 will cut power to a lot of disk so that the cache needs to be synced
> even if the spec says we need not. Runtime PM doesn't much alter the
> situation.

I think you're confusing two things.  Sleep states (S3 and S4) aren't
spec'd in SCSI, so we have to take care of everything (including the
cache before power off) because they're done invisibly to the disk.  The
same tends to go for link power management, which was previously our
only form of runtime PM, but which doesn't actually affect the disk at
all and, of course, ACPI power off of devices (ZPDD).

Disk runtime power states are defined in the standard and so we rely on
the standard taking care of the cache.  I suspect the most efficient use
may be via the power management mode page, which does everything
automatically on timers (you just get to set the timer interval, plus
some transports *may* require an initialising command which we already
have some provision for) than doing it all ourselves from block.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum Sept. 13, 2012, 12:34 p.m. UTC | #10
On Thursday 13 September 2012 11:51:07 James Bottomley wrote:
> On Thu, 2012-09-13 at 12:16 +0200, Oliver Neukum wrote:
> > On Thursday 13 September 2012 10:26:44 James Bottomley wrote:
> > > On Thu, 2012-09-13 at 17:07 +0800, Aaron Lu wrote:
> > 
> > > > So I think this is basically 2 things, one is the runtime suspend of the
> > > > disk, another is when it is runtime suspended, how to remove its power.
> > > > I'm currently doing the latter one, which is simpler, so I want to do it
> > > > first :-)
> > > 
> > > Well, I don't like the way the interaction of the patches is going.
> > > You're the one proposing powering down the device outside of the
> > > standards defined transitions, so you need to be responsible for the
> > > actions that necessitates, including synchronizing the cache.  The specs
> > > (SPC-4) say that cache management is explicitly unnecessary for the
> > > standard SCSI power states (Active, Idle, Standby and Stopped), so
> > > someone at some point is going to read that and remove the unnecessary
> > > cache sync in the code.  When that happens, you'll start getting data
> > > loss.
> > 
> > The cache is handled identically in sd_suspend() and sd_shutdown().
> > In fact sd_shutdown() will skip handling it if the device has already been
> > suspended, so the assumption is built into the code and has been so
> > for a long time.
> > 
> > Though it wouldn't hurt to add a comment that says that the system going
> > to S3 or S4 will cut power to a lot of disk so that the cache needs to be synced
> > even if the spec says we need not. Runtime PM doesn't much alter the
> > situation.
> 
> I think you're confusing two things.  Sleep states (S3 and S4) aren't
> spec'd in SCSI, so we have to take care of everything (including the
> cache before power off) because they're done invisibly to the disk.  The

Yes, but this confusion is necessary. The driver core is supposed to
be generic and knows strictly speaking only suspended and active.
It is a driver's job to do what needs to be done and translate this
into the appropriate device states.

> same tends to go for link power management, which was previously our
> only form of runtime PM, but which doesn't actually affect the disk at
> all and, of course, ACPI power off of devices (ZPDD).

The latter however does cut power to the drive. So the driver should do
what it does when other operations that affect power are done.

> Disk runtime power states are defined in the standard and so we rely on
> the standard taking care of the cache.  I suspect the most efficient use
> may be via the power management mode page, which does everything
> automatically on timers (you just get to set the timer interval, plus
> some transports *may* require an initialising command which we already
> have some provision for) than doing it all ourselves from block.

Well, yes, but we need support modes of power management that cut off
power to the disk in any case, so what does it matter if we also do it for
runtime PM?

Are you concerned about layering?

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Sept. 13, 2012, 4:24 p.m. UTC | #11
On Thu, 13 Sep 2012, Oliver Neukum wrote:

> > > > Well, I don't like the way the interaction of the patches is going.
> > > > You're the one proposing powering down the device outside of the
> > > > standards defined transitions, so you need to be responsible for the
> > > > actions that necessitates, including synchronizing the cache.  The specs
> > > > (SPC-4) say that cache management is explicitly unnecessary for the
> > > > standard SCSI power states (Active, Idle, Standby and Stopped), so
> > > > someone at some point is going to read that and remove the unnecessary
> > > > cache sync in the code.  When that happens, you'll start getting data
> > > > loss.
> > > 
> > > The cache is handled identically in sd_suspend() and sd_shutdown().
> > > In fact sd_shutdown() will skip handling it if the device has already been
> > > suspended, so the assumption is built into the code and has been so
> > > for a long time.
> > > 
> > > Though it wouldn't hurt to add a comment that says that the system going
> > > to S3 or S4 will cut power to a lot of disk so that the cache needs to be synced
> > > even if the spec says we need not. Runtime PM doesn't much alter the
> > > situation.
> > 
> > I think you're confusing two things.  Sleep states (S3 and S4) aren't
> > spec'd in SCSI, so we have to take care of everything (including the
> > cache before power off) because they're done invisibly to the disk.  The
> 
> Yes, but this confusion is necessary. The driver core is supposed to
> be generic and knows strictly speaking only suspended and active.
> It is a driver's job to do what needs to be done and translate this
> into the appropriate device states.

Currently the sd driver's suspend routine is not very sophisticated.  
It needs to become smarter about the differences between system
suspend, runtime suspend, and power off.

> > same tends to go for link power management, which was previously our
> > only form of runtime PM, but which doesn't actually affect the disk at
> > all and, of course, ACPI power off of devices (ZPDD).
> 
> The latter however does cut power to the drive. So the driver should do
> what it does when other operations that affect power are done.
> 
> > Disk runtime power states are defined in the standard and so we rely on
> > the standard taking care of the cache.  I suspect the most efficient use
> > may be via the power management mode page, which does everything
> > automatically on timers (you just get to set the timer interval, plus
> > some transports *may* require an initialising command which we already
> > have some provision for) than doing it all ourselves from block.
> 
> Well, yes, but we need support modes of power management that cut off
> power to the disk in any case, so what does it matter if we also do it for
> runtime PM?
> 
> Are you concerned about layering?

It sounds like James is partly concerned about efficiency.  If Lin
Ming's patches are merged then we will be doing runtime suspend
relatively often, not just when the device file is closed.  The
sd_suspend routine should know when SYNCHRONIZE CACHE is needed and
when it can be skipped.

From what I gather of this discussion, we can avoid flushing the cache 
during (1) a runtime suspend provided (2) the drive isn't going to be 
powered down.  If either (1) or (2) doesn't hold then the cache needs 
to be synchronized.

The problem with relying on the internal timers and the power
management mode page is that the transitions take place automatically
and the host system doesn't know about them.  We _want_ to know about
them so that the higher layers of the device tree can go to low power
when the disk does.

On the other hand, perhaps sd_suspend/sd_resume could use the mode page
by telling it to go into or out of Stopped mode immediately.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum Sept. 13, 2012, 8:18 p.m. UTC | #12
On Thursday 13 September 2012 12:24:46 Alan Stern wrote:
> On Thu, 13 Sep 2012, Oliver Neukum wrote:

> > Yes, but this confusion is necessary. The driver core is supposed to
> > be generic and knows strictly speaking only suspended and active.
> > It is a driver's job to do what needs to be done and translate this
> > into the appropriate device states.
> 
> Currently the sd driver's suspend routine is not very sophisticated.  
> It needs to become smarter about the differences between system
> suspend, runtime suspend, and power off.

In what way?

> > Well, yes, but we need support modes of power management that cut off
> > power to the disk in any case, so what does it matter if we also do it for
> > runtime PM?
> > 
> > Are you concerned about layering?
> 
> It sounds like James is partly concerned about efficiency.  If Lin
> Ming's patches are merged then we will be doing runtime suspend
> relatively often, not just when the device file is closed.  The
> sd_suspend routine should know when SYNCHRONIZE CACHE is needed and
> when it can be skipped.

How? This depends on the hardware?

> From what I gather of this discussion, we can avoid flushing the cache 
> during (1) a runtime suspend provided (2) the drive isn't going to be 
> powered down.  If either (1) or (2) doesn't hold then the cache needs 
> to be synchronized.

This is true, but how is it relevant?

> The problem with relying on the internal timers and the power
> management mode page is that the transitions take place automatically
> and the host system doesn't know about them.  We _want_ to know about
> them so that the higher layers of the device tree can go to low power
> when the disk does.

Why would you want that to correlate? The operation of the controller
and the driver is independent of the state.
And what would it tell us, as the driver knows aout all IO anyway?

	Regards
		Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Sept. 13, 2012, 8:46 p.m. UTC | #13
On Thu, 13 Sep 2012, Oliver Neukum wrote:

> On Thursday 13 September 2012 12:24:46 Alan Stern wrote:
> > On Thu, 13 Sep 2012, Oliver Neukum wrote:
> 
> > > Yes, but this confusion is necessary. The driver core is supposed to
> > > be generic and knows strictly speaking only suspended and active.
> > > It is a driver's job to do what needs to be done and translate this
> > > into the appropriate device states.
> > 
> > Currently the sd driver's suspend routine is not very sophisticated.  
> > It needs to become smarter about the differences between system
> > suspend, runtime suspend, and power off.
> 
> In what way?

sd_suspend should know whether or not to issue the SYNCHRONIZE CACHE 
command.

> > It sounds like James is partly concerned about efficiency.  If Lin
> > Ming's patches are merged then we will be doing runtime suspend
> > relatively often, not just when the device file is closed.  The
> > sd_suspend routine should know when SYNCHRONIZE CACHE is needed and
> > when it can be skipped.
> 
> How? This depends on the hardware?

It depends partly on the hardware, partly on the type of suspend, and 
partly on the flag settings in sysfs.

> > From what I gather of this discussion, we can avoid flushing the cache 
> > during (1) a runtime suspend provided (2) the drive isn't going to be 
> > powered down.  If either (1) or (2) doesn't hold then the cache needs 
> > to be synchronized.
> 
> This is true, but how is it relevant?

This, or something like it, is the algorithm sd_suspend should use for 
determining whether or not to issue SYNCHRONIZE CACHE.

> > The problem with relying on the internal timers and the power
> > management mode page is that the transitions take place automatically
> > and the host system doesn't know about them.  We _want_ to know about
> > them so that the higher layers of the device tree can go to low power
> > when the disk does.
> 
> Why would you want that to correlate? The operation of the controller
> and the driver is independent of the state.

That's the problem -- I would like them not to be so independent.  The
reason stated above: If we know when the controller puts the drive in a
low-power state then we can tell the higher layers of the device tree
to go to low power at those times.

> And what would it tell us, as the driver knows aout all IO anyway?

But the driver doesn't know when the controller has spun down the disk.  
That's something else sd_suspend has to worry about.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu Sept. 14, 2012, 5:20 a.m. UTC | #14
On Thu, Sep 13, 2012 at 10:26:44AM +0100, James Bottomley wrote:
> On Thu, 2012-09-13 at 17:07 +0800, Aaron Lu wrote:
> > So I think this is basically 2 things, one is the runtime suspend of the
> > disk, another is when it is runtime suspended, how to remove its power.
> > I'm currently doing the latter one, which is simpler, so I want to do it
> > first :-)
> 
> Well, I don't like the way the interaction of the patches is going.
> You're the one proposing powering down the device outside of the
> standards defined transitions, so you need to be responsible for the
> actions that necessitates, including synchronizing the cache. The specs

OK, I'll update the code.

> (SPC-4) say that cache management is explicitly unnecessary for the
> standard SCSI power states (Active, Idle, Standby and Stopped), so

Just read the SPC-4 spec, in section 5.12.3, it has words like this:

Logical units that contain cache memory shall write all cached data to
the medium for the logical unit(e.g., as a logical unit would do in
response to a SYNCHRONIZE CACHE command as described SBC-3) prior to
entering into any power condition that prevents accessing the
media(e.g., before a hard drive stops its spindle motor during a change
to the standby power condition).

So this looks like cache needs to be synced before the device enter
standby/stopped power condition. Or do I miss somthing?

> someone at some point is going to read that and remove the unnecessary
> cache sync in the code.  When that happens, you'll start getting data
> loss.

Indeed, I'll make sure cache gets synced when we are to power off the
device. Thanks for the remind.

-Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu Sept. 14, 2012, 6:57 a.m. UTC | #15
On Thu, Sep 13, 2012 at 12:24:46PM -0400, Alan Stern wrote:
> > > Disk runtime power states are defined in the standard and so we rely on
> > > the standard taking care of the cache.  I suspect the most efficient use
> > > may be via the power management mode page, which does everything
> > > automatically on timers (you just get to set the timer interval, plus
> > > some transports *may* require an initialising command which we already
> > > have some provision for) than doing it all ourselves from block.
> > 
> > Well, yes, but we need support modes of power management that cut off
> > power to the disk in any case, so what does it matter if we also do it for
> > runtime PM?
> > 
> > Are you concerned about layering?
> 
> It sounds like James is partly concerned about efficiency.  If Lin
> Ming's patches are merged then we will be doing runtime suspend
> relatively often, not just when the device file is closed.  The
> sd_suspend routine should know when SYNCHRONIZE CACHE is needed and
> when it can be skipped.
> 
> From what I gather of this discussion, we can avoid flushing the cache 
> during (1) a runtime suspend provided (2) the drive isn't going to be 
> powered down.  If either (1) or (2) doesn't hold then the cache needs 
> to be synchronized.

Agree.

> 
> The problem with relying on the internal timers and the power
> management mode page is that the transitions take place automatically
> and the host system doesn't know about them.  We _want_ to know about
> them so that the higher layers of the device tree can go to low power
> when the disk does.

Looks like it's not easy to know when the device entered a low power
state. Constantly polling with request sense doesn't seem to be a good
idea.

This will make upper layer devices not able to enter runtime suspend
state and device's power can't be cut.

> 
> On the other hand, perhaps sd_suspend/sd_resume could use the mode page
> by telling it to go into or out of Stopped mode immediately.

BTW, is it necessary to issue the stop command before we cut its power
either due to runtime power off or system entering S3/S4/S5?

Thanks,
Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Sept. 14, 2012, 8:15 a.m. UTC | #16
On Thu, 2012-09-13 at 12:24 -0400, Alan Stern wrote:
> On Thu, 13 Sep 2012, Oliver Neukum wrote:
> > > Disk runtime power states are defined in the standard and so we rely on
> > > the standard taking care of the cache.  I suspect the most efficient use
> > > may be via the power management mode page, which does everything
> > > automatically on timers (you just get to set the timer interval, plus
> > > some transports *may* require an initialising command which we already
> > > have some provision for) than doing it all ourselves from block.
> > 
> > Well, yes, but we need support modes of power management that cut off
> > power to the disk in any case, so what does it matter if we also do it for
> > runtime PM?
> > 
> > Are you concerned about layering?
> 
> It sounds like James is partly concerned about efficiency.

Sort of, but my main worry is correctness: I don't want a path in
runtime suspend that requires a cache flush to be dependent on the flush
being in a path which doesn't because efficiency dictates that at some
time or other the unnecessary flush will get removed (and then we'll
start corrupting data).

>   If Lin
> Ming's patches are merged then we will be doing runtime suspend
> relatively often, not just when the device file is closed.  The
> sd_suspend routine should know when SYNCHRONIZE CACHE is needed and
> when it can be skipped.

Keeping the flush in sd_suspend and making sure we know when to use it
would be fine by me as well ... I just need all the independent runtime
suspend patch authors to agree on this scheme.

> >From what I gather of this discussion, we can avoid flushing the cache 
> during (1) a runtime suspend provided (2) the drive isn't going to be 
> powered down.  If either (1) or (2) doesn't hold then the cache needs 
> to be synchronized.
> 
> The problem with relying on the internal timers and the power
> management mode page is that the transitions take place automatically
> and the host system doesn't know about them.  We _want_ to know about
> them so that the higher layers of the device tree can go to low power
> when the disk does.

Sigh ... the standards guys didn't help there then, since SPC-4
specifically says there will be no notifications.

> On the other hand, perhaps sd_suspend/sd_resume could use the mode page
> by telling it to go into or out of Stopped mode immediately.

That's perfectly legal.  Even if you use timer based power state
management afforded by the mode page you can still preempt the timer
with an explicit go into this power state command.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Sept. 14, 2012, 8:17 a.m. UTC | #17
On Fri, 2012-09-14 at 13:20 +0800, Aaron Lu wrote:
> On Thu, Sep 13, 2012 at 10:26:44AM +0100, James Bottomley wrote:
> > On Thu, 2012-09-13 at 17:07 +0800, Aaron Lu wrote:
> > > So I think this is basically 2 things, one is the runtime suspend of the
> > > disk, another is when it is runtime suspended, how to remove its power.
> > > I'm currently doing the latter one, which is simpler, so I want to do it
> > > first :-)
> > 
> > Well, I don't like the way the interaction of the patches is going.
> > You're the one proposing powering down the device outside of the
> > standards defined transitions, so you need to be responsible for the
> > actions that necessitates, including synchronizing the cache. The specs
> 
> OK, I'll update the code.
> 
> > (SPC-4) say that cache management is explicitly unnecessary for the
> > standard SCSI power states (Active, Idle, Standby and Stopped), so
> 
> Just read the SPC-4 spec, in section 5.12.3, it has words like this:
> 
> Logical units that contain cache memory shall write all cached data to
> the medium for the logical unit(e.g., as a logical unit would do in
> response to a SYNCHRONIZE CACHE command as described SBC-3) prior to
> entering into any power condition that prevents accessing the
> media(e.g., before a hard drive stops its spindle motor during a change
> to the standby power condition).
> 
> So this looks like cache needs to be synced before the device enter
> standby/stopped power condition. Or do I miss somthing?

Um, no it says the device shall do the sync on its own (as though it
received a sync cache).  That section says the device shall be
responsible for cache management in the power states.

> > someone at some point is going to read that and remove the unnecessary
> > cache sync in the code.  When that happens, you'll start getting data
> > loss.
> 
> Indeed, I'll make sure cache gets synced when we are to power off the
> device. Thanks for the remind.

Great, thanks.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu Sept. 14, 2012, 8:48 a.m. UTC | #18
On 09/14/2012 04:17 PM, James Bottomley wrote:
>> Just read the SPC-4 spec, in section 5.12.3, it has words like this:
>>
>> Logical units that contain cache memory shall write all cached data to
>> the medium for the logical unit(e.g., as a logical unit would do in
>> response to a SYNCHRONIZE CACHE command as described SBC-3) prior to
>> entering into any power condition that prevents accessing the
>> media(e.g., before a hard drive stops its spindle motor during a change
>> to the standby power condition).
>>
>> So this looks like cache needs to be synced before the device enter
>> standby/stopped power condition. Or do I miss somthing?
> 
> Um, no it says the device shall do the sync on its own (as though it
> received a sync cache).  That section says the device shall be
> responsible for cache management in the power states.

Oh, I thought it was the host software's responsibility, thanks for the
explanation.

So if we program the device to let it enter standby/stopped power
condition with the start_stop_unit command, do we need to sync the
cache?

Thanks,
Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Sept. 14, 2012, 10:26 a.m. UTC | #19
On Fri, 2012-09-14 at 16:48 +0800, Aaron Lu wrote:
> On 09/14/2012 04:17 PM, James Bottomley wrote:
> >> Just read the SPC-4 spec, in section 5.12.3, it has words like this:
> >>
> >> Logical units that contain cache memory shall write all cached data to
> >> the medium for the logical unit(e.g., as a logical unit would do in
> >> response to a SYNCHRONIZE CACHE command as described SBC-3) prior to
> >> entering into any power condition that prevents accessing the
> >> media(e.g., before a hard drive stops its spindle motor during a change
> >> to the standby power condition).
> >>
> >> So this looks like cache needs to be synced before the device enter
> >> standby/stopped power condition. Or do I miss somthing?
> > 
> > Um, no it says the device shall do the sync on its own (as though it
> > received a sync cache).  That section says the device shall be
> > responsible for cache management in the power states.
> 
> Oh, I thought it was the host software's responsibility, thanks for the
> explanation.
> 
> So if we program the device to let it enter standby/stopped power
> condition with the start_stop_unit command, do we need to sync the
> cache?

No, that's what the spec says.  The device must manage the cache in both
the forced (start stop unit) and timed (power control mode page) cases.

The reason is the spec doesn't define what idle and standby actually
mean (just that they're "lower" power states).  So the device
implementers get to choose if they stop the platter or power off the
motor.  The spec just means that if they do anything that causes danger
to data in the cache, they have to deal with it themselves.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4df73e5..de786cf 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2638,6 +2638,7 @@  static void sd_probe_async(void *data, async_cookie_t cookie)
 
 	sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
 		  sdp->removable ? "removable " : "");
+	sdp->ready_to_power_off = 1;
 	scsi_autopm_put_device(sdp);
 	put_device(&sdkp->dev);
 }