Patchwork [v9,06/10] ata: zpodd: check zero power ready status

login
register
mail settings
Submitter Aaron Lu
Date Nov. 9, 2012, 6:51 a.m.
Message ID <1352443922-13734-7-git-send-email-aaron.lu@intel.com>
Download mbox | patch
Permalink /patch/197955/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Aaron Lu - Nov. 9, 2012, 6:51 a.m.
Per the Mount Fuji spec, the ODD is considered zero power ready when:
- For slot type ODD, no media inside;
- For tray type ODD, no media inside and tray closed.

The information can be retrieved by either the returned information of
command GET_EVENT_STATUS_NOTIFICATION(the command is used to poll for
media event) or sense code.

In this implementation, the zero power ready status is determined by
the following factors:
1 polled media status byte, and this info is recorded in status_ready
  field of zpodd structure;
2 sense code by issuing a TEST_UNIT_READY command after status_ready
  is found to be true.

The information provided by the media status byte is not accurate, it is
possible that after a new disc is just inserted, the status byte still
returns media not present. So this information can not be used as the
final deciding factor. But since SCSI ODD driver sr will always poll the
ODD every 2 seconds, this information is readily available without any
much cost. So it is used as a hint: when we find zero power ready status
in the media status byte, we will see if it is really the case with the
sense code method. This way, we can avoid sending too many
TEST_UNIT_READY commands to the ODD.

When we first sensed the ODD in the zero power ready state, the
timestamp will be recoreded. And after ODD stayed in this state for some
pre-defined period, the ODD is considered as power off ready and the
zp_ready flag will be set. The zp_ready flag serves as the deciding
factor other code will use to see if power off is OK for the ODD. The
Mount Fuji spec suggests a delay should be used here, to avoid the case
user ejects the ODD and then instantly inserts a new one again, so that
we can avoid a power transition. And some ODDs may be slow to place its
head to the home position after disc is ejected, so a delay here is
generally a good idea.

The zero power ready status check is performed in the ata port's runtime
suspend code path, when port is not frozen yet, as we need to issue some
IOs to the ODD.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/ata/libata-acpi.c  |   8 +++-
 drivers/ata/libata-scsi.c  |   4 ++
 drivers/ata/libata-zpodd.c | 116 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h       |   4 ++
 4 files changed, 131 insertions(+), 1 deletion(-)
Tejun Heo - Nov. 12, 2012, 7:13 p.m.
Hello,

On Fri, Nov 09, 2012 at 02:51:58PM +0800, Aaron Lu wrote:
> @@ -784,7 +784,13 @@ static int ata_acpi_push_id(struct ata_device *dev)
>   */
>  int ata_acpi_on_suspend(struct ata_port *ap)
>  {
> -	/* nada */
> +	struct ata_device *dev;
> +
> +	ata_for_each_dev(dev, &ap->link, ENABLED) {
> +		if (zpodd_dev_enabled(dev))
> +			zpodd_check_zpready(dev);
> +	}
> +

Why is it running off ata_acpi_on_suspend() instead of hooking
directly into EH suspend routine?

> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index e3bda07..6f235b9 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2665,6 +2665,10 @@ static void atapi_qc_complete(struct ata_queued_cmd *qc)
>  			ata_scsi_rbuf_put(cmd, true, &flags);
>  		}
>  
> +		if (zpodd_dev_enabled(qc->dev) &&
> +				scsicmd[0] == GET_EVENT_STATUS_NOTIFICATION)
> +			zpodd_snoop_status(qc->dev, cmd);
> +

Brief comment explaining what's going on here wouldn't hurt.

> +#define POWEROFF_DELAY  (30 * 1000)     /* 30 seconds for power off delay */
> +
>  struct zpodd {
>  	bool slot:1;
>  	bool drawer:1;
>  	bool from_notify:1;	/* resumed as a result of acpi notification */
> +	bool status_ready:1;	/* ready status derived from media event poll,
> +				   it is not accurate, but serves as a hint */
> +	bool zp_ready:1;	/* zero power ready state */
> +
> +	unsigned long last_ready; /* last zero power ready timestamp */
>  
>  	struct ata_device *dev;
>  };

How are accesses to the bit fields synchronized?

> +/*
> + * Snoop the result of GET_STATUS_NOTIFICATION_EVENT, the media
> + * status byte has information on media present/door closed.
> + *
> + * This information serves only as a hint, as it is not accurate.
> + * The sense code method will be used when deciding if the ODD is
> + * really zero power ready.
> + */

It would be great if you can make the above a proper dockbook function
comment.  Also, as the snooping for hint thing isn't too obvious it
would be great if the comment contains more info which is explained in
the commit message.

> +void zpodd_snoop_status(struct ata_device *dev, struct scsi_cmnd *scmd)
> +{
> +	bool ready;
> +	char buf[8];
> +	struct event_header *eh = (void *)buf;
> +	struct media_event_desc *med = (void *)(buf + 4);
> +	struct sg_table *table = &scmd->sdb.table;
> +	struct zpodd *zpodd = dev->private_data;

Don't people usually put variables definitions w/ assignments above
the ones without?

> +/*
> + * Check ODD's zero power ready status.
> + *
> + * This function is called during ATA port's suspend path,
> + * when the port is not frozen yet, so that we can still make
> + * some IO to the ODD to decide if it is zero power ready.
> + *
> + * The ODD is regarded as zero power ready when it is in zero
> + * power ready state for some time(defined by POWEROFF_DELAY).
> + */
> +void zpodd_check_zpready(struct ata_device *dev)
> +{
> +	bool zp_ready;
> +	unsigned long expires;
> +	struct zpodd *zpodd = dev->private_data;
> +
> +	if (!zpodd->status_ready) {
> +		zpodd->last_ready = 0;
> +		return;
> +	}
> +
> +	if (!zpodd->last_ready) {
> +		zp_ready = zpready(dev);
> +		if (zp_ready)
> +			zpodd->last_ready = jiffies;
> +		return;
> +	}
> +
> +	expires = zpodd->last_ready + msecs_to_jiffies(POWEROFF_DELAY);
> +	if (time_before(jiffies, expires))
> +		return;
> +
> +	zpodd->zp_ready = zpready(dev);
> +	if (!zpodd->zp_ready)
> +		zpodd->last_ready = 0;
> +}

Hmmm... so, the "full" check only happens when autopm kicks in, right?
Is it really worth avoiding an extra TUR on autopm events?  That's not
really a hot path.  It seems a bit over-engineered to me.

Thanks.
Aaron Lu - Nov. 13, 2012, 1:20 p.m.
On Mon, Nov 12, 2012 at 11:13:03AM -0800, Tejun Heo wrote:
> Hello,
> 
> On Fri, Nov 09, 2012 at 02:51:58PM +0800, Aaron Lu wrote:
> > +/*
> > + * Check ODD's zero power ready status.
> > + *
> > + * This function is called during ATA port's suspend path,
> > + * when the port is not frozen yet, so that we can still make
> > + * some IO to the ODD to decide if it is zero power ready.
> > + *
> > + * The ODD is regarded as zero power ready when it is in zero
> > + * power ready state for some time(defined by POWEROFF_DELAY).
> > + */
> > +void zpodd_check_zpready(struct ata_device *dev)
> > +{
> > +	bool zp_ready;
> > +	unsigned long expires;
> > +	struct zpodd *zpodd = dev->private_data;
> > +
> > +	if (!zpodd->status_ready) {
> > +		zpodd->last_ready = 0;
> > +		return;
> > +	}
> > +
> > +	if (!zpodd->last_ready) {
> > +		zp_ready = zpready(dev);
> > +		if (zp_ready)
> > +			zpodd->last_ready = jiffies;
> > +		return;
> > +	}
> > +
> > +	expires = zpodd->last_ready + msecs_to_jiffies(POWEROFF_DELAY);
> > +	if (time_before(jiffies, expires))
> > +		return;
> > +
> > +	zpodd->zp_ready = zpready(dev);
> > +	if (!zpodd->zp_ready)
> > +		zpodd->last_ready = 0;
> > +}
> 
> Hmmm... so, the "full" check only happens when autopm kicks in, right?

Yes, and unless the ODD is powered off, autopm can kick in every 2
seconds(together with the events poll).

> Is it really worth avoiding an extra TUR on autopm events?  That's not
> really a hot path.  It seems a bit over-engineered to me.

I'm not sure...But if issuing TUR every 2 seconds under some condition
is acceptable, I'll be happily removing the snoop thing and use TUR only.

Thanks,
Aaron

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - Nov. 14, 2012, 2:18 a.m.
On 11/13/2012 03:13 AM, Tejun Heo wrote:
> Hello,
> 
> On Fri, Nov 09, 2012 at 02:51:58PM +0800, Aaron Lu wrote:
>> +#define POWEROFF_DELAY  (30 * 1000)     /* 30 seconds for power off delay */
>> +
>>  struct zpodd {
>>  	bool slot:1;
>>  	bool drawer:1;
>>  	bool from_notify:1;	/* resumed as a result of acpi notification */
>> +	bool status_ready:1;	/* ready status derived from media event poll,
>> +				   it is not accurate, but serves as a hint */
>> +	bool zp_ready:1;	/* zero power ready state */
>> +
>> +	unsigned long last_ready; /* last zero power ready timestamp */
>>  
>>  	struct ata_device *dev;
>>  };
> 
> How are accesses to the bit fields synchronized?

They are synchronized by PM core.
PM core ensures that no two suspend or resume callback run concurrently.
And when ODD is executing a command, it is in active state, no PM
callback will run.

> 
>> +/*
>> + * Snoop the result of GET_STATUS_NOTIFICATION_EVENT, the media
>> + * status byte has information on media present/door closed.
>> + *
>> + * This information serves only as a hint, as it is not accurate.
>> + * The sense code method will be used when deciding if the ODD is
>> + * really zero power ready.
>> + */
> 
> It would be great if you can make the above a proper dockbook function
> comment.  Also, as the snooping for hint thing isn't too obvious it
> would be great if the comment contains more info which is explained in
> the commit message.

OK.

> 
>> +/*
>> + * Check ODD's zero power ready status.
>> + *
>> + * This function is called during ATA port's suspend path,
>> + * when the port is not frozen yet, so that we can still make
>> + * some IO to the ODD to decide if it is zero power ready.
>> + *
>> + * The ODD is regarded as zero power ready when it is in zero
>> + * power ready state for some time(defined by POWEROFF_DELAY).
>> + */
>> +void zpodd_check_zpready(struct ata_device *dev)
>> +{
>> +	bool zp_ready;
>> +	unsigned long expires;
>> +	struct zpodd *zpodd = dev->private_data;
>> +
>> +	if (!zpodd->status_ready) {
>> +		zpodd->last_ready = 0;
>> +		return;
>> +	}
>> +
>> +	if (!zpodd->last_ready) {
>> +		zp_ready = zpready(dev);
>> +		if (zp_ready)
>> +			zpodd->last_ready = jiffies;
>> +		return;
>> +	}
>> +
>> +	expires = zpodd->last_ready + msecs_to_jiffies(POWEROFF_DELAY);
>> +	if (time_before(jiffies, expires))
>> +		return;
>> +
>> +	zpodd->zp_ready = zpready(dev);
>> +	if (!zpodd->zp_ready)
>> +		zpodd->last_ready = 0;
>> +}
> 
> Hmmm... so, the "full" check only happens when autopm kicks in, right?
> Is it really worth avoiding an extra TUR on autopm events?  That's not
> really a hot path.  It seems a bit over-engineered to me.

A little more information about this:
When there is disc inside and no program mounted the drive, the ODD will
be runtime suspended/resumed every 2 seconds along with the event poll.

I'm not sure if the above situation can happen often. Normal desktop
environment should automatically mount the ODD once they got uevent, and
for console users, they will probably manually mount the drive after
they have inserted a disc. The way I did it this way is to deal with the
worst possible case. But if this is deemed as not necessary, I think I
can remove the snoop hint thing and use TUR directly.


Thanks,
Aaron

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Nov. 18, 2012, 3 p.m.
Hello, Aaron.

On Wed, Nov 14, 2012 at 10:18:23AM +0800, Aaron Lu wrote:
> On 11/13/2012 03:13 AM, Tejun Heo wrote:
> > Hello,
> > 
> > On Fri, Nov 09, 2012 at 02:51:58PM +0800, Aaron Lu wrote:
> >> +#define POWEROFF_DELAY  (30 * 1000)     /* 30 seconds for power off delay */
> >> +
> >>  struct zpodd {
> >>  	bool slot:1;
> >>  	bool drawer:1;
> >>  	bool from_notify:1;	/* resumed as a result of acpi notification */
> >> +	bool status_ready:1;	/* ready status derived from media event poll,
> >> +				   it is not accurate, but serves as a hint */
> >> +	bool zp_ready:1;	/* zero power ready state */
> >> +
> >> +	unsigned long last_ready; /* last zero power ready timestamp */
> >>  
> >>  	struct ata_device *dev;
> >>  };
> > 
> > How are accesses to the bit fields synchronized?
> 
> They are synchronized by PM core.
> PM core ensures that no two suspend or resume callback run concurrently.
> And when ODD is executing a command, it is in active state, no PM
> callback will run.

Care to add short comment for that?  Flag and bitfield updates aren't
atomic to each other, so I find it usually helpful to clearly state
the synchronization rules for them.  More so if locking is inherited
from upprer layer and not immediately obvious.

> > Hmmm... so, the "full" check only happens when autopm kicks in, right?
> > Is it really worth avoiding an extra TUR on autopm events?  That's not
> > really a hot path.  It seems a bit over-engineered to me.
> 
> A little more information about this:
> When there is disc inside and no program mounted the drive, the ODD will
> be runtime suspended/resumed every 2 seconds along with the event poll.

Is that a desirable behavior?  I haven't been following autopm and am
a bit fuzzy about how autopm works and what it does.

If there isn't any user of the device autopm kicks in.  If zpodd is
enabled and there's no media, the device goes off power.  If the user
initiates an event which may change media status, the driver is
notified via acpi and autopm backs out restoring power to the device.
Am I understanding it correctly?

What I'm confused about is what autopm does for devices w/o zpodd.
What happens then?  Is it gonna leave power on for the device and,
say, go on to suspend the controller?  But, how would that work for,
say, future devices with async notification for media events?

Also, if autopm is enabled, an optical device would go in and out of
suspend every two seconds?

> I'm not sure if the above situation can happen often. Normal desktop
> environment should automatically mount the ODD once they got uevent, and
> for console users, they will probably manually mount the drive after
> they have inserted a disc. The way I did it this way is to deal with the
> worst possible case. But if this is deemed as not necessary, I think I
> can remove the snoop hint thing and use TUR directly.

The problem with issuing TUR regularly is that some ODDs lock up after
getting hit by frequent TURs.  That's the reason why sr event check
routine is being careful with TUR and only issue
GET_EVENT_STATUS_NOTIFICATION.  Windows does about the same thing and
some vendors somehow screwed up TUR.

That said, I can't say the snooping is pretty.  It's a rather nasty
thing to do.  So, libata now wants information from the event polling
in block layer, but reaching for block_device from ata_devices is
nasty too.  Hmmm... but aren't you already doing that to block polling
on a powered down device?

Thanks.
Aaron Lu - Nov. 19, 2012, 3:09 a.m.
On 11/18/2012 11:00 PM, Tejun Heo wrote:
> Hello, Aaron.
Hi,

> 
> On Wed, Nov 14, 2012 at 10:18:23AM +0800, Aaron Lu wrote:
>> On 11/13/2012 03:13 AM, Tejun Heo wrote:
>>> Hello,
>>>
>>> On Fri, Nov 09, 2012 at 02:51:58PM +0800, Aaron Lu wrote:
>>>> +#define POWEROFF_DELAY  (30 * 1000)     /* 30 seconds for power off delay */
>>>> +
>>>>  struct zpodd {
>>>>  	bool slot:1;
>>>>  	bool drawer:1;
>>>>  	bool from_notify:1;	/* resumed as a result of acpi notification */
>>>> +	bool status_ready:1;	/* ready status derived from media event poll,
>>>> +				   it is not accurate, but serves as a hint */
>>>> +	bool zp_ready:1;	/* zero power ready state */
>>>> +
>>>> +	unsigned long last_ready; /* last zero power ready timestamp */
>>>>  
>>>>  	struct ata_device *dev;
>>>>  };
>>>
>>> How are accesses to the bit fields synchronized?
>>
>> They are synchronized by PM core.
>> PM core ensures that no two suspend or resume callback run concurrently.
>> And when ODD is executing a command, it is in active state, no PM
>> callback will run.
> 
> Care to add short comment for that?  Flag and bitfield updates aren't
> atomic to each other, so I find it usually helpful to clearly state
> the synchronization rules for them.  More so if locking is inherited
> from upprer layer and not immediately obvious.

OK.

> 
>>> Hmmm... so, the "full" check only happens when autopm kicks in, right?
>>> Is it really worth avoiding an extra TUR on autopm events?  That's not
>>> really a hot path.  It seems a bit over-engineered to me.
>>
>> A little more information about this:
>> When there is disc inside and no program mounted the drive, the ODD will
>> be runtime suspended/resumed every 2 seconds along with the event poll.
> 
> Is that a desirable behavior?  I haven't been following autopm and am
> a bit fuzzy about how autopm works and what it does.
> 
> If there isn't any user of the device autopm kicks in.  If zpodd is
> enabled and there's no media, the device goes off power.  If the user
> initiates an event which may change media status, the driver is
> notified via acpi and autopm backs out restoring power to the device.
> Am I understanding it correctly?

Yes.

> 
> What I'm confused about is what autopm does for devices w/o zpodd.
> What happens then?  Is it gonna leave power on for the device and,
> say, go on to suspend the controller?  But, how would that work for,
> say, future devices with async notification for media events?

Maybe we shouldn't allow autopm for such devices?

> 
> Also, if autopm is enabled, an optical device would go in and out of
> suspend every two seconds?
> 
>> I'm not sure if the above situation can happen often. Normal desktop
>> environment should automatically mount the ODD once they got uevent, and
>> for console users, they will probably manually mount the drive after
>> they have inserted a disc. The way I did it this way is to deal with the
>> worst possible case. But if this is deemed as not necessary, I think I
>> can remove the snoop hint thing and use TUR directly.
> 
> The problem with issuing TUR regularly is that some ODDs lock up after
> getting hit by frequent TURs.  That's the reason why sr event check
> routine is being careful with TUR and only issue
> GET_EVENT_STATUS_NOTIFICATION.  Windows does about the same thing and
> some vendors somehow screwed up TUR.
> 
> That said, I can't say the snooping is pretty.  It's a rather nasty
> thing to do.  So, libata now wants information from the event polling
> in block layer, but reaching for block_device from ata_devices is
> nasty too.  Hmmm... but aren't you already doing that to block polling
> on a powered down device?

I was feeling brain damaged by this for some time :-)

Basically, only ATA layer is aware of the power off thing, and sr knows
nothing about this(or it is not supposed to know this, at least, this is
what SCSI people think) and once powered off, I do not want the poll to
disturb the device, so I need to block the poll. I can't come up with
another way to achieve this except this nasty way.

James suggests me to keep the poll, but emulate the command. The problem
with this is, the autopm for resume will kick in on each poll, so I'll
need to decide if power up the ODD for this time's resume is needed in
port's runtime resume callback. This made things complex and it also put
too much logic in the resume callback, which is not desired. And even if
I keep the ODD in powered off state by emulating this poll command, its
ancestor devices will still be resumed, and I may need to do some trick
in their resume callback to avoid needless power/state transitions. This
doesn't feel like an elegant way to solve this either.

So yes, I'm still using this _nasty_ way to make the ODD stay in powered
off state as long as possible. But if there is other elegant ways to
solve this, I would appreciate it and happily using it. Personally, I
hope we can make sr aware of ZPODD, that would make the pain gone.

Thanks,
Aaron

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Nov. 19, 2012, 2:56 p.m.
Hey, Aaron.

On Mon, Nov 19, 2012 at 11:09:40AM +0800, Aaron Lu wrote:
> > What I'm confused about is what autopm does for devices w/o zpodd.
> > What happens then?  Is it gonna leave power on for the device and,
> > say, go on to suspend the controller?  But, how would that work for,
> > say, future devices with async notification for media events?
> 
> Maybe we shouldn't allow autopm for such devices?

Yeah, maybe.  It would be nice to be able to automatically power off
disks and ports which aren't being used tho.

> > That said, I can't say the snooping is pretty.  It's a rather nasty
> > thing to do.  So, libata now wants information from the event polling
> > in block layer, but reaching for block_device from ata_devices is
> > nasty too.  Hmmm... but aren't you already doing that to block polling
> > on a powered down device?
> 
> I was feeling brain damaged by this for some time :-)
> 
> Basically, only ATA layer is aware of the power off thing, and sr knows
> nothing about this(or it is not supposed to know this, at least, this is
> what SCSI people think) and once powered off, I do not want the poll to
> disturb the device, so I need to block the poll. I can't come up with
> another way to achieve this except this nasty way.
> 
> James suggests me to keep the poll, but emulate the command. The problem
> with this is, the autopm for resume will kick in on each poll, so I'll
> need to decide if power up the ODD for this time's resume is needed in
> port's runtime resume callback. This made things complex and it also put
> too much logic in the resume callback, which is not desired. And even if
> I keep the ODD in powered off state by emulating this poll command, its
> ancestor devices will still be resumed, and I may need to do some trick
> in their resume callback to avoid needless power/state transitions. This
> doesn't feel like an elegant way to solve this either.
> 
> So yes, I'm still using this _nasty_ way to make the ODD stay in powered
> off state as long as possible. But if there is other elegant ways to
> solve this, I would appreciate it and happily using it. Personally, I
> hope we can make sr aware of ZPODD, that would make the pain gone.

I really think we need a way for (auto)pm and event polling to talk to
each other so that autopm can tell event poll to sod off while pm is
in effect.  Trying to solve this from inside libata doesn't seem
right.  The problem, again, seems to be figuring out which hardware
device maps to which block device.  Hmmm... Any good ideas?

Thanks.
James Bottomley - Nov. 19, 2012, 3:06 p.m.
On Mon, 2012-11-19 at 06:56 -0800, Tejun Heo wrote:
> Hey, Aaron.
> 
> On Mon, Nov 19, 2012 at 11:09:40AM +0800, Aaron Lu wrote:
> > > What I'm confused about is what autopm does for devices w/o zpodd.
> > > What happens then?  Is it gonna leave power on for the device and,
> > > say, go on to suspend the controller?  But, how would that work for,
> > > say, future devices with async notification for media events?
> > 
> > Maybe we shouldn't allow autopm for such devices?
> 
> Yeah, maybe.  It would be nice to be able to automatically power off
> disks and ports which aren't being used tho.
> 
> > > That said, I can't say the snooping is pretty.  It's a rather nasty
> > > thing to do.  So, libata now wants information from the event polling
> > > in block layer, but reaching for block_device from ata_devices is
> > > nasty too.  Hmmm... but aren't you already doing that to block polling
> > > on a powered down device?
> > 
> > I was feeling brain damaged by this for some time :-)
> > 
> > Basically, only ATA layer is aware of the power off thing, and sr knows
> > nothing about this(or it is not supposed to know this, at least, this is
> > what SCSI people think) and once powered off, I do not want the poll to
> > disturb the device, so I need to block the poll. I can't come up with
> > another way to achieve this except this nasty way.
> > 
> > James suggests me to keep the poll, but emulate the command. The problem
> > with this is, the autopm for resume will kick in on each poll, so I'll
> > need to decide if power up the ODD for this time's resume is needed in
> > port's runtime resume callback. This made things complex and it also put
> > too much logic in the resume callback, which is not desired. And even if
> > I keep the ODD in powered off state by emulating this poll command, its
> > ancestor devices will still be resumed, and I may need to do some trick
> > in their resume callback to avoid needless power/state transitions. This
> > doesn't feel like an elegant way to solve this either.
> > 
> > So yes, I'm still using this _nasty_ way to make the ODD stay in powered
> > off state as long as possible. But if there is other elegant ways to
> > solve this, I would appreciate it and happily using it. Personally, I
> > hope we can make sr aware of ZPODD, that would make the pain gone.
> 
> I really think we need a way for (auto)pm and event polling to talk to
> each other so that autopm can tell event poll to sod off while pm is
> in effect.  Trying to solve this from inside libata doesn't seem
> right.  The problem, again, seems to be figuring out which hardware
> device maps to which block device.  Hmmm... Any good ideas?

I've asked the PM people several times about this, because it's a real
problem for almost everything:  PM needs some type of top to bottom
stack view, which the layering isolation we have within storage really
doesn't cope with well.  No real suggestion has been forthcoming.

The reason I think it should be emulated (in the acpi layer, not libata,
but as long as it's not in SCSI, I'm not so fussed where it ends up) is
because ZPODD is the software equivalent of ZPREADY, which will be done
in hardware and will be effectively invisible to autopm in the same way
that SCSI (and ATA) power management is mostly invisible.  If we
currently do ZPREADY emulation in the low layer (i.e. ZPODD has exact
ZPREADY emulation), we won't care (except for flipping the sofware bit)
whether the device support ZPODD or ZPREADY and it will all just
work(tm).  The industry expectation is that ZPODD is just a transition
state between current power management and ZPREADY.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - Nov. 20, 2012, 6 a.m.
On 11/19/2012 10:56 PM, Tejun Heo wrote:
> Hey, Aaron.
> 
> On Mon, Nov 19, 2012 at 11:09:40AM +0800, Aaron Lu wrote:
>>> What I'm confused about is what autopm does for devices w/o zpodd.
>>> What happens then?  Is it gonna leave power on for the device and,
>>> say, go on to suspend the controller?  But, how would that work for,
>>> say, future devices with async notification for media events?
>>
>> Maybe we shouldn't allow autopm for such devices?
> 
> Yeah, maybe.  It would be nice to be able to automatically power off
> disks and ports which aren't being used tho.

Yes, we can do this.
I'm just saying, if an ODD is using async notification, we probably
shouldn't enable autopm for it at the moment.

> 
>>> That said, I can't say the snooping is pretty.  It's a rather nasty
>>> thing to do.  So, libata now wants information from the event polling
>>> in block layer, but reaching for block_device from ata_devices is
>>> nasty too.  Hmmm... but aren't you already doing that to block polling
>>> on a powered down device?
>>
>> I was feeling brain damaged by this for some time :-)
>>
>> Basically, only ATA layer is aware of the power off thing, and sr knows
>> nothing about this(or it is not supposed to know this, at least, this is
>> what SCSI people think) and once powered off, I do not want the poll to
>> disturb the device, so I need to block the poll. I can't come up with
>> another way to achieve this except this nasty way.
>>
>> James suggests me to keep the poll, but emulate the command. The problem
>> with this is, the autopm for resume will kick in on each poll, so I'll
>> need to decide if power up the ODD for this time's resume is needed in
>> port's runtime resume callback. This made things complex and it also put
>> too much logic in the resume callback, which is not desired. And even if
>> I keep the ODD in powered off state by emulating this poll command, its
>> ancestor devices will still be resumed, and I may need to do some trick
>> in their resume callback to avoid needless power/state transitions. This
>> doesn't feel like an elegant way to solve this either.
>>
>> So yes, I'm still using this _nasty_ way to make the ODD stay in powered
>> off state as long as possible. But if there is other elegant ways to
>> solve this, I would appreciate it and happily using it. Personally, I
>> hope we can make sr aware of ZPODD, that would make the pain gone.
> 
> I really think we need a way for (auto)pm and event polling to talk to
> each other so that autopm can tell event poll to sod off while pm is
> in effect.  Trying to solve this from inside libata doesn't seem
> right.  The problem, again, seems to be figuring out which hardware
> device maps to which block device.  Hmmm... Any good ideas?

A possible way of doing this is using pm qos.

We currently have 2 pm qos flags, NO_POWER_OFF and REMOTE_WAKEUP, and we
can add another one: NO_POLL, use it like the following:
1 Set the NO_POLL pm qos flag when the underlying driver thinks it is no
  longer necessary. In the ZPODD's case, it should be set when the
  device is to be powered off;
2 Clear it when poll is necessary again. In the ZPODD's case, when power
  is re-gained, this flag will be cleared.
3 In the disk_events_workfn, check if this flag is set, if so, simply
  return.

The disk->driverfs_dev can be used to host the pm qos flag, ATA layer
can access it through ata_device->sdev->sdev_gendev.

Is this OK?

Thanks,
Aaron

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - Nov. 20, 2012, 8:59 a.m.
On 11/20/2012 02:00 PM, Aaron Lu wrote:
> On 11/19/2012 10:56 PM, Tejun Heo wrote:
>> I really think we need a way for (auto)pm and event polling to talk to
>> each other so that autopm can tell event poll to sod off while pm is
>> in effect.  Trying to solve this from inside libata doesn't seem
>> right.  The problem, again, seems to be figuring out which hardware
>> device maps to which block device.  Hmmm... Any good ideas?
> 
> A possible way of doing this is using pm qos.
> 
> We currently have 2 pm qos flags, NO_POWER_OFF and REMOTE_WAKEUP, and we
> can add another one: NO_POLL, use it like the following:
> 1 Set the NO_POLL pm qos flag when the underlying driver thinks it is no
>   longer necessary. In the ZPODD's case, it should be set when the
>   device is to be powered off;
> 2 Clear it when poll is necessary again. In the ZPODD's case, when power
>   is re-gained, this flag will be cleared.


> 3 In the disk_events_workfn, check if this flag is set, if so, simply
>   return.

It should be, skip calling disk->fops->check_events, but still queue the
work for next time's poll.

-Aaron

> 
> The disk->driverfs_dev can be used to host the pm qos flag, ATA layer
> can access it through ata_device->sdev->sdev_gendev.
> 
> Is this OK?
> 
> Thanks,
> Aaron
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Nov. 26, 2012, 12:33 a.m.
On Monday, November 19, 2012 03:06:51 PM James Bottomley wrote:
> On Mon, 2012-11-19 at 06:56 -0800, Tejun Heo wrote:
> > Hey, Aaron.
> > 
> > On Mon, Nov 19, 2012 at 11:09:40AM +0800, Aaron Lu wrote:
> > > > What I'm confused about is what autopm does for devices w/o zpodd.
> > > > What happens then?  Is it gonna leave power on for the device and,
> > > > say, go on to suspend the controller?  But, how would that work for,
> > > > say, future devices with async notification for media events?
> > > 
> > > Maybe we shouldn't allow autopm for such devices?
> > 
> > Yeah, maybe.  It would be nice to be able to automatically power off
> > disks and ports which aren't being used tho.
> > 
> > > > That said, I can't say the snooping is pretty.  It's a rather nasty
> > > > thing to do.  So, libata now wants information from the event polling
> > > > in block layer, but reaching for block_device from ata_devices is
> > > > nasty too.  Hmmm... but aren't you already doing that to block polling
> > > > on a powered down device?
> > > 
> > > I was feeling brain damaged by this for some time :-)
> > > 
> > > Basically, only ATA layer is aware of the power off thing, and sr knows
> > > nothing about this(or it is not supposed to know this, at least, this is
> > > what SCSI people think) and once powered off, I do not want the poll to
> > > disturb the device, so I need to block the poll. I can't come up with
> > > another way to achieve this except this nasty way.
> > > 
> > > James suggests me to keep the poll, but emulate the command. The problem
> > > with this is, the autopm for resume will kick in on each poll, so I'll
> > > need to decide if power up the ODD for this time's resume is needed in
> > > port's runtime resume callback. This made things complex and it also put
> > > too much logic in the resume callback, which is not desired. And even if
> > > I keep the ODD in powered off state by emulating this poll command, its
> > > ancestor devices will still be resumed, and I may need to do some trick
> > > in their resume callback to avoid needless power/state transitions. This
> > > doesn't feel like an elegant way to solve this either.
> > > 
> > > So yes, I'm still using this _nasty_ way to make the ODD stay in powered
> > > off state as long as possible. But if there is other elegant ways to
> > > solve this, I would appreciate it and happily using it. Personally, I
> > > hope we can make sr aware of ZPODD, that would make the pain gone.
> > 
> > I really think we need a way for (auto)pm and event polling to talk to
> > each other so that autopm can tell event poll to sod off while pm is
> > in effect.  Trying to solve this from inside libata doesn't seem
> > right.  The problem, again, seems to be figuring out which hardware
> > device maps to which block device.  Hmmm... Any good ideas?
> 
> I've asked the PM people several times about this, because it's a real
> problem for almost everything:  PM needs some type of top to bottom
> stack view, which the layering isolation we have within storage really
> doesn't cope with well.  No real suggestion has been forthcoming.

Actually, I think that the particular case in question is really special
and the fact that there's the pollig loop that user space is involved in
doesn't make things more stratightforward.

And PM really doesn't need to see things top to bottom, but the polling
needs to know what happens in the PM land.  We need to be able to tell it
"from now on tell user space that there are no events here".  The question
is where to put that information so that it's accessible to all parts of the
stack involved.

> The reason I think it should be emulated (in the acpi layer, not libata,
> but as long as it's not in SCSI, I'm not so fussed where it ends up) is
> because ZPODD is the software equivalent of ZPREADY, which will be done
> in hardware and will be effectively invisible to autopm in the same way
> that SCSI (and ATA) power management is mostly invisible.  If we
> currently do ZPREADY emulation in the low layer (i.e. ZPODD has exact
> ZPREADY emulation), we won't care (except for flipping the sofware bit)
> whether the device support ZPODD or ZPREADY and it will all just
> work(tm).  The industry expectation is that ZPODD is just a transition
> state between current power management and ZPREADY.

Well, if you poll a ZPREADY-capable drive, it will go off and on in cycles
transparently, but still it won't save as much energy as it can.  We'll need
to do something about the polling in that case too, it seems.

Thanks,
Rafael
Aaron Lu - Nov. 26, 2012, 12:45 a.m.
On 11/26/2012 08:33 AM, Rafael J. Wysocki wrote:
> On Monday, November 19, 2012 03:06:51 PM James Bottomley wrote:
>> On Mon, 2012-11-19 at 06:56 -0800, Tejun Heo wrote:
>>> Hey, Aaron.
>>>
>>> On Mon, Nov 19, 2012 at 11:09:40AM +0800, Aaron Lu wrote:
>>>>> What I'm confused about is what autopm does for devices w/o zpodd.
>>>>> What happens then?  Is it gonna leave power on for the device and,
>>>>> say, go on to suspend the controller?  But, how would that work for,
>>>>> say, future devices with async notification for media events?
>>>>
>>>> Maybe we shouldn't allow autopm for such devices?
>>>
>>> Yeah, maybe.  It would be nice to be able to automatically power off
>>> disks and ports which aren't being used tho.
>>>
>>>>> That said, I can't say the snooping is pretty.  It's a rather nasty
>>>>> thing to do.  So, libata now wants information from the event polling
>>>>> in block layer, but reaching for block_device from ata_devices is
>>>>> nasty too.  Hmmm... but aren't you already doing that to block polling
>>>>> on a powered down device?
>>>>
>>>> I was feeling brain damaged by this for some time :-)
>>>>
>>>> Basically, only ATA layer is aware of the power off thing, and sr knows
>>>> nothing about this(or it is not supposed to know this, at least, this is
>>>> what SCSI people think) and once powered off, I do not want the poll to
>>>> disturb the device, so I need to block the poll. I can't come up with
>>>> another way to achieve this except this nasty way.
>>>>
>>>> James suggests me to keep the poll, but emulate the command. The problem
>>>> with this is, the autopm for resume will kick in on each poll, so I'll
>>>> need to decide if power up the ODD for this time's resume is needed in
>>>> port's runtime resume callback. This made things complex and it also put
>>>> too much logic in the resume callback, which is not desired. And even if
>>>> I keep the ODD in powered off state by emulating this poll command, its
>>>> ancestor devices will still be resumed, and I may need to do some trick
>>>> in their resume callback to avoid needless power/state transitions. This
>>>> doesn't feel like an elegant way to solve this either.
>>>>
>>>> So yes, I'm still using this _nasty_ way to make the ODD stay in powered
>>>> off state as long as possible. But if there is other elegant ways to
>>>> solve this, I would appreciate it and happily using it. Personally, I
>>>> hope we can make sr aware of ZPODD, that would make the pain gone.
>>>
>>> I really think we need a way for (auto)pm and event polling to talk to
>>> each other so that autopm can tell event poll to sod off while pm is
>>> in effect.  Trying to solve this from inside libata doesn't seem
>>> right.  The problem, again, seems to be figuring out which hardware
>>> device maps to which block device.  Hmmm... Any good ideas?
>>
>> I've asked the PM people several times about this, because it's a real
>> problem for almost everything:  PM needs some type of top to bottom
>> stack view, which the layering isolation we have within storage really
>> doesn't cope with well.  No real suggestion has been forthcoming.
> 
> Actually, I think that the particular case in question is really special
> and the fact that there's the pollig loop that user space is involved in
> doesn't make things more stratightforward.
> 
> And PM really doesn't need to see things top to bottom, but the polling
> needs to know what happens in the PM land.  We need to be able to tell it
> "from now on tell user space that there are no events here".  The question

Actually, in newer kernels(2.6.38 later) and user space tools(udisks
version 1.0.3 on), this is no longer the case.

udisks now no longer poll for event change, and in kernel polling has
been used to notify user space about event change with uevent. So the
polling thing can be entirely controlled in kernel space.

And please take a look at the v10 patch I've sent where I tried to solve
this by introducing PM_QOS_NO_POLL flag, see if that makes sense to you.

Thanks,
Aaron

> is where to put that information so that it's accessible to all parts of the
> stack involved.
> 
>> The reason I think it should be emulated (in the acpi layer, not libata,
>> but as long as it's not in SCSI, I'm not so fussed where it ends up) is
>> because ZPODD is the software equivalent of ZPREADY, which will be done
>> in hardware and will be effectively invisible to autopm in the same way
>> that SCSI (and ATA) power management is mostly invisible.  If we
>> currently do ZPREADY emulation in the low layer (i.e. ZPODD has exact
>> ZPREADY emulation), we won't care (except for flipping the sofware bit)
>> whether the device support ZPODD or ZPREADY and it will all just
>> work(tm).  The industry expectation is that ZPODD is just a transition
>> state between current power management and ZPREADY.
> 
> Well, if you poll a ZPREADY-capable drive, it will go off and on in cycles
> transparently, but still it won't save as much energy as it can.  We'll need
> to do something about the polling in that case too, it seems.
> 
> Thanks,
> Rafael
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - Nov. 26, 2012, 12:48 a.m.
On 11/26/2012 08:50 AM, Rafael J. Wysocki wrote:
> On Tuesday, November 20, 2012 04:59:57 PM Aaron Lu wrote:
>> On 11/20/2012 02:00 PM, Aaron Lu wrote:
>>> On 11/19/2012 10:56 PM, Tejun Heo wrote:
>>>> I really think we need a way for (auto)pm and event polling to talk to
>>>> each other so that autopm can tell event poll to sod off while pm is
>>>> in effect.  Trying to solve this from inside libata doesn't seem
>>>> right.  The problem, again, seems to be figuring out which hardware
>>>> device maps to which block device.  Hmmm... Any good ideas?
>>>
>>> A possible way of doing this is using pm qos.
>>>
>>> We currently have 2 pm qos flags, NO_POWER_OFF and REMOTE_WAKEUP, and we
>>> can add another one: NO_POLL, use it like the following:
>>> 1 Set the NO_POLL pm qos flag when the underlying driver thinks it is no
>>>   longer necessary. In the ZPODD's case, it should be set when the
>>>   device is to be powered off;
>>> 2 Clear it when poll is necessary again. In the ZPODD's case, when power
>>>   is re-gained, this flag will be cleared.
>>
>>
>>> 3 In the disk_events_workfn, check if this flag is set, if so, simply
>>>   return.
>>
>> It should be, skip calling disk->fops->check_events, but still queue the
>> work for next time's poll.
>>
>> -Aaron
>>
>>>
>>> The disk->driverfs_dev can be used to host the pm qos flag, ATA layer
>>> can access it through ata_device->sdev->sdev_gendev.
>>>
>>> Is this OK?
> 
> No, I don't think so.  PM QoS is about telling the layer that will put the
> device into low-power states what states are to be taken into consideration.
> In this case, however, we need to tell someone else that the device has been
> turned off.  Clearly, we need a way to do that, but not through PM QoS.
> 
> Did you consider using pm_runtime_suspended() to check the device status?

The problem is, a device can be in runtime suspended state while still
needs to be polled...

Thanks,
Aaron

> 
> Rafael
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Nov. 26, 2012, 12:50 a.m.
On Tuesday, November 20, 2012 04:59:57 PM Aaron Lu wrote:
> On 11/20/2012 02:00 PM, Aaron Lu wrote:
> > On 11/19/2012 10:56 PM, Tejun Heo wrote:
> >> I really think we need a way for (auto)pm and event polling to talk to
> >> each other so that autopm can tell event poll to sod off while pm is
> >> in effect.  Trying to solve this from inside libata doesn't seem
> >> right.  The problem, again, seems to be figuring out which hardware
> >> device maps to which block device.  Hmmm... Any good ideas?
> > 
> > A possible way of doing this is using pm qos.
> > 
> > We currently have 2 pm qos flags, NO_POWER_OFF and REMOTE_WAKEUP, and we
> > can add another one: NO_POLL, use it like the following:
> > 1 Set the NO_POLL pm qos flag when the underlying driver thinks it is no
> >   longer necessary. In the ZPODD's case, it should be set when the
> >   device is to be powered off;
> > 2 Clear it when poll is necessary again. In the ZPODD's case, when power
> >   is re-gained, this flag will be cleared.
> 
> 
> > 3 In the disk_events_workfn, check if this flag is set, if so, simply
> >   return.
> 
> It should be, skip calling disk->fops->check_events, but still queue the
> work for next time's poll.
> 
> -Aaron
> 
> > 
> > The disk->driverfs_dev can be used to host the pm qos flag, ATA layer
> > can access it through ata_device->sdev->sdev_gendev.
> > 
> > Is this OK?

No, I don't think so.  PM QoS is about telling the layer that will put the
device into low-power states what states are to be taken into consideration.
In this case, however, we need to tell someone else that the device has been
turned off.  Clearly, we need a way to do that, but not through PM QoS.

Did you consider using pm_runtime_suspended() to check the device status?

Rafael
Rafael J. Wysocki - Nov. 26, 2012, 1:03 a.m.
On Monday, November 26, 2012 08:48:51 AM Aaron Lu wrote:
> On 11/26/2012 08:50 AM, Rafael J. Wysocki wrote:
> > On Tuesday, November 20, 2012 04:59:57 PM Aaron Lu wrote:
> >> On 11/20/2012 02:00 PM, Aaron Lu wrote:
> >>> On 11/19/2012 10:56 PM, Tejun Heo wrote:
> >>>> I really think we need a way for (auto)pm and event polling to talk to
> >>>> each other so that autopm can tell event poll to sod off while pm is
> >>>> in effect.  Trying to solve this from inside libata doesn't seem
> >>>> right.  The problem, again, seems to be figuring out which hardware
> >>>> device maps to which block device.  Hmmm... Any good ideas?
> >>>
> >>> A possible way of doing this is using pm qos.
> >>>
> >>> We currently have 2 pm qos flags, NO_POWER_OFF and REMOTE_WAKEUP, and we
> >>> can add another one: NO_POLL, use it like the following:
> >>> 1 Set the NO_POLL pm qos flag when the underlying driver thinks it is no
> >>>   longer necessary. In the ZPODD's case, it should be set when the
> >>>   device is to be powered off;
> >>> 2 Clear it when poll is necessary again. In the ZPODD's case, when power
> >>>   is re-gained, this flag will be cleared.
> >>
> >>
> >>> 3 In the disk_events_workfn, check if this flag is set, if so, simply
> >>>   return.
> >>
> >> It should be, skip calling disk->fops->check_events, but still queue the
> >> work for next time's poll.
> >>
> >> -Aaron
> >>
> >>>
> >>> The disk->driverfs_dev can be used to host the pm qos flag, ATA layer
> >>> can access it through ata_device->sdev->sdev_gendev.
> >>>
> >>> Is this OK?
> > 
> > No, I don't think so.  PM QoS is about telling the layer that will put the
> > device into low-power states what states are to be taken into consideration.
> > In this case, however, we need to tell someone else that the device has been
> > turned off.  Clearly, we need a way to do that, but not through PM QoS.
> > 
> > Did you consider using pm_runtime_suspended() to check the device status?
> 
> The problem is, a device can be in runtime suspended state while still
> needs to be polled...

Well, maybe this is the problem, then?  Why does it need to be polled when
suspended?

Rafael
Aaron Lu - Nov. 26, 2012, 1:05 a.m.
On 11/26/2012 09:03 AM, Rafael J. Wysocki wrote:
> On Monday, November 26, 2012 08:48:51 AM Aaron Lu wrote:
>> On 11/26/2012 08:50 AM, Rafael J. Wysocki wrote:
>>> On Tuesday, November 20, 2012 04:59:57 PM Aaron Lu wrote:
>>>> On 11/20/2012 02:00 PM, Aaron Lu wrote:
>>>>> On 11/19/2012 10:56 PM, Tejun Heo wrote:
>>>>>> I really think we need a way for (auto)pm and event polling to talk to
>>>>>> each other so that autopm can tell event poll to sod off while pm is
>>>>>> in effect.  Trying to solve this from inside libata doesn't seem
>>>>>> right.  The problem, again, seems to be figuring out which hardware
>>>>>> device maps to which block device.  Hmmm... Any good ideas?
>>>>>
>>>>> A possible way of doing this is using pm qos.
>>>>>
>>>>> We currently have 2 pm qos flags, NO_POWER_OFF and REMOTE_WAKEUP, and we
>>>>> can add another one: NO_POLL, use it like the following:
>>>>> 1 Set the NO_POLL pm qos flag when the underlying driver thinks it is no
>>>>>   longer necessary. In the ZPODD's case, it should be set when the
>>>>>   device is to be powered off;
>>>>> 2 Clear it when poll is necessary again. In the ZPODD's case, when power
>>>>>   is re-gained, this flag will be cleared.
>>>>
>>>>
>>>>> 3 In the disk_events_workfn, check if this flag is set, if so, simply
>>>>>   return.
>>>>
>>>> It should be, skip calling disk->fops->check_events, but still queue the
>>>> work for next time's poll.
>>>>
>>>> -Aaron
>>>>
>>>>>
>>>>> The disk->driverfs_dev can be used to host the pm qos flag, ATA layer
>>>>> can access it through ata_device->sdev->sdev_gendev.
>>>>>
>>>>> Is this OK?
>>>
>>> No, I don't think so.  PM QoS is about telling the layer that will put the
>>> device into low-power states what states are to be taken into consideration.
>>> In this case, however, we need to tell someone else that the device has been
>>> turned off.  Clearly, we need a way to do that, but not through PM QoS.
>>>
>>> Did you consider using pm_runtime_suspended() to check the device status?
>>
>> The problem is, a device can be in runtime suspended state while still
>> needs to be polled...
> 
> Well, maybe this is the problem, then?  Why does it need to be polled when
> suspended?

For ODDs, poll is not necessary only when ZP capable ODD is powered off.
For other ODDs, poll still needs to go on.

ZP capable ODDs:
  - runtime suspended, power remained(due to NO_POWER_OFF qos flag)
    poll is needed
  -- runtime suspended, power removed
    poll is not needed

Non ZP capable ODDs:
  -- runtime suspended, power remained (power will never be removed)
    poll is needed

If we do not poll for the powered on case, we will lose media change
event.

Thanks,
Aaron

> 
> Rafael
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - Nov. 26, 2012, 1:09 a.m.
On 11/26/2012 09:11 AM, Rafael J. Wysocki wrote:
> On Monday, November 26, 2012 09:05:58 AM Aaron Lu wrote:
>> On 11/26/2012 09:03 AM, Rafael J. Wysocki wrote:
>>> On Monday, November 26, 2012 08:48:51 AM Aaron Lu wrote:
>>>> On 11/26/2012 08:50 AM, Rafael J. Wysocki wrote:
>>>>> On Tuesday, November 20, 2012 04:59:57 PM Aaron Lu wrote:
>>>>>> On 11/20/2012 02:00 PM, Aaron Lu wrote:
>>>>>>> On 11/19/2012 10:56 PM, Tejun Heo wrote:
>>>>>>>> I really think we need a way for (auto)pm and event polling to talk to
>>>>>>>> each other so that autopm can tell event poll to sod off while pm is
>>>>>>>> in effect.  Trying to solve this from inside libata doesn't seem
>>>>>>>> right.  The problem, again, seems to be figuring out which hardware
>>>>>>>> device maps to which block device.  Hmmm... Any good ideas?
>>>>>>>
>>>>>>> A possible way of doing this is using pm qos.
>>>>>>>
>>>>>>> We currently have 2 pm qos flags, NO_POWER_OFF and REMOTE_WAKEUP, and we
>>>>>>> can add another one: NO_POLL, use it like the following:
>>>>>>> 1 Set the NO_POLL pm qos flag when the underlying driver thinks it is no
>>>>>>>   longer necessary. In the ZPODD's case, it should be set when the
>>>>>>>   device is to be powered off;
>>>>>>> 2 Clear it when poll is necessary again. In the ZPODD's case, when power
>>>>>>>   is re-gained, this flag will be cleared.
>>>>>>
>>>>>>
>>>>>>> 3 In the disk_events_workfn, check if this flag is set, if so, simply
>>>>>>>   return.
>>>>>>
>>>>>> It should be, skip calling disk->fops->check_events, but still queue the
>>>>>> work for next time's poll.
>>>>>>
>>>>>> -Aaron
>>>>>>
>>>>>>>
>>>>>>> The disk->driverfs_dev can be used to host the pm qos flag, ATA layer
>>>>>>> can access it through ata_device->sdev->sdev_gendev.
>>>>>>>
>>>>>>> Is this OK?
>>>>>
>>>>> No, I don't think so.  PM QoS is about telling the layer that will put the
>>>>> device into low-power states what states are to be taken into consideration.
>>>>> In this case, however, we need to tell someone else that the device has been
>>>>> turned off.  Clearly, we need a way to do that, but not through PM QoS.
>>>>>
>>>>> Did you consider using pm_runtime_suspended() to check the device status?
>>>>
>>>> The problem is, a device can be in runtime suspended state while still
>>>> needs to be polled...
>>>
>>> Well, maybe this is the problem, then?  Why does it need to be polled when
>>> suspended?
>>
>> For ODDs, poll is not necessary only when ZP capable ODD is powered off.
>> For other ODDs, poll still needs to go on.
>>
>> ZP capable ODDs:
>>   - runtime suspended, power remained(due to NO_POWER_OFF qos flag)
>>     poll is needed
>>   -- runtime suspended, power removed
>>     poll is not needed
>>
>> Non ZP capable ODDs:
>>   -- runtime suspended, power remained (power will never be removed)
>>     poll is needed
>>
>> If we do not poll for the powered on case, we will lose media change
>> event.
> 
> But the media change event should change the status from suspended to active,
> shouldn't it?

The media change event is derived from the poll, if we stop the poll, how
can we know the event in the first place?

Thanks,
Aaron

> 
> Rafael
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Nov. 26, 2012, 1:11 a.m.
On Monday, November 26, 2012 09:05:58 AM Aaron Lu wrote:
> On 11/26/2012 09:03 AM, Rafael J. Wysocki wrote:
> > On Monday, November 26, 2012 08:48:51 AM Aaron Lu wrote:
> >> On 11/26/2012 08:50 AM, Rafael J. Wysocki wrote:
> >>> On Tuesday, November 20, 2012 04:59:57 PM Aaron Lu wrote:
> >>>> On 11/20/2012 02:00 PM, Aaron Lu wrote:
> >>>>> On 11/19/2012 10:56 PM, Tejun Heo wrote:
> >>>>>> I really think we need a way for (auto)pm and event polling to talk to
> >>>>>> each other so that autopm can tell event poll to sod off while pm is
> >>>>>> in effect.  Trying to solve this from inside libata doesn't seem
> >>>>>> right.  The problem, again, seems to be figuring out which hardware
> >>>>>> device maps to which block device.  Hmmm... Any good ideas?
> >>>>>
> >>>>> A possible way of doing this is using pm qos.
> >>>>>
> >>>>> We currently have 2 pm qos flags, NO_POWER_OFF and REMOTE_WAKEUP, and we
> >>>>> can add another one: NO_POLL, use it like the following:
> >>>>> 1 Set the NO_POLL pm qos flag when the underlying driver thinks it is no
> >>>>>   longer necessary. In the ZPODD's case, it should be set when the
> >>>>>   device is to be powered off;
> >>>>> 2 Clear it when poll is necessary again. In the ZPODD's case, when power
> >>>>>   is re-gained, this flag will be cleared.
> >>>>
> >>>>
> >>>>> 3 In the disk_events_workfn, check if this flag is set, if so, simply
> >>>>>   return.
> >>>>
> >>>> It should be, skip calling disk->fops->check_events, but still queue the
> >>>> work for next time's poll.
> >>>>
> >>>> -Aaron
> >>>>
> >>>>>
> >>>>> The disk->driverfs_dev can be used to host the pm qos flag, ATA layer
> >>>>> can access it through ata_device->sdev->sdev_gendev.
> >>>>>
> >>>>> Is this OK?
> >>>
> >>> No, I don't think so.  PM QoS is about telling the layer that will put the
> >>> device into low-power states what states are to be taken into consideration.
> >>> In this case, however, we need to tell someone else that the device has been
> >>> turned off.  Clearly, we need a way to do that, but not through PM QoS.
> >>>
> >>> Did you consider using pm_runtime_suspended() to check the device status?
> >>
> >> The problem is, a device can be in runtime suspended state while still
> >> needs to be polled...
> > 
> > Well, maybe this is the problem, then?  Why does it need to be polled when
> > suspended?
> 
> For ODDs, poll is not necessary only when ZP capable ODD is powered off.
> For other ODDs, poll still needs to go on.
> 
> ZP capable ODDs:
>   - runtime suspended, power remained(due to NO_POWER_OFF qos flag)
>     poll is needed
>   -- runtime suspended, power removed
>     poll is not needed
> 
> Non ZP capable ODDs:
>   -- runtime suspended, power remained (power will never be removed)
>     poll is needed
> 
> If we do not poll for the powered on case, we will lose media change
> event.

But the media change event should change the status from suspended to active,
shouldn't it?

Rafael
Aaron Lu - Nov. 26, 2012, 1:17 a.m.
On 11/26/2012 09:11 AM, Rafael J. Wysocki wrote:
> On Monday, November 26, 2012 09:05:58 AM Aaron Lu wrote:
>> On 11/26/2012 09:03 AM, Rafael J. Wysocki wrote:
>>> On Monday, November 26, 2012 08:48:51 AM Aaron Lu wrote:
>>>> On 11/26/2012 08:50 AM, Rafael J. Wysocki wrote:
>>>>> On Tuesday, November 20, 2012 04:59:57 PM Aaron Lu wrote:
>>>>>> On 11/20/2012 02:00 PM, Aaron Lu wrote:
>>>>>>> On 11/19/2012 10:56 PM, Tejun Heo wrote:
>>>>>>>> I really think we need a way for (auto)pm and event polling to talk to
>>>>>>>> each other so that autopm can tell event poll to sod off while pm is
>>>>>>>> in effect.  Trying to solve this from inside libata doesn't seem
>>>>>>>> right.  The problem, again, seems to be figuring out which hardware
>>>>>>>> device maps to which block device.  Hmmm... Any good ideas?
>>>>>>>
>>>>>>> A possible way of doing this is using pm qos.
>>>>>>>
>>>>>>> We currently have 2 pm qos flags, NO_POWER_OFF and REMOTE_WAKEUP, and we
>>>>>>> can add another one: NO_POLL, use it like the following:
>>>>>>> 1 Set the NO_POLL pm qos flag when the underlying driver thinks it is no
>>>>>>>   longer necessary. In the ZPODD's case, it should be set when the
>>>>>>>   device is to be powered off;
>>>>>>> 2 Clear it when poll is necessary again. In the ZPODD's case, when power
>>>>>>>   is re-gained, this flag will be cleared.
>>>>>>
>>>>>>
>>>>>>> 3 In the disk_events_workfn, check if this flag is set, if so, simply
>>>>>>>   return.
>>>>>>
>>>>>> It should be, skip calling disk->fops->check_events, but still queue the
>>>>>> work for next time's poll.
>>>>>>
>>>>>> -Aaron
>>>>>>
>>>>>>>
>>>>>>> The disk->driverfs_dev can be used to host the pm qos flag, ATA layer
>>>>>>> can access it through ata_device->sdev->sdev_gendev.
>>>>>>>
>>>>>>> Is this OK?
>>>>>
>>>>> No, I don't think so.  PM QoS is about telling the layer that will put the
>>>>> device into low-power states what states are to be taken into consideration.
>>>>> In this case, however, we need to tell someone else that the device has been
>>>>> turned off.  Clearly, we need a way to do that, but not through PM QoS.
>>>>>
>>>>> Did you consider using pm_runtime_suspended() to check the device status?
>>>>
>>>> The problem is, a device can be in runtime suspended state while still
>>>> needs to be polled...
>>>
>>> Well, maybe this is the problem, then?  Why does it need to be polled when
>>> suspended?
>>
>> For ODDs, poll is not necessary only when ZP capable ODD is powered off.
>> For other ODDs, poll still needs to go on.
>>
>> ZP capable ODDs:
>>   - runtime suspended, power remained(due to NO_POWER_OFF qos flag)
>>     poll is needed
>>   -- runtime suspended, power removed
>>     poll is not needed
>>
>> Non ZP capable ODDs:
>>   -- runtime suspended, power remained (power will never be removed)
>>     poll is needed
>>
>> If we do not poll for the powered on case, we will lose media change
>> event.
> 
> But the media change event should change the status from suspended to active,
> shouldn't it?

We need a way PM code can tell block layer that it is not necessary to
poll the device now, and runtime suspended is not enough, so we need
another way.

Thanks,
Aaron

> 
> Rafael
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - Nov. 26, 2012, 1:22 a.m.
On 11/26/2012 09:22 AM, Rafael J. Wysocki wrote:
> On Monday, November 26, 2012 09:09:36 AM Aaron Lu wrote:
>> On 11/26/2012 09:11 AM, Rafael J. Wysocki wrote:
>>> On Monday, November 26, 2012 09:05:58 AM Aaron Lu wrote:
>>>> On 11/26/2012 09:03 AM, Rafael J. Wysocki wrote:
>>>>> On Monday, November 26, 2012 08:48:51 AM Aaron Lu wrote:
>>>>>> On 11/26/2012 08:50 AM, Rafael J. Wysocki wrote:
>>>>>>> On Tuesday, November 20, 2012 04:59:57 PM Aaron Lu wrote:
>>>>>>>> On 11/20/2012 02:00 PM, Aaron Lu wrote:
>>>>>>>>> On 11/19/2012 10:56 PM, Tejun Heo wrote:
>>>>>>>>>> I really think we need a way for (auto)pm and event polling to talk to
>>>>>>>>>> each other so that autopm can tell event poll to sod off while pm is
>>>>>>>>>> in effect.  Trying to solve this from inside libata doesn't seem
>>>>>>>>>> right.  The problem, again, seems to be figuring out which hardware
>>>>>>>>>> device maps to which block device.  Hmmm... Any good ideas?
>>>>>>>>>
>>>>>>>>> A possible way of doing this is using pm qos.
>>>>>>>>>
>>>>>>>>> We currently have 2 pm qos flags, NO_POWER_OFF and REMOTE_WAKEUP, and we
>>>>>>>>> can add another one: NO_POLL, use it like the following:
>>>>>>>>> 1 Set the NO_POLL pm qos flag when the underlying driver thinks it is no
>>>>>>>>>   longer necessary. In the ZPODD's case, it should be set when the
>>>>>>>>>   device is to be powered off;
>>>>>>>>> 2 Clear it when poll is necessary again. In the ZPODD's case, when power
>>>>>>>>>   is re-gained, this flag will be cleared.
>>>>>>>>
>>>>>>>>
>>>>>>>>> 3 In the disk_events_workfn, check if this flag is set, if so, simply
>>>>>>>>>   return.
>>>>>>>>
>>>>>>>> It should be, skip calling disk->fops->check_events, but still queue the
>>>>>>>> work for next time's poll.
>>>>>>>>
>>>>>>>> -Aaron
>>>>>>>>
>>>>>>>>>
>>>>>>>>> The disk->driverfs_dev can be used to host the pm qos flag, ATA layer
>>>>>>>>> can access it through ata_device->sdev->sdev_gendev.
>>>>>>>>>
>>>>>>>>> Is this OK?
>>>>>>>
>>>>>>> No, I don't think so.  PM QoS is about telling the layer that will put the
>>>>>>> device into low-power states what states are to be taken into consideration.
>>>>>>> In this case, however, we need to tell someone else that the device has been
>>>>>>> turned off.  Clearly, we need a way to do that, but not through PM QoS.
>>>>>>>
>>>>>>> Did you consider using pm_runtime_suspended() to check the device status?
>>>>>>
>>>>>> The problem is, a device can be in runtime suspended state while still
>>>>>> needs to be polled...
>>>>>
>>>>> Well, maybe this is the problem, then?  Why does it need to be polled when
>>>>> suspended?
>>>>
>>>> For ODDs, poll is not necessary only when ZP capable ODD is powered off.
>>>> For other ODDs, poll still needs to go on.
>>>>
>>>> ZP capable ODDs:
>>>>   - runtime suspended, power remained(due to NO_POWER_OFF qos flag)
>>>>     poll is needed
>>>>   -- runtime suspended, power removed
>>>>     poll is not needed
>>>>
>>>> Non ZP capable ODDs:
>>>>   -- runtime suspended, power remained (power will never be removed)
>>>>     poll is needed
>>>>
>>>> If we do not poll for the powered on case, we will lose media change
>>>> event.
>>>
>>> But the media change event should change the status from suspended to active,
>>> shouldn't it?
>>
>> The media change event is derived from the poll, if we stop the poll, how
>> can we know the event in the first place?
> 
> OK, so what you're trying to say is that if the device is not turned off
> and the user opens the tray and inserts a media in there, we won't know that
> that happened without polling.  Is that correct?

Yes.

> 
> If so, can you please remind me why we want to pretend that the device is
> suspended if we want to poll it?

We do not pretend the device is suspended, it is :-)

Even though we want to poll it, we are not polling it all the time, we
still have the poll interval, where the device and its ancestor devices
can enter runtime suspended state.

The timing to idle the device is decided by SCSI sr driver, and why it
is done this way is discussed here:
http://thread.gmane.org/gmane.linux.acpi.devel/55243/focus=52703

Thanks,
Aaron

> 
> Rafael
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Nov. 26, 2012, 1:22 a.m.
On Monday, November 26, 2012 09:09:36 AM Aaron Lu wrote:
> On 11/26/2012 09:11 AM, Rafael J. Wysocki wrote:
> > On Monday, November 26, 2012 09:05:58 AM Aaron Lu wrote:
> >> On 11/26/2012 09:03 AM, Rafael J. Wysocki wrote:
> >>> On Monday, November 26, 2012 08:48:51 AM Aaron Lu wrote:
> >>>> On 11/26/2012 08:50 AM, Rafael J. Wysocki wrote:
> >>>>> On Tuesday, November 20, 2012 04:59:57 PM Aaron Lu wrote:
> >>>>>> On 11/20/2012 02:00 PM, Aaron Lu wrote:
> >>>>>>> On 11/19/2012 10:56 PM, Tejun Heo wrote:
> >>>>>>>> I really think we need a way for (auto)pm and event polling to talk to
> >>>>>>>> each other so that autopm can tell event poll to sod off while pm is
> >>>>>>>> in effect.  Trying to solve this from inside libata doesn't seem
> >>>>>>>> right.  The problem, again, seems to be figuring out which hardware
> >>>>>>>> device maps to which block device.  Hmmm... Any good ideas?
> >>>>>>>
> >>>>>>> A possible way of doing this is using pm qos.
> >>>>>>>
> >>>>>>> We currently have 2 pm qos flags, NO_POWER_OFF and REMOTE_WAKEUP, and we
> >>>>>>> can add another one: NO_POLL, use it like the following:
> >>>>>>> 1 Set the NO_POLL pm qos flag when the underlying driver thinks it is no
> >>>>>>>   longer necessary. In the ZPODD's case, it should be set when the
> >>>>>>>   device is to be powered off;
> >>>>>>> 2 Clear it when poll is necessary again. In the ZPODD's case, when power
> >>>>>>>   is re-gained, this flag will be cleared.
> >>>>>>
> >>>>>>
> >>>>>>> 3 In the disk_events_workfn, check if this flag is set, if so, simply
> >>>>>>>   return.
> >>>>>>
> >>>>>> It should be, skip calling disk->fops->check_events, but still queue the
> >>>>>> work for next time's poll.
> >>>>>>
> >>>>>> -Aaron
> >>>>>>
> >>>>>>>
> >>>>>>> The disk->driverfs_dev can be used to host the pm qos flag, ATA layer
> >>>>>>> can access it through ata_device->sdev->sdev_gendev.
> >>>>>>>
> >>>>>>> Is this OK?
> >>>>>
> >>>>> No, I don't think so.  PM QoS is about telling the layer that will put the
> >>>>> device into low-power states what states are to be taken into consideration.
> >>>>> In this case, however, we need to tell someone else that the device has been
> >>>>> turned off.  Clearly, we need a way to do that, but not through PM QoS.
> >>>>>
> >>>>> Did you consider using pm_runtime_suspended() to check the device status?
> >>>>
> >>>> The problem is, a device can be in runtime suspended state while still
> >>>> needs to be polled...
> >>>
> >>> Well, maybe this is the problem, then?  Why does it need to be polled when
> >>> suspended?
> >>
> >> For ODDs, poll is not necessary only when ZP capable ODD is powered off.
> >> For other ODDs, poll still needs to go on.
> >>
> >> ZP capable ODDs:
> >>   - runtime suspended, power remained(due to NO_POWER_OFF qos flag)
> >>     poll is needed
> >>   -- runtime suspended, power removed
> >>     poll is not needed
> >>
> >> Non ZP capable ODDs:
> >>   -- runtime suspended, power remained (power will never be removed)
> >>     poll is needed
> >>
> >> If we do not poll for the powered on case, we will lose media change
> >> event.
> > 
> > But the media change event should change the status from suspended to active,
> > shouldn't it?
> 
> The media change event is derived from the poll, if we stop the poll, how
> can we know the event in the first place?

OK, so what you're trying to say is that if the device is not turned off
and the user opens the tray and inserts a media in there, we won't know that
that happened without polling.  Is that correct?

If so, can you please remind me why we want to pretend that the device is
suspended if we want to poll it?

Rafael
James Bottomley - Nov. 26, 2012, 5:03 a.m.
On Mon, 2012-11-26 at 01:33 +0100, Rafael J. Wysocki wrote:
> On Monday, November 19, 2012 03:06:51 PM James Bottomley wrote:
> > > I really think we need a way for (auto)pm and event polling to talk to
> > > each other so that autopm can tell event poll to sod off while pm is
> > > in effect.  Trying to solve this from inside libata doesn't seem
> > > right.  The problem, again, seems to be figuring out which hardware
> > > device maps to which block device.  Hmmm... Any good ideas?
> > 
> > I've asked the PM people several times about this, because it's a real
> > problem for almost everything:  PM needs some type of top to bottom
> > stack view, which the layering isolation we have within storage really
> > doesn't cope with well.  No real suggestion has been forthcoming.
> 
> Actually, I think that the particular case in question is really special
> and the fact that there's the pollig loop that user space is involved in
> doesn't make things more stratightforward.
> 
> And PM really doesn't need to see things top to bottom, but the polling
> needs to know what happens in the PM land.  We need to be able to tell it
> "from now on tell user space that there are no events here".  The question
> is where to put that information so that it's accessible to all parts of the
> stack involved.

Right, open to suggestions ...

> > The reason I think it should be emulated (in the acpi layer, not libata,
> > but as long as it's not in SCSI, I'm not so fussed where it ends up) is
> > because ZPODD is the software equivalent of ZPREADY, which will be done
> > in hardware and will be effectively invisible to autopm in the same way
> > that SCSI (and ATA) power management is mostly invisible.  If we
> > currently do ZPREADY emulation in the low layer (i.e. ZPODD has exact
> > ZPREADY emulation), we won't care (except for flipping the sofware bit)
> > whether the device support ZPODD or ZPREADY and it will all just
> > work(tm).  The industry expectation is that ZPODD is just a transition
> > state between current power management and ZPREADY.
> 
> Well, if you poll a ZPREADY-capable drive, it will go off and on in cycles
> transparently, but still it won't save as much energy as it can.  We'll need
> to do something about the polling in that case too, it seems.

No: with ZPREADY, the device effectively lies to the poll.  The Spec
says that when it powers off the mechanical pieces, it must reply from
firmware to a certain preset emulations of SCSI commands and not wake
from power off.  These commands include TEST UNIT READY and a few
others, so the device will happily reply to polls while being off (it
replies with the original state before power was lost).  When you issue
actual medium access commands, or manually insert or remove media it
will wake up.

That's why I think ZPODD should emulate this behaviour.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - Nov. 26, 2012, 5:09 a.m.
On 11/26/2012 01:03 PM, James Bottomley wrote:
> On Mon, 2012-11-26 at 01:33 +0100, Rafael J. Wysocki wrote:
>> On Monday, November 19, 2012 03:06:51 PM James Bottomley wrote:
>>>> I really think we need a way for (auto)pm and event polling to talk to
>>>> each other so that autopm can tell event poll to sod off while pm is
>>>> in effect.  Trying to solve this from inside libata doesn't seem
>>>> right.  The problem, again, seems to be figuring out which hardware
>>>> device maps to which block device.  Hmmm... Any good ideas?
>>>
>>> I've asked the PM people several times about this, because it's a real
>>> problem for almost everything:  PM needs some type of top to bottom
>>> stack view, which the layering isolation we have within storage really
>>> doesn't cope with well.  No real suggestion has been forthcoming.
>>
>> Actually, I think that the particular case in question is really special
>> and the fact that there's the pollig loop that user space is involved in
>> doesn't make things more stratightforward.
>>
>> And PM really doesn't need to see things top to bottom, but the polling
>> needs to know what happens in the PM land.  We need to be able to tell it
>> "from now on tell user space that there are no events here".  The question
>> is where to put that information so that it's accessible to all parts of the
>> stack involved.
> 
> Right, open to suggestions ...
> 
>>> The reason I think it should be emulated (in the acpi layer, not libata,
>>> but as long as it's not in SCSI, I'm not so fussed where it ends up) is
>>> because ZPODD is the software equivalent of ZPREADY, which will be done
>>> in hardware and will be effectively invisible to autopm in the same way
>>> that SCSI (and ATA) power management is mostly invisible.  If we
>>> currently do ZPREADY emulation in the low layer (i.e. ZPODD has exact
>>> ZPREADY emulation), we won't care (except for flipping the sofware bit)
>>> whether the device support ZPODD or ZPREADY and it will all just
>>> work(tm).  The industry expectation is that ZPODD is just a transition
>>> state between current power management and ZPREADY.
>>
>> Well, if you poll a ZPREADY-capable drive, it will go off and on in cycles
>> transparently, but still it won't save as much energy as it can.  We'll need
>> to do something about the polling in that case too, it seems.
> 
> No: with ZPREADY, the device effectively lies to the poll.  The Spec
> says that when it powers off the mechanical pieces, it must reply from
> firmware to a certain preset emulations of SCSI commands and not wake
> from power off.  These commands include TEST UNIT READY and a few
> others, so the device will happily reply to polls while being off (it
> replies with the original state before power was lost).  When you issue
> actual medium access commands, or manually insert or remove media it
> will wake up.
> 
> That's why I think ZPODD should emulate this behaviour.

I suppose you are refering to section 15.3.5?

I don't think the SPEC says what the host software _must_ do, it's
informative. And I agree that when possible, we should emulate the
command without powering up the ODD, but if we can eliminate the noise,
wouldn't that be better?

-Aaron

> 
> James
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley - Nov. 26, 2012, 7:32 a.m.
On Mon, 2012-11-26 at 13:09 +0800, Aaron Lu wrote:
> On 11/26/2012 01:03 PM, James Bottomley wrote:
> > On Mon, 2012-11-26 at 01:33 +0100, Rafael J. Wysocki wrote:
> >> On Monday, November 19, 2012 03:06:51 PM James Bottomley wrote:
> >>>> I really think we need a way for (auto)pm and event polling to talk to
> >>>> each other so that autopm can tell event poll to sod off while pm is
> >>>> in effect.  Trying to solve this from inside libata doesn't seem
> >>>> right.  The problem, again, seems to be figuring out which hardware
> >>>> device maps to which block device.  Hmmm... Any good ideas?
> >>>
> >>> I've asked the PM people several times about this, because it's a real
> >>> problem for almost everything:  PM needs some type of top to bottom
> >>> stack view, which the layering isolation we have within storage really
> >>> doesn't cope with well.  No real suggestion has been forthcoming.
> >>
> >> Actually, I think that the particular case in question is really special
> >> and the fact that there's the pollig loop that user space is involved in
> >> doesn't make things more stratightforward.
> >>
> >> And PM really doesn't need to see things top to bottom, but the polling
> >> needs to know what happens in the PM land.  We need to be able to tell it
> >> "from now on tell user space that there are no events here".  The question
> >> is where to put that information so that it's accessible to all parts of the
> >> stack involved.
> > 
> > Right, open to suggestions ...
> > 
> >>> The reason I think it should be emulated (in the acpi layer, not libata,
> >>> but as long as it's not in SCSI, I'm not so fussed where it ends up) is
> >>> because ZPODD is the software equivalent of ZPREADY, which will be done
> >>> in hardware and will be effectively invisible to autopm in the same way
> >>> that SCSI (and ATA) power management is mostly invisible.  If we
> >>> currently do ZPREADY emulation in the low layer (i.e. ZPODD has exact
> >>> ZPREADY emulation), we won't care (except for flipping the sofware bit)
> >>> whether the device support ZPODD or ZPREADY and it will all just
> >>> work(tm).  The industry expectation is that ZPODD is just a transition
> >>> state between current power management and ZPREADY.
> >>
> >> Well, if you poll a ZPREADY-capable drive, it will go off and on in cycles
> >> transparently, but still it won't save as much energy as it can.  We'll need
> >> to do something about the polling in that case too, it seems.
> > 
> > No: with ZPREADY, the device effectively lies to the poll.  The Spec
> > says that when it powers off the mechanical pieces, it must reply from
> > firmware to a certain preset emulations of SCSI commands and not wake
> > from power off.  These commands include TEST UNIT READY and a few
> > others, so the device will happily reply to polls while being off (it
> > replies with the original state before power was lost).  When you issue
> > actual medium access commands, or manually insert or remove media it
> > will wake up.
> > 
> > That's why I think ZPODD should emulate this behaviour.
> 
> I suppose you are refering to section 15.3.5?

That's the recommendation for emulating ZPREADY in ZPODD devices, yes.

> I don't think the SPEC says what the host software _must_ do, it's
> informative. And I agree that when possible, we should emulate the
> command without powering up the ODD, but if we can eliminate the noise,
> wouldn't that be better?

The way I look at it is we currently have no real power management for
actual SCSI devices, we rely on the internal timers of the device to
effect power management for us.  ZPREADY fits right into this scheme (as
I think it was designed to) so it seems odd to me that we would
introduce a software PM based scheme for ZPODD, which is an interim spec
until everything supports ZPREADY, and then go back to doing nothing for
ZPREADY.

I'm amenable to a proposal that we *shouldn't* be doing nothing for SCSI
PM and perhaps it should be plumbed into software PM, but it looks odd
to me to have sofware PM for ZPODD but not for at least ZPREADY and
probably for SCSI PM as well.

If we elect to do nothing about ZPREADY and SCSI PM, then I think ZPODD
should be implemented in a way that emulates ZPREADY (i.e. as section
15.3.5 recommends).

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - Nov. 26, 2012, 8:27 a.m.
On 11/26/2012 03:32 PM, James Bottomley wrote:
> On Mon, 2012-11-26 at 13:09 +0800, Aaron Lu wrote:
>> On 11/26/2012 01:03 PM, James Bottomley wrote:
>>> On Mon, 2012-11-26 at 01:33 +0100, Rafael J. Wysocki wrote:
>>>> On Monday, November 19, 2012 03:06:51 PM James Bottomley wrote:
>>>>>> I really think we need a way for (auto)pm and event polling to talk to
>>>>>> each other so that autopm can tell event poll to sod off while pm is
>>>>>> in effect.  Trying to solve this from inside libata doesn't seem
>>>>>> right.  The problem, again, seems to be figuring out which hardware
>>>>>> device maps to which block device.  Hmmm... Any good ideas?
>>>>>
>>>>> I've asked the PM people several times about this, because it's a real
>>>>> problem for almost everything:  PM needs some type of top to bottom
>>>>> stack view, which the layering isolation we have within storage really
>>>>> doesn't cope with well.  No real suggestion has been forthcoming.
>>>>
>>>> Actually, I think that the particular case in question is really special
>>>> and the fact that there's the pollig loop that user space is involved in
>>>> doesn't make things more stratightforward.
>>>>
>>>> And PM really doesn't need to see things top to bottom, but the polling
>>>> needs to know what happens in the PM land.  We need to be able to tell it
>>>> "from now on tell user space that there are no events here".  The question
>>>> is where to put that information so that it's accessible to all parts of the
>>>> stack involved.
>>>
>>> Right, open to suggestions ...
>>>
>>>>> The reason I think it should be emulated (in the acpi layer, not libata,
>>>>> but as long as it's not in SCSI, I'm not so fussed where it ends up) is
>>>>> because ZPODD is the software equivalent of ZPREADY, which will be done
>>>>> in hardware and will be effectively invisible to autopm in the same way
>>>>> that SCSI (and ATA) power management is mostly invisible.  If we
>>>>> currently do ZPREADY emulation in the low layer (i.e. ZPODD has exact
>>>>> ZPREADY emulation), we won't care (except for flipping the sofware bit)
>>>>> whether the device support ZPODD or ZPREADY and it will all just
>>>>> work(tm).  The industry expectation is that ZPODD is just a transition
>>>>> state between current power management and ZPREADY.
>>>>
>>>> Well, if you poll a ZPREADY-capable drive, it will go off and on in cycles
>>>> transparently, but still it won't save as much energy as it can.  We'll need
>>>> to do something about the polling in that case too, it seems.
>>>
>>> No: with ZPREADY, the device effectively lies to the poll.  The Spec
>>> says that when it powers off the mechanical pieces, it must reply from
>>> firmware to a certain preset emulations of SCSI commands and not wake
>>> from power off.  These commands include TEST UNIT READY and a few
>>> others, so the device will happily reply to polls while being off (it
>>> replies with the original state before power was lost).  When you issue
>>> actual medium access commands, or manually insert or remove media it
>>> will wake up.
>>>
>>> That's why I think ZPODD should emulate this behaviour.
>>
>> I suppose you are refering to section 15.3.5?
> 
> That's the recommendation for emulating ZPREADY in ZPODD devices, yes.
> 
>> I don't think the SPEC says what the host software _must_ do, it's
>> informative. And I agree that when possible, we should emulate the
>> command without powering up the ODD, but if we can eliminate the noise,
>> wouldn't that be better?
> 
> The way I look at it is we currently have no real power management for
> actual SCSI devices, we rely on the internal timers of the device to
> effect power management for us.  ZPREADY fits right into this scheme (as
> I think it was designed to) so it seems odd to me that we would
> introduce a software PM based scheme for ZPODD, which is an interim spec
> until everything supports ZPREADY, and then go back to doing nothing for
> ZPREADY.
> 
> I'm amenable to a proposal that we *shouldn't* be doing nothing for SCSI
> PM and perhaps it should be plumbed into software PM, but it looks odd
> to me to have sofware PM for ZPODD but not for at least ZPREADY and
> probably for SCSI PM as well.

Well, ZPREADY is not a power state that we can program the ODD to
enter(figure 234 and table 323 of the SPEC), it servers more like an
information provided by ODD to host so that host does not need to do TUR
and then examine the sense code to decide if zero power ready status is
satisfied but simply query ODD if its current power state is ZPREADY.
So it's not that we program the device to go into ZPREADY power state
and the ODD's power will be omitted.

The benefit of a ZPREADY capable ODD is that, when we need to decide if
the ODD is in a zero power ready status, we can simply query the ODD by
issuing a GET_EVENT_STATUS_NOTIFICATION and check the returned power
class events, if it is in ZPREADY power state, then we can omit the
power. To support ZPREADY, we just need some change to the zpready
funtion, which currently uses sense code to check ZP ready status.

So this is my understanding of ZPREADY, and I don't see it as a total
different thing with ZPODD, it just changes the way how host senses the
zero power ready status. But if I was wrong, please kindly let me know,
thanks.

-Aaron

> 
> If we elect to do nothing about ZPREADY and SCSI PM, then I think ZPODD
> should be implemented in a way that emulates ZPREADY (i.e. as section
> 15.3.5 recommends).
> 
> James
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley - Nov. 26, 2012, 1:17 p.m.
On Mon, 2012-11-26 at 16:27 +0800, Aaron Lu wrote:
> Well, ZPREADY is not a power state that we can program the ODD to
> enter(figure 234 and table 323 of the SPEC), it servers more like an
> information provided by ODD to host so that host does not need to do TUR
> and then examine the sense code to decide if zero power ready status is
> satisfied but simply query ODD if its current power state is ZPREADY.
> So it's not that we program the device to go into ZPREADY power state
> and the ODD's power will be omitted.
> 
> The benefit of a ZPREADY capable ODD is that, when we need to decide if
> the ODD is in a zero power ready status, we can simply query the ODD by
> issuing a GET_EVENT_STATUS_NOTIFICATION and check the returned power
> class events, if it is in ZPREADY power state, then we can omit the
> power. To support ZPREADY, we just need some change to the zpready
> funtion, which currently uses sense code to check ZP ready status.
> 
> So this is my understanding of ZPREADY, and I don't see it as a total
> different thing with ZPODD, it just changes the way how host senses the
> zero power ready status. But if I was wrong, please kindly let me know,
> thanks.

My understanding is that a ZPREADY device may be capable of internal
power down, meaning it doesn't necessarily need the host to omit the
power.  It depends what the difference is between Sleep and Off is
(they're deliberately left as implementation defined in the standard, Ch
16, but the conditions of sleep are pretty onerous, so it sounds like
most of the mechanics are powered down).

However, if you want to work it similarly to ZPODD, then the timeouts
automatically transitions to ZPREADY, the device issues an event, we
trap the event at the low level and omit power.

I'm also curious about driving sleep from autopm, since mode page timers
don't control the sleep transition.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - Nov. 26, 2012, 4:21 p.m.
On Mon, 26 Nov 2012, James Bottomley wrote:

> I'm also curious about driving sleep from autopm, since mode page timers
> don't control the sleep transition.

Is it feasible to do this the other way around?  That is, to drive 
runtime suspend by noticing when the device decides to put itself into 
a low-power state?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley - Nov. 26, 2012, 7:15 p.m.
On Mon, 2012-11-26 at 11:21 -0500, Alan Stern wrote:
> On Mon, 26 Nov 2012, James Bottomley wrote:
> 
> > I'm also curious about driving sleep from autopm, since mode page timers
> > don't control the sleep transition.
> 
> Is it feasible to do this the other way around?  That is, to drive 
> runtime suspend by noticing when the device decides to put itself into 
> a low-power state?

Well, yes and no.  The spec is annoyingly (and deliberately) vague about
what the states actually mean.  The two states that the devices go into
because of timers is idle and standby.  The supposedly deeper low power
state of sleep can only be reached by sending a command to the device.
The design is that software (or HBA drivers) don't really need to notice
idle and standby; the device automatically manages transitions between
them depending on command activity.  For sleep, we do have to care (if
it actually makes some meaningful difference).

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - Nov. 27, 2012, 1:41 a.m.
On 11/26/2012 09:17 PM, James Bottomley wrote:
> On Mon, 2012-11-26 at 16:27 +0800, Aaron Lu wrote:
>> Well, ZPREADY is not a power state that we can program the ODD to
>> enter(figure 234 and table 323 of the SPEC), it servers more like an
>> information provided by ODD to host so that host does not need to do TUR
>> and then examine the sense code to decide if zero power ready status is
>> satisfied but simply query ODD if its current power state is ZPREADY.
>> So it's not that we program the device to go into ZPREADY power state
>> and the ODD's power will be omitted.
>>
>> The benefit of a ZPREADY capable ODD is that, when we need to decide if
>> the ODD is in a zero power ready status, we can simply query the ODD by
>> issuing a GET_EVENT_STATUS_NOTIFICATION and check the returned power
>> class events, if it is in ZPREADY power state, then we can omit the
>> power. To support ZPREADY, we just need some change to the zpready
>> funtion, which currently uses sense code to check ZP ready status.
>>
>> So this is my understanding of ZPREADY, and I don't see it as a total
>> different thing with ZPODD, it just changes the way how host senses the
>> zero power ready status. But if I was wrong, please kindly let me know,
>> thanks.
> 
> My understanding is that a ZPREADY device may be capable of internal
> power down, meaning it doesn't necessarily need the host to omit the
> power.  It depends what the difference is between Sleep and Off is
> (they're deliberately left as implementation defined in the standard, Ch
> 16, but the conditions of sleep are pretty onerous, so it sounds like
> most of the mechanics are powered down).

I Agree that when the ODD is put to Sleep state, it may power down most
of the mechanics, good for power saving. The problem is, we have the 2
seconds poll, and since Sleep state can not process any command, we will
need to bring the ODD out of Sleep state every 2 seconds, is this
feasible? Please note that leaving Sleep state needs full initialization
of the ODD.

ZPODD system(ODD+platform) solves this problem with ACPI, when the ODD
is powered off and any event that may induce a media change event will
generate an ACPI interrupt, so we can stop the poll(though in whatever
way is still in discussion).

So I suppose we need to find a proper way to implement Sleep.

> 
> However, if you want to work it similarly to ZPODD, then the timeouts
> automatically transitions to ZPREADY, the device issues an event, we
> trap the event at the low level and omit power.

Yeah, I can do this. Except that I don't quite understand how the device
issues the event to host, by interrupt? My understanding is that, it
will issue this event to itself...and host still needs to use command
GET_EVENT_STATUS_NOTIFICATION to fetch the event, much the same way like
the media related events it emits.

> 
> I'm also curious about driving sleep from autopm, since mode page timers
> don't control the sleep transition.

I see. But we will need to work out a sensible way to put the ODD into
that power state, if at all possible.

Thanks,
Aaron

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Nov. 28, 2012, 12:51 a.m.
On Monday, November 26, 2012 09:03:11 AM James Bottomley wrote:
> On Mon, 2012-11-26 at 01:33 +0100, Rafael J. Wysocki wrote:
> > On Monday, November 19, 2012 03:06:51 PM James Bottomley wrote:
> > > > I really think we need a way for (auto)pm and event polling to talk to
> > > > each other so that autopm can tell event poll to sod off while pm is
> > > > in effect.  Trying to solve this from inside libata doesn't seem
> > > > right.  The problem, again, seems to be figuring out which hardware
> > > > device maps to which block device.  Hmmm... Any good ideas?
> > > 
> > > I've asked the PM people several times about this, because it's a real
> > > problem for almost everything:  PM needs some type of top to bottom
> > > stack view, which the layering isolation we have within storage really
> > > doesn't cope with well.  No real suggestion has been forthcoming.
> > 
> > Actually, I think that the particular case in question is really special
> > and the fact that there's the pollig loop that user space is involved in
> > doesn't make things more stratightforward.
> > 
> > And PM really doesn't need to see things top to bottom, but the polling
> > needs to know what happens in the PM land.  We need to be able to tell it
> > "from now on tell user space that there are no events here".  The question
> > is where to put that information so that it's accessible to all parts of the
> > stack involved.
> 
> Right, open to suggestions ...

Having considered that a bit more I'm now thinking that in fact the power state
the device is in at the moment doesn't really matter, so the polling code need
not really know what PM is doing.  What it needs to know is that the device
will generate a hardware event when something interesting happens, so it is not
necessary to poll it.

In this particular case it is related to power management (apparently, hardware
events will only be generated after the device has been put into ACPI D3cold,
or so Aaron seems to be claiming), but it need not be in general, at least in
principle.

It looks like we need an "event driven" flag that the can be set in the lower
layers and read by the upper layers.  I suppose this means it would need to be
in struct device, but not necessarily in the PM-specific part of it.

Thanks,
Rafael
Tejun Heo - Nov. 28, 2012, 1:39 a.m.
Hey, Rafael.

On Wed, Nov 28, 2012 at 01:51:00AM +0100, Rafael J. Wysocki wrote:
> Having considered that a bit more I'm now thinking that in fact the power state
> the device is in at the moment doesn't really matter, so the polling code need
> not really know what PM is doing.  What it needs to know is that the device
> will generate a hardware event when something interesting happens, so it is not
> necessary to poll it.
> 
> In this particular case it is related to power management (apparently, hardware
> events will only be generated after the device has been put into ACPI D3cold,
> or so Aaron seems to be claiming), but it need not be in general, at least in
> principle.
> 
> It looks like we need an "event driven" flag that the can be set in the lower
> layers and read by the upper layers.  I suppose this means it would need to be
> in struct device, but not necessarily in the PM-specific part of it.

We already have that.  That's what gendisk->async_events is for (as
opposed to gendisk->events).  If all events are async_events, block
won't poll for events, but I'm not sure that's the golden bullet.

* None implements async_events yet and an API is missing -
  disk_check_events() - which is trivial to add, but it's the same
  story.  We'll need a mechanism to shoot up notification from libata
  to block layer.  It's admittedly easier to justify routing through
  SCSI tho.  So, we're mostly shifting the problem.  Given that async
  events is nice to have, so it isn't a bad idea.

* Still dunno much about zpodd but IIUC the notification from
  zero-power is via ACPI.  To advertise that the device doesn't need
  polling, it should also be able to do async notification while
  powered up, which isn't covered by zpodd but ATA async notification.
  So, ummm... that's another obstacle.  If zpodd requires the device
  to support ATA async notification, it might not be too bad tho.

Thanks.
Aaron Lu - Nov. 28, 2012, 2:24 a.m.
On 11/28/2012 09:39 AM, Tejun Heo wrote:
> Hey, Rafael.
> 
> On Wed, Nov 28, 2012 at 01:51:00AM +0100, Rafael J. Wysocki wrote:
>> Having considered that a bit more I'm now thinking that in fact the power state
>> the device is in at the moment doesn't really matter, so the polling code need
>> not really know what PM is doing.  What it needs to know is that the device
>> will generate a hardware event when something interesting happens, so it is not
>> necessary to poll it.
>>
>> In this particular case it is related to power management (apparently, hardware
>> events will only be generated after the device has been put into ACPI D3cold,
>> or so Aaron seems to be claiming), but it need not be in general, at least in
>> principle.
>>
>> It looks like we need an "event driven" flag that the can be set in the lower
>> layers and read by the upper layers.  I suppose this means it would need to be
>> in struct device, but not necessarily in the PM-specific part of it.
> 
> We already have that.  That's what gendisk->async_events is for (as
> opposed to gendisk->events).  If all events are async_events, block
> won't poll for events, but I'm not sure that's the golden bullet.

Right. For ZPODD, the problem is, during power up time, it needs
gendisk->events to be set to get poll; and after powered off, it will
need to clear the gendisk->events field so that the poll work will stop.
I'm not sure if the gendisk->async_events should be set here, as the
interrupt only says that user pressed the eject button for the tray type
ODD but he may not insert any disc. The whole point of the interrupt is
to re-power the ODD, it is not designed to give notification of media
change. This is my understanding of ZPODD.

> 
> * None implements async_events yet and an API is missing -

That's what I'm confused when reading the sata async support code, the
SCSI sr driver will unconditionally set gendisk->events, so how the sata
async can benefit? But this is another story.

>   disk_check_events() - which is trivial to add, but it's the same
>   story.  We'll need a mechanism to shoot up notification from libata
>   to block layer.  It's admittedly easier to justify routing through
>   SCSI tho.  So, we're mostly shifting the problem.  Given that async
>   events is nice to have, so it isn't a bad idea.
> 
> * Still dunno much about zpodd but IIUC the notification from
>   zero-power is via ACPI.  To advertise that the device doesn't need

Yes, when powered off, if users press the eject button of a tray type
ODD or inserts a disc of the slot type ODD, the SATA ODD will generate a
DEVICE ATTENTION signal, which will cause ACPI to emit an interrupt.

On my test system, when I insert a disc into the slot type ODD to wake
it up, it will continue to generate DEVICE ATTENTION signal, and thus,
ACPI interrupts are coming all the time if I do not disable the ACPI GPE
that controls the interrupt. I guess the behaviour is device by device,
and the SPEC only states what ODD needs to do when in powered off state.
And I don't think a tray type ODD will generate DEVICE ATTENTION signal
when its eject button is pressed after powered up, but Jeff Wu from AMD
may be able to test this behaviour as he has some tray type ODDs if this
is meaningful to know.

>   polling, it should also be able to do async notification while
>   powered up, which isn't covered by zpodd but ATA async notification.

Agree. For powered up media change notification, that's SATA async
notification's job.

>   So, ummm... that's another obstacle.  If zpodd requires the device
>   to support ATA async notification, it might not be too bad tho.

This doesn't seem to be the case, since ZPODD is for how ODD to get
notified when it is powered off, so there is no words stating if the ODD
should also support sata async notification.

Thanks,
Aaron

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley - Nov. 28, 2012, 8:56 a.m.
On Tue, 2012-11-27 at 17:39 -0800, Tejun Heo wrote:
> Hey, Rafael.
> 
> On Wed, Nov 28, 2012 at 01:51:00AM +0100, Rafael J. Wysocki wrote:
> > Having considered that a bit more I'm now thinking that in fact the power state
> > the device is in at the moment doesn't really matter, so the polling code need
> > not really know what PM is doing.  What it needs to know is that the device
> > will generate a hardware event when something interesting happens, so it is not
> > necessary to poll it.
> > 
> > In this particular case it is related to power management (apparently, hardware
> > events will only be generated after the device has been put into ACPI D3cold,
> > or so Aaron seems to be claiming), but it need not be in general, at least in
> > principle.
> > 
> > It looks like we need an "event driven" flag that the can be set in the lower
> > layers and read by the upper layers.  I suppose this means it would need to be
> > in struct device, but not necessarily in the PM-specific part of it.
> 
> We already have that.  That's what gendisk->async_events is for (as
> opposed to gendisk->events).  If all events are async_events, block
> won't poll for events, but I'm not sure that's the golden bullet.
> 
> * None implements async_events yet and an API is missing -
>   disk_check_events() - which is trivial to add, but it's the same
>   story.  We'll need a mechanism to shoot up notification from libata
>   to block layer.  It's admittedly easier to justify routing through
>   SCSI tho.  So, we're mostly shifting the problem.  Given that async
>   events is nice to have, so it isn't a bad idea.

Could we drive it in the polling code?  As in, if we set a flag to say
we're event driven and please don't bother us, we could just respond to
the poll with the last known state (this would probably have to be in
SCSI somewhere since most polls are Test Unit Readys).  That way ZPODD
sets this flag when the device powers down and unsets it when it powers
up.

James

> * Still dunno much about zpodd but IIUC the notification from
>   zero-power is via ACPI.  To advertise that the device doesn't need
>   polling, it should also be able to do async notification while
>   powered up, which isn't covered by zpodd but ATA async notification.
>   So, ummm... that's another obstacle.  If zpodd requires the device
>   to support ATA async notification, it might not be too bad tho.



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Lu - Nov. 30, 2012, 8:55 a.m.
On 11/28/2012 09:39 AM, Tejun Heo wrote:
> Hey, Rafael.
> 
> On Wed, Nov 28, 2012 at 01:51:00AM +0100, Rafael J. Wysocki wrote:
>> Having considered that a bit more I'm now thinking that in fact the power state
>> the device is in at the moment doesn't really matter, so the polling code need
>> not really know what PM is doing.  What it needs to know is that the device
>> will generate a hardware event when something interesting happens, so it is not
>> necessary to poll it.
>>
>> In this particular case it is related to power management (apparently, hardware
>> events will only be generated after the device has been put into ACPI D3cold,
>> or so Aaron seems to be claiming), but it need not be in general, at least in
>> principle.
>>
>> It looks like we need an "event driven" flag that the can be set in the lower
>> layers and read by the upper layers.  I suppose this means it would need to be
>> in struct device, but not necessarily in the PM-specific part of it.
> 
> We already have that.  That's what gendisk->async_events is for (as
> opposed to gendisk->events).  If all events are async_events, block
> won't poll for events, but I'm not sure that's the golden bullet.
> 
> * None implements async_events yet and an API is missing -
>   disk_check_events() - which is trivial to add, but it's the same
>   story.  We'll need a mechanism to shoot up notification from libata
>   to block layer.  It's admittedly easier to justify routing through

I don't see a way to do this, since libata has no chance of accessing
the gendisk pointer. Or we can add a new field to struct device,
something like no_poll, but I don't think it is the right thing to do,
as not all devices are block ones.

Any other suggestions/ideas please?

Thanks,
Aaron

>   SCSI tho.  So, we're mostly shifting the problem.  Given that async
>   events is nice to have, so it isn't a bad idea.
> 
> * Still dunno much about zpodd but IIUC the notification from
>   zero-power is via ACPI.  To advertise that the device doesn't need
>   polling, it should also be able to do async notification while
>   powered up, which isn't covered by zpodd but ATA async notification.
>   So, ummm... that's another obstacle.  If zpodd requires the device
>   to support ATA async notification, it might not be too bad tho.
> 
> Thanks.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Nov. 30, 2012, 11:15 a.m.
On Friday, November 30, 2012 04:55:56 PM Aaron Lu wrote:
> On 11/28/2012 09:39 AM, Tejun Heo wrote:
> > Hey, Rafael.
> > 
> > On Wed, Nov 28, 2012 at 01:51:00AM +0100, Rafael J. Wysocki wrote:
> >> Having considered that a bit more I'm now thinking that in fact the power state
> >> the device is in at the moment doesn't really matter, so the polling code need
> >> not really know what PM is doing.  What it needs to know is that the device
> >> will generate a hardware event when something interesting happens, so it is not
> >> necessary to poll it.
> >>
> >> In this particular case it is related to power management (apparently, hardware
> >> events will only be generated after the device has been put into ACPI D3cold,
> >> or so Aaron seems to be claiming), but it need not be in general, at least in
> >> principle.
> >>
> >> It looks like we need an "event driven" flag that the can be set in the lower
> >> layers and read by the upper layers.  I suppose this means it would need to be
> >> in struct device, but not necessarily in the PM-specific part of it.
> > 
> > We already have that.  That's what gendisk->async_events is for (as
> > opposed to gendisk->events).  If all events are async_events, block
> > won't poll for events, but I'm not sure that's the golden bullet.
> > 
> > * None implements async_events yet and an API is missing -
> >   disk_check_events() - which is trivial to add, but it's the same
> >   story.  We'll need a mechanism to shoot up notification from libata
> >   to block layer.  It's admittedly easier to justify routing through
> 
> I don't see a way to do this, since libata has no chance of accessing
> the gendisk pointer. Or we can add a new field to struct device,
> something like no_poll, but I don't think it is the right thing to do,
> as not all devices are block ones.
> 
> Any other suggestions/ideas please?

I think you can follow the James' suggestion:

http://www.spinics.net/lists/linux-acpi/msg40257.html

and see how that goes.

Thanks,
Rafael

Patch

diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index 6b6819c..13ee178 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -784,7 +784,13 @@  static int ata_acpi_push_id(struct ata_device *dev)
  */
 int ata_acpi_on_suspend(struct ata_port *ap)
 {
-	/* nada */
+	struct ata_device *dev;
+
+	ata_for_each_dev(dev, &ap->link, ENABLED) {
+		if (zpodd_dev_enabled(dev))
+			zpodd_check_zpready(dev);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index e3bda07..6f235b9 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2665,6 +2665,10 @@  static void atapi_qc_complete(struct ata_queued_cmd *qc)
 			ata_scsi_rbuf_put(cmd, true, &flags);
 		}
 
+		if (zpodd_dev_enabled(qc->dev) &&
+				scsicmd[0] == GET_EVENT_STATUS_NOTIFICATION)
+			zpodd_snoop_status(qc->dev, cmd);
+
 		cmd->result = SAM_STAT_GOOD;
 	}
 
diff --git a/drivers/ata/libata-zpodd.c b/drivers/ata/libata-zpodd.c
index ba8c985..533a39e 100644
--- a/drivers/ata/libata-zpodd.c
+++ b/drivers/ata/libata-zpodd.c
@@ -2,13 +2,21 @@ 
 #include <linux/cdrom.h>
 #include <linux/pm_runtime.h>
 #include <scsi/scsi_device.h>
+#include <scsi/scsi_cmnd.h>
 
 #include "libata.h"
 
+#define POWEROFF_DELAY  (30 * 1000)     /* 30 seconds for power off delay */
+
 struct zpodd {
 	bool slot:1;
 	bool drawer:1;
 	bool from_notify:1;	/* resumed as a result of acpi notification */
+	bool status_ready:1;	/* ready status derived from media event poll,
+				   it is not accurate, but serves as a hint */
+	bool zp_ready:1;	/* zero power ready state */
+
+	unsigned long last_ready; /* last zero power ready timestamp */
 
 	struct ata_device *dev;
 };
@@ -93,6 +101,114 @@  static bool device_can_poweroff(struct ata_device *ata_dev)
 		return false;
 }
 
+/*
+ * Snoop the result of GET_STATUS_NOTIFICATION_EVENT, the media
+ * status byte has information on media present/door closed.
+ *
+ * This information serves only as a hint, as it is not accurate.
+ * The sense code method will be used when deciding if the ODD is
+ * really zero power ready.
+ */
+void zpodd_snoop_status(struct ata_device *dev, struct scsi_cmnd *scmd)
+{
+	bool ready;
+	char buf[8];
+	struct event_header *eh = (void *)buf;
+	struct media_event_desc *med = (void *)(buf + 4);
+	struct sg_table *table = &scmd->sdb.table;
+	struct zpodd *zpodd = dev->private_data;
+
+	if (sg_copy_to_buffer(table->sgl, table->nents, buf, 8) != 8)
+		return;
+
+	if (be16_to_cpu(eh->data_len) < sizeof(*med))
+		return;
+
+	if (eh->nea || eh->notification_class != 0x4)
+		return;
+
+	if (zpodd->slot)
+		ready = !med->media_present;
+	else
+		ready = !(med->media_present || med->door_open);
+
+	zpodd->status_ready = ready;
+}
+
+/* Test if ODD is zero power ready by sense code */
+static bool zpready(struct ata_device *dev)
+{
+	u8 sense_key, *sense_buf;
+	unsigned int ret, asc, ascq, add_len;
+	struct zpodd *zpodd = dev->private_data;
+
+	ret = atapi_eh_tur(dev, &sense_key);
+
+	if (!ret || sense_key != NOT_READY)
+		return false;
+
+	sense_buf = dev->link->ap->sector_buf;
+	ret = atapi_eh_request_sense(dev, sense_buf, sense_key);
+	if (ret)
+		return false;
+
+	/* sense valid */
+	if ((sense_buf[0] & 0x7f) != 0x70)
+		return false;
+
+	add_len = sense_buf[7];
+	/* has asc and ascq */
+	if (add_len < 6)
+		return false;
+
+	asc = sense_buf[12];
+	ascq = sense_buf[13];
+
+	if (zpodd->slot)
+		/* no media inside */
+		return asc == 0x3a;
+	else
+		/* no media inside and door closed */
+		return asc == 0x3a && ascq == 0x01;
+}
+
+/*
+ * Check ODD's zero power ready status.
+ *
+ * This function is called during ATA port's suspend path,
+ * when the port is not frozen yet, so that we can still make
+ * some IO to the ODD to decide if it is zero power ready.
+ *
+ * The ODD is regarded as zero power ready when it is in zero
+ * power ready state for some time(defined by POWEROFF_DELAY).
+ */
+void zpodd_check_zpready(struct ata_device *dev)
+{
+	bool zp_ready;
+	unsigned long expires;
+	struct zpodd *zpodd = dev->private_data;
+
+	if (!zpodd->status_ready) {
+		zpodd->last_ready = 0;
+		return;
+	}
+
+	if (!zpodd->last_ready) {
+		zp_ready = zpready(dev);
+		if (zp_ready)
+			zpodd->last_ready = jiffies;
+		return;
+	}
+
+	expires = zpodd->last_ready + msecs_to_jiffies(POWEROFF_DELAY);
+	if (time_before(jiffies, expires))
+		return;
+
+	zpodd->zp_ready = zpready(dev);
+	if (!zpodd->zp_ready)
+		zpodd->last_ready = 0;
+}
+
 static void zpodd_wake_dev(acpi_handle handle, u32 event, void *context)
 {
 	struct ata_device *ata_dev = context;
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 5d68210..2b46703 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -241,10 +241,14 @@  static inline bool zpodd_dev_enabled(struct ata_device *dev)
 	else
 		return false;
 }
+void zpodd_snoop_status(struct ata_device *dev, struct scsi_cmnd *scmd);
+void zpodd_check_zpready(struct ata_device *dev);
 #else /* CONFIG_SATA_ZPODD */
 static inline void zpodd_init(struct ata_device *dev) {}
 static inline void zpodd_deinit(struct ata_device *dev) {}
 static inline bool zpodd_dev_enabled(struct ata_device *dev) { return false; }
+static inline void zpodd_snoop_status(struct ata_device *dev, struct scsi_cmnd *scmd) {}
+static inline void zpodd_check_zpready(struct ata_device *dev) {}
 #endif /* CONFIG_SATA_ZPODD */
 
 /* libata-atapi.c */