Patchwork libata: Add ALPM power state accounting to the AHCI driver

login
register
mail settings
Submitter Arjan van de Ven
Date Nov. 14, 2009, 3:24 a.m.
Message ID <20091113192429.4dfc9c39@infradead.org>
Download mbox | patch
Permalink /patch/38421/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Arjan van de Ven - Nov. 14, 2009, 3:24 a.m.
From f62ff8c98080b4a9e66f82f793145b863b4e183a Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Fri, 13 Nov 2009 16:54:37 -0800
Subject: [PATCH] libata: Add ALPM power state accounting to the AHCI driver

PowerTOP wants to show the user how effective the ALPM link
power management is for the user. ALPM is worth around 0.5W on a quiet
link; PowerTOP wants to be able to find and expose cases where the "quiet link" 
isn't actually quiet.

This patch adds state accounting functionality to the AHCI driver for
PowerTOP to use.
The parts of the patch are
1) the sysfs logic of exposing the stats for each state in sysfs
2) the basic accounting logic that gets update on link change interrupts
   (or when the user accesses the info from sysfs)
3) a "accounting enable" flag; in order to get the accounting to work,
   the driver needs to get phyrdy interrupts on link status changes.
   Normally and currently this is disabled by the driver when ALPM is
   on (to reduce overhead); when PowerTOP is running this will need
   to be on to get usable statistics... hence the sysfs tunable.

The PowerTOP output currently looks like this:

Recent SATA AHCI link activity statistics
Active	Partial	Slumber	Device name
  0.5%	 99.5%	  0.0%	host0

(work to resolve "host0" to a more human readable name is in progress in PowerTOP)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/ata/ahci.c |  175 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 173 insertions(+), 2 deletions(-)
Tejun Heo - Nov. 15, 2009, 8:05 a.m.
11/14/2009 12:24 PM, Arjan van de Ven wrote:
> From f62ff8c98080b4a9e66f82f793145b863b4e183a Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Fri, 13 Nov 2009 16:54:37 -0800
> Subject: [PATCH] libata: Add ALPM power state accounting to the AHCI driver
> 
> PowerTOP wants to show the user how effective the ALPM link
> power management is for the user. ALPM is worth around 0.5W on a quiet
> link; PowerTOP wants to be able to find and expose cases where the "quiet link" 
> isn't actually quiet.
> 
> This patch adds state accounting functionality to the AHCI driver for
> PowerTOP to use.
> The parts of the patch are
> 1) the sysfs logic of exposing the stats for each state in sysfs
> 2) the basic accounting logic that gets update on link change interrupts
>    (or when the user accesses the info from sysfs)
> 3) a "accounting enable" flag; in order to get the accounting to work,
>    the driver needs to get phyrdy interrupts on link status changes.
>    Normally and currently this is disabled by the driver when ALPM is
>    on (to reduce overhead); when PowerTOP is running this will need
>    to be on to get usable statistics... hence the sysfs tunable.
> 
> The PowerTOP output currently looks like this:
> 
> Recent SATA AHCI link activity statistics
> Active	Partial	Slumber	Device name
>   0.5%	 99.5%	  0.0%	host0
> 
> (work to resolve "host0" to a more human readable name is in progress in PowerTOP)

Most of logic seems to belong to libata generic part rather than in
ahci itself.  Wouldn't it be better to implement the thing as generic
libata feature which ahci uses?

Thanks.
Jeff Garzik - Nov. 15, 2009, 8:27 a.m.
On 11/15/2009 03:05 AM, Tejun Heo wrote:
> 11/14/2009 12:24 PM, Arjan van de Ven wrote:
>>  From f62ff8c98080b4a9e66f82f793145b863b4e183a Mon Sep 17 00:00:00 2001
>> From: Arjan van de Ven<arjan@linux.intel.com>
>> Date: Fri, 13 Nov 2009 16:54:37 -0800
>> Subject: [PATCH] libata: Add ALPM power state accounting to the AHCI driver
>>
>> PowerTOP wants to show the user how effective the ALPM link
>> power management is for the user. ALPM is worth around 0.5W on a quiet
>> link; PowerTOP wants to be able to find and expose cases where the "quiet link"
>> isn't actually quiet.
>>
>> This patch adds state accounting functionality to the AHCI driver for
>> PowerTOP to use.
>> The parts of the patch are
>> 1) the sysfs logic of exposing the stats for each state in sysfs
>> 2) the basic accounting logic that gets update on link change interrupts
>>     (or when the user accesses the info from sysfs)
>> 3) a "accounting enable" flag; in order to get the accounting to work,
>>     the driver needs to get phyrdy interrupts on link status changes.
>>     Normally and currently this is disabled by the driver when ALPM is
>>     on (to reduce overhead); when PowerTOP is running this will need
>>     to be on to get usable statistics... hence the sysfs tunable.
>>
>> The PowerTOP output currently looks like this:
>>
>> Recent SATA AHCI link activity statistics
>> Active	Partial	Slumber	Device name
>>    0.5%	 99.5%	  0.0%	host0
>>
>> (work to resolve "host0" to a more human readable name is in progress in PowerTOP)
>
> Most of logic seems to belong to libata generic part rather than in
> ahci itself.  Wouldn't it be better to implement the thing as generic
> libata feature which ahci uses?

I had pretty much the same comment, on IRC.  The accounting variables 
should probably live in struct ata_link, or a subsidiary (struct 
ata_link_accounting ?).

What little there is of driver-specific behavior can be handled from 
existing callbacks (->enable_pm) or by creating a driver-specific 
function that calls a generic function (eg. ahci_alpm_set_accounting 
could call ata_alpm_set_accounting, before twiddling AHCI's PORT_IRQ_MASK).

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arjan van de Ven - Nov. 15, 2009, 5 p.m.
On Sun, 15 Nov 2009 17:05:51 +0900
Tejun Heo <tj@kernel.org> wrote:

> Most of logic seems to belong to libata generic part rather than in
> ahci itself.  Wouldn't it be better to implement the thing as generic
> libata feature which ahci uses?

maybe. AHCI is the only one using it then (2 years ago there was
discussion about having others do alpm as well but.. well nothing has
happened).

Doing "most" in libata means that at best you end up with stuff no
longer local and static, and at worst that you get a bunch of
new function pointers via method tables going around... for
something only one driver is using (with no sight of other users).

If there were going to be more users... I'd absolutely agree with you.
But for only 1 user... I tend to prefer simplicity ;)
Arjan van de Ven - Nov. 15, 2009, 5:46 p.m.
On Sun, 15 Nov 2009 03:27:33 -0500
Jeff Garzik <jgarzik@pobox.com> wrote:

> 
> What little there is of driver-specific behavior can be handled from 
> existing callbacks (->enable_pm) or by creating a driver-specific 
> function that calls a generic function (eg. ahci_alpm_set_accounting 
> could call ata_alpm_set_accounting, before twiddling AHCI's
> PORT_IRQ_MASK).

the whole concept of needing that accounting flag is AHCI specific;
if ever any of the other chip drivers goes to do ALPM, it'll be using
explicit software control, which doesn't need this kind of enable flag;
The only reason there is an enable flag is that the hw based accounting
is resulting in extra interrupts that you don't want except when you
want the accounting (read: powertop is running)
Tejun Heo - Nov. 15, 2009, 6:06 p.m.
Arjan van de Ven wrote:
> On Sun, 15 Nov 2009 03:27:33 -0500
> Jeff Garzik <jgarzik@pobox.com> wrote:
> 
>> What little there is of driver-specific behavior can be handled from 
>> existing callbacks (->enable_pm) or by creating a driver-specific 
>> function that calls a generic function (eg. ahci_alpm_set_accounting 
>> could call ata_alpm_set_accounting, before twiddling AHCI's
>> PORT_IRQ_MASK).
> 
> the whole concept of needing that accounting flag is AHCI specific;
> if ever any of the other chip drivers goes to do ALPM, it'll be using
> explicit software control, which doesn't need this kind of enable flag;
> The only reason there is an enable flag is that the hw based accounting
> is resulting in extra interrupts that you don't want except when you
> want the accounting (read: powertop is running)

There's DIPM.
Arjan van de Ven - Nov. 15, 2009, 6:23 p.m.
On Mon, 16 Nov 2009 03:06:32 +0900
Tejun Heo <tj@kernel.org> wrote:

> There's DIPM.

where in the code is this?
And just to be clear: DIPM is done by the hardware so that it needs
sampling/etc features, rather than just straightforward (free) software
accounting when you do software controller transitions ?
Tejun Heo - Nov. 15, 2009, 6:26 p.m.
Arjan van de Ven wrote:
> On Mon, 16 Nov 2009 03:06:32 +0900
> Tejun Heo <tj@kernel.org> wrote:
> 
>> There's DIPM.
> 
> where in the code is this?

In ata_dev_set_dipm()?

> And just to be clear: DIPM is done by the hardware so that it needs
> sampling/etc features, rather than just straightforward (free) software
> accounting when you do software controller transitions ?

Enabling DIPM means that the drive will be in charge of transitioning
the link into powersave modes, so if sampling SStatus is necessary to
determine how much time the link spends in powersave mode.

Thanks.
Arjan van de Ven - Nov. 15, 2009, 6:33 p.m.
On Mon, 16 Nov 2009 03:26:19 +0900
Tejun Heo <tj@kernel.org> wrote:

> Arjan van de Ven wrote:
> > On Mon, 16 Nov 2009 03:06:32 +0900
> > Tejun Heo <tj@kernel.org> wrote:
> > 
> >> There's DIPM.
> > 
> > where in the code is this?
> 
> In ata_dev_set_dipm()?
> 
> > And just to be clear: DIPM is done by the hardware so that it needs
> > sampling/etc features, rather than just straightforward (free)
> > software accounting when you do software controller transitions ?
> 
> Enabling DIPM means that the drive will be in charge of transitioning
> the link into powersave modes, so if sampling SStatus is necessary to
> determine how much time the link spends in powersave mode.

do we get a link interrupt when the state changes?
(with ALPM we do.. optionally.. which is where the flag comes in)
Tejun Heo - Nov. 16, 2009, 1:51 a.m.
11/16/2009 03:33 AM, Arjan van de Ven wrote:
>> Enabling DIPM means that the drive will be in charge of transitioning
>> the link into powersave modes, so if sampling SStatus is necessary to
>> determine how much time the link spends in powersave mode.
> 
> do we get a link interrupt when the state changes?
> (with ALPM we do.. optionally.. which is where the flag comes in)

It will depend on the controller I suppose but we can always poll.

Thanks.
Arjan van de Ven - Nov. 16, 2009, 2 a.m.
On Mon, 16 Nov 2009 10:51:48 +0900
Tejun Heo <tj@kernel.org> wrote:

> 11/16/2009 03:33 AM, Arjan van de Ven wrote:
> >> Enabling DIPM means that the drive will be in charge of
> >> transitioning the link into powersave modes, so if sampling
> >> SStatus is necessary to determine how much time the link spends in
> >> powersave mode.
> > 
> > do we get a link interrupt when the state changes?
> > (with ALPM we do.. optionally.. which is where the flag comes in)
> 
> It will depend on the controller I suppose but we can always poll.
> 

I appreciate the possibility but at this point... it's only that.
To be honest, nothing in the last 2 or 3 years has moved forward in
this area that I can see, even DIPM seems to be AHCI only based on a
grep.

If a polling controller comes forward it's going to need a whole bunch
of infrastructure; a sysfs flag to control the polling and its frequency
being the least of the technical problems.

Moving the base accounting logic to libata wasn't too painful (see the
patch); moving this very driver specific logic gets a much more
intertwigned relationship that at this point in time is just overkill.
If someone comes around with a driver that also need it, can we just
please resolve it at that time?
Tejun Heo - Nov. 16, 2009, 2:15 a.m.
Hello,

11/16/2009 11:00 AM, Arjan van de Ven wrote:
> I appreciate the possibility but at this point... it's only that.
> To be honest, nothing in the last 2 or 3 years has moved forward in
> this area that I can see, even DIPM seems to be AHCI only based on a
> grep.

I'm fairly sure some controllers will happily generate phy event
interrupts on power state transitions.

> If a polling controller comes forward it's going to need a whole bunch
> of infrastructure; a sysfs flag to control the polling and its frequency
> being the least of the technical problems.
> 
> Moving the base accounting logic to libata wasn't too painful (see the
> patch); moving this very driver specific logic gets a much more
> intertwigned relationship that at this point in time is just overkill.
> If someone comes around with a driver that also need it, can we just
> please resolve it at that time?

I don't know.  On PM front, it's true that ahci is way more important
than others.  Implementing things just for ahci has been happening for
some time now and in the long run it's just not healthy.  It lowers
maintainability and may even hinder generic implementation.  In this
case, for example, you're associating link power management with the
host port, not the link.  Your specific case may be fine but what
about DIPM on devices attached via PMP?  Internal implementation is
one thing but you're adding externally visible attribute to the host
part.  This one thing might be fine but we can't allow things like
this to pile up.

Thanks.
Arjan van de Ven - Nov. 16, 2009, 5:55 a.m.
On Mon, 16 Nov 2009 11:15:44 +0900
Tejun Heo <tj@kernel.org> wrote:

> > Moving the base accounting logic to libata wasn't too painful (see
> > the patch); moving this very driver specific logic gets a much more
> > intertwigned relationship that at this point in time is just
> > overkill. If someone comes around with a driver that also need it,
> > can we just please resolve it at that time?
> 
> I don't know.  On PM front, it's true that ahci is way more important
> than others.  Implementing things just for ahci has been happening for
> some time now and in the long run it's just not healthy.  It lowers
> maintainability and may even hinder generic implementation.  In this
> case, for example, you're associating link power management with the
> host port, not the link.  Your specific case may be fine but what
> about DIPM on devices attached via PMP?  Internal implementation is
> one thing but you're adding externally visible attribute to the host
> part.  This one thing might be fine but we can't allow things like
> this to pile up.

sigh. so I moved all the generic logic generic, and left the ahci
specific code specific to ahci. I put the logic there where it was
easy to implement, and there where the other link power management
controls are (in sysfs). If that's not good enough, I'm out of my skills
in the libata world to be honest, and would like to ask you to implement
that instead. let me know what sysfs looks like and I'll adjust
powertop to it....
Tejun Heo - Nov. 16, 2009, 6:14 a.m.
Hello,

Arjan van de Ven wrote:
> sigh. so I moved all the generic logic generic, and left the ahci
> specific code specific to ahci. I put the logic there where it was
> easy to implement, and there where the other link power management
> controls are (in sysfs). If that's not good enough, I'm out of my skills
> in the libata world to be honest, and would like to ask you to implement
> that instead. let me know what sysfs looks like and I'll adjust
> powertop to it....

The reason why we have sysfs attributes which should have been at link
layer at host was that it originally was for ahci alpm which is host
specific feature which got extended to something somewhat generic.
Now another pm feature which should belong to link is added to host
following the precedence.

Then again, it's also true that nobody really cares about ATA PM
features enough during past couple of years so I really don't want to
prevent the feature you're trying to add.  It would be best if there's
someone who would pick it up and implement proper infrastructure but
well that doesn't seem to be happening anytime soon.

So, I don't know.  That's the concern I have but I don't want to nack
your change either.  One thing is at least make those functions take
ata_link isntead of ata_port as there's nothing port specific about
those.  Jeff, what do you think?

Thanks.
Jeff Garzik - Nov. 16, 2009, 8:13 a.m.
On 11/16/2009 01:14 AM, Tejun Heo wrote:
> Hello,
>
> Arjan van de Ven wrote:
>> sigh. so I moved all the generic logic generic, and left the ahci
>> specific code specific to ahci. I put the logic there where it was
>> easy to implement, and there where the other link power management
>> controls are (in sysfs). If that's not good enough, I'm out of my skills
>> in the libata world to be honest, and would like to ask you to implement
>> that instead. let me know what sysfs looks like and I'll adjust
>> powertop to it....
>
> The reason why we have sysfs attributes which should have been at link
> layer at host was that it originally was for ahci alpm which is host
> specific feature which got extended to something somewhat generic.
> Now another pm feature which should belong to link is added to host
> following the precedence.
>
> Then again, it's also true that nobody really cares about ATA PM
> features enough during past couple of years so I really don't want to
> prevent the feature you're trying to add.  It would be best if there's
> someone who would pick it up and implement proper infrastructure but
> well that doesn't seem to be happening anytime soon.
>
> So, I don't know.  That's the concern I have but I don't want to nack
> your change either.  One thing is at least make those functions take
> ata_link isntead of ata_port as there's nothing port specific about
> those.  Jeff, what do you think?

Well,

- these are link-level features
- libata lacks a link-level sysfs API
- we need a link-level sysfs API (ata_transport, anyone?)

The ugly alternative has always been to hack in something at the host level.

In term of internal data structures, the v3 patch putting the stats into 
struct ata_link is definitely the right thing to do.  I would also put 
the accounting_enabled variable in there, as non-AHCI implementations, 
polling or not, seem highly likely to need that.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arjan van de Ven - Nov. 16, 2009, 2:43 p.m.
On Mon, 16 Nov 2009 03:13:48 -0500
Jeff Garzik <jgarzik@pobox.com> wrote:

> On 11/16/2009 01:14 AM, Tejun Heo wrote:
> > Hello,
> >
> > Arjan van de Ven wrote:
> >> sigh. so I moved all the generic logic generic, and left the ahci
> >> specific code specific to ahci. I put the logic there where it was
> >> easy to implement, and there where the other link power management
> >> controls are (in sysfs). If that's not good enough, I'm out of my
> >> skills in the libata world to be honest, and would like to ask you
> >> to implement that instead. let me know what sysfs looks like and
> >> I'll adjust powertop to it....
> >
> > The reason why we have sysfs attributes which should have been at
> > link layer at host was that it originally was for ahci alpm which
> > is host specific feature which got extended to something somewhat
> > generic. Now another pm feature which should belong to link is
> > added to host following the precedence.
> >
> > Then again, it's also true that nobody really cares about ATA PM
> > features enough during past couple of years so I really don't want
> > to prevent the feature you're trying to add.  It would be best if
> > there's someone who would pick it up and implement proper
> > infrastructure but well that doesn't seem to be happening anytime
> > soon.
> >
> > So, I don't know.  That's the concern I have but I don't want to
> > nack your change either.  One thing is at least make those
> > functions take ata_link isntead of ata_port as there's nothing port
> > specific about those.  Jeff, what do you think?
> 
> Well,
> 
> - these are link-level features
> - libata lacks a link-level sysfs API
> - we need a link-level sysfs API (ata_transport, anyone?)
> 
> The ugly alternative has always been to hack in something at the host
> level.

is there a hardware way to ask for the link status via a link level
thing? I thought the sata_scr_read() was by definition a host thing
Tejun Heo - Nov. 16, 2009, 2:59 p.m.
Arjan van de Ven wrote:
> is there a hardware way to ask for the link status via a link level
> thing? I thought the sata_scr_read() was by definition a host thing

sata_scr_read() will do the right thing given the link parameter
although it needs to be called from EH context for PMP links.  For
now, just making the functions take @link param and passing it around
should do.

Thanks.
Arjan van de Ven - Nov. 16, 2009, 3:21 p.m.
On Mon, 16 Nov 2009 23:59:39 +0900
Tejun Heo <tj@kernel.org> wrote:

> Arjan van de Ven wrote:
> > is there a hardware way to ask for the link status via a link level
> > thing? I thought the sata_scr_read() was by definition a host thing
> 
> sata_scr_read() will do the right thing given the link parameter
> although it needs to be called from EH context for PMP links.  For
> now, just making the functions take @link param and passing it around
> should do.
> 

I'm sorry it just does not make sense to me anymore.
if this moves to be a link level thing, then the statistics also need
to be kept link level, and thus exported link level, and I don't think
that exists.

if that ever comes into existence it's easy to change the whole thing
at that time (and I'll then change PowerTOP to match). But doing
some half "oh and if you call it on THIS kind of object it'll explode"
does not sound sensible.

I think I'm waaaay out of my league in terms of understanding the
libata structure in the things you are suggesting to be honest, and I
do not feel I understand enough of the subtleties in this area.
Tejun Heo - Nov. 16, 2009, 3:35 p.m.
Hello, Arjan.

Arjan van de Ven wrote:
> I think I'm waaaay out of my league in terms of understanding the
> libata structure in the things you are suggesting to be honest, and I
> do not feel I understand enough of the subtleties in this area.

SCR access is a bit subtle because it requires issuing commands if the
link is behind PMP and internal commands currently assume EH context.
This is true for all link functions, so I'm suggesting the stats
helpers to follow the same convention.  This will also make slave link
configuration work properly (controllers which present two SATA links
as master/slave of the same port but still provide access to separate
SCR registers).  You know, it's a link function, make it take a link
as all other stuff is designed that way.

Thanks.
Tejun Heo - Nov. 16, 2009, 3:40 p.m.
Tejun Heo wrote:
> Hello, Arjan.
> 
> Arjan van de Ven wrote:
>> I think I'm waaaay out of my league in terms of understanding the
>> libata structure in the things you are suggesting to be honest, and I
>> do not feel I understand enough of the subtleties in this area.
> 
> SCR access is a bit subtle because it requires issuing commands if the
> link is behind PMP and internal commands currently assume EH context.
> This is true for all link functions, so I'm suggesting the stats
> helpers to follow the same convention.  This will also make slave link
> configuration work properly (controllers which present two SATA links
> as master/slave of the same port but still provide access to separate
> SCR registers).  You know, it's a link function, make it take a link
> as all other stuff is designed that way.

Oooh... right, I missed that part about exporting to sysfs.  It won't
show up properly for slave links anyway.  Sorry to keep nagging but
it's quite uncomfortable for me to see the code go in as-is.  ALPM was
one thing as it was host specific feature anyway but this is a clearly
link level thing being implemented at the port layer.  Then again, I
agree there isn't any easy solution at this time.  So, I don't object
to the current implementation.

Argh...........

Thanks.
Alan Cox - Nov. 16, 2009, 3:57 p.m.
> SCR access is a bit subtle because it requires issuing commands if the
> link is behind PMP and internal commands currently assume EH context.

As do chunks of some of the drivers.

> This is true for all link functions, so I'm suggesting the stats
> helpers to follow the same convention.  This will also make slave link
> configuration work properly (controllers which present two SATA links
> as master/slave of the same port but still provide access to separate
> SCR registers).  You know, it's a link function, make it take a link
> as all other stuff is designed that way.

I don't see how you can do per link reporting on AHCI.

Sure maybe some devices should expose it per link, but it's so expensive
and it'll get really ugly fast.

I'd say it has to go in at host level for now. If some super whizzo
future silicon exposes it per link without being beaten around the head
by the software stack then it's easy enough to add per link data and a
generic helper which provides a summary set of numbers at the host level
for hardware that can report link data.

That gives us a useful API now and the ability to fix it in the future,
not that I anticipate anyone ever needing to. AHCI (or AHCI-ish) seems to
be what everyone is working to nowdays with little obvious pressure to
move anywhere else.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Nov. 16, 2009, 4:25 p.m.
Alan Cox wrote:
>> This is true for all link functions, so I'm suggesting the stats
>> helpers to follow the same convention.  This will also make slave link
>> configuration work properly (controllers which present two SATA links
>> as master/slave of the same port but still provide access to separate
>> SCR registers).  You know, it's a link function, make it take a link
>> as all other stuff is designed that way.
> 
> I don't see how you can do per link reporting on AHCI.
> 
> Sure maybe some devices should expose it per link, but it's so expensive
> and it'll get really ugly fast.

Not for ahci but ata_piix, sis and via already do it.

> I'd say it has to go in at host level for now. If some super whizzo
> future silicon exposes it per link without being beaten around the head
> by the software stack then it's easy enough to add per link data and a
> generic helper which provides a summary set of numbers at the host level
> for hardware that can report link data.
> 
> That gives us a useful API now and the ability to fix it in the future,
> not that I anticipate anyone ever needing to. AHCI (or AHCI-ish) seems to
> be what everyone is working to nowdays with little obvious pressure to
> move anywhere else.

Yeap, agreed.  It just is so ugly (not Arjan's fault).

Thanks.
Jeff Garzik - Nov. 16, 2009, 9:21 p.m.
On 11/16/2009 09:43 AM, Arjan van de Ven wrote:
> is there a hardware way to ask for the link status via a link level
> thing? I thought the sata_scr_read() was by definition a host thing

It is by definition a link-level thing, which is why sata_scr_read() 
takes struct ata_link * as its first argument.

Think about it...  a port multiplier has downstream links, each with 
their own link status, and link-related statistics.

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Nov. 16, 2009, 9:25 p.m.
On 11/16/2009 10:21 AM, Arjan van de Ven wrote:
> On Mon, 16 Nov 2009 23:59:39 +0900
> Tejun Heo<tj@kernel.org>  wrote:
>
>> Arjan van de Ven wrote:
>>> is there a hardware way to ask for the link status via a link level
>>> thing? I thought the sata_scr_read() was by definition a host thing
>>
>> sata_scr_read() will do the right thing given the link parameter
>> although it needs to be called from EH context for PMP links.  For
>> now, just making the functions take @link param and passing it around
>> should do.
>>
>
> I'm sorry it just does not make sense to me anymore.
> if this moves to be a link level thing, then the statistics also need
> to be kept link level, and thus exported link level, and I don't think
> that exists.

It is already link-level in hardware (ie. in reality), in the sata_scr_* 
interface, and in the accounting stats you [quite correctly] added to 
struct ata_link in your patch.

The only thing -not- at link level is the userland interface, which 
instead uses host-level granularity.

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arjan van de Ven - Nov. 17, 2009, 5:25 a.m.
On Mon, 16 Nov 2009 16:21:26 -0500
Jeff Garzik <jgarzik@pobox.com> wrote:

> On 11/16/2009 09:43 AM, Arjan van de Ven wrote:
> > is there a hardware way to ask for the link status via a link level
> > thing? I thought the sata_scr_read() was by definition a host thing
> 
> It is by definition a link-level thing, which is why sata_scr_read() 
> takes struct ata_link * as its first argument.
> 
> Think about it...  a port multiplier has downstream links, each with 
> their own link status, and link-related statistics.

but also think about it... we can neither call this function with one
of those as argument (since the function is called from irq context,
and not the EH context) nor represent the result.

The former would make it an unsafe API, while right now it's a safe API.

Patch

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index a3241a1..448d684 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -72,6 +72,21 @@  MODULE_PARM_DESC(ignore_sss, "Ignore staggered spinup flag (0=don't ignore, 1=ig
 static int ahci_enable_alpm(struct ata_port *ap,
 		enum link_pm policy);
 static void ahci_disable_alpm(struct ata_port *ap);
+
+static ssize_t ahci_alpm_show_active(struct device *dev,
+				  struct device_attribute *attr, char *buf);
+static ssize_t ahci_alpm_show_slumber(struct device *dev,
+				  struct device_attribute *attr, char *buf);
+static ssize_t ahci_alpm_show_partial(struct device *dev,
+				  struct device_attribute *attr, char *buf);
+
+static ssize_t ahci_alpm_show_accounting(struct device *dev,
+				  struct device_attribute *attr, char *buf);
+
+static ssize_t ahci_alpm_set_accounting(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count);
+
 static ssize_t ahci_led_show(struct ata_port *ap, char *buf);
 static ssize_t ahci_led_store(struct ata_port *ap, const char *buf,
 			      size_t size);
@@ -289,6 +304,13 @@  struct ahci_host_priv {
 	u32 			em_loc; /* enclosure management location */
 };
 
+enum ahci_port_states {
+	AHCI_PORT_NOLINK = 0,
+	AHCI_PORT_ACTIVE = 1,
+	AHCI_PORT_PARTIAL = 2,
+	AHCI_PORT_SLUMBER = 3
+};
+
 struct ahci_port_priv {
 	struct ata_link		*active_link;
 	struct ahci_cmd_hdr	*cmd_slot;
@@ -304,6 +326,14 @@  struct ahci_port_priv {
 	u32 			intr_mask;	/* interrupts to enable */
 	/* enclosure management info per PM slot */
 	struct ahci_em_priv	em_priv[EM_MAX_SLOTS];
+
+	/* ALPM accounting state and stats */
+	unsigned int 		accounting_active:1;
+	u64			active_jiffies;
+	u64			partial_jiffies;
+	u64			slumber_jiffies;
+	int			previous_state;
+	int			previous_jiffies;
 };
 
 static int ahci_scr_read(struct ata_link *link, unsigned int sc_reg, u32 *val);
@@ -359,6 +389,12 @@  DEVICE_ATTR(ahci_host_cap2, S_IRUGO, ahci_show_host_cap2, NULL);
 DEVICE_ATTR(ahci_host_version, S_IRUGO, ahci_show_host_version, NULL);
 DEVICE_ATTR(ahci_port_cmd, S_IRUGO, ahci_show_port_cmd, NULL);
 
+DEVICE_ATTR(ahci_alpm_active, S_IRUGO, ahci_alpm_show_active, NULL);
+DEVICE_ATTR(ahci_alpm_partial, S_IRUGO, ahci_alpm_show_partial, NULL);
+DEVICE_ATTR(ahci_alpm_slumber, S_IRUGO, ahci_alpm_show_slumber, NULL);
+DEVICE_ATTR(ahci_alpm_accounting, S_IRUGO | S_IWUSR,
+		ahci_alpm_show_accounting, ahci_alpm_set_accounting);
+
 static struct device_attribute *ahci_shost_attrs[] = {
 	&dev_attr_link_power_management_policy,
 	&dev_attr_em_message_type,
@@ -367,6 +403,10 @@  static struct device_attribute *ahci_shost_attrs[] = {
 	&dev_attr_ahci_host_cap2,
 	&dev_attr_ahci_host_version,
 	&dev_attr_ahci_port_cmd,
+	&dev_attr_ahci_alpm_active,
+	&dev_attr_ahci_alpm_partial,
+	&dev_attr_ahci_alpm_slumber,
+	&dev_attr_ahci_alpm_accounting,
 	NULL
 };
 
@@ -1165,9 +1205,14 @@  static int ahci_enable_alpm(struct ata_port *ap,
  	 * getting woken up due to spurious phy ready interrupts
 	 * TBD - Hot plug should be done via polling now, is
 	 * that even supported?
+	 *
+	 * However, when accounting_active is set, we do want
+	 * the interrupts for accounting purposes.
  	 */
-	pp->intr_mask &= ~PORT_IRQ_PHYRDY;
-	writel(pp->intr_mask, port_mmio + PORT_IRQ_MASK);
+	if (!pp->accounting_active) {
+		pp->intr_mask &= ~PORT_IRQ_PHYRDY;
+		writel(pp->intr_mask, port_mmio + PORT_IRQ_MASK);
+	}
 
 	/*
  	 * Set a flag to indicate that we should ignore all PhyRdy
@@ -2157,6 +2202,131 @@  static void ahci_error_intr(struct ata_port *ap, u32 irq_stat)
 		ata_port_abort(ap);
 }
 
+static int get_current_alpm_state(struct ata_port *ap)
+{
+	u32 status = 0;
+
+	ahci_scr_read(&ap->link, SCR_STATUS, &status);
+
+	/* link status is in bits 11-8 */
+	status = status >> 8;
+	status = status & 0x7;
+
+	if (status == 6)
+		return AHCI_PORT_SLUMBER;
+	if (status == 2)
+		return AHCI_PORT_PARTIAL;
+	if (status == 1)
+		return AHCI_PORT_ACTIVE;
+	return AHCI_PORT_NOLINK;
+}
+
+static void account_alpm_stats(struct ata_port *ap)
+{
+	struct ahci_port_priv *pp = ap->private_data;
+
+	int new_state;
+	u64 new_jiffies, jiffies_delta;
+
+	new_state = get_current_alpm_state(ap);
+	new_jiffies = jiffies;
+
+	jiffies_delta = new_jiffies - pp->previous_jiffies;
+
+	switch (pp->previous_state) {
+	case AHCI_PORT_NOLINK:
+		pp->active_jiffies = 0;
+		pp->partial_jiffies = 0;
+		pp->slumber_jiffies = 0;
+		break;
+	case AHCI_PORT_ACTIVE:
+		pp->active_jiffies += jiffies_delta;
+		break;
+	case AHCI_PORT_PARTIAL:
+		pp->partial_jiffies += jiffies_delta;
+		break;
+	case AHCI_PORT_SLUMBER:
+		pp->slumber_jiffies += jiffies_delta;
+		break;
+	default:
+		break;
+	}
+	pp->previous_state = new_state;
+	pp->previous_jiffies = new_jiffies;
+}
+
+static ssize_t ahci_alpm_show_active(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(dev);
+	struct ata_port *ap = ata_shost_to_port(shost);
+	struct ahci_port_priv *pp = ap->private_data;
+
+	account_alpm_stats(ap);
+
+	return sprintf(buf, "%u\n", jiffies_to_msecs(pp->active_jiffies));
+}
+
+static ssize_t ahci_alpm_show_partial(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(dev);
+	struct ata_port *ap = ata_shost_to_port(shost);
+	struct ahci_port_priv *pp = ap->private_data;
+
+	account_alpm_stats(ap);
+
+	return sprintf(buf, "%u\n", jiffies_to_msecs(pp->partial_jiffies));
+}
+
+static ssize_t ahci_alpm_show_slumber(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(dev);
+	struct ata_port *ap = ata_shost_to_port(shost);
+	struct ahci_port_priv *pp = ap->private_data;
+
+	account_alpm_stats(ap);
+
+	return sprintf(buf, "%u\n", jiffies_to_msecs(pp->slumber_jiffies));
+}
+
+
+static ssize_t ahci_alpm_show_accounting(struct device *dev,
+				  struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(dev);
+	struct ata_port *ap = ata_shost_to_port(shost);
+	struct ahci_port_priv *pp = ap->private_data;
+
+	return sprintf(buf, "%u\n", pp->accounting_active);
+}
+
+static ssize_t ahci_alpm_set_accounting(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	unsigned long flags;
+	struct Scsi_Host *shost = class_to_shost(dev);
+	struct ata_port *ap = ata_shost_to_port(shost);
+	struct ahci_port_priv *pp = ap->private_data;
+	void __iomem *port_mmio = ahci_port_base(ap);
+
+	if (buf[0] == '0')
+		pp->accounting_active = 0;
+	if (buf[0] == '1')
+		pp->accounting_active = 1;
+
+	/* we need to enable the PHYRDY interrupt when we want accounting */
+	if (pp->accounting_active) {
+		spin_lock_irqsave(ap->lock, flags);
+		pp->intr_mask |= PORT_IRQ_PHYRDY;
+		writel(pp->intr_mask, port_mmio + PORT_IRQ_MASK);
+		spin_unlock_irqrestore(ap->lock, flags);
+	}
+	return count;
+}
+
 static void ahci_port_intr(struct ata_port *ap)
 {
 	void __iomem *port_mmio = ahci_port_base(ap);
@@ -2182,6 +2352,7 @@  static void ahci_port_intr(struct ata_port *ap)
 	if ((hpriv->flags & AHCI_HFLAG_NO_HOTPLUG) &&
 		(status & PORT_IRQ_PHYRDY)) {
 		status &= ~PORT_IRQ_PHYRDY;
+		account_alpm_stats(ap);
 		ahci_scr_write(&ap->link, SCR_ERROR, ((1 << 16) | (1 << 18)));
 	}