diff mbox

[1/2] PM / Domains: Introduce domain-performance-state binding

Message ID cce951c7159aa235792959ce27546be3aaa554ec.1479459752.git.viresh.kumar@linaro.org
State Superseded, archived
Headers show

Commit Message

Viresh Kumar Nov. 18, 2016, 9:23 a.m. UTC
Some platforms have the capability to configure the performance state of
their Power Domains. The performance levels are represented by positive
integer values, a lower value represents lower performance state.

The power-domains until now were only concentrating on the idle state
management of the device and this needs to change in order to reuse the
infrastructure of power domains for active state management.

This patch introduces a new optional property for the consumers of the
power-domains: domain-performance-state.

If the consumers don't need the capability of switching to different
domain performance states at runtime, then they can simply define their
required domain performance state in their node directly. Otherwise the
consumers can define their requirements with help of other
infrastructure, for example the OPP table.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Rob Herring (Arm) Nov. 21, 2016, 3:07 p.m. UTC | #1
On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
> Some platforms have the capability to configure the performance state of
> their Power Domains. The performance levels are represented by positive
> integer values, a lower value represents lower performance state.
> 
> The power-domains until now were only concentrating on the idle state
> management of the device and this needs to change in order to reuse the
> infrastructure of power domains for active state management.
> 
> This patch introduces a new optional property for the consumers of the
> power-domains: domain-performance-state.
> 
> If the consumers don't need the capability of switching to different
> domain performance states at runtime, then they can simply define their
> required domain performance state in their node directly. Otherwise the
> consumers can define their requirements with help of other
> infrastructure, for example the OPP table.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt
> index e1650364b296..db42eacf8b5c 100644
> --- a/Documentation/devicetree/bindings/power/power_domain.txt
> +++ b/Documentation/devicetree/bindings/power/power_domain.txt
> @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
>   - power-domains : A phandle and PM domain specifier as defined by bindings of
>                     the power controller specified by phandle.
>  
> +Optional properties:
> +- domain-performance-state: A positive integer value representing the minimum
> +  performance level (of the parent domain) required by the consumer for its
> +  working. The integer value '1' represents the lowest performance level and the
> +  highest value represents the highest performance level.

How does one come up with the range of values? It seems like you are 
just making up numbers. Couldn't the domain performance level be an OPP 
in the sense that it is a collection of clock frequencies and voltage 
settings?

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Nov. 22, 2016, 3:17 a.m. UTC | #2
On 21-11-16, 09:07, Rob Herring wrote:
> On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
> > Some platforms have the capability to configure the performance state of
> > their Power Domains. The performance levels are represented by positive
> > integer values, a lower value represents lower performance state.
> > 
> > The power-domains until now were only concentrating on the idle state
> > management of the device and this needs to change in order to reuse the
> > infrastructure of power domains for active state management.
> > 
> > This patch introduces a new optional property for the consumers of the
> > power-domains: domain-performance-state.
> > 
> > If the consumers don't need the capability of switching to different
> > domain performance states at runtime, then they can simply define their
> > required domain performance state in their node directly. Otherwise the
> > consumers can define their requirements with help of other
> > infrastructure, for example the OPP table.
> > 
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
> >  Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt
> > index e1650364b296..db42eacf8b5c 100644
> > --- a/Documentation/devicetree/bindings/power/power_domain.txt
> > +++ b/Documentation/devicetree/bindings/power/power_domain.txt
> > @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
> >   - power-domains : A phandle and PM domain specifier as defined by bindings of
> >                     the power controller specified by phandle.
> >  
> > +Optional properties:
> > +- domain-performance-state: A positive integer value representing the minimum
> > +  performance level (of the parent domain) required by the consumer for its
> > +  working. The integer value '1' represents the lowest performance level and the
> > +  highest value represents the highest performance level.
> 
> How does one come up with the range of values?

Why would we need a range here? The value here represents the minimum 'state'
and the assumption is that everything above that level would be fine. So the
range is automatically: domain-performance-state -> MAX.

> It seems like you are 
> just making up numbers. Couldn't the domain performance level be an OPP 
> in the sense that it is a collection of clock frequencies and voltage 
> settings?

The clock is going to be handled by the device itself (at least for the case we
have today) and the performance-state lies with the power-domain which is
configured separately. If the performance level includes both clk and voltage,
then why would we need to show the clock rates in the DT ? Wouldn't a
performance level be enough in such cases?
Kevin Hilman Nov. 22, 2016, 6:12 p.m. UTC | #3
Viresh Kumar <viresh.kumar@linaro.org> writes:

> On 21-11-16, 09:07, Rob Herring wrote:
>> On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
>> > Some platforms have the capability to configure the performance state of
>> > their Power Domains. The performance levels are represented by positive
>> > integer values, a lower value represents lower performance state.
>> > 
>> > The power-domains until now were only concentrating on the idle state
>> > management of the device and this needs to change in order to reuse the
>> > infrastructure of power domains for active state management.
>> > 
>> > This patch introduces a new optional property for the consumers of the
>> > power-domains: domain-performance-state.
>> > 
>> > If the consumers don't need the capability of switching to different
>> > domain performance states at runtime, then they can simply define their
>> > required domain performance state in their node directly. Otherwise the
>> > consumers can define their requirements with help of other
>> > infrastructure, for example the OPP table.
>> > 
>> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>> > ---
>> >  Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++
>> >  1 file changed, 6 insertions(+)
>> > 
>> > diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt
>> > index e1650364b296..db42eacf8b5c 100644
>> > --- a/Documentation/devicetree/bindings/power/power_domain.txt
>> > +++ b/Documentation/devicetree/bindings/power/power_domain.txt
>> > @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
>> >   - power-domains : A phandle and PM domain specifier as defined by bindings of
>> >                     the power controller specified by phandle.
>> >  
>> > +Optional properties:
>> > +- domain-performance-state: A positive integer value representing the minimum
>> > +  performance level (of the parent domain) required by the consumer for its
>> > +  working. The integer value '1' represents the lowest performance level and the
>> > +  highest value represents the highest performance level.
>> 
>> How does one come up with the range of values?
>
> Why would we need a range here? The value here represents the minimum 'state'
> and the assumption is that everything above that level would be fine. So the
> range is automatically: domain-performance-state -> MAX.
>
>> It seems like you are 
>> just making up numbers. Couldn't the domain performance level be an OPP 
>> in the sense that it is a collection of clock frequencies and voltage 
>> settings?
>
> The clock is going to be handled by the device itself (at least for the case we
> have today) and the performance-state lies with the power-domain which is
> configured separately. If the performance level includes both clk and voltage,
> then why would we need to show the clock rates in the DT ? Wouldn't a
> performance level be enough in such cases?

I think the question is: what does the performance-level of a domain
actually mean?  Or, what are the units?

Depending on the SoC, there's probably a few things this could mean.  It
might mean is that an underlying bus/interconnect can be configured to
guarantee a specific bandwidth or throughput.  That in turn might mean
that that bus/interconnect might have to be set at a specific
frequency/voltage.

In your case, IIUC, you're just passing some magic value to some
firmware running on a micro-controller, but under the hood that uC is
probably configuring a frequency/voltage someplace.

So, if we're going to have a generic DT binding for this, it needs to be
something that's useful on platforms that are not using magic numbers
managed by a uC as well.

Kevin








--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vincent Guittot Nov. 22, 2016, 6:34 p.m. UTC | #4
On 22 November 2016 at 19:12, Kevin Hilman <khilman@baylibre.com> wrote:
> Viresh Kumar <viresh.kumar@linaro.org> writes:
>
>> On 21-11-16, 09:07, Rob Herring wrote:
>>> On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
>>> > Some platforms have the capability to configure the performance state of
>>> > their Power Domains. The performance levels are represented by positive
>>> > integer values, a lower value represents lower performance state.
>>> >
>>> > The power-domains until now were only concentrating on the idle state
>>> > management of the device and this needs to change in order to reuse the
>>> > infrastructure of power domains for active state management.
>>> >
>>> > This patch introduces a new optional property for the consumers of the
>>> > power-domains: domain-performance-state.
>>> >
>>> > If the consumers don't need the capability of switching to different
>>> > domain performance states at runtime, then they can simply define their
>>> > required domain performance state in their node directly. Otherwise the
>>> > consumers can define their requirements with help of other
>>> > infrastructure, for example the OPP table.
>>> >
>>> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>>> > ---
>>> >  Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++
>>> >  1 file changed, 6 insertions(+)
>>> >
>>> > diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt
>>> > index e1650364b296..db42eacf8b5c 100644
>>> > --- a/Documentation/devicetree/bindings/power/power_domain.txt
>>> > +++ b/Documentation/devicetree/bindings/power/power_domain.txt
>>> > @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
>>> >   - power-domains : A phandle and PM domain specifier as defined by bindings of
>>> >                     the power controller specified by phandle.
>>> >
>>> > +Optional properties:
>>> > +- domain-performance-state: A positive integer value representing the minimum
>>> > +  performance level (of the parent domain) required by the consumer for its
>>> > +  working. The integer value '1' represents the lowest performance level and the
>>> > +  highest value represents the highest performance level.
>>>
>>> How does one come up with the range of values?
>>
>> Why would we need a range here? The value here represents the minimum 'state'
>> and the assumption is that everything above that level would be fine. So the
>> range is automatically: domain-performance-state -> MAX.
>>
>>> It seems like you are
>>> just making up numbers. Couldn't the domain performance level be an OPP
>>> in the sense that it is a collection of clock frequencies and voltage
>>> settings?
>>
>> The clock is going to be handled by the device itself (at least for the case we
>> have today) and the performance-state lies with the power-domain which is
>> configured separately. If the performance level includes both clk and voltage,
>> then why would we need to show the clock rates in the DT ? Wouldn't a
>> performance level be enough in such cases?
>
> I think the question is: what does the performance-level of a domain
> actually mean?  Or, what are the units?
>
> Depending on the SoC, there's probably a few things this could mean.  It
> might mean is that an underlying bus/interconnect can be configured to
> guarantee a specific bandwidth or throughput.  That in turn might mean
> that that bus/interconnect might have to be set at a specific
> frequency/voltage.
>
> In your case, IIUC, you're just passing some magic value to some
> firmware running on a micro-controller, but under the hood that uC is
> probably configuring a frequency/voltage someplace.

In the case described by Viresh, it's only about setting the voltage
of a power domain that is shared between different devices. these
devices wants to run at different frequency (set by the devices) but
we have to select a Volateg value that will match with the constraint
of all devices (in this case the highest voltage)

>
> So, if we're going to have a generic DT binding for this, it needs to be
> something that's useful on platforms that are not using magic numbers
> managed by a uC as well.
>
> Kevin
>
>
>
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Nov. 23, 2016, 3:22 a.m. UTC | #5
Thanks for explaining on my behalf Vincent :)

On 22-11-16, 19:34, Vincent Guittot wrote:
> On 22 November 2016 at 19:12, Kevin Hilman <khilman@baylibre.com> wrote:
> > I think the question is: what does the performance-level of a domain
> > actually mean?  Or, what are the units?

There is no unit. If we have units like Hz and volts etc, then we can actually
use the existing clk/regulator frameworks straight away.

The whole problem here is that the regulator (and maybe the clock on a different
platform) for a power domain are hidden from the kernel and handled by a black
box (An M3 core in my case). All we can ask is for a performance state, a simple
positive integer value.

> > Depending on the SoC, there's probably a few things this could mean.  It
> > might mean is that an underlying bus/interconnect can be configured to
> > guarantee a specific bandwidth or throughput.

We are talking in terms of power domains here and so if the bus/interconnect has
a power domain for itself, then yes we can very much have that situation. But if
the kernel have the capability of configuring clk and voltages directly, then we
don't need this new infrastructure at all.

> > That in turn might mean
> > that that bus/interconnect might have to be set at a specific
> > frequency/voltage.
> >
> > In your case, IIUC, you're just passing some magic value to some
> > firmware running on a micro-controller, but under the hood that uC is
> > probably configuring a frequency/voltage someplace.
> 
> In the case described by Viresh, it's only about setting the voltage
> of a power domain that is shared between different devices. these
> devices wants to run at different frequency (set by the devices) but
> we have to select a Volateg value that will match with the constraint
> of all devices (in this case the highest voltage)

That's right.

> > So, if we're going to have a generic DT binding for this, it needs to be
> > something that's useful on platforms that are not using magic numbers
> > managed by a uC as well.

What suggestions do you have for this and I am not sure what all cases we want
to solve by this ?
Kevin Hilman Nov. 23, 2016, 3:51 p.m. UTC | #6
Vincent Guittot <vincent.guittot@linaro.org> writes:

> On 22 November 2016 at 19:12, Kevin Hilman <khilman@baylibre.com> wrote:
>> Viresh Kumar <viresh.kumar@linaro.org> writes:
>>
>>> On 21-11-16, 09:07, Rob Herring wrote:
>>>> On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
>>>> > Some platforms have the capability to configure the performance state of
>>>> > their Power Domains. The performance levels are represented by positive
>>>> > integer values, a lower value represents lower performance state.
>>>> >
>>>> > The power-domains until now were only concentrating on the idle state
>>>> > management of the device and this needs to change in order to reuse the
>>>> > infrastructure of power domains for active state management.
>>>> >
>>>> > This patch introduces a new optional property for the consumers of the
>>>> > power-domains: domain-performance-state.
>>>> >
>>>> > If the consumers don't need the capability of switching to different
>>>> > domain performance states at runtime, then they can simply define their
>>>> > required domain performance state in their node directly. Otherwise the
>>>> > consumers can define their requirements with help of other
>>>> > infrastructure, for example the OPP table.
>>>> >
>>>> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>>>> > ---
>>>> >  Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++
>>>> >  1 file changed, 6 insertions(+)
>>>> >
>>>> > diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt
>>>> > index e1650364b296..db42eacf8b5c 100644
>>>> > --- a/Documentation/devicetree/bindings/power/power_domain.txt
>>>> > +++ b/Documentation/devicetree/bindings/power/power_domain.txt
>>>> > @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
>>>> >   - power-domains : A phandle and PM domain specifier as defined by bindings of
>>>> >                     the power controller specified by phandle.
>>>> >
>>>> > +Optional properties:
>>>> > +- domain-performance-state: A positive integer value representing the minimum
>>>> > +  performance level (of the parent domain) required by the consumer for its
>>>> > +  working. The integer value '1' represents the lowest performance level and the
>>>> > +  highest value represents the highest performance level.
>>>>
>>>> How does one come up with the range of values?
>>>
>>> Why would we need a range here? The value here represents the minimum 'state'
>>> and the assumption is that everything above that level would be fine. So the
>>> range is automatically: domain-performance-state -> MAX.
>>>
>>>> It seems like you are
>>>> just making up numbers. Couldn't the domain performance level be an OPP
>>>> in the sense that it is a collection of clock frequencies and voltage
>>>> settings?
>>>
>>> The clock is going to be handled by the device itself (at least for the case we
>>> have today) and the performance-state lies with the power-domain which is
>>> configured separately. If the performance level includes both clk and voltage,
>>> then why would we need to show the clock rates in the DT ? Wouldn't a
>>> performance level be enough in such cases?
>>
>> I think the question is: what does the performance-level of a domain
>> actually mean?  Or, what are the units?
>>
>> Depending on the SoC, there's probably a few things this could mean.  It
>> might mean is that an underlying bus/interconnect can be configured to
>> guarantee a specific bandwidth or throughput.  That in turn might mean
>> that that bus/interconnect might have to be set at a specific
>> frequency/voltage.
>>
>> In your case, IIUC, you're just passing some magic value to some
>> firmware running on a micro-controller, but under the hood that uC is
>> probably configuring a frequency/voltage someplace.
>
> In the case described by Viresh, it's only about setting the voltage
> of a power domain that is shared between different devices. these
> devices wants to run at different frequency (set by the devices) but
> we have to select a Volateg value that will match with the constraint
> of all devices (in this case the highest voltage)

Then, at least for this use case, we're talking about voltage, not some
unspecified units.

But that makes me wonder, this performance state sounds like something
that is changing dynamically at runtime, so why do you want to describe
this statically in DT?

This sounds to me like the job of the genpd.  When any device in the
domain does its pm_runtime_get(), the domain could check the device
frequency and see if it needs to change the domain voltage in order for
that device to operate at that frequency.  When the device goes away
(using pm_runtime_put()) the domain can check again if it could lower
the voltage and still meet the requirements of the remaining devices.

Kevin



--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vincent Guittot Nov. 23, 2016, 3:55 p.m. UTC | #7
On 23 November 2016 at 16:51, Kevin Hilman <khilman@baylibre.com> wrote:
> Vincent Guittot <vincent.guittot@linaro.org> writes:
>
>> On 22 November 2016 at 19:12, Kevin Hilman <khilman@baylibre.com> wrote:
>>> Viresh Kumar <viresh.kumar@linaro.org> writes:
>>>
>>>> On 21-11-16, 09:07, Rob Herring wrote:
>>>>> On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
>>>>> > Some platforms have the capability to configure the performance state of
>>>>> > their Power Domains. The performance levels are represented by positive
>>>>> > integer values, a lower value represents lower performance state.
>>>>> >
>>>>> > The power-domains until now were only concentrating on the idle state
>>>>> > management of the device and this needs to change in order to reuse the
>>>>> > infrastructure of power domains for active state management.
>>>>> >
>>>>> > This patch introduces a new optional property for the consumers of the
>>>>> > power-domains: domain-performance-state.
>>>>> >
>>>>> > If the consumers don't need the capability of switching to different
>>>>> > domain performance states at runtime, then they can simply define their
>>>>> > required domain performance state in their node directly. Otherwise the
>>>>> > consumers can define their requirements with help of other
>>>>> > infrastructure, for example the OPP table.
>>>>> >
>>>>> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>>>>> > ---
>>>>> >  Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++
>>>>> >  1 file changed, 6 insertions(+)
>>>>> >
>>>>> > diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt
>>>>> > index e1650364b296..db42eacf8b5c 100644
>>>>> > --- a/Documentation/devicetree/bindings/power/power_domain.txt
>>>>> > +++ b/Documentation/devicetree/bindings/power/power_domain.txt
>>>>> > @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
>>>>> >   - power-domains : A phandle and PM domain specifier as defined by bindings of
>>>>> >                     the power controller specified by phandle.
>>>>> >
>>>>> > +Optional properties:
>>>>> > +- domain-performance-state: A positive integer value representing the minimum
>>>>> > +  performance level (of the parent domain) required by the consumer for its
>>>>> > +  working. The integer value '1' represents the lowest performance level and the
>>>>> > +  highest value represents the highest performance level.
>>>>>
>>>>> How does one come up with the range of values?
>>>>
>>>> Why would we need a range here? The value here represents the minimum 'state'
>>>> and the assumption is that everything above that level would be fine. So the
>>>> range is automatically: domain-performance-state -> MAX.
>>>>
>>>>> It seems like you are
>>>>> just making up numbers. Couldn't the domain performance level be an OPP
>>>>> in the sense that it is a collection of clock frequencies and voltage
>>>>> settings?
>>>>
>>>> The clock is going to be handled by the device itself (at least for the case we
>>>> have today) and the performance-state lies with the power-domain which is
>>>> configured separately. If the performance level includes both clk and voltage,
>>>> then why would we need to show the clock rates in the DT ? Wouldn't a
>>>> performance level be enough in such cases?
>>>
>>> I think the question is: what does the performance-level of a domain
>>> actually mean?  Or, what are the units?
>>>
>>> Depending on the SoC, there's probably a few things this could mean.  It
>>> might mean is that an underlying bus/interconnect can be configured to
>>> guarantee a specific bandwidth or throughput.  That in turn might mean
>>> that that bus/interconnect might have to be set at a specific
>>> frequency/voltage.
>>>
>>> In your case, IIUC, you're just passing some magic value to some
>>> firmware running on a micro-controller, but under the hood that uC is
>>> probably configuring a frequency/voltage someplace.
>>
>> In the case described by Viresh, it's only about setting the voltage
>> of a power domain that is shared between different devices. these
>> devices wants to run at different frequency (set by the devices) but
>> we have to select a Volateg value that will match with the constraint
>> of all devices (in this case the highest voltage)
>
> Then, at least for this use case, we're talking about voltage, not some
> unspecified units.
>
> But that makes me wonder, this performance state sounds like something
> that is changing dynamically at runtime, so why do you want to describe
> this statically in DT?
>
> This sounds to me like the job of the genpd.  When any device in the
> domain does its pm_runtime_get(), the domain could check the device
> frequency and see if it needs to change the domain voltage in order for
> that device to operate at that frequency.  When the device goes away
> (using pm_runtime_put()) the domain can check again if it could lower
> the voltage and still meet the requirements of the remaining devices.

That's only part of the job. The device can change its frequency and
as a result ask for a new voltage index while it is already running

Vincent

>
> Kevin
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kevin Hilman Nov. 23, 2016, 10:30 p.m. UTC | #8
Vincent Guittot <vincent.guittot@linaro.org> writes:

> On 23 November 2016 at 16:51, Kevin Hilman <khilman@baylibre.com> wrote:
>> Vincent Guittot <vincent.guittot@linaro.org> writes:
>>
>>> On 22 November 2016 at 19:12, Kevin Hilman <khilman@baylibre.com> wrote:
>>>> Viresh Kumar <viresh.kumar@linaro.org> writes:
>>>>
>>>>> On 21-11-16, 09:07, Rob Herring wrote:
>>>>>> On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
>>>>>> > Some platforms have the capability to configure the performance state of
>>>>>> > their Power Domains. The performance levels are represented by positive
>>>>>> > integer values, a lower value represents lower performance state.
>>>>>> >
>>>>>> > The power-domains until now were only concentrating on the idle state
>>>>>> > management of the device and this needs to change in order to reuse the
>>>>>> > infrastructure of power domains for active state management.
>>>>>> >
>>>>>> > This patch introduces a new optional property for the consumers of the
>>>>>> > power-domains: domain-performance-state.
>>>>>> >
>>>>>> > If the consumers don't need the capability of switching to different
>>>>>> > domain performance states at runtime, then they can simply define their
>>>>>> > required domain performance state in their node directly. Otherwise the
>>>>>> > consumers can define their requirements with help of other
>>>>>> > infrastructure, for example the OPP table.
>>>>>> >
>>>>>> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>>>>>> > ---
>>>>>> >  Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++
>>>>>> >  1 file changed, 6 insertions(+)
>>>>>> >
>>>>>> > diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt
>>>>>> > index e1650364b296..db42eacf8b5c 100644
>>>>>> > --- a/Documentation/devicetree/bindings/power/power_domain.txt
>>>>>> > +++ b/Documentation/devicetree/bindings/power/power_domain.txt
>>>>>> > @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
>>>>>> >   - power-domains : A phandle and PM domain specifier as defined by bindings of
>>>>>> >                     the power controller specified by phandle.
>>>>>> >
>>>>>> > +Optional properties:
>>>>>> > +- domain-performance-state: A positive integer value representing the minimum
>>>>>> > +  performance level (of the parent domain) required by the consumer for its
>>>>>> > +  working. The integer value '1' represents the lowest performance level and the
>>>>>> > +  highest value represents the highest performance level.
>>>>>>
>>>>>> How does one come up with the range of values?
>>>>>
>>>>> Why would we need a range here? The value here represents the minimum 'state'
>>>>> and the assumption is that everything above that level would be fine. So the
>>>>> range is automatically: domain-performance-state -> MAX.
>>>>>
>>>>>> It seems like you are
>>>>>> just making up numbers. Couldn't the domain performance level be an OPP
>>>>>> in the sense that it is a collection of clock frequencies and voltage
>>>>>> settings?
>>>>>
>>>>> The clock is going to be handled by the device itself (at least for the case we
>>>>> have today) and the performance-state lies with the power-domain which is
>>>>> configured separately. If the performance level includes both clk and voltage,
>>>>> then why would we need to show the clock rates in the DT ? Wouldn't a
>>>>> performance level be enough in such cases?
>>>>
>>>> I think the question is: what does the performance-level of a domain
>>>> actually mean?  Or, what are the units?
>>>>
>>>> Depending on the SoC, there's probably a few things this could mean.  It
>>>> might mean is that an underlying bus/interconnect can be configured to
>>>> guarantee a specific bandwidth or throughput.  That in turn might mean
>>>> that that bus/interconnect might have to be set at a specific
>>>> frequency/voltage.
>>>>
>>>> In your case, IIUC, you're just passing some magic value to some
>>>> firmware running on a micro-controller, but under the hood that uC is
>>>> probably configuring a frequency/voltage someplace.
>>>
>>> In the case described by Viresh, it's only about setting the voltage
>>> of a power domain that is shared between different devices. these
>>> devices wants to run at different frequency (set by the devices) but
>>> we have to select a Volateg value that will match with the constraint
>>> of all devices (in this case the highest voltage)
>>
>> Then, at least for this use case, we're talking about voltage, not some
>> unspecified units.
>>
>> But that makes me wonder, this performance state sounds like something
>> that is changing dynamically at runtime, so why do you want to describe
>> this statically in DT?
>>
>> This sounds to me like the job of the genpd.  When any device in the
>> domain does its pm_runtime_get(), the domain could check the device
>> frequency and see if it needs to change the domain voltage in order for
>> that device to operate at that frequency.  When the device goes away
>> (using pm_runtime_put()) the domain can check again if it could lower
>> the voltage and still meet the requirements of the remaining devices.
>
> That's only part of the job. The device can change its frequency and
> as a result ask for a new voltage index while it is already running

That's fine.  Use clock notifiers, or better use QoS (with notifiers) so
that the genpd knows when any of those change.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Boyd Nov. 24, 2016, 2:03 a.m. UTC | #9
On 11/23, Kevin Hilman wrote:
> Vincent Guittot <vincent.guittot@linaro.org> writes:
> 
> > On 23 November 2016 at 16:51, Kevin Hilman <khilman@baylibre.com> wrote:
> >> Vincent Guittot <vincent.guittot@linaro.org> writes:
> >>
> >>> On 22 November 2016 at 19:12, Kevin Hilman <khilman@baylibre.com> wrote:
> >>>> Viresh Kumar <viresh.kumar@linaro.org> writes:
> >>>>
> >>>>> On 21-11-16, 09:07, Rob Herring wrote:
> >>>>>> On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
> >>>>>> > Some platforms have the capability to configure the performance state of
> >>>>>> > their Power Domains. The performance levels are represented by positive
> >>>>>> > integer values, a lower value represents lower performance state.
> >>>>>> >
> >>>>>> > The power-domains until now were only concentrating on the idle state
> >>>>>> > management of the device and this needs to change in order to reuse the
> >>>>>> > infrastructure of power domains for active state management.
> >>>>>> >
> >>>>>> > This patch introduces a new optional property for the consumers of the
> >>>>>> > power-domains: domain-performance-state.
> >>>>>> >
> >>>>>> > If the consumers don't need the capability of switching to different
> >>>>>> > domain performance states at runtime, then they can simply define their
> >>>>>> > required domain performance state in their node directly. Otherwise the
> >>>>>> > consumers can define their requirements with help of other
> >>>>>> > infrastructure, for example the OPP table.
> >>>>>> >
> >>>>>> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> >>>>>> > ---
> >>>>>> >  Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++
> >>>>>> >  1 file changed, 6 insertions(+)
> >>>>>> >
> >>>>>> > diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt
> >>>>>> > index e1650364b296..db42eacf8b5c 100644
> >>>>>> > --- a/Documentation/devicetree/bindings/power/power_domain.txt
> >>>>>> > +++ b/Documentation/devicetree/bindings/power/power_domain.txt
> >>>>>> > @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
> >>>>>> >   - power-domains : A phandle and PM domain specifier as defined by bindings of
> >>>>>> >                     the power controller specified by phandle.
> >>>>>> >
> >>>>>> > +Optional properties:
> >>>>>> > +- domain-performance-state: A positive integer value representing the minimum
> >>>>>> > +  performance level (of the parent domain) required by the consumer for its
> >>>>>> > +  working. The integer value '1' represents the lowest performance level and the
> >>>>>> > +  highest value represents the highest performance level.
> >>>>>>
> >>>>>> How does one come up with the range of values?
> >>>>>
> >>>>> Why would we need a range here? The value here represents the minimum 'state'
> >>>>> and the assumption is that everything above that level would be fine. So the
> >>>>> range is automatically: domain-performance-state -> MAX.
> >>>>>
> >>>>>> It seems like you are
> >>>>>> just making up numbers. Couldn't the domain performance level be an OPP
> >>>>>> in the sense that it is a collection of clock frequencies and voltage
> >>>>>> settings?
> >>>>>
> >>>>> The clock is going to be handled by the device itself (at least for the case we
> >>>>> have today) and the performance-state lies with the power-domain which is
> >>>>> configured separately. If the performance level includes both clk and voltage,
> >>>>> then why would we need to show the clock rates in the DT ? Wouldn't a
> >>>>> performance level be enough in such cases?
> >>>>
> >>>> I think the question is: what does the performance-level of a domain
> >>>> actually mean?  Or, what are the units?
> >>>>
> >>>> Depending on the SoC, there's probably a few things this could mean.  It
> >>>> might mean is that an underlying bus/interconnect can be configured to
> >>>> guarantee a specific bandwidth or throughput.  That in turn might mean
> >>>> that that bus/interconnect might have to be set at a specific
> >>>> frequency/voltage.
> >>>>
> >>>> In your case, IIUC, you're just passing some magic value to some
> >>>> firmware running on a micro-controller, but under the hood that uC is
> >>>> probably configuring a frequency/voltage someplace.
> >>>
> >>> In the case described by Viresh, it's only about setting the voltage
> >>> of a power domain that is shared between different devices. these
> >>> devices wants to run at different frequency (set by the devices) but
> >>> we have to select a Volateg value that will match with the constraint
> >>> of all devices (in this case the highest voltage)
> >>
> >> Then, at least for this use case, we're talking about voltage, not some
> >> unspecified units.

In some cases we actually know the voltage of the domain and
would want to put some voltage mapping in DT. For example, level
1 is voltage 2V and level 2 is voltage 2.5V. In other cases we
don't know the voltage, all we know is the voltage "corner" which
is a number from 0 to N that is translated into a voltage by the
firmware but is otherwise unknown what that is outside of the
firmware. In this case we've lost the units, but otherwise we're
still interested in requesting some 'level' that the domain be
operating in.

> >>
> >> But that makes me wonder, this performance state sounds like something
> >> that is changing dynamically at runtime, so why do you want to describe
> >> this statically in DT?
> >>
> >> This sounds to me like the job of the genpd.  When any device in the
> >> domain does its pm_runtime_get(), the domain could check the device
> >> frequency and see if it needs to change the domain voltage in order for
> >> that device to operate at that frequency.

How do we check the device frequency? Does the domain need to
know about the clocks for all devices that are in the domain and
what clocks in there are contributing to the voltage requirement?

In out of tree solutions we've 'bucketized' the requirements of
the devices into an array sized to the number of levels of the
voltage domain. When a device requires a new level, we increment
the new level and decrement the old level and then look for the
largest non-zero index in the array. This is the inverse design
of iterating over all devices in the domain to see what frequency
they're running at to determine the voltage requirement. I guess
using PM QoS would be similar here to do the aggregation and then
tell the domain to go to that level.

> >> When the device goes away
> >> (using pm_runtime_put()) the domain can check again if it could lower
> >> the voltage and still meet the requirements of the remaining devices.
> >
> > That's only part of the job. The device can change its frequency and
> > as a result ask for a new voltage index while it is already running
> 
> That's fine.  Use clock notifiers, or better use QoS (with notifiers) so
> that the genpd knows when any of those change.
> 

>From my perspective clock notifiers are going to be ugly. At the
point we notify that a rate has changed we're deep in the clk
framework holding the prepare mutex and we're calling it from an
SRCU callback. If those callbacks need to turn on an i2c clk to
communicate with some PMIC to change voltages we're in a world of
pain due to our locking scheme. Maybe that's solvable with a
different clk locking scheme though so I may be overly concerned
here and everything will work out. Also, we don't have any
notification that a clock is turned on or off right now, which
sounds like we're going to assume is the case when a device gets
pm_runtime_put().
Viresh Kumar Nov. 24, 2016, 4:40 a.m. UTC | #10
On 23-11-16, 18:03, Stephen Boyd wrote:
> On 11/23, Kevin Hilman wrote:
> > Vincent Guittot <vincent.guittot@linaro.org> writes:
> > > On 23 November 2016 at 16:51, Kevin Hilman <khilman@baylibre.com> wrote:

> > >> Then, at least for this use case, we're talking about voltage, not some
> > >> unspecified units.
> 
> In some cases we actually know the voltage of the domain and
> would want to put some voltage mapping in DT. For example, level
> 1 is voltage 2V and level 2 is voltage 2.5V.

But even in these cases we wouldn't be using the voltage values within the
kernel as we will be giving only a performance state to the M3 core, right?

> In other cases we
> don't know the voltage, all we know is the voltage "corner" which
> is a number from 0 to N that is translated into a voltage by the
> firmware but is otherwise unknown what that is outside of the
> firmware. In this case we've lost the units, but otherwise we're
> still interested in requesting some 'level' that the domain be
> operating in.

> > >> But that makes me wonder, this performance state sounds like something
> > >> that is changing dynamically at runtime, so why do you want to describe
> > >> this statically in DT?

Each frequency a device can operate in has the requirement of minimum
performance state of the domain and so we need these values in the DT.

> > >>
> > >> This sounds to me like the job of the genpd.  When any device in the
> > >> domain does its pm_runtime_get(), the domain could check the device
> > >> frequency and see if it needs to change the domain voltage in order for
> > >> that device to operate at that frequency.

Also note that the performance index may be required to be changed before
updating the frequency in case we are increasing the frequency which needs a
higher performance index to be set.

> How do we check the device frequency? Does the domain need to
> know about the clocks for all devices that are in the domain and
> what clocks in there are contributing to the voltage requirement?
> 
> In out of tree solutions we've 'bucketized' the requirements of
> the devices into an array sized to the number of levels of the
> voltage domain. When a device requires a new level, we increment
> the new level and decrement the old level and then look for the
> largest non-zero index in the array.

For such a design we need to know the index-size in advance and I am not sure if
we should get anything like that from the DT.

> This is the inverse design
> of iterating over all devices in the domain to see what frequency
> they're running at to determine the voltage requirement. I guess
> using PM QoS would be similar here to do the aggregation and then
> tell the domain to go to that level.
> 
> > >> When the device goes away
> > >> (using pm_runtime_put()) the domain can check again if it could lower
> > >> the voltage and still meet the requirements of the remaining devices.

This will be done nevertheless.

> > >
> > > That's only part of the job. The device can change its frequency and
> > > as a result ask for a new voltage index while it is already running
> > 
> > That's fine.  Use clock notifiers, or better use QoS (with notifiers) so
> > that the genpd knows when any of those change.

Yes genpd will be handling it all but it will surely need to know the
performance index for each individual clock rate we support.

The way I have written the code for now is this with another QOS request type
DEV_PM_QOS_PERFORMANCE:

+static int _generic_set_opp_pd(...)
+{
+
	...

+       /* Scaling up? Scale voltage before frequency */
+       if (freq > old_freq)
+               dev_pm_qos_update_request(req, perf);
+
+       clk_set_rate(...);
+
+       if (freq < old_freq)
+               dev_pm_qos_update_request(req, perf);
+
+       return 0;
+}

And genpd is registering its notifier for DEV_PM_QOS_PERFORMANCE request type
where it accumulates requests from all the devices and selects the highest one.
Stephen Boyd Nov. 28, 2016, 6:27 p.m. UTC | #11
On 11/23/2016 08:40 PM, Viresh Kumar wrote:
> On 23-11-16, 18:03, Stephen Boyd wrote:
>> On 11/23, Kevin Hilman wrote:
>>> Vincent Guittot <vincent.guittot@linaro.org> writes:
>>>> On 23 November 2016 at 16:51, Kevin Hilman <khilman@baylibre.com> wrote:
>>>>> Then, at least for this use case, we're talking about voltage, not some
>>>>> unspecified units.
>> In some cases we actually know the voltage of the domain and
>> would want to put some voltage mapping in DT. For example, level
>> 1 is voltage 2V and level 2 is voltage 2.5V.
> But even in these cases we wouldn't be using the voltage values within the
> kernel as we will be giving only a performance state to the M3 core, right?

Nope. In these cases we need to set a certain voltage and we do that by
requesting it via the M3 core.
Viresh Kumar Nov. 29, 2016, 6:57 a.m. UTC | #12
On 28-11-16, 10:27, Stephen Boyd wrote:
> On 11/23/2016 08:40 PM, Viresh Kumar wrote:
> > But even in these cases we wouldn't be using the voltage values within the
> > kernel as we will be giving only a performance state to the M3 core, right?
> 
> Nope. In these cases we need to set a certain voltage and we do that by
> requesting it via the M3 core.

Don't we need something like this then ?

	parent: power-controller@12340000 {
		compatible = "foo,power-controller";
		reg = <0x12340000 0x1000>;
		#power-domain-cells = <0>;
		domain-performance-states = <&perf_state0>;
	};

	perf_state0: performance_states {
		pstate1: pstate@1 {
			index = <1>;
			/* Optional */
			microvolt = <970000 975000 985000>;
		};
		pstate2: pstate@2 {
			index = <2>;
			/* Optional */
			microvolt = <970000 975000 985000>;
		};
		pstate3: pstate@3 {
			index = <3>;
			/* Optional */
			microvolt = <970000 975000 985000>;
		};
	}

	cpus {
		cpu@0 {
			...
			power-domain = <&parent>;
			operating-points-v2 = <&cpu0_opp_table>;
		};
	};

	cpu0_opp_table: opp_table0 {
		compatible = "operating-points-v2";
		opp-shared;

		opp@1000000000 {
			opp-hz = /bits/ 64 <1000000000>;
			domain-performance-state = <&pstate1>;
		};
		opp@1100000000 {
			opp-hz = /bits/ 64 <1100000000>;
			domain-performance-state = <&pstate2>;
		};
		opp@1200000000 {
			opp-hz = /bits/ 64 <1200000000>;
			domain-performance-state = <&pstate3>;
		};
	};
Stephen Boyd Nov. 30, 2016, 1:08 a.m. UTC | #13
On 11/29, Viresh Kumar wrote:
> On 28-11-16, 10:27, Stephen Boyd wrote:
> > On 11/23/2016 08:40 PM, Viresh Kumar wrote:
> > > But even in these cases we wouldn't be using the voltage values within the
> > > kernel as we will be giving only a performance state to the M3 core, right?
> > 
> > Nope. In these cases we need to set a certain voltage and we do that by
> > requesting it via the M3 core.
> 
> Don't we need something like this then ?

Perhaps. One question is if we consider a shared regulator as a
regulator in the kernel, or if we want to hide the regulator
behind some other API that aggregates the users of the voltage. I
don't see how to draw the line clearly between a regulator and a
power domain that modifies a regulator underneath. It seems like
everything that's using a regulator on the SoC could be using a
power domain instead and then we could be aggregating the voltage
requirements outside of the regulator APIs.

The only other way I can think of doing it is by having the
voltages in the OPP tables for each device. That gets sort of
messy though because all the devices calling
regulator_set_voltage() have to set the min voltage to be their
required voltage and the max to be the global max voltage on the
system. Otherwise a higher voltage may not be used while it may
be required. Of course, we could encode that as the last value in
the triplet and everything works.

> 
> 	parent: power-controller@12340000 {
> 		compatible = "foo,power-controller";
> 		reg = <0x12340000 0x1000>;
> 		#power-domain-cells = <0>;
> 		domain-performance-states = <&perf_state0>;
> 	};
> 
> 	perf_state0: performance_states {
> 		pstate1: pstate@1 {
> 			index = <1>;
> 			/* Optional */
> 			microvolt = <970000 975000 985000>;
> 		};
> 		pstate2: pstate@2 {
> 			index = <2>;
> 			/* Optional */
> 			microvolt = <970000 975000 985000>;
> 		};
> 		pstate3: pstate@3 {
> 			index = <3>;
> 			/* Optional */
> 			microvolt = <970000 975000 985000>;
> 		};
> 	}
> 
> 	cpus {
> 		cpu@0 {
> 			...
> 			power-domain = <&parent>;
> 			operating-points-v2 = <&cpu0_opp_table>;
> 		};
> 	};
> 
> 	cpu0_opp_table: opp_table0 {
> 		compatible = "operating-points-v2";
> 		opp-shared;
> 
> 		opp@1000000000 {
> 			opp-hz = /bits/ 64 <1000000000>;
> 			domain-performance-state = <&pstate1>;

What do we do if the device is part of multiple domains? For
example it may be part of two power domains for different pieces
of the silicon within one device, and we may want to
independently control those domains depending on the clock
frequency.

> 		};
> 		opp@1100000000 {
> 			opp-hz = /bits/ 64 <1100000000>;
> 			domain-performance-state = <&pstate2>;
> 		};
> 		opp@1200000000 {
> 			opp-hz = /bits/ 64 <1200000000>;
> 			domain-performance-state = <&pstate3>;
> 		};
Viresh Kumar Dec. 2, 2016, 10:47 a.m. UTC | #14
On 29-11-16, 17:08, Stephen Boyd wrote:
> Perhaps. One question is if we consider a shared regulator as a
> regulator in the kernel, or if we want to hide the regulator
> behind some other API that aggregates the users of the voltage. I
> don't see how to draw the line clearly between a regulator and a

Neither do I :)

> power domain that modifies a regulator underneath. It seems like
> everything that's using a regulator on the SoC could be using a
> power domain instead and then we could be aggregating the voltage
> requirements outside of the regulator APIs.

If I am not wrong the regulator API chooses an intersection instead of the max
requested voltage and so it may not fit very well in the use-case we are trying
to solve.

> The only other way I can think of doing it is by having the
> voltages in the OPP tables for each device. That gets sort of
> messy though because all the devices calling
> regulator_set_voltage() have to set the min voltage to be their
> required voltage and the max to be the global max voltage on the
> system. Otherwise a higher voltage may not be used while it may
> be required. Of course, we could encode that as the last value in
> the triplet and everything works.

Where will we get the performance levels in this case ?

> What do we do if the device is part of multiple domains? For
> example it may be part of two power domains for different pieces
> of the silicon within one device, and we may want to
> independently control those domains depending on the clock
> frequency.

I thought the best way to handle would be to add a virtual domain for such a
device. That domain shall be responsible for changing its parent based on the
performance level selected, and at that point only the recalculations for both
the parents should happen to select the new best performance level.

@Rob / Kevin: Do you have any inputs on the things we are discussing here? I
want to involve you guys as early as possible, or we will come back to this
again :(
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt
index e1650364b296..db42eacf8b5c 100644
--- a/Documentation/devicetree/bindings/power/power_domain.txt
+++ b/Documentation/devicetree/bindings/power/power_domain.txt
@@ -106,6 +106,12 @@  domain provided by the 'parent' power controller.
  - power-domains : A phandle and PM domain specifier as defined by bindings of
                    the power controller specified by phandle.
 
+Optional properties:
+- domain-performance-state: A positive integer value representing the minimum
+  performance level (of the parent domain) required by the consumer for its
+  working. The integer value '1' represents the lowest performance level and the
+  highest value represents the highest performance level.
+
 Example:
 
 	leaky-device@12350000 {