diff mbox

[RFC,2/2] Documentation: arm: define DT C-states bindings

Message ID 1386001205-11978-3-git-send-email-lorenzo.pieralisi@arm.com
State Superseded, archived
Headers show

Commit Message

Lorenzo Pieralisi Dec. 2, 2013, 4:20 p.m. UTC
ARM based platforms implement a variety of power management schemes that
allow processors to enter at run-time low-power states, aka C-states
in ACPI jargon. The parameters defining these C-states vary on a per-platform
basis forcing the OS to hardcode the state parameters in platform
specific static tables whose size grows as the number of platforms supported
in the kernel increases and hampers device drivers standardization.

Therefore, this patch aims at standardizing C-state device tree bindings for
ARM platforms. Bindings define C-state parameters inclusive of entry methods
and state latencies, to allow operating systems to retrieve the
configuration entries from the device tree and initialize the related
power management drivers, paving the way for common code in the kernel
to deal with power states and removing the need for static data in current
and previous kernel versions.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 Documentation/devicetree/bindings/arm/c-states.txt | 830 +++++++++++++++++++++
 1 file changed, 830 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/c-states.txt

Comments

Kumar Gala Dec. 2, 2013, 6:08 p.m. UTC | #1
On Dec 2, 2013, at 10:20 AM, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:

> ARM based platforms implement a variety of power management schemes that
> allow processors to enter at run-time low-power states, aka C-states
> in ACPI jargon. The parameters defining these C-states vary on a per-platform
> basis forcing the OS to hardcode the state parameters in platform
> specific static tables whose size grows as the number of platforms supported
> in the kernel increases and hampers device drivers standardization.
> 
> Therefore, this patch aims at standardizing C-state device tree bindings for
> ARM platforms. Bindings define C-state parameters inclusive of entry methods
> and state latencies, to allow operating systems to retrieve the
> configuration entries from the device tree and initialize the related
> power management drivers, paving the way for common code in the kernel
> to deal with power states and removing the need for static data in current
> and previous kernel versions.

Where is this spec’d today in the kernel?

> 
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> ---
> Documentation/devicetree/bindings/arm/c-states.txt | 830 +++++++++++++++++++++
> 1 file changed, 830 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/arm/c-states.txt
> 
> diff --git a/Documentation/devicetree/bindings/arm/c-states.txt b/Documentation/devicetree/bindings/arm/c-states.txt
> new file mode 100644
> index 0000000..f568417
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/c-states.txt
> @@ -0,0 +1,830 @@
> +==========================================
> +ARM C-states binding description
> +==========================================
> +
> +==========================================
> +1 - Introduction
> +==========================================
> +
> +ARM systems contain HW capable of managing power consumption dynamically,
> +where cores can be put in different low-power states (ranging from simple
> +wfi to power gating) according to OSPM policies. Borrowing concepts
> +from the ACPI specification[1], the CPU states representing the range of
> +dynamic states that a processor can enter at run-time, aka C-state, can be
> +specified through device tree bindings representing the parameters required to
> +enter/exit specific C-states on a given processor.
> +
> +The state an ARM CPU can be put into is loosely identified by one of the
> +following operating modes:
> +
> +- Running:
> +	 # Processor core is executing instructions
> +
> +- Wait for Interrupt:
> +	# An ARM processor enters wait for interrupt (WFI) low power
> +	  state by executing a wfi instruction. When a processor enters
> +	  wfi state it disables most of the clocks while keeping the processor
> +	  powered up. This state is standard on all ARM processors and it is
> +	  defined as C1 in the remainder of this document.
> +
> +- Dormant:
> +	# Dormant mode is entered by executing wfi instructions and by sending
> +	  platform specific commands to the platform power controller (coupled
> +	  with processor specific SW/HW control sequences).
> +	  In dormant mode, most of the processor control and debug logic is
> +	  powered up but cache RAM can be put in retention state, providing
> +	  additional power savings.
> +
> +- Sleep:
> +	# Sleep mode is entered by executing the wfi instruction and by sending
> +	  platform specific commands to the platform power controller (coupled
> +	  with processor specific SW/HW control sequences). In sleep mode, a
> +	  processor and its caches are shutdown, the entire processor state is
> +	  lost.
> +
> +Building on top of the previous processor modes, ARM platforms implement power
> +management schemes that allow an OS PM implementation to put the processor in
> +different CPU states (C-states). C-states parameters (eg latency) are
> +platform specific and need to be characterized with bindings that provide the
> +required information to OSPM code so that it can build the required tables and
> +use them at runtime.
> +
> +The device tree binding definition for ARM C-states is the subject of this
> +document.
> +
> +===========================================
> +2 - cpu-power-states node
> +===========================================
> +
> +ARM processor C-states are defined within the cpu-power-states node, which is
> +a direct child of the cpus node and provides a container where the processor
> +states, defined as device tree nodes, are listed.
> +
> +- cpu-power-states node
> +
> +	Usage: Optional - On ARM systems, is a container of processor C-state
> +			  nodes. If the system does not provide CPU power
> +			  management capabilities or the processor just
> +			  supports WFI (C1 state) a cpu-power-states node is
> +			  not required.
> +
> +	Description: cpu-power-states node is a container node, where its
> +		     subnodes describe the CPU low-power C-states.
> +
> +	Node name must be "cpu-power-states".
> +
> +	The cpu-power-states node's parent node must be cpus node.
> +
> +	The cpu-power-states node's child nodes can be:
> +
> +	- one or more state nodes
> +
> +	The cpu-power-states node must contain the following properties:
> +
> +	- compatible
> +		Value type: <stringlist>
> +		Usage: Required
> +		Definition: Must be "arm,cpu-power-states".
> +
> +	- #address-cells
> +		Usage: Required
> +		Value type: <u32>
> +		Definition: must be set to 1.
> +
> +	- #size-cells
> +		Usage: Required
> +		Value type: <u32>
> +		Definition: must be set to 0.
> +
> +	Any other configuration is considered invalid.
> +
> +The nodes describing the C-states (state) can only be defined within the
> +cpu-power-states node.
> +
> +Any other configuration is consider invalid and therefore must be ignored.
> +
> +===========================================
> +2 - state node
> +===========================================
> +
> +A state node represents a C-state description and must be defined as follows:
> +
> +- state node
> +
> +	Description: must be child of the cpu-power-states node.
> +
> +	The state node name must be "state", with unit address provided by the
> +	"reg" property following standard DT requirements[4].
> +
> +	A state node defines the following properties:
> +
> +	- reg
> +		Usage: Required
> +		Value type: <u32>
> +		Definition: Standard device tree property [4] used for
> +			    enumeration purposes.

I’m not sure what purpose reg is really serving here.

> +
> +	- index
> +		Usage: Required
> +		Value type: <u32>
> +		Definition: It represents C-state index, starting from 2 (index
> +			    0 represents the processor state "running" and
> +			    index 1 represents processor mode "WFI"; indexes 0
> +			    and 1 are standard ARM states that need not be
> +			    described).

any reason not to call it c-state-index"

> +
> +	- entry-method
> +		Value type: <stringlist>
> +		Usage: Required
> +		Definition: Describes the method by which a CPU enters the
> +			    C-state. This property is required and must be one
> +			    of:
> +
> +			    - "psci"
> +			      ARM Standard firmware interface
> +
> +			    - "[vendor],[method]"
> +			      An implementation dependent string with
> +			      format "vendor,method", where vendor is a string
> +			      denoting the name of the manufacturer and
> +			      method is a string specifying the mechanism
> +			      used to enter the C-state.
> +
> +	- psci-power-state
> +		Usage: Required if entry-method property value is set to
> +		       "psci".
> +		Value type: <u32>
> +		Definition: power_state parameter to pass to the PSCI
> +			    suspend call to enter the C-state.
> +
> +	- latency
> +		Usage: Required
> +		Value type: <u32>
> +		Definition: Worst case latency in microseconds required to
> +			    enter and exit the C-state.
> +
> +	- min-residency
> +		Usage: Required
> +		Value type: <u32>
> +		Definition: Time in microseconds required for the CPU to be in
> +			    the C-state to make up for the dynamic power
> +			    consumed to enter/exit the C-state in order to
> +			    break even in terms of power consumption compared
> +			    to C1 state (wfi).
> +			    This parameter depends on the operating conditions
> +			    (operating point, cache state) and must assume
> +			    worst case scenario.
> +
> +	- cpus
> +		Usage: Optional
> +		Value type: <phandle>
> +		Definition: If defined, the phandle points to a node in the
> +			    cpu-map[2] representing all CPUs on which C-state
> +			    is valid. If not present or system is UP, the
> +			    C-state has to be considered valid for all CPUs in
> +			    the system.
> +
> +	- affinity
> +		Usage: Optional
> +		Value type: <phandle>
> +		Definition: If defined, phandle points to a node in the
> +			    cpu-map[2] that represents all CPUs that are
> +			    affected (ie share) by the C-state and have to
> +			    be coordinated on C-state entry/exit. If not
> +			    present or system is UP, the C-state is local to
> +			    a CPU and need no coordination (ie it is a CPU
> +			    state, that does not require coordination with
> +			    other CPUs). If present, the affinity property
> +			    must contain a phandle to a cpu-map node that
> +			    represents a subset, possibly inclusive of the
> +			    CPUs described through the cpus property.
> +
> +	- power-depth
> +		Usage: Required
> +		Value type: <u32>
> +		Definition: Integer value, starting from 2 (value 0 meaning
> +			    running and value 1 representing power depth of
> +			    wfi (C1)), that defines the level of depth of a
> +			    power state.
> +			    The system denotes power states with different
> +			    depths, an increasing value meaning less power
> +			    consumption and might involve powering down more
> +			    components.  Devices that are affected by
> +			    C-states entry must define the maximum power
> +			    depth supported in their respective device tree
> +			    bindings so that OSPM can take decision on how
> +			    to handle the device in question when the C-state
> +			    is entered. All devices (per-CPU or external) with
> +			    a power depth lower than the one defined in the
> +			    C-state entry stop operating when the C-state
> +			    is entered and action is required by OSPM to
> +			    guarantee their logic and memory content is saved
> +			    restored to guarantee proper functioning.

How is this different from the c-state index?

> +
> +	- cache-level-lost:
> +		Usage: Required if "entry-method" differs from "psci".
> +		Value type: <u32>
> +		Definition: An integer value representing the uppermost cache
> +			    level (inclusive) that is lost upon state entry.
> +			    This property requires the definition of cache
> +			    nodes as specified in [3]. Cache levels that are
> +			    shared between processors, according to [3], should
> +			    coordinate cache cleaning and invalidation to
> +			    maximize performance (ie a shared cache level
> +			    must be cleaned only if all CPUs sharing the
> +			    cache entered the state). If missing, cache
> +			    state has to be considered retained.
> +
> +	- processor-state-retained:
> +		Usage: See definition
> +		Value type: <none>
> +		Definition: if present CPU processor logic is retained on
> +			    power down, otherwise it is lost.
> +
> +
> +===========================================
> +3 - Examples
> +===========================================
> +
> +Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters):
> +
> +cpus {
> +	#size-cells = <0>;
> +	#address-cells = <2>;
> +
> +	cpu-map {
> +		CLUSTER0: cluster0 {
> +			CLUSTER2: cluster0 {
> +				core0 {
> +					thread0 {
> +						cpu = <&CPU0>;
> +					};
> +					thread1 {
> +						cpu = <&CPU1>;
> +					};
> +				};
> +
> +				core1 {
> +					thread0 {
> +						cpu = <&CPU2>;
> +					};
> +					thread1 {
> +						cpu = <&CPU3>;
> +					};
> +				};
> +			};
> +
> +			CLUSTER3: cluster1 {
> +				core0 {
> +					thread0 {
> +						cpu = <&CPU4>;
> +					};
> +					thread1 {
> +						cpu = <&CPU5>;
> +					};
> +				};
> +
> +				core1 {
> +					thread0 {
> +						cpu = <&CPU6>;
> +					};
> +					thread1 {
> +						cpu = <&CPU7>;
> +					};
> +				};
> +			};
> +		};
> +
> +		CLUSTER1: cluster1 {
> +			CLUSTER4: cluster0 {
> +				core0 {
> +					thread0 {
> +						cpu = <&CPU8>;
> +					};
> +					thread1 {
> +						cpu = <&CPU9>;
> +					};
> +				};
> +				core1 {
> +					thread0 {
> +						cpu = <&CPU10>;
> +					};
> +					thread1 {
> +						cpu = <&CPU11>;
> +					};
> +				};
> +			};
> +
> +			CLUSTER5: cluster1 {
> +				core0 {
> +					thread0 {
> +						cpu = <&CPU12>;
> +					};
> +					thread1 {
> +						cpu = <&CPU13>;
> +					};
> +				};
> +				core1 {
> +					thread0 {
> +						cpu = <&CPU14>;
> +					};
> +					thread1 {
> +						cpu = <&CPU15>;
> +					};
> +				};
> +			};
> +		};
> +	};
> +
> +	cpu-power-states {
> +		compatible = "arm,cpu-power-states";
> +		#size-cells = <0>;
> +		#address-cells = <1>;
> +
> +		state@0 {
> +			reg = <0>;
> +			index = <2>;
> +			entry-method = "psci";
> +			psci-power-state = <0x1010000>;
> +			latency = <400>;
> +			min-residency = <300>;
> +			power-depth = <2>;
> +			cache-level-lost = <1>;
> +			cpus = <&CLUSTER0>;
> +		};
> +
> +		state@1 {
> +			reg = <1>;
> +			index = <2>;
> +			entry-method = "psci";
> +			psci-power-state = <0x1010000>;
> +			latency = <400>;
> +			min-residency = <500>;
> +			power-depth = <2>;
> +			cache-level-lost = <1>;
> +			cpus = <&CLUSTER1>;
> +		};
> +
> +		state@2 {
> +			reg = <2>;
> +			index = <3>;
> +			entry-method = "psci";
> +			psci-power-state = <0x3010000>;
> +			latency = <1000>;
> +			power-depth = <4>;
> +			cache-level-lost = <2>;
> +			cpus = <&CLUSTER0>;
> +			affinity = <&CLUSTER0>;
> +		};
> +
> +		state@3 {
> +			reg = <3>;
> +			index = <3>;
> +			entry-method = "psci";
> +			latency = <4500>;
> +			min-residency = <6500>;
> +			psci-power-state = <0x3010000>;
> +			power-depth = <4>;
> +			cache-level-lost = <2>;
> +			cpus = <&CLUSTER1>;
> +			affinity = <&CLUSTER1>;
> +		};
> +	};
> +
> +	CPU0: cpu@0 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a57";
> +		reg = <0x0 0x0>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_0>;
> +		L1_0: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_0>;
> +		};
> +		L2_0: l2-cache {
> +			compatible = "cache";
> +			cache-level = <2>;
> +		};
> +	};
> +
> +	CPU1: cpu@1 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a57";
> +		reg = <0x0 0x1>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_1>;
> +		L1_1: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_0>;
> +		};
> +	};
> +
> +	CPU2: cpu@100 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a57";
> +		reg = <0x0 0x100>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_2>;
> +		L1_2: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_0>;
> +		};
> +	};
> +
> +	CPU3: cpu@101 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a57";
> +		reg = <0x0 0x101>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_3>;
> +		L1_3: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_0>;
> +		};
> +	};
> +
> +	CPU4: cpu@10000 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a57";
> +		reg = <0x0 0x10000>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_4>;
> +		L1_4: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_0>;
> +		};
> +	};
> +
> +	CPU5: cpu@10001 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a57";
> +		reg = <0x0 0x10001>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_5>;
> +		L1_5: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_0>;
> +		};
> +	};
> +
> +	CPU6: cpu@10100 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a57";
> +		reg = <0x0 0x10100>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_6>;
> +		L1_6: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_0>;
> +		};
> +	};
> +
> +	CPU7: cpu@10101 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a57";
> +		reg = <0x0 0x10101>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_7>;
> +		L1_7: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_0>;
> +		};
> +	};
> +
> +	CPU8: cpu@100000000 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a53";
> +		reg = <0x1 0x0>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_8>;
> +		L1_8: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_1>;
> +		};
> +		L2_1: l2-cache {
> +			compatible = "cache";
> +			cache-level = <2>;
> +		};
> +	};
> +
> +	CPU9: cpu@100000001 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a53";
> +		reg = <0x1 0x1>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_9>;
> +		L1_9: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_1>;
> +		};
> +	};
> +
> +	CPU10: cpu@100000100 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a53";
> +		reg = <0x1 0x100>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_10>;
> +		L1_10: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_1>;
> +		};
> +	};
> +
> +	CPU11: cpu@100000101 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a53";
> +		reg = <0x1 0x101>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_11>;
> +		L1_11: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_1>;
> +		};
> +	};
> +
> +	CPU12: cpu@100010000 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a53";
> +		reg = <0x1 0x10000>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_12>;
> +		L1_12: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_1>;
> +		};
> +	};
> +
> +	CPU13: cpu@100010001 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a53";
> +		reg = <0x1 0x10001>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_13>;
> +		L1_13: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_1>;
> +		};
> +	};
> +
> +	CPU14: cpu@100010100 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a53";
> +		reg = <0x1 0x10100>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_14>;
> +		L1_14: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_1>;
> +		};
> +	};
> +
> +	CPU15: cpu@100010101 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a53";
> +		reg = <0x1 0x10101>;
> +		enable-method = "psci";
> +		next-cache-level = <&L1_15>;
> +		L1_15: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_1>;
> +		};
> +	};
> +};
> +
> +Example 2 (ARM 32-bit, 8-cpu system, two clusters):
> +
> +cpus {
> +	#size-cells = <0>;
> +	#address-cells = <1>;
> +
> +	cpu-map {
> +		CLUSTER0: cluster0 {
> +			core0 {
> +				thread0 {
> +					cpu = <&CPU0>;
> +				};
> +				thread1 {
> +					cpu = <&CPU1>;
> +				};
> +			};
> +
> +			core1 {
> +				thread0 {
> +					cpu = <&CPU2>;
> +				};
> +				thread1 {
> +					cpu = <&CPU3>;
> +				};
> +			};
> +		};
> +
> +		CLUSTER1: cluster1 {
> +			core0 {
> +				thread0 {
> +					cpu = <&CPU4>;
> +				};
> +				thread1 {
> +					cpu = <&CPU5>;
> +				};
> +			};
> +
> +			core1 {
> +				thread0 {
> +					cpu = <&CPU6>;
> +				};
> +				thread1 {
> +					cpu = <&CPU7>;
> +				};
> +			};
> +		};
> +	};
> +
> +	cpu-power-states {
> +		compatible = "arm,cpu-power-states";
> +		#size-cells = <0>;
> +		#address-cells = <1>;
> +
> +		state@0 {
> +			reg = <0>;
> +			index = <2>;
> +			entry-method = "psci";
> +			psci-power-state = <0x1010000>;
> +			latency = <400>;
> +			min-residency = <300>;
> +			power-depth = <2>;
> +			cpus = <&CLUSTER0>;
> +		};
> +
> +		state@1 {
> +			reg = <1>;
> +			index = <2>;
> +			entry-method = "psci";
> +			psci-power-state = <0x1010000>;
> +			latency = <400>;
> +			min-residency = <500>;
> +			power-depth = <2>;
> +			cpus = <&CLUSTER1>;
> +		};
> +
> +		state@2 {
> +			reg = <2>;
> +			index = <3>;
> +			entry-method = "psci";
> +			psci-power-state = <0x2010000>;
> +			latency = <3000>;
> +			min-residency = <3000>;
> +			cache-level-lost = <2>;
> +			power-depth = <3>;
> +			cpus = <&CLUSTER0>;
> +			affinity = <&CLUSTER0>;
> +		};
> +
> +		state@3 {
> +			reg = <3>;
> +			index = <3>;
> +			entry-method = "psci";
> +			psci-power-state = <0x2010000>;
> +			latency = <4000>;
> +			min-residency = <5000>;
> +			cache-level-lost = <2>;
> +			power-depth = <3>;
> +			cpus = <&CLUSTER1>;
> +			affinity = <&CLUSTER1>;
> +		};
> +	};
> +
> +	CPU0: cpu@0 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a15";
> +		reg = <0x0>;
> +		next-cache-level = <&L1_0>;
> +		L1_0: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +		};
> +		L2_0: l2-cache {
> +			compatible = "cache";
> +			cache-level = <2>;
> +		};
> +	};
> +
> +	CPU1: cpu@1 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a15";
> +		reg = <0x1>;
> +		next-cache-level = <&L1_1>;
> +		L1_1: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_0>;
> +		};
> +	};
> +
> +	CPU2: cpu@2 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a15";
> +		reg = <0x2>;
> +		next-cache-level = <&L1_2>;
> +		L1_2: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_0>;
> +		};
> +	};
> +
> +	CPU3: cpu@3 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a15";
> +		reg = <0x3>;
> +		next-cache-level = <&L1_3>;
> +		L1_3: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_0>;
> +		};
> +	};
> +
> +	CPU4: cpu@100 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a7";
> +		reg = <0x100>;
> +		next-cache-level = <&L1_4>;
> +		L1_4: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +		};
> +		L2_1: l2-cache {
> +			compatible = "cache";
> +			cache-level = <2>;
> +		};
> +	};
> +
> +	CPU5: cpu@101 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a7";
> +		reg = <0x101>;
> +		next-cache-level = <&L1_5>;
> +		L1_5: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_1>;
> +		};
> +	};
> +
> +	CPU6: cpu@102 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a7";
> +		reg = <0x102>;
> +		next-cache-level = <&L1_6>;
> +		L1_6: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_1>;
> +		};
> +	};
> +
> +	CPU7: cpu@103 {
> +		device_type = "cpu";
> +		compatible = "arm,cortex-a7";
> +		reg = <0x103>;
> +		next-cache-level = <&L1_7>;
> +		L1_7: l1-cache {
> +			compatible = "cache";
> +			cache-level = <1>;
> +			next-cache-level = <&L2_1>;
> +		};
> +	};
> +};
> +
> +===========================================
> +4 - References
> +===========================================
> +
> +[1] ACPI v5.0 specification
> +    http://www.acpi.info/spec50.htm
> +
> +[2] ARM Linux kernel documentation - topology bindings
> +    Documentation/devicetree/bindings/arm/topology.txt
> +
> +[3] ARM Linux kernel documentation - cache bindings
> +    Documentation/devicetree/bindings/arm/cache.txt
> +
> +[4] ePAPR standard
> +    https://www.power.org/documentation/epapr-version-1-1/
> -- 
> 1.8.4
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Dec. 3, 2013, 10:40 a.m. UTC | #2
On Mon, Dec 02, 2013 at 06:08:16PM +0000, Kumar Gala wrote:
> 
> On Dec 2, 2013, at 10:20 AM, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
> 
> > ARM based platforms implement a variety of power management schemes that
> > allow processors to enter at run-time low-power states, aka C-states
> > in ACPI jargon. The parameters defining these C-states vary on a per-platform
> > basis forcing the OS to hardcode the state parameters in platform
> > specific static tables whose size grows as the number of platforms supported
> > in the kernel increases and hampers device drivers standardization.
> >
> > Therefore, this patch aims at standardizing C-state device tree bindings for
> > ARM platforms. Bindings define C-state parameters inclusive of entry methods
> > and state latencies, to allow operating systems to retrieve the
> > configuration entries from the device tree and initialize the related
> > power management drivers, paving the way for common code in the kernel
> > to deal with power states and removing the need for static data in current
> > and previous kernel versions.
> 
> Where is this spec'd today in the kernel?

How can it be in the kernel given that these bindings have just been posted ?
I started coding the layer managing the C-states in the kernel, even
though I would avoid writing it and then restart from scratch if these
bindings are scrapped. Bindings should not depend on kernel code, it
is the other way around, right ?

[...]

> > +===========================================
> > +2 - state node
> > +===========================================
> > +
> > +A state node represents a C-state description and must be defined as follows:
> > +
> > +- state node
> > +
> > +     Description: must be child of the cpu-power-states node.
> > +
> > +     The state node name must be "state", with unit address provided by the
> > +     "reg" property following standard DT requirements[4].
> > +
> > +     A state node defines the following properties:
> > +
> > +     - reg
> > +             Usage: Required
> > +             Value type: <u32>
> > +             Definition: Standard device tree property [4] used for
> > +                         enumeration purposes.
> 
> I'm not sure what purpose reg is really serving here.

Enumeration, that's it.

> > +
> > +     - index
> > +             Usage: Required
> > +             Value type: <u32>
> > +             Definition: It represents C-state index, starting from 2 (index
> > +                         0 represents the processor state "running" and
> > +                         index 1 represents processor mode "WFI"; indexes 0
> > +                         and 1 are standard ARM states that need not be
> > +                         described).
> 
> any reason not to call it c-state-index"

Well, they are called "state" nodes, so I called it "index".

I really do not care, can change it if we think we should call states
c-states.

> > +
> > +     - entry-method
> > +             Value type: <stringlist>
> > +             Usage: Required
> > +             Definition: Describes the method by which a CPU enters the
> > +                         C-state. This property is required and must be one
> > +                         of:
> > +
> > +                         - "psci"
> > +                           ARM Standard firmware interface
> > +
> > +                         - "[vendor],[method]"
> > +                           An implementation dependent string with
> > +                           format "vendor,method", where vendor is a string
> > +                           denoting the name of the manufacturer and
> > +                           method is a string specifying the mechanism
> > +                           used to enter the C-state.
> > +
> > +     - psci-power-state
> > +             Usage: Required if entry-method property value is set to
> > +                    "psci".
> > +             Value type: <u32>
> > +             Definition: power_state parameter to pass to the PSCI
> > +                         suspend call to enter the C-state.
> > +
> > +     - latency
> > +             Usage: Required
> > +             Value type: <u32>
> > +             Definition: Worst case latency in microseconds required to
> > +                         enter and exit the C-state.
> > +
> > +     - min-residency
> > +             Usage: Required
> > +             Value type: <u32>
> > +             Definition: Time in microseconds required for the CPU to be in
> > +                         the C-state to make up for the dynamic power
> > +                         consumed to enter/exit the C-state in order to
> > +                         break even in terms of power consumption compared
> > +                         to C1 state (wfi).
> > +                         This parameter depends on the operating conditions
> > +                         (operating point, cache state) and must assume
> > +                         worst case scenario.
> > +
> > +     - cpus
> > +             Usage: Optional
> > +             Value type: <phandle>
> > +             Definition: If defined, the phandle points to a node in the
> > +                         cpu-map[2] representing all CPUs on which C-state
> > +                         is valid. If not present or system is UP, the
> > +                         C-state has to be considered valid for all CPUs in
> > +                         the system.
> > +
> > +     - affinity
> > +             Usage: Optional
> > +             Value type: <phandle>
> > +             Definition: If defined, phandle points to a node in the
> > +                         cpu-map[2] that represents all CPUs that are
> > +                         affected (ie share) by the C-state and have to
> > +                         be coordinated on C-state entry/exit. If not
> > +                         present or system is UP, the C-state is local to
> > +                         a CPU and need no coordination (ie it is a CPU
> > +                         state, that does not require coordination with
> > +                         other CPUs). If present, the affinity property
> > +                         must contain a phandle to a cpu-map node that
> > +                         represents a subset, possibly inclusive of the
> > +                         CPUs described through the cpus property.
> > +
> > +     - power-depth
> > +             Usage: Required
> > +             Value type: <u32>
> > +             Definition: Integer value, starting from 2 (value 0 meaning
> > +                         running and value 1 representing power depth of
> > +                         wfi (C1)), that defines the level of depth of a
> > +                         power state.
> > +                         The system denotes power states with different
> > +                         depths, an increasing value meaning less power
> > +                         consumption and might involve powering down more
> > +                         components.  Devices that are affected by
> > +                         C-states entry must define the maximum power
> > +                         depth supported in their respective device tree
> > +                         bindings so that OSPM can take decision on how
> > +                         to handle the device in question when the C-state
> > +                         is entered. All devices (per-CPU or external) with
> > +                         a power depth lower than the one defined in the
> > +                         C-state entry stop operating when the C-state
> > +                         is entered and action is required by OSPM to
> > +                         guarantee their logic and memory content is saved
> > +                         restored to guarantee proper functioning.
> 
> How is this different from the c-state index?

The idea, not sure it is worthwhile, is to represent a unique value in
the system. There can be multiple eg C2 states (two clusters in two
different power domains) with same index and different power depths.

Devices attached to power domains can check the power depth to detect if
the CPU must be prevented from entering the C-state or not, and on the
other hand, power depth allows to understand if a device state must be
saved and restored.

I should have added an example but there is already lots of stuff to
discuss for bindings as they are IMHO.

Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Lezcano Dec. 3, 2013, 11:52 a.m. UTC | #3
On 12/02/2013 05:20 PM, Lorenzo Pieralisi wrote:
> ARM based platforms implement a variety of power management schemes that
> allow processors to enter at run-time low-power states, aka C-states
> in ACPI jargon. The parameters defining these C-states vary on a per-platform
> basis forcing the OS to hardcode the state parameters in platform
> specific static tables whose size grows as the number of platforms supported
> in the kernel increases and hampers device drivers standardization.
>
> Therefore, this patch aims at standardizing C-state device tree bindings for
> ARM platforms. Bindings define C-state parameters inclusive of entry methods
> and state latencies, to allow operating systems to retrieve the
> configuration entries from the device tree and initialize the related
> power management drivers, paving the way for common code in the kernel
> to deal with power states and removing the need for static data in current
> and previous kernel versions.
>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Hi Lorenzo,

thanks for the detailed description.

Just a couple of typos below.

[ ... ]

> +- cpu-power-states node
> +
> +	Usage: Optional - On ARM systems, is a container of processor C-state
> +			  nodes. If the system does not provide CPU power
> +			  management capabilities or the processor just
> +			  supports WFI (C1 state) a cpu-power-states node is
> +			  not required.
> +
> +	Description: cpu-power-states node is a container node, where its
> +		     subnodes describe the CPU low-power C-states.
> +
> +	Node name must be "cpu-power-states".
> +
> +	The cpu-power-states node's parent node must be cpus node.
> +
> +	The cpu-power-states node's child nodes can be:
> +
> +	- one or more state nodes
> +
> +	The cpu-power-states node must contain the following properties:
> +
> +	- compatible
> +		Value type: <stringlist>
> +		Usage: Required

Invert the field above to be consistent with the definitions below.

[ ... ]

> +	- power-depth
> +		Usage: Required
> +		Value type: <u32>
> +		Definition: Integer value, starting from 2 (value 0 meaning
> +			    running and value 1 representing power depth of
> +			    wfi (C1)), that defines the level of depth of a
> +			    power state.
> +			    The system denotes power states with different
> +			    depths, an increasing value meaning less power
> +			    consumption and might involve powering down more
> +			    components.  Devices that are affected by

                                     ^^^ extra space

[ ... ]

Thanks
   -- Daniel
Dave Martin Dec. 4, 2013, 3:20 p.m. UTC | #4
On Mon, Dec 02, 2013 at 04:20:05PM +0000, Lorenzo Pieralisi wrote:
> ARM based platforms implement a variety of power management schemes that
> allow processors to enter at run-time low-power states, aka C-states
> in ACPI jargon. The parameters defining these C-states vary on a per-platform
> basis forcing the OS to hardcode the state parameters in platform
> specific static tables whose size grows as the number of platforms supported
> in the kernel increases and hampers device drivers standardization.
> 
> Therefore, this patch aims at standardizing C-state device tree bindings for
> ARM platforms. Bindings define C-state parameters inclusive of entry methods
> and state latencies, to allow operating systems to retrieve the
> configuration entries from the device tree and initialize the related
> power management drivers, paving the way for common code in the kernel
> to deal with power states and removing the need for static data in current
> and previous kernel versions.
> 
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> +

[...]

> +	- cpus

Because cpus is really a topology subtree, it might be good to have
different names.

To avoid confusion with Mark's affinity properties, maybe a different
word would be preferable instead of "affinity".

Maybe "topology" instead of "cpus", and "affected-topology" instead of
"affinity"?


cpufreq also has its concepts of "related" and "affected" cpus, which
tries to describe something similar (in all honesty, I always struggle
to remember which is which ... but if we were consistent with it, that
might help).

> +		Usage: Optional
> +		Value type: <phandle>
> +		Definition: If defined, the phandle points to a node in the
> +			    cpu-map[2] representing all CPUs on which C-state
> +			    is valid. If not present or system is UP, the
> +			    C-state has to be considered valid for all CPUs in
> +			    the system.
> +
> +	- affinity
> +		Usage: Optional
> +		Value type: <phandle>
> +		Definition: If defined, phandle points to a node in the
> +			    cpu-map[2] that represents all CPUs that are
> +			    affected (ie share) by the C-state and have to
> +			    be coordinated on C-state entry/exit. If not
> +			    present or system is UP, the C-state is local to
> +			    a CPU and need no coordination (ie it is a CPU
> +			    state, that does not require coordination with
> +			    other CPUs). If present, the affinity property
> +			    must contain a phandle to a cpu-map node that
> +			    represents a subset, possibly inclusive of the
> +			    CPUs described through the cpus property.

Can you elaborate on how cpus and affinity might be different?

The statement about "having to be coordainted" also feels a bit vague,
though I'm not sure how much we can usefully say here.

If we describe power domains more explicitly it might help with this,
because that could bring some description of what needs to be
coordinated.

> +
> +	- power-depth
> +		Usage: Required
> +		Value type: <u32>
> +		Definition: Integer value, starting from 2 (value 0 meaning
> +			    running and value 1 representing power depth of
> +			    wfi (C1)), that defines the level of depth of a
> +			    power state.
> +			    The system denotes power states with different
> +			    depths, an increasing value meaning less power
> +			    consumption and might involve powering down more
> +			    components.  Devices that are affected by
> +			    C-states entry must define the maximum power
> +			    depth supported in their respective device tree
> +			    bindings so that OSPM can take decision on how
> +			    to handle the device in question when the C-state
> +			    is entered. All devices (per-CPU or external) with
> +			    a power depth lower than the one defined in the
> +			    C-state entry stop operating when the C-state
> +			    is entered and action is required by OSPM to
> +			    guarantee their logic and memory content is saved
> +			    restored to guarantee proper functioning.

Any reason to use numbers instead of strings?

Strings make the DT more readable ... we would presumably only have to
parse this information once, so it shouldn't be an overhead, unless there
are hundreds of C-state nodes.

[...]

Cheers
---Dave
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kumar Gala Dec. 4, 2013, 3:36 p.m. UTC | #5
On Dec 3, 2013, at 4:40 AM, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:

> On Mon, Dec 02, 2013 at 06:08:16PM +0000, Kumar Gala wrote:
>> 
>> On Dec 2, 2013, at 10:20 AM, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
>> 
>>> ARM based platforms implement a variety of power management schemes that
>>> allow processors to enter at run-time low-power states, aka C-states
>>> in ACPI jargon. The parameters defining these C-states vary on a per-platform
>>> basis forcing the OS to hardcode the state parameters in platform
>>> specific static tables whose size grows as the number of platforms supported
>>> in the kernel increases and hampers device drivers standardization.
>>> 
>>> Therefore, this patch aims at standardizing C-state device tree bindings for
>>> ARM platforms. Bindings define C-state parameters inclusive of entry methods
>>> and state latencies, to allow operating systems to retrieve the
>>> configuration entries from the device tree and initialize the related
>>> power management drivers, paving the way for common code in the kernel
>>> to deal with power states and removing the need for static data in current
>>> and previous kernel versions.
>> 
>> Where is this spec'd today in the kernel?
> 
> How can it be in the kernel given that these bindings have just been posted ?
> I started coding the layer managing the C-states in the kernel, even
> though I would avoid writing it and then restart from scratch if these
> bindings are scrapped. Bindings should not depend on kernel code, it
> is the other way around, right ?

I was guessing that there is existing code in the kernel that uses some platform data structures.  I was wondering what that code looked like today.

> 
> [...]
> 
>>> +===========================================
>>> +2 - state node
>>> +===========================================
>>> +
>>> +A state node represents a C-state description and must be defined as follows:
>>> +
>>> +- state node
>>> +
>>> +     Description: must be child of the cpu-power-states node.
>>> +
>>> +     The state node name must be "state", with unit address provided by the
>>> +     "reg" property following standard DT requirements[4].
>>> +
>>> +     A state node defines the following properties:
>>> +
>>> +     - reg
>>> +             Usage: Required
>>> +             Value type: <u32>
>>> +             Definition: Standard device tree property [4] used for
>>> +                         enumeration purposes.
>> 
>> I'm not sure what purpose reg is really serving here.
> 
> Enumeration, that's it.
> 
>>> +
>>> +     - index
>>> +             Usage: Required
>>> +             Value type: <u32>
>>> +             Definition: It represents C-state index, starting from 2 (index
>>> +                         0 represents the processor state "running" and
>>> +                         index 1 represents processor mode "WFI"; indexes 0
>>> +                         and 1 are standard ARM states that need not be
>>> +                         described).
>> 
>> any reason not to call it c-state-index"
> 
> Well, they are called "state" nodes, so I called it "index".
> 
> I really do not care, can change it if we think we should call states
> c-states.
> 
>>> +
>>> +     - entry-method
>>> +             Value type: <stringlist>
>>> +             Usage: Required
>>> +             Definition: Describes the method by which a CPU enters the
>>> +                         C-state. This property is required and must be one
>>> +                         of:
>>> +
>>> +                         - "psci"
>>> +                           ARM Standard firmware interface
>>> +
>>> +                         - "[vendor],[method]"
>>> +                           An implementation dependent string with
>>> +                           format "vendor,method", where vendor is a string
>>> +                           denoting the name of the manufacturer and
>>> +                           method is a string specifying the mechanism
>>> +                           used to enter the C-state.
>>> +
>>> +     - psci-power-state
>>> +             Usage: Required if entry-method property value is set to
>>> +                    "psci".
>>> +             Value type: <u32>
>>> +             Definition: power_state parameter to pass to the PSCI
>>> +                         suspend call to enter the C-state.
>>> +
>>> +     - latency
>>> +             Usage: Required
>>> +             Value type: <u32>
>>> +             Definition: Worst case latency in microseconds required to
>>> +                         enter and exit the C-state.
>>> +
>>> +     - min-residency
>>> +             Usage: Required
>>> +             Value type: <u32>
>>> +             Definition: Time in microseconds required for the CPU to be in
>>> +                         the C-state to make up for the dynamic power
>>> +                         consumed to enter/exit the C-state in order to
>>> +                         break even in terms of power consumption compared
>>> +                         to C1 state (wfi).
>>> +                         This parameter depends on the operating conditions
>>> +                         (operating point, cache state) and must assume
>>> +                         worst case scenario.
>>> +
>>> +     - cpus
>>> +             Usage: Optional
>>> +             Value type: <phandle>
>>> +             Definition: If defined, the phandle points to a node in the
>>> +                         cpu-map[2] representing all CPUs on which C-state
>>> +                         is valid. If not present or system is UP, the
>>> +                         C-state has to be considered valid for all CPUs in
>>> +                         the system.
>>> +
>>> +     - affinity
>>> +             Usage: Optional
>>> +             Value type: <phandle>
>>> +             Definition: If defined, phandle points to a node in the
>>> +                         cpu-map[2] that represents all CPUs that are
>>> +                         affected (ie share) by the C-state and have to
>>> +                         be coordinated on C-state entry/exit. If not
>>> +                         present or system is UP, the C-state is local to
>>> +                         a CPU and need no coordination (ie it is a CPU
>>> +                         state, that does not require coordination with
>>> +                         other CPUs). If present, the affinity property
>>> +                         must contain a phandle to a cpu-map node that
>>> +                         represents a subset, possibly inclusive of the
>>> +                         CPUs described through the cpus property.
>>> +
>>> +     - power-depth
>>> +             Usage: Required
>>> +             Value type: <u32>
>>> +             Definition: Integer value, starting from 2 (value 0 meaning
>>> +                         running and value 1 representing power depth of
>>> +                         wfi (C1)), that defines the level of depth of a
>>> +                         power state.
>>> +                         The system denotes power states with different
>>> +                         depths, an increasing value meaning less power
>>> +                         consumption and might involve powering down more
>>> +                         components.  Devices that are affected by
>>> +                         C-states entry must define the maximum power
>>> +                         depth supported in their respective device tree
>>> +                         bindings so that OSPM can take decision on how
>>> +                         to handle the device in question when the C-state
>>> +                         is entered. All devices (per-CPU or external) with
>>> +                         a power depth lower than the one defined in the
>>> +                         C-state entry stop operating when the C-state
>>> +                         is entered and action is required by OSPM to
>>> +                         guarantee their logic and memory content is saved
>>> +                         restored to guarantee proper functioning.
>> 
>> How is this different from the c-state index?
> 
> The idea, not sure it is worthwhile, is to represent a unique value in
> the system. There can be multiple eg C2 states (two clusters in two
> different power domains) with same index and different power depths.
> 
> Devices attached to power domains can check the power depth to detect if
> the CPU must be prevented from entering the C-state or not, and on the
> other hand, power depth allows to understand if a device state must be
> saved and restored.
> 
> I should have added an example but there is already lots of stuff to
> discuss for bindings as they are IMHO.
> 
> Lorenzo
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Lorenzo Pieralisi Dec. 4, 2013, 4:31 p.m. UTC | #6
On Wed, Dec 04, 2013 at 03:36:08PM +0000, Kumar Gala wrote:
> 
> On Dec 3, 2013, at 4:40 AM, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
> 
> > On Mon, Dec 02, 2013 at 06:08:16PM +0000, Kumar Gala wrote:
> >> 
> >> On Dec 2, 2013, at 10:20 AM, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
> >> 
> >>> ARM based platforms implement a variety of power management schemes that
> >>> allow processors to enter at run-time low-power states, aka C-states
> >>> in ACPI jargon. The parameters defining these C-states vary on a per-platform
> >>> basis forcing the OS to hardcode the state parameters in platform
> >>> specific static tables whose size grows as the number of platforms supported
> >>> in the kernel increases and hampers device drivers standardization.
> >>> 
> >>> Therefore, this patch aims at standardizing C-state device tree bindings for
> >>> ARM platforms. Bindings define C-state parameters inclusive of entry methods
> >>> and state latencies, to allow operating systems to retrieve the
> >>> configuration entries from the device tree and initialize the related
> >>> power management drivers, paving the way for common code in the kernel
> >>> to deal with power states and removing the need for static data in current
> >>> and previous kernel versions.
> >> 
> >> Where is this spec'd today in the kernel?
> > 
> > How can it be in the kernel given that these bindings have just been posted ?
> > I started coding the layer managing the C-states in the kernel, even
> > though I would avoid writing it and then restart from scratch if these
> > bindings are scrapped. Bindings should not depend on kernel code, it
> > is the other way around, right ?
> 
> I was guessing that there is existing code in the kernel that uses some platform data structures.  I was wondering what that code looked like today.

All C-states (struct cpuidle_driver.states) in drivers in drivers/cpuidle
are examples of static data that would disappear. But there is more to it.
Most of the information added by these bindings is implicit nowadays in the
kernel (cache levels to flush, peripheral state to save/restore); it works
today (but it is not optimized in some cases), it will not tomorrow given that
the complexity of systems is on the rise.

Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Dec. 4, 2013, 5:06 p.m. UTC | #7
On Wed, Dec 04, 2013 at 03:20:10PM +0000, Dave Martin wrote:
> On Mon, Dec 02, 2013 at 04:20:05PM +0000, Lorenzo Pieralisi wrote:
> > ARM based platforms implement a variety of power management schemes that
> > allow processors to enter at run-time low-power states, aka C-states
> > in ACPI jargon. The parameters defining these C-states vary on a per-platform
> > basis forcing the OS to hardcode the state parameters in platform
> > specific static tables whose size grows as the number of platforms supported
> > in the kernel increases and hampers device drivers standardization.
> > 
> > Therefore, this patch aims at standardizing C-state device tree bindings for
> > ARM platforms. Bindings define C-state parameters inclusive of entry methods
> > and state latencies, to allow operating systems to retrieve the
> > configuration entries from the device tree and initialize the related
> > power management drivers, paving the way for common code in the kernel
> > to deal with power states and removing the need for static data in current
> > and previous kernel versions.
> > 
> > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > +
> 
> [...]
> 
> > +	- cpus
> 
> Because cpus is really a topology subtree, it might be good to have
> different names.
> 
> To avoid confusion with Mark's affinity properties, maybe a different
> word would be preferable instead of "affinity".
> 
> Maybe "topology" instead of "cpus", and "affected-topology" instead of
> "affinity"?
> 
> 
> cpufreq also has its concepts of "related" and "affected" cpus, which
> tries to describe something similar (in all honesty, I always struggle
> to remember which is which ... but if we were consistent with it, that
> might help).

Yes, I will try to come up with something clearer.

> > +		Usage: Optional
> > +		Value type: <phandle>
> > +		Definition: If defined, the phandle points to a node in the
> > +			    cpu-map[2] representing all CPUs on which C-state
> > +			    is valid. If not present or system is UP, the
> > +			    C-state has to be considered valid for all CPUs in
> > +			    the system.
> > +
> > +	- affinity
> > +		Usage: Optional
> > +		Value type: <phandle>
> > +		Definition: If defined, phandle points to a node in the
> > +			    cpu-map[2] that represents all CPUs that are
> > +			    affected (ie share) by the C-state and have to
> > +			    be coordinated on C-state entry/exit. If not
> > +			    present or system is UP, the C-state is local to
> > +			    a CPU and need no coordination (ie it is a CPU
> > +			    state, that does not require coordination with
> > +			    other CPUs). If present, the affinity property
> > +			    must contain a phandle to a cpu-map node that
> > +			    represents a subset, possibly inclusive of the
> > +			    CPUs described through the cpus property.
> 
> Can you elaborate on how cpus and affinity might be different?

I was referring to:

- cpus -> processor type (eg valid on A15 or A7, or different implementations
  of the same processor in the same chip)
- affinity -> power domain (a subset of the cpus that require coordination)

Nowadays the distinction does not make much sense (I hardly see a power
state valid on eg A15 clusters [cpus], where just a subset of its cpus need to
be coordinated [affinity] - might be if other levels of caches are added or
if you have multiple clusters of the same CPU type with different power
states capabilities).

I think this deserves more attention, and probably adding power domain
information can remove this mumbo jumbo, there is a scary level of
duplicated information in there.

> The statement about "having to be coordainted" also feels a bit vague,
> though I'm not sure how much we can usefully say here.
> 
> If we describe power domains more explicitly it might help with this,
> because that could bring some description of what needs to be
> coordinated.

Yes, see above.

> > +	- power-depth
> > +		Usage: Required
> > +		Value type: <u32>
> > +		Definition: Integer value, starting from 2 (value 0 meaning
> > +			    running and value 1 representing power depth of
> > +			    wfi (C1)), that defines the level of depth of a
> > +			    power state.
> > +			    The system denotes power states with different
> > +			    depths, an increasing value meaning less power
> > +			    consumption and might involve powering down more
> > +			    components.  Devices that are affected by
> > +			    C-states entry must define the maximum power
> > +			    depth supported in their respective device tree
> > +			    bindings so that OSPM can take decision on how
> > +			    to handle the device in question when the C-state
> > +			    is entered. All devices (per-CPU or external) with
> > +			    a power depth lower than the one defined in the
> > +			    C-state entry stop operating when the C-state
> > +			    is entered and action is required by OSPM to
> > +			    guarantee their logic and memory content is saved
> > +			    restored to guarantee proper functioning.
> 
> Any reason to use numbers instead of strings?
> 
> Strings make the DT more readable ... we would presumably only have to
> parse this information once, so it shouldn't be an overhead, unless there
> are hundreds of C-state nodes.

Yes, but it is supposed to be a unique identifier in the entire system.
Ok, we can create a list of strings denoting power depths, as long as
they are "standard" fine by me, but I think that a number would be
easier to use, even though honestly I think it is better to use power
domains and get rid of this property altogether.

Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vincent Guittot Dec. 6, 2013, 2:54 p.m. UTC | #8
On 4 December 2013 18:06, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
> On Wed, Dec 04, 2013 at 03:20:10PM +0000, Dave Martin wrote:
>> On Mon, Dec 02, 2013 at 04:20:05PM +0000, Lorenzo Pieralisi wrote:
>> > ARM based platforms implement a variety of power management schemes that
>> > allow processors to enter at run-time low-power states, aka C-states
>> > in ACPI jargon. The parameters defining these C-states vary on a per-platform
>> > basis forcing the OS to hardcode the state parameters in platform
>> > specific static tables whose size grows as the number of platforms supported
>> > in the kernel increases and hampers device drivers standardization.
>> >
>> > Therefore, this patch aims at standardizing C-state device tree bindings for
>> > ARM platforms. Bindings define C-state parameters inclusive of entry methods
>> > and state latencies, to allow operating systems to retrieve the
>> > configuration entries from the device tree and initialize the related
>> > power management drivers, paving the way for common code in the kernel
>> > to deal with power states and removing the need for static data in current
>> > and previous kernel versions.
>> >
>> > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>> > +
>>
>> [...]
>>
>> > +   - cpus
>>
>> Because cpus is really a topology subtree, it might be good to have
>> different names.
>>
>> To avoid confusion with Mark's affinity properties, maybe a different
>> word would be preferable instead of "affinity".
>>
>> Maybe "topology" instead of "cpus", and "affected-topology" instead of
>> "affinity"?
>>
>>
>> cpufreq also has its concepts of "related" and "affected" cpus, which
>> tries to describe something similar (in all honesty, I always struggle
>> to remember which is which ... but if we were consistent with it, that
>> might help).
>
> Yes, I will try to come up with something clearer.
>
>> > +           Usage: Optional
>> > +           Value type: <phandle>
>> > +           Definition: If defined, the phandle points to a node in the
>> > +                       cpu-map[2] representing all CPUs on which C-state
>> > +                       is valid. If not present or system is UP, the
>> > +                       C-state has to be considered valid for all CPUs in
>> > +                       the system.
>> > +
>> > +   - affinity
>> > +           Usage: Optional
>> > +           Value type: <phandle>
>> > +           Definition: If defined, phandle points to a node in the
>> > +                       cpu-map[2] that represents all CPUs that are
>> > +                       affected (ie share) by the C-state and have to
>> > +                       be coordinated on C-state entry/exit. If not
>> > +                       present or system is UP, the C-state is local to
>> > +                       a CPU and need no coordination (ie it is a CPU
>> > +                       state, that does not require coordination with
>> > +                       other CPUs). If present, the affinity property
>> > +                       must contain a phandle to a cpu-map node that
>> > +                       represents a subset, possibly inclusive of the
>> > +                       CPUs described through the cpus property.
>>
>> Can you elaborate on how cpus and affinity might be different?
>
> I was referring to:
>
> - cpus -> processor type (eg valid on A15 or A7, or different implementations
>   of the same processor in the same chip)
> - affinity -> power domain (a subset of the cpus that require coordination)
>
> Nowadays the distinction does not make much sense (I hardly see a power
> state valid on eg A15 clusters [cpus], where just a subset of its cpus need to
> be coordinated [affinity] - might be if other levels of caches are added or
> if you have multiple clusters of the same CPU type with different power
> states capabilities).

Hi Lorenzo

not only linked to the cache. Be sure that you will have HW guys to
group a subset of cores of a cluster under same powergate. Now that
should probably be described thanks to power domain information as you
said below

>
> I think this deserves more attention, and probably adding power domain
> information can remove this mumbo jumbo, there is a scary level of
> duplicated information in there.
>
>> The statement about "having to be coordainted" also feels a bit vague,
>> though I'm not sure how much we can usefully say here.
>>
>> If we describe power domains more explicitly it might help with this,
>> because that could bring some description of what needs to be
>> coordinated.
>
> Yes, see above.
>
>> > +   - power-depth
>> > +           Usage: Required
>> > +           Value type: <u32>
>> > +           Definition: Integer value, starting from 2 (value 0 meaning
>> > +                       running and value 1 representing power depth of
>> > +                       wfi (C1)), that defines the level of depth of a
>> > +                       power state.
>> > +                       The system denotes power states with different
>> > +                       depths, an increasing value meaning less power
>> > +                       consumption and might involve powering down more
>> > +                       components.  Devices that are affected by
>> > +                       C-states entry must define the maximum power
>> > +                       depth supported in their respective device tree
>> > +                       bindings so that OSPM can take decision on how
>> > +                       to handle the device in question when the C-state
>> > +                       is entered. All devices (per-CPU or external) with
>> > +                       a power depth lower than the one defined in the
>> > +                       C-state entry stop operating when the C-state
>> > +                       is entered and action is required by OSPM to
>> > +                       guarantee their logic and memory content is saved
>> > +                       restored to guarantee proper functioning.
>>
>> Any reason to use numbers instead of strings?
>>
>> Strings make the DT more readable ... we would presumably only have to
>> parse this information once, so it shouldn't be an overhead, unless there
>> are hundreds of C-state nodes.
>
> Yes, but it is supposed to be a unique identifier in the entire system.
> Ok, we can create a list of strings denoting power depths, as long as
> they are "standard" fine by me, but I think that a number would be
> easier to use, even though honestly I think it is better to use power
> domains and get rid of this property altogether.
>
> Lorenzo
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Antti P Miettinen Dec. 10, 2013, 6:31 a.m. UTC | #9
Hi Lorenzo,

Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> +	- latency
> +		Usage: Required
> +		Value type: <u32>
> +		Definition: Worst case latency in microseconds required to
> +			    enter and exit the C-state.
> +
> +	- min-residency
> +		Usage: Required
> +		Value type: <u32>
> +		Definition: Time in microseconds required for the CPU to be in
> +			    the C-state to make up for the dynamic power
> +			    consumed to enter/exit the C-state in order to
> +			    break even in terms of power consumption compared
> +			    to C1 state (wfi).
> +			    This parameter depends on the operating conditions
> +			    (operating point, cache state) and must assume
> +			    worst case scenario.

I have a concern with these. I know it is not the fault of this patch as
these parameters are what current cpuidle governor/driver interface
uses, but..

Power state entry/exit latencies can be vary quite a lot. Especially CPU
and memory frequencies affect them as can e.g. PMIC properties. Also
power level during entry/exit depends on clocks and voltages. Also the
power level of a sleep state can be context dependent (clocks and
voltages). These mean that also the minimum residency for energy break
even varies. Defining a minimum residency against C1 is a bit
arbitrary. There is no guarantee that the break even order of idle
states remains constant over device context changes.

I have not really properly thought through this but here's an idea.. how
about an alternative interface between governor and driver? The cpuidle
core would provide the expected wakeup time and currently enforced
minimum latency to the driver and the driver would make the decision
about the state to choose.

	--Antti
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Dec. 10, 2013, 1:27 p.m. UTC | #10
On Tue, Dec 10, 2013 at 06:31:56AM +0000, Antti Miettinen wrote:
> Hi Lorenzo,
> 
> Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> > +	- latency
> > +		Usage: Required
> > +		Value type: <u32>
> > +		Definition: Worst case latency in microseconds required to
> > +			    enter and exit the C-state.
> > +
> > +	- min-residency
> > +		Usage: Required
> > +		Value type: <u32>
> > +		Definition: Time in microseconds required for the CPU to be in
> > +			    the C-state to make up for the dynamic power
> > +			    consumed to enter/exit the C-state in order to
> > +			    break even in terms of power consumption compared
> > +			    to C1 state (wfi).
> > +			    This parameter depends on the operating conditions
> > +			    (operating point, cache state) and must assume
> > +			    worst case scenario.
> 
> I have a concern with these. I know it is not the fault of this patch as
> these parameters are what current cpuidle governor/driver interface
> uses, but..

Concern is shared, that's why these bindings hit the list as early as
possible. Just to mention that, I wanted to keep them as OS agnostic
as I could, and I think min-residency might make sense (I mentioned
it has to cater for the worst case, which depends on a number of
run-time states as you describe below).

> Power state entry/exit latencies can be vary quite a lot. Especially CPU
> and memory frequencies affect them as can e.g. PMIC properties. Also
> power level during entry/exit depends on clocks and voltages. Also the
> power level of a sleep state can be context dependent (clocks and
> voltages). These mean that also the minimum residency for energy break
> even varies. Defining a minimum residency against C1 is a bit
> arbitrary. There is no guarantee that the break even order of idle
> states remains constant over device context changes.

Agree 100%.

> I have not really properly thought through this but here's an idea.. how
> about an alternative interface between governor and driver? The cpuidle
> core would provide the expected wakeup time and currently enforced
> minimum latency to the driver and the driver would make the decision
> about the state to choose.

I do not think we should think about how the kernel uses this data.
We should strive to make DT data representative of HW C-states and
that's very complex, as you mentioned (it depends at what granularity
we want these bits of info).

When we are happy with the bindings we can then code the kernel accordingly.

Please let me know how you would like to have these bindings extended
(eg adding operating points), getting feedback is the main reason why
I posted them in the first place.

Thank you,
Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Antti P Miettinen Dec. 10, 2013, 10:04 p.m. UTC | #11
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> I do not think we should think about how the kernel uses this data.
> We should strive to make DT data representative of HW C-states and
> that's very complex, as you mentioned (it depends at what granularity
> we want these bits of info).
>
> When we are happy with the bindings we can then code the kernel accordingly.
>
> Please let me know how you would like to have these bindings extended
> (eg adding operating points), getting feedback is the main reason why
> I posted them in the first place.

Hmm.. I'd like to challenge that a bit. I guess we are not defining DT
bindings just for the joy of modelling the hardware? We should care
whether kernel needs the data and have some idea of how the data will be
used.

As you say, modelling C state details is not trivial. It might be
possible to construct an approximate formula for e.g. entry/exit latency
that takes CPU frequency, memory frequency and PMIC ramp rates as
input. Also, in principle we could estimate power based on clocks,
voltages, temperature etc. As we probably do not want to put function
definitions to DT, the DT would contain e.g. coefficients for functions
that would need to be platform neutral.

Is this what you'd like to see? There has been some research in
estimating power without actually measuring it, e.g. the google
powertutor people have written some papers about this. The latencies
could be measured to some extend with instrumentation in the kernel and
the measurement results could be used to tune some parameters.

Or would you rather have tables, which specify latencies and power
levels and the tables would be indexed with frequencies and voltages?

Anyway, I would really like to see the option of having the state choice
in the driver. One possible way to achieve this would be to allow for
the driver to export an optional "choose" method. If that exists the
governor would offload the decision to the driver.

	--Antti
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Dec. 16, 2013, 12:11 p.m. UTC | #12
On Tue, Dec 10, 2013 at 10:04:27PM +0000, Antti Miettinen wrote:
> Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> > I do not think we should think about how the kernel uses this data.
> > We should strive to make DT data representative of HW C-states and
> > that's very complex, as you mentioned (it depends at what granularity
> > we want these bits of info).
> >
> > When we are happy with the bindings we can then code the kernel accordingly.
> >
> > Please let me know how you would like to have these bindings extended
> > (eg adding operating points), getting feedback is the main reason why
> > I posted them in the first place.
> 
> Hmm.. I'd like to challenge that a bit. I guess we are not defining DT
> bindings just for the joy of modelling the hardware? We should care
> whether kernel needs the data and have some idea of how the data will be
> used.

I agree all I am saying is that DT bindings must not contain anything
Linux kernel specific, ie adding parameters that are purely SW concepts
(eg menu governor target_residency).

> As you say, modelling C state details is not trivial. It might be
> possible to construct an approximate formula for e.g. entry/exit latency
> that takes CPU frequency, memory frequency and PMIC ramp rates as
> input. Also, in principle we could estimate power based on clocks,
> voltages, temperature etc. As we probably do not want to put function
> definitions to DT, the DT would contain e.g. coefficients for functions
> that would need to be platform neutral.

I do not think we should model anything in DT, we should define what
a C-state entry/exit implies in HW. The kernel can model the behaviour
depending on the parameters provided by the DT data.

> Is this what you'd like to see? There has been some research in
> estimating power without actually measuring it, e.g. the google
> powertutor people have written some papers about this. The latencies
> could be measured to some extend with instrumentation in the kernel and
> the measurement results could be used to tune some parameters.
> 
> Or would you rather have tables, which specify latencies and power
> levels and the tables would be indexed with frequencies and voltages?

The latter. I did not add operating points info in v1 because I thought
it might have been too much, but I think it is something we should
consider for the final version.

> Anyway, I would really like to see the option of having the state choice
> in the driver. One possible way to achieve this would be to allow for
> the driver to export an optional "choose" method. If that exists the
> governor would offload the decision to the driver.

That's a separate discussion. CPUidle backends can already demote
C-states depending on HW states (pending IRQs, state of caches).
This also has loads of dependencies (what piece of code is in charge of
making the final decision ? Kernel ? FW (ie PSCI) ?).

I think as I mentioned that the state choice discussion is a parallel
track altogether. Let's define what bits of info are required in the DT
first, with an eye on how the kernel can make use of them, then we
can focus on changing the kernel (actually idle interfaces changes are
already under way owing to scheduler discussions) to make best usage of
them.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/arm/c-states.txt b/Documentation/devicetree/bindings/arm/c-states.txt
new file mode 100644
index 0000000..f568417
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/c-states.txt
@@ -0,0 +1,830 @@ 
+==========================================
+ARM C-states binding description
+==========================================
+
+==========================================
+1 - Introduction
+==========================================
+
+ARM systems contain HW capable of managing power consumption dynamically,
+where cores can be put in different low-power states (ranging from simple
+wfi to power gating) according to OSPM policies. Borrowing concepts
+from the ACPI specification[1], the CPU states representing the range of
+dynamic states that a processor can enter at run-time, aka C-state, can be
+specified through device tree bindings representing the parameters required to
+enter/exit specific C-states on a given processor.
+
+The state an ARM CPU can be put into is loosely identified by one of the
+following operating modes:
+
+- Running:
+	 # Processor core is executing instructions
+
+- Wait for Interrupt:
+	# An ARM processor enters wait for interrupt (WFI) low power
+	  state by executing a wfi instruction. When a processor enters
+	  wfi state it disables most of the clocks while keeping the processor
+	  powered up. This state is standard on all ARM processors and it is
+	  defined as C1 in the remainder of this document.
+
+- Dormant:
+	# Dormant mode is entered by executing wfi instructions and by sending
+	  platform specific commands to the platform power controller (coupled
+	  with processor specific SW/HW control sequences).
+	  In dormant mode, most of the processor control and debug logic is
+	  powered up but cache RAM can be put in retention state, providing
+	  additional power savings.
+
+- Sleep:
+	# Sleep mode is entered by executing the wfi instruction and by sending
+	  platform specific commands to the platform power controller (coupled
+	  with processor specific SW/HW control sequences). In sleep mode, a
+	  processor and its caches are shutdown, the entire processor state is
+	  lost.
+
+Building on top of the previous processor modes, ARM platforms implement power
+management schemes that allow an OS PM implementation to put the processor in
+different CPU states (C-states). C-states parameters (eg latency) are
+platform specific and need to be characterized with bindings that provide the
+required information to OSPM code so that it can build the required tables and
+use them at runtime.
+
+The device tree binding definition for ARM C-states is the subject of this
+document.
+
+===========================================
+2 - cpu-power-states node
+===========================================
+
+ARM processor C-states are defined within the cpu-power-states node, which is
+a direct child of the cpus node and provides a container where the processor
+states, defined as device tree nodes, are listed.
+
+- cpu-power-states node
+
+	Usage: Optional - On ARM systems, is a container of processor C-state
+			  nodes. If the system does not provide CPU power
+			  management capabilities or the processor just
+			  supports WFI (C1 state) a cpu-power-states node is
+			  not required.
+
+	Description: cpu-power-states node is a container node, where its
+		     subnodes describe the CPU low-power C-states.
+
+	Node name must be "cpu-power-states".
+
+	The cpu-power-states node's parent node must be cpus node.
+
+	The cpu-power-states node's child nodes can be:
+
+	- one or more state nodes
+
+	The cpu-power-states node must contain the following properties:
+
+	- compatible
+		Value type: <stringlist>
+		Usage: Required
+		Definition: Must be "arm,cpu-power-states".
+
+	- #address-cells
+		Usage: Required
+		Value type: <u32>
+		Definition: must be set to 1.
+
+	- #size-cells
+		Usage: Required
+		Value type: <u32>
+		Definition: must be set to 0.
+
+	Any other configuration is considered invalid.
+
+The nodes describing the C-states (state) can only be defined within the
+cpu-power-states node.
+
+Any other configuration is consider invalid and therefore must be ignored.
+
+===========================================
+2 - state node
+===========================================
+
+A state node represents a C-state description and must be defined as follows:
+
+- state node
+
+	Description: must be child of the cpu-power-states node.
+
+	The state node name must be "state", with unit address provided by the
+	"reg" property following standard DT requirements[4].
+
+	A state node defines the following properties:
+
+	- reg
+		Usage: Required
+		Value type: <u32>
+		Definition: Standard device tree property [4] used for
+			    enumeration purposes.
+
+	- index
+		Usage: Required
+		Value type: <u32>
+		Definition: It represents C-state index, starting from 2 (index
+			    0 represents the processor state "running" and
+			    index 1 represents processor mode "WFI"; indexes 0
+			    and 1 are standard ARM states that need not be
+			    described).
+
+	- entry-method
+		Value type: <stringlist>
+		Usage: Required
+		Definition: Describes the method by which a CPU enters the
+			    C-state. This property is required and must be one
+			    of:
+
+			    - "psci"
+			      ARM Standard firmware interface
+
+			    - "[vendor],[method]"
+			      An implementation dependent string with
+			      format "vendor,method", where vendor is a string
+			      denoting the name of the manufacturer and
+			      method is a string specifying the mechanism
+			      used to enter the C-state.
+
+	- psci-power-state
+		Usage: Required if entry-method property value is set to
+		       "psci".
+		Value type: <u32>
+		Definition: power_state parameter to pass to the PSCI
+			    suspend call to enter the C-state.
+
+	- latency
+		Usage: Required
+		Value type: <u32>
+		Definition: Worst case latency in microseconds required to
+			    enter and exit the C-state.
+
+	- min-residency
+		Usage: Required
+		Value type: <u32>
+		Definition: Time in microseconds required for the CPU to be in
+			    the C-state to make up for the dynamic power
+			    consumed to enter/exit the C-state in order to
+			    break even in terms of power consumption compared
+			    to C1 state (wfi).
+			    This parameter depends on the operating conditions
+			    (operating point, cache state) and must assume
+			    worst case scenario.
+
+	- cpus
+		Usage: Optional
+		Value type: <phandle>
+		Definition: If defined, the phandle points to a node in the
+			    cpu-map[2] representing all CPUs on which C-state
+			    is valid. If not present or system is UP, the
+			    C-state has to be considered valid for all CPUs in
+			    the system.
+
+	- affinity
+		Usage: Optional
+		Value type: <phandle>
+		Definition: If defined, phandle points to a node in the
+			    cpu-map[2] that represents all CPUs that are
+			    affected (ie share) by the C-state and have to
+			    be coordinated on C-state entry/exit. If not
+			    present or system is UP, the C-state is local to
+			    a CPU and need no coordination (ie it is a CPU
+			    state, that does not require coordination with
+			    other CPUs). If present, the affinity property
+			    must contain a phandle to a cpu-map node that
+			    represents a subset, possibly inclusive of the
+			    CPUs described through the cpus property.
+
+	- power-depth
+		Usage: Required
+		Value type: <u32>
+		Definition: Integer value, starting from 2 (value 0 meaning
+			    running and value 1 representing power depth of
+			    wfi (C1)), that defines the level of depth of a
+			    power state.
+			    The system denotes power states with different
+			    depths, an increasing value meaning less power
+			    consumption and might involve powering down more
+			    components.  Devices that are affected by
+			    C-states entry must define the maximum power
+			    depth supported in their respective device tree
+			    bindings so that OSPM can take decision on how
+			    to handle the device in question when the C-state
+			    is entered. All devices (per-CPU or external) with
+			    a power depth lower than the one defined in the
+			    C-state entry stop operating when the C-state
+			    is entered and action is required by OSPM to
+			    guarantee their logic and memory content is saved
+			    restored to guarantee proper functioning.
+
+	- cache-level-lost:
+		Usage: Required if "entry-method" differs from "psci".
+		Value type: <u32>
+		Definition: An integer value representing the uppermost cache
+			    level (inclusive) that is lost upon state entry.
+			    This property requires the definition of cache
+			    nodes as specified in [3]. Cache levels that are
+			    shared between processors, according to [3], should
+			    coordinate cache cleaning and invalidation to
+			    maximize performance (ie a shared cache level
+			    must be cleaned only if all CPUs sharing the
+			    cache entered the state). If missing, cache
+			    state has to be considered retained.
+
+	- processor-state-retained:
+		Usage: See definition
+		Value type: <none>
+		Definition: if present CPU processor logic is retained on
+			    power down, otherwise it is lost.
+
+
+===========================================
+3 - Examples
+===========================================
+
+Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters):
+
+cpus {
+	#size-cells = <0>;
+	#address-cells = <2>;
+
+	cpu-map {
+		CLUSTER0: cluster0 {
+			CLUSTER2: cluster0 {
+				core0 {
+					thread0 {
+						cpu = <&CPU0>;
+					};
+					thread1 {
+						cpu = <&CPU1>;
+					};
+				};
+
+				core1 {
+					thread0 {
+						cpu = <&CPU2>;
+					};
+					thread1 {
+						cpu = <&CPU3>;
+					};
+				};
+			};
+
+			CLUSTER3: cluster1 {
+				core0 {
+					thread0 {
+						cpu = <&CPU4>;
+					};
+					thread1 {
+						cpu = <&CPU5>;
+					};
+				};
+
+				core1 {
+					thread0 {
+						cpu = <&CPU6>;
+					};
+					thread1 {
+						cpu = <&CPU7>;
+					};
+				};
+			};
+		};
+
+		CLUSTER1: cluster1 {
+			CLUSTER4: cluster0 {
+				core0 {
+					thread0 {
+						cpu = <&CPU8>;
+					};
+					thread1 {
+						cpu = <&CPU9>;
+					};
+				};
+				core1 {
+					thread0 {
+						cpu = <&CPU10>;
+					};
+					thread1 {
+						cpu = <&CPU11>;
+					};
+				};
+			};
+
+			CLUSTER5: cluster1 {
+				core0 {
+					thread0 {
+						cpu = <&CPU12>;
+					};
+					thread1 {
+						cpu = <&CPU13>;
+					};
+				};
+				core1 {
+					thread0 {
+						cpu = <&CPU14>;
+					};
+					thread1 {
+						cpu = <&CPU15>;
+					};
+				};
+			};
+		};
+	};
+
+	cpu-power-states {
+		compatible = "arm,cpu-power-states";
+		#size-cells = <0>;
+		#address-cells = <1>;
+
+		state@0 {
+			reg = <0>;
+			index = <2>;
+			entry-method = "psci";
+			psci-power-state = <0x1010000>;
+			latency = <400>;
+			min-residency = <300>;
+			power-depth = <2>;
+			cache-level-lost = <1>;
+			cpus = <&CLUSTER0>;
+		};
+
+		state@1 {
+			reg = <1>;
+			index = <2>;
+			entry-method = "psci";
+			psci-power-state = <0x1010000>;
+			latency = <400>;
+			min-residency = <500>;
+			power-depth = <2>;
+			cache-level-lost = <1>;
+			cpus = <&CLUSTER1>;
+		};
+
+		state@2 {
+			reg = <2>;
+			index = <3>;
+			entry-method = "psci";
+			psci-power-state = <0x3010000>;
+			latency = <1000>;
+			power-depth = <4>;
+			cache-level-lost = <2>;
+			cpus = <&CLUSTER0>;
+			affinity = <&CLUSTER0>;
+		};
+
+		state@3 {
+			reg = <3>;
+			index = <3>;
+			entry-method = "psci";
+			latency = <4500>;
+			min-residency = <6500>;
+			psci-power-state = <0x3010000>;
+			power-depth = <4>;
+			cache-level-lost = <2>;
+			cpus = <&CLUSTER1>;
+			affinity = <&CLUSTER1>;
+		};
+	};
+
+	CPU0: cpu@0 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x0>;
+		enable-method = "psci";
+		next-cache-level = <&L1_0>;
+		L1_0: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_0>;
+		};
+		L2_0: l2-cache {
+			compatible = "cache";
+			cache-level = <2>;
+		};
+	};
+
+	CPU1: cpu@1 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x1>;
+		enable-method = "psci";
+		next-cache-level = <&L1_1>;
+		L1_1: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_0>;
+		};
+	};
+
+	CPU2: cpu@100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x100>;
+		enable-method = "psci";
+		next-cache-level = <&L1_2>;
+		L1_2: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_0>;
+		};
+	};
+
+	CPU3: cpu@101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x101>;
+		enable-method = "psci";
+		next-cache-level = <&L1_3>;
+		L1_3: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_0>;
+		};
+	};
+
+	CPU4: cpu@10000 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10000>;
+		enable-method = "psci";
+		next-cache-level = <&L1_4>;
+		L1_4: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_0>;
+		};
+	};
+
+	CPU5: cpu@10001 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10001>;
+		enable-method = "psci";
+		next-cache-level = <&L1_5>;
+		L1_5: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_0>;
+		};
+	};
+
+	CPU6: cpu@10100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10100>;
+		enable-method = "psci";
+		next-cache-level = <&L1_6>;
+		L1_6: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_0>;
+		};
+	};
+
+	CPU7: cpu@10101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10101>;
+		enable-method = "psci";
+		next-cache-level = <&L1_7>;
+		L1_7: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_0>;
+		};
+	};
+
+	CPU8: cpu@100000000 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x0>;
+		enable-method = "psci";
+		next-cache-level = <&L1_8>;
+		L1_8: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_1>;
+		};
+		L2_1: l2-cache {
+			compatible = "cache";
+			cache-level = <2>;
+		};
+	};
+
+	CPU9: cpu@100000001 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x1>;
+		enable-method = "psci";
+		next-cache-level = <&L1_9>;
+		L1_9: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_1>;
+		};
+	};
+
+	CPU10: cpu@100000100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x100>;
+		enable-method = "psci";
+		next-cache-level = <&L1_10>;
+		L1_10: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_1>;
+		};
+	};
+
+	CPU11: cpu@100000101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x101>;
+		enable-method = "psci";
+		next-cache-level = <&L1_11>;
+		L1_11: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_1>;
+		};
+	};
+
+	CPU12: cpu@100010000 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10000>;
+		enable-method = "psci";
+		next-cache-level = <&L1_12>;
+		L1_12: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_1>;
+		};
+	};
+
+	CPU13: cpu@100010001 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10001>;
+		enable-method = "psci";
+		next-cache-level = <&L1_13>;
+		L1_13: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_1>;
+		};
+	};
+
+	CPU14: cpu@100010100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10100>;
+		enable-method = "psci";
+		next-cache-level = <&L1_14>;
+		L1_14: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_1>;
+		};
+	};
+
+	CPU15: cpu@100010101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10101>;
+		enable-method = "psci";
+		next-cache-level = <&L1_15>;
+		L1_15: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_1>;
+		};
+	};
+};
+
+Example 2 (ARM 32-bit, 8-cpu system, two clusters):
+
+cpus {
+	#size-cells = <0>;
+	#address-cells = <1>;
+
+	cpu-map {
+		CLUSTER0: cluster0 {
+			core0 {
+				thread0 {
+					cpu = <&CPU0>;
+				};
+				thread1 {
+					cpu = <&CPU1>;
+				};
+			};
+
+			core1 {
+				thread0 {
+					cpu = <&CPU2>;
+				};
+				thread1 {
+					cpu = <&CPU3>;
+				};
+			};
+		};
+
+		CLUSTER1: cluster1 {
+			core0 {
+				thread0 {
+					cpu = <&CPU4>;
+				};
+				thread1 {
+					cpu = <&CPU5>;
+				};
+			};
+
+			core1 {
+				thread0 {
+					cpu = <&CPU6>;
+				};
+				thread1 {
+					cpu = <&CPU7>;
+				};
+			};
+		};
+	};
+
+	cpu-power-states {
+		compatible = "arm,cpu-power-states";
+		#size-cells = <0>;
+		#address-cells = <1>;
+
+		state@0 {
+			reg = <0>;
+			index = <2>;
+			entry-method = "psci";
+			psci-power-state = <0x1010000>;
+			latency = <400>;
+			min-residency = <300>;
+			power-depth = <2>;
+			cpus = <&CLUSTER0>;
+		};
+
+		state@1 {
+			reg = <1>;
+			index = <2>;
+			entry-method = "psci";
+			psci-power-state = <0x1010000>;
+			latency = <400>;
+			min-residency = <500>;
+			power-depth = <2>;
+			cpus = <&CLUSTER1>;
+		};
+
+		state@2 {
+			reg = <2>;
+			index = <3>;
+			entry-method = "psci";
+			psci-power-state = <0x2010000>;
+			latency = <3000>;
+			min-residency = <3000>;
+			cache-level-lost = <2>;
+			power-depth = <3>;
+			cpus = <&CLUSTER0>;
+			affinity = <&CLUSTER0>;
+		};
+
+		state@3 {
+			reg = <3>;
+			index = <3>;
+			entry-method = "psci";
+			psci-power-state = <0x2010000>;
+			latency = <4000>;
+			min-residency = <5000>;
+			cache-level-lost = <2>;
+			power-depth = <3>;
+			cpus = <&CLUSTER1>;
+			affinity = <&CLUSTER1>;
+		};
+	};
+
+	CPU0: cpu@0 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x0>;
+		next-cache-level = <&L1_0>;
+		L1_0: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+		};
+		L2_0: l2-cache {
+			compatible = "cache";
+			cache-level = <2>;
+		};
+	};
+
+	CPU1: cpu@1 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x1>;
+		next-cache-level = <&L1_1>;
+		L1_1: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_0>;
+		};
+	};
+
+	CPU2: cpu@2 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x2>;
+		next-cache-level = <&L1_2>;
+		L1_2: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_0>;
+		};
+	};
+
+	CPU3: cpu@3 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x3>;
+		next-cache-level = <&L1_3>;
+		L1_3: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_0>;
+		};
+	};
+
+	CPU4: cpu@100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x100>;
+		next-cache-level = <&L1_4>;
+		L1_4: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+		};
+		L2_1: l2-cache {
+			compatible = "cache";
+			cache-level = <2>;
+		};
+	};
+
+	CPU5: cpu@101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x101>;
+		next-cache-level = <&L1_5>;
+		L1_5: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_1>;
+		};
+	};
+
+	CPU6: cpu@102 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x102>;
+		next-cache-level = <&L1_6>;
+		L1_6: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_1>;
+		};
+	};
+
+	CPU7: cpu@103 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x103>;
+		next-cache-level = <&L1_7>;
+		L1_7: l1-cache {
+			compatible = "cache";
+			cache-level = <1>;
+			next-cache-level = <&L2_1>;
+		};
+	};
+};
+
+===========================================
+4 - References
+===========================================
+
+[1] ACPI v5.0 specification
+    http://www.acpi.info/spec50.htm
+
+[2] ARM Linux kernel documentation - topology bindings
+    Documentation/devicetree/bindings/arm/topology.txt
+
+[3] ARM Linux kernel documentation - cache bindings
+    Documentation/devicetree/bindings/arm/cache.txt
+
+[4] ePAPR standard
+    https://www.power.org/documentation/epapr-version-1-1/