diff mbox series

[RFC,1/1] SLW : Add new idle device-tree format

Message ID 20190809105052.21958-2-huntbag@linux.vnet.ibm.com
State RFC
Headers show
Series New device-tree format | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (0e1db80c70477d89a73c7f2a1a7e19c7d8292c5f)
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present

Commit Message

Abhishek Goel Aug. 9, 2019, 10:50 a.m. UTC
Add stop states under ibm,idle-states in addition to the current array
based device tree properties.

New device tree format adds a compatible flag which has version
corresponding to every state, so that only kernel which has the capability
to handle the version of stop state will enable it. Drawback of the array
based dt node is that versioning of idle states is not possible.

Older kernel will still see stop0 and stop0_lite in older format and we
will deprecate it after some time.

Consider a case that stop4 has a bug. We take the following steps to
mitigate the problem.

1) Change compatible string for stop4 in OPAL to "stop4,v2" from
"stop4,v1", i.e. basicallly bump up the previous version and ship the
new firmware.
2) The kernel will ignore stop4 as it won't be able to recognize this
new version. Kernel will also ignore all the deeper states because its
possible that a cpu have requested for a deeper state but was never able
to enter into it. But we will still have shallower states that are there
before stop 4. This, thus prevents from completely disabling stop states.

Linux kernel can now look at the version string and decide if it has the
ability to handle that idle state. Henceforth, if kernel does not know
about a version, it will skip that state and all the deeper state.

Once when the workaround are implemented into the kernel, we can bump up
the known version in kernel for that state, so that support can be
enabled once again in kernel.

New Device-tree :

Final output
       power-mgt {
            ...
         ibm,enabled-stop-levels = <0xec000000>;
         ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
         ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
         ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
         ibm,cpu-idle-state-flags = <0x100000 0x101000>;
         ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
         ibm,idle-states {
                     stop4 {
                         flags = <0x207000>;
                         compatible = "stop4,v1",
                         psscr-mask = <0x0 0x3003ff>;
                         latency-ns = <0x186a0>;
                         residency-ns = <0x989680>;
                         psscr = <0x0 0x300374>;
                  };
                    ...
                    stop11 {
                         ...
                         compatible = "stop11,v1",
                         ...
                  };
             };

Signed-off-by: Abhishek Goel <huntbag@linux.vnet.ibm.com>
---
 hw/slw.c           | 146 +++++++++++++++++++++++++++++++++++----------
 include/opal-api.h |   7 +++
 2 files changed, 123 insertions(+), 30 deletions(-)

Comments

Oliver O'Halloran Nov. 18, 2019, 1:05 a.m. UTC | #1
On Fri, Aug 9, 2019 at 8:55 PM Abhishek Goel <huntbag@linux.vnet.ibm.com> wrote:
>
> Add stop states under ibm,idle-states in addition to the current array
> based device tree properties.
Hi Ahbishek,

Sorry, about the late response. I got bogged down on other things for
a whle, but I'm trying to clear out the patchwork backlog now.

> New device tree format adds a compatible flag which has version
> corresponding to every state, so that only kernel which has the capability
> to handle the version of stop state will enable it. Drawback of the array
> based dt node is that versioning of idle states is not possible.
>
> Older kernel will still see stop0 and stop0_lite in older format and we
> will deprecate it after some time.
>
> Consider a case that stop4 has a bug. We take the following steps to
> mitigate the problem.
>
> 1) Change compatible string for stop4 in OPAL to "stop4,v2" from
> "stop4,v1", i.e. basicallly bump up the previous version and ship the
> new firmware.
> 2) The kernel will ignore stop4 as it won't be able to recognize this
> new version. Kernel will also ignore all the deeper states because its
> possible that a cpu have requested for a deeper state but was never able
> to enter into it. But we will still have shallower states that are there
> before stop 4. This, thus prevents from completely disabling stop states.
>
> Linux kernel can now look at the version string and decide if it has the
> ability to handle that idle state. Henceforth, if kernel does not know
> about a version, it will skip that state and all the deeper state.

How does the kernel know if one state node deeper than another? What
I'm really wondering is whether we need a reg for the individual
states e.g.:

stop@11 {
 reg= <11>
 compatible="stop11-v1"
 ...
}
stop@4 {
 reg = <4>;
 compatible="stop4-v1";
 ...
}

> Once when the workaround are implemented into the kernel, we can bump up
> the known version in kernel for that state, so that support can be
> enabled once again in kernel.
>
> New Device-tree :
>
> Final output
>        power-mgt {
>             ...
>          ibm,enabled-stop-levels = <0xec000000>;
>          ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
>          ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
>          ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
>          ibm,cpu-idle-state-flags = <0x100000 0x101000>;
>          ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
>          ibm,idle-states {
>                      stop4 {
>                          flags = <0x207000>;
>                          compatible = "stop4,v1",
>                          psscr-mask = <0x0 0x3003ff>;
>                          latency-ns = <0x186a0>;
>                          residency-ns = <0x989680>;
>                          psscr = <0x0 0x300374>;
>                   };
>                     ...
>                     stop11 {
>                          ...
>                          compatible = "stop11,v1",
>                          ...
>                   };
>              };
>
> Signed-off-by: Abhishek Goel <huntbag@linux.vnet.ibm.com>
> ---
>  hw/slw.c           | 146 +++++++++++++++++++++++++++++++++++----------
>  include/opal-api.h |   7 +++
>  2 files changed, 123 insertions(+), 30 deletions(-)
>
> diff --git a/hw/slw.c b/hw/slw.c
> index c872b630..1c857d5a 100644
> --- a/hw/slw.c
> +++ b/hw/slw.c
> @@ -370,9 +370,15 @@ static bool idle_prepare_core(struct proc_chip *chip, struct cpu_thread *c)
>  }
>
>  /* Define device-tree fields */
> -#define MAX_NAME_LEN   16
> +#define MAX_NAME_LEN           16
> +#define MAX_VERSION_LEN                25
> +#define MAX_DESCRIPTION_LEN    60
>  struct cpu_idle_states {
>         char name[MAX_NAME_LEN];
> +       /* Adding compatibility version for evey stop state */
Pointless comment.

> +       char compat_version[MAX_VERSION_LEN];
> +       /* Adding description for every stop state */
Same here.

> +       char desc[MAX_DESCRIPTION_LEN];
>         u32 latency_ns;
>         u32 residency_ns;
>         /*
> @@ -381,7 +387,9 @@ struct cpu_idle_states {
>          */
>         u64 pm_ctrl_reg_val;
>         u64 pm_ctrl_reg_mask;
> -       u32 flags;
> +       u64 flags; /* changing flags from u32 to u64 */
and here

> +       /* older kernel with newer firmware will use these states */
> +       u32 legacy_state;
That comment should probably be saying something like "legacy states
go into the legacy STOP state arrays" or something. We try not to
assume too much about the host kernel.

>  };
>
>  static struct cpu_idle_states nap_only_cpu_idle_states[] = {
> @@ -461,13 +469,20 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>         {
>                 .name = "stop0_lite", /* Enter stop0 with no state loss */
>                 .latency_ns = 1000,
> +               .compat_version = "stop0_lite,v1",
> +               .desc = "thread level stop instruction dispatch",
> +               .legacy_state = 1,
>                 .residency_ns = 10000,
> +               /* flags will now also have bits to identify if a state is
> +                * of cpuidle or hotplug type
> +                */
>                 .flags = 0*OPAL_PM_DEC_STOP \
>                        | 0*OPAL_PM_TIMEBASE_STOP  \
>                        | 0*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 0*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_FAST,
> +                      | 1*OPAL_PM_STOP_INST_FAST \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(0) \
>                                  | OPAL_PM_PSSCR_MTL(3) \
>                                  | OPAL_PM_PSSCR_TR(3),
> @@ -475,13 +490,17 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>         {
>                 .name = "stop0",
>                 .latency_ns = 2000,
> +               .compat_version = "stop0,v1",
> +               .desc = "low latency SMT switching",
>                 .residency_ns = 20000,
> +               .legacy_state = 1,
>                 .flags = 0*OPAL_PM_DEC_STOP \
>                        | 0*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 0*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_FAST,
> +                      | 1*OPAL_PM_STOP_INST_FAST \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(0) \
>                                  | OPAL_PM_PSSCR_MTL(3) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -493,14 +512,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>
>         {
>                 .name = "stop1",
> +               .compat_version = "stop1,v1",
> +               .desc = "SMT switching with basic clock gating",
>                 .latency_ns = 5000,
>                 .residency_ns = 50000,
> +               .legacy_state = 0,
>                 .flags = 0*OPAL_PM_DEC_STOP \
>                        | 0*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 0*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_FAST,
> +                      | 1*OPAL_PM_STOP_INST_FAST \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(1) \
>                                  | OPAL_PM_PSSCR_MTL(3) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -514,14 +537,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>
>         {
>                 .name = "stop2",
> +               .compat_version = "stop2,v1",
> +               .desc = "core level clock gating",
>                 .latency_ns = 10000,
>                 .residency_ns = 100000,
> +               .legacy_state = 0,
>                 .flags = 0*OPAL_PM_DEC_STOP \
>                        | 0*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 0*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_FAST,
> +                      | 1*OPAL_PM_STOP_INST_FAST \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(2) \
>                                  | OPAL_PM_PSSCR_MTL(3) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -530,14 +557,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>                 .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
>         {
>                 .name = "stop4",
> +               .compat_version = "stop4,v1",
> +               .desc = "core level power gating",
>                 .latency_ns = 100000,
>                 .residency_ns = 10000000,
> +               .legacy_state = 0,
>                 .flags = 0*OPAL_PM_DEC_STOP \
>                        | 0*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 1*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 1*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_DEEP,
> +                      | 1*OPAL_PM_STOP_INST_DEEP \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(4) \
>                                  | OPAL_PM_PSSCR_MTL(7) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -546,14 +577,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>                 .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
>         {
>                 .name = "stop5",
> +               .compat_version = "stop5,v1",
> +               .desc = "core level power gating and turbo boost benefit",
>                 .latency_ns = 200000,
>                 .residency_ns = 20000000,
> +               .legacy_state = 0,
>                 .flags = 0*OPAL_PM_DEC_STOP \
>                        | 0*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 1*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 1*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_DEEP,
> +                      | 1*OPAL_PM_STOP_INST_DEEP \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(5) \
>                                  | OPAL_PM_PSSCR_MTL(7) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -563,14 +598,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>
>         {
>                 .name = "stop8",
> +               .compat_version = "stop8,v1",
> +               .desc = "core power gating and L2 clock gating",
>                 .latency_ns = 2000000,
>                 .residency_ns = 20000000,
> +               .legacy_state = 0,
>                 .flags = 1*OPAL_PM_DEC_STOP \
>                        | 1*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 1*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 1*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_DEEP,
> +                      | 1*OPAL_PM_STOP_INST_DEEP \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(8) \
>                                  | OPAL_PM_PSSCR_MTL(11) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -580,14 +619,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>
>         {
>                 .name = "stop11",
> +               .compat_version = "stop11,v1",
> +               .desc = "quad level core and cache power gating",
>                 .latency_ns = 10000000,
>                 .residency_ns = 100000000,
> +               .legacy_state = 0,
>                 .flags = 1*OPAL_PM_DEC_STOP \
>                        | 1*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 1*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 1*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_DEEP,
> +                      | 1*OPAL_PM_STOP_INST_DEEP \
> +                      | 1*OPAL_PM_STOP_CPUHOTPLUG,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(11) \
>                                  | OPAL_PM_PSSCR_MTL(11) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -917,34 +960,77 @@ void add_cpu_idle_state_properties(void)
>                                 }
>                         }
>                 }

> +               if (states[i].legacy_state) {
> +                       prlog(PR_NOTICE, "SLW: Enabling: %s\n", states[i].name);
>
> -               prlog(PR_INFO, "SLW: Enabling: %s\n", states[i].name);
> +                       /*
> +                        * If a state is supported add each of its property
> +                        * to its corresponding property buffer.
> +                        */
> +                       strncpy(name_buf, states[i].name, MAX_NAME_LEN);
> +                       name_buf = name_buf + strlen(states[i].name) + 1;
>
> -               /*
> -                * If a state is supported add each of its property
> -                * to its corresponding property buffer.
> -                */
> -               strncpy(name_buf, states[i].name, MAX_NAME_LEN);
> -               name_buf = name_buf + strlen(states[i].name) + 1;
> +                       *latency_ns_buf = cpu_to_fdt32(states[i].latency_ns);
> +                       latency_ns_buf++;
>
> -               *latency_ns_buf = cpu_to_fdt32(states[i].latency_ns);
> -               latency_ns_buf++;
> +                       *residency_ns_buf = cpu_to_fdt32(states[i].residency_ns);
> +                       residency_ns_buf++;
>
> -               *residency_ns_buf = cpu_to_fdt32(states[i].residency_ns);
> -               residency_ns_buf++;
> +                       *flags_buf = cpu_to_fdt32(states[i].flags);
> +                       flags_buf++;
>
> -               *flags_buf = cpu_to_fdt32(states[i].flags);
> -               flags_buf++;
> +                       *pm_ctrl_reg_val_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_val);
> +                       pm_ctrl_reg_val_buf++;
>
> -               *pm_ctrl_reg_val_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_val);
> -               pm_ctrl_reg_val_buf++;
> +                       *pm_ctrl_reg_mask_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_mask);
> +                       pm_ctrl_reg_mask_buf++;
>
> -               *pm_ctrl_reg_mask_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_mask);
> -               pm_ctrl_reg_mask_buf++;
> +                       /* Increment buffer length trackers */
> +                       name_buf_len += strlen(states[i].name) + 1;
> +                       num_supported_idle_states++;
>
> -               /* Increment buffer length trackers */
> -               name_buf_len += strlen(states[i].name) + 1;
> -               num_supported_idle_states++;

> +               } else {
> +                       struct dt_node *parent_node, *dt_state_node;
> +                       char version_buf[MAX_VERSION_LEN];
> +                       u32 *lat_buf, *res_buf;
> +                       u64 *flg_buf, *psscr_buf, *psscr_mask_buf;
> +
> +                       lat_buf = malloc(sizeof(u32));
> +                       res_buf = malloc(sizeof(u32));
> +                       flg_buf = malloc(sizeof(u64));
> +                       psscr_buf = malloc(sizeof(u64));
> +                       psscr_mask_buf = malloc(sizeof(u64));
> +                       prlog(PR_NOTICE, "SLW: Enabling: %s in new format\n", states[i].name);
> +
> +                       parent_node = dt_new_check(power_mgt,  "ibm,idle-states");
> +                       if (!parent_node)
> +                               prlog(PR_ERR, "Error creating ibm,idle-states\n");
> +
> +                       dt_state_node = dt_new_check(parent_node, states[i].name);
> +                       if (!dt_state_node)
> +                               prlog(PR_ERR, "Error creating %s\n", states[i].name);
> +
> +                       strncpy(version_buf, states[i].compat_version,
> +                                       strlen(states[i].compat_version)+1);
> +                       dt_add_property_strings(dt_state_node, "compatible", version_buf);
> +
> +                       *lat_buf = cpu_to_fdt32(states[i].latency_ns);
> +                       dt_add_property(dt_state_node, "latency-ns", lat_buf, sizeof(u32));
> +                       *res_buf = cpu_to_fdt32(states[i].residency_ns);
> +                       dt_add_property(dt_state_node, "residency-ns", res_buf, sizeof(u32));
> +                       *flg_buf = cpu_to_fdt64(states[i].flags);
> +                       dt_add_property(dt_state_node, "flags", flg_buf, sizeof(u64));
> +                       *psscr_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_val);
> +                       dt_add_property(dt_state_node, "psscr", psscr_buf, sizeof(u64));
> +                       *psscr_mask_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_mask);
> +                       dt_add_property(dt_state_node, "psscr-mask", psscr_mask_buf, sizeof(u64));
> +                       free(res_buf);
> +                       free(lat_buf);
> +                       free(flg_buf);
> +                       free(psscr_buf);
> +                       free(psscr_mask_buf);

All the malloc(), free() and manual endian swapping goes away if you
use dt_add_property_u64() and friends. The existing code uses
dt_add_property() directly because there isn't a better way to handle
dynamicly sized arrays, but you need to. Anyway, I doing this:

for_each_state() {
  if (states[i].legacy_state) {
     /* old blah */
  } else {
    /* new blah */
  }
}

Seems bad to me. We only want the legacy arrays to be used if the
kernel doesn't understand the new format, so the new format should
contain all the same information as the old one i.e.

for_each_state() {
  if (states[i].legacy_state) {
     /* old blah */
  }
}

for_each_state() {
   /* new blah*/
}

So if you understand the new format then you only need to parse the
new format and the old one can be ignored.

> +
> +               }
>
>         }
>
> diff --git a/include/opal-api.h b/include/opal-api.h
> index 0b0ae196..f54f3862 100644
> --- a/include/opal-api.h
> +++ b/include/opal-api.h
> @@ -292,6 +292,13 @@
>  #define OPAL_PM_STOP_INST_FAST         0x00100000
>  #define OPAL_PM_STOP_INST_DEEP         0x00200000
>
> +/*
> + * Flags for stop states to distinguish between cpuidle and
> + * cpuoffline type of states.
> + */
> +#define OPAL_PM_STOP_CPUIDLE           0x01000000
> +#define OPAL_PM_STOP_CPUHOTPLUG                0x02000000

Seems a bit arbitrary and I don't think it's firmware's job to decide
if a state is "cpu hotplug" or not. IIRC Raptor are interested in
using STOP11 to support a hibernate mode so there are uses for it
besides removing a CPU.

> +
>  #ifndef __ASSEMBLY__
>
>  #include <stdbool.h>
> --
> 2.17.1
>
> _______________________________________________
> Skiboot mailing list
> Skiboot@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
Abhishek Goel Nov. 21, 2019, 11:54 a.m. UTC | #2
Hi Oliver,

Thanks for the review.

We(My team) have had a deep design discussion on the kernel patchset

corresponding to this patch with Nicholas Piggin. And we will most probably

not go with the current design proposed in this skiboot patch and kernel

patch set. I will post patches on top of little endian skiboot patches 
of Nicholas.

This is the kernel patch set : https://www.lkml.org/lkml/2019/8/23/77

-- Abhishek


On 11/18/2019 06:35 AM, Oliver O'Halloran wrote:
> On Fri, Aug 9, 2019 at 8:55 PM Abhishek Goel <huntbag@linux.vnet.ibm.com> wrote:
>> Add stop states under ibm,idle-states in addition to the current array
>> based device tree properties.
> Hi Ahbishek,
>
> Sorry, about the late response. I got bogged down on other things for
> a whle, but I'm trying to clear out the patchwork backlog now.
>
>> New device tree format adds a compatible flag which has version
>> corresponding to every state, so that only kernel which has the capability
>> to handle the version of stop state will enable it. Drawback of the array
>> based dt node is that versioning of idle states is not possible.
>>
>> Older kernel will still see stop0 and stop0_lite in older format and we
>> will deprecate it after some time.
>>
>> Consider a case that stop4 has a bug. We take the following steps to
>> mitigate the problem.
>>
>> 1) Change compatible string for stop4 in OPAL to "stop4,v2" from
>> "stop4,v1", i.e. basicallly bump up the previous version and ship the
>> new firmware.
>> 2) The kernel will ignore stop4 as it won't be able to recognize this
>> new version. Kernel will also ignore all the deeper states because its
>> possible that a cpu have requested for a deeper state but was never able
>> to enter into it. But we will still have shallower states that are there
>> before stop 4. This, thus prevents from completely disabling stop states.
>>
>> Linux kernel can now look at the version string and decide if it has the
>> ability to handle that idle state. Henceforth, if kernel does not know
>> about a version, it will skip that state and all the deeper state.
> How does the kernel know if one state node deeper than another? What
> I'm really wondering is whether we need a reg for the individual
> states e.g.:
>
> stop@11 {
>   reg= <11>
>   compatible="stop11-v1"
>   ...
> }
> stop@4 {
>   reg = <4>;
>   compatible="stop4-v1";
>   ...
> }
>
>> Once when the workaround are implemented into the kernel, we can bump up
>> the known version in kernel for that state, so that support can be
>> enabled once again in kernel.
>>
>> New Device-tree :
>>
>> Final output
>>         power-mgt {
>>              ...
>>           ibm,enabled-stop-levels = <0xec000000>;
>>           ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
>>           ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
>>           ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
>>           ibm,cpu-idle-state-flags = <0x100000 0x101000>;
>>           ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
>>           ibm,idle-states {
>>                       stop4 {
>>                           flags = <0x207000>;
>>                           compatible = "stop4,v1",
>>                           psscr-mask = <0x0 0x3003ff>;
>>                           latency-ns = <0x186a0>;
>>                           residency-ns = <0x989680>;
>>                           psscr = <0x0 0x300374>;
>>                    };
>>                      ...
>>                      stop11 {
>>                           ...
>>                           compatible = "stop11,v1",
>>                           ...
>>                    };
>>               };
>>
>> Signed-off-by: Abhishek Goel <huntbag@linux.vnet.ibm.com>
>> ---
>>   hw/slw.c           | 146 +++++++++++++++++++++++++++++++++++----------
>>   include/opal-api.h |   7 +++
>>   2 files changed, 123 insertions(+), 30 deletions(-)
>>
>> diff --git a/hw/slw.c b/hw/slw.c
>> index c872b630..1c857d5a 100644
>> --- a/hw/slw.c
>> +++ b/hw/slw.c
>> @@ -370,9 +370,15 @@ static bool idle_prepare_core(struct proc_chip *chip, struct cpu_thread *c)
>>   }
>>
>>   /* Define device-tree fields */
>> -#define MAX_NAME_LEN   16
>> +#define MAX_NAME_LEN           16
>> +#define MAX_VERSION_LEN                25
>> +#define MAX_DESCRIPTION_LEN    60
>>   struct cpu_idle_states {
>>          char name[MAX_NAME_LEN];
>> +       /* Adding compatibility version for evey stop state */
> Pointless comment.
>
>> +       char compat_version[MAX_VERSION_LEN];
>> +       /* Adding description for every stop state */
> Same here.
>
>> +       char desc[MAX_DESCRIPTION_LEN];
>>          u32 latency_ns;
>>          u32 residency_ns;
>>          /*
>> @@ -381,7 +387,9 @@ struct cpu_idle_states {
>>           */
>>          u64 pm_ctrl_reg_val;
>>          u64 pm_ctrl_reg_mask;
>> -       u32 flags;
>> +       u64 flags; /* changing flags from u32 to u64 */
> and here
>
>> +       /* older kernel with newer firmware will use these states */
>> +       u32 legacy_state;
> That comment should probably be saying something like "legacy states
> go into the legacy STOP state arrays" or something. We try not to
> assume too much about the host kernel.
>
>>   };
>>
>>   static struct cpu_idle_states nap_only_cpu_idle_states[] = {
>> @@ -461,13 +469,20 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>>          {
>>                  .name = "stop0_lite", /* Enter stop0 with no state loss */
>>                  .latency_ns = 1000,
>> +               .compat_version = "stop0_lite,v1",
>> +               .desc = "thread level stop instruction dispatch",
>> +               .legacy_state = 1,
>>                  .residency_ns = 10000,
>> +               /* flags will now also have bits to identify if a state is
>> +                * of cpuidle or hotplug type
>> +                */
>>                  .flags = 0*OPAL_PM_DEC_STOP \
>>                         | 0*OPAL_PM_TIMEBASE_STOP  \
>>                         | 0*OPAL_PM_LOSE_USER_CONTEXT \
>>                         | 0*OPAL_PM_LOSE_HYP_CONTEXT \
>>                         | 0*OPAL_PM_LOSE_FULL_CONTEXT \
>> -                      | 1*OPAL_PM_STOP_INST_FAST,
>> +                      | 1*OPAL_PM_STOP_INST_FAST \
>> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>>                  .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(0) \
>>                                   | OPAL_PM_PSSCR_MTL(3) \
>>                                   | OPAL_PM_PSSCR_TR(3),
>> @@ -475,13 +490,17 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>>          {
>>                  .name = "stop0",
>>                  .latency_ns = 2000,
>> +               .compat_version = "stop0,v1",
>> +               .desc = "low latency SMT switching",
>>                  .residency_ns = 20000,
>> +               .legacy_state = 1,
>>                  .flags = 0*OPAL_PM_DEC_STOP \
>>                         | 0*OPAL_PM_TIMEBASE_STOP  \
>>                         | 1*OPAL_PM_LOSE_USER_CONTEXT \
>>                         | 0*OPAL_PM_LOSE_HYP_CONTEXT \
>>                         | 0*OPAL_PM_LOSE_FULL_CONTEXT \
>> -                      | 1*OPAL_PM_STOP_INST_FAST,
>> +                      | 1*OPAL_PM_STOP_INST_FAST \
>> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>>                  .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(0) \
>>                                   | OPAL_PM_PSSCR_MTL(3) \
>>                                   | OPAL_PM_PSSCR_TR(3) \
>> @@ -493,14 +512,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>>
>>          {
>>                  .name = "stop1",
>> +               .compat_version = "stop1,v1",
>> +               .desc = "SMT switching with basic clock gating",
>>                  .latency_ns = 5000,
>>                  .residency_ns = 50000,
>> +               .legacy_state = 0,
>>                  .flags = 0*OPAL_PM_DEC_STOP \
>>                         | 0*OPAL_PM_TIMEBASE_STOP  \
>>                         | 1*OPAL_PM_LOSE_USER_CONTEXT \
>>                         | 0*OPAL_PM_LOSE_HYP_CONTEXT \
>>                         | 0*OPAL_PM_LOSE_FULL_CONTEXT \
>> -                      | 1*OPAL_PM_STOP_INST_FAST,
>> +                      | 1*OPAL_PM_STOP_INST_FAST \
>> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>>                  .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(1) \
>>                                   | OPAL_PM_PSSCR_MTL(3) \
>>                                   | OPAL_PM_PSSCR_TR(3) \
>> @@ -514,14 +537,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>>
>>          {
>>                  .name = "stop2",
>> +               .compat_version = "stop2,v1",
>> +               .desc = "core level clock gating",
>>                  .latency_ns = 10000,
>>                  .residency_ns = 100000,
>> +               .legacy_state = 0,
>>                  .flags = 0*OPAL_PM_DEC_STOP \
>>                         | 0*OPAL_PM_TIMEBASE_STOP  \
>>                         | 1*OPAL_PM_LOSE_USER_CONTEXT \
>>                         | 0*OPAL_PM_LOSE_HYP_CONTEXT \
>>                         | 0*OPAL_PM_LOSE_FULL_CONTEXT \
>> -                      | 1*OPAL_PM_STOP_INST_FAST,
>> +                      | 1*OPAL_PM_STOP_INST_FAST \
>> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>>                  .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(2) \
>>                                   | OPAL_PM_PSSCR_MTL(3) \
>>                                   | OPAL_PM_PSSCR_TR(3) \
>> @@ -530,14 +557,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>>                  .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
>>          {
>>                  .name = "stop4",
>> +               .compat_version = "stop4,v1",
>> +               .desc = "core level power gating",
>>                  .latency_ns = 100000,
>>                  .residency_ns = 10000000,
>> +               .legacy_state = 0,
>>                  .flags = 0*OPAL_PM_DEC_STOP \
>>                         | 0*OPAL_PM_TIMEBASE_STOP  \
>>                         | 1*OPAL_PM_LOSE_USER_CONTEXT \
>>                         | 1*OPAL_PM_LOSE_HYP_CONTEXT \
>>                         | 1*OPAL_PM_LOSE_FULL_CONTEXT \
>> -                      | 1*OPAL_PM_STOP_INST_DEEP,
>> +                      | 1*OPAL_PM_STOP_INST_DEEP \
>> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>>                  .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(4) \
>>                                   | OPAL_PM_PSSCR_MTL(7) \
>>                                   | OPAL_PM_PSSCR_TR(3) \
>> @@ -546,14 +577,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>>                  .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
>>          {
>>                  .name = "stop5",
>> +               .compat_version = "stop5,v1",
>> +               .desc = "core level power gating and turbo boost benefit",
>>                  .latency_ns = 200000,
>>                  .residency_ns = 20000000,
>> +               .legacy_state = 0,
>>                  .flags = 0*OPAL_PM_DEC_STOP \
>>                         | 0*OPAL_PM_TIMEBASE_STOP  \
>>                         | 1*OPAL_PM_LOSE_USER_CONTEXT \
>>                         | 1*OPAL_PM_LOSE_HYP_CONTEXT \
>>                         | 1*OPAL_PM_LOSE_FULL_CONTEXT \
>> -                      | 1*OPAL_PM_STOP_INST_DEEP,
>> +                      | 1*OPAL_PM_STOP_INST_DEEP \
>> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>>                  .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(5) \
>>                                   | OPAL_PM_PSSCR_MTL(7) \
>>                                   | OPAL_PM_PSSCR_TR(3) \
>> @@ -563,14 +598,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>>
>>          {
>>                  .name = "stop8",
>> +               .compat_version = "stop8,v1",
>> +               .desc = "core power gating and L2 clock gating",
>>                  .latency_ns = 2000000,
>>                  .residency_ns = 20000000,
>> +               .legacy_state = 0,
>>                  .flags = 1*OPAL_PM_DEC_STOP \
>>                         | 1*OPAL_PM_TIMEBASE_STOP  \
>>                         | 1*OPAL_PM_LOSE_USER_CONTEXT \
>>                         | 1*OPAL_PM_LOSE_HYP_CONTEXT \
>>                         | 1*OPAL_PM_LOSE_FULL_CONTEXT \
>> -                      | 1*OPAL_PM_STOP_INST_DEEP,
>> +                      | 1*OPAL_PM_STOP_INST_DEEP \
>> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>>                  .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(8) \
>>                                   | OPAL_PM_PSSCR_MTL(11) \
>>                                   | OPAL_PM_PSSCR_TR(3) \
>> @@ -580,14 +619,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>>
>>          {
>>                  .name = "stop11",
>> +               .compat_version = "stop11,v1",
>> +               .desc = "quad level core and cache power gating",
>>                  .latency_ns = 10000000,
>>                  .residency_ns = 100000000,
>> +               .legacy_state = 0,
>>                  .flags = 1*OPAL_PM_DEC_STOP \
>>                         | 1*OPAL_PM_TIMEBASE_STOP  \
>>                         | 1*OPAL_PM_LOSE_USER_CONTEXT \
>>                         | 1*OPAL_PM_LOSE_HYP_CONTEXT \
>>                         | 1*OPAL_PM_LOSE_FULL_CONTEXT \
>> -                      | 1*OPAL_PM_STOP_INST_DEEP,
>> +                      | 1*OPAL_PM_STOP_INST_DEEP \
>> +                      | 1*OPAL_PM_STOP_CPUHOTPLUG,
>>                  .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(11) \
>>                                   | OPAL_PM_PSSCR_MTL(11) \
>>                                   | OPAL_PM_PSSCR_TR(3) \
>> @@ -917,34 +960,77 @@ void add_cpu_idle_state_properties(void)
>>                                  }
>>                          }
>>                  }
>> +               if (states[i].legacy_state) {
>> +                       prlog(PR_NOTICE, "SLW: Enabling: %s\n", states[i].name);
>>
>> -               prlog(PR_INFO, "SLW: Enabling: %s\n", states[i].name);
>> +                       /*
>> +                        * If a state is supported add each of its property
>> +                        * to its corresponding property buffer.
>> +                        */
>> +                       strncpy(name_buf, states[i].name, MAX_NAME_LEN);
>> +                       name_buf = name_buf + strlen(states[i].name) + 1;
>>
>> -               /*
>> -                * If a state is supported add each of its property
>> -                * to its corresponding property buffer.
>> -                */
>> -               strncpy(name_buf, states[i].name, MAX_NAME_LEN);
>> -               name_buf = name_buf + strlen(states[i].name) + 1;
>> +                       *latency_ns_buf = cpu_to_fdt32(states[i].latency_ns);
>> +                       latency_ns_buf++;
>>
>> -               *latency_ns_buf = cpu_to_fdt32(states[i].latency_ns);
>> -               latency_ns_buf++;
>> +                       *residency_ns_buf = cpu_to_fdt32(states[i].residency_ns);
>> +                       residency_ns_buf++;
>>
>> -               *residency_ns_buf = cpu_to_fdt32(states[i].residency_ns);
>> -               residency_ns_buf++;
>> +                       *flags_buf = cpu_to_fdt32(states[i].flags);
>> +                       flags_buf++;
>>
>> -               *flags_buf = cpu_to_fdt32(states[i].flags);
>> -               flags_buf++;
>> +                       *pm_ctrl_reg_val_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_val);
>> +                       pm_ctrl_reg_val_buf++;
>>
>> -               *pm_ctrl_reg_val_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_val);
>> -               pm_ctrl_reg_val_buf++;
>> +                       *pm_ctrl_reg_mask_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_mask);
>> +                       pm_ctrl_reg_mask_buf++;
>>
>> -               *pm_ctrl_reg_mask_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_mask);
>> -               pm_ctrl_reg_mask_buf++;
>> +                       /* Increment buffer length trackers */
>> +                       name_buf_len += strlen(states[i].name) + 1;
>> +                       num_supported_idle_states++;
>>
>> -               /* Increment buffer length trackers */
>> -               name_buf_len += strlen(states[i].name) + 1;
>> -               num_supported_idle_states++;
>> +               } else {
>> +                       struct dt_node *parent_node, *dt_state_node;
>> +                       char version_buf[MAX_VERSION_LEN];
>> +                       u32 *lat_buf, *res_buf;
>> +                       u64 *flg_buf, *psscr_buf, *psscr_mask_buf;
>> +
>> +                       lat_buf = malloc(sizeof(u32));
>> +                       res_buf = malloc(sizeof(u32));
>> +                       flg_buf = malloc(sizeof(u64));
>> +                       psscr_buf = malloc(sizeof(u64));
>> +                       psscr_mask_buf = malloc(sizeof(u64));
>> +                       prlog(PR_NOTICE, "SLW: Enabling: %s in new format\n", states[i].name);
>> +
>> +                       parent_node = dt_new_check(power_mgt,  "ibm,idle-states");
>> +                       if (!parent_node)
>> +                               prlog(PR_ERR, "Error creating ibm,idle-states\n");
>> +
>> +                       dt_state_node = dt_new_check(parent_node, states[i].name);
>> +                       if (!dt_state_node)
>> +                               prlog(PR_ERR, "Error creating %s\n", states[i].name);
>> +
>> +                       strncpy(version_buf, states[i].compat_version,
>> +                                       strlen(states[i].compat_version)+1);
>> +                       dt_add_property_strings(dt_state_node, "compatible", version_buf);
>> +
>> +                       *lat_buf = cpu_to_fdt32(states[i].latency_ns);
>> +                       dt_add_property(dt_state_node, "latency-ns", lat_buf, sizeof(u32));
>> +                       *res_buf = cpu_to_fdt32(states[i].residency_ns);
>> +                       dt_add_property(dt_state_node, "residency-ns", res_buf, sizeof(u32));
>> +                       *flg_buf = cpu_to_fdt64(states[i].flags);
>> +                       dt_add_property(dt_state_node, "flags", flg_buf, sizeof(u64));
>> +                       *psscr_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_val);
>> +                       dt_add_property(dt_state_node, "psscr", psscr_buf, sizeof(u64));
>> +                       *psscr_mask_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_mask);
>> +                       dt_add_property(dt_state_node, "psscr-mask", psscr_mask_buf, sizeof(u64));
>> +                       free(res_buf);
>> +                       free(lat_buf);
>> +                       free(flg_buf);
>> +                       free(psscr_buf);
>> +                       free(psscr_mask_buf);
> All the malloc(), free() and manual endian swapping goes away if you
> use dt_add_property_u64() and friends. The existing code uses
> dt_add_property() directly because there isn't a better way to handle
> dynamicly sized arrays, but you need to. Anyway, I doing this:
>
> for_each_state() {
>    if (states[i].legacy_state) {
>       /* old blah */
>    } else {
>      /* new blah */
>    }
> }
>
> Seems bad to me. We only want the legacy arrays to be used if the
> kernel doesn't understand the new format, so the new format should
> contain all the same information as the old one i.e.
>
> for_each_state() {
>    if (states[i].legacy_state) {
>       /* old blah */
>    }
> }
>
> for_each_state() {
>     /* new blah*/
> }
>
> So if you understand the new format then you only need to parse the
> new format and the old one can be ignored.
>
>> +
>> +               }
>>
>>          }
>>
>> diff --git a/include/opal-api.h b/include/opal-api.h
>> index 0b0ae196..f54f3862 100644
>> --- a/include/opal-api.h
>> +++ b/include/opal-api.h
>> @@ -292,6 +292,13 @@
>>   #define OPAL_PM_STOP_INST_FAST         0x00100000
>>   #define OPAL_PM_STOP_INST_DEEP         0x00200000
>>
>> +/*
>> + * Flags for stop states to distinguish between cpuidle and
>> + * cpuoffline type of states.
>> + */
>> +#define OPAL_PM_STOP_CPUIDLE           0x01000000
>> +#define OPAL_PM_STOP_CPUHOTPLUG                0x02000000
> Seems a bit arbitrary and I don't think it's firmware's job to decide
> if a state is "cpu hotplug" or not. IIRC Raptor are interested in
> using STOP11 to support a hibernate mode so there are uses for it
> besides removing a CPU.
>
>> +
>>   #ifndef __ASSEMBLY__
>>
>>   #include <stdbool.h>
>> --
>> 2.17.1
>>
>> _______________________________________________
>> Skiboot mailing list
>> Skiboot@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/skiboot
Klaus Heinrich Kiwi Nov. 25, 2019, 1:03 p.m. UTC | #3
On 11/17/2019 10:05 PM, Oliver O'Halloran wrote:
> On Fri, Aug 9, 2019 at 8:55 PM Abhishek Goel <huntbag@linux.vnet.ibm.com> wrote:
>> Add stop states under ibm,idle-states in addition to the current array
>> based device tree properties.
I noted that Abhishek answered that a complete redesign will take place, 
but I couldn't help to ask...
> How does the kernel know if one state node deeper than another? What
> I'm really wondering is whether we need a reg for the individual
> states e.g.:
>
> stop@11 {
>   reg= <11>
>   compatible="stop11-v1"
>   ...
> }
> stop@4 {
>   reg = <4>;
>   compatible="stop4-v1";
>   ...
> }

Your original patch apparently implies that stop states are naturally 
hierarchical (i.e., can only support stop4 if we support stop1, 2, 3). 
If that's the case, perhaps we should also have an hierarchical 
device-tree representation?

  -Klaus
Stewart Smith Nov. 25, 2019, 5:08 p.m. UTC | #4
> On 25 Nov 2019, at 05:04, Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com> wrote:
> 
> On 11/17/2019 10:05 PM, Oliver O'Halloran wrote:
>>> On Fri, Aug 9, 2019 at 8:55 PM Abhishek Goel <huntbag@linux.vnet.ibm.com> wrote:
>>> Add stop states under ibm,idle-states in addition to the current array
>>> based device tree properties.
> I noted that Abhishek answered that a complete redesign will take place, but I couldn't help to ask...
>> How does the kernel know if one state node deeper than another? What
>> I'm really wondering is whether we need a reg for the individual
>> states e.g.:
>> 
>> stop@11 {
>>  reg= <11>
>>  compatible="stop11-v1"
>>  ...
>> }
>> stop@4 {
>>  reg = <4>;
>>  compatible="stop4-v1";
>>  ...
>> }
> 
> Your original patch apparently implies that stop states are naturally hierarchical (i.e., can only support stop4 if we support stop1, 2, 3). If that's the case, perhaps we should also have an hierarchical device-tree representation?

Kind of. The higher the stop number the higher the saving and latency (ie the deeper the stop), but there’s no reason that supported numbers are linear. Supporting only 0lite, 1, and 5 is possible.
Abhishek Goel Nov. 29, 2019, 5:25 a.m. UTC | #5
Hi,


On 11/25/2019 10:38 PM, Stewart Smith wrote:
>> On 25 Nov 2019, at 05:04, Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com> wrote:
>>
>> On 11/17/2019 10:05 PM, Oliver O'Halloran wrote:
>>>> On Fri, Aug 9, 2019 at 8:55 PM Abhishek Goel <huntbag@linux.vnet.ibm.com> wrote:
>>>> Add stop states under ibm,idle-states in addition to the current array
>>>> based device tree properties.
>> I noted that Abhishek answered that a complete redesign will take place, but I couldn't help to ask...
>>> How does the kernel know if one state node deeper than another? What
>>> I'm really wondering is whether we need a reg for the individual
>>> states e.g.:
>>>
>>> stop@11 {
>>>   reg= <11>
>>>   compatible="stop11-v1"
>>>   ...
>>> }
>>> stop@4 {
>>>   reg = <4>;
>>>   compatible="stop4-v1";
>>>   ...
>>> }
>> Your original patch apparently implies that stop states are naturally hierarchical (i.e., can only support stop4 if we support stop1, 2, 3). If that's the case, perhaps we should also have an hierarchical device-tree representation?
> Kind of. The higher the stop number the higher the saving and latency (ie the deeper the stop), but there’s no reason that supported numbers are linear. Supporting only 0lite, 1, and 5 is possible.
> _______________________________________________
> Skiboot mailing list
> Skiboot@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
Let's say that only stop0, 1 and 5 are enabled state and other states 
have been disabled
and not supported because they have some bug. In case a cpu request 
stop5 and enters
stop, it is possible that it wakes up from a shallower state such as 
stop 2 which is a buggy
state. To prevent this, we must maintain linearity among the supported 
state.

--Abhishek
diff mbox series

Patch

diff --git a/hw/slw.c b/hw/slw.c
index c872b630..1c857d5a 100644
--- a/hw/slw.c
+++ b/hw/slw.c
@@ -370,9 +370,15 @@  static bool idle_prepare_core(struct proc_chip *chip, struct cpu_thread *c)
 }
 
 /* Define device-tree fields */
-#define MAX_NAME_LEN	16
+#define MAX_NAME_LEN		16
+#define MAX_VERSION_LEN		25
+#define MAX_DESCRIPTION_LEN	60
 struct cpu_idle_states {
 	char name[MAX_NAME_LEN];
+	/* Adding compatibility version for evey stop state */
+	char compat_version[MAX_VERSION_LEN];
+	/* Adding description for every stop state */
+	char desc[MAX_DESCRIPTION_LEN];
 	u32 latency_ns;
 	u32 residency_ns;
 	/*
@@ -381,7 +387,9 @@  struct cpu_idle_states {
 	 */
 	u64 pm_ctrl_reg_val;
 	u64 pm_ctrl_reg_mask;
-	u32 flags;
+	u64 flags; /* changing flags from u32 to u64 */
+	/* older kernel with newer firmware will use these states */
+	u32 legacy_state;
 };
 
 static struct cpu_idle_states nap_only_cpu_idle_states[] = {
@@ -461,13 +469,20 @@  static struct cpu_idle_states power9_cpu_idle_states[] = {
 	{
 		.name = "stop0_lite", /* Enter stop0 with no state loss */
 		.latency_ns = 1000,
+		.compat_version = "stop0_lite,v1",
+		.desc = "thread level stop instruction dispatch",
+		.legacy_state = 1,
 		.residency_ns = 10000,
+		/* flags will now also have bits to identify if a state is
+		 * of cpuidle or hotplug type
+		 */
 		.flags = 0*OPAL_PM_DEC_STOP \
 		       | 0*OPAL_PM_TIMEBASE_STOP  \
 		       | 0*OPAL_PM_LOSE_USER_CONTEXT \
 		       | 0*OPAL_PM_LOSE_HYP_CONTEXT \
 		       | 0*OPAL_PM_LOSE_FULL_CONTEXT \
-		       | 1*OPAL_PM_STOP_INST_FAST,
+		       | 1*OPAL_PM_STOP_INST_FAST \
+		       | 1*OPAL_PM_STOP_CPUIDLE,
 		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(0) \
 				 | OPAL_PM_PSSCR_MTL(3) \
 				 | OPAL_PM_PSSCR_TR(3),
@@ -475,13 +490,17 @@  static struct cpu_idle_states power9_cpu_idle_states[] = {
 	{
 		.name = "stop0",
 		.latency_ns = 2000,
+		.compat_version = "stop0,v1",
+		.desc = "low latency SMT switching",
 		.residency_ns = 20000,
+		.legacy_state = 1,
 		.flags = 0*OPAL_PM_DEC_STOP \
 		       | 0*OPAL_PM_TIMEBASE_STOP  \
 		       | 1*OPAL_PM_LOSE_USER_CONTEXT \
 		       | 0*OPAL_PM_LOSE_HYP_CONTEXT \
 		       | 0*OPAL_PM_LOSE_FULL_CONTEXT \
-		       | 1*OPAL_PM_STOP_INST_FAST,
+		       | 1*OPAL_PM_STOP_INST_FAST \
+		       | 1*OPAL_PM_STOP_CPUIDLE,
 		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(0) \
 				 | OPAL_PM_PSSCR_MTL(3) \
 				 | OPAL_PM_PSSCR_TR(3) \
@@ -493,14 +512,18 @@  static struct cpu_idle_states power9_cpu_idle_states[] = {
 
 	{
 		.name = "stop1",
+		.compat_version = "stop1,v1",
+		.desc = "SMT switching with basic clock gating",
 		.latency_ns = 5000,
 		.residency_ns = 50000,
+		.legacy_state = 0,
 		.flags = 0*OPAL_PM_DEC_STOP \
 		       | 0*OPAL_PM_TIMEBASE_STOP  \
 		       | 1*OPAL_PM_LOSE_USER_CONTEXT \
 		       | 0*OPAL_PM_LOSE_HYP_CONTEXT \
 		       | 0*OPAL_PM_LOSE_FULL_CONTEXT \
-		       | 1*OPAL_PM_STOP_INST_FAST,
+		       | 1*OPAL_PM_STOP_INST_FAST \
+		       | 1*OPAL_PM_STOP_CPUIDLE,
 		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(1) \
 				 | OPAL_PM_PSSCR_MTL(3) \
 				 | OPAL_PM_PSSCR_TR(3) \
@@ -514,14 +537,18 @@  static struct cpu_idle_states power9_cpu_idle_states[] = {
 
 	{
 		.name = "stop2",
+		.compat_version = "stop2,v1",
+		.desc = "core level clock gating",
 		.latency_ns = 10000,
 		.residency_ns = 100000,
+		.legacy_state = 0,
 		.flags = 0*OPAL_PM_DEC_STOP \
 		       | 0*OPAL_PM_TIMEBASE_STOP  \
 		       | 1*OPAL_PM_LOSE_USER_CONTEXT \
 		       | 0*OPAL_PM_LOSE_HYP_CONTEXT \
 		       | 0*OPAL_PM_LOSE_FULL_CONTEXT \
-		       | 1*OPAL_PM_STOP_INST_FAST,
+		       | 1*OPAL_PM_STOP_INST_FAST \
+		       | 1*OPAL_PM_STOP_CPUIDLE,
 		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(2) \
 				 | OPAL_PM_PSSCR_MTL(3) \
 				 | OPAL_PM_PSSCR_TR(3) \
@@ -530,14 +557,18 @@  static struct cpu_idle_states power9_cpu_idle_states[] = {
 		.pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
 	{
 		.name = "stop4",
+		.compat_version = "stop4,v1",
+		.desc = "core level power gating",
 		.latency_ns = 100000,
 		.residency_ns = 10000000,
+		.legacy_state = 0,
 		.flags = 0*OPAL_PM_DEC_STOP \
 		       | 0*OPAL_PM_TIMEBASE_STOP  \
 		       | 1*OPAL_PM_LOSE_USER_CONTEXT \
 		       | 1*OPAL_PM_LOSE_HYP_CONTEXT \
 		       | 1*OPAL_PM_LOSE_FULL_CONTEXT \
-		       | 1*OPAL_PM_STOP_INST_DEEP,
+		       | 1*OPAL_PM_STOP_INST_DEEP \
+		       | 1*OPAL_PM_STOP_CPUIDLE,
 		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(4) \
 				 | OPAL_PM_PSSCR_MTL(7) \
 				 | OPAL_PM_PSSCR_TR(3) \
@@ -546,14 +577,18 @@  static struct cpu_idle_states power9_cpu_idle_states[] = {
 		.pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
 	{
 		.name = "stop5",
+		.compat_version = "stop5,v1",
+		.desc = "core level power gating and turbo boost benefit",
 		.latency_ns = 200000,
 		.residency_ns = 20000000,
+		.legacy_state = 0,
 		.flags = 0*OPAL_PM_DEC_STOP \
 		       | 0*OPAL_PM_TIMEBASE_STOP  \
 		       | 1*OPAL_PM_LOSE_USER_CONTEXT \
 		       | 1*OPAL_PM_LOSE_HYP_CONTEXT \
 		       | 1*OPAL_PM_LOSE_FULL_CONTEXT \
-		       | 1*OPAL_PM_STOP_INST_DEEP,
+		       | 1*OPAL_PM_STOP_INST_DEEP \
+		       | 1*OPAL_PM_STOP_CPUIDLE,
 		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(5) \
 				 | OPAL_PM_PSSCR_MTL(7) \
 				 | OPAL_PM_PSSCR_TR(3) \
@@ -563,14 +598,18 @@  static struct cpu_idle_states power9_cpu_idle_states[] = {
 
 	{
 		.name = "stop8",
+		.compat_version = "stop8,v1",
+		.desc = "core power gating and L2 clock gating",
 		.latency_ns = 2000000,
 		.residency_ns = 20000000,
+		.legacy_state = 0,
 		.flags = 1*OPAL_PM_DEC_STOP \
 		       | 1*OPAL_PM_TIMEBASE_STOP  \
 		       | 1*OPAL_PM_LOSE_USER_CONTEXT \
 		       | 1*OPAL_PM_LOSE_HYP_CONTEXT \
 		       | 1*OPAL_PM_LOSE_FULL_CONTEXT \
-		       | 1*OPAL_PM_STOP_INST_DEEP,
+		       | 1*OPAL_PM_STOP_INST_DEEP \
+		       | 1*OPAL_PM_STOP_CPUIDLE,
 		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(8) \
 				 | OPAL_PM_PSSCR_MTL(11) \
 				 | OPAL_PM_PSSCR_TR(3) \
@@ -580,14 +619,18 @@  static struct cpu_idle_states power9_cpu_idle_states[] = {
 
 	{
 		.name = "stop11",
+		.compat_version = "stop11,v1",
+		.desc = "quad level core and cache power gating",
 		.latency_ns = 10000000,
 		.residency_ns = 100000000,
+		.legacy_state = 0,
 		.flags = 1*OPAL_PM_DEC_STOP \
 		       | 1*OPAL_PM_TIMEBASE_STOP  \
 		       | 1*OPAL_PM_LOSE_USER_CONTEXT \
 		       | 1*OPAL_PM_LOSE_HYP_CONTEXT \
 		       | 1*OPAL_PM_LOSE_FULL_CONTEXT \
-		       | 1*OPAL_PM_STOP_INST_DEEP,
+		       | 1*OPAL_PM_STOP_INST_DEEP \
+		       | 1*OPAL_PM_STOP_CPUHOTPLUG,
 		.pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(11) \
 				 | OPAL_PM_PSSCR_MTL(11) \
 				 | OPAL_PM_PSSCR_TR(3) \
@@ -917,34 +960,77 @@  void add_cpu_idle_state_properties(void)
 				}
 			}
 		}
+		if (states[i].legacy_state) {
+			prlog(PR_NOTICE, "SLW: Enabling: %s\n", states[i].name);
 
-		prlog(PR_INFO, "SLW: Enabling: %s\n", states[i].name);
+			/*
+			 * If a state is supported add each of its property
+			 * to its corresponding property buffer.
+			 */
+			strncpy(name_buf, states[i].name, MAX_NAME_LEN);
+			name_buf = name_buf + strlen(states[i].name) + 1;
 
-		/*
-		 * If a state is supported add each of its property
-		 * to its corresponding property buffer.
-		 */
-		strncpy(name_buf, states[i].name, MAX_NAME_LEN);
-		name_buf = name_buf + strlen(states[i].name) + 1;
+			*latency_ns_buf = cpu_to_fdt32(states[i].latency_ns);
+			latency_ns_buf++;
 
-		*latency_ns_buf = cpu_to_fdt32(states[i].latency_ns);
-		latency_ns_buf++;
+			*residency_ns_buf = cpu_to_fdt32(states[i].residency_ns);
+			residency_ns_buf++;
 
-		*residency_ns_buf = cpu_to_fdt32(states[i].residency_ns);
-		residency_ns_buf++;
+			*flags_buf = cpu_to_fdt32(states[i].flags);
+			flags_buf++;
 
-		*flags_buf = cpu_to_fdt32(states[i].flags);
-		flags_buf++;
+			*pm_ctrl_reg_val_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_val);
+			pm_ctrl_reg_val_buf++;
 
-		*pm_ctrl_reg_val_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_val);
-		pm_ctrl_reg_val_buf++;
+			*pm_ctrl_reg_mask_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_mask);
+			pm_ctrl_reg_mask_buf++;
 
-		*pm_ctrl_reg_mask_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_mask);
-		pm_ctrl_reg_mask_buf++;
+			/* Increment buffer length trackers */
+			name_buf_len += strlen(states[i].name) + 1;
+			num_supported_idle_states++;
 
-		/* Increment buffer length trackers */
-		name_buf_len += strlen(states[i].name) + 1;
-		num_supported_idle_states++;
+		} else {
+			struct dt_node *parent_node, *dt_state_node;
+			char version_buf[MAX_VERSION_LEN];
+			u32 *lat_buf, *res_buf;
+			u64 *flg_buf, *psscr_buf, *psscr_mask_buf;
+
+			lat_buf = malloc(sizeof(u32));
+			res_buf = malloc(sizeof(u32));
+			flg_buf = malloc(sizeof(u64));
+			psscr_buf = malloc(sizeof(u64));
+			psscr_mask_buf = malloc(sizeof(u64));
+			prlog(PR_NOTICE, "SLW: Enabling: %s in new format\n", states[i].name);
+
+			parent_node = dt_new_check(power_mgt,  "ibm,idle-states");
+			if (!parent_node)
+				prlog(PR_ERR, "Error creating ibm,idle-states\n");
+
+			dt_state_node = dt_new_check(parent_node, states[i].name);
+			if (!dt_state_node)
+				prlog(PR_ERR, "Error creating %s\n", states[i].name);
+
+			strncpy(version_buf, states[i].compat_version,
+					strlen(states[i].compat_version)+1);
+			dt_add_property_strings(dt_state_node, "compatible", version_buf);
+
+			*lat_buf = cpu_to_fdt32(states[i].latency_ns);
+			dt_add_property(dt_state_node, "latency-ns", lat_buf, sizeof(u32));
+			*res_buf = cpu_to_fdt32(states[i].residency_ns);
+			dt_add_property(dt_state_node, "residency-ns", res_buf, sizeof(u32));
+			*flg_buf = cpu_to_fdt64(states[i].flags);
+			dt_add_property(dt_state_node, "flags", flg_buf, sizeof(u64));
+			*psscr_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_val);
+			dt_add_property(dt_state_node, "psscr", psscr_buf, sizeof(u64));
+			*psscr_mask_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_mask);
+			dt_add_property(dt_state_node, "psscr-mask", psscr_mask_buf, sizeof(u64));
+			free(res_buf);
+			free(lat_buf);
+			free(flg_buf);
+			free(psscr_buf);
+			free(psscr_mask_buf);
+
+		}
 
 	}
 
diff --git a/include/opal-api.h b/include/opal-api.h
index 0b0ae196..f54f3862 100644
--- a/include/opal-api.h
+++ b/include/opal-api.h
@@ -292,6 +292,13 @@ 
 #define OPAL_PM_STOP_INST_FAST		0x00100000
 #define OPAL_PM_STOP_INST_DEEP		0x00200000
 
+/*
+ * Flags for stop states to distinguish between cpuidle and
+ * cpuoffline type of states.
+ */
+#define OPAL_PM_STOP_CPUIDLE		0x01000000
+#define OPAL_PM_STOP_CPUHOTPLUG		0x02000000
+
 #ifndef __ASSEMBLY__
 
 #include <stdbool.h>