diff mbox series

powerpc/pseries: Add pool idle time at LPAR boot

Message ID 20240405101340.149171-1-sshegde@linux.ibm.com (mailing list archive)
State Superseded
Headers show
Series powerpc/pseries: Add pool idle time at LPAR boot | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 23 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.

Commit Message

Shrikanth Hegde April 5, 2024, 10:13 a.m. UTC
When there are no options specified for lparstat, it is expected to
give reports since LPAR(Logical Partition) boot. App is an indicator
for available processor pool in an Shared Processor LPAR(SPLPAR). App is
derived using pool_idle_time which is obtained using H_PIC call.

The interval based reports show correct App value while since boot
report shows very high App values. This happens because in that case app
is obtained by dividing pool idle time by LPAR uptime. Since pool idle
time is reported by the PowerVM hypervisor since its boot, it need not
align with LPAR boot. This leads to large App values.

To fix that export boot pool idle time in lparcfg and powerpc-utils will
use this info to derive App as below for since boot reports.

App = (pool idle time - boot pool idle time) / (uptime * timebase)

Results:: Observe app values.
====================== Shared LPAR ================================
lparstat
System Configuration
type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00

reboot
stress-ng --cpu=$(nproc) -t 600
sleep 600
So in this case app is expected to close to 37-6=31.

====== 6.9-rc1 and lparstat 1.3.10  =============
%user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
----- ----- -----    -----    ----- ----- ----- ----- ----- -----
47.48  0.01  0.00    52.51     0.00  0.00 47.49 69099.72 541547    21

=== With this patch and powerpc-utils patch to do the above equation ===
%user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
----- ----- -----    -----    ----- ----- ----- ----- ----- -----
47.48  0.01  0.00    52.51     5.73 47.75 47.49 31.21 541753    21
=====================================================================

Note: physc, purr/idle purr being inaccurate is being handled in a
separate patch in powerpc-utils tree.

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
Note:

This patch needs to merged first in the kernel for the powerpc-utils
patches to work. powerpc-utils patches will be posted to its mailing
list and link would be found in the reply to this patch if available.

arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
 1 file changed, 7 insertions(+)

--
2.39.3

Comments

Shrikanth Hegde April 5, 2024, 11:14 a.m. UTC | #1
On 4/5/24 3:43 PM, Shrikanth Hegde wrote:
> When there are no options specified for lparstat, it is expected to
> give reports since LPAR(Logical Partition) boot. App is an indicator
> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
> derived using pool_idle_time which is obtained using H_PIC call.
> 
powerpc-utils link: https://groups.google.com/g/powerpc-utils-devel/c/WZFxrm2aCzU
Nathan Lynch April 5, 2024, 12:49 p.m. UTC | #2
Shrikanth Hegde <sshegde@linux.ibm.com> writes:
> When there are no options specified for lparstat, it is expected to
> give reports since LPAR(Logical Partition) boot. App is an indicator
> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
> derived using pool_idle_time which is obtained using H_PIC call.

If "App" is short for "Available Procesoor Pool" then it should be
written "APP" and the it should be introduced and defined more clearly
than this.


> The interval based reports show correct App value while since boot
> report shows very high App values. This happens because in that case app
> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
> time is reported by the PowerVM hypervisor since its boot, it need not
> align with LPAR boot. This leads to large App values.
>
> To fix that export boot pool idle time in lparcfg and powerpc-utils will
> use this info to derive App as below for since boot reports.
>
> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>
> Results:: Observe app values.
> ====================== Shared LPAR ================================
> lparstat
> System Configuration
> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>
> reboot
> stress-ng --cpu=$(nproc) -t 600
> sleep 600
> So in this case app is expected to close to 37-6=31.
>
> ====== 6.9-rc1 and lparstat 1.3.10  =============
> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
> 47.48  0.01  0.00    52.51     0.00  0.00 47.49 69099.72 541547    21
>
> === With this patch and powerpc-utils patch to do the above equation ===
> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
> 47.48  0.01  0.00    52.51     5.73 47.75 47.49 31.21 541753    21
> =====================================================================
>
> Note: physc, purr/idle purr being inaccurate is being handled in a
> separate patch in powerpc-utils tree.
>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
> Note:
>
> This patch needs to merged first in the kernel for the powerpc-utils
> patches to work. powerpc-utils patches will be posted to its mailing
> list and link would be found in the reply to this patch if available.
>
> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
> index f73c4d1c26af..8df4e7c529d7 100644
> --- a/arch/powerpc/platforms/pseries/lparcfg.c
> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>  	return rc;
>  }
>
> +unsigned long boot_pool_idle_time;

Should be static, and u64. Better to use explicitly sized types for data
at the kernel-hypervisor boundary.

> +
>  /*
>   * parse_ppp_data
>   * Parse out the data returned from h_get_ppp and h_pic
> @@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m)
>  		h_pic(&pool_idle_time, &pool_procs);
>  		seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
>  		seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
> +		seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time);

If boot_pool_idle_time is unsigned then the format string should be %ul
or similar, not %ld.

>  	}
>
>  	seq_printf(m, "unallocated_capacity_weight=%d\n",
> @@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = {
>  static int __init lparcfg_init(void)
>  {
>  	umode_t mode = 0444;
> +	unsigned long num_procs;
>
>  	/* Allow writing if we have FW_FEATURE_SPLPAR */
>  	if (firmware_has_feature(FW_FEATURE_SPLPAR))
> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>  		printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>  		return -EIO;
>  	}
> +
> +	h_pic(&boot_pool_idle_time, &num_procs);

h_pic() can fail, leaving the out parameters uninitialized.

> +
>  	return 0;
>  }
>  machine_device_initcall(pseries, lparcfg_init);
> --
> 2.39.3
Shrikanth Hegde April 5, 2024, 5:02 p.m. UTC | #3
On 4/5/24 6:19 PM, Nathan Lynch wrote:
> Shrikanth Hegde <sshegde@linux.ibm.com> writes:

Hi Nathan, Thanks for reviewing this.

>> When there are no options specified for lparstat, it is expected to
>> give reports since LPAR(Logical Partition) boot. App is an indicator
>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
>> derived using pool_idle_time which is obtained using H_PIC call.
> 
> If "App" is short for "Available Procesoor Pool" then it should be
> written "APP" and the it should be introduced and defined more clearly
> than this.
> 

Ok.I reworded it for v2. 

yes APP is Available Processor Pool. 

> 
>> The interval based reports show correct App value while since boot
>> report shows very high App values. This happens because in that case app
>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
>> time is reported by the PowerVM hypervisor since its boot, it need not
>> align with LPAR boot. This leads to large App values.
>>
>> To fix that export boot pool idle time in lparcfg and powerpc-utils will
>> use this info to derive App as below for since boot reports.
>>
>> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>>
>> Results:: Observe app values.
>> ====================== Shared LPAR ================================
>> lparstat
>> System Configuration
>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>>
>> reboot
>> stress-ng --cpu=$(nproc) -t 600
>> sleep 600
>> So in this case app is expected to close to 37-6=31.
>>
>> ====== 6.9-rc1 and lparstat 1.3.10  =============
>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>> 47.48  0.01  0.00    52.51     0.00  0.00 47.49 69099.72 541547    21
>>
>> === With this patch and powerpc-utils patch to do the above equation ===
>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>> 47.48  0.01  0.00    52.51     5.73 47.75 47.49 31.21 541753    21
>> =====================================================================
>>
>> Note: physc, purr/idle purr being inaccurate is being handled in a
>> separate patch in powerpc-utils tree.
>>
>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>> ---
>> Note:
>>
>> This patch needs to merged first in the kernel for the powerpc-utils
>> patches to work. powerpc-utils patches will be posted to its mailing
>> list and link would be found in the reply to this patch if available.
>>
>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
>> index f73c4d1c26af..8df4e7c529d7 100644
>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>>  	return rc;
>>  }
>>
>> +unsigned long boot_pool_idle_time;
> 
> Should be static, and u64. Better to use explicitly sized types for data
> at the kernel-hypervisor boundary.

Current usage of h_pic doesn't follow this either. Are you suggesting we change that 
as well? Or is this applicable to only boot_pool_idle_time?

For example in parse_ppp_data: 

        if (lppaca_shared_proc()) {
                unsigned long pool_idle_time, pool_procs;

		h_pic(&pool_idle_time, &pool_procs);
                seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
                seq_printf(m, "pool_num_procs=%ld\n", pool_procs);

> 
>> +
>>  /*
>>   * parse_ppp_data
>>   * Parse out the data returned from h_get_ppp and h_pic
>> @@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m)
>>  		h_pic(&pool_idle_time, &pool_procs);
>>  		seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
>>  		seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
>> +		seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time);
> 
> If boot_pool_idle_time is unsigned then the format string should be %ul
> or similar, not %ld.
> 
>>  	}
>>
>>  	seq_printf(m, "unallocated_capacity_weight=%d\n",
>> @@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = {
>>  static int __init lparcfg_init(void)
>>  {
>>  	umode_t mode = 0444;
>> +	unsigned long num_procs;
>>
>>  	/* Allow writing if we have FW_FEATURE_SPLPAR */
>>  	if (firmware_has_feature(FW_FEATURE_SPLPAR))
>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>>  		printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>>  		return -EIO;
>>  	}
>> +
>> +	h_pic(&boot_pool_idle_time, &num_procs);
> 
> h_pic() can fail, leaving the out parameters uninitialized.

Naveen pointed to me this a while ago, but I forgot. 

Currently h_pic return value is not checked at all, either at boor or at runtime. 
When it fails, should we re-try or just print a kernel debug? What would be expected 
behavior? because if it fails, it would anyway result in wrong values of app even 
if the variables are initialized to 0.

> 
>> +
>>  	return 0;
>>  }
>>  machine_device_initcall(pseries, lparcfg_init);
>> --
>> 2.39.3
Nathan Lynch April 5, 2024, 8:03 p.m. UTC | #4
Shrikanth Hegde <sshegde@linux.ibm.com> writes:
> On 4/5/24 6:19 PM, Nathan Lynch wrote:
>> Shrikanth Hegde <sshegde@linux.ibm.com> writes:
>
> Hi Nathan, Thanks for reviewing this.
>
>>> When there are no options specified for lparstat, it is expected to
>>> give reports since LPAR(Logical Partition) boot. App is an indicator
>>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
>>> derived using pool_idle_time which is obtained using H_PIC call.
>> 
>> If "App" is short for "Available Procesoor Pool" then it should be
>> written "APP" and the it should be introduced and defined more clearly
>> than this.
>> 
>
> Ok.I reworded it for v2. 
>
> yes APP is Available Processor Pool. 
>
>> 
>>> The interval based reports show correct App value while since boot
>>> report shows very high App values. This happens because in that case app
>>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
>>> time is reported by the PowerVM hypervisor since its boot, it need not
>>> align with LPAR boot. This leads to large App values.
>>>
>>> To fix that export boot pool idle time in lparcfg and powerpc-utils will
>>> use this info to derive App as below for since boot reports.
>>>
>>> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>>>
>>> Results:: Observe app values.
>>> ====================== Shared LPAR ================================
>>> lparstat
>>> System Configuration
>>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>>>
>>> reboot
>>> stress-ng --cpu=$(nproc) -t 600
>>> sleep 600
>>> So in this case app is expected to close to 37-6=31.
>>>
>>> ====== 6.9-rc1 and lparstat 1.3.10  =============
>>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>>> 47.48  0.01  0.00    52.51     0.00  0.00 47.49 69099.72 541547    21
>>>
>>> === With this patch and powerpc-utils patch to do the above equation ===
>>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>>> 47.48  0.01  0.00    52.51     5.73 47.75 47.49 31.21 541753    21
>>> =====================================================================
>>>
>>> Note: physc, purr/idle purr being inaccurate is being handled in a
>>> separate patch in powerpc-utils tree.
>>>
>>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>>> ---
>>> Note:
>>>
>>> This patch needs to merged first in the kernel for the powerpc-utils
>>> patches to work. powerpc-utils patches will be posted to its mailing
>>> list and link would be found in the reply to this patch if available.
>>>
>>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
>>> index f73c4d1c26af..8df4e7c529d7 100644
>>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>>>  	return rc;
>>>  }
>>>
>>> +unsigned long boot_pool_idle_time;
>> 
>> Should be static, and u64. Better to use explicitly sized types for data
>> at the kernel-hypervisor boundary.
>
> Current usage of h_pic doesn't follow this either. Are you suggesting we change that 
> as well?

Yes pretty much. h_pic() as currently written and used has some
problems:

  static unsigned h_pic(unsigned long *pool_idle_time,
                        unsigned long *num_procs)
  {
          unsigned long rc;
          unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
  
          rc = plpar_hcall(H_PIC, retbuf);
  
          *pool_idle_time = retbuf[0];
          *num_procs = retbuf[1];
  
          return rc;
  }

* Coerces the return value of plpar_hcall() to unsigned -- hcall
  errors are negative.
* Assigns *pool_idle_time and *num_procs using uninitialized
  data when H_PIC is unsuccessful.
* Assigns the outparams unconditionally; would be nicer if it allowed
  callers to pass NULL so they don't have to provide dummy inputs that
  aren't even used, as in your change.
* Should follow Linux -errno return value convention in the absence
  of a need for the specific hcall status in its callers.

>>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>>>  		printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>>>  		return -EIO;
>>>  	}
>>> +
>>> +	h_pic(&boot_pool_idle_time, &num_procs);
>> 
>> h_pic() can fail, leaving the out parameters uninitialized.
>
> Naveen pointed to me this a while ago, but I forgot. 
>
> Currently h_pic return value is not checked at all, either at boor or at runtime. 
> When it fails, should we re-try or just print a kernel debug? What would be expected 
> behavior? because if it fails, it would anyway result in wrong values of app even 
> if the variables are initialized to 0.

There's nothing in the spec for H_PIC that suggests retrying on failure.
I'm not really in favor of printing a message about it either; that
practice tends to result in log noise in non-PowerVM guests.

When H_PIC doesn't work the corresponding lines should not be emitted in
/proc/powerpc/lparcfg IMO.
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
index f73c4d1c26af..8df4e7c529d7 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -184,6 +184,8 @@  static unsigned h_pic(unsigned long *pool_idle_time,
 	return rc;
 }

+unsigned long boot_pool_idle_time;
+
 /*
  * parse_ppp_data
  * Parse out the data returned from h_get_ppp and h_pic
@@ -218,6 +220,7 @@  static void parse_ppp_data(struct seq_file *m)
 		h_pic(&pool_idle_time, &pool_procs);
 		seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
 		seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
+		seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time);
 	}

 	seq_printf(m, "unallocated_capacity_weight=%d\n",
@@ -792,6 +795,7 @@  static const struct proc_ops lparcfg_proc_ops = {
 static int __init lparcfg_init(void)
 {
 	umode_t mode = 0444;
+	unsigned long num_procs;

 	/* Allow writing if we have FW_FEATURE_SPLPAR */
 	if (firmware_has_feature(FW_FEATURE_SPLPAR))
@@ -801,6 +805,9 @@  static int __init lparcfg_init(void)
 		printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
 		return -EIO;
 	}
+
+	h_pic(&boot_pool_idle_time, &num_procs);
+
 	return 0;
 }
 machine_device_initcall(pseries, lparcfg_init);