Message ID | 20240405101340.149171-1-sshegde@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | powerpc/pseries: Add pool idle time at LPAR boot | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 23 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
On 4/5/24 3:43 PM, Shrikanth Hegde wrote: > When there are no options specified for lparstat, it is expected to > give reports since LPAR(Logical Partition) boot. App is an indicator > for available processor pool in an Shared Processor LPAR(SPLPAR). App is > derived using pool_idle_time which is obtained using H_PIC call. > powerpc-utils link: https://groups.google.com/g/powerpc-utils-devel/c/WZFxrm2aCzU
Shrikanth Hegde <sshegde@linux.ibm.com> writes: > When there are no options specified for lparstat, it is expected to > give reports since LPAR(Logical Partition) boot. App is an indicator > for available processor pool in an Shared Processor LPAR(SPLPAR). App is > derived using pool_idle_time which is obtained using H_PIC call. If "App" is short for "Available Procesoor Pool" then it should be written "APP" and the it should be introduced and defined more clearly than this. > The interval based reports show correct App value while since boot > report shows very high App values. This happens because in that case app > is obtained by dividing pool idle time by LPAR uptime. Since pool idle > time is reported by the PowerVM hypervisor since its boot, it need not > align with LPAR boot. This leads to large App values. > > To fix that export boot pool idle time in lparcfg and powerpc-utils will > use this info to derive App as below for since boot reports. > > App = (pool idle time - boot pool idle time) / (uptime * timebase) > > Results:: Observe app values. > ====================== Shared LPAR ================================ > lparstat > System Configuration > type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00 > > reboot > stress-ng --cpu=$(nproc) -t 600 > sleep 600 > So in this case app is expected to close to 37-6=31. > > ====== 6.9-rc1 and lparstat 1.3.10 ============= > %user %sys %wait %idle physc %entc lbusy app vcsw phint > ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- > 47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21 > > === With this patch and powerpc-utils patch to do the above equation === > %user %sys %wait %idle physc %entc lbusy app vcsw phint > ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- > 47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21 > ===================================================================== > > Note: physc, purr/idle purr being inaccurate is being handled in a > separate patch in powerpc-utils tree. > > Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> > --- > Note: > > This patch needs to merged first in the kernel for the powerpc-utils > patches to work. powerpc-utils patches will be posted to its mailing > list and link would be found in the reply to this patch if available. > > arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c > index f73c4d1c26af..8df4e7c529d7 100644 > --- a/arch/powerpc/platforms/pseries/lparcfg.c > +++ b/arch/powerpc/platforms/pseries/lparcfg.c > @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time, > return rc; > } > > +unsigned long boot_pool_idle_time; Should be static, and u64. Better to use explicitly sized types for data at the kernel-hypervisor boundary. > + > /* > * parse_ppp_data > * Parse out the data returned from h_get_ppp and h_pic > @@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m) > h_pic(&pool_idle_time, &pool_procs); > seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time); > seq_printf(m, "pool_num_procs=%ld\n", pool_procs); > + seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time); If boot_pool_idle_time is unsigned then the format string should be %ul or similar, not %ld. > } > > seq_printf(m, "unallocated_capacity_weight=%d\n", > @@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = { > static int __init lparcfg_init(void) > { > umode_t mode = 0444; > + unsigned long num_procs; > > /* Allow writing if we have FW_FEATURE_SPLPAR */ > if (firmware_has_feature(FW_FEATURE_SPLPAR)) > @@ -801,6 +805,9 @@ static int __init lparcfg_init(void) > printk(KERN_ERR "Failed to create powerpc/lparcfg\n"); > return -EIO; > } > + > + h_pic(&boot_pool_idle_time, &num_procs); h_pic() can fail, leaving the out parameters uninitialized. > + > return 0; > } > machine_device_initcall(pseries, lparcfg_init); > -- > 2.39.3
On 4/5/24 6:19 PM, Nathan Lynch wrote: > Shrikanth Hegde <sshegde@linux.ibm.com> writes: Hi Nathan, Thanks for reviewing this. >> When there are no options specified for lparstat, it is expected to >> give reports since LPAR(Logical Partition) boot. App is an indicator >> for available processor pool in an Shared Processor LPAR(SPLPAR). App is >> derived using pool_idle_time which is obtained using H_PIC call. > > If "App" is short for "Available Procesoor Pool" then it should be > written "APP" and the it should be introduced and defined more clearly > than this. > Ok.I reworded it for v2. yes APP is Available Processor Pool. > >> The interval based reports show correct App value while since boot >> report shows very high App values. This happens because in that case app >> is obtained by dividing pool idle time by LPAR uptime. Since pool idle >> time is reported by the PowerVM hypervisor since its boot, it need not >> align with LPAR boot. This leads to large App values. >> >> To fix that export boot pool idle time in lparcfg and powerpc-utils will >> use this info to derive App as below for since boot reports. >> >> App = (pool idle time - boot pool idle time) / (uptime * timebase) >> >> Results:: Observe app values. >> ====================== Shared LPAR ================================ >> lparstat >> System Configuration >> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00 >> >> reboot >> stress-ng --cpu=$(nproc) -t 600 >> sleep 600 >> So in this case app is expected to close to 37-6=31. >> >> ====== 6.9-rc1 and lparstat 1.3.10 ============= >> %user %sys %wait %idle physc %entc lbusy app vcsw phint >> ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- >> 47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21 >> >> === With this patch and powerpc-utils patch to do the above equation === >> %user %sys %wait %idle physc %entc lbusy app vcsw phint >> ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- >> 47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21 >> ===================================================================== >> >> Note: physc, purr/idle purr being inaccurate is being handled in a >> separate patch in powerpc-utils tree. >> >> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> >> --- >> Note: >> >> This patch needs to merged first in the kernel for the powerpc-utils >> patches to work. powerpc-utils patches will be posted to its mailing >> list and link would be found in the reply to this patch if available. >> >> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c >> index f73c4d1c26af..8df4e7c529d7 100644 >> --- a/arch/powerpc/platforms/pseries/lparcfg.c >> +++ b/arch/powerpc/platforms/pseries/lparcfg.c >> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time, >> return rc; >> } >> >> +unsigned long boot_pool_idle_time; > > Should be static, and u64. Better to use explicitly sized types for data > at the kernel-hypervisor boundary. Current usage of h_pic doesn't follow this either. Are you suggesting we change that as well? Or is this applicable to only boot_pool_idle_time? For example in parse_ppp_data: if (lppaca_shared_proc()) { unsigned long pool_idle_time, pool_procs; h_pic(&pool_idle_time, &pool_procs); seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time); seq_printf(m, "pool_num_procs=%ld\n", pool_procs); > >> + >> /* >> * parse_ppp_data >> * Parse out the data returned from h_get_ppp and h_pic >> @@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m) >> h_pic(&pool_idle_time, &pool_procs); >> seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time); >> seq_printf(m, "pool_num_procs=%ld\n", pool_procs); >> + seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time); > > If boot_pool_idle_time is unsigned then the format string should be %ul > or similar, not %ld. > >> } >> >> seq_printf(m, "unallocated_capacity_weight=%d\n", >> @@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = { >> static int __init lparcfg_init(void) >> { >> umode_t mode = 0444; >> + unsigned long num_procs; >> >> /* Allow writing if we have FW_FEATURE_SPLPAR */ >> if (firmware_has_feature(FW_FEATURE_SPLPAR)) >> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void) >> printk(KERN_ERR "Failed to create powerpc/lparcfg\n"); >> return -EIO; >> } >> + >> + h_pic(&boot_pool_idle_time, &num_procs); > > h_pic() can fail, leaving the out parameters uninitialized. Naveen pointed to me this a while ago, but I forgot. Currently h_pic return value is not checked at all, either at boor or at runtime. When it fails, should we re-try or just print a kernel debug? What would be expected behavior? because if it fails, it would anyway result in wrong values of app even if the variables are initialized to 0. > >> + >> return 0; >> } >> machine_device_initcall(pseries, lparcfg_init); >> -- >> 2.39.3
Shrikanth Hegde <sshegde@linux.ibm.com> writes: > On 4/5/24 6:19 PM, Nathan Lynch wrote: >> Shrikanth Hegde <sshegde@linux.ibm.com> writes: > > Hi Nathan, Thanks for reviewing this. > >>> When there are no options specified for lparstat, it is expected to >>> give reports since LPAR(Logical Partition) boot. App is an indicator >>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is >>> derived using pool_idle_time which is obtained using H_PIC call. >> >> If "App" is short for "Available Procesoor Pool" then it should be >> written "APP" and the it should be introduced and defined more clearly >> than this. >> > > Ok.I reworded it for v2. > > yes APP is Available Processor Pool. > >> >>> The interval based reports show correct App value while since boot >>> report shows very high App values. This happens because in that case app >>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle >>> time is reported by the PowerVM hypervisor since its boot, it need not >>> align with LPAR boot. This leads to large App values. >>> >>> To fix that export boot pool idle time in lparcfg and powerpc-utils will >>> use this info to derive App as below for since boot reports. >>> >>> App = (pool idle time - boot pool idle time) / (uptime * timebase) >>> >>> Results:: Observe app values. >>> ====================== Shared LPAR ================================ >>> lparstat >>> System Configuration >>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00 >>> >>> reboot >>> stress-ng --cpu=$(nproc) -t 600 >>> sleep 600 >>> So in this case app is expected to close to 37-6=31. >>> >>> ====== 6.9-rc1 and lparstat 1.3.10 ============= >>> %user %sys %wait %idle physc %entc lbusy app vcsw phint >>> ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- >>> 47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21 >>> >>> === With this patch and powerpc-utils patch to do the above equation === >>> %user %sys %wait %idle physc %entc lbusy app vcsw phint >>> ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- >>> 47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21 >>> ===================================================================== >>> >>> Note: physc, purr/idle purr being inaccurate is being handled in a >>> separate patch in powerpc-utils tree. >>> >>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> >>> --- >>> Note: >>> >>> This patch needs to merged first in the kernel for the powerpc-utils >>> patches to work. powerpc-utils patches will be posted to its mailing >>> list and link would be found in the reply to this patch if available. >>> >>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c >>> index f73c4d1c26af..8df4e7c529d7 100644 >>> --- a/arch/powerpc/platforms/pseries/lparcfg.c >>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c >>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time, >>> return rc; >>> } >>> >>> +unsigned long boot_pool_idle_time; >> >> Should be static, and u64. Better to use explicitly sized types for data >> at the kernel-hypervisor boundary. > > Current usage of h_pic doesn't follow this either. Are you suggesting we change that > as well? Yes pretty much. h_pic() as currently written and used has some problems: static unsigned h_pic(unsigned long *pool_idle_time, unsigned long *num_procs) { unsigned long rc; unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; rc = plpar_hcall(H_PIC, retbuf); *pool_idle_time = retbuf[0]; *num_procs = retbuf[1]; return rc; } * Coerces the return value of plpar_hcall() to unsigned -- hcall errors are negative. * Assigns *pool_idle_time and *num_procs using uninitialized data when H_PIC is unsuccessful. * Assigns the outparams unconditionally; would be nicer if it allowed callers to pass NULL so they don't have to provide dummy inputs that aren't even used, as in your change. * Should follow Linux -errno return value convention in the absence of a need for the specific hcall status in its callers. >>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void) >>> printk(KERN_ERR "Failed to create powerpc/lparcfg\n"); >>> return -EIO; >>> } >>> + >>> + h_pic(&boot_pool_idle_time, &num_procs); >> >> h_pic() can fail, leaving the out parameters uninitialized. > > Naveen pointed to me this a while ago, but I forgot. > > Currently h_pic return value is not checked at all, either at boor or at runtime. > When it fails, should we re-try or just print a kernel debug? What would be expected > behavior? because if it fails, it would anyway result in wrong values of app even > if the variables are initialized to 0. There's nothing in the spec for H_PIC that suggests retrying on failure. I'm not really in favor of printing a message about it either; that practice tends to result in log noise in non-PowerVM guests. When H_PIC doesn't work the corresponding lines should not be emitted in /proc/powerpc/lparcfg IMO.
diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c index f73c4d1c26af..8df4e7c529d7 100644 --- a/arch/powerpc/platforms/pseries/lparcfg.c +++ b/arch/powerpc/platforms/pseries/lparcfg.c @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time, return rc; } +unsigned long boot_pool_idle_time; + /* * parse_ppp_data * Parse out the data returned from h_get_ppp and h_pic @@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m) h_pic(&pool_idle_time, &pool_procs); seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time); seq_printf(m, "pool_num_procs=%ld\n", pool_procs); + seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time); } seq_printf(m, "unallocated_capacity_weight=%d\n", @@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = { static int __init lparcfg_init(void) { umode_t mode = 0444; + unsigned long num_procs; /* Allow writing if we have FW_FEATURE_SPLPAR */ if (firmware_has_feature(FW_FEATURE_SPLPAR)) @@ -801,6 +805,9 @@ static int __init lparcfg_init(void) printk(KERN_ERR "Failed to create powerpc/lparcfg\n"); return -EIO; } + + h_pic(&boot_pool_idle_time, &num_procs); + return 0; } machine_device_initcall(pseries, lparcfg_init);
When there are no options specified for lparstat, it is expected to give reports since LPAR(Logical Partition) boot. App is an indicator for available processor pool in an Shared Processor LPAR(SPLPAR). App is derived using pool_idle_time which is obtained using H_PIC call. The interval based reports show correct App value while since boot report shows very high App values. This happens because in that case app is obtained by dividing pool idle time by LPAR uptime. Since pool idle time is reported by the PowerVM hypervisor since its boot, it need not align with LPAR boot. This leads to large App values. To fix that export boot pool idle time in lparcfg and powerpc-utils will use this info to derive App as below for since boot reports. App = (pool idle time - boot pool idle time) / (uptime * timebase) Results:: Observe app values. ====================== Shared LPAR ================================ lparstat System Configuration type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00 reboot stress-ng --cpu=$(nproc) -t 600 sleep 600 So in this case app is expected to close to 37-6=31. ====== 6.9-rc1 and lparstat 1.3.10 ============= %user %sys %wait %idle physc %entc lbusy app vcsw phint ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21 === With this patch and powerpc-utils patch to do the above equation === %user %sys %wait %idle physc %entc lbusy app vcsw phint ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21 ===================================================================== Note: physc, purr/idle purr being inaccurate is being handled in a separate patch in powerpc-utils tree. Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> --- Note: This patch needs to merged first in the kernel for the powerpc-utils patches to work. powerpc-utils patches will be posted to its mailing list and link would be found in the reply to this patch if available. arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++ 1 file changed, 7 insertions(+) -- 2.39.3