diff mbox series

fast-boot: occ: Re-parse the pstate table during fast-boot

Message ID 1517554952-23789-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com
State Accepted
Headers show
Series fast-boot: occ: Re-parse the pstate table during fast-boot | expand

Commit Message

Shilpasri G Bhat Feb. 2, 2018, 7:02 a.m. UTC
OCC shares the frequency list to host by copying the pstate table to
main memory in HOMER. This table is parsed during boot to create
device-tree properties for frequency and pstate IDs. OCC can update
the pstate table to present a new set of frequencies to the host. But
host will remain oblivious to these changes unless it is re-inited
with the updated device-tree CPU frequency properties. So this patch
allows to re-parse the pstate table and update the device-tree
properties during fast-reboot.

OCC updates the pstate table when asked to do so using pstate-table
bias command. And this is mainly used by WOF team for
characterization purposes.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---
 core/init.c |  3 ++-
 hw/occ.c    | 23 +++++++++++++++++++++--
 2 files changed, 23 insertions(+), 3 deletions(-)

Comments

Nicholas Piggin Feb. 3, 2018, 6:24 a.m. UTC | #1
On Fri,  2 Feb 2018 12:32:32 +0530
Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:

> OCC shares the frequency list to host by copying the pstate table to
> main memory in HOMER. This table is parsed during boot to create
> device-tree properties for frequency and pstate IDs. OCC can update
> the pstate table to present a new set of frequencies to the host. But
> host will remain oblivious to these changes unless it is re-inited
> with the updated device-tree CPU frequency properties. So this patch
> allows to re-parse the pstate table and update the device-tree
> properties during fast-reboot.
> 
> OCC updates the pstate table when asked to do so using pstate-table
> bias command. And this is mainly used by WOF team for
> characterization purposes.

Would this ever be used in production, I'm guessing not? I don't
think that's a bad thing as such -- designing for test is always
good. Perhaps a comment though to explain why you're re-parsing
it.

Without knowing much about OCC, I'll guess you're doing this so
you can update the OCC at runtime without having to update firmware
before each IPL?

I guess we should always keep in mind fast reboot should match IPL
as closely as possible and any undetected deviations are a pretty
serious flaw. (e.g., you mess up your OCC state and want to return
to normal, you would reboot).

I'm just wondering, should this be under an nvram option?

Thanks,
Nick
Vaidyanathan Srinivasan Feb. 3, 2018, 6:57 a.m. UTC | #2
* Nicholas Piggin <npiggin@gmail.com> [2018-02-03 16:24:25]:

> 
> On Fri,  2 Feb 2018 12:32:32 +0530
> Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:
> 
> > OCC shares the frequency list to host by copying the pstate table to
> > main memory in HOMER. This table is parsed during boot to create
> > device-tree properties for frequency and pstate IDs. OCC can update
> > the pstate table to present a new set of frequencies to the host. But
> > host will remain oblivious to these changes unless it is re-inited
> > with the updated device-tree CPU frequency properties. So this patch
> > allows to re-parse the pstate table and update the device-tree
> > properties during fast-reboot.
> > 
> > OCC updates the pstate table when asked to do so using pstate-table
> > bias command. And this is mainly used by WOF team for
> > characterization purposes.
> 
> Would this ever be used in production, I'm guessing not? I don't
> think that's a bad thing as such -- designing for test is always
> good. Perhaps a comment though to explain why you're re-parsing
> it.

Never say never :)

At this time this facility is to enabling tooling to set OCC parameter
at runtime and test the system without encoding all parameters and
building a PNOR and dependent components.

This enables a very efficient workflow with just a OCC reboot and
fast-reboot on OPAL+Linux and we are back in about 2 minutes.  This
can be leveraged in automation/CI also to test various parameters.

> Without knowing much about OCC, I'll guess you're doing this so
> you can update the OCC at runtime without having to update firmware
> before each IPL?

Yep, you got the use case right.  These OCC and PState parameters and
tunings can be tested and later rolled up into the firmware.
 
> I guess we should always keep in mind fast reboot should match IPL
> as closely as possible and any undetected deviations are a pretty
> serious flaw. (e.g., you mess up your OCC state and want to return
> to normal, you would reboot).

We expect PState table to change for these tests.  If something goes
wrong, OCC will crash or pull system to safe mode.  No major change in
system configuration like cpu, memory, IO.  If something really bad
happens, we will hang/checkstop and we will have to re-ipl to recover.

If there was no OCC PState changes, then parsing it again should get
us exact state compared to Power-ON and hence no risk.

> I'm just wondering, should this be under an nvram option?

The risk actually depends on what we ask OCC to do and hence not
a major config change/risk for OPAL.  I would like to leave it as
default for fast-reboot.  We add slight time factor to rebuild the
relevant device tree.  Given that fast-reboot itself is experimental,
this is a acceptable risk and overhead.

If we hit new error/fail scenarios in future, we can add settings.
I would like to roll this into a single knob for fast-reboot like
"safe", "risky" or something that can help us choose what we want to
do in reinit path.  We need not call out OCC reinit as explicit nvram
option at this time.

--Vaidy
Nicholas Piggin Feb. 3, 2018, 7:59 a.m. UTC | #3
On Sat, 3 Feb 2018 12:27:02 +0530
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:

> * Nicholas Piggin <npiggin@gmail.com> [2018-02-03 16:24:25]:
> 
> > 
> > On Fri,  2 Feb 2018 12:32:32 +0530
> > Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:
> >   
> > > OCC shares the frequency list to host by copying the pstate table to
> > > main memory in HOMER. This table is parsed during boot to create
> > > device-tree properties for frequency and pstate IDs. OCC can update
> > > the pstate table to present a new set of frequencies to the host. But
> > > host will remain oblivious to these changes unless it is re-inited
> > > with the updated device-tree CPU frequency properties. So this patch
> > > allows to re-parse the pstate table and update the device-tree
> > > properties during fast-reboot.
> > > 
> > > OCC updates the pstate table when asked to do so using pstate-table
> > > bias command. And this is mainly used by WOF team for
> > > characterization purposes.  
> > 
> > Would this ever be used in production, I'm guessing not? I don't
> > think that's a bad thing as such -- designing for test is always
> > good. Perhaps a comment though to explain why you're re-parsing
> > it.  
> 
> Never say never :)
> 
> At this time this facility is to enabling tooling to set OCC parameter
> at runtime and test the system without encoding all parameters and
> building a PNOR and dependent components.
> 
> This enables a very efficient workflow with just a OCC reboot and
> fast-reboot on OPAL+Linux and we are back in about 2 minutes.  This
> can be leveraged in automation/CI also to test various parameters.
> 
> > Without knowing much about OCC, I'll guess you're doing this so
> > you can update the OCC at runtime without having to update firmware
> > before each IPL?  
> 
> Yep, you got the use case right.  These OCC and PState parameters and
> tunings can be tested and later rolled up into the firmware.
>  
> > I guess we should always keep in mind fast reboot should match IPL
> > as closely as possible and any undetected deviations are a pretty
> > serious flaw. (e.g., you mess up your OCC state and want to return
> > to normal, you would reboot).  
> 
> We expect PState table to change for these tests.  If something goes
> wrong, OCC will crash or pull system to safe mode.

Is there a way to mark this and disable fast reboot if that happens.

I don't mean to single out this one patch. I just think it's a good
time to think about exactly how we're going to deal with fast reboot,
and how it would be used effectively if we enable it for end users.

We need to be very conservative and be sure to mark any possible errors
or inconsistencies in hardware or firmware for full IPL.

Other modes for debugging and testing like you're doing here are fine
too. No problem if we gate them with an nvram parameter.

>  No major change in
> system configuration like cpu, memory, IO.  If something really bad
> happens, we will hang/checkstop and we will have to re-ipl to recover.
> 
> If there was no OCC PState changes, then parsing it again should get
> us exact state compared to Power-ON and hence no risk.
> 
> > I'm just wondering, should this be under an nvram option?  
> 
> The risk actually depends on what we ask OCC to do and hence not
> a major config change/risk for OPAL.  I would like to leave it as
> default for fast-reboot.  We add slight time factor to rebuild the
> relevant device tree.  Given that fast-reboot itself is experimental,
> this is a acceptable risk and overhead.
> 
> If we hit new error/fail scenarios in future, we can add settings.
> I would like to roll this into a single knob for fast-reboot like
> "safe", "risky" or something that can help us choose what we want to
> do in reinit path.  We need not call out OCC reinit as explicit nvram
> option at this time.

Thanks for the background. The patch is fine in principle from me, I'm
not a fast reboot or OCC expert so my ack would not mean much :) But we
should be thinking about a coherent way to manage fast reboot behaviour.

Thanks,
Nick
Stewart Smith Feb. 8, 2018, 4:44 a.m. UTC | #4
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
>> I'm just wondering, should this be under an nvram option?
>
> The risk actually depends on what we ask OCC to do and hence not
> a major config change/risk for OPAL.  I would like to leave it as
> default for fast-reboot.  We add slight time factor to rebuild the
> relevant device tree.  Given that fast-reboot itself is experimental,
> this is a acceptable risk and overhead.
>
> If we hit new error/fail scenarios in future, we can add settings.
> I would like to roll this into a single knob for fast-reboot like
> "safe", "risky" or something that can help us choose what we want to
> do in reinit path.  We need not call out OCC reinit as explicit nvram
> option at this time.

It's a good question what we should do there... in a secure boot world,
we're probably better off as in that case you shouldn't be able to go
and poke things into the OCCs to do bad things.

I wonder if we could get an OCC command to "reset back to what you'd be
on IPL" or something?
ppaidipe Feb. 8, 2018, 5:15 a.m. UTC | #5
On 2018-02-08 10:14, Stewart Smith wrote:
> Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
>>> I'm just wondering, should this be under an nvram option?
>> 
>> The risk actually depends on what we ask OCC to do and hence not
>> a major config change/risk for OPAL.  I would like to leave it as
>> default for fast-reboot.  We add slight time factor to rebuild the
>> relevant device tree.  Given that fast-reboot itself is experimental,
>> this is a acceptable risk and overhead.
>> 
>> If we hit new error/fail scenarios in future, we can add settings.
>> I would like to roll this into a single knob for fast-reboot like
>> "safe", "risky" or something that can help us choose what we want to
>> do in reinit path.  We need not call out OCC reinit as explicit nvram
>> option at this time.
> 
> It's a good question what we should do there... in a secure boot world,
> we're probably better off as in that case you shouldn't be able to go
> and poke things into the OCCs to do bad things.
> 
> I wonder if we could get an OCC command to "reset back to what you'd be
> on IPL" or something?

I think this is one of the example failure case where occ is still in
disabled state after fast reboot. As shilpa mentioned we should be able
to clear max_occ_reset_count and do the occ re-init.
https://github.com/open-power/skiboot/issues/49
Stewart Smith Feb. 9, 2018, 5:09 a.m. UTC | #6
ppaidipe <ppaidipe@linux.vnet.ibm.com> writes:
> On 2018-02-08 10:14, Stewart Smith wrote:
>> Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
>>>> I'm just wondering, should this be under an nvram option?
>>> 
>>> The risk actually depends on what we ask OCC to do and hence not
>>> a major config change/risk for OPAL.  I would like to leave it as
>>> default for fast-reboot.  We add slight time factor to rebuild the
>>> relevant device tree.  Given that fast-reboot itself is experimental,
>>> this is a acceptable risk and overhead.
>>> 
>>> If we hit new error/fail scenarios in future, we can add settings.
>>> I would like to roll this into a single knob for fast-reboot like
>>> "safe", "risky" or something that can help us choose what we want to
>>> do in reinit path.  We need not call out OCC reinit as explicit nvram
>>> option at this time.
>> 
>> It's a good question what we should do there... in a secure boot world,
>> we're probably better off as in that case you shouldn't be able to go
>> and poke things into the OCCs to do bad things.
>> 
>> I wonder if we could get an OCC command to "reset back to what you'd be
>> on IPL" or something?
>
> I think this is one of the example failure case where occ is still in
> disabled state after fast reboot. As shilpa mentioned we should be able
> to clear max_occ_reset_count and do the occ re-init.
> https://github.com/open-power/skiboot/issues/49

Yeah, we should probably go down the route of doing that (perhaps we
should do that no matter what? I'm not sure, we'll need to talk to the
OCC team to work out what's the best/correct thing to do).

I've merged the patch to master as of
85a1de35cbe47618e9d4104880f182121806af4d as it seems to be something
decent at least for the time being... although I'm not convinced we
shouldn't either disable it or go through a larger re-init cycle with
the OCC....
diff mbox series

Patch

diff --git a/core/init.c b/core/init.c
index ec9f329..92c79af 100644
--- a/core/init.c
+++ b/core/init.c
@@ -486,6 +486,8 @@  void __noreturn load_and_boot_kernel(bool is_reboot)
 
 	ipmi_set_fw_progress_sensor(IPMI_FW_OS_BOOT);
 
+	occ_pstates_init();
+
 	if (!is_reboot) {
 		/* We wait for the nvram read to complete here so we can
 		 * grab stuff from there such as the kernel arguments
@@ -499,7 +501,6 @@  void __noreturn load_and_boot_kernel(bool is_reboot)
 		 * OCC takes few secs to boot.  Call this as late as
 		 * as possible to avoid delay.
 		 */
-		occ_pstates_init();
 		occ_sensors_init();
 
 	} else {
diff --git a/hw/occ.c b/hw/occ.c
index f3f1231..fb7e683 100644
--- a/hw/occ.c
+++ b/hw/occ.c
@@ -1556,8 +1556,24 @@  void occ_pstates_init(void)
 	if (proc_gen < proc_gen_p8)
 		return;
 	/* Handle fast reboots */
-	if (occ_pstates_initialized)
-		return;
+	if (occ_pstates_initialized) {
+		struct dt_node *power_mgt;
+		int i;
+		const char *props[] = {
+				"ibm,pstate-core-max",
+				"ibm,pstate-frequencies-mhz",
+				"ibm,pstate-ids",
+				"ibm,pstate-max",
+				"ibm,pstate-min",
+				"ibm,pstate-nominal",
+				"ibm,pstate-turbo",
+				"ibm,pstate-ultra-turbo",
+				};
+
+		power_mgt = dt_find_by_path(dt_root, "/ibm,opal/power-mgt");
+		for (i = 0; i < ARRAY_SIZE(props); i++)
+			dt_check_del_prop(power_mgt, props[i]);
+	}
 
 	switch (proc_gen) {
 	case proc_gen_p8:
@@ -1605,6 +1621,9 @@  void occ_pstates_init(void)
 				cpu_pstates_prepare_core(chip, c, pstate_nom);
 	}
 
+	if (occ_pstates_initialized)
+		return;
+
 	/* Add opal_poller to poll OCC throttle status of each chip */
 	for_each_chip(chip)
 		chip->throttle = 0;