diff mbox series

Run pollers in time_wait() when not booting

Message ID 20181030071347.18398-1-stewart@linux.ibm.com
State Accepted
Headers show
Series Run pollers in time_wait() when not booting | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied
snowpatch_ozlabs/make_check success Test make_check on branch master
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied
snowpatch_ozlabs/make_check fail Test make_check on branch master
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied
snowpatch_ozlabs/make_check fail Test make_check on branch master
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied
snowpatch_ozlabs/make_check success Test make_check on branch master

Commit Message

Stewart Smith Oct. 30, 2018, 7:13 a.m. UTC
This only bit us hard with hioma in one scenario.

Our OPAL API has been OPAL_POLL_EVENTS may be needed to make forward
progress on ongoing operations, and the internal to skiboot API has been
that time_wait() of a suitable time will run pollers (on at least one
CPU) to help ensure forward progress can be made.

In a perfect world, interrupts are used but they may a) be disabled, or
b) the thing we're doing can't use interrupts because computers are
generally terrible.

Back in 3db397ea5892a (circa 2015), we changed skiboot so that we'd run
pollers only on the boot CPU, and not if we held any locks. This was to
reduce the chance of programming code that could deadlock, as well as to
ensure that we didn't just thrash all the cachelines for running pollers
all over a large system during boot, or hard spin on the same locks on
all secondary CPUs.

The problem arises if the OS we're booting makes an OPAL call early on,
with interrupts disabled, that requires a poller to run to make forward
progress. An example of this would be OPAL_WRITE_NVRAM early in Linux
boot (where Linux sets up the partitions it wants) - something that
occurs iff we've had to reformat NVRAM this boot (i.e. first boot or
corrupted NVRAM).

The hiomap implementation should arguably *not* rely on synchronous IPMI
messages, but this is a future improvement (as was for mbox before it).
The mbox-flash code solved this problem by spinning on check_timers().

More generically though, the approach of running the pollers when no
longer booting means we behave more in line with what the API is meant
to be, rather than have this odd case of "time_wait() for a condition
that could also be tripped by an interrupt works fine unless the OS is
up and running but hasn't set interrupts up yet".

Fixes: 529bdca0bc546a7ae3ecbd2c3134b7260072d8b0
Fixes: 3db397ea5892a8b348cf412739996731884561b3
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
---
 core/timebase.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andrew Jeffery Oct. 30, 2018, 11:15 a.m. UTC | #1
On Tue, 30 Oct 2018, at 17:43, Stewart Smith wrote:
> This only bit us hard with hioma in one scenario.

Missing a 'p' in 'hiomap'

> 
> Our OPAL API has been OPAL_POLL_EVENTS may be needed to make forward
> progress on ongoing operations, and the internal to skiboot API has been
> that time_wait() of a suitable time will run pollers (on at least one
> CPU) to help ensure forward progress can be made.
> 
> In a perfect world, interrupts are used but they may a) be disabled, or
> b) the thing we're doing can't use interrupts because computers are
> generally terrible.
> 
> Back in 3db397ea5892a (circa 2015), we changed skiboot so that we'd run
> pollers only on the boot CPU, and not if we held any locks. This was to
> reduce the chance of programming code that could deadlock, as well as to
> ensure that we didn't just thrash all the cachelines for running pollers
> all over a large system during boot, or hard spin on the same locks on
> all secondary CPUs.
> 
> The problem arises if the OS we're booting makes an OPAL call early on,
> with interrupts disabled, that requires a poller to run to make forward
> progress. An example of this would be OPAL_WRITE_NVRAM early in Linux
> boot (where Linux sets up the partitions it wants) - something that
> occurs iff we've had to reformat NVRAM this boot (i.e. first boot or
> corrupted NVRAM).
> 
> The hiomap implementation should arguably *not* rely on synchronous IPMI
> messages, but this is a future improvement (as was for mbox before it).
> The mbox-flash code solved this problem by spinning on check_timers().
> 
> More generically though, the approach of running the pollers when no
> longer booting means we behave more in line with what the API is meant
> to be, rather than have this odd case of "time_wait() for a condition
> that could also be tripped by an interrupt works fine unless the OS is
> up and running but hasn't set interrupts up yet".

Nice description.

> 
> Fixes: 529bdca0bc546a7ae3ecbd2c3134b7260072d8b0
> Fixes: 3db397ea5892a8b348cf412739996731884561b3
> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
> ---
>  core/timebase.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/core/timebase.c b/core/timebase.c
> index 24e72959f542..f9f1a55b8dbb 100644
> --- a/core/timebase.c
> +++ b/core/timebase.c
> @@ -57,7 +57,7 @@ void time_wait(unsigned long duration)
>  		return;
>  	}
>  
> -	if (c != boot_cpu)
> +	if (c != boot_cpu && opal_booting())
>  		time_wait_nopoll(duration);
>  	else
>  		time_wait_poll(duration);
> -- 
> 2.17.2
> 

Applying this to master and building gave me:

core/timebase.c: In function ‘time_wait’:
core/timebase.c:60:23: error: implicit declaration of function ‘opal_booting’ [-Werror=implicit-function-declaration]
  if (c != boot_cpu && opal_booting())
                       ^~~~~~~~~~~~

But it passes snowpatch according to patchwork :thinking_face:

I had to do this:

diff --git a/core/timebase.c b/core/timebase.c
index 24e72959f542..830ca53d2442 100644
--- a/core/timebase.c
+++ b/core/timebase.c
@@ -19,6 +19,7 @@
 #include <opal.h>
 #include <cpu.h>
 #include <chip.h>
+#include <debug_descriptor.h>
 
 unsigned long tb_hz = 512000000;
 
@@ -57,7 +58,7 @@ void time_wait(unsigned long duration)
                return;
        }
 
-       if (c != boot_cpu)
+       if (c != boot_cpu && opal_booting())
                time_wait_nopoll(duration);
        else
                time_wait_poll(duration);

Anyway, with the additional include the build looks like it rids ozrom2 of the stuck thread problem, which it was hitting consistently prior to applying it.

Thanks for chasing that down.

Andrew
Andrew Donnellan Oct. 31, 2018, 4:21 a.m. UTC | #2
+ ruscur

On 30/10/18 10:15 pm, Andrew Jeffery wrote:
> But it passes snowpatch according to patchwork :thinking_face:

https://openpower.xyz/job/snowpatch/job/snowpatch-skiboot/192//artifact/skiboot/build.log/*view*/

...
	[CC]  platforms/qemu/qemu.o
	[CC]  platforms/astbmc/vesnin.o
	[CC]  platforms/mambo/mambo.o
	[CC]  platforms/astbmc/romulus.o
	[CC]  platforms/astbmc/p9dsu.o
	[CC]  platforms/mambo/console.o
/var/lib/jenkins-slave/workspace/snowpatch/snowpatch-skiboot/skiboot/Makefile.rules:56: 
recipe for target 'core/timebase.o' failed

"success"

hmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm...............
Stewart Smith Oct. 31, 2018, 4:25 a.m. UTC | #3
Andrew Jeffery <andrew@aj.id.au> writes:
> On Tue, 30 Oct 2018, at 17:43, Stewart Smith wrote:
>> This only bit us hard with hioma in one scenario.
>
> Missing a 'p' in 'hiomap'

Cheers, fixed.

>> Our OPAL API has been OPAL_POLL_EVENTS may be needed to make forward
>> progress on ongoing operations, and the internal to skiboot API has been
>> that time_wait() of a suitable time will run pollers (on at least one
>> CPU) to help ensure forward progress can be made.
>> 
>> In a perfect world, interrupts are used but they may a) be disabled, or
>> b) the thing we're doing can't use interrupts because computers are
>> generally terrible.
>> 
>> Back in 3db397ea5892a (circa 2015), we changed skiboot so that we'd run
>> pollers only on the boot CPU, and not if we held any locks. This was to
>> reduce the chance of programming code that could deadlock, as well as to
>> ensure that we didn't just thrash all the cachelines for running pollers
>> all over a large system during boot, or hard spin on the same locks on
>> all secondary CPUs.
>> 
>> The problem arises if the OS we're booting makes an OPAL call early on,
>> with interrupts disabled, that requires a poller to run to make forward
>> progress. An example of this would be OPAL_WRITE_NVRAM early in Linux
>> boot (where Linux sets up the partitions it wants) - something that
>> occurs iff we've had to reformat NVRAM this boot (i.e. first boot or
>> corrupted NVRAM).
>> 
>> The hiomap implementation should arguably *not* rely on synchronous IPMI
>> messages, but this is a future improvement (as was for mbox before it).
>> The mbox-flash code solved this problem by spinning on check_timers().
>> 
>> More generically though, the approach of running the pollers when no
>> longer booting means we behave more in line with what the API is meant
>> to be, rather than have this odd case of "time_wait() for a condition
>> that could also be tripped by an interrupt works fine unless the OS is
>> up and running but hasn't set interrupts up yet".
>
> Nice description.
>
>> 
>> Fixes: 529bdca0bc546a7ae3ecbd2c3134b7260072d8b0
>> Fixes: 3db397ea5892a8b348cf412739996731884561b3
>> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
>> ---
>>  core/timebase.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/core/timebase.c b/core/timebase.c
>> index 24e72959f542..f9f1a55b8dbb 100644
>> --- a/core/timebase.c
>> +++ b/core/timebase.c
>> @@ -57,7 +57,7 @@ void time_wait(unsigned long duration)
>>  		return;
>>  	}
>>  
>> -	if (c != boot_cpu)
>> +	if (c != boot_cpu && opal_booting())
>>  		time_wait_nopoll(duration);
>>  	else
>>  		time_wait_poll(duration);
>> -- 
>> 2.17.2
>> 
>
> Applying this to master and building gave me:
>
> core/timebase.c: In function ‘time_wait’:
> core/timebase.c:60:23: error: implicit declaration of function ‘opal_booting’ [-Werror=implicit-function-declaration]
>   if (c != boot_cpu && opal_booting())
>                        ^~~~~~~~~~~~
>
> But it passes snowpatch according to patchwork :thinking_face:

That'll be because of b1dee4a43dc3b10e89355964c8cd1f6e26447837 which
split the debug_descriptor routines into their own include file. The
patch will work on 6.0.x okay, but not master.

Teaches me for just building on 6.0.x :)

> diff --git a/core/timebase.c b/core/timebase.c
> index 24e72959f542..830ca53d2442 100644
> --- a/core/timebase.c
> +++ b/core/timebase.c
> @@ -19,6 +19,7 @@
>  #include <opal.h>
>  #include <cpu.h>
>  #include <chip.h>
> +#include <debug_descriptor.h>
>  
>  unsigned long tb_hz = 512000000;
>  
> @@ -57,7 +58,7 @@ void time_wait(unsigned long duration)
>                 return;
>         }
>  
> -       if (c != boot_cpu)
> +       if (c != boot_cpu && opal_booting())
>                 time_wait_nopoll(duration);
>         else
>                 time_wait_poll(duration);
>
> Anyway, with the additional include the build looks like it rids ozrom2 of the stuck thread problem, which it was hitting consistently prior to applying it.

Excellent.
Andrew Donnellan Oct. 31, 2018, 4:43 a.m. UTC | #4
On 31/10/18 3:21 pm, Andrew Donnellan wrote:
> + ruscur
> 
> On 30/10/18 10:15 pm, Andrew Jeffery wrote:
>> But it passes snowpatch according to patchwork :thinking_face:
> 
> https://openpower.xyz/job/snowpatch/job/snowpatch-skiboot/192//artifact/skiboot/build.log/*view*/ 
> 
> 
> ...
>      [CC]  platforms/qemu/qemu.o
>      [CC]  platforms/astbmc/vesnin.o
>      [CC]  platforms/mambo/mambo.o
>      [CC]  platforms/astbmc/romulus.o
>      [CC]  platforms/astbmc/p9dsu.o
>      [CC]  platforms/mambo/console.o
> /var/lib/jenkins-slave/workspace/snowpatch/snowpatch-skiboot/skiboot/Makefile.rules:56: 
> recipe for target 'core/timebase.o' failed
> 
> "success"
> 
> hmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm...............

Thanks to Russell for fixing this - as it turns out, our job had been 
broken for a while and somehow we just haven't noticed :/ Now it's 
failing correctly.
Russell Currey Oct. 31, 2018, 4:45 a.m. UTC | #5
On Wed, 2018-10-31 at 15:21 +1100, Andrew Donnellan wrote:
> + ruscur
> 
> On 30/10/18 10:15 pm, Andrew Jeffery wrote:
> > But it passes snowpatch according to patchwork :thinking_face:
> 
> https://openpower.xyz/job/snowpatch/job/snowpatch-skiboot/192//artifact/skiboot/build.log/*view*/
> 
> ...
> 	[CC]  platforms/qemu/qemu.o
> 	[CC]  platforms/astbmc/vesnin.o
> 	[CC]  platforms/mambo/mambo.o
> 	[CC]  platforms/astbmc/romulus.o
> 	[CC]  platforms/astbmc/p9dsu.o
> 	[CC]  platforms/mambo/console.o
> /var/lib/jenkins-slave/workspace/snowpatch/snowpatch-
> skiboot/skiboot/Makefile.rules:56: 
> recipe for target 'core/timebase.o' failed
> 
> "success"
> 
> hmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm...............
> 

Gentlemen, if you observe what snowpatch reports you'll clearly see
it's boned:

http://patchwork.ozlabs.org/patch/990670/

and by that I mean I added "|| exit 1" to the "docker run" command and
ran the tests again.

This is clearly docker's fault and not mine.
Stewart Smith Oct. 31, 2018, 6 a.m. UTC | #6
Russell Currey <ruscur@russell.cc> writes:
> On Wed, 2018-10-31 at 15:21 +1100, Andrew Donnellan wrote:
>> + ruscur
>> 
>> On 30/10/18 10:15 pm, Andrew Jeffery wrote:
>> > But it passes snowpatch according to patchwork :thinking_face:
>> 
>> https://openpower.xyz/job/snowpatch/job/snowpatch-skiboot/192//artifact/skiboot/build.log/*view*/
>> 
>> ...
>> 	[CC]  platforms/qemu/qemu.o
>> 	[CC]  platforms/astbmc/vesnin.o
>> 	[CC]  platforms/mambo/mambo.o
>> 	[CC]  platforms/astbmc/romulus.o
>> 	[CC]  platforms/astbmc/p9dsu.o
>> 	[CC]  platforms/mambo/console.o
>> /var/lib/jenkins-slave/workspace/snowpatch/snowpatch-
>> skiboot/skiboot/Makefile.rules:56: 
>> recipe for target 'core/timebase.o' failed
>> 
>> "success"
>> 
>> hmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm...............
>> 
>
> Gentlemen, if you observe what snowpatch reports you'll clearly see
> it's boned:
>
> http://patchwork.ozlabs.org/patch/990670/
>
> and by that I mean I added "|| exit 1" to the "docker run" command and
> ran the tests again.
>
> This is clearly docker's fault and not mine.

Yes. I too blame Docker. Especially for all my bugs in firmware. Or misc
other life problems.

Seriously though, thanks for fixing it.
Stewart Smith Oct. 31, 2018, 6 a.m. UTC | #7
Stewart Smith <stewart@linux.ibm.com> writes:
> This only bit us hard with hioma in one scenario.
>
> Our OPAL API has been OPAL_POLL_EVENTS may be needed to make forward
> progress on ongoing operations, and the internal to skiboot API has been
> that time_wait() of a suitable time will run pollers (on at least one
> CPU) to help ensure forward progress can be made.
>
> In a perfect world, interrupts are used but they may a) be disabled, or
> b) the thing we're doing can't use interrupts because computers are
> generally terrible.
>
> Back in 3db397ea5892a (circa 2015), we changed skiboot so that we'd run
> pollers only on the boot CPU, and not if we held any locks. This was to
> reduce the chance of programming code that could deadlock, as well as to
> ensure that we didn't just thrash all the cachelines for running pollers
> all over a large system during boot, or hard spin on the same locks on
> all secondary CPUs.
>
> The problem arises if the OS we're booting makes an OPAL call early on,
> with interrupts disabled, that requires a poller to run to make forward
> progress. An example of this would be OPAL_WRITE_NVRAM early in Linux
> boot (where Linux sets up the partitions it wants) - something that
> occurs iff we've had to reformat NVRAM this boot (i.e. first boot or
> corrupted NVRAM).
>
> The hiomap implementation should arguably *not* rely on synchronous IPMI
> messages, but this is a future improvement (as was for mbox before it).
> The mbox-flash code solved this problem by spinning on check_timers().
>
> More generically though, the approach of running the pollers when no
> longer booting means we behave more in line with what the API is meant
> to be, rather than have this odd case of "time_wait() for a condition
> that could also be tripped by an interrupt works fine unless the OS is
> up and running but hasn't set interrupts up yet".
>
> Fixes: 529bdca0bc546a7ae3ecbd2c3134b7260072d8b0
> Fixes: 3db397ea5892a8b348cf412739996731884561b3
> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
> ---
>  core/timebase.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

FIxed up Andrew's comments and merged to master as of d10862e40cd8ef36fe49b298f3c615ac73dd3c4e
diff mbox series

Patch

diff --git a/core/timebase.c b/core/timebase.c
index 24e72959f542..f9f1a55b8dbb 100644
--- a/core/timebase.c
+++ b/core/timebase.c
@@ -57,7 +57,7 @@  void time_wait(unsigned long duration)
 		return;
 	}
 
-	if (c != boot_cpu)
+	if (c != boot_cpu && opal_booting())
 		time_wait_nopoll(duration);
 	else
 		time_wait_poll(duration);