diff mbox

Ensure we run pollers in cpu_wait_job()

Message ID 1444175322-21733-1-git-send-email-stewart@linux.vnet.ibm.com
State Accepted
Headers show

Commit Message

Stewart Smith Oct. 6, 2015, 11:48 p.m. UTC
In root causing a bug on AST BMC Alistair found that pollers weren't
being run for around 3800ms.

This was due to a wonderful accident that's probably about a year or more
old where:

In cpu_wait_job we have:
        unsigned long ticks = usecs_to_tb(5);
	...
	time_wait(ticks);

While in time_wait(), deciding on if to run pollers:
        unsigned long period = msecs_to_tb(5);
	...
	if (remaining >= period) {

Obviously, this means we never run pollers. Not ideal.

This patch ensures we run pollers every 5ms in cpu_wait_job() as well
as displaying how long we waited for a job if that wait was >1second.

Reported-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
---
 core/cpu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Patrick Williams Oct. 7, 2015, 11:22 a.m. UTC | #1
On Wed, Oct 07, 2015 at 10:48:42AM +1100, Stewart Smith wrote:
>  	unsigned long ticks = usecs_to_tb(5);
> +	unsigned long period = msecs_to_tb(5);
>  
> +		time_waited+=ticks;
> +		if (time_waited % period == 0)
> +			opal_run_pollers();

This feels incorrect to me.  If there is a rounding error in your
msecs_to_tb vs usecs_to_tb there is no certainty that LCM(period,ticks)
is period.  In that case your still not going to run opal_run_pollers as
often as you would like.

My suggestion:

    if (time_waited >= period) {
        time_waited -= period;
        opal_run_pollers();
    }
Patrick Williams Oct. 7, 2015, 7:37 p.m. UTC | #2
On Wed, Oct 07, 2015 at 06:22:44AM -0500, Patrick Williams wrote:
> On Wed, Oct 07, 2015 at 10:48:42AM +1100, Stewart Smith wrote:
> >  	unsigned long ticks = usecs_to_tb(5);
> > +	unsigned long period = msecs_to_tb(5);
> >  
> > +		time_waited+=ticks;
> > +		if (time_waited % period == 0)
> > +			opal_run_pollers();
> 
> This feels incorrect to me.  If there is a rounding error in your
> msecs_to_tb vs usecs_to_tb there is no certainty that LCM(period,ticks)
> is period.  In that case your still not going to run opal_run_pollers as
> often as you would like.

I wrote this wrong.  What I meant was that there is no certainty that:
        period % ticks == 0

Issue and suggestion below still holds.

> My suggestion:
> 
>     if (time_waited >= period) {
>         time_waited -= period;
>         opal_run_pollers();
>     }
> 
> -- 
> Patrick Williams



> _______________________________________________
> Skiboot mailing list
> Skiboot@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
Stewart Smith Oct. 7, 2015, 10:59 p.m. UTC | #3
Patrick Williams <patrick@stwcx.xyz> writes:
> On Wed, Oct 07, 2015 at 06:22:44AM -0500, Patrick Williams wrote:
>> On Wed, Oct 07, 2015 at 10:48:42AM +1100, Stewart Smith wrote:
>> >  	unsigned long ticks = usecs_to_tb(5);
>> > +	unsigned long period = msecs_to_tb(5);
>> >  
>> > +		time_waited+=ticks;
>> > +		if (time_waited % period == 0)
>> > +			opal_run_pollers();
>> 
>> This feels incorrect to me.  If there is a rounding error in your
>> msecs_to_tb vs usecs_to_tb there is no certainty that LCM(period,ticks)
>> is period.  In that case your still not going to run opal_run_pollers as
>> often as you would like.
>
> I wrote this wrong.  What I meant was that there is no certainty that:
>         period % ticks == 0

S, I went and wrote a unit test of course (which was lacking for all the
tb functions anyway), and then went "well, everything looks good" until
I remembered reading the riveting Chapter 6 of Book II of the PowerISA
and the bits about the variation of incrementing the timebase over
specified frequency and then went "darn... you're right, this isn't
about math at all!"

Except that here we're okay, because we're not using the timebase
register at all, we're using the computed ticks off a constant tb_hz
value of 512mhz and if time_wait(ticks) takes longer/shorter than what
we ask it doesn't actually matter, we increment time_waited by the exact
number of ticks we *meant* to wait for and will end up hitting
time_waited % period == 0 when we expect to (or at least close enough
to) - as it's just integer math with pretty nice round numbers.
Patrick Williams Oct. 7, 2015, 11:40 p.m. UTC | #4
On Thu, Oct 08, 2015 at 09:59:09AM +1100, Stewart Smith wrote:
> Patrick Williams <patrick@stwcx.xyz> writes:
> > On Wed, Oct 07, 2015 at 06:22:44AM -0500, Patrick Williams wrote:
> >> On Wed, Oct 07, 2015 at 10:48:42AM +1100, Stewart Smith wrote:
> >> >  	unsigned long ticks = usecs_to_tb(5);
> >> > +	unsigned long period = msecs_to_tb(5);
> >> >  
> >> > +		time_waited+=ticks;
> >> > +		if (time_waited % period == 0)
> >> > +			opal_run_pollers();
> >> 
> >> This feels incorrect to me.  If there is a rounding error in your
> >> msecs_to_tb vs usecs_to_tb there is no certainty that LCM(period,ticks)
> >> is period.  In that case your still not going to run opal_run_pollers as
> >> often as you would like.
> >
> > I wrote this wrong.  What I meant was that there is no certainty that:
> >         period % ticks == 0
> 
> S, I went and wrote a unit test of course (which was lacking for all the
> tb functions anyway), and then went "well, everything looks good" until
> I remembered reading the riveting Chapter 6 of Book II of the PowerISA
> and the bits about the variation of incrementing the timebase over
> specified frequency and then went "darn... you're right, this isn't
> about math at all!"
> 
> Except that here we're okay, because we're not using the timebase
> register at all, we're using the computed ticks off a constant tb_hz
> value of 512mhz and if time_wait(ticks) takes longer/shorter than what
> we ask it doesn't actually matter, we increment time_waited by the exact
> number of ticks we *meant* to wait for and will end up hitting
> time_waited % period == 0 when we expect to (or at least close enough
> to) - as it's just integer math with pretty nice round numbers.
> 

Since tb_hz is currently defined to 512000000, (msecs_to_tb(1) %
usecs_to_tb(1) == 0), so you don't have any problem.  If tb_hz is not a
nice multiple of 100000, then you would have a problem.

Example:
    tb_hz = 520093696; // 0x1F000000
    usecs_to_tb(5) == 2600
    msecs_to_tb(5) == 2600470

In that case (time_waited % period == 0) will never be true.

If you're confident the tb_hz will always be 512000000, then I guess it
doesn't matter, but I know Hostboot runs at a time when the tb is
actually variable.
Stewart Smith Oct. 8, 2015, 1:12 a.m. UTC | #5
Patrick Williams <patrick@stwcx.xyz> writes:
> On Thu, Oct 08, 2015 at 09:59:09AM +1100, Stewart Smith wrote:
>> Patrick Williams <patrick@stwcx.xyz> writes:
>> > On Wed, Oct 07, 2015 at 06:22:44AM -0500, Patrick Williams wrote:
>> >> On Wed, Oct 07, 2015 at 10:48:42AM +1100, Stewart Smith wrote:
>> >> >  	unsigned long ticks = usecs_to_tb(5);
>> >> > +	unsigned long period = msecs_to_tb(5);
>> >> >  
>> >> > +		time_waited+=ticks;
>> >> > +		if (time_waited % period == 0)
>> >> > +			opal_run_pollers();
>> >> 
>> >> This feels incorrect to me.  If there is a rounding error in your
>> >> msecs_to_tb vs usecs_to_tb there is no certainty that LCM(period,ticks)
>> >> is period.  In that case your still not going to run opal_run_pollers as
>> >> often as you would like.
>> >
>> > I wrote this wrong.  What I meant was that there is no certainty that:
>> >         period % ticks == 0
>> 
>> S, I went and wrote a unit test of course (which was lacking for all the
>> tb functions anyway), and then went "well, everything looks good" until
>> I remembered reading the riveting Chapter 6 of Book II of the PowerISA
>> and the bits about the variation of incrementing the timebase over
>> specified frequency and then went "darn... you're right, this isn't
>> about math at all!"
>> 
>> Except that here we're okay, because we're not using the timebase
>> register at all, we're using the computed ticks off a constant tb_hz
>> value of 512mhz and if time_wait(ticks) takes longer/shorter than what
>> we ask it doesn't actually matter, we increment time_waited by the exact
>> number of ticks we *meant* to wait for and will end up hitting
>> time_waited % period == 0 when we expect to (or at least close enough
>> to) - as it's just integer math with pretty nice round numbers.
>> 
>
> Since tb_hz is currently defined to 512000000, (msecs_to_tb(1) %
> usecs_to_tb(1) == 0), so you don't have any problem.  If tb_hz is not a
> nice multiple of 100000, then you would have a problem.
>
> Example:
>     tb_hz = 520093696; // 0x1F000000
>     usecs_to_tb(5) == 2600
>     msecs_to_tb(5) == 2600470
>
> In that case (time_waited % period == 0) will never be true.
>
> If you're confident the tb_hz will always be 512000000, then I guess it
> doesn't matter, but I know Hostboot runs at a time when the tb is
> actually variable.

At least for the tb_hz that's used in the math, pretty confident (at
least for the moment) that we'll be doing the math on a clean 512mhz
value.

I've put an explicit assert in a unit test with a comment about it to
hopefully catch it if we ever go and read what the timebase hz
*actually* is - because, as you point out, things will break.
diff mbox

Patch

diff --git a/core/cpu.c b/core/cpu.c
index 8b0f861..d905ebe 100644
--- a/core/cpu.c
+++ b/core/cpu.c
@@ -129,17 +129,26 @@  bool cpu_poll_job(struct cpu_job *job)
 void cpu_wait_job(struct cpu_job *job, bool free_it)
 {
 	unsigned long ticks = usecs_to_tb(5);
+	unsigned long period = msecs_to_tb(5);
+	unsigned long time_waited = 0;
 
 	if (!job)
 		return;
 
 	while(!job->complete) {
 		time_wait(ticks);
+		time_waited+=ticks;
+		if (time_waited % period == 0)
+			opal_run_pollers();
 		lwsync();
 	}
 	lwsync();
 	smt_medium();
 
+	if (time_waited > msecs_to_tb(1000))
+		prlog(PR_DEBUG, "cpu_wait_job(%s) for %d\n",
+		      job->name, tb_to_msecs(time_waited));
+
 	if (free_it)
 		free(job);
 }