Patchwork [natty/maverick] SRU: sched, x86: Avoid unnecessary overflow in sched_clock

login
register
mail settings
Submitter Chris J Arges
Date Jan. 11, 2012, 9:13 p.m.
Message ID <4F0DFB5E.7030007@canonical.com>
Download mbox | patch
Permalink /patch/135501/
State New
Headers show

Comments

Chris J Arges - Jan. 11, 2012, 9:13 p.m.
SRU Justification:

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/805341

Impact: Machines with uptimes of 200+ days will sometimes have soft lockups.

Fix: This was fixed in upstream. It has already been backported to Lucid.

Testcase: Run your computer for 200+ days and verify it doesn't lockup.
Seth Forshee - Jan. 12, 2012, 8:50 a.m.
On Wed, Jan 11, 2012 at 03:13:02PM -0600, Chris J Arges wrote:
> SRU Justification:
> 
> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/805341
> 
> Impact: Machines with uptimes of 200+ days will sometimes have soft lockups.
> 
> Fix: This was fixed in upstream. It has already been backported to Lucid.
> 
> Testcase: Run your computer for 200+ days and verify it doesn't lockup.

> From a9747c4dc863ab62553dd65c4159b4b8e749042f Mon Sep 17 00:00:00 2001
> From: Salman Qazi <sqazi@google.com>
> Date: Wed, 11 Jan 2012 12:29:48 -0600
> Subject: [PATCH] sched, x86: Avoid unnecessary overflow in sched_clock
> 
> (Added the missing signed-off-by line)
> 
> In hundreds of days, the __cycles_2_ns calculation in sched_clock
> has an overflow.  cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
> the final value to become zero.  We can solve this without losing
> any precision.
> 
> We can decompose TSC into quotient and remainder of division by the
> scale factor, and then use this to convert TSC into nanoseconds.
> 
> Signed-off-by: Salman Qazi <sqazi@google.com>
> Acked-by: John Stultz <johnstul@us.ibm.com>
> Reviewed-by: Paul Turner <pjt@google.com>
> Cc: stable@kernel.org
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Link: http://lkml.kernel.org/r/20111115221121.7262.88871.stgit@dungbeetle.mtv.corp.google.com
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> (cherry picked from commit 4cecf6d401a01d054afc1e5f605bcbfe553cb9b9)
> 
> Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
> BugLink: http://bugs.launchpad.net/bugs/805341
> ---
>  arch/x86/include/asm/timer.h |   23 ++++++++++++++++++++++-
>  1 files changed, 22 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
> index fa7b917..431793e 100644
> --- a/arch/x86/include/asm/timer.h
> +++ b/arch/x86/include/asm/timer.h
> @@ -32,6 +32,22 @@ extern int no_timer_check;
>   *  (mathieu.desnoyers@polymtl.ca)
>   *
>   *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
> + *
> + * In:
> + *
> + * ns = cycles * cyc2ns_scale / SC
> + *
> + * Although we may still have enough bits to store the value of ns,
> + * in some cases, we may not have enough bits to store cycles * cyc2ns_scale,
> + * leading to an incorrect result.
> + *
> + * To avoid this, we can decompose 'cycles' into quotient and remainder
> + * of division by SC.  Then,
> + *
> + * ns = (quot * SC + rem) * cyc2ns_scale / SC
> + *    = quot * cyc2ns_scale + (rem * cyc2ns_scale) / SC
> + *
> + *			- sqazi@google.com
>   */
>  
>  DECLARE_PER_CPU(unsigned long, cyc2ns);
> @@ -41,9 +57,14 @@ DECLARE_PER_CPU(unsigned long long, cyc2ns_offset);
>  
>  static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
>  {
> +	unsigned long long quot;
> +	unsigned long long rem;
>  	int cpu = smp_processor_id();
>  	unsigned long long ns = per_cpu(cyc2ns_offset, cpu);
> -	ns += cyc * per_cpu(cyc2ns, cpu) >> CYC2NS_SCALE_FACTOR;
> +	quot = (cyc >> CYC2NS_SCALE_FACTOR);
> +	rem = cyc & ((1ULL << CYC2NS_SCALE_FACTOR) - 1);
> +	ns += quot * per_cpu(cyc2ns, cpu) +
> +		((rem * per_cpu(cyc2ns, cpu)) >> CYC2NS_SCALE_FACTOR);
>  	return ns;
>  }

Clean cherry-pick. The math looks correct.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Tim Gardner - Jan. 12, 2012, 10:08 a.m.
On 01/11/2012 10:13 PM, Chris J Arges wrote:
> SRU Justification:
>
> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/805341
>
> Impact: Machines with uptimes of 200+ days will sometimes have soft lockups.
>
> Fix: This was fixed in upstream. It has already been backported to Lucid.
>
> Testcase: Run your computer for 200+ days and verify it doesn't lockup.
>
>
>

Patch

From a9747c4dc863ab62553dd65c4159b4b8e749042f Mon Sep 17 00:00:00 2001
From: Salman Qazi <sqazi@google.com>
Date: Wed, 11 Jan 2012 12:29:48 -0600
Subject: [PATCH] sched, x86: Avoid unnecessary overflow in sched_clock

(Added the missing signed-off-by line)

In hundreds of days, the __cycles_2_ns calculation in sched_clock
has an overflow.  cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
the final value to become zero.  We can solve this without losing
any precision.

We can decompose TSC into quotient and remainder of division by the
scale factor, and then use this to convert TSC into nanoseconds.

Signed-off-by: Salman Qazi <sqazi@google.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Reviewed-by: Paul Turner <pjt@google.com>
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111115221121.7262.88871.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
(cherry picked from commit 4cecf6d401a01d054afc1e5f605bcbfe553cb9b9)

Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/805341
---
 arch/x86/include/asm/timer.h |   23 ++++++++++++++++++++++-
 1 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
index fa7b917..431793e 100644
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -32,6 +32,22 @@  extern int no_timer_check;
  *  (mathieu.desnoyers@polymtl.ca)
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
+ *
+ * In:
+ *
+ * ns = cycles * cyc2ns_scale / SC
+ *
+ * Although we may still have enough bits to store the value of ns,
+ * in some cases, we may not have enough bits to store cycles * cyc2ns_scale,
+ * leading to an incorrect result.
+ *
+ * To avoid this, we can decompose 'cycles' into quotient and remainder
+ * of division by SC.  Then,
+ *
+ * ns = (quot * SC + rem) * cyc2ns_scale / SC
+ *    = quot * cyc2ns_scale + (rem * cyc2ns_scale) / SC
+ *
+ *			- sqazi@google.com
  */
 
 DECLARE_PER_CPU(unsigned long, cyc2ns);
@@ -41,9 +57,14 @@  DECLARE_PER_CPU(unsigned long long, cyc2ns_offset);
 
 static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
 {
+	unsigned long long quot;
+	unsigned long long rem;
 	int cpu = smp_processor_id();
 	unsigned long long ns = per_cpu(cyc2ns_offset, cpu);
-	ns += cyc * per_cpu(cyc2ns, cpu) >> CYC2NS_SCALE_FACTOR;
+	quot = (cyc >> CYC2NS_SCALE_FACTOR);
+	rem = cyc & ((1ULL << CYC2NS_SCALE_FACTOR) - 1);
+	ns += quot * per_cpu(cyc2ns, cpu) +
+		((rem * per_cpu(cyc2ns, cpu)) >> CYC2NS_SCALE_FACTOR);
 	return ns;
 }
 
-- 
1.7.5.4