diff mbox

[V4] POWER: perf_event: Skip updating kernel counters if register value shrinks

Message ID 1302891150-8121-1-git-send-email-emunson@mgebm.net (mailing list archive)
State Accepted, archived
Commit 86c74ab317c1ef4d37325e0d7ca8a01a796b0bd7
Headers show

Commit Message

Eric B Munson April 15, 2011, 6:12 p.m. UTC
Because of speculative event roll back, it is possible for some event coutners
to decrease between reads on POWER7.  This causes a problem with the way that
counters are updated.  Delta calues are calculated in a 64 bit value and the
top 32 bits are masked.  If the register value has decreased, this leaves us
with a very large positive value added to the kernel counters.  This patch
protects against this by skipping the update if the delta would be negative.
This can lead to a lack of precision in the coutner values, but from my testing
the value is typcially fewer than 10 samples at a time.

Signed-off-by: Eric B Munson <emunson@mgebm.net>
Cc: stable@kernel.org
---
Changes from V3:
 Fix delta checking so that only roll backs are discarded
Changes from V2:
 Create a helper that should handle counter roll back as well as registers that
might be allowed to roll over
Changes from V1:
 Updated patch leader
 Added stable CC
 Use an s32 to hold delta values and discard any values that are less than 0

 arch/powerpc/kernel/perf_event.c |   37 ++++++++++++++++++++++++++++++-------
 1 files changed, 30 insertions(+), 7 deletions(-)

Comments

David Laight April 27, 2011, 7:40 a.m. UTC | #1
I keep telling Eric that the code below is incorrect
modulo arithimetic...

> +static u64 check_and_compute_delta(u64 prev, u64 val)
> +{
> +	u64 delta = (val - prev) & 0xfffffffful;
> +
> +	/*
> +	 * POWER7 can roll back counter values, if the new value is
smaller
> +	 * than the previous value it will cause the delta and the
counter to
> +	 * have bogus values unless we rolled a counter over.  If a
coutner is
> +	 * rolled back, it will be smaller, but within 256, which is the
maximum
> +	 * number of events to rollback at once.  If we dectect a
rollback
> +	 * return 0.  This can lead to a small lack of precision in the
> +	 * counters.
> +	 */
> +	if (prev > val && (prev - val) < 256)
> +		delta = 0;
> +
> +	return delta;

The code should detect rollback by looking at the value of 'delta'
otherwise there are horrid end effects near 2^32-1.

For instance:
	u32 delta = val - prev;
	return delta & 0x80000000 ? 0 : delta;


   David
Eric B Munson April 27, 2011, 12:19 p.m. UTC | #2
On Wed, 27 Apr 2011, David Laight wrote:

> I keep telling Eric that the code below is incorrect
> modulo arithimetic...

But it isn't, and it doesn't have trouble with 2^32 - 1.  Here is one done by
hand:

Counter is at 0xffffffff and is rolled over to 0x101 (258 counted items so
that we miss the test for rollback).

 0x00000000ffffffff  (Remember counters are 32 bit, but we store them in 64)
-0x0000000000000101
=0xffffffff00000102

After the mask we have 0x00000000000102, the actual difference between the
counters.

> 
> > +static u64 check_and_compute_delta(u64 prev, u64 val)
> > +{
> > +	u64 delta = (val - prev) & 0xfffffffful;
> > +
> > +	/*
> > +	 * POWER7 can roll back counter values, if the new value is
> smaller
> > +	 * than the previous value it will cause the delta and the
> counter to
> > +	 * have bogus values unless we rolled a counter over.  If a
> coutner is
> > +	 * rolled back, it will be smaller, but within 256, which is the
> maximum
> > +	 * number of events to rollback at once.  If we dectect a
> rollback
> > +	 * return 0.  This can lead to a small lack of precision in the
> > +	 * counters.
> > +	 */
> > +	if (prev > val && (prev - val) < 256)
> > +		delta = 0;
> > +
> > +	return delta;
> 
> The code should detect rollback by looking at the value of 'delta'
> otherwise there are horrid end effects near 2^32-1.
> 
> For instance:
> 	u32 delta = val - prev;
> 	return delta & 0x80000000 ? 0 : delta;
> 
> 
>    David
> 
> 
> 
>
David Laight April 27, 2011, 12:34 p.m. UTC | #3
> But it isn't, and it doesn't have trouble with 2^32 - 1.  

what about:
prev = 0x00000001
val  = 0xffffffff

	David
David Laight April 27, 2011, 12:42 p.m. UTC | #4
> +	if (prev > val && (prev - val) < 256)
> +		delta = 0;
> +
> +	return delta;

Also, 'prev' is a true 64bit value, but 'val' is only ever 32bits.
So once the 64bit 'prev' exceeds 2^32+256 both 'prev > val'
and 'prev - val' are true regardless of the value of 'val'.
This will lead to jumps in the value ....

	David
Eric B Munson April 27, 2011, 12:59 p.m. UTC | #5
On Wed, 27 Apr 2011, David Laight wrote:

> > But it isn't, and it doesn't have trouble with 2^32 - 1.  
> 
> what about:
> prev = 0x00000001
> val  = 0xffffffff

Result is 0xfffffffe and we are fine.
> 
> 	David
> 
>
David Laight April 27, 2011, 1:04 p.m. UTC | #6
> On Wed, 27 Apr 2011, David Laight wrote:
> 
> > > But it isn't, and it doesn't have trouble with 2^32 - 1.  
> > 
> > what about:
> > prev = 0x00000001
> > val  = 0xffffffff
> 
> Result is 0xfffffffe and we are fine.

'delta' will be 0xfffffffe, but you need the function to
return zero - since the underlying counter has decremented
by 2. But 'prev > val' is false so the counter will
be increased by 2^32-2.

	David
Eric B Munson April 27, 2011, 1:08 p.m. UTC | #7
On Wed, 27 Apr 2011, David Laight wrote:

>  
> > +	if (prev > val && (prev - val) < 256)
> > +		delta = 0;
> > +
> > +	return delta;
> 
> Also, 'prev' is a true 64bit value, but 'val' is only ever 32bits.
> So once the 64bit 'prev' exceeds 2^32+256 both 'prev > val'
> and 'prev - val' are true regardless of the value of 'val'.
> This will lead to jumps in the value ....

prev and val are both 64 bit variables holding 32 bit numbers, we do not
accumulate in either, they are both replaced by values directly from the
registers.  So prev > val will not always be true.

Eric
David Laight April 27, 2011, 1:13 p.m. UTC | #8
> prev and val are both 64 bit variables holding 32 bit numbers, we do
not
> accumulate in either, they are both replaced by values directly from
the
> registers.
> So prev > val will not always be true.

The code seems to be:
    prev = local64_read(&event->hw.prev_count);
    val = read_pmc(event->hw.idx);
    delta = check_and_compute_delta(prev, val);
    local64_add(delta, &event->count);
Which looks very much like 'prev' being a 64bit counter generated
from the 32bit pmc register.

	David
Eric B Munson April 27, 2011, 1:20 p.m. UTC | #9
On Wed, 27 Apr 2011, David Laight wrote:

>  
> > prev and val are both 64 bit variables holding 32 bit numbers, we do
> not
> > accumulate in either, they are both replaced by values directly from
> the
> > registers.
> > So prev > val will not always be true.
> 
> The code seems to be:
>     prev = local64_read(&event->hw.prev_count);
>     val = read_pmc(event->hw.idx);
>     delta = check_and_compute_delta(prev, val);
>     local64_add(delta, &event->count);
> Which looks very much like 'prev' being a 64bit counter generated
> from the 32bit pmc register.
> 

Which implies that it will only ever be 32 bits wide, just stored in 64.
Benjamin Herrenschmidt April 27, 2011, 1:26 p.m. UTC | #10
On Wed, 2011-04-27 at 08:40 +0100, David Laight wrote:
> I keep telling Eric that the code below is incorrect
> modulo arithimetic...

His previous versions were wrong yes. This one should be limping along
afaik. But I tend to agree, testing delta is the way to go.

Eric idea was to not test the sign bit to allow for bigger increments of
the counter, since the rollback is limited to 255... But I agree it's
non obvious.

I've put the patch in for now, we can always followup with something
better.

Cheers,
Ben.

> > +static u64 check_and_compute_delta(u64 prev, u64 val)
> > +{
> > +	u64 delta = (val - prev) & 0xfffffffful;
> > +
> > +	/*
> > +	 * POWER7 can roll back counter values, if the new value is
> smaller
> > +	 * than the previous value it will cause the delta and the
> counter to
> > +	 * have bogus values unless we rolled a counter over.  If a
> coutner is
> > +	 * rolled back, it will be smaller, but within 256, which is the
> maximum
> > +	 * number of events to rollback at once.  If we dectect a
> rollback
> > +	 * return 0.  This can lead to a small lack of precision in the
> > +	 * counters.
> > +	 */
> > +	if (prev > val && (prev - val) < 256)
> > +		delta = 0;
> > +
> > +	return delta;
> 
> The code should detect rollback by looking at the value of 'delta'
> otherwise there are horrid end effects near 2^32-1.
> 
> For instance:
> 	u32 delta = val - prev;
> 	return delta & 0x80000000 ? 0 : delta;
> 
> 
>    David
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
index c4063b7..822f630 100644
--- a/arch/powerpc/kernel/perf_event.c
+++ b/arch/powerpc/kernel/perf_event.c
@@ -398,6 +398,25 @@  static int check_excludes(struct perf_event **ctrs, unsigned int cflags[],
 	return 0;
 }
 
+static u64 check_and_compute_delta(u64 prev, u64 val)
+{
+	u64 delta = (val - prev) & 0xfffffffful;
+
+	/*
+	 * POWER7 can roll back counter values, if the new value is smaller
+	 * than the previous value it will cause the delta and the counter to
+	 * have bogus values unless we rolled a counter over.  If a coutner is
+	 * rolled back, it will be smaller, but within 256, which is the maximum
+	 * number of events to rollback at once.  If we dectect a rollback
+	 * return 0.  This can lead to a small lack of precision in the
+	 * counters.
+	 */
+	if (prev > val && (prev - val) < 256)
+		delta = 0;
+
+	return delta;
+}
+
 static void power_pmu_read(struct perf_event *event)
 {
 	s64 val, delta, prev;
@@ -416,10 +435,11 @@  static void power_pmu_read(struct perf_event *event)
 		prev = local64_read(&event->hw.prev_count);
 		barrier();
 		val = read_pmc(event->hw.idx);
+		delta = check_and_compute_delta(prev, val);
+		if (!delta)
+			return;
 	} while (local64_cmpxchg(&event->hw.prev_count, prev, val) != prev);
 
-	/* The counters are only 32 bits wide */
-	delta = (val - prev) & 0xfffffffful;
 	local64_add(delta, &event->count);
 	local64_sub(delta, &event->hw.period_left);
 }
@@ -449,8 +469,9 @@  static void freeze_limited_counters(struct cpu_hw_events *cpuhw,
 		val = (event->hw.idx == 5) ? pmc5 : pmc6;
 		prev = local64_read(&event->hw.prev_count);
 		event->hw.idx = 0;
-		delta = (val - prev) & 0xfffffffful;
-		local64_add(delta, &event->count);
+		delta = check_and_compute_delta(prev, val);
+		if (delta)
+			local64_add(delta, &event->count);
 	}
 }
 
@@ -458,14 +479,16 @@  static void thaw_limited_counters(struct cpu_hw_events *cpuhw,
 				  unsigned long pmc5, unsigned long pmc6)
 {
 	struct perf_event *event;
-	u64 val;
+	u64 val, prev;
 	int i;
 
 	for (i = 0; i < cpuhw->n_limited; ++i) {
 		event = cpuhw->limited_counter[i];
 		event->hw.idx = cpuhw->limited_hwidx[i];
 		val = (event->hw.idx == 5) ? pmc5 : pmc6;
-		local64_set(&event->hw.prev_count, val);
+		prev = local64_read(&event->hw.prev_count);
+		if (check_and_compute_delta(prev, val))
+			local64_set(&event->hw.prev_count, val);
 		perf_event_update_userpage(event);
 	}
 }
@@ -1197,7 +1220,7 @@  static void record_and_restart(struct perf_event *event, unsigned long val,
 
 	/* we don't have to worry about interrupts here */
 	prev = local64_read(&event->hw.prev_count);
-	delta = (val - prev) & 0xfffffffful;
+	delta = check_and_compute_delta(prev, val);
 	local64_add(delta, &event->count);
 
 	/*