diff mbox

POWER: perf_event: Skip updating kernel counters if register value shrinks

Message ID 1301059689-4556-1-git-send-email-emunson@mgebm.net (mailing list archive)
State Changes Requested
Headers show

Commit Message

Eric B Munson March 25, 2011, 1:28 p.m. UTC
It is possible on POWER7 for some perf events to have values decrease.  This
causes a problem with the way the kernel counters are updated.  Deltas are
computed and then stored in a 64 bit value while the registers are 32 bits
wide so if new value is smaller than previous value, the delta is a very
large positive value.  As a work around this patch skips updating the kernel
counter in when the new value is smaller than the previous.  This can lead to
a lack of precision in the coutner values, but from my testing the value is
typcially fewer than 10 samples at a time.

Signed-off-by: Eric B Munson <emunson@mgebm.net>
---
 arch/powerpc/kernel/perf_event.c |   26 +++++++++++++++++++++-----
 1 files changed, 21 insertions(+), 5 deletions(-)

Comments

Benjamin Herrenschmidt March 29, 2011, 6:03 a.m. UTC | #1
On Fri, 2011-03-25 at 09:28 -0400, Eric B Munson wrote:
> It is possible on POWER7 for some perf events to have values decrease.  This
> causes a problem with the way the kernel counters are updated.  Deltas are
> computed and then stored in a 64 bit value while the registers are 32 bits
> wide so if new value is smaller than previous value, the delta is a very
> large positive value.  As a work around this patch skips updating the kernel
> counter in when the new value is smaller than the previous.  This can lead to
> a lack of precision in the coutner values, but from my testing the value is
> typcially fewer than 10 samples at a time.

Unfortunately the patch isn't 100% correct I believe:

I think you don't deal with the rollover of the counters. The new value
could be smaller than the previous one simply because the counter just
rolled over.

In cases like this:

> @@ -449,8 +458,10 @@ static void freeze_limited_counters(struct cpu_hw_events *cpuhw,
>  		val = (event->hw.idx == 5) ? pmc5 : pmc6;
>  		prev = local64_read(&event->hw.prev_count);
>  		event->hw.idx = 0;
> -		delta = (val - prev) & 0xfffffffful;
> -		local64_add(delta, &event->count);
> +		if (val >= prev) {
> +			delta = (val - prev) & 0xfffffffful;
> +			local64_add(delta, &event->count);
> +		}
>  	}
>  }

I wonder if it isn't easier to just define delta to be a s32, get rid
of the mask and test if delta is positive, something like:

		delta =  val - prev;
		if (delta > 0)
			local64_add(delta, &event->count);

Wouldn't that be simpler ? Or do I miss a reason why it wouldn't work ?

Cheers,
Ben.
Eric B Munson March 29, 2011, 2:25 p.m. UTC | #2
On Tue, 29 Mar 2011, Benjamin Herrenschmidt wrote:

> On Fri, 2011-03-25 at 09:28 -0400, Eric B Munson wrote:
> > It is possible on POWER7 for some perf events to have values decrease.  This
> > causes a problem with the way the kernel counters are updated.  Deltas are
> > computed and then stored in a 64 bit value while the registers are 32 bits
> > wide so if new value is smaller than previous value, the delta is a very
> > large positive value.  As a work around this patch skips updating the kernel
> > counter in when the new value is smaller than the previous.  This can lead to
> > a lack of precision in the coutner values, but from my testing the value is
> > typcially fewer than 10 samples at a time.
> 
> Unfortunately the patch isn't 100% correct I believe:
> 
> I think you don't deal with the rollover of the counters. The new value
> could be smaller than the previous one simply because the counter just
> rolled over.
> 
> In cases like this:
> 
> > @@ -449,8 +458,10 @@ static void freeze_limited_counters(struct cpu_hw_events *cpuhw,
> >  		val = (event->hw.idx == 5) ? pmc5 : pmc6;
> >  		prev = local64_read(&event->hw.prev_count);
> >  		event->hw.idx = 0;
> > -		delta = (val - prev) & 0xfffffffful;
> > -		local64_add(delta, &event->count);
> > +		if (val >= prev) {
> > +			delta = (val - prev) & 0xfffffffful;
> > +			local64_add(delta, &event->count);
> > +		}
> >  	}
> >  }
> 
> I wonder if it isn't easier to just define delta to be a s32, get rid
> of the mask and test if delta is positive, something like:
> 
> 		delta =  val - prev;
> 		if (delta > 0)
> 			local64_add(delta, &event->count);
> 
> Wouldn't that be simpler ? Or do I miss a reason why it wouldn't work ?

Here I made the assumption that the hardware would never remove more events in
a speculative roll back than it had added.  This is not a situation I
encoutered in my limited testing, so I didn't think underflow was possible.  I
will send out a V2 using the signed 32 bit delta and remeber to CC stable
this time.

Eric
Benjamin Herrenschmidt March 29, 2011, 9:12 p.m. UTC | #3
On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote:
> Here I made the assumption that the hardware would never remove more events in
> a speculative roll back than it had added.  This is not a situation I
> encoutered in my limited testing, so I didn't think underflow was possible.  I
> will send out a V2 using the signed 32 bit delta and remeber to CC stable
> this time. 

I'm not thinking about underflow but rollover... or that isn't possible
with those counters ? IE. They don't wrap back to 0 after hitting
ffffffff ?

Cheers,
Ben.
Eric B Munson March 30, 2011, 6:36 p.m. UTC | #4
On Wed, 30 Mar 2011, Benjamin Herrenschmidt wrote:

> On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote:
> > Here I made the assumption that the hardware would never remove more events in
> > a speculative roll back than it had added.  This is not a situation I
> > encoutered in my limited testing, so I didn't think underflow was possible.  I
> > will send out a V2 using the signed 32 bit delta and remeber to CC stable
> > this time. 
> 
> I'm not thinking about underflow but rollover... or that isn't possible
> with those counters ? IE. They don't wrap back to 0 after hitting
> ffffffff ?
> 

They do roll over to 0 after ffffffff, but I thought that case was already
covered by the perf_event_interrupt.  Are you concerned that we will reset a
counter and speculative roll back will underflow that counter?
Benjamin Herrenschmidt March 31, 2011, 6:04 a.m. UTC | #5
On Wed, 2011-03-30 at 14:36 -0400, Eric B Munson wrote:
> On Wed, 30 Mar 2011, Benjamin Herrenschmidt wrote:
> 
> > On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote:
> > > Here I made the assumption that the hardware would never remove more events in
> > > a speculative roll back than it had added.  This is not a situation I
> > > encoutered in my limited testing, so I didn't think underflow was possible.  I
> > > will send out a V2 using the signed 32 bit delta and remeber to CC stable
> > > this time. 
> > 
> > I'm not thinking about underflow but rollover... or that isn't possible
> > with those counters ? IE. They don't wrap back to 0 after hitting
> > ffffffff ?
> > 
> 
> They do roll over to 0 after ffffffff, but I thought that case was already
> covered by the perf_event_interrupt.  Are you concerned that we will reset a
> counter and speculative roll back will underflow that counter?

No, but take this part of the patch:

> --- a/arch/powerpc/kernel/perf_event.c
> +++ b/arch/powerpc/kernel/perf_event.c
> @@ -416,6 +416,15 @@ static void power_pmu_read(struct perf_event *event)
>  		prev = local64_read(&event->hw.prev_count);
>  		barrier();
>  		val = read_pmc(event->hw.idx);
> +		/*
> +		 * POWER7 can roll back counter values, if the new value is
> +		 * smaller than the previous value it will cause the delta
> +		 * and the counter to have bogus values.  If this is the
> +		 * case skip updating anything until the counter grows again.
> +		 * This can lead to a small lack of precision in the counters.
> +		 */
> +		if (val < prev)
> +			return;
>  	} while (local64_cmpxchg(&event->hw.prev_count, prev, val) != prev);

Doesn't that mean that power_pmu_read() can only ever increase the value of
the perf_event and so will essentially -stop- once the counter rolls over ?

Similar comments every where you do this type of comparison.

Cheers,
Ben.
Eric B Munson March 31, 2011, 4:14 p.m. UTC | #6
On Thu, 31 Mar 2011, Benjamin Herrenschmidt wrote:

> On Wed, 2011-03-30 at 14:36 -0400, Eric B Munson wrote:
> > On Wed, 30 Mar 2011, Benjamin Herrenschmidt wrote:
> > 
> > > On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote:
> > > > Here I made the assumption that the hardware would never remove more events in
> > > > a speculative roll back than it had added.  This is not a situation I
> > > > encoutered in my limited testing, so I didn't think underflow was possible.  I
> > > > will send out a V2 using the signed 32 bit delta and remeber to CC stable
> > > > this time. 
> > > 
> > > I'm not thinking about underflow but rollover... or that isn't possible
> > > with those counters ? IE. They don't wrap back to 0 after hitting
> > > ffffffff ?
> > > 
> > 
> > They do roll over to 0 after ffffffff, but I thought that case was already
> > covered by the perf_event_interrupt.  Are you concerned that we will reset a
> > counter and speculative roll back will underflow that counter?
> 
> No, but take this part of the patch:
> 
> > --- a/arch/powerpc/kernel/perf_event.c
> > +++ b/arch/powerpc/kernel/perf_event.c
> > @@ -416,6 +416,15 @@ static void power_pmu_read(struct perf_event *event)
> >  		prev = local64_read(&event->hw.prev_count);
> >  		barrier();
> >  		val = read_pmc(event->hw.idx);
> > +		/*
> > +		 * POWER7 can roll back counter values, if the new value is
> > +		 * smaller than the previous value it will cause the delta
> > +		 * and the counter to have bogus values.  If this is the
> > +		 * case skip updating anything until the counter grows again.
> > +		 * This can lead to a small lack of precision in the counters.
> > +		 */
> > +		if (val < prev)
> > +			return;
> >  	} while (local64_cmpxchg(&event->hw.prev_count, prev, val) != prev);
> 
> Doesn't that mean that power_pmu_read() can only ever increase the value of
> the perf_event and so will essentially -stop- once the counter rolls over ?
> 
> Similar comments every where you do this type of comparison.
> 

Sorry for being so dense on this, but I think that when a counter overflows
both the register value and the previous value are reset so we should continue
seeing new event counts after the overflow interrupt handler puts the counter
back into a sane state.  What am I not seeing?

Eric
Eric B Munson April 6, 2011, 9:27 p.m. UTC | #7
On Thu, 31 Mar 2011, Benjamin Herrenschmidt wrote:

> On Wed, 2011-03-30 at 14:36 -0400, Eric B Munson wrote:
> > On Wed, 30 Mar 2011, Benjamin Herrenschmidt wrote:
> > 
> > > On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote:
> > > > Here I made the assumption that the hardware would never remove more events in
> > > > a speculative roll back than it had added.  This is not a situation I
> > > > encoutered in my limited testing, so I didn't think underflow was possible.  I
> > > > will send out a V2 using the signed 32 bit delta and remeber to CC stable
> > > > this time. 
> > > 
> > > I'm not thinking about underflow but rollover... or that isn't possible
> > > with those counters ? IE. They don't wrap back to 0 after hitting
> > > ffffffff ?
> > > 
> > 
> > They do roll over to 0 after ffffffff, but I thought that case was already
> > covered by the perf_event_interrupt.  Are you concerned that we will reset a
> > counter and speculative roll back will underflow that counter?
> 
> No, but take this part of the patch:
> 
> > --- a/arch/powerpc/kernel/perf_event.c
> > +++ b/arch/powerpc/kernel/perf_event.c
> > @@ -416,6 +416,15 @@ static void power_pmu_read(struct perf_event *event)
> >  		prev = local64_read(&event->hw.prev_count);
> >  		barrier();
> >  		val = read_pmc(event->hw.idx);
> > +		/*
> > +		 * POWER7 can roll back counter values, if the new value is
> > +		 * smaller than the previous value it will cause the delta
> > +		 * and the counter to have bogus values.  If this is the
> > +		 * case skip updating anything until the counter grows again.
> > +		 * This can lead to a small lack of precision in the counters.
> > +		 */
> > +		if (val < prev)
> > +			return;
> >  	} while (local64_cmpxchg(&event->hw.prev_count, prev, val) != prev);
> 
> Doesn't that mean that power_pmu_read() can only ever increase the value of
> the perf_event and so will essentially -stop- once the counter rolls over ?
> 
> Similar comments every where you do this type of comparison.
> 
> Cheers,
> Ben.

Sorry for the nag, but am I missing something about the way the register and
the previous values are reset in the overflow interrupt handler?

Thanks,
Eric
Benjamin Herrenschmidt April 7, 2011, 4:22 a.m. UTC | #8
> > Doesn't that mean that power_pmu_read() can only ever increase the value of
> > the perf_event and so will essentially -stop- once the counter rolls over ?
> > 
> > Similar comments every where you do this type of comparison.
> > 
> > Cheers,
> > Ben.
> 
> Sorry for the nag, but am I missing something about the way the register and
> the previous values are reset in the overflow interrupt handler?

Well, not all counters get interrupts right ? Some counters are just
free running... I'm not sure when that power_pmu_read() function is
actually used by the core, I'm not that familiar with perf, but I'd say
better safe than sorry. When comparing counter values, doing in a way
that is generally safe vs. wraparounds. Eventually do a helper for that.

Cheers,
Ben.
Eric B Munson April 7, 2011, 4:16 p.m. UTC | #9
On Thu, 07 Apr 2011, Benjamin Herrenschmidt wrote:

> 
> > > Doesn't that mean that power_pmu_read() can only ever increase the value of
> > > the perf_event and so will essentially -stop- once the counter rolls over ?
> > > 
> > > Similar comments every where you do this type of comparison.
> > > 
> > > Cheers,
> > > Ben.
> > 
> > Sorry for the nag, but am I missing something about the way the register and
> > the previous values are reset in the overflow interrupt handler?
> 
> Well, not all counters get interrupts right ? Some counters are just
> free running... I'm not sure when that power_pmu_read() function is
> actually used by the core, I'm not that familiar with perf, but I'd say
> better safe than sorry. When comparing counter values, doing in a way
> that is generally safe vs. wraparounds. Eventually do a helper for that.
> 
> Cheers,
> Ben.

I am honestly not sure, I was under the assumption that all counters would
generate an interrupt if they overflowed.  I do not have the hardware docs to
prove this, so I will have a V3 that (I think/hope) addresses your concerns out
momentarily.

Eric
diff mbox

Patch

diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
index 97e0ae4..6752dc1 100644
--- a/arch/powerpc/kernel/perf_event.c
+++ b/arch/powerpc/kernel/perf_event.c
@@ -416,6 +416,15 @@  static void power_pmu_read(struct perf_event *event)
 		prev = local64_read(&event->hw.prev_count);
 		barrier();
 		val = read_pmc(event->hw.idx);
+		/*
+		 * POWER7 can roll back counter values, if the new value is
+		 * smaller than the previous value it will cause the delta
+		 * and the counter to have bogus values.  If this is the
+		 * case skip updating anything until the counter grows again.
+		 * This can lead to a small lack of precision in the counters.
+		 */
+		if (val < prev)
+			return;
 	} while (local64_cmpxchg(&event->hw.prev_count, prev, val) != prev);
 
 	/* The counters are only 32 bits wide */
@@ -449,8 +458,10 @@  static void freeze_limited_counters(struct cpu_hw_events *cpuhw,
 		val = (event->hw.idx == 5) ? pmc5 : pmc6;
 		prev = local64_read(&event->hw.prev_count);
 		event->hw.idx = 0;
-		delta = (val - prev) & 0xfffffffful;
-		local64_add(delta, &event->count);
+		if (val >= prev) {
+			delta = (val - prev) & 0xfffffffful;
+			local64_add(delta, &event->count);
+		}
 	}
 }
 
@@ -458,14 +469,16 @@  static void thaw_limited_counters(struct cpu_hw_events *cpuhw,
 				  unsigned long pmc5, unsigned long pmc6)
 {
 	struct perf_event *event;
-	u64 val;
+	u64 val, prev;
 	int i;
 
 	for (i = 0; i < cpuhw->n_limited; ++i) {
 		event = cpuhw->limited_counter[i];
 		event->hw.idx = cpuhw->limited_hwidx[i];
 		val = (event->hw.idx == 5) ? pmc5 : pmc6;
-		local64_set(&event->hw.prev_count, val);
+		prev = local64_read(&event->hw.prev_count);
+		if (val > prev)
+			local64_set(&event->hw.prev_count, val);
 		perf_event_update_userpage(event);
 	}
 }
@@ -1197,7 +1210,10 @@  static void record_and_restart(struct perf_event *event, unsigned long val,
 
 	/* we don't have to worry about interrupts here */
 	prev = local64_read(&event->hw.prev_count);
-	delta = (val - prev) & 0xfffffffful;
+	if (val < prev)
+		delta = 0;
+	else
+		delta = (val - prev) & 0xfffffffful;
 	local64_add(delta, &event->count);
 
 	/*