Message ID | 1301059689-4556-1-git-send-email-emunson@mgebm.net (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Fri, 2011-03-25 at 09:28 -0400, Eric B Munson wrote: > It is possible on POWER7 for some perf events to have values decrease. This > causes a problem with the way the kernel counters are updated. Deltas are > computed and then stored in a 64 bit value while the registers are 32 bits > wide so if new value is smaller than previous value, the delta is a very > large positive value. As a work around this patch skips updating the kernel > counter in when the new value is smaller than the previous. This can lead to > a lack of precision in the coutner values, but from my testing the value is > typcially fewer than 10 samples at a time. Unfortunately the patch isn't 100% correct I believe: I think you don't deal with the rollover of the counters. The new value could be smaller than the previous one simply because the counter just rolled over. In cases like this: > @@ -449,8 +458,10 @@ static void freeze_limited_counters(struct cpu_hw_events *cpuhw, > val = (event->hw.idx == 5) ? pmc5 : pmc6; > prev = local64_read(&event->hw.prev_count); > event->hw.idx = 0; > - delta = (val - prev) & 0xfffffffful; > - local64_add(delta, &event->count); > + if (val >= prev) { > + delta = (val - prev) & 0xfffffffful; > + local64_add(delta, &event->count); > + } > } > } I wonder if it isn't easier to just define delta to be a s32, get rid of the mask and test if delta is positive, something like: delta = val - prev; if (delta > 0) local64_add(delta, &event->count); Wouldn't that be simpler ? Or do I miss a reason why it wouldn't work ? Cheers, Ben.
On Tue, 29 Mar 2011, Benjamin Herrenschmidt wrote: > On Fri, 2011-03-25 at 09:28 -0400, Eric B Munson wrote: > > It is possible on POWER7 for some perf events to have values decrease. This > > causes a problem with the way the kernel counters are updated. Deltas are > > computed and then stored in a 64 bit value while the registers are 32 bits > > wide so if new value is smaller than previous value, the delta is a very > > large positive value. As a work around this patch skips updating the kernel > > counter in when the new value is smaller than the previous. This can lead to > > a lack of precision in the coutner values, but from my testing the value is > > typcially fewer than 10 samples at a time. > > Unfortunately the patch isn't 100% correct I believe: > > I think you don't deal with the rollover of the counters. The new value > could be smaller than the previous one simply because the counter just > rolled over. > > In cases like this: > > > @@ -449,8 +458,10 @@ static void freeze_limited_counters(struct cpu_hw_events *cpuhw, > > val = (event->hw.idx == 5) ? pmc5 : pmc6; > > prev = local64_read(&event->hw.prev_count); > > event->hw.idx = 0; > > - delta = (val - prev) & 0xfffffffful; > > - local64_add(delta, &event->count); > > + if (val >= prev) { > > + delta = (val - prev) & 0xfffffffful; > > + local64_add(delta, &event->count); > > + } > > } > > } > > I wonder if it isn't easier to just define delta to be a s32, get rid > of the mask and test if delta is positive, something like: > > delta = val - prev; > if (delta > 0) > local64_add(delta, &event->count); > > Wouldn't that be simpler ? Or do I miss a reason why it wouldn't work ? Here I made the assumption that the hardware would never remove more events in a speculative roll back than it had added. This is not a situation I encoutered in my limited testing, so I didn't think underflow was possible. I will send out a V2 using the signed 32 bit delta and remeber to CC stable this time. Eric
On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote: > Here I made the assumption that the hardware would never remove more events in > a speculative roll back than it had added. This is not a situation I > encoutered in my limited testing, so I didn't think underflow was possible. I > will send out a V2 using the signed 32 bit delta and remeber to CC stable > this time. I'm not thinking about underflow but rollover... or that isn't possible with those counters ? IE. They don't wrap back to 0 after hitting ffffffff ? Cheers, Ben.
On Wed, 30 Mar 2011, Benjamin Herrenschmidt wrote: > On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote: > > Here I made the assumption that the hardware would never remove more events in > > a speculative roll back than it had added. This is not a situation I > > encoutered in my limited testing, so I didn't think underflow was possible. I > > will send out a V2 using the signed 32 bit delta and remeber to CC stable > > this time. > > I'm not thinking about underflow but rollover... or that isn't possible > with those counters ? IE. They don't wrap back to 0 after hitting > ffffffff ? > They do roll over to 0 after ffffffff, but I thought that case was already covered by the perf_event_interrupt. Are you concerned that we will reset a counter and speculative roll back will underflow that counter?
On Wed, 2011-03-30 at 14:36 -0400, Eric B Munson wrote: > On Wed, 30 Mar 2011, Benjamin Herrenschmidt wrote: > > > On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote: > > > Here I made the assumption that the hardware would never remove more events in > > > a speculative roll back than it had added. This is not a situation I > > > encoutered in my limited testing, so I didn't think underflow was possible. I > > > will send out a V2 using the signed 32 bit delta and remeber to CC stable > > > this time. > > > > I'm not thinking about underflow but rollover... or that isn't possible > > with those counters ? IE. They don't wrap back to 0 after hitting > > ffffffff ? > > > > They do roll over to 0 after ffffffff, but I thought that case was already > covered by the perf_event_interrupt. Are you concerned that we will reset a > counter and speculative roll back will underflow that counter? No, but take this part of the patch: > --- a/arch/powerpc/kernel/perf_event.c > +++ b/arch/powerpc/kernel/perf_event.c > @@ -416,6 +416,15 @@ static void power_pmu_read(struct perf_event *event) > prev = local64_read(&event->hw.prev_count); > barrier(); > val = read_pmc(event->hw.idx); > + /* > + * POWER7 can roll back counter values, if the new value is > + * smaller than the previous value it will cause the delta > + * and the counter to have bogus values. If this is the > + * case skip updating anything until the counter grows again. > + * This can lead to a small lack of precision in the counters. > + */ > + if (val < prev) > + return; > } while (local64_cmpxchg(&event->hw.prev_count, prev, val) != prev); Doesn't that mean that power_pmu_read() can only ever increase the value of the perf_event and so will essentially -stop- once the counter rolls over ? Similar comments every where you do this type of comparison. Cheers, Ben.
On Thu, 31 Mar 2011, Benjamin Herrenschmidt wrote: > On Wed, 2011-03-30 at 14:36 -0400, Eric B Munson wrote: > > On Wed, 30 Mar 2011, Benjamin Herrenschmidt wrote: > > > > > On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote: > > > > Here I made the assumption that the hardware would never remove more events in > > > > a speculative roll back than it had added. This is not a situation I > > > > encoutered in my limited testing, so I didn't think underflow was possible. I > > > > will send out a V2 using the signed 32 bit delta and remeber to CC stable > > > > this time. > > > > > > I'm not thinking about underflow but rollover... or that isn't possible > > > with those counters ? IE. They don't wrap back to 0 after hitting > > > ffffffff ? > > > > > > > They do roll over to 0 after ffffffff, but I thought that case was already > > covered by the perf_event_interrupt. Are you concerned that we will reset a > > counter and speculative roll back will underflow that counter? > > No, but take this part of the patch: > > > --- a/arch/powerpc/kernel/perf_event.c > > +++ b/arch/powerpc/kernel/perf_event.c > > @@ -416,6 +416,15 @@ static void power_pmu_read(struct perf_event *event) > > prev = local64_read(&event->hw.prev_count); > > barrier(); > > val = read_pmc(event->hw.idx); > > + /* > > + * POWER7 can roll back counter values, if the new value is > > + * smaller than the previous value it will cause the delta > > + * and the counter to have bogus values. If this is the > > + * case skip updating anything until the counter grows again. > > + * This can lead to a small lack of precision in the counters. > > + */ > > + if (val < prev) > > + return; > > } while (local64_cmpxchg(&event->hw.prev_count, prev, val) != prev); > > Doesn't that mean that power_pmu_read() can only ever increase the value of > the perf_event and so will essentially -stop- once the counter rolls over ? > > Similar comments every where you do this type of comparison. > Sorry for being so dense on this, but I think that when a counter overflows both the register value and the previous value are reset so we should continue seeing new event counts after the overflow interrupt handler puts the counter back into a sane state. What am I not seeing? Eric
On Thu, 31 Mar 2011, Benjamin Herrenschmidt wrote: > On Wed, 2011-03-30 at 14:36 -0400, Eric B Munson wrote: > > On Wed, 30 Mar 2011, Benjamin Herrenschmidt wrote: > > > > > On Tue, 2011-03-29 at 10:25 -0400, Eric B Munson wrote: > > > > Here I made the assumption that the hardware would never remove more events in > > > > a speculative roll back than it had added. This is not a situation I > > > > encoutered in my limited testing, so I didn't think underflow was possible. I > > > > will send out a V2 using the signed 32 bit delta and remeber to CC stable > > > > this time. > > > > > > I'm not thinking about underflow but rollover... or that isn't possible > > > with those counters ? IE. They don't wrap back to 0 after hitting > > > ffffffff ? > > > > > > > They do roll over to 0 after ffffffff, but I thought that case was already > > covered by the perf_event_interrupt. Are you concerned that we will reset a > > counter and speculative roll back will underflow that counter? > > No, but take this part of the patch: > > > --- a/arch/powerpc/kernel/perf_event.c > > +++ b/arch/powerpc/kernel/perf_event.c > > @@ -416,6 +416,15 @@ static void power_pmu_read(struct perf_event *event) > > prev = local64_read(&event->hw.prev_count); > > barrier(); > > val = read_pmc(event->hw.idx); > > + /* > > + * POWER7 can roll back counter values, if the new value is > > + * smaller than the previous value it will cause the delta > > + * and the counter to have bogus values. If this is the > > + * case skip updating anything until the counter grows again. > > + * This can lead to a small lack of precision in the counters. > > + */ > > + if (val < prev) > > + return; > > } while (local64_cmpxchg(&event->hw.prev_count, prev, val) != prev); > > Doesn't that mean that power_pmu_read() can only ever increase the value of > the perf_event and so will essentially -stop- once the counter rolls over ? > > Similar comments every where you do this type of comparison. > > Cheers, > Ben. Sorry for the nag, but am I missing something about the way the register and the previous values are reset in the overflow interrupt handler? Thanks, Eric
> > Doesn't that mean that power_pmu_read() can only ever increase the value of > > the perf_event and so will essentially -stop- once the counter rolls over ? > > > > Similar comments every where you do this type of comparison. > > > > Cheers, > > Ben. > > Sorry for the nag, but am I missing something about the way the register and > the previous values are reset in the overflow interrupt handler? Well, not all counters get interrupts right ? Some counters are just free running... I'm not sure when that power_pmu_read() function is actually used by the core, I'm not that familiar with perf, but I'd say better safe than sorry. When comparing counter values, doing in a way that is generally safe vs. wraparounds. Eventually do a helper for that. Cheers, Ben.
On Thu, 07 Apr 2011, Benjamin Herrenschmidt wrote: > > > > Doesn't that mean that power_pmu_read() can only ever increase the value of > > > the perf_event and so will essentially -stop- once the counter rolls over ? > > > > > > Similar comments every where you do this type of comparison. > > > > > > Cheers, > > > Ben. > > > > Sorry for the nag, but am I missing something about the way the register and > > the previous values are reset in the overflow interrupt handler? > > Well, not all counters get interrupts right ? Some counters are just > free running... I'm not sure when that power_pmu_read() function is > actually used by the core, I'm not that familiar with perf, but I'd say > better safe than sorry. When comparing counter values, doing in a way > that is generally safe vs. wraparounds. Eventually do a helper for that. > > Cheers, > Ben. I am honestly not sure, I was under the assumption that all counters would generate an interrupt if they overflowed. I do not have the hardware docs to prove this, so I will have a V3 that (I think/hope) addresses your concerns out momentarily. Eric
diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c index 97e0ae4..6752dc1 100644 --- a/arch/powerpc/kernel/perf_event.c +++ b/arch/powerpc/kernel/perf_event.c @@ -416,6 +416,15 @@ static void power_pmu_read(struct perf_event *event) prev = local64_read(&event->hw.prev_count); barrier(); val = read_pmc(event->hw.idx); + /* + * POWER7 can roll back counter values, if the new value is + * smaller than the previous value it will cause the delta + * and the counter to have bogus values. If this is the + * case skip updating anything until the counter grows again. + * This can lead to a small lack of precision in the counters. + */ + if (val < prev) + return; } while (local64_cmpxchg(&event->hw.prev_count, prev, val) != prev); /* The counters are only 32 bits wide */ @@ -449,8 +458,10 @@ static void freeze_limited_counters(struct cpu_hw_events *cpuhw, val = (event->hw.idx == 5) ? pmc5 : pmc6; prev = local64_read(&event->hw.prev_count); event->hw.idx = 0; - delta = (val - prev) & 0xfffffffful; - local64_add(delta, &event->count); + if (val >= prev) { + delta = (val - prev) & 0xfffffffful; + local64_add(delta, &event->count); + } } } @@ -458,14 +469,16 @@ static void thaw_limited_counters(struct cpu_hw_events *cpuhw, unsigned long pmc5, unsigned long pmc6) { struct perf_event *event; - u64 val; + u64 val, prev; int i; for (i = 0; i < cpuhw->n_limited; ++i) { event = cpuhw->limited_counter[i]; event->hw.idx = cpuhw->limited_hwidx[i]; val = (event->hw.idx == 5) ? pmc5 : pmc6; - local64_set(&event->hw.prev_count, val); + prev = local64_read(&event->hw.prev_count); + if (val > prev) + local64_set(&event->hw.prev_count, val); perf_event_update_userpage(event); } } @@ -1197,7 +1210,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val, /* we don't have to worry about interrupts here */ prev = local64_read(&event->hw.prev_count); - delta = (val - prev) & 0xfffffffful; + if (val < prev) + delta = 0; + else + delta = (val - prev) & 0xfffffffful; local64_add(delta, &event->count); /*
It is possible on POWER7 for some perf events to have values decrease. This causes a problem with the way the kernel counters are updated. Deltas are computed and then stored in a 64 bit value while the registers are 32 bits wide so if new value is smaller than previous value, the delta is a very large positive value. As a work around this patch skips updating the kernel counter in when the new value is smaller than the previous. This can lead to a lack of precision in the coutner values, but from my testing the value is typcially fewer than 10 samples at a time. Signed-off-by: Eric B Munson <emunson@mgebm.net> --- arch/powerpc/kernel/perf_event.c | 26 +++++++++++++++++++++----- 1 files changed, 21 insertions(+), 5 deletions(-)