diff mbox

powerpc/perf: hw breakpoints return ENOSPC

Message ID 28857.1345091034@neuling.org (mailing list archive)
State Not Applicable
Headers show

Commit Message

Michael Neuling Aug. 16, 2012, 4:23 a.m. UTC
Hi,

I've been trying to get hardware breakpoints with perf to work on POWER7
but I'm getting the following:

  % perf record -e mem:0x10000000 true

    Error: sys_perf_event_open() syscall returned with 28 (No space left on device).  /bin/dmesg may provide additional information.

    Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?

  true: Terminated

(FWIW adding -a and it works fine)

Debugging it seems that __reserve_bp_slot() is returning ENOSPC because
it thinks there are no free breakpoint slots on this CPU.

I have a 2 CPUs, so perf userspace is doing two perf_event_open syscalls
to add a counter to each CPU [1].  The first syscall succeeds but the
second is failing.

On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
despite there being no breakpoint on this CPU.  This is because the call
the task_bp_pinned, checks all CPUs, rather than just the current CPU.
POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
return ENOSPC.

The following patch fixes this by checking the associated CPU for each
breakpoint in task_bp_pinned.  I'm not familiar with this code, so it's
provided as a reference to the above issue.

Mikey

1. not sure why it doesn't just do one syscall and specify all CPUs, but
that's another issue.  Using two syscalls should work.

Comments

Peter Zijlstra Aug. 16, 2012, 7:40 a.m. UTC | #1
On Thu, 2012-08-16 at 14:23 +1000, Michael Neuling wrote:
> 
> On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> despite there being no breakpoint on this CPU.  This is because the call
> the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> return ENOSPC.

I think this comes from the ptrace legacy, we register a breakpoint on
all cpus because when we migrate a task it cannot fail to migrate the
breakpoint.

Its one of the things I hate most about the hwbp stuff as it relates to
perf.

Frederic knows more...
Michael Neuling Aug. 16, 2012, 11:17 a.m. UTC | #2
Peter,

> > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > despite there being no breakpoint on this CPU.  This is because the call
> > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > return ENOSPC.
> 
> I think this comes from the ptrace legacy, we register a breakpoint on
> all cpus because when we migrate a task it cannot fail to migrate the
> breakpoint.
> 
> Its one of the things I hate most about the hwbp stuff as it relates to
> perf.
> 
> Frederic knows more...

Maybe I should wait for Frederic to respond but I'm not sure I
understand what you're saying.

I can see how using ptrace hw breakpoints and perf hw breakpoints at the
same time could be a problem, but I'm not sure how this would stop it.

Are you saying that we need to keep at least 1 slot free at all times,
so that we can use it for ptrace?

Is "perf record -e mem:0x10000000 true" ever going to be able to work on
POWER7 with only one hw breakpoint resource per CPU?  

Thanks,
Mikey
Peter Zijlstra Aug. 16, 2012, 11:44 a.m. UTC | #3
On Thu, 2012-08-16 at 21:17 +1000, Michael Neuling wrote:
> Peter,
> 
> > > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > > despite there being no breakpoint on this CPU.  This is because the call
> > > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > > return ENOSPC.
> > 
> > I think this comes from the ptrace legacy, we register a breakpoint on
> > all cpus because when we migrate a task it cannot fail to migrate the
> > breakpoint.
> > 
> > Its one of the things I hate most about the hwbp stuff as it relates to
> > perf.
> > 
> > Frederic knows more...
> 
> Maybe I should wait for Frederic to respond but I'm not sure I
> understand what you're saying.
> 
> I can see how using ptrace hw breakpoints and perf hw breakpoints at the
> same time could be a problem, but I'm not sure how this would stop it.

ptrace uses perf for hwbp support so we're stuck with all kinds of
stupid ptrace constraints.. or somesuch.

> Are you saying that we need to keep at least 1 slot free at all times,
> so that we can use it for ptrace?

No, I'm saying perf-hwbp is weird because of ptrace, maybe the ptrace
weirdness shouldn't live in perf-hwpb but in the ptrace-perf glue
however..

> Is "perf record -e mem:0x10000000 true" ever going to be able to work on
> POWER7 with only one hw breakpoint resource per CPU?  

I think it should work... but I'm fairly sure it currently doesn't
because of how things are done. 'perf record -ie mem:0x100... true'
might just work.

I always forget all the ptrace details but I am forever annoyed at the
mess that is perf-hwbp.. Frederic is there really nothing we can do
about this?

The fact that ptrace hwbp semantics are different per architecture
doesn't help of course.
Michael Ellerman Aug. 16, 2012, 2:02 p.m. UTC | #4
On Thu, 2012-08-16 at 13:44 +0200, Peter Zijlstra wrote:
> On Thu, 2012-08-16 at 21:17 +1000, Michael Neuling wrote:
> > Peter,
> > 
> > > > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > > > despite there being no breakpoint on this CPU.  This is because the call
> > > > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > > > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > > > return ENOSPC.
> > > 
> > > I think this comes from the ptrace legacy, we register a breakpoint on
> > > all cpus because when we migrate a task it cannot fail to migrate the
> > > breakpoint.
> > > 
> > > Its one of the things I hate most about the hwbp stuff as it relates to
> > > perf.
> > > 
> > > Frederic knows more...
> > 
> > Maybe I should wait for Frederic to respond but I'm not sure I
> > understand what you're saying.
> > 
> > I can see how using ptrace hw breakpoints and perf hw breakpoints at the
> > same time could be a problem, but I'm not sure how this would stop it.
> 
> ptrace uses perf for hwbp support so we're stuck with all kinds of
> stupid ptrace constraints.. or somesuch.
> 
> > Are you saying that we need to keep at least 1 slot free at all times,
> > so that we can use it for ptrace?
> 
> No, I'm saying perf-hwbp is weird because of ptrace, maybe the ptrace
> weirdness shouldn't live in perf-hwpb but in the ptrace-perf glue
> however..

But how else would it work, even if ptrace wasn't in the picture?

You do want to guarantee that the task will always be subject to the
breakpoint, even if it moves cpus. So is there any way to guarantee that
other than reserving a breakpoint slot on every cpu ahead of time?

Or can a hwbp event go into error state if it can't be installed on the
new cpu, like a pinned event does? I can't see any code that does that.

cheers
Peter Zijlstra Aug. 16, 2012, 2:15 p.m. UTC | #5
On Fri, 2012-08-17 at 00:02 +1000, Michael Ellerman wrote:
> You do want to guarantee that the task will always be subject to the
> breakpoint, even if it moves cpus. So is there any way to guarantee that
> other than reserving a breakpoint slot on every cpu ahead of time? 

That's not how regular perf works.. regular perf can overload hw
resources at will and stuff is strictly per-cpu.

So the regular perf record has perf_event_attr::inherit enabled by
default, this will result in it creating a per-task-per-cpu event for
each cpu and this will succeed because there's no strict reservation to
avoid/detect starvation against perf_event_attr::pinned events.

For regular (!pinned) events, we'll RR the created events on the
available hardware resources.

HWBP does things completely different and reserves a slot over all CPUs
for everything, thus stuff completely falls apart.
Michael Neuling Aug. 16, 2012, 11:34 p.m. UTC | #6
> > > > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > > > despite there being no breakpoint on this CPU.  This is because the call
> > > > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > > > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > > > return ENOSPC.
> > > 
> > > I think this comes from the ptrace legacy, we register a breakpoint on
> > > all cpus because when we migrate a task it cannot fail to migrate the
> > > breakpoint.
> > > 
> > > Its one of the things I hate most about the hwbp stuff as it relates to
> > > perf.
> > > 
> > > Frederic knows more...
> > 
> > Maybe I should wait for Frederic to respond but I'm not sure I
> > understand what you're saying.
> > 
> > I can see how using ptrace hw breakpoints and perf hw breakpoints at the
> > same time could be a problem, but I'm not sure how this would stop it.
> 
> ptrace uses perf for hwbp support so we're stuck with all kinds of
> stupid ptrace constraints.. or somesuch.

OK

> > Are you saying that we need to keep at least 1 slot free at all times,
> > so that we can use it for ptrace?
> 
> No, I'm saying perf-hwbp is weird because of ptrace, maybe the ptrace
> weirdness shouldn't live in perf-hwpb but in the ptrace-perf glue
> however..

OK.

> > Is "perf record -e mem:0x10000000 true" ever going to be able to work on
> > POWER7 with only one hw breakpoint resource per CPU?  
> 
> I think it should work... but I'm fairly sure it currently doesn't
> because of how things are done. 'perf record -ie mem:0x100... true'
> might just work.

Adding -i doesn't help. 

Mikey
Michael Ellerman Aug. 17, 2012, 1:20 a.m. UTC | #7
On Thu, 2012-08-16 at 16:15 +0200, Peter Zijlstra wrote:
> On Fri, 2012-08-17 at 00:02 +1000, Michael Ellerman wrote:
> > You do want to guarantee that the task will always be subject to the
> > breakpoint, even if it moves cpus. So is there any way to guarantee that
> > other than reserving a breakpoint slot on every cpu ahead of time? 
> 
> That's not how regular perf works.. regular perf can overload hw
> resources at will and stuff is strictly per-cpu.
..
> For regular (!pinned) events, we'll RR the created events on the
> available hardware resources.

Yeah I know, but that isn't really the semantics you want for a
breakpoint. You don't want to sometimes have the breakpoint active and
sometimes not, it needs to be active at all times when the task is
running.

At the very least you want it to behave like a pinned event, ie. if it
can't be scheduled you get notified and can tell the user.

> HWBP does things completely different and reserves a slot over all CPUs
> for everything, thus stuff completely falls apart.

So it would seem :)

I guess my point was that reserving a slot on each cpu seems like a
reasonable way of guaranteeing that wherever the task goes we will be
able to install the breakpoint.

But obviously we need some way to make it play nice with perf.

cheers
Frédéric Weisbecker Aug. 17, 2012, 4:15 p.m. UTC | #8
On Thu, Aug 16, 2012 at 02:23:54PM +1000, Michael Neuling wrote:
> Hi,
> 
> I've been trying to get hardware breakpoints with perf to work on POWER7
> but I'm getting the following:
> 
>   % perf record -e mem:0x10000000 true
> 
>     Error: sys_perf_event_open() syscall returned with 28 (No space left on device).  /bin/dmesg may provide additional information.
> 
>     Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?
> 
>   true: Terminated
> 
> (FWIW adding -a and it works fine)
> 
> Debugging it seems that __reserve_bp_slot() is returning ENOSPC because
> it thinks there are no free breakpoint slots on this CPU.
> 
> I have a 2 CPUs, so perf userspace is doing two perf_event_open syscalls
> to add a counter to each CPU [1].  The first syscall succeeds but the
> second is failing.
> 
> On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> despite there being no breakpoint on this CPU.  This is because the call
> the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> return ENOSPC.
> 
> The following patch fixes this by checking the associated CPU for each
> breakpoint in task_bp_pinned.  I'm not familiar with this code, so it's
> provided as a reference to the above issue.
> 
> Mikey
> 
> 1. not sure why it doesn't just do one syscall and specify all CPUs, but
> that's another issue.  Using two syscalls should work.

This patch seems to make sense. I'll try it and run some tests.
Can I have your Signed-off-by ?

Thanks.

> 
> diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
> index bb38c4d..e092daa 100644
> --- a/kernel/events/hw_breakpoint.c
> +++ b/kernel/events/hw_breakpoint.c
> @@ -111,14 +111,16 @@ static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
>   * Count the number of breakpoints of the same type and same task.
>   * The given event must be not on the list.
>   */
> -static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
> +static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
>  {
>  	struct task_struct *tsk = bp->hw.bp_target;
>  	struct perf_event *iter;
>  	int count = 0;
>  
>  	list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
> -		if (iter->hw.bp_target == tsk && find_slot_idx(iter) == type)
> +		if (iter->hw.bp_target == tsk &&
> +		    find_slot_idx(iter) == type &&
> +		    cpu == iter->cpu)
>  			count += hw_breakpoint_weight(iter);
>  	}
>  
> @@ -141,7 +143,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
>  		if (!tsk)
>  			slots->pinned += max_task_bp_pinned(cpu, type);
>  		else
> -			slots->pinned += task_bp_pinned(bp, type);
> +			slots->pinned += task_bp_pinned(cpu, bp, type);
>  		slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
>  
>  		return;
> @@ -154,7 +156,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
>  		if (!tsk)
>  			nr += max_task_bp_pinned(cpu, type);
>  		else
> -			nr += task_bp_pinned(bp, type);
> +			nr += task_bp_pinned(cpu, bp, type);
>  
>  		if (nr > slots->pinned)
>  			slots->pinned = nr;
> @@ -188,7 +190,7 @@ static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
>  	int old_idx = 0;
>  	int idx = 0;
>  
> -	old_count = task_bp_pinned(bp, type);
> +	old_count = task_bp_pinned(cpu, bp, type);
>  	old_idx = old_count - 1;
>  	idx = old_idx + weight;
>
Michael Neuling Aug. 17, 2012, 9:58 p.m. UTC | #9
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Thu, Aug 16, 2012 at 02:23:54PM +1000, Michael Neuling wrote:
> > Hi,
> > 
> > I've been trying to get hardware breakpoints with perf to work on POWER7
> > but I'm getting the following:
> > 
> >   % perf record -e mem:0x10000000 true
> > 
> >     Error: sys_perf_event_open() syscall returned with 28 (No space left on device).  /bin/dmesg may provide additional information.
> > 
> >     Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?
> > 
> >   true: Terminated
> > 
> > (FWIW adding -a and it works fine)
> > 
> > Debugging it seems that __reserve_bp_slot() is returning ENOSPC because
> > it thinks there are no free breakpoint slots on this CPU.
> > 
> > I have a 2 CPUs, so perf userspace is doing two perf_event_open syscalls
> > to add a counter to each CPU [1].  The first syscall succeeds but the
> > second is failing.
> > 
> > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > despite there being no breakpoint on this CPU.  This is because the call
> > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > return ENOSPC.
> > 
> > The following patch fixes this by checking the associated CPU for each
> > breakpoint in task_bp_pinned.  I'm not familiar with this code, so it's
> > provided as a reference to the above issue.
> > 
> > Mikey
> > 
> > 1. not sure why it doesn't just do one syscall and specify all CPUs, but
> > that's another issue.  Using two syscalls should work.
> 
> This patch seems to make sense. I'll try it and run some tests.
> Can I have your Signed-off-by ?

Of course...

Signed-off-by: Michael Neuling <mikey@neuling.org>

> 
> Thanks.
> 
> > 
> > diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
> > index bb38c4d..e092daa 100644
> > --- a/kernel/events/hw_breakpoint.c
> > +++ b/kernel/events/hw_breakpoint.c
> > @@ -111,14 +111,16 @@ static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
> >   * Count the number of breakpoints of the same type and same task.
> >   * The given event must be not on the list.
> >   */
> > -static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
> > +static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
> >  {
> >  	struct task_struct *tsk = bp->hw.bp_target;
> >  	struct perf_event *iter;
> >  	int count = 0;
> >  
> >  	list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
> > -		if (iter->hw.bp_target == tsk && find_slot_idx(iter) == type)
> > +		if (iter->hw.bp_target == tsk &&
> > +		    find_slot_idx(iter) == type &&
> > +		    cpu == iter->cpu)
> >  			count += hw_breakpoint_weight(iter);
> >  	}
> >  
> > @@ -141,7 +143,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
> >  		if (!tsk)
> >  			slots->pinned += max_task_bp_pinned(cpu, type);
> >  		else
> > -			slots->pinned += task_bp_pinned(bp, type);
> > +			slots->pinned += task_bp_pinned(cpu, bp, type);
> >  		slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
> >  
> >  		return;
> > @@ -154,7 +156,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
> >  		if (!tsk)
> >  			nr += max_task_bp_pinned(cpu, type);
> >  		else
> > -			nr += task_bp_pinned(bp, type);
> > +			nr += task_bp_pinned(cpu, bp, type);
> >  
> >  		if (nr > slots->pinned)
> >  			slots->pinned = nr;
> > @@ -188,7 +190,7 @@ static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
> >  	int old_idx = 0;
> >  	int idx = 0;
> >  
> > -	old_count = task_bp_pinned(bp, type);
> > +	old_count = task_bp_pinned(cpu, bp, type);
> >  	old_idx = old_count - 1;
> >  	idx = old_idx + weight;
> >  
>
Michael Neuling Sept. 25, 2012, 7:10 a.m. UTC | #10
Michael Neuling <mikey@neuling.org> wrote:

> Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Thu, Aug 16, 2012 at 02:23:54PM +1000, Michael Neuling wrote:
> > > Hi,
> > > 
> > > I've been trying to get hardware breakpoints with perf to work on POWER7
> > > but I'm getting the following:
> > > 
> > >   % perf record -e mem:0x10000000 true
> > > 
> > >     Error: sys_perf_event_open() syscall returned with 28 (No space left on device).  /bin/dmesg may provide additional information.
> > > 
> > >     Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?
> > > 
> > >   true: Terminated
> > > 
> > > (FWIW adding -a and it works fine)
> > > 
> > > Debugging it seems that __reserve_bp_slot() is returning ENOSPC because
> > > it thinks there are no free breakpoint slots on this CPU.
> > > 
> > > I have a 2 CPUs, so perf userspace is doing two perf_event_open syscalls
> > > to add a counter to each CPU [1].  The first syscall succeeds but the
> > > second is failing.
> > > 
> > > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > > despite there being no breakpoint on this CPU.  This is because the call
> > > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > > return ENOSPC.
> > > 
> > > The following patch fixes this by checking the associated CPU for each
> > > breakpoint in task_bp_pinned.  I'm not familiar with this code, so it's
> > > provided as a reference to the above issue.
> > > 
> > > Mikey
> > > 
> > > 1. not sure why it doesn't just do one syscall and specify all CPUs, but
> > > that's another issue.  Using two syscalls should work.
> > 
> > This patch seems to make sense. I'll try it and run some tests.
> > Can I have your Signed-off-by ?

Frederic,

Did you ever get to testing or integrating this patch?

Mikey

> Of course...
> 
> Signed-off-by: Michael Neuling <mikey@neuling.org>
jovi zhang Sept. 27, 2012, 1:02 a.m. UTC | #11
On Thu, Aug 16, 2012 at 12:23 PM, Michael Neuling <mikey@neuling.org> wrote:
> Hi,
>
> I've been trying to get hardware breakpoints with perf to work on POWER7
> but I'm getting the following:
>
>   % perf record -e mem:0x10000000 true
>
>     Error: sys_perf_event_open() syscall returned with 28 (No space left on device).  /bin/dmesg may provide additional information.
>
>     Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?
>
>   true: Terminated
>
> (FWIW adding -a and it works fine)
>
> Debugging it seems that __reserve_bp_slot() is returning ENOSPC because
> it thinks there are no free breakpoint slots on this CPU.
>
> I have a 2 CPUs, so perf userspace is doing two perf_event_open syscalls
> to add a counter to each CPU [1].  The first syscall succeeds but the
> second is failing.
>
> On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> despite there being no breakpoint on this CPU.  This is because the call
> the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> return ENOSPC.
>
> The following patch fixes this by checking the associated CPU for each
> breakpoint in task_bp_pinned.  I'm not familiar with this code, so it's
> provided as a reference to the above issue.
>
> Mikey
>
> 1. not sure why it doesn't just do one syscall and specify all CPUs, but
> that's another issue.  Using two syscalls should work.
>
This problem let me recall what I reported several months ago.

https://lkml.org/lkml/2012/6/27/631

At that time, I thought it is caused by uses_mmap field in record sub
command which
added by commit d1cb9f(perf target: Add uses_mmap field).

In that testcase, it's fine to use stat sub command, but failed with
record sub command.

As Namhyung metioned in that thread, [perf record xxx] use
per-task-per-cpu for fix
scalability issues.
Frédéric Weisbecker Sept. 27, 2012, 3:23 p.m. UTC | #12
2012/9/25 Michael Neuling <mikey@neuling.org>:
> Michael Neuling <mikey@neuling.org> wrote:
>
>> Frederic Weisbecker <fweisbec@gmail.com> wrote:
>>
>> > On Thu, Aug 16, 2012 at 02:23:54PM +1000, Michael Neuling wrote:
>> > > Hi,
>> > >
>> > > I've been trying to get hardware breakpoints with perf to work on POWER7
>> > > but I'm getting the following:
>> > >
>> > >   % perf record -e mem:0x10000000 true
>> > >
>> > >     Error: sys_perf_event_open() syscall returned with 28 (No space left on device).  /bin/dmesg may provide additional information.
>> > >
>> > >     Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?
>> > >
>> > >   true: Terminated
>> > >
>> > > (FWIW adding -a and it works fine)
>> > >
>> > > Debugging it seems that __reserve_bp_slot() is returning ENOSPC because
>> > > it thinks there are no free breakpoint slots on this CPU.
>> > >
>> > > I have a 2 CPUs, so perf userspace is doing two perf_event_open syscalls
>> > > to add a counter to each CPU [1].  The first syscall succeeds but the
>> > > second is failing.
>> > >
>> > > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
>> > > despite there being no breakpoint on this CPU.  This is because the call
>> > > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
>> > > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
>> > > return ENOSPC.
>> > >
>> > > The following patch fixes this by checking the associated CPU for each
>> > > breakpoint in task_bp_pinned.  I'm not familiar with this code, so it's
>> > > provided as a reference to the above issue.
>> > >
>> > > Mikey
>> > >
>> > > 1. not sure why it doesn't just do one syscall and specify all CPUs, but
>> > > that's another issue.  Using two syscalls should work.
>> >
>> > This patch seems to make sense. I'll try it and run some tests.
>> > Can I have your Signed-off-by ?
>
> Frederic,
>
> Did you ever get to testing or integrating this patch?
>
> Mikey

Sorry, I forgot this in my mailbox. I'm going to look at this in the
next few days.
Feel free to harass me by email or IRC if I don't give news on this soon.
diff mbox

Patch

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index bb38c4d..e092daa 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -111,14 +111,16 @@  static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
  * Count the number of breakpoints of the same type and same task.
  * The given event must be not on the list.
  */
-static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
+static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
 {
 	struct task_struct *tsk = bp->hw.bp_target;
 	struct perf_event *iter;
 	int count = 0;
 
 	list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
-		if (iter->hw.bp_target == tsk && find_slot_idx(iter) == type)
+		if (iter->hw.bp_target == tsk &&
+		    find_slot_idx(iter) == type &&
+		    cpu == iter->cpu)
 			count += hw_breakpoint_weight(iter);
 	}
 
@@ -141,7 +143,7 @@  fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
 		if (!tsk)
 			slots->pinned += max_task_bp_pinned(cpu, type);
 		else
-			slots->pinned += task_bp_pinned(bp, type);
+			slots->pinned += task_bp_pinned(cpu, bp, type);
 		slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
 
 		return;
@@ -154,7 +156,7 @@  fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
 		if (!tsk)
 			nr += max_task_bp_pinned(cpu, type);
 		else
-			nr += task_bp_pinned(bp, type);
+			nr += task_bp_pinned(cpu, bp, type);
 
 		if (nr > slots->pinned)
 			slots->pinned = nr;
@@ -188,7 +190,7 @@  static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
 	int old_idx = 0;
 	int idx = 0;
 
-	old_count = task_bp_pinned(bp, type);
+	old_count = task_bp_pinned(cpu, bp, type);
 	old_idx = old_count - 1;
 	idx = old_idx + weight;