Patchwork [v4,2/2] powerpc: Uprobes port to powerpc

login
register
mail settings
Submitter Ananth N Mavinakayanahalli
Date Aug. 22, 2012, 8:27 a.m.
Message ID <20120822082708.GB29216@in.ibm.com>
Download mbox | patch
Permalink /patch/179260/
State Not Applicable
Headers show

Comments

Ananth N Mavinakayanahalli - Aug. 22, 2012, 8:27 a.m.
From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>

This is the port of uprobes to powerpc. Usage is similar to x86.

[root@xxxx ~]# ./bin/perf probe -x /lib64/libc.so.6 malloc
Added new event:
  probe_libc:malloc    (on 0xb4860)

You can now use it in all perf tools, such as:

	perf record -e probe_libc:malloc -aR sleep 1

[root@xxxx ~]# ./bin/perf record -e probe_libc:malloc -aR sleep 20
[ perf record: Woken up 22 times to write data ]
[ perf record: Captured and wrote 5.843 MB perf.data (~255302 samples) ]
[root@xxxx ~]# ./bin/perf report --stdio
...

# Samples: 83K of event 'probe_libc:malloc'
# Event count (approx.): 83484
#
# Overhead       Command  Shared Object      Symbol
# ........  ............  .............  ..........
#
    69.05%           tar  libc-2.12.so   [.] malloc
    28.57%            rm  libc-2.12.so   [.] malloc
     1.32%  avahi-daemon  libc-2.12.so   [.] malloc
     0.58%          bash  libc-2.12.so   [.] malloc
     0.28%          sshd  libc-2.12.so   [.] malloc
     0.08%    irqbalance  libc-2.12.so   [.] malloc
     0.05%         bzip2  libc-2.12.so   [.] malloc
     0.04%         sleep  libc-2.12.so   [.] malloc
     0.03%    multipathd  libc-2.12.so   [.] malloc
     0.01%      sendmail  libc-2.12.so   [.] malloc
     0.01%     automount  libc-2.12.so   [.] malloc

The trap_nr addition patch is a prereq.

A few minor changes can be expected down the line based on the
discussions currently happening on lkml around the single-stepping
infrastructure code (http://marc.info/?l=linux-kernel&m=134545967710242&w=2).

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
---

Tested on POWER6; I don't see anything here that should stop it from
working on a ppc32; since I don't have access to a ppc32 machine, it
would be good if somoene could verify that part.

V4:
Enhance arch_analyze_uprobe_insn() to reject uprobes on trap variants.

V3:
Added abort_xol() logic for powerpc, using thread_struct.trap_nr to
determine if the stepped instruction caused an exception.

V2:
a. arch_uprobe_analyze_insn() now gets unsigned long addr.
b. Verified that mtmsr[d] and rfi[d] are handled correctly by
   emulate_step() (no changes to this patch).

 arch/powerpc/Kconfig                   |    3 
 arch/powerpc/include/asm/thread_info.h |    4 
 arch/powerpc/include/asm/uprobes.h     |   58 ++++++++++
 arch/powerpc/kernel/Makefile           |    1 
 arch/powerpc/kernel/signal.c           |    6 +
 arch/powerpc/kernel/uprobes.c          |  180 +++++++++++++++++++++++++++++++++
 6 files changed, 251 insertions(+), 1 deletion(-)
Oleg Nesterov - Aug. 22, 2012, 3:55 p.m.
On 08/22, Ananth N Mavinakayanahalli wrote:
>
> +int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr)
> +{
> +	unsigned int insn;
> +
> +	if (addr & 0x03)
> +		return -EINVAL;
> +
> +	memcpy(&insn, auprobe->insn, MAX_UINSN_BYTES);
> +	if (is_trap(insn))
> +		return -ENOTSUPP;

This addresses the only concern I had, thanks.

Once again, I am in no position to review the powerpc code, but
since I made some noise before: I think the patch is fine.

Oleg.
Michael Ellerman - Aug. 23, 2012, 4:28 a.m.
On Wed, 2012-08-22 at 13:57 +0530, Ananth N Mavinakayanahalli wrote:
> From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> 
> This is the port of uprobes to powerpc. Usage is similar to x86.

Hi Ananth,

Excuse my ignorance of uprobes, some comments inline ...


> [root@xxxx ~]# ./bin/perf probe -x /lib64/libc.so.6 malloc
> Added new event:
>   probe_libc:malloc    (on 0xb4860)
> 
> You can now use it in all perf tools, such as:
> 
> 	perf record -e probe_libc:malloc -aR sleep 1

Is there a test suite for any of this?


> Index: linux-tip-16aug/arch/powerpc/include/asm/uprobes.h
> ===================================================================
> --- /dev/null
> +++ linux-tip-16aug/arch/powerpc/include/asm/uprobes.h
> @@ -0,0 +1,58 @@
> +#ifndef _ASM_UPROBES_H
> +#define _ASM_UPROBES_H
> +/*
> + * User-space Probes (UProbes) for powerpc
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Copyright (C) IBM Corporation, 2007-2012

The lawyers say we shouldn't use (C).

Is it really copyright IBM 2007-2012? Or is that because you copied
another header?


> +typedef unsigned int uprobe_opcode_t;

I'd prefer u32.

It would be nice if someone could consolidate this with kprobe_opcode_t.


> +#define MAX_UINSN_BYTES			4
> +#define UPROBE_XOL_SLOT_BYTES		(MAX_UINSN_BYTES)
> +
> +#define UPROBE_SWBP_INSN		0x7fe00008

This is just "trap" ?

> +#define UPROBE_SWBP_INSN_SIZE		4 /* swbp insn size in bytes */
> +
> +#define IS_TW(instr)		(((instr) & 0xfc0007fe) == 0x7c000008)
> +#define IS_TD(instr)		(((instr) & 0xfc0007fe) == 0x7c000088)
> +#define IS_TDI(instr)		(((instr) & 0xfc000000) == 0x08000000)
> +#define IS_TWI(instr)		(((instr) & 0xfc000000) == 0x0c000000)
> +
> +#define is_trap(instr)	(IS_TW(instr) || IS_TD(instr) || \
> +			IS_TWI(instr) || IS_TDI(instr))

These seem to be duplicated in kprobes.h, can we consolidate them.

> +struct arch_uprobe {
> +	u8	insn[MAX_UINSN_BYTES];
> +};

Why not uprobe_opcode_t insn ?



> Index: linux-tip-16aug/arch/powerpc/kernel/signal.c
> ===================================================================
> --- linux-tip-16aug.orig/arch/powerpc/kernel/signal.c
> +++ linux-tip-16aug/arch/powerpc/kernel/signal.c
> @@ -11,6 +11,7 @@
>  
>  #include <linux/tracehook.h>
>  #include <linux/signal.h>
> +#include <linux/uprobes.h>
>  #include <linux/key.h>
>  #include <asm/hw_breakpoint.h>
>  #include <asm/uaccess.h>
> @@ -157,6 +158,11 @@ static int do_signal(struct pt_regs *reg
>  
>  void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
>  {
> +	if (thread_info_flags & _TIF_UPROBE) {
> +		clear_thread_flag(TIF_UPROBE);
> +		uprobe_notify_resume(regs);
> +	}

Presumably this ordering is crucial, ie. uprobes before signals.

>  	if (thread_info_flags & _TIF_SIGPENDING)
>  		do_signal(regs);
>  
> Index: linux-tip-16aug/arch/powerpc/kernel/uprobes.c
> ===================================================================
> --- /dev/null
> +++ linux-tip-16aug/arch/powerpc/kernel/uprobes.c
> @@ -0,0 +1,180 @@
> +/*
> + * User-space Probes (UProbes) for powerpc
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Copyright (C) IBM Corporation, 2007-2012
> + *
> + * Adapted from the x86 port by Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> + */
> +#include <linux/kernel.h>
> +#include <linux/sched.h>
> +#include <linux/ptrace.h>
> +#include <linux/uprobes.h>
> +#include <linux/uaccess.h>
> +#include <linux/kdebug.h>
> +
> +#include <asm/sstep.h>
> +
> +#define UPROBE_TRAP_NR	UINT_MAX

In the comments below you talk about -1 a few times, but you actually
mean UINT_MAX.


> +/**
> + * arch_uprobe_analyze_insn

Analyze what about the instruction?

> + * @mm: the probed address space.
> + * @arch_uprobe: the probepoint information.
> + * @addr: vaddr to probe.
> + * Return 0 on success or a -ve number on error.
> + */
> +int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr)
> +{
> +	unsigned int insn;
> +
> +	if (addr & 0x03)
> +		return -EINVAL;
> +
> +	memcpy(&insn, auprobe->insn, MAX_UINSN_BYTES);

We shouldn't need to use memcpy, we know it's a u32.

> +	if (is_trap(insn))
> +		return -ENOTSUPP;

A comment saying why we can't handle this would be nice.

> +	return 0;
> +}


I am probably missing something, but why do we need to execute out of
line?

> +/*
> + * arch_uprobe_pre_xol - prepare to execute out of line.
> + * @auprobe: the probepoint information.
> + * @regs: reflects the saved user state of current task.
> + */
> +int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> +	struct arch_uprobe_task *autask = &current->utask->autask;
> +
> +	autask->saved_trap_nr = current->thread.trap_nr;
> +	current->thread.trap_nr = UPROBE_TRAP_NR;
> +	regs->nip = current->utask->xol_vaddr;
> +	return 0;
> +}
> +
> +/**
> + * uprobe_get_swbp_addr - compute address of swbp given post-swbp regs
> + * @regs: Reflects the saved state of the task after it has hit a breakpoint
> + * instruction.
> + * Return the address of the breakpoint instruction.
> + */
> +unsigned long uprobe_get_swbp_addr(struct pt_regs *regs)
> +{
> +	return instruction_pointer(regs);
> +}

This seems like it would be better in asm/uprobes.h as a static inline,
but that's not your fault.

> +/*
> + * If xol insn itself traps and generates a signal (SIGILL/SIGSEGV/etc),
> + * then detect the case where a singlestepped instruction jumps back to its
> + * own address. It is assumed that anything like do_page_fault/do_trap/etc
> + * sets thread.trap_nr != -1.
> + *
> + * arch_uprobe_pre_xol/arch_uprobe_post_xol save/restore thread.trap_nr,
> + * arch_uprobe_xol_was_trapped() simply checks that ->trap_nr is not equal to
> + * UPROBE_TRAP_NR == -1 set by arch_uprobe_pre_xol().
> + */
> +bool arch_uprobe_xol_was_trapped(struct task_struct *t)
> +{
> +	if (t->thread.trap_nr != UPROBE_TRAP_NR)
> +		return true;
> +
> +	return false;
> +}
> +
> +/*
> + * Called after single-stepping. To avoid the SMP problems that can
> + * occur when we temporarily put back the original opcode to
> + * single-step, we single-stepped a copy of the instruction.
> + *
> + * This function prepares to resume execution after the single-step.
> + */
> +int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> +	struct uprobe_task *utask = current->utask;
> +
> +	WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);
> +
> +	current->thread.trap_nr = utask->autask.saved_trap_nr;
> +
> +	/*
> +	 * On powerpc, except for loads and stores, most instructions
> +	 * including ones that alter code flow (branches, calls, returns)
> +	 * are emulated in the kernel. We get here only if the emulation
> +	 * support doesn't exist and have to fix-up the next instruction
> +	 * to be executed.
> +	 */
> +	regs->nip = utask->vaddr + MAX_UINSN_BYTES;
> +	return 0;
> +}
> +
> +/* callback routine for handling exceptions. */
> +int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data)
> +{
> +	struct die_args *args = data;
> +	struct pt_regs *regs = args->regs;
> +
> +	/* We are only interested in userspace traps */
> +	if (regs && !user_mode(regs))
> +		return NOTIFY_DONE;

Do we ever get here with a NULL regs?

> +	switch (val) {
> +	case DIE_BPT:
> +		if (uprobe_pre_sstep_notifier(regs))
> +			return NOTIFY_STOP;
> +		break;
> +	case DIE_SSTEP:
> +		if (uprobe_post_sstep_notifier(regs))
> +			return NOTIFY_STOP;
> +	default:
> +		break;
> +	}
> +	return NOTIFY_DONE;
> +}
> +
> +/*
> + * This function gets called when XOL instruction either gets trapped or
> + * the thread has a fatal signal, so reset the instruction pointer to its
> + * probed address.
> + */
> +void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> +	struct uprobe_task *utask = current->utask;
> +
> +	current->thread.trap_nr = utask->autask.saved_trap_nr;
> +	instruction_pointer_set(regs, utask->vaddr);
> +}
> +
> +/*
> + * See if the instruction can be emulated.
> + * Returns true if instruction was emulated, false otherwise.
> + */
> +bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> +	int ret;
> +	unsigned int insn;
> +
> +	memcpy(&insn, auprobe->insn, MAX_UINSN_BYTES);

Why memcpy?

> +
> +	/*
> +	 * emulate_step() returns 1 if the insn was successfully emulated.
> +	 * For all other cases, we need to single-step in hardware.
> +	 */
> +	ret = emulate_step(regs, insn);
> +	if (ret > 0)
> +		return true;

This actually emulates the instruction, ie. the contents of regs are
changed based on the instruction.

That seems to differ vs x86, where arch_uprobe_skip_sstep() just checks
the instruction and returns true/false. Is that because on x86 they are
only returning true for nops? ie. there is no emulation to be done?

It's a little surprising that can_skip_sstep() actually emulates the
instruction, but again that's not your fault.

cheers
Srikar Dronamraju - Aug. 23, 2012, 5:32 a.m.
> 
> These seem to be duplicated in kprobes.h, can we consolidate them.
> 
> > +struct arch_uprobe {
> > +	u8	insn[MAX_UINSN_BYTES];
> > +};
> 
> Why not uprobe_opcode_t insn ?
> 

insn is updated/accessed in the arch independent code. Size of
uprobe_opcode_t could be different for different archs. uprobe_opcode_t
represents the size of the smallest breakpoint instruction for an arch.

Hence u8 works out the best. I know we could still use uprobe_opcode_t
and achieve the same. In which case, we would have to interpret
MAX_UINSN_BYTES differently. Do you see any advantages of using
uprobe_opcode_t instead of u8 across archs?

> 
> >  void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
> >  {
> > +	if (thread_info_flags & _TIF_UPROBE) {
> > +		clear_thread_flag(TIF_UPROBE);
> > +		uprobe_notify_resume(regs);
> > +	}
> 
> Presumably this ordering is crucial, ie. uprobes before signals.
> 

Yes, we would want uprobes to have a say before do_signal can take a
look.

> 
> 
> I am probably missing something, but why do we need to execute out of
> line?
> 


The other alternative to execute out of line is the current inline
singlestepping. In inline singlestepping, we would have to guard other
tasks from executing the same instruction. 

Executing out of line solves this problem.

> > +/*
> > + * arch_uprobe_pre_xol - prepare to execute out of line.
> > + * @auprobe: the probepoint information.
> > + * @regs: reflects the saved user state of current task.
> > + */
> > +int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> > +{
> > +	struct arch_uprobe_task *autask = &current->utask->autask;
> > +
> > +	autask->saved_trap_nr = current->thread.trap_nr;
> > +	current->thread.trap_nr = UPROBE_TRAP_NR;
> > +	regs->nip = current->utask->xol_vaddr;
> > +	return 0;
> > +}
> > +
> > +/**
> > + * uprobe_get_swbp_addr - compute address of swbp given post-swbp regs
> > + * @regs: Reflects the saved state of the task after it has hit a breakpoint
> > + * instruction.
> > + * Return the address of the breakpoint instruction.
> > + */
> > +unsigned long uprobe_get_swbp_addr(struct pt_regs *regs)
> > +{
> > +	return instruction_pointer(regs);
> > +}
> 
> This seems like it would be better in asm/uprobes.h as a static inline,
> but that's not your fault.
> 

We want archs to override uprobe_get_swbp_addr() if the default
uprobe_get_swbp_addr() doesnt suite for a particular arch. Do you have
better ways to achieve this. Initially we were using arch specific
callbacks from which we moved to using weak functions based on reviews.

> > +	/*
> > +	 * emulate_step() returns 1 if the insn was successfully emulated.
> > +	 * For all other cases, we need to single-step in hardware.
> > +	 */
> > +	ret = emulate_step(regs, insn);
> > +	if (ret > 0)
> > +		return true;
> 
> This actually emulates the instruction, ie. the contents of regs are
> changed based on the instruction.
> 
> That seems to differ vs x86, where arch_uprobe_skip_sstep() just checks
> the instruction and returns true/false. Is that because on x86 they are
> only returning true for nops? ie. there is no emulation to be done?
> 

Yes, In x86, we currently support skip for nop instructions, Once we add
code for skipping other instructions in x86, we could enhance them to
skip them too.

> It's a little surprising that can_skip_sstep() actually emulates the
> instruction, but again that's not your fault.
> 

Are you saying its doing more than what the name suggests or to the fact
that can_skip_step should ideally be called just once?
Ananth N Mavinakayanahalli - Aug. 23, 2012, 5:58 a.m.
On Thu, Aug 23, 2012 at 02:28:20PM +1000, Michael Ellerman wrote:
> On Wed, 2012-08-22 at 13:57 +0530, Ananth N Mavinakayanahalli wrote:
> > From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> > 
> > This is the port of uprobes to powerpc. Usage is similar to x86.
> 
> Hi Ananth,
> 
> Excuse my ignorance of uprobes, some comments inline ...

Thanks for the review Michael!

> > [root@xxxx ~]# ./bin/perf probe -x /lib64/libc.so.6 malloc
> > Added new event:
> >   probe_libc:malloc    (on 0xb4860)
> > 
> > You can now use it in all perf tools, such as:
> > 
> > 	perf record -e probe_libc:malloc -aR sleep 1
> 
> Is there a test suite for any of this?

We don't have a formal testsuite yet, but the usual way of testing it is
to run kernbench while registering/unregistering a bunch of probes
periodically.

...

> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> > + *
> > + * Copyright (C) IBM Corporation, 2007-2012
> 
> The lawyers say we shouldn't use (C).

Will remove that.

> Is it really copyright IBM 2007-2012? Or is that because you copied
> another header?

The later. This is adapted from the x86 version.

> > +typedef unsigned int uprobe_opcode_t;
> 
> I'd prefer u32.

OK. Will change.

> It would be nice if someone could consolidate this with kprobe_opcode_t.

Thats on the TODO after the uprobes code stabilizes further.

I am wondering which file would be appropriate? We could either
consolidate a bunch of these into asm/kdebug.h or asm/ptrace.h. Any
preference/suggestion?

> > +#define MAX_UINSN_BYTES			4
> > +#define UPROBE_XOL_SLOT_BYTES		(MAX_UINSN_BYTES)
> > +
> > +#define UPROBE_SWBP_INSN		0x7fe00008
> 
> This is just "trap" ?

Yes. But since its referred to in arch agnostic code too, we'd have to alias
it thus.

> > +#define UPROBE_SWBP_INSN_SIZE		4 /* swbp insn size in bytes */
> > +
> > +#define IS_TW(instr)		(((instr) & 0xfc0007fe) == 0x7c000008)
> > +#define IS_TD(instr)		(((instr) & 0xfc0007fe) == 0x7c000088)
> > +#define IS_TDI(instr)		(((instr) & 0xfc000000) == 0x08000000)
> > +#define IS_TWI(instr)		(((instr) & 0xfc000000) == 0x0c000000)
> > +
> > +#define is_trap(instr)	(IS_TW(instr) || IS_TD(instr) || \
> > +			IS_TWI(instr) || IS_TDI(instr))
> 
> These seem to be duplicated in kprobes.h, can we consolidate them.

Yes, similar to the opcode_t types above.

> > +struct arch_uprobe {
> > +	u8	insn[MAX_UINSN_BYTES];
> > +};
> 
> Why not uprobe_opcode_t insn ?

I had a similar discussion with Srikar while doing the port, but he has
reasons for this...

...

> >  void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
> >  {
> > +	if (thread_info_flags & _TIF_UPROBE) {
> > +		clear_thread_flag(TIF_UPROBE);
> > +		uprobe_notify_resume(regs);
> > +	}
> 
> Presumably this ordering is crucial, ie. uprobes before signals.

Correct!

...

> > +#define UPROBE_TRAP_NR	UINT_MAX
> 
> In the comments below you talk about -1 a few times, but you actually
> mean UINT_MAX.

Correct. I will fix those references.

> > +/**
> > + * arch_uprobe_analyze_insn
> 
> Analyze what about the instruction?

Depends on the architecture. On x86, we need to verify if the address is
at an instruction boundary, and if the instruction can be probed at all.
On powerpc, we have an easier time. We just validate the address is
aligned at instruction boundary and flag if the instruction at the
address is a trap variant.

> > + * @mm: the probed address space.
> > + * @arch_uprobe: the probepoint information.
> > + * @addr: vaddr to probe.
> > + * Return 0 on success or a -ve number on error.
> > + */
> > +int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr)
> > +{
> > +	unsigned int insn;
> > +
> > +	if (addr & 0x03)
> > +		return -EINVAL;
> > +
> > +	memcpy(&insn, auprobe->insn, MAX_UINSN_BYTES);
> 
> We shouldn't need to use memcpy, we know it's a u32.

OK. Right now, its u8 insn[4], so I did this to be 'correct'. But I
agree we can just do an assignment.

> > +	if (is_trap(insn))
> > +		return -ENOTSUPP;
> 
> A comment saying why we can't handle this would be nice.

Will add.

> > +	return 0;
> > +}
> 
> 
> I am probably missing something, but why do we need to execute out of
> line?

I think Srikar answered that.

...

> > +/* callback routine for handling exceptions. */
> > +int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data)
> > +{
> > +	struct die_args *args = data;
> > +	struct pt_regs *regs = args->regs;
> > +
> > +	/* We are only interested in userspace traps */
> > +	if (regs && !user_mode(regs))
> > +		return NOTIFY_DONE;
> 
> Do we ever get here with a NULL regs?

I don't think so. Its just a paranoid check. Do you prefer it to be
removed?

...

> > + * See if the instruction can be emulated.
> > + * Returns true if instruction was emulated, false otherwise.
> > + */
> > +bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
> > +{
> > +	int ret;
> > +	unsigned int insn;
> > +
> > +	memcpy(&insn, auprobe->insn, MAX_UINSN_BYTES);
> 
> Why memcpy?

Same as above - u8 insn[4].

> > +
> > +	/*
> > +	 * emulate_step() returns 1 if the insn was successfully emulated.
> > +	 * For all other cases, we need to single-step in hardware.
> > +	 */
> > +	ret = emulate_step(regs, insn);
> > +	if (ret > 0)
> > +		return true;
> 
> This actually emulates the instruction, ie. the contents of regs are
> changed based on the instruction.
> 
> That seems to differ vs x86, where arch_uprobe_skip_sstep() just checks
> the instruction and returns true/false. Is that because on x86 they are
> only returning true for nops? ie. there is no emulation to be done?
> 
> It's a little surprising that can_skip_sstep() actually emulates the
> instruction, but again that's not your fault.

We initially had a uprobe_emulate_step() callback which over many review
cycles has morphed into this.

Ananth
Oleg Nesterov - Aug. 23, 2012, 9:02 a.m.
On 08/23, Benjamin Herrenschmidt wrote:
>
> On Thu, 2012-08-23 at 11:02 +0530, Srikar Dronamraju wrote:
> > >
> >
> > insn is updated/accessed in the arch independent code. Size of
> > uprobe_opcode_t could be different for different archs.
> > uprobe_opcode_t
> > represents the size of the smallest breakpoint instruction for an
> > arch.
> >
> > Hence u8 works out the best. I know we could still use uprobe_opcode_t
> > and achieve the same. In which case, we would have to interpret
> > MAX_UINSN_BYTES differently. Do you see any advantages of using
> > uprobe_opcode_t instead of u8 across archs?
>
> But don't you actively rely on the fact that on powerpc, unlike x86, you
> -can- atomically replace an instruction with a single 32-bit store ?

I must have missed something...

But powerpc does not replace an instruction, the arch independent code
does this and it assumes that uprobe->arch.insn is u8[MAX_UINSN_BYTES].

Perhaps you meant that on powerpc it is "safe" to replace the insn
even if this can race with some CPU executing this code? But uprobes
has to replace the original page anyway, we should not write to
->vm_file.

I agree that memcpy() in arch_uprobe_analyze_insn() and
arch_uprobe_skip_sstep() looks a bit strange. May be powerpc can do

	struct arch_uprobe {
		union {
			u8		insn[MAX_UINSN_BYTES];
			u32		ainsn;
		};
	};

and use auprobe->ainsn directly, I dunno.

But probably I misunderstood you.

Oleg.
Benjamin Herrenschmidt - Aug. 23, 2012, 10:06 a.m.
On Thu, 2012-08-23 at 11:02 +0530, Srikar Dronamraju wrote:
> > 
> 
> insn is updated/accessed in the arch independent code. Size of
> uprobe_opcode_t could be different for different archs.
> uprobe_opcode_t
> represents the size of the smallest breakpoint instruction for an
> arch.
> 
> Hence u8 works out the best. I know we could still use uprobe_opcode_t
> and achieve the same. In which case, we would have to interpret
> MAX_UINSN_BYTES differently. Do you see any advantages of using
> uprobe_opcode_t instead of u8 across archs?

But don't you actively rely on the fact that on powerpc, unlike x86, you
-can- atomically replace an instruction with a single 32-bit store ?

If you don't you should consider it, and that makes defining this as a
u8 array non-sensical (as is using memcpy)

Ben.
Srikar Dronamraju - Aug. 23, 2012, 4:02 p.m.
* Oleg Nesterov <oleg@redhat.com> [2012-08-23 11:02:09]:

> On 08/23, Benjamin Herrenschmidt wrote:
> >
> > On Thu, 2012-08-23 at 11:02 +0530, Srikar Dronamraju wrote:
> > > >
> > >
> > > insn is updated/accessed in the arch independent code. Size of
> > > uprobe_opcode_t could be different for different archs.
> > > uprobe_opcode_t
> > > represents the size of the smallest breakpoint instruction for an
> > > arch.
> > >
> > > Hence u8 works out the best. I know we could still use uprobe_opcode_t
> > > and achieve the same. In which case, we would have to interpret
> > > MAX_UINSN_BYTES differently. Do you see any advantages of using
> > > uprobe_opcode_t instead of u8 across archs?
> >
> > But don't you actively rely on the fact that on powerpc, unlike x86, you
> > -can- atomically replace an instruction with a single 32-bit store ?
> 
> I must have missed something...
> 
> But powerpc does not replace an instruction, the arch independent code
> does this and it assumes that uprobe->arch.insn is u8[MAX_UINSN_BYTES].
> 
> Perhaps you meant that on powerpc it is "safe" to replace the insn
> even if this can race with some CPU executing this code? But uprobes
> has to replace the original page anyway, we should not write to
> ->vm_file.

I think Ben is referring to the fact that because we use an array we
endup using memcpy to copy the original instruction from the ->vm_file.

> 
> I agree that memcpy() in arch_uprobe_analyze_insn() and
> arch_uprobe_skip_sstep() looks a bit strange. May be powerpc can do
> 
> 	struct arch_uprobe {
> 		union {
> 			u8		insn[MAX_UINSN_BYTES];
> 			u32		ainsn;
> 		};
> 	};
> 
> and use auprobe->ainsn directly, I dunno.

I think this should work.

Ben  would this suffice?
Srikar Dronamraju - Aug. 23, 2012, 4:17 p.m.
* Benjamin Herrenschmidt <benh@kernel.crashing.org> [2012-08-23 20:06:18]:

> On Thu, 2012-08-23 at 11:02 +0530, Srikar Dronamraju wrote:
> > > 
> > 
> > insn is updated/accessed in the arch independent code. Size of
> > uprobe_opcode_t could be different for different archs.
> > uprobe_opcode_t
> > represents the size of the smallest breakpoint instruction for an
> > arch.
> > 
> > Hence u8 works out the best. I know we could still use uprobe_opcode_t
> > and achieve the same. In which case, we would have to interpret
> > MAX_UINSN_BYTES differently. Do you see any advantages of using
> > uprobe_opcode_t instead of u8 across archs?
> 
> But don't you actively rely on the fact that on powerpc, unlike x86, you
> -can- atomically replace an instruction with a single 32-bit store ?
> 

We are not doing a replace here, we are only copying from the ->vm_file
for the largest size instruction possible for that instruction. For
powerpc, this is easy because of fixed size instructions.  

On other archs, at this point, we dont even know the length of the
underlying instruction.

Now there are 3 ways to handle this:
1. use arch independent copy_insn() (current.) (handles if the
instruction spreads across multiple pages on non fixed instruction
archs). 

2. make the copy_insn() arch specific, that would mean every arch will
have to do read_mapping_page etc.

3. have a arch specific hook in arch independent copy_insn code that
either does a memcpy for non fixed instruction archs or does an
assignment in archs like powerpc.

I think you are suggesting option 3.
But instead of adding another call that does the arch specific stuff, we
are probably be better of doing a memcpy. Right?

For all powerpc references to insn we could refer to it as u32 as
suggested by Oleg.
Benjamin Herrenschmidt - Aug. 23, 2012, 9:57 p.m.
On Thu, 2012-08-23 at 21:47 +0530, Srikar Dronamraju wrote:
> * Benjamin Herrenschmidt <benh@kernel.crashing.org> [2012-08-23 20:06:18]:
> 
> > On Thu, 2012-08-23 at 11:02 +0530, Srikar Dronamraju wrote:
> > > > 
> > > 
> > > insn is updated/accessed in the arch independent code. Size of
> > > uprobe_opcode_t could be different for different archs.
> > > uprobe_opcode_t
> > > represents the size of the smallest breakpoint instruction for an
> > > arch.
> > > 
> > > Hence u8 works out the best. I know we could still use uprobe_opcode_t
> > > and achieve the same. In which case, we would have to interpret
> > > MAX_UINSN_BYTES differently. Do you see any advantages of using
> > > uprobe_opcode_t instead of u8 across archs?
> > 
> > But don't you actively rely on the fact that on powerpc, unlike x86, you
> > -can- atomically replace an instruction with a single 32-bit store ?
> > 
> 
> We are not doing a replace here, we are only copying from the ->vm_file
> for the largest size instruction possible for that instruction. For
> powerpc, this is easy because of fixed size instructions.  
> 
> On other archs, at this point, we dont even know the length of the
> underlying instruction.
> 
> Now there are 3 ways to handle this:
> 1. use arch independent copy_insn() (current.) (handles if the
> instruction spreads across multiple pages on non fixed instruction
> archs). 
> 
> 2. make the copy_insn() arch specific, that would mean every arch will
> have to do read_mapping_page etc.
> 
> 3. have a arch specific hook in arch independent copy_insn code that
> either does a memcpy for non fixed instruction archs or does an
> assignment in archs like powerpc.
>
> I think you are suggesting option 3.
> But instead of adding another call that does the arch specific stuff, we
> are probably be better of doing a memcpy. Right?
> 
> For all powerpc references to insn we could refer to it as u32 as
> suggested by Oleg.

Ok, doens't matter much either way, it's just odd and inefficient.

Cheers,
Ben.
Michael Ellerman - Aug. 24, 2012, 1:13 a.m.
On Thu, 2012-08-23 at 11:28 +0530, Ananth N Mavinakayanahalli wrote:
> On Thu, Aug 23, 2012 at 02:28:20PM +1000, Michael Ellerman wrote:
> > On Wed, 2012-08-22 at 13:57 +0530, Ananth N Mavinakayanahalli wrote:
> > > From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> > > 
> > > This is the port of uprobes to powerpc. Usage is similar to x86.
> > 
> > Hi Ananth,
> > 
> > Excuse my ignorance of uprobes, some comments inline ...
> 
> Thanks for the review Michael!
> 
> > > [root@xxxx ~]# ./bin/perf probe -x /lib64/libc.so.6 malloc
> > > Added new event:
> > >   probe_libc:malloc    (on 0xb4860)
> > > 
> > > You can now use it in all perf tools, such as:
> > > 
> > > 	perf record -e probe_libc:malloc -aR sleep 1
> > 
> > Is there a test suite for any of this?
> 
> We don't have a formal testsuite yet, but the usual way of testing it is
> to run kernbench while registering/unregistering a bunch of probes
> periodically.

OK. Someone should put that on their TODO list, otherwise in a year or
two it'll be broken and we won't notice :)


> > It would be nice if someone could consolidate this with kprobe_opcode_t.
> 
> Thats on the TODO after the uprobes code stabilizes further.
> 
> I am wondering which file would be appropriate? We could either
> consolidate a bunch of these into asm/kdebug.h or asm/ptrace.h. Any
> preference/suggestion?

Add a new one :)

> > > +#define MAX_UINSN_BYTES			4
> > > +#define UPROBE_XOL_SLOT_BYTES		(MAX_UINSN_BYTES)
> > > +
> > > +#define UPROBE_SWBP_INSN		0x7fe00008
> > 
> > This is just "trap" ?
> 
> Yes. But since its referred to in arch agnostic code too, we'd have to alias
> it thus.

Yep I was just checking, I think it's probably worth a comment.

> > > +#define UPROBE_SWBP_INSN_SIZE		4 /* swbp insn size in bytes */
> > > +
> > > +#define IS_TW(instr)		(((instr) & 0xfc0007fe) == 0x7c000008)
> > > +#define IS_TD(instr)		(((instr) & 0xfc0007fe) == 0x7c000088)
> > > +#define IS_TDI(instr)		(((instr) & 0xfc000000) == 0x08000000)
> > > +#define IS_TWI(instr)		(((instr) & 0xfc000000) == 0x0c000000)
> > > +
> > > +#define is_trap(instr)	(IS_TW(instr) || IS_TD(instr) || \
> > > +			IS_TWI(instr) || IS_TDI(instr))
> > 
> > These seem to be duplicated in kprobes.h, can we consolidate them.
> 
> Yes, similar to the opcode_t types above.

Hmm, OK. Any reason you can't include kprobes.h to get those?

> > > +struct arch_uprobe {
> > > +	u8	insn[MAX_UINSN_BYTES];
> > > +};
> > 
> > Why not uprobe_opcode_t insn ?
> 
> I had a similar discussion with Srikar while doing the port, but he has
> reasons for this...

OK, will argue with him :D

> > > +/**
> > > + * arch_uprobe_analyze_insn
> > 
> > Analyze what about the instruction?
> 

+ /*

> Depends on the architecture. On x86, we need to verify if the address is
> at an instruction boundary, and if the instruction can be probed at all.
> On powerpc, we have an easier time. We just validate the address is
> aligned at instruction boundary and flag if the instruction at the
> address is a trap variant.

+ */

:)


> > > + * @mm: the probed address space.
> > > + * @arch_uprobe: the probepoint information.
> > > + * @addr: vaddr to probe.
> > > + * Return 0 on success or a -ve number on error.
> > > + */
> > > +int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr)
> > > +{
> > > +	unsigned int insn;
> > > +
> > > +	if (addr & 0x03)
> > > +		return -EINVAL;
> > > +
> > > +	memcpy(&insn, auprobe->insn, MAX_UINSN_BYTES);
> > 
> > We shouldn't need to use memcpy, we know it's a u32.
> 
> OK. Right now, its u8 insn[4], so I did this to be 'correct'. But I
> agree we can just do an assignment.

Yeah, at least in the arch code I think we should just use u32 and
assign directly.

> > > +/* callback routine for handling exceptions. */
> > > +int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data)
> > > +{
> > > +	struct die_args *args = data;
> > > +	struct pt_regs *regs = args->regs;
> > > +
> > > +	/* We are only interested in userspace traps */
> > > +	if (regs && !user_mode(regs))
> > > +		return NOTIFY_DONE;
> > 
> > Do we ever get here with a NULL regs?
> 
> I don't think so. Its just a paranoid check. Do you prefer it to be
> removed?

Yeah. A NULL regs here is a kernel bug, so I think it's actually
preferable to crash than silently return.

cheers
Michael Ellerman - Aug. 24, 2012, 1:33 a.m.
On Thu, 2012-08-23 at 11:02 +0530, Srikar Dronamraju wrote:
> > 
> > These seem to be duplicated in kprobes.h, can we consolidate them.
> > 
> > > +struct arch_uprobe {
> > > +	u8	insn[MAX_UINSN_BYTES];
> > > +};
> > 
> > Why not uprobe_opcode_t insn ?
> > 
> 
> insn is updated/accessed in the arch independent code. Size of
> uprobe_opcode_t could be different for different archs. uprobe_opcode_t
> represents the size of the smallest breakpoint instruction for an arch.
> 
> Hence u8 works out the best. I know we could still use uprobe_opcode_t
> and achieve the same. In which case, we would have to interpret
> MAX_UINSN_BYTES differently. Do you see any advantages of using
> uprobe_opcode_t instead of u8 across archs?

It's not a big deal, just a bit of a nit.

It just feels a bit messy to have uprobe_opcode_t and also expect the
arch code to use an array.

It seems like if we need uprobe_opcode_t then the generic code should
expect to use that everywhere.

Or, we should get ride of uprobe_opcode_t and use u8 (or void *).


> > >  void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
> > >  {
> > > +	if (thread_info_flags & _TIF_UPROBE) {
> > > +		clear_thread_flag(TIF_UPROBE);
> > > +		uprobe_notify_resume(regs);
> > > +	}
> > 
> > Presumably this ordering is crucial, ie. uprobes before signals.
> > 
> 
> Yes, we would want uprobes to have a say before do_signal can take a
> look.

A comment would be good IMHO, so in 10 years we don't forget.

> > I am probably missing something, but why do we need to execute out of
> > line?
> 
> The other alternative to execute out of line is the current inline
> singlestepping. In inline singlestepping, we would have to guard other
> tasks from executing the same instruction. 
> 
> Executing out of line solves this problem.

Yeah ignore that, I was forgetting that the original trap is caused by a
patched instruction.

> > > +/**
> > > + * uprobe_get_swbp_addr - compute address of swbp given post-swbp regs
> > > + * @regs: Reflects the saved state of the task after it has hit a breakpoint
> > > + * instruction.
> > > + * Return the address of the breakpoint instruction.
> > > + */
> > > +unsigned long uprobe_get_swbp_addr(struct pt_regs *regs)
> > > +{
> > > +	return instruction_pointer(regs);
> > > +}
> > 
> > This seems like it would be better in asm/uprobes.h as a static inline,
> > but that's not your fault.
> > 
> 
> We want archs to override uprobe_get_swbp_addr() if the default
> uprobe_get_swbp_addr() doesnt suite for a particular arch. Do you have
> better ways to achieve this. Initially we were using arch specific
> callbacks from which we moved to using weak functions based on reviews.

OK, having a default that is weak makes sense if there is one behaviour
that is common across arches, but a few arches need to override it. A
static inline in the arch header is simpler if the behaviour is actually
different across all arches.

The current default implements the x86 behaviour, where the instruction
pointer has moved past the breakpoint instruction.

What does ARM do? ie. is this behaviour actually the right default?

> > > +	/*
> > > +	 * emulate_step() returns 1 if the insn was successfully emulated.
> > > +	 * For all other cases, we need to single-step in hardware.
> > > +	 */
> > > +	ret = emulate_step(regs, insn);
> > > +	if (ret > 0)
> > > +		return true;
> > 
> > This actually emulates the instruction, ie. the contents of regs are
> > changed based on the instruction.
> > 
> > That seems to differ vs x86, where arch_uprobe_skip_sstep() just checks
> > the instruction and returns true/false. Is that because on x86 they are
> > only returning true for nops? ie. there is no emulation to be done?
> > 
> 
> Yes, In x86, we currently support skip for nop instructions, Once we add
> code for skipping other instructions in x86, we could enhance them to
> skip them too.

(and emulating them right?)

> > It's a little surprising that can_skip_sstep() actually emulates the
> > instruction, but again that's not your fault.
> 
> Are you saying its doing more than what the name suggests or to the fact
> that can_skip_step should ideally be called just once?

Yeah just that the name suggests it is just a "checker" function, but it
actually emulates the instruction and changes the task state!

cheers
Benjamin Herrenschmidt - Aug. 24, 2012, 7:07 a.m.
On Fri, 2012-08-24 at 11:13 +1000, Michael Ellerman wrote:
> 
> Yeah. A NULL regs here is a kernel bug, so I think it's actually
> preferable to crash than silently return. 

Or best, if you think there's a remote chance that the bug might hit:

	if (WARN(!regs))
		return

Cheers,
Ben.
Ananth N Mavinakayanahalli - Aug. 24, 2012, 7:37 a.m.
On Fri, Aug 24, 2012 at 05:07:31PM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2012-08-24 at 11:13 +1000, Michael Ellerman wrote:
> > 
> > Yeah. A NULL regs here is a kernel bug, so I think it's actually
> > preferable to crash than silently return. 
> 
> Or best, if you think there's a remote chance that the bug might hit:
> 
> 	if (WARN(!regs))
> 		return

Incorporated this and other suggestions from Michael in v5 I just
posted.

Ananth

Patch

Index: linux-tip-16aug/arch/powerpc/include/asm/thread_info.h
===================================================================
--- linux-tip-16aug.orig/arch/powerpc/include/asm/thread_info.h
+++ linux-tip-16aug/arch/powerpc/include/asm/thread_info.h
@@ -102,6 +102,7 @@  static inline struct thread_info *curren
 #define TIF_RESTOREALL		11	/* Restore all regs (implies NOERROR) */
 #define TIF_NOERROR		12	/* Force successful syscall return */
 #define TIF_NOTIFY_RESUME	13	/* callback before returning to user */
+#define TIF_UPROBE		14	/* breakpointed or single-stepping */
 #define TIF_SYSCALL_TRACEPOINT	15	/* syscall tracepoint instrumentation */
 
 /* as above, but as bit values */
@@ -118,12 +119,13 @@  static inline struct thread_info *curren
 #define _TIF_RESTOREALL		(1<<TIF_RESTOREALL)
 #define _TIF_NOERROR		(1<<TIF_NOERROR)
 #define _TIF_NOTIFY_RESUME	(1<<TIF_NOTIFY_RESUME)
+#define _TIF_UPROBE		(1<<TIF_UPROBE)
 #define _TIF_SYSCALL_TRACEPOINT	(1<<TIF_SYSCALL_TRACEPOINT)
 #define _TIF_SYSCALL_T_OR_A	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)
 
 #define _TIF_USER_WORK_MASK	(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
-				 _TIF_NOTIFY_RESUME)
+				 _TIF_NOTIFY_RESUME | _TIF_UPROBE)
 #define _TIF_PERSYSCALL_MASK	(_TIF_RESTOREALL|_TIF_NOERROR)
 
 /* Bits in local_flags */
Index: linux-tip-16aug/arch/powerpc/include/asm/uprobes.h
===================================================================
--- /dev/null
+++ linux-tip-16aug/arch/powerpc/include/asm/uprobes.h
@@ -0,0 +1,58 @@ 
+#ifndef _ASM_UPROBES_H
+#define _ASM_UPROBES_H
+/*
+ * User-space Probes (UProbes) for powerpc
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2007-2012
+ *
+ * Adapted from the x86 port by Ananth N Mavinakayanahalli <ananth@in.ibm.com>
+ */
+
+#include <linux/notifier.h>
+
+typedef unsigned int uprobe_opcode_t;
+
+#define MAX_UINSN_BYTES			4
+#define UPROBE_XOL_SLOT_BYTES		(MAX_UINSN_BYTES)
+
+#define UPROBE_SWBP_INSN		0x7fe00008
+#define UPROBE_SWBP_INSN_SIZE		4 /* swbp insn size in bytes */
+
+#define IS_TW(instr)		(((instr) & 0xfc0007fe) == 0x7c000008)
+#define IS_TD(instr)		(((instr) & 0xfc0007fe) == 0x7c000088)
+#define IS_TDI(instr)		(((instr) & 0xfc000000) == 0x08000000)
+#define IS_TWI(instr)		(((instr) & 0xfc000000) == 0x0c000000)
+
+#define is_trap(instr)	(IS_TW(instr) || IS_TD(instr) || \
+			IS_TWI(instr) || IS_TDI(instr))
+
+struct arch_uprobe {
+	u8	insn[MAX_UINSN_BYTES];
+};
+
+struct arch_uprobe_task {
+	unsigned long	saved_trap_nr;
+};
+
+extern int  arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr);
+extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
+extern int  arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs);
+extern int  arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs *regs);
+extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
+extern int  arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data);
+extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs *regs);
+#endif	/* _ASM_UPROBES_H */
Index: linux-tip-16aug/arch/powerpc/kernel/Makefile
===================================================================
--- linux-tip-16aug.orig/arch/powerpc/kernel/Makefile
+++ linux-tip-16aug/arch/powerpc/kernel/Makefile
@@ -96,6 +96,7 @@  obj-$(CONFIG_MODULES)		+= ppc_ksyms.o
 obj-$(CONFIG_BOOTX_TEXT)	+= btext.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_KPROBES)		+= kprobes.o
+obj-$(CONFIG_UPROBES)		+= uprobes.o
 obj-$(CONFIG_PPC_UDBG_16550)	+= legacy_serial.o udbg_16550.o
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-$(CONFIG_SWIOTLB)		+= dma-swiotlb.o
Index: linux-tip-16aug/arch/powerpc/kernel/signal.c
===================================================================
--- linux-tip-16aug.orig/arch/powerpc/kernel/signal.c
+++ linux-tip-16aug/arch/powerpc/kernel/signal.c
@@ -11,6 +11,7 @@ 
 
 #include <linux/tracehook.h>
 #include <linux/signal.h>
+#include <linux/uprobes.h>
 #include <linux/key.h>
 #include <asm/hw_breakpoint.h>
 #include <asm/uaccess.h>
@@ -157,6 +158,11 @@  static int do_signal(struct pt_regs *reg
 
 void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 {
+	if (thread_info_flags & _TIF_UPROBE) {
+		clear_thread_flag(TIF_UPROBE);
+		uprobe_notify_resume(regs);
+	}
+
 	if (thread_info_flags & _TIF_SIGPENDING)
 		do_signal(regs);
 
Index: linux-tip-16aug/arch/powerpc/kernel/uprobes.c
===================================================================
--- /dev/null
+++ linux-tip-16aug/arch/powerpc/kernel/uprobes.c
@@ -0,0 +1,180 @@ 
+/*
+ * User-space Probes (UProbes) for powerpc
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2007-2012
+ *
+ * Adapted from the x86 port by Ananth N Mavinakayanahalli <ananth@in.ibm.com>
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/ptrace.h>
+#include <linux/uprobes.h>
+#include <linux/uaccess.h>
+#include <linux/kdebug.h>
+
+#include <asm/sstep.h>
+
+#define UPROBE_TRAP_NR	UINT_MAX
+
+/**
+ * arch_uprobe_analyze_insn
+ * @mm: the probed address space.
+ * @arch_uprobe: the probepoint information.
+ * @addr: vaddr to probe.
+ * Return 0 on success or a -ve number on error.
+ */
+int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr)
+{
+	unsigned int insn;
+
+	if (addr & 0x03)
+		return -EINVAL;
+
+	memcpy(&insn, auprobe->insn, MAX_UINSN_BYTES);
+	if (is_trap(insn))
+		return -ENOTSUPP;
+	return 0;
+}
+
+/*
+ * arch_uprobe_pre_xol - prepare to execute out of line.
+ * @auprobe: the probepoint information.
+ * @regs: reflects the saved user state of current task.
+ */
+int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+	struct arch_uprobe_task *autask = &current->utask->autask;
+
+	autask->saved_trap_nr = current->thread.trap_nr;
+	current->thread.trap_nr = UPROBE_TRAP_NR;
+	regs->nip = current->utask->xol_vaddr;
+	return 0;
+}
+
+/**
+ * uprobe_get_swbp_addr - compute address of swbp given post-swbp regs
+ * @regs: Reflects the saved state of the task after it has hit a breakpoint
+ * instruction.
+ * Return the address of the breakpoint instruction.
+ */
+unsigned long uprobe_get_swbp_addr(struct pt_regs *regs)
+{
+	return instruction_pointer(regs);
+}
+
+/*
+ * If xol insn itself traps and generates a signal (SIGILL/SIGSEGV/etc),
+ * then detect the case where a singlestepped instruction jumps back to its
+ * own address. It is assumed that anything like do_page_fault/do_trap/etc
+ * sets thread.trap_nr != -1.
+ *
+ * arch_uprobe_pre_xol/arch_uprobe_post_xol save/restore thread.trap_nr,
+ * arch_uprobe_xol_was_trapped() simply checks that ->trap_nr is not equal to
+ * UPROBE_TRAP_NR == -1 set by arch_uprobe_pre_xol().
+ */
+bool arch_uprobe_xol_was_trapped(struct task_struct *t)
+{
+	if (t->thread.trap_nr != UPROBE_TRAP_NR)
+		return true;
+
+	return false;
+}
+
+/*
+ * Called after single-stepping. To avoid the SMP problems that can
+ * occur when we temporarily put back the original opcode to
+ * single-step, we single-stepped a copy of the instruction.
+ *
+ * This function prepares to resume execution after the single-step.
+ */
+int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+	struct uprobe_task *utask = current->utask;
+
+	WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);
+
+	current->thread.trap_nr = utask->autask.saved_trap_nr;
+
+	/*
+	 * On powerpc, except for loads and stores, most instructions
+	 * including ones that alter code flow (branches, calls, returns)
+	 * are emulated in the kernel. We get here only if the emulation
+	 * support doesn't exist and have to fix-up the next instruction
+	 * to be executed.
+	 */
+	regs->nip = utask->vaddr + MAX_UINSN_BYTES;
+	return 0;
+}
+
+/* callback routine for handling exceptions. */
+int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data)
+{
+	struct die_args *args = data;
+	struct pt_regs *regs = args->regs;
+
+	/* We are only interested in userspace traps */
+	if (regs && !user_mode(regs))
+		return NOTIFY_DONE;
+
+	switch (val) {
+	case DIE_BPT:
+		if (uprobe_pre_sstep_notifier(regs))
+			return NOTIFY_STOP;
+		break;
+	case DIE_SSTEP:
+		if (uprobe_post_sstep_notifier(regs))
+			return NOTIFY_STOP;
+	default:
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+/*
+ * This function gets called when XOL instruction either gets trapped or
+ * the thread has a fatal signal, so reset the instruction pointer to its
+ * probed address.
+ */
+void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+	struct uprobe_task *utask = current->utask;
+
+	current->thread.trap_nr = utask->autask.saved_trap_nr;
+	instruction_pointer_set(regs, utask->vaddr);
+}
+
+/*
+ * See if the instruction can be emulated.
+ * Returns true if instruction was emulated, false otherwise.
+ */
+bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+	int ret;
+	unsigned int insn;
+
+	memcpy(&insn, auprobe->insn, MAX_UINSN_BYTES);
+
+	/*
+	 * emulate_step() returns 1 if the insn was successfully emulated.
+	 * For all other cases, we need to single-step in hardware.
+	 */
+	ret = emulate_step(regs, insn);
+	if (ret > 0)
+		return true;
+
+	return false;
+}
Index: linux-tip-16aug/arch/powerpc/Kconfig
===================================================================
--- linux-tip-16aug.orig/arch/powerpc/Kconfig
+++ linux-tip-16aug/arch/powerpc/Kconfig
@@ -239,6 +239,9 @@  config PPC_OF_PLATFORM_PCI
 config ARCH_SUPPORTS_DEBUG_PAGEALLOC
 	def_bool y
 
+config ARCH_SUPPORTS_UPROBES
+	def_bool y
+
 config PPC_ADV_DEBUG_REGS
 	bool
 	depends on 40x || BOOKE