diff mbox

[BUG?] 3.0-rc4+ftrace+kprobe: set kprobe at instruction 'stwu' lead to system crash/freeze

Message ID BANLkTi=gx73d6-s-jqeke05SQYorpnzLVw@mail.gmail.com (mailing list archive)
State Superseded
Headers show

Commit Message

Yong Zhang June 24, 2011, 9:21 a.m. UTC
Hi,

When I use kprobe to do something, I found some wired thing.

When CONFIG_FUNCTION_TRACER is disabled:
(gdb) disassemble do_fork
Dump of assembler code for function do_fork:
   0xc0037390 <+0>:	mflr    r0
   0xc0037394 <+4>:	stwu    r1,-64(r1)
   0xc0037398 <+8>:	mfcr    r12
   0xc003739c <+12>:	stmw    r27,44(r1)

Then I:
modprobe kprobe_example func=do_fork offset=4
ls
Things works well.

But when CONFIG_FUNCTION_TRACER is enabled:
(gdb) disassemble do_fork
Dump of assembler code for function do_fork:
   0xc0040334 <+0>:	mflr    r0
   0xc0040338 <+4>:	stw     r0,4(r1)
   0xc004033c <+8>:	bl      0xc00109d4 <mcount>
   0xc0040340 <+12>:	stwu    r1,-80(r1)
   0xc0040344 <+16>:	mflr    r0
   0xc0040348 <+20>:	stw     r0,84(r1)
   0xc004034c <+24>:	mfcr    r12
Then I:
modprobe kprobe_example func=do_fork offset=12
ls
'ls' will never retrun. system freeze.

I'm using toolchain from:http://www.denx.de/wiki/ELDK-5/WebHome
powerpc-linux-gcc -v
Using built-in specs.
COLLECT_GCC=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/bin/powerpc-linux/powerpc-linux-gcc
COLLECT_LTO_WRAPPER=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/libexec/powerpc-linux/gcc/powerpc-linux/4.5.1/lto-wrapper
Target: powerpc-linux
Configured with:
/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/work/i686-nativesdk-oesdk-linux/gcc-cross-canadian-powerpc-4.5.1-r4/gcc-4.5.1/configure
--build=x86_64-linux --host=i686-oesdk-linux --target=powerpc-linux
--prefix=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr
--exec_prefix=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr
--bindir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/bin/powerpc-linux
--sbindir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/bin/powerpc-linux
--libexecdir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/libexec/powerpc-linux
--datadir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/share
--sysconfdir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/etc
--sharedstatedir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/com
--localstatedir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/var
--libdir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/lib/powerpc-linux
--includedir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/include
--oldincludedir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/include
--infodir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/share/info
--mandir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/share/man
--with-libtool-sysroot=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/i686-oesdk-linux-nativesdk
--with-gnu-ld --enable-shared --enable-languages=c,c++
--enable-threads=posix --disable-multilib --enable-c99
--enable-long-long --enable-symvers=gnu --enable-libstdcxx-pch
--program-prefix=powerpc-linux- --enable-lto --enable-libssp
--disable-bootstrap --disable-libgomp --disable-libmudflap
--enable-cheaders=c_global
--with-local-prefix=/opt/eldk-5.0/powerpc/sysroots/powerpc-linux/usr
--with-gxx-include-dir=/usr/include/c++
--with-build-time-tools=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/x86_64-linux/usr/powerpc-linux/bin
--with-sysroot=/opt/eldk-5.0/powerpc/sysroots/powerpc-linux
--with-build-sysroot=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/powerpc
--disable-libunwind-exceptions --disable-libssp --disable-libgomp
--disable-libmudflap
--with-mpfr=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/i686-oesdk-linux-nativesdk
--with-mpc=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/i686-oesdk-linux-nativesdk
--enable-__cxa_atexit
Thread model: posix
gcc version 4.5.1 (GCC)


And kernel config is attached.

BTW, I have made a patch to make kprobe_example set breakpoint easily,
attached too.

Thanks,
Yong

Comments

Steven Rostedt June 24, 2011, 10:29 a.m. UTC | #1
On Fri, 2011-06-24 at 17:21 +0800, Yong Zhang wrote:
> Hi,
> 
> When I use kprobe to do something, I found some wired thing.
> 
> When CONFIG_FUNCTION_TRACER is disabled:
> (gdb) disassemble do_fork
> Dump of assembler code for function do_fork:
>    0xc0037390 <+0>:	mflr    r0
>    0xc0037394 <+4>:	stwu    r1,-64(r1)
>    0xc0037398 <+8>:	mfcr    r12
>    0xc003739c <+12>:	stmw    r27,44(r1)
> 
> Then I:
> modprobe kprobe_example func=do_fork offset=4
> ls
> Things works well.
> 
> But when CONFIG_FUNCTION_TRACER is enabled:
> (gdb) disassemble do_fork
> Dump of assembler code for function do_fork:
>    0xc0040334 <+0>:	mflr    r0
>    0xc0040338 <+4>:	stw     r0,4(r1)
>    0xc004033c <+8>:	bl      0xc00109d4 <mcount>
>    0xc0040340 <+12>:	stwu    r1,-80(r1)
>    0xc0040344 <+16>:	mflr    r0
>    0xc0040348 <+20>:	stw     r0,84(r1)
>    0xc004034c <+24>:	mfcr    r12
> Then I:
> modprobe kprobe_example func=do_fork offset=12
> ls
> 'ls' will never retrun. system freeze.

I'm not sure if x86 had a similar issue.

Masami, have any ideas to why this happened?

I don't have a PPC32 to test on, but I can try it out on my PPC64.

-- Steve

> 
> I'm using toolchain from:http://www.denx.de/wiki/ELDK-5/WebHome
> powerpc-linux-gcc -v
> Using built-in specs.
> COLLECT_GCC=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/bin/powerpc-linux/powerpc-linux-gcc
> COLLECT_LTO_WRAPPER=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/libexec/powerpc-linux/gcc/powerpc-linux/4.5.1/lto-wrapper
> Target: powerpc-linux
> Configured with:
> /opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/work/i686-nativesdk-oesdk-linux/gcc-cross-canadian-powerpc-4.5.1-r4/gcc-4.5.1/configure
> --build=x86_64-linux --host=i686-oesdk-linux --target=powerpc-linux
> --prefix=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr
> --exec_prefix=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr
> --bindir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/bin/powerpc-linux
> --sbindir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/bin/powerpc-linux
> --libexecdir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/libexec/powerpc-linux
> --datadir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/share
> --sysconfdir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/etc
> --sharedstatedir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/com
> --localstatedir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/var
> --libdir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/lib/powerpc-linux
> --includedir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/include
> --oldincludedir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/include
> --infodir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/share/info
> --mandir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/share/man
> --with-libtool-sysroot=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/i686-oesdk-linux-nativesdk
> --with-gnu-ld --enable-shared --enable-languages=c,c++
> --enable-threads=posix --disable-multilib --enable-c99
> --enable-long-long --enable-symvers=gnu --enable-libstdcxx-pch
> --program-prefix=powerpc-linux- --enable-lto --enable-libssp
> --disable-bootstrap --disable-libgomp --disable-libmudflap
> --enable-cheaders=c_global
> --with-local-prefix=/opt/eldk-5.0/powerpc/sysroots/powerpc-linux/usr
> --with-gxx-include-dir=/usr/include/c++
> --with-build-time-tools=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/x86_64-linux/usr/powerpc-linux/bin
> --with-sysroot=/opt/eldk-5.0/powerpc/sysroots/powerpc-linux
> --with-build-sysroot=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/powerpc
> --disable-libunwind-exceptions --disable-libssp --disable-libgomp
> --disable-libmudflap
> --with-mpfr=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/i686-oesdk-linux-nativesdk
> --with-mpc=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/i686-oesdk-linux-nativesdk
> --enable-__cxa_atexit
> Thread model: posix
> gcc version 4.5.1 (GCC)
> 
> 
> And kernel config is attached.
> 
> BTW, I have made a patch to make kprobe_example set breakpoint easily,
> attached too.
> 
> Thanks,
> Yong
>
Masami Hiramatsu June 26, 2011, 2:47 p.m. UTC | #2
(2011/06/24 19:29), Steven Rostedt wrote:
> On Fri, 2011-06-24 at 17:21 +0800, Yong Zhang wrote:
>> Hi,
>>
>> When I use kprobe to do something, I found some wired thing.
>>
>> When CONFIG_FUNCTION_TRACER is disabled:
>> (gdb) disassemble do_fork
>> Dump of assembler code for function do_fork:
>>    0xc0037390 <+0>:	mflr    r0
>>    0xc0037394 <+4>:	stwu    r1,-64(r1)
>>    0xc0037398 <+8>:	mfcr    r12
>>    0xc003739c <+12>:	stmw    r27,44(r1)
>>
>> Then I:
>> modprobe kprobe_example func=do_fork offset=4
>> ls
>> Things works well.
>>
>> But when CONFIG_FUNCTION_TRACER is enabled:
>> (gdb) disassemble do_fork
>> Dump of assembler code for function do_fork:
>>    0xc0040334 <+0>:	mflr    r0
>>    0xc0040338 <+4>:	stw     r0,4(r1)
>>    0xc004033c <+8>:	bl      0xc00109d4 <mcount>
>>    0xc0040340 <+12>:	stwu    r1,-80(r1)
>>    0xc0040344 <+16>:	mflr    r0
>>    0xc0040348 <+20>:	stw     r0,84(r1)
>>    0xc004034c <+24>:	mfcr    r12
>> Then I:
>> modprobe kprobe_example func=do_fork offset=12
>> ls
>> 'ls' will never retrun. system freeze.
> 
> I'm not sure if x86 had a similar issue.
> 
> Masami, have any ideas to why this happened?

No, I don't familiar with ppc implementation. I guess
that single-step resume code failed to emulate the
instruction, but it strongly depends on ppc arch.
Maybe IBM people may know what happened.

Ananth, Jim, would you have any ideas?

Thank you,

> 
> I don't have a PPC32 to test on, but I can try it out on my PPC64.
> 
> -- Steve
> 
>>
>> I'm using toolchain from:http://www.denx.de/wiki/ELDK-5/WebHome
>> powerpc-linux-gcc -v
>> Using built-in specs.
>> COLLECT_GCC=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/bin/powerpc-linux/powerpc-linux-gcc
>> COLLECT_LTO_WRAPPER=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/libexec/powerpc-linux/gcc/powerpc-linux/4.5.1/lto-wrapper
>> Target: powerpc-linux
>> Configured with:
>> /opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/work/i686-nativesdk-oesdk-linux/gcc-cross-canadian-powerpc-4.5.1-r4/gcc-4.5.1/configure
>> --build=x86_64-linux --host=i686-oesdk-linux --target=powerpc-linux
>> --prefix=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr
>> --exec_prefix=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr
>> --bindir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/bin/powerpc-linux
>> --sbindir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/bin/powerpc-linux
>> --libexecdir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/libexec/powerpc-linux
>> --datadir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/share
>> --sysconfdir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/etc
>> --sharedstatedir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/com
>> --localstatedir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/var
>> --libdir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/lib/powerpc-linux
>> --includedir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/include
>> --oldincludedir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/include
>> --infodir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/share/info
>> --mandir=/opt/eldk-5.0/powerpc/sysroots/i686-oesdk-linux/usr/share/man
>> --with-libtool-sysroot=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/i686-oesdk-linux-nativesdk
>> --with-gnu-ld --enable-shared --enable-languages=c,c++
>> --enable-threads=posix --disable-multilib --enable-c99
>> --enable-long-long --enable-symvers=gnu --enable-libstdcxx-pch
>> --program-prefix=powerpc-linux- --enable-lto --enable-libssp
>> --disable-bootstrap --disable-libgomp --disable-libmudflap
>> --enable-cheaders=c_global
>> --with-local-prefix=/opt/eldk-5.0/powerpc/sysroots/powerpc-linux/usr
>> --with-gxx-include-dir=/usr/include/c++
>> --with-build-time-tools=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/x86_64-linux/usr/powerpc-linux/bin
>> --with-sysroot=/opt/eldk-5.0/powerpc/sysroots/powerpc-linux
>> --with-build-sysroot=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/powerpc
>> --disable-libunwind-exceptions --disable-libssp --disable-libgomp
>> --disable-libmudflap
>> --with-mpfr=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/i686-oesdk-linux-nativesdk
>> --with-mpc=/opt/poky/build/eldk-2011-05-20-5cde06e-powerpc/tmp/sysroots/i686-oesdk-linux-nativesdk
>> --enable-__cxa_atexit
>> Thread model: posix
>> gcc version 4.5.1 (GCC)
>>
>>
>> And kernel config is attached.
>>
>> BTW, I have made a patch to make kprobe_example set breakpoint easily,
>> attached too.
>>
>> Thanks,
>> Yong
>>
Ananth N Mavinakayanahalli June 27, 2011, 10:01 a.m. UTC | #3
On Sun, Jun 26, 2011 at 11:47:13PM +0900, Masami Hiramatsu wrote:
> (2011/06/24 19:29), Steven Rostedt wrote:
> > On Fri, 2011-06-24 at 17:21 +0800, Yong Zhang wrote:
> >> Hi,
> >>
> >> When I use kprobe to do something, I found some wired thing.
> >>
> >> When CONFIG_FUNCTION_TRACER is disabled:
> >> (gdb) disassemble do_fork
> >> Dump of assembler code for function do_fork:
> >>    0xc0037390 <+0>:	mflr    r0
> >>    0xc0037394 <+4>:	stwu    r1,-64(r1)
> >>    0xc0037398 <+8>:	mfcr    r12
> >>    0xc003739c <+12>:	stmw    r27,44(r1)
> >>
> >> Then I:
> >> modprobe kprobe_example func=do_fork offset=4
> >> ls
> >> Things works well.
> >>
> >> But when CONFIG_FUNCTION_TRACER is enabled:
> >> (gdb) disassemble do_fork
> >> Dump of assembler code for function do_fork:
> >>    0xc0040334 <+0>:	mflr    r0
> >>    0xc0040338 <+4>:	stw     r0,4(r1)
> >>    0xc004033c <+8>:	bl      0xc00109d4 <mcount>
> >>    0xc0040340 <+12>:	stwu    r1,-80(r1)
> >>    0xc0040344 <+16>:	mflr    r0
> >>    0xc0040348 <+20>:	stw     r0,84(r1)
> >>    0xc004034c <+24>:	mfcr    r12
> >> Then I:
> >> modprobe kprobe_example func=do_fork offset=12
> >> ls
> >> 'ls' will never retrun. system freeze.
> > 
> > I'm not sure if x86 had a similar issue.
> > 
> > Masami, have any ideas to why this happened?
> 
> No, I don't familiar with ppc implementation. I guess
> that single-step resume code failed to emulate the
> instruction, but it strongly depends on ppc arch.
> Maybe IBM people may know what happened.
> 
> Ananth, Jim, would you have any ideas?

On powerpc, we emulate sstep whenever possible. Only recently support to
emulate loads and stores got added. I don't have access to a powerpc box
today... but will try to recreate the problem ASAP and see what could be
happening in the presence of mcount.

Ananth
Ananth N Mavinakayanahalli June 28, 2011, 10:41 a.m. UTC | #4
On Mon, Jun 27, 2011 at 03:31:05PM +0530, Ananth N Mavinakayanahalli wrote:
> On Sun, Jun 26, 2011 at 11:47:13PM +0900, Masami Hiramatsu wrote:
> > (2011/06/24 19:29), Steven Rostedt wrote:
> > > On Fri, 2011-06-24 at 17:21 +0800, Yong Zhang wrote:
> > >> Hi,
> > >>
> > >> When I use kprobe to do something, I found some wired thing.
> > >>
> > >> When CONFIG_FUNCTION_TRACER is disabled:
> > >> (gdb) disassemble do_fork
> > >> Dump of assembler code for function do_fork:
> > >>    0xc0037390 <+0>:	mflr    r0
> > >>    0xc0037394 <+4>:	stwu    r1,-64(r1)
> > >>    0xc0037398 <+8>:	mfcr    r12
> > >>    0xc003739c <+12>:	stmw    r27,44(r1)
> > >>
> > >> Then I:
> > >> modprobe kprobe_example func=do_fork offset=4
> > >> ls
> > >> Things works well.
> > >>
> > >> But when CONFIG_FUNCTION_TRACER is enabled:
> > >> (gdb) disassemble do_fork
> > >> Dump of assembler code for function do_fork:
> > >>    0xc0040334 <+0>:	mflr    r0
> > >>    0xc0040338 <+4>:	stw     r0,4(r1)
> > >>    0xc004033c <+8>:	bl      0xc00109d4 <mcount>
> > >>    0xc0040340 <+12>:	stwu    r1,-80(r1)
> > >>    0xc0040344 <+16>:	mflr    r0
> > >>    0xc0040348 <+20>:	stw     r0,84(r1)
> > >>    0xc004034c <+24>:	mfcr    r12
> > >> Then I:
> > >> modprobe kprobe_example func=do_fork offset=12
> > >> ls
> > >> 'ls' will never retrun. system freeze.

My access to a 32bit powerpc box is very limited. Also, embedded powerpc
has had issues with gcc-4.6 while gcc-4.5 worked fine.

> > > I'm not sure if x86 had a similar issue.
> > > 
> > > Masami, have any ideas to why this happened?
> > 
> > No, I don't familiar with ppc implementation. I guess
> > that single-step resume code failed to emulate the
> > instruction, but it strongly depends on ppc arch.
> > Maybe IBM people may know what happened.
> > 
> > Ananth, Jim, would you have any ideas?
> 
> On powerpc, we emulate sstep whenever possible. Only recently support to
> emulate loads and stores got added. I don't have access to a powerpc box
> today... but will try to recreate the problem ASAP and see what could be
> happening in the presence of mcount.

I tried to recreate this problem on a 64-bit pSeries box without
success. Every one of the instructions in the stream at .do_fork are
emulated and work fine there -- no hangs/crashes with or without
function tracer.

Yong,
I am copying Kumar to see if he knows of any issues with 32-bit kprobes
(he wrote it) or with the function tracer, or with the toolchain itself.

You may want to check if, in the failure case, the instruction in
question is single-stepped or emulated (print out the value of
kprobe->ainsn.boostable in the post_handler) and see if you can find a
pattern to the failure.

Ananth
Steven Rostedt June 28, 2011, 1:15 p.m. UTC | #5
On Tue, 2011-06-28 at 16:11 +0530, Ananth N Mavinakayanahalli wrote:

> My access to a 32bit powerpc box is very limited. Also, embedded powerpc
> has had issues with gcc-4.6 while gcc-4.5 worked fine.

I'd work to debug this too, but I don't have access to a 32bit ppc
either. Although I've been told that people would send me one ;)

-- Steve
Yong Zhang June 29, 2011, 6:23 a.m. UTC | #6
On Mon, Jun 27, 2011 at 6:01 PM, Ananth N Mavinakayanahalli
<ananth@in.ibm.com> wrote:
> On Sun, Jun 26, 2011 at 11:47:13PM +0900, Masami Hiramatsu wrote:
>> (2011/06/24 19:29), Steven Rostedt wrote:
>> > On Fri, 2011-06-24 at 17:21 +0800, Yong Zhang wrote:
>> >> Hi,
>> >>
>> >> When I use kprobe to do something, I found some wired thing.
>> >>
>> >> When CONFIG_FUNCTION_TRACER is disabled:
>> >> (gdb) disassemble do_fork
>> >> Dump of assembler code for function do_fork:
>> >>    0xc0037390 <+0>:        mflr    r0
>> >>    0xc0037394 <+4>:        stwu    r1,-64(r1)
>> >>    0xc0037398 <+8>:        mfcr    r12
>> >>    0xc003739c <+12>:       stmw    r27,44(r1)
>> >>
>> >> Then I:
>> >> modprobe kprobe_example func=do_fork offset=4
>> >> ls
>> >> Things works well.
>> >>
>> >> But when CONFIG_FUNCTION_TRACER is enabled:
>> >> (gdb) disassemble do_fork
>> >> Dump of assembler code for function do_fork:
>> >>    0xc0040334 <+0>:        mflr    r0
>> >>    0xc0040338 <+4>:        stw     r0,4(r1)
>> >>    0xc004033c <+8>:        bl      0xc00109d4 <mcount>
>> >>    0xc0040340 <+12>:       stwu    r1,-80(r1)
>> >>    0xc0040344 <+16>:       mflr    r0
>> >>    0xc0040348 <+20>:       stw     r0,84(r1)
>> >>    0xc004034c <+24>:       mfcr    r12
>> >> Then I:
>> >> modprobe kprobe_example func=do_fork offset=12
>> >> ls
>> >> 'ls' will never retrun. system freeze.
>> >
>> > I'm not sure if x86 had a similar issue.
>> >
>> > Masami, have any ideas to why this happened?
>>
>> No, I don't familiar with ppc implementation. I guess
>> that single-step resume code failed to emulate the
>> instruction, but it strongly depends on ppc arch.
>> Maybe IBM people may know what happened.
>>
>> Ananth, Jim, would you have any ideas?
>
> On powerpc, we emulate sstep whenever possible. Only recently support to
> emulate loads and stores got added. I don't have access to a powerpc box
> today... but will try to recreate the problem ASAP and see what could be
> happening in the presence of mcount.

After taking more testing on it, it looks like the issue doesn't
depend on mcount
(AKA. CONFIG_FUNCTION_TRACER)

As I said in the first email, with eldk-5.0 CONFIG_FUNCTION_TRACER=n
will work well.

But when I'm using eldk-4.2[1], both will fail. But the funny thing is when I
set kprobe at several functions some works fine but some will fail. For example,
at this time do_fork() works well, but show_interrupt() will crash.

root@unknown:/root> insmod kprobe_example.ko func=show_interrupts
Planted kprobe at c009be18
root@unknown:/root> cat /proc/interrupts
pre_handler: p->addr = 0xc009be18, nip = 0xc009be18, msr = 0x29000
post_handler: p->addr = 0xc009be18, msr = 0x29000,boostable = 1
Oops: Exception in kernel mode, sig: 11 [#1]
PREEMPT MPC8536 DS
Modules linked in: kprobe_example
NIP: df159e74 LR: c0106f40 CTR: c009be18
REGS: df159d90 TRAP: 0700   Not tainted  (3.0.0-rc4-00001-ge8ffcca-dirty)
MSR: 00029000 <EE,ME,CE>  CR: 20202688  XER: 00000000
TASK = dfaa5340[613] 'cat' THREAD: df158000
GPR00: fffff000 df159e40 dfaa5340 df024a00 df159e78 00000000 df159f20 00000001
GPR08: c10060d0 c009be18 00029000 df159e70 00000000 1001ca74 1ffb5f00 100a01cc
GPR16: 00000000 00000000 00000000 00000000 df024a28 df159f20 00000000 dfbff080
GPR24: 10016000 00001000 df159f20 df159e78 dfbff080 df159e78 df024a00 df159e70
NIP [df159e74] 0xdf159e74
LR [c0106f40] seq_read+0x2a4/0x568
Call Trace:
[df159e40] [00029000] 0x29000 (unreliable)
[df159e74] [00000000]   (null)
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 60026bfc1fe79aed ]---
Segmentation fault

Thanks,
Yong

[1]: http://ftp.denx.de/pub/eldk/4.2/
Ananth N Mavinakayanahalli June 29, 2011, 6:46 a.m. UTC | #7
On Wed, Jun 29, 2011 at 02:23:28PM +0800, Yong Zhang wrote:
> On Mon, Jun 27, 2011 at 6:01 PM, Ananth N Mavinakayanahalli
> <ananth@in.ibm.com> wrote:
> > On Sun, Jun 26, 2011 at 11:47:13PM +0900, Masami Hiramatsu wrote:
> >> (2011/06/24 19:29), Steven Rostedt wrote:
> >> > On Fri, 2011-06-24 at 17:21 +0800, Yong Zhang wrote:
> >> >> Hi,
> >> >>
> >> >> When I use kprobe to do something, I found some wired thing.
> >> >>
> >> >> When CONFIG_FUNCTION_TRACER is disabled:
> >> >> (gdb) disassemble do_fork
> >> >> Dump of assembler code for function do_fork:
> >> >>    0xc0037390 <+0>:        mflr    r0
> >> >>    0xc0037394 <+4>:        stwu    r1,-64(r1)
> >> >>    0xc0037398 <+8>:        mfcr    r12
> >> >>    0xc003739c <+12>:       stmw    r27,44(r1)
> >> >>
> >> >> Then I:
> >> >> modprobe kprobe_example func=do_fork offset=4
> >> >> ls
> >> >> Things works well.
> >> >>
> >> >> But when CONFIG_FUNCTION_TRACER is enabled:
> >> >> (gdb) disassemble do_fork
> >> >> Dump of assembler code for function do_fork:
> >> >>    0xc0040334 <+0>:        mflr    r0
> >> >>    0xc0040338 <+4>:        stw     r0,4(r1)
> >> >>    0xc004033c <+8>:        bl      0xc00109d4 <mcount>
> >> >>    0xc0040340 <+12>:       stwu    r1,-80(r1)
> >> >>    0xc0040344 <+16>:       mflr    r0
> >> >>    0xc0040348 <+20>:       stw     r0,84(r1)
> >> >>    0xc004034c <+24>:       mfcr    r12
> >> >> Then I:
> >> >> modprobe kprobe_example func=do_fork offset=12
> >> >> ls
> >> >> 'ls' will never retrun. system freeze.
> >> >
> >> > I'm not sure if x86 had a similar issue.
> >> >
> >> > Masami, have any ideas to why this happened?
> >>
> >> No, I don't familiar with ppc implementation. I guess
> >> that single-step resume code failed to emulate the
> >> instruction, but it strongly depends on ppc arch.
> >> Maybe IBM people may know what happened.
> >>
> >> Ananth, Jim, would you have any ideas?
> >
> > On powerpc, we emulate sstep whenever possible. Only recently support to
> > emulate loads and stores got added. I don't have access to a powerpc box
> > today... but will try to recreate the problem ASAP and see what could be
> > happening in the presence of mcount.
> 
> After taking more testing on it, it looks like the issue doesn't
> depend on mcount
> (AKA. CONFIG_FUNCTION_TRACER)
> 
> As I said in the first email, with eldk-5.0 CONFIG_FUNCTION_TRACER=n
> will work well.
> 
> But when I'm using eldk-4.2[1], both will fail. But the funny thing is when I
> set kprobe at several functions some works fine but some will fail. For example,
> at this time do_fork() works well, but show_interrupt() will crash.

Certain functions are off limits for probing -- look for __kprobe
annotations in the kernel. Some such functions are arch specific, but
show_interrupts() would definitely not be one of them. It works fine on
my (64bit) test box.

At this time, I think your best bet is to work with the eldk folks to
narrow down the problem. Given the current set of data, I am inclined to
think it could be an eldk bug, not a kernel one.

Ananth
Yong Zhang June 30, 2011, 7:08 a.m. UTC | #8
On Wed, Jun 29, 2011 at 2:46 PM, Ananth N Mavinakayanahalli
<ananth@in.ibm.com> wrote:
>
> Certain functions are off limits for probing -- look for __kprobe

Yup.

> annotations in the kernel. Some such functions are arch specific, but
> show_interrupts() would definitely not be one of them. It works fine on
> my (64bit) test box.
>
> At this time, I think your best bet is to work with the eldk folks to
> narrow down the problem.

I'll give a try :)

> Given the current set of data, I am inclined to
> think it could be an eldk bug, not a kernel one.

Maybe, but the fact is if I don't use kprobe, things works
very well.

I'll be back if there is any update :)

Thanks,
Yong
Tiejun Chen July 1, 2011, 10:03 a.m. UTC | #9
Yong Zhang wrote:
> On Mon, Jun 27, 2011 at 6:01 PM, Ananth N Mavinakayanahalli
> <ananth@in.ibm.com> wrote:
>> On Sun, Jun 26, 2011 at 11:47:13PM +0900, Masami Hiramatsu wrote:
>>> (2011/06/24 19:29), Steven Rostedt wrote:
>>>> On Fri, 2011-06-24 at 17:21 +0800, Yong Zhang wrote:
>>>>> Hi,
>>>>>
>>>>> When I use kprobe to do something, I found some wired thing.
>>>>>
>>>>> When CONFIG_FUNCTION_TRACER is disabled:
>>>>> (gdb) disassemble do_fork
>>>>> Dump of assembler code for function do_fork:
>>>>>    0xc0037390 <+0>:        mflr    r0
>>>>>    0xc0037394 <+4>:        stwu    r1,-64(r1)
>>>>>    0xc0037398 <+8>:        mfcr    r12
>>>>>    0xc003739c <+12>:       stmw    r27,44(r1)
>>>>>
>>>>> Then I:
>>>>> modprobe kprobe_example func=do_fork offset=4
>>>>> ls
>>>>> Things works well.
>>>>>
>>>>> But when CONFIG_FUNCTION_TRACER is enabled:
>>>>> (gdb) disassemble do_fork
>>>>> Dump of assembler code for function do_fork:
>>>>>    0xc0040334 <+0>:        mflr    r0
>>>>>    0xc0040338 <+4>:        stw     r0,4(r1)
>>>>>    0xc004033c <+8>:        bl      0xc00109d4 <mcount>
>>>>>    0xc0040340 <+12>:       stwu    r1,-80(r1)
>>>>>    0xc0040344 <+16>:       mflr    r0
>>>>>    0xc0040348 <+20>:       stw     r0,84(r1)
>>>>>    0xc004034c <+24>:       mfcr    r12
>>>>> Then I:
>>>>> modprobe kprobe_example func=do_fork offset=12
>>>>> ls
>>>>> 'ls' will never retrun. system freeze.
>>>> I'm not sure if x86 had a similar issue.
>>>>
>>>> Masami, have any ideas to why this happened?
>>> No, I don't familiar with ppc implementation. I guess
>>> that single-step resume code failed to emulate the
>>> instruction, but it strongly depends on ppc arch.
>>> Maybe IBM people may know what happened.
>>>
>>> Ananth, Jim, would you have any ideas?
>> On powerpc, we emulate sstep whenever possible. Only recently support to
>> emulate loads and stores got added. I don't have access to a powerpc box
>> today... but will try to recreate the problem ASAP and see what could be
>> happening in the presence of mcount.
> 
> After taking more testing on it, it looks like the issue doesn't
> depend on mcount
> (AKA. CONFIG_FUNCTION_TRACER)
> 
> As I said in the first email, with eldk-5.0 CONFIG_FUNCTION_TRACER=n
> will work well.
> 
> But when I'm using eldk-4.2[1], both will fail. But the funny thing is when I
> set kprobe at several functions some works fine but some will fail. For example,
> at this time do_fork() works well, but show_interrupt() will crash.
> 
> root@unknown:/root> insmod kprobe_example.ko func=show_interrupts
> Planted kprobe at c009be18
> root@unknown:/root> cat /proc/interrupts
> pre_handler: p->addr = 0xc009be18, nip = 0xc009be18, msr = 0x29000
> post_handler: p->addr = 0xc009be18, msr = 0x29000,boostable = 1
> Oops: Exception in kernel mode, sig: 11 [#1]
> PREEMPT MPC8536 DS
> Modules linked in: kprobe_example
> NIP: df159e74 LR: c0106f40 CTR: c009be18
> REGS: df159d90 TRAP: 0700   Not tainted  (3.0.0-rc4-00001-ge8ffcca-dirty)
> MSR: 00029000 <EE,ME,CE>  CR: 20202688  XER: 00000000
> TASK = dfaa5340[613] 'cat' THREAD: df158000
> GPR00: fffff000 df159e40 dfaa5340 df024a00 df159e78 00000000 df159f20 00000001
> GPR08: c10060d0 c009be18 00029000 df159e70 00000000 1001ca74 1ffb5f00 100a01cc
> GPR16: 00000000 00000000 00000000 00000000 df024a28 df159f20 00000000 dfbff080
> GPR24: 10016000 00001000 df159f20 df159e78 dfbff080 df159e78 df024a00 df159e70
> NIP [df159e74] 0xdf159e74
> LR [c0106f40] seq_read+0x2a4/0x568
> Call Trace:
> [df159e40] [00029000] 0x29000 (unreliable)
> [df159e74] [00000000]   (null)
> Instruction dump:
> XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> ---[ end trace 60026bfc1fe79aed ]---
> Segmentation fault

Maybe I can understand this problem.

When we kprobe these operations such as store-and-update-word for SP(r1),

stwu r1, -A(r1)

The program exception is triggered then PPC always allocate an exception frame
as shown as the follows:

old r1 --------
	 ...
         nip
         gpr[2]~gpr[31]
         gpr[1] <--------- old r1 is stored here.
	 gpr[0]
       -------- <-- pr_regs @offset 16 bytes
       padding
       STACK_FRAME_REGS_MARKER
       LR
       back chain
new r1 --------

Here emulate_step() is called to emulate 'stwu'. Actually this is equivalent to
1> update pr_regs->gpr[1] = mem(old r1 + (-A))
2> 'stw <old r1>, mem<(old r1 + (-A)) >

You should notice the stack based on new r1 would be covered with mem<old r1
+(-A)>. So after this, the kernel exit from post_krpobe, something would be
broken. This should depend on sizeof(-A).

For kprobe show_interrupts, you can see pregs->nip is re-written violently so
kernel issued.

But sometimes we may only re-write some violate registers the kernel still
alive. And so this is just why the kernel works well for some kprobed point
after you change some kernel options/toolchains.

If I'm correct its difficult to kprobe these stwu sp operation since the
sizeof(-A) is undermined for the kernel. So we have to implement in-depend
interrupt stack like PPC64.

Tiejun

> 
> Thanks,
> Yong
> 
> [1]: http://ftp.denx.de/pub/eldk/4.2/
>
Yong Zhang July 4, 2011, 2:23 a.m. UTC | #10
On Fri, Jul 1, 2011 at 6:03 PM, tiejun.chen <tiejun.chen@windriver.com> wrote:
>> root@unknown:/root> insmod kprobe_example.ko func=show_interrupts
>> Planted kprobe at c009be18
>> root@unknown:/root> cat /proc/interrupts
>> pre_handler: p->addr = 0xc009be18, nip = 0xc009be18, msr = 0x29000
>> post_handler: p->addr = 0xc009be18, msr = 0x29000,boostable = 1
>> Oops: Exception in kernel mode, sig: 11 [#1]
>> PREEMPT MPC8536 DS
>> Modules linked in: kprobe_example
>> NIP: df159e74 LR: c0106f40 CTR: c009be18
>> REGS: df159d90 TRAP: 0700   Not tainted  (3.0.0-rc4-00001-ge8ffcca-dirty)
>> MSR: 00029000 <EE,ME,CE>  CR: 20202688  XER: 00000000
>> TASK = dfaa5340[613] 'cat' THREAD: df158000
>> GPR00: fffff000 df159e40 dfaa5340 df024a00 df159e78 00000000 df159f20 00000001
>> GPR08: c10060d0 c009be18 00029000 df159e70 00000000 1001ca74 1ffb5f00 100a01cc
>> GPR16: 00000000 00000000 00000000 00000000 df024a28 df159f20 00000000 dfbff080
>> GPR24: 10016000 00001000 df159f20 df159e78 dfbff080 df159e78 df024a00 df159e70
>> NIP [df159e74] 0xdf159e74
>> LR [c0106f40] seq_read+0x2a4/0x568
>> Call Trace:
>> [df159e40] [00029000] 0x29000 (unreliable)
>> [df159e74] [00000000]   (null)
>> Instruction dump:
>> XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
>> XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
>> ---[ end trace 60026bfc1fe79aed ]---
>> Segmentation fault
>
> Maybe I can understand this problem.
>
> When we kprobe these operations such as store-and-update-word for SP(r1),
>
> stwu r1, -A(r1)
>
> The program exception is triggered then PPC always allocate an exception frame
> as shown as the follows:
>
> old r1 --------
>         ...
>         nip
>         gpr[2]~gpr[31]
>         gpr[1] <--------- old r1 is stored here.
>         gpr[0]
>       -------- <-- pr_regs @offset 16 bytes
>       padding
>       STACK_FRAME_REGS_MARKER
>       LR
>       back chain
> new r1 --------
>
> Here emulate_step() is called to emulate 'stwu'. Actually this is equivalent to
> 1> update pr_regs->gpr[1] = mem(old r1 + (-A))
> 2> 'stw <old r1>, mem<(old r1 + (-A)) >
>
> You should notice the stack based on new r1 would be covered with mem<old r1
> +(-A)>. So after this, the kernel exit from post_krpobe, something would be
> broken. This should depend on sizeof(-A).
>
> For kprobe show_interrupts, you can see pregs->nip is re-written violently so
> kernel issued.

Yeah, my debug also show this, so this is the root cause.
Thanks for your explanation.

>
> But sometimes we may only re-write some violate registers the kernel still
> alive. And so this is just why the kernel works well for some kprobed point
> after you change some kernel options/toolchains.
>
> If I'm correct its difficult to kprobe these stwu sp operation since the
> sizeof(-A) is undermined for the kernel. So we have to implement in-depend
> interrupt stack like PPC64.

Hmmm, a dedicated exception stack will smooth the concern IMHO,
Ben, Kuma?

Thanks,
Yong
Benjamin Herrenschmidt Nov. 30, 2011, 4:19 a.m. UTC | #11
On Fri, 2011-07-01 at 18:03 +0800, tiejun.chen wrote:
> 
> Here emulate_step() is called to emulate 'stwu'. Actually this is equivalent to
> 1> update pr_regs->gpr[1] = mem(old r1 + (-A))
> 2> 'stw <old r1>, mem<(old r1 + (-A)) >
> 
> You should notice the stack based on new r1 would be covered with mem<old r1
> +(-A)>. So after this, the kernel exit from post_krpobe, something would be
> broken. This should depend on sizeof(-A).
> 
> For kprobe show_interrupts, you can see pregs->nip is re-written violently so
> kernel issued.
> 
> But sometimes we may only re-write some violate registers the kernel still
> alive. And so this is just why the kernel works well for some kprobed point
> after you change some kernel options/toolchains.
> 
> If I'm correct its difficult to kprobe these stwu sp operation since the
> sizeof(-A) is undermined for the kernel. So we have to implement in-depend
> interrupt stack like PPC64.

So I've spent a lot of time trying to find a better way to fix that bug
and I think I might have finally found one :-)

 - When you try to emulate stwcx on the kernel stack (and only there),
don't actually perform the store. Set a TIF flag instead to indicate
special processing in the exception return path and store the address to
update somewhere either in a free slot of the stack frame itself of
somewhere in the thread struct (the former would be easier). You may as
well do some sanity checking on the value while at it to catch errors
early.

 - In the exception return code, we already test for various TIF flags
(*** see comment below, it's even buggy today for preempt ***), so we
add a test for that flag and go to do_work.

 - At the end of do_work, we check for this TIF flag. If it's not set or
any other flag is set, move on as usual. However, if it's the only flag
still set:

 - Copy the exception frame we're about to unwind to just -below- the
new r1 value we want to write to. Then perform the write, and change
r1 to point to that copy of the frame.

 - Branch to restore: which will unwind everything from that copy of
the frame, and eventually set r1 to GPR(1) in the copy which contains
the new value of r1.

This is the less intrusive approach and should work just fine, it's also
more robust than anything I've been able to think of and the approach
would work for 32 and 64-bit similarily.

(***) Above comment about a bug: If you look at entry_64.S version of
ret_from_except_lite you'll notice that in the !preempt case, after
we've checked MSR_PR we test for any TIF flag in _TIF_USER_WORK_MASK to
decide whether to go to do_work or not. However, in the preempt case, we
do a convoluted trick to test SIGPENDING only if PR was set and always
test NEED_RESCHED ... but we forget to test any other bit of
_TIF_USER_WORK_MASK !!! So that means that with preempt, we completely
fail to test for things like single step, syscall tracing, etc...

I think this should be fixed at the same time, by simplifying the code
by doing:

 - Test PR. If set, go to test_work_user, else continue (or the other
way around and call it test_work_kernel)

 - In test_work_user, always test for _TIF_USER_WORK_MASK to decide to
go to do_work, maybe call it do_user_work

 - In test_work_kernel, test for _TIF_KERNEL_WORK_MASK which is set to
our new flag along with NEED_RESCHED if preempt is enabled and branch to
do_kernel_work.

do_user_work is basically the same as today's user_work

do_kernel_work is basically the same as today preempt block with added
code to handle the new flag as described above.

Is anybody volunteering for fixing that ? I don't have the bandwidth
right now, but if nobody shows up I suppose I'll have to eventually deal
with it myself :-)

Cheers,
Ben.
Tiejun Chen Nov. 30, 2011, 11:06 a.m. UTC | #12
Benjamin Herrenschmidt wrote:
> On Fri, 2011-07-01 at 18:03 +0800, tiejun.chen wrote:
>> Here emulate_step() is called to emulate 'stwu'. Actually this is equivalent to
>> 1> update pr_regs->gpr[1] = mem(old r1 + (-A))
>> 2> 'stw <old r1>, mem<(old r1 + (-A)) >
>>
>> You should notice the stack based on new r1 would be covered with mem<old r1
>> +(-A)>. So after this, the kernel exit from post_krpobe, something would be
>> broken. This should depend on sizeof(-A).
>>
>> For kprobe show_interrupts, you can see pregs->nip is re-written violently so
>> kernel issued.
>>
>> But sometimes we may only re-write some violate registers the kernel still
>> alive. And so this is just why the kernel works well for some kprobed point
>> after you change some kernel options/toolchains.
>>
>> If I'm correct its difficult to kprobe these stwu sp operation since the
>> sizeof(-A) is undermined for the kernel. So we have to implement in-depend
>> interrupt stack like PPC64.
> 
> So I've spent a lot of time trying to find a better way to fix that bug
> and I think I might have finally found one :-)

I can understand what you mean in below since I remember you already	 clarified
this way previously.

> 
>  - When you try to emulate stwcx on the kernel stack (and only there),

I think it should be stwu/stdu.

> don't actually perform the store. Set a TIF flag instead to indicate
> special processing in the exception return path and store the address to
> update somewhere either in a free slot of the stack frame itself of
> somewhere in the thread struct (the former would be easier). You may as
> well do some sanity checking on the value while at it to catch errors
> early.
> 
>  - In the exception return code, we already test for various TIF flags
> (*** see comment below, it's even buggy today for preempt ***), so we
> add a test for that flag and go to do_work.
> 
>  - At the end of do_work, we check for this TIF flag. If it's not set or
> any other flag is set, move on as usual. However, if it's the only flag
> still set:
> 
>  - Copy the exception frame we're about to unwind to just -below- the
> new r1 value we want to write to. Then perform the write, and change
> r1 to point to that copy of the frame.
> 
>  - Branch to restore: which will unwind everything from that copy of
> the frame, and eventually set r1 to GPR(1) in the copy which contains
> the new value of r1.

We still can't restore this there. As you know this emulated store instruction
can touch any filed inside pt_regs. Sometimes nip would be involved for this
problematic location. And unfortunately, this is just that we meet currently. So
we have to go exc_exit_restart.

        .globl exc_exit_restart
exc_exit_restart:
        lwz     r11,_NIP(r1)
        lwz     r12,_MSR(r1)

I mean we have to do that real restore here. So I'm really not sure if its a
good way to add such a codes including check TIF/copy-get new r1/restore
operation here since this is so deep for the exception return code.

exc_exit_start:
        mtspr   SPRN_SRR0,r11
        mtspr   SPRN_SRR1,r12

> 
> This is the less intrusive approach and should work just fine, it's also
> more robust than anything I've been able to think of and the approach
> would work for 32 and 64-bit similarily.
> 
> (***) Above comment about a bug: If you look at entry_64.S version of
> ret_from_except_lite you'll notice that in the !preempt case, after
> we've checked MSR_PR we test for any TIF flag in _TIF_USER_WORK_MASK to
> decide whether to go to do_work or not. However, in the preempt case, we
> do a convoluted trick to test SIGPENDING only if PR was set and always
> test NEED_RESCHED ... but we forget to test any other bit of
> _TIF_USER_WORK_MASK !!! So that means that with preempt, we completely
> fail to test for things like single step, syscall tracing, etc...
> 

This is another problem we should address.

> I think this should be fixed at the same time, by simplifying the code
> by doing:
> 
>  - Test PR. If set, go to test_work_user, else continue (or the other
> way around and call it test_work_kernel)
> 
>  - In test_work_user, always test for _TIF_USER_WORK_MASK to decide to
> go to do_work, maybe call it do_user_work
> 
>  - In test_work_kernel, test for _TIF_KERNEL_WORK_MASK which is set to
> our new flag along with NEED_RESCHED if preempt is enabled and branch to
> do_kernel_work.
> 
> do_user_work is basically the same as today's user_work
> 
> do_kernel_work is basically the same as today preempt block with added
> code to handle the new flag as described above.
> 
> Is anybody volunteering for fixing that ? I don't have the bandwidth

I always use one specific kprobe stack to fix this for BOOKE and work well in my
local tree :) Do you remember my v3 patch? I think its possible to extend this
for all PPC variants.

Anyway, I'd like to be this volunteer with our last solution.

Tiejun

> right now, but if nobody shows up I suppose I'll have to eventually deal
> with it myself :-)
> 
> Cheers,
> Ben.
Benjamin Herrenschmidt Nov. 30, 2011, 9 p.m. UTC | #13
On Wed, 2011-11-30 at 19:06 +0800, tiejun.chen wrote:

> >  - Copy the exception frame we're about to unwind to just -below- the
> > new r1 value we want to write to. Then perform the write, and change
> > r1 to point to that copy of the frame.
> > 
> >  - Branch to restore: which will unwind everything from that copy of
> > the frame, and eventually set r1 to GPR(1) in the copy which contains
> > the new value of r1.
> 
> We still can't restore this there.

Yes, we can since we have copied the pt_regs down to -below- where we
are going to write to. That's the whole trick. We copy the pt_regs
somewhere "safe" and restore from there.

> I mean we have to do that real restore here. So I'm really not sure if its a
> good way to add such a codes including check TIF/copy-get new r1/restore
> operation here since this is so deep for the exception return code.

No. Re-read my explanation.

In the do_work case (so when things are still easy), we copy the pt_regs
of the return frame to a safe place (right below where we want to write
to typically), and change r1 to point to this "new" frame which we use
to restore from. Then we do the store which is now safe and go to an
unmodified "restore" exit path.

> > This is the less intrusive approach and should work just fine, it's also
> > more robust than anything I've been able to think of and the approach
> > would work for 32 and 64-bit similarily.
> > 
> > (***) Above comment about a bug: If you look at entry_64.S version of
> > ret_from_except_lite you'll notice that in the !preempt case, after
> > we've checked MSR_PR we test for any TIF flag in _TIF_USER_WORK_MASK to
> > decide whether to go to do_work or not. However, in the preempt case, we
> > do a convoluted trick to test SIGPENDING only if PR was set and always
> > test NEED_RESCHED ... but we forget to test any other bit of
> > _TIF_USER_WORK_MASK !!! So that means that with preempt, we completely
> > fail to test for things like single step, syscall tracing, etc...
> > 
> 
> This is another problem we should address.
> 
> > I think this should be fixed at the same time, by simplifying the code
> > by doing:
> > 
> >  - Test PR. If set, go to test_work_user, else continue (or the other
> > way around and call it test_work_kernel)
> > 
> >  - In test_work_user, always test for _TIF_USER_WORK_MASK to decide to
> > go to do_work, maybe call it do_user_work
> > 
> >  - In test_work_kernel, test for _TIF_KERNEL_WORK_MASK which is set to
> > our new flag along with NEED_RESCHED if preempt is enabled and branch to
> > do_kernel_work.
> > 
> > do_user_work is basically the same as today's user_work
> > 
> > do_kernel_work is basically the same as today preempt block with added
> > code to handle the new flag as described above.
> > 
> > Is anybody volunteering for fixing that ? I don't have the bandwidth
> 
> I always use one specific kprobe stack to fix this for BOOKE and work well in my
> local tree :) Do you remember my v3 patch? I think its possible to extend this
> for all PPC variants.
>
> Anyway, I'd like to be this volunteer with our last solution.

So the second problem I exposed, for which you just volunteered
(hint :-) is an orthogonal issue not related to kprobe or stacks which
happen to be something I discovered yesterday while looking at the code.

As for the solution to the emulation problem, re-read my explanation.
The whole trick is that we can avoid a separate stack (which I really
want to avoid) and we can avoid touching the low level restore code
path.

Cheers,
Ben.
Tiejun Chen Dec. 1, 2011, 10:44 a.m. UTC | #14
Benjamin Herrenschmidt wrote:
> On Wed, 2011-11-30 at 19:06 +0800, tiejun.chen wrote:
> 
>>>  - Copy the exception frame we're about to unwind to just -below- the
>>> new r1 value we want to write to. Then perform the write, and change
>>> r1 to point to that copy of the frame.
>>>
>>>  - Branch to restore: which will unwind everything from that copy of
>>> the frame, and eventually set r1 to GPR(1) in the copy which contains
>>> the new value of r1.
>> We still can't restore this there.
> 
> Yes, we can since we have copied the pt_regs down to -below- where we
> are going to write to. That's the whole trick. We copy the pt_regs
> somewhere "safe" and restore from there.
> 
>> I mean we have to do that real restore here. So I'm really not sure if its a
>> good way to add such a codes including check TIF/copy-get new r1/restore
>> operation here since this is so deep for the exception return code.
> 
> No. Re-read my explanation.
> 
> In the do_work case (so when things are still easy), we copy the pt_regs
> of the return frame to a safe place (right below where we want to write
> to typically), and change r1 to point to this "new" frame which we use
> to restore from. Then we do the store which is now safe and go to an
> unmodified "restore" exit path.

Do you mean we should push the original pt_regs (or that whole exception stack)
downwards the location the new r1 point? Then its safe to perform this real
emulated stw instruction. At last we will reroute r1 to that copied exception
frame to restore as normal. Right?

Here I suppose so, I implement this for PPC32 based on the above understanding.
I take a validation for kprobing do_fork()/show_interrupts(), now looks fine.
Tomorrow I will go PPC64, and hope its fine as well.

If everything is good I'll send these patches to linuxppc-dev next week.

Cheers
Tiejun

> 
>>> This is the less intrusive approach and should work just fine, it's also
>>> more robust than anything I've been able to think of and the approach
>>> would work for 32 and 64-bit similarily.
>>>
>>> (***) Above comment about a bug: If you look at entry_64.S version of
>>> ret_from_except_lite you'll notice that in the !preempt case, after
>>> we've checked MSR_PR we test for any TIF flag in _TIF_USER_WORK_MASK to
>>> decide whether to go to do_work or not. However, in the preempt case, we
>>> do a convoluted trick to test SIGPENDING only if PR was set and always
>>> test NEED_RESCHED ... but we forget to test any other bit of
>>> _TIF_USER_WORK_MASK !!! So that means that with preempt, we completely
>>> fail to test for things like single step, syscall tracing, etc...
>>>
>> This is another problem we should address.
>>
>>> I think this should be fixed at the same time, by simplifying the code
>>> by doing:
>>>
>>>  - Test PR. If set, go to test_work_user, else continue (or the other
>>> way around and call it test_work_kernel)
>>>
>>>  - In test_work_user, always test for _TIF_USER_WORK_MASK to decide to
>>> go to do_work, maybe call it do_user_work
>>>
>>>  - In test_work_kernel, test for _TIF_KERNEL_WORK_MASK which is set to
>>> our new flag along with NEED_RESCHED if preempt is enabled and branch to
>>> do_kernel_work.
>>>
>>> do_user_work is basically the same as today's user_work
>>>
>>> do_kernel_work is basically the same as today preempt block with added
>>> code to handle the new flag as described above.
>>>
>>> Is anybody volunteering for fixing that ? I don't have the bandwidth
>> I always use one specific kprobe stack to fix this for BOOKE and work well in my
>> local tree :) Do you remember my v3 patch? I think its possible to extend this
>> for all PPC variants.
>>
>> Anyway, I'd like to be this volunteer with our last solution.
> 
> So the second problem I exposed, for which you just volunteered
> (hint :-) is an orthogonal issue not related to kprobe or stacks which
> happen to be something I discovered yesterday while looking at the code.
> 
> As for the solution to the emulation problem, re-read my explanation.
> The whole trick is that we can avoid a separate stack (which I really
> want to avoid) and we can avoid touching the low level restore code
> path.
> 
> Cheers,
> Ben.
Benjamin Herrenschmidt Dec. 1, 2011, 9:37 p.m. UTC | #15
On Thu, 2011-12-01 at 18:44 +0800, tiejun.chen wrote:

> Do you mean we should push the original pt_regs (or that whole exception stack)
> downwards the location the new r1 point? Then its safe to perform this real
> emulated stw instruction. At last we will reroute r1 to that copied exception
> frame to restore as normal. Right?

Right. That way we don't have to modify the (sensitive) restore path, we
only hook around the do_work branch which is a lot easier and less
risky.

> Here I suppose so, I implement this for PPC32 based on the above understanding.
> I take a validation for kprobing do_fork()/show_interrupts(), now looks fine.
> Tomorrow I will go PPC64, and hope its fine as well.
> 
> If everything is good I'll send these patches to linuxppc-dev next week.

Excellent, thanks !

Cheers,
Ben.
diff mbox

Patch

diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c
index ebf5e0c..f0ce4e6 100644
--- a/samples/kprobes/kprobe_example.c
+++ b/samples/kprobes/kprobe_example.c
@@ -13,11 +13,17 @@ 
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/kprobes.h>
+#include <linux/limits.h>
+
+static char func_name[NAME_MAX] = "do_fork";
+module_param_string(func, func_name, NAME_MAX, S_IRUGO);
+MODULE_PARM_DESC(func, "Function to kprobe");
+
+static unsigned int offset;
+module_param(offset, uint, S_IRUGO);
 
 /* For each probe you need to allocate a kprobe structure */
-static struct kprobe kp = {
-	.symbol_name	= "do_fork",
-};
+static struct kprobe kp;
 
 /* kprobe pre_handler: called just before the probed instruction is executed */
 static int handler_pre(struct kprobe *p, struct pt_regs *regs)
@@ -76,6 +82,8 @@  static int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
 static int __init kprobe_init(void)
 {
 	int ret;
+	kp.symbol_name = func_name;
+	kp.offset = offset;
 	kp.pre_handler = handler_pre;
 	kp.post_handler = handler_post;
 	kp.fault_handler = handler_fault;