Patchwork [google,gcc-4_7,integration] Build more of libstdc++ with frame pointers

login
register
mail settings
Submitter Paul Pluzhnikov
Date Feb. 27, 2013, 8:13 p.m.
Message ID <ye6qhakx7bxl.fsf@elbrus2.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/223691/
State New
Headers show

Comments

Paul Pluzhnikov - Feb. 27, 2013, 8:13 p.m.
Greetings,

Google ref b/8187733

Build libstdc++-v3/src/c++11/debug.cc with -fno-omit-frame-pointer, so
frame-based unwinder can step through it.

Tested: bootstrap build and verified debug.cc is built with
-fno-omit-frame-pointer.

Ok for google/gcc-4_7 and google/integration?

Thanks,

--
Paul Pluzhnikov
Diego Novillo - Feb. 27, 2013, 8:14 p.m.
On Wed, Feb 27, 2013 at 3:13 PM, Paul Pluzhnikov <ppluzhnikov@google.com> wrote:

> Ok for google/gcc-4_7 and google/integration?

OK.


Diego.
Andrew Pinski - Feb. 27, 2013, 8:17 p.m.
On Wed, Feb 27, 2013 at 12:13 PM, Paul Pluzhnikov
<ppluzhnikov@google.com> wrote:
> Greetings,
>
> Google ref b/8187733
>
> Build libstdc++-v3/src/c++11/debug.cc with -fno-omit-frame-pointer, so
> frame-based unwinder can step through it.

Just an aside which program does not understand dwarf2 unwinding?
Perf is setup to understand it.  Valgrind is setup to understand it.
ASAN should be setup to understand it.

Thanks,
Andrew Pinski


>
> Tested: bootstrap build and verified debug.cc is built with
> -fno-omit-frame-pointer.
>
> Ok for google/gcc-4_7 and google/integration?
>
> Thanks,
>
> --
> Paul Pluzhnikov
>
>
> Index: libstdc++-v3/src/c++11/Makefile.am
> ===================================================================
> --- libstdc++-v3/src/c++11/Makefile.am  (revision 196316)
> +++ libstdc++-v3/src/c++11/Makefile.am  (working copy)
> @@ -124,3 +124,4 @@
>
>  # Google-specific pessimization
>  functexcept.lo_no_omit_frame_pointer = -fno-omit-frame-pointer
> +debug.lo_no_omit_frame_pointer = -fno-omit-frame-pointer
> Index: libstdc++-v3/src/c++11/Makefile.in
> ===================================================================
> --- libstdc++-v3/src/c++11/Makefile.in  (revision 196316)
> +++ libstdc++-v3/src/c++11/Makefile.in  (working copy)
> @@ -391,6 +391,7 @@
>
>  # Google-specific pessimization
>  functexcept.lo_no_omit_frame_pointer = -fno-omit-frame-pointer
> +debug.lo_no_omit_frame_pointer = -fno-omit-frame-pointer
>  all: all-am
>
>  .SUFFIXES:
>
>
Paul Pluzhnikov - Feb. 27, 2013, 8:50 p.m.
On Wed, Feb 27, 2013 at 12:17 PM, Andrew Pinski <pinskia@gmail.com> wrote:

> Just an aside which program does not understand dwarf2 unwinding?

All Google executables currently link in a frame-based unwinder.

This allows us to report crashes and record runtime statistics from the
executable itself, in ways that are convenient when dealing with thousands
of executables spread across millions of machines, and which are much
harder to achieve using external tools.

There is of course libunwind, but it turns out that it's very hard to
beat speed and simplicity of a frame-based unwinder (which matters when
you collect stack traces across live production services).

I hope this answers your question.
Jeff Law - Feb. 27, 2013, 8:53 p.m.
On 02/27/2013 01:50 PM, Paul Pluzhnikov wrote:
> On Wed, Feb 27, 2013 at 12:17 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>
>> Just an aside which program does not understand dwarf2 unwinding?
>
> All Google executables currently link in a frame-based unwinder.
>
> This allows us to report crashes and record runtime statistics from the
> executable itself, in ways that are convenient when dealing with thousands
> of executables spread across millions of machines, and which are much
> harder to achieve using external tools.
>
> There is of course libunwind, but it turns out that it's very hard to
> beat speed and simplicity of a frame-based unwinder (which matters when
> you collect stack traces across live production services).
>
> I hope this answers your question.
Or imagine something like trying to get traces out of low level 
profilers and such.  There are situations where dwarf2 unwinders are 
just too damn bloated.

jeff
Andrew Pinski - Feb. 27, 2013, 9:46 p.m.
On Wed, Feb 27, 2013 at 12:53 PM, Jeff Law <law@redhat.com> wrote:
> On 02/27/2013 01:50 PM, Paul Pluzhnikov wrote:
>>
>> On Wed, Feb 27, 2013 at 12:17 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>
>>> Just an aside which program does not understand dwarf2 unwinding?
>>
>>
>> All Google executables currently link in a frame-based unwinder.
>>
>> This allows us to report crashes and record runtime statistics from the
>> executable itself, in ways that are convenient when dealing with thousands
>> of executables spread across millions of machines, and which are much
>> harder to achieve using external tools.
>>
>> There is of course libunwind, but it turns out that it's very hard to
>> beat speed and simplicity of a frame-based unwinder (which matters when
>> you collect stack traces across live production services).

libunwind is not needed since there is already a dwarf2 based unwinder
that is accessible in libgcc already.  I don't know why people still
promote libunwind when libgcc already has similar facilities.

>>
>> I hope this answers your question.
>
> Or imagine something like trying to get traces out of low level profilers
> and such.  There are situations where dwarf2 unwinders are just too damn
> bloated.

Perf handles this by saving off some of the stack space instead and
then post-process it.  This is why I said perf handles this case
already.  Now oprofile does not but oprofile is really going away.

Thanks,
Andrew

>
> jeff
Jeff Law - Feb. 27, 2013, 9:48 p.m.
On 02/27/2013 02:46 PM, Andrew Pinski wrote:
> Perf handles this by saving off some of the stack space instead and
> then post-process it.  This is why I said perf handles this case
> already.  Now oprofile does not but oprofile is really going away.
I'm well aware of how perf handles this, having had numerous discussions 
with Red Hat's kernel team about this issue.

jeff
Paul Pluzhnikov - Feb. 27, 2013, 10:01 p.m.
On Wed, Feb 27, 2013 at 1:46 PM, Andrew Pinski <pinskia@gmail.com> wrote:

> libunwind is not needed since there is already a dwarf2 based unwinder
> that is accessible in libgcc already.

Last I checked, libgcc dwarf handling code called malloc, and thus wasn't
suitable when you want to instrument malloc *itself* to collect stack
traces, nor for SIGPROF profiling.

Is libgcc dwarf code async-signal safe now?
If not, will it ever be?

> I don't know why people still
> promote libunwind when libgcc already has similar facilities.

Because these facilities are not (or at least were not in the recent past)
suitable for the unwinder requirements that people have?

Thanks,
Ian Taylor - Feb. 28, 2013, 1:52 a.m.
On Wed, Feb 27, 2013 at 2:01 PM, Paul Pluzhnikov <ppluzhnikov@google.com> wrote:
>
> Last I checked, libgcc dwarf handling code called malloc, and thus wasn't
> suitable when you want to instrument malloc *itself* to collect stack
> traces, nor for SIGPROF profiling.
>
> Is libgcc dwarf code async-signal safe now?
> If not, will it ever be?

I think that the libgcc unwinder only calls malloc if somebody calls
__register_frame_info.  And in ordinary circumstances nobody ever
calls that.  I don't think there is an malloc issue there.  Unless
I've forgotten something.

I thought a more serious issue is that the libgcc unwinder has no way
of dealing with a corrupt stack--it will simply crash.

Ian
Paul Pluzhnikov - Feb. 28, 2013, 4:23 p.m.
On Wed, Feb 27, 2013 at 5:52 PM, Ian Lance Taylor <iant@google.com> wrote:

> I think that the libgcc unwinder only calls malloc if somebody calls
> __register_frame_info.

So I looked. Indeed it doesn't call malloc in our environment, but
does call dl_iterate_phdr, which is just as deadly.

To be fair, libunwind does that as well, and we have to provide a
workaround for that.

Still, out-of-the-box libgcc unwinder is not suitable for our unwind
requirements.

> I thought a more serious issue is that the libgcc unwinder has no way
> of dealing with a corrupt stack--it will simply crash.

That too ...

Thanks,
Ian Taylor - Feb. 28, 2013, 5:28 p.m.
On Thu, Feb 28, 2013 at 8:23 AM, Paul Pluzhnikov <ppluzhnikov@google.com> wrote:
> On Wed, Feb 27, 2013 at 5:52 PM, Ian Lance Taylor <iant@google.com> wrote:
>
>> I think that the libgcc unwinder only calls malloc if somebody calls
>> __register_frame_info.
>
> So I looked. Indeed it doesn't call malloc in our environment, but
> does call dl_iterate_phdr, which is just as deadly.

Hmmm, I guess I can't remember why.  dl_iterate_phdr uses a recursive
lock so there shouldn't be any deadlock problem.  If the problem is
the dynamic lookup of the dl_iterate_phdr symbol for lazy PLT
evaluation, that could be finessed by calling it beforehand.

Ian
Paul Pluzhnikov - Feb. 28, 2013, 6:10 p.m.
On Thu, Feb 28, 2013 at 9:28 AM, Ian Lance Taylor <iant@google.com> wrote:

>> does call dl_iterate_phdr, which is just as deadly.
>
> Hmmm, I guess I can't remember why.

Let me refresh your memory. You've seen this deadlock before:

  Thread 1                    Thread 2
  dlopen                      malloc
   ... takes loader lock       ... takes malloc lock
   malloc                      _Unwind_Backtrace
   ... needs malloc lock         dl_iterate_phdr
       held by T2                ... needs loader lock held by T1



--
Paul Pluzhnikov
Ian Taylor - Feb. 28, 2013, 6:35 p.m.
On Thu, Feb 28, 2013 at 10:10 AM, Paul Pluzhnikov
<ppluzhnikov@google.com> wrote:
> On Thu, Feb 28, 2013 at 9:28 AM, Ian Lance Taylor <iant@google.com> wrote:
>
>>> does call dl_iterate_phdr, which is just as deadly.
>>
>> Hmmm, I guess I can't remember why.
>
> Let me refresh your memory. You've seen this deadlock before:
>
>   Thread 1                    Thread 2
>   dlopen                      malloc
>    ... takes loader lock       ... takes malloc lock
>    malloc                      _Unwind_Backtrace
>    ... needs malloc lock         dl_iterate_phdr
>        held by T2                ... needs loader lock held by T1

Oh yeah.  Thanks.

Ian
Cary Coutant - Feb. 28, 2013, 7:57 p.m.
>   Thread 1                    Thread 2
>   dlopen                      malloc
>    ... takes loader lock       ... takes malloc lock
>    malloc                      _Unwind_Backtrace
>    ... needs malloc lock         dl_iterate_phdr
>        held by T2                ... needs loader lock held by T1

Am I missing something, or wouldn't it be feasible when instrumenting
malloc to call _Unwind_Backtrace before obtaining the malloc lock (or
drop the malloc lock while calling _Unwind_Backtrace)? Similarly,
couldn't dlopen drop the loader lock while calling malloc?

-cary
Jakub Jelinek - Feb. 28, 2013, 7:59 p.m.
On Thu, Feb 28, 2013 at 11:57:48AM -0800, Cary Coutant wrote:
> Similarly, couldn't dlopen drop the loader lock while calling malloc?

It can't, but perhaps it could call some alternative malloc instead
(the simpler malloc version in ld.so or similar).

	Jakub
Paul Pluzhnikov - Feb. 28, 2013, 9:49 p.m.
On Thu, Feb 28, 2013 at 9:07 AM, Xinliang David Li <davidxl@google.com> wrote:

> Any insight about the relative performance of the two implementations?

We have a benchmark for the speed of unwinder. Here are results.

The number /1, /2, etc. is the number of levels in the stack trace.

Using frame-based unwinder:

Benchmark             Time(ns)    CPU(ns) Iterations
----------------------------------------------------
BM_GetStackTrace/1          29         29   24137931
BM_GetStackTrace/2          38         38   17948718
BM_GetStackTrace/3          43         44   16279070
BM_GetStackTrace/4          57         57   10000000
BM_GetStackTrace/5          62         62   10000000
BM_GetStackTrace/6          65         65   10000000
BM_GetStackTrace/7          65         64   10000000
BM_GetStackTrace/8          65         65   10000000
BM_GetStackTrace/9          65         65   10000000
BM_GetStackTrace/10         65         65   10000000


Using libgcc:

Benchmark             Time(ns)    CPU(ns) Iterations
----------------------------------------------------
BM_GetStackTrace/1        1543       1543     466667
BM_GetStackTrace/2        2042       2057     350000
BM_GetStackTrace/3        2378       2366     291667
BM_GetStackTrace/4        2754       2720     250000
BM_GetStackTrace/5        3212       3200     218750
BM_GetStackTrace/6        3655       3651     194444
BM_GetStackTrace/7        4039       4000     175000
BM_GetStackTrace/8        4009       4000     175000
BM_GetStackTrace/9        4002       4000     175000
BM_GetStackTrace/10       4017       4000     175000


Using libunwind:

Benchmark             Time(ns)    CPU(ns) Iterations
----------------------------------------------------
BM_GetStackTrace/1         109        108    6363636
BM_GetStackTrace/2         121        122    5833333
BM_GetStackTrace/3         130        130    5000000
BM_GetStackTrace/4         133        132    5000000
BM_GetStackTrace/5         148        148    4666667
BM_GetStackTrace/6         162        162    4375000
BM_GetStackTrace/7         174        175    4117647
BM_GetStackTrace/8         185        185    3888889
BM_GetStackTrace/9         188        187    3684211
BM_GetStackTrace/10        188        187    3684211

Conclusions:
- frame based unwinder is hard to beat (re-confirmed :-)
- libunwind is getting really close at 3x the overhead
- libgcc is nowhere close at 50x.
Paul Pluzhnikov - Feb. 28, 2013, 10:03 p.m.
On Thu, Feb 28, 2013 at 11:59 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Feb 28, 2013 at 11:57:48AM -0800, Cary Coutant wrote:
>> Similarly, couldn't dlopen drop the loader lock while calling malloc?
>
> It can't, but perhaps it could call some alternative malloc instead
> (the simpler malloc version in ld.so or similar).

ld-linux starts calling the simpler malloc in dl-minimal.c, then switches to
"real" libc.so.6 malloc later on.

This behavior causes a lot of pain to anyone who tries to interpose malloc
and use dlsym(RTLD_NEXT,...) or similar from the interposer.

Roland explained to me ~15 years ago why it is that way (it had something to
do with Hurd); an explanation I can't find at the moment.

Patch

Index: libstdc++-v3/src/c++11/Makefile.am
===================================================================
--- libstdc++-v3/src/c++11/Makefile.am	(revision 196316)
+++ libstdc++-v3/src/c++11/Makefile.am	(working copy)
@@ -124,3 +124,4 @@ 
 
 # Google-specific pessimization
 functexcept.lo_no_omit_frame_pointer = -fno-omit-frame-pointer
+debug.lo_no_omit_frame_pointer = -fno-omit-frame-pointer
Index: libstdc++-v3/src/c++11/Makefile.in
===================================================================
--- libstdc++-v3/src/c++11/Makefile.in	(revision 196316)
+++ libstdc++-v3/src/c++11/Makefile.in	(working copy)
@@ -391,6 +391,7 @@ 
 
 # Google-specific pessimization
 functexcept.lo_no_omit_frame_pointer = -fno-omit-frame-pointer
+debug.lo_no_omit_frame_pointer = -fno-omit-frame-pointer
 all: all-am
 
 .SUFFIXES: