diff mbox

[RFC] LD_DEBUG option to measure init time

Message ID "027601d0cac7$952818c0$bf784a40$@guseva"@samsung.com
State New
Headers show

Commit Message

Maria Guseva July 30, 2015, 12:59 p.m. UTC
Currently the LD_DEBUG=statistics provides the total time spent in dynamic
linker and details it with relocation and load objects time. But other
useful information may be the time spent in init and init_array routines.
It may help to analyze the bottlenecks of application startup time. 
Attached (draft) patch measures the time of each call to .init and
.init_array entries in dl-init() for every loaded dso and cumulative time of
init() call for main executable. For each init function called also its
address is printed.  
 
The output currently looks like this (run on x86-64):
     13825:     ./a.out => /home1/m.guseva/glibc/install/lib/libc.so.6:
spent 29988 cycles in .init (== 0x00000000000218c0)
     13825:     ./a.out => /home1/m.guseva/glibc/install/lib/libc.so.6:
spent 30132 cycles in .init_array[0] (== 0x0000000000021630)
     13825:     ./a.out =>
/home1/m.guseva/trunk/gcc-4.9.2/lib64/libgcc_s.so.1: spent 816 cycles in
.init (== 0x0000000000002720)
     13825:     ./a.out =>
/home1/m.guseva/trunk/gcc-4.9.2/lib64/libgcc_s.so.1: spent 10598 cycles in
.init_array[0] (== 0x0000000000002b40)
     13825:     ./a.out =>
/home1/m.guseva/trunk/gcc-4.9.2/lib64/libgcc_s.so.1: spent 324 cycles in
.init_array[1] (== 0x0000000000002f40)
     13825:     ./a.out => /home1/m.guseva/glibc/install/lib/libm.so.6:
spent 7828 cycles in .init (== 0x0000000000005590)
     13825:     ./a.out =>
/home1/m.guseva/trunk/gcc-4.9.2/lib64/libstdc++.so.6: spent 6238 cycles in
.init (== 0x000000000005ac18)
     13825:     ./a.out =>
/home1/m.guseva/trunk/gcc-4.9.2/lib64/libstdc++.so.6: spent 730 cycles in
.init_array[0] (== 0x000000000005dcb0)
     13825:     ./a.out =>
/home1/m.guseva/trunk/gcc-4.9.2/lib64/libstdc++.so.6: spent 59840 cycles in
.init_array[1] (== 0x000000000005d660)
     13825:     ./a.out =>
/home1/m.guseva/trunk/gcc-4.9.2/lib64/libstdc++.so.6: spent 170 cycles in
.init_array[2] (== 0x000000000005d680)
     13825:     ./a.out =>
/home1/m.guseva/trunk/gcc-4.9.2/lib64/libstdc++.so.6: spent 11126 cycles in
.init_array[3] (== 0x000000000005d6c0)
     13825:     ./a.out =>
/home1/m.guseva/trunk/gcc-4.9.2/lib64/libstdc++.so.6: spent 1036 cycles in
.init_array[4] (== 0x000000000005d7d0)
     13825:     ./a.out =>
/home1/m.guseva/trunk/gcc-4.9.2/lib64/libstdc++.so.6: spent 1070 cycles in
.init_array[5] (== 0x000000000005d890)
     13825:     ./a.out =>
/home1/m.guseva/trunk/gcc-4.9.2/lib64/libstdc++.so.6: spent 5474 cycles in
.init_array[6] (== 0x000000000005d950)
     13825:     ./a.out =>
/home1/m.guseva/trunk/gcc-4.9.2/lib64/libstdc++.so.6: spent 900 cycles in
.init_array[7] (== 0x000000000005db80)
     13825:     ./a.out => /home1/m.guseva/Test/libsample.so: spent 418
cycles in .init (== 0x00000000000005f0)
     13825:     ./a.out => /home1/m.guseva/Test/libsample.so: spent 510
cycles in .init_array[0] (== 0x0000000000000750)
     13825:     ./a.out => /home1/m.guseva/Test/libsample.so: spent
3400396160 cycles in .init_array[1] (== 0x0000000000000780)
     13825:     ./a.out => /home1/m.guseva/Test/libsample.so: spent
3400263040 cycles in .init_array[2] (== 0x00000000000007d3)
     13825:     ./a.out: spent 28160 cycles in init (== 0x00000000004006e0)

Would something like this be useful in mainline Glibc? If attitude is
generally positive I can work on more polished implementation.

I'm also thinking about having such feature for platforms without HP timing
available, like arm. Some generic time measurements may be used e.g. via
clock_gettime syscall.

	 


Regards,
Maria

Comments

Carlos O'Donell July 31, 2015, 3:53 a.m. UTC | #1
On 07/30/2015 08:59 AM, Maria Guseva wrote:
> Currently the LD_DEBUG=statistics provides the total time spent in dynamic
> linker and details it with relocation and load objects time. But other
> useful information may be the time spent in init and init_array routines.
> It may help to analyze the bottlenecks of application startup time. 
> Attached (draft) patch measures the time of each call to .init and
> .init_array entries in dl-init() for every loaded dso and cumulative time of
> init() call for main executable. For each init function called also its
> address is printed.  

Agreed.

> Would something like this be useful in mainline Glibc? If attitude is
> generally positive I can work on more polished implementation.

I think the idea is a very good one. It's certainly useful information for
application developers to have and it can be fed back into other tools.

The implementation needs polish. For example it should not do any timing
unless DL_DEBUG_INITTIME is set.

What about time for finalizers? Same problem. Use a more generic flag
e.g. DL_DEBUG_CTORDTOR_TIME to cover all cases.

Should LD_DEBUG=all enable this? Probably not.

Note that you need a patch to linux kernel man pages project to describe
this new feature too.

> I'm also thinking about having such feature for platforms without HP timing
> available, like arm. Some generic time measurements may be used e.g. via
> clock_gettime syscall.

Yes please. That would be a requirement of acceptance, it should work on
all machines.

Cheers,
Carlos.
Andrew Pinski July 31, 2015, 3:58 a.m. UTC | #2
> On Jul 30, 2015, at 8:53 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> 
>> On 07/30/2015 08:59 AM, Maria Guseva wrote:
>> Currently the LD_DEBUG=statistics provides the total time spent in dynamic
>> linker and details it with relocation and load objects time. But other
>> useful information may be the time spent in init and init_array routines.
>> It may help to analyze the bottlenecks of application startup time. 
>> Attached (draft) patch measures the time of each call to .init and
>> .init_array entries in dl-init() for every loaded dso and cumulative time of
>> init() call for main executable. For each init function called also its
>> address is printed.  
> 
> Agreed.
> 
>> Would something like this be useful in mainline Glibc? If attitude is
>> generally positive I can work on more polished implementation.
> 
> I think the idea is a very good one. It's certainly useful information for
> application developers to have and it can be fed back into other tools.
> 
> The implementation needs polish. For example it should not do any timing
> unless DL_DEBUG_INITTIME is set.
> 
> What about time for finalizers? Same problem. Use a more generic flag
> e.g. DL_DEBUG_CTORDTOR_TIME to cover all cases.
> 
> Should LD_DEBUG=all enable this? Probably not.
> 
> Note that you need a patch to linux kernel man pages project to describe
> this new feature too.
> 
>> I'm also thinking about having such feature for platforms without HP timing
>> available, like arm. Some generic time measurements may be used e.g. via
>> clock_gettime syscall.
> 
> Yes please. That would be a requirement of acceptance, it should work on
> all machines.


Note aarch64 really should be using the virtual timer which is optional part of armv8 but not an optional part of the server base.  It is sad that arm is blocking a glibc patch due to it being an optional feature of v8 but is required for servers. 

Thanks,
Andrew

> 
> Cheers,
> Carlos.
Carlos O'Donell July 31, 2015, 4:08 a.m. UTC | #3
On 07/30/2015 11:58 PM, pinskia@gmail.com wrote:
> Note aarch64 really should be using the virtual timer which is
> optional part of armv8 but not an optional part of the server base.
> It is sad that arm is blocking a glibc patch due to it being an
> optional feature of v8 but is required for servers.

ARM doesn't block this patch at all. Such machines that don't implement
hptiming could fall back to more expensive interfaces like gettimeofday
(vdso) or clock_gettime?

We really should be doing something better for AArhc64, but why not
ask ARM to fix this via Linaro? :-)

c.
Andrew Pinski July 31, 2015, 4:13 a.m. UTC | #4
> On Jul 30, 2015, at 9:08 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> 
>> On 07/30/2015 11:58 PM, pinskia@gmail.com wrote:
>> Note aarch64 really should be using the virtual timer which is
>> optional part of armv8 but not an optional part of the server base.
>> It is sad that arm is blocking a glibc patch due to it being an
>> optional feature of v8 but is required for servers.
> 
> ARM doesn't block this patch at all. Such machines that don't implement
> hptiming could fall back to more expensive interfaces like gettimeofday
> (vdso) or clock_gettime?

I am saying they are blocking the patch which adds hptiming. They reverted it being an option feature. There is another thread about this already. 

Thanks,
Andrew

> 
> We really should be doing something better for AArhc64, but why not
> ask ARM to fix this via Linaro? :-)
> 
> c.
Carlos O'Donell July 31, 2015, 4:28 a.m. UTC | #5
On 07/31/2015 12:13 AM, pinskia@gmail.com wrote:
>> ARM doesn't block this patch at all. Such machines that don't
>> implement hptiming could fall back to more expensive interfaces
>> like gettimeofday (vdso) or clock_gettime?
> 
> I am saying they are blocking the patch which adds hptiming. They
> reverted it being an option feature. There is another thread about
> this already.

Thanks for clarifying.

We should resurrect that discussion.

c.
diff mbox

Patch

diff --git a/csu/libc-start.c b/csu/libc-start.c
index 138418a..b2333ca 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -21,6 +21,7 @@ 
 #include <unistd.h>
 #include <ldsodefs.h>
 #include <exit-thread.h>
+#include "hp-timing.h"

 extern void __libc_init_first (int argc, char **argv, char **envp);
 #ifndef SHARED
@@ -242,7 +243,29 @@  LIBC_START_MAIN (int (*main) (int, char **, char **
MAIN_AUXVEC_DECL),
     GLRO(dl_debug_printf) ("\ninitialize program: %s\n\n", argv[0]);
 #endif
   if (init)
-    (*init) (argc, argv, __environ MAIN_AUXVEC_PARAM);
+    {
+#if defined SHARED && ! defined HP_TIMING_NONAVAIL
+      hp_timing_t init_time;
+      hp_timing_t start_time;
+      hp_timing_t end_time;
+      HP_TIMING_NOW (start_time);
+#endif
+
+      (*init) (argc, argv, __environ MAIN_AUXVEC_PARAM);
+
+#if defined SHARED && ! defined HP_TIMING_NONAVAIL
+      HP_TIMING_NOW (end_time);
+      HP_TIMING_DIFF (init_time, start_time, end_time);
+      if (__glibc_unlikely (GLRO (dl_debug_mask) & DL_DEBUG_INITTIME))
+       {
+         char buf[200];
+         HP_TIMING_PRINT (buf, sizeof (buf), init_time);
+         GLRO (dl_debug_printf) ("%s: spent %s in init (== 0x%0*lx)\n",
+                                 argv[0], buf, (int) sizeof (void *) * 2,
+                                 (unsigned long int)init);
+       }
+#endif
+    }

 #ifdef SHARED
   /* Auditing checkpoint: we have a new object.  */
diff --git a/elf/dl-init.c b/elf/dl-init.c
index 598df3b..5c8976a 100644
--- a/elf/dl-init.c
+++ b/elf/dl-init.c
@@ -18,6 +18,7 @@ 

 #include <stddef.h>
 #include <ldsodefs.h>
+#include "hp-timing.h"


 /* Type of the initializer.  */
@@ -33,6 +34,11 @@  extern int _dl_starting_up_internal attribute_hidden;
 static void
 call_init (struct link_map *l, int argc, char **argv, char **env)
 {
+#ifndef HP_TIMING_NONAVAIL
+  hp_timing_t init_time;
+  hp_timing_t start_time;
+  hp_timing_t end_time;
+#endif
   if (l->l_init_called)
     /* This object is all done.  */
     return;
@@ -61,7 +67,27 @@  call_init (struct link_map *l, int argc, char **argv,
char **env)
      - the others in the DT_INIT_ARRAY.
   */
   if (l->l_info[DT_INIT] != NULL)
-    DL_CALL_DT_INIT(l, l->l_addr + l->l_info[DT_INIT]->d_un.d_ptr, argc,
argv, env);
+    {
+#ifndef HP_TIMING_NONAVAIL
+      HP_TIMING_NOW (start_time);
+#endif
+      DL_CALL_DT_INIT (l, l->l_addr + l->l_info[DT_INIT]->d_un.d_ptr, argc,
+                      argv, env);
+#ifndef HP_TIMING_NONAVAIL
+      HP_TIMING_NOW (end_time);
+      HP_TIMING_DIFF (init_time, start_time, end_time);
+
+      if (__glibc_unlikely (GLRO (dl_debug_mask) & DL_DEBUG_INITTIME))
+       {
+         char buf[200];
+         HP_TIMING_PRINT (buf, sizeof (buf), init_time);
+         _dl_debug_printf ("%s => %s: spent %s in .init (== 0x%0*lx)\n",
+                           argv[0], DSO_FILENAME (l->l_name), buf,
+                           (int) sizeof (void *) * 2,
+                           (unsigned long
int)(l->l_info[DT_INIT]->d_un.d_ptr));
+       }
+#endif
+    }

   /* Next see whether there is an array with initialization functions.  */
   ElfW(Dyn) *init_array = l->l_info[DT_INIT_ARRAY];
@@ -75,7 +101,26 @@  call_init (struct link_map *l, int argc, char **argv,
char **env)

       addrs = (ElfW(Addr) *) (init_array->d_un.d_ptr + l->l_addr);
       for (j = 0; j < jm; ++j)
-       ((init_t) addrs[j]) (argc, argv, env);
+       {
+#ifndef HP_TIMING_NONAVAIL
+         HP_TIMING_NOW (start_time);
+#endif
+         ((init_t) addrs[j]) (argc, argv, env);
+#ifndef HP_TIMING_NONAVAIL
+         HP_TIMING_NOW (end_time);
+         HP_TIMING_DIFF (init_time, start_time, end_time);
+
+         if (__glibc_unlikely (GLRO (dl_debug_mask) & DL_DEBUG_INITTIME))
+           {
+             char buf[200];
+             HP_TIMING_PRINT (buf, sizeof (buf), init_time);
+             _dl_debug_printf
+               ("%s => %s: spent %s in .init_array[%u] (== 0x%0*lx)\n",
argv[0],
+                DSO_FILENAME (l->l_name), buf, j, (int) sizeof (void *) *
2,
+                (unsigned long int)(addrs[j] - l->l_addr));
+           }
+#endif
+       }
     }
 }

diff --git a/elf/rtld.c b/elf/rtld.c
index d5cace8..95e75a7 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -2241,6 +2241,8 @@  process_dl_debug (const char *dl_debug)
        DL_DEBUG_STATISTICS },
       { LEN_AND_STR ("unused"), "determined unused DSOs",
        DL_DEBUG_UNUSED },
+      { LEN_AND_STR ("init_time"), "display init time",
+    DL_DEBUG_INITTIME },
       { LEN_AND_STR ("help"), "display this help message and exit",
        DL_DEBUG_HELP },
     };
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index e01df84..26e7186 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -452,6 +452,7 @@  struct rtld_global_ro
 /* These two are used only internally.  */
 #define DL_DEBUG_HELP       (1 << 10)
 #define DL_DEBUG_PRELINK    (1 << 11)
+#define DL_DEBUG_INITTIME   (1 << 12)

   /* OS version.  */
   EXTERN unsigned int _dl_osversion;