Patchwork [1/3] Add native ELF and LTO support in collect2

login
register
mail settings
Submitter Andi Kleen
Date Oct. 16, 2010, 3:09 p.m.
Message ID <1287241747-2496-2-git-send-email-andi@firstfloor.org>
Download mbox | patch
Permalink /patch/68037/
State New
Headers show

Comments

Andi Kleen - Oct. 16, 2010, 3:09 p.m.
From: Andi Kleen <ak@linux.intel.com>

Change collect2 to read the symbol table directly on ELF systems
using libelf. Also add support for the LTO symbol table.
This way collect2 can resolve symbols in a object file
that only has LTO information.

The LTO parser is closely patterned after the code
in the lto-plugin.

v2: Addressed review feedback. Enabled on all ELF systems now.

gcc/

2010-10-07  Andi Kleen  <ak@linux.intel.com>

	* collect2.c: Add ifdefs for OBJECT_FORMAT_ELF.
	(main): Move use_plugin to top level.
	(scan_prog_file): Move switch statement to ..
	(handle_pass): Separate function here.
	(LTO_SYMTAB_NAME, scan_lto_symtab, scan_elf_symtab, is_ar,
	 scan_prog_file): Add.
	* config.in: Regenerate.
	* config/i386/linux.h (OBJECT_FORMAT_ELF): Define.
	* configure: Regenerate.
	* configure.ac: Check for libelf.h and gelf.h. Adjust
	libelf test.
---
 gcc/collect2.c   |  309 +++++++++++++++++++++++++++++++++++++++++++++---------
 gcc/config.in    |   12 ++
 gcc/configure    |   30 +++++-
 gcc/configure.ac |    8 +-
 4 files changed, 305 insertions(+), 54 deletions(-)
Diego Novillo - Nov. 2, 2010, 11:04 a.m.
On 10-10-16 11:09 , Andi Kleen wrote:

>
> 	* collect2.c: Add ifdefs for OBJECT_FORMAT_ELF.
> 	(main): Move use_plugin to top level.
> 	(scan_prog_file): Move switch statement to ..
> 	(handle_pass): Separate function here.
> 	(LTO_SYMTAB_NAME, scan_lto_symtab, scan_elf_symtab, is_ar,
> 	 scan_prog_file): Add.
> 	* config.in: Regenerate.
> 	* config/i386/linux.h (OBJECT_FORMAT_ELF): Define.
> 	* configure: Regenerate.
> 	* configure.ac: Check for libelf.h and gelf.h. Adjust
> 	libelf test.

Hm, why not just force the use of a linker that has plugin support?


Diego.
Richard Guenther - Nov. 2, 2010, 11:06 a.m.
On Tue, Nov 2, 2010 at 12:04 PM, Diego Novillo <dnovillo@google.com> wrote:
> On 10-10-16 11:09 , Andi Kleen wrote:
>
>>
>>        * collect2.c: Add ifdefs for OBJECT_FORMAT_ELF.
>>        (main): Move use_plugin to top level.
>>        (scan_prog_file): Move switch statement to ..
>>        (handle_pass): Separate function here.
>>        (LTO_SYMTAB_NAME, scan_lto_symtab, scan_elf_symtab, is_ar,
>>         scan_prog_file): Add.
>>        * config.in: Regenerate.
>>        * config/i386/linux.h (OBJECT_FORMAT_ELF): Define.
>>        * configure: Regenerate.
>>        * configure.ac: Check for libelf.h and gelf.h. Adjust
>>        libelf test.
>
> Hm, why not just force the use of a linker that has plugin support?

Or instead use the new facility Ian added to libiberty (or use it once
that is committed).

Richard.

>
> Diego.
>
Andi Kleen - Nov. 2, 2010, 11:27 a.m.
On Tue, Nov 02, 2010 at 12:06:37PM +0100, Richard Guenther wrote:
> On Tue, Nov 2, 2010 at 12:04 PM, Diego Novillo <dnovillo@google.com> wrote:
> > On 10-10-16 11:09 , Andi Kleen wrote:
> >
> >>
> >>        * collect2.c: Add ifdefs for OBJECT_FORMAT_ELF.
> >>        (main): Move use_plugin to top level.
> >>        (scan_prog_file): Move switch statement to ..
> >>        (handle_pass): Separate function here.
> >>        (LTO_SYMTAB_NAME, scan_lto_symtab, scan_elf_symtab, is_ar,
> >>         scan_prog_file): Add.
> >>        * config.in: Regenerate.
> >>        * config/i386/linux.h (OBJECT_FORMAT_ELF): Define.
> >>        * configure: Regenerate.
> >>        * configure.ac: Check for libelf.h and gelf.h. Adjust
> >>        libelf test.
> >
> > Hm, why not just force the use of a linker that has plugin support?
> 
> Or instead use the new facility Ian added to libiberty (or use it once
> that is committed).

I can do that, shouldn't be too hard to convert it.
Should I wait for that?

-andi
Andi Kleen - Nov. 2, 2010, 11:41 a.m.
> Hm, why not just force the use of a linker that has plugin support?

I mainly did it so that the constructors etc. would be still
generated by collect2. But maybe that could be fully done
by the linker now. I must admit I don't understand all 
the implications of such a change, so I preferred to keep
the old mode and just make it work with LTO symbol tables.

There's also two additional uses of this infrastructure
(which are not in this patchkit yet):

- Detect leftover non LTO code from earlier ld -r and use objcopy to copy
it to a new object file and include it. This can be easily done
with slim LTO.
The main use of that is handling .S files LTO build that uses ld -r.
Without it that code would disappear during LTO.
I need that for a project I'm interested in. This was the main
reason I implemented slim LTO in the first place.

- Automatic detection of LTO so that if a Makefile forgets 
to add the lto options for the final link it would still work.
This is actually needed for a full gcc lto slim bootstrap right
now because libiberty doesn't set correct stage2/3 link flags.

Perhaps that's obsolete, if ld plugin support is universal
the plugin could be just enabled unconditionally. I was a bit
wary of this before because it would mean unconditional gold
and gold still seems to have trouble with some code.

-Andi
Dave Korn - Nov. 2, 2010, 9:32 p.m.
On 02/11/2010 11:27, Andi Kleen wrote:
> On Tue, Nov 02, 2010 at 12:06:37PM +0100, Richard Guenther wrote:
>> On Tue, Nov 2, 2010 at 12:04 PM, Diego Novillo <dnovillo@google.com> wrote:
>>> On 10-10-16 11:09 , Andi Kleen wrote:
>>>
>>>>        * collect2.c: Add ifdefs for OBJECT_FORMAT_ELF.
>>>>        (main): Move use_plugin to top level.
>>>>        (scan_prog_file): Move switch statement to ..
>>>>        (handle_pass): Separate function here.
>>>>        (LTO_SYMTAB_NAME, scan_lto_symtab, scan_elf_symtab, is_ar,
>>>>         scan_prog_file): Add.
>>>>        * config.in: Regenerate.
>>>>        * config/i386/linux.h (OBJECT_FORMAT_ELF): Define.
>>>>        * configure: Regenerate.
>>>>        * configure.ac: Check for libelf.h and gelf.h. Adjust
>>>>        libelf test.
>>> Hm, why not just force the use of a linker that has plugin support?
>> Or instead use the new facility Ian added to libiberty (or use it once
>> that is committed).
> 
> I can do that, shouldn't be too hard to convert it.
> Should I wait for that?

  It's gone in now.  If you want an example of how it can be used, take a look
at the patch I wrote to convert the lto-plugin.

http://gcc.gnu.org/ml/gcc/2010-10/msg00483.html

N.B. that there's a bug in that version: "free (data);" in two locations in
process_symtab() should in fact read "free (secdata);".  (Also it was written
before the objfile functions all got renamed, so I need to update it there too.)

    cheers,
      DaveK

Patch

diff --git a/gcc/collect2.c b/gcc/collect2.c
index a8cd232..b3a61ac 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -68,12 +68,19 @@  along with GCC; see the file COPYING3.  If not see
 #undef REAL_STRIP_FILE_NAME
 #endif
 
+#if  !defined (HAVE_LIBELF_H) || !defined (HAVE_GELF_H)  \
+  || !defined (HAVE_UNISTD_H) || !defined (HAVE_FCNTL_H)
+#undef OBJECT_FORMAT_ELF
+#else
+#undef REAL_NM_FILE_NAME
+#endif
+
 /* If we cannot use a special method, use the ordinary one:
    run nm to find what symbols are present.
    In a cross-compiler, this means you need a cross nm,
    but that is not quite as unpleasant as special headers.  */
 
-#if !defined (OBJECT_FORMAT_COFF)
+#if !defined (OBJECT_FORMAT_COFF) && !defined(OBJECT_FORMAT_ELF)
 #define OBJECT_FORMAT_NONE
 #endif
 
@@ -869,7 +876,7 @@  prefix_from_string (const char *p, struct path_prefix *pprefix)
   free (nstore);
 }
 
-#ifdef OBJECT_FORMAT_NONE
+#if defined(OBJECT_FORMAT_NONE) || defined(OBJECT_FORMAT_ELF)
 
 /* Add an entry for the object file NAME to object file list LIST.
    New entries are added at the end of the list. The original pointer
@@ -889,7 +896,7 @@  add_lto_object (struct lto_object_list *list, const char *name)
 
   list->last = n;
 }
-#endif /* OBJECT_FORMAT_NONE */
+#endif /* OBJECT_FORMAT_NONE || OBJECT_FORMAT_ELF */
 
 
 /* Perform a link-time recompilation and relink if any of the object
@@ -1070,6 +1077,8 @@  maybe_run_lto_and_relink (char **lto_ld_argv, char **object_lst,
     }
 }
 
+bool use_plugin = false;
+
 /* Main program.  */
 
 int
@@ -1132,7 +1141,6 @@  main (int argc, char **argv)
   const char **c_ptr;
   char **ld1_argv;
   const char **ld1;
-  bool use_plugin = false;
 
   /* The kinds of symbols we will have to consider when scanning the
      outcome of a first pass link.  This is ALL to start with, then might
@@ -2521,6 +2529,63 @@  write_aix_file (FILE *stream, struct id *list)
 }
 #endif
 
+#if defined (OBJECT_FORMAT_NONE) || defined (OBJECT_FORMAT_ELF)
+
+/* Handle a defined symbol */
+
+static void
+handle_pass (const char *name, scanpass which_pass, scanfilter filter,
+	     const char *prog_name)
+{
+  switch (is_ctor_dtor (name))
+    {
+    case SYM_CTOR:
+      if (! (filter & SCAN_CTOR))
+	break;
+      if (which_pass != PASS_LIB)
+	add_to_list (&constructors, name);
+      break;
+      
+    case SYM_DTOR:
+      if (! (filter & SCAN_DTOR))
+	break;
+      if (which_pass != PASS_LIB)
+	add_to_list (&destructors, name);
+      break;
+      
+    case SYM_INIT:
+      if (! (filter & SCAN_INIT))
+	break;
+      if (which_pass != PASS_LIB)
+	fatal ("init function found in object %s", prog_name);
+#ifndef LD_INIT_SWITCH
+      add_to_list (&constructors, name);
+#endif
+      break;
+      
+    case SYM_FINI:
+      if (! (filter & SCAN_FINI))
+	break;
+      if (which_pass != PASS_LIB)
+	fatal ("fini function found in object %s", prog_name);
+#ifndef LD_FINI_SWITCH
+      add_to_list (&destructors, name);
+#endif
+      break;
+      
+    case SYM_DWEH:
+      if (! (filter & SCAN_DWEH))
+	break;
+      if (which_pass != PASS_LIB)
+	add_to_list (&frame_tables, name);
+      break;
+
+    case SYM_REGULAR:
+      break;
+    }
+}
+#endif
+
 #ifdef OBJECT_FORMAT_NONE
 
 /* Check to make sure the file is an LTO object file.  */
@@ -2703,52 +2768,7 @@  scan_prog_file (const char *prog_name, scanpass which_pass,
 
 
       *end = '\0';
-      switch (is_ctor_dtor (name))
-	{
-	case SYM_CTOR:
-	  if (! (filter & SCAN_CTOR))
-	    break;
-	  if (which_pass != PASS_LIB)
-	    add_to_list (&constructors, name);
-	  break;
-
-	case SYM_DTOR:
-	  if (! (filter & SCAN_DTOR))
-	    break;
-	  if (which_pass != PASS_LIB)
-	    add_to_list (&destructors, name);
-	  break;
-
-	case SYM_INIT:
-	  if (! (filter & SCAN_INIT))
-	    break;
-	  if (which_pass != PASS_LIB)
-	    fatal ("init function found in object %s", prog_name);
-#ifndef LD_INIT_SWITCH
-	  add_to_list (&constructors, name);
-#endif
-	  break;
-
-	case SYM_FINI:
-	  if (! (filter & SCAN_FINI))
-	    break;
-	  if (which_pass != PASS_LIB)
-	    fatal ("fini function found in object %s", prog_name);
-#ifndef LD_FINI_SWITCH
-	  add_to_list (&destructors, name);
-#endif
-	  break;
-
-	case SYM_DWEH:
-	  if (! (filter & SCAN_DWEH))
-	    break;
-	  if (which_pass != PASS_LIB)
-	    add_to_list (&frame_tables, name);
-	  break;
-
-	default:		/* not a constructor or destructor */
-	  continue;
-	}
+      handle_pass (name, which_pass, filter, prog_name);
     }
 
   if (debug)
@@ -3218,3 +3238,192 @@  resolve_lib_name (const char *name)
   return (NULL);
 }
 #endif /* COLLECT_EXPORT_LIST */
+
+#ifdef OBJECT_FORMAT_ELF
+#include <libelf.h>
+#include <gelf.h>
+#include <plugin-api.h>
+
+#include <sys/fcntl.h>
+#include <unistd.h>
+
+#define LTO_SYMTAB_NAME ".gnu.lto_.symtab"
+
+/* Scan a LTO symbol table section. */
+
+static unsigned
+scan_lto_symtab (Elf_Data *tab, scanpass which_pass, 
+		 scanfilter filter, const char *prog_name)
+{
+  unsigned nsyms = 0;
+  char *p;
+
+  for (p = (char *)tab->d_buf; p < (char *)tab->d_buf + tab->d_size; ) 
+    {
+      const char *name;
+      int skip = 0;
+
+      /* Done in the same way as the lto-plugin. */
+
+      /* name */
+      name = p;
+      while (*p++)
+	;
+      /* comdat */
+      while (*p++)
+	;
+      /* translate */
+      if (*p == LDPK_UNDEF || *p == LDPK_WEAKUNDEF)
+	skip = 1;
+      p++;
+      /* visibility */
+      p++;
+      /* size */
+      p += 8;
+      /* slot */
+      p += 4;
+
+      if (!skip)
+	handle_pass (name, which_pass, filter, prog_name);
+      nsyms++;
+    }
+  return nsyms;
+}
+
+/* Scan a ELF symbol table */
+
+static unsigned
+scan_elf_symtab (Elf *elf, Elf_Data *data, GElf_Shdr *shdr,
+		 scanpass which_pass, scanfilter filter, const char *prog_name)
+{
+  unsigned i;
+  unsigned nsyms = shdr->sh_size / shdr->sh_entsize;
+  unsigned proc = 0;
+
+  for (i = 0; i < nsyms; i++) 
+    {
+      GElf_Sym sym;
+      const char *name;
+
+      gelf_getsym (data, i, &sym);
+
+      if (ELF32_ST_TYPE (sym.st_info) >= STT_SECTION)
+	continue;
+      if (sym.st_shndx == SHN_UNDEF || sym.st_shndx >= SHN_LORESERVE)
+	continue;
+      name = elf_strptr (elf, shdr->sh_link, sym.st_name);
+      handle_pass (name, which_pass, filter, prog_name);
+      proc++;
+    }
+
+  return nsyms;
+}
+
+/* Is FD an ar file? */
+
+static int
+is_ar (int fd)
+{
+  char buf[8];
+  bool isar = false;
+ 
+  if (read (fd, buf, 8) == 8) 
+    isar = !memcmp (buf, "!<arch>\r", 8);
+  lseek (fd, 0, SEEK_SET);
+  return isar;
+}
+
+/* ELF version to scan the name list of the loaded program for
+   the symbols g++ uses for static constructors and destructors.
+   This also supports LTO symbol tables. */
+
+static void
+scan_prog_file (const char *prog_name, scanpass which_pass,
+		scanfilter filter)
+{
+  int fd;
+  GElf_Ehdr header;
+  Elf *elf;
+  Elf_Scn *section;
+  unsigned syms = 0;
+  unsigned sections = 0;
+  unsigned lto = 0;
+
+  if (which_pass == PASS_SECOND)
+    return;
+
+  if (debug)
+    fprintf (stderr, "Scanning file '%s'\n", prog_name);
+
+  elf_version (EV_CURRENT);
+  fd = open (prog_name, O_RDONLY);
+  if (fd < 0)
+    {
+      fprintf (stderr, "Cannot open %s\n", prog_name);
+      return;
+    }
+
+  /* It is difficult to figure out if an ar file needs LTO or not.
+     If we use the linker plugin and it is an ar file just handle
+     it like a LTO file unconditionally.  */
+  if (which_pass == PASS_LTOINFO && use_plugin && is_ar (fd)) 
+    {
+      if (debug)
+        fprintf (stderr, "Handling ar file %s as LTO\n", prog_name);
+      add_lto_object (&lto_objects, prog_name);
+      return;
+    }
+
+  elf = elf_begin (fd, ELF_C_READ, NULL);
+  if (elf == NULL)
+    {
+      fprintf (stderr, "Cannot run elf_begin on %s: %s\n", prog_name,
+                       elf_errmsg (0));
+      close (fd);
+      return;
+    }
+  if (!gelf_getehdr (elf, &header))
+    fatal ("Cannot find EHDR in %s: %s", prog_name, elf_errmsg (0));
+
+  section = NULL;
+  while ((section = elf_nextscn (elf, section)) != 0)
+    {
+      GElf_Shdr shdr_mem;
+      GElf_Shdr *shdr = gelf_getshdr (section, &shdr_mem);
+      char *name;
+      int islto;
+
+      sections++;
+
+      if (!shdr)
+        fatal("Cannot read SHDR for section %d: %s", sections, elf_errmsg (0));
+
+      name = elf_strptr (elf, header.e_shstrndx, shdr->sh_name);
+      islto = !strncmp (name, LTO_SYMTAB_NAME, strlen (LTO_SYMTAB_NAME));
+      lto += islto;
+
+      if (which_pass == PASS_LTOINFO)
+	{
+	  if (!islto)
+	    continue;
+	  add_lto_object (&lto_objects, prog_name);
+	  break;
+	}
+
+      if (islto)
+	syms += scan_lto_symtab (elf_getdata (section, NULL), which_pass, 
+				 filter, prog_name);
+      else if (shdr->sh_type == SHT_SYMTAB && shdr->sh_entsize > 0)
+	syms += scan_elf_symtab (elf, elf_getdata (section, NULL), shdr,
+				 which_pass, filter, prog_name);
+    }
+
+  if (debug)
+    fprintf (stderr, "Scanned %u symbols, %u sections, %u LTO\n", syms, 
+                     sections, lto);
+
+  elf_end (elf);
+  close (fd);
+}
+
+#endif
diff --git a/gcc/config.in b/gcc/config.in
index 4576de0..d141e6a 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1017,6 +1017,12 @@ 
 #endif
 
 
+/* Define to 1 if you have the <gelf.h> header file. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GELF_H
+#endif
+
+
 /* Define to 1 if you have the `getchar_unlocked' function. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_GETCHAR_UNLOCKED
@@ -1215,6 +1221,12 @@ 
 #endif
 
 
+/* Define to 1 if you have the <libelf.h> header file. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_LIBELF_H
+#endif
+
+
 /* Define to 1 if you have the <limits.h> header file. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_LIMITS_H
diff --git a/gcc/configure b/gcc/configure
index bceffd6..4b30df4 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -7870,7 +7870,8 @@  fi
 for ac_header in limits.h stddef.h string.h strings.h stdlib.h time.h iconv.h \
 		 fcntl.h unistd.h sys/file.h sys/time.h sys/mman.h \
 		 sys/resource.h sys/param.h sys/times.h sys/stat.h \
-		 direct.h malloc.h langinfo.h ldfcn.h locale.h wchar.h
+		 direct.h malloc.h langinfo.h ldfcn.h locale.h wchar.h \
+                 libelf.h gelf.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_preproc "$LINENO" "$ac_header" "$as_ac_Header"
@@ -8209,7 +8210,7 @@  if test "${gcc_cv_collect2_libs+set}" = set; then :
   $as_echo_n "(cached) " >&6
 else
   save_LIBS="$LIBS"
-for libs in '' -lld -lmld \
+for libs in '' -lelf -lld -lmld \
 		'-L/usr/lib/cmplrs/cc2.11 -lmld' \
 		'-L/usr/lib/cmplrs/cc3.11 -lmld'
 do
@@ -8238,6 +8239,31 @@  fi
 rm -f core conftest.err conftest.$ac_objext \
     conftest$ac_exeext conftest.$ac_ext
 done
+LIBS=-lelf
+test -z "$gcc_cv_collect2_libs" &&
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char elf_version ();
+int
+main ()
+{
+return elf_version ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  gcc_cv_collect2_libs=-lelf
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
 LIBS="$save_LIBS"
 test -z "$gcc_cv_collect2_libs" && gcc_cv_collect2_libs='none required'
 fi
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 1300f82..a567619 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -893,7 +893,8 @@  AC_HEADER_SYS_WAIT
 AC_CHECK_HEADERS(limits.h stddef.h string.h strings.h stdlib.h time.h iconv.h \
 		 fcntl.h unistd.h sys/file.h sys/time.h sys/mman.h \
 		 sys/resource.h sys/param.h sys/times.h sys/stat.h \
-		 direct.h malloc.h langinfo.h ldfcn.h locale.h wchar.h)
+		 direct.h malloc.h langinfo.h ldfcn.h locale.h wchar.h \
+                 libelf.h gelf.h)
 
 # Check for thread headers.
 AC_CHECK_HEADER(thread.h, [have_thread_h=yes], [have_thread_h=])
@@ -912,7 +913,7 @@  AC_C_BIGENDIAN
 # We may need a special search path to get them linked.
 AC_CACHE_CHECK(for collect2 libraries, gcc_cv_collect2_libs,
 [save_LIBS="$LIBS"
-for libs in '' -lld -lmld \
+for libs in '' -lelf -lld -lmld \
 		'-L/usr/lib/cmplrs/cc2.11 -lmld' \
 		'-L/usr/lib/cmplrs/cc3.11 -lmld'
 do
@@ -920,6 +921,9 @@  do
 	AC_TRY_LINK_FUNC(ldopen,
 		[gcc_cv_collect2_libs="$libs"; break])
 done
+LIBS=-lelf
+test -z "$gcc_cv_collect2_libs" && 
+AC_TRY_LINK_FUNC(elf_version, [gcc_cv_collect2_libs=-lelf])
 LIBS="$save_LIBS"
 test -z "$gcc_cv_collect2_libs" && gcc_cv_collect2_libs='none required'])
 case $gcc_cv_collect2_libs in