diff mbox

[build] Use elfdump, readelf in make_sunver.pl

Message ID ydd7hg1tgnw.fsf@manam.CeBiTec.Uni-Bielefeld.DE
State New
Headers show

Commit Message

Rainer Orth Nov. 25, 2010, 6:13 p.m. UTC
The use of nm in contrib/make_sunver.pl has caused problems without end,
both due to the different and sometimes hard to parse output formats of
GNU nm and Sun nm, and lately due to the inability to reliably detect
and ignore hidden symbols in the symbol table:

While trying to use CVS gas and gld in a Solaris 11 bootstrap, it could
happen that global hidden symbols in libgcj.so input objects were turned
global with default visibility and thus cause link failures later.
While Sun nm can show hidden symbols in its default (without -P)
output format, GNU nm can not.

So I've finally chosed to bite the bullet and move away from nm here:
I'd like to avoid requiring users to install GNU binutils if possible,
so I'm using elfdump -s in the native case.  For the cross case, I chose
to use readelf -s instead since the objdump -t output is useless for
automatic consumption.

The patch below implements this.  It allowed the bootstrap with CVS gas
and gld to complete successfully, and I've bootstrapped on Solaris 8 to
11 on SPARC and x86, both with Sun as and gas to make sure nothing
broke.  Than, I've compared the old version maps generated using nm
output and the new ones and found them to be identical with the
exception of the two libgcj ones where many hidden symbols are now
correctly omitted.  Finally, I've disabled the use of elfdump and
verified that readelf produced identical output.

This should hopefully put an end to all the problems and fragility
Solaris bootstraps have suffered from make_sunver.pl.

Installed.

	Rainer



2010-11-13  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

	* make_sunver.pl: Use elfdump -s to extract symbols if possible,
	readelf -s otherwise.

Comments

Ralf Wildenhues Nov. 27, 2010, 10:47 a.m. UTC | #1
Hi Rainer,

* Rainer Orth wrote on Thu, Nov 25, 2010 at 07:13:55PM CET:
> The use of nm in contrib/make_sunver.pl has caused problems without end,
> both due to the different and sometimes hard to parse output formats of
> GNU nm and Sun nm, and lately due to the inability to reliably detect
> and ignore hidden symbols in the symbol table:
> 
> While trying to use CVS gas and gld in a Solaris 11 bootstrap, it could
> happen that global hidden symbols in libgcj.so input objects were turned
> global with default visibility and thus cause link failures later.
> While Sun nm can show hidden symbols in its default (without -P)
> output format, GNU nm can not.
> 
> So I've finally chosed to bite the bullet and move away from nm here:
> I'd like to avoid requiring users to install GNU binutils if possible,
> so I'm using elfdump -s in the native case.  For the cross case, I chose
> to use readelf -s instead since the objdump -t output is useless for
> automatic consumption.

Bummer that nm makes such problems.  I wonder whether similar problems
are lurking in libtool already, given that it tries to make do with nm
(plus some per-system arguments) everywhere except on w32.  Do you have
a simple test case, or can describe one that I could try to use?

> 2010-11-13  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
> 
> 	* make_sunver.pl: Use elfdump -s to extract symbols if possible,
> 	readelf -s otherwise.

Doesn't this need an update to lib{gfortran,...}/configure.ac and/or
gcc/doc/install.texi, either testing and/or documenting the requirement
for either elfdump or readelf?

> --- a/contrib/make_sunver.pl	Mon Nov 15 12:59:41 2010 +0100
> +++ b/contrib/make_sunver.pl	Fri Nov 19 15:33:21 2010 +0100
> @@ -12,8 +12,7 @@
>  # A comment with the original pattern and its type is left in the output
>  # file to make it easy to understand the matches.
>  #
> -# It expects a 'nm' with the POSIX '-P' option, but everyone has one of
> -# those, right?
> +# It uses elfdump when present (native), GNU readelf otherwise.
>  # It depends on the GNU version of c++filt, since it must understand the
>  # GNU mangling style.
>  
> @@ -46,35 +45,104 @@
>      }
>  }
>  
> -# The nm command to use.
> -my $nm = $ENV{'NM_FOR_TARGET'} || "nm";
> +# We need to detect and ignore hidden symbols.  Solaris nm can only detect
> +# this in the harder to parse default output format, and GNU nm not at all,
> +# so use elfdump -s in the native case and GNU readelf -s otherwise.
> +# GNU objdump -t cannot be used since it produces a variable number of
> +# columns.
>  
> -# Process each symbol.
> -open NM,$nm.' -P '.(join ' ',@OBJECTS).'|' or die $!;
> -while (<NM>) {
> -    my $i;
> -    chomp;
> +# The path to elfdump.
> +my $elfdump = "/usr/ccs/bin/elfdump";

> +if (-f $elfdump) {

Doesn't this break cross-compilation, potentially using a build tool for
a target file?

> +    open ELFDUMP,$elfdump.' -s '.(join ' ',@OBJECTS).'|' or die $!;
> +    my $skip_arsym = 0;

> +    while (<ELFDUMP>) {
> +	chomp;
[...]

Thanks,
Ralf
Rainer Orth Nov. 29, 2010, 11:43 a.m. UTC | #2
Hi Ralf,

> Bummer that nm makes such problems.  I wonder whether similar problems
> are lurking in libtool already, given that it tries to make do with nm
> (plus some per-system arguments) everywhere except on w32.  Do you have
> a simple test case, or can describe one that I could try to use?

it very much depends on how you parse nm output and if you try to match
all lines or ignore ones you don't.  Problems were e.g. caused by
register symbols being included in Sun (or GNU, I don't really remember)
output on SPARC which make_sunver.pl didn't know about.  The worst
problem lately, though, was the fact that collect2 with -flto tries to
parse nm output, but simply ignores it if it isn't in GNU nm format
(e.g. the default Sun/System V nm output).

>> 2010-11-13  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
>> 
>> 	* make_sunver.pl: Use elfdump -s to extract symbols if possible,
>> 	readelf -s otherwise.
>
> Doesn't this need an update to lib{gfortran,...}/configure.ac and/or
> gcc/doc/install.texi, either testing and/or documenting the requirement
> for either elfdump or readelf?

I don't think so: elfdump is in the base system, and I don't think we
specificially document requirements for ar, nm or ld.  readelf is only
need for cross compilation, and you will always need a cross toolchain
in that case, so there's nothing new either.

>> +# The path to elfdump.
>> +my $elfdump = "/usr/ccs/bin/elfdump";
>
>> +if (-f $elfdump) {
>
> Doesn't this break cross-compilation, potentially using a build tool for
> a target file?

No, make_sunver.pl is only used for *-*-solaris2* targets, and
Solaris/x86 elfdump can deal with Solaris/SPARC objects just fine.

	Rainer
diff mbox

Patch

diff -r e93a180fd196 contrib/make_sunver.pl
--- a/contrib/make_sunver.pl	Mon Nov 15 12:59:41 2010 +0100
+++ b/contrib/make_sunver.pl	Fri Nov 19 15:33:21 2010 +0100
@@ -12,8 +12,7 @@ 
 # A comment with the original pattern and its type is left in the output
 # file to make it easy to understand the matches.
 #
-# It expects a 'nm' with the POSIX '-P' option, but everyone has one of
-# those, right?
+# It uses elfdump when present (native), GNU readelf otherwise.
 # It depends on the GNU version of c++filt, since it must understand the
 # GNU mangling style.
 
@@ -46,35 +45,104 @@ 
     }
 }
 
-# The nm command to use.
-my $nm = $ENV{'NM_FOR_TARGET'} || "nm";
+# We need to detect and ignore hidden symbols.  Solaris nm can only detect
+# this in the harder to parse default output format, and GNU nm not at all,
+# so use elfdump -s in the native case and GNU readelf -s otherwise.
+# GNU objdump -t cannot be used since it produces a variable number of
+# columns.
 
-# Process each symbol.
-open NM,$nm.' -P '.(join ' ',@OBJECTS).'|' or die $!;
-while (<NM>) {
-    my $i;
-    chomp;
+# The path to elfdump.
+my $elfdump = "/usr/ccs/bin/elfdump";
 
-    # nm prints out stuff at the start, ignore it.
-    next if (/^$/);
-    next if (/:$/);
-    # Ignore entries without symbol name.  Sun nm emits those for local, .bss
-    # or scratch register (SPARC only) symbols for example.
-    next if (/^ /);
-    # Ignore undefined and local symbols.
-    next if (/^[^ ]+[ \t]+[Ua-z][ \t]+/);
-    # Ignore objects without symbol table.  Message goes to stdout with Sun
-    # nm, while GNU nm emits the corresponding message to stderr.
-    next if (/.* - No symbol table data/);
+if (-f $elfdump) {
+    open ELFDUMP,$elfdump.' -s '.(join ' ',@OBJECTS).'|' or die $!;
+    my $skip_arsym = 0;
 
-    # $sym is the name of the symbol.
-    die "unknown nm output $_" if (! /^([^ ]+)[ \t]+[A-Z][ \t]+/);
-    my $sym = $1;
+    while (<ELFDUMP>) {
+	chomp;
 
-    # Remember symbol.
-    $sym_hash{$sym}++;
+	# Ignore empty lines.
+	if (/^$/) {
+	    # End of archive symbol table, stop skipping.
+	    $skip_arsym = 0 if $skip_arsym;
+	    next;
+	}
+
+	# Keep skipping until end of archive symbol table.
+	next if ($skip_arsym);
+
+	# Ignore object name header for individual objects and archives.
+	next if (/:$/);
+
+	# Ignore table header lines.
+	next if (/^Symbol Table Section:/);
+	next if (/index.*value.*size/);
+
+	# Start of archive symbol table: start skipping.
+	if (/^Symbol Table: \(archive/) {
+	    $skip_arsym = 1;
+	    next;
+	}
+
+	# Split table.
+	(undef, undef, undef, undef, $bind, $oth, undef, $shndx, $name) = split;
+
+	# Error out for unknown input.
+	die "unknown input line:\n$_" unless defined($bind);
+
+	# Ignore local symbols.
+	next if ($bind eq "LOCL");
+	# Ignore hidden symbols.
+	next if ($oth eq "H");
+	# Ignore undefined symbols.
+	next if ($shndx eq "UNDEF");
+	# Error out for unhandled cases.
+	if ($bind !~ /^(GLOB|WEAK)/ or $oth ne "D") {
+	    die "unhandled symbol:\n$_";
+	}
+
+	# Remember symbol.
+	$sym_hash{$name}++;
+    }
+    close ELFDUMP or die "$elfdump error";
+} else {
+    open READELF, 'readelf -s -W '.(join ' ',@OBJECTS).'|' or die $!;
+    # Process each symbol.
+    while (<READELF>) {
+	chomp;
+
+	# Ignore empty lines.
+	next if (/^$/);
+
+	# Ignore object name header.
+	next if (/^File: .*$/);
+
+	# Ignore table header lines.
+	next if (/^Symbol table.*contains.*:/);
+	next if (/Num:.*Value.*Size/);
+
+	# Split table.
+	(undef, undef, undef, undef, $bind, $vis, $ndx, $name) = split;
+
+	# Error out for unknown input.
+	die "unknown input line:\n$_" unless defined($bind);
+
+	# Ignore local symbols.
+	next if ($bind eq "LOCAL");
+	# Ignore hidden symbols.
+	next if ($vis eq "HIDDEN");
+	# Ignore undefined symbols.
+	next if ($ndx eq "UND");
+	# Error out for unhandled cases.
+	if ($bind !~ /^(GLOBAL|WEAK)/ or $vis ne "DEFAULT") {
+	    die "unhandled symbol:\n$_";
+	}
+
+	# Remember symbol.
+	$sym_hash{$name}++;
+    }
+    close READELF or die "readelf error";
 }
-close NM or die "nm error";
 
 ##########
 # The various types of glob patterns.