diff mbox

[RFC,2/9] support/scripts: add fix-rpath script to sanitize the rpath

Message ID 1488550733-3956-3-git-send-email-wg@grandegger.com
State Superseded
Headers show

Commit Message

Wolfgang Grandegger March 3, 2017, 2:18 p.m. UTC
From: Samuel Martin <s.martin49@gmail.com>

This commit introduces a fix-rpath script able to scan a tree,
detect ELF files, check their RPATH and fix it in a proper way.
The RPATH fixup is done by the patchelf utility using the option
"--make-rpath-relative".

Signed-off-by: Samuel Martin <s.martin49@gmail.com>
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
---
 support/scripts/fix-rpath | 106 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 106 insertions(+)
 create mode 100755 support/scripts/fix-rpath

Comments

Thomas Petazzoni March 3, 2017, 2:48 p.m. UTC | #1
Hello,

On Fri,  3 Mar 2017 15:18:46 +0100, Wolfgang Grandegger wrote:
> From: Samuel Martin <s.martin49@gmail.com>
> 
> This commit introduces a fix-rpath script able to scan a tree,
> detect ELF files, check their RPATH and fix it in a proper way.
> The RPATH fixup is done by the patchelf utility using the option
> "--make-rpath-relative".
> 
> Signed-off-by: Samuel Martin <s.martin49@gmail.com>
> Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
> ---
>  support/scripts/fix-rpath | 106 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 106 insertions(+)
>  create mode 100755 support/scripts/fix-rpath

Now that patchelf is much smarter, I am wondering if we still really
need a utility script just to call patchelf. Have you tried bringing
this logic into the make logic?

Thanks,

Thomas
Wolfgang Grandegger March 3, 2017, 4:39 p.m. UTC | #2
Hello,

Am 03.03.2017 um 15:48 schrieb Thomas Petazzoni:
> Hello,
>
> On Fri,  3 Mar 2017 15:18:46 +0100, Wolfgang Grandegger wrote:
>> From: Samuel Martin <s.martin49@gmail.com>
>>
>> This commit introduces a fix-rpath script able to scan a tree,
>> detect ELF files, check their RPATH and fix it in a proper way.
>> The RPATH fixup is done by the patchelf utility using the option
>> "--make-rpath-relative".
>>
>> Signed-off-by: Samuel Martin <s.martin49@gmail.com>
>> Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
>> ---
>>  support/scripts/fix-rpath | 106 ++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 106 insertions(+)
>>  create mode 100755 support/scripts/fix-rpath
>
> Now that patchelf is much smarter, I am wondering if we still really
> need a utility script just to call patchelf. Have you tried bringing
> this logic into the make logic?

Well, I think three magic "find" oneliners could do the job. I will try 
and see.

Wolfgang.
Arnout Vandecappelle March 4, 2017, 11:39 a.m. UTC | #3
Hi Wolfgang,

 I realize that this patch is going to be dropped because it is going to be done
directly from the makefiles, but I have some hopefully useful comments still.

 First of all: make sure the original author is explicitly in Cc. git send-email
normally does that automatically so I'm surprised Samuel isn't there.

On 03-03-17 15:18, Wolfgang Grandegger wrote:
> From: Samuel Martin <s.martin49@gmail.com>
> 
> This commit introduces a fix-rpath script able to scan a tree,
> detect ELF files, check their RPATH and fix it in a proper way.
> The RPATH fixup is done by the patchelf utility using the option
> "--make-rpath-relative".
> 
> Signed-off-by: Samuel Martin <s.martin49@gmail.com>
> Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
> ---
>  support/scripts/fix-rpath | 106 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 106 insertions(+)
>  create mode 100755 support/scripts/fix-rpath
> 
> diff --git a/support/scripts/fix-rpath b/support/scripts/fix-rpath
> new file mode 100755
> index 0000000..bde2c17
> --- /dev/null
> +++ b/support/scripts/fix-rpath
> @@ -0,0 +1,106 @@
> +#!/usr/bin/env bash
> +
> +# Copyright (C) 2016 Samuel Martin <s.martin49@gmail.com>
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> +
> +#set -e
> +#set -v

 Don't include debugging cruft in the final patch.

> +
> +usage() {
> +  cat <<EOF >&2
> +Usage:  ${0} TREE_KIND TREE_ROOT
> +
> +Description:
> +
> +        This script scans a tree and sanitize ELF files' RPATH found in there.
> +
> +        Sanitization behaves the same whatever the kindd of the processed tree, but
> +        the resulting RPATH differs.
> +
> +        Sanitization action:
> +        - remove RPATH pointing outside of the tree
> +        - for RPATH pointing in the tree:
> +          - if they point to standard location (/lib, /usr/lib): remove them
> +          - otherwise: make them relative using \$ORIGIN
> +
> +        For the target tree:
> +        - scan the whole tree for sanitization
> +
> +        For the staging tree :
> +        - scan the whole tree for sanitization
> +
> +        For the host tree:
> +        - skip the staging tree for sanitization
> +        - add \$HOST_DIR/{lib,usr/lib} to RPATH (as relative pathes)
> +
> +Arguments:
> +
> +        TREE_KIND   Kind of tree to be processed.
> +                    Allowed values: host, target, staging
> +
> +        TREE_ROOT   Path to the root of the tree to be scaned
> +
> +Environment:
> +
> +        PATCHELF        patchelf program to use
> +                        (default: patchelf)
> +EOF
> +}
> +
> +: ${PATCHELF:=patchelf}
> +
> +main() {
> +    local tree="${1}"
> +    local basedir="$(readlink -f "${2}")"
> +
> +    local find_args=( "${basedir}" )
> +    local sanitize_extra_args=( )
> +
> +    case "${tree}" in
> +        host)
> +            # do not process the sysroot (only contains target binaries)
> +            find_args+=( "-name" "sysroot" "-prune" "-o" )

 I believe we should specify a full path here. If there is any other directory
in the tree that happens to be called 'sysroot' for whatever reason, it
shouldn't be pruned. When this is done directly from make, it's pretty easy:
-path $(STAGING_DIR) -prune


> +
> +            # do not process the external toolchain installation directory to
> +            # to avoid breaking it.
> +            find_args+=( "-path" "*/opt/ext-toolchain" "-prune" "-o" )

 Same here: $(TOOLCHAIN_EXTERNAL_DOWNLOAD_INSTALL_DIR) which has the full path.


 Regards,
 Arnout

> +
> +            ;;
> +        staging|target)
> +            # discard ${hostdir}/lib and ${hostdir}/usr/lib
> +            sanitize_extra_args+=( "--no-standard-lib-dirs" )
> +
> +            ;;
> +        *)
> +            usage
> +            exit 1
> +            ;;
> +    esac
> +
> +    find_args+=( "-type" "f" "-print" )
> +
> +    while read file ; do
> +        # some files are not writable
> +        chmod u+w $file
> +        # call patchelf for each regular file; error will silently be ignored.
> +        ${PATCHELF} --debug --make-rpath-relative "${basedir}" ${sanitize_extra_args[@]} "${file}" >> /dev/null 2>&1
> +    done < <(find ${find_args[@]})
> +
> +    # ignore errors
> +    return 0
> +}
> +
> +main ${@}
>
Wolfgang Grandegger March 4, 2017, 3:29 p.m. UTC | #4
Hello Arnout,

Am 04.03.2017 um 12:39 schrieb Arnout Vandecappelle:
>  Hi Wolfgang,
>
>  I realize that this patch is going to be dropped because it is going to be done
> directly from the makefiles, but I have some hopefully useful comments still.

If the logic is not too complex.

>
>  First of all: make sure the original author is explicitly in Cc. git send-email
> normally does that automatically so I'm surprised Samuel isn't there.

I think the mailing-list is stripping the CC. On the mail I got, Samuel 
is on CC. I did use "git send-email" with an "-cc" to Samuel.

> On 03-03-17 15:18, Wolfgang Grandegger wrote:
>> From: Samuel Martin <s.martin49@gmail.com>
>>
>> This commit introduces a fix-rpath script able to scan a tree,
>> detect ELF files, check their RPATH and fix it in a proper way.
>> The RPATH fixup is done by the patchelf utility using the option
>> "--make-rpath-relative".
>>
>> Signed-off-by: Samuel Martin <s.martin49@gmail.com>
>> Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
>> ---
>>  support/scripts/fix-rpath | 106 ++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 106 insertions(+)
>>  create mode 100755 support/scripts/fix-rpath
>>
>> diff --git a/support/scripts/fix-rpath b/support/scripts/fix-rpath
>> new file mode 100755
>> index 0000000..bde2c17
>> --- /dev/null
>> +++ b/support/scripts/fix-rpath
>> @@ -0,0 +1,106 @@
>> +#!/usr/bin/env bash
>> +
>> +# Copyright (C) 2016 Samuel Martin <s.martin49@gmail.com>
>> +#
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 2 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> +# General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write to the Free Software
>> +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>> +
>> +#set -e
>> +#set -v
>
>  Don't include debugging cruft in the final patch.

I agree!

>> +
>> +usage() {
>> +  cat <<EOF >&2
>> +Usage:  ${0} TREE_KIND TREE_ROOT
>> +
>> +Description:
>> +
>> +        This script scans a tree and sanitize ELF files' RPATH found in there.
>> +
>> +        Sanitization behaves the same whatever the kindd of the processed tree, but
>> +        the resulting RPATH differs.
>> +
>> +        Sanitization action:
>> +        - remove RPATH pointing outside of the tree
>> +        - for RPATH pointing in the tree:
>> +          - if they point to standard location (/lib, /usr/lib): remove them
>> +          - otherwise: make them relative using \$ORIGIN
>> +
>> +        For the target tree:
>> +        - scan the whole tree for sanitization
>> +
>> +        For the staging tree :
>> +        - scan the whole tree for sanitization
>> +
>> +        For the host tree:
>> +        - skip the staging tree for sanitization
>> +        - add \$HOST_DIR/{lib,usr/lib} to RPATH (as relative pathes)
>> +
>> +Arguments:
>> +
>> +        TREE_KIND   Kind of tree to be processed.
>> +                    Allowed values: host, target, staging
>> +
>> +        TREE_ROOT   Path to the root of the tree to be scaned
>> +
>> +Environment:
>> +
>> +        PATCHELF        patchelf program to use
>> +                        (default: patchelf)
>> +EOF
>> +}
>> +
>> +: ${PATCHELF:=patchelf}
>> +
>> +main() {
>> +    local tree="${1}"
>> +    local basedir="$(readlink -f "${2}")"
>> +
>> +    local find_args=( "${basedir}" )
>> +    local sanitize_extra_args=( )
>> +
>> +    case "${tree}" in
>> +        host)
>> +            # do not process the sysroot (only contains target binaries)
>> +            find_args+=( "-name" "sysroot" "-prune" "-o" )
>
>  I believe we should specify a full path here. If there is any other directory
> in the tree that happens to be called 'sysroot' for whatever reason, it
> shouldn't be pruned. When this is done directly from make, it's pretty easy:
> -path $(STAGING_DIR) -prune

OK.

>
>> +
>> +            # do not process the external toolchain installation directory to
>> +            # to avoid breaking it.
>> +            find_args+=( "-path" "*/opt/ext-toolchain" "-prune" "-o" )
>
>  Same here: $(TOOLCHAIN_EXTERNAL_DOWNLOAD_INSTALL_DIR) which has the full path.

OK.

>
>  Regards,
>  Arnout
>
>> +
>> +            ;;
>> +        staging|target)
>> +            # discard ${hostdir}/lib and ${hostdir}/usr/lib
>> +            sanitize_extra_args+=( "--no-standard-lib-dirs" )
>> +
>> +            ;;
>> +        *)
>> +            usage
>> +            exit 1
>> +            ;;
>> +    esac
>> +
>> +    find_args+=( "-type" "f" "-print" )
>> +
>> +    while read file ; do
>> +        # some files are not writable
>> +        chmod u+w $file

This is just a hack because some elf files are not writable. To make all 
regular file writable is wrong. We may also want to reset the original 
permission. It needs more logic, unfortunately.

>> +        # call patchelf for each regular file; error will silently be ignored.
>> +        ${PATCHELF} --debug --make-rpath-relative "${basedir}" ${sanitize_extra_args[@]} "${file}" >> /dev/null 2>&1
>> +    done < <(find ${find_args[@]})
>> +
>> +    # ignore errors

Simply ignoring errors here is fast, but it also does not allow to print 
warnings.

Wolfgang.
Wolfgang Grandegger March 6, 2017, 9:07 a.m. UTC | #5
Hello,

Am 03.03.2017 um 17:39 schrieb Wolfgang Grandegger:
> Hello,
> 
> Am 03.03.2017 um 15:48 schrieb Thomas Petazzoni:
>> Hello,
>>
>> On Fri,  3 Mar 2017 15:18:46 +0100, Wolfgang Grandegger wrote:
>>> From: Samuel Martin <s.martin49@gmail.com>
>>>
>>> This commit introduces a fix-rpath script able to scan a tree,
>>> detect ELF files, check their RPATH and fix it in a proper way.
>>> The RPATH fixup is done by the patchelf utility using the option
>>> "--make-rpath-relative".
>>>
>>> Signed-off-by: Samuel Martin <s.martin49@gmail.com>
>>> Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
>>> ---
>>>  support/scripts/fix-rpath | 106
>>> ++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 106 insertions(+)
>>>  create mode 100755 support/scripts/fix-rpath
>>
>> Now that patchelf is much smarter, I am wondering if we still really
>> need a utility script just to call patchelf. Have you tried bringing
>> this logic into the make logic?
> 
> Well, I think three magic "find" oneliners could do the job. I will try
> and see.

Here are my current oneliners in the Makefile. It mainly shows what needs
to be done. Unfortunately, the scanning for ELF files may take rather
long:

 >>>   Sanitizing RPATH in target and staging directory
 time find /opt/bdo/x86_host/target -type f -print0 | xargs -0 -I {} sh -c \
	"if patchelf --print-rpath {} >/dev/null 2>&1; then chmod +w {}; \
	/opt/bdo/x86_host/host/usr/bin/patchelf --make-rpath-relative /opt/bdo/x86_host/target \
	--no-standard-lib-dirs {}; fi"

  real	0m9.644s
  user	0m0.032s
  sys	0m0.360s

  time find /opt/bdo/x86_host/host/usr/x86_64-buildroot-linux-gnu/sysroot -type f -print0 | xargs -0 -I {} sh -c \
	"if patchelf --print-rpath {} >/dev/null 2>&1; then chmod +w {}; \
	/opt/bdo/x86_host/host/usr/bin/patchelf --make-rpath-relative /opt/bdo/x86_host/host/usr/x86_64-buildroot-linux-gnu/sysroot \
	--no-standard-lib-dirs {}; fi"

  real	0m46.433s
  user	0m0.240s
  sys	0m1.980s

  >>>   Rendering the SDK relocatable
  cp /opt/bdo/x86_host/host/usr/bin/patchelf /opt/bdo/x86_host/host/usr/bin/patchelf.__copy__
  time find /opt/bdo/x86_host/host -path /opt/bdo/x86_host/host/usr/x86_64-buildroot-linux-gnu/sysroot -prune -o \
	-path /opt/bdo/x86_host/host/usr/bin/patchelf -prune -o -type f -print0 | xargs -0 -I {} sh -c \
	"if patchelf --print-rpath {} >/dev/null 2>&1; then chmod +w {}; \
	/opt/bdo/x86_host/host/usr/bin/patchelf --make-rpath-relative /opt/bdo/x86_host/host {}; \
	fi"
  mv /opt/bdo/dcu_host/host/usr/bin/patchelf.__copy__ /opt/bdo/dcu_host/host/usr/bin/patchelf

  real	0m23.154s
  user	0m0.144s
  sys	0m1.124s


Using "file" to test if it's an ELF files is much slower. Using
"patchelf --print-rpath {} >/dev/null 2>&1" is much smarter and allows
to run the path sanitation without ignoring errors.

Because some ELF files are not writeable, we need a chmod first. For the
host tree, we also need special handling for patchelf to work around the
"file in use" issue.

"xargs" could be speedup using "-P NUM". Would that be an option?
We could also introduce BR2_TOOLCHAIN_RELOCATABLE. Any other idea to
speed up the scripts above?

Wolfgang.
Wolfgang Grandegger March 7, 2017, 9:13 a.m. UTC | #6
Am 07.03.2017 um 00:49 schrieb Arnout Vandecappelle:
> 
> 
> On 06-03-17 10:07, Wolfgang Grandegger wrote:
>> Hello,
>>
>> Am 03.03.2017 um 17:39 schrieb Wolfgang Grandegger:
>>> Hello,
>>>
>>> Am 03.03.2017 um 15:48 schrieb Thomas Petazzoni:
>>>> Hello,
>>>>
>>>> On Fri,  3 Mar 2017 15:18:46 +0100, Wolfgang Grandegger wrote:
>>>>> From: Samuel Martin <s.martin49@gmail.com>
>>>>>
>>>>> This commit introduces a fix-rpath script able to scan a tree,
>>>>> detect ELF files, check their RPATH and fix it in a proper way.
>>>>> The RPATH fixup is done by the patchelf utility using the option
>>>>> "--make-rpath-relative".
>>>>>
>>>>> Signed-off-by: Samuel Martin <s.martin49@gmail.com>
>>>>> Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
>>>>> ---
>>>>>  support/scripts/fix-rpath | 106
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  1 file changed, 106 insertions(+)
>>>>>  create mode 100755 support/scripts/fix-rpath
>>>>
>>>> Now that patchelf is much smarter, I am wondering if we still really
>>>> need a utility script just to call patchelf. Have you tried bringing
>>>> this logic into the make logic?
>>>
>>> Well, I think three magic "find" oneliners could do the job. I will try
>>> and see.
>>
>> Here are my current oneliners in the Makefile. It mainly shows what needs
>> to be done. 
> 
>  I think they're sufficiently complicated to warrant offloading to a script
> after all. Especially since you need the same structure three times, so it would
> have to be factored into a macro, which makes the makefile more complicated.
> Alternatively, you could keep the find command in make, and use -exec
> support/scripts/fix-rpath '{}' ';'  - this avoids all the tricky xargs stuff.

+1

>> Unfortunately, the scanning for ELF files may take rather
>> long:
>>
>>  >>>   Sanitizing RPATH in target and staging directory
>>  time find /opt/bdo/x86_host/target -type f -print0 | xargs -0 -I {} sh -c \
>> 	"if patchelf --print-rpath {} >/dev/null 2>&1; then chmod +w {}; \
> 
>  You can actually include this in the find command itself, it's almost twice as
> fast:
> 
> find /opt/bdo/x86_host/target -type f \
> 	-exec patchelf --print-rpath '{}' ';' \
> 	-exec  support/scripts/fix-rpath '{}' ';'

OK, it' two times faster, indeed

>> 	/opt/bdo/x86_host/host/usr/bin/patchelf --make-rpath-relative /opt/bdo/x86_host/target \
>> 	--no-standard-lib-dirs {}; fi"
>>
>>   real	0m9.644s
> 
>  Can you quantify which bit is taking the time here? Is it the find itself, or
> the patchelf --print-rpath, or the final patchelf?

The initial ELF file testing takes most of the time. The processing of the ELF
file itself doesn't matter much:

  $ find target -type f | wc -l
  3902
  $ find host/usr/x86_64-buildroot-linux-gnu/sysroot -type f | wc -l
  21413
  $ find host -path host/usr/x86_64-buildroot-linux-gnu/sysroot -prune -o -type f | wc -l
  10861
 
>  Also, how does the time compare to the rest of the finalize step and creating a
> tarball?

The rest of "target-finalize" takes half a second and the tar:

$ time tar cjf /tmp/target.tar.bz2 target
real	0m21.742s

But it depends on the compression, of course.

>>   user	0m0.032s
>>   sys	0m0.360s
>>
>>   time find /opt/bdo/x86_host/host/usr/x86_64-buildroot-linux-gnu/sysroot -type f -print0 | xargs -0 -I {} sh -c \
> 
>  We could optimise this a little by eliminating directories that certainly don't
> contain interesting files, like /usr/include. Or alternatively, explicitly
> select only "\( -path lib -o -path bin -o -path sbin \)".


  $ find host/usr/x86_64-buildroot-linux-gnu/sysroot/usr/include/ -type f | wc -l
  6946
  $ find host/usr/x86_64-buildroot-linux-gnu/sysroot/usr/share/ -type f | wc -l
  9273

But there is one ELF file in ".../usr/share".

>  However, now I think of it: why do we do this for staging? Binaries in staging
> are never executed... Is it just to eliminate all references to HOST_DIR from
> the binaries?

Good question! So far I followed Samuels proposal (now on CC).

>> 	"if patchelf --print-rpath {} >/dev/null 2>&1; then chmod +w {}; \
>> 	/opt/bdo/x86_host/host/usr/bin/patchelf --make-rpath-relative /opt/bdo/x86_host/host/usr/x86_64-buildroot-linux-gnu/sysroot \
>> 	--no-standard-lib-dirs {}; fi"
>>
>>   real	0m46.433s
>>   user	0m0.240s
>>   sys	0m1.980s
>>
>>   >>>   Rendering the SDK relocatable
>>   cp /opt/bdo/x86_host/host/usr/bin/patchelf /opt/bdo/x86_host/host/usr/bin/patchelf.__copy__
> 
>  Why do you need this? To make sure patchelf itself is processed as well? Does
> it contain an invalid rpath? If yes, isn't it easier to patch the patchelf build
> system so it uses $ORIGIN already?

We cannot update the "patchelf" binary while it's in use. There no
need to touch it if it already uses a proper rpath, of course.
Currently it uses:

  $ readelf -d host/usr/bin/patchelf
  ...
  0x0000000000000001 (NEEDED)             Gemeinsame Bibliothek [libstdc++.so.6]
  0x0000000000000001 (NEEDED)             Gemeinsame Bibliothek [libgcc_s.so.1]
  0x0000000000000001 (NEEDED)             Gemeinsame Bibliothek [libc.so.6]
  0x000000000000000f (RPATH)              Bibliothek rpath: [/opt/bdo/dcu_host/host/usr/lib]

"patchelf --make-relative" will drop the rpath above, because the first
two needed libs are not in the listed rpath but in "host/usr/x86_64-buildroot-linux-gnu/lib64".
Running patchelf with "LD_DEBUG" tells me that it will take the libraries
from the host (/usr/lib). Just wondering if that's correct!?

> 
>>   time find /opt/bdo/x86_host/host -path /opt/bdo/x86_host/host/usr/x86_64-buildroot-linux-gnu/sysroot -prune -o \
>> 	-path /opt/bdo/x86_host/host/usr/bin/patchelf -prune -o -type f -print0 | xargs -0 -I {} sh -c \
>> 	"if patchelf --print-rpath {} >/dev/null 2>&1; then chmod +w {}; \
>> 	/opt/bdo/x86_host/host/usr/bin/patchelf --make-rpath-relative /opt/bdo/x86_host/host {}; \
>> 	fi"
>>   mv /opt/bdo/dcu_host/host/usr/bin/patchelf.__copy__ /opt/bdo/dcu_host/host/usr/bin/patchelf
>>
>>   real	0m23.154s
>>   user	0m0.144s
>>   sys	0m1.124s
>>
>>
>> Using "file" to test if it's an ELF files is much slower. Using
>> "patchelf --print-rpath {} >/dev/null 2>&1" is much smarter and allows
>> to run the path sanitation without ignoring errors.
> 
>  readelf -h is still a little faster, but it also matches files without rpath.

Yes, 38 vs. 44 seconds.

> To just check if it's an ELF file, it's actually enough to do "cmp -n 4" with
> another ELF file (e.g. patchelf itself). That gives even more false positives
> (e.g. object files). So if one of those is chosen, a further check with patchelf
> --print-rpath is still needed, or errors have to be ignored.

Yep.

>>
>> Because some ELF files are not writeable, we need a chmod first. 
> 
>  Shouldn't we restore the original permissions?

Maybe! There are actually two libraries not being writeable. Can't tell if
that's by purpose.

>> For the
>> host tree, we also need special handling for patchelf to work around the
>> "file in use" issue.
>>
>> "xargs" could be speedup using "-P NUM". Would that be an option?
> 
>  Yes, we could use PARALLEL_JOBS.

Using -P8 on my i7 quad-core laptop is almost 8 times faster.

Wolfgang.
Wolfgang Grandegger March 8, 2017, 9:25 a.m. UTC | #7
Am 07.03.2017 um 18:40 schrieb Arnout Vandecappelle:
>
>
> On 07-03-17 10:13, Wolfgang Grandegger wrote:
>> Am 07.03.2017 um 00:49 schrieb Arnout Vandecappelle:
>>>
>>>
>>> On 06-03-17 10:07, Wolfgang Grandegger wrote:
>>>> Hello,
>>>>
> [snip]
>>>>  >>>   Sanitizing RPATH in target and staging directory
>>>>  time find /opt/bdo/x86_host/target -type f -print0 | xargs -0 -I {} sh -c \
>>>> 	"if patchelf --print-rpath {} >/dev/null 2>&1; then chmod +w {}; \
>>>
>>>  You can actually include this in the find command itself, it's almost twice as
>>> fast:
>>>
>>> find /opt/bdo/x86_host/target -type f \
>>> 	-exec patchelf --print-rpath '{}' ';' \
>>> 	-exec  support/scripts/fix-rpath '{}' ';'
>>
>> OK, it' two times faster, indeed
>
>  Actually with such a construct, and assuming we don't bother with restoring the
> permissions, a helper shell script isn't needed because we can just chain the
> chmod and patchelf calls with further -exec calls. Of course with -exec, xargs
> -P is not possible so you have to evaluate a little what is the best approach.

"-P" is very effective... but that's fine tuning.

>>>> 	/opt/bdo/x86_host/host/usr/bin/patchelf --make-rpath-relative /opt/bdo/x86_host/target \
>>>> 	--no-standard-lib-dirs {}; fi"
>>>>
>>>>   real	0m9.644s
>>>
>>>  Can you quantify which bit is taking the time here? Is it the find itself, or
>>> the patchelf --print-rpath, or the final patchelf?
>>
>> The initial ELF file testing takes most of the time. The processing of the ELF
>> file itself doesn't matter much:
>>
>>   $ find target -type f | wc -l
>>   3902
>>   $ find host/usr/x86_64-buildroot-linux-gnu/sysroot -type f | wc -l
>>   21413
>>   $ find host -path host/usr/x86_64-buildroot-linux-gnu/sysroot -prune -o -type f | wc -l
>>   10861
>
>  What kind of config is this? Many packages? Or just an internal toolchain?

It's a config including QT5... also doubling the build time .

$ find sysroot/usr/lib/qt/ -type f| wc -l
3355
$ find sysroot/usr/include/qt5/ -type f| wc -l
4168

>>>  Also, how does the time compare to the rest of the finalize step and creating a
>>> tarball?
>>
>> The rest of "target-finalize" takes half a second and the tar:
>>
>> $ time tar cjf /tmp/target.tar.bz2 target
>> real	0m21.742s
>>
>> But it depends on the compression, of course.
>>
>>>>   user	0m0.032s
>>>>   sys	0m0.360s
>>>>
>>>>   time find /opt/bdo/x86_host/host/usr/x86_64-buildroot-linux-gnu/sysroot -type f -print0 | xargs -0 -I {} sh -c \
>>>
>>>  We could optimise this a little by eliminating directories that certainly don't
>>> contain interesting files, like /usr/include. Or alternatively, explicitly
>>> select only "\( -path lib -o -path bin -o -path sbin \)".
>>
>>
>>   $ find host/usr/x86_64-buildroot-linux-gnu/sysroot/usr/include/ -type f | wc -l
>>   6946
>>   $ find host/usr/x86_64-buildroot-linux-gnu/sysroot/usr/share/ -type f | wc -l
>>   9273
>>
>> But there is one ELF file in ".../usr/share".
>
>  Ah yes, I also found one:
> /usr/share/bash-completion/helpers/gst-completion-helper-1.0
>
>  So in that case we can't skip share. We certainly can skip include, I think,
> though the benefit is perhaps limited.
>
>  Of course, if the entire staging can be skipped it's even easier :-)

Yep.

>>>  However, now I think of it: why do we do this for staging? Binaries in staging
>>> are never executed... Is it just to eliminate all references to HOST_DIR from
>>> the binaries?
>>
>> Good question! So far I followed Samuels proposal (now on CC).
>>
>>>> 	"if patchelf --print-rpath {} >/dev/null 2>&1; then chmod +w {}; \
>>>> 	/opt/bdo/x86_host/host/usr/bin/patchelf --make-rpath-relative /opt/bdo/x86_host/host/usr/x86_64-buildroot-linux-gnu/sysroot \
>>>> 	--no-standard-lib-dirs {}; fi"
>>>>
>>>>   real	0m46.433s
>>>>   user	0m0.240s
>>>>   sys	0m1.980s
>>>>
>>>>   >>>   Rendering the SDK relocatable
>>>>   cp /opt/bdo/x86_host/host/usr/bin/patchelf /opt/bdo/x86_host/host/usr/bin/patchelf.__copy__
>>>
>>>  Why do you need this? To make sure patchelf itself is processed as well? Does
>>> it contain an invalid rpath? If yes, isn't it easier to patch the patchelf build
>>> system so it uses $ORIGIN already?
>>
>> We cannot update the "patchelf" binary while it's in use.
>
>  Yes, but that's avoided with the -name patchelf -prune bit.

Yep.

>> There no
>> need to touch it if it already uses a proper rpath, of course.
>> Currently it uses:
>>
>>   $ readelf -d host/usr/bin/patchelf
>>   ...
>>   0x0000000000000001 (NEEDED)             Gemeinsame Bibliothek [libstdc++.so.6]
>>   0x0000000000000001 (NEEDED)             Gemeinsame Bibliothek [libgcc_s.so.1]
>>   0x0000000000000001 (NEEDED)             Gemeinsame Bibliothek [libc.so.6]
>>   0x000000000000000f (RPATH)              Bibliothek rpath: [/opt/bdo/dcu_host/host/usr/lib]
>>
>> "patchelf --make-relative" will drop the rpath above, because the first
>> two needed libs are not in the listed rpath but in "host/usr/x86_64-buildroot-linux-gnu/lib64".
>
>  That's the staging dir - it should certainly NOT use anything from there.

No, the staging dir is "host/usr/x86_64-buildroot-linux-gnu/sysroot/".

>> Running patchelf with "LD_DEBUG" tells me that it will take the libraries
>> from the host (/usr/lib). Just wondering if that's correct!?
>
>  Yes it's correct, those 3 libraries are standard host libraries that can be
> found in the standard paths.

Hm, what are the libraries in 
"host/usr/x86_64-buildroot-linux-gnu/lib64" then good for? They work if 
I use "LD_LIBRARY_PATH" to run the executable.

>  So I've checked where this rpath comes from. Turns out it is added by
> Buildroot, through HOST_LDFLAGS. This is in fact needed to make sure that an
> executable that uses libraries from HOST_DIR works - see commit
> 4fdecac9d692b8d6f071ba6ad938b6ad68b675fd. So we can either:

Shouldn't "host/usr/x86_64-buildroot-linux-gnu/lib64" not treated in a 
similar way? Just wondering! I have attached the debug output of 
patchelf for the host tree.

> - keep this __copy__ manipulation in the patchelf step; or
> - override HOST_LDFLAGS in patchelf.mk.
>
>  I'm cool either way, so since you already have this workaround, just keep it.
> Perhaps a comment above it explaining why would be useful though.

OK.

>>>>   time find /opt/bdo/x86_host/host -path /opt/bdo/x86_host/host/usr/x86_64-buildroot-linux-gnu/sysroot -prune -o \
>>>> 	-path /opt/bdo/x86_host/host/usr/bin/patchelf -prune -o -type f -print0 | xargs -0 -I {} sh -c \
>>>> 	"if patchelf --print-rpath {} >/dev/null 2>&1; then chmod +w {}; \
>>>> 	/opt/bdo/x86_host/host/usr/bin/patchelf --make-rpath-relative /opt/bdo/x86_host/host {}; \
>>>> 	fi"
>>>>   mv /opt/bdo/dcu_host/host/usr/bin/patchelf.__copy__ /opt/bdo/dcu_host/host/usr/bin/patchelf
>>>>
>>>>   real	0m23.154s
>>>>   user	0m0.144s
>>>>   sys	0m1.124s
>>>>
>>>>
>>>> Using "file" to test if it's an ELF files is much slower. Using
>>>> "patchelf --print-rpath {} >/dev/null 2>&1" is much smarter and allows
>>>> to run the path sanitation without ignoring errors.
>>>
>>>  readelf -h is still a little faster, but it also matches files without rpath.
>>
>> Yes, 38 vs. 44 seconds.
>>
>>> To just check if it's an ELF file, it's actually enough to do "cmp -n 4" with
>>> another ELF file (e.g. patchelf itself). That gives even more false positives
>>> (e.g. object files). So if one of those is chosen, a further check with patchelf
>>> --print-rpath is still needed, or errors have to be ignored.
>>
>> Yep.
>
>  To be evaluated if the speedup from using cmp is worth the false positives.

I wrote a little C program just checking the first 4 bytes of the file. 
The saving is 19 vs. 22 seconds. With "readelf" I have similar results. 
patchelf is written in C++... maybe that's the reason why it's slower.

>>>> Because some ELF files are not writeable, we need a chmod first.
>>>
>>>  Shouldn't we restore the original permissions?
>>
>> Maybe! There are actually two libraries not being writeable. Can't tell if
>> that's by purpose.
>
>  Perhaps as an easier way to restore permissions, we could do it in patchelf
> itself: add something like --force that does a chmod if needed, similar to how
> some editors do it.

Good idea!

>  There are actually quite a few packages that install library read-only, e.g.
> Python and openssl. I guess it is on purpose, but I'm not sure if it is important.

I think restoring the permission is the correct solution.

Wolfgang.
Arnout Vandecappelle March 12, 2017, 8:53 p.m. UTC | #8
On 08-03-17 10:25, Wolfgang Grandegger wrote:
> Am 07.03.2017 um 18:40 schrieb Arnout Vandecappelle:
>>
>>
[snip]
>>  So in that case we can't skip share. We certainly can skip include, I think,
>> though the benefit is perhaps limited.
>>
>>  Of course, if the entire staging can be skipped it's even easier :-)
> 
> Yep.

 I propose that you start with the simple option, but don't do it for staging
for the time being. Until Samuel comes with an argument why staging is necessary :-)

[snip]
>>> "patchelf --make-relative" will drop the rpath above, because the first
>>> two needed libs are not in the listed rpath but in
>>> "host/usr/x86_64-buildroot-linux-gnu/lib64".
>>
>>  That's the staging dir - it should certainly NOT use anything from there.
> 
> No, the staging dir is "host/usr/x86_64-buildroot-linux-gnu/sysroot/".

 My bad. Still, it's a cross-directory. It contains libgcc and other stuff from
the cross-compiler. So it is NOT meant to be used by the host binaries.

 It's easier to see when you do actual cross-compilation, then you'll see the
binaries there are for the target, not the host.


> 
>>> Running patchelf with "LD_DEBUG" tells me that it will take the libraries
>>> from the host (/usr/lib). Just wondering if that's correct!?
>>
>>  Yes it's correct, those 3 libraries are standard host libraries that can be
>> found in the standard paths.
> 
> Hm, what are the libraries in "host/usr/x86_64-buildroot-linux-gnu/lib64" then
> good for? They work if I use "LD_LIBRARY_PATH" to run the executable.

 Since your target is x86_64, just like your host, running a target binary will
just work as long as it can find the libraries (i.e. as long as you set
LD_LIBRARY_PATH).

[snip]
>>  To be evaluated if the speedup from using cmp is worth the false positives.
> 
> I wrote a little C program just checking the first 4 bytes of the file. The
> saving is 19 vs. 22 seconds. With "readelf" I have similar results. 

 You mean 19 seconds for the C program vs 22 seconds for cmp? That's what I
would expect indeed because cmp basically does the same. In my measurements,
readelf still was significantly slower - I didn't write down the numbers but
from memory it was like 25%.

> patchelf is
> written in C++... maybe that's the reason why it's slower.

 Well, it's rather because patchelf reads the entire file into a std::vector,
while readelf will just seek to the bits that are requested.



 By the way, did you already post your patches to the patchelf list?

 Regards,
 Arnout


[snip]
Wolfgang Grandegger March 12, 2017, 9:10 p.m. UTC | #9
Am 12.03.2017 um 21:53 schrieb Arnout Vandecappelle:
>
>
> On 08-03-17 10:25, Wolfgang Grandegger wrote:
>> Am 07.03.2017 um 18:40 schrieb Arnout Vandecappelle:
>>>
>>>
> [snip]
>>>  So in that case we can't skip share. We certainly can skip include, I think,
>>> though the benefit is perhaps limited.
>>>
>>>  Of course, if the entire staging can be skipped it's even easier :-)
>>
>> Yep.
>
>  I propose that you start with the simple option, but don't do it for staging
> for the time being. Until Samuel comes with an argument why staging is necessary :-)
>
> [snip]
>>>> "patchelf --make-relative" will drop the rpath above, because the first
>>>> two needed libs are not in the listed rpath but in
>>>> "host/usr/x86_64-buildroot-linux-gnu/lib64".
>>>
>>>  That's the staging dir - it should certainly NOT use anything from there.
>>
>> No, the staging dir is "host/usr/x86_64-buildroot-linux-gnu/sysroot/".
>
>  My bad. Still, it's a cross-directory. It contains libgcc and other stuff from
> the cross-compiler. So it is NOT meant to be used by the host binaries.
>
>  It's easier to see when you do actual cross-compilation, then you'll see the
> binaries there are for the target, not the host.
>
>
>>
>>>> Running patchelf with "LD_DEBUG" tells me that it will take the libraries
>>>> from the host (/usr/lib). Just wondering if that's correct!?
>>>
>>>  Yes it's correct, those 3 libraries are standard host libraries that can be
>>> found in the standard paths.
>>
>> Hm, what are the libraries in "host/usr/x86_64-buildroot-linux-gnu/lib64" then
>> good for? They work if I use "LD_LIBRARY_PATH" to run the executable.
>
>  Since your target is x86_64, just like your host, running a target binary will
> just work as long as it can find the libraries (i.e. as long as you set
> LD_LIBRARY_PATH).
>
> [snip]
>>>  To be evaluated if the speedup from using cmp is worth the false positives.
>>
>> I wrote a little C program just checking the first 4 bytes of the file. The
>> saving is 19 vs. 22 seconds. With "readelf" I have similar results.
>
>  You mean 19 seconds for the C program vs 22 seconds for cmp? That's what I
> would expect indeed because cmp basically does the same. In my measurements,
> readelf still was significantly slower - I didn't write down the numbers but
> from memory it was like 25%.

No, 19 sec for the C program just checking the first 4 bytes. 22 with 
patchelf. And with readelf I think it was close to 19 sec as well. I 
will doublecheck, though.

>
>> patchelf is
>> written in C++... maybe that's the reason why it's slower.
>
>  Well, it's rather because patchelf reads the entire file into a std::vector,
> while readelf will just seek to the bits that are requested.
>
>
>
>  By the way, did you already post your patches to the patchelf list?

I'm currently preparing a new patch series. I did not find a mailing 
list for the patchelf project. Maybe I have to use the GIT hub interface.

Wolfgang.
Arnout Vandecappelle March 13, 2017, 5:08 p.m. UTC | #10
On 12-03-17 22:10, Wolfgang Grandegger wrote:
> Am 12.03.2017 um 21:53 schrieb Arnout Vandecappelle:
>>
>>
[snip]
>>  You mean 19 seconds for the C program vs 22 seconds for cmp? That's what I
>> would expect indeed because cmp basically does the same. In my measurements,
>> readelf still was significantly slower - I didn't write down the numbers but
>> from memory it was like 25%.
> 
> No, 19 sec for the C program just checking the first 4 bytes. 22 with patchelf.
> And with readelf I think it was close to 19 sec as well. I will doublecheck,
> though.

 I think I had a bigger difference, but anyway even 20% isn't really worth it,
probably.

> 
>>
>>> patchelf is
>>> written in C++... maybe that's the reason why it's slower.
>>
>>  Well, it's rather because patchelf reads the entire file into a std::vector,
>> while readelf will just seek to the bits that are requested.
>>
>>
>>
>>  By the way, did you already post your patches to the patchelf list?
> 
> I'm currently preparing a new patch series. I did not find a mailing list for
> the patchelf project. Maybe I have to use the GIT hub interface.

 Yes, github pull request is the way to go.

 Regards,
 Arnout
Wolfgang Grandegger March 14, 2017, 7:36 a.m. UTC | #11
Hello Arnout,

Am 13.03.2017 um 18:08 schrieb Arnout Vandecappelle:
>
>
> On 12-03-17 22:10, Wolfgang Grandegger wrote:
>> Am 12.03.2017 um 21:53 schrieb Arnout Vandecappelle:
>>>
>>>
> [snip]
>>>  You mean 19 seconds for the C program vs 22 seconds for cmp? That's what I
>>> would expect indeed because cmp basically does the same. In my measurements,
>>> readelf still was significantly slower - I didn't write down the numbers but
>>> from memory it was like 25%.
>>
>> No, 19 sec for the C program just checking the first 4 bytes. 22 with patchelf.
>> And with readelf I think it was close to 19 sec as well. I will doublecheck,
>> though.
>
>  I think I had a bigger difference, but anyway even 20% isn't really worth it,
> probably.

Well, I just learned that it's very difficult to do reproducible 
measurements with my laptop due to frequency scaling, boost, 
temperature, etc. :(.

>>
>>>
>>>> patchelf is
>>>> written in C++... maybe that's the reason why it's slower.
>>>
>>>  Well, it's rather because patchelf reads the entire file into a std::vector,
>>> while readelf will just seek to the bits that are requested.
>>>
>>>
>>>
>>>  By the way, did you already post your patches to the patchelf list?
>>
>> I'm currently preparing a new patch series. I did not find a mailing list for
>> the patchelf project. Maybe I have to use the GIT hub interface.
>
>  Yes, github pull request is the way to go.

OK, will do with a CC to this list. More soon...

Wolfgang.
diff mbox

Patch

diff --git a/support/scripts/fix-rpath b/support/scripts/fix-rpath
new file mode 100755
index 0000000..bde2c17
--- /dev/null
+++ b/support/scripts/fix-rpath
@@ -0,0 +1,106 @@ 
+#!/usr/bin/env bash
+
+# Copyright (C) 2016 Samuel Martin <s.martin49@gmail.com>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+
+#set -e
+#set -v
+
+usage() {
+  cat <<EOF >&2
+Usage:  ${0} TREE_KIND TREE_ROOT
+
+Description:
+
+        This script scans a tree and sanitize ELF files' RPATH found in there.
+
+        Sanitization behaves the same whatever the kindd of the processed tree, but
+        the resulting RPATH differs.
+
+        Sanitization action:
+        - remove RPATH pointing outside of the tree
+        - for RPATH pointing in the tree:
+          - if they point to standard location (/lib, /usr/lib): remove them
+          - otherwise: make them relative using \$ORIGIN
+
+        For the target tree:
+        - scan the whole tree for sanitization
+
+        For the staging tree :
+        - scan the whole tree for sanitization
+
+        For the host tree:
+        - skip the staging tree for sanitization
+        - add \$HOST_DIR/{lib,usr/lib} to RPATH (as relative pathes)
+
+Arguments:
+
+        TREE_KIND   Kind of tree to be processed.
+                    Allowed values: host, target, staging
+
+        TREE_ROOT   Path to the root of the tree to be scaned
+
+Environment:
+
+        PATCHELF        patchelf program to use
+                        (default: patchelf)
+EOF
+}
+
+: ${PATCHELF:=patchelf}
+
+main() {
+    local tree="${1}"
+    local basedir="$(readlink -f "${2}")"
+
+    local find_args=( "${basedir}" )
+    local sanitize_extra_args=( )
+
+    case "${tree}" in
+        host)
+            # do not process the sysroot (only contains target binaries)
+            find_args+=( "-name" "sysroot" "-prune" "-o" )
+
+            # do not process the external toolchain installation directory to
+            # to avoid breaking it.
+            find_args+=( "-path" "*/opt/ext-toolchain" "-prune" "-o" )
+
+            ;;
+        staging|target)
+            # discard ${hostdir}/lib and ${hostdir}/usr/lib
+            sanitize_extra_args+=( "--no-standard-lib-dirs" )
+
+            ;;
+        *)
+            usage
+            exit 1
+            ;;
+    esac
+
+    find_args+=( "-type" "f" "-print" )
+
+    while read file ; do
+        # some files are not writable
+        chmod u+w $file
+        # call patchelf for each regular file; error will silently be ignored.
+        ${PATCHELF} --debug --make-rpath-relative "${basedir}" ${sanitize_extra_args[@]} "${file}" >> /dev/null 2>&1
+    done < <(find ${find_args[@]})
+
+    # ignore errors
+    return 0
+}
+
+main ${@}