diff mbox

[v7,2/9] support/scripts: add fix-rpath script to sanitize the rpath

Message ID 75ca9a71-aaf5-dd21-f71f-4bcfc2880faa@grandegger.com
State Not Applicable
Headers show

Commit Message

Wolfgang Grandegger July 20, 2017, 9:03 a.m. UTC
Am 20.07.2017 um 10:15 schrieb Arnout Vandecappelle:
> 
> 
> On 20-07-17 09:31, Wolfgang Grandegger wrote:
>> Am 20.07.2017 um 01:02 schrieb Arnout Vandecappelle:
>>>
>>>
>>> On 05-07-17 18:53, Wolfgang Grandegger wrote:
> [snip]
>>>> +Environment:
>>>> +
>>>> +    PATCHELF    patchelf program to use
>>>> +        (default: HOST_DIR/bin/patchelf)
>>>
>>>    And also HOST_DIR, TARGET_DIR, STAGING_DIR, and
>>> TOOLCHAIN_EXTERNAL_DOWNLOAD_INSTALL_DIR.
>>
>> PATCHELF points to the patchelf utility! Or what do you mean here?
> 
>   I mean, also these environment variables are used by the script. Moreover, the
> _DIR variables are not optional.

Got it!

> [snip]
>>>> +    target)
>>>> +        rootdir="${TARGET_DIR}"
>>>> +        sanitize_extra_args+=( "--no-standard-lib-dirs" "--relative-to-file" )
>>>
>>>    I thought we decided that target and staging should NOT be relative-to-file,
>>> but should be absolute paths without rootdir? Or was this a problem for staging,
>>> because the linker would add the RUNPATH to the library search path and this
>>> would point to the system libs instead of target libs? According to the man page
>>> this should only be the case for a native linker, so not for a cross linker.
>>
>> Grrr, that's wrong, cut an paste error :(. Fixed!
> 
>   But staging stays the same as target, right?
> 
>   And did you add comments everywhere to explain why these options are chosen?

    case "${tree}" in
        host)
            ...
            # we always want $ORIGIN-based rpaths to make it relocatable.
            sanitize_extra_args+=( "--relative-to-file" )
            ;;

        staging)
            ...
            # should be like for the target tree below
            sanitize_extra_args+=( "--no-standard-lib-dirs" )
            ;;

        target)
            ...
            # we don't want $ORIGIN-based rpaths but absolute paths without rootdir.
            # we also want to remove rpaths pointing to /lib or /usr/lib. 
            sanitize_extra_args+=( "--no-standard-lib-dirs" )
            ;;

        ...
    esac


> 
>>
>>>    I don't really like having $ORIGIN-based rpaths in target, but it's no big
>>> deal.
>>>
>>>    Regardless, the reasoning behind the choice we make should be documented here
>>> in comments. So for host, add something to the effect that we always want
>>> $ORIGIN-based rpaths to make it relocatable. For staging, say that we want it to
>>> be the same as target. And for target, explain the choice you made.
>>>
>>>
>>>> +    while read file ; do
>>>> +    # check if it's an ELF file
>>>
>>>    Indentation is inconsistent: sometimes 4 spaces, sometimes 2 spaces (inside the
>>
>> That's due to 8 spaces replaced by one tab.
> 
>   Fix your editor not to do that.
> 
>>
>>> case, and in the condition below). And here, you're switching from spaces to
>>> tabs!
>>
>> I use the default indention from emacs "Shell-script[bash]" which is usually not
>> a bad choice. I'm going to replace all tabs with spaces.
> 
>   Since emacs, vim and kate all have compatible mode-settings lines, feel free to
> add a mode-setting line below the shebang of the script. Do test it with vim please.
> 
>>
>>>> +    if ${PATCHELF} --print-rpath "${file}" > /dev/null 2>&1; then
>>>
>>>    It might be good to check if the file actually has a non-empty rpath. When
>>> processing a second time, most rpaths are empty so they can be skipped.
>>
>> I already tried that. It will make the check slower, meaning it costs more that
>> calling patchelf a second time.
>>
>>>    Maybe even add another patch to patchelf to exit early and quietly with just 0
>>> or 1 depending on rpath presence.
>>
>> Optimization of the execution time needs more careful analysis. I have a
>> patchelf patch which just open and reads the patchelf header to check quickly if
>> it's an ELF file. But that does not help a lot. readelf is still faster and I
>> guess that's because it's written in C (and not C++).
> 
>   I have a few speedup suggestions for patchelf (but all should be in follow-up
> patches, not as part of patch 1/9).

As I said, the execution time comes form scanning for ELF files. The rest doesn't
matter a lot. The attached patch speeds up ELF file checking. As said, readelf
is still faster even with that patch.
 
> - Add a --has-rpath option that is quiet and just exits 0 or 1.
> - Instead of reading into a string, mmap the file. Since in most cases the file
> size isn't changed, that works *very* efficiently. Note that this needs a
> configure.ac check (mmap is not always available, e.g. not on nommu). With mmap,
> it's not needed to first read the header only, since it will just pull in a page
> when needed. It will be slower on NFS and some FUSE filesystems though.

OK, will try that when time permits.

> - Remove empty RUNPATH entries, so a second run doesn't even start.
> - Link patchelf statically, preferably with -flto as well. This will probably
> just speed things up with a few %, but it's also on the "empty" calls so could
> be significant.

Wolfgang.

Comments

Arnout Vandecappelle July 20, 2017, 1:45 p.m. UTC | #1
On 20-07-17 11:03, Wolfgang Grandegger wrote:
>     case "${tree}" in
>         host)
>             ...
>             # we always want $ORIGIN-based rpaths to make it relocatable.

 Perfect. Just please start the sentence with a capital (it ends with a .)

>             sanitize_extra_args+=( "--relative-to-file" )
>             ;;
> 
>         staging)
>             ...
>             # should be like for the target tree below

 s/like/the same as/

>             sanitize_extra_args+=( "--no-standard-lib-dirs" )
>             ;;
> 
>         target)
>             ...
>             # we don't want $ORIGIN-based rpaths but absolute paths without rootdir.
>             # we also want to remove rpaths pointing to /lib or /usr/lib. 

 Capitalization, otherwise perfect.

[snip]
> As I said, the execution time comes form scanning for ELF files. The rest doesn't
> matter a lot.

 Instead of the patch you attached, it might actually make more sense to create
a tiny C file that does this, and execute that as part of the find command. Or
you could just create support/scripts/elfmagic with those 4 bytes and use "cmp
-n 4 {} ${0%/*}/elfmagic".

 By the way, I believe filtering with -exec is much faster than doing it with
while/if. Unfortunately you can't use that with patchelf --print-rpath because
the output of patchelf would be mixed up with the find output.

 Regards,
 Arnout

> The attached patch speeds up ELF file checking. As said, readelf
> is still faster even with that patch.
[snip]
Wolfgang Grandegger July 20, 2017, 2:08 p.m. UTC | #2
Hello Arnout,

Am 20.07.2017 um 15:45 schrieb Arnout Vandecappelle:
> 
> 
> On 20-07-17 11:03, Wolfgang Grandegger wrote:
>>      case "${tree}" in
>>          host)
>>              ...
>>              # we always want $ORIGIN-based rpaths to make it relocatable.
> 
>   Perfect. Just please start the sentence with a capital (it ends with a .)

Well, I followed the style of the original script... just to be 
consistent... otherwise we will have an ugly mix.

> 
>>              sanitize_extra_args+=( "--relative-to-file" )
>>              ;;
>>
>>          staging)
>>              ...
>>              # should be like for the target tree below
> 
>   s/like/the same as/
> 
>>              sanitize_extra_args+=( "--no-standard-lib-dirs" )
>>              ;;
>>
>>          target)
>>              ...
>>              # we don't want $ORIGIN-based rpaths but absolute paths without rootdir.
>>              # we also want to remove rpaths pointing to /lib or /usr/lib.
> 
>   Capitalization, otherwise perfect.
> 
> [snip]
>> As I said, the execution time comes form scanning for ELF files. The rest doesn't
>> matter a lot.
> 
>   Instead of the patch you attached, it might actually make more sense to create
> a tiny C file that does this, and execute that as part of the find command. Or
> you could just create support/scripts/elfmagic with those 4 bytes and use "cmp
> -n 4 {} ${0%/*}/elfmagic".

Yes, I had that idea as well and I have already hacked something. The 
problem is that it does not check if it does have an rpath. "cmp" is not 
faster, I already tried!

>   By the way, I believe filtering with -exec is much faster than doing it with
> while/if. Unfortunately you can't use that with patchelf --print-rpath because
> the output of patchelf would be mixed up with the find output.

I already tried various other solutions.

Just preparing v8...

Wolfgang.
Arnout Vandecappelle July 20, 2017, 2:19 p.m. UTC | #3
On 20-07-17 16:08, Wolfgang Grandegger wrote:
> Hello Arnout,
> 
> Am 20.07.2017 um 15:45 schrieb Arnout Vandecappelle:
>>
>>
>> On 20-07-17 11:03, Wolfgang Grandegger wrote:
>>>      case "${tree}" in
>>>          host)
>>>              ...
>>>              # we always want $ORIGIN-based rpaths to make it relocatable.
>>
>>   Perfect. Just please start the sentence with a capital (it ends with a .)
> 
> Well, I followed the style of the original script... just to be consistent...
> otherwise we will have an ugly mix.

 There is no original script, only an original author :-)

 So indeed, fix it script-wide please.

[snip]
>>> As I said, the execution time comes form scanning for ELF files. The rest
>>> doesn't
>>> matter a lot.
>>
>>   Instead of the patch you attached, it might actually make more sense to create
>> a tiny C file that does this, and execute that as part of the find command. Or
>> you could just create support/scripts/elfmagic with those 4 bytes and use "cmp
>> -n 4 {} ${0%/*}/elfmagic".
> 
> Yes, I had that idea as well and I have already hacked something. The problem is
> that it does not check if it does have an rpath.

 Well, you could do that particular check in the find -exec and keep the
--print-rpath check in the while loop.


> "cmp" is not faster, I already
> tried!

 OK good. Might be a good idea to keep track of the things you already tried in
the cover letter.


 Regards,
 Arnout


> 
>>   By the way, I believe filtering with -exec is much faster than doing it with
>> while/if. Unfortunately you can't use that with patchelf --print-rpath because
>> the output of patchelf would be mixed up with the find output.
> 
> I already tried various other solutions.
> 
> Just preparing v8...
> 
> Wolfgang.
>
Wolfgang Grandegger July 20, 2017, 4:50 p.m. UTC | #4
Am 20.07.2017 um 16:19 schrieb Arnout Vandecappelle:
> 
> 
> On 20-07-17 16:08, Wolfgang Grandegger wrote:
>> Hello Arnout,
>>
>> Am 20.07.2017 um 15:45 schrieb Arnout Vandecappelle:
>>>
>>>
>>> On 20-07-17 11:03, Wolfgang Grandegger wrote:
>>>>       case "${tree}" in
>>>>           host)
>>>>               ...
>>>>               # we always want $ORIGIN-based rpaths to make it relocatable.
>>>
>>>    Perfect. Just please start the sentence with a capital (it ends with a .)
>>
>> Well, I followed the style of the original script... just to be consistent...
>> otherwise we will have an ugly mix.
> 
>   There is no original script, only an original author :-)

I have no problem to adapt the style of the original author(s) ;). It 
should just be consistent.

>   So indeed, fix it script-wide please.
> 
> [snip]
>>>> As I said, the execution time comes form scanning for ELF files. The rest
>>>> doesn't
>>>> matter a lot.
>>>
>>>    Instead of the patch you attached, it might actually make more sense to create
>>> a tiny C file that does this, and execute that as part of the find command. Or
>>> you could just create support/scripts/elfmagic with those 4 bytes and use "cmp
>>> -n 4 {} ${0%/*}/elfmagic".
>>
>> Yes, I had that idea as well and I have already hacked something. The problem is
>> that it does not check if it does have an rpath.
> 
>   Well, you could do that particular check in the find -exec and keep the
> --print-rpath check in the while loop.
> 
> 
>> "cmp" is not faster, I already
>> tried!
> 
>   OK good. Might be a good idea to keep track of the things you already tried in
> the cover letter.

Well, it's already a few month ago that I tried finding a good and fast 
solution. I tried a lot of things, be can't remember the details. My 
time working on this topic is limited! When I have more free time I will 
take care.

Wolfgang.
Arnout Vandecappelle July 20, 2017, 5:05 p.m. UTC | #5
On 20-07-17 18:50, Wolfgang Grandegger wrote:
> 
> 
> Am 20.07.2017 um 16:19 schrieb Arnout Vandecappelle:
[snip]
>>   OK good. Might be a good idea to keep track of the things you already tried in
>> the cover letter.
> 
> Well, it's already a few month ago that I tried finding a good and fast
> solution. I tried a lot of things, be can't remember the details. 

 That's exactly why I ask to keep track of it in the cover letter. :-)

 Since you already posted the series, no need to do it now. But when you post
patches that speed things up, please try to write down what you tested (at least
the things that you are sure of).

 Regards,
 Arnout

> My time
> working on this topic is limited! When I have more free time I will take care.
> 
> Wolfgang.
diff mbox

Patch

Index: patchelf-0.9/src/patchelf.cc
===================================================================
--- patchelf-0.9.orig/src/patchelf.cc	2017-07-04 19:38:55.744223853 +0200
+++ patchelf-0.9/src/patchelf.cc	2017-07-05 09:43:42.417493549 +0200
@@ -317,20 +317,30 @@ 
 }
 
 
-static void readFile(string fileName)
+static void readElfFile(string fileName)
 {
     struct stat st;
+    unsigned char elfmag[SELFMAG];
+
+    int fd = open(fileName.c_str(), O_RDONLY);
+    if (fd == -1) error("open");
+
+    if (read(fd, elfmag, SELFMAG) != SELFMAG) error("read");
+
+    if (memcmp(elfmag, ELFMAG, SELFMAG) != 0)
+        error("not an ELF executable");
+
     if (stat(fileName.c_str(), &st) != 0) error("stat");
+
     fileSize = st.st_size;
     maxSize = fileSize + 32 * 1024 * 1024;
-
     contents = (unsigned char *) malloc(fileSize + maxSize);
     if (!contents) abort();
+    memcpy(contents, elfmag, SELFMAG);
 
-    int fd = open(fileName.c_str(), O_RDONLY);
-    if (fd == -1) error("open");
-
-    if (read(fd, contents, fileSize) != fileSize) error("read");
+    /* Read the rest of the file */
+    if (read(fd, contents + SELFMAG, fileSize - SELFMAG)
+	!= fileSize - SELFMAG) error("read");
 
     close(fd);
 }
@@ -1609,15 +1619,11 @@ 
 
     debug("Kernel page size is %u bytes\n", getPageSize());
 
-    readFile(fileName);
-
+    readElfFile(fileName);
 
     /* Check the ELF header for basic validity. */
     if (fileSize < (off_t) sizeof(Elf32_Ehdr)) error("missing ELF header");
 
-    if (memcmp(contents, ELFMAG, SELFMAG) != 0)
-        error("not an ELF executable");
-
     if (contents[EI_CLASS] == ELFCLASS32 &&
         contents[EI_VERSION] == EV_CURRENT)
     {