diff mbox series

[10/19] support: rewrite check-bin-arch in python

Message ID 638ba4cafb8ab013365ae54a393b49f459bc6e74.1546898693.git.yann.morin.1998@free.fr
State Changes Requested
Headers show
Series [01/19] infra/pkg-generic: display MESSAGE before running PRE_HOOKS | expand

Commit Message

Yann E. MORIN Jan. 7, 2019, 10:05 p.m. UTC
The existing check-bin-arch script is written in shell, so it can't make
use of all the helpers we have in python, especially the parser for the
package-files lists.

Although this script is relatively clean as it is, it is not totally
fool-proof either, especially against weird filenames (e.g.
specially-crafted filenames with \n, or \b, etc...).

Also, shell scripts are often frowned upon but for the most mundane
processing, and this script is definitely not mundane.

Finally, shell scripts are slow, as all the processing the have to do is
more often than not done by spawning programs, and that is relatively
expensive.

Rewrite that script in python.

This allows to do cleaner processing and reuse the package-files list
parser.

There's however a drawback: the script grows substantially, in part
because of spawning the actual readelf call (there is no ELF parser in
the standard python library), and because we want to keep backward
compatible with old pythons that lack proper abstractions like
subprocess.DEVNULL et al.

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Thomas De Schampheleire <patrickdepinguin@gmail.com>
---
 support/scripts/check-bin-arch | 205 +++++++++++++++++++++++------------------
 1 file changed, 113 insertions(+), 92 deletions(-)

Comments

Thomas De Schampheleire Jan. 8, 2019, 2:56 p.m. UTC | #1
El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribió:
>
> The existing check-bin-arch script is written in shell, so it can't make
> use of all the helpers we have in python, especially the parser for the
> package-files lists.
>
> Although this script is relatively clean as it is, it is not totally
> fool-proof either, especially against weird filenames (e.g.
> specially-crafted filenames with \n, or \b, etc...).
>
> Also, shell scripts are often frowned upon but for the most mundane
> processing, and this script is definitely not mundane.
>
> Finally, shell scripts are slow, as all the processing the have to do is
> more often than not done by spawning programs, and that is relatively
> expensive.
>
> Rewrite that script in python.
>
> This allows to do cleaner processing and reuse the package-files list
> parsuer.
>
> There's however a drawback: the script grows substantially, in part
> because of spawning the actual readelf call (there is no ELF parser in
> the standard python library), and because we want to keep backward
> compatible with old pythons that lack proper abstractions like
> subprocess.DEVNULL et al.
>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> Cc: Thomas De Schampheleire <patrickdepinguin@gmail.com>
> ---
>  support/scripts/check-bin-arch | 205 +++++++++++++++++++++++------------------
>  1 file changed, 113 insertions(+), 92 deletions(-)
>
> diff --git a/support/scripts/check-bin-arch b/support/scripts/check-bin-arch
> index 66b8d89932..d4902163e7 100755
> --- a/support/scripts/check-bin-arch
> +++ b/support/scripts/check-bin-arch
> @@ -1,92 +1,113 @@
> -#!/usr/bin/env bash
> -
> -# List of hardcoded paths that should be ignored, as they may
> -# contain binaries for an architecture different from the
> -# architecture of the target.
> -declare -a IGNORES=(
> -       # Skip firmware files, they could be ELF files for other
> -       # architectures
> -       "/lib/firmware"
> -       "/usr/lib/firmware"
> -
> -       # Skip kernel modules
> -       # When building a 32-bit userland on 64-bit architectures, the kernel
> -       # and its modules may still be 64-bit. To keep the basic
> -       # check-bin-arch logic simple, just skip this directory.
> -       "/lib/modules"
> -       "/usr/lib/modules"
> -
> -       # Skip files in /usr/share, several packages (qemu,
> -       # pru-software-support) legitimately install ELF binaries that
> -       # are not for the target architecture
> -       "/usr/share"
> -
> -       # Skip files in /lib/grub, since it is possible to have it
> -       # for a different architecture (e.g. i386 grub on x86_64).
> -       "/lib/grub"
> -)
> -
> -while getopts p:l:r:a:i: OPT ; do
> -       case "${OPT}" in
> -       p) package="${OPTARG}";;
> -       l) pkg_list="${OPTARG}";;
> -       r) readelf="${OPTARG}";;
> -       a) arch_name="${OPTARG}";;
> -       i)
> -               # Ensure we do have single '/' as separators,
> -               # and that we have a leading and a trailing one.
> -               pattern="$(sed -r -e 's:/+:/:g; s:^/*:/:; s:/*$:/:;' <<<"${OPTARG}")"
> -               IGNORES+=("${pattern}")
> -               ;;
> -       :) error "option '%s' expects a mandatory argument\n" "${OPTARG}";;
> -       \?) error "unknown option '%s'\n" "${OPTARG}";;
> -       esac
> -done
> -
> -if test -z "${package}" -o -z "${pkg_list}" -o -z "${readelf}" -o -z "${arch_name}" ; then
> -       echo "Usage: $0 -p <pkg> -l <pkg-file-list> -r <readelf> -a <arch name> [-i PATH ...]"
> -       exit 1
> -fi
> -
> -exitcode=0
> -
> -# Only split on new lines, for filenames-with-spaces
> -IFS="
> -"
> -
> -while read f; do
> -       for ignore in "${IGNORES[@]}"; do
> -               if [[ "${f}" =~ ^"${ignore}" ]]; then
> -                       continue 2
> -               fi
> -       done
> -
> -       # Skip symlinks. Some symlinks may have absolute paths as
> -       # target, pointing to host binaries while we're building.
> -       if [[ -L "${TARGET_DIR}/${f}" ]]; then
> -               continue
> -       fi
> -
> -       # Get architecture using readelf. We pipe through 'head -1' so
> -       # that when the file is a static library (.a), we only take
> -       # into account the architecture of the first object file.
> -       arch=$(LC_ALL=C ${readelf} -h "${TARGET_DIR}/${f}" 2>&1 | \
> -                      sed -r -e '/^  Machine: +(.+)/!d; s//\1/;' | head -1)
> -
> -       # If no architecture found, assume it was not an ELF file
> -       if test "${arch}" = "" ; then
> -               continue
> -       fi
> -
> -       # Architecture is correct
> -       if test "${arch}" = "${arch_name}" ; then
> -               continue
> -       fi
> -
> -       printf 'ERROR: architecture for "%s" is "%s", should be "%s"\n' \
> -              "${f}" "${arch}" "${arch_name}"
> -
> -       exitcode=1
> -done < <( sed -r -e "/^${package},\.(.+)$/!d; s//\1/;" ${pkg_list} )
> -
> -exit ${exitcode}
> +#!/usr/bin/env python
> +
> +import argparse
> +import os
> +import re
> +import subprocess
> +import sys
> +from brpkgutil import parse_pkg_file_list as parse_pkg_file_list
> +
> +
> +# List of hardcoded paths that should be ignored, as they may contain
> +# binaries for an architecture different from the architecture of the
> +# target
> +IGNORES = {
> +    # Skip firmware files
> +    # They could be ELF files for other architectures.
> +    '/lib/firmware',
> +    '/usr/lib/firmware',
> +
> +    # Skip kernel modules
> +    # When building a 32-bit userland on 64-bit architectures, the
> +    # kernel and its modules may still be 64-bit. To keep the basic
> +    # check-bin-arch logic simple, just skip these directories
> +    '/lib/modules',
> +    '/usr/lib/modules',
> +
> +    # Skip files in /usr/share
> +    # Several packages (qemu, pru-software-support) legitimately install
> +    # ELF binaries that are not for the target architecture
> +    '/usr/share',
> +
> +    # Skip files in /lib/grub
> +    # It is possible to have it for a different architecture (e.g. i386
> +    # grub on x86_64)
> +    '/lib/grub',
> +}
> +
> +
> +ERROR = 'ERROR: architecture for "%s" is "%s", should be "%s"\n'
> +
> +
> +def main():
> +    parser = argparse.ArgumentParser()
> +    parser.add_argument('--package', '-p', metavar='PACKAGE', required=True)
> +    parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True)
> +    parser.add_argument('--readelf', '-r', metavar='READELF', required=True)
> +    parser.add_argument('--arch', '-a', metavar='ARCH', required=True)
> +    parser.add_argument('--ignore', '-i', metavar='PATH', action='append')

I think the above 'metavar' options are not necessary, as they are the
default value.

> +    args = parser.parse_args()
> +
> +    if args.ignore is not None:
> +        # Ensure we do have single '/' as separators, and that we have a
> +        # leading and a trailing one, then append to the global list.
> +        for pattern in args.ignore:
> +            IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern)))
> +

Note that the global list itself does not have trailing slashes. This
seems inconsistent to me.

> +    ignores_re = set()
> +    for i in IGNORES:
> +        ignores_re.add(re.compile(i))
> +
> +    arch_re = re.compile('^  Machine: +(.+)')
> +
> +    target_dir = os.environ['TARGET_DIR']
> +
> +    exit_code = 0
> +    for record in parse_pkg_file_list(args.pkg_list):
> +        if record['pkg'] != args.package:
> +            continue
> +
> +        fpath = record['file']
> +
> +        ignored = False
> +        for i in ignores_re:
> +            if i.match(fpath):
> +                ignored = True
> +                break
> +        if ignored:
> +            continue

We can never get into this if, right?, because there is already a
break whenever ignored is set to True.

Note that I think that performance will be better with a list
comprehension instead of explicit for's. Something like (untested):

valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
            if record['pkg'] == args.package
            and not any(ignore_re.match(record['file']) for ignore_re
in ignores_re) ]
for record in valid_records:
    ...


You can normally swap the [ ] for ( ) to reduce memory consumption
(generator iso list comprehension, yielding one entry at a time).

You can argue about readability, but for me this is not less readable
than the expanded version.

/Thomas
Yann E. MORIN Jan. 8, 2019, 4:37 p.m. UTC | #2
On 2019-01-08 15:56 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribió:
[--SNIP--]
> > Rewrite that script in python.
[--SNIP--]
> > +def main():
> > +    parser = argparse.ArgumentParser()
> > +    parser.add_argument('--package', '-p', metavar='PACKAGE', required=True)
> > +    parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True)
> > +    parser.add_argument('--readelf', '-r', metavar='READELF', required=True)
> > +    parser.add_argument('--arch', '-a', metavar='ARCH', required=True)
> > +    parser.add_argument('--ignore', '-i', metavar='PATH', action='append')
> I think the above 'metavar' options are not necessary, as they are the
> default value.

I'll check and drop if they indeed are not.

> > +    args = parser.parse_args()
> > +
> > +    if args.ignore is not None:
> > +        # Ensure we do have single '/' as separators, and that we have a
> > +        # leading and a trailing one, then append to the global list.
> > +        for pattern in args.ignore:
> > +            IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern)))
> Note that the global list itself does not have trailing slashes. This
> seems inconsistent to me.

It is inconsistent, but I kept the existing behaviour intact as much as
possible, so the python script is a drop-in replacement for the shjell
script, with the same semantics.

> > +    ignores_re = set()
> > +    for i in IGNORES:
> > +        ignores_re.add(re.compile(i))
> > +
> > +    arch_re = re.compile('^  Machine: +(.+)')
> > +
> > +    target_dir = os.environ['TARGET_DIR']
> > +
> > +    exit_code = 0
> > +    for record in parse_pkg_file_list(args.pkg_list):
> > +        if record['pkg'] != args.package:
> > +            continue
> > +
> > +        fpath = record['file']
> > +
> > +        ignored = False
> > +        for i in ignores_re:
> > +            if i.match(fpath):
> > +                ignored = True
> > +                break
> > +        if ignored:
> > +            continue
> 
> We can never get into this if, right?, because there is already a
> break whenever ignored is set to True.

Yes we can, because the break only applied to the inner-most loop, which
is the one that iterates over the ignores regexps.

> Note that I think that performance will be better with a list
> comprehension instead of explicit for's. Something like (untested):
> 
> valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
>             if record['pkg'] == args.package
>             and not any(ignore_re.match(record['file']) for ignore_re
> in ignores_re) ]

Sorry, but this is totally illegible to me.

> for record in valid_records:
>     ...
> 
> 
> You can normally swap the [ ] for ( ) to reduce memory consumption
> (generator iso list comprehension, yielding one entry at a time).
> 
> You can argue about readability, but for me this is not less readable
> than the expanded version.

Well, for me your version is totally unredable, sorry... :-/

The one I wrote at least clearly describes what it is doing:

  - for each record in the list:

    - check if the package is the one we're looking at, if not continue to
      next record

    - for each regecp to ignore, check if the filename matches it, and
      if one is matched, memorizre the fact and break the loop (fast-path).

    - if the file is to be ignored, continue to the next record

and so on...

The size argument is moot IMO, because we do not have thousands of
regexps to ignore, and the parse_pkg_file_list() already yields its
results.

Regards,
Yann E. MORIN.
Thomas De Schampheleire Jan. 8, 2019, 5:22 p.m. UTC | #3
El mar., 8 ene. 2019 a las 17:37, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribió:
>
> On 2019-01-08 15:56 +0100, Thomas De Schampheleire spake thusly:
> > El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> > (<yann.morin.1998@free.fr>) escribió:
> [--SNIP--]
> > > Rewrite that script in python.
> [--SNIP--]
> > > +def main():
> > > +    parser = argparse.ArgumentParser()
> > > +    parser.add_argument('--package', '-p', metavar='PACKAGE', required=True)
> > > +    parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True)
> > > +    parser.add_argument('--readelf', '-r', metavar='READELF', required=True)
> > > +    parser.add_argument('--arch', '-a', metavar='ARCH', required=True)
> > > +    parser.add_argument('--ignore', '-i', metavar='PATH', action='append')
> > I think the above 'metavar' options are not necessary, as they are the
> > default value.
>
> I'll check and drop if they indeed are not.
>
> > > +    args = parser.parse_args()
> > > +
> > > +    if args.ignore is not None:
> > > +        # Ensure we do have single '/' as separators, and that we have a
> > > +        # leading and a trailing one, then append to the global list.
> > > +        for pattern in args.ignore:
> > > +            IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern)))
> > Note that the global list itself does not have trailing slashes. This
> > seems inconsistent to me.
>
> It is inconsistent, but I kept the existing behaviour intact as much as
> possible, so the python script is a drop-in replacement for the shjell
> script, with the same semantics.
>
> > > +    ignores_re = set()
> > > +    for i in IGNORES:
> > > +        ignores_re.add(re.compile(i))
> > > +
> > > +    arch_re = re.compile('^  Machine: +(.+)')
> > > +
> > > +    target_dir = os.environ['TARGET_DIR']
> > > +
> > > +    exit_code = 0
> > > +    for record in parse_pkg_file_list(args.pkg_list):
> > > +        if record['pkg'] != args.package:
> > > +            continue
> > > +
> > > +        fpath = record['file']
> > > +
> > > +        ignored = False
> > > +        for i in ignores_re:
> > > +            if i.match(fpath):
> > > +                ignored = True
> > > +                break
> > > +        if ignored:
> > > +            continue
> >
> > We can never get into this if, right?, because there is already a
> > break whenever ignored is set to True.
>
> Yes we can, because the break only applied to the inner-most loop, which
> is the one that iterates over the ignores regexps.

Ok, I see.

>
> > Note that I think that performance will be better with a list
> > comprehension instead of explicit for's. Something like (untested):
> >
> > valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
> >             if record['pkg'] == args.package
> >             and not any(ignore_re.match(record['file']) for ignore_re
> > in ignores_re) ]
>
> Sorry, but this is totally illegible to me.

:-D

Ok, I won't argue about the fact itself.
Just, as a reference, small decomposition:

list = [ f(x) for x in otherlist ]

is a 'list comprehension' and is a performant way to generate a list
without requiring a for loop.
f(x) can be any action on x, e.g. x.strip() or more complex
expressions involving x.

The 'for x in otherlist' can be restricted further with 'if' clauses.
In the code I gave above, the base list is the return value of
'parse_pkg_file_list(..)' and it is filtered by two clauses.
The first one:
    record['pkg'] == args.package
only preserves entries in the base list that match the package we are
interested in.
This is the equivalent of:
       if record['pkg'] != args.package:
           continue

and the second clause:
    not any(ignore_re.match(record['file']) for ignore_re in ignores_re)

Here, any(x) is a function that takes an iterable (x) and returns True
if any of the items in 'x' evaluate to True.
Similar to any(x) there is also all(x) which only returns True if
_all_ items in x evaluate to True.

The value passed to 'any' is '(ignore_re.match(record['file']) for
ignore_re in ignores_re)'
which itself is a type of list comprehension, except that it actually
is a generator comprehension. 'generator' is what you called an
'iterator' in another patch: a thing that 'yields' values one by one.
The difference does not really matter for this discussion, so the base
format is still:

f(x) for x in some_iterable

Here f(x) is ignore_re.match(..) and x is 'ignore_re' itself.

So the following code:
    valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
                      if record['pkg'] == args.package
                      and not any(ignore_re.match(record['file']) for
ignore_re in ignores_re) ]

naturally reads as:

valid_records  is the list of records returned by parse_pkg_file_list
for which the 'pkg' field is the one we want, and not any of our
ignore expression matches the 'file' field of the record.

I hope it makes things a bit more clear :)

/Thomas
Yann E. MORIN Jan. 8, 2019, 8:33 p.m. UTC | #4
Thomas DS, All,

On 2019-01-08 18:22 +0100, Thomas De Schampheleire spake thusly:
> El mar., 8 ene. 2019 a las 17:37, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribió:
[--SNIP--]
> > > Note that I think that performance will be better with a list
> > > comprehension instead of explicit for's. Something like (untested):
> > >
> > > valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
> > >             if record['pkg'] == args.package
> > >             and not any(ignore_re.match(record['file']) for ignore_re
> > > in ignores_re) ]
> >
> > Sorry, but this is totally illegible to me.
> 
> :-D
> 
> Ok, I won't argue about the fact itself.
> Just, as a reference, small decomposition:
> 
> list = [ f(x) for x in otherlist ]
> 
> is a 'list comprehension' and is a performant way to generate a list
> without requiring a for loop.
> f(x) can be any action on x, e.g. x.strip() or more complex
> expressions involving x.

Yeah, I know what it is [0], and I can actually decipher the code above
with quite a bit of mental processing. But that is it: it's deciphering,
not reading.

[0] I even use them: https://patchwork.ozlabs.org/patch/1021622/

[--SNIP--]
> Here, any(x) is a function that takes an iterable (x) and returns True
> if any of the items in 'x' evaluate to True.

Ah, I did not know about any() (although I did infer its behaviour from
its name!) Thanks! :-)

> Similar to any(x) there is also all(x) which only returns True if
> _all_ items in x evaluate to True.

And is there none() ? ;-) I guess it's "not all()"...

> The value passed to 'any' is '(ignore_re.match(record['file']) for
> ignore_re in ignores_re)'
> which itself is a type of list comprehension, except that it actually
> is a generator comprehension. 'generator' is what you called an
> 'iterator' in another patch: a thing that 'yields' values one by one.

Yep, generator, not iterator.

[--SNIP--]
> I hope it makes things a bit more clear :)

As I said, I can actually decipher it (well, once the indentation if
fixed anyway! ;-) ), but still it is less readable for me...

And since I would have hard a time justifying it, or would have a hard
time maintaining it in the future, I prefer writing (and thus keeping)
code that I can work with later.

If others find the comprehensions easier to comprehend (pun), then fine,
but I would be much less at ease with that.

Thanks! :-)

Regards,
Yann E. MORIN.
Thomas De Schampheleire Jan. 8, 2019, 8:46 p.m. UTC | #5
El mar., 8 ene. 2019 a las 21:33, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribió:
>
> Thomas DS, All,
>
> On 2019-01-08 18:22 +0100, Thomas De Schampheleire spake thusly:
> > El mar., 8 ene. 2019 a las 17:37, Yann E. MORIN
> > (<yann.morin.1998@free.fr>) escribió:
> [--SNIP--]
> > > > Note that I think that performance will be better with a list
> > > > comprehension instead of explicit for's. Something like (untested):
> > > >
> > > > valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
> > > >             if record['pkg'] == args.package
> > > >             and not any(ignore_re.match(record['file']) for ignore_re
> > > > in ignores_re) ]
> > >
> > > Sorry, but this is totally illegible to me.
> >
> > :-D
> >
> > Ok, I won't argue about the fact itself.
> > Just, as a reference, small decomposition:
> >
> > list = [ f(x) for x in otherlist ]
> >
> > is a 'list comprehension' and is a performant way to generate a list
> > without requiring a for loop.
> > f(x) can be any action on x, e.g. x.strip() or more complex
> > expressions involving x.
>
> Yeah, I know what it is [0], and I can actually decipher the code above
> with quite a bit of mental processing. But that is it: it's deciphering,
> not reading.
>
> [0] I even use them: https://patchwork.ozlabs.org/patch/1021622/
>
> [--SNIP--]
> > Here, any(x) is a function that takes an iterable (x) and returns True
> > if any of the items in 'x' evaluate to True.
>
> Ah, I did not know about any() (although I did infer its behaviour from
> its name!) Thanks! :-)
>
> > Similar to any(x) there is also all(x) which only returns True if
> > _all_ items in x evaluate to True.
>
> And is there none() ? ;-) I guess it's "not all()"...

none() does not exist but if you intend what I understand by its name
then it is:
'not any()'

>
> > The value passed to 'any' is '(ignore_re.match(record['file']) for
> > ignore_re in ignores_re)'
> > which itself is a type of list comprehension, except that it actually
> > is a generator comprehension. 'generator' is what you called an
> > 'iterator' in another patch: a thing that 'yields' values one by one.
>
> Yep, generator, not iterator.
>
> [--SNIP--]
> > I hope it makes things a bit more clear :)
>
> As I said, I can actually decipher it (well, once the indentation if
> fixed anyway! ;-) ), but still it is less readable for me...
>
> And since I would have hard a time justifying it, or would have a hard
> time maintaining it in the future, I prefer writing (and thus keeping)
> code that I can work with later.
>
> If others find the comprehensions easier to comprehend (pun), then fine,
> but I would be much less at ease with that.

By all means, keep your original code.

At least by talking about it, we have learnt each other's views and
shared some knowledge :-)

Best regards,
Thomas
Yann E. MORIN Jan. 8, 2019, 9:16 p.m. UTC | #6
Thomas, All,

On 2019-01-08 21:46 +0100, Thomas De Schampheleire spake thusly:
> El mar., 8 ene. 2019 a las 21:33, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribió:
> > And is there none() ? ;-) I guess it's "not all()"...
> none() does not exist but if you intend what I understand by its name
> then it is:
> 'not any()'

Dang right. "not all()" would mean at least one is not True, i.e.
"not all(x)" ≡ "any(not x)"

[--SNIP--]
> > If others find the comprehensions easier to comprehend (pun), then fine,
> > but I would be much less at ease with that.
> 
> By all means, keep your original code.
> 
> At least by talking about it, we have learnt each other's views and
> shared some knowledge :-)

Yes, I guess with writing more python I'd get more used to reading it.
And I certainly did appreciate the walkthrough, thanks. :-)

BTW, do you intend to join after FOSDEM?

Regards,
Yann E. MORIN.
Thomas De Schampheleire Jan. 9, 2019, 2:47 p.m. UTC | #7
El mar., 8 ene. 2019 a las 22:16, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribió:
>
> Thomas, All,
>
> On 2019-01-08 21:46 +0100, Thomas De Schampheleire spake thusly:
> > El mar., 8 ene. 2019 a las 21:33, Yann E. MORIN
> > (<yann.morin.1998@free.fr>) escribió:
> > > And is there none() ? ;-) I guess it's "not all()"...
> > none() does not exist but if you intend what I understand by its name
> > then it is:
> > 'not any()'
>
> Dang right. "not all()" would mean at least one is not True, i.e.
> "not all(x)" ≡ "any(not x)"
>
> [--SNIP--]
> > > If others find the comprehensions easier to comprehend (pun), then fine,
> > > but I would be much less at ease with that.
> >
> > By all means, keep your original code.
> >
> > At least by talking about it, we have learnt each other's views and
> > shared some knowledge :-)
>
> Yes, I guess with writing more python I'd get more used to reading it.
> And I certainly did appreciate the walkthrough, thanks. :-)
>
> BTW, do you intend to join after FOSDEM?

Normally yes, although I still have to request official permission.
Possibly not all three days though, to be confirmed...

/Thomas
diff mbox series

Patch

diff --git a/support/scripts/check-bin-arch b/support/scripts/check-bin-arch
index 66b8d89932..d4902163e7 100755
--- a/support/scripts/check-bin-arch
+++ b/support/scripts/check-bin-arch
@@ -1,92 +1,113 @@ 
-#!/usr/bin/env bash
-
-# List of hardcoded paths that should be ignored, as they may
-# contain binaries for an architecture different from the
-# architecture of the target.
-declare -a IGNORES=(
-	# Skip firmware files, they could be ELF files for other
-	# architectures
-	"/lib/firmware"
-	"/usr/lib/firmware"
-
-	# Skip kernel modules
-	# When building a 32-bit userland on 64-bit architectures, the kernel
-	# and its modules may still be 64-bit. To keep the basic
-	# check-bin-arch logic simple, just skip this directory.
-	"/lib/modules"
-	"/usr/lib/modules"
-
-	# Skip files in /usr/share, several packages (qemu,
-	# pru-software-support) legitimately install ELF binaries that
-	# are not for the target architecture
-	"/usr/share"
-
-	# Skip files in /lib/grub, since it is possible to have it
-	# for a different architecture (e.g. i386 grub on x86_64).
-	"/lib/grub"
-)
-
-while getopts p:l:r:a:i: OPT ; do
-	case "${OPT}" in
-	p) package="${OPTARG}";;
-	l) pkg_list="${OPTARG}";;
-	r) readelf="${OPTARG}";;
-	a) arch_name="${OPTARG}";;
-	i)
-		# Ensure we do have single '/' as separators,
-		# and that we have a leading and a trailing one.
-		pattern="$(sed -r -e 's:/+:/:g; s:^/*:/:; s:/*$:/:;' <<<"${OPTARG}")"
-		IGNORES+=("${pattern}")
-		;;
-	:) error "option '%s' expects a mandatory argument\n" "${OPTARG}";;
-	\?) error "unknown option '%s'\n" "${OPTARG}";;
-	esac
-done
-
-if test -z "${package}" -o -z "${pkg_list}" -o -z "${readelf}" -o -z "${arch_name}" ; then
-	echo "Usage: $0 -p <pkg> -l <pkg-file-list> -r <readelf> -a <arch name> [-i PATH ...]"
-	exit 1
-fi
-
-exitcode=0
-
-# Only split on new lines, for filenames-with-spaces
-IFS="
-"
-
-while read f; do
-	for ignore in "${IGNORES[@]}"; do
-		if [[ "${f}" =~ ^"${ignore}" ]]; then
-			continue 2
-		fi
-	done
-
-	# Skip symlinks. Some symlinks may have absolute paths as
-	# target, pointing to host binaries while we're building.
-	if [[ -L "${TARGET_DIR}/${f}" ]]; then
-		continue
-	fi
-
-	# Get architecture using readelf. We pipe through 'head -1' so
-	# that when the file is a static library (.a), we only take
-	# into account the architecture of the first object file.
-	arch=$(LC_ALL=C ${readelf} -h "${TARGET_DIR}/${f}" 2>&1 | \
-		       sed -r -e '/^  Machine: +(.+)/!d; s//\1/;' | head -1)
-
-	# If no architecture found, assume it was not an ELF file
-	if test "${arch}" = "" ; then
-		continue
-	fi
-
-	# Architecture is correct
-	if test "${arch}" = "${arch_name}" ; then
-		continue
-	fi
-
-	printf 'ERROR: architecture for "%s" is "%s", should be "%s"\n' \
-	       "${f}" "${arch}" "${arch_name}"
-
-	exitcode=1
-done < <( sed -r -e "/^${package},\.(.+)$/!d; s//\1/;" ${pkg_list} )
-
-exit ${exitcode}
+#!/usr/bin/env python
+
+import argparse
+import os
+import re
+import subprocess
+import sys
+from brpkgutil import parse_pkg_file_list as parse_pkg_file_list
+
+
+# List of hardcoded paths that should be ignored, as they may contain
+# binaries for an architecture different from the architecture of the
+# target
+IGNORES = {
+    # Skip firmware files
+    # They could be ELF files for other architectures.
+    '/lib/firmware',
+    '/usr/lib/firmware',
+
+    # Skip kernel modules
+    # When building a 32-bit userland on 64-bit architectures, the
+    # kernel and its modules may still be 64-bit. To keep the basic
+    # check-bin-arch logic simple, just skip these directories
+    '/lib/modules',
+    '/usr/lib/modules',
+
+    # Skip files in /usr/share
+    # Several packages (qemu, pru-software-support) legitimately install
+    # ELF binaries that are not for the target architecture
+    '/usr/share',
+
+    # Skip files in /lib/grub
+    # It is possible to have it for a different architecture (e.g. i386
+    # grub on x86_64)
+    '/lib/grub',
+}
+
+
+ERROR = 'ERROR: architecture for "%s" is "%s", should be "%s"\n'
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--package', '-p', metavar='PACKAGE', required=True)
+    parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True)
+    parser.add_argument('--readelf', '-r', metavar='READELF', required=True)
+    parser.add_argument('--arch', '-a', metavar='ARCH', required=True)
+    parser.add_argument('--ignore', '-i', metavar='PATH', action='append')
+    args = parser.parse_args()
+
+    if args.ignore is not None:
+        # Ensure we do have single '/' as separators, and that we have a
+        # leading and a trailing one, then append to the global list.
+        for pattern in args.ignore:
+            IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern)))
+
+    ignores_re = set()
+    for i in IGNORES:
+        ignores_re.add(re.compile(i))
+
+    arch_re = re.compile('^  Machine: +(.+)')
+
+    target_dir = os.environ['TARGET_DIR']
+
+    exit_code = 0
+    for record in parse_pkg_file_list(args.pkg_list):
+        if record['pkg'] != args.package:
+            continue
+
+        fpath = record['file']
+
+        ignored = False
+        for i in ignores_re:
+            if i.match(fpath):
+                ignored = True
+                break
+        if ignored:
+            continue
+
+        # Skip symlinks. Some symlinks may have absolute paths as
+        # target, pointing to host binaries while we're building.
+        if os.path.islink(os.path.join(target_dir, fpath)):
+            continue
+
+        cmd = [args.readelf, '-h', os.path.join(target_dir, fpath)]
+        try:
+            with open(os.devnull, 'wb') as devnull:
+                p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=devnull,
+                                     universal_newlines=True)
+                stdout = p.communicate()[0].split('\n')[:-1]
+                ret = p.returncode
+        except OSError:
+            ret = 1
+            stdout = list()
+        # If readelf returned in error, or returned nothing on stdout,
+        # then that was not an ELF file, or we may not yet have readelf
+        # (e.g. in skeleton, before toolchain)
+        if ret != 0 or len(stdout) == 0:
+            continue
+
+        for line in stdout:
+            if arch_re.match(line):
+                arch = arch_re.sub(r'\1', line)
+                if arch != args.arch:
+                    print(ERROR.format(fpath, arch, args.arch))
+                    exit_code = 1
+                break
+
+    return exit_code
+
+
+if __name__ == "__main__":
+    sys.exit(main())