Message ID | 638ba4cafb8ab013365ae54a393b49f459bc6e74.1546898693.git.yann.morin.1998@free.fr |
---|---|
State | Changes Requested |
Headers | show |
Series | [01/19] infra/pkg-generic: display MESSAGE before running PRE_HOOKS | expand |
El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN (<yann.morin.1998@free.fr>) escribió: > > The existing check-bin-arch script is written in shell, so it can't make > use of all the helpers we have in python, especially the parser for the > package-files lists. > > Although this script is relatively clean as it is, it is not totally > fool-proof either, especially against weird filenames (e.g. > specially-crafted filenames with \n, or \b, etc...). > > Also, shell scripts are often frowned upon but for the most mundane > processing, and this script is definitely not mundane. > > Finally, shell scripts are slow, as all the processing the have to do is > more often than not done by spawning programs, and that is relatively > expensive. > > Rewrite that script in python. > > This allows to do cleaner processing and reuse the package-files list > parsuer. > > There's however a drawback: the script grows substantially, in part > because of spawning the actual readelf call (there is no ELF parser in > the standard python library), and because we want to keep backward > compatible with old pythons that lack proper abstractions like > subprocess.DEVNULL et al. > > Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr> > Cc: Thomas De Schampheleire <patrickdepinguin@gmail.com> > --- > support/scripts/check-bin-arch | 205 +++++++++++++++++++++++------------------ > 1 file changed, 113 insertions(+), 92 deletions(-) > > diff --git a/support/scripts/check-bin-arch b/support/scripts/check-bin-arch > index 66b8d89932..d4902163e7 100755 > --- a/support/scripts/check-bin-arch > +++ b/support/scripts/check-bin-arch > @@ -1,92 +1,113 @@ > -#!/usr/bin/env bash > - > -# List of hardcoded paths that should be ignored, as they may > -# contain binaries for an architecture different from the > -# architecture of the target. > -declare -a IGNORES=( > - # Skip firmware files, they could be ELF files for other > - # architectures > - "/lib/firmware" > - "/usr/lib/firmware" > - > - # Skip kernel modules > - # When building a 32-bit userland on 64-bit architectures, the kernel > - # and its modules may still be 64-bit. To keep the basic > - # check-bin-arch logic simple, just skip this directory. > - "/lib/modules" > - "/usr/lib/modules" > - > - # Skip files in /usr/share, several packages (qemu, > - # pru-software-support) legitimately install ELF binaries that > - # are not for the target architecture > - "/usr/share" > - > - # Skip files in /lib/grub, since it is possible to have it > - # for a different architecture (e.g. i386 grub on x86_64). > - "/lib/grub" > -) > - > -while getopts p:l:r:a:i: OPT ; do > - case "${OPT}" in > - p) package="${OPTARG}";; > - l) pkg_list="${OPTARG}";; > - r) readelf="${OPTARG}";; > - a) arch_name="${OPTARG}";; > - i) > - # Ensure we do have single '/' as separators, > - # and that we have a leading and a trailing one. > - pattern="$(sed -r -e 's:/+:/:g; s:^/*:/:; s:/*$:/:;' <<<"${OPTARG}")" > - IGNORES+=("${pattern}") > - ;; > - :) error "option '%s' expects a mandatory argument\n" "${OPTARG}";; > - \?) error "unknown option '%s'\n" "${OPTARG}";; > - esac > -done > - > -if test -z "${package}" -o -z "${pkg_list}" -o -z "${readelf}" -o -z "${arch_name}" ; then > - echo "Usage: $0 -p <pkg> -l <pkg-file-list> -r <readelf> -a <arch name> [-i PATH ...]" > - exit 1 > -fi > - > -exitcode=0 > - > -# Only split on new lines, for filenames-with-spaces > -IFS=" > -" > - > -while read f; do > - for ignore in "${IGNORES[@]}"; do > - if [[ "${f}" =~ ^"${ignore}" ]]; then > - continue 2 > - fi > - done > - > - # Skip symlinks. Some symlinks may have absolute paths as > - # target, pointing to host binaries while we're building. > - if [[ -L "${TARGET_DIR}/${f}" ]]; then > - continue > - fi > - > - # Get architecture using readelf. We pipe through 'head -1' so > - # that when the file is a static library (.a), we only take > - # into account the architecture of the first object file. > - arch=$(LC_ALL=C ${readelf} -h "${TARGET_DIR}/${f}" 2>&1 | \ > - sed -r -e '/^ Machine: +(.+)/!d; s//\1/;' | head -1) > - > - # If no architecture found, assume it was not an ELF file > - if test "${arch}" = "" ; then > - continue > - fi > - > - # Architecture is correct > - if test "${arch}" = "${arch_name}" ; then > - continue > - fi > - > - printf 'ERROR: architecture for "%s" is "%s", should be "%s"\n' \ > - "${f}" "${arch}" "${arch_name}" > - > - exitcode=1 > -done < <( sed -r -e "/^${package},\.(.+)$/!d; s//\1/;" ${pkg_list} ) > - > -exit ${exitcode} > +#!/usr/bin/env python > + > +import argparse > +import os > +import re > +import subprocess > +import sys > +from brpkgutil import parse_pkg_file_list as parse_pkg_file_list > + > + > +# List of hardcoded paths that should be ignored, as they may contain > +# binaries for an architecture different from the architecture of the > +# target > +IGNORES = { > + # Skip firmware files > + # They could be ELF files for other architectures. > + '/lib/firmware', > + '/usr/lib/firmware', > + > + # Skip kernel modules > + # When building a 32-bit userland on 64-bit architectures, the > + # kernel and its modules may still be 64-bit. To keep the basic > + # check-bin-arch logic simple, just skip these directories > + '/lib/modules', > + '/usr/lib/modules', > + > + # Skip files in /usr/share > + # Several packages (qemu, pru-software-support) legitimately install > + # ELF binaries that are not for the target architecture > + '/usr/share', > + > + # Skip files in /lib/grub > + # It is possible to have it for a different architecture (e.g. i386 > + # grub on x86_64) > + '/lib/grub', > +} > + > + > +ERROR = 'ERROR: architecture for "%s" is "%s", should be "%s"\n' > + > + > +def main(): > + parser = argparse.ArgumentParser() > + parser.add_argument('--package', '-p', metavar='PACKAGE', required=True) > + parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True) > + parser.add_argument('--readelf', '-r', metavar='READELF', required=True) > + parser.add_argument('--arch', '-a', metavar='ARCH', required=True) > + parser.add_argument('--ignore', '-i', metavar='PATH', action='append') I think the above 'metavar' options are not necessary, as they are the default value. > + args = parser.parse_args() > + > + if args.ignore is not None: > + # Ensure we do have single '/' as separators, and that we have a > + # leading and a trailing one, then append to the global list. > + for pattern in args.ignore: > + IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern))) > + Note that the global list itself does not have trailing slashes. This seems inconsistent to me. > + ignores_re = set() > + for i in IGNORES: > + ignores_re.add(re.compile(i)) > + > + arch_re = re.compile('^ Machine: +(.+)') > + > + target_dir = os.environ['TARGET_DIR'] > + > + exit_code = 0 > + for record in parse_pkg_file_list(args.pkg_list): > + if record['pkg'] != args.package: > + continue > + > + fpath = record['file'] > + > + ignored = False > + for i in ignores_re: > + if i.match(fpath): > + ignored = True > + break > + if ignored: > + continue We can never get into this if, right?, because there is already a break whenever ignored is set to True. Note that I think that performance will be better with a list comprehension instead of explicit for's. Something like (untested): valid_records = [ record for record in parse_pkg_file_list(args.pkg_list) if record['pkg'] == args.package and not any(ignore_re.match(record['file']) for ignore_re in ignores_re) ] for record in valid_records: ... You can normally swap the [ ] for ( ) to reduce memory consumption (generator iso list comprehension, yielding one entry at a time). You can argue about readability, but for me this is not less readable than the expanded version. /Thomas
On 2019-01-08 15:56 +0100, Thomas De Schampheleire spake thusly: > El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN > (<yann.morin.1998@free.fr>) escribió: [--SNIP--] > > Rewrite that script in python. [--SNIP--] > > +def main(): > > + parser = argparse.ArgumentParser() > > + parser.add_argument('--package', '-p', metavar='PACKAGE', required=True) > > + parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True) > > + parser.add_argument('--readelf', '-r', metavar='READELF', required=True) > > + parser.add_argument('--arch', '-a', metavar='ARCH', required=True) > > + parser.add_argument('--ignore', '-i', metavar='PATH', action='append') > I think the above 'metavar' options are not necessary, as they are the > default value. I'll check and drop if they indeed are not. > > + args = parser.parse_args() > > + > > + if args.ignore is not None: > > + # Ensure we do have single '/' as separators, and that we have a > > + # leading and a trailing one, then append to the global list. > > + for pattern in args.ignore: > > + IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern))) > Note that the global list itself does not have trailing slashes. This > seems inconsistent to me. It is inconsistent, but I kept the existing behaviour intact as much as possible, so the python script is a drop-in replacement for the shjell script, with the same semantics. > > + ignores_re = set() > > + for i in IGNORES: > > + ignores_re.add(re.compile(i)) > > + > > + arch_re = re.compile('^ Machine: +(.+)') > > + > > + target_dir = os.environ['TARGET_DIR'] > > + > > + exit_code = 0 > > + for record in parse_pkg_file_list(args.pkg_list): > > + if record['pkg'] != args.package: > > + continue > > + > > + fpath = record['file'] > > + > > + ignored = False > > + for i in ignores_re: > > + if i.match(fpath): > > + ignored = True > > + break > > + if ignored: > > + continue > > We can never get into this if, right?, because there is already a > break whenever ignored is set to True. Yes we can, because the break only applied to the inner-most loop, which is the one that iterates over the ignores regexps. > Note that I think that performance will be better with a list > comprehension instead of explicit for's. Something like (untested): > > valid_records = [ record for record in parse_pkg_file_list(args.pkg_list) > if record['pkg'] == args.package > and not any(ignore_re.match(record['file']) for ignore_re > in ignores_re) ] Sorry, but this is totally illegible to me. > for record in valid_records: > ... > > > You can normally swap the [ ] for ( ) to reduce memory consumption > (generator iso list comprehension, yielding one entry at a time). > > You can argue about readability, but for me this is not less readable > than the expanded version. Well, for me your version is totally unredable, sorry... :-/ The one I wrote at least clearly describes what it is doing: - for each record in the list: - check if the package is the one we're looking at, if not continue to next record - for each regecp to ignore, check if the filename matches it, and if one is matched, memorizre the fact and break the loop (fast-path). - if the file is to be ignored, continue to the next record and so on... The size argument is moot IMO, because we do not have thousands of regexps to ignore, and the parse_pkg_file_list() already yields its results. Regards, Yann E. MORIN.
El mar., 8 ene. 2019 a las 17:37, Yann E. MORIN (<yann.morin.1998@free.fr>) escribió: > > On 2019-01-08 15:56 +0100, Thomas De Schampheleire spake thusly: > > El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN > > (<yann.morin.1998@free.fr>) escribió: > [--SNIP--] > > > Rewrite that script in python. > [--SNIP--] > > > +def main(): > > > + parser = argparse.ArgumentParser() > > > + parser.add_argument('--package', '-p', metavar='PACKAGE', required=True) > > > + parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True) > > > + parser.add_argument('--readelf', '-r', metavar='READELF', required=True) > > > + parser.add_argument('--arch', '-a', metavar='ARCH', required=True) > > > + parser.add_argument('--ignore', '-i', metavar='PATH', action='append') > > I think the above 'metavar' options are not necessary, as they are the > > default value. > > I'll check and drop if they indeed are not. > > > > + args = parser.parse_args() > > > + > > > + if args.ignore is not None: > > > + # Ensure we do have single '/' as separators, and that we have a > > > + # leading and a trailing one, then append to the global list. > > > + for pattern in args.ignore: > > > + IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern))) > > Note that the global list itself does not have trailing slashes. This > > seems inconsistent to me. > > It is inconsistent, but I kept the existing behaviour intact as much as > possible, so the python script is a drop-in replacement for the shjell > script, with the same semantics. > > > > + ignores_re = set() > > > + for i in IGNORES: > > > + ignores_re.add(re.compile(i)) > > > + > > > + arch_re = re.compile('^ Machine: +(.+)') > > > + > > > + target_dir = os.environ['TARGET_DIR'] > > > + > > > + exit_code = 0 > > > + for record in parse_pkg_file_list(args.pkg_list): > > > + if record['pkg'] != args.package: > > > + continue > > > + > > > + fpath = record['file'] > > > + > > > + ignored = False > > > + for i in ignores_re: > > > + if i.match(fpath): > > > + ignored = True > > > + break > > > + if ignored: > > > + continue > > > > We can never get into this if, right?, because there is already a > > break whenever ignored is set to True. > > Yes we can, because the break only applied to the inner-most loop, which > is the one that iterates over the ignores regexps. Ok, I see. > > > Note that I think that performance will be better with a list > > comprehension instead of explicit for's. Something like (untested): > > > > valid_records = [ record for record in parse_pkg_file_list(args.pkg_list) > > if record['pkg'] == args.package > > and not any(ignore_re.match(record['file']) for ignore_re > > in ignores_re) ] > > Sorry, but this is totally illegible to me. :-D Ok, I won't argue about the fact itself. Just, as a reference, small decomposition: list = [ f(x) for x in otherlist ] is a 'list comprehension' and is a performant way to generate a list without requiring a for loop. f(x) can be any action on x, e.g. x.strip() or more complex expressions involving x. The 'for x in otherlist' can be restricted further with 'if' clauses. In the code I gave above, the base list is the return value of 'parse_pkg_file_list(..)' and it is filtered by two clauses. The first one: record['pkg'] == args.package only preserves entries in the base list that match the package we are interested in. This is the equivalent of: if record['pkg'] != args.package: continue and the second clause: not any(ignore_re.match(record['file']) for ignore_re in ignores_re) Here, any(x) is a function that takes an iterable (x) and returns True if any of the items in 'x' evaluate to True. Similar to any(x) there is also all(x) which only returns True if _all_ items in x evaluate to True. The value passed to 'any' is '(ignore_re.match(record['file']) for ignore_re in ignores_re)' which itself is a type of list comprehension, except that it actually is a generator comprehension. 'generator' is what you called an 'iterator' in another patch: a thing that 'yields' values one by one. The difference does not really matter for this discussion, so the base format is still: f(x) for x in some_iterable Here f(x) is ignore_re.match(..) and x is 'ignore_re' itself. So the following code: valid_records = [ record for record in parse_pkg_file_list(args.pkg_list) if record['pkg'] == args.package and not any(ignore_re.match(record['file']) for ignore_re in ignores_re) ] naturally reads as: valid_records is the list of records returned by parse_pkg_file_list for which the 'pkg' field is the one we want, and not any of our ignore expression matches the 'file' field of the record. I hope it makes things a bit more clear :) /Thomas
Thomas DS, All, On 2019-01-08 18:22 +0100, Thomas De Schampheleire spake thusly: > El mar., 8 ene. 2019 a las 17:37, Yann E. MORIN > (<yann.morin.1998@free.fr>) escribió: [--SNIP--] > > > Note that I think that performance will be better with a list > > > comprehension instead of explicit for's. Something like (untested): > > > > > > valid_records = [ record for record in parse_pkg_file_list(args.pkg_list) > > > if record['pkg'] == args.package > > > and not any(ignore_re.match(record['file']) for ignore_re > > > in ignores_re) ] > > > > Sorry, but this is totally illegible to me. > > :-D > > Ok, I won't argue about the fact itself. > Just, as a reference, small decomposition: > > list = [ f(x) for x in otherlist ] > > is a 'list comprehension' and is a performant way to generate a list > without requiring a for loop. > f(x) can be any action on x, e.g. x.strip() or more complex > expressions involving x. Yeah, I know what it is [0], and I can actually decipher the code above with quite a bit of mental processing. But that is it: it's deciphering, not reading. [0] I even use them: https://patchwork.ozlabs.org/patch/1021622/ [--SNIP--] > Here, any(x) is a function that takes an iterable (x) and returns True > if any of the items in 'x' evaluate to True. Ah, I did not know about any() (although I did infer its behaviour from its name!) Thanks! :-) > Similar to any(x) there is also all(x) which only returns True if > _all_ items in x evaluate to True. And is there none() ? ;-) I guess it's "not all()"... > The value passed to 'any' is '(ignore_re.match(record['file']) for > ignore_re in ignores_re)' > which itself is a type of list comprehension, except that it actually > is a generator comprehension. 'generator' is what you called an > 'iterator' in another patch: a thing that 'yields' values one by one. Yep, generator, not iterator. [--SNIP--] > I hope it makes things a bit more clear :) As I said, I can actually decipher it (well, once the indentation if fixed anyway! ;-) ), but still it is less readable for me... And since I would have hard a time justifying it, or would have a hard time maintaining it in the future, I prefer writing (and thus keeping) code that I can work with later. If others find the comprehensions easier to comprehend (pun), then fine, but I would be much less at ease with that. Thanks! :-) Regards, Yann E. MORIN.
El mar., 8 ene. 2019 a las 21:33, Yann E. MORIN (<yann.morin.1998@free.fr>) escribió: > > Thomas DS, All, > > On 2019-01-08 18:22 +0100, Thomas De Schampheleire spake thusly: > > El mar., 8 ene. 2019 a las 17:37, Yann E. MORIN > > (<yann.morin.1998@free.fr>) escribió: > [--SNIP--] > > > > Note that I think that performance will be better with a list > > > > comprehension instead of explicit for's. Something like (untested): > > > > > > > > valid_records = [ record for record in parse_pkg_file_list(args.pkg_list) > > > > if record['pkg'] == args.package > > > > and not any(ignore_re.match(record['file']) for ignore_re > > > > in ignores_re) ] > > > > > > Sorry, but this is totally illegible to me. > > > > :-D > > > > Ok, I won't argue about the fact itself. > > Just, as a reference, small decomposition: > > > > list = [ f(x) for x in otherlist ] > > > > is a 'list comprehension' and is a performant way to generate a list > > without requiring a for loop. > > f(x) can be any action on x, e.g. x.strip() or more complex > > expressions involving x. > > Yeah, I know what it is [0], and I can actually decipher the code above > with quite a bit of mental processing. But that is it: it's deciphering, > not reading. > > [0] I even use them: https://patchwork.ozlabs.org/patch/1021622/ > > [--SNIP--] > > Here, any(x) is a function that takes an iterable (x) and returns True > > if any of the items in 'x' evaluate to True. > > Ah, I did not know about any() (although I did infer its behaviour from > its name!) Thanks! :-) > > > Similar to any(x) there is also all(x) which only returns True if > > _all_ items in x evaluate to True. > > And is there none() ? ;-) I guess it's "not all()"... none() does not exist but if you intend what I understand by its name then it is: 'not any()' > > > The value passed to 'any' is '(ignore_re.match(record['file']) for > > ignore_re in ignores_re)' > > which itself is a type of list comprehension, except that it actually > > is a generator comprehension. 'generator' is what you called an > > 'iterator' in another patch: a thing that 'yields' values one by one. > > Yep, generator, not iterator. > > [--SNIP--] > > I hope it makes things a bit more clear :) > > As I said, I can actually decipher it (well, once the indentation if > fixed anyway! ;-) ), but still it is less readable for me... > > And since I would have hard a time justifying it, or would have a hard > time maintaining it in the future, I prefer writing (and thus keeping) > code that I can work with later. > > If others find the comprehensions easier to comprehend (pun), then fine, > but I would be much less at ease with that. By all means, keep your original code. At least by talking about it, we have learnt each other's views and shared some knowledge :-) Best regards, Thomas
Thomas, All, On 2019-01-08 21:46 +0100, Thomas De Schampheleire spake thusly: > El mar., 8 ene. 2019 a las 21:33, Yann E. MORIN > (<yann.morin.1998@free.fr>) escribió: > > And is there none() ? ;-) I guess it's "not all()"... > none() does not exist but if you intend what I understand by its name > then it is: > 'not any()' Dang right. "not all()" would mean at least one is not True, i.e. "not all(x)" ≡ "any(not x)" [--SNIP--] > > If others find the comprehensions easier to comprehend (pun), then fine, > > but I would be much less at ease with that. > > By all means, keep your original code. > > At least by talking about it, we have learnt each other's views and > shared some knowledge :-) Yes, I guess with writing more python I'd get more used to reading it. And I certainly did appreciate the walkthrough, thanks. :-) BTW, do you intend to join after FOSDEM? Regards, Yann E. MORIN.
El mar., 8 ene. 2019 a las 22:16, Yann E. MORIN (<yann.morin.1998@free.fr>) escribió: > > Thomas, All, > > On 2019-01-08 21:46 +0100, Thomas De Schampheleire spake thusly: > > El mar., 8 ene. 2019 a las 21:33, Yann E. MORIN > > (<yann.morin.1998@free.fr>) escribió: > > > And is there none() ? ;-) I guess it's "not all()"... > > none() does not exist but if you intend what I understand by its name > > then it is: > > 'not any()' > > Dang right. "not all()" would mean at least one is not True, i.e. > "not all(x)" ≡ "any(not x)" > > [--SNIP--] > > > If others find the comprehensions easier to comprehend (pun), then fine, > > > but I would be much less at ease with that. > > > > By all means, keep your original code. > > > > At least by talking about it, we have learnt each other's views and > > shared some knowledge :-) > > Yes, I guess with writing more python I'd get more used to reading it. > And I certainly did appreciate the walkthrough, thanks. :-) > > BTW, do you intend to join after FOSDEM? Normally yes, although I still have to request official permission. Possibly not all three days though, to be confirmed... /Thomas
diff --git a/support/scripts/check-bin-arch b/support/scripts/check-bin-arch index 66b8d89932..d4902163e7 100755 --- a/support/scripts/check-bin-arch +++ b/support/scripts/check-bin-arch @@ -1,92 +1,113 @@ -#!/usr/bin/env bash - -# List of hardcoded paths that should be ignored, as they may -# contain binaries for an architecture different from the -# architecture of the target. -declare -a IGNORES=( - # Skip firmware files, they could be ELF files for other - # architectures - "/lib/firmware" - "/usr/lib/firmware" - - # Skip kernel modules - # When building a 32-bit userland on 64-bit architectures, the kernel - # and its modules may still be 64-bit. To keep the basic - # check-bin-arch logic simple, just skip this directory. - "/lib/modules" - "/usr/lib/modules" - - # Skip files in /usr/share, several packages (qemu, - # pru-software-support) legitimately install ELF binaries that - # are not for the target architecture - "/usr/share" - - # Skip files in /lib/grub, since it is possible to have it - # for a different architecture (e.g. i386 grub on x86_64). - "/lib/grub" -) - -while getopts p:l:r:a:i: OPT ; do - case "${OPT}" in - p) package="${OPTARG}";; - l) pkg_list="${OPTARG}";; - r) readelf="${OPTARG}";; - a) arch_name="${OPTARG}";; - i) - # Ensure we do have single '/' as separators, - # and that we have a leading and a trailing one. - pattern="$(sed -r -e 's:/+:/:g; s:^/*:/:; s:/*$:/:;' <<<"${OPTARG}")" - IGNORES+=("${pattern}") - ;; - :) error "option '%s' expects a mandatory argument\n" "${OPTARG}";; - \?) error "unknown option '%s'\n" "${OPTARG}";; - esac -done - -if test -z "${package}" -o -z "${pkg_list}" -o -z "${readelf}" -o -z "${arch_name}" ; then - echo "Usage: $0 -p <pkg> -l <pkg-file-list> -r <readelf> -a <arch name> [-i PATH ...]" - exit 1 -fi - -exitcode=0 - -# Only split on new lines, for filenames-with-spaces -IFS=" -" - -while read f; do - for ignore in "${IGNORES[@]}"; do - if [[ "${f}" =~ ^"${ignore}" ]]; then - continue 2 - fi - done - - # Skip symlinks. Some symlinks may have absolute paths as - # target, pointing to host binaries while we're building. - if [[ -L "${TARGET_DIR}/${f}" ]]; then - continue - fi - - # Get architecture using readelf. We pipe through 'head -1' so - # that when the file is a static library (.a), we only take - # into account the architecture of the first object file. - arch=$(LC_ALL=C ${readelf} -h "${TARGET_DIR}/${f}" 2>&1 | \ - sed -r -e '/^ Machine: +(.+)/!d; s//\1/;' | head -1) - - # If no architecture found, assume it was not an ELF file - if test "${arch}" = "" ; then - continue - fi - - # Architecture is correct - if test "${arch}" = "${arch_name}" ; then - continue - fi - - printf 'ERROR: architecture for "%s" is "%s", should be "%s"\n' \ - "${f}" "${arch}" "${arch_name}" - - exitcode=1 -done < <( sed -r -e "/^${package},\.(.+)$/!d; s//\1/;" ${pkg_list} ) - -exit ${exitcode} +#!/usr/bin/env python + +import argparse +import os +import re +import subprocess +import sys +from brpkgutil import parse_pkg_file_list as parse_pkg_file_list + + +# List of hardcoded paths that should be ignored, as they may contain +# binaries for an architecture different from the architecture of the +# target +IGNORES = { + # Skip firmware files + # They could be ELF files for other architectures. + '/lib/firmware', + '/usr/lib/firmware', + + # Skip kernel modules + # When building a 32-bit userland on 64-bit architectures, the + # kernel and its modules may still be 64-bit. To keep the basic + # check-bin-arch logic simple, just skip these directories + '/lib/modules', + '/usr/lib/modules', + + # Skip files in /usr/share + # Several packages (qemu, pru-software-support) legitimately install + # ELF binaries that are not for the target architecture + '/usr/share', + + # Skip files in /lib/grub + # It is possible to have it for a different architecture (e.g. i386 + # grub on x86_64) + '/lib/grub', +} + + +ERROR = 'ERROR: architecture for "%s" is "%s", should be "%s"\n' + + +def main(): + parser = argparse.ArgumentParser() + parser.add_argument('--package', '-p', metavar='PACKAGE', required=True) + parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True) + parser.add_argument('--readelf', '-r', metavar='READELF', required=True) + parser.add_argument('--arch', '-a', metavar='ARCH', required=True) + parser.add_argument('--ignore', '-i', metavar='PATH', action='append') + args = parser.parse_args() + + if args.ignore is not None: + # Ensure we do have single '/' as separators, and that we have a + # leading and a trailing one, then append to the global list. + for pattern in args.ignore: + IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern))) + + ignores_re = set() + for i in IGNORES: + ignores_re.add(re.compile(i)) + + arch_re = re.compile('^ Machine: +(.+)') + + target_dir = os.environ['TARGET_DIR'] + + exit_code = 0 + for record in parse_pkg_file_list(args.pkg_list): + if record['pkg'] != args.package: + continue + + fpath = record['file'] + + ignored = False + for i in ignores_re: + if i.match(fpath): + ignored = True + break + if ignored: + continue + + # Skip symlinks. Some symlinks may have absolute paths as + # target, pointing to host binaries while we're building. + if os.path.islink(os.path.join(target_dir, fpath)): + continue + + cmd = [args.readelf, '-h', os.path.join(target_dir, fpath)] + try: + with open(os.devnull, 'wb') as devnull: + p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=devnull, + universal_newlines=True) + stdout = p.communicate()[0].split('\n')[:-1] + ret = p.returncode + except OSError: + ret = 1 + stdout = list() + # If readelf returned in error, or returned nothing on stdout, + # then that was not an ELF file, or we may not yet have readelf + # (e.g. in skeleton, before toolchain) + if ret != 0 or len(stdout) == 0: + continue + + for line in stdout: + if arch_re.match(line): + arch = arch_re.sub(r'\1', line) + if arch != args.arch: + print(ERROR.format(fpath, arch, args.arch)) + exit_code = 1 + break + + return exit_code + + +if __name__ == "__main__": + sys.exit(main())
The existing check-bin-arch script is written in shell, so it can't make use of all the helpers we have in python, especially the parser for the package-files lists. Although this script is relatively clean as it is, it is not totally fool-proof either, especially against weird filenames (e.g. specially-crafted filenames with \n, or \b, etc...). Also, shell scripts are often frowned upon but for the most mundane processing, and this script is definitely not mundane. Finally, shell scripts are slow, as all the processing the have to do is more often than not done by spawning programs, and that is relatively expensive. Rewrite that script in python. This allows to do cleaner processing and reuse the package-files list parser. There's however a drawback: the script grows substantially, in part because of spawning the actual readelf call (there is no ELF parser in the standard python library), and because we want to keep backward compatible with old pythons that lack proper abstractions like subprocess.DEVNULL et al. Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr> Cc: Thomas De Schampheleire <patrickdepinguin@gmail.com> --- support/scripts/check-bin-arch | 205 +++++++++++++++++++++++------------------ 1 file changed, 113 insertions(+), 92 deletions(-)