Message ID | 20220920064550.520645-1-thomas.petazzoni@bootlin.com |
---|---|
State | Changes Requested |
Headers | show |
Series | support/scripts/check-host-libs: add new check on host binaries/libs | expand |
Thomas, All, On 2022-09-20 08:45 +0200, Thomas Petazzoni spake thusly: > One frequent issue in Buildroot is that when building host libraries > or applications, the build system of the package detects some > libraries provided by the system, and happily links to them, without > Buildroot knowing. Sometimes this doesn't cause any problem, but > sometimes this causes issues, and we're regularly eliminating such > mis-detection by forcing those packages to not detect the system > libraries that have not been built by Buildroot. > > The new script check-host-libs added in this commit, which is executed > during the host-finalize step at the end of the build is an attempt at > detecting at least some of these situations. > > What it does is that at the end of the build, it verifies that all > binaries and libraries in $(HOST_DIR) only have shared library > dependencies on libraries that are in Buildroot $(HOST_DIR), to the > exception of the C library, for which we of course use the system C > library. > > For example, if the binary output/host/bin/plop is linked against > libpng, but libpng was not built and installed by Buildroot, the build > will now fail with: > > ERROR: in /home/thomas/projets/buildroot/output/host/bin/plop, libpng16.so.16 unknown > make: *** [Makefile:715: host-finalize] Error 1 I'm afraid this is going to be too big a hammer, at least if that's a hard error. Indeed, when the build environment is reproducible (like, it is a container image), and there are no package in Buildroot, and the host directory will not be distributed (i.e. one does not care about doing an SDK), then it is usually good-enough to use the system-installed libraries, rather than add a Buildroot package. > The script includes an allowlist of libraries provided by the C > library. It is potentially possible that this list might need to be > extended to cover all systems/distributions/C libraries, but only > wider testing of this script will help detect such cases. There are at least missing entries already: librt.so.1 libutil.so.1 libresolv.so.2 Also, there are error messages for some go stuff: ==> HOST_DIR/lib/go/src/debug/elf/testdata/go-relocation-test-gcc930-ranges-no-rela-x86-64 readelf: Error: no .dynamic section in the dynamic segment ==> HOST_DIR/lib/go/src/debug/elf/testdata/go-relocation-test-gcc930-ranges-with-rela-x86-64 readelf: Error: no .dynamic section in the dynamic segment Finally, this script takes more than a minute to run on our build, about a 10% increase from ~12min. This is not nice at all. $ find host/*bin host/lib* -type f |wc -l 16186 $ find host/*bin host/lib* -type f |(keep just libs + execs) |wc -l 339 That last command, though, is what takes time: there is a huge ton of go junk installed in HOST_DIR, and this takes ages to filter-out. So, although I understand the rationale, this should probably be opt-in. Or it should only be ran when doing an SDK? Regards, Yann E. MORIN. > It is worth mentioning that for now this script is executed only once > at the end of the build. This means that if a package A gets built, > detects and uses a system library libfoo and uses it, and then by > chance later Buildroot package B builds and installs libfoo into > HOST_DIR/lib, this script will believe that package A is correct, as > it finds libfoo in HOST_DIR/lib, even though while package A was being > built, the libfoo being detected was the system one. Detecting this > would require running check-host-libs at the end of each package > build, but that would imply re-checking over and over again all host > binaries/libraries, which could have a noticeable impact on the build > time. So for now, we simply check at the end of the build, which > should already help to detect a lot of interesting bogus situations. > > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> > --- > It would be very useful if a few people could apply this patch to > their local tree, run their usual build, and see how it behaves. This > way, I can get some feedback to address the most obvious issues before > it gets merged and starts causing build failures in the autobuilders. > --- > Makefile | 1 + > support/scripts/check-host-libs | 36 +++++++++++++++++++++++++++++++++ > 2 files changed, 37 insertions(+) > create mode 100755 support/scripts/check-host-libs > > diff --git a/Makefile b/Makefile > index ec7c034ac1..7ba8ccd535 100644 > --- a/Makefile > +++ b/Makefile > @@ -712,6 +712,7 @@ STAGING_DIR_FILES_LISTS = $(sort $(wildcard $(BUILD_DIR)/*/.files-list-staging.t > host-finalize: $(PACKAGES) $(HOST_DIR) $(HOST_DIR_SYMLINK) > @$(call MESSAGE,"Finalizing host directory") > $(call per-package-rsync,$(sort $(PACKAGES)),host,$(HOST_DIR)) > + ./support/scripts/check-host-libs $(HOST_DIR) > > .PHONY: staging-finalize > staging-finalize: $(STAGING_DIR_SYMLINK) > diff --git a/support/scripts/check-host-libs b/support/scripts/check-host-libs > new file mode 100755 > index 0000000000..ef307bb6dd > --- /dev/null > +++ b/support/scripts/check-host-libs > @@ -0,0 +1,36 @@ > +#!/bin/bash > + > +HOST_DIR=$1 > + > +if test -z "${HOST_DIR}" ; then > + echo "usage: check-host-libs HOST_DIR" > + exit 1 > +fi > + > +bailout="no" > + > +for f in $(find ${HOST_DIR}/*bin ${HOST_DIR}/lib* -type f); do > + mime=$(file -b --mime-type ${f}) > + if test "${mime}" != "application/x-sharedlib" -a \ > + "${mime}" != "application/x-executable" ; then > + continue > + fi > + for lib in $(LC_ALL=C readelf -d ${f} | grep NEEDED | sed 's,.*Shared library: \[\(.*\)\].*,\1,'); do > + case ${lib} in > + libc.so*|libm.so*|libstdc++.so*|libpthread.so*|libgcc_s.so*|libdl.so*|ld-*|libgomp.so*|libcrypt.so*|libcrypto.so*|libatomic.so*) > + continue > + ;; > + *) > + if test -e ${HOST_DIR}/lib/${lib} ; then > + continue > + fi > + echo "ERROR: in ${f}, ${lib} unknown" > + bailout="yes" > + ;; > + esac > + done > +done > + > +if test "${bailout}" = "yes" ; then > + exit 1 > +fi > -- > 2.37.2 > > _______________________________________________ > buildroot mailing list > buildroot@buildroot.org > https://lists.buildroot.org/mailman/listinfo/buildroot
Hello, Thanks for the quick feedback! On Tue, 20 Sep 2022 09:46:24 +0200 <yann.morin@orange.com> wrote: > I'm afraid this is going to be too big a hammer, at least if that's a > hard error. > > Indeed, when the build environment is reproducible (like, it is a > container image), and there are no package in Buildroot, and the host > directory will not be distributed (i.e. one does not care about doing an > SDK), then it is usually good-enough to use the system-installed > libraries, rather than add a Buildroot package. True. > > The script includes an allowlist of libraries provided by the C > > library. It is potentially possible that this list might need to be > > extended to cover all systems/distributions/C libraries, but only > > wider testing of this script will help detect such cases. > > There are at least missing entries already: > librt.so.1 > libutil.so.1 > libresolv.so.2 Thanks, I'll add them. > Also, there are error messages for some go stuff: > ==> HOST_DIR/lib/go/src/debug/elf/testdata/go-relocation-test-gcc930-ranges-no-rela-x86-64 > readelf: Error: no .dynamic section in the dynamic segment > > ==> HOST_DIR/lib/go/src/debug/elf/testdata/go-relocation-test-gcc930-ranges-with-rela-x86-64 > readelf: Error: no .dynamic section in the dynamic segment I guess this can be resolved by simplying sending the readelf error output to oblivion. > Finally, this script takes more than a minute to run on our build, about > a 10% increase from ~12min. This is not nice at all. > > $ find host/*bin host/lib* -type f |wc -l > 16186 > > $ find host/*bin host/lib* -type f |(keep just libs + execs) |wc -l > 339 > > That last command, though, is what takes time: there is a huge ton of go > junk installed in HOST_DIR, and this takes ages to filter-out. Wow, 1 extra minute, this seems a lot. However, I don't understand what you mean by "That last command". What exactly takes time? The fact that we run "file" on zillion of files to filter out files that are not executable/shared libraries? Or the readelf on the filtered files? > So, although I understand the rationale, this should probably be opt-in. > > Or it should only be ran when doing an SDK? I am fine with this being opt-in, but I would not tie it to the SDK, but rather to CI in the autobuilders. Indeed, while locally for your own projects where you might control the build environment (using containers, as you mentioned), in the general situation, Buildroot tries to not use system libraries other than the C library. So having this in the autobuilders would not impose the extra time on users, but would allow us to detect a number of undetected spurious host dependencies. So, it would be a Config.in option, disabled by default. Autobuilders would enable it, and users who are interested by the extra checks could also enable it. Thoughts? Thomas
Thomas, All, On 2022-09-20 10:35 +0200, Thomas Petazzoni spake thusly: > On Tue, 20 Sep 2022 09:46:24 +0200 > <yann.morin@orange.com> wrote: [--SNIP--] > > Finally, this script takes more than a minute to run on our build, about > > a 10% increase from ~12min. This is not nice at all. > > $ find host/*bin host/lib* -type f |(keep just libs + execs) |wc -l > > 339 > > That last command, though, is what takes time: there is a huge ton of go > > junk installed in HOST_DIR, and this takes ages to filter-out. > Wow, 1 extra minute, this seems a lot. However, I don't understand what > you mean by "That last command". What exactly takes time? The fact that > we run "file" on zillion of files to filter out files that are not > executable/shared libraries? Or the readelf on the filtered files? Sorry, I was not explicit enough. Yes, the filtering only, i.e. only running 'file' on all those 16k+ files takes more than a minute: $ date +%s; find host/*bin host/lib* -type f |while read f; do mime=$(file -b --mime-type ${f}) if test "${mime}" != "application/x-sharedlib" -a "${mime}" != "application/x-executable" ; then continue fi printf '%s\n' "${f}" done |wc -l; date +%s 1663659605 339 1663659681 I.e. 76 seconds just to identify the files to actually look at. > > So, although I understand the rationale, this should probably be opt-in. > > > > Or it should only be ran when doing an SDK? > > I am fine with this being opt-in, but I would not tie it to the SDK, > but rather to CI in the autobuilders. Indeed, while locally for your > own projects where you might control the build environment (using > containers, as you mentioned), in the general situation, Buildroot > tries to not use system libraries other than the C library. So having > this in the autobuilders would not impose the extra time on users, but > would allow us to detect a number of undetected spurious host > dependencies. > > So, it would be a Config.in option, disabled by default. Autobuilders > would enable it, and users who are interested by the extra checks could > also enable it. > > Thoughts? Yes, being opt-in is probably the best solution, in the "Build options" -> "Advanced" submenu. Or we can see at optimising it. Overall, I am not a fan of "if it's too slow, don't do it", but I prefer "if it's too slow, make it faster". So, we can start by spawning less processes, then use sed to filter out the result: find ${HOST_DIR}/*bin ${HOST_DIR}/lib* -type f -print 0 \ |xargs -0 -r file --mime-type \ |sed -r -e '/^(.+): application/(x-executable|x-sharedlib)$/!d; s//\1/' \ |while read f; do readelf blabla... done I'll try to find some free CPU cycles to look further into that soon... Regards, Yann E. MORIN.
Howdy, On Tue, 20 Sep 2022 11:11:43 +0200 <yann.morin@orange.com> wrote: > Sorry, I was not explicit enough. Yes, the filtering only, i.e. only > running 'file' on all those 16k+ files takes more than a minute: > > $ date +%s; find host/*bin host/lib* -type f |while read f; do > mime=$(file -b --mime-type ${f}) > if test "${mime}" != "application/x-sharedlib" -a > "${mime}" != "application/x-executable" ; then > continue > fi > printf '%s\n' "${f}" > done |wc -l; date +%s > > 1663659605 > 339 > 1663659681 > > I.e. 76 seconds just to identify the files to actually look at. ACK. > Or we can see at optimising it. Overall, I am not a fan of "if it's too > slow, don't do it", but I prefer "if it's too slow, make it faster". Sure :-) > So, we can start by spawning less processes, then use sed to filter out > the result: > > find ${HOST_DIR}/*bin ${HOST_DIR}/lib* -type f -print 0 \ > |xargs -0 -r file --mime-type \ > |sed -r -e '/^(.+): application/(x-executable|x-sharedlib)$/!d; s//\1/' \ > |while read f; do > readelf blabla... > done If it's really running "file" on a zillion files that is slow, I'm not sure how this can really improve the performance: it still runs "file" on zillion files. Also, I initially had some constructs with a ... | while read f, but the variables in the sub-shell where not visible outside, so the "bailout" variable didn't work. An alternative would be to try to do this in Python and use parallelization. Thomas
Thomas, All, On 2022-09-20 11:19 +0200, Thomas Petazzoni spake thusly: > On Tue, 20 Sep 2022 11:11:43 +0200 > <yann.morin@orange.com> wrote: [--SNIP--] > > So, we can start by spawning less processes, then use sed to filter out > > the result: > > > > find ${HOST_DIR}/*bin ${HOST_DIR}/lib* -type f -print 0 \ > > |xargs -0 -r file --mime-type \ > > |sed -r -e '/^(.+): application/(x-executable|x-sharedlib)$/!d; s//\1/' \ > > |while read f; do > > readelf blabla... > > done > > If it's really running "file" on a zillion files that is slow, I'm not > sure how this can really improve the performance: it still runs "file" > on zillion files. That's because in your script, 'file' reads in DB for every file, while the above only loads the DB for each set of a lot of files at once. So, using the above to just list he ELF files, takes just 30s instead of 76s with your construct. It's still a lot, but it is less than half the time already. > Also, I initially had some constructs with a ... | while read f, but > the variables in the sub-shell where not visible outside, so the > "bailout" variable didn't work. > > An alternative would be to try to do this in Python and use > parallelization. What is the policy on requiring a pyhon interpreter on the host for mandatory Buildroot infra? In my experience, doing things in python rather than in shell, does indeed speed up things quite substantially. Regards, Yann E. MORIN.
On Tue, 20 Sep 2022 11:40:51 +0200 <yann.morin@orange.com> wrote: > That's because in your script, 'file' reads in DB for every file, while > the above only loads the DB for each set of a lot of files at once. Ah, yes, indeed. Another possibility is to not use "file" and just look at the first 4 bytes of the files to identify ELF files, because that's really what matters. > What is the policy on requiring a pyhon interpreter on the host for > mandatory Buildroot infra? In my experience, doing things in python > rather than in shell, does indeed speed up things quite substantially. We have indeed dropped python on the host as a requirement for the build some time ago. Probably doesn't make sense to reintroduce this requirement "just" for this. Even though practically speaking, a lot of the tooling around Buildroot (pkg-stats, graphs, etc.) already relies on Python, and most people are very likely to have Python installed anyway. Thomas
On 20/09/2022 11:54, Thomas Petazzoni wrote: > On Tue, 20 Sep 2022 11:40:51 +0200 > <yann.morin@orange.com> wrote: > >> That's because in your script, 'file' reads in DB for every file, while >> the above only loads the DB for each set of a lot of files at once. > > Ah, yes, indeed. > > Another possibility is to not use "file" and just look at the first 4 > bytes of the files to identify ELF files, because that's really what > matters. I was thinking the same thing. date +%s find host/*bin host/lib* -type f |while read f; do mime=$(file -b --mime-type ${f}) if test "${mime}" != "application/x-sharedlib" -a "${mime}" != "application/x-executable" ; then continue fi printf '%s\n' "${f}" done | wc -l date +%s 1663702163 69 1663702230 date +%s find host/*bin host/lib* -type f |while read f; do if printf '\x7f\x45\x4c\x46' | cmp -s -n 4 - ${f}; then printf '%s\n' "${f}" fi done | wc -l date +%s 1663702312 111 1663702338 For comparison, just iterating over all files: date +%s find host/*bin host/lib* -type f |while read f; do : done | wc -l date +%s 1663702472 0 1663702473 So: - cmp of just 4 bytes still takes 26 seconds, but that's a big improvement over the original 67 seconds. Still it seems to be a bit long for just 13000 files - that's only 500 files per second... - cmp also finds object files (mime-type application/x-object), and Go happens to dump a lot of those in lib/go/debug. This may cause the subsequent readelf to take a lot of time again. >> What is the policy on requiring a pyhon interpreter on the host for >> mandatory Buildroot infra? In my experience, doing things in python >> rather than in shell, does indeed speed up things quite substantially. > > We have indeed dropped python on the host as a requirement for the > build some time ago. Probably doesn't make sense to reintroduce this > requirement "just" for this. Even though practically speaking, a lot of > the tooling around Buildroot (pkg-stats, graphs, etc.) already relies > on Python, and most people are very likely to have Python installed > anyway. I hate to say this, but we do still have perl as a dependency... If you want, I can ask my colleague Hugo to whip something up. But I don't expect you actually want that :-) No, seriously, what he writes is really readable. Still perl though. Regards, Arnout
Arnout, All, On 2022-09-20 21:48 +0200, Arnout Vandecappelle spake thusly: > On 20/09/2022 11:54, Thomas Petazzoni wrote: > >Another possibility is to not use "file" and just look at the first 4 > >bytes of the files to identify ELF files, because that's really what > >matters. > I was thinking the same thing. [--SNIP--] > date +%s > find host/*bin host/lib* -type f |while read f; do > if printf '\x7f\x45\x4c\x46' | cmp -s -n 4 - ${f}; then > printf '%s\n' "${f}" > fi > done | wc -l > date +%s > > 1663702312 > 111 > 1663702338 [--SNIP--] > - cmp of just 4 bytes still takes 26 seconds, but that's a big improvement > over the original 67 seconds. Still it seems to be a bit long for just 13000 > files - that's only 500 files per second... In my case, it takes 62s, i.e. roughly the same amount of time the one-by-one mimetype check does (76s). So, the many mimetype checks at once are much faster here (30s). Anyway, indeed, there is nothing in shell that can beat a little python script: $ cat foo #!/usr/bin/env python3 import os import sys def main(): for d in sys.argv: if os.path.islink(d): continue for dirpath, _, filenames in os.walk(d): for f in filenames: path = os.path.join(dirpath, f) if os.path.islink(path): continue with open(path, 'rb') as fd: blob = fd.read(4) if blob == b'\x7f\x45\x4c\x46': print(path) if __name__ == "__main__": main() $ date +%s; ./foo host/*bin host/lib* |wc -l; date +%s 1663743040 382 1663743040 Yes, less than one second. And it identifies exactly the same set of files the cmp shell snippet does (of course, both on a cache-hot dir). So, I would be happy that we add a python interpreter to the required dependencies. We can be conservative and require just 2.7 if we really want to be able to build on older distros. Nowadays, virtually everyone has a python interpreter on their machine... There still is a little issue, though, is how we can parse an ELF file in python. It is not entirely trivial, but I already did that in some other place (and the license is compatible with that of Buildroot): https://github.com/Orange-OpenSource/aa-scan3/blob/master/aa_scan3/plugins/elf.py#L63 And that proved to be much, *much* faster than doing it in shell (at least an order of magnitude). Only issue: it is not in the python stdlib... So, we could delegate to a python helper like the little snippet above, to search for ELF files, and keep using a shell script to to the rest. In my experience with Thomas' initial patch, what realyl took time was running the search, and the readelf part was mostly unnoticeable, so probably around a few seconds. I'll try to do some timing today. Regards, Yann E. MORIN. > - cmp also finds object files (mime-type application/x-object), and Go > happens to dump a lot of those in lib/go/debug. This may cause the > subsequent readelf to take a lot of time again. > > > >>What is the policy on requiring a pyhon interpreter on the host for > >>mandatory Buildroot infra? In my experience, doing things in python > >>rather than in shell, does indeed speed up things quite substantially. > > > >We have indeed dropped python on the host as a requirement for the > >build some time ago. Probably doesn't make sense to reintroduce this > >requirement "just" for this. Even though practically speaking, a lot of > >the tooling around Buildroot (pkg-stats, graphs, etc.) already relies > >on Python, and most people are very likely to have Python installed > >anyway. > > I hate to say this, but we do still have perl as a dependency... If you > want, I can ask my colleague Hugo to whip something up. But I don't expect > you actually want that :-) No, seriously, what he writes is really readable. > Still perl though. > > > Regards, > Arnout >
From: Arnout Vandecappelle > Sent: 20 September 2022 20:49 ... > date +%s > find host/*bin host/lib* -type f |while read f; do > if printf '\x7f\x45\x4c\x46' | cmp -s -n 4 - ${f}; then > printf '%s\n' "${f}" > fi > done | wc -l > date +%s > > 1663702312 > 111 > 1663702338 Think a bit further and you can remove the exec of cmp, eg: $ time find /usr/lib -type f | (ELF="$(printf '\x7f\x45\x4c\x46')"; while read f; do read l <"$f" && [ "${l#$ELF}" != "$l" ] && echo $f; done )| wc -l That takes a few seconds on my system. I gave up waiting for the one that runs 'cmp' to finish. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
David, All, On 2022-09-21 08:23 +0000, David Laight spake thusly: > From: Arnout Vandecappelle > > Sent: 20 September 2022 20:49 > ... > > date +%s > > find host/*bin host/lib* -type f |while read f; do > > if printf '\x7f\x45\x4c\x46' | cmp -s -n 4 - ${f}; then > > printf '%s\n' "${f}" > > fi > > done | wc -l > > date +%s > > > > 1663702312 > > 111 > > 1663702338 > > Think a bit further and you can remove the exec of cmp, eg: > > $ time find /usr/lib -type f | (ELF="$(printf '\x7f\x45\x4c\x46')"; while read f; do read l <"$f" && [ "${l#$ELF}" != "$l" ] && echo $f; done )| wc -l Nice trick. That took ~3s on my host/ dir. And then, I think we can do simpler and more obvious (we know we have bash, so we can use the bashisms 'read -N'): read -N 4 magic <"${f}" if [ "${magic}" = "${ELF}" ]; then printf "%s\n" "${f}"; fi Now it's about ~1s, and that yields the same list as cmp yields. Regards, Yann E. MORIN.
Hello, On Wed, 21 Sep 2022 08:23:54 +0000 David Laight <David.Laight@ACULAB.COM> wrote: > Think a bit further and you can remove the exec of cmp, eg: > > $ time find /usr/lib -type f | (ELF="$(printf '\x7f\x45\x4c\x46')"; while read f; do read l <"$f" && [ "${l#$ELF}" != "$l" ] && echo $f; done )| wc -l Thanks for the input and proposal. Isn't read l <"$f" going to read each file entirely in memory? > That takes a few seconds on my system. On my system, your command took 19 seconds. When doing this kind of test, make sure to drop the contents of the page cache first by doing: echo 3 > /proc/sys/vm/drop_caches Before each experiment. On my machine, with the page cache contents dropped, it takes 19-20 seconds to run your command the first time. Subsequent runs only last 12 seconds. Thomas
Thomas, All, On 2022-09-21 11:05 +0200, Thomas Petazzoni spake thusly: > On Wed, 21 Sep 2022 08:23:54 +0000 > David Laight <David.Laight@ACULAB.COM> wrote: > > Think a bit further and you can remove the exec of cmp, eg: > > $ time find /usr/lib -type f | (ELF="$(printf '\x7f\x45\x4c\x46')"; while read f; do read l <"$f" && [ "${l#$ELF}" != "$l" ] && echo $f; done )| wc -l > Thanks for the input and proposal. Isn't read l <"$f" going to read > each file entirely in memory? It should read a single "line", i.e. until the first \n, which can still be quite a lot for binary files, yes. Hence my followup tweak that reads only 4 bytes. > > That takes a few seconds on my system. > On my system, your command took 19 seconds. When doing this kind of > test, make sure to drop the contents of the page cache first by doing: > > echo 3 > /proc/sys/vm/drop_caches > > Before each experiment. On my machine, with the page cache contents > dropped, it takes 19-20 seconds to run your command the first time. > Subsequent runs only last 12 seconds. But usually, that will run just after the rsyncs to aggregate the host dir, so everything will be cache-hot, which is what we care about... Regards, Yann E. MORIN.
From: Thomas Petazzoni > Sent: 21 September 2022 10:05 > > On Wed, 21 Sep 2022 08:23:54 +0000 > David Laight <David.Laight@ACULAB.COM> wrote: > > > Think a bit further and you can remove the exec of cmp, eg: > > > > $ time find /usr/lib -type f | (ELF="$(printf '\x7f\x45\x4c\x46')"; while read f; do read l <"$f" && > [ "${l#$ELF}" != "$l" ] && echo $f; done )| wc -l > > Thanks for the input and proposal. Isn't read l <"$f" going to read > each file entirely in memory? Probably until the first '\n' - which might be quite a lot! But usually the fork+exec is what kills shell script performance. (Avoiding cmp/expr on a SYSV shell took lateral thought.) > > That takes a few seconds on my system. > > On my system, your command took 19 seconds. When doing this kind of > test, make sure to drop the contents of the page cache first by doing: > > echo 3 > /proc/sys/vm/drop_caches > > Before each experiment. On my machine, with the page cache contents > dropped, it takes 19-20 seconds to run your command the first time. > Subsequent runs only last 12 seconds. I suspect that slows down the find as well. The file reads may be small compared to the directory traversal. In any case the 'read -N 4' will stop the shell reading more than one page. Which should get the whole thing about as fast as it is ever going to be. (Although bash isn't exactly the most spritely shell.) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Thomas, All, On 2022-09-20 08:45 +0200, Thomas Petazzoni spake thusly: > One frequent issue in Buildroot is that when building host libraries > or applications, the build system of the package detects some > libraries provided by the system, and happily links to them, without > Buildroot knowing. Sometimes this doesn't cause any problem, but > sometimes this causes issues, and we're regularly eliminating such > mis-detection by forcing those packages to not detect the system > libraries that have not been built by Buildroot. > > The new script check-host-libs added in this commit, which is executed > during the host-finalize step at the end of the build is an attempt at > detecting at least some of these situations. [--SNIP--] There was some feedback on that script: it is way too slow, but it can be made much faster with some shell tricks. So, I've marked it as changes-requested in patchwork, now. Regards, Yann E. MORIN.
diff --git a/Makefile b/Makefile index ec7c034ac1..7ba8ccd535 100644 --- a/Makefile +++ b/Makefile @@ -712,6 +712,7 @@ STAGING_DIR_FILES_LISTS = $(sort $(wildcard $(BUILD_DIR)/*/.files-list-staging.t host-finalize: $(PACKAGES) $(HOST_DIR) $(HOST_DIR_SYMLINK) @$(call MESSAGE,"Finalizing host directory") $(call per-package-rsync,$(sort $(PACKAGES)),host,$(HOST_DIR)) + ./support/scripts/check-host-libs $(HOST_DIR) .PHONY: staging-finalize staging-finalize: $(STAGING_DIR_SYMLINK) diff --git a/support/scripts/check-host-libs b/support/scripts/check-host-libs new file mode 100755 index 0000000000..ef307bb6dd --- /dev/null +++ b/support/scripts/check-host-libs @@ -0,0 +1,36 @@ +#!/bin/bash + +HOST_DIR=$1 + +if test -z "${HOST_DIR}" ; then + echo "usage: check-host-libs HOST_DIR" + exit 1 +fi + +bailout="no" + +for f in $(find ${HOST_DIR}/*bin ${HOST_DIR}/lib* -type f); do + mime=$(file -b --mime-type ${f}) + if test "${mime}" != "application/x-sharedlib" -a \ + "${mime}" != "application/x-executable" ; then + continue + fi + for lib in $(LC_ALL=C readelf -d ${f} | grep NEEDED | sed 's,.*Shared library: \[\(.*\)\].*,\1,'); do + case ${lib} in + libc.so*|libm.so*|libstdc++.so*|libpthread.so*|libgcc_s.so*|libdl.so*|ld-*|libgomp.so*|libcrypt.so*|libcrypto.so*|libatomic.so*) + continue + ;; + *) + if test -e ${HOST_DIR}/lib/${lib} ; then + continue + fi + echo "ERROR: in ${f}, ${lib} unknown" + bailout="yes" + ;; + esac + done +done + +if test "${bailout}" = "yes" ; then + exit 1 +fi
One frequent issue in Buildroot is that when building host libraries or applications, the build system of the package detects some libraries provided by the system, and happily links to them, without Buildroot knowing. Sometimes this doesn't cause any problem, but sometimes this causes issues, and we're regularly eliminating such mis-detection by forcing those packages to not detect the system libraries that have not been built by Buildroot. The new script check-host-libs added in this commit, which is executed during the host-finalize step at the end of the build is an attempt at detecting at least some of these situations. What it does is that at the end of the build, it verifies that all binaries and libraries in $(HOST_DIR) only have shared library dependencies on libraries that are in Buildroot $(HOST_DIR), to the exception of the C library, for which we of course use the system C library. For example, if the binary output/host/bin/plop is linked against libpng, but libpng was not built and installed by Buildroot, the build will now fail with: ERROR: in /home/thomas/projets/buildroot/output/host/bin/plop, libpng16.so.16 unknown make: *** [Makefile:715: host-finalize] Error 1 The script includes an allowlist of libraries provided by the C library. It is potentially possible that this list might need to be extended to cover all systems/distributions/C libraries, but only wider testing of this script will help detect such cases. It is worth mentioning that for now this script is executed only once at the end of the build. This means that if a package A gets built, detects and uses a system library libfoo and uses it, and then by chance later Buildroot package B builds and installs libfoo into HOST_DIR/lib, this script will believe that package A is correct, as it finds libfoo in HOST_DIR/lib, even though while package A was being built, the libfoo being detected was the system one. Detecting this would require running check-host-libs at the end of each package build, but that would imply re-checking over and over again all host binaries/libraries, which could have a noticeable impact on the build time. So for now, we simply check at the end of the build, which should already help to detect a lot of interesting bogus situations. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> --- It would be very useful if a few people could apply this patch to their local tree, run their usual build, and see how it behaves. This way, I can get some feedback to address the most obvious issues before it gets merged and starts causing build failures in the autobuilders. --- Makefile | 1 + support/scripts/check-host-libs | 36 +++++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) create mode 100755 support/scripts/check-host-libs