Message ID | 91348c3073b0dcf57a8ea3227bae74c23a122713.1641663315.git.yann.morin.1998@free.fr |
---|---|
State | Changes Requested |
Headers | show |
Series | core: fixup PPD paths (branch yem/ppd-fixup-paths) | expand |
On Sat, 8 Jan 2022 18:35:41 +0100 "Yann E. MORIN" <yann.morin.1998@free.fr> wrote: > Some files contain hard-coded absolute paths that point to the host > and/or staging directories. > > With per-package directories (aka. PPD), these paths point to the PPD > of the package that created the files, when we want them to point to the > PPD of the package that uses them. > > Up until now, we had two hooks that attempted to fix those files: > > - a libtool-specific hook that searches for all .la files and seds > them with the proper PPD, > > - a python-specific hook that tweaks just the sysconfigdata and > removes the byte-compiled version of the sysconfigdata. > > But now, we also have a few other kinds of files for which we need to > fix the PPD: .cmake, .pc, or .pri files, and probably a bunch of others > as well. > > We solve this issue by just replacing any PPD in text files, with the > current package's PPD. > > This is very similar to, and inspired from what is done when relocating > the SDK. However, we can't use the existing relocate-sdk script, because > that needs to know the original location, which we do not have when we > aggregate the PPD (we could store it, but we can easily do without it). > > Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr> > Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> > Cc: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be> > Cc: Adam Duskett <aduskett@gmail.com> > Cc: Louis-Paul CORDIER <lpdev@cordier.org> > Cc: Andreas Naumann <anaumann@ultratronik.de> > --- > package/pkg-generic.mk | 43 +++++++++++++++++++++--------------------- > 1 file changed, 21 insertions(+), 22 deletions(-) > [...] Acked-by: Herve Codina <herve.codina@bootlin.com> Best regards, Hervé
On 08/01/2022 18:35, Yann E. MORIN wrote: > Some files contain hard-coded absolute paths that point to the host > and/or staging directories. > > With per-package directories (aka. PPD), these paths point to the PPD > of the package that created the files, when we want them to point to the > PPD of the package that uses them. > > Up until now, we had two hooks that attempted to fix those files: > > - a libtool-specific hook that searches for all .la files and seds > them with the proper PPD, > > - a python-specific hook that tweaks just the sysconfigdata and > removes the byte-compiled version of the sysconfigdata. > > But now, we also have a few other kinds of files for which we need to > fix the PPD: .cmake, .pc, or .pri files, and probably a bunch of others > as well. > > We solve this issue by just replacing any PPD in text files, with the > current package's PPD. > > This is very similar to, and inspired from what is done when relocating > the SDK. However, we can't use the existing relocate-sdk script, because > that needs to know the original location, which we do not have when we > aggregate the PPD (we could store it, but we can easily do without it). Looking at the resulting code, it's so small that it's not worth factoring out. > > Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr> > Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> > Cc: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be> > Cc: Adam Duskett <aduskett@gmail.com> > Cc: Louis-Paul CORDIER <lpdev@cordier.org> > Cc: Andreas Naumann <anaumann@ultratronik.de> > --- > package/pkg-generic.mk | 43 +++++++++++++++++++++--------------------- > 1 file changed, 21 insertions(+), 22 deletions(-) > > diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk > index 6a5fe5507b..1022062bcf 100644 > --- a/package/pkg-generic.mk > +++ b/package/pkg-generic.mk > @@ -90,21 +90,24 @@ endif > ####################################### > # Helper functions > > -# Make sure .la files only reference the current per-package > -# directory. > - > -# $1: package name (lower case) > -# $2: staging directory of the package > ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y) > -define fixup-libtool-files > - $(Q)find $(2) \( -path '$(2)/lib*' -o -path '$(2)/usr/lib*' \) \ > - -name "*.la" -print0 | xargs -0 --no-run-if-empty \ > - $(SED) "s:$(PER_PACKAGE_DIR)/[^/]\+/:$(PER_PACKAGE_DIR)/$(1)/:g" > + > +# Ensure files like .la, .pc, .pri, .cmake, and so on, point to the > +# proper staging and host directories for the current package: find > +# all text files that contain the PPD root, and replace it with the > +# current package's PPD. > +define PPD_FIXUP_PATHS > + $(Q)grep -lrZ '$(PER_PACKAGE_DIR)/[^/]\+/' $(HOST_DIR) \ This will trawl to large binary files and potentially take a long time... > + |while read -d '' f; do \ > + file -b --mime-type "$${f}" | grep -q '^text/' || continue; \ ... just to be ignored here. More importantly though: if a file is a symlink, it's going to be hit twice. Worse, if it's a symlink to an absolute path (which most likely points *outside* of STAGING_DIR), we may end up sed'ing something on the host... I notice now that the same (theoretical) issue exists in relocate-sdk.sh. Obviously that script doesn't get thoroughly tested so it may very well be the wrong thing to do... Do you remember perhaps why we don't simply do find $(HOST_DIR) -type f -print0 \ ? I was going to say that we can skip the grep, but then we're back at the large (text) file thing. > + printf '%s\0' "$${f}"; \ Why not do the sed right here, like is done in relocate-sdk.sh? In fact, I'd keep the code as close as possible to relocate-sdk.sh to make later refactoring easier. > + done \ > + |xargs -0 --no-run-if-empty \ > + $(SED) 's:$(PER_PACKAGE_DIR)/[^/]\+/:$(PER_PACKAGE_DIR)/$($(PKG)_NAME)/:g' > endef > -endif > > -# Make sure python _sysconfigdata*.py files only reference the current > -# per-package directory. > +# Remove python's pre-compiled "sysconfigdata", as it may contain paths to > +# the original staging or host dirs. > # > # Can't use $(foreach d, $(HOST_DIR)/lib/python* $(STAGING_DIR)/usr/lib/python*, ...) > # because those directories may be created in the same recipe this macro will > @@ -113,19 +116,15 @@ endif > # fail. > # So we just use HOST_DIR as a starting point, and filter on the two directories > # of interest. > -ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y) > define FIXUP_PYTHON_SYSCONFIGDATA Maybe rename to REMOVE_PYTHON_SYSCONFIGATA_PYC Regards, Arnout > $(Q)find $(HOST_DIR) \ > \( -path '$(HOST_DIR)/lib/python*' \ > -o -path '$(STAGING_DIR)/usr/lib/python*' \ > \) \ > - \( \( -name "_sysconfigdata*.pyc" -delete \) \ > - -o \( -name "_sysconfigdata*.py" -print0 \) \ > - \) \ > - | xargs -0 --no-run-if-empty \ > - $(SED) 's:$(PER_PACKAGE_DIR)/[^/]\+/:$(PER_PACKAGE_DIR)/$($(PKG)_NAME)/:g' > + \( -name "_sysconfigdata*.pyc" -delete \) > endef > -endif > + > +endif # PPD > > # Functions to collect statistics about installed files > > @@ -278,8 +277,6 @@ $(BUILD_DIR)/%/.stamp_configured: > @$(call pkg_size_before,$(STAGING_DIR),-staging) > @$(call pkg_size_before,$(BINARIES_DIR),-images) > @$(call pkg_size_before,$(HOST_DIR),-host) > - $(call fixup-libtool-files,$(NAME),$(HOST_DIR)) > - $(call fixup-libtool-files,$(NAME),$(STAGING_DIR)) > $(foreach hook,$($(PKG)_POST_PREPARE_HOOKS),$(call $(hook))$(sep)) > $(foreach hook,$($(PKG)_PRE_CONFIGURE_HOOKS),$(call $(hook))$(sep)) > $($(PKG)_CONFIGURE_CMDS) > @@ -836,7 +833,9 @@ $(2)_EXTRACT_CMDS ?= \ > $$(TAR_OPTIONS) -) > > # pre/post-steps hooks > -$(2)_POST_PREPARE_HOOKS += FIXUP_PYTHON_SYSCONFIGDATA > +$(2)_POST_PREPARE_HOOKS += \ > + PPD_FIXUP_PATHS \ > + FIXUP_PYTHON_SYSCONFIGDATA > > ifeq ($$($(2)_TYPE),target) > ifneq ($$(HOST_$(2)_KCONFIG_VAR),) >
Arnout, All, On 2022-01-12 21:42 +0100, Arnout Vandecappelle spake thusly: > On 08/01/2022 18:35, Yann E. MORIN wrote: > >Some files contain hard-coded absolute paths that point to the host > >and/or staging directories. > > > >With per-package directories (aka. PPD), these paths point to the PPD > >of the package that created the files, when we want them to point to the > >PPD of the package that uses them. [--SNIP--] > >+define PPD_FIXUP_PATHS > >+ $(Q)grep -lrZ '$(PER_PACKAGE_DIR)/[^/]\+/' $(HOST_DIR) \ > This will trawl to large binary files and potentially take a long time... I too thought of that, and I even discussed it slightly with Peter K. here about what would be most efficient: grep-then-file, or file-then-grep? But if we add --binary-files=without-match to grep, then it will bail-out early on binary files, which would be more efficient. And then we still pass that through file to really get only text files. But a text-file with utf8 is really a binary file, so would grep ignore it? > >+ |while read -d '' f; do \ > >+ file -b --mime-type "$${f}" | grep -q '^text/' || continue; \ > ... just to be ignored here. Again, not sure, because text/plain but utf8-encoded? > More importantly though: if a file is a symlink, it's going to be hit > twice. No, I don't think so: $ man grep [..] -r, --recursive Read all files under each directory, recursively, following symbolic links only if they are on the command line. [...] And that seems to be the case: $ echo Pouet >foo $ ln -s foo bar $ grep -r Pouet . foo:Pouet So, seems OK to me? > Worse, if it's a symlink to an absolute path (which most likely > points *outside* of STAGING_DIR), we may end up sed'ing something on the > host... > I notice now that the same (theoretical) issue exists in relocate-sdk.sh. > Obviously that script doesn't get thoroughly tested so it may very well be > the wrong thing to do... > > Do you remember perhaps why we don't simply do > > find $(HOST_DIR) -type f -print0 \ Maybe we should. But grep -r was an elegant way to combine the recursive feature of find, and limit to the matching files at the same time... I can try to time things to see what's the fastest / less-slow... But given that the symlinks are not an issue in practice, so we care? > ? I was going to say that we can skip the grep, but then we're back at the > large (text) file thing. > > >+ printf '%s\0' "$${f}"; \ > > Why not do the sed right here, like is done in relocate-sdk.sh? In fact, > I'd keep the code as close as possible to relocate-sdk.sh to make later > refactoring easier. Using xarg allows to spawn only a few sed. printf is a shell built-in so it's basically free. [--SNIP--] > > define FIXUP_PYTHON_SYSCONFIGDATA > Maybe rename to REMOVE_PYTHON_SYSCONFIGATA_PYC OK Regards, Yann E. MORIN.
Arnout, All, On 2022-01-13 22:58 +0100, Yann E. MORIN spake thusly: > On 2022-01-12 21:42 +0100, Arnout Vandecappelle spake thusly: > > On 08/01/2022 18:35, Yann E. MORIN wrote: > > >Some files contain hard-coded absolute paths that point to the host > > >and/or staging directories. > > > > > >With per-package directories (aka. PPD), these paths point to the PPD > > >of the package that created the files, when we want them to point to the > > >PPD of the package that uses them. > [--SNIP--] > > >+define PPD_FIXUP_PATHS > > >+ $(Q)grep -lrZ '$(PER_PACKAGE_DIR)/[^/]\+/' $(HOST_DIR) \ > > This will trawl to large binary files and potentially take a long time... > > I too thought of that, and I even discussed it slightly with Peter K. > here about what would be most efficient: grep-then-file, or file-then-grep? > > But if we add --binary-files=without-match to grep, then it will > bail-out early on binary files, which would be more efficient. And then > we still pass that through file to really get only text files. > > But a text-file with utf8 is really a binary file, so would grep ignore > it? So I tried and created a simple file with arbitrary non-ASCII UTF-8 symbols (I just copy-pasted the chinese Wikipedia home page, arbitarily wrapped for "readability"): $ cat foo 英格兰足球超级联赛金手套奖是英格兰足球超级联赛的一个奖项, 每年颁发一次,颁给在当赛季英超联赛中完成最多次零封的守门员。 在足球中,零封指的是守门员在一整场比赛中保持不失球。2005年, 英超联赛第一次颁发金手套奖,切尔西的彼得·切赫获奖,切赫在 2004-05赛季完成了24场零封,至今仍是英超联赛的记录。切赫和乔· 哈特各赢得过四次金手套奖,并列获该奖次数最多的球员,切赫还是 至今唯一一位为两家俱乐部(切尔西、阿森纳)效力时都获得金手套 奖的球员。佩佩·雷纳则是第一个能够连续获得该奖的球员,他在2006 至2008年连续三次获得英超金手套。乔·哈特则在2011-2013年追平了 这一成就。 $ file -b --mime-type foo text/plain $ grep --binary-files=without-match 是英超 foo 2004-05赛季完成了24场零封,至今仍是英超联赛的记录。切赫和乔· So, a text file with utf-8 chars is still recognied as a text file, and grep will also not recognise it as a binary file. So we can tell grep to bail out early on binary files, to speed up the stuff. I'd still keep the 'file' test as a further safety net, though. So, grep does not follow symlinks, and we can can speed grep by instructing it to skip binary files. As such, I think all your issues have been addressed now. Regards, Yann E. MORIN.
diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk index 6a5fe5507b..1022062bcf 100644 --- a/package/pkg-generic.mk +++ b/package/pkg-generic.mk @@ -90,21 +90,24 @@ endif ####################################### # Helper functions -# Make sure .la files only reference the current per-package -# directory. - -# $1: package name (lower case) -# $2: staging directory of the package ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y) -define fixup-libtool-files - $(Q)find $(2) \( -path '$(2)/lib*' -o -path '$(2)/usr/lib*' \) \ - -name "*.la" -print0 | xargs -0 --no-run-if-empty \ - $(SED) "s:$(PER_PACKAGE_DIR)/[^/]\+/:$(PER_PACKAGE_DIR)/$(1)/:g" + +# Ensure files like .la, .pc, .pri, .cmake, and so on, point to the +# proper staging and host directories for the current package: find +# all text files that contain the PPD root, and replace it with the +# current package's PPD. +define PPD_FIXUP_PATHS + $(Q)grep -lrZ '$(PER_PACKAGE_DIR)/[^/]\+/' $(HOST_DIR) \ + |while read -d '' f; do \ + file -b --mime-type "$${f}" | grep -q '^text/' || continue; \ + printf '%s\0' "$${f}"; \ + done \ + |xargs -0 --no-run-if-empty \ + $(SED) 's:$(PER_PACKAGE_DIR)/[^/]\+/:$(PER_PACKAGE_DIR)/$($(PKG)_NAME)/:g' endef -endif -# Make sure python _sysconfigdata*.py files only reference the current -# per-package directory. +# Remove python's pre-compiled "sysconfigdata", as it may contain paths to +# the original staging or host dirs. # # Can't use $(foreach d, $(HOST_DIR)/lib/python* $(STAGING_DIR)/usr/lib/python*, ...) # because those directories may be created in the same recipe this macro will @@ -113,19 +116,15 @@ endif # fail. # So we just use HOST_DIR as a starting point, and filter on the two directories # of interest. -ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y) define FIXUP_PYTHON_SYSCONFIGDATA $(Q)find $(HOST_DIR) \ \( -path '$(HOST_DIR)/lib/python*' \ -o -path '$(STAGING_DIR)/usr/lib/python*' \ \) \ - \( \( -name "_sysconfigdata*.pyc" -delete \) \ - -o \( -name "_sysconfigdata*.py" -print0 \) \ - \) \ - | xargs -0 --no-run-if-empty \ - $(SED) 's:$(PER_PACKAGE_DIR)/[^/]\+/:$(PER_PACKAGE_DIR)/$($(PKG)_NAME)/:g' + \( -name "_sysconfigdata*.pyc" -delete \) endef -endif + +endif # PPD # Functions to collect statistics about installed files @@ -278,8 +277,6 @@ $(BUILD_DIR)/%/.stamp_configured: @$(call pkg_size_before,$(STAGING_DIR),-staging) @$(call pkg_size_before,$(BINARIES_DIR),-images) @$(call pkg_size_before,$(HOST_DIR),-host) - $(call fixup-libtool-files,$(NAME),$(HOST_DIR)) - $(call fixup-libtool-files,$(NAME),$(STAGING_DIR)) $(foreach hook,$($(PKG)_POST_PREPARE_HOOKS),$(call $(hook))$(sep)) $(foreach hook,$($(PKG)_PRE_CONFIGURE_HOOKS),$(call $(hook))$(sep)) $($(PKG)_CONFIGURE_CMDS) @@ -836,7 +833,9 @@ $(2)_EXTRACT_CMDS ?= \ $$(TAR_OPTIONS) -) # pre/post-steps hooks -$(2)_POST_PREPARE_HOOKS += FIXUP_PYTHON_SYSCONFIGDATA +$(2)_POST_PREPARE_HOOKS += \ + PPD_FIXUP_PATHS \ + FIXUP_PYTHON_SYSCONFIGDATA ifeq ($$($(2)_TYPE),target) ifneq ($$(HOST_$(2)_KCONFIG_VAR),)
Some files contain hard-coded absolute paths that point to the host and/or staging directories. With per-package directories (aka. PPD), these paths point to the PPD of the package that created the files, when we want them to point to the PPD of the package that uses them. Up until now, we had two hooks that attempted to fix those files: - a libtool-specific hook that searches for all .la files and seds them with the proper PPD, - a python-specific hook that tweaks just the sysconfigdata and removes the byte-compiled version of the sysconfigdata. But now, we also have a few other kinds of files for which we need to fix the PPD: .cmake, .pc, or .pri files, and probably a bunch of others as well. We solve this issue by just replacing any PPD in text files, with the current package's PPD. This is very similar to, and inspired from what is done when relocating the SDK. However, we can't use the existing relocate-sdk script, because that needs to know the original location, which we do not have when we aggregate the PPD (we could store it, but we can easily do without it). Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Cc: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be> Cc: Adam Duskett <aduskett@gmail.com> Cc: Louis-Paul CORDIER <lpdev@cordier.org> Cc: Andreas Naumann <anaumann@ultratronik.de> --- package/pkg-generic.mk | 43 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 22 deletions(-)