diff mbox series

[2/3] core/pkg-generic: fixup all PPD paths in a generic fashion

Message ID 91348c3073b0dcf57a8ea3227bae74c23a122713.1641663315.git.yann.morin.1998@free.fr
State Changes Requested
Headers show
Series core: fixup PPD paths (branch yem/ppd-fixup-paths) | expand

Commit Message

Yann E. MORIN Jan. 8, 2022, 5:35 p.m. UTC
Some files contain hard-coded absolute paths that point to the host
and/or staging directories.

With per-package directories (aka. PPD), these paths point to the PPD
of the package that created the files, when we want them to point to the
PPD of the package that uses them.

Up until now, we had two hooks that attempted to fix those files:

  - a libtool-specific hook that searches for all .la files and seds
    them with the proper PPD,

  - a python-specific hook that tweaks just the sysconfigdata and
    removes the byte-compiled version of the sysconfigdata.

But now, we also have a few other kinds of files for which we need to
fix the PPD: .cmake, .pc, or .pri files, and probably a bunch of others
as well.

We solve this issue by just replacing any PPD in text files, with the
current package's PPD.

This is very similar to, and inspired from what is done when relocating
the SDK. However, we can't use the existing relocate-sdk script, because
that needs to know the original location, which we do not have when we
aggregate the PPD (we could store it, but we can easily do without it).

Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
Cc: Adam Duskett <aduskett@gmail.com>
Cc: Louis-Paul CORDIER <lpdev@cordier.org>
Cc: Andreas Naumann <anaumann@ultratronik.de>
---
 package/pkg-generic.mk | 43 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 22 deletions(-)

Comments

Herve Codina Jan. 9, 2022, 12:14 p.m. UTC | #1
On Sat,  8 Jan 2022 18:35:41 +0100
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> Some files contain hard-coded absolute paths that point to the host
> and/or staging directories.
> 
> With per-package directories (aka. PPD), these paths point to the PPD
> of the package that created the files, when we want them to point to the
> PPD of the package that uses them.
> 
> Up until now, we had two hooks that attempted to fix those files:
> 
>   - a libtool-specific hook that searches for all .la files and seds
>     them with the proper PPD,
> 
>   - a python-specific hook that tweaks just the sysconfigdata and
>     removes the byte-compiled version of the sysconfigdata.
> 
> But now, we also have a few other kinds of files for which we need to
> fix the PPD: .cmake, .pc, or .pri files, and probably a bunch of others
> as well.
> 
> We solve this issue by just replacing any PPD in text files, with the
> current package's PPD.
> 
> This is very similar to, and inspired from what is done when relocating
> the SDK. However, we can't use the existing relocate-sdk script, because
> that needs to know the original location, which we do not have when we
> aggregate the PPD (we could store it, but we can easily do without it).
> 
> Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> Cc: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
> Cc: Adam Duskett <aduskett@gmail.com>
> Cc: Louis-Paul CORDIER <lpdev@cordier.org>
> Cc: Andreas Naumann <anaumann@ultratronik.de>
> ---
>  package/pkg-generic.mk | 43 +++++++++++++++++++++---------------------
>  1 file changed, 21 insertions(+), 22 deletions(-)
> 

[...]

Acked-by: Herve Codina <herve.codina@bootlin.com>

Best regards,
Hervé
Arnout Vandecappelle Jan. 12, 2022, 8:42 p.m. UTC | #2
On 08/01/2022 18:35, Yann E. MORIN wrote:
> Some files contain hard-coded absolute paths that point to the host
> and/or staging directories.
> 
> With per-package directories (aka. PPD), these paths point to the PPD
> of the package that created the files, when we want them to point to the
> PPD of the package that uses them.
> 
> Up until now, we had two hooks that attempted to fix those files:
> 
>    - a libtool-specific hook that searches for all .la files and seds
>      them with the proper PPD,
> 
>    - a python-specific hook that tweaks just the sysconfigdata and
>      removes the byte-compiled version of the sysconfigdata.
> 
> But now, we also have a few other kinds of files for which we need to
> fix the PPD: .cmake, .pc, or .pri files, and probably a bunch of others
> as well.
> 
> We solve this issue by just replacing any PPD in text files, with the
> current package's PPD.
> 
> This is very similar to, and inspired from what is done when relocating
> the SDK. However, we can't use the existing relocate-sdk script, because
> that needs to know the original location, which we do not have when we
> aggregate the PPD (we could store it, but we can easily do without it).

  Looking at the resulting code, it's so small that it's not worth factoring out.

> 
> Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> Cc: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
> Cc: Adam Duskett <aduskett@gmail.com>
> Cc: Louis-Paul CORDIER <lpdev@cordier.org>
> Cc: Andreas Naumann <anaumann@ultratronik.de>
> ---
>   package/pkg-generic.mk | 43 +++++++++++++++++++++---------------------
>   1 file changed, 21 insertions(+), 22 deletions(-)
> 
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 6a5fe5507b..1022062bcf 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -90,21 +90,24 @@ endif
>   #######################################
>   # Helper functions
>   
> -# Make sure .la files only reference the current per-package
> -# directory.
> -
> -# $1: package name (lower case)
> -# $2: staging directory of the package
>   ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y)
> -define fixup-libtool-files
> -	$(Q)find $(2) \( -path '$(2)/lib*' -o -path '$(2)/usr/lib*' \) \
> -		-name "*.la" -print0 | xargs -0 --no-run-if-empty \
> -		$(SED) "s:$(PER_PACKAGE_DIR)/[^/]\+/:$(PER_PACKAGE_DIR)/$(1)/:g"
> +
> +# Ensure files like .la, .pc, .pri, .cmake, and so on, point to the
> +# proper staging and host directories for the current package: find
> +# all text files that contain the PPD root, and replace it with the
> +# current package's PPD.
> +define PPD_FIXUP_PATHS
> +	$(Q)grep -lrZ '$(PER_PACKAGE_DIR)/[^/]\+/' $(HOST_DIR) \

  This will trawl to large binary files and potentially take a long time...

> +	|while read -d '' f; do \
> +		file -b --mime-type "$${f}" | grep -q '^text/' || continue; \

... just to be ignored here.

  More importantly though: if a file is a symlink, it's going to be hit twice. 
Worse, if it's a symlink to an absolute path (which most likely points *outside* 
of STAGING_DIR), we may end up sed'ing something on the host...

  I notice now that the same (theoretical) issue exists in relocate-sdk.sh. 
Obviously that script doesn't get thoroughly tested so it may very well be the 
wrong thing to do...

  Do you remember perhaps why we don't simply do

find $(HOST_DIR) -type f -print0 \

? I was going to say that we can skip the grep, but then we're back at the large 
(text) file thing.

> +		printf '%s\0' "$${f}"; \

  Why not do the sed right here, like is done in relocate-sdk.sh? In fact, I'd 
keep the code as close as possible to relocate-sdk.sh to make later refactoring 
easier.

> +	done \
> +	|xargs -0 --no-run-if-empty \
> +		$(SED) 's:$(PER_PACKAGE_DIR)/[^/]\+/:$(PER_PACKAGE_DIR)/$($(PKG)_NAME)/:g'
>   endef
> -endif
>   
> -# Make sure python _sysconfigdata*.py files only reference the current
> -# per-package directory.
> +# Remove python's pre-compiled "sysconfigdata", as it may contain paths to
> +# the original staging or host dirs.
>   #
>   # Can't use $(foreach d, $(HOST_DIR)/lib/python* $(STAGING_DIR)/usr/lib/python*, ...)
>   # because those directories may be created in the same recipe this macro will
> @@ -113,19 +116,15 @@ endif
>   # fail.
>   # So we just use HOST_DIR as a starting point, and filter on the two directories
>   # of interest.
> -ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y)
>   define FIXUP_PYTHON_SYSCONFIGDATA

  Maybe rename to REMOVE_PYTHON_SYSCONFIGATA_PYC

  Regards,
  Arnout

>   	$(Q)find $(HOST_DIR) \
>   		\(    -path '$(HOST_DIR)/lib/python*' \
>   		   -o -path '$(STAGING_DIR)/usr/lib/python*' \
>   		\) \
> -		\(    \( -name "_sysconfigdata*.pyc" -delete \) \
> -		   -o \( -name "_sysconfigdata*.py" -print0 \) \
> -		\) \
> -	| xargs -0 --no-run-if-empty \
> -		$(SED) 's:$(PER_PACKAGE_DIR)/[^/]\+/:$(PER_PACKAGE_DIR)/$($(PKG)_NAME)/:g'
> +		\( -name "_sysconfigdata*.pyc" -delete \)
>   endef
> -endif
> +
> +endif  # PPD
>   
>   # Functions to collect statistics about installed files
>   
> @@ -278,8 +277,6 @@ $(BUILD_DIR)/%/.stamp_configured:
>   	@$(call pkg_size_before,$(STAGING_DIR),-staging)
>   	@$(call pkg_size_before,$(BINARIES_DIR),-images)
>   	@$(call pkg_size_before,$(HOST_DIR),-host)
> -	$(call fixup-libtool-files,$(NAME),$(HOST_DIR))
> -	$(call fixup-libtool-files,$(NAME),$(STAGING_DIR))
>   	$(foreach hook,$($(PKG)_POST_PREPARE_HOOKS),$(call $(hook))$(sep))
>   	$(foreach hook,$($(PKG)_PRE_CONFIGURE_HOOKS),$(call $(hook))$(sep))
>   	$($(PKG)_CONFIGURE_CMDS)
> @@ -836,7 +833,9 @@ $(2)_EXTRACT_CMDS ?= \
>   		$$(TAR_OPTIONS) -)
>   
>   # pre/post-steps hooks
> -$(2)_POST_PREPARE_HOOKS += FIXUP_PYTHON_SYSCONFIGDATA
> +$(2)_POST_PREPARE_HOOKS += \
> +	PPD_FIXUP_PATHS \
> +	FIXUP_PYTHON_SYSCONFIGDATA
>   
>   ifeq ($$($(2)_TYPE),target)
>   ifneq ($$(HOST_$(2)_KCONFIG_VAR),)
>
Yann E. MORIN Jan. 13, 2022, 9:58 p.m. UTC | #3
Arnout, All,

On 2022-01-12 21:42 +0100, Arnout Vandecappelle spake thusly:
> On 08/01/2022 18:35, Yann E. MORIN wrote:
> >Some files contain hard-coded absolute paths that point to the host
> >and/or staging directories.
> >
> >With per-package directories (aka. PPD), these paths point to the PPD
> >of the package that created the files, when we want them to point to the
> >PPD of the package that uses them.
[--SNIP--]
> >+define PPD_FIXUP_PATHS
> >+	$(Q)grep -lrZ '$(PER_PACKAGE_DIR)/[^/]\+/' $(HOST_DIR) \
>  This will trawl to large binary files and potentially take a long time...

I too thought of that, and I even discussed it slightly with Peter K.
here about what would be most efficient: grep-then-file, or file-then-grep?

But if we add --binary-files=without-match to grep, then it will
bail-out early on binary files, which would be more efficient. And then
we still pass that through file to really get only text files.

But a text-file with utf8 is really a binary file, so would grep ignore
it?

> >+	|while read -d '' f; do \
> >+		file -b --mime-type "$${f}" | grep -q '^text/' || continue; \
> ... just to be ignored here.

Again, not sure, because text/plain but utf8-encoded?

>  More importantly though: if a file is a symlink, it's going to be hit
> twice.

No, I don't think so:

    $ man grep
    [..]
    -r, --recursive
        Read all files under each directory, recursively, following
        symbolic links only if they are on the command line. [...]

And that seems to be the case:

    $ echo Pouet >foo
    $ ln -s foo bar
    $ grep -r Pouet .
    foo:Pouet

So, seems OK to me?

> Worse, if it's a symlink to an absolute path (which most likely
> points *outside* of STAGING_DIR), we may end up sed'ing something on the
> host...
>  I notice now that the same (theoretical) issue exists in relocate-sdk.sh.
> Obviously that script doesn't get thoroughly tested so it may very well be
> the wrong thing to do...
> 
>  Do you remember perhaps why we don't simply do
> 
> find $(HOST_DIR) -type f -print0 \

Maybe we should. But grep -r was an elegant way to combine the recursive
feature of find, and limit to the matching files at the same time...

I can try to time things to see what's the fastest / less-slow... But
given that the symlinks are not an issue in practice, so we care?

> ? I was going to say that we can skip the grep, but then we're back at the
> large (text) file thing.
> 
> >+		printf '%s\0' "$${f}"; \
> 
>  Why not do the sed right here, like is done in relocate-sdk.sh? In fact,
> I'd keep the code as close as possible to relocate-sdk.sh to make later
> refactoring easier.

Using xarg allows to spawn only a few sed. printf is a shell built-in so
it's basically free.

[--SNIP--]
> >  define FIXUP_PYTHON_SYSCONFIGDATA
>  Maybe rename to REMOVE_PYTHON_SYSCONFIGATA_PYC

OK

Regards,
Yann E. MORIN.
Yann E. MORIN Jan. 22, 2022, 10:50 a.m. UTC | #4
Arnout, All,

On 2022-01-13 22:58 +0100, Yann E. MORIN spake thusly:
> On 2022-01-12 21:42 +0100, Arnout Vandecappelle spake thusly:
> > On 08/01/2022 18:35, Yann E. MORIN wrote:
> > >Some files contain hard-coded absolute paths that point to the host
> > >and/or staging directories.
> > >
> > >With per-package directories (aka. PPD), these paths point to the PPD
> > >of the package that created the files, when we want them to point to the
> > >PPD of the package that uses them.
> [--SNIP--]
> > >+define PPD_FIXUP_PATHS
> > >+	$(Q)grep -lrZ '$(PER_PACKAGE_DIR)/[^/]\+/' $(HOST_DIR) \
> >  This will trawl to large binary files and potentially take a long time...
> 
> I too thought of that, and I even discussed it slightly with Peter K.
> here about what would be most efficient: grep-then-file, or file-then-grep?
> 
> But if we add --binary-files=without-match to grep, then it will
> bail-out early on binary files, which would be more efficient. And then
> we still pass that through file to really get only text files.
> 
> But a text-file with utf8 is really a binary file, so would grep ignore
> it?

So I tried and created a simple file with arbitrary non-ASCII UTF-8
symbols (I just copy-pasted the chinese Wikipedia home page, arbitarily
wrapped for "readability"):

    $ cat foo
    英格兰足球超级联赛金手套奖是英格兰足球超级联赛的一个奖项,
    每年颁发一次,颁给在当赛季英超联赛中完成最多次零封的守门员。
    在足球中,零封指的是守门员在一整场比赛中保持不失球。2005年,
    英超联赛第一次颁发金手套奖,切尔西的彼得·切赫获奖,切赫在
    2004-05赛季完成了24场零封,至今仍是英超联赛的记录。切赫和乔·
    哈特各赢得过四次金手套奖,并列获该奖次数最多的球员,切赫还是
    至今唯一一位为两家俱乐部(切尔西、阿森纳)效力时都获得金手套
    奖的球员。佩佩·雷纳则是第一个能够连续获得该奖的球员,他在2006
    至2008年连续三次获得英超金手套。乔·哈特则在2011-2013年追平了
    这一成就。

    $ file -b --mime-type foo
    text/plain

    $ grep --binary-files=without-match 是英超 foo
    2004-05赛季完成了24场零封,至今仍是英超联赛的记录。切赫和乔·

So, a text file with utf-8 chars is still recognied as a text file,
and grep will also not recognise it as a binary file.

So we can tell grep to bail out early on binary files, to speed up the
stuff. I'd still keep the 'file' test as a further safety net, though.

So, grep does not follow symlinks, and we can can speed grep by
instructing it to skip binary files. As such, I think all your issues
have been addressed now.

Regards,
Yann E. MORIN.
diff mbox series

Patch

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 6a5fe5507b..1022062bcf 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -90,21 +90,24 @@  endif
 #######################################
 # Helper functions
 
-# Make sure .la files only reference the current per-package
-# directory.
-
-# $1: package name (lower case)
-# $2: staging directory of the package
 ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y)
-define fixup-libtool-files
-	$(Q)find $(2) \( -path '$(2)/lib*' -o -path '$(2)/usr/lib*' \) \
-		-name "*.la" -print0 | xargs -0 --no-run-if-empty \
-		$(SED) "s:$(PER_PACKAGE_DIR)/[^/]\+/:$(PER_PACKAGE_DIR)/$(1)/:g"
+
+# Ensure files like .la, .pc, .pri, .cmake, and so on, point to the
+# proper staging and host directories for the current package: find
+# all text files that contain the PPD root, and replace it with the
+# current package's PPD.
+define PPD_FIXUP_PATHS
+	$(Q)grep -lrZ '$(PER_PACKAGE_DIR)/[^/]\+/' $(HOST_DIR) \
+	|while read -d '' f; do \
+		file -b --mime-type "$${f}" | grep -q '^text/' || continue; \
+		printf '%s\0' "$${f}"; \
+	done \
+	|xargs -0 --no-run-if-empty \
+		$(SED) 's:$(PER_PACKAGE_DIR)/[^/]\+/:$(PER_PACKAGE_DIR)/$($(PKG)_NAME)/:g'
 endef
-endif
 
-# Make sure python _sysconfigdata*.py files only reference the current
-# per-package directory.
+# Remove python's pre-compiled "sysconfigdata", as it may contain paths to
+# the original staging or host dirs.
 #
 # Can't use $(foreach d, $(HOST_DIR)/lib/python* $(STAGING_DIR)/usr/lib/python*, ...)
 # because those directories may be created in the same recipe this macro will
@@ -113,19 +116,15 @@  endif
 # fail.
 # So we just use HOST_DIR as a starting point, and filter on the two directories
 # of interest.
-ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y)
 define FIXUP_PYTHON_SYSCONFIGDATA
 	$(Q)find $(HOST_DIR) \
 		\(    -path '$(HOST_DIR)/lib/python*' \
 		   -o -path '$(STAGING_DIR)/usr/lib/python*' \
 		\) \
-		\(    \( -name "_sysconfigdata*.pyc" -delete \) \
-		   -o \( -name "_sysconfigdata*.py" -print0 \) \
-		\) \
-	| xargs -0 --no-run-if-empty \
-		$(SED) 's:$(PER_PACKAGE_DIR)/[^/]\+/:$(PER_PACKAGE_DIR)/$($(PKG)_NAME)/:g'
+		\( -name "_sysconfigdata*.pyc" -delete \)
 endef
-endif
+
+endif  # PPD
 
 # Functions to collect statistics about installed files
 
@@ -278,8 +277,6 @@  $(BUILD_DIR)/%/.stamp_configured:
 	@$(call pkg_size_before,$(STAGING_DIR),-staging)
 	@$(call pkg_size_before,$(BINARIES_DIR),-images)
 	@$(call pkg_size_before,$(HOST_DIR),-host)
-	$(call fixup-libtool-files,$(NAME),$(HOST_DIR))
-	$(call fixup-libtool-files,$(NAME),$(STAGING_DIR))
 	$(foreach hook,$($(PKG)_POST_PREPARE_HOOKS),$(call $(hook))$(sep))
 	$(foreach hook,$($(PKG)_PRE_CONFIGURE_HOOKS),$(call $(hook))$(sep))
 	$($(PKG)_CONFIGURE_CMDS)
@@ -836,7 +833,9 @@  $(2)_EXTRACT_CMDS ?= \
 		$$(TAR_OPTIONS) -)
 
 # pre/post-steps hooks
-$(2)_POST_PREPARE_HOOKS += FIXUP_PYTHON_SYSCONFIGDATA
+$(2)_POST_PREPARE_HOOKS += \
+	PPD_FIXUP_PATHS \
+	FIXUP_PYTHON_SYSCONFIGDATA
 
 ifeq ($$($(2)_TYPE),target)
 ifneq ($$(HOST_$(2)_KCONFIG_VAR),)