diff mbox series

[01/15] package/pkg-generic.mk: detect files overwritten in TARGET_DIR and HOST_DIR

Message ID 20210621141130.48654-2-herve.codina@bootlin.com
State Changes Requested
Headers show
Series Overwritten file detection and fixes, one more step to TLP build | expand

Commit Message

Herve Codina June 21, 2021, 2:11 p.m. UTC
From: Thomas Petazzoni <thomas.petazzoni@bootlin.com>

Without per-package directory support, a package can happily overwrite
files installed by other packages. Indeed, because the build order
between packages is always guaranteed, Buildroot will always produce
the same output.

However, with per-package directory support, it is absolutely critical
that a given package does not overwrite files already installed by
another package, due to how the final aggregation is done to create
the complete target/, staging/ and host/ folders. Unfortunately, we
currently don't have anything in Buildroot that detects this
situation.

We used to have check-uniq-files, but it was dropped in commit
2496189a4207173e4cd5bbab90256f911175ee57.

This commit is a new implementation of such a detection, which is
based on calculating and verifying MD5 hashes of installed files: the
calculation is done at the beginning of the configure step, the
verification during the newly introduced "install" step that takes
place after all installation steps.

Since preventing file overwrites is really only needed when
per-package directory support is used, and due to this verification
having some overhead, it is only enabled when
BR2_PER_PACKAGE_DIRECTORIES=y. This additional verification cost is
however not too bad as on average, with per-package directory support,
the per-package target/ and host/ directories will contain less files
than with a build that doesn't use per-package directory support. This
helps a bit in mitigating the additional cost of this verification.

Note that we are not handling separately HOST_DIR and STAGING_DIR,
like we're doing with the pkg_size_{before,after} functions. Instead,
the verification on HOST_DIR walks down into the STAGING_DIR.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>

This commit is retreived from Thomas's work.
The first version was discussed
https://patchwork.ozlabs.org/project/buildroot/patch/20200430095249.782597-9-thomas.petazzoni@bootlin.com/
This new version was not already submitted by Thomas or I missed it.

Signed-off-by: Herve Codina <herve.codina@bootlin.com>
---
 package/pkg-generic.mk | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Comments

Yann E. MORIN June 21, 2021, 9:31 p.m. UTC | #1
Hervé, All,

On 2021-06-21 16:11 +0200, Herve Codina spake thusly:
> From: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> 
> Without per-package directory support, a package can happily overwrite
> files installed by other packages. Indeed, because the build order
> between packages is always guaranteed, Buildroot will always produce
> the same output.
> 
> However, with per-package directory support, it is absolutely critical
> that a given package does not overwrite files already installed by
> another package, due to how the final aggregation is done to create
> the complete target/, staging/ and host/ folders. Unfortunately, we
> currently don't have anything in Buildroot that detects this
> situation.
> 
> We used to have check-uniq-files, but it was dropped in commit
> 2496189a4207173e4cd5bbab90256f911175ee57.
> 
> This commit is a new implementation of such a detection, which is
> based on calculating and verifying MD5 hashes of installed files: the
> calculation is done at the beginning of the configure step, the
> verification during the newly introduced "install" step that takes
> place after all installation steps.
> 
> Since preventing file overwrites is really only needed when
> per-package directory support is used, and due to this verification
> having some overhead, it is only enabled when
> BR2_PER_PACKAGE_DIRECTORIES=y. This additional verification cost is
> however not too bad as on average, with per-package directory support,
> the per-package target/ and host/ directories will contain less files
> than with a build that doesn't use per-package directory support. This
> helps a bit in mitigating the additional cost of this verification.
> 
> Note that we are not handling separately HOST_DIR and STAGING_DIR,
> like we're doing with the pkg_size_{before,after} functions. Instead,
> the verification on HOST_DIR walks down into the STAGING_DIR.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>

From here...

> This commit is retreived from Thomas's work.
> The first version was discussed
> https://patchwork.ozlabs.org/project/buildroot/patch/20200430095249.782597-9-thomas.petazzoni@bootlin.com/
> This new version was not already submitted by Thomas or I missed it.

... to here: this should have been a post-commit note, after the ---
line...

> Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> ---

... here.

Additionally, it would have been nice to summarise the changes made
between the original submission and htis new one, and how the previous
review was handled.

>  package/pkg-generic.mk | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 45589bcbb4..bb9ff4150a 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -102,6 +102,25 @@ define fixup-libtool-files
>  endef
>  endif
>  
> +# Functions to detect overwritten files
> +
> +ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y)
> +# $(1): base directory to search in
> +# $(2): suffix of file (optional)
> +define pkg_detect_overwrite_before
> +	LC_ALL=C find $(1) -type f -print0 | xargs -0 -r md5sum > $($(PKG)_DIR)/.files$(2).md5;
> +endef
> +
> +# $(1): base directory to search in
> +# $(2): suffix of file (optional)
> +define pkg_detect_overwrite_after
> +	if test -s $($(PKG)_DIR)/.files$(2).md5 ; then \
> +		LC_ALL=C md5sum --quiet -c $($(PKG)_DIR)/.files$(2).md5 || \
> +		{ echo "ERROR: package $($(PKG)_NAME) has overwritten files installed by a previous package, aborting."; exit 1; } ; \
> +	fi
> +endef
> +endif

And now I see that only part of the problem is handled; it only tries
and detect files which content changes; it does not account for files
which type did change. I.e. a symlink that was onverted over to a file,
or the other way around.

So I think we could change the pkg_detect_overwrite_before macro thusly;

    LC_ALL=C find -L $(1) -type f -print0 |...

That should cover both cases, with just the esception that a symlink is
replaced with a file of the same content (or the other way around).
There's just a little quirk, though:

    find: File system loop detected; ‘host/usr’ is part of the same file
    system loop as ‘host’.

Meh, that's starting to be a bit hairy... We can just ignore it
explicitly, maybe? For merged-usr in target, we should not have that
problem, but there is still the option for a custom skeleton, where
people could provide it as well...

So, to summarise: this patch does not cover all cases, but it is still
acceptable, and brings in the necessary infra that we can later extend
should the need arises.

Regards,
Yann E. MORIN.

>  # Functions to collect statistics about installed files
>  
>  # $(1): base directory to search in
> @@ -235,6 +254,8 @@ $(BUILD_DIR)/%/.stamp_configured:
>  	@$(call pkg_size_before,$(TARGET_DIR))
>  	@$(call pkg_size_before,$(STAGING_DIR),-staging)
>  	@$(call pkg_size_before,$(HOST_DIR),-host)
> +	@$(call pkg_detect_overwrite_before,$(TARGET_DIR))
> +	@$(call pkg_detect_overwrite_before,$(HOST_DIR),-host)
>  	$(call fixup-libtool-files,$(NAME),$(STAGING_DIR))
>  	$(foreach hook,$($(PKG)_PRE_CONFIGURE_HOOKS),$(call $(hook))$(sep))
>  	$($(PKG)_CONFIGURE_CMDS)
> @@ -360,6 +381,8 @@ $(BUILD_DIR)/%/.stamp_installed:
>  	@$(call pkg_size_after,$(STAGING_DIR),-staging)
>  	@$(call pkg_size_after,$(HOST_DIR),-host)
>  	@$(call check_bin_arch)
> +	@$(call pkg_detect_overwrite_after,$(TARGET_DIR))
> +	@$(call pkg_detect_overwrite_after,$(HOST_DIR),-host)
>  	$(Q)touch $@
>  
>  # Remove package sources
> -- 
> 2.31.1
> 
> _______________________________________________
> buildroot mailing list
> buildroot@busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot
Herve Codina June 22, 2021, 7:40 a.m. UTC | #2
Hi,

On Mon, 21 Jun 2021 23:31:40 +0200
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

[...]
> > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>  
> 
> From here...
> 
> > This commit is retreived from Thomas's work.
> > The first version was discussed
> > https://patchwork.ozlabs.org/project/buildroot/patch/20200430095249.782597-9-thomas.petazzoni@bootlin.com/
> > This new version was not already submitted by Thomas or I missed it.  
> 
> ... to here: this should have been a post-commit note, after the ---
> line...
> 
> > Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> > ---  
> 
> ... here.
> 
> Additionally, it would have been nice to summarise the changes made
> between the original submission and htis new one, and how the previous
> review was handled.

I moved the note after a '---' line and added:

   Compared to the first version, this patch has an improved commit message and
   generates the md5sum snapshot using
     'find -L $(1) -type f -print0 | xargs -0 -r md5sum > $($(PKG)_DIR)/.files$(2).md5;'
   instead of
     'cd $(1); LC_ALL=C find . -type f -exec md5sum {} \; > $($(PKG)_DIR)/.files$(2).md5'

> 
> >  package/pkg-generic.mk | 23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> > 
> > diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> > index 45589bcbb4..bb9ff4150a 100644
> > --- a/package/pkg-generic.mk
> > +++ b/package/pkg-generic.mk
> > @@ -102,6 +102,25 @@ define fixup-libtool-files
> >  endef
> >  endif
> >  
> > +# Functions to detect overwritten files
> > +
> > +ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y)
> > +# $(1): base directory to search in
> > +# $(2): suffix of file (optional)
> > +define pkg_detect_overwrite_before
> > +	LC_ALL=C find $(1) -type f -print0 | xargs -0 -r md5sum > $($(PKG)_DIR)/.files$(2).md5;
> > +endef
> > +
> > +# $(1): base directory to search in
> > +# $(2): suffix of file (optional)
> > +define pkg_detect_overwrite_after
> > +	if test -s $($(PKG)_DIR)/.files$(2).md5 ; then \
> > +		LC_ALL=C md5sum --quiet -c $($(PKG)_DIR)/.files$(2).md5 || \
> > +		{ echo "ERROR: package $($(PKG)_NAME) has overwritten files installed by a previous package, aborting."; exit 1; } ; \
> > +	fi
> > +endef
> > +endif  
> 
> And now I see that only part of the problem is handled; it only tries
> and detect files which content changes; it does not account for files
> which type did change. I.e. a symlink that was onverted over to a file,
> or the other way around.
> 
> So I think we could change the pkg_detect_overwrite_before macro thusly;
> 
>     LC_ALL=C find -L $(1) -type f -print0 |...

I added '-L' option.

> 
> That should cover both cases, with just the esception that a symlink is
> replaced with a file of the same content (or the other way around).
> There's just a little quirk, though:
> 
>     find: File system loop detected; ‘host/usr’ is part of the same file
>     system loop as ‘host’.
> 
> Meh, that's starting to be a bit hairy... We can just ignore it
> explicitly, maybe? For merged-usr in target, we should not have that
> problem, but there is still the option for a custom skeleton, where
> people could provide it as well...
> 
> So, to summarise: this patch does not cover all cases, but it is still
> acceptable, and brings in the necessary infra that we can later extend
> should the need arises.
> 

Ok, v2 with the changes I mentioned will be sent.

Thanks for the review,
Hervé Codina
Thomas Petazzoni June 22, 2021, 9:30 a.m. UTC | #3
On Tue, 22 Jun 2021 09:40:52 +0200
Herve Codina <herve.codina@bootlin.com> wrote:

>    Compared to the first version, this patch has an improved commit message and
>    generates the md5sum snapshot using
>      'find -L $(1) -type f -print0 | xargs -0 -r md5sum > $($(PKG)_DIR)/.files$(2).md5;'

But is this better ? Due to not cd-ing into the directory, you will
have full absolute paths in the .md5 file, and adding LC_ALL=C is quite
customary too. Ditto for using | xargs instead of -exec.

>    instead of
>      'cd $(1); LC_ALL=C find . -type f -exec md5sum {} \; > $($(PKG)_DIR)/.files$(2).md5'

All in all this version, looked better. Any reason for the change ?

Thomas
Nicolas Cavallari June 22, 2021, 9:57 a.m. UTC | #4
On 22/06/2021 11:30, Thomas Petazzoni wrote:
> On Tue, 22 Jun 2021 09:40:52 +0200
> Herve Codina <herve.codina@bootlin.com> wrote:
> 
>>     Compared to the first version, this patch has an improved commit message and
>>     generates the md5sum snapshot using
>>       'find -L $(1) -type f -print0 | xargs -0 -r md5sum > $($(PKG)_DIR)/.files$(2).md5;'
> 
> But is this better ? Due to not cd-ing into the directory, you will
> have full absolute paths in the .md5 file, and adding LC_ALL=C is quite
> customary too. Ditto for using | xargs instead of -exec.

xargs will run only start one md5sum process if the command line size 
permits it, whereas "-exec md5sum {} ;" will start one md5sum per file.

An alternative is "-exec md5sum {} +", which is essentially xargs-like.
Yann E. MORIN June 22, 2021, 10:24 a.m. UTC | #5
Nicolas, All,

On 2021-06-22 11:57 +0200, Nicolas Cavallari spake thusly:
> On 22/06/2021 11:30, Thomas Petazzoni wrote:
> >On Tue, 22 Jun 2021 09:40:52 +0200
> >Herve Codina <herve.codina@bootlin.com> wrote:
> >
> >>    Compared to the first version, this patch has an improved commit message and
> >>    generates the md5sum snapshot using
> >>      'find -L $(1) -type f -print0 | xargs -0 -r md5sum > $($(PKG)_DIR)/.files$(2).md5;'
> >
> >But is this better ? Due to not cd-ing into the directory, you will
> >have full absolute paths in the .md5 file,

But is that even an issue? Those paths are only used for one per-package
directory; they do not follow files around into another package's ppd.

So, relative or absolute, we don't much care.

> >and adding LC_ALL=C is quite
> >customary too.

In this case, do we also even care or need it? I pretty sure the output
of find is not sorted at all; it most probably depend on the order of
dentries in the directory, and those have absolutely no ordering
guarantee...

Note that if you really want to use LC_ALL, passing it in front of 'find'
will not make LC_ALL available to anythin after the pipeline, so you'd
still have xargs (and md5sum) run without it...

> > Ditto for using | xargs instead of -exec.
> xargs will run only start one md5sum process if the command line size
> permits it, whereas "-exec md5sum {} ;" will start one md5sum per file.

Agreed. But I think this is exactly what Thomas suggested to use, right?

> An alternative is "-exec md5sum {} +", which is essentially xargs-like.

I once used that in the past, and it breaks on non-GNU find, because '+'
is a GNUism. So I think using xargs is still better.

Regards,
Yann E. MORIN.

> _______________________________________________
> buildroot mailing list
> buildroot@busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot
Herve Codina June 24, 2021, 2:09 p.m. UTC | #6
Hi,

On Tue, 22 Jun 2021 12:24:33 +0200
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> Nicolas, All,
> 
> On 2021-06-22 11:57 +0200, Nicolas Cavallari spake thusly:
> > On 22/06/2021 11:30, Thomas Petazzoni wrote:  
> > >On Tue, 22 Jun 2021 09:40:52 +0200
> > >Herve Codina <herve.codina@bootlin.com> wrote:
> > >  
> > >>    Compared to the first version, this patch has an improved commit message and
> > >>    generates the md5sum snapshot using
> > >>      'find -L $(1) -type f -print0 | xargs -0 -r md5sum > $($(PKG)_DIR)/.files$(2).md5;'  
> > >
> > >But is this better ? Due to not cd-ing into the directory, you will
> > >have full absolute paths in the .md5 file,  
> 
> But is that even an issue? Those paths are only used for one per-package
> directory; they do not follow files around into another package's ppd.
> 
> So, relative or absolute, we don't much care.
> 
> > >and adding LC_ALL=C is quite
> > >customary too.  
> 
> In this case, do we also even care or need it? I pretty sure the output
> of find is not sorted at all; it most probably depend on the order of
> dentries in the directory, and those have absolutely no ordering
> guarantee...
> 
> Note that if you really want to use LC_ALL, passing it in front of 'find'
> will not make LC_ALL available to anythin after the pipeline, so you'd
> still have xargs (and md5sum) run without it...
> 
> > > Ditto for using | xargs instead of -exec.  
> > xargs will run only start one md5sum process if the command line size
> > permits it, whereas "-exec md5sum {} ;" will start one md5sum per file.  
> 
> Agreed. But I think this is exactly what Thomas suggested to use, right?
> 
> > An alternative is "-exec md5sum {} +", which is essentially xargs-like.  
> 
> I once used that in the past, and it breaks on non-GNU find, because '+'
> is a GNUism. So I think using xargs is still better.
> 

Well, according to the different exchanges,

I think I can remove LC_ALL (we don't care about any order), and I keep
'... | xargs -0 -r md5sum'.
'cd' is not needed (we do not care about absolute or relative path).

So, the snapshot is taken using:
find -L $(1) -type f -print0 | xargs -0 -r md5sum > $($(PKG)_DIR)/.files$(2).md5;

and the detection is done using:
md5sum --quiet -c $($(PKG)_DIR)/.files$(2).md5

Is that ok for everyone ?


Regards,
Hervé
Yann E. MORIN June 24, 2021, 4:18 p.m. UTC | #7
Hervé, All,

On 2021-06-24 16:09 +0200, Herve Codina spake thusly:
> On Tue, 22 Jun 2021 12:24:33 +0200
> "Yann E. MORIN" <yann.morin.1998@free.fr> wrote:
[--SNIP--]
> Well, according to the different exchanges,
> 
> I think I can remove LC_ALL (we don't care about any order), and I keep
> '... | xargs -0 -r md5sum'.
> 'cd' is not needed (we do not care about absolute or relative path).
> 
> So, the snapshot is taken using:
> find -L $(1) -type f -print0 | xargs -0 -r md5sum > $($(PKG)_DIR)/.files$(2).md5;

I spoke about -L as a suggestion to follow and monitor symlinks.
However, as I said, that does not work, at least for host/, because it
will emit a warning about a directory loop. Also, there is no telling
what users will install i their target/ and we may have a similar
situation for target/.

This warning is not nice as it will happen for each and every packages.

So, drop -L. At the very least, we have the infrastructure in place, and
we can extend it later should we need additional details.

But now that I think about it, the entry-changed-to-omething-else case
is already covered by the existing instrumenttion, as part of the
pkg_size_before/after hooks (find's %y printf directive)

So, yes, just drop the -L.

> and the detection is done using:
> md5sum --quiet -c $($(PKG)_DIR)/.files$(2).md5
> 
> Is that ok for everyone ?

Except for -L, yes, this is a proper summary from my point if view.

Thanks! :-)

Regards,
Yann E. MORIN.
diff mbox series

Patch

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 45589bcbb4..bb9ff4150a 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -102,6 +102,25 @@  define fixup-libtool-files
 endef
 endif
 
+# Functions to detect overwritten files
+
+ifeq ($(BR2_PER_PACKAGE_DIRECTORIES),y)
+# $(1): base directory to search in
+# $(2): suffix of file (optional)
+define pkg_detect_overwrite_before
+	LC_ALL=C find $(1) -type f -print0 | xargs -0 -r md5sum > $($(PKG)_DIR)/.files$(2).md5;
+endef
+
+# $(1): base directory to search in
+# $(2): suffix of file (optional)
+define pkg_detect_overwrite_after
+	if test -s $($(PKG)_DIR)/.files$(2).md5 ; then \
+		LC_ALL=C md5sum --quiet -c $($(PKG)_DIR)/.files$(2).md5 || \
+		{ echo "ERROR: package $($(PKG)_NAME) has overwritten files installed by a previous package, aborting."; exit 1; } ; \
+	fi
+endef
+endif
+
 # Functions to collect statistics about installed files
 
 # $(1): base directory to search in
@@ -235,6 +254,8 @@  $(BUILD_DIR)/%/.stamp_configured:
 	@$(call pkg_size_before,$(TARGET_DIR))
 	@$(call pkg_size_before,$(STAGING_DIR),-staging)
 	@$(call pkg_size_before,$(HOST_DIR),-host)
+	@$(call pkg_detect_overwrite_before,$(TARGET_DIR))
+	@$(call pkg_detect_overwrite_before,$(HOST_DIR),-host)
 	$(call fixup-libtool-files,$(NAME),$(STAGING_DIR))
 	$(foreach hook,$($(PKG)_PRE_CONFIGURE_HOOKS),$(call $(hook))$(sep))
 	$($(PKG)_CONFIGURE_CMDS)
@@ -360,6 +381,8 @@  $(BUILD_DIR)/%/.stamp_installed:
 	@$(call pkg_size_after,$(STAGING_DIR),-staging)
 	@$(call pkg_size_after,$(HOST_DIR),-host)
 	@$(call check_bin_arch)
+	@$(call pkg_detect_overwrite_after,$(TARGET_DIR))
+	@$(call pkg_detect_overwrite_after,$(HOST_DIR),-host)
 	$(Q)touch $@
 
 # Remove package sources