[06/19] infra/pkg-generic: only list files installed by the current package

Message ID 8c891d4245028f97585d0e55ab5962e6ea337659.1546898693.git.yann.morin.1998@free.fr
State Changes Requested
Headers show
Series
  • [01/19] infra/pkg-generic: display MESSAGE before running PRE_HOOKS
Related show

Commit Message

Yann E. MORIN Jan. 7, 2019, 10:05 p.m.
Since 7fb6e782542f (core/instrumentation: shave minutes off the build
time), the built stampfile is used as a reference to detect files
installed by a package.

However, this falls short during development, when a user may want to
re-install a built-early package without rebuilding it (i.e. make
foo-reinstall). In this case, the built stampfile is not touched, and is
still dated from way back when the package was first built. As such,
almost all files in target (or staging or host) are newer than that, and
so those files are all now accounted for that package, when in fact only
a minor subset may be accountable to it.

We fix that by limiting the search for files that have been actually
touched during the install step, now that we have the proper timestamp
for it.

Reported-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
---
 package/pkg-generic.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Thomas De Schampheleire Jan. 8, 2019, 1:07 p.m. | #1
El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribió:
>
> Since 7fb6e782542f (core/instrumentation: shave minutes off the build
> time), the built stampfile is used as a reference to detect files
> installed by a package.
>
> However, this falls short during development, when a user may want to
> re-install a built-early package without rebuilding it (i.e. make
> foo-reinstall). In this case, the built stampfile is not touched, and is
> still dated from way back when the package was first built. As such,
> almost all files in target (or staging or host) are newer than that, and
> so those files are all now accounted for that package, when in fact only
> a minor subset may be accountable to it.
>
> We fix that by limiting the search for files that have been actually
> touched during the install step, now that we have the proper timestamp
> for it.
>
> Reported-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
> ---
>  package/pkg-generic.mk | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 3de8a99675..42aebeb49d 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -67,7 +67,7 @@ define step_pkg_size_inner
>         $(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt
>         cd $(2); \
>         find . \( -type f -o -type l \) \
> -               -newer $($(PKG)_DIR)/.stamp_built \
> +               -newer $@_before \
>                 -exec printf '$(1),%s\n' {} + \
>                 >> $(BUILD_DIR)/packages-file-list$(3).txt
>  endef

While it is probably not important, just noting here that if package A
installs a file with a modification time in the future, then package B
installed after A would still get that future file in its list. In
fact, every package built after pkg A would then get that file into
'its' list.
Thomas De Schampheleire Jan. 8, 2019, 3:56 p.m. | #2
El mar., 8 ene. 2019 a las 14:07, Thomas De Schampheleire
(<patrickdepinguin@gmail.com>) escribió:
>
> El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribió:
> >
> > Since 7fb6e782542f (core/instrumentation: shave minutes off the build
> > time), the built stampfile is used as a reference to detect files
> > installed by a package.
> >
> > However, this falls short during development, when a user may want to
> > re-install a built-early package without rebuilding it (i.e. make
> > foo-reinstall). In this case, the built stampfile is not touched, and is
> > still dated from way back when the package was first built. As such,
> > almost all files in target (or staging or host) are newer than that, and
> > so those files are all now accounted for that package, when in fact only
> > a minor subset may be accountable to it.
> >
> > We fix that by limiting the search for files that have been actually
> > touched during the install step, now that we have the proper timestamp
> > for it.
> >
> > Reported-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
> > Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> > Cc: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
> > ---
> >  package/pkg-generic.mk | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> > index 3de8a99675..42aebeb49d 100644
> > --- a/package/pkg-generic.mk
> > +++ b/package/pkg-generic.mk
> > @@ -67,7 +67,7 @@ define step_pkg_size_inner
> >         $(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt
> >         cd $(2); \
> >         find . \( -type f -o -type l \) \
> > -               -newer $($(PKG)_DIR)/.stamp_built \
> > +               -newer $@_before \
> >                 -exec printf '$(1),%s\n' {} + \
> >                 >> $(BUILD_DIR)/packages-file-list$(3).txt
> >  endef
>
> While it is probably not important, just noting here that if package A
> installs a file with a modification time in the future, then package B
> installed after A would still get that future file in its list. In
> fact, every package built after pkg A would then get that file into
> 'its' list.

Possibly more realistically, a package installing a file with a fixed
timestamp in the past, perhaps a timestamp of the build time preserved
via 'cp -a'.

For this reason I retract the 'probably not important' part.
Yann E. MORIN Jan. 8, 2019, 4:08 p.m. | #3
Thomas DS, All,

Thanks for the review! :-)

On 2019-01-08 14:07 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribió:
snip
> > We fix that by limiting the search for files that have been actually
> > touched during the install step, now that we have the proper timestamp
> > for it.
[--SNIP--]
> > diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> > index 3de8a99675..42aebeb49d 100644
> > --- a/package/pkg-generic.mk
> > +++ b/package/pkg-generic.mk
> > @@ -67,7 +67,7 @@ define step_pkg_size_inner
> >         $(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt
> >         cd $(2); \
> >         find . \( -type f -o -type l \) \
> > -               -newer $($(PKG)_DIR)/.stamp_built \
> > +               -newer $@_before \
> While it is probably not important, just noting here that if package A
> installs a file with a modification time in the future, then package B
> installed after A would still get that future file in its list. In
> fact, every package built after pkg A would then get that file into
> 'its' list.

How is this different from the current situation?

Maybe in that case, it would make sense that we touch all files after a
package installation (limited to that package's files, of course)?

My position is that, if a package voluntarily install stuff in the
future, it is borked. I.e. either it uses an absolute time, in which
case that time will be in the past some time in the future, or it uses
a delta, in which case it is not reproducible.

Furthermore when we do an actual reproducible build, we actually touch
all files at the end of the build, so a package that relies on file's
timestamp is not reproducible.

Regards,
Yann E. MORIN.
Thomas De Schampheleire Jan. 8, 2019, 4:55 p.m. | #4
El mar., 8 ene. 2019 a las 17:08, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribió:
>
> Thomas DS, All,
>
> Thanks for the review! :-)
>
> On 2019-01-08 14:07 +0100, Thomas De Schampheleire spake thusly:
> > El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> > (<yann.morin.1998@free.fr>) escribió:
> snip
> > > We fix that by limiting the search for files that have been actually
> > > touched during the install step, now that we have the proper timestamp
> > > for it.
> [--SNIP--]
> > > diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> > > index 3de8a99675..42aebeb49d 100644
> > > --- a/package/pkg-generic.mk
> > > +++ b/package/pkg-generic.mk
> > > @@ -67,7 +67,7 @@ define step_pkg_size_inner
> > >         $(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt
> > >         cd $(2); \
> > >         find . \( -type f -o -type l \) \
> > > -               -newer $($(PKG)_DIR)/.stamp_built \
> > > +               -newer $@_before \
> > While it is probably not important, just noting here that if package A
> > installs a file with a modification time in the future, then package B
> > installed after A would still get that future file in its list. In
> > fact, every package built after pkg A would then get that file into
> > 'its' list.
>
> How is this different from the current situation?

Yes, it's the same as in the current situation, but not in 'my'
current situation where I'm still relying on md5sum :-)
Part of my interest in reviewing these changes is the hope that they
could remove the need for my revert of the 'shave off' commit that
switches to mtime instead of md5sum, and thus following upstream
Buildroot in this respect.

>
> Maybe in that case, it would make sense that we touch all files after a
> package installation (limited to that package's files, of course)?

But it could be that the package has good reasons to expect a specific
timestamp, or more probably a relative time order between certain
files.
I'm not sure if it is common, like I said I'd need to check in my own case.

>
> My position is that, if a package voluntarily install stuff in the
> future, it is borked. I.e. either it uses an absolute time, in which
> case that time will be in the past some time in the future, or it uses
> a delta, in which case it is not reproducible.

A time delta expected between two files is not irreproducible if both
files come from the same package.

>
> Furthermore when we do an actual reproducible build, we actually touch
> all files at the end of the build, so a package that relies on file's
> timestamp is not reproducible.

I was not aware of this. That changes things, of course.
Can you point me to the exact place in the code?

Thanks,
Thomas
Yann E. MORIN Jan. 8, 2019, 7:51 p.m. | #5
Thomas DS, All,

On 2019-01-08 16:56 +0100, Thomas De Schampheleire spake thusly:
> El mar., 8 ene. 2019 a las 14:07, Thomas De Schampheleire
> (<patrickdepinguin@gmail.com>) escribió:
[--SNIP--]
> > > -               -newer $($(PKG)_DIR)/.stamp_built \
> > > +               -newer $@_before \
> > While it is probably not important, just noting here that if package A
> > installs a file with a modification time in the future, then package B
> > installed after A would still get that future file in its list. In
> > fact, every package built after pkg A would then get that file into
> > 'its' list.
> Possibly more realistically, a package installing a file with a fixed
> timestamp in the past, perhaps a timestamp of the build time preserved
> via 'cp -a'.

Actually, we already have this very issue with our skeleton: its files
are copied and their timestamps are kept as-is, so the files are almost
certainly older than the built timestamp, and thus are not accounted for
againt the skeleton.

I already identified this problem a few days ago while working on this
series, and I had considered fixing it later, or the series would get
even bigger...

And yes, the problem already happens with the current situation.

If we wanted to be exhaustive, we'd also list files that are not already
in the list (and warn loudly). And since we introduce a new format, we
could more easily add a "metadata" field that specifies why the entry
was added (mtime, new-but-old, etc...)

Regards,
Yann E. MORIN.
Yann E. MORIN Jan. 8, 2019, 8:02 p.m. | #6
Thomas DS, All,

On 2019-01-08 17:55 +0100, Thomas De Schampheleire spake thusly:
> El mar., 8 ene. 2019 a las 17:08, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribió:
> > On 2019-01-08 14:07 +0100, Thomas De Schampheleire spake thusly:
> > > El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
> > > (<yann.morin.1998@free.fr>) escribió:
[--SNIP--]
> > > > -               -newer $($(PKG)_DIR)/.stamp_built \
> > > > +               -newer $@_before \
> > > While it is probably not important, just noting here that if package A
> > > installs a file with a modification time in the future, then package B
> > > installed after A would still get that future file in its list. In
> > > fact, every package built after pkg A would then get that file into
> > > 'its' list.
> > How is this different from the current situation?
> 
> Yes, it's the same as in the current situation, but not in 'my'
> current situation where I'm still relying on md5sum :-)
> Part of my interest in reviewing these changes is the hope that they
> could remove the need for my revert of the 'shave off' commit that
> switches to mtime instead of md5sum, and thus following upstream
> Buildroot in this respect.

Yes, and I thank you a lot for this thorough review! :-)

> > Maybe in that case, it would make sense that we touch all files after a
> > package installation (limited to that package's files, of course)?
> But it could be that the package has good reasons to expect a specific
> timestamp, or more probably a relative time order between certain
> files.
> I'm not sure if it is common, like I said I'd need to check in my own case.

The only case I'm aware of, is that systemd can detect the timestamp of
/usr and decide to run upgrade scripts if /usr is more recent than a
file in /etc, hence the reason we have:

    https://git.buildroot.org/buildroot/tree/Makefile#n786

But see below...

> > My position is that, if a package voluntarily install stuff in the
> > future, it is borked. I.e. either it uses an absolute time, in which
> > case that time will be in the past some time in the future, or it uses
> > a delta, in which case it is not reproducible.
> 
> A time delta expected between two files is not irreproducible if both
> files come from the same package.
> 
> >
> > Furthermore when we do an actual reproducible build, we actually touch
> > all files at the end of the build, so a package that relies on file's
> > timestamp is not reproducible.
> 
> I was not aware of this. That changes things, of course.
> Can you point me to the exact place in the code?

    https://git.buildroot.org/buildroot/tree/fs/common.mk#n39

And now I notice that the systemd stuff is borked in this case, because
/usr must really be the time of build, not the time of the commit date.

I now realise that there is a whole slew of incorrect assumptions we've
made about SOURCE_DATE_EPOCH in the context of Buildroot, but I shall
write about this in another thread.

Regards,
Yann E. MORIN.

Patch

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 3de8a99675..42aebeb49d 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -67,7 +67,7 @@  define step_pkg_size_inner
 	$(SED) '/^$(1),/d' $(BUILD_DIR)/packages-file-list$(3).txt
 	cd $(2); \
 	find . \( -type f -o -type l \) \
-		-newer $($(PKG)_DIR)/.stamp_built \
+		-newer $@_before \
 		-exec printf '$(1),%s\n' {} + \
 		>> $(BUILD_DIR)/packages-file-list$(3).txt
 endef