diff mbox series

[4/4,v2] support/dependencies: add a check for a suitable gzip

Message ID ba548d03f79279b7f02605006a709ffc630e8639.1542474939.git.yann.morin.1998@free.fr
State Accepted
Headers show
Series [1/4,v2] core/download: drop the SSH command | expand

Commit Message

Yann E. MORIN Nov. 17, 2018, 5:15 p.m. UTC
Recently, some hash mismatch have been reported, both by users as well
as autobuilder failures, about tarballs generated from git repositories.

This turned out to be caused by users having the 'gzip' command somehow
aliased to 'pigz' (which stand for: parallel implementation of gzip,
which takes advantage of multi-processor system to parallelise the
compression).

Unfortunately, the output of pigz-compressed archives differ from that
of gzip (even though they *are* valid gzip-compressed streams).

Add a dependency check that ensures that gzip is not pigz. If that is
the case, define a conditional dependency to host-gzip, that is used as
a download dependency for packages that will generate compressed files,
i.e. cvs, git, and svn.

Fixes:
    http://autobuild.buildroot.org/results/330/3308271fc641cadb59dbf1b5ee529a84f79e6d5c/

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Peter Korsgaard <peter@korsgaard.com>
Cc: Arnout Vandecappelle <arnout@mind.be>
Cc: Marcin Niestrój <m.niestroj@grinn-global.com>
Cc: Erico Nunes <nunes.erico@gmail.com>

---
Changes v1 -> v2:
  - don't fail, but define the conditional dependency  (Thomas)
---
 package/pkg-generic.mk                  |  4 +++-
 support/dependencies/check-host-gzip.mk |  3 +++
 support/dependencies/check-host-gzip.sh | 21 +++++++++++++++++++++
 3 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 support/dependencies/check-host-gzip.mk
 create mode 100755 support/dependencies/check-host-gzip.sh

Comments

Matt Weber Nov. 17, 2018, 5:23 p.m. UTC | #1
Yann,

On Sat, Nov 17, 2018 at 11:16 AM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
>
> Recently, some hash mismatch have been reported, both by users as well
> as autobuilder failures, about tarballs generated from git repositories.
>
> This turned out to be caused by users having the 'gzip' command somehow
> aliased to 'pigz' (which stand for: parallel implementation of gzip,
> which takes advantage of multi-processor system to parallelise the
> compression).
>
> Unfortunately, the output of pigz-compressed archives differ from that
> of gzip (even though they *are* valid gzip-compressed streams).
>
> Add a dependency check that ensures that gzip is not pigz. If that is
> the case, define a conditional dependency to host-gzip, that is used as
> a download dependency for packages that will generate compressed files,
> i.e. cvs, git, and svn.
>
> Fixes:
>     http://autobuild.buildroot.org/results/330/3308271fc641cadb59dbf1b5ee529a84f79e6d5c/
>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> Cc: Peter Korsgaard <peter@korsgaard.com>
> Cc: Arnout Vandecappelle <arnout@mind.be>
> Cc: Marcin Niestrój <m.niestroj@grinn-global.com>
> Cc: Erico Nunes <nunes.erico@gmail.com>
>
> ---
> Changes v1 -> v2:
>   - don't fail, but define the conditional dependency  (Thomas)
> ---
>  package/pkg-generic.mk                  |  4 +++-
>  support/dependencies/check-host-gzip.mk |  3 +++
>  support/dependencies/check-host-gzip.sh | 21 +++++++++++++++++++++
>  3 files changed, 27 insertions(+), 1 deletion(-)
>  create mode 100644 support/dependencies/check-host-gzip.mk
>  create mode 100755 support/dependencies/check-host-gzip.sh
>
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index f34f46afc8..ef890981bb 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -583,7 +583,9 @@ $(2)_DEPENDENCIES += host-skeleton
>  endif
>
>  ifneq ($$(filter cvs git svn,$$($(2)_SITE_METHOD)),)
> -$(2)_DOWNLOAD_DEPENDENCIES += $(BR2_TAR_HOST_DEPENDENCY)
> +$(2)_DOWNLOAD_DEPENDENCIES += \
> +       $(BR2_GZIP_HOST_DEPENDENCY) \
> +       $(BR2_TAR_HOST_DEPENDENCY)
>  endif
>
>  ifeq ($$(filter host-tar host-skeleton host-fakedate,$(1)),)
> diff --git a/support/dependencies/check-host-gzip.mk b/support/dependencies/check-host-gzip.mk
> new file mode 100644
> index 0000000000..bf9a369a7d
> --- /dev/null
> +++ b/support/dependencies/check-host-gzip.mk
> @@ -0,0 +1,3 @@
> +ifeq (,$(call suitable-host-package,gzip))
> +BR2_GZIP_HOST_DEPENDENCY = host-gzip
> +endif
> diff --git a/support/dependencies/check-host-gzip.sh b/support/dependencies/check-host-gzip.sh

(Not wanting to hijack the intent of this patch :-) )
As part of a reproducible build, why should we conditionally build
these dependencies and not instead always build them.  Then builds
start become reproducible with the same cached dl folder of material
across a series of distro releases?  Best example I have is a product
that is under development for 2-3years and we may have a spread of
build machine distros (ie Ubuntu 14 -> 18 LTS).  We've recently
started to run into this as products stabilize with the Buildroot
concept of having these conditional host dependencies building.  Where
depending on the machine, we may miss a source archive in our
collection of dl material at release time.  Thoughts?

> new file mode 100755
> index 0000000000..5f344c5f9b
> --- /dev/null
> +++ b/support/dependencies/check-host-gzip.sh
> @@ -0,0 +1,21 @@
> +#!/bin/sh
> +
> +candidate="$1" # ignored
> +
> +gzip="$(which gzip)"
> +if [ ! -x "${gzip}" ]; then
> +    # echo nothing: no suitable gzip found
> +    exit 1
> +fi
> +
> +# gzip displays its version string on stdout
> +# pigz displays its version string on stderr
> +version="$("${gzip}" --version 2>&1)"
> +case "${version}" in
> +  (*pigz*)
> +    # echo nothing: no suitable gzip found
> +    exit 1
> +    ;;
> +esac
> +
> +printf "%s" "${gzip}"
> --
> 2.14.1
>
> _______________________________________________
> buildroot mailing list
> buildroot@busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot
Yann E. MORIN Nov. 18, 2018, 1:44 p.m. UTC | #2
Matthew, All,

On 2018-11-17 11:23 -0600, Matthew Weber spake thusly:
> On Sat, Nov 17, 2018 at 11:16 AM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
[--SNIP--]
> > Add a dependency check that ensures that gzip is not pigz. If that is
> > the case, define a conditional dependency to host-gzip, that is used as
> > a download dependency for packages that will generate compressed files,
> > i.e. cvs, git, and svn.
[--SNIP--]
> (Not wanting to hijack the intent of this patch :-) )
> As part of a reproducible build, why should we conditionally build
> these dependencies and not instead always build them.  Then builds
> start become reproducible with the same cached dl folder of material
> across a series of distro releases?  Best example I have is a product
> that is under development for 2-3years and we may have a spread of
> build machine distros (ie Ubuntu 14 -> 18 LTS).  We've recently
> started to run into this as products stabilize with the Buildroot
> concept of having these conditional host dependencies building.  Where
> depending on the machine, we may miss a source archive in our
> collection of dl material at release time.  Thoughts?

So, two things, that are contradictory one to the other:

 1- we want reproducible builds,
 2- we want fast builds

For 1, it would mean that we should build as much tools as possible.
However, the more we build, the slower the build is.

For 2, we should rely as much as possible on distro-provided tools,
However, the more we rely on the host, the less reproducible we get.

gzip has been rock stable over the years. IIRC, I took one of the first
releases from way back 1993-or-so, and the latest one, 1.9; they were
generating the exact same output, 25 years apart! That, is stability.

Given the goals of the gzip authors and maintainers, I don't expect they
change anything to it anytime.

So, we really don't want to build it if the host provides it.

Now, we can't know what the future will be, and we can't predict what
other tool is gonna change its behaviour, that we have to build our
own. So, when you update to a newer host, you'll also have to adapt,
even if that means adding a few new archives to your BR2_DL_DIR, yes.

If you want to be sure that, in the future, you'll be as reproducible as
possible, then do a chroot. Even now, having a chroot ensures that all
users/developpers of your project have a known and reproducible devel
environment (no more "it builds for me" arguments!) You may even go
further, and mandate a VM, and even go as far as having HW spares for
the project lifetime (to run the VM on!).

As for Buildroot, I guess we're going to continue relying on the host
tools when they meet our expectations.

Regards,
Yann E. MORIN.
Matt Weber Nov. 18, 2018, 2:41 p.m. UTC | #3
Yann,

On Sun, Nov 18, 2018 at 7:44 AM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
>
> Matthew, All,
>
> On 2018-11-17 11:23 -0600, Matthew Weber spake thusly:
> > On Sat, Nov 17, 2018 at 11:16 AM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> [--SNIP--]
> > > Add a dependency check that ensures that gzip is not pigz. If that is
> > > the case, define a conditional dependency to host-gzip, that is used as
> > > a download dependency for packages that will generate compressed files,
> > > i.e. cvs, git, and svn.
> [--SNIP--]
> > (Not wanting to hijack the intent of this patch :-) )
> > As part of a reproducible build, why should we conditionally build
> > these dependencies and not instead always build them.  Then builds
> > start become reproducible with the same cached dl folder of material
> > across a series of distro releases?  Best example I have is a product
> > that is under development for 2-3years and we may have a spread of
> > build machine distros (ie Ubuntu 14 -> 18 LTS).  We've recently
> > started to run into this as products stabilize with the Buildroot
> > concept of having these conditional host dependencies building.  Where
> > depending on the machine, we may miss a source archive in our
> > collection of dl material at release time.  Thoughts?
>
> So, two things, that are contradictory one to the other:
>
>  1- we want reproducible builds,
>  2- we want fast builds
>
> For 1, it would mean that we should build as much tools as possible.
> However, the more we build, the slower the build is.
>

I'm definitely not advocating for building all the tools and libraries
we use from the host distro packages.  The case I'm running into is
when additional host dependency checks/builds are added over time to
Buildroot, it changes the consistency of the necessary set of cached
dl archives depending on the machine you execute on.  I do agree using
a standard container or VM instance is the way to capture and define
that "consistent environment".  More times then not, I find that I
can't control the OS users use for a dev env (many devops teams,
timelines, "favorite OS", financial constraints, engineer opinions :-)
).

Use cases
1) We have a Sandbox environment which is engineered to create
consistent offline rebuilds from a given set off offline inputs.  This
sandbox environment can't change as often as the distro used for day
to day development.  ie. need lots of projects to use the consistent
environment to get our money out of the setup/doc effort.  Normally
we'd update the environment every ~4yrs.  This mis-match of distro/env
versions results in us doing some additional test builds in the
sandbox and our day-to-day envs to identify the conditional host pkg
builds.
2) Corporate network/proxy and offline builds.  A user prepares to
take a set of files offline and collects their material on distro
14.x.y.z (when online) and then had the same distro but 14.x (offline)
that triggered a dependency build requiring another archive.

> For 2, we should rely as much as possible on distro-provided tools,
> However, the more we rely on the host, the less reproducible we get.
>
> gzip has been rock stable over the years. IIRC, I took one of the first
> releases from way back 1993-or-so, and the latest one, 1.9; they were
> generating the exact same output, 25 years apart! That, is stability.
>
> Given the goals of the gzip authors and maintainers, I don't expect they
> change anything to it anytime.
>
> So, we really don't want to build it if the host provides it.
>

Agree.  What about adding the option that if only the reproducible
option is enabled, then we build all host tools we have a version
dependency on (ie. all those we'd normally just conditionally build)?

> Now, we can't know what the future will be, and we can't predict what
> other tool is gonna change its behaviour, that we have to build our
> own. So, when you update to a newer host, you'll also have to adapt,
> even if that means adding a few new archives to your BR2_DL_DIR, yes.
>

I'm actually worried/experiencing the opposite.  It's when our distro
versions are newer during development and we go back to a older OS for
release or CI.

> If you want to be sure that, in the future, you'll be as reproducible as
> possible, then do a chroot. Even now, having a chroot ensures that all
> users/developpers of your project have a known and reproducible devel
> environment (no more "it builds for me" arguments!) You may even go
> further, and mandate a VM, and even go as far as having HW spares for
> the project lifetime (to run the VM on!).
>

Yeah, the hard part is the $/time investment in those VM and dev
environments means (at least for my company) they don't change as
often and we've found you always end up with a different/new one on
the next new project.  As a Linux team supporting our own env and a
series of dev configurations, we start to see some of the use cases
appear.  For instance I currently have a projects with dev envs close
to my Buildroot build machine distro version and a project on the
fringe of support.   Generally this spread of versions is Ok with our
projects only having a ~1-2yr development cycle before feature
complete.  It does mean we get caught occasionally by things like the
conditional host dependencies.  Internally we'll carry a patch to make
this consistant but I figured I'd bring it up and see if collectively
this would be a good upstream change.

Thanks for the feedback Yann!
Peter Korsgaard Nov. 25, 2018, 8:21 a.m. UTC | #4
>>>>> "Matthew" == Matthew Weber <matthew.weber@rockwellcollins.com> writes:

Hi,

 >> So, we really don't want to build it if the host provides it.

 > Agree.  What about adding the option that if only the reproducible
 > option is enabled, then we build all host tools we have a version
 > dependency on (ie. all those we'd normally just conditionally build)?

I think there are a number of use cases where BR2_REPRODUCIBLE would be
interesting (E.G. we have discussed turning it on by default), but you
do no want to pay the extra build time for building these host
utilities.

So I'm open to an option to force building all host dependencies, but it
should be keyed from a separate configuration option and not
BR2_REPRODUCIBLE.
diff mbox series

Patch

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index f34f46afc8..ef890981bb 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -583,7 +583,9 @@  $(2)_DEPENDENCIES += host-skeleton
 endif
 
 ifneq ($$(filter cvs git svn,$$($(2)_SITE_METHOD)),)
-$(2)_DOWNLOAD_DEPENDENCIES += $(BR2_TAR_HOST_DEPENDENCY)
+$(2)_DOWNLOAD_DEPENDENCIES += \
+	$(BR2_GZIP_HOST_DEPENDENCY) \
+	$(BR2_TAR_HOST_DEPENDENCY)
 endif
 
 ifeq ($$(filter host-tar host-skeleton host-fakedate,$(1)),)
diff --git a/support/dependencies/check-host-gzip.mk b/support/dependencies/check-host-gzip.mk
new file mode 100644
index 0000000000..bf9a369a7d
--- /dev/null
+++ b/support/dependencies/check-host-gzip.mk
@@ -0,0 +1,3 @@ 
+ifeq (,$(call suitable-host-package,gzip))
+BR2_GZIP_HOST_DEPENDENCY = host-gzip
+endif
diff --git a/support/dependencies/check-host-gzip.sh b/support/dependencies/check-host-gzip.sh
new file mode 100755
index 0000000000..5f344c5f9b
--- /dev/null
+++ b/support/dependencies/check-host-gzip.sh
@@ -0,0 +1,21 @@ 
+#!/bin/sh
+
+candidate="$1" # ignored
+
+gzip="$(which gzip)"
+if [ ! -x "${gzip}" ]; then
+    # echo nothing: no suitable gzip found
+    exit 1
+fi
+
+# gzip displays its version string on stdout
+# pigz displays its version string on stderr
+version="$("${gzip}" --version 2>&1)"
+case "${version}" in
+  (*pigz*)
+    # echo nothing: no suitable gzip found
+    exit 1
+    ;;
+esac
+
+printf "%s" "${gzip}"