diff mbox series

[v2,05/12] support/download/go-post-process: implement Go vendoring support

Message ID 20201219153525.1361175-6-thomas.petazzoni@bootlin.com
State Changes Requested
Headers show
Series [v2,01/12] support/download/dl-wrapper: add concept of download post-processing | expand

Commit Message

Thomas Petazzoni Dec. 19, 2020, 3:35 p.m. UTC
This commit introduces the download post-process script
support/download/go-post-process, and hooks it into the Go package
infrastructure.

The -modcacherw flag is added to ensure that the Go cache is
read/write, and can be deleted properly upon "make clean".

The <pkg>_LICENSE variable of golang packages is expanded with ",
vendored dependencies licenses probably not listed" as currently for
all packages, the licenses of the vendored dependencies are not taken
into account.

The logic to generate go.mod when not available is moved to the
download post-process helper, as it must be done before "go mod
vendor" is executed. Also, "go mod init" is used instead of manually
crafting go.mod. This was suggested by Christian Stewart
<christian@paral.in>. The Go module name is passed down to
go-post-process using the BR_GOMOD environment variable.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 package/pkg-golang.mk            | 20 ++++++++++--------
 support/download/go-post-process | 35 ++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 9 deletions(-)
 create mode 100755 support/download/go-post-process

Comments

Christian Stewart July 29, 2021, 8:17 p.m. UTC | #1
Thomas, all,


On Sat, Dec 19, 2020 at 7:35 AM Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
> The logic to generate go.mod when not available is moved to the
> download post-process helper, as it must be done before "go mod
> vendor" is executed. Also, "go mod init" is used instead of manually
> crafting go.mod. This was suggested by Christian Stewart
> <christian@paral.in>. The Go module name is passed down to
> go-post-process using the BR_GOMOD environment variable.

> +go mod vendor -v -modcacherw
> +popd > /dev/null
> +
> +repack ${base_name} ${output}

This works fine in most cases, however, this causes inconsistent hashes.

On my system, the hash is correct after re-pack. But on 2 of my users
machines, the hash ends up different.

So something specific to the host machine must be influencing the hash
of the repacked archive.

It's either something in "go mod vendor" that's not deterministic, or
maybe host Go is being used instead of the one compiled by Buildroot,
or perhaps it's not possible to always get deterministic hashes this
way.

Can we possibly switch to hashing the download /before/ re-packing it,
given that Go verifies the hashes in go.sum automatically as part of
the "go mod vendor" step?

Go mod vendor will fail if any of the dependencies are different, the
checksums are included in go.sum

Thanks & best regards,
Christian Stewart
Thomas Petazzoni July 29, 2021, 8:50 p.m. UTC | #2
Hello Christian,

On Thu, 29 Jul 2021 13:17:09 -0700
Christian Stewart <christian@paral.in> wrote:

> This works fine in most cases, however, this causes inconsistent hashes.
> 
> On my system, the hash is correct after re-pack. But on 2 of my users
> machines, the hash ends up different.
> 
> So something specific to the host machine must be influencing the hash
> of the repacked archive.
> 
> It's either something in "go mod vendor" that's not deterministic, or
> maybe host Go is being used instead of the one compiled by Buildroot,
> or perhaps it's not possible to always get deterministic hashes this
> way.

Meh :-/

> Can we possibly switch to hashing the download /before/ re-packing it,
> given that Go verifies the hashes in go.sum automatically as part of
> the "go mod vendor" step?

This is going to be difficult, because the whole principle of this
patch series is to integrate the vendoring within the download step,
i.e *before* the hash is verified. For example, the hash is verified
even if you're just extracting, and not downloading.

One possibility that we had discussed in the past, but that Yann didn't
like was to create one tarball for the upstream code, and another for
the vendored dependencies.

Based on the difficulties, I would really like to understand a little
bit why the archive is not deterministic. Could you compare the
contents of the download stuff between your different machines, and the
contents of the tarball, and see what changes?

> Go mod vendor will fail if any of the dependencies are different, the
> checksums are included in go.sum

Is the presence of go.sum mandatory?

Best regards,

Thomas
Vincent Fazio July 30, 2021, 1:18 p.m. UTC | #3
Christian, all,

On 7/29/21 3:50 PM, Thomas Petazzoni wrote:
> Hello Christian,
>
> On Thu, 29 Jul 2021 13:17:09 -0700
> Christian Stewart <christian@paral.in> wrote:
>
>> This works fine in most cases, however, this causes inconsistent hashes.
>>
>> On my system, the hash is correct after re-pack. But on 2 of my users
>> machines, the hash ends up different.
>>
>> So something specific to the host machine must be influencing the hash
>> of the repacked archive.
>>
>> It's either something in "go mod vendor" that's not deterministic, or
>> maybe host Go is being used instead of the one compiled by Buildroot,
>> or perhaps it's not possible to always get deterministic hashes this
>> way.
> Meh :-/
>
I think Yann already calls out what may be going on in this patch 
https://patchwork.ozlabs.org/project/buildroot/patch/20201219153525.1361175-5-thomas.petazzoni@bootlin.com/

The tarball format will need to follow the same or similar format we use 
for other source packs by using the PAX format and trimming some 
metadata being included in the archive. GNU format tarballs are not 
guaranteed to be consistent across versions.

See: 
https://patchwork.ozlabs.org/project/buildroot/patch/bcb281853f0da8cd970446f4afed093b317dcc82.1609239666.git.yann.morin.1998@free.fr/
Yann E. MORIN Aug. 1, 2021, 7:08 a.m. UTC | #4
Christian, All,

On 2021-07-31 21:59 -0700, Christian Stewart spake thusly:
> Vincent said:
> >I think Yann already calls out what may be going on in this patch
> >https://patchwork.ozlabs.org/project/buildroot/patch/20201219153525.1361175-5-thomas.petazzoni@bootlin.com/
> 
> I think this is probably correct, the .tar.gz needs to have
> deterministic formatting.

Can you try usign the tar helper in support/download/helpers instead?

> In this particular case the user was trying to build on Kali 2.
> 
> Doesn't buildroot compile host-gzip and host-tar for this? I'm
> wondering why it's not the same between host machines.

Since 2020.02, we are no longer building host-tar, unless for very old
versions (i.e. before 1.27), thanks to some tar trickery vy Vincent.
See:

    cbe95b1a455b support/download: add helper to generate a reproducible archive

We also only build host-gzip if the host gzip is pigz (or missing).

> On Thu, Jul 29, 2021 at 1:50 PM Thomas Petazzoni
> <thomas.petazzoni@bootlin.com> wrote:
> 
> > > This works fine in most cases, however, this causes inconsistent hashes.
> > Based on the difficulties, I would really like to understand a little
> > bit why the archive is not deterministic. Could you compare the
> > contents of the download stuff between your different machines, and the
> > contents of the tarball, and see what changes?
> The two versions are attached. The actual contents of the two are
> identical. But for some reason the formatting of
> embiggen-disk-bad.tar.gz is different:
> 
>   00000030: 8232 63de 4282 92dd acff fb79 6740 eae2
> 
> Starting at "2dd acff" - in "embiggen-disk-good" it's
> 
> 00000030: 8232 63de 4282 96dd acff fb79 6740 eae6

So, I zcat both archives, and the two tar are identical!

    $ sha1sum *.tar
    dbb8338ab4acaff1e232c67d3b46602d61114b53  embiggen-disk-bad.tar
    dbb8338ab4acaff1e232c67d3b46602d61114b53  embiggen-disk-good.tar

So I also hashed the archives you provided, and they too are identical:

    $ sha1sum *.tar.gz
    bd887a60c4ca60e55d062067132ccab7d85e7d95  embiggen-disk-bad.tar.gz
    bd887a60c4ca60e55d062067132ccab7d85e7d95  embiggen-disk-good.tar.gz

Did you mess up when sending the archives?

Regards,
Yann E. MORIN.
Yann E. MORIN Aug. 1, 2021, 9:14 a.m. UTC | #5
Christian, All,

On 2021-08-01 00:24 -0700, Christian Stewart spake thusly:
> On Sun, Aug 1, 2021 at 12:08 AM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> > So I also hashed the archives you provided, and they too are identical:
> >     $ sha1sum *.tar.gz
> >     bd887a60c4ca60e55d062067132ccab7d85e7d95  embiggen-disk-bad.tar.gz
> >     bd887a60c4ca60e55d062067132ccab7d85e7d95  embiggen-disk-good.tar.gz
> > Did you mess up when sending the archives?
> Not sure how that got mixed up, attached is the correct
> embiggen-disk-bad.tar.gz:

So, what are the differences?

First, looking at the compressed archive is useless. We really need to
look at the uncompressed archives to see what is going on.

    $ gunzip embiggen-disk-good.tar.gz embiggen-disk-bad.tar.gz

    $ hxedump -Cv embiggen-disk-good.tar >embiggen-disk-good.tar.hex
    $ hxedump -Cv embiggen-disk-bad.tar >embiggen-disk-bad.tar.hex

    $ diff -du embiggen-disk-good.tar.hex embiggen-disk-bad.tar.hex >embiggen-disk.diff

    $ diffstat <embiggen-disk.diff
     embiggen-disk-bad.tar.hex |   52    26    26     0 +++++++++++++-------------
     1 file changed, 26 insertions(+), 26 deletions(-)

So, there aren't so many differences, in the end. What's about them,
then? Let's see the first hunk (the others are very similar):

    @@ -4,10 +4,10 @@
     00000030  31 61 32 33 30 38 2f 43  4f 4e 54 52 49 42 55 54 |1a2308/CONTRIBUT|
     00000040  49 4e 47 2e 6d 64 00 00  00 00 00 00 00 00 00 00 |ING.md..........|
     00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
    -00000060  00 00 00 00 30 30 30 30  36 34 34 00 30 30 30 30 |....0000644.0000|
    +00000060  00 00 00 00 30 30 30 30  36 36 34 00 30 30 30 30 |....0000664.0000|
     00000070  30 30 30 00 30 30 30 30  30 30 30 00 30 30 30 30 |000.0000000.0000|
     00000080  30 30 30 31 33 34 37 00  31 33 36 31 36 36 36 32 |0001347.13616662|
    -00000090  36 31 33 00 30 32 30 34  30 33 00 20 30 00 00 00 |613.020403. 0...|
    +00000090  36 31 33 00 30 32 30 34  30 35 00 20 30 00 00 00 |613.020405. 0...|
     000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
     000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
     000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|

So, the first obvious delta: the access mode. In the 'good' archive, it
is 0644 (rw-r--r--), while in the 'bad' archive, it is 0664 (rw-rw-r--).

But Buildroot is supposed to enforce the mode, see Makefile:

    70: REQ_UMASK = 0022
    ...
    77: ifneq ($(shell umask):$(CURDIR):$(O),$(REQ_UMASK):$(CANONICAL_CURDIR):$(CANONICAL_O))
    ...
    83: _all:
    84:     @umask $(REQ_UMASK) && \
    85:         $(MAKE) -C $(CANONICAL_CURDIR) --no-print-directory \
    86:             $(MAKECMDGOALS) $(EXTRAMAKEARGS)

So, how comes that the two have different modes?

The second change is a bit more curious. As per the GNU tar manual [0]:
offset 0x94 (148) is 'char chksum[8]'.

So we should expect an 8-byte field (still from [0]):

    The chksum field represents the simple sum of all bytes in the
    header block. Each 8-bit byte in the header is added to an unsigned
    integer, initialized to zero [...]

OK, so because the mode is different, the delta from the mode is carried
over to the chksum.

If we can find why the modes are different in your two cases, we can
probably solve the issue.

QED.

Still, I would prefer that we do use the tarball helper, because then we
would have a single location where we could solve this kind of
problems...

[0] https://www.gnu.org/software/tar/manual/html_node/Standard.html

Regards,
Yann E. MORIN.
Christian Stewart Sept. 19, 2021, 6:20 a.m. UTC | #6
Hi Yann, all,

Some users are still running into the issue that on their system the
"go mod vendor" process is producing different .tar.gz hashes than on
most other systems. Yann previously checked on this and found that the
access modes are wrong in the "bad" archive:

On Sun, Aug 1, 2021 at 2:14 AM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> So, what are the differences?
>
> First, looking at the compressed archive is useless. We really need to
> look at the uncompressed archives to see what is going on.
>
>     $ gunzip embiggen-disk-good.tar.gz embiggen-disk-bad.tar.gz

[snip]

> So, the first obvious delta: the access mode. In the 'good' archive, it
> is 0644 (rw-r--r--), while in the 'bad' archive, it is 0664 (rw-rw-r--).
>
> But Buildroot is supposed to enforce the mode, see Makefile:
>
>     70: REQ_UMASK = 0022
>     ...
>     77: ifneq ($(shell umask):$(CURDIR):$(O),$(REQ_UMASK):$(CANONICAL_CURDIR):$(CANONICAL_O))
>     ...
>     83: _all:
>     84:     @umask $(REQ_UMASK) && \
>     85:         $(MAKE) -C $(CANONICAL_CURDIR) --no-print-directory \
>     86:             $(MAKECMDGOALS) $(EXTRAMAKEARGS)

[snip]

> If we can find why the modes are different in your two cases, we can
> probably solve the issue.

Does anyone know a suitable fix for this? I recall there might also
have been some other places where this source of lack of determinism
when making .tar.gz was causing issues.

Thanks,
Christian Stewart
Christian Stewart Sept. 19, 2021, 6:42 a.m. UTC | #7
Hi all,


On Sat, Sep 18, 2021 at 11:20 PM Christian Stewart <christian@paral.in> wrote:
> Some users are still running into the issue that on their system the
> "go mod vendor" process is producing different .tar.gz hashes than on
> most other systems. Yann previously checked on this and found that the
> access modes are wrong in the "bad" archive:

> > So, the first obvious delta: the access mode. In the 'good' archive, it
> > is 0644 (rw-r--r--), while in the 'bad' archive, it is 0664 (rw-rw-r--).
> >
> > But Buildroot is supposed to enforce the mode, see Makefile:
> >
> >     70: REQ_UMASK = 0022
> >     ...
> >     77: ifneq ($(shell umask):$(CURDIR):$(O),$(REQ_UMASK):$(CANONICAL_CURDIR):$(CANONICAL_O))
> >     ...
> >     83: _all:
> >     84:     @umask $(REQ_UMASK) && \
> >     85:         $(MAKE) -C $(CANONICAL_CURDIR) --no-print-directory \
> >     86:             $(MAKECMDGOALS) $(EXTRAMAKEARGS)
>
> [snip]
>
> > If we can find why the modes are different in your two cases, we can
> > probably solve the issue.
>

I have adjusted the code from the "support for Go vendoring" patch,
which I now see was never merged into mainline (yet),

The following now uses the "mk_tar_gz" helper, as Yann suggests to do:

diff --git a/support/download/post-process-helpers
b/support/download/post-process-helpers
old mode 100644
new mode 100755
index bed8df2577..0517abcc24
--- a/support/download/post-process-helpers
+++ b/support/download/post-process-helpers
@@ -1,3 +1,5 @@
+# Source the mk_tar_gz helper.
+. "${0%/*}/helpers"

 unpack() {
         dest="$1"
@@ -8,23 +10,23 @@ unpack() {
 }

 repack() {
-        src="$1"
+        base_name="$1"
         tarball="$2"
-
-        # Generate the archive, sort with the C locale so that it is
reproducible.
-        find "$(basename ${src})" -not -type d -print0 >files.list
-        LC_ALL=C sort -z <files.list >files.list.sorted
+        wdir="$(pwd)"

         # let's use a fixed hardcoded date to be reproducible
         date="2020-02-06 01:02:03 +0000"

-        # Create GNU-format tarballs, since that's the format of the
tarballs on
-        # sources.buildroot.org and used in the *.hash files
-        tar cf new.tar --null --verbatim-files-from --numeric-owner
--format=gnu \
-            --owner=0 --group=0 --mtime="${date}" -T files.list.sorted
-        gzip -6 -n <new.tar >new.tar.gz
+        # use the helper to create a reproducible archive
+        # note: the current working dir contains base_name.
+        mk_tar_gz "${base_name}" "${base_name}" "${date}" "${wdir}/new.tar.gz"
+
+        # replace the old tarball
         mv "${tarball}" "${tarball}".old
         mv new.tar.gz "${tarball}"
         rm "${tarball}".old
         rm -rf ${src}
 }
+
+# Keep this line and the following as last lines in this file.
+# vim: ft=bash

I think this might fix the issue, if so, shall I re-submit the go
vendoring series with the adjustments?

Thanks,
Christian Stewart
diff mbox series

Patch

diff --git a/package/pkg-golang.mk b/package/pkg-golang.mk
index d07242310d..00a74a4d18 100644
--- a/package/pkg-golang.mk
+++ b/package/pkg-golang.mk
@@ -42,12 +42,13 @@  define inner-golang-package
 
 $(2)_BUILD_OPTS += \
 	-ldflags "$$($(2)_LDFLAGS)" \
+	-modcacherw \
 	-tags "$$($(2)_TAGS)" \
 	-trimpath \
 	-p $(PARALLEL_JOBS)
 
 # Target packages need the Go compiler on the host.
-$(2)_DEPENDENCIES += host-go
+$(2)_DOWNLOAD_DEPENDENCIES += host-go
 
 $(2)_BUILD_TARGETS ?= .
 
@@ -72,14 +73,15 @@  $(2)_SRC_SOFTWARE = $$(word 2,$$(subst /, ,$$(call notdomain,$$($(2)_SITE))))
 # If the go.mod file does not exist, one is written with this root path.
 $(2)_GOMOD ?= $$($(2)_SRC_DOMAIN)/$$($(2)_SRC_VENDOR)/$$($(2)_SRC_SOFTWARE)
 
-# Generate a go.mod file if it doesn't exist. Note: Go is configured
-# to use the "vendor" dir and not make network calls.
-define $(2)_GEN_GOMOD
-	if [ ! -f $$(@D)/go.mod ]; then \
-		printf "module $$($(2)_GOMOD)\n" > $$(@D)/go.mod; \
-	fi
-endef
-$(2)_POST_PATCH_HOOKS += $(2)_GEN_GOMOD
+$(2)_DOWNLOAD_POST_PROCESS = go
+$(2)_DL_ENV = \
+	$(HOST_GO_COMMON_ENV) \
+	GOPROXY=direct \
+	BR_GOMOD=$$($(2)_GOMOD)
+
+# Due to vendoring, it is pretty likely that not all licenses are
+# listed in <pkg>_LICENSE.
+$(2)_LICENSE += , vendored dependencies licenses probably not listed
 
 # Build step. Only define it if not already defined by the package .mk
 # file.
diff --git a/support/download/go-post-process b/support/download/go-post-process
new file mode 100755
index 0000000000..4ca3e9a710
--- /dev/null
+++ b/support/download/go-post-process
@@ -0,0 +1,35 @@ 
+#!/usr/bin/env bash
+
+set -e
+
+. $(dirname $0)/post-process-helpers
+
+# Parse our options
+while getopts "n:o:" OPT; do
+        case "${OPT}" in
+        o)  output="${OPTARG}";;
+        n)  base_name="${OPTARG}";;
+        :)  error "option '%s' expects a mandatory argument\n" "${OPTARG}";;
+        \?) error "unknown option '%s'\n" "${OPTARG}";;
+        esac
+done
+
+# Already vendored tarball, nothing to do
+if tar tf ${output} | grep -q "^[^/]*/vendor" ; then
+	exit 0
+fi
+
+unpack ${base_name} ${output}
+
+# Do the Go vendoring
+pushd ${base_name} > /dev/null
+
+# Generate go.mod if it doesn't exist
+if [ ! -f go.mod ] && [ -n "${BR_GOMOD}" ]; then
+	go mod init ${BR_GOMOD}
+fi
+
+go mod vendor -v -modcacherw
+popd > /dev/null
+
+repack ${base_name} ${output}