diff mbox series

[11/22,v3] support/download: even more reproducible archives (until next time)

Message ID 4adbe48574f1868fa89446a101999cd5262f775f.1714858818.git.yann.morin.1998@free.fr
State Accepted
Headers show
Series support/download: extend download features and reproducibility (branch yem/git-attributes-2) | expand

Commit Message

Yann E. MORIN May 4, 2024, 9:40 p.m. UTC
Currently, when we generate archives, we rely on a few assumptions and
mechanisms to ensure reproducilibity. So far, we mostly accounted for
the content (i.e. content, filenames, and path) of the files we
archived, and this is OK (git and svn should provide reproducilbe
content by design, and cargo and go vendoring are also supposed to be
generating reproducible content.

However, tarballs do not only contain the content of the files; they
also have a few metadata about those files. Beyond filenames and paths,
which are already reproducible, there is the timestamp, the user and
group name and ID. Those are also accounted for and made reproducible.

The final touch (so far!) is that files have access rights (aka mode),
and those too are stored in tarballs. So far we accounted for those by
ensuring that Buildroot would always run under a known umask, thus
generating files with reproducible modes.

That falls short in one case that we did not envision, though: a shared
download directory, where extended attributes are set to provide a
default ACL that is permissive, to allow two or more users (with
different uid and gid) to all read and write to such a directory. This
is trivially achieved with something like:

    $ mkdir -p "${BR2_DL_DIR}"
    $ setfacl -m 'default:user::rwx' "${BR2_DL_DIR}"
    $ setfacl -m 'default:group::rwx' "${BR2_DL_DIR}"
    $ setfacl -m 'default:other::rwx' "${BR2_DL_DIR}"

This has the effect that:

  - files below BR2_DL_DIR are all set with user, group, and world read
    and write access,
  - files executable by the owner will also be group and world
    executable,
  - directories are user, group, and world readable, writable, and
    searchable.

This means that all the archives we generate from files in BR2_DL_DIR
will have modes that are different from those generated on other systems,
where only the traditional umask is used.

There are various solutions to solve that issue:

  - detect the situation and abort: that's not nice, because users have
    a legitimiate reason to want to share that directory,

  - find a solution for each affected download mechanism: git, svn, hg,
    cvs, bzr... and for each of the affected vendoring mechanism: go and
    cargo [0]; this is not nice, because it means a lot of repetition,
    with the risk that they diverge over time (e.g. one is fixed for a
    newer issue, while the others are left out due to an oversight...)

  - find a single, common solution that works in all cases, whatever the
    download mechanism and/or vendoring: this is the best, because we
    can extend and fix it once and everything else benefits from it.

We obviously go for the third option.

The common solution is rather simple. When creating the tarball in
support/download/helpers, give an option to tar to set the group and
other permissions to those of the user, but without write permission.

This implies that we must bump the version-suffix for the download
backends [1] and for the vendoring post-processes. It also implies that
the hash may change, under the following circumstances:

- Symlinks normally have permissions 0777 (because symlink permissions
  are in fact meaningless). They will now have permission 0755 in the
  tarball.
- If the original tarball (for vendored go and cargo packages) contained
  files that are readable or executable by owner but not by group or
  other, they will now be readable resp. executable by group and other
  too. Note that for writeable it is not the case, because those were
  already handled by our 0022 umask (which makes them not writeable by
  group and other).

Because the hash may change, we need to update the BR_FMT_VERSION for
everything that creates tarballs. Go and cargo didn't have one up to
now, the the previous commit added the possibility to give one. The ones
for git and svn have to be updated. Since it is now possible to have a
suffix for both the VCS and the post-processing, change the suffix to
something more descriptive than "-brX", i.e. -git3 for git, -go1 for
golang, etc.

The hash updates and filename changes will be handled in a follow-up
commit.

[0] Note however that the vendoring is currently not done in a
sub-directory of BR2_DL_DIR, but the cargo and go caches are located
there. Files that get copied from there to the vendoring area would be
tainted as well, and thus we want to address that situation as well.

[1] we currently do not have a CVS version suffix, because we do not
guarantee the reproducilibity of CVS archives (we can't); for hg, we are
currently using hg's own archive tool, and presumably that does not have
the mode issue because it is not using the checked-out files. Still,
doing the mode fix in a single location will help extend those two
backends in the future (if that ever happens...).

Reported-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
Signed-off-by: Arnout Vandecappelle <arnout@mind.be>
---
 package/pkg-download.mk  | 8 +++++---
 support/download/helpers | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)
diff mbox series

Patch

diff --git a/package/pkg-download.mk b/package/pkg-download.mk
index 9047e99f86..61e565b002 100644
--- a/package/pkg-download.mk
+++ b/package/pkg-download.mk
@@ -19,9 +19,11 @@  export SFTP := $(call qstrip,$(BR2_SFTP))
 export LOCALFILES := $(call qstrip,$(BR2_LOCALFILES))
 
 # Version of the format of the archives we generate in the corresponding
-# download backend:
-BR_FMT_VERSION_git = -br2
-BR_FMT_VERSION_svn = -br3
+# download backend and post-process:
+BR_FMT_VERSION_git = -git3
+BR_FMT_VERSION_svn = -svn4
+BR_FMT_VERSION_go = -go1
+BR_FMT_VERSION_cargo = -cargo1
 
 DL_WRAPPER = support/download/dl-wrapper
 
diff --git a/support/download/helpers b/support/download/helpers
index 90a7d6c1ec..823e4d2f91 100755
--- a/support/download/helpers
+++ b/support/download/helpers
@@ -61,7 +61,7 @@  mk_tar_gz() {
     # Create POSIX tarballs, since that's the format the most reproducible
     tar cf - --transform="s#^\./#${base_dir}/#S" \
              --numeric-owner --owner=0 --group=0 --mtime="${date}" \
-             --format=posix --pax-option="${pax_options}" \
+             --format=posix --pax-option="${pax_options}" --mode='go=u,go-w' \
              -T "${tmp}.sorted" >"${tmp}.tar"
 
     # Compress the archive