Message ID | 20201219153525.1361175-5-thomas.petazzoni@bootlin.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [v2,01/12] support/download/dl-wrapper: add concept of download post-processing | expand |
Thomas, All, On 2020-12-19 16:35 +0100, Thomas Petazzoni spake thusly: > download post process scripts will often need to unpack the source > code tarball, do some operation, and then repack it. In order to help > with this, post-process-helpers provide an unpack() function and a > repack() function. You forgot to explain in the commit log why we need a hard-coded, fixed date. However, I find it ugly that we resort to such an arbitrary constant. At the very least, I'd like we take a memorable value, like 1970-01-01 00:00:00 +0000. However, I am afraid that tar would refuse to extract files to that date, whining about an "implausibly old time stamp" when extracting such files... But see below for a proposal... > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> > --- > support/download/post-process-helpers | 30 +++++++++++++++++++++++++++ > 1 file changed, 30 insertions(+) > create mode 100644 support/download/post-process-helpers > > diff --git a/support/download/post-process-helpers b/support/download/post-process-helpers > new file mode 100644 > index 0000000000..bed8df2577 > --- /dev/null > +++ b/support/download/post-process-helpers > @@ -0,0 +1,30 @@ > + > +unpack() { > + dest="$1" > + tarball="$2" This is a shell script, variables are global. But since we run it with bash, we can declare such variable as local: local dest="${1}" local tarball="${2}" > + mkdir ${dest} > + tar -C ${dest} --strip-components=1 -xf ${tarball} local one_file one_file="$(find "${dest}" -type f |head -n1)" touch -r "${one_file}" "${dest}.timestamp" > +} > + > +repack() { > + src="$1" > + tarball="$2" Ditto: make them 'local' (this can be fixed when applying). Also, I suppose that the 'dest' parameter of unpack is expected to be the 'src' parameter of repack, right? > + # Generate the archive, sort with the C locale so that it is reproducible. > + find "$(basename ${src})" -not -type d -print0 >files.list > + LC_ALL=C sort -z <files.list >files.list.sorted > + > + # let's use a fixed hardcoded date to be reproducible > + date="2020-02-06 01:02:03 +0000" Assuming that unpack's dest and repack's src are pointing to the same directory: date="$(stat -c "${src}.timestamp")" > + # Create GNU-format tarballs, since that's the format of the tarballs on > + # sources.buildroot.org and used in the *.hash files > + tar cf new.tar --null --verbatim-files-from --numeric-owner --format=gnu \ > + --owner=0 --group=0 --mtime="${date}" -T files.list.sorted This will have to be changed according to my pending series to change to the PAX format, whish is even more reproducible. > + gzip -6 -n <new.tar >new.tar.gz > + mv "${tarball}" "${tarball}".old > + mv new.tar.gz "${tarball}" > + rm "${tarball}".old > + rm -rf ${src} Quote all expanded variable, even "${src}". And also rm "${src}.timestamp", now. Regards, Yann E. MORIN. > +} > -- > 2.29.2 >
Thomas, All, A few mre tweaks I forgot about... On 2020-12-19 18:39 +0100, Yann E. MORIN spake thusly: > On 2020-12-19 16:35 +0100, Thomas Petazzoni spake thusly: [--SNIP--] > > + mkdir ${dest} Using 'mkdir -p' would ensure that parent direcotries are created as needed. > > one_file="$(find "${dest}" -type f |head -n1)" Of course, the output should be sorted in the C locale, to ensure the same file is always used as reference (and hence a better name for that file): local ref_file ref_file="$(find "${dest}" -type f |LC_ALL=C sort |head -n1)" touch -r "${ref_file}" "${dest}.timestamp" [--SNIP--] > > + # let's use a fixed hardcoded date to be reproducible > > + date="2020-02-06 01:02:03 +0000" > Assuming that unpack's dest and repack's src are pointing to the same > directory: > date="$(stat -c "${src}.timestamp")" Meh.. I forgot the actual format: date="$(stat -c '%y' "${src}.timestamp")" Regards, Yann E. MORIN.
Thomas, All, On 2020-12-19 16:35 +0100, Thomas Petazzoni spake thusly: > download post process scripts will often need to unpack the source > code tarball, do some operation, and then repack it. In order to help > with this, post-process-helpers provide an unpack() function and a > repack() function. Sorry to come back yet once more on that one, but I may have found quite an issue. Basically, your code does this: unpack() { tar -C ${dest} --strip-components=1 -xf ${tarball} } repack() { tar cf new.tar --null --verbatim-files-from --numeric-owner --format=gnu \ --owner=0 --group=0 --mtime="${date}" -T files.list.sorted gzip -6 -n <new.tar >new.tar.gz } So, it takes a tarball in whatever copression scheme and extract it, but it will alwyas create a gzip-compressed tarball. So, how would we hanfle a case where the upstream tarball is, say, a .tar.bz2 ? For example; FOO_SOURCE = foo=1.2.3.tar.bz2 FOO_SITE = http://foo.example.net/download FOO_SITE_METOD = wget $(eval $(go-package)) Besides the fact that the packager would see in DL_DIR a .tar.gz tarball different from the .tar.bz2 he coded, the most problematic issue wil be that we would have to list the .tar.gz inthe .hash file. This will be very confusing to packagers and reviewers... Furthermore, there is another discrepancy in the post-processing scripts. For example, the go post-processor does, basically: if tarball already vendored: exit leaving the tarball as-is unpack vendor repack as .tar.gz So some go tarballs will be left as they are upstream, while others will be re-encoded. This is going to be quite confusing... I think the post-processing script should always re-encode the tarball, so that at least we have a consistent behaviour across the board. This is pretty trivial to come up with, at the exepense of a bit of overhead for those tarballs that are already vendored, but the majority of packages will probably be git clones that need to be vendored, so it is a small price to pay. The big problem that's left, is how we handle the discrepancy between the tarball listed in _SOURCE and the one listed in the .hash file... I have to admit I do not have an easy solution for that for now... Especially since with the pending series about changing the way we assemble tarballs from git and svn: the tar format changes, and the compression changes... Regards, Yann E. MORIN.
diff --git a/support/download/post-process-helpers b/support/download/post-process-helpers new file mode 100644 index 0000000000..bed8df2577 --- /dev/null +++ b/support/download/post-process-helpers @@ -0,0 +1,30 @@ + +unpack() { + dest="$1" + tarball="$2" + + mkdir ${dest} + tar -C ${dest} --strip-components=1 -xf ${tarball} +} + +repack() { + src="$1" + tarball="$2" + + # Generate the archive, sort with the C locale so that it is reproducible. + find "$(basename ${src})" -not -type d -print0 >files.list + LC_ALL=C sort -z <files.list >files.list.sorted + + # let's use a fixed hardcoded date to be reproducible + date="2020-02-06 01:02:03 +0000" + + # Create GNU-format tarballs, since that's the format of the tarballs on + # sources.buildroot.org and used in the *.hash files + tar cf new.tar --null --verbatim-files-from --numeric-owner --format=gnu \ + --owner=0 --group=0 --mtime="${date}" -T files.list.sorted + gzip -6 -n <new.tar >new.tar.gz + mv "${tarball}" "${tarball}".old + mv new.tar.gz "${tarball}" + rm "${tarball}".old + rm -rf ${src} +}
download post process scripts will often need to unpack the source code tarball, do some operation, and then repack it. In order to help with this, post-process-helpers provide an unpack() function and a repack() function. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> --- support/download/post-process-helpers | 30 +++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 support/download/post-process-helpers