diff mbox series

support/download/hg: fix broken method

Message ID 20210427194544.15861-1-patrickdepinguin@gmail.com
State Accepted
Headers show
Series support/download/hg: fix broken method | expand

Commit Message

Thomas De Schampheleire April 27, 2021, 7:45 p.m. UTC
From: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>

Commit 54d3d94b6e3846447b5796ef8587b08b537cd348 broke the 'hg' download
method, in a similar way as it broke the 'git' download method (later fixed
with commit b70ce5665126246bd6b6bf804c6d9eea1fc599cf), by introducing extra
output on stdout in a case where the output is redirected.

In the case of 'hg', the 'hg archive' step uses shell redirection rather
than directly letting hg write the output file, since commit
76b51f90c0e393349dd0c71d7e6cf82fbc094282.

As a result, the extra print added by the _hg function is prepended to the
actual archive, causing an invalid archive.

Fix by using the _plain_hg function instead. The disadvantage is that the
command for 'hg archive' is no longer printed.

Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
---
 support/download/hg | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Yann E. MORIN April 28, 2021, 7:51 p.m. UTC | #1
Thomas, All,

On 2021-04-27 21:45 +0200, Thomas De Schampheleire spake thusly:
> From: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
> 
> Commit 54d3d94b6e3846447b5796ef8587b08b537cd348 broke the 'hg' download
> method, in a similar way as it broke the 'git' download method (later fixed
> with commit b70ce5665126246bd6b6bf804c6d9eea1fc599cf), by introducing extra
> output on stdout in a case where the output is redirected.
> 
> In the case of 'hg', the 'hg archive' step uses shell redirection rather
> than directly letting hg write the output file, since commit
> 76b51f90c0e393349dd0c71d7e6cf82fbc094282.
> 
> As a result, the extra print added by the _hg function is prepended to the
> actual archive, causing an invalid archive.
> 
> Fix by using the _plain_hg function instead. The disadvantage is that the
> command for 'hg archive' is no longer printed.
> 
> Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>

As discussed on IRC, it would be better if we could keep the verbosity
of the 'archive' step. But doing so would involve playing various
tricks with filedescriptors, and this is a bit too much for the gain.

An alternative would be to change the way the archives are generated, by
using a sequence not unlike what we were using for git, i.e. roughly:

    hg archive [opts] ${outputfile}.tmp
    gzip -c -n -9 <${outputfile}.tmp >${outputfile}

The 'hg archive' part is reproducible, and the generated (uncompressed)
tarball is identical to the previous archive after decompression.

The 'gzip' part itself is reproducible (thanks to the -n option, and to
reading from stdin), but calling gzip manually generates a compressed
archive that differs from the previous one. One difference is that the
timestamp was previously set to the commit date (thus was reproducible),
while 'gzip -n' sets it to 0 (zero, ie. 1970-01-01 00:00:00+0000, also
reproducible). The manual gzip also injects additional data (additional
headers?).

As such, even though the gzip call is reproducible, it generates archives
that are different from the previous ones, which would require us to
introduce the hg-specific archive format:
    BR_FMT_VERSION_hg = -br1

A third option would be to use the mk_tar_gz() helper, as is used for
git and svn.

This is generating reproducible archives (that's the whole point for
that helper to begin with), but those are again different from the
previous ones, which would also require that we introduce
BR_FMT_VERSION_hg.

All in all, we have only 4 packages that are downloaded from Hg:

  - canfestival: does not have a hash, but fails to extract, nad we have
    a few autobuild failures:
        http://autobuild.buildroot.org/results/3fd/3fd118fb87299099e47026161c1d8c948eb9a5c1/build-end.log
        http://autobuild.buildroot.org/results/bf4/bf42580e4c03015004975e03736d6d6a3a87b04d/build-end.log

  - dvb-apps: has a hash, thus fails to download from upstream, and
    falls back to downloading an old archive from s.b.o

  - opentyrian: does not have a hash, but upstream is dead, so falls
    back to downloading an old archive from s.b.o:
        hg clone --noupdate 'https://bitbucket.org/opentyrian/opentyrian' 'opentyrian-9c9f0ec3532b'
        abort: HTTP Error 404: Not Found

  - python-pygame: has a hash *and* upstream is dead, so fails to
    download from upstream, and falls back to downloading an old archive
    from s.b.o.
        hg clone --noupdate 'https://bitbucket.org/pygame/pygame' 'python-pygame-d61ea8eabd56'
        abort: HTTP Error 404: Not Found

So, the situation with Hg is dire. Basically no one is using Hg *and*
cares, but you. But as you said on IRC (quoting):
   "I don't care *enough* about the verbosity of the archive generator"

So...

Applied to master, thanks.

If you ever reconsider, feel free to poke me about the archive format
version stuff. ;-)

Regards,
Yann E. MORIN.

> ---
>  support/download/hg | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/support/download/hg b/support/download/hg
> index a7542e5c84..768a27e06f 100755
> --- a/support/download/hg
> +++ b/support/download/hg
> @@ -45,6 +45,6 @@ _plain_hg() {
>  
>  _hg clone ${quiet} "${@}" --noupdate "'${uri}'" "'${basename}'"
>  
> -_hg archive ${quiet} --repository "'${basename}'" --type tgz \
> +_plain_hg archive ${quiet} --repository "'${basename}'" --type tgz \
>              --prefix "'${basename}'" --rev "'${cset}'" \
>              - >"${output}"
> -- 
> 2.26.3
>
Thomas De Schampheleire May 3, 2021, 6:40 p.m. UTC | #2
Hi Yann,

El mié, 28 abr 2021 a las 21:51, Yann E. MORIN
(<yann.morin.1998@free.fr>) escribió:
>
> Thomas, All,
>
> On 2021-04-27 21:45 +0200, Thomas De Schampheleire spake thusly:
> > From: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
> >
> > Commit 54d3d94b6e3846447b5796ef8587b08b537cd348 broke the 'hg' download
> > method, in a similar way as it broke the 'git' download method (later fixed
> > with commit b70ce5665126246bd6b6bf804c6d9eea1fc599cf), by introducing extra
> > output on stdout in a case where the output is redirected.
> >
> > In the case of 'hg', the 'hg archive' step uses shell redirection rather
> > than directly letting hg write the output file, since commit
> > 76b51f90c0e393349dd0c71d7e6cf82fbc094282.
> >
> > As a result, the extra print added by the _hg function is prepended to the
> > actual archive, causing an invalid archive.
> >
> > Fix by using the _plain_hg function instead. The disadvantage is that the
> > command for 'hg archive' is no longer printed.
> >
> > Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
>
> As discussed on IRC, it would be better if we could keep the verbosity
> of the 'archive' step. But doing so would involve playing various
> tricks with filedescriptors, and this is a bit too much for the gain.
>
> An alternative would be to change the way the archives are generated, by
> using a sequence not unlike what we were using for git, i.e. roughly:
>
>     hg archive [opts] ${outputfile}.tmp
>     gzip -c -n -9 <${outputfile}.tmp >${outputfile}
>
> The 'hg archive' part is reproducible, and the generated (uncompressed)
> tarball is identical to the previous archive after decompression.
>
> The 'gzip' part itself is reproducible (thanks to the -n option, and to
> reading from stdin), but calling gzip manually generates a compressed
> archive that differs from the previous one. One difference is that the
> timestamp was previously set to the commit date (thus was reproducible),
> while 'gzip -n' sets it to 0 (zero, ie. 1970-01-01 00:00:00+0000, also
> reproducible). The manual gzip also injects additional data (additional
> headers?).
>
> As such, even though the gzip call is reproducible, it generates archives
> that are different from the previous ones, which would require us to
> introduce the hg-specific archive format:
>     BR_FMT_VERSION_hg = -br1
>
> A third option would be to use the mk_tar_gz() helper, as is used for
> git and svn.
>
> This is generating reproducible archives (that's the whole point for
> that helper to begin with), but those are again different from the
> previous ones, which would also require that we introduce
> BR_FMT_VERSION_hg.

Thanks for this analysis!

>
> All in all, we have only 4 packages that are downloaded from Hg:
>
>   - canfestival: does not have a hash, but fails to extract, nad we have
>     a few autobuild failures:
>         http://autobuild.buildroot.org/results/3fd/3fd118fb87299099e47026161c1d8c948eb9a5c1/build-end.log
>         http://autobuild.buildroot.org/results/bf4/bf42580e4c03015004975e03736d6d6a3a87b04d/build-end.log
>
>   - dvb-apps: has a hash, thus fails to download from upstream, and
>     falls back to downloading an old archive from s.b.o
>
>   - opentyrian: does not have a hash, but upstream is dead, so falls
>     back to downloading an old archive from s.b.o:
>         hg clone --noupdate 'https://bitbucket.org/opentyrian/opentyrian' 'opentyrian-9c9f0ec3532b'
>         abort: HTTP Error 404: Not Found
>
>   - python-pygame: has a hash *and* upstream is dead, so fails to
>     download from upstream, and falls back to downloading an old archive
>     from s.b.o.
>         hg clone --noupdate 'https://bitbucket.org/pygame/pygame' 'python-pygame-d61ea8eabd56'
>         abort: HTTP Error 404: Not Found
>

Note that the cases where upstream still exists, the 'fails to
extract' is exactly the problem this patch is fixing. It fails to
extract because the 'hg archive' step is adding a textual print as
part of the .tar.gz file, which corrupts it.
So this leaves us with:
- canfestival: should work now

- dvb-apps: only problem is that the hash file does not match the
archive generated. It's not clear to me why.
From your analysis, even the original hg download helper should
generate reproducible archives, right? When I try it several times
tonight, I observe following retrieved hash:

ERROR: dvb-apps-3d43b280298c39a67d1d889e01e173f52c12da35.tar.gz has
wrong sha256 hash:
ERROR: expected:
926208b7e711b4bab1a909ff9bf4e6ae54acdd30a46f5d5bd700ecb088fe1f57
ERROR: got     :
71cab442377987425aa05770d6e5724bcbcdd3ea144e54c63883be8fcebf5644

If there is still a problem with reproducibility for the hg helper, we
are back to fixing that, with the -br1 suffix.

- opentyrian: upstream is dead due to Bitbucket no longer supporting
hg. There seems to be a new (git) upstream:
https://github.com/opentyrian/opentyrian . Perhaps **Julien
Boibessot** is interested in fixing this?

- python-pygame: likewise, new upstream:
https://github.com/pygame/pygame . Also something that Julien could
look at?


> So, the situation with Hg is dire. Basically no one is using Hg *and*
> cares, but you. But as you said on IRC (quoting):
>    "I don't care *enough* about the verbosity of the archive generator"
>
> So...
>
> Applied to master, thanks.
>
> If you ever reconsider, feel free to poke me about the archive format
> version stuff. ;-)

Yes, I think we would better handle it at some point, but now is not a
good time for me.

Thanks,
Thomas
Yann E. MORIN May 3, 2021, 7:29 p.m. UTC | #3
Thomas, All,

On 2021-05-03 20:40 +0200, Thomas De Schampheleire spake thusly:
> El mié, 28 abr 2021 a las 21:51, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribió:
> > On 2021-04-27 21:45 +0200, Thomas De Schampheleire spake thusly:
> > > Commit 54d3d94b6e3846447b5796ef8587b08b537cd348 broke the 'hg' download
[--SNIP--]
> > > Fix by using the _plain_hg function instead. The disadvantage is that the
> > > command for 'hg archive' is no longer printed.
[--SNIP--]
> > All in all, we have only 4 packages that are downloaded from Hg:
> >   - canfestival: does not have a hash, but fails to extract, nad we have
> >     a few autobuild failures:
> >         http://autobuild.buildroot.org/results/3fd/3fd118fb87299099e47026161c1d8c948eb9a5c1/build-end.log
> >         http://autobuild.buildroot.org/results/bf4/bf42580e4c03015004975e03736d6d6a3a87b04d/build-end.log
> >
> >   - dvb-apps: has a hash, thus fails to download from upstream, and
> >     falls back to downloading an old archive from s.b.o
> >
> >   - opentyrian: does not have a hash, but upstream is dead, so falls
> >     back to downloading an old archive from s.b.o:
> >         hg clone --noupdate 'https://bitbucket.org/opentyrian/opentyrian' 'opentyrian-9c9f0ec3532b'
> >         abort: HTTP Error 404: Not Found
> >
> >   - python-pygame: has a hash *and* upstream is dead, so fails to
> >     download from upstream, and falls back to downloading an old archive
> >     from s.b.o.
> >         hg clone --noupdate 'https://bitbucket.org/pygame/pygame' 'python-pygame-d61ea8eabd56'
> >         abort: HTTP Error 404: Not Found
> >
[--SNIP--]
> - dvb-apps: only problem is that the hash file does not match the
> archive generated. It's not clear to me why.
> From your analysis, even the original hg download helper should
> generate reproducible archives, right? When I try it several times
> tonight, I observe following retrieved hash:
> 
> ERROR: dvb-apps-3d43b280298c39a67d1d889e01e173f52c12da35.tar.gz has
> wrong sha256 hash:
> ERROR: expected:
> 926208b7e711b4bab1a909ff9bf4e6ae54acdd30a46f5d5bd700ecb088fe1f57
> ERROR: got     :
> 71cab442377987425aa05770d6e5724bcbcdd3ea144e54c63883be8fcebf5644

Weird, it works here...

    $ hg --version
    Mercurial Distributed SCM (version 5.3.1)

> If there is still a problem with reproducibility for the hg helper, we
> are back to fixing that, with the -br1 suffix.

So, if that's the case, then we should use our mk-tar_gz() helper, and
not rely on the archive generated from hg.

> - opentyrian: upstream is dead due to Bitbucket no longer supporting
> hg. There seems to be a new (git) upstream:
> https://github.com/opentyrian/opentyrian . Perhaps **Julien
> Boibessot** is interested in fixing this?

I've already sent a patch:
    https://patchwork.ozlabs.org/project/buildroot/patch/20210429195324.2463255-1-yann.morin.1998@free.fr/

> - python-pygame: likewise, new upstream:
> https://github.com/pygame/pygame . Also something that Julien could
> look at?

> > If you ever reconsider, feel free to poke me about the archive format
> > version stuff. ;-)
> Yes, I think we would better handle it at some point, but now is not a
> good time for me.

If we can't solve this reproducibility issue, then we'll have no choice
but to leap.

Regards,
Yann E. MORIN.
diff mbox series

Patch

diff --git a/support/download/hg b/support/download/hg
index a7542e5c84..768a27e06f 100755
--- a/support/download/hg
+++ b/support/download/hg
@@ -45,6 +45,6 @@  _plain_hg() {
 
 _hg clone ${quiet} "${@}" --noupdate "'${uri}'" "'${basename}'"
 
-_hg archive ${quiet} --repository "'${basename}'" --type tgz \
+_plain_hg archive ${quiet} --repository "'${basename}'" --type tgz \
             --prefix "'${basename}'" --rev "'${cset}'" \
             - >"${output}"