diff mbox series

<PACKAGE>_SOURCE with <PACKAGE>_SITE_METHOD = git can result to tar.gz with mismatched file extension

Message ID 38529777-79a9-dfb5-ec42-b7c37e617281@gmail.com
State Not Applicable
Headers show
Series <PACKAGE>_SOURCE with <PACKAGE>_SITE_METHOD = git can result to tar.gz with mismatched file extension | expand

Commit Message

Bagas Sanjaya Dec. 2, 2022, 2:02 p.m. UTC
Hello,

I noticed odd behavior when <PACKAGE>_SOURCE is set whereas <PACKAGE>
is downloaded via git (<PACKAGE>_SITE_METHOD = git).

For example, I'm trying to bump Git package to commit
c000d916380bb59db69c78546928eadd076b9c7d (v2.39.0-rc0). On the makefile
(package/git/git.mk), I bumped by:

---- >8 ----

(note: I fetch from my local Git project repository, hence file:// URI).

When I fetch the sources (make source), the generated tarball is actually
.tar.gz with $(GIT_VERSION) as the filename. This causes extracting the
tarball to fail. In this case, the tarball is decompressed first with
xzcat, and it complains due to unrecognized compressed format (gzip
instead of xz).

Thanks.

Comments

Thomas Petazzoni Dec. 3, 2022, 2:52 p.m. UTC | #1
Hello,

+Yann Morin in Cc.

On Fri, 2 Dec 2022 21:02:06 +0700
Bagas Sanjaya <bagasdotme@gmail.com> wrote:

> Hello,
> 
> I noticed odd behavior when <PACKAGE>_SOURCE is set whereas <PACKAGE>
> is downloaded via git (<PACKAGE>_SITE_METHOD = git).
> 
> For example, I'm trying to bump Git package to commit
> c000d916380bb59db69c78546928eadd076b9c7d (v2.39.0-rc0). On the makefile
> (package/git/git.mk), I bumped by:
> 
> ---- >8 ----  
> diff --git a/package/git/git.mk b/package/git/git.mk
> index dc587170e8..1990bf8e67 100644
> --- a/package/git/git.mk
> +++ b/package/git/git.mk
> @@ -4,9 +4,10 @@
>  #
>  ################################################################################
>  
> -GIT_VERSION = 2.31.4
> +GIT_VERSION = c000d916380bb59db69c78546928eadd076b9c7d
>  GIT_SOURCE = git-$(GIT_VERSION).tar.xz
> -GIT_SITE = $(BR2_KERNEL_MIRROR)/software/scm/git
> +GIT_SITE = file:///home/bagas/repo/git-scm
> +GIT_SITE_METHOD = git
>  GIT_LICENSE = GPL-2.0, LGPL-2.1+
>  GIT_LICENSE_FILES = COPYING LGPL-2.1
>  GIT_CPE_ID_VENDOR = git-scm
> 
> (note: I fetch from my local Git project repository, hence file:// URI).
> 
> When I fetch the sources (make source), the generated tarball is actually
> .tar.gz with $(GIT_VERSION) as the filename. This causes extracting the
> tarball to fail. In this case, the tarball is decompressed first with
> xzcat, and it complains due to unrecognized compressed format (gzip
> instead of xz).

I think you're right. We assume that _SOURCE is not defined when a
_SITE_METHOD using a VCS (git, cvs, svn, hg) is used.

I guess we have two options here:

 (1) Detect this situation, and error out, to prevent this situation
 from happening

 (2) Actually support overriding _SOURCE, but in this case, we should
 comply with the specified compression algorithm

Yann, thoughts? (1) seems easier to me, I don't know if the benefits of
(2) are really relevant.

Thomas
Yann E. MORIN Dec. 3, 2022, 4:05 p.m. UTC | #2
Thomas, Bagas, All,

On 2022-12-03 15:52 +0100, Thomas Petazzoni spake thusly:
> On Fri, 2 Dec 2022 21:02:06 +0700
> Bagas Sanjaya <bagasdotme@gmail.com> wrote:
> > Hello,
> > 
> > I noticed odd behavior when <PACKAGE>_SOURCE is set whereas <PACKAGE>
> > is downloaded via git (<PACKAGE>_SITE_METHOD = git).
> > 
> > For example, I'm trying to bump Git package to commit
> > c000d916380bb59db69c78546928eadd076b9c7d (v2.39.0-rc0). On the makefile
> > (package/git/git.mk), I bumped by:
> > 
> > ---- >8 ----  
> > diff --git a/package/git/git.mk b/package/git/git.mk
> > index dc587170e8..1990bf8e67 100644
> > --- a/package/git/git.mk
> > +++ b/package/git/git.mk
> > @@ -4,9 +4,10 @@
> >  #
> >  ################################################################################
> >  
> > -GIT_VERSION = 2.31.4
> > +GIT_VERSION = c000d916380bb59db69c78546928eadd076b9c7d
> >  GIT_SOURCE = git-$(GIT_VERSION).tar.xz
> > -GIT_SITE = $(BR2_KERNEL_MIRROR)/software/scm/git
> > +GIT_SITE = file:///home/bagas/repo/git-scm
> > +GIT_SITE_METHOD = git
> >  GIT_LICENSE = GPL-2.0, LGPL-2.1+
> >  GIT_LICENSE_FILES = COPYING LGPL-2.1
> >  GIT_CPE_ID_VENDOR = git-scm
> > 
> > (note: I fetch from my local Git project repository, hence file:// URI).
> > 
> > When I fetch the sources (make source), the generated tarball is actually
> > .tar.gz with $(GIT_VERSION) as the filename. This causes extracting the
> > tarball to fail. In this case, the tarball is decompressed first with
> > xzcat, and it complains due to unrecognized compressed format (gzip
> > instead of xz).
> 
> I think you're right. We assume that _SOURCE is not defined when a
> _SITE_METHOD using a VCS (git, cvs, svn, hg) is used.
> 
> I guess we have two options here:
> 
>  (1) Detect this situation, and error out, to prevent this situation
>  from happening
> 
>  (2) Actually support overriding _SOURCE, but in this case, we should
>  comply with the specified compression algorithm
> 
> Yann, thoughts? (1) seems easier to me, I don't know if the benefits of
> (2) are really relevant.

I think the only meaningful solution is to go for (1): detect and abort.
I too do not think letting packages each request a different compression
would be interesting...

If we believe that gzip is too slow/big, then we can think about
switching to another scheme globaly, but that's orthogonal...

So, yes: detect and abort.

Regards,
Yann E. MORIN.
Thomas Petazzoni Dec. 3, 2022, 5:23 p.m. UTC | #3
Hello,

On Sat, 3 Dec 2022 17:05:06 +0100
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> > Yann, thoughts? (1) seems easier to me, I don't know if the benefits of
> > (2) are really relevant.  
> 
> I think the only meaningful solution is to go for (1): detect and abort.
> I too do not think letting packages each request a different compression
> would be interesting...
> 
> If we believe that gzip is too slow/big, then we can think about
> switching to another scheme globaly, but that's orthogonal...

Actually, (2) would have an advantage: it would allow to migrate from
one compression to another package per package, instead of requiring a
flag day where we switch all VCS-fetched packages to the new
compression.

Best regards,

Thomas
Yann E. MORIN Dec. 3, 2022, 6:14 p.m. UTC | #4
Thomas, All,

On 2022-12-03 18:23 +0100, Thomas Petazzoni spake thusly:
> On Sat, 3 Dec 2022 17:05:06 +0100
> "Yann E. MORIN" <yann.morin.1998@free.fr> wrote:
> > > Yann, thoughts? (1) seems easier to me, I don't know if the benefits of
> > > (2) are really relevant.  
> > I think the only meaningful solution is to go for (1): detect and abort.
> > I too do not think letting packages each request a different compression
> > would be interesting...
> > If we believe that gzip is too slow/big, then we can think about
> > switching to another scheme globaly, but that's orthogonal...
> Actually, (2) would have an advantage: it would allow to migrate from
> one compression to another package per package, instead of requiring a
> flag day where we switch all VCS-fetched packages to the new
> compression.

Sure, but we already need a flag-day when we actually change the way we
generate the archives from VCS checkouts:
    5b95a5dc27c0 support/download: change format of archives generated from git
    c043ecb20ce6 support/download: change format of archives generated from svn
    c92be85e3a29 support/download: make the svn backend more reproducible

Having a flag-day is not too bad, as we can quite easily script the hash
updates...

So, I still believe we should not allow _SOURCE when _SITE_METHOD is one
of our VCS backends.

Regards,
Yann E. MORIN.
Bagas Sanjaya Dec. 4, 2022, 12:31 p.m. UTC | #5
On 12/3/22 23:05, Yann E. MORIN wrote:
>> I think you're right. We assume that _SOURCE is not defined when a
>> _SITE_METHOD using a VCS (git, cvs, svn, hg) is used.
>>
>> I guess we have two options here:
>>
>>  (1) Detect this situation, and error out, to prevent this situation
>>  from happening
>>
>>  (2) Actually support overriding _SOURCE, but in this case, we should
>>  comply with the specified compression algorithm
>>
>> Yann, thoughts? (1) seems easier to me, I don't know if the benefits of
>> (2) are really relevant.
> 
> I think the only meaningful solution is to go for (1): detect and abort.
> I too do not think letting packages each request a different compression
> would be interesting...
> 
> If we believe that gzip is too slow/big, then we can think about
> switching to another scheme globaly, but that's orthogonal...
> 
> So, yes: detect and abort.
> 

Actually I think we can just do ``tar xvf`` since tar will automatically
figure out the decompressor to use.
Yann E. MORIN Dec. 4, 2022, 2:05 p.m. UTC | #6
Bagas, All,

On 2022-12-04 19:31 +0700, Bagas Sanjaya spake thusly:
> On 12/3/22 23:05, Yann E. MORIN wrote:
> >> I guess we have two options here:
> >>  (1) Detect this situation, and error out, to prevent this situation
> >>  from happening
> >>  (2) Actually support overriding _SOURCE, but in this case, we should
> >>  comply with the specified compression algorithm
[--SNIP--]
> > So, yes: detect and abort.
> Actually I think we can just do ``tar xvf`` since tar will automatically
> figure out the decompressor to use.

The issue is not so much about decompressing, than about compressing.

The download backends for VCS all expect to generate gzip-compressed
tarballs; see: support/download/git@231, support/download/svn@67,
support/download/cvs@70. For mercurial, it's even going deeper, as this
is ingrained in hg itself, and we rely on hg to generate the archive,
see support/download/hg@48..50 (it knows other compression formats, but
that may not align with the one we'd choose for the others).

The reason to request for a compression other than gzip is to achieve
one or more of:
  - decreasing the size of the generated archives;
  - decreasing the time needed to compress the archives;
  - decreasing the time needed to decompress the archives.

Aiming for those goals I believe only really makes sense globally, not
on a per-package basis.

If we were to go for a per-package support, we would need to convert all
our download backends anyway, and as I said previously, I don't think it
would be too cumbersome to migrate all the packages in one go at the
same time:
  - no cvs-hosted package;
  - a single svn-hosted package for which we have a hash: libxmlrpc (the
    other four have no hash);
  - two hg-hosted packages: dvb-apps, python-pygmes (the other three
    have no hash);
  - 112 git-hosted packages, so a bit more than 100 with a hash, and
    updating them can be easily scripted (that's what I did when we
    introduced the -br1 suffix for git).

Sure, we would also need to add that support to {go,cargo}-post-process,
but we do not have that many impacted packages either. However, for
those, I could well see a reason to have a per-package support, that we
shoved aside when introducing the download post-process: if we have to
download an unvendored archive that is not a tar.gz already, then we'd
have no way to represent that. But we assumed that all the packages that
would require vendoring would come from github or gitlab, they would all
be .tar.gz already, and that if we were to download a released archive,
it would already be vendored; this is turning to be actually the case
in practice: we only use archives from github/gitlab, so the point is
mostly academic.

So I still think we do not need support for different per-package
compression support, and that we shouldddetect the case that _SOURCE is
not set for a VCS-based download.

Regards,
Yann E. MORIN.
diff mbox series

Patch

diff --git a/package/git/git.mk b/package/git/git.mk
index dc587170e8..1990bf8e67 100644
--- a/package/git/git.mk
+++ b/package/git/git.mk
@@ -4,9 +4,10 @@ 
 #
 ################################################################################
 
-GIT_VERSION = 2.31.4
+GIT_VERSION = c000d916380bb59db69c78546928eadd076b9c7d
 GIT_SOURCE = git-$(GIT_VERSION).tar.xz
-GIT_SITE = $(BR2_KERNEL_MIRROR)/software/scm/git
+GIT_SITE = file:///home/bagas/repo/git-scm
+GIT_SITE_METHOD = git
 GIT_LICENSE = GPL-2.0, LGPL-2.1+
 GIT_LICENSE_FILES = COPYING LGPL-2.1
 GIT_CPE_ID_VENDOR = git-scm