diff mbox series

suport/download: fix git wrapper with submodules on older git versions

Message ID 20200524114718.21707-1-yann.morin.1998@free.fr
State Accepted
Headers show
Series suport/download: fix git wrapper with submodules on older git versions | expand

Commit Message

Yann E. MORIN May 24, 2020, 11:47 a.m. UTC
Older versions of git store the absolute path of the submodules'
repository as stored in the super-prject, e.g.:

    $ cat some-submodule/.git
    gitdir: /path/to/super-project/.git/modules/some-submodule

Obviously, this is not very reproducible.

More recent versions of git, however, store relative paths, shich
de-facto makes it reproducible.

Fix older versions by replacing the absolute paths with relative ones.

Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
---
 support/download/git | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Vincent Fazio May 25, 2020, 1:55 a.m. UTC | #1
Yann,

On 5/24/20 6:47 AM, Yann E. MORIN wrote:
> Older versions of git store the absolute path of the submodules'
> repository as stored in the super-prject, e.g.:
>
>      $ cat some-submodule/.git
>      gitdir: /path/to/super-project/.git/modules/some-submodule
>
> Obviously, this is not very reproducible.
>
> More recent versions of git, however, store relative paths, shich
> de-facto makes it reproducible.
>
> Fix older versions by replacing the absolute paths with relative ones.
>
> Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
> ---
>   support/download/git | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/support/download/git b/support/download/git
> index 075f665bbf..15d8c66e05 100755
> --- a/support/download/git
> +++ b/support/download/git
> @@ -176,6 +176,19 @@ date="$( _git log -1 --pretty=format:%cD )"
>   # There might be submodules, so fetch them.
>   if [ ${recurse} -eq 1 ]; then
>       _git submodule update --init --recursive
> +
> +    # Older versions of git will store the absolute path of the git tree
> +    # in the .git of submodules, while newer versions just use relative
> +    # paths. Detect and fix the older variants to use relative paths, so
> +    # that the archives are reproducible across a wider range of git
> +    # versions. However, we can't do that if git is too old and uses
> +    # full repositories for submodules.
> +    cmd='printf "%s\n" "${path}/"'
> +    for module_dir in $( _git submodule --quiet foreach "'${cmd}'" ); do
> +        [ -f "${module_dir}/.git" ] || continue
> +        relative_dir="$( sed -r -e 's,/+,/,g; s,[^/]+/,../,g' <<<"${module_dir}" )"
> +        sed -r -i -e "s:^gitdir\: $(pwd)/:gitdir\: "${relative_dir}":" "${module_dir}/.git"
> +    done
>   fi
>   
>   # Generate the archive, sort with the C locale so that it is reproducible.
Should we expand the `find` to ignore files named '.git' so that these 
don't get added to the tarball at all?

find . -not -type d \
        -and -not -path "./.git/*" -and -not -name ".git" >"${output}.list"

Seems like it'd be in-line with our current exclusion of the .git/ 
subfolder because that relative git reference wouldn't be valid after 
the tarball got unpacked anyway.
Yann E. MORIN May 25, 2020, 8:05 p.m. UTC | #2
Vincent, All,

On 2020-05-24 20:55 -0500, Vincent Fazio spake thusly:
> On 5/24/20 6:47 AM, Yann E. MORIN wrote:
>  Older versions of git store the absolute path of the submodules'
>  repository as stored in the super-prject, e.g.:
>      $ cat some-submodule/.git    gitdir: /path/to/super-project/.git/modules/some-submodule
>  Obviously, this is not very reproducible.
>  More recent versions of git, however, store relative paths, shich
>  de-facto makes it reproducible.Fix older versions by replacing the absolute paths with relative ones.
>  Signed-off-by: Yann E. MORIN
>  [1]<yann.morin.1998@free.fr>--- support/download/git | 13 +++++++++++++
>   1 file changed, 13 insertions(+)diff --git a/support/download/git b/support/download/git
>  index 075f665bbf..15d8c66e05 100755--- a/support/download/git
>  +++ b/support/download/git@@ -176,6 +176,19 @@ date="$( _git log -1 --pretty=format:%cD )"
>   # There might be submodules, so fetch them.
>   if [ ${recurse} -eq 1 ]; then     _git submodule update --init --recursive
>  ++    # Older versions of git will store the absolute path of the git tree
>  +    # in the .git of submodules, while newer versions just use relative
>  +    # paths. Detect and fix the older variants to use relative paths, so
>  +    # that the archives are reproducible across a wider range of git
>  +    # versions. However, we can't do that if git is too old and uses
>  +    # full repositories for submodules.+    cmd='printf "%s\n" "${path}/"'
>  +    for module_dir in $( _git submodule --quiet foreach "'${cmd}'" ); do
>  +        [ -f "${module_dir}/.git" ] || continue
>  +        relative_dir="$( sed -r -e 's,/+,/,g; s,[^/]+/,../,g' <<<"${module_dir}" )"
>  +        sed -r -i -e "s:^gitdir\: $(pwd)/:gitdir\: "${relative_dir}":" "${module_dir}/.git"
>  +    done fi  # Generate the archive, sort with the C locale so that it is reproducible.
> 
> Should we expand the `find` to ignore files named '.git' so that these don't get added to the tarball at all?
> 
> find . -not -type d \
>        -and -not -path "./.git/*" -and -not -name ".git" >"${output}.list"

We do not want to do tht, because we want to reproduce the existign
tarballs. And those existign tarballs already contain the .git files.

Note however that, for people wo have prehistoric git versions, git
submodulkes will be entire repositories of their own, i.e. the .git of
submodules is a directory with an actual repository, instead of a plain
file with a gitdir indirection. For those people, tarballs from git
archives are not reproducible but we don't care.

But for the case that concerns us, we don't want to drop the .git files.

Yeah, that might be an oversight from back when we introduced support
for submodules, but it's now too late...

Regards,
Yann E. MORIN.

> Seems like it'd be in-line with our current exclusion of the .git/ subfolder because that relative git reference wouldn't be valid
> after the tarball got unpacked anyway.
> 
> -- Vincent FazioEmbedded Software Engineer - Linux
> Extreme Engineering Solutions, Inc
> [2]http://www.xes-inc.com
> 
> Links:
> 1. mailto:yann.morin.1998@free.fr/
> 2. http://www.xes-inc.com/
Vincent Fazio May 25, 2020, 11:24 p.m. UTC | #3
Yann,

On 5/25/20 3:05 PM, Yann E. MORIN wrote:
> Vincent, All,
> 
> On 2020-05-24 20:55 -0500, Vincent Fazio spake thusly:
>> On 5/24/20 6:47 AM, Yann E. MORIN wrote:
>>   Older versions of git store the absolute path of the submodules'
>>   repository as stored in the super-prject, e.g.:
>>       $ cat some-submodule/.git    gitdir: /path/to/super-project/.git/modules/some-submodule
>>   Obviously, this is not very reproducible.
>>   More recent versions of git, however, store relative paths, shich
>>   de-facto makes it reproducible.Fix older versions by replacing the absolute paths with relative ones.
>>   Signed-off-by: Yann E. MORIN
>>   [1]<yann.morin.1998@free.fr>--- support/download/git | 13 +++++++++++++
>>    1 file changed, 13 insertions(+)diff --git a/support/download/git b/support/download/git
>>   index 075f665bbf..15d8c66e05 100755--- a/support/download/git
>>   +++ b/support/download/git@@ -176,6 +176,19 @@ date="$( _git log -1 --pretty=format:%cD )"
>>    # There might be submodules, so fetch them.
>>    if [ ${recurse} -eq 1 ]; then     _git submodule update --init --recursive
>>   ++    # Older versions of git will store the absolute path of the git tree
>>   +    # in the .git of submodules, while newer versions just use relative
>>   +    # paths. Detect and fix the older variants to use relative paths, so
>>   +    # that the archives are reproducible across a wider range of git
>>   +    # versions. However, we can't do that if git is too old and uses
>>   +    # full repositories for submodules.+    cmd='printf "%s\n" "${path}/"'
>>   +    for module_dir in $( _git submodule --quiet foreach "'${cmd}'" ); do
>>   +        [ -f "${module_dir}/.git" ] || continue
>>   +        relative_dir="$( sed -r -e 's,/+,/,g; s,[^/]+/,../,g' <<<"${module_dir}" )"
>>   +        sed -r -i -e "s:^gitdir\: $(pwd)/:gitdir\: "${relative_dir}":" "${module_dir}/.git"
>>   +    done fi  # Generate the archive, sort with the C locale so that it is reproducible.
>>
>> Should we expand the `find` to ignore files named '.git' so that these don't get added to the tarball at all?
>>
>> find . -not -type d \
>>         -and -not -path "./.git/*" -and -not -name ".git" >"${output}.list"
> 
> We do not want to do tht, because we want to reproduce the existign
> tarballs. And those existign tarballs already contain the .git files.
> 

Gotcha. In the case that this is a one-off patch and may be ported, I 
completely agree. I was thinking of this in terms of the PAX patch 
series we've been discussing via IRC

> Note however that, for people wo have prehistoric git versions, git
> submodulkes will be entire repositories of their own, i.e. the .git of
> submodules is a directory with an actual repository, instead of a plain
> file with a gitdir indirection. For those people, tarballs from git
> archives are not reproducible but we don't care.
> 
> But for the case that concerns us, we don't want to drop the .git files.
> 
> Yeah, that might be an oversight from back when we introduced support
> for submodules, but it's now too late...
> 

However, when we introduce the new PAX tarball, i think we should 
consider dropping all directories and files named '.git'...

Sure it was an oversight before with the GNU tarballs, but we know it's 
a problem and i don't think we need to carry it forward. Or is there a 
detail I'm missing here as well?

> Regards,
> Yann E. MORIN.
> 
>> Seems like it'd be in-line with our current exclusion of the .git/ subfolder because that relative git reference wouldn't be valid
>> after the tarball got unpacked anyway.
>>
>> -- Vincent FazioEmbedded Software Engineer - Linux
>> Extreme Engineering Solutions, Inc
>> [2]http://www.xes-inc.com
>>
>> Links:
>> 1. mailto:yann.morin.1998@free.fr/
>> 2. http://www.xes-inc.com/
>
Thomas Petazzoni May 29, 2020, 9:41 p.m. UTC | #4
On Sun, 24 May 2020 13:47:18 +0200
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> +    # Older versions of git will store the absolute path of the git tree
> +    # in the .git of submodules, while newer versions just use relative
> +    # paths. Detect and fix the older variants to use relative paths, so
> +    # that the archives are reproducible across a wider range of git
> +    # versions. However, we can't do that if git is too old and uses
> +    # full repositories for submodules.

If I understand correctly, there are three "eras":

 - Really old Git versions, where full repositories are used for
   submodules, where we can't do anything.

 - Old Git versions, that stored absolute paths.

 - Recent Git versions, that store relative paths.

Would it be possible to identify which versions we're talking about
here? I'm sure you've done that research, and I think it makes sense to
capture that, as we will certainly wonder what we mean by "older
versions", "old version", "new version.

What is new, old, or older today, will feel quite different 5 years
from now.

Thomas
Yann E. MORIN May 30, 2020, 9:08 p.m. UTC | #5
Thomas, All,

On 2020-05-29 23:41 +0200, Thomas Petazzoni spake thusly:
> On Sun, 24 May 2020 13:47:18 +0200
> "Yann E. MORIN" <yann.morin.1998@free.fr> wrote:
> > +    # Older versions of git will store the absolute path of the git tree
> > +    # in the .git of submodules, while newer versions just use relative
> > +    # paths. Detect and fix the older variants to use relative paths, so
> > +    # that the archives are reproducible across a wider range of git
> > +    # versions. However, we can't do that if git is too old and uses
> > +    # full repositories for submodules.
> 
> If I understand correctly, there are three "eras":
> 
>  - Really old Git versions, where full repositories are used for
>    submodules, where we can't do anything.
> 
>  - Old Git versions, that stored absolute paths.
> 
>  - Recent Git versions, that store relative paths.

Spot-on.

> Would it be possible to identify which versions we're talking about
> here? I'm sure you've done that research, and I think it makes sense to
> capture that, as we will certainly wonder what we mean by "older
> versions", "old version", "new version.
> 
> What is new, old, or older today, will feel quite different 5 years
> from now.

Sorry, you presumed too much: I haven't dug the nitty-gritty details on
when git transitioned from one area to another...

I just stumbled on this issue while working on the conversion of the
archives generated from a git tree, which got me scratch my head for
quite some time... I should have noted the conditions back then, true,
but I forgot...

Regards,
Yann E. MORIN.

> Thomas
> -- 
> Thomas Petazzoni, CTO, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
diff mbox series

Patch

diff --git a/support/download/git b/support/download/git
index 075f665bbf..15d8c66e05 100755
--- a/support/download/git
+++ b/support/download/git
@@ -176,6 +176,19 @@  date="$( _git log -1 --pretty=format:%cD )"
 # There might be submodules, so fetch them.
 if [ ${recurse} -eq 1 ]; then
     _git submodule update --init --recursive
+
+    # Older versions of git will store the absolute path of the git tree
+    # in the .git of submodules, while newer versions just use relative
+    # paths. Detect and fix the older variants to use relative paths, so
+    # that the archives are reproducible across a wider range of git
+    # versions. However, we can't do that if git is too old and uses
+    # full repositories for submodules.
+    cmd='printf "%s\n" "${path}/"'
+    for module_dir in $( _git submodule --quiet foreach "'${cmd}'" ); do
+        [ -f "${module_dir}/.git" ] || continue
+        relative_dir="$( sed -r -e 's,/+,/,g; s,[^/]+/,../,g' <<<"${module_dir}" )"
+        sed -r -i -e "s:^gitdir\: $(pwd)/:gitdir\: "${relative_dir}":" "${module_dir}/.git"
+    done
 fi
 
 # Generate the archive, sort with the C locale so that it is reproducible.