diff mbox series

[RFC] download/git: ban branch references

Message ID 20190619151817.6331-1-john@metanate.com
State RFC
Headers show
Series [RFC] download/git: ban branch references | expand

Commit Message

John Keeping June 19, 2019, 3:18 p.m. UTC
As described in the manual, using a branch name as a version is not
supported.  However, nothing enforces this so it is easy to specify a
branch name either accidentally or because new developers have not read
through the manual.

For Git it is reasonably easy to catch most violations of this rule and
fail the fetch phase.  This isn't intended to be a comprehensive filter
(it can be bypassed with, for example, FOO_VERSION=origin/master), but
should catch accidental use of a branch version and prompt switching to
an immutable reference.

Signed-off-by: John Keeping <john@metanate.com>
---
 support/download/git | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

John Keeping June 19, 2019, 3:34 p.m. UTC | #1
On Wed, 19 Jun 2019 16:18:17 +0100
John Keeping <john@metanate.com> wrote:

> As described in the manual, using a branch name as a version is not
> supported.  However, nothing enforces this so it is easy to specify a
> branch name either accidentally or because new developers have not read
> through the manual.
> 
> For Git it is reasonably easy to catch most violations of this rule and
> fail the fetch phase.  This isn't intended to be a comprehensive filter
> (it can be bypassed with, for example, FOO_VERSION=origin/master), but
> should catch accidental use of a branch version and prompt switching to
> an immutable reference.
> 
> Signed-off-by: John Keeping <john@metanate.com>
> ---

Just after sending this, I realised that the patch below doesn't work
for versions specified as a SHA1.

When we have a SHA1 version, then the earlier call to:

	_git fetch origin "'${cset}:${cset}'"

creates a *branch* refs/heads/${cset} for the SHA1.  Git then prints a
warning when passing the SHA1 to rev-parse:

	Git normally never creates a ref that ends with 40 hex characters
	because it will be ignored when you just specify 40-hex. These refs
	may be created by mistake. For example,

	  git checkout -b $br $(git rev-parse ...)

	where "$br" is somehow empty and a 40-hex ref is created. Please
	examine these refs and maybe delete them. Turn this message off by
	running "git config advice.objectNameWarning false"

Maybe we need to skip that fetch if ${cset} matches [0-9a-fA-F]+ or skip
it if ${cset} doesn't contain '/' since I think all of the special refs
we're interested in there will contain at least one branch separator.

>  support/download/git | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/support/download/git b/support/download/git
> index 075f665bbf..3f26613e61 100755
> --- a/support/download/git
> +++ b/support/download/git
> @@ -134,6 +134,25 @@ if ! _git rev-parse --quiet --verify "'${cset}^{commit}'" >/dev/null 2>&1; then
>      exit 1
>  fi
>  
> +# Check that the specified version is not a branch. We expect a tag or
> +# raw commit hash, and accept some special refs as above. Using a branch
> +# is forbidden because these are mutable references.
> +case "${cset}" in
> +    refs/heads/*)
> +        printf >&2 "Refusing to use Git branch '%s'.\n" "${cset#refs/heads/}"
> +        exit 1
> +        ;;
> +    refs/*)
> +        : pass
> +        ;;
> +    *)
> +        if _git rev-parse --quiet --verify "refs/heads/${cset}" >/dev/null 2>&1; then
> +            printf >&2 "Refusing to use Git branch '%s'.\n" "${cset}"
> +            exit 1
> +        fi
> +        ;;
> +esac
> +
>  # The new cset we want to checkout might have different submodules, or
>  # have sub-dirs converted to/from a submodule. So we would need to
>  # deregister _current_ submodules before we checkout.
Yann E. MORIN June 20, 2019, 4:39 p.m. UTC | #2
John, All,

On 2019-06-19 16:34 +0100, John Keeping spake thusly:
> On Wed, 19 Jun 2019 16:18:17 +0100
> John Keeping <john@metanate.com> wrote:
> 
> > As described in the manual, using a branch name as a version is not
> > supported.  However, nothing enforces this so it is easy to specify a
> > branch name either accidentally or because new developers have not read
> > through the manual.
> > 
> > For Git it is reasonably easy to catch most violations of this rule and
> > fail the fetch phase.  This isn't intended to be a comprehensive filter
> > (it can be bypassed with, for example, FOO_VERSION=origin/master), but
> > should catch accidental use of a branch version and prompt switching to
> > an immutable reference.
> > 
> > Signed-off-by: John Keeping <john@metanate.com>
> > ---
> 
> Just after sending this, I realised that the patch below doesn't work
> for versions specified as a SHA1.

I was going to reply to your original pathc, but since you noticed the
issue on your own :-) here is some additional feedback.

The sha1-as-branch is becuase of the so-called special refs. I already
sent a patch some time ago, but did not get around to respinning it,
which was followed by a patch to drop local branches altogether now:
and finally a patch to detect and abort when the cset was a branch:

    https://git.buildroot.org/~ymorin/git/buildroot/log/?h=yem/dl-git-no-branch-3
    https://git.buildroot.org/~ymorin/git/buildroot/commit/?h=yem/dl-git-no-branch-3&id=6d82e95e6d2de17370734d9253ed11a81bb750f9
    https://git.buildroot.org/~ymorin/git/buildroot/commit/?h=yem/dl-git-no-branch-3&id=5788d8dcb46ccbb595a62568ee47bbce07563dd5
    https://git.buildroot.org/~ymorin/git/buildroot/commit/?h=yem/dl-git-no-branch-3&id=f5a962bcfe481ac3edba0d04ddc06369fa54e8de

So. if you're still interested in the topic, feel fre eto draw
inspiration from the above if you think they may be helpful.

There is some feedback on those somewhere on the list.

> When we have a SHA1 version, then the earlier call to:
> 
> 	_git fetch origin "'${cset}:${cset}'"
> 
> creates a *branch* refs/heads/${cset} for the SHA1.  Git then prints a
> warning when passing the SHA1 to rev-parse:
> 
> 	Git normally never creates a ref that ends with 40 hex characters
> 	because it will be ignored when you just specify 40-hex. These refs
> 	may be created by mistake. For example,
> 
> 	  git checkout -b $br $(git rev-parse ...)
> 
> 	where "$br" is somehow empty and a 40-hex ref is created. Please
> 	examine these refs and maybe delete them. Turn this message off by
> 	running "git config advice.objectNameWarning false"
> 
> Maybe we need to skip that fetch if ${cset} matches [0-9a-fA-F]+ or skip
> it if ${cset} doesn't contain '/' since I think all of the special refs
> we're interested in there will contain at least one branch separator.

I think it is much better to actually drop any local branch: since we do
not want to support branch by name, we don't need any local branch.

Thanks! :-)

Regards,
Yann E. MORIN.

> >  support/download/git | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> > 
> > diff --git a/support/download/git b/support/download/git
> > index 075f665bbf..3f26613e61 100755
> > --- a/support/download/git
> > +++ b/support/download/git
> > @@ -134,6 +134,25 @@ if ! _git rev-parse --quiet --verify "'${cset}^{commit}'" >/dev/null 2>&1; then
> >      exit 1
> >  fi
> >  
> > +# Check that the specified version is not a branch. We expect a tag or
> > +# raw commit hash, and accept some special refs as above. Using a branch
> > +# is forbidden because these are mutable references.
> > +case "${cset}" in
> > +    refs/heads/*)
> > +        printf >&2 "Refusing to use Git branch '%s'.\n" "${cset#refs/heads/}"
> > +        exit 1
> > +        ;;
> > +    refs/*)
> > +        : pass
> > +        ;;
> > +    *)
> > +        if _git rev-parse --quiet --verify "refs/heads/${cset}" >/dev/null 2>&1; then
> > +            printf >&2 "Refusing to use Git branch '%s'.\n" "${cset}"
> > +            exit 1
> > +        fi
> > +        ;;
> > +esac
> > +
> >  # The new cset we want to checkout might have different submodules, or
> >  # have sub-dirs converted to/from a submodule. So we would need to
> >  # deregister _current_ submodules before we checkout.
>
Joel Carlson June 20, 2019, 5:27 p.m. UTC | #3
On Wed, Jun 19, 2019 at 9:19 AM John Keeping <john@metanate.com> wrote:
>
> As described in the manual, using a branch name as a version is not
> supported.  However, nothing enforces this so it is easy to specify a
> branch name either accidentally or because new developers have not read
> through the manual.

At a more general discussion level, I have to admit I like having the
ability to use a branch name even if it isn't supported. During
development I find it easier to use a branch name and just delete the
cached version to force it to re-download when I know I have new
updates.  I can switch to a tag when I have a version I actually want
tagged and "final."  Otherwise I'd have to keep tagging changes and
updating what tag to grab in the config or package file to pull in new
changes.

Granted, it isn't a huge inconvenience to switch how I do things, and
I could see it being useful to "idiot-proof" for people who don't
realize the potential problems with using branch names.  But I at
least wanted to make my opinion known that I like it the way it is.

-Joel
John Keeping June 21, 2019, 12:36 p.m. UTC | #4
On Thu, 20 Jun 2019 11:27:02 -0600
Joel Carlson <JoelsonCarl@gmail.com> wrote:

> On Wed, Jun 19, 2019 at 9:19 AM John Keeping <john@metanate.com> wrote:
> >
> > As described in the manual, using a branch name as a version is not
> > supported.  However, nothing enforces this so it is easy to specify a
> > branch name either accidentally or because new developers have not read
> > through the manual.  
> 
> At a more general discussion level, I have to admit I like having the
> ability to use a branch name even if it isn't supported. During
> development I find it easier to use a branch name and just delete the
> cached version to force it to re-download when I know I have new
> updates.  I can switch to a tag when I have a version I actually want
> tagged and "final."  Otherwise I'd have to keep tagging changes and
> updating what tag to grab in the config or package file to pull in new
> changes.
> 
> Granted, it isn't a huge inconvenience to switch how I do things, and
> I could see it being useful to "idiot-proof" for people who don't
> realize the potential problems with using branch names.  But I at
> least wanted to make my opinion known that I like it the way it is.

I'm curious what advantages you see to this method compared to
_OVERRIDE_SRCDIR.  I find pointing to an external version of the source
to be more flexible and generally faster than pulling from Git every
time.


Regards,
John
John Keeping June 21, 2019, 4:36 p.m. UTC | #5
Hi Yann,

On Thu, 20 Jun 2019 18:39:46 +0200
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> On 2019-06-19 16:34 +0100, John Keeping spake thusly:
> > On Wed, 19 Jun 2019 16:18:17 +0100
> > John Keeping <john@metanate.com> wrote:
> >   
> > > As described in the manual, using a branch name as a version is not
> > > supported.  However, nothing enforces this so it is easy to specify a
> > > branch name either accidentally or because new developers have not read
> > > through the manual.
> > > 
> > > For Git it is reasonably easy to catch most violations of this rule and
> > > fail the fetch phase.  This isn't intended to be a comprehensive filter
> > > (it can be bypassed with, for example, FOO_VERSION=origin/master), but
> > > should catch accidental use of a branch version and prompt switching to
> > > an immutable reference.
> > > 
> > > Signed-off-by: John Keeping <john@metanate.com>
> > > ---  
> > 
> > Just after sending this, I realised that the patch below doesn't work
> > for versions specified as a SHA1.  
> 
> I was going to reply to your original pathc, but since you noticed the
> issue on your own :-) here is some additional feedback.
> 
> The sha1-as-branch is becuase of the so-called special refs. I already
> sent a patch some time ago, but did not get around to respinning it,
> which was followed by a patch to drop local branches altogether now:
> and finally a patch to detect and abort when the cset was a branch:
> 
>     https://git.buildroot.org/~ymorin/git/buildroot/log/?h=yem/dl-git-no-branch-3
>     https://git.buildroot.org/~ymorin/git/buildroot/commit/?h=yem/dl-git-no-branch-3&id=6d82e95e6d2de17370734d9253ed11a81bb750f9
>     https://git.buildroot.org/~ymorin/git/buildroot/commit/?h=yem/dl-git-no-branch-3&id=5788d8dcb46ccbb595a62568ee47bbce07563dd5
>     https://git.buildroot.org/~ymorin/git/buildroot/commit/?h=yem/dl-git-no-branch-3&id=f5a962bcfe481ac3edba0d04ddc06369fa54e8de
> 
> So. if you're still interested in the topic, feel fre eto draw
> inspiration from the above if you think they may be helpful.
> 
> There is some feedback on those somewhere on the list.

Thanks for these pointers, I found the related thread [1] and the
discussion there was really helpful.

What I propose is that we change our repository setup to essentially
mirror the remote (we can't clone with "--mirror" as that creates a bare
repo).  This means we change the fetch command to:

    git fetch origin --prune refs/*:refs/*

and remove the explicit fetch of $cset.

Since this fetches everything it allows the version to be the SHA-1 of a
special ref like a pull request or Gerrit changeset, which is a use case
Ricardo mentioned in the thread at [1].

We can then filter out non-tags with:

    case $(git rev-parse --symbolic-full-name "${cset}") in
        refs/tags/*
            : ok
            ;;
        refs/*
            printf >&2 "Refusing to use Git branch '%s'.\n" "${cset}"
            ;;
        # Anything else is not a ref, must be a raw hash which is ok.
    esac

What do you think?


Regards,
John


[1] http://buildroot-busybox.2317881.n4.nabble.com/PATCH-0-3-download-detect-and-refuse-git-branch-by-name-td200050.html
Yann E. MORIN June 22, 2019, 7:47 a.m. UTC | #6
John, All,

On 2019-06-21 17:36 +0100, John Keeping spake thusly:
> We can then filter out non-tags with:
>     case $(git rev-parse --symbolic-full-name "${cset}") in

Need for 2>/dev/null or there is a big warning printed (see below).

>         refs/tags/*
>             : ok
>             ;;
>         refs/*
>             printf >&2 "Refusing to use Git branch '%s'.\n" "${cset}"
>             ;;
>         # Anything else is not a ref, must be a raw hash which is ok.

That's wrong, because branches may pre-exist from git-clones that we do
currently.

For example, I have a local git clone of linux-firmware, which has:

    $ git branch
    * 1baa34868b2c0a004dc595b20678145e3fff83e7
      44d4fca9922a252a0bd81f6307bcc072a78da54a
      d87753369b82c5f362250c197d04a1e1ef5bf698

    $ git rev-parse --symbolic-full-name 1baa34868b2c0a004dc595b20678145e3fff83e7
    warning: refname '1baa34868b2c0a004dc595b20678145e3fff83e7' is ambiguous.
    Git normally never creates a ref that ends with 40 hex characters
    because it will be ignored when you just specify 40-hex. These refs
    may be created by mistake. For example,

      git checkout -b $br $(git rev-parse ...)

    where "$br" is somehow empty and a 40-hex ref is created. Please
    examine these refs and maybe delete them. Turn this message off by
    running "git config advice.objectNameWarning false"
    refs/heads/1baa34868b2c0a004dc595b20678145e3fff83e7

    $ git rev-parse --symbolic-full-name 1baa34868b2c0a004dc595b20678145e3fff83e7 2>/dev/null
    refs/heads/1baa34868b2c0a004dc595b20678145e3fff83e7

So, 1baa34868b2c0a004dc595b20678145e3fff83e7 is the sha1 we curently use
in Buildroot, the two others we used in the past.

So, your code snippet above woud break the build. We have a single
option, really, which is to try and remove the local branch before
checking if the cset is a branch.

So, my position is that the plan stays basically the same:

  - get rid of special refs. As I already explained, we already have a
    mechanism to catter for that use-case: override-srcdir;

  - remove local branches;

  - refuse branches.

Regards,
Yann E. MORIN.

>     esac
> 
> What do you think?
> 
> 
> Regards,
> John
> 
> 
> [1] http://buildroot-busybox.2317881.n4.nabble.com/PATCH-0-3-download-detect-and-refuse-git-branch-by-name-td200050.html
John Keeping June 24, 2019, 11:30 a.m. UTC | #7
Hi Yann,

On Sat, 22 Jun 2019 09:47:13 +0200
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> On 2019-06-21 17:36 +0100, John Keeping spake thusly:
> > We can then filter out non-tags with:
> >     case $(git rev-parse --symbolic-full-name "${cset}") in  
> 
> Need for 2>/dev/null or there is a big warning printed (see below).

Agreed, although see below for that specific case.

> >         refs/tags/*
> >             : ok
> >             ;;
> >         refs/*
> >             printf >&2 "Refusing to use Git branch '%s'.\n" "${cset}"
> >             ;;
> >         # Anything else is not a ref, must be a raw hash which is ok.  
> 
> That's wrong, because branches may pre-exist from git-clones that we do
> currently.
> 
> For example, I have a local git clone of linux-firmware, which has:
> 
>     $ git branch
>     * 1baa34868b2c0a004dc595b20678145e3fff83e7
>       44d4fca9922a252a0bd81f6307bcc072a78da54a
>       d87753369b82c5f362250c197d04a1e1ef5bf698
> 
>     $ git rev-parse --symbolic-full-name 1baa34868b2c0a004dc595b20678145e3fff83e7
>     warning: refname '1baa34868b2c0a004dc595b20678145e3fff83e7' is ambiguous.
>     Git normally never creates a ref that ends with 40 hex characters
>     because it will be ignored when you just specify 40-hex. These refs
>     may be created by mistake. For example,
> 
>       git checkout -b $br $(git rev-parse ...)
> 
>     where "$br" is somehow empty and a 40-hex ref is created. Please
>     examine these refs and maybe delete them. Turn this message off by
>     running "git config advice.objectNameWarning false"
>     refs/heads/1baa34868b2c0a004dc595b20678145e3fff83e7
> 
>     $ git rev-parse --symbolic-full-name 1baa34868b2c0a004dc595b20678145e3fff83e7 2>/dev/null
>     refs/heads/1baa34868b2c0a004dc595b20678145e3fff83e7
> 
> So, 1baa34868b2c0a004dc595b20678145e3fff83e7 is the sha1 we curently use
> in Buildroot, the two others we used in the past.
> 
> So, your code snippet above woud break the build. We have a single
> option, really, which is to try and remove the local branch before
> checking if the cset is a branch.

Note that the suggestion earlier in my message to change the fetch
mechanism avoids this problem by ensuring that these branches do not
exist.  It turns out to be slightly more complicated than just doing:

	git fetch origin --prune +refs/*:refs/*

but that can be made to work and means that we avoid the whole problem
of unexpected refs.

I'll send updated patches in reply to this message.


Regards,
John
diff mbox series

Patch

diff --git a/support/download/git b/support/download/git
index 075f665bbf..3f26613e61 100755
--- a/support/download/git
+++ b/support/download/git
@@ -134,6 +134,25 @@  if ! _git rev-parse --quiet --verify "'${cset}^{commit}'" >/dev/null 2>&1; then
     exit 1
 fi
 
+# Check that the specified version is not a branch. We expect a tag or
+# raw commit hash, and accept some special refs as above. Using a branch
+# is forbidden because these are mutable references.
+case "${cset}" in
+    refs/heads/*)
+        printf >&2 "Refusing to use Git branch '%s'.\n" "${cset#refs/heads/}"
+        exit 1
+        ;;
+    refs/*)
+        : pass
+        ;;
+    *)
+        if _git rev-parse --quiet --verify "refs/heads/${cset}" >/dev/null 2>&1; then
+            printf >&2 "Refusing to use Git branch '%s'.\n" "${cset}"
+            exit 1
+        fi
+        ;;
+esac
+
 # The new cset we want to checkout might have different submodules, or
 # have sub-dirs converted to/from a submodule. So we would need to
 # deregister _current_ submodules before we checkout.