diff mbox series

[RFC,3/3] download/git: unshallow when fetching all refs

Message ID 20180412092855.30621-4-ricardo.martincoski@gmail.com
State Rejected
Headers show
Series fix some corner cases for download/git v1 | expand

Commit Message

Ricardo Martincoski April 12, 2018, 9:28 a.m. UTC
With any version of git, after the git cache was once populated using a
shallow fetch, a full fetch succeeds but doesn't download the commits
behind the reference that was fetched with --depth 1. In this case the
download fails like this:
 Fetching all references
 Could not fetch special ref 'sha1'; assuming it is not special.
 fatal: reference is not a tree: sha1

One scenario in buildroot development that would trigger this sequence
is for example when a package is bumped on master branch, some users
create the git cache at this time, then the bump is reverted.

Another scenario is when giving maintenance to a product version that
uses one version of buildroot using the same build farm that uses a
newer version of buildroot.

Signed-off-by: Ricardo Martincoski <ricardo.martincoski@gmail.com>
Cc: "Yann E. MORIN" <yann.morin.1998@free.fr>
---
WARNING: this patch is in early stage, really RFC. I tested it in a
short period of time.

I tested it by first creating the cache for dt-utils in its current
version, then I changed dt-utils version to the commit immediately
before it (ea1d08e26b73a4855165ed56043fa439ff9948cb). Without this patch
the fetch of all refs succeeds but does not bring the desired commit and
the checkout fails.
---
 support/download/git | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Thomas Petazzoni April 12, 2018, 11:28 a.m. UTC | #1
Hello,

Thanks a lot Ricardo for all this work on the download issues, much
appreciated. One question on this specific commit.

On Thu, 12 Apr 2018 06:28:55 -0300, Ricardo Martincoski wrote:
> With any version of git, after the git cache was once populated using a
> shallow fetch, a full fetch succeeds but doesn't download the commits
> behind the reference that was fetched with --depth 1. In this case the
> download fails like this:
>  Fetching all references
>  Could not fetch special ref 'sha1'; assuming it is not special.
>  fatal: reference is not a tree: sha1
> 
> One scenario in buildroot development that would trigger this sequence
> is for example when a package is bumped on master branch, some users
> create the git cache at this time, then the bump is reverted.
> 
> Another scenario is when giving maintenance to a product version that
> uses one version of buildroot using the same build farm that uses a
> newer version of buildroot.

In the light of this, wouldn't it be simpler to stop doing shallow
clones at all ? A shallow clone did make a lot of sense back when we
didn't cache the Git repositories. But now that we are caching the
results, does it make sense to keep the complexity of the code to use
shallow clones ?

Note that this is really an open question, there are possibly some
convincing argument for us to keep a shallow clone strategy.

Best regards,

Thomas
Yann E. MORIN April 12, 2018, 5:33 p.m. UTC | #2
Thomas, Ricardo, All,

On 2018-04-12 13:28 +0200, Thomas Petazzoni spake thusly:
> Thanks a lot Ricardo for all this work on the download issues, much
> appreciated.

Indeed, thanks!

> One question on this specific commit.
> 
> On Thu, 12 Apr 2018 06:28:55 -0300, Ricardo Martincoski wrote:
> > With any version of git, after the git cache was once populated using a
> > shallow fetch, a full fetch succeeds but doesn't download the commits
> > behind the reference that was fetched with --depth 1. In this case the
> > download fails like this:
> >  Fetching all references
> >  Could not fetch special ref 'sha1'; assuming it is not special.
> >  fatal: reference is not a tree: sha1
> > 
> > One scenario in buildroot development that would trigger this sequence
> > is for example when a package is bumped on master branch, some users
> > create the git cache at this time, then the bump is reverted.
> > 
> > Another scenario is when giving maintenance to a product version that
> > uses one version of buildroot using the same build farm that uses a
> > newer version of buildroot.
> 
> In the light of this, wouldn't it be simpler to stop doing shallow
> clones at all ? A shallow clone did make a lot of sense back when we
> didn't cache the Git repositories. But now that we are caching the
> results, does it make sense to keep the complexity of the code to use
> shallow clones ?
> 
> Note that this is really an open question, there are possibly some
> convincing argument for us to keep a shallow clone strategy.

Indeed, I don't think it makes sense anymore...

So, intead of fixing it, let's just do full clones now.

Regards,
Yann E. MORIN.
Ricardo Martincoski April 13, 2018, 6:32 p.m. UTC | #3
Hello,

On Thu, Apr 12, 2018 at 02:33 PM, Yann E. MORIN wrote:

> On 2018-04-12 13:28 +0200, Thomas Petazzoni spake thusly:

>> > One scenario in buildroot development that would trigger this sequence
>> > is for example when a package is bumped on master branch, some users
>> > create the git cache at this time, then the bump is reverted.
>> > 
>> > Another scenario is when giving maintenance to a product version that
>> > uses one version of buildroot using the same build farm that uses a
>> > newer version of buildroot.
>> 
>> In the light of this, wouldn't it be simpler to stop doing shallow
>> clones at all ? A shallow clone did make a lot of sense back when we
>> didn't cache the Git repositories. But now that we are caching the
>> results, does it make sense to keep the complexity of the code to use
>> shallow clones ?
>> 
>> Note that this is really an open question, there are possibly some
>> convincing argument for us to keep a shallow clone strategy.
> 
> Indeed, I don't think it makes sense anymore...
> 
> So, intead of fixing it, let's just do full clones now.

I agree it is less needed now than before.

The worst scenario (linux trees) is covered by:
- the use of tarballs from GitHub in the defconfigs, for newcomers who sometimes
  just want to use a defconfig, add few packages and use the image in a new
  board;
- in the long run, due to the git cache feature.

But I think people will miss this feature. At least I will.

In the case of someone always using tags from a linux repo, this person
would never need to download the full repo if the shallow fetch remains.

When someone wants just to try a package, for example, a debug tool, which
maybe will never be used again, this person would need to do a full clone,
instead of potentially do a shallow fetch.

The repo for some packages can be located on slow servers (not necessarily a
package in the tree, it can be on a br2-external).

And the least appealing argument: I was willing to recreate
http://patchwork.ozlabs.org/patch/702006/ on top of the new infra.


Regards,
Ricardo
Yann E. MORIN April 15, 2018, 12:08 p.m. UTC | #4
Ricardo, All,

On 2018-04-13 15:32 -0300, Ricardo Martincoski spake thusly:
> On Thu, Apr 12, 2018 at 02:33 PM, Yann E. MORIN wrote:
> > On 2018-04-12 13:28 +0200, Thomas Petazzoni spake thusly:
> 
> >> > One scenario in buildroot development that would trigger this sequence
> >> > is for example when a package is bumped on master branch, some users
> >> > create the git cache at this time, then the bump is reverted.
> >> > 
> >> > Another scenario is when giving maintenance to a product version that
> >> > uses one version of buildroot using the same build farm that uses a
> >> > newer version of buildroot.
> >> 
> >> In the light of this, wouldn't it be simpler to stop doing shallow
> >> clones at all ? A shallow clone did make a lot of sense back when we
> >> didn't cache the Git repositories. But now that we are caching the
> >> results, does it make sense to keep the complexity of the code to use
> >> shallow clones ?
> >> 
> >> Note that this is really an open question, there are possibly some
> >> convincing argument for us to keep a shallow clone strategy.
> > 
> > Indeed, I don't think it makes sense anymore...
> > 
> > So, intead of fixing it, let's just do full clones now.
> 
> I agree it is less needed now than before.

OK.

> The worst scenario (linux trees) is covered by:
> - the use of tarballs from GitHub in the defconfigs, for newcomers who sometimes
>   just want to use a defconfig, add few packages and use the image in a new
>   board;
> - in the long run, due to the git cache feature.
> 
> But I think people will miss this feature. At least I will.

Except for the linux git tree and a very few other packages we get from
git, most git tree are reasonably sized, so that the overhead of doign a
full clone is not much as comp[ared to the shallow clone.

And since it also makes our code much simpler and more maintaineable,
that's still a win for me...

> In the case of someone always using tags from a linux repo, this person
> would never need to download the full repo if the shallow fetch remains.
> 
> When someone wants just to try a package, for example, a debug tool, which
> maybe will never be used again, this person would need to do a full clone,
> instead of potentially do a shallow fetch.
> 
> The repo for some packages can be located on slow servers (not necessarily a
> package in the tree, it can be on a br2-external).
> 
> And the least appealing argument: I was willing to recreate
> http://patchwork.ozlabs.org/patch/702006/ on top of the new infra.

I'm still not convinced any more than I previously was. I.e. I still think
we should go with full clone.

Regards,
Yann E. MORIN.
Ricardo Martincoski April 16, 2018, 3:04 a.m. UTC | #5
Hello,

On Sun, Apr 15, 2018 at 09:08 AM, Yann E. MORIN wrote:

>> The worst scenario (linux trees) is covered by:
>> - the use of tarballs from GitHub in the defconfigs, for newcomers who sometimes
>>   just want to use a defconfig, add few packages and use the image in a new
>>   board;
>> - in the long run, due to the git cache feature.
>> 
>> But I think people will miss this feature. At least I will.
> 
> Except for the linux git tree and a very few other packages we get from
> git, most git tree are reasonably sized, so that the overhead of doign a
> full clone is not much as comp[ared to the shallow clone.
> 
> And since it also makes our code much simpler and more maintaineable,

Much simpler code indeed. I see in your WIP series.

> that's still a win for me...

Unrelated question:
Should we add something about the git cache to 'Migrating from older Buildroot
versions'?
I am not sure because it does not really need any action from the user. It only
changes the behavior. For better in the long run.

An alternative is to reformat that e-mail that Thomas sent with a nice summary
of the feature and add it to 'Download infrastructure'.
If you don't find an easy way to add a user-friendly message in the rare case
the download infra cannot recover the cache (after your WIP series), the
message
     If you are not sure how to solve this, remove the git cache.
Could be added to this section of the manual.

>> In the case of someone always using tags from a linux repo, this person
>> would never need to download the full repo if the shallow fetch remains.
>> 
>> When someone wants just to try a package, for example, a debug tool, which
>> maybe will never be used again, this person would need to do a full clone,
>> instead of potentially do a shallow fetch.
>> 
>> The repo for some packages can be located on slow servers (not necessarily a
>> package in the tree, it can be on a br2-external).
>> 
>> And the least appealing argument: I was willing to recreate
>> http://patchwork.ozlabs.org/patch/702006/ on top of the new infra.
> 
> I'm still not convinced any more than I previously was. I.e. I still think
> we should go with full clone.

OK. If someone else brings a more convincing argument in the future we can
always re-add it covering all its corner cases.
I will change this patch to Rejected.

For my use case of sha1 from Gerrit, I will pursue different paths...
Maybe tuning the server side options to generate tarballs.
Maybe adding support to register download methods from a br2-external.

Regards,
Ricardo
diff mbox series

Patch

diff --git a/support/download/git b/support/download/git
index 68b5d920a1..8216da0a71 100755
--- a/support/download/git
+++ b/support/download/git
@@ -76,7 +76,15 @@  if [ -n "$(_git ls-remote origin "'${cset}'" 2>&1)" ]; then
 fi
 if [ ${git_done} -eq 0 ]; then
     printf "Fetching all references\n"
-    _git fetch origin -u "'+refs/tags/*:refs/tags/*'" "'+refs/heads/*:refs/remotes/origin/*'"
+    # After a shallow fetch was once sucessful in this repo, the only way to
+    # ensure the commits behind its current depth are available is to unshallow.
+    # Git versions older than 2.15.0 do not have a command to know if a repo is
+    # shallow, so test for the actual file.
+    if [ -f "$(git rev-parse --git-dir)/shallow" ]; then
+        unshallow=--unshallow
+    fi
+    _git fetch origin -u ${unshallow} \
+               "'+refs/tags/*:refs/tags/*'" "'+refs/heads/*:refs/remotes/origin/*'"
 fi
 
 # Try to get the special refs exposed by some forges (pull-requests for