Message ID | 20190205202433.25292-1-patrickdepinguin@gmail.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [TO-BE-TESTED] support/download/hg: implement repository cache | expand |
On 05/02/2019 21:24, Thomas De Schampheleire wrote: > From: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com> > > Similar to the git download helper, implement a repository cache for > Mercurial repositories. > > The code is mostly guided by the implementation for git, with certain parts > copied almost verbatim. > > Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com> > --- > support/download/hg | 81 +++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 78 insertions(+), 3 deletions(-) > > Note: this patch only got limited testing so far and needs to be tested further. > But I already wanted to send it out for feedback and for those of you that want > to help test it. It would be a good idea to do the same kind of copy and pasting on support/testing/tests/download/test_git and its helpers and include that as the first patches in the series. BTW, do you already have an idea of performance improvement? > diff --git a/support/download/hg b/support/download/hg > index efb515fca5..bb5cc87969 100755 > --- a/support/download/hg > +++ b/support/download/hg > @@ -1,7 +1,7 @@ > #!/usr/bin/env bash > > # We want to catch any unexpected failure, and exit immediately > -set -e > +set -E If you do this, the comment above is no longer valid... -e Exit immediately if a command exits with a non-zero status. -E If set, the ERR trap is inherited by shell functions. are not the same thing! However, since AFAICS the only function in this file is the trap handler itself, this seems to be a mistake, no? Ow, this really is inherited from git, introduced by commit b7efb43e86da96. Yann, care to explain? > > # Download helper for hg, to be called from the download wrapper script > # > @@ -10,11 +10,39 @@ set -e > # -o FILE Generate archive in FILE. > # -u URI Clone from repository at URI. > # -c CSET Use changeset (or revision) CSET. > +# -d DLDIR Download directory path. Hm, looks like this is missing in the git helper :-) Yann? > # -n NAME Use basename NAME. > # [snip] > + Do *not* work in that directory; your changes will eventually get > + lost. Do *not* even use it as a remote, or as the source for new This bit is actually rubbish: it's perfectly fine to clone it, as long as you don't push to it. Oh well. > + worktrees; your commits will eventually get lost. > +_EOF_ > + > +if ! [ -d "${hg_cache}/.hg" ]; then Is the presence of a .hg directory a 100% reliable of identifying it as a mercurial repo? > + # While it would be possible to create an empty repo and use pull > + # subsequently, we intentionally use clone the first time as it could be > + # faster than pull thanks to clonebundles, if enabled on the server. > + printf "Cloning repository from '${uri}' into '${hg_cache}'\n" > + _hg clone ${verbose} "${@}" --noupdate "'${uri}'" "'${hg_cache}'" > +else > + printf "Pulling repository from '${uri}' into '${hg_cache}'\n" > + _hg pull ${verbose} "${@}" --repository "'${hg_cache}'" "'${uri}'" > +fi Does hg always do a full mirror on clone/pull? Or doesn't it have the concept of special refs that git has? > + > +# Check that the changeset does exist. If it does not, re-cloning from > +# scratch won't help, so we don't want to trash the repository for a > +# missing commit. We just exit without going through the ERR trap. > +if ! _hg identify --repository "'${hg_cache}'" --rev "'${cset}'" >/dev/null 2>&1; then > + printf "Commit '%s' does not exist in this repository\n." "${cset}" > + exit 1 > +fi > + > +# Make sure that there is no working directory. This will clear any user > +# temptation to work in this directory. Good plan :-) Regards, Arnout > +_hg update --repository "'${hg_cache}'" null >/dev/null 2>&1 > > -_hg archive ${verbose} --repository "'${basename}'" --type tgz \ > +_hg archive ${verbose} --repository "'${hg_cache}'" --type tgz \ > --prefix "'${basename}'" --rev "'${cset}'" \ > - >"${output}" >
Arnout, All, On 2019-02-07 22:33 +0100, Arnout Vandecappelle spake thusly: > On 05/02/2019 21:24, Thomas De Schampheleire wrote: > > From: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com> > > > > Similar to the git download helper, implement a repository cache for > > Mercurial repositories. > > > > The code is mostly guided by the implementation for git, with certain parts > > copied almost verbatim. > > > > Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com> > > --- > > index efb515fca5..bb5cc87969 100755 > > --- a/support/download/hg > > +++ b/support/download/hg > > @@ -1,7 +1,7 @@ > > #!/usr/bin/env bash > > > > # We want to catch any unexpected failure, and exit immediately > > -set -e > > +set -E > > If you do this, the comment above is no longer valid... > -e Exit immediately if a command exits with a non-zero status. > -E If set, the ERR trap is inherited by shell functions. > are not the same thing! > > However, since AFAICS the only function in this file is the trap handler > itself, this seems to be a mistake, no? > > Ow, this really is inherited from git, introduced by commit b7efb43e86da96. > Yann, care to explain? Yes, I care to explain. Even though I do write some crap most of the time, there are in fact two functions in the git backend. Like all backends, the actual tool is wrapped into a _git() function, which was introduced more than three years ago, by commit 3f2bdd070 (support/download: protect from custom commands with spaces in args), as a way to actually fix a bug actually reported by Thomas DS. Yes, this makes it that the set -E is needed, after all. > > # Download helper for hg, to be called from the download wrapper script > > # > > @@ -10,11 +10,39 @@ set -e > > # -o FILE Generate archive in FILE. > > # -u URI Clone from repository at URI. > > # -c CSET Use changeset (or revision) CSET. > > +# -d DLDIR Download directory path. > Hm, looks like this is missing in the git helper :-) Yann? As I said above: I do write a lot of crap. However, in this case, I'm not the culprit; see commit 6d938bcb (download: git: introduce cache feature). But yeah, the comment-help is not accurate. > > # -n NAME Use basename NAME. > > # > [snip] > > + Do *not* work in that directory; your changes will eventually get > > + lost. Do *not* even use it as a remote, or as the source for new > > This bit is actually rubbish: it's perfectly fine to clone it, as long as you > don't push to it. Oh well. Indeed, it is technically possible to clone from it. However, that means that the 'origin' would point to that, and people would be very easily tempted to push to it. As a consequence, they could loose their work. And this is really a cache for use by Buildroot, and we really want users not to rely on it. Hence, we really suggest they do not use it at all, not even as their 'origin'. [--SNIP--] > > +# Make sure that there is no working directory. This will clear any user > > +# temptation to work in this directory. > Good plan :-) Hmm.. I think someone once said there was not problem leaving cruft in BR2_DL_DIR, because it does not matter what's inside as it is a location private to Buildroot... (not directly in those words, but the message was subtly similar.) Regards, Yann E. MORIN. > Regards, > Arnout > > > +_hg update --repository "'${hg_cache}'" null >/dev/null 2>&1 > > > > -_hg archive ${verbose} --repository "'${basename}'" --type tgz \ > > +_hg archive ${verbose} --repository "'${hg_cache}'" --type tgz \ > > --prefix "'${basename}'" --rev "'${cset}'" \ > > - >"${output}" > >
On 08/02/2019 17:54, Yann E. MORIN wrote: > Arnout, All, > > On 2019-02-07 22:33 +0100, Arnout Vandecappelle spake thusly: >> On 05/02/2019 21:24, Thomas De Schampheleire wrote: >>> From: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com> >>> >>> Similar to the git download helper, implement a repository cache for >>> Mercurial repositories. >>> >>> The code is mostly guided by the implementation for git, with certain parts >>> copied almost verbatim. >>> >>> Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com> >>> --- >>> index efb515fca5..bb5cc87969 100755 >>> --- a/support/download/hg >>> +++ b/support/download/hg >>> @@ -1,7 +1,7 @@ >>> #!/usr/bin/env bash >>> >>> # We want to catch any unexpected failure, and exit immediately >>> -set -e >>> +set -E >> >> If you do this, the comment above is no longer valid... >> -e Exit immediately if a command exits with a non-zero status. >> -E If set, the ERR trap is inherited by shell functions. >> are not the same thing! >> >> However, since AFAICS the only function in this file is the trap handler >> itself, this seems to be a mistake, no? >> >> Ow, this really is inherited from git, introduced by commit b7efb43e86da96. >> Yann, care to explain? > > Yes, I care to explain. Even though I do write some crap most of the > time, I never said that! > there are in fact two functions in the git backend. Oh yes, I missed the _git/_hg function. > > Like all backends, the actual tool is wrapped into a _git() function, > which was introduced more than three years ago, by commit 3f2bdd070 > (support/download: protect from custom commands with spaces in args), as > a way to actually fix a bug actually reported by Thomas DS. > > Yes, this makes it that the set -E is needed, after all. But then the comment is wrong, no? Because -E does not imply -e and without -e there is no immediate exit. Or am I missing something else? Regards, Arnout [snip]
Arnout, All, On 2019-02-08 20:27 +0100, Arnout Vandecappelle spake thusly: > On 08/02/2019 17:54, Yann E. MORIN wrote: > > On 2019-02-07 22:33 +0100, Arnout Vandecappelle spake thusly: > >> On 05/02/2019 21:24, Thomas De Schampheleire wrote: > >>> From: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com> > >>> > >>> Similar to the git download helper, implement a repository cache for > >>> Mercurial repositories. > >>> > >>> The code is mostly guided by the implementation for git, with certain parts > >>> copied almost verbatim. > >>> > >>> Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com> > >>> --- > >>> index efb515fca5..bb5cc87969 100755 > >>> --- a/support/download/hg > >>> +++ b/support/download/hg > >>> @@ -1,7 +1,7 @@ > >>> #!/usr/bin/env bash > >>> > >>> # We want to catch any unexpected failure, and exit immediately > >>> -set -e > >>> +set -E > >> Ow, this really is inherited from git, introduced by commit b7efb43e86da96. > >> Yann, care to explain? > > > > Yes, I care to explain. Even though I do write some crap most of the > > time, > I never said that! I said it! :-) [--SNIP--] > > Like all backends, the actual tool is wrapped into a _git() function, > > which was introduced more than three years ago, by commit 3f2bdd070 > > (support/download: protect from custom commands with spaces in args), as > > a way to actually fix a bug actually reported by Thomas DS. > > > > Yes, this makes it that the set -E is needed, after all. > > But then the comment is wrong, no? Because -E does not imply -e and without -e > there is no immediate exit. Or am I missing something else? Yes, it is partly incorrect. But consider that, with the ERR trap, we _do_ catch unexpected failures, but we no longer _immediately_ exit anymore, indeed. We just postpone the exit to after the second tentative, in which case we do exit immediately from the ERR trap. However, I was thinking that maybe we could revisit this try-again code, and just exit on the first error and rely on the user to actually clean the git tree if it ever gets corrupted. We initially added that, in case the repository was uterly broken, because our backend itself was creating situations where it _could_ leave the repo in an inconsistent state, or the autobuilder code would do that (e.g. timeout-kill-9 the build right in the middle of a git action). However, the git backend was fixed to supposedly always leave the tree in a consistent state now, and the autobuilder code now timeouts from the last step, not from the build start, so we can assume that not git action would now be interrupted abruptely anymore... Yet, I am a bit contrived, as the git backend is partly my fault, and so I'm not entirely happy with it... Regards, Yann E. MORIN.
On 08/02/2019 20:54, Yann E. MORIN wrote: > Arnout, All, > > On 2019-02-08 20:27 +0100, Arnout Vandecappelle spake thusly: >> On 08/02/2019 17:54, Yann E. MORIN wrote: >>> On 2019-02-07 22:33 +0100, Arnout Vandecappelle spake thusly: >>>> On 05/02/2019 21:24, Thomas De Schampheleire wrote: >>>>> From: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com> >>>>> >>>>> Similar to the git download helper, implement a repository cache for >>>>> Mercurial repositories. >>>>> >>>>> The code is mostly guided by the implementation for git, with certain parts >>>>> copied almost verbatim. >>>>> >>>>> Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com> >>>>> --- >>>>> index efb515fca5..bb5cc87969 100755 >>>>> --- a/support/download/hg >>>>> +++ b/support/download/hg >>>>> @@ -1,7 +1,7 @@ >>>>> #!/usr/bin/env bash >>>>> >>>>> # We want to catch any unexpected failure, and exit immediately >>>>> -set -e >>>>> +set -E >>>> Ow, this really is inherited from git, introduced by commit b7efb43e86da96. >>>> Yann, care to explain? >>> >>> Yes, I care to explain. Even though I do write some crap most of the >>> time, >> I never said that! > > I said it! :-) > > [--SNIP--] >>> Like all backends, the actual tool is wrapped into a _git() function, >>> which was introduced more than three years ago, by commit 3f2bdd070 >>> (support/download: protect from custom commands with spaces in args), as >>> a way to actually fix a bug actually reported by Thomas DS. >>> >>> Yes, this makes it that the set -E is needed, after all. >> >> But then the comment is wrong, no? Because -E does not imply -e and without -e >> there is no immediate exit. Or am I missing something else? > > Yes, it is partly incorrect. But consider that, with the ERR trap, we > _do_ catch unexpected failures, It seems I'm unable to interpret documentation correctly, because I read the trap help text as stating that trap ERR only works if -e is set... A SIGNAL_SPEC of ERR means to execute ARG each time a command's failure would cause the shell to exit when the -e option is enabled. > but we no longer _immediately_ exit > anymore, indeed. We just postpone the exit to after the second > tentative, in which case we do exit immediately from the ERR trap. > > However, I was thinking that maybe we could revisit this try-again code, > and just exit on the first error and rely on the user to actually clean > the git tree if it ever gets corrupted. > > We initially added that, in case the repository was uterly broken, > because our backend itself was creating situations where it _could_ > leave the repo in an inconsistent state, or the autobuilder code would > do that (e.g. timeout-kill-9 the build right in the middle of a git > action). Kill -9 of git should never leave the repo in an inconsistent state anyway. The only reasonable way to get an inconsistent repo state is randomly deleting files from it (and disk failure, of course). But doesn't the autobuilder do that, randomly delete files? Regards, Arnout > > However, the git backend was fixed to supposedly always leave the tree > in a consistent state now, and the autobuilder code now timeouts from > the last step, not from the build start, so we can assume that not git > action would now be interrupted abruptely anymore... > > Yet, I am a bit contrived, as the git backend is partly my fault, and so > I'm not entirely happy with it... > > Regards, > Yann E. MORIN. >
Arnout, All, On 2019-02-08 21:54 +0100, Arnout Vandecappelle spake thusly: > On 08/02/2019 20:54, Yann E. MORIN wrote: [--SNIP--] > > Yes, it is partly incorrect. But consider that, with the ERR trap, we > > _do_ catch unexpected failures, > > It seems I'm unable to interpret documentation correctly, because I read the > trap help text as stating that trap ERR only works if -e is set... > > A SIGNAL_SPEC > of ERR means to execute ARG each time a command's failure would cause the > shell to exit when the -e option is enabled. Yes, the sentence is ambiguous. But reading it carefully, they wrote "would cause", i.e.: "do A if B would cause C" does not require "B". And indeed: $ cat ess.sh #!/bin/bash set -E foo() { echo Trying...; false; echo Bummer; } foo trap '{ echo ERR; exit 42; }' ERR foo $ ./ess.sh Trying... Bummer Trying... ERR $ echo $? 42 > > but we no longer _immediately_ exit > > anymore, indeed. We just postpone the exit to after the second > > tentative, in which case we do exit immediately from the ERR trap. > > > > However, I was thinking that maybe we could revisit this try-again code, > > and just exit on the first error and rely on the user to actually clean > > the git tree if it ever gets corrupted. > > > > We initially added that, in case the repository was uterly broken, > > because our backend itself was creating situations where it _could_ > > leave the repo in an inconsistent state, or the autobuilder code would > > do that (e.g. timeout-kill-9 the build right in the middle of a git > > action). > > Kill -9 of git should never leave the repo in an inconsistent state anyway. The Well, not sure what caused it, in the end, but we did have random git failures, and not ncessarily caused by the autobuilders removing files in .git/ (but that was quite some tiem ago now, so I don't recall the details...) > only reasonable way to get an inconsistent repo state is randomly deleting files > from it (and disk failure, of course). But doesn't the autobuilder do that, > randomly delete files? It used to, but no longer does, descend into sub-directopries, so could remove files in BR2_DL_DIR/pkg/git/.git/ https://git.buildroot.org/buildroot-test/tree/scripts/autobuild-run#n291 https://git.buildroot.org/buildroot-test/commit/scripts/autobuild-run?id=cb1c4829b1d1ab660736811101fb6d988a8d14e7 Regards, Yann E. MORIN.
diff --git a/support/download/hg b/support/download/hg index efb515fca5..bb5cc87969 100755 --- a/support/download/hg +++ b/support/download/hg @@ -1,7 +1,7 @@ #!/usr/bin/env bash # We want to catch any unexpected failure, and exit immediately -set -e +set -E # Download helper for hg, to be called from the download wrapper script # @@ -10,11 +10,39 @@ set -e # -o FILE Generate archive in FILE. # -u URI Clone from repository at URI. # -c CSET Use changeset (or revision) CSET. +# -d DLDIR Download directory path. # -n NAME Use basename NAME. # # Environment: # HG : the hg command to call +# Save our path and options in case we need to call ourselves again +myname="${0}" +declare -a OPTS=("${@}") + +# This function is called when an error occurs. Its job is to attempt a clone +# from scratch (only once!) in case the hg tree is borked, or in case an +# unexpected and unsupported situation arises with uncommitted stuff (e.g. if +# the user manually mucked around in the hg cache). +# Note that this function would also be triggered by connection errors during +# the clone/pull. +_on_error() { + local ret=${?} + + printf "Detected a possibly corrupted hg cache.\n" >&2 + if ${BR_HG_BACKEND_FIRST_FAULT:-false}; then + printf "This is the second time in a row; bailing out\n" >&2 + exit ${ret} + fi + export BR_HG_BACKEND_FIRST_FAULT=true + + printf "Removing it and starting afresh.\n" >&2 + + rm -rf "${hg_cache}" + + exec "${myname}" "${OPTS[@]}" || exit ${ret} +} + verbose= while getopts "${BR_BACKEND_DL_GETOPTS}" OPT; do case "${OPT}" in @@ -22,6 +50,7 @@ while getopts "${BR_BACKEND_DL_GETOPTS}" OPT; do o) output="${OPTARG}";; u) uri="${OPTARG}";; c) cset="${OPTARG}";; + d) dl_dir="${OPTARG}";; n) basename="${OPTARG}";; :) printf "option '%s' expects a mandatory argument\n" "${OPTARG}"; exit 1;; \?) printf "unknown option '%s'\n" "${OPTARG}" >&2; exit 1;; @@ -36,8 +65,54 @@ _hg() { eval ${HG} "${@}" } -_hg clone ${verbose} "${@}" --noupdate "'${uri}'" "'${basename}'" +# Create and cd into the directory that will contain the local hg cache +hg_cache="${dl_dir}/hg" +mkdir -p "${hg_cache}" + +# Any error now should try to recover +trap _on_error ERR + +# Create a warning file, that the user should not use the hg cache. +# It's ours. Our precious. +cat <<-_EOF_ >"${dl_dir}/hg.readme" + IMPORTANT NOTE! + + The hg tree located in this directory is for the exclusive use + by Buildroot, which uses it as a local cache to reduce bandwidth + usage. + + Buildroot *will* trash any changes in that tree whenever it needs + to use it. Buildroot may even remove it in case it detects the + repository may have been damaged or corrupted. + + Do *not* work in that directory; your changes will eventually get + lost. Do *not* even use it as a remote, or as the source for new + worktrees; your commits will eventually get lost. +_EOF_ + +if ! [ -d "${hg_cache}/.hg" ]; then + # While it would be possible to create an empty repo and use pull + # subsequently, we intentionally use clone the first time as it could be + # faster than pull thanks to clonebundles, if enabled on the server. + printf "Cloning repository from '${uri}' into '${hg_cache}'\n" + _hg clone ${verbose} "${@}" --noupdate "'${uri}'" "'${hg_cache}'" +else + printf "Pulling repository from '${uri}' into '${hg_cache}'\n" + _hg pull ${verbose} "${@}" --repository "'${hg_cache}'" "'${uri}'" +fi + +# Check that the changeset does exist. If it does not, re-cloning from +# scratch won't help, so we don't want to trash the repository for a +# missing commit. We just exit without going through the ERR trap. +if ! _hg identify --repository "'${hg_cache}'" --rev "'${cset}'" >/dev/null 2>&1; then + printf "Commit '%s' does not exist in this repository\n." "${cset}" + exit 1 +fi + +# Make sure that there is no working directory. This will clear any user +# temptation to work in this directory. +_hg update --repository "'${hg_cache}'" null >/dev/null 2>&1 -_hg archive ${verbose} --repository "'${basename}'" --type tgz \ +_hg archive ${verbose} --repository "'${hg_cache}'" --type tgz \ --prefix "'${basename}'" --rev "'${cset}'" \ - >"${output}"