diff mbox series

[PATCHv5,1/9] support/download: reintroduce 'source-check' target

Message ID 20190219103839.25409-1-patrickdepinguin@gmail.com
State Rejected
Headers show
Series [PATCHv5,1/9] support/download: reintroduce 'source-check' target | expand

Commit Message

Thomas De Schampheleire Feb. 19, 2019, 10:38 a.m. UTC
From: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>

Commit bf28a165d992681a85b43a886005e669b3cb579b removed the source-check
target to allow easier refactoring of the download infrastructure.
It was partly based on the fact that no-one really used this target.

However, it turns out that there _are_ good uses for it. In a work
environment where many people are submitting changes to a Buildroot
repository, one cannot always blindly trust every change. A typical case
of human error is when bumping the version of a package but forgetting to
upload the tarball to the internal BR2_PRIMARY_SITE or (in case of a
repository) pushing the new changesets to the repository.
If a user cannot directly push to the Buildroot repository but needs to
queue their change via an automatic validation system, that validation
system can use 'make source-check' on the relevant defconfigs to protect
against such errors. A full download would also be possible but takes
longer and unnecessarily uses internal network bandwidth.

With that use case in mind, this commit reintroduces the 'source-check'
target, but embedded in the current situation with a dl-wrapper.  The
changes to the wrapper are minimal (not considering additional indentation).
A new option '-C' means 'check only' and will be passed to the download
backends. If the backend supports the option, no download will happen. If it
does not, then the backend will actually perform a download as a means of
checking that the source exists (a form of graceful degradation). In neither
case, though, hash checking is performed (as the standard case is without
download thus without file to calculate hashes on).

Subsequent commits will actually implement -C in the backends.

Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
---
 Makefile                    |   7 +++
 package/pkg-download.mk     |   6 +++
 package/pkg-generic.mk      |  14 ++++-
 support/download/dl-wrapper | 100 ++++++++++++++++++++----------------
 4 files changed, 82 insertions(+), 45 deletions(-)

Comments

Yann E. MORIN March 17, 2019, 2:32 p.m. UTC | #1
Thomas, All,

On 2019-02-19 11:38 +0100, Thomas De Schampheleire spake thusly:
> From: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
> 
> Commit bf28a165d992681a85b43a886005e669b3cb579b removed the source-check
> target to allow easier refactoring of the download infrastructure.
> It was partly based on the fact that no-one really used this target.
> 
> However, it turns out that there _are_ good uses for it. In a work
> environment where many people are submitting changes to a Buildroot
> repository, one cannot always blindly trust every change. A typical case
> of human error is when bumping the version of a package but forgetting to
> upload the tarball to the internal BR2_PRIMARY_SITE or (in case of a
> repository) pushing the new changesets to the repository.
> If a user cannot directly push to the Buildroot repository but needs to
> queue their change via an automatic validation system, that validation
> system can use 'make source-check' on the relevant defconfigs to protect
> against such errors. A full download would also be possible but takes
> longer and unnecessarily uses internal network bandwidth.
> 
> With that use case in mind, this commit reintroduces the 'source-check'
> target, but embedded in the current situation with a dl-wrapper.  The
> changes to the wrapper are minimal (not considering additional indentation).
> A new option '-C' means 'check only' and will be passed to the download
> backends. If the backend supports the option, no download will happen. If it
> does not, then the backend will actually perform a download as a means of
> checking that the source exists (a form of graceful degradation). In neither
> case, though, hash checking is performed (as the standard case is without
> download thus without file to calculate hashes on).

So, that last part made me wonder: in the wget case, if the HEAD
succeeds, how does that guarantee that the resource does actually correspond
to what you expected?

What if upstream moved the tarball away, but the server is now replying
garbaged 404-that-is-not-404 so HEAD succeeds?

Or what if the server does not accept HEAD requests?

Or, in the case of git, what if the tag still exists but has re-tagged?

In short, source-check does not guarantee that what exist "there" is
what you do expect to get.

As discussed with Thomas P., it all depends on the semantics we want to
give to source-check. But the one implemented by your series does not
seem that interesting in the end. Yes, it makes your remote-worker-with-
a-slow-connection maybe a bit more bearable.

It also means that all it says is that, right at the time of source-check,
here is actually "something" named as you expect, but there is no
guarantee that a 'make source' later will be able to get that "something"
anyway.

So, I am still not entirely convinced by the usefulness of source-check.

Regards,
Yann E. MORIN.

> Subsequent commits will actually implement -C in the backends.
> 
> Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
> ---
>  Makefile                    |   7 +++
>  package/pkg-download.mk     |   6 +++
>  package/pkg-generic.mk      |  14 ++++-
>  support/download/dl-wrapper | 100 ++++++++++++++++++++----------------
>  4 files changed, 82 insertions(+), 45 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index f736ecfb3e..de1ffa9d0e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -131,6 +131,7 @@ noconfig_targets := menuconfig nconfig gconfig xconfig config oldconfig randconf
>  # (default target is to build), or when MAKECMDGOALS contains
>  # something else than one of the nobuild_targets.
>  nobuild_targets := source %-source \
> +	source-check %-source-check %-all-source-check \
>  	legal-info %-legal-info external-deps _external-deps \
>  	clean distclean help show-targets graph-depends \
>  	%-graph-depends %-show-depends %-show-version \
> @@ -818,6 +819,9 @@ target-post-image: $(TARGETS_ROOTFS) target-finalize staging-finalize
>  .PHONY: source
>  source: $(foreach p,$(PACKAGES),$(p)-all-source)
>  
> +.PHONY: source-check
> +source-check: $(foreach p,$(PACKAGES),$(p)-all-source-check)
> +
>  .PHONY: _external-deps external-deps
>  _external-deps: $(foreach p,$(PACKAGES),$(p)-all-external-deps)
>  external-deps:
> @@ -1103,6 +1107,8 @@ help:
>  	@echo '  <pkg>-dirclean         - Remove <pkg> build directory'
>  	@echo '  <pkg>-reconfigure      - Restart the build from the configure step'
>  	@echo '  <pkg>-rebuild          - Restart the build from the build step'
> +	@echo '  <pkg>-source-check     - Check package for valid download URLs'
> +	@echo '  <pkg>-all-source-check - Check package and its dependencies for valid download URLs'
>  	$(foreach p,$(HELP_PACKAGES), \
>  		@echo $(sep) \
>  		@echo '$($(p)_NAME):' $(sep) \
> @@ -1122,6 +1128,7 @@ help:
>  	@echo
>  	@echo 'Miscellaneous:'
>  	@echo '  source                 - download all sources needed for offline-build'
> +	@echo '  source-check           - check selected packages for valid download URLs'
>  	@echo '  external-deps          - list external packages used'
>  	@echo '  legal-info             - generate info about license compliance'
>  	@echo '  printvars              - dump all the internal variables'
> diff --git a/package/pkg-download.mk b/package/pkg-download.mk
> index 7cd87c38ff..5384fc2af4 100644
> --- a/package/pkg-download.mk
> +++ b/package/pkg-download.mk
> @@ -70,6 +70,7 @@ export BR_NO_CHECK_HASH_FOR =
>  # 3) BR2_BACKUP_SITE if enabled, unless BR2_PRIMARY_SITE_ONLY is set
>  #
>  # Argument 1 is the source location
> +# Argument 2 (optionally) provides extra arguments to pass to DL_WRAPPER
>  #
>  ################################################################################
>  
> @@ -92,6 +93,7 @@ endif
>  define DOWNLOAD
>  	$(Q)mkdir -p $($(PKG)_DL_DIR)
>  	$(Q)$(EXTRA_ENV) $(FLOCK) $(DL_WRAPPER) \
> +		$(2) \
>  		-c '$($(PKG)_DL_VERSION)' \
>  		-d '$($(PKG)_DL_DIR)' \
>  		-D '$(DL_DIR)' \
> @@ -106,3 +108,7 @@ define DOWNLOAD
>  		-- \
>  		$($(PKG)_DL_OPTS)
>  endef
> +
> +define SOURCE_CHECK
> +	$(call DOWNLOAD,$(1),-C)
> +endef
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 6168b40e89..bc57956867 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -788,6 +788,10 @@ $(1)-legal-source:	$$($(2)_TARGET_ACTUAL_SOURCE)
>  endif # actual sources != sources
>  endif # actual sources != ""
>  
> +$(1)-source-check: PKG=$(2)
> +$(1)-source-check:
> +	$$(foreach p,$$($(2)_ALL_DOWNLOADS),$$(call SOURCE_CHECK,$$(p))$$(sep))
> +
>  $(1)-external-deps:
>  	@for p in $$($(2)_SOURCE) $$($(2)_PATCH) $$($(2)_EXTRA_DOWNLOADS) ; do \
>  		echo `basename $$$$p` ; \
> @@ -812,6 +816,9 @@ $(1)-rsync:		$$($(2)_TARGET_RSYNC)
>  $(1)-source:
>  $(1)-legal-source:
>  
> +$(1)-source-check:
> +	test -d $$($(2)_OVERRIDE_SRCDIR)
> +
>  $(1)-external-deps:
>  	@echo "file://$$($(2)_OVERRIDE_SRCDIR)"
>  endif
> @@ -846,6 +853,9 @@ $(1)-graph-rdepends: graph-depends-requirements
>  $(1)-all-source:	$(1)-source
>  $(1)-all-source:	$$(foreach p,$$($(2)_FINAL_ALL_DEPENDENCIES),$$(p)-all-source)
>  
> +$(1)-all-source-check:	$(1)-source-check
> +$(1)-all-source-check:	$$(foreach p,$$($(2)_FINAL_ALL_DEPENDENCIES),$$(p)-all-source-check)
> +
>  $(1)-all-external-deps:	$(1)-external-deps
>  $(1)-all-external-deps:	$$(foreach p,$$($(2)_FINAL_ALL_DEPENDENCIES),$$(p)-all-external-deps)
>  
> @@ -1044,6 +1054,7 @@ DL_TOOLS_DEPENDENCIES += $$(call extractor-dependency,$$($(2)_SOURCE))
>  	$(1)-all-external-deps \
>  	$(1)-all-legal-info \
>  	$(1)-all-source \
> +	$(1)-all-source-check \
>  	$(1)-build \
>  	$(1)-clean-for-rebuild \
>  	$(1)-clean-for-reconfigure \
> @@ -1068,7 +1079,8 @@ DL_TOOLS_DEPENDENCIES += $$(call extractor-dependency,$$($(2)_SOURCE))
>  	$(1)-rsync \
>  	$(1)-show-depends \
>  	$(1)-show-version \
> -	$(1)-source
> +	$(1)-source \
> +	$(1)-source-check
>  
>  ifneq ($$($(2)_SOURCE),)
>  ifeq ($$($(2)_SITE),)
> diff --git a/support/download/dl-wrapper b/support/download/dl-wrapper
> index 3315bd410e..920699e51d 100755
> --- a/support/download/dl-wrapper
> +++ b/support/download/dl-wrapper
> @@ -17,7 +17,7 @@
>  # We want to catch any unexpected failure, and exit immediately.
>  set -e
>  
> -export BR_BACKEND_DL_GETOPTS=":hc:d:o:n:N:H:ru:qf:e"
> +export BR_BACKEND_DL_GETOPTS=":hc:Cd:o:n:N:H:ru:qf:e"
>  
>  main() {
>      local OPT OPTARG
> @@ -25,9 +25,10 @@ main() {
>      local -a uris
>  
>      # Parse our options; anything after '--' is for the backend
> -    while getopts ":c:d:D:o:n:N:H:rf:u:q" OPT; do
> +    while getopts ":c:Cd:D:o:n:N:H:rf:u:q" OPT; do
>          case "${OPT}" in
>          c)  cset="${OPTARG}";;
> +        C)  checkonly=-C;;
>          d)  dl_dir="${OPTARG}";;
>          D)  old_dl_dir="${OPTARG}";;
>          o)  output="${OPTARG}";;
> @@ -46,38 +47,40 @@ main() {
>      # Forget our options, and keep only those for the backend
>      shift $((OPTIND-1))
>  
> -    if [ -z "${output}" ]; then
> -        error "no output specified, use -o\n"
> -    fi
> +    if [ -z "${checkonly}" ]; then
> +        if [ -z "${output}" ]; then
> +            error "no output specified, use -o\n"
> +        fi
>  
> -    # Legacy handling: check if the file already exists in the global
> -    # download directory. If it does, hard-link it. If it turns out it
> -    # was an incorrect download, we'd still check it below anyway.
> -    # If we can neither link nor copy, fallback to doing a download.
> -    # NOTE! This is not atomic, is subject to TOCTTOU, but the whole
> -    # dl-wrapper runs under an flock, so we're safe.
> -    if [ ! -e "${output}" -a -e "${old_dl_dir}/${filename}" ]; then
> -        ln "${old_dl_dir}/${filename}" "${output}" || \
> -        cp "${old_dl_dir}/${filename}" "${output}" || \
> -        true
> -    fi
> +        # Legacy handling: check if the file already exists in the global
> +        # download directory. If it does, hard-link it. If it turns out it
> +        # was an incorrect download, we'd still check it below anyway.
> +        # If we can neither link nor copy, fallback to doing a download.
> +        # NOTE! This is not atomic, is subject to TOCTTOU, but the whole
> +        # dl-wrapper runs under an flock, so we're safe.
> +        if [ ! -e "${output}" -a -e "${old_dl_dir}/${filename}" ]; then
> +            ln "${old_dl_dir}/${filename}" "${output}" || \
> +            cp "${old_dl_dir}/${filename}" "${output}" || \
> +            true
> +        fi
>  
> -    # If the output file already exists and:
> -    # - there's no .hash file: do not download it again and exit promptly
> -    # - matches all its hashes: do not download it again and exit promptly
> -    # - fails at least one of its hashes: force a re-download
> -    # - there's no hash (but a .hash file): consider it a hard error
> -    if [ -e "${output}" ]; then
> -        if support/download/check-hash ${quiet} "${hfile}" "${output}" "${output##*/}"; then
> -            exit 0
> -        elif [ ${?} -ne 2 ]; then
> -            # Do not remove the file, otherwise it might get re-downloaded
> -            # from a later location (i.e. primary -> upstream -> mirror).
> -            # Do not print a message, check-hash already did.
> -            exit 1
> +        # If the output file already exists and:
> +        # - there's no .hash file: do not download it again and exit promptly
> +        # - matches all its hashes: do not download it again and exit promptly
> +        # - fails at least one of its hashes: force a re-download
> +        # - there's no hash (but a .hash file): consider it a hard error
> +        if [ -e "${output}" ]; then
> +            if support/download/check-hash ${quiet} "${hfile}" "${output}" "${output##*/}"; then
> +                exit 0
> +            elif [ ${?} -ne 2 ]; then
> +                # Do not remove the file, otherwise it might get re-downloaded
> +                # from a later location (i.e. primary -> upstream -> mirror).
> +                # Do not print a message, check-hash already did.
> +                exit 1
> +            fi
> +            rm -f "${output}"
> +            warn "Re-downloading '%s'...\n" "${output##*/}"
>          fi
> -        rm -f "${output}"
> -        warn "Re-downloading '%s'...\n" "${output##*/}"
>      fi
>  
>      # Look through all the uris that we were given to download the package
> @@ -127,7 +130,7 @@ main() {
>                  -f "${filename}" \
>                  -u "${uri}" \
>                  -o "${tmpf}" \
> -                ${quiet} ${recurse} -- "${@}"
> +                ${quiet} ${recurse} ${checkonly} -- "${@}"
>          then
>              # cd back to keep path coherence
>              cd "${OLDPWD}"
> @@ -138,19 +141,21 @@ main() {
>          # cd back to free the temp-dir, so we can remove it later
>          cd "${OLDPWD}"
>  
> -        # Check if the downloaded file is sane, and matches the stored hashes
> -        # for that file
> -        if support/download/check-hash ${quiet} "${hfile}" "${tmpf}" "${output##*/}"; then
> -            rc=0
> -        else
> -            if [ ${?} -ne 3 ]; then
> -                rm -rf "${tmpd}"
> -                continue
> +        if [ -z "${checkonly}" ]; then
> +            # Check if the downloaded file is sane, and matches the stored hashes
> +            # for that file
> +            if support/download/check-hash ${quiet} "${hfile}" "${tmpf}" "${output##*/}"; then
> +                rc=0
> +            else
> +                if [ ${?} -ne 3 ]; then
> +                    rm -rf "${tmpd}"
> +                    continue
> +                fi
> +
> +                # the hash file exists and there was no hash to check the file
> +                # against
> +                rc=1
>              fi
> -
> -            # the hash file exists and there was no hash to check the file
> -            # against
> -            rc=1
>          fi
>          download_and_check=1
>          break
> @@ -163,6 +168,13 @@ main() {
>          exit 1
>      fi
>  
> +    # If we only need to check the presence of sources, stop here.
> +    # No need to handle output files.
> +    if [ -n "${checkonly}" ]; then
> +        rm -rf "${tmpd}"
> +        exit 0
> +    fi
> +
>      # tmp_output is in the same directory as the final output, so we can
>      # later move it atomically.
>      tmp_output="$(mktemp "${output}.XXXXXX")"
> -- 
> 2.19.2
>
Thomas Petazzoni March 17, 2019, 2:57 p.m. UTC | #2
Hello,

On Sun, 17 Mar 2019 15:32:15 +0100
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> As discussed with Thomas P., it all depends on the semantics we want to
> give to source-check. But the one implemented by your series does not
> seem that interesting in the end. Yes, it makes your remote-worker-with-
> a-slow-connection maybe a bit more bearable.
> 
> It also means that all it says is that, right at the time of source-check,
> here is actually "something" named as you expect, but there is no
> guarantee that a 'make source' later will be able to get that "something"
> anyway.
> 
> So, I am still not entirely convinced by the usefulness of source-check.

Just to add to Yann's explanations, here are the two aspects that makes
the proposed semantic of "source-check" a bit weak:

 - It only says "something" was available at the specified download
   location, but not that this something matches the hash we have in
   the package hash file. Hence a successful "make source-check" does
   not guarantee a successful "make source".

 - It only says at the time of "make source-check" that "something" was
   available at the specified download location, but 2 minutes later
   the file could be removed upstream, the tag be dropped, etc. Hence a
   successful "make source-check" does not guarantee a successful "make
   source".

So, a "make source-check" does not provide you much guarantees. It only
tells you that at the time of the source-check, "something" was
available at the specified download location, but you don't know if
that "something" is correct and that it will still be around when
you'll do your build.

If we are willing to accept this very weak semantic of source-check,
then fair enough, but we (and our users) have to be aware of the very
limited guarantees that source-check provides.

What do you think?

Thomas
Arnout Vandecappelle March 27, 2019, 1:46 p.m. UTC | #3
On 17/03/2019 15:57, Thomas Petazzoni wrote:
> Hello,
> 
> On Sun, 17 Mar 2019 15:32:15 +0100
> "Yann E. MORIN" <yann.morin.1998@free.fr> wrote:
> 
>> As discussed with Thomas P., it all depends on the semantics we want to
>> give to source-check. But the one implemented by your series does not
>> seem that interesting in the end. Yes, it makes your remote-worker-with-
>> a-slow-connection maybe a bit more bearable.
>>
>> It also means that all it says is that, right at the time of source-check,
>> here is actually "something" named as you expect, but there is no
>> guarantee that a 'make source' later will be able to get that "something"
>> anyway.
>>
>> So, I am still not entirely convinced by the usefulness of source-check.
> 
> Just to add to Yann's explanations, here are the two aspects that makes
> the proposed semantic of "source-check" a bit weak:
> 
>  - It only says "something" was available at the specified download
>    location, but not that this something matches the hash we have in
>    the package hash file. Hence a successful "make source-check" does
>    not guarantee a successful "make source".
> 
>  - It only says at the time of "make source-check" that "something" was
>    available at the specified download location, but 2 minutes later
>    the file could be removed upstream, the tag be dropped, etc. Hence a
>    successful "make source-check" does not guarantee a successful "make
>    source".
> 
> So, a "make source-check" does not provide you much guarantees. It only
> tells you that at the time of the source-check, "something" was
> available at the specified download location, but you don't know if
> that "something" is correct and that it will still be around when
> you'll do your build.
> 
> If we are willing to accept this very weak semantic of source-check,
> then fair enough, but we (and our users) have to be aware of the very
> limited guarantees that source-check provides.

 For the use case of Thomas DS, this weak semantic is perfectly fine. What it
tries to protect against is the common mistake that you add or bump a package
and forget to upload the source to PRIMARY_SITE. As explained by Thomas DS,
doing a full download is eventually still needed, but takes much longer:
source-check may take in the order of 1 minute while the full download takes 10
minutes. That makes all the difference for immediate feedback to the developer,
and it doesn't significantly slow down a good pipeline.

 Indeed, it does not protect against uploading the wrong tarball with the
correct name. It also doesn't protect against forgetting to update the hash
file. It also doesn't protect against the package failing to build. It also
doesn't protect against bugs in the code which become only apparent at run time.
But all of these are checked later in CI, and the source check captures an
important (and likely) class of mistakes early on.

 It is true that this is a very narrow use case. However, I think it passes two
tests for acceptance:

* It does not make Buildroot (much) more complex.

* There is no way to do this with scripting outside of Buildroot.

 Regarding that second point: in fact there would be a way, if we would instead
have a command to print everything that would be downloaded. That would be very
similar to patch 1/9, but would remove the need for the other patches in the
series. The advantage of this approach is that there could be other uses for
printing the list of sources. But that would be back to the drawing board for
Thomas DS, so I doubt he's enthusiastic about that option :-) Plus, we have no
actual example of an additional use case.

 Regards,
 Arnout
Thomas De Schampheleire March 27, 2019, 2:13 p.m. UTC | #4
El mié., 27 mar. 2019 a las 14:46, Arnout Vandecappelle
(<arnout@mind.be>) escribió:
>
>
>
> On 17/03/2019 15:57, Thomas Petazzoni wrote:
> > Hello,
> >
> > On Sun, 17 Mar 2019 15:32:15 +0100
> > "Yann E. MORIN" <yann.morin.1998@free.fr> wrote:
> >
> >> As discussed with Thomas P., it all depends on the semantics we want to
> >> give to source-check. But the one implemented by your series does not
> >> seem that interesting in the end. Yes, it makes your remote-worker-with-
> >> a-slow-connection maybe a bit more bearable.
> >>
> >> It also means that all it says is that, right at the time of source-check,
> >> here is actually "something" named as you expect, but there is no
> >> guarantee that a 'make source' later will be able to get that "something"
> >> anyway.
> >>
> >> So, I am still not entirely convinced by the usefulness of source-check.
> >
> > Just to add to Yann's explanations, here are the two aspects that makes
> > the proposed semantic of "source-check" a bit weak:
> >
> >  - It only says "something" was available at the specified download
> >    location, but not that this something matches the hash we have in
> >    the package hash file. Hence a successful "make source-check" does
> >    not guarantee a successful "make source".
> >
> >  - It only says at the time of "make source-check" that "something" was
> >    available at the specified download location, but 2 minutes later
> >    the file could be removed upstream, the tag be dropped, etc. Hence a
> >    successful "make source-check" does not guarantee a successful "make
> >    source".
> >
> > So, a "make source-check" does not provide you much guarantees. It only
> > tells you that at the time of the source-check, "something" was
> > available at the specified download location, but you don't know if
> > that "something" is correct and that it will still be around when
> > you'll do your build.
> >
> > If we are willing to accept this very weak semantic of source-check,
> > then fair enough, but we (and our users) have to be aware of the very
> > limited guarantees that source-check provides.
>
>  For the use case of Thomas DS, this weak semantic is perfectly fine. What it
> tries to protect against is the common mistake that you add or bump a package
> and forget to upload the source to PRIMARY_SITE. As explained by Thomas DS,
> doing a full download is eventually still needed, but takes much longer:
> source-check may take in the order of 1 minute while the full download takes 10
> minutes. That makes all the difference for immediate feedback to the developer,
> and it doesn't significantly slow down a good pipeline.
>
>  Indeed, it does not protect against uploading the wrong tarball with the
> correct name. It also doesn't protect against forgetting to update the hash
> file. It also doesn't protect against the package failing to build. It also
> doesn't protect against bugs in the code which become only apparent at run time.
> But all of these are checked later in CI, and the source check captures an
> important (and likely) class of mistakes early on.

Thanks for writing this out. It conveys my thoughts very well.

>
>  It is true that this is a very narrow use case. However, I think it passes two
> tests for acceptance:
>
> * It does not make Buildroot (much) more complex.
>
> * There is no way to do this with scripting outside of Buildroot.
>
>  Regarding that second point: in fact there would be a way, if we would instead
> have a command to print everything that would be downloaded. That would be very
> similar to patch 1/9, but would remove the need for the other patches in the
> series. The advantage of this approach is that there could be other uses for
> printing the list of sources. But that would be back to the drawing board for
> Thomas DS, so I doubt he's enthusiastic about that option :-) Plus, we have no
> actual example of an additional use case.

I'm not against exploring this, if you think it is better.

Question is what exactly you would print, e.g. all possible URLs, the
main URL, only the one via PRIMARY_SITE (which are in fact two
possible ones, either with or without the package directory).
And how to describe the 'site method'?

For my own use case, everything is expected on the primary site via
scp, except our own Mercurial repositories (which are checked via 'hg'
or with an optimization via a special scp URL).
I would thus need a list of tarball names to check via scp, and a list
of repositories+revisions.

One could envision an output format with one package per line, each
line listing the different possible URLs where the package could be
obtained, space-separated.
The external script could then have a logic that checks that at least
one URL for each line can be accessed. The URLs could be those that
the dl-wrapper receives, i.e. with prefixes like git|git+ssh:// that
identify the method.

In the general case, the external script would need to duplicate some
logic that Buildroot is already handling. I.e. how to process git,
wget, etc. URLs. If I'm the only user of this feature, it's fine with
me.

Best regards,
Thomas


>
>  Regards,
>  Arnout
>
Thomas Petazzoni March 27, 2019, 4:35 p.m. UTC | #5
Hello,

Thanks a lot for sharing your feedback in this discussion, it's very
useful.

On Wed, 27 Mar 2019 14:46:32 +0100
Arnout Vandecappelle <arnout@mind.be> wrote:

>  For the use case of Thomas DS, this weak semantic is perfectly fine. What it
> tries to protect against is the common mistake that you add or bump a package
> and forget to upload the source to PRIMARY_SITE. As explained by Thomas DS,
> doing a full download is eventually still needed, but takes much longer:
> source-check may take in the order of 1 minute while the full download takes 10
> minutes. That makes all the difference for immediate feedback to the developer,
> and it doesn't significantly slow down a good pipeline.
> 
>  Indeed, it does not protect against uploading the wrong tarball with the
> correct name. It also doesn't protect against forgetting to update the hash
> file. It also doesn't protect against the package failing to build. It also
> doesn't protect against bugs in the code which become only apparent at run time.
> But all of these are checked later in CI, and the source check captures an
> important (and likely) class of mistakes early on.

Interesting perspective indeed :-)

>  It is true that this is a very narrow use case. However, I think it passes two
> tests for acceptance:
> 
> * It does not make Buildroot (much) more complex.
> 
> * There is no way to do this with scripting outside of Buildroot.

True.

>  Regarding that second point: in fact there would be a way, if we would instead
> have a command to print everything that would be downloaded. That would be very
> similar to patch 1/9, but would remove the need for the other patches in the
> series. The advantage of this approach is that there could be other uses for
> printing the list of sources. But that would be back to the drawing board for
> Thomas DS, so I doubt he's enthusiastic about that option :-) Plus, we have no
> actual example of an additional use case.

We already have "make external-deps" but it only prints the file names,
not the full URL.

Does Thomas really want to check the BR2_PRIMARY_SITE or the original
Mercurial repository ? If he wants to check BR2_PRIMARY_SITE (which
contains only tarballs), then he could run "make external-deps" and
checks that the tarballs are here. But I suppose that's not what Thomas
wants: he really wants to check the upstream Mercurial repository, no?

Best regards,

Thomas
Arnout Vandecappelle March 27, 2019, 5:25 p.m. UTC | #6
On 27/03/2019 17:35, Thomas Petazzoni wrote:
> Hello,
> 
> Thanks a lot for sharing your feedback in this discussion, it's very
> useful.
> 
> On Wed, 27 Mar 2019 14:46:32 +0100
> Arnout Vandecappelle <arnout@mind.be> wrote:
> 
>>  For the use case of Thomas DS, this weak semantic is perfectly fine. What it
>> tries to protect against is the common mistake that you add or bump a package
>> and forget to upload the source to PRIMARY_SITE. As explained by Thomas DS,
>> doing a full download is eventually still needed, but takes much longer:
>> source-check may take in the order of 1 minute while the full download takes 10
>> minutes. That makes all the difference for immediate feedback to the developer,
>> and it doesn't significantly slow down a good pipeline.
>>
>>  Indeed, it does not protect against uploading the wrong tarball with the
>> correct name. It also doesn't protect against forgetting to update the hash
>> file. It also doesn't protect against the package failing to build. It also
>> doesn't protect against bugs in the code which become only apparent at run time.
>> But all of these are checked later in CI, and the source check captures an
>> important (and likely) class of mistakes early on.
> 
> Interesting perspective indeed :-)
> 
>>  It is true that this is a very narrow use case. However, I think it passes two
>> tests for acceptance:
>>
>> * It does not make Buildroot (much) more complex.
>>
>> * There is no way to do this with scripting outside of Buildroot.
> 
> True.
> 
>>  Regarding that second point: in fact there would be a way, if we would instead
>> have a command to print everything that would be downloaded. That would be very
>> similar to patch 1/9, but would remove the need for the other patches in the
>> series. The advantage of this approach is that there could be other uses for
>> printing the list of sources. But that would be back to the drawing board for
>> Thomas DS, so I doubt he's enthusiastic about that option :-) Plus, we have no
>> actual example of an additional use case.
> 
> We already have "make external-deps" but it only prints the file names,
> not the full URL.

 True! So, we could extend external-deps with printing the URL (similar to how
printvars uses RAW_VARS maybe, to not break existing scripts using
external-deps). We could have variables to ask for primary, upstream or secondary.

 Regards,
 Arnout


> Does Thomas really want to check the BR2_PRIMARY_SITE or the original
> Mercurial repository ? If he wants to check BR2_PRIMARY_SITE (which
> contains only tarballs), then he could run "make external-deps" and
> checks that the tarballs are here. But I suppose that's not what Thomas
> wants: he really wants to check the upstream Mercurial repository, no?
> 
> Best regards,
> 
> Thomas
>
Arnout Vandecappelle April 13, 2019, 3:24 p.m. UTC | #7
Hi Thomas DS,

On 27/03/2019 18:25, Arnout Vandecappelle wrote:
> 
> 
> On 27/03/2019 17:35, Thomas Petazzoni wrote:
[snip]
>> We already have "make external-deps" but it only prints the file names,
>> not the full URL.
> 
>  True! So, we could extend external-deps with printing the URL (similar to how
> printvars uses RAW_VARS maybe, to not break existing scripts using
> external-deps). We could have variables to ask for primary, upstream or secondary.

 With Yann's show-info series on its way in, can we consider this series as
Rejected?

 Thanks for the patches anyway, it was the push needed to get Yann to implement
the Right Thing (tm) :-)

 Regards,
 Arnout
Thomas De Schampheleire April 13, 2019, 3:51 p.m. UTC | #8
Hi Arnout,

On Sat, Apr 13, 2019, 17:24 Arnout Vandecappelle <arnout@mind.be> wrote:

>  Hi Thomas DS,
>
> On 27/03/2019 18:25, Arnout Vandecappelle wrote:
> >
> >
> > On 27/03/2019 17:35, Thomas Petazzoni wrote:
> [snip]
> >> We already have "make external-deps" but it only prints the file names,
> >> not the full URL.
> >
> >  True! So, we could extend external-deps with printing the URL (similar
> to how
> > printvars uses RAW_VARS maybe, to not break existing scripts using
> > external-deps). We could have variables to ask for primary, upstream or
> secondary.
>
>  With Yann's show-info series on its way in, can we consider this series as
> Rejected?
>
>  Thanks for the patches anyway, it was the push needed to get Yann to
> implement
> the Right Thing (tm) :-)
>

Yes, I agree to drop these patches and use show-info instead.

Thanks all for the discussion and alternative solution.

Best regards
Thomas
<div dir="auto"><div>Hi Arnout,<br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Apr 13, 2019, 17:24 Arnout Vandecappelle &lt;<a href="mailto:arnout@mind.be">arnout@mind.be</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Hi Thomas DS,<br>
<br>
On 27/03/2019 18:25, Arnout Vandecappelle wrote:<br>
&gt; <br>
&gt; <br>
&gt; On 27/03/2019 17:35, Thomas Petazzoni wrote:<br>
[snip]<br>
&gt;&gt; We already have &quot;make external-deps&quot; but it only prints the file names,<br>
&gt;&gt; not the full URL.<br>
&gt; <br>
&gt;  True! So, we could extend external-deps with printing the URL (similar to how<br>
&gt; printvars uses RAW_VARS maybe, to not break existing scripts using<br>
&gt; external-deps). We could have variables to ask for primary, upstream or secondary.<br>
<br>
 With Yann&#39;s show-info series on its way in, can we consider this series as<br>
Rejected?<br>
<br>
 Thanks for the patches anyway, it was the push needed to get Yann to implement<br>
the Right Thing (tm) :-)<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Yes, I agree to drop these patches and use show-info instead.</div><div dir="auto"><br></div><div dir="auto">Thanks all for the discussion and alternative solution.</div><div dir="auto"><br></div><div dir="auto">Best regards</div><div dir="auto">Thomas </div></div>
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index f736ecfb3e..de1ffa9d0e 100644
--- a/Makefile
+++ b/Makefile
@@ -131,6 +131,7 @@  noconfig_targets := menuconfig nconfig gconfig xconfig config oldconfig randconf
 # (default target is to build), or when MAKECMDGOALS contains
 # something else than one of the nobuild_targets.
 nobuild_targets := source %-source \
+	source-check %-source-check %-all-source-check \
 	legal-info %-legal-info external-deps _external-deps \
 	clean distclean help show-targets graph-depends \
 	%-graph-depends %-show-depends %-show-version \
@@ -818,6 +819,9 @@  target-post-image: $(TARGETS_ROOTFS) target-finalize staging-finalize
 .PHONY: source
 source: $(foreach p,$(PACKAGES),$(p)-all-source)
 
+.PHONY: source-check
+source-check: $(foreach p,$(PACKAGES),$(p)-all-source-check)
+
 .PHONY: _external-deps external-deps
 _external-deps: $(foreach p,$(PACKAGES),$(p)-all-external-deps)
 external-deps:
@@ -1103,6 +1107,8 @@  help:
 	@echo '  <pkg>-dirclean         - Remove <pkg> build directory'
 	@echo '  <pkg>-reconfigure      - Restart the build from the configure step'
 	@echo '  <pkg>-rebuild          - Restart the build from the build step'
+	@echo '  <pkg>-source-check     - Check package for valid download URLs'
+	@echo '  <pkg>-all-source-check - Check package and its dependencies for valid download URLs'
 	$(foreach p,$(HELP_PACKAGES), \
 		@echo $(sep) \
 		@echo '$($(p)_NAME):' $(sep) \
@@ -1122,6 +1128,7 @@  help:
 	@echo
 	@echo 'Miscellaneous:'
 	@echo '  source                 - download all sources needed for offline-build'
+	@echo '  source-check           - check selected packages for valid download URLs'
 	@echo '  external-deps          - list external packages used'
 	@echo '  legal-info             - generate info about license compliance'
 	@echo '  printvars              - dump all the internal variables'
diff --git a/package/pkg-download.mk b/package/pkg-download.mk
index 7cd87c38ff..5384fc2af4 100644
--- a/package/pkg-download.mk
+++ b/package/pkg-download.mk
@@ -70,6 +70,7 @@  export BR_NO_CHECK_HASH_FOR =
 # 3) BR2_BACKUP_SITE if enabled, unless BR2_PRIMARY_SITE_ONLY is set
 #
 # Argument 1 is the source location
+# Argument 2 (optionally) provides extra arguments to pass to DL_WRAPPER
 #
 ################################################################################
 
@@ -92,6 +93,7 @@  endif
 define DOWNLOAD
 	$(Q)mkdir -p $($(PKG)_DL_DIR)
 	$(Q)$(EXTRA_ENV) $(FLOCK) $(DL_WRAPPER) \
+		$(2) \
 		-c '$($(PKG)_DL_VERSION)' \
 		-d '$($(PKG)_DL_DIR)' \
 		-D '$(DL_DIR)' \
@@ -106,3 +108,7 @@  define DOWNLOAD
 		-- \
 		$($(PKG)_DL_OPTS)
 endef
+
+define SOURCE_CHECK
+	$(call DOWNLOAD,$(1),-C)
+endef
diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 6168b40e89..bc57956867 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -788,6 +788,10 @@  $(1)-legal-source:	$$($(2)_TARGET_ACTUAL_SOURCE)
 endif # actual sources != sources
 endif # actual sources != ""
 
+$(1)-source-check: PKG=$(2)
+$(1)-source-check:
+	$$(foreach p,$$($(2)_ALL_DOWNLOADS),$$(call SOURCE_CHECK,$$(p))$$(sep))
+
 $(1)-external-deps:
 	@for p in $$($(2)_SOURCE) $$($(2)_PATCH) $$($(2)_EXTRA_DOWNLOADS) ; do \
 		echo `basename $$$$p` ; \
@@ -812,6 +816,9 @@  $(1)-rsync:		$$($(2)_TARGET_RSYNC)
 $(1)-source:
 $(1)-legal-source:
 
+$(1)-source-check:
+	test -d $$($(2)_OVERRIDE_SRCDIR)
+
 $(1)-external-deps:
 	@echo "file://$$($(2)_OVERRIDE_SRCDIR)"
 endif
@@ -846,6 +853,9 @@  $(1)-graph-rdepends: graph-depends-requirements
 $(1)-all-source:	$(1)-source
 $(1)-all-source:	$$(foreach p,$$($(2)_FINAL_ALL_DEPENDENCIES),$$(p)-all-source)
 
+$(1)-all-source-check:	$(1)-source-check
+$(1)-all-source-check:	$$(foreach p,$$($(2)_FINAL_ALL_DEPENDENCIES),$$(p)-all-source-check)
+
 $(1)-all-external-deps:	$(1)-external-deps
 $(1)-all-external-deps:	$$(foreach p,$$($(2)_FINAL_ALL_DEPENDENCIES),$$(p)-all-external-deps)
 
@@ -1044,6 +1054,7 @@  DL_TOOLS_DEPENDENCIES += $$(call extractor-dependency,$$($(2)_SOURCE))
 	$(1)-all-external-deps \
 	$(1)-all-legal-info \
 	$(1)-all-source \
+	$(1)-all-source-check \
 	$(1)-build \
 	$(1)-clean-for-rebuild \
 	$(1)-clean-for-reconfigure \
@@ -1068,7 +1079,8 @@  DL_TOOLS_DEPENDENCIES += $$(call extractor-dependency,$$($(2)_SOURCE))
 	$(1)-rsync \
 	$(1)-show-depends \
 	$(1)-show-version \
-	$(1)-source
+	$(1)-source \
+	$(1)-source-check
 
 ifneq ($$($(2)_SOURCE),)
 ifeq ($$($(2)_SITE),)
diff --git a/support/download/dl-wrapper b/support/download/dl-wrapper
index 3315bd410e..920699e51d 100755
--- a/support/download/dl-wrapper
+++ b/support/download/dl-wrapper
@@ -17,7 +17,7 @@ 
 # We want to catch any unexpected failure, and exit immediately.
 set -e
 
-export BR_BACKEND_DL_GETOPTS=":hc:d:o:n:N:H:ru:qf:e"
+export BR_BACKEND_DL_GETOPTS=":hc:Cd:o:n:N:H:ru:qf:e"
 
 main() {
     local OPT OPTARG
@@ -25,9 +25,10 @@  main() {
     local -a uris
 
     # Parse our options; anything after '--' is for the backend
-    while getopts ":c:d:D:o:n:N:H:rf:u:q" OPT; do
+    while getopts ":c:Cd:D:o:n:N:H:rf:u:q" OPT; do
         case "${OPT}" in
         c)  cset="${OPTARG}";;
+        C)  checkonly=-C;;
         d)  dl_dir="${OPTARG}";;
         D)  old_dl_dir="${OPTARG}";;
         o)  output="${OPTARG}";;
@@ -46,38 +47,40 @@  main() {
     # Forget our options, and keep only those for the backend
     shift $((OPTIND-1))
 
-    if [ -z "${output}" ]; then
-        error "no output specified, use -o\n"
-    fi
+    if [ -z "${checkonly}" ]; then
+        if [ -z "${output}" ]; then
+            error "no output specified, use -o\n"
+        fi
 
-    # Legacy handling: check if the file already exists in the global
-    # download directory. If it does, hard-link it. If it turns out it
-    # was an incorrect download, we'd still check it below anyway.
-    # If we can neither link nor copy, fallback to doing a download.
-    # NOTE! This is not atomic, is subject to TOCTTOU, but the whole
-    # dl-wrapper runs under an flock, so we're safe.
-    if [ ! -e "${output}" -a -e "${old_dl_dir}/${filename}" ]; then
-        ln "${old_dl_dir}/${filename}" "${output}" || \
-        cp "${old_dl_dir}/${filename}" "${output}" || \
-        true
-    fi
+        # Legacy handling: check if the file already exists in the global
+        # download directory. If it does, hard-link it. If it turns out it
+        # was an incorrect download, we'd still check it below anyway.
+        # If we can neither link nor copy, fallback to doing a download.
+        # NOTE! This is not atomic, is subject to TOCTTOU, but the whole
+        # dl-wrapper runs under an flock, so we're safe.
+        if [ ! -e "${output}" -a -e "${old_dl_dir}/${filename}" ]; then
+            ln "${old_dl_dir}/${filename}" "${output}" || \
+            cp "${old_dl_dir}/${filename}" "${output}" || \
+            true
+        fi
 
-    # If the output file already exists and:
-    # - there's no .hash file: do not download it again and exit promptly
-    # - matches all its hashes: do not download it again and exit promptly
-    # - fails at least one of its hashes: force a re-download
-    # - there's no hash (but a .hash file): consider it a hard error
-    if [ -e "${output}" ]; then
-        if support/download/check-hash ${quiet} "${hfile}" "${output}" "${output##*/}"; then
-            exit 0
-        elif [ ${?} -ne 2 ]; then
-            # Do not remove the file, otherwise it might get re-downloaded
-            # from a later location (i.e. primary -> upstream -> mirror).
-            # Do not print a message, check-hash already did.
-            exit 1
+        # If the output file already exists and:
+        # - there's no .hash file: do not download it again and exit promptly
+        # - matches all its hashes: do not download it again and exit promptly
+        # - fails at least one of its hashes: force a re-download
+        # - there's no hash (but a .hash file): consider it a hard error
+        if [ -e "${output}" ]; then
+            if support/download/check-hash ${quiet} "${hfile}" "${output}" "${output##*/}"; then
+                exit 0
+            elif [ ${?} -ne 2 ]; then
+                # Do not remove the file, otherwise it might get re-downloaded
+                # from a later location (i.e. primary -> upstream -> mirror).
+                # Do not print a message, check-hash already did.
+                exit 1
+            fi
+            rm -f "${output}"
+            warn "Re-downloading '%s'...\n" "${output##*/}"
         fi
-        rm -f "${output}"
-        warn "Re-downloading '%s'...\n" "${output##*/}"
     fi
 
     # Look through all the uris that we were given to download the package
@@ -127,7 +130,7 @@  main() {
                 -f "${filename}" \
                 -u "${uri}" \
                 -o "${tmpf}" \
-                ${quiet} ${recurse} -- "${@}"
+                ${quiet} ${recurse} ${checkonly} -- "${@}"
         then
             # cd back to keep path coherence
             cd "${OLDPWD}"
@@ -138,19 +141,21 @@  main() {
         # cd back to free the temp-dir, so we can remove it later
         cd "${OLDPWD}"
 
-        # Check if the downloaded file is sane, and matches the stored hashes
-        # for that file
-        if support/download/check-hash ${quiet} "${hfile}" "${tmpf}" "${output##*/}"; then
-            rc=0
-        else
-            if [ ${?} -ne 3 ]; then
-                rm -rf "${tmpd}"
-                continue
+        if [ -z "${checkonly}" ]; then
+            # Check if the downloaded file is sane, and matches the stored hashes
+            # for that file
+            if support/download/check-hash ${quiet} "${hfile}" "${tmpf}" "${output##*/}"; then
+                rc=0
+            else
+                if [ ${?} -ne 3 ]; then
+                    rm -rf "${tmpd}"
+                    continue
+                fi
+
+                # the hash file exists and there was no hash to check the file
+                # against
+                rc=1
             fi
-
-            # the hash file exists and there was no hash to check the file
-            # against
-            rc=1
         fi
         download_and_check=1
         break
@@ -163,6 +168,13 @@  main() {
         exit 1
     fi
 
+    # If we only need to check the presence of sources, stop here.
+    # No need to handle output files.
+    if [ -n "${checkonly}" ]; then
+        rm -rf "${tmpd}"
+        exit 0
+    fi
+
     # tmp_output is in the same directory as the final output, so we can
     # later move it atomically.
     tmp_output="$(mktemp "${output}.XXXXXX")"