diff mbox series

[RFC,v1,1/1] package/pkg-golang: download deps to vendor tree if not present

Message ID 20200831062335.1105977-1-christian@paral.in
State RFC
Headers show
Series [RFC,v1,1/1] package/pkg-golang: download deps to vendor tree if not present | expand

Commit Message

Christian Stewart Aug. 31, 2020, 6:23 a.m. UTC
NOTE: This patch is a RFC and is not intended for merging in its current state.
It is a naiive implementation of the "go mod vendor" download step as a
post-extract hook, for early testing and demonstration of the desired effect. I
don't yet know what a final implementation might look like.

Add a new hook to POST_EXTRACT_HOOKS for Go packages which will create the
"vendor" directory structure under $(@D)/vendor with Go package deps by running
the "go mod vendor" command.

This will download dependency sources and use $GOPATH/pkg as a caching
directory for lookups and downloads.

Go specifies commit hashes OR version tags in go.mod, and lists source code
checksums in go.sum. The Go module system has a robust security model for
preventing MITM attacks or changed Git tags on dependencies through this
checksumming and explicitly-specified versioning approach.

Reference: https://blog.golang.org/using-go-modules

Signed-off-by: Christian Stewart <christian@paral.in>
---
 package/pkg-golang.mk | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Yann E. MORIN Aug. 31, 2020, 7:08 a.m. UTC | #1
Christian, All,

On 2020-08-30 23:23 -0700, Christian Stewart spake thusly:
> NOTE: This patch is a RFC and is not intended for merging in its current state.
> It is a naiive implementation of the "go mod vendor" download step as a
> post-extract hook, for early testing and demonstration of the desired effect. I
> don't yet know what a final implementation might look like.

Thanks, that's a good starting point.

My proposal was that we introduce per-package managers download
backends, which would typically do something like the following
(pseudo-code):

    #!/usr/bin/env bash
    # This file is support/fownload/go

    # Parse options:
    actual_site_method="$(get_option '--actual-site-method')"

    # Call actual download helper:
    "${0%/*}/${actual_site_method}" "${@}" -o "${temp_tarball}"

    # Populate the vendor:
    tar xf "${temp_tarball}" -C "${temp_directory}"
    cd "${temp_directory}/${package_name_version}"
    go mod vendor
    cd ..
    tar czf "${final_tarball}" "${package_name_version}"

(of course, the details would be a bit more complex, and would require
that we pass the actual site method vi the download infra, but the idea
is there)

What's your opinion on this?

See also the following mails from Thomas, which contain copies of some
of the IRC discussions we had on the topic (about rust and cargo, but
that's the same topic):

    http://lists.busybox.net/pipermail/buildroot/2020-August/289895.html
    http://lists.busybox.net/pipermail/buildroot/2020-August/289894.html

> Add a new hook to POST_EXTRACT_HOOKS for Go packages which will create the
> "vendor" directory structure under $(@D)/vendor with Go package deps by running
> the "go mod vendor" command.
> 
> This will download dependency sources and use $GOPATH/pkg as a caching
> directory for lookups and downloads.

But that does the download at extract time, and we would like that we
still be able to do:

    $ make source
    # Unplug network
    $ make

Also, the hook is registered in the infra (we can't do otherwise), so it
means it would run after any hook registered by the package, while those
hooks may expect the package to be fully available (.e. fully vendored).

> Go specifies commit hashes OR version tags in go.mod, and lists source code
> checksums in go.sum. The Go module system has a robust security model for
> preventing MITM attacks or changed Git tags on dependencies through this
> checksumming and explicitly-specified versioning approach.

This is good, because supposedly that will allow us to generate
reproducible archives, and have hashes for them (in foo.hash)

Regards,
Yann E. MORIN.

> Reference: https://blog.golang.org/using-go-modules
> 
> Signed-off-by: Christian Stewart <christian@paral.in>
> ---
>  package/pkg-golang.mk | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/package/pkg-golang.mk b/package/pkg-golang.mk
> index 2d80e99619..88eb89a68e 100644
> --- a/package/pkg-golang.mk
> +++ b/package/pkg-golang.mk
> @@ -98,6 +98,16 @@ endef
>  
>  $(2)_POST_EXTRACT_HOOKS += $(2)_APPLY_EXTRACT_GOMOD
>  
> +# WIP - download dependencies with the Go tool if vendor does not exist.
> +define $(2)_DOWNLOAD_GOMOD
> +	if [ ! -d $$(@D)/vendor ]; then \
> +		cd $$(@D); \
> +		go mod vendor; \
> +	fi
> +endef
> +
> +$(2)_POST_EXTRACT_HOOKS += $(2)_DOWNLOAD_GOMOD
> +
>  # Build step. Only define it if not already defined by the package .mk
>  # file.
>  ifndef $(2)_BUILD_CMDS
> -- 
> 2.28.0
>
Sam Voss Sept. 3, 2020, 10:52 a.m. UTC | #2
Yann, All,

On Mon, Aug 31, 2020 at 2:08 AM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
>
> Christian, All,
>
> On 2020-08-30 23:23 -0700, Christian Stewart spake thusly:
> > NOTE: This patch is a RFC and is not intended for merging in its current state.
> > It is a naiive implementation of the "go mod vendor" download step as a
> > post-extract hook, for early testing and demonstration of the desired effect. I
> > don't yet know what a final implementation might look like.
>
> Thanks, that's a good starting point.
>
> My proposal was that we introduce per-package managers download
> backends, which would typically do something like the following
> (pseudo-code):
>
>     #!/usr/bin/env bash
>     # This file is support/fownload/go
>
>     # Parse options:
>     actual_site_method="$(get_option '--actual-site-method')"
>
>     # Call actual download helper:
>     "${0%/*}/${actual_site_method}" "${@}" -o "${temp_tarball}"
>
>     # Populate the vendor:
>     tar xf "${temp_tarball}" -C "${temp_directory}"
>     cd "${temp_directory}/${package_name_version}"
>     go mod vendor
>     cd ..
>     tar czf "${final_tarball}" "${package_name_version}"
>
> (of course, the details would be a bit more complex, and would require
> that we pass the actual site method vi the download infra, but the idea
> is there)
>
> What's your opinion on this?

As we spoke about plenty on the IRC, I'm sure you know my opinion on
this but I figure I mention it anyway: I believe we should split these
recursive-requirements from the base tar. This should allow
Proprietary applications and TPIP requirements to be captured while
maintaining separation between them.

Now, to your point about "well, what if the repository has proprietary
dependencies?". I think this is still a valid point, especially when
looking at the case you mentioned of "proprietary app with commingled
TPIP+proprietary requirements", but I believe we should be able to
handle this at a buildroot level fairly reasonably.

I took a look at `go mod`, it seems to share a similar mechanism with
cargo which allows us to pass local paths for dependencies. My
proposition is to put the responsibility of whomever is adding a
proprietary application, which has mixed dependencies, to instead
split any proprietary modules into selectable options in buildroot,
and use the standard depends mechanism. To enforce this, we could
investigate using license-coalescing options of the package managers
to find any proprietary dependencies and fail if they're found to be
pointing to upstream URLs (and would be captured) with an error
message clearly describing our intentions.

>
> See also the following mails from Thomas, which contain copies of some
> of the IRC discussions we had on the topic (about rust and cargo, but
> that's the same topic):
>
>     http://lists.busybox.net/pipermail/buildroot/2020-August/289895.html
>     http://lists.busybox.net/pipermail/buildroot/2020-August/289894.html
>
> > Add a new hook to POST_EXTRACT_HOOKS for Go packages which will create the
> > "vendor" directory structure under $(@D)/vendor with Go package deps by running
> > the "go mod vendor" command.
> >
> > This will download dependency sources and use $GOPATH/pkg as a caching
> > directory for lookups and downloads.
>
> But that does the download at extract time, and we would like that we
> still be able to do:
>
>     $ make source
>     # Unplug network
>     $ make

In my (so far unshared) patchset, my solution to do this agrees with
your suggestion above, by adding download-backend support for these
package managers. I leveraged the implementation suggested previously
by patrick[1] to use the cargo-dl step to then create two tarballs:

<package>-<ver>.tar.gz   <- we're all familiar with
<package>-<ver>-vendor.tar.gz <- the TPIP portion

The reasons for splitting are shared above. I believe that patchset is
a good initial direction, and I think those interested in this
patchset who are unfamiliar with that one should give it a review.

Thanks,

Sam

1: http://patchwork.ozlabs.org/project/buildroot/list/?series=159771

>
> Also, the hook is registered in the infra (we can't do otherwise), so it
> means it would run after any hook registered by the package, while those
> hooks may expect the package to be fully available (.e. fully vendored).
>
> > Go specifies commit hashes OR version tags in go.mod, and lists source code
> > checksums in go.sum. The Go module system has a robust security model for
> > preventing MITM attacks or changed Git tags on dependencies through this
> > checksumming and explicitly-specified versioning approach.
>
> This is good, because supposedly that will allow us to generate
> reproducible archives, and have hashes for them (in foo.hash)
>
> Regards,
> Yann E. MORIN.
>
> > Reference: https://blog.golang.org/using-go-modules
> >
> > Signed-off-by: Christian Stewart <christian@paral.in>
> > ---
> >  package/pkg-golang.mk | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/package/pkg-golang.mk b/package/pkg-golang.mk
> > index 2d80e99619..88eb89a68e 100644
> > --- a/package/pkg-golang.mk
> > +++ b/package/pkg-golang.mk
> > @@ -98,6 +98,16 @@ endef
> >
> >  $(2)_POST_EXTRACT_HOOKS += $(2)_APPLY_EXTRACT_GOMOD
> >
> > +# WIP - download dependencies with the Go tool if vendor does not exist.
> > +define $(2)_DOWNLOAD_GOMOD
> > +     if [ ! -d $$(@D)/vendor ]; then \
> > +             cd $$(@D); \
> > +             go mod vendor; \
> > +     fi
> > +endef
> > +
> > +$(2)_POST_EXTRACT_HOOKS += $(2)_DOWNLOAD_GOMOD
> > +
> >  # Build step. Only define it if not already defined by the package .mk
> >  # file.
> >  ifndef $(2)_BUILD_CMDS
> > --
> > 2.28.0
> >
>
> --
> .-----------------.--------------------.------------------.--------------------.
> |  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
> | +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
> | +33 561 099 427 `------------.-------:  X  AGAINST      |  \e/  There is no  |
> | http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
> '------------------------------^-------^------------------^--------------------'
Thomas Petazzoni Sept. 3, 2020, 11:57 a.m. UTC | #3
Hello,

On Thu, 3 Sep 2020 05:52:56 -0500
Sam Voss <sam.voss@gmail.com> wrote:

> > My proposal was that we introduce per-package managers download
> > backends, which would typically do something like the following
> > (pseudo-code):
> >
> >     #!/usr/bin/env bash
> >     # This file is support/fownload/go
> >
> >     # Parse options:
> >     actual_site_method="$(get_option '--actual-site-method')"
> >
> >     # Call actual download helper:
> >     "${0%/*}/${actual_site_method}" "${@}" -o "${temp_tarball}"
> >
> >     # Populate the vendor:
> >     tar xf "${temp_tarball}" -C "${temp_directory}"
> >     cd "${temp_directory}/${package_name_version}"
> >     go mod vendor
> >     cd ..
> >     tar czf "${final_tarball}" "${package_name_version}"
> >
> > (of course, the details would be a bit more complex, and would require
> > that we pass the actual site method vi the download infra, but the idea
> > is there)
> >
> > What's your opinion on this?  
> 
> As we spoke about plenty on the IRC, I'm sure you know my opinion on
> this but I figure I mention it anyway: I believe we should split these
> recursive-requirements from the base tar. This should allow
> Proprietary applications and TPIP requirements to be captured while
> maintaining separation between them.

For those tuning in: TPIP stands for (I believe) "Third Party
Intellectual Property".

I think I agree with the idea of splitting into two tarballs the
package source code itself from the source code of all its dependencies.

> Now, to your point about "well, what if the repository has proprietary
> dependencies?".

That was indeed my question as well.

> I think this is still a valid point, especially when
> looking at the case you mentioned of "proprietary app with commingled
> TPIP+proprietary requirements", but I believe we should be able to
> handle this at a buildroot level fairly reasonably.
> 
> I took a look at `go mod`, it seems to share a similar mechanism with
> cargo which allows us to pass local paths for dependencies. My
> proposition is to put the responsibility of whomever is adding a
> proprietary application, which has mixed dependencies, to instead
> split any proprietary modules into selectable options in buildroot,
> and use the standard depends mechanism. To enforce this, we could
> investigate using license-coalescing options of the package managers
> to find any proprietary dependencies and fail if they're found to be
> pointing to upstream URLs (and would be captured) with an error
> message clearly describing our intentions.

This feels like a reasonable approach to me.

Question: for the dependencies, instead of having a single tarball for
all of them, would there be a way of having separate tarballs for each
dependency that gets pulled by "go mod" or "cargo" ? If so, then we
could perhaps be able to teach the package which parts of the package
are proprietary (including the proprietary dependencies) and which
parts are open-source and under which license.

Note that as a first step, I am personally perfectly fine with what you
are proposing. The above question is really just a question, not a
request to implement something like that.

> In my (so far unshared) patchset, my solution to do this agrees with
> your suggestion above, by adding download-backend support for these
> package managers. I leveraged the implementation suggested previously
> by patrick[1] to use the cargo-dl step to then create two tarballs:
> 
> <package>-<ver>.tar.gz   <- we're all familiar with
> <package>-<ver>-vendor.tar.gz <- the TPIP portion
> 
> The reasons for splitting are shared above. I believe that patchset is
> a good initial direction, and I think those interested in this
> patchset who are unfamiliar with that one should give it a review.

Could you submit your patch series ? Did you fix the issues that we
pointed out in the review of Patrick's series ?

I think we have a good opportunity to try to solve this problem for
both Cargo and Go at the same time, so that we at least see if there's
a reasonably similar solution that can be used.

Best regards,

Thomas
Sam Voss Sept. 3, 2020, 1:01 p.m. UTC | #4
Hey Thomas,

On Thu, Sep 3, 2020 at 7:00 AM Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
>
> Hello,
>
> On Thu, 3 Sep 2020 05:52:56 -0500
> Sam Voss <sam.voss@gmail.com> wrote:
>
> > > My proposal was that we introduce per-package managers download
> > > backends, which would typically do something like the following
> > > (pseudo-code):
> > >
> > >     #!/usr/bin/env bash
> > >     # This file is support/fownload/go
> > >
> > >     # Parse options:
> > >     actual_site_method="$(get_option '--actual-site-method')"
> > >
> > >     # Call actual download helper:
> > >     "${0%/*}/${actual_site_method}" "${@}" -o "${temp_tarball}"
> > >
> > >     # Populate the vendor:
> > >     tar xf "${temp_tarball}" -C "${temp_directory}"
> > >     cd "${temp_directory}/${package_name_version}"
> > >     go mod vendor
> > >     cd ..
> > >     tar czf "${final_tarball}" "${package_name_version}"
> > >
> > > (of course, the details would be a bit more complex, and would require
> > > that we pass the actual site method vi the download infra, but the idea
> > > is there)
> > >
> > > What's your opinion on this?
> >
> > As we spoke about plenty on the IRC, I'm sure you know my opinion on
> > this but I figure I mention it anyway: I believe we should split these
> > recursive-requirements from the base tar. This should allow
> > Proprietary applications and TPIP requirements to be captured while
> > maintaining separation between them.
>
> For those tuning in: TPIP stands for (I believe) "Third Party
> Intellectual Property".

Sorry, that is correct, I should have clarified. Thanks for mentioning it.

>
> I think I agree with the idea of splitting into two tarballs the
> package source code itself from the source code of all its dependencies.
>
> > Now, to your point about "well, what if the repository has proprietary
> > dependencies?".
>
> That was indeed my question as well.
>
> > I think this is still a valid point, especially when
> > looking at the case you mentioned of "proprietary app with commingled
> > TPIP+proprietary requirements", but I believe we should be able to
> > handle this at a buildroot level fairly reasonably.
> >
> > I took a look at `go mod`, it seems to share a similar mechanism with
> > cargo which allows us to pass local paths for dependencies. My
> > proposition is to put the responsibility of whomever is adding a
> > proprietary application, which has mixed dependencies, to instead
> > split any proprietary modules into selectable options in buildroot,
> > and use the standard depends mechanism. To enforce this, we could
> > investigate using license-coalescing options of the package managers
> > to find any proprietary dependencies and fail if they're found to be
> > pointing to upstream URLs (and would be captured) with an error
> > message clearly describing our intentions.
>
> This feels like a reasonable approach to me.
>
> Question: for the dependencies, instead of having a single tarball for
> all of them, would there be a way of having separate tarballs for each
> dependency that gets pulled by "go mod" or "cargo" ? If so, then we
> could perhaps be able to teach the package which parts of the package
> are proprietary (including the proprietary dependencies) and which
> parts are open-source and under which license.

I don't exactly know, but to your point I know it pulls them into
correctly-named folders, however no version string attached:

VENDOR/ws2_32-sys/Cargo.toml

>
> Note that as a first step, I am personally perfectly fine with what you
> are proposing. The above question is really just a question, not a
> request to implement something like that.

Understood, I figured since I had a vendor tar I may as well take a
look at what it produced.

>
> > In my (so far unshared) patchset, my solution to do this agrees with
> > your suggestion above, by adding download-backend support for these
> > package managers. I leveraged the implementation suggested previously
> > by patrick[1] to use the cargo-dl step to then create two tarballs:
> >
> > <package>-<ver>.tar.gz   <- we're all familiar with
> > <package>-<ver>-vendor.tar.gz <- the TPIP portion
> >
> > The reasons for splitting are shared above. I believe that patchset is
> > a good initial direction, and I think those interested in this
> > patchset who are unfamiliar with that one should give it a review.
>
> Could you submit your patch series ? Did you fix the issues that we
> pointed out in the review of Patrick's series ?

I did not change Patrick's series at all, built a new change-set on
top of it. I will have to go back and integrate them together, and I
can take a look at the issues with him in the meantime.

Thanks,

Sam
Yann E. MORIN Sept. 3, 2020, 1:28 p.m. UTC | #5
Sam, All,

On 2020-09-03 05:52 -0500, Sam Voss spake thusly:
> On Mon, Aug 31, 2020 at 2:08 AM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> > My proposal was that we introduce per-package managers download
> > backends, which would typically do something like the following
> > (pseudo-code):
[--SNIP--]
> As we spoke about plenty on the IRC, I'm sure you know my opinion on
> this but I figure I mention it anyway: I believe we should split these
> recursive-requirements from the base tar. This should allow
> Proprietary applications and TPIP requirements to be captured while
> maintaining separation between them.
>
> Now, to your point about "well, what if the repository has proprietary
> dependencies?". I think this is still a valid point, especially when
> looking at the case you mentioned of "proprietary app with commingled
> TPIP+proprietary requirements", but I believe we should be able to
> handle this at a buildroot level fairly reasonably.

And as I explained on IRC (which I am going to repeat here so that it is
archived in the list archives, or for all to see as well), I believe
this dichotomy between main and vendored archives is incorrect.

What I understand is that, for proprietary applications, you do not want
to provide the main tarball, but since those package managers make it so
easy to depend on external, FLOSS components, you want to be able to
provide those vendored FLOSS stuff to be compliant, without providing
the code for your proprietary blurb.

Consider for example that your company develops two proprietary
packages, foo and bar, which have the following dependencies, which will
ultimately be vendored in those packages (names totally made up;
licenses between brakcets; CORP means proprietary package):

  - foo [CORP] depends on libssl [MIT]

  - bar [CORP] depends on foo and libhttp [MIT]

So if you have a separate archive for bar's vendoring, it will include
your foo proprietary package, and in the case of 'bar', the separation
of archive will not help you properly separate the "TPIP" that you would
have to distribute to be complaint.

And I don't think we could have a single, simple mechanism for that,
except by splitting each and every vendored dependencies at all levels
into their own archives.

And I doubt this would be something we would want to do.

> I took a look at `go mod`, it seems to share a similar mechanism with
> cargo which allows us to pass local paths for dependencies. My
> proposition is to put the responsibility of whomever is adding a
> proprietary application, which has mixed dependencies, to instead
> split any proprietary modules into selectable options in buildroot,
> and use the standard depends mechanism.

This is doomed to not work, because it relies on process. And since this
is not enforce by the tool (more on that below), people will not realise
that they have proprietary dependencies down to the n-th level.

> To enforce this, we could
> investigate using license-coalescing options of the package managers
> to find any proprietary dependencies and fail if they're found to be
> pointing to upstream URLs (and would be captured) with an error
> message clearly describing our intentions.

But that could also be a totally valid setup, where one would not care
about that situation, if one builds a device for their own use, and that
is never ever distributed; those people would not care about the
licensing of their dependency hell.

> 
> In my (so far unshared) patchset, my solution to do this agrees with
> your suggestion above, by adding download-backend support for these
> package managers. I leveraged the implementation suggested previously
> by patrick[1] to use the cargo-dl step to then create two tarballs:
> 
> <package>-<ver>.tar.gz   <- we're all familiar with
> <package>-<ver>-vendor.tar.gz <- the TPIP portion
> 
> The reasons for splitting are shared above. I believe that patchset is
> a good initial direction, and I think those interested in this
> patchset who are unfamiliar with that one should give it a review.

Well, whether we go for one or two archives, I think posting that
patchset would anyway be a good first step: indeed, switching to the
other solution would be relatively trivial.

Regards,
Yann E. MORIN.
Thomas Petazzoni Sept. 3, 2020, 1:58 p.m. UTC | #6
Hello,

On Thu, 3 Sep 2020 08:01:27 -0500
Sam Voss <sam.voss@collins.com> wrote:

> > Question: for the dependencies, instead of having a single tarball for
> > all of them, would there be a way of having separate tarballs for each
> > dependency that gets pulled by "go mod" or "cargo" ? If so, then we
> > could perhaps be able to teach the package which parts of the package
> > are proprietary (including the proprietary dependencies) and which
> > parts are open-source and under which license.  
> 
> I don't exactly know, but to your point I know it pulls them into
> correctly-named folders, however no version string attached:
> 
> VENDOR/ws2_32-sys/Cargo.toml

So potentially we could tarball independently each dependency, perhaps
in a tarball named:

<pkgname>-<pkgversion>-<dependencyname>.tar.gz

where <dependencyname> here would be "ws2_32-sys".

The question is then how we can associate the licenses to each
dependency. Again, I don't think it's worth tackling at this moment.

> > Could you submit your patch series ? Did you fix the issues that we
> > pointed out in the review of Patrick's series ?  
> 
> I did not change Patrick's series at all, built a new change-set on
> top of it. I will have to go back and integrate them together, and I
> can take a look at the issues with him in the meantime.

Yes, if you can re-integrate your improvements within the patches from
Patrick and submit an updated series, it would be great.

Thanks,

Thomas
Thomas Petazzoni Sept. 3, 2020, 2:02 p.m. UTC | #7
Hello,

On Thu, 3 Sep 2020 15:28:52 +0200
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> And as I explained on IRC (which I am going to repeat here so that it is
> archived in the list archives, or for all to see as well), I believe
> this dichotomy between main and vendored archives is incorrect.
> 
> What I understand is that, for proprietary applications, you do not want
> to provide the main tarball, but since those package managers make it so
> easy to depend on external, FLOSS components, you want to be able to
> provide those vendored FLOSS stuff to be compliant, without providing
> the code for your proprietary blurb.
> 
> Consider for example that your company develops two proprietary
> packages, foo and bar, which have the following dependencies, which will
> ultimately be vendored in those packages (names totally made up;
> licenses between brakcets; CORP means proprietary package):
> 
>   - foo [CORP] depends on libssl [MIT]
> 
>   - bar [CORP] depends on foo and libhttp [MIT]

In this case, Sam's suggestion is that "foo" should be packaged as a
proper Buildroot package, and the "bar" package should use "foo" from
the "foo" Buildroot package, rather than using the language-specific
vendoring/module system to retrieve it.

In other words, Sam's suggestion is that if you have proprietary /
non-redistributable bits of code used as dependencies in your Go/Rust
application, you should go and create Buildroot packages for these bits.

> So if you have a separate archive for bar's vendoring, it will include
> your foo proprietary package, and in the case of 'bar', the separation
> of archive will not help you properly separate the "TPIP" that you would
> have to distribute to be complaint.

See above. It was also covered in Sam's proposal.

> > I took a look at `go mod`, it seems to share a similar mechanism with
> > cargo which allows us to pass local paths for dependencies. My
> > proposition is to put the responsibility of whomever is adding a
> > proprietary application, which has mixed dependencies, to instead
> > split any proprietary modules into selectable options in buildroot,
> > and use the standard depends mechanism.  
> 
> This is doomed to not work, because it relies on process. And since this
> is not enforce by the tool (more on that below), people will not realise
> that they have proprietary dependencies down to the n-th level.

What alternative do you suggest ?

> > To enforce this, we could
> > investigate using license-coalescing options of the package managers
> > to find any proprietary dependencies and fail if they're found to be
> > pointing to upstream URLs (and would be captured) with an error
> > message clearly describing our intentions.  
> 
> But that could also be a totally valid setup, where one would not care
> about that situation, if one builds a device for their own use, and that
> is never ever distributed; those people would not care about the
> licensing of their dependency hell.

Again, what do you suggest ?

I agree it's not ideal, but it is relatively reasonable, and in the
absence of other clear solution and proposal, we're likely to opt for a
"relatively reasonable" solution, even if not ideal.

Thomas
Yann E. MORIN Sept. 3, 2020, 3:12 p.m. UTC | #8
Thomas, All,

On 2020-09-03 16:02 +0200, Thomas Petazzoni spake thusly:
> On Thu, 3 Sep 2020 15:28:52 +0200
> "Yann E. MORIN" <yann.morin.1998@free.fr> wrote:
> > And as I explained on IRC (which I am going to repeat here so that it is
> > archived in the list archives, or for all to see as well), I believe
> > this dichotomy between main and vendored archives is incorrect.
> > 
> > What I understand is that, for proprietary applications, you do not want
> > to provide the main tarball, but since those package managers make it so
> > easy to depend on external, FLOSS components, you want to be able to
> > provide those vendored FLOSS stuff to be compliant, without providing
> > the code for your proprietary blurb.
> > 
> > Consider for example that your company develops two proprietary
> > packages, foo and bar, which have the following dependencies, which will
> > ultimately be vendored in those packages (names totally made up;
> > licenses between brakcets; CORP means proprietary package):
> > 
> >   - foo [CORP] depends on libssl [MIT]
> > 
> >   - bar [CORP] depends on foo and libhttp [MIT]
> 
> In this case, Sam's suggestion is that "foo" should be packaged as a
> proper Buildroot package, and the "bar" package should use "foo" from
> the "foo" Buildroot package, rather than using the language-specific
> vendoring/module system to retrieve it.
> 
> In other words, Sam's suggestion is that if you have proprietary /
> non-redistributable bits of code used as dependencies in your Go/Rust
> application, you should go and create Buildroot packages for these bits.

But that only ever works if the bar package is only developped for use
by Buildroot, and is not shared with other projects in $BIG_CORP.

The situation is not uncommon that a package is developped in some
department (that has no idea what Buildroot is or even that it exists)
and be re-used in anotehr department, which uses Buildroot and will be
very sad that the package does not folow Buildroot packaging principles.

> > > I took a look at `go mod`, it seems to share a similar mechanism with
> > > cargo which allows us to pass local paths for dependencies. My
> > > proposition is to put the responsibility of whomever is adding a
> > > proprietary application, which has mixed dependencies, to instead
> > > split any proprietary modules into selectable options in buildroot,
> > > and use the standard depends mechanism.  
> > This is doomed to not work, because it relies on process. And since this
> > is not enforce by the tool (more on that below), people will not realise
> > that they have proprietary dependencies down to the n-th level.
> What alternative do you suggest ?

That we do not create separate archives, at all. Vendored stuf is like
any other bundled dependencies: they are part of the source archive.

> > > To enforce this, we could
> > > investigate using license-coalescing options of the package managers
> > > to find any proprietary dependencies and fail if they're found to be
> > > pointing to upstream URLs (and would be captured) with an error
> > > message clearly describing our intentions.  
> > 
> > But that could also be a totally valid setup, where one would not care
> > about that situation, if one builds a device for their own use, and that
> > is never ever distributed; those people would not care about the
> > licensing of their dependency hell.
> 
> Again, what do you suggest ?
> 
> I agree it's not ideal, but it is relatively reasonable, and in the
> absence of other clear solution and proposal, we're likely to opt for a
> "relatively reasonable" solution, even if not ideal.

The best and simplest solution is to not create separate archives. That
is a reasonable solution.

At least as a first step.

If, and I say "if", the need arises and *all* cases are properly covered,
then we can further the infra to do the split (but I have a lot of doubts
about the relevance of that).

However, there are a few things I forgot about in my previous comments
(but the first two probably the same issue, just at two different
places):

  - at extract step, how do you know that you have to also extract the
    archive with the vendor stuff? Probably easy by looking if said
    archive exists. But then, if each dependency is stored in its own
    archive, how do you know which to extract? And then, if two deps at
    different levels have a depednency on another package, but at
    different versions?

  - when you generate the legal-info/ directory, how do you know what to
    put in there for that package? You are back to the problem above,
    plus you would also want to ignore those vendored deps that are not
    redistributable, although we have no way in Buildroot to describe
    that either....

  - why would we impose the policy that vendoring dependencies on other
    proprietary packages must use Buildroot packages (see the rationale
    I also gave, above, about code shared across multiple departments of
    a single company), but dependencies on FLOSS (or even other
    proprietary but redistributable packages) would be OK?

Regards,
Yann E. MORIN.
Thomas Petazzoni Sept. 3, 2020, 4:13 p.m. UTC | #9
On Thu, 3 Sep 2020 17:12:21 +0200
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> > In this case, Sam's suggestion is that "foo" should be packaged as a
> > proper Buildroot package, and the "bar" package should use "foo" from
> > the "foo" Buildroot package, rather than using the language-specific
> > vendoring/module system to retrieve it.
> > 
> > In other words, Sam's suggestion is that if you have proprietary /
> > non-redistributable bits of code used as dependencies in your Go/Rust
> > application, you should go and create Buildroot packages for these bits.  
> 
> But that only ever works if the bar package is only developped for use
> by Buildroot, and is not shared with other projects in $BIG_CORP.
> 
> The situation is not uncommon that a package is developped in some
> department (that has no idea what Buildroot is or even that it exists)
> and be re-used in anotehr department, which uses Buildroot and will be
> very sad that the package does not folow Buildroot packaging principles.

What you mean is that "bar" (which needs "foo") would have to not use
the standard Go/Cargo vendoring model to describe its dependency on
"foo" so that the Go/Cargo vendoring tools don't fetch "foo". Correct ?

If yes, then indeed that's a fair point.

> > > > I took a look at `go mod`, it seems to share a similar mechanism with
> > > > cargo which allows us to pass local paths for dependencies. My
> > > > proposition is to put the responsibility of whomever is adding a
> > > > proprietary application, which has mixed dependencies, to instead
> > > > split any proprietary modules into selectable options in buildroot,
> > > > and use the standard depends mechanism.    
> > > This is doomed to not work, because it relies on process. And since this
> > > is not enforce by the tool (more on that below), people will not realise
> > > that they have proprietary dependencies down to the n-th level.  
> > What alternative do you suggest ?  
> 
> That we do not create separate archives, at all. Vendored stuf is like
> any other bundled dependencies: they are part of the source archive.

But then what is your proposal to handle this mixture of
licensing/redistributability conditions between the core package source
and its dependencies ?

> If, and I say "if", the need arises and *all* cases are properly covered,
> then we can further the infra to do the split (but I have a lot of doubts
> about the relevance of that).
> 
> However, there are a few things I forgot about in my previous comments
> (but the first two probably the same issue, just at two different
> places):
> 
>   - at extract step, how do you know that you have to also extract the
>     archive with the vendor stuff? Probably easy by looking if said
>     archive exists. But then, if each dependency is stored in its own
>     archive, how do you know which to extract?

You would parse the go.mod file I suppose, but that doesn't give you
indirect dependencies. Perhaps some Go tool can help with that ? But
indeed, that's a good question.

Note that storing each dependency as a separate tarball was an idea I
have thrown on the table, not something that was proposed by Sam.

> And then, if two deps at
>     different levels have a depednency on another package, but at
>     different versions?

I am not following you on this one.

>   - when you generate the legal-info/ directory, how do you know what to
>     put in there for that package? You are back to the problem above,
>     plus you would also want to ignore those vendored deps that are not
>     redistributable, although we have no way in Buildroot to describe
>     that either....

Yes, we would certainly need to extend somehow the LICENSE /
LICENSE_FILES stuff to be able to distinguish which license / license
files applies to what.

>   - why would we impose the policy that vendoring dependencies on other
>     proprietary packages must use Buildroot packages (see the rationale
>     I also gave, above, about code shared across multiple departments of
>     a single company), but dependencies on FLOSS (or even other
>     proprietary but redistributable packages) would be OK?

Well, this I can answer: the reason why we don't create one Buildroot
package per Cargo or Go package is because there are zillions of such
FLOSS packages. So we take the lazy route and use the Cargo/Go
dependency management system. But since we have a problem with how to
/not/ redistribute the few Proprietary apps/libs, an idea was to
package them separately. So the reason is really: probably zillions of
FLOSS dependencies, while just a handful of internal/proprietary
dependencies.

Thomas
Christian Stewart Sept. 3, 2020, 6:51 p.m. UTC | #10
Hi Thomas, Sam,

Thanks much for looking into this, and I'm excited at the idea of
splitting vendor into a separate tarball.

In my mind the optimal approach would be to leverage the Go module
system to produce a tarball with associated hash for every Go module
that's imported by any of the selected Build targets, and vendor
specifically those, excluding all else.

I can write the Go code to do this & prototype a proof of concept if
this is the way we are interested in going. It would be IMO
appropriate because every Go module is checksummed, will produce a
guaranteed identical output tarball that we can add to the hash file,
and the vendor tree would only contain the necessary packages for the
build targets flagged by the buildroot config.

On Thu, Sep 3, 2020 at 4:57 AM Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
> > I took a look at `go mod`, it seems to share a similar mechanism with
> > cargo which allows us to pass local paths for dependencies. My
> > proposition is to put the responsibility of whomever is adding a
> > proprietary application, which has mixed dependencies, to instead
> > split any proprietary modules into selectable options in buildroot,
> > and use the standard depends mechanism. To enforce this, we could
> > investigate using license-coalescing options of the package managers
> > to find any proprietary dependencies and fail if they're found to be
> > pointing to upstream URLs (and would be captured) with an error
> > message clearly describing our intentions.

This fits well with what I'm describing above. The Go code to produce
/ extract the tarballs for the dependency modules and the module
metadata could also be used to produce the license check and legal
output. In fact, we could even generate the upstream URLs from this,
and all other metadata desired (even authors information and a
dependency chart).

> Question: for the dependencies, instead of having a single tarball for
> all of them, would there be a way of having separate tarballs for each
> dependency that gets pulled by "go mod" or "cargo" ? If so, then we
> could perhaps be able to teach the package which parts of the package
> are proprietary (including the proprietary dependencies) and which
> parts are open-source and under which license.

See above.

> Note that as a first step, I am personally perfectly fine with what you
> are proposing. The above question is really just a question, not a
> request to implement something like that.

To me those two are the same, heh.

> I think we have a good opportunity to try to solve this problem for
> both Cargo and Go at the same time, so that we at least see if there's
> a reasonably similar solution that can be used.

Since our host-go infrastructure is now up to par, we can integrate
the tool I'm describing above as a host-go package with no external
dependencies since it would only use the Go stdlib.

Best regards,
Christian Stewart
Yann E. MORIN Sept. 3, 2020, 7:18 p.m. UTC | #11
Thomas, All,

On 2020-09-03 18:13 +0200, Thomas Petazzoni spake thusly:
> On Thu, 3 Sep 2020 17:12:21 +0200
> "Yann E. MORIN" <yann.morin.1998@free.fr> wrote:
[--SNIP--]
> > The situation is not uncommon that a package is developped in some
> > department (that has no idea what Buildroot is or even that it exists)
> > and be re-used in anotehr department, which uses Buildroot and will be
> > very sad that the package does not folow Buildroot packaging principles.
> 
> What you mean is that "bar" (which needs "foo") would have to not use
> the standard Go/Cargo vendoring model to describe its dependency on
> "foo" so that the Go/Cargo vendoring tools don't fetch "foo". Correct ?

Something the go/cargo/npm/composer/... folkds would have to answer. And
if at all, I;d like that we get a consistent behaviour across all those
package managers that have vendoring of dependencies. Indeed, it would
be very disapointing to have one support it and nto the others. I.e. if
we can't do separate tarball for one of them, we should not do for the
others...

[--SNIP--]
> > That we do not create separate archives, at all. Vendored stuf is like
> > any other bundled dependencies: they are part of the source archive.
> But then what is your proposal to handle this mixture of
> licensing/redistributability conditions between the core package source
> and its dependencies ?

How do we handle it in other proprietary packages? We don't.

For example, let's say that Joe Developer Jr. in BIG CORP copies some
LGPL-licensed code into their proprietary project, as a bundled library.

This is totally legit: the proprietary parts are not tainted (I hate
that word, but have not better for now) by the LGPL stuff. Yet, they
have to redistribute the LGPL stuff.

Buildroot currently offers noway for them to do that.

Vendoring is the same as bundling, except it happens at download time on
our side, not on the upstream developper's machine.

> > If, and I say "if", the need arises and *all* cases are properly covered,
> > then we can further the infra to do the split (but I have a lot of doubts
> > about the relevance of that).
> > 
> > However, there are a few things I forgot about in my previous comments
> > (but the first two probably the same issue, just at two different
> > places):
> > 
> >   - at extract step, how do you know that you have to also extract the
> >     archive with the vendor stuff? Probably easy by looking if said
> >     archive exists. But then, if each dependency is stored in its own
> >     archive, how do you know which to extract?
> 
> You would parse the go.mod file I suppose, but that doesn't give you
> indirect dependencies. Perhaps some Go tool can help with that ? But
> indeed, that's a good question.

And what about cargo? npm? php composer? Others? (what, there are
others? ;-] )

> Note that storing each dependency as a separate tarball was an idea I
> have thrown on the table, not something that was proposed by Sam.

Yes, but one or many, it does not change much...

> > And then, if two deps at
> >     different levels have a depednency on another package, but at
> >     different versions?
> I am not following you on this one.

  - foo vendors libssl version 42.27
  - foo vendors libhttp version 1.2.3
    - libhttp 1.2.3 vendors libssl 666.777

What version of libssl do we extract where?

I do not want to have to repeat the vendoring logic in Buildroot.

Also, I do not want that we have various level of vendoring support for
the various package managers.

> >   - when you generate the legal-info/ directory, how do you know what to
> >     put in there for that package? You are back to the problem above,
> >     plus you would also want to ignore those vendored deps that are not
> >     redistributable, although we have no way in Buildroot to describe
> >     that either....
> 
> Yes, we would certainly need to extend somehow the LICENSE /
> LICENSE_FILES stuff to be able to distinguish which license / license
> files applies to what.

As I understand it, in fact, the problem is just about legal-info.

People do not want to redistribute their proprietary packages and the
proprietary bits it vendors, but they want to redistribute the FLOSS
bits they vendor.

So, if we jut concentrate on how we can help people do exactly that:
filter out the bits they do not want to redistribute?

One solution would be to have packages provide some legal-inf hooks,
something like (e.g.: only keep files which names match the regexp):

    FOO_LEGAL_INFO_FILTER_REGEXP = ^vendor/FLOSS/

Or whatever, that would be applied at the time the legal-info is
generated.

Thoughts?

Note: I find that awfull, but less so than the split archives. Also, it
is generic, so it can also be used by other packages like the hypotetical
Joe Developer Jr. case explained above.

> >   - why would we impose the policy that vendoring dependencies on other
> >     proprietary packages must use Buildroot packages (see the rationale
> >     I also gave, above, about code shared across multiple departments of
> >     a single company), but dependencies on FLOSS (or even other
> >     proprietary but redistributable packages) would be OK?
> 
> Well, this I can answer: the reason why we don't create one Buildroot
> package per Cargo or Go package is because there are zillions of such
> FLOSS packages. So we take the lazy route and use the Cargo/Go
> dependency management system. But since we have a problem with how to
> /not/ redistribute the few Proprietary apps/libs, an idea was to
> package them separately. So the reason is really: probably zillions of
> FLOSS dependencies, while just a handful of internal/proprietary
> dependencies.

Paint me unconvinced.

Regards,
Yann E. MORIN.
Christian Stewart Sept. 3, 2020, 7:40 p.m. UTC | #12
Yann,

On Thu, Sep 3, 2020 at 12:19 PM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> Something the go/cargo/npm/composer/... folkds would have to answer. And
> if at all, I;d like that we get a consistent behaviour across all those
> package managers that have vendoring of dependencies. Indeed, it would
> be very disapointing to have one support it and nto the others. I.e. if
> we can't do separate tarball for one of them, we should not do for the
> others...

I do not see in any way how this would not be possible, even for npm,
you can quite easily query the package dependency graph and set things
up properly.

> For example, let's say that Joe Developer Jr. in BIG CORP copies some
> LGPL-licensed code into their proprietary project, as a bundled library.
>
> This is totally legit: the proprietary parts are not tainted (I hate
> that word, but have not better for now) by the LGPL stuff. Yet, they
> have to redistribute the LGPL stuff.
>
> Buildroot currently offers noway for them to do that.

In what world would a Buildroot package ever be added as an in-tree
package with a proprietary library *copied into the source code tree*
??

> Vendoring is the same as bundling, except it happens at download time on
> our side, not on the upstream developper's machine.

We're enforcing hash checks on these bundles. The format may not
always be the same across versions. Storing the source code before
it's extracted into a vendor tree is the only way to be sure that the
hashes won't change between iterations of the package manager. It's
also the only way to redistribute the source code packages for the
libraries independently from the proprietary part, maintaining the
hashes. It's the only way to deduplicate downloads of identical
package versions, to do LICENSE checks on dependencies, etc etc etc.

I don't see an alternative option here. Just running the package
manager and compressing the result is less convincing than writing
some code to properly understand each dependency.

> > >   - at extract step, how do you know that you have to also extract the
> > >     archive with the vendor stuff? Probably easy by looking if said
> > >     archive exists. But then, if each dependency is stored in its own
> > >     archive, how do you know which to extract?

 - Extract the main package
 - Check the package.json or go.mod or cargo or whatever
 - Extract the relevant stuff into a format the package manager understands
 - Run the package manager from the language to assemble the "vendor"
tree in the source dir (maybe same step).

> > You would parse the go.mod file I suppose, but that doesn't give you
> > indirect dependencies. Perhaps some Go tool can help with that ? But
> > indeed, that's a good question.

?? go.mod handles indirect dependencies.

> And what about cargo? npm? php composer? Others? (what, there are
> others? ;-] )

package.lock yarn.lock. Require it.

> > Note that storing each dependency as a separate tarball was an idea I
> > have thrown on the table, not something that was proposed by Sam.
>
> Yes, but one or many, it does not change much...

It changes a lot! As described above.

> > > And then, if two deps at
> > >     different levels have a depednency on another package, but at
> > >     different versions?
> > I am not following you on this one.
>
>   - foo vendors libssl version 42.27
>   - foo vendors libhttp version 1.2.3
>     - libhttp 1.2.3 vendors libssl 666.777
>
> What version of libssl do we extract where?
>
> I do not want to have to repeat the vendoring logic in Buildroot.

Why repeat it? Re-use it from the programming language! Not everything
has to be in bash.

> Also, I do not want that we have various level of vendoring support for
> the various package managers.

OK, so we implement it across the board, which language would not be
able to support this?

> > >   - when you generate the legal-info/ directory, how do you know what to
> > >     put in there for that package? You are back to the problem above,
> > >     plus you would also want to ignore those vendored deps that are not
> > >     redistributable, although we have no way in Buildroot to describe
> > >     that either....

Use the license field in the package.json or wherever the specifiers
exist, and if they aren't there, detect common LICENSE file names, if
you can't find anything, fail.

Go has a few very robust license detector packages. (if desired).

> So, if we jut concentrate on how we can help people do exactly that:
> filter out the bits they do not want to redistribute?
>
> One solution would be to have packages provide some legal-inf hooks,
> something like (e.g.: only keep files which names match the regexp):
>
>     FOO_LEGAL_INFO_FILTER_REGEXP = ^vendor/FLOSS/
>
> Or whatever, that would be applied at the time the legal-info is
> generated.

How does this solve the problem? If I need to give the source tarballs
away for dependencies, and it's all mixed into one massive tarball,
you can't separate things out and keep the hashes the same

I thought the requirement was that you would be able to send someone
the buildroot "dl" directory and be able to perform a build without
network fetches.

> Paint me unconvinced.

What's the alternative?

> Regards,
> Yann E. MORIN.


best regards,
Christian Stewart
Yann E. MORIN Sept. 3, 2020, 8:43 p.m. UTC | #13
Christian, All,

On 2020-09-03 12:40 -0700, Christian Stewart spake thusly:
> On Thu, Sep 3, 2020 at 12:19 PM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> > For example, let's say that Joe Developer Jr. in BIG CORP copies some
> > LGPL-licensed code into their proprietary project, as a bundled library.
> >
> > This is totally legit: the proprietary parts are not tainted (I hate
> > that word, but have not better for now) by the LGPL stuff. Yet, they
> > have to redistribute the LGPL stuff.
> >
> > Buildroot currently offers noway for them to do that.
> 
> In what world would a Buildroot package ever be added as an in-tree
> package with a proprietary library *copied into the source code tree*
> ??

I never talked about upstream Buildroot.

But consider that people would take Buildroot, put it in git in their
internal git server, and modify it by adding local packages to it. Or
they could also use a br2-external tree.

The packages at stake here are non-public packages that people write as
part of their day-time job in their company. It is totally possible that
someone belives it would be "easier" to have the source of a dependency
bundled in their proprietary package.

Which is *exactly* the case about a proprietary package vendoring a set
of external libraries.

> > Vendoring is the same as bundling, except it happens at download time on
> > our side, not on the upstream developper's machine.
> 
> We're enforcing hash checks on these bundles. The format may not
> always be the same across versions. Storing the source code before
> it's extracted into a vendor tree is the only way to be sure that the
> hashes won't change between iterations of the package manager.

If a package vendors unversioned dependencies, then indeed we can't add
hashes for that package, because two builds may get different stuff for
each such vendored dependency, like we don;t add hashes for stuff we
know is not reproducible (bzr, cvs, hg for example).

> It's
> also the only way to redistribute the source code packages for the
> libraries independently from the proprietary part,

Except as I explained, it does not work in case the dependencies have
dependencies to other proprietary packages, at an arbitrary depth...

> maintaining the hashes.

With my proposal, it would not be: there would be a single archive, for
which we have hashes. Then when we call legal-info, the package filter
is applied to generate a new archive in legal-info, which only contains
the filterd files.

And in the output of legal-info, we do not store the hahes from the
infra, we calculate the hashes already:

    https://git.buildroot.org/buildroot/tree/Makefile#n870

... so we do not need to have hashes of download archives match the
legal-info archives.

> It's the only way to deduplicate downloads of identical
> package versions, to do LICENSE checks on dependencies, etc etc etc.

That would not de-duplicate, because the separate archives would end up
in $(FOO_DL_DIR), which is $(BR2_DL_DIR)/foo/ anyway.

> I don't see an alternative option here. Just running the package
> manager and compressing the result is less convincing than writing
> some code to properly understand each dependency.

Just running the package manager and compressing the result is the
easiest and simplest solution, that will work reliably and consistently
across all cases.

> > > >   - at extract step, how do you know that you have to also extract the
> > > >     archive with the vendor stuff? Probably easy by looking if said
> > > >     archive exists. But then, if each dependency is stored in its own
> > > >     archive, how do you know which to extract?
> 
>  - Extract the main package
>  - Check the package.json or go.mod or cargo or whatever
>  - Extract the relevant stuff into a format the package manager understands

This is what I mean by "reinventing the logic of the package managers".
Because this one go.mod would refer to dependencies, that my have their
won dependencies, so we'd have to look at the go.mod of teach
dependencies, recursively... Well, I think the top-level go.mod has all
the info, but what of the others?

>  - Run the package manager from the language to assemble the "vendor"
> tree in the source dir (maybe same step).

    go mod vendor

And that is all, I don't even need to look at go.mod and parse it to
know where to extract things; or even where to download them from.

> > > You would parse the go.mod file I suppose, but that doesn't give you
> > > indirect dependencies. Perhaps some Go tool can help with that ? But
> > > indeed, that's a good question.
> ?? go.mod handles indirect dependencies.
> > And what about cargo? npm? php composer? Others? (what, there are
> > others? ;-] )
> package.lock yarn.lock. Require it.

I guess package.log is for npm. No idea what yarn is. Still, that's only
two out of at least three...

> > I do not want to have to repeat the vendoring logic in Buildroot.
> Why repeat it? Re-use it from the programming language! Not everything
> has to be in bash.

It's not about the language; it's about the logic.

> > Also, I do not want that we have various level of vendoring support for
> > the various package managers.
> OK, so we implement it across the board, which language would not be
> able to support this?

npm is noptorious for having very bad behaviour wrt vendoring dependencies,
for example (in my limited suffering^Wexperience packaging npm stuff, I
have to admit)

> > > >   - when you generate the legal-info/ directory, how do you know what to
> > > >     put in there for that package? You are back to the problem above,
> > > >     plus you would also want to ignore those vendored deps that are not
> > > >     redistributable, although we have no way in Buildroot to describe
> > > >     that either....
> Use the license field in the package.json or wherever the specifiers
> exist, and if they aren't there, detect common LICENSE file names, if
> you can't find anything, fail.

How do we know that such or such vendored depednency has to be
redsitributed?

But is license "(C) BIG CORP" a redistributable license or not?

> Go has a few very robust license detector packages. (if desired).

It is not only about detecting the license (which is however a very
important step, indeed)m but it is about deciding whether to
redistribute it or not.

If we assume that all vendored stuff is only FLOSS and can only be
FLOSS, then that is OK: we redistribute everything that is vendored.

But that is not the cae: if a proprietary package vendors another
proprietary package, how do we know that we should not redistribute that
second package as well? Knowing the license name is *not* enough to
decide; only a human can tell.

> > So, if we jut concentrate on how we can help people do exactly that:
> > filter out the bits they do not want to redistribute?
> >
> > One solution would be to have packages provide some legal-inf hooks,
> > something like (e.g.: only keep files which names match the regexp):
> >
> >     FOO_LEGAL_INFO_FILTER_REGEXP = ^vendor/FLOSS/
> >
> > Or whatever, that would be applied at the time the legal-info is
> > generated.
> 
> How does this solve the problem? If I need to give the source tarballs
> away for dependencies, and it's all mixed into one massive tarball,
> you can't separate things out and keep the hashes the same

It solves the problem that the legal-info/ directory only contains what
you accept to redistribute.

> I thought the requirement was that you would be able to send someone
> the buildroot "dl" directory and be able to perform a build without
> network fetches.

Wait, you are confusing the two: the content of dl/ which is used at
build time, and from which we extract the sources that are built, and
the content of legal-info/ which contains what you should provide to
be in compliance with licenses terms.

> > Paint me unconvinced.
> What's the alternative?

Please re-review my proposal: the content of dl/ would always contains
everything unmolested. It is only when calling 'make legal-info' that
the filtering would be applied, and a new archive would be genrated with
only the filter (or filtered-out) content. I.e. basically:

    $ make legal-info
        for pkg in PACKAGES:
            if pkg.FOO_LEGAL_INFO_FILTER_REGEXP is not set:
                copy dl/foo-version.tar.gz to legal-info/foo-version/foo-version.tar.gz
                continue
            extract dl/foo-version.tar.gz \
                into temp-dir/ \
                if file matches pkg.FOO_LEGAL_INFO_FILTER_REGEXP
            create legal-info/foo-version/foo-version.tar.gz \
                from temp-dir/

Regards,
Yann E. MORIN.
Christian Stewart Sept. 3, 2020, 9:47 p.m. UTC | #14
Hi Yann,

On Thu, Sep 3, 2020 at 1:44 PM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> On 2020-09-03 12:40 -0700, Christian Stewart spake thusly:
> > In what world would a Buildroot package ever be added as an in-tree
> > package with a proprietary library *copied into the source code tree*
> > ??
>
> I never talked about upstream Buildroot.
>
> But consider that people would take Buildroot, put it in git in their
> internal git server, and modify it by adding local packages to it. Or
> they could also use a br2-external tree.

People are going to take Buildroot and store a private copy of it on
their internal server, make modifications?

All of this on a GPLv2 licensed project? I thought this wasn't legal?

For br2-external packages you can relax the LICENSE requirement.

> The packages at stake here are non-public packages that people write as
> part of their day-time job in their company. It is totally possible that
> someone belives it would be "easier" to have the source of a dependency
> bundled in their proprietary package.

That's fine, but that would go into br2-external as you've said.

> Which is *exactly* the case about a proprietary package vendoring a set
> of external libraries.

If proprietary package is importing some external libraries that may
be permissively licensed, even requiring redistribution in source
form, without the proprietary section - how do you redistribute those
dependencies separately?

> > We're enforcing hash checks on these bundles. The format may not
> > always be the same across versions. Storing the source code before
> > it's extracted into a vendor tree is the only way to be sure that the
> > hashes won't change between iterations of the package manager.
>
> If a package vendors unversioned dependencies, then indeed we can't add
> hashes for that package, because two builds may get different stuff for
> each such vendored dependency, like we don;t add hashes for stuff we
> know is not reproducible (bzr, cvs, hg for example).

I don't understand what you're saying here. It should not be possible
to have the package manager bring in arbitrary dependencies at build
time. Buildroot builds are meant to produce the same output every
time, right?

> > It's
> > also the only way to redistribute the source code packages for the
> > libraries independently from the proprietary part,
>
> Except as I explained, it does not work in case the dependencies have
> dependencies to other proprietary packages, at an arbitrary depth...

I don't understand what you're saying here.

Package A (in buildroot) imports package B. Package B imports
proprietary package C.

Result: three tarballs, package-a-1.0.1.tar.gz,
package-b-0.0.5.tar.gz, package-c-fooversion.tar.gz.

> With my proposal, it would not be: there would be a single archive, for
> which we have hashes. Then when we call legal-info, the package filter
> is applied to generate a new archive in legal-info, which only contains
> the filterd files.

Yes this is simpler but it won't work in every case. The vendor tree
or the node_modules tree might have some minor things changed about it
which will break the hash. Node-modules also often contains symlinks
and binaries prepared for the particular host build system.

> And in the output of legal-info, we do not store the hahes from the
> infra, we calculate the hashes already:
>
>     https://git.buildroot.org/buildroot/tree/Makefile#n870
>
> ... so we do not need to have hashes of download archives match the
> legal-info archives.

I don't agree that legal is the only thing that matters here, you also
want to be sure that you'll have a Buildroot build that works every
time, without internet access, if you have a full "make download" pass
run in advance.

> > It's the only way to deduplicate downloads of identical
> > package versions, to do LICENSE checks on dependencies, etc etc etc.
>
> That would not de-duplicate, because the separate archives would end up
> in $(FOO_DL_DIR), which is $(BR2_DL_DIR)/foo/ anyway.

I don't understand what you're saying here. If I download package-c
dependency at 1.0.4 it will be under - for example -
$(BR2_DL_DIR)/go-modules/package-c/package-c-1.0.4.tar.gz. The
deduplication is for package dependencies with identical versions and
identical source URLs.


> Just running the package manager and compressing the result is the
> easiest and simplest solution, that will work reliably and consistently
> across all cases.

I don't agree, there are tons of cases where simply compressing the
result after running "npm install" or "go mod vendor" will not
necessarily work.

You're also going to need to download tons of dependencies for
features of the program that you may not even have enabled in your
Buildroot config.

> > > > >   - at extract step, how do you know that you have to also extract the
> > > > >     archive with the vendor stuff? Probably easy by looking if said
> > > > >     archive exists. But then, if each dependency is stored in its own
> > > > >     archive, how do you know which to extract?
> >
> >  - Extract the main package
> >  - Check the package.json or go.mod or cargo or whatever
> >  - Extract the relevant stuff into a format the package manager understands
>
> This is what I mean by "reinventing the logic of the package managers".
> Because this one go.mod would refer to dependencies, that my have their
> won dependencies, so we'd have to look at the go.mod of teach
> dependencies, recursively... Well, I think the top-level go.mod has all
> the info, but what of the others?

This is already implemented as a library in Go. You don't have to
re-do it from scratch.

https://pkg.go.dev/golang.org/x/tools/go/packages?tab=doc

The top-level go.mod and go.sum have all information on transient and
indirect dependencies.

> >  - Run the package manager from the language to assemble the "vendor"
> > tree in the source dir (maybe same step).
>
>     go mod vendor
>
> And that is all, I don't even need to look at go.mod and parse it to
> know where to extract things; or even where to download them from.

And how is this better than running a Go program which understands how
to download dependencies into the .tar.gz format that we expect, and
to fetch them back again from that format into the Go module cache,
and then the vendor/ tree?

> > > > You would parse the go.mod file I suppose, but that doesn't give you
> > > > indirect dependencies. Perhaps some Go tool can help with that ? But
> > > > indeed, that's a good question.
> > ?? go.mod handles indirect dependencies.
> > > And what about cargo? npm? php composer? Others? (what, there are
> > > others? ;-] )
> > package.lock yarn.lock. Require it.
>
> I guess package.log is for npm. No idea what yarn is. Still, that's only
> two out of at least three...

Are you saying it's not possible to collect an index of indirect
dependencies with those?

> > > I do not want to have to repeat the vendoring logic in Buildroot.
> > Why repeat it? Re-use it from the programming language! Not everything
> > has to be in bash.
>
> It's not about the language; it's about the logic.

I don't understand what you mean.

> > > Also, I do not want that we have various level of vendoring support for
> > > the various package managers.
> > OK, so we implement it across the board, which language would not be
> > able to support this?
>
> npm is noptorious for having very bad behaviour wrt vendoring dependencies,
> for example (in my limited suffering^Wexperience packaging npm stuff, I
> have to admit)

And this is exactly why compressing the node_modules is not enough.

> > > > >   - when you generate the legal-info/ directory, how do you know what to
> > > > >     put in there for that package? You are back to the problem above,
> > > > >     plus you would also want to ignore those vendored deps that are not
> > > > >     redistributable, although we have no way in Buildroot to describe
> > > > >     that either....
> > Use the license field in the package.json or wherever the specifiers
> > exist, and if they aren't there, detect common LICENSE file names, if
> > you can't find anything, fail.
>
> How do we know that such or such vendored depednency has to be
> redsitributed?
>
> But is license "(C) BIG CORP" a redistributable license or not?

If you run "make source" it collects source for everything to produce
the build, correct?

So, in this case we would collect everything needed for the build,
scan LICENSE, if the package is in the Buildroot tree, fail if we
don't recognize all the LICENSE files (allowing for manual override of
course), and if it's in buildroot-ext, assume anything without LICENSE
or with an unrecognized LICENSE is not redistributable and show a
warning.

You wouldn't put anything proprietary into Buildroot proper since it's
a GPLv2 project. It would be a extension package.

> > Go has a few very robust license detector packages. (if desired).
>
> It is not only about detecting the license (which is however a very
> important step, indeed)m but it is about deciding whether to
> redistribute it or not.
>
> If we assume that all vendored stuff is only FLOSS and can only be
> FLOSS, then that is OK: we redistribute everything that is vendored.
>
> But that is not the cae: if a proprietary package vendors another
> proprietary package, how do we know that we should not redistribute that
> second package as well? Knowing the license name is *not* enough to
> decide; only a human can tell.

OK, so you put a manual override to block things from being included
in the redistributable... I still don't see how compressing the entire
source tree after "npm install" or "go mod vendor" would address this,
in this case you're going to unconditionally include all proprietary
code into that package and redistribute it no matter what.

> > > So, if we jut concentrate on how we can help people do exactly that:
> > > filter out the bits they do not want to redistribute?
> > >
> > > One solution would be to have packages provide some legal-inf hooks,
> > > something like (e.g.: only keep files which names match the regexp):
> > >
> > >     FOO_LEGAL_INFO_FILTER_REGEXP = ^vendor/FLOSS/
> > >
> > > Or whatever, that would be applied at the time the legal-info is
> > > generated.
> >
> > How does this solve the problem? If I need to give the source tarballs
> > away for dependencies, and it's all mixed into one massive tarball,
> > you can't separate things out and keep the hashes the same
>
> It solves the problem that the legal-info/ directory only contains what
> you accept to redistribute.

This makes sense, probably for Go this would be better as a regexp on
the package import path - for example, ^github.com/myprivateorg/ would
deny all of the myprivateorg packages from inclusion in legal-info,
this is quite similar to the GOPRIVATE variable in golang.

> > I thought the requirement was that you would be able to send someone
> > the buildroot "dl" directory and be able to perform a build without
> > network fetches.
>
> Wait, you are confusing the two: the content of dl/ which is used at
> build time, and from which we extract the sources that are built, and
> the content of legal-info/ which contains what you should provide to
> be in compliance with licenses terms.

I agree, I think these are two things that are being mixed here.

> > > Paint me unconvinced.
> > What's the alternative?
>
> Please re-review my proposal: the content of dl/ would always contains
> everything unmolested. It is only when calling 'make legal-info' that
> the filtering would be applied, and a new archive would be genrated with
> only the filter (or filtered-out) content. I.e. basically:
>
>     $ make legal-info
>         for pkg in PACKAGES:
>             if pkg.FOO_LEGAL_INFO_FILTER_REGEXP is not set:
>                 copy dl/foo-version.tar.gz to legal-info/foo-version/foo-version.tar.gz
>                 continue
>             extract dl/foo-version.tar.gz \
>                 into temp-dir/ \
>                 if file matches pkg.FOO_LEGAL_INFO_FILTER_REGEXP
>             create legal-info/foo-version/foo-version.tar.gz \
>                 from temp-dir/

This handles legal-info but not "make download" or "make extract" or
"make source".

Probably best approach is some combination of the two?

Best regards,
Christian
Yann E. MORIN Sept. 4, 2020, 8:06 a.m. UTC | #15
Christian, All,

On 2020-09-03 14:47 -0700, Christian Stewart spake thusly:
> On Thu, Sep 3, 2020 at 1:44 PM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> > On 2020-09-03 12:40 -0700, Christian Stewart spake thusly:
[--SNIP--]
> People are going to take Buildroot and store a private copy of it on
> their internal server, make modifications?
> 
> All of this on a GPLv2 licensed project? I thought this wasn't legal?

Of course this is totally legit. You do it all the time when you giot
clone toyour local machine and do modifications, for example.

> For br2-external packages you can relax the LICENSE requirement.

But in the Buildroot infra, we do not treat packages from br2-external
differently from in-tree packages.

> > The packages at stake here are non-public packages that people write as
> > part of their day-time job in their company. It is totally possible that
> > someone belives it would be "easier" to have the source of a dependency
> > bundled in their proprietary package.
> 
> That's fine, but that would go into br2-external as you've said.

No.

> > Which is *exactly* the case about a proprietary package vendoring a set
> > of external libraries.
> 
> If proprietary package is importing some external libraries that may
> be permissively licensed, even requiring redistribution in source
> form, without the proprietary section - how do you redistribute those
> dependencies separately?

Redistribution is the role of legal-info, see below how I suggest this
is handled.

> > > We're enforcing hash checks on these bundles. The format may not
> > > always be the same across versions. Storing the source code before
> > > it's extracted into a vendor tree is the only way to be sure that the
> > > hashes won't change between iterations of the package manager.
> >
> > If a package vendors unversioned dependencies, then indeed we can't add
> > hashes for that package, because two builds may get different stuff for
> > each such vendored dependency, like we don;t add hashes for stuff we
> > know is not reproducible (bzr, cvs, hg for example).
> 
> I don't understand what you're saying here. It should not be possible
> to have the package manager bring in arbitrary dependencies at build
> time. Buildroot builds are meant to produce the same output every
> time, right?

For example, dependencies in npm are loose, where yiou can say "I need
package bar at version 1.x". So at some point, the 'x' in '1.x' will
match the latest '1.1' and use that, but the next day, '1.2' might get
released, and the 'x' would match that. So if bar 1.2 brings in a new
dependency,or a new version of an existing dependency, two builds do not
provide the same output and are thus not reproducible.

See:
    https://docs.npmjs.com/files/package.json#dependencies
    https://docs.npmjs.com/misc/semver

> > > It's
> > > also the only way to redistribute the source code packages for the
> > > libraries independently from the proprietary part,
> >
> > Except as I explained, it does not work in case the dependencies have
> > dependencies to other proprietary packages, at an arbitrary depth...
> 
> I don't understand what you're saying here.
> 
> Package A (in buildroot) imports package B. Package B imports
> proprietary package C.

That is the other way around: the top-leval package is proprietary, and
it imports FLOSS packages:

  - foo is proprietary
    - foo vendors bar
      - bar is proprietary
        - bar vendors buz, which is FLOSS (e.g. MIT, LGPL...)
          - buz vendors ni
            - ni is FLOSS...
    - foo vendors doh
      - doh is FLOSS...
        - doh vendors bla
          - bla is FLOSS...

> > With my proposal, it would not be: there would be a single archive, for
> > which we have hashes. Then when we call legal-info, the package filter
> > is applied to generate a new archive in legal-info, which only contains
> > the filterd files.
> 
> Yes this is simpler but it won't work in every case. The vendor tree
> or the node_modules tree might have some minor things changed about it
> which will break the hash.

Then if the package manager can not generate reproducible archives, we
can't have hashes for it, period.

> Node-modules also often contains symlinks
> and binaries prepared for the particular host build system.

But those are created at build time, not at download time, right? Well,
node is so awful, that I would not be surprised...

> > And in the output of legal-info, we do not store the hahes from the
> > infra, we calculate the hashes already:
> >
> >     https://git.buildroot.org/buildroot/tree/Makefile#n870
> >
> > ... so we do not need to have hashes of download archives match the
> > legal-info archives.
> 
> I don't agree that legal is the only thing that matters here, you also
> want to be sure that you'll have a Buildroot build that works every
> time, without internet access, if you have a full "make download" pass
> run in advance.

Exactly the role of the dl/ directory; and that would contain the
complete downloaded archives with all the vendored dependencies.

> > > It's the only way to deduplicate downloads of identical
> > > package versions, to do LICENSE checks on dependencies, etc etc etc.
> >
> > That would not de-duplicate, because the separate archives would end up
> > in $(FOO_DL_DIR), which is $(BR2_DL_DIR)/foo/ anyway.
> 
> I don't understand what you're saying here. If I download package-c
> dependency at 1.0.4 it will be under - for example -
> $(BR2_DL_DIR)/go-modules/package-c/package-c-1.0.4.tar.gz. The
> deduplication is for package dependencies with identical versions and
> identical source URLs.

Ah, because you store all the go vendored dependencies "away" of the
package.

I am not sure I like that, because it breaks the semantics of the dl/
directory: all the csource needed by one package is in dl/foo/ and the
vendored dependencies *are* part of the package.

So I am absolutely not in favour of storing the go modules on the side.

> > Just running the package manager and compressing the result is the
> > easiest and simplest solution, that will work reliably and consistently
> > across all cases.
> 
> I don't agree, there are tons of cases where simply compressing the
> result after running "npm install" or "go mod vendor" will not
> necessarily work.

Why?

> You're also going to need to download tons of dependencies for
> features of the program that you may not even have enabled in your
> Buildroot config.

So, when you download the linux kernel, there are tons of code for tons
of features you don;t need.

Really, vendoring is, tome, exactly like bundling, except that the
aggregation happens on the client side, at download time, rather than on
the developpers machine that pushes a pre-aggregated "repository".

> > > > > >   - at extract step, how do you know that you have to also extract the
> > > > > >     archive with the vendor stuff? Probably easy by looking if said
> > > > > >     archive exists. But then, if each dependency is stored in its own
> > > > > >     archive, how do you know which to extract?
> > >
> > >  - Extract the main package
> > >  - Check the package.json or go.mod or cargo or whatever
> > >  - Extract the relevant stuff into a format the package manager understands
> >
> > This is what I mean by "reinventing the logic of the package managers".
> > Because this one go.mod would refer to dependencies, that my have their
> > won dependencies, so we'd have to look at the go.mod of teach
> > dependencies, recursively... Well, I think the top-level go.mod has all
> > the info, but what of the others?
> 
> This is already implemented as a library in Go. You don't have to
> re-do it from scratch.
> 
> https://pkg.go.dev/golang.org/x/tools/go/packages?tab=doc

But I don;t want we even have to deal with the internals of the go
module stuff at all, that's what I'm saying.

I don;t want we have to write a script (in whatever language) that has
to deal with the go module internals, to reproduce the logic.

go mod vendor is all that we should ever need to create the archives.

> The top-level go.mod and go.sum have all information on transient and
> indirect dependencies.

Good for go. Last I had to play wit ^W^W suffer of npm stuff, that was
not the case IIRC: a paclkage would only list its first-level
dependencies, and the install would recurse into that... And since the
versions of depenencies are floating, there is no way you can now what
you'd need ahead of time.

So again, I want that we have a consistent handling of all the package
managers, that they all behave the same.

> > >  - Run the package manager from the language to assemble the "vendor"
> > > tree in the source dir (maybe same step).
> >
> >     go mod vendor
> >
> > And that is all, I don't even need to look at go.mod and parse it to
> > know where to extract things; or even where to download them from.
> 
> And how is this better than running a Go program which understands how
> to download dependencies into the .tar.gz format that we expect, and
> to fetch them back again from that format into the Go module cache,
> and then the vendor/ tree?

Because writing a go program (or any other language) is duplicating the
logic of 'go mod vendor'.

> > I guess package.log is for npm. No idea what yarn is. Still, that's only
> > two out of at least three...
> Are you saying it's not possible to collect an index of indirect
> dependencies with those?

IIRC, for NPM, no. Or not trivially, or not reproducibly.

> > > > I do not want to have to repeat the vendoring logic in Buildroot.
> > > Why repeat it? Re-use it from the programming language! Not everything
> > > has to be in bash.
> > It's not about the language; it's about the logic.
> I don't understand what you mean.

The logic of vendoring.

For example, this could be a logic:

    def fill_deps(dep_file):
        for dep in file.read(dep_file):
            download(dep)
            fill_deps(dep.dependencies)

    def main():
        fill_deps(load_main_deps())

I do not want we do that, because it will be a maintenance burden, and
is duplicating the logic the package managers are there to cover in the
first place!

> > How do we know that such or such vendored depednency has to be
> > redsitributed?
> >
> > But is license "(C) BIG CORP" a redistributable license or not?
> 
> If you run "make source" it collects source for everything to produce
> the build, correct?
> 
> So, in this case we would collect everything needed for the build,
> scan LICENSE, if the package is in the Buildroot tree, fail if we
> don't recognize all the LICENSE files (allowing for manual override of
> course), and if it's in buildroot-ext, assume anything without LICENSE
> or with an unrecognized LICENSE is not redistributable and show a
> warning.
> 
> You wouldn't put anything proprietary into Buildroot proper since it's
> a GPLv2 project. It would be a extension package.

We do have proprietary packages in Buidlroot:

    boot/s500-bootloader/
    package/armbian-firmware/
    package/nvidia-driver/
    package/wilc1000-firmware/

And quite a few others...

IEANAL and all disclaimers... And no, this is not a violation of the
GPLv2 at all: Buildroot is not a derived work of those, not are those
a derived work of Buildroot.

(Note: I dropped the rest of the mail because I don't have time to reply
to it right now, and I am afraid I would anyway re-hash what I already
said...)

Regards,
Yann E. MORIN.
Christian Stewart Sept. 4, 2020, 4:07 p.m. UTC | #16
Hi Yann,

On Fri, Sep 4, 2020 at 1:06 AM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> On 2020-09-03 14:47 -0700, Christian Stewart spake thusly:
> > On Thu, Sep 3, 2020 at 1:44 PM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> > People are going to take Buildroot and store a private copy of it on
> > their internal server, make modifications?
>
> Of course this is totally legit. You do it all the time when you giot
> clone toyour local machine and do modifications, for example.

OK.

> Redistribution is the role of legal-info, see below how I suggest this
> is handled.

This isn't really my domain (legal-info) and what you've suggested -
picking the dependencies out of vendor/ - might work for this.

However, how do you know which version of the dependency is coming out
of vendor/ ? I suppose you're going to redistribute
my-package-vendor.tar.gz with anything proprietary excluded? So
ultimately you redistribute 15 copies of the same thing?

> > I don't understand what you're saying here. It should not be possible
> > to have the package manager bring in arbitrary dependencies at build
> > time. Buildroot builds are meant to produce the same output every
> > time, right?
>
> For example, dependencies in npm are loose, where yiou can say "I need
> package bar at version 1.x". So at some point, the 'x' in '1.x' will
> match the latest '1.1' and use that, but the next day, '1.2' might get
> released, and the 'x' would match that. So if bar 1.2 brings in a new
> dependency,or a new version of an existing dependency, two builds do not
> provide the same output and are thus not reproducible.

We're really going to add packages to Buildroot which will have fuzzy
dependencies and might bring in something different if Npm is having a
bad day?

For third party packages, this makes sense - you wouldn't have a hash
on it. But for packages in the Buildroot tree you would probably
expect a hash + a lock file. The same goes for Go with go.mod and
go.sum - without go.sum you can't be sure it will be the same every
time, and should not have a hash there.

Even with package-lock.json, the node_modules will not necessarily
produce the same hash every time, particularly across different OS
versions. So, are you saying we're not going to put hashes on
/anything/ that uses npm?

Go Modules is designed around always downloading the exact same
dependencies, and our approach for that language can at least be built
around that assumption (that it's going to use the go.sum every time
to produce the same dependency output for most Buildroot in-tree
packages).

> > > > It's
> > > > also the only way to redistribute the source code packages for the
> > > > libraries independently from the proprietary part,
> > >
> > > Except as I explained, it does not work in case the dependencies have
> > > dependencies to other proprietary packages, at an arbitrary depth...
> >
> > Package A (in buildroot) imports package B. Package B imports
> > proprietary package C.
>
> That is the other way around: the top-leval package is proprietary, and
> it imports FLOSS packages:
>
>   - foo is proprietary
>     - foo vendors bar
>       - bar is proprietary
>         - bar vendors buz, which is FLOSS (e.g. MIT, LGPL...)
>           - buz vendors ni
>             - ni is FLOSS...
>     - foo vendors doh
>       - doh is FLOSS...
>         - doh vendors bla
>           - bla is FLOSS...

foo is proprietary - download to foo-version.tar.gz

foo selects bar - download to bar-version.tar.gz.

bar selects baz - download to baz-version.tar.gz

foo selects baz - we already have it in baz-version.tar.gz.

You can package them separately. Yes, you need to look into the first
archive to know what the dependencies are. But, you need to do this
with the vendoring anyway.

> > Yes this is simpler but it won't work in every case. The vendor tree
> > or the node_modules tree might have some minor things changed about it
> > which will break the hash.
>
> Then if the package manager can not generate reproducible archives, we
> can't have hashes for it, period.

No hashes for node_modules across all node_modules packages? :(

Buildroot packages that are actually merged into mainline, I would
expect to produce reproducible output and have hashes on their
downloaded source code files. It doesn't make sense to have a mainline
package which is fetching random stuff due to fuzzy semver specifiers.

For external or third party packages, those wouldn't have source code
hashes, and that makes sense.

For Go at least it will always be possible to make a hashed set of
source code archives. It's possible to download & compress each
dependency independently for Go as well, and analyze the licenses and
whatever else.

> > Node-modules also often contains symlinks
> > and binaries prepared for the particular host build system.
>
> But those are created at build time, not at download time, right? Well,
> node is so awful, that I would not be surprised...

No - download time - the post install hooks are run. For example,
electron might download + extract a tarball with Chromium.

> > I don't agree that legal is the only thing that matters here, you also
> > want to be sure that you'll have a Buildroot build that works every
> > time, without internet access, if you have a full "make download" pass
> > run in advance.
>
> Exactly the role of the dl/ directory; and that would contain the
> complete downloaded archives with all the vendored dependencies.

You want to have a download archive for every Buildroot package which
contains ALL the dependencies for the package, and make this the ONLY
way to do this?

Even when node_modules won't necessarily work on some machines without
running "npm install" again?

The storage space increase alone of storing the entire node_modules
for multiple packages across potentially many versions is a reason to
at least consider an alternative.

There's a way to split dependency source tarballs properly - even in
Node - and even if it was impossible in Node, the limitations of Node
shouldn't prevent us from doing this for a language that /can/ support
it like Go.

I understand that the goal is to keep things as simple as possible and
avoid adding any work for the maintainers, and indeed this makes
sense.

> > I don't understand what you're saying here. If I download package-c
> > dependency at 1.0.4 it will be under - for example -
> > $(BR2_DL_DIR)/go-modules/package-c/package-c-1.0.4.tar.gz. The
> > deduplication is for package dependencies with identical versions and
> > identical source URLs.
>
> Ah, because you store all the go vendored dependencies "away" of the
> package.
>
> I am not sure I like that, because it breaks the semantics of the dl/
> directory: all the csource needed by one package is in dl/foo/ and the
> vendored dependencies *are* part of the package.

We aren't actually running "go mod vendor" or "npm install" and
storing the result in the tarballs (as far as I am aware) today.

> So I am absolutely not in favour of storing the go modules on the side.

Just to clarify, so I understand, the reasons not to are ?

 - Currently we can say that if you extract a .tar.gz from dl it will
be buildable
 - It's too hard to add code to manage dependencies
 - Some packages don't use locking and are getting different deps everytime

And you're willing to take the downsides of mixing together all sorts
of proprietary and FLOSS code into these .tar.gz download files,
making it next to impossible to independently redistribute
dependencies from proprietary Go packages (for example?) and
increasing the size of the dl/ tree many times over?

> > You're also going to need to download tons of dependencies for
> > features of the program that you may not even have enabled in your
> > Buildroot config.
>
> So, when you download the linux kernel, there are tons of code for tons
> of features you don;t need.
>
> Really, vendoring is, tome, exactly like bundling, except that the
> aggregation happens on the client side, at download time, rather than on
> the developpers machine that pushes a pre-aggregated "repository".

It's entirely avoidable to store duplicate copies of the dependencies
for Go programs many times over in vendor/ trees compressed into the
.tar.gz with the root project.

Linux being a large project with tons of code you don't build, doesn't
necessarily preclude making some alternative when possible.

> > > This is what I mean by "reinventing the logic of the package managers".
> > > Because this one go.mod would refer to dependencies, that my have their
> > > won dependencies, so we'd have to look at the go.mod of teach
> > > dependencies, recursively... Well, I think the top-level go.mod has all
> > > the info, but what of the others?
> >
> > This is already implemented as a library in Go. You don't have to
> > re-do it from scratch.
> >
> > https://pkg.go.dev/golang.org/x/tools/go/packages?tab=doc
>
> But I don;t want we even have to deal with the internals of the go
> module stuff at all, that's what I'm saying.
>
> I don;t want we have to write a script (in whatever language) that has
> to deal with the go module internals, to reproduce the logic.

The packages and tools make it easy to do this without getting into
the internals, those are very high level APIs there.

I guess the only way to prove this particular point is to just write
the scripts as a RFC?

> > The top-level go.mod and go.sum have all information on transient and
> > indirect dependencies.
>
> Good for go. Last I had to play wit ^W^W suffer of npm stuff, that was
> not the case IIRC: a paclkage would only list its first-level
> dependencies, and the install would recurse into that... And since the
> versions of depenencies are floating, there is no way you can now what
> you'd need ahead of time.

I was not aware that you would ever add a package to Buildroot which
uses floating semver selections, and could get a different version
between "make" executions. Please remind me to never build or install
any of those packages :)

> Because writing a go program (or any other language) is duplicating the
> logic of 'go mod vendor'.

But we would still be running "go mod vendor" - the idea is to
pre-fill the Go modules cache from the Buildroot dl tree, and avoid
having tarballs in the dl tree that contain source code from multiple
projects simultaneously.

The goal is also to avoid breaking package source code download hashes
after upgrading the tool, due to a change in the format of vendor/, to
reduce source code tarball sizes, to make it easier to separate out
proprietary and FLOSS components, and to ensure that the build is
reproducible.

I'll build & submit a RFC prototype so that it's clearer what I'm
actually suggesting here.

> > > I guess package.log is for npm. No idea what yarn is. Still, that's only
> > > two out of at least three...
> > Are you saying it's not possible to collect an index of indirect
> > dependencies with those?
>
> IIRC, for NPM, no. Or not trivially, or not reproducibly.

You absolutely can collect a manifest of dependencies reproducibly and
easily with the package-lock.json.

> > > > > I do not want to have to repeat the vendoring logic in Buildroot.
> > > > Why repeat it? Re-use it from the programming language! Not everything
> > > > has to be in bash.
> > > It's not about the language; it's about the logic.
> > I don't understand what you mean.
>
> The logic of vendoring.

All of which is implemented in these languages already in a format
that is at a very high level and will require little to no
"reinventing the wheel" from us. (at least, for Go)

> > You wouldn't put anything proprietary into Buildroot proper since it's
> > a GPLv2 project. It would be a extension package.
>
> We do have proprietary packages in Buidlroot:
>
>     boot/s500-bootloader/
>     package/armbian-firmware/
>     package/nvidia-driver/
>     package/wilc1000-firmware/
>
> And quite a few others...

OK, I see what you mean by proprietary.

> (Note: I dropped the rest of the mail because I don't have time to reply
> to it right now, and I am afraid I would anyway re-hash what I already
> said...)

I'll get back to you all with a RFC patch for the Go approach. I can't
speak for node but allowing Node to have unhashed fuzzy semver
specifiers in Buildroot, is not something I can recommend, since it
seems almost like a security issue.

Best regards,
Christian Stewart
Sam Voss Sept. 4, 2020, 8:25 p.m. UTC | #17
Hey everybody,

Was a long end of the week, and I finally had a minute to catch back up.

On Fri, Sep 4, 2020 at 11:07 AM Christian Stewart <christian@paral.in> wrote:
>
[snip]
> > Redistribution is the role of legal-info, see below how I suggest this
> > is handled.
>
> This isn't really my domain (legal-info) and what you've suggested -
> picking the dependencies out of vendor/ - might work for this.
>
> However, how do you know which version of the dependency is coming out
> of vendor/ ? I suppose you're going to redistribute
> my-package-vendor.tar.gz with anything proprietary excluded? So
> ultimately you redistribute 15 copies of the same thing?

I think that is an unfortunate consequence, yes, that we will have
some overlap of released things if two top-level packages which share
a similar dep chain. I think for now that should not be the focus of
the topic, as it's an unfortunate side-effect disk space is cheap.
>
> > > I don't understand what you're saying here. It should not be possible
> > > to have the package manager bring in arbitrary dependencies at build
> > > time. Buildroot builds are meant to produce the same output every
> > > time, right?
> >
> > For example, dependencies in npm are loose, where yiou can say "I need
> > package bar at version 1.x". So at some point, the 'x' in '1.x' will
> > match the latest '1.1' and use that, but the next day, '1.2' might get
> > released, and the 'x' would match that. So if bar 1.2 brings in a new
> > dependency,or a new version of an existing dependency, two builds do not
> > provide the same output and are thus not reproducible.
>
> We're really going to add packages to Buildroot which will have fuzzy
> dependencies and might bring in something different if Npm is having a
> bad day?
>
> For third party packages, this makes sense - you wouldn't have a hash
> on it. But for packages in the Buildroot tree you would probably
> expect a hash + a lock file. The same goes for Go with go.mod and
> go.sum - without go.sum you can't be sure it will be the same every
> time, and should not have a hash there.

Cargo+rust also uses a lock file, and the patchset proposed by Patrick
uses the "lock" flag to disallow buildroot from "upgrading" any
packages on the fly. It must use what is in the lock.

>
> Even with package-lock.json, the node_modules will not necessarily
> produce the same hash every time, particularly across different OS
> versions. So, are you saying we're not going to put hashes on
> /anything/ that uses npm?
>
> Go Modules is designed around always downloading the exact same
> dependencies, and our approach for that language can at least be built
> around that assumption (that it's going to use the go.sum every time
> to produce the same dependency output for most Buildroot in-tree
> packages).
>
> > > > > It's
> > > > > also the only way to redistribute the source code packages for the
> > > > > libraries independently from the proprietary part,
> > > >
> > > > Except as I explained, it does not work in case the dependencies have
> > > > dependencies to other proprietary packages, at an arbitrary depth...
> > >
> > > Package A (in buildroot) imports package B. Package B imports
> > > proprietary package C.

I'm not sure if this makes sense, in a general sense. I'm not sure how
we would anticipate this actually happening, where a FLOSS package
grabs something proprietary.

And to that point, maybe proprietary isn't what we should necessarily
be directly describing. In my mind it should be considered as
non-openly distributable.

> >
> > That is the other way around: the top-leval package is proprietary, and
> > it imports FLOSS packages:
> >
> >   - foo is proprietary
> >     - foo vendors bar
> >       - bar is proprietary
> >         - bar vendors buz, which is FLOSS (e.g. MIT, LGPL...)
> >           - buz vendors ni
> >             - ni is FLOSS...
> >     - foo vendors doh
> >       - doh is FLOSS...
> >         - doh vendors bla
> >           - bla is FLOSS...
>
> foo is proprietary - download to foo-version.tar.gz
>
> foo selects bar - download to bar-version.tar.gz.
>
> bar selects baz - download to baz-version.tar.gz
>
> foo selects baz - we already have it in baz-version.tar.gz.
>
> You can package them separately. Yes, you need to look into the first
> archive to know what the dependencies are. But, you need to do this
> with the vendoring anyway.

I don't think I'm following what you're getting at here - if you mean
'select' at a buildroot-level, I think it makes sense that both foo
and bar have vendoring which include baz, because they both need it.

This goes back into what I said before, and while it isn't perfect I
think is going to be a fact of our first iteration without making this
so convoluted nobody is going to take it on leading to everybody
carrying patchsets internally with an RFC thrown out to the mailing
list and wishing whoever wants to handle it "good luck".

>
> > > Yes this is simpler but it won't work in every case. The vendor tree
> > > or the node_modules tree might have some minor things changed about it
> > > which will break the hash.
> >
> > Then if the package manager can not generate reproducible archives, we
> > can't have hashes for it, period.
>
> No hashes for node_modules across all node_modules packages? :(

To my understanding, a lot of node is kind of "wild west" still, we
had to pull stuff in through the "install node packages" string this
year, and handle all vendoring ourselves.

>
> Buildroot packages that are actually merged into mainline, I would
> expect to produce reproducible output and have hashes on their
> downloaded source code files. It doesn't make sense to have a mainline
> package which is fetching random stuff due to fuzzy semver specifiers.
>
> For external or third party packages, those wouldn't have source code
> hashes, and that makes sense.
>
> For Go at least it will always be possible to make a hashed set of
> source code archives. It's possible to download & compress each
> dependency independently for Go as well, and analyze the licenses and
> whatever else.
>
> > > Node-modules also often contains symlinks
> > > and binaries prepared for the particular host build system.
> >
> > But those are created at build time, not at download time, right? Well,
> > node is so awful, that I would not be surprised...
>
> No - download time - the post install hooks are run. For example,
> electron might download + extract a tarball with Chromium.
>
> > > I don't agree that legal is the only thing that matters here, you also
> > > want to be sure that you'll have a Buildroot build that works every
> > > time, without internet access, if you have a full "make download" pass
> > > run in advance.
> >
> > Exactly the role of the dl/ directory; and that would contain the
> > complete downloaded archives with all the vendored dependencies.
>
> You want to have a download archive for every Buildroot package which
> contains ALL the dependencies for the package, and make this the ONLY
> way to do this?
>
> Even when node_modules won't necessarily work on some machines without
> running "npm install" again?
>
> The storage space increase alone of storing the entire node_modules
> for multiple packages across potentially many versions is a reason to
> at least consider an alternative.

What "reasonable" alternative do we have at this point?

>
> There's a way to split dependency source tarballs properly - even in
> Node - and even if it was impossible in Node, the limitations of Node
> shouldn't prevent us from doing this for a language that /can/ support
> it like Go.

Agree. NPM is notoriously worse than other modern package managers.

>
> I understand that the goal is to keep things as simple as possible and
> avoid adding any work for the maintainers, and indeed this makes
> sense.
>
> > > I don't understand what you're saying here. If I download package-c
> > > dependency at 1.0.4 it will be under - for example -
> > > $(BR2_DL_DIR)/go-modules/package-c/package-c-1.0.4.tar.gz. The
> > > deduplication is for package dependencies with identical versions and
> > > identical source URLs.
> >
> > Ah, because you store all the go vendored dependencies "away" of the
> > package.
> >
> > I am not sure I like that, because it breaks the semantics of the dl/
> > directory: all the csource needed by one package is in dl/foo/ and the
> > vendored dependencies *are* part of the package.

Semantics are semantics though, and a deviation shouldn't be thrown
out just due to them.

[snip]

> The goal is also to avoid breaking package source code download hashes
> after upgrading the tool, due to a change in the format of vendor/, to
> reduce source code tarball sizes, to make it easier to separate out
> proprietary and FLOSS components, and to ensure that the build is
> reproducible.
>
> I'll build & submit a RFC prototype so that it's clearer what I'm
> actually suggesting here.

Please reach out to me, I think we should be on reasonably-same
footing on an RFC and we're going to want to leverage this for both Go
and Rust.

>
> > > > I guess package.log is for npm. No idea what yarn is. Still, that's only
> > > > two out of at least three...
> > > Are you saying it's not possible to collect an index of indirect
> > > dependencies with those?
> >
> > IIRC, for NPM, no. Or not trivially, or not reproducibly.
>
> You absolutely can collect a manifest of dependencies reproducibly and
> easily with the package-lock.json.
>
> > > > > > I do not want to have to repeat the vendoring logic in Buildroot.
> > > > > Why repeat it? Re-use it from the programming language! Not everything
> > > > > has to be in bash.
> > > > It's not about the language; it's about the logic.
> > > I don't understand what you mean.
> >
> > The logic of vendoring.
>
> All of which is implemented in these languages already in a format
> that is at a very high level and will require little to no
> "reinventing the wheel" from us. (at least, for Go)
>
> > > You wouldn't put anything proprietary into Buildroot proper since it's
> > > a GPLv2 project. It would be a extension package.
> >
> > We do have proprietary packages in Buidlroot:
> >
> >     boot/s500-bootloader/
> >     package/armbian-firmware/
> >     package/nvidia-driver/
> >     package/wilc1000-firmware/
> >
> > And quite a few others...
>
> OK, I see what you mean by proprietary.

I actually meant to make this point earlier, but didn't. I think
"proprietary" isn't necessarily what I meant to stir up here. In my
case, and I believe others, my proprietary is non-distributable. Some
proprietary is distributable.

This is where we go back to my original comments: I think keeping
these packages out of the dep-chain should be the responsibility of
the package owners. Maybe we don't even error out when we catch it
happening, because to your point we may not care. Or add an option to
explicitly override.

Sorry for snipping a lot out, but as with Christian I replied to
everything that I felt wouldn't send us back in circles.

Thanks,

Sam
Christian Stewart Sept. 10, 2020, 10:33 p.m. UTC | #18
Yann,

On Fri, Sep 4, 2020 at 1:06 AM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> Ah, because you store all the go vendored dependencies "away" of the
> package.
>
> I am not sure I like that, because it breaks the semantics of the dl/
> directory: all the csource needed by one package is in dl/foo/ and the
> vendored dependencies *are* part of the package.
>
> So I am absolutely not in favour of storing the go modules on the side.

I've thought about this quite a bit and I think you're probably right,
for a v1 implementation in Buildroot we most likely should do the
approach of "go mod vendor" just before compressing the source file
for "make source." It should work fine for Go (can't speak for node or
others) because the vendor/ tree is always consistent.

Best regards,
Christian
Arnout Vandecappelle Sept. 15, 2020, 7:10 p.m. UTC | #19
On 11/09/2020 00:33, Christian Stewart wrote:
> Yann,
> 
> On Fri, Sep 4, 2020 at 1:06 AM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
>> Ah, because you store all the go vendored dependencies "away" of the
>> package.
>>
>> I am not sure I like that, because it breaks the semantics of the dl/
>> directory: all the csource needed by one package is in dl/foo/ and the
>> vendored dependencies *are* part of the package.
>>
>> So I am absolutely not in favour of storing the go modules on the side.
> 
> I've thought about this quite a bit and I think you're probably right,
> for a v1 implementation in Buildroot we most likely should do the
> approach of "go mod vendor" just before compressing the source file
> for "make source." It should work fine for Go (can't speak for node or
> others) because the vendor/ tree is always consistent.

 Now that apparently everyone is on the same page, I think it's good to keep
track of the conclusions, so I've created a wiki page [1].

 If there is anything that needs further discussion, please continue that here
in the list. However, if there is something I just didn't write down correctly
or clearly enough, simply update the wiki.

 For your convenience, I've copied the wiki page below.


== Problem statement and requirements

Language package managers make it possible to express dependencies and make sure
the correct version of those dependencies are used.

For Buildroot, the problem is to make sure those dependencies are downloaded.
Sometimes the upstream package simply bundles all dependencies (this is called
"vendored"), but sometimes they only have them by reference.

Buildroot needs to make sure that:

* the dependencies are downloaded;
* the dependencies are extracted in a way that they're accessible by the
language's build system;
* everything is properly hashed so we are sure the correct tarballs were downloaded;
* it is possible to do "make source", unplug the network, and do a build;
* it is possible to use a BR2_PRIMARY_SITE and BR2_PRIMARY_SITE_ONLY;
* it is possible to use BR2_BACKUP_SITE;
* the information produced by "make legal-info" is correct.

An additional nice-to-have is to be able to split off the open source
dependencies from the non-redistributable dependencies.


== Possible solutions

1. Add a download helper that downloads the base source, runs the package
manager to vendor all dependencies, and creates a tarball of this bundle.
2. Add a download helper that downloads the base source, runs the package
manager to vendor all dependencies, and creates a separate tarball containing
only the dependencies.
3. Add a download helper that downloads the base source, runs the package
manager to vendor all dependencies, splits off the non-redistributable
dependencies, and creates a separate tarball for the open-source and the
non-redistributable dependencies.
4. Add a download helper that downloads the base source, runs the package
manager to get all dependencies, and creates a separate tarball for each
individual dependency.
5. Solution 1 + extra legal-info infrastructure that makes it possible to strip
off non-redistributable parts when generating the legal-info tarball.

Solution 1 is the simplest to implement. It is the approach taken by
http://patchwork.ozlabs.org/project/buildroot/list/?series=159771 for cargo.

Solution 2 is also easy to implement. However:

- the download infra has to "know" how to download the additional tarball (it
should not be downloaded when using the normal method, but it should be
downloaded for PRIMARY and BACKUP);
- the additional tarball has to be added to the hash file;
- the additional tarball has to be extracted as well.

Solution 3 is like solution 2, but times two.

Solution 4 is a lot more complicated to implement. It is probably doable to use
the language itself to wrap the package manager (which typically has a library
as well) to gather a list of all dependencies and URLs. Then each of them has to
be downloaded and tarred separately. However, to support PRIMARY and BACKUP,
we'll need to maintain this list inside Buildroot as well. In addition, we need
a separate hash for each tarball.

Solution 5 is somewhat orthogonal to the rest. It has the nice advantage that it
can also be applied to cases where the bundling is done without using a language
package manager (e.g. for C/C++ code).


 Regards,
 Arnout

[1] https://elinux.org/Buildroot:Language_package_managers_and_dependencies
Sam Voss Sept. 15, 2020, 8:08 p.m. UTC | #20
Hey Arnout,

On Tue, Sep 15, 2020 at 2:23 PM Arnout Vandecappelle <arnout@mind.be> wrote:
>
>
>
> On 11/09/2020 00:33, Christian Stewart wrote:
> > Yann,
> >
> > On Fri, Sep 4, 2020 at 1:06 AM Yann E. MORIN <yann.morin.1998@free.fr> wrote:
> >> Ah, because you store all the go vendored dependencies "away" of the
> >> package.
> >>
> >> I am not sure I like that, because it breaks the semantics of the dl/
> >> directory: all the csource needed by one package is in dl/foo/ and the
> >> vendored dependencies *are* part of the package.
> >>
> >> So I am absolutely not in favour of storing the go modules on the side.
> >
> > I've thought about this quite a bit and I think you're probably right,
> > for a v1 implementation in Buildroot we most likely should do the
> > approach of "go mod vendor" just before compressing the source file
> > for "make source." It should work fine for Go (can't speak for node or
> > others) because the vendor/ tree is always consistent.
>
>  Now that apparently everyone is on the same page, I think it's good to keep
> track of the conclusions, so I've created a wiki page [1].
>
>  If there is anything that needs further discussion, please continue that here
> in the list. However, if there is something I just didn't write down correctly
> or clearly enough, simply update the wiki.

Thanks for writing all of this down and creating a wiki for this. I
think this helps clarify our options and will give us room for
discussion.

>
>  For your convenience, I've copied the wiki page below.
>
>
> == Problem statement and requirements
>
> Language package managers make it possible to express dependencies and make sure
> the correct version of those dependencies are used.
>
> For Buildroot, the problem is to make sure those dependencies are downloaded.
> Sometimes the upstream package simply bundles all dependencies (this is called
> "vendored"), but sometimes they only have them by reference.
>
> Buildroot needs to make sure that:
>
> * the dependencies are downloaded;
> * the dependencies are extracted in a way that they're accessible by the
> language's build system;
> * everything is properly hashed so we are sure the correct tarballs were downloaded;
> * it is possible to do "make source", unplug the network, and do a build;
> * it is possible to use a BR2_PRIMARY_SITE and BR2_PRIMARY_SITE_ONLY;
> * it is possible to use BR2_BACKUP_SITE;
> * the information produced by "make legal-info" is correct.
>
> An additional nice-to-have is to be able to split off the open source
> dependencies from the non-redistributable dependencies.
>
>
> == Possible solutions
>
> 1. Add a download helper that downloads the base source, runs the package
> manager to vendor all dependencies, and creates a tarball of this bundle.
> 2. Add a download helper that downloads the base source, runs the package
> manager to vendor all dependencies, and creates a separate tarball containing
> only the dependencies.
> 3. Add a download helper that downloads the base source, runs the package
> manager to vendor all dependencies, splits off the non-redistributable
> dependencies, and creates a separate tarball for the open-source and the
> non-redistributable dependencies.
> 4. Add a download helper that downloads the base source, runs the package
> manager to get all dependencies, and creates a separate tarball for each
> individual dependency.
> 5. Solution 1 + extra legal-info infrastructure that makes it possible to strip
> off non-redistributable parts when generating the legal-info tarball.
>
> Solution 1 is the simplest to implement. It is the approach taken by
> http://patchwork.ozlabs.org/project/buildroot/list/?series=159771 for cargo.
>
> Solution 2 is also easy to implement. However:
>
> - the download infra has to "know" how to download the additional tarball (it
> should not be downloaded when using the normal method, but it should be
> downloaded for PRIMARY and BACKUP);
> - the additional tarball has to be added to the hash file;
> - the additional tarball has to be extracted as well.

I sent an RFC last night, which created a very crude initial
implementation of this, as I needed/wanted to leverage the atomic
saves in support/download/dl-wrapper.

After briefly chatting with Christian, we believe this could very
easily be leveraged by using `go mod vendor` instead of `cargo
vendor`.

http://lists.busybox.net/pipermail/buildroot/2020-September/292211.html

Thanks,
diff mbox series

Patch

diff --git a/package/pkg-golang.mk b/package/pkg-golang.mk
index 2d80e99619..88eb89a68e 100644
--- a/package/pkg-golang.mk
+++ b/package/pkg-golang.mk
@@ -98,6 +98,16 @@  endef
 
 $(2)_POST_EXTRACT_HOOKS += $(2)_APPLY_EXTRACT_GOMOD
 
+# WIP - download dependencies with the Go tool if vendor does not exist.
+define $(2)_DOWNLOAD_GOMOD
+	if [ ! -d $$(@D)/vendor ]; then \
+		cd $$(@D); \
+		go mod vendor; \
+	fi
+endef
+
+$(2)_POST_EXTRACT_HOOKS += $(2)_DOWNLOAD_GOMOD
+
 # Build step. Only define it if not already defined by the package .mk
 # file.
 ifndef $(2)_BUILD_CMDS