diff mbox series

[v2,2/3] docs: define policy limiting the inclusion of generated files

Message ID 20240516162230.937047-3-berrange@redhat.com
State New
Headers show
Series docs: define policy forbidding use of "AI" / LLM code generators | expand

Commit Message

Daniel P. Berrangé May 16, 2024, 4:22 p.m. UTC
Files contributed to QEMU are generally expected to be provided in the
preferred format for manipulation. IOW, we generally don't expect to
have generated / compiled code included in the tree, rather, we expect
to run the code generator / compiler as part of the build process.

There are some obvious exceptions to this seen in our existing tree, the
biggest one being the inclusion of many binary firmware ROMs. A more
niche example is the inclusion of a generated eBPF program. Or the CI
dockerfiles which are mostly auto-generated. In these cases, however,
the preferred format source code is still required to be included,
alongside the generated output.

Tools which perform user defined algorithmic transformations on code are
not considered to be "code generators". ie, we permit use of coccinelle,
spell checkers, and sed/awk/etc to manipulate code. Such use of automated
manipulation should still be declared in the commit message.

One off generators which create a boilerplate file which the author then
fills in, are acceptable if their output has clear copyright and license
status. This could be where a contributor writes a throwaway python
script to automate creation of some mundane piece of code for example.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 docs/devel/code-provenance.rst | 55 ++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

Comments

Michael S. Tsirkin May 16, 2024, 5:04 p.m. UTC | #1
On Thu, May 16, 2024 at 05:22:29PM +0100, Daniel P. Berrangé wrote:
> Files contributed to QEMU are generally expected to be provided in the
> preferred format for manipulation. IOW, we generally don't expect to
> have generated / compiled code included in the tree, rather, we expect
> to run the code generator / compiler as part of the build process.
> 
> There are some obvious exceptions to this seen in our existing tree, the
> biggest one being the inclusion of many binary firmware ROMs. A more
> niche example is the inclusion of a generated eBPF program. Or the CI
> dockerfiles which are mostly auto-generated. In these cases, however,
> the preferred format source code is still required to be included,
> alongside the generated output.
> 
> Tools which perform user defined algorithmic transformations on code are
> not considered to be "code generators". ie, we permit use of coccinelle,
> spell checkers, and sed/awk/etc to manipulate code. Such use of automated
> manipulation should still be declared in the commit message.
> 
> One off generators which create a boilerplate file which the author then
> fills in, are acceptable if their output has clear copyright and license
> status. This could be where a contributor writes a throwaway python
> script to automate creation of some mundane piece of code for example.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 55 ++++++++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index 7c42fae571..eabb3e7c08 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -210,3 +210,58 @@ mailing list.
>  It is also recommended to attempt to contact the original author to let them
>  know you are interested in taking over their work, in case they still intended
>  to return to the work, or had any suggestions about the best way to continue.
> +
> +Inclusion of generated files
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Files in patches contributed to QEMU are generally expected to be provided
> +only in the preferred format for making modifications. The implication of
> +this is that the output of code generators or compilers is usually not
> +appropriate to contribute to QEMU.
> +
> +For reasons of practicality there are some exceptions to this rule, where
> +generated code is permitted, provided it is also accompanied by the
> +corresponding preferred source format. This is done where it is impractical
> +to expect those building QEMU to run the code generation or compilation
> +process. A non-exhustive list of examples is:
> +
> + * Images: where an bitmap image is created from a vector file it is common
> +   to include the rendered bitmaps at desired resolution(s), since subtle
> +   changes in the rasterization process / tools may affect quality. The
> +   original vector file is expected to accompany any generated bitmaps.
> +
> + * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
> +   firmwares. When such binary ROMs are contributed, the corresponding source
> +   must also be provided, either directly, or through a git submodule link.
> +
> + * Dockerfiles: the majority of the dockerfiles are automatically generated
> +   from a canonical list of build dependencies maintained in tree, together
> +   with the libvirt-ci git submodule link. The generated dockerfiles are
> +   included in tree because it is desirable to be able to directly build
> +   container images from a clean git checkout.
> +
> + * EBPF: QEMU includes some generated EBPF machine code, since the required
> +   eBPF compilation tools are not broadly available on all targetted OS
> +   distributions. The corresponding eBPF C code for the binary is also
> +   provided. This is a time limited exception until the eBPF toolchain is
> +   sufficiently broadly available in distros.
> +
> +In all cases above, the existence of generated files must be acknowledged
> +and justified in the commit that introduces them.
> +
> +Tools which perform changes to existing code with deterministic algorithmic
> +manipulation, driven by user specified inputs, are not generally considered
> +to be "generators".
> +
> +IOW, using coccinelle to convert code from one pattern to another pattern, or
> +fixing docs typos with a spell checker, or transforming code using sed / awk /
> +etc, are not considered to be acts of code generation. Where an automated
> +manipulation is performed on code, however, this should be declared in the
> +commit message.
> +
> +At times contributors may use or create scripts/tools to generate an initial
> +boilerplate code template which is then filled in to produce the final patch.
> +The output of such a tool would still be considered the "preferred format",
> +since it is intended to be a foundation for further human authored changes.
> +Such tools are acceptable to use, provided they follow a deterministic process
> +and there is clearly defined copyright and licensing for their output.

GPL seems sufficiently clear on the matter:
The source code for a work means the preferred form of the work for making modifications to it. 

Do we really need to play lawyer?

> -- 
> 2.43.0
Daniel P. Berrangé May 17, 2024, 10:51 a.m. UTC | #2
On Thu, May 16, 2024 at 01:04:42PM -0400, Michael S. Tsirkin wrote:
> On Thu, May 16, 2024 at 05:22:29PM +0100, Daniel P. Berrangé wrote:
> > Files contributed to QEMU are generally expected to be provided in the
> > preferred format for manipulation. IOW, we generally don't expect to
> > have generated / compiled code included in the tree, rather, we expect
> > to run the code generator / compiler as part of the build process.
> > 
> > There are some obvious exceptions to this seen in our existing tree, the
> > biggest one being the inclusion of many binary firmware ROMs. A more
> > niche example is the inclusion of a generated eBPF program. Or the CI
> > dockerfiles which are mostly auto-generated. In these cases, however,
> > the preferred format source code is still required to be included,
> > alongside the generated output.
> > 
> > Tools which perform user defined algorithmic transformations on code are
> > not considered to be "code generators". ie, we permit use of coccinelle,
> > spell checkers, and sed/awk/etc to manipulate code. Such use of automated
> > manipulation should still be declared in the commit message.
> > 
> > One off generators which create a boilerplate file which the author then
> > fills in, are acceptable if their output has clear copyright and license
> > status. This could be where a contributor writes a throwaway python
> > script to automate creation of some mundane piece of code for example.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  docs/devel/code-provenance.rst | 55 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 55 insertions(+)
> > 
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > index 7c42fae571..eabb3e7c08 100644
> > --- a/docs/devel/code-provenance.rst
> > +++ b/docs/devel/code-provenance.rst
> > @@ -210,3 +210,58 @@ mailing list.
> >  It is also recommended to attempt to contact the original author to let them
> >  know you are interested in taking over their work, in case they still intended
> >  to return to the work, or had any suggestions about the best way to continue.
> > +
> > +Inclusion of generated files
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Files in patches contributed to QEMU are generally expected to be provided
> > +only in the preferred format for making modifications. The implication of
> > +this is that the output of code generators or compilers is usually not
> > +appropriate to contribute to QEMU.
> > +
> > +For reasons of practicality there are some exceptions to this rule, where
> > +generated code is permitted, provided it is also accompanied by the
> > +corresponding preferred source format. This is done where it is impractical
> > +to expect those building QEMU to run the code generation or compilation
> > +process. A non-exhustive list of examples is:
> > +
> > + * Images: where an bitmap image is created from a vector file it is common
> > +   to include the rendered bitmaps at desired resolution(s), since subtle
> > +   changes in the rasterization process / tools may affect quality. The
> > +   original vector file is expected to accompany any generated bitmaps.
> > +
> > + * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
> > +   firmwares. When such binary ROMs are contributed, the corresponding source
> > +   must also be provided, either directly, or through a git submodule link.
> > +
> > + * Dockerfiles: the majority of the dockerfiles are automatically generated
> > +   from a canonical list of build dependencies maintained in tree, together
> > +   with the libvirt-ci git submodule link. The generated dockerfiles are
> > +   included in tree because it is desirable to be able to directly build
> > +   container images from a clean git checkout.
> > +
> > + * EBPF: QEMU includes some generated EBPF machine code, since the required
> > +   eBPF compilation tools are not broadly available on all targetted OS
> > +   distributions. The corresponding eBPF C code for the binary is also
> > +   provided. This is a time limited exception until the eBPF toolchain is
> > +   sufficiently broadly available in distros.
> > +
> > +In all cases above, the existence of generated files must be acknowledged
> > +and justified in the commit that introduces them.
> > +
> > +Tools which perform changes to existing code with deterministic algorithmic
> > +manipulation, driven by user specified inputs, are not generally considered
> > +to be "generators".
> > +
> > +IOW, using coccinelle to convert code from one pattern to another pattern, or
> > +fixing docs typos with a spell checker, or transforming code using sed / awk /
> > +etc, are not considered to be acts of code generation. Where an automated
> > +manipulation is performed on code, however, this should be declared in the
> > +commit message.
> > +
> > +At times contributors may use or create scripts/tools to generate an initial
> > +boilerplate code template which is then filled in to produce the final patch.
> > +The output of such a tool would still be considered the "preferred format",
> > +since it is intended to be a foundation for further human authored changes.
> > +Such tools are acceptable to use, provided they follow a deterministic process
> > +and there is clearly defined copyright and licensing for their output.
> 
> GPL seems sufficiently clear on the matter:
> The source code for a work means the preferred form of the work for making modifications to it.

That's a different scenario.

That GPL clause applies to someone distributing the QEMU binaries. They
must make the corresponding source available in the "preferred form",
which would imply the form that the QEMU project released originally
in its source tarballs.

This doesn't say that QEMU maintainers have to choose a particular type
of source as the preferred one. We could easily choose the output of
"bison" as our preferred form of a given source file if we so wished, and
the GPL has no opinion on that matter. Just that downstream distributors
have to stick with the preferred source that QEMU originally chose.

This doc is about guiding our own contributors on what QEMU should pick
as the preferred form for source we release.

With regards,
Daniel
Alex Bennée May 17, 2024, 6:23 p.m. UTC | #3
Daniel P. Berrangé <berrange@redhat.com> writes:

<snip>
> +
> +IOW, using coccinelle to convert code from one pattern to another pattern, or
> +fixing docs typos with a spell checker, or transforming code using sed / awk /
> +etc, are not considered to be acts of code generation. Where an automated
> +manipulation is performed on code, however, this should be declared in the
> +commit message.

Lets avoid IRC speak in documents (s/IOW/In other words/), otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


> +
> +At times contributors may use or create scripts/tools to generate an initial
> +boilerplate code template which is then filled in to produce the final patch.
> +The output of such a tool would still be considered the "preferred format",
> +since it is intended to be a foundation for further human authored changes.
> +Such tools are acceptable to use, provided they follow a deterministic process
> +and there is clearly defined copyright and licensing for their output.
Kevin Wolf May 28, 2024, 3:41 p.m. UTC | #4
Am 16.05.2024 um 18:22 hat Daniel P. Berrangé geschrieben:
> Files contributed to QEMU are generally expected to be provided in the
> preferred format for manipulation. IOW, we generally don't expect to
> have generated / compiled code included in the tree, rather, we expect
> to run the code generator / compiler as part of the build process.
> 
> There are some obvious exceptions to this seen in our existing tree, the
> biggest one being the inclusion of many binary firmware ROMs. A more
> niche example is the inclusion of a generated eBPF program. Or the CI
> dockerfiles which are mostly auto-generated. In these cases, however,
> the preferred format source code is still required to be included,
> alongside the generated output.
> 
> Tools which perform user defined algorithmic transformations on code are
> not considered to be "code generators". ie, we permit use of coccinelle,
> spell checkers, and sed/awk/etc to manipulate code. Such use of automated
> manipulation should still be declared in the commit message.
> 
> One off generators which create a boilerplate file which the author then
> fills in, are acceptable if their output has clear copyright and license
> status. This could be where a contributor writes a throwaway python
> script to automate creation of some mundane piece of code for example.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 55 ++++++++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index 7c42fae571..eabb3e7c08 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -210,3 +210,58 @@ mailing list.
>  It is also recommended to attempt to contact the original author to let them
>  know you are interested in taking over their work, in case they still intended
>  to return to the work, or had any suggestions about the best way to continue.
> +
> +Inclusion of generated files
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Files in patches contributed to QEMU are generally expected to be provided
> +only in the preferred format for making modifications. The implication of
> +this is that the output of code generators or compilers is usually not
> +appropriate to contribute to QEMU.
> +
> +For reasons of practicality there are some exceptions to this rule, where
> +generated code is permitted, provided it is also accompanied by the
> +corresponding preferred source format. This is done where it is impractical
> +to expect those building QEMU to run the code generation or compilation
> +process. A non-exhustive list of examples is:
> +
> + * Images: where an bitmap image is created from a vector file it is common
> +   to include the rendered bitmaps at desired resolution(s), since subtle
> +   changes in the rasterization process / tools may affect quality. The
> +   original vector file is expected to accompany any generated bitmaps.
> +
> + * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
> +   firmwares. When such binary ROMs are contributed, the corresponding source
> +   must also be provided, either directly, or through a git submodule link.
> +
> + * Dockerfiles: the majority of the dockerfiles are automatically generated
> +   from a canonical list of build dependencies maintained in tree, together
> +   with the libvirt-ci git submodule link. The generated dockerfiles are
> +   included in tree because it is desirable to be able to directly build
> +   container images from a clean git checkout.
> +
> + * EBPF: QEMU includes some generated EBPF machine code, since the required
> +   eBPF compilation tools are not broadly available on all targetted OS
> +   distributions. The corresponding eBPF C code for the binary is also
> +   provided. This is a time limited exception until the eBPF toolchain is
> +   sufficiently broadly available in distros.

This paragraph is inconsistent with the spelling of "EBPF"/"eBPF".

Kevin
diff mbox series

Patch

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index 7c42fae571..eabb3e7c08 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -210,3 +210,58 @@  mailing list.
 It is also recommended to attempt to contact the original author to let them
 know you are interested in taking over their work, in case they still intended
 to return to the work, or had any suggestions about the best way to continue.
+
+Inclusion of generated files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Files in patches contributed to QEMU are generally expected to be provided
+only in the preferred format for making modifications. The implication of
+this is that the output of code generators or compilers is usually not
+appropriate to contribute to QEMU.
+
+For reasons of practicality there are some exceptions to this rule, where
+generated code is permitted, provided it is also accompanied by the
+corresponding preferred source format. This is done where it is impractical
+to expect those building QEMU to run the code generation or compilation
+process. A non-exhustive list of examples is:
+
+ * Images: where an bitmap image is created from a vector file it is common
+   to include the rendered bitmaps at desired resolution(s), since subtle
+   changes in the rasterization process / tools may affect quality. The
+   original vector file is expected to accompany any generated bitmaps.
+
+ * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
+   firmwares. When such binary ROMs are contributed, the corresponding source
+   must also be provided, either directly, or through a git submodule link.
+
+ * Dockerfiles: the majority of the dockerfiles are automatically generated
+   from a canonical list of build dependencies maintained in tree, together
+   with the libvirt-ci git submodule link. The generated dockerfiles are
+   included in tree because it is desirable to be able to directly build
+   container images from a clean git checkout.
+
+ * EBPF: QEMU includes some generated EBPF machine code, since the required
+   eBPF compilation tools are not broadly available on all targetted OS
+   distributions. The corresponding eBPF C code for the binary is also
+   provided. This is a time limited exception until the eBPF toolchain is
+   sufficiently broadly available in distros.
+
+In all cases above, the existence of generated files must be acknowledged
+and justified in the commit that introduces them.
+
+Tools which perform changes to existing code with deterministic algorithmic
+manipulation, driven by user specified inputs, are not generally considered
+to be "generators".
+
+IOW, using coccinelle to convert code from one pattern to another pattern, or
+fixing docs typos with a spell checker, or transforming code using sed / awk /
+etc, are not considered to be acts of code generation. Where an automated
+manipulation is performed on code, however, this should be declared in the
+commit message.
+
+At times contributors may use or create scripts/tools to generate an initial
+boilerplate code template which is then filled in to produce the final patch.
+The output of such a tool would still be considered the "preferred format",
+since it is intended to be a foundation for further human authored changes.
+Such tools are acceptable to use, provided they follow a deterministic process
+and there is clearly defined copyright and licensing for their output.