diff mbox series

[RFC,10/16] gitlab-ci: Introduce the CI "job maintainer" concept

Message ID 20201110160140.2859904-11-philmd@redhat.com
State New
Headers show
Series gitlab-ci: Introduce "CI job maintainer" concept, mark jobs maintained | expand

Commit Message

Philippe Mathieu-Daudé Nov. 10, 2020, 4:01 p.m. UTC
When a job fails, someone has to take care of it. As we can
not wait indefinitively of volunteers good will, introduce the
concept of "job maintainers". A job maintainer is reponsible
of keeping it working, or contact the developers having broken
it to fix it.

When a job is added, it must have a maintainer. A job without
maintainer is not run automatically. It can however be run
manually from the WebUI.

To declare a maintainer, it is as easy as defining the
JOB_MAINTAINER_NAME / JOB_MAINTAINER_EMAIL environment variables.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
Ideally the runner will notify the maintainer by mail that
a job fails. But we are not quite there yet.

It would be nice if someone document this properly.
---
 .gitlab-ci.d/containers.yml  | 4 ++++
 .gitlab-ci.d/crossbuilds.yml | 6 ++++++
 .gitlab-ci.yml               | 6 ++++++
 3 files changed, 16 insertions(+)

Comments

Alex Bennée Nov. 11, 2020, 9:37 a.m. UTC | #1
Philippe Mathieu-Daudé <philmd@redhat.com> writes:

> When a job fails, someone has to take care of it. As we can
> not wait indefinitively of volunteers good will, introduce the
> concept of "job maintainers". A job maintainer is reponsible
> of keeping it working, or contact the developers having broken
> it to fix it.
>
> When a job is added, it must have a maintainer. A job without
> maintainer is not run automatically. It can however be run
> manually from the WebUI.
>
> To declare a maintainer, it is as easy as defining the
> JOB_MAINTAINER_NAME / JOB_MAINTAINER_EMAIL environment variables.

So I think the problem here is the CI jobs are orthogonal to the actual
tests. And the tests should be associated via MAINTAINERS with the
relevant sub-systems.

That is not to say that the test environments don't need some care and
attention. So I'm quite happy to track updates needed to
tests/docker/dockerfiles for example but just because check-block failed
on an Ubuntu system doesn't mean I'm best placed to diagnose the
problem. In the first instance it shouldn't happen (not merging code
that regresses a test) and the second instance probably requires a block
maintainer to look at the output.

I think a better solution is to improve our test reporting so we can
quickly point the failing tests. I notice GitLab gets nice test output
from check-acceptance. What would we need to do to improve it from
check, check-block and check-tcg?


>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
> Ideally the runner will notify the maintainer by mail that
> a job fails. But we are not quite there yet.
>
> It would be nice if someone document this properly.
> ---
>  .gitlab-ci.d/containers.yml  | 4 ++++
>  .gitlab-ci.d/crossbuilds.yml | 6 ++++++
>  .gitlab-ci.yml               | 6 ++++++
>  3 files changed, 16 insertions(+)
>
> diff --git a/.gitlab-ci.d/containers.yml b/.gitlab-ci.d/containers.yml
> index 7e664878cab..bd2a05008d1 100644
> --- a/.gitlab-ci.d/containers.yml
> +++ b/.gitlab-ci.d/containers.yml
> @@ -21,6 +21,10 @@
>    after_script:
>      - docker logout
>    rules:
> +    # Skip unmaintained jobs
> +    - if: $JOB_MAINTAINER_NAME == null || $JOB_MAINTAINER_EMAIL == null
> +      when: manual
> +      allow_failure: true
>      - changes:
>        - .gitlab-ci.d/containers.yml
>        - tests/docker/*
> diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
> index 701550f028c..aecdb2a38f1 100644
> --- a/.gitlab-ci.d/crossbuilds.yml
> +++ b/.gitlab-ci.d/crossbuilds.yml
> @@ -1,6 +1,12 @@
>  .cross_common_job:
>    stage: build
>    image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest
> +  rules:
> +    # Skip unmaintained jobs
> +    - if: $JOB_MAINTAINER_NAME == null || $JOB_MAINTAINER_EMAIL == null
> +      when: manual
> +      allow_failure: true
> +    - when: always
>  
>  .cross_system_build_job:
>    extends: .cross_common_job
> diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
> index 2e631d4f160..ded4f0bdd18 100644
> --- a/.gitlab-ci.yml
> +++ b/.gitlab-ci.yml
> @@ -19,6 +19,12 @@ include:
>  
>  .native_common_job:
>    image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest
> +  rules:
> +    # Skip unmaintained jobs
> +    - if: $JOB_MAINTAINER_NAME == null || $JOB_MAINTAINER_EMAIL == null
> +      when: manual
> +      allow_failure: true
> +    - when: always
>  
>  .native_build_job:
>    extends: .native_common_job
Daniel P. Berrangé Nov. 11, 2020, 9:45 a.m. UTC | #2
On Tue, Nov 10, 2020 at 05:01:34PM +0100, Philippe Mathieu-Daudé wrote:
> When a job fails, someone has to take care of it. As we can
> not wait indefinitively of volunteers good will, introduce the
> concept of "job maintainers". A job maintainer is reponsible
> of keeping it working, or contact the developers having broken
> it to fix it.
> 
> When a job is added, it must have a maintainer. A job without
> maintainer is not run automatically. It can however be run
> manually from the WebUI.
> 
> To declare a maintainer, it is as easy as defining the
> JOB_MAINTAINER_NAME / JOB_MAINTAINER_EMAIL environment variables.

I don't think we really want/need todo this.

The big problem we're facing is that there is no incentive right now
for maintainers to make sure their code works with GitLab CI before
they send a pull request. Adding job maintainers is just a band-aid,
and not a very good one either, because each job covers features across
many subsystems which should each be dealt with by their existing
maintainers.

The primary solution to this tragedy is to make all the jobs gating
on all pull requests. If a maintainer wants their pull requrst to
get merged they then have no choice but to ensure it doesn't break
any CI jobs.

The main blocker for this right now IIUC is the git repo sync from
qemu to gitlab. Once we switch to gitlab as primary, we need to start
enforcing GitLab as gating for all jobs. At this point making sure
GitLab CI passes is every maintainer's job.

We'll still have some failures periodically from non-deterministic
bugs. If a test shows itself to be flaky though, it should just be
disabled in a very short time frame. Whichever maintainer owned the
test has the job for fixing the flakyness before it can be renabled.


Regards,
Daniel
Daniel P. Berrangé Nov. 11, 2020, 9:53 a.m. UTC | #3
On Wed, Nov 11, 2020 at 09:37:53AM +0000, Alex Bennée wrote:
> 
> Philippe Mathieu-Daudé <philmd@redhat.com> writes:
> 
> > When a job fails, someone has to take care of it. As we can
> > not wait indefinitively of volunteers good will, introduce the
> > concept of "job maintainers". A job maintainer is reponsible
> > of keeping it working, or contact the developers having broken
> > it to fix it.
> >
> > When a job is added, it must have a maintainer. A job without
> > maintainer is not run automatically. It can however be run
> > manually from the WebUI.
> >
> > To declare a maintainer, it is as easy as defining the
> > JOB_MAINTAINER_NAME / JOB_MAINTAINER_EMAIL environment variables.
> 
> So I think the problem here is the CI jobs are orthogonal to the actual
> tests. And the tests should be associated via MAINTAINERS with the
> relevant sub-systems.
> 
> That is not to say that the test environments don't need some care and
> attention. So I'm quite happy to track updates needed to
> tests/docker/dockerfiles for example but just because check-block failed
> on an Ubuntu system doesn't mean I'm best placed to diagnose the
> problem. In the first instance it shouldn't happen (not merging code
> that regresses a test) and the second instance probably requires a block
> maintainer to look at the output.
> 
> I think a better solution is to improve our test reporting so we can
> quickly point the failing tests. I notice GitLab gets nice test output
> from check-acceptance. What would we need to do to improve it from
> check, check-block and check-tcg?

That presentation is from the artifacts publishing test results
in junit format. IOW, we need to enhance the build system in some
way such that it can generate a junit report of other tests too.


Regards,
Daniel
Philippe Mathieu-Daudé Nov. 11, 2020, 11:13 a.m. UTC | #4
On 11/11/20 10:45 AM, Daniel P. Berrangé wrote:
> On Tue, Nov 10, 2020 at 05:01:34PM +0100, Philippe Mathieu-Daudé wrote:
>> When a job fails, someone has to take care of it. As we can
>> not wait indefinitively of volunteers good will, introduce the
>> concept of "job maintainers". A job maintainer is reponsible
>> of keeping it working, or contact the developers having broken
>> it to fix it.
>>
>> When a job is added, it must have a maintainer. A job without
>> maintainer is not run automatically. It can however be run
>> manually from the WebUI.
>>
>> To declare a maintainer, it is as easy as defining the
>> JOB_MAINTAINER_NAME / JOB_MAINTAINER_EMAIL environment variables.
> 
> I don't think we really want/need todo this.
> 
> The big problem we're facing is that there is no incentive right now
> for maintainers to make sure their code works with GitLab CI before
> they send a pull request. Adding job maintainers is just a band-aid,
> and not a very good one either, because each job covers features across
> many subsystems which should each be dealt with by their existing
> maintainers.
> 
> The primary solution to this tragedy is to make all the jobs gating
> on all pull requests. If a maintainer wants their pull requrst to
> get merged they then have no choice but to ensure it doesn't break
> any CI jobs.
> 
> The main blocker for this right now IIUC is the git repo sync from
> qemu to gitlab. Once we switch to gitlab as primary, we need to start
> enforcing GitLab as gating for all jobs. At this point making sure
> GitLab CI passes is every maintainer's job.
> 
> We'll still have some failures periodically from non-deterministic
> bugs. If a test shows itself to be flaky though, it should just be
> disabled in a very short time frame. Whichever maintainer owned the
> test has the job for fixing the flakyness before it can be renabled.

At this point I'd rather remove everything we have in CI and restart
from scratch. So if someone is really interested in having CI jobs /
runner, this someone will step in and contribute / maintain. Not like
the current state when some are happy to use the result, but nobody
cares about maintenance or fixing bugs (as the last year experience
clearly show).

I don't know if Alex / Thomas are willing to keep doing that.

I personally don't have the the energy to do this in my spare time.

Thanks.
diff mbox series

Patch

diff --git a/.gitlab-ci.d/containers.yml b/.gitlab-ci.d/containers.yml
index 7e664878cab..bd2a05008d1 100644
--- a/.gitlab-ci.d/containers.yml
+++ b/.gitlab-ci.d/containers.yml
@@ -21,6 +21,10 @@ 
   after_script:
     - docker logout
   rules:
+    # Skip unmaintained jobs
+    - if: $JOB_MAINTAINER_NAME == null || $JOB_MAINTAINER_EMAIL == null
+      when: manual
+      allow_failure: true
     - changes:
       - .gitlab-ci.d/containers.yml
       - tests/docker/*
diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
index 701550f028c..aecdb2a38f1 100644
--- a/.gitlab-ci.d/crossbuilds.yml
+++ b/.gitlab-ci.d/crossbuilds.yml
@@ -1,6 +1,12 @@ 
 .cross_common_job:
   stage: build
   image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest
+  rules:
+    # Skip unmaintained jobs
+    - if: $JOB_MAINTAINER_NAME == null || $JOB_MAINTAINER_EMAIL == null
+      when: manual
+      allow_failure: true
+    - when: always
 
 .cross_system_build_job:
   extends: .cross_common_job
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 2e631d4f160..ded4f0bdd18 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -19,6 +19,12 @@  include:
 
 .native_common_job:
   image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest
+  rules:
+    # Skip unmaintained jobs
+    - if: $JOB_MAINTAINER_NAME == null || $JOB_MAINTAINER_EMAIL == null
+      when: manual
+      allow_failure: true
+    - when: always
 
 .native_build_job:
   extends: .native_common_job