diff mbox series

[v1,3/7] gitlab-ci: Fix the build-cfi-aarch64 and build-cfi-ppc64-s390x jobs

Message ID 20220613171258.1905715-4-alex.bennee@linaro.org
State New
Headers show
Series testing/next pre-PR (docker, gitlab, tcg) | expand

Commit Message

Alex Bennée June 13, 2022, 5:12 p.m. UTC
From: Thomas Huth <thuth@redhat.com>

The job definitions recently got a second "variables:" section by
accident and thus are failing now if one tries to run them. Merge
the two sections into one again to fix the issue.

And while we're at it, bump the timeout here (70 minutes are currently
not enough for the aarch64 job). The jobs are marked as manual anyway,
so if the user starts them, they want to see their result for sure and
then it's annoying if the job timeouts too early.

Fixes: e312d1fdbb ("gitlab: convert build/container jobs to .base_job_template")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220603124809.70794-1-thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 .gitlab-ci.d/buildtest.yml | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

Comments

Richard Henderson June 13, 2022, 9:46 p.m. UTC | #1
On 6/13/22 10:12, Alex Bennée wrote:
> From: Thomas Huth <thuth@redhat.com>
> 
> The job definitions recently got a second "variables:" section by
> accident and thus are failing now if one tries to run them. Merge
> the two sections into one again to fix the issue.
> 
> And while we're at it, bump the timeout here (70 minutes are currently
> not enough for the aarch64 job). The jobs are marked as manual anyway,
> so if the user starts them, they want to see their result for sure and
> then it's annoying if the job timeouts too early.
> 
> Fixes: e312d1fdbb ("gitlab: convert build/container jobs to .base_job_template")
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> Acked-by: Richard Henderson <richard.henderson@linaro.org>
> Message-Id: <20220603124809.70794-1-thuth@redhat.com>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   .gitlab-ci.d/buildtest.yml | 22 ++++++++++------------
>   1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> index 544385f5be..cb7cad44b5 100644
> --- a/.gitlab-ci.d/buildtest.yml
> +++ b/.gitlab-ci.d/buildtest.yml
> @@ -357,16 +357,15 @@ build-cfi-aarch64:
>         --enable-safe-stack --enable-slirp=git
>       TARGETS: aarch64-softmmu
>       MAKE_CHECK_ARGS: check-build
> -  timeout: 70m
> -  artifacts:
> -    expire_in: 2 days
> -    paths:
> -      - build
> -  variables:
>       # FIXME: This job is often failing, likely due to out-of-memory problems in
>       # the constrained containers of the shared runners. Thus this is marked as
>       # skipped until the situation has been solved.
>       QEMU_JOB_SKIPPED: 1
> +  timeout: 90m
> +  artifacts:
> +    expire_in: 2 days
> +    paths:
> +      - build

FWIW, 90 minutes was close, but insufficient:

https://gitlab.com/qemu-project/qemu/-/jobs/2584472225

But certainly, let us fix the job definition:
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~
Thomas Huth June 14, 2022, 4:30 a.m. UTC | #2
On 13/06/2022 23.46, Richard Henderson wrote:
> On 6/13/22 10:12, Alex Bennée wrote:
>> From: Thomas Huth <thuth@redhat.com>
>>
>> The job definitions recently got a second "variables:" section by
>> accident and thus are failing now if one tries to run them. Merge
>> the two sections into one again to fix the issue.
>>
>> And while we're at it, bump the timeout here (70 minutes are currently
>> not enough for the aarch64 job). The jobs are marked as manual anyway,
>> so if the user starts them, they want to see their result for sure and
>> then it's annoying if the job timeouts too early.
>>
>> Fixes: e312d1fdbb ("gitlab: convert build/container jobs to 
>> .base_job_template")
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> Acked-by: Richard Henderson <richard.henderson@linaro.org>
>> Message-Id: <20220603124809.70794-1-thuth@redhat.com>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>   .gitlab-ci.d/buildtest.yml | 22 ++++++++++------------
>>   1 file changed, 10 insertions(+), 12 deletions(-)
>>
>> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
>> index 544385f5be..cb7cad44b5 100644
>> --- a/.gitlab-ci.d/buildtest.yml
>> +++ b/.gitlab-ci.d/buildtest.yml
>> @@ -357,16 +357,15 @@ build-cfi-aarch64:
>>         --enable-safe-stack --enable-slirp=git
>>       TARGETS: aarch64-softmmu
>>       MAKE_CHECK_ARGS: check-build
>> -  timeout: 70m
>> -  artifacts:
>> -    expire_in: 2 days
>> -    paths:
>> -      - build
>> -  variables:
>>       # FIXME: This job is often failing, likely due to out-of-memory 
>> problems in
>>       # the constrained containers of the shared runners. Thus this is 
>> marked as
>>       # skipped until the situation has been solved.
>>       QEMU_JOB_SKIPPED: 1
>> +  timeout: 90m
>> +  artifacts:
>> +    expire_in: 2 days
>> +    paths:
>> +      - build
> 
> FWIW, 90 minutes was close, but insufficient:
> 
> https://gitlab.com/qemu-project/qemu/-/jobs/2584472225

Hmm, it was working at least once for me while I was working on the patch. 
But as I already wrote here:

  https://lists.gnu.org/archive/html/qemu-devel/2022-06/msg00463.html

I think nobody really used this build-cfi-aarch64 in month ... so we should 
maybe have a try with the 90 min timeout first (maybe the CI servers were 
just a little bit overloaded when you tried), but if the test continues to 
hit the 90 minutes timeout, I'd say we rather delete it instead of bumping 
the timeout even further. 90 minutes are really very close to the pain level 
already - at least for me.

> But certainly, let us fix the job definition:
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Thanks!

  Thomas
Daniel P. Berrangé June 14, 2022, 8:29 a.m. UTC | #3
On Tue, Jun 14, 2022 at 06:30:47AM +0200, Thomas Huth wrote:
> On 13/06/2022 23.46, Richard Henderson wrote:
> > On 6/13/22 10:12, Alex Bennée wrote:
> > > From: Thomas Huth <thuth@redhat.com>
> > > 
> > > The job definitions recently got a second "variables:" section by
> > > accident and thus are failing now if one tries to run them. Merge
> > > the two sections into one again to fix the issue.
> > > 
> > > And while we're at it, bump the timeout here (70 minutes are currently
> > > not enough for the aarch64 job). The jobs are marked as manual anyway,
> > > so if the user starts them, they want to see their result for sure and
> > > then it's annoying if the job timeouts too early.
> > > 
> > > Fixes: e312d1fdbb ("gitlab: convert build/container jobs to
> > > .base_job_template")
> > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > > Acked-by: Richard Henderson <richard.henderson@linaro.org>
> > > Message-Id: <20220603124809.70794-1-thuth@redhat.com>
> > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> > > ---
> > >   .gitlab-ci.d/buildtest.yml | 22 ++++++++++------------
> > >   1 file changed, 10 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> > > index 544385f5be..cb7cad44b5 100644
> > > --- a/.gitlab-ci.d/buildtest.yml
> > > +++ b/.gitlab-ci.d/buildtest.yml
> > > @@ -357,16 +357,15 @@ build-cfi-aarch64:
> > >         --enable-safe-stack --enable-slirp=git
> > >       TARGETS: aarch64-softmmu
> > >       MAKE_CHECK_ARGS: check-build
> > > -  timeout: 70m
> > > -  artifacts:
> > > -    expire_in: 2 days
> > > -    paths:
> > > -      - build
> > > -  variables:
> > >       # FIXME: This job is often failing, likely due to
> > > out-of-memory problems in
> > >       # the constrained containers of the shared runners. Thus this
> > > is marked as
> > >       # skipped until the situation has been solved.
> > >       QEMU_JOB_SKIPPED: 1
> > > +  timeout: 90m
> > > +  artifacts:
> > > +    expire_in: 2 days
> > > +    paths:
> > > +      - build
> > 
> > FWIW, 90 minutes was close, but insufficient:
> > 
> > https://gitlab.com/qemu-project/qemu/-/jobs/2584472225
> 
> Hmm, it was working at least once for me while I was working on the patch.
> But as I already wrote here:
> 
>  https://lists.gnu.org/archive/html/qemu-devel/2022-06/msg00463.html
> 
> I think nobody really used this build-cfi-aarch64 in month ... so we should
> maybe have a try with the 90 min timeout first (maybe the CI servers were
> just a little bit overloaded when you tried), but if the test continues to
> hit the 90 minutes timeout, I'd say we rather delete it instead of bumping
> the timeout even further. 90 minutes are really very close to the pain level
> already - at least for me.

The CFI jobs seem to massively slow down and timeout waaaaaaay
more often than any other job. I've seen the CFI jobs run
successfully in 45 minutes, and yet they frequently take so long
that they can't even complete in double that. CFI is certainly
slower at compile but not in a non-deterministic manner that
would randomly double compilation time. I would be willing to
blame CI overload if all our other jobs were showing similar
magnitude of slow down, but AFAIK, they are not showing this.
I worry that there are genuine problems with the CFI builds
that result in non-deterministic runtime problems in functional
testing. IOW not merely running slowly, but genuine hang


With regards,
Daniel
diff mbox series

Patch

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 544385f5be..cb7cad44b5 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -357,16 +357,15 @@  build-cfi-aarch64:
       --enable-safe-stack --enable-slirp=git
     TARGETS: aarch64-softmmu
     MAKE_CHECK_ARGS: check-build
-  timeout: 70m
-  artifacts:
-    expire_in: 2 days
-    paths:
-      - build
-  variables:
     # FIXME: This job is often failing, likely due to out-of-memory problems in
     # the constrained containers of the shared runners. Thus this is marked as
     # skipped until the situation has been solved.
     QEMU_JOB_SKIPPED: 1
+  timeout: 90m
+  artifacts:
+    expire_in: 2 days
+    paths:
+      - build
 
 check-cfi-aarch64:
   extends: .native_test_job_template
@@ -398,16 +397,15 @@  build-cfi-ppc64-s390x:
       --enable-safe-stack --enable-slirp=git
     TARGETS: ppc64-softmmu s390x-softmmu
     MAKE_CHECK_ARGS: check-build
-  timeout: 70m
-  artifacts:
-    expire_in: 2 days
-    paths:
-      - build
-  variables:
     # FIXME: This job is often failing, likely due to out-of-memory problems in
     # the constrained containers of the shared runners. Thus this is marked as
     # skipped until the situation has been solved.
     QEMU_JOB_SKIPPED: 1
+  timeout: 80m
+  artifacts:
+    expire_in: 2 days
+    paths:
+      - build
 
 check-cfi-ppc64-s390x:
   extends: .native_test_job_template