diff mbox

qemu-system-s390x tests/boot-serial-test intermittent failure

Message ID ff8093ef-78ee-b2ab-99d4-6957cbeac07d@de.ibm.com
State New
Headers show

Commit Message

Christian Borntraeger March 24, 2017, 11:11 a.m. UTC
On 03/24/2017 11:57 AM, Peter Maydell wrote:
> Hi; qemu-system-s390x seems to have an intermittent failure at
> the moment -- it's been causing our Travis builds to flap. I actually
> caught it doing this on one of my local test builds (which happens
> to be aarch64 but I don't think that matters, since Travis is doing
> x86 builds):
> 
> while  QTEST_QEMU_BINARY=s390x-softmmu/qemu-system-s390x
> QTEST_QEMU_IMG=qemu-img MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM %
> 255 + 1))} gtester -k --verbose -m=quick tests/boot-serial-test ; do
> true; done
>  TEST: tests/boot-serial-test... (pid=1122)
>   /s390x/boot-serial/s390-ccw-virtio:                                  OK
> PASS: tests/boot-serial-test
> TEST: tests/boot-serial-test... (pid=1135)
>   /s390x/boot-serial/s390-ccw-virtio:                                  OK
> [skip lots more successes]
> TEST: tests/boot-serial-test... (pid=1582)
>   /s390x/boot-serial/s390-ccw-virtio:
> Broken pipe
> FAIL
> GTester: last random seed: R02Se94f36f305f2edd8391a22749ec91143
> (pid=1635)
> FAIL: tests/boot-serial-test
> 
> Any ideas?
> thanks

Adding Thomas who did the s390 version.

One idea. Maybe qemu exits before the other side is ready.
Does reverting

commit 864111f422babcf8ce837fb47f7f9e1948446f22
Author:     Christian Borntraeger <borntraeger@de.ibm.com>
AuthorDate: Tue Oct 18 09:29:54 2016 +0200
Commit:     Paolo Bonzini <pbonzini@redhat.com>
CommitDate: Wed Nov 2 09:28:56 2016 +0100

    vl: exit qemu on guest panic if -no-shutdown is not set

help?

If yes, does 

also help?

Comments

Thomas Huth March 24, 2017, 11:26 a.m. UTC | #1
On 24.03.2017 12:11, Christian Borntraeger wrote:
> On 03/24/2017 11:57 AM, Peter Maydell wrote:
>> Hi; qemu-system-s390x seems to have an intermittent failure at
>> the moment -- it's been causing our Travis builds to flap. I actually
>> caught it doing this on one of my local test builds (which happens
>> to be aarch64 but I don't think that matters, since Travis is doing
>> x86 builds):
>>
>> while  QTEST_QEMU_BINARY=s390x-softmmu/qemu-system-s390x
>> QTEST_QEMU_IMG=qemu-img MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM %
>> 255 + 1))} gtester -k --verbose -m=quick tests/boot-serial-test ; do
>> true; done
>>  TEST: tests/boot-serial-test... (pid=1122)
>>   /s390x/boot-serial/s390-ccw-virtio:                                  OK
>> PASS: tests/boot-serial-test
>> TEST: tests/boot-serial-test... (pid=1135)
>>   /s390x/boot-serial/s390-ccw-virtio:                                  OK
>> [skip lots more successes]
>> TEST: tests/boot-serial-test... (pid=1582)
>>   /s390x/boot-serial/s390-ccw-virtio:
>> Broken pipe
>> FAIL
>> GTester: last random seed: R02Se94f36f305f2edd8391a22749ec91143
>> (pid=1635)
>> FAIL: tests/boot-serial-test
>>
>> Any ideas?

I was not able to reproduce this issue so far (on my x86 laptop since I
don't have an aarch64 host) ... can you reproduce it by running the test
directly, too, e.g. something like:

QTEST_QEMU_BINARY=s390x-softmmu/qemu-system-s390x tests/boot-serial-test

?

> Adding Thomas who did the s390 version.
> 
> One idea. Maybe qemu exits before the other side is ready.

Or could this be a timeout issue again? Is the host very loaded?
(however, I don't really believe that this could be the issue here,
since the test is rather fast)

 Thomas
Peter Maydell March 24, 2017, 11:37 a.m. UTC | #2
On 24 March 2017 at 11:26, Thomas Huth <thuth@redhat.com> wrote:
> I was not able to reproduce this issue so far (on my x86 laptop since I
> don't have an aarch64 host) ... can you reproduce it by running the test
> directly, too, e.g. something like:
>
> QTEST_QEMU_BINARY=s390x-softmmu/qemu-system-s390x tests/boot-serial-test
>
> ?

Yes, you can.

>> Adding Thomas who did the s390 version.
>>
>> One idea. Maybe qemu exits before the other side is ready.
>
> Or could this be a timeout issue again? Is the host very loaded?
> (however, I don't really believe that this could be the issue here,
> since the test is rather fast)

The timeout is 60 seconds, right? When the test fails it
fails immediately, not after hanging for 60s.

thanks
-- PMM
Paolo Bonzini March 24, 2017, 11:48 a.m. UTC | #3
On 24/03/2017 12:11, Christian Borntraeger wrote:
> One idea. Maybe qemu exits before the other side is ready.
> Does reverting
> 
> commit 864111f422babcf8ce837fb47f7f9e1948446f22
> Author:     Christian Borntraeger <borntraeger@de.ibm.com>
> AuthorDate: Tue Oct 18 09:29:54 2016 +0200
> Commit:     Paolo Bonzini <pbonzini@redhat.com>
> CommitDate: Wed Nov 2 09:28:56 2016 +0100
> 
>     vl: exit qemu on guest panic if -no-shutdown is not set
> 
> help?

Didn't test, but this:

> If yes, does 
> diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
> index 57edf6a..11f48b0 100644
> --- a/tests/boot-serial-test.c
> +++ b/tests/boot-serial-test.c
> @@ -79,8 +79,8 @@ static void test_machine(const void *data)
>      g_assert(fd != -1);
>  
>      args = g_strdup_printf("-M %s,accel=tcg -chardev file,id=serial0,path=%s"
> -                           " -serial chardev:serial0 %s", test->machine,
> -                           tmpname, test->extra);
> +                           " -no-shutdown -serial chardev:serial0 %s",
> +                           test->machine, tmpname, test->extra);
>  
>      qtest_start(args);
>      unlink(tmpname);
> 
> also help?

seems to help (survives about 1 minute, while usually it fails in a few
seconds).

Paolo
Peter Maydell March 24, 2017, 11:50 a.m. UTC | #4
On 24 March 2017 at 11:11, Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> One idea. Maybe qemu exits before the other side is ready.
> Does reverting
>
> commit 864111f422babcf8ce837fb47f7f9e1948446f22
> Author:     Christian Borntraeger <borntraeger@de.ibm.com>
> AuthorDate: Tue Oct 18 09:29:54 2016 +0200
> Commit:     Paolo Bonzini <pbonzini@redhat.com>
> CommitDate: Wed Nov 2 09:28:56 2016 +0100
>
>     vl: exit qemu on guest panic if -no-shutdown is not set
>
> help?

I didn't test this...

> If yes, does
> diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
> index 57edf6a..11f48b0 100644
> --- a/tests/boot-serial-test.c
> +++ b/tests/boot-serial-test.c
> @@ -79,8 +79,8 @@ static void test_machine(const void *data)
>      g_assert(fd != -1);
>
>      args = g_strdup_printf("-M %s,accel=tcg -chardev file,id=serial0,path=%s"
> -                           " -serial chardev:serial0 %s", test->machine,
> -                           tmpname, test->extra);
> +                           " -no-shutdown -serial chardev:serial0 %s",
> +                           test->machine, tmpname, test->extra);

...but this doesn't help.

thanks
-- PMM
Peter Maydell March 24, 2017, 12:56 p.m. UTC | #5
On 24 March 2017 at 11:50, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 24 March 2017 at 11:11, Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>> diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
>> index 57edf6a..11f48b0 100644
>> --- a/tests/boot-serial-test.c
>> +++ b/tests/boot-serial-test.c
>> @@ -79,8 +79,8 @@ static void test_machine(const void *data)
>>      g_assert(fd != -1);
>>
>>      args = g_strdup_printf("-M %s,accel=tcg -chardev file,id=serial0,path=%s"
>> -                           " -serial chardev:serial0 %s", test->machine,
>> -                           tmpname, test->extra);
>> +                           " -no-shutdown -serial chardev:serial0 %s",
>> +                           test->machine, tmpname, test->extra);
>
> ...but this doesn't help.

I think I must have got the testing wrong here somehow, because
trying it again, it does seem to cause the failures to stop.
So I guess this must be the bug...

thanks
-- PMM
Christian Borntraeger March 24, 2017, 1:02 p.m. UTC | #6
On 03/24/2017 01:56 PM, Peter Maydell wrote:
> On 24 March 2017 at 11:50, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 24 March 2017 at 11:11, Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>> diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
>>> index 57edf6a..11f48b0 100644
>>> --- a/tests/boot-serial-test.c
>>> +++ b/tests/boot-serial-test.c
>>> @@ -79,8 +79,8 @@ static void test_machine(const void *data)
>>>      g_assert(fd != -1);
>>>
>>>      args = g_strdup_printf("-M %s,accel=tcg -chardev file,id=serial0,path=%s"
>>> -                           " -serial chardev:serial0 %s", test->machine,
>>> -                           tmpname, test->extra);
>>> +                           " -no-shutdown -serial chardev:serial0 %s",
>>> +                           test->machine, tmpname, test->extra);
>>
>> ...but this doesn't help.
> 
> I think I must have got the testing wrong here somehow, because
> trying it again, it does seem to cause the failures to stop.
> So I guess this must be the bug...

Will send a proper patch.
diff mbox

Patch

diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
index 57edf6a..11f48b0 100644
--- a/tests/boot-serial-test.c
+++ b/tests/boot-serial-test.c
@@ -79,8 +79,8 @@  static void test_machine(const void *data)
     g_assert(fd != -1);
 
     args = g_strdup_printf("-M %s,accel=tcg -chardev file,id=serial0,path=%s"
-                           " -serial chardev:serial0 %s", test->machine,
-                           tmpname, test->extra);
+                           " -no-shutdown -serial chardev:serial0 %s",
+                           test->machine, tmpname, test->extra);
 
     qtest_start(args);
     unlink(tmpname);