diff mbox series

iotests 041 intermittent failure (netbsd)

Message ID CAFEAcA_-ARyPM0gB2Y_FKdUp9DYRNbz1GFU1AzFE9UZgjWNazQ@mail.gmail.com
State New
Headers show
Series iotests 041 intermittent failure (netbsd) | expand

Commit Message

Peter Maydell April 9, 2021, 9:43 a.m. UTC
Just hit this (presumably intermittent) 041 failure running
the build-and-test on the tests/vm netbsd setup. Does it look
familiar to anybody?


  TEST   iotest-qcow2: 041 [fail]
QEMU          --
"/home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/../../qemu-system-aarch64"
-nodefaults -display none -accel qtest -machine virt
QEMU_IMG      --
"/home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/../../qemu-img"
QEMU_IO       --
"/home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/../../qemu-io"
--cache writeback --aio threads -f qcow2
QEMU_NBD      --
"/home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/../../qemu-nbd"
IMGFMT        -- qcow2
IMGPROTO      -- file
PLATFORM      -- NetBSD/amd64 localhost 9.1
TEST_DIR      -- /home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/scratch
SOCK_DIR      -- /tmp/tmp5wf5bgkm
SOCKET_SCM_HELPER --
+  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/iotests.py",
line 482, in timeout
+    raise Exception(self.errmsg)
+Exception: Timeout waiting for job to pause
+
 ----------------------------------------------------------------------
 Ran 107 tests

-OK
+FAILED (errors=1)


thanks
-- PMM

Comments

Philippe Mathieu-Daudé April 9, 2021, 10:22 a.m. UTC | #1
On 4/9/21 11:43 AM, Peter Maydell wrote:
> Just hit this (presumably intermittent) 041 failure running
> the build-and-test on the tests/vm netbsd setup. Does it look
> familiar to anybody?

This one is known as the mysterious failure:
https://www.mail-archive.com/qemu-block@nongnu.org/msg73321.html

> 
> 
>   TEST   iotest-qcow2: 041 [fail]
> QEMU          --
> "/home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/../../qemu-system-aarch64"
> -nodefaults -display none -accel qtest -machine virt
> QEMU_IMG      --
> "/home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/../../qemu-img"
> QEMU_IO       --
> "/home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/../../qemu-io"
> --cache writeback --aio threads -f qcow2
> QEMU_NBD      --
> "/home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/../../qemu-nbd"
> IMGFMT        -- qcow2
> IMGPROTO      -- file
> PLATFORM      -- NetBSD/amd64 localhost 9.1
> TEST_DIR      -- /home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/scratch
> SOCK_DIR      -- /tmp/tmp5wf5bgkm
> SOCKET_SCM_HELPER --
> --- /home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/041.out
> +++ 041.out.bad
> @@ -1,5 +1,29 @@
> -...........................................................................................................
> +..............................................................................E............................
> +======================================================================
> +ERROR: test_pause (__main__.TestSingleDrive)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/041", line
> 111, in test_pause
> +    self.pause_job('drive0')
> +  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/iotests.py",
> line 1064, in pause_job
> +    return self.pause_wait(job_id)
> +  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/iotests.py",
> line 1050, in pause_wait
> +    result = self.vm.qmp('query-block-jobs')
> +  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/machine.py",
> line 560, in qmp
> fcntl(): Invalid argument
> +    return self._qmp.cmd(cmd, args=qmp_args)
> +  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/qmp.py",
> line 278, in cmd
> +    return self.cmd_obj(qmp_cmd)
> +  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/qmp.py",
> line 257, in cmd_obj
> +    resp = self.__json_read()
> +  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/qmp.py",
> line 140, in __json_read
> +    data = self.__sockfile.readline()
> +  File "/usr/pkg/lib/python3.7/socket.py", line 589, in readinto
> +    return self._sock.recv_into(b)
> +  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/iotests.py",
> line 482, in timeout
> +    raise Exception(self.errmsg)
> +Exception: Timeout waiting for job to pause
> +
>  ----------------------------------------------------------------------
>  Ran 107 tests
> 
> -OK
> +FAILED (errors=1)
> 
> 
> thanks
> -- PMM
>
Daniel P. Berrangé April 9, 2021, 10:31 a.m. UTC | #2
On Fri, Apr 09, 2021 at 12:22:26PM +0200, Philippe Mathieu-Daudé wrote:
> On 4/9/21 11:43 AM, Peter Maydell wrote:
> > Just hit this (presumably intermittent) 041 failure running
> > the build-and-test on the tests/vm netbsd setup. Does it look
> > familiar to anybody?
> 
> This one is known as the mysterious failure:
> https://www.mail-archive.com/qemu-block@nongnu.org/msg73321.html

If the test has been flakey with no confirmed fix since Sept 2020,
then it is well overdue to be switched to disabled by default, at
least on the platforms it is known to be flakey on.

Non-determinsitic failures accumulate until you find yourself in
a situation where its impossible to get CI to pass. We must be
aggressive in either (a) fixing non-deterministic failures promptly,
or (b) disabling the test until someone has time to work on a fix.


Regards,
Daniel
Kevin Wolf April 9, 2021, 11:37 a.m. UTC | #3
Am 09.04.2021 um 12:31 hat Daniel P. Berrangé geschrieben:
> On Fri, Apr 09, 2021 at 12:22:26PM +0200, Philippe Mathieu-Daudé wrote:
> > On 4/9/21 11:43 AM, Peter Maydell wrote:
> > > Just hit this (presumably intermittent) 041 failure running
> > > the build-and-test on the tests/vm netbsd setup. Does it look
> > > familiar to anybody?
> > 
> > This one is known as the mysterious failure:
> > https://www.mail-archive.com/qemu-block@nongnu.org/msg73321.html
> 
> If the test has been flakey with no confirmed fix since Sept 2020,
> then it is well overdue to be switched to disabled by default, at
> least on the platforms it is known to be flakey on.

Why do you think this is the same problem? It is a completely different
error message, happening in a different test function. The problems
reported in September were fixed in the next version of the pull
request.

What Peter is reporting here is probably unrelated to NetBSD, but to
overloaded test hosts. QMPTestCase.pause_wait() uses a timeout of
3 seconds until it decides that the job probably has just failed to
pause at all, so that the test case wouldn't hang indefinitely on
failure.

We can increase the timeout, but of course, that doesn't guarantee that
we'll never hit it again on very slow test hosts.

Kevin
Philippe Mathieu-Daudé April 9, 2021, 1:41 p.m. UTC | #4
On 4/9/21 1:37 PM, Kevin Wolf wrote:
> Am 09.04.2021 um 12:31 hat Daniel P. Berrangé geschrieben:
>> On Fri, Apr 09, 2021 at 12:22:26PM +0200, Philippe Mathieu-Daudé wrote:
>>> On 4/9/21 11:43 AM, Peter Maydell wrote:
>>>> Just hit this (presumably intermittent) 041 failure running
>>>> the build-and-test on the tests/vm netbsd setup. Does it look
>>>> familiar to anybody?
>>>
>>> This one is known as the mysterious failure:
>>> https://www.mail-archive.com/qemu-block@nongnu.org/msg73321.html
>>
>> If the test has been flakey with no confirmed fix since Sept 2020,
>> then it is well overdue to be switched to disabled by default, at
>> least on the platforms it is known to be flakey on.
> 
> Why do you think this is the same problem? It is a completely different
> error message, happening in a different test function. The problems
> reported in September were fixed in the next version of the pull
> request.

Oops my bad, I thought this was the same, sorry.

> What Peter is reporting here is probably unrelated to NetBSD, but to
> overloaded test hosts. QMPTestCase.pause_wait() uses a timeout of
> 3 seconds until it decides that the job probably has just failed to
> pause at all, so that the test case wouldn't hang indefinitely on
> failure.
> 
> We can increase the timeout, but of course, that doesn't guarantee that
> we'll never hit it again on very slow test hosts.
> 
> Kevin
>
diff mbox series

Patch

--- /home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/041.out
+++ 041.out.bad
@@ -1,5 +1,29 @@ 
-...........................................................................................................
+..............................................................................E............................
+======================================================================
+ERROR: test_pause (__main__.TestSingleDrive)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/041", line
111, in test_pause
+    self.pause_job('drive0')
+  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/iotests.py",
line 1064, in pause_job
+    return self.pause_wait(job_id)
+  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/iotests.py",
line 1050, in pause_wait
+    result = self.vm.qmp('query-block-jobs')
+  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/machine.py",
line 560, in qmp
fcntl(): Invalid argument
+    return self._qmp.cmd(cmd, args=qmp_args)
+  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/qmp.py",
line 278, in cmd
+    return self.cmd_obj(qmp_cmd)
+  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/qmp.py",
line 257, in cmd_obj
+    resp = self.__json_read()
+  File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/qmp.py",
line 140, in __json_read
+    data = self.__sockfile.readline()
+  File "/usr/pkg/lib/python3.7/socket.py", line 589, in readinto
+    return self._sock.recv_into(b)