Message ID | CAFEAcA_-ARyPM0gB2Y_FKdUp9DYRNbz1GFU1AzFE9UZgjWNazQ@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | iotests 041 intermittent failure (netbsd) | expand |
On 4/9/21 11:43 AM, Peter Maydell wrote: > Just hit this (presumably intermittent) 041 failure running > the build-and-test on the tests/vm netbsd setup. Does it look > familiar to anybody? This one is known as the mysterious failure: https://www.mail-archive.com/qemu-block@nongnu.org/msg73321.html > > > TEST iotest-qcow2: 041 [fail] > QEMU -- > "/home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/../../qemu-system-aarch64" > -nodefaults -display none -accel qtest -machine virt > QEMU_IMG -- > "/home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/../../qemu-img" > QEMU_IO -- > "/home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/../../qemu-io" > --cache writeback --aio threads -f qcow2 > QEMU_NBD -- > "/home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/../../qemu-nbd" > IMGFMT -- qcow2 > IMGPROTO -- file > PLATFORM -- NetBSD/amd64 localhost 9.1 > TEST_DIR -- /home/qemu/qemu-test.bx6kgg/build/tests/qemu-iotests/scratch > SOCK_DIR -- /tmp/tmp5wf5bgkm > SOCKET_SCM_HELPER -- > --- /home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/041.out > +++ 041.out.bad > @@ -1,5 +1,29 @@ > -........................................................................................................... > +..............................................................................E............................ > +====================================================================== > +ERROR: test_pause (__main__.TestSingleDrive) > +---------------------------------------------------------------------- > +Traceback (most recent call last): > + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/041", line > 111, in test_pause > + self.pause_job('drive0') > + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/iotests.py", > line 1064, in pause_job > + return self.pause_wait(job_id) > + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/iotests.py", > line 1050, in pause_wait > + result = self.vm.qmp('query-block-jobs') > + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/machine.py", > line 560, in qmp > fcntl(): Invalid argument > + return self._qmp.cmd(cmd, args=qmp_args) > + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/qmp.py", > line 278, in cmd > + return self.cmd_obj(qmp_cmd) > + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/qmp.py", > line 257, in cmd_obj > + resp = self.__json_read() > + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/qmp.py", > line 140, in __json_read > + data = self.__sockfile.readline() > + File "/usr/pkg/lib/python3.7/socket.py", line 589, in readinto > + return self._sock.recv_into(b) > + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/iotests.py", > line 482, in timeout > + raise Exception(self.errmsg) > +Exception: Timeout waiting for job to pause > + > ---------------------------------------------------------------------- > Ran 107 tests > > -OK > +FAILED (errors=1) > > > thanks > -- PMM >
On Fri, Apr 09, 2021 at 12:22:26PM +0200, Philippe Mathieu-Daudé wrote: > On 4/9/21 11:43 AM, Peter Maydell wrote: > > Just hit this (presumably intermittent) 041 failure running > > the build-and-test on the tests/vm netbsd setup. Does it look > > familiar to anybody? > > This one is known as the mysterious failure: > https://www.mail-archive.com/qemu-block@nongnu.org/msg73321.html If the test has been flakey with no confirmed fix since Sept 2020, then it is well overdue to be switched to disabled by default, at least on the platforms it is known to be flakey on. Non-determinsitic failures accumulate until you find yourself in a situation where its impossible to get CI to pass. We must be aggressive in either (a) fixing non-deterministic failures promptly, or (b) disabling the test until someone has time to work on a fix. Regards, Daniel
Am 09.04.2021 um 12:31 hat Daniel P. Berrangé geschrieben: > On Fri, Apr 09, 2021 at 12:22:26PM +0200, Philippe Mathieu-Daudé wrote: > > On 4/9/21 11:43 AM, Peter Maydell wrote: > > > Just hit this (presumably intermittent) 041 failure running > > > the build-and-test on the tests/vm netbsd setup. Does it look > > > familiar to anybody? > > > > This one is known as the mysterious failure: > > https://www.mail-archive.com/qemu-block@nongnu.org/msg73321.html > > If the test has been flakey with no confirmed fix since Sept 2020, > then it is well overdue to be switched to disabled by default, at > least on the platforms it is known to be flakey on. Why do you think this is the same problem? It is a completely different error message, happening in a different test function. The problems reported in September were fixed in the next version of the pull request. What Peter is reporting here is probably unrelated to NetBSD, but to overloaded test hosts. QMPTestCase.pause_wait() uses a timeout of 3 seconds until it decides that the job probably has just failed to pause at all, so that the test case wouldn't hang indefinitely on failure. We can increase the timeout, but of course, that doesn't guarantee that we'll never hit it again on very slow test hosts. Kevin
On 4/9/21 1:37 PM, Kevin Wolf wrote: > Am 09.04.2021 um 12:31 hat Daniel P. Berrangé geschrieben: >> On Fri, Apr 09, 2021 at 12:22:26PM +0200, Philippe Mathieu-Daudé wrote: >>> On 4/9/21 11:43 AM, Peter Maydell wrote: >>>> Just hit this (presumably intermittent) 041 failure running >>>> the build-and-test on the tests/vm netbsd setup. Does it look >>>> familiar to anybody? >>> >>> This one is known as the mysterious failure: >>> https://www.mail-archive.com/qemu-block@nongnu.org/msg73321.html >> >> If the test has been flakey with no confirmed fix since Sept 2020, >> then it is well overdue to be switched to disabled by default, at >> least on the platforms it is known to be flakey on. > > Why do you think this is the same problem? It is a completely different > error message, happening in a different test function. The problems > reported in September were fixed in the next version of the pull > request. Oops my bad, I thought this was the same, sorry. > What Peter is reporting here is probably unrelated to NetBSD, but to > overloaded test hosts. QMPTestCase.pause_wait() uses a timeout of > 3 seconds until it decides that the job probably has just failed to > pause at all, so that the test case wouldn't hang indefinitely on > failure. > > We can increase the timeout, but of course, that doesn't guarantee that > we'll never hit it again on very slow test hosts. > > Kevin >
--- /home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/041.out +++ 041.out.bad @@ -1,5 +1,29 @@ -........................................................................................................... +..............................................................................E............................ +====================================================================== +ERROR: test_pause (__main__.TestSingleDrive) +---------------------------------------------------------------------- +Traceback (most recent call last): + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/041", line 111, in test_pause + self.pause_job('drive0') + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/iotests.py", line 1064, in pause_job + return self.pause_wait(job_id) + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/iotests.py", line 1050, in pause_wait + result = self.vm.qmp('query-block-jobs') + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/machine.py", line 560, in qmp fcntl(): Invalid argument + return self._qmp.cmd(cmd, args=qmp_args) + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 278, in cmd + return self.cmd_obj(qmp_cmd) + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 257, in cmd_obj + resp = self.__json_read() + File "/home/qemu/qemu-test.bx6kgg/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 140, in __json_read + data = self.__sockfile.readline() + File "/usr/pkg/lib/python3.7/socket.py", line 589, in readinto + return self._sock.recv_into(b)