diff mbox

[for-2.5?] qemu-iotests: Reduce racy output in 028

Message ID 1449804437-30373-1-git-send-email-eblake@redhat.com
State New
Headers show

Commit Message

Eric Blake Dec. 11, 2015, 3:27 a.m. UTC
On my machine, './check -qcow2 028' was failing about 80% of the
time, due to a race in how many times the repeated attempts
to run 'info block-jobs' could occur before the job was done,
showing up as a failure of fewer '(qemu) ' prompts than in the
expected output.  Silence the output during the repetitions, then
add a final clean command to keep the expected output useful;
once patched, I was finally able to run the test 20 times in a
row with no failures.

Signed-off-by: Eric Blake <eblake@redhat.com>
---

Not sure if this is the best fix, or if it is even appropriate
for inclusion in 2.5 this late in the game.

 tests/qemu-iotests/028     | 6 ++++--
 tests/qemu-iotests/028.out | 3 ---
 2 files changed, 4 insertions(+), 5 deletions(-)

Comments

Peter Maydell Dec. 11, 2015, 1:22 p.m. UTC | #1
On 11 December 2015 at 03:27, Eric Blake <eblake@redhat.com> wrote:
> On my machine, './check -qcow2 028' was failing about 80% of the time,
> due to a race in how many times the repeated attempts to run 'info
> block-jobs' could occur before the job was done, showing up as a
> failure of fewer '(qemu) ' prompts than in the expected output.

I'm guessing from the git history for tests/qemu-iotests/028.out
that this isn't a recent regression? That suggests to me that we
shouldn't worry about fixing this for 2.5.

PS: gmail displays your email with no line breaks in it at all,
but I suspect that's gmail's fault -- perhaps it got confused by
all the hardcoded ESC characters...

thanks
-- PMM
John Snow Dec. 12, 2015, 1:23 a.m. UTC | #2
On 12/10/2015 10:27 PM, Eric Blake wrote:
> On my machine, './check -qcow2 028' was failing about 80% of the
> time, due to a race in how many times the repeated attempts
> to run 'info block-jobs' could occur before the job was done,
> showing up as a failure of fewer '(qemu) ' prompts than in the
> expected output.  Silence the output during the repetitions, then
> add a final clean command to keep the expected output useful;
> once patched, I was finally able to run the test 20 times in a
> row with no failures.
> 

Son of a gun:

028 4s ... - output mismatch (see 028.out.bad)
--- /home/bos/jsnow/src/q3mu/tests/qemu-iotests/028.out	2015-10-09
13:33:06.823758271 -0400
+++ 028.out.bad	2015-12-11 20:12:23.688952510 -0500
@@ -470,7 +470,7 @@

 Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832
backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
 (qemu)
-(qemu) info block-jobs
+info block-jobs
 Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit
0 bytes/s
 info block-jobs
 No active jobs


Something must have changed to make it more prone to races, somehow.
Venus... Swamp gas...

> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> 
> Not sure if this is the best fix, or if it is even appropriate
> for inclusion in 2.5 this late in the game.
> 
>  tests/qemu-iotests/028     | 6 ++++--
>  tests/qemu-iotests/028.out | 3 ---
>  2 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
> index a1f4423..756c335 100755
> --- a/tests/qemu-iotests/028
> +++ b/tests/qemu-iotests/028
> @@ -111,10 +111,12 @@ h=$QEMU_HANDLE
>  QEMU_COMM_TIMEOUT=1
> 
>  # Silence output since it contains the disk image path and QEMU's readline
> -# character echoing makes it very hard to filter the output
> +# character echoing makes it very hard to filter the output. Plus, there
> +# is no telling how many times the command will repeat before succeeding.
>  _send_qemu_cmd $h "drive_backup disk ${TEST_IMG}.copy" "(qemu)" >/dev/null
>  _send_qemu_cmd $h "" "Formatting" | _filter_img_create
> -qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs"
> +qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs" >/dev/null
> +_send_qemu_cmd $h "info block-jobs" "No active jobs"
>  _send_qemu_cmd $h 'quit' ""
> 
>  # Base image sectors
> diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out
> index 29c9972..e03452d 100644
> --- a/tests/qemu-iotests/028.out
> +++ b/tests/qemu-iotests/028.out
> @@ -469,10 +469,7 @@ No errors were found on the image.
>  block-backup
> 
>  Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
> -(qemu)
>  (qemu) iininfinfoinfo info binfo blinfo bloinfo blocinfo blockinfo block-info block-jinfo block-joinfo block-jobinfo block-jobs
> -Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit 0 bytes/s
> -iininfinfoinfo info binfo blinfo bloinfo blocinfo blockinfo block-info block-jinfo block-joinfo block-jobinfo block-jobs
>  No active jobs
>  === IO: pattern 195
>  read 512/512 bytes at offset 3221194240
>
Eric Blake Dec. 18, 2015, 8:46 p.m. UTC | #3
On 12/11/2015 06:23 PM, John Snow wrote:
> 

ping

> 
> On 12/10/2015 10:27 PM, Eric Blake wrote:
>> On my machine, './check -qcow2 028' was failing about 80% of the
>> time, due to a race in how many times the repeated attempts
>> to run 'info block-jobs' could occur before the job was done,
>> showing up as a failure of fewer '(qemu) ' prompts than in the
>> expected output.  Silence the output during the repetitions, then
>> add a final clean command to keep the expected output useful;
>> once patched, I was finally able to run the test 20 times in a
>> row with no failures.
>>
> 
> Son of a gun:
> 
> 028 4s ... - output mismatch (see 028.out.bad)
> --- /home/bos/jsnow/src/q3mu/tests/qemu-iotests/028.out	2015-10-09
> 13:33:06.823758271 -0400
> +++ 028.out.bad	2015-12-11 20:12:23.688952510 -0500
> @@ -470,7 +470,7 @@
> 
>  Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832
> backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
>  (qemu)
> -(qemu) info block-jobs
> +info block-jobs
>  Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit
> 0 bytes/s
>  info block-jobs
>  No active jobs
> 
> 
> Something must have changed to make it more prone to races, somehow.
> Venus... Swamp gas...

Now that 2.6 is open, is my approach right, or do we need a more robust
solution?

> 
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---
>>
>> Not sure if this is the best fix, or if it is even appropriate
>> for inclusion in 2.5 this late in the game.
>>
>>  tests/qemu-iotests/028     | 6 ++++--
>>  tests/qemu-iotests/028.out | 3 ---
>>  2 files changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
>> index a1f4423..756c335 100755
>> --- a/tests/qemu-iotests/028
>> +++ b/tests/qemu-iotests/028
>> @@ -111,10 +111,12 @@ h=$QEMU_HANDLE
>>  QEMU_COMM_TIMEOUT=1
>>
>>  # Silence output since it contains the disk image path and QEMU's readline
>> -# character echoing makes it very hard to filter the output
>> +# character echoing makes it very hard to filter the output. Plus, there
>> +# is no telling how many times the command will repeat before succeeding.
>>  _send_qemu_cmd $h "drive_backup disk ${TEST_IMG}.copy" "(qemu)" >/dev/null
>>  _send_qemu_cmd $h "" "Formatting" | _filter_img_create
>> -qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs"
>> +qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs" >/dev/null
>> +_send_qemu_cmd $h "info block-jobs" "No active jobs"
>>  _send_qemu_cmd $h 'quit' ""
>>
>>  # Base image sectors
>> diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out
>> index 29c9972..e03452d 100644
>> --- a/tests/qemu-iotests/028.out
>> +++ b/tests/qemu-iotests/028.out
>> @@ -469,10 +469,7 @@ No errors were found on the image.
>>  block-backup
>>
>>  Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
>> -(qemu)
>>  (qemu) iininfinfoinfo info binfo blinfo bloinfo blocinfo blockinfo block-info block-jinfo block-joinfo block-jobinfo block-jobs
>> -Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit 0 bytes/s
>> -iininfinfoinfo info binfo blinfo bloinfo blocinfo blockinfo block-info block-jinfo block-joinfo block-jobinfo block-jobs
>>  No active jobs
>>  === IO: pattern 195
>>  read 512/512 bytes at offset 3221194240
>>
>
John Snow Dec. 18, 2015, 9:49 p.m. UTC | #4
On 12/18/2015 03:46 PM, Eric Blake wrote:
> On 12/11/2015 06:23 PM, John Snow wrote:
>>
> 
> ping
> 
>>
>> On 12/10/2015 10:27 PM, Eric Blake wrote:
>>> On my machine, './check -qcow2 028' was failing about 80% of the
>>> time, due to a race in how many times the repeated attempts
>>> to run 'info block-jobs' could occur before the job was done,
>>> showing up as a failure of fewer '(qemu) ' prompts than in the
>>> expected output.  Silence the output during the repetitions, then
>>> add a final clean command to keep the expected output useful;
>>> once patched, I was finally able to run the test 20 times in a
>>> row with no failures.
>>>
>>
>> Son of a gun:
>>
>> 028 4s ... - output mismatch (see 028.out.bad)
>> --- /home/bos/jsnow/src/q3mu/tests/qemu-iotests/028.out	2015-10-09
>> 13:33:06.823758271 -0400
>> +++ 028.out.bad	2015-12-11 20:12:23.688952510 -0500
>> @@ -470,7 +470,7 @@
>>
>>  Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832
>> backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
>>  (qemu)
>> -(qemu) info block-jobs
>> +info block-jobs
>>  Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit
>> 0 bytes/s
>>  info block-jobs
>>  No active jobs
>>
>>
>> Something must have changed to make it more prone to races, somehow.
>> Venus... Swamp gas...
> 
> Now that 2.6 is open, is my approach right, or do we need a more robust
> solution?
> 
>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>> ---
>>>
>>> Not sure if this is the best fix, or if it is even appropriate
>>> for inclusion in 2.5 this late in the game.
>>>
>>>  tests/qemu-iotests/028     | 6 ++++--
>>>  tests/qemu-iotests/028.out | 3 ---
>>>  2 files changed, 4 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
>>> index a1f4423..756c335 100755
>>> --- a/tests/qemu-iotests/028
>>> +++ b/tests/qemu-iotests/028
>>> @@ -111,10 +111,12 @@ h=$QEMU_HANDLE
>>>  QEMU_COMM_TIMEOUT=1
>>>
>>>  # Silence output since it contains the disk image path and QEMU's readline
>>> -# character echoing makes it very hard to filter the output
>>> +# character echoing makes it very hard to filter the output. Plus, there
>>> +# is no telling how many times the command will repeat before succeeding.
>>>  _send_qemu_cmd $h "drive_backup disk ${TEST_IMG}.copy" "(qemu)" >/dev/null
>>>  _send_qemu_cmd $h "" "Formatting" | _filter_img_create
>>> -qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs"
>>> +qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs" >/dev/null
>>> +_send_qemu_cmd $h "info block-jobs" "No active jobs"
>>>  _send_qemu_cmd $h 'quit' ""
>>>
>>>  # Base image sectors
>>> diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out
>>> index 29c9972..e03452d 100644
>>> --- a/tests/qemu-iotests/028.out
>>> +++ b/tests/qemu-iotests/028.out
>>> @@ -469,10 +469,7 @@ No errors were found on the image.
>>>  block-backup
>>>
>>>  Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
>>> -(qemu)
>>>  (qemu) iininfinfoinfo info binfo blinfo bloinfo blocinfo blockinfo block-info block-jinfo block-joinfo block-jobinfo block-jobs
>>> -Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit 0 bytes/s
>>> -iininfinfoinfo info binfo blinfo bloinfo blocinfo blockinfo block-info block-jinfo block-joinfo block-jobinfo block-jobs
>>>  No active jobs
>>>  === IO: pattern 195
>>>  read 512/512 bytes at offset 3221194240
>>>
>>
> 

It looks sane to me, because you are deleting the insane portion of the
expected output. The final "info block" command serves as an assertion
that the loop polled successfully.

Reviewed-by: John Snow <jsnow@redhat.com>
Kevin Wolf Jan. 19, 2016, 4:15 p.m. UTC | #5
Am 18.12.2015 um 22:49 hat John Snow geschrieben:
> 
> 
> On 12/18/2015 03:46 PM, Eric Blake wrote:
> > On 12/11/2015 06:23 PM, John Snow wrote:
> >> On 12/10/2015 10:27 PM, Eric Blake wrote:
> >>> On my machine, './check -qcow2 028' was failing about 80% of the
> >>> time, due to a race in how many times the repeated attempts
> >>> to run 'info block-jobs' could occur before the job was done,
> >>> showing up as a failure of fewer '(qemu) ' prompts than in the
> >>> expected output.  Silence the output during the repetitions, then
> >>> add a final clean command to keep the expected output useful;
> >>> once patched, I was finally able to run the test 20 times in a
> >>> row with no failures.

> It looks sane to me, because you are deleting the insane portion of the
> expected output. The final "info block" command serves as an assertion
> that the loop polled successfully.
> 
> Reviewed-by: John Snow <jsnow@redhat.com>

Thanks, applied to the block branch.

Kevin
diff mbox

Patch

diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
index a1f4423..756c335 100755
--- a/tests/qemu-iotests/028
+++ b/tests/qemu-iotests/028
@@ -111,10 +111,12 @@  h=$QEMU_HANDLE
 QEMU_COMM_TIMEOUT=1

 # Silence output since it contains the disk image path and QEMU's readline
-# character echoing makes it very hard to filter the output
+# character echoing makes it very hard to filter the output. Plus, there
+# is no telling how many times the command will repeat before succeeding.
 _send_qemu_cmd $h "drive_backup disk ${TEST_IMG}.copy" "(qemu)" >/dev/null
 _send_qemu_cmd $h "" "Formatting" | _filter_img_create
-qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs"
+qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs" >/dev/null
+_send_qemu_cmd $h "info block-jobs" "No active jobs"
 _send_qemu_cmd $h 'quit' ""

 # Base image sectors
diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out
index 29c9972..e03452d 100644
--- a/tests/qemu-iotests/028.out
+++ b/tests/qemu-iotests/028.out
@@ -469,10 +469,7 @@  No errors were found on the image.
 block-backup

 Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
-(qemu)
 (qemu) iininfinfoinfo info binfo blinfo bloinfo blocinfo blockinfo block-info block-jinfo block-joinfo block-jobinfo block-jobs
-Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit 0 bytes/s
-iininfinfoinfo info binfo blinfo bloinfo blocinfo blockinfo block-info block-jinfo block-joinfo block-jobinfo block-jobs
 No active jobs
 === IO: pattern 195
 read 512/512 bytes at offset 3221194240