Message ID | 1449804437-30373-1-git-send-email-eblake@redhat.com |
---|---|
State | New |
Headers | show |
On 11 December 2015 at 03:27, Eric Blake <eblake@redhat.com> wrote: > On my machine, './check -qcow2 028' was failing about 80% of the time, > due to a race in how many times the repeated attempts to run 'info > block-jobs' could occur before the job was done, showing up as a > failure of fewer '(qemu) ' prompts than in the expected output. I'm guessing from the git history for tests/qemu-iotests/028.out that this isn't a recent regression? That suggests to me that we shouldn't worry about fixing this for 2.5. PS: gmail displays your email with no line breaks in it at all, but I suspect that's gmail's fault -- perhaps it got confused by all the hardcoded ESC characters... thanks -- PMM
On 12/10/2015 10:27 PM, Eric Blake wrote: > On my machine, './check -qcow2 028' was failing about 80% of the > time, due to a race in how many times the repeated attempts > to run 'info block-jobs' could occur before the job was done, > showing up as a failure of fewer '(qemu) ' prompts than in the > expected output. Silence the output during the repetitions, then > add a final clean command to keep the expected output useful; > once patched, I was finally able to run the test 20 times in a > row with no failures. > Son of a gun: 028 4s ... - output mismatch (see 028.out.bad) --- /home/bos/jsnow/src/q3mu/tests/qemu-iotests/028.out 2015-10-09 13:33:06.823758271 -0400 +++ 028.out.bad 2015-12-11 20:12:23.688952510 -0500 @@ -470,7 +470,7 @@ Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT (qemu) -(qemu) info block-jobs +info block-jobs Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit 0 bytes/s info block-jobs No active jobs Something must have changed to make it more prone to races, somehow. Venus... Swamp gas... > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > > Not sure if this is the best fix, or if it is even appropriate > for inclusion in 2.5 this late in the game. > > tests/qemu-iotests/028 | 6 ++++-- > tests/qemu-iotests/028.out | 3 --- > 2 files changed, 4 insertions(+), 5 deletions(-) > > diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028 > index a1f4423..756c335 100755 > --- a/tests/qemu-iotests/028 > +++ b/tests/qemu-iotests/028 > @@ -111,10 +111,12 @@ h=$QEMU_HANDLE > QEMU_COMM_TIMEOUT=1 > > # Silence output since it contains the disk image path and QEMU's readline > -# character echoing makes it very hard to filter the output > +# character echoing makes it very hard to filter the output. Plus, there > +# is no telling how many times the command will repeat before succeeding. > _send_qemu_cmd $h "drive_backup disk ${TEST_IMG}.copy" "(qemu)" >/dev/null > _send_qemu_cmd $h "" "Formatting" | _filter_img_create > -qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs" > +qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs" >/dev/null > +_send_qemu_cmd $h "info block-jobs" "No active jobs" > _send_qemu_cmd $h 'quit' "" > > # Base image sectors > diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out > index 29c9972..e03452d 100644 > --- a/tests/qemu-iotests/028.out > +++ b/tests/qemu-iotests/028.out > @@ -469,10 +469,7 @@ No errors were found on the image. > block-backup > > Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT > -(qemu) > (qemu) i[K[Din[K[D[Dinf[K[D[D[Dinfo[K[D[D[D[Dinfo [K[D[D[D[D[Dinfo b[K[D[D[D[D[D[Dinfo bl[K[D[D[D[D[D[D[Dinfo blo[K[D[D[D[D[D[D[D[Dinfo bloc[K[D[D[D[D[D[D[D[D[Dinfo block[K[D[D[D[D[D[D[D[D[D[Dinfo block-[K[D[D[D[D[D[D[D[D[D[D[Dinfo block-j[K[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jo[K[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-job[K[D[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jobs[K > -Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit 0 bytes/s > -i[K[Din[K[D[Dinf[K[D[D[Dinfo[K[D[D[D[Dinfo [K[D[D[D[D[Dinfo b[K[D[D[D[D[D[Dinfo bl[K[D[D[D[D[D[D[Dinfo blo[K[D[D[D[D[D[D[D[Dinfo bloc[K[D[D[D[D[D[D[D[D[Dinfo block[K[D[D[D[D[D[D[D[D[D[Dinfo block-[K[D[D[D[D[D[D[D[D[D[D[Dinfo block-j[K[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jo[K[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-job[K[D[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jobs[K > No active jobs > === IO: pattern 195 > read 512/512 bytes at offset 3221194240 >
On 12/11/2015 06:23 PM, John Snow wrote: > ping > > On 12/10/2015 10:27 PM, Eric Blake wrote: >> On my machine, './check -qcow2 028' was failing about 80% of the >> time, due to a race in how many times the repeated attempts >> to run 'info block-jobs' could occur before the job was done, >> showing up as a failure of fewer '(qemu) ' prompts than in the >> expected output. Silence the output during the repetitions, then >> add a final clean command to keep the expected output useful; >> once patched, I was finally able to run the test 20 times in a >> row with no failures. >> > > Son of a gun: > > 028 4s ... - output mismatch (see 028.out.bad) > --- /home/bos/jsnow/src/q3mu/tests/qemu-iotests/028.out 2015-10-09 > 13:33:06.823758271 -0400 > +++ 028.out.bad 2015-12-11 20:12:23.688952510 -0500 > @@ -470,7 +470,7 @@ > > Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 > backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT > (qemu) > -(qemu) info block-jobs > +info block-jobs > Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit > 0 bytes/s > info block-jobs > No active jobs > > > Something must have changed to make it more prone to races, somehow. > Venus... Swamp gas... Now that 2.6 is open, is my approach right, or do we need a more robust solution? > >> Signed-off-by: Eric Blake <eblake@redhat.com> >> --- >> >> Not sure if this is the best fix, or if it is even appropriate >> for inclusion in 2.5 this late in the game. >> >> tests/qemu-iotests/028 | 6 ++++-- >> tests/qemu-iotests/028.out | 3 --- >> 2 files changed, 4 insertions(+), 5 deletions(-) >> >> diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028 >> index a1f4423..756c335 100755 >> --- a/tests/qemu-iotests/028 >> +++ b/tests/qemu-iotests/028 >> @@ -111,10 +111,12 @@ h=$QEMU_HANDLE >> QEMU_COMM_TIMEOUT=1 >> >> # Silence output since it contains the disk image path and QEMU's readline >> -# character echoing makes it very hard to filter the output >> +# character echoing makes it very hard to filter the output. Plus, there >> +# is no telling how many times the command will repeat before succeeding. >> _send_qemu_cmd $h "drive_backup disk ${TEST_IMG}.copy" "(qemu)" >/dev/null >> _send_qemu_cmd $h "" "Formatting" | _filter_img_create >> -qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs" >> +qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs" >/dev/null >> +_send_qemu_cmd $h "info block-jobs" "No active jobs" >> _send_qemu_cmd $h 'quit' "" >> >> # Base image sectors >> diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out >> index 29c9972..e03452d 100644 >> --- a/tests/qemu-iotests/028.out >> +++ b/tests/qemu-iotests/028.out >> @@ -469,10 +469,7 @@ No errors were found on the image. >> block-backup >> >> Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT >> -(qemu) >> (qemu) i[K[Din[K[D[Dinf[K[D[D[Dinfo[K[D[D[D[Dinfo [K[D[D[D[D[Dinfo b[K[D[D[D[D[D[Dinfo bl[K[D[D[D[D[D[D[Dinfo blo[K[D[D[D[D[D[D[D[Dinfo bloc[K[D[D[D[D[D[D[D[D[Dinfo block[K[D[D[D[D[D[D[D[D[D[Dinfo block-[K[D[D[D[D[D[D[D[D[D[D[Dinfo block-j[K[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jo[K[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-job[K[D[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jobs[K >> -Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit 0 bytes/s >> -i[K[Din[K[D[Dinf[K[D[D[Dinfo[K[D[D[D[Dinfo [K[D[D[D[D[Dinfo b[K[D[D[D[D[D[Dinfo bl[K[D[D[D[D[D[D[Dinfo blo[K[D[D[D[D[D[D[D[Dinfo bloc[K[D[D[D[D[D[D[D[D[Dinfo block[K[D[D[D[D[D[D[D[D[D[Dinfo block-[K[D[D[D[D[D[D[D[D[D[D[Dinfo block-j[K[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jo[K[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-job[K[D[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jobs[K >> No active jobs >> === IO: pattern 195 >> read 512/512 bytes at offset 3221194240 >> >
On 12/18/2015 03:46 PM, Eric Blake wrote: > On 12/11/2015 06:23 PM, John Snow wrote: >> > > ping > >> >> On 12/10/2015 10:27 PM, Eric Blake wrote: >>> On my machine, './check -qcow2 028' was failing about 80% of the >>> time, due to a race in how many times the repeated attempts >>> to run 'info block-jobs' could occur before the job was done, >>> showing up as a failure of fewer '(qemu) ' prompts than in the >>> expected output. Silence the output during the repetitions, then >>> add a final clean command to keep the expected output useful; >>> once patched, I was finally able to run the test 20 times in a >>> row with no failures. >>> >> >> Son of a gun: >> >> 028 4s ... - output mismatch (see 028.out.bad) >> --- /home/bos/jsnow/src/q3mu/tests/qemu-iotests/028.out 2015-10-09 >> 13:33:06.823758271 -0400 >> +++ 028.out.bad 2015-12-11 20:12:23.688952510 -0500 >> @@ -470,7 +470,7 @@ >> >> Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 >> backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT >> (qemu) >> -(qemu) info block-jobs >> +info block-jobs >> Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit >> 0 bytes/s >> info block-jobs >> No active jobs >> >> >> Something must have changed to make it more prone to races, somehow. >> Venus... Swamp gas... > > Now that 2.6 is open, is my approach right, or do we need a more robust > solution? > >> >>> Signed-off-by: Eric Blake <eblake@redhat.com> >>> --- >>> >>> Not sure if this is the best fix, or if it is even appropriate >>> for inclusion in 2.5 this late in the game. >>> >>> tests/qemu-iotests/028 | 6 ++++-- >>> tests/qemu-iotests/028.out | 3 --- >>> 2 files changed, 4 insertions(+), 5 deletions(-) >>> >>> diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028 >>> index a1f4423..756c335 100755 >>> --- a/tests/qemu-iotests/028 >>> +++ b/tests/qemu-iotests/028 >>> @@ -111,10 +111,12 @@ h=$QEMU_HANDLE >>> QEMU_COMM_TIMEOUT=1 >>> >>> # Silence output since it contains the disk image path and QEMU's readline >>> -# character echoing makes it very hard to filter the output >>> +# character echoing makes it very hard to filter the output. Plus, there >>> +# is no telling how many times the command will repeat before succeeding. >>> _send_qemu_cmd $h "drive_backup disk ${TEST_IMG}.copy" "(qemu)" >/dev/null >>> _send_qemu_cmd $h "" "Formatting" | _filter_img_create >>> -qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs" >>> +qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs" >/dev/null >>> +_send_qemu_cmd $h "info block-jobs" "No active jobs" >>> _send_qemu_cmd $h 'quit' "" >>> >>> # Base image sectors >>> diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out >>> index 29c9972..e03452d 100644 >>> --- a/tests/qemu-iotests/028.out >>> +++ b/tests/qemu-iotests/028.out >>> @@ -469,10 +469,7 @@ No errors were found on the image. >>> block-backup >>> >>> Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT >>> -(qemu) >>> (qemu) i[K[Din[K[D[Dinf[K[D[D[Dinfo[K[D[D[D[Dinfo [K[D[D[D[D[Dinfo b[K[D[D[D[D[D[Dinfo bl[K[D[D[D[D[D[D[Dinfo blo[K[D[D[D[D[D[D[D[Dinfo bloc[K[D[D[D[D[D[D[D[D[Dinfo block[K[D[D[D[D[D[D[D[D[D[Dinfo block-[K[D[D[D[D[D[D[D[D[D[D[Dinfo block-j[K[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jo[K[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-job[K[D[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jobs[K >>> -Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit 0 bytes/s >>> -i[K[Din[K[D[Dinf[K[D[D[Dinfo[K[D[D[D[Dinfo [K[D[D[D[D[Dinfo b[K[D[D[D[D[D[Dinfo bl[K[D[D[D[D[D[D[Dinfo blo[K[D[D[D[D[D[D[D[Dinfo bloc[K[D[D[D[D[D[D[D[D[Dinfo block[K[D[D[D[D[D[D[D[D[D[Dinfo block-[K[D[D[D[D[D[D[D[D[D[D[Dinfo block-j[K[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jo[K[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-job[K[D[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jobs[K >>> No active jobs >>> === IO: pattern 195 >>> read 512/512 bytes at offset 3221194240 >>> >> > It looks sane to me, because you are deleting the insane portion of the expected output. The final "info block" command serves as an assertion that the loop polled successfully. Reviewed-by: John Snow <jsnow@redhat.com>
Am 18.12.2015 um 22:49 hat John Snow geschrieben: > > > On 12/18/2015 03:46 PM, Eric Blake wrote: > > On 12/11/2015 06:23 PM, John Snow wrote: > >> On 12/10/2015 10:27 PM, Eric Blake wrote: > >>> On my machine, './check -qcow2 028' was failing about 80% of the > >>> time, due to a race in how many times the repeated attempts > >>> to run 'info block-jobs' could occur before the job was done, > >>> showing up as a failure of fewer '(qemu) ' prompts than in the > >>> expected output. Silence the output during the repetitions, then > >>> add a final clean command to keep the expected output useful; > >>> once patched, I was finally able to run the test 20 times in a > >>> row with no failures. > It looks sane to me, because you are deleting the insane portion of the > expected output. The final "info block" command serves as an assertion > that the loop polled successfully. > > Reviewed-by: John Snow <jsnow@redhat.com> Thanks, applied to the block branch. Kevin
diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028 index a1f4423..756c335 100755 --- a/tests/qemu-iotests/028 +++ b/tests/qemu-iotests/028 @@ -111,10 +111,12 @@ h=$QEMU_HANDLE QEMU_COMM_TIMEOUT=1 # Silence output since it contains the disk image path and QEMU's readline -# character echoing makes it very hard to filter the output +# character echoing makes it very hard to filter the output. Plus, there +# is no telling how many times the command will repeat before succeeding. _send_qemu_cmd $h "drive_backup disk ${TEST_IMG}.copy" "(qemu)" >/dev/null _send_qemu_cmd $h "" "Formatting" | _filter_img_create -qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs" +qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active jobs" >/dev/null +_send_qemu_cmd $h "info block-jobs" "No active jobs" _send_qemu_cmd $h 'quit' "" # Base image sectors diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out index 29c9972..e03452d 100644 --- a/tests/qemu-iotests/028.out +++ b/tests/qemu-iotests/028.out @@ -469,10 +469,7 @@ No errors were found on the image. block-backup Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT -(qemu) (qemu) i[K[Din[K[D[Dinf[K[D[D[Dinfo[K[D[D[D[Dinfo [K[D[D[D[D[Dinfo b[K[D[D[D[D[D[Dinfo bl[K[D[D[D[D[D[D[Dinfo blo[K[D[D[D[D[D[D[D[Dinfo bloc[K[D[D[D[D[D[D[D[D[Dinfo block[K[D[D[D[D[D[D[D[D[D[Dinfo block-[K[D[D[D[D[D[D[D[D[D[D[Dinfo block-j[K[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jo[K[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-job[K[D[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jobs[K -Type backup, device disk: Completed 0 of 4294968832 bytes, speed limit 0 bytes/s -i[K[Din[K[D[Dinf[K[D[D[Dinfo[K[D[D[D[Dinfo [K[D[D[D[D[Dinfo b[K[D[D[D[D[D[Dinfo bl[K[D[D[D[D[D[D[Dinfo blo[K[D[D[D[D[D[D[D[Dinfo bloc[K[D[D[D[D[D[D[D[D[Dinfo block[K[D[D[D[D[D[D[D[D[D[Dinfo block-[K[D[D[D[D[D[D[D[D[D[D[Dinfo block-j[K[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jo[K[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-job[K[D[D[D[D[D[D[D[D[D[D[D[D[D[Dinfo block-jobs[K No active jobs === IO: pattern 195 read 512/512 bytes at offset 3221194240
On my machine, './check -qcow2 028' was failing about 80% of the time, due to a race in how many times the repeated attempts to run 'info block-jobs' could occur before the job was done, showing up as a failure of fewer '(qemu) ' prompts than in the expected output. Silence the output during the repetitions, then add a final clean command to keep the expected output useful; once patched, I was finally able to run the test 20 times in a row with no failures. Signed-off-by: Eric Blake <eblake@redhat.com> --- Not sure if this is the best fix, or if it is even appropriate for inclusion in 2.5 this late in the game. tests/qemu-iotests/028 | 6 ++++-- tests/qemu-iotests/028.out | 3 --- 2 files changed, 4 insertions(+), 5 deletions(-)