diff mbox

[6/7] qemu-iotests: 141: reduce likelihood of race condition on systems with fast IO

Message ID 1459848109-29756-7-git-send-email-silbe@linux.vnet.ibm.com
State New
Headers show

Commit Message

Sascha Silbe April 5, 2016, 9:21 a.m. UTC
On systems with fast IO, qemu-io may write more than 1 MiB before
receiving the block-job-cancel command, causing the test case to fail.

141 is inherently racy, but we can at least reduce the likelihood of the
job completing before the cancel command arrives by bumping the size of
the data getting written; we'll try 32 MiB for a start.

Once we actually move enough data around for the block job not to
complete prematurely, the test will still fail because the offset value
in the BLOCK_JOB_CANCELLED event will vary. Since this is more or less
inherent to the nature of this event, we just replace it with a fixed
value globally (in _filter_qmp), the same way we handle timestamps.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
---
 tests/qemu-iotests/141           | 11 ++++++-----
 tests/qemu-iotests/141.out       | 24 ++++++++++++------------
 tests/qemu-iotests/common.filter |  1 +
 3 files changed, 19 insertions(+), 17 deletions(-)

Comments

Max Reitz April 6, 2016, 4:15 p.m. UTC | #1
On 05.04.2016 11:21, Sascha Silbe wrote:
> On systems with fast IO, qemu-io may write more than 1 MiB before
> receiving the block-job-cancel command, causing the test case to fail.
> 
> 141 is inherently racy, but we can at least reduce the likelihood of the
> job completing before the cancel command arrives by bumping the size of
> the data getting written; we'll try 32 MiB for a start.

Hm, interesting. I tried to prevent this by setting the block jobs'
speed to 1, which should make it stop after the block job has processed
the first block of data.

I won't oppose this patch, because if it fixes things for you, that's
good. But I don't think it should be necessary.

Max

> 
> Once we actually move enough data around for the block job not to
> complete prematurely, the test will still fail because the offset value
> in the BLOCK_JOB_CANCELLED event will vary. Since this is more or less
> inherent to the nature of this event, we just replace it with a fixed
> value globally (in _filter_qmp), the same way we handle timestamps.
> 
> Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
> Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
> ---
>  tests/qemu-iotests/141           | 11 ++++++-----
>  tests/qemu-iotests/141.out       | 24 ++++++++++++------------
>  tests/qemu-iotests/common.filter |  1 +
>  3 files changed, 19 insertions(+), 17 deletions(-)
Kevin Wolf April 6, 2016, 4:30 p.m. UTC | #2
Am 06.04.2016 um 18:15 hat Max Reitz geschrieben:
> On 05.04.2016 11:21, Sascha Silbe wrote:
> > On systems with fast IO, qemu-io may write more than 1 MiB before
> > receiving the block-job-cancel command, causing the test case to fail.
> > 
> > 141 is inherently racy, but we can at least reduce the likelihood of the
> > job completing before the cancel command arrives by bumping the size of
> > the data getting written; we'll try 32 MiB for a start.
> 
> Hm, interesting. I tried to prevent this by setting the block jobs'
> speed to 1, which should make it stop after the block job has processed
> the first block of data.
> 
> I won't oppose this patch, because if it fixes things for you, that's
> good. But I don't think it should be necessary.

We don't generally change test cases when they fail. Making a test case
pass doesn't "fix things" per se. It only helps when the failure is a
false positive.

In this case, it looks like there might be a problem with block job
throttling, so maybe we should look into that before changing the test?

Kevin

> > Once we actually move enough data around for the block job not to
> > complete prematurely, the test will still fail because the offset value
> > in the BLOCK_JOB_CANCELLED event will vary. Since this is more or less
> > inherent to the nature of this event, we just replace it with a fixed
> > value globally (in _filter_qmp), the same way we handle timestamps.
> > 
> > Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
> > Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
> > ---
> >  tests/qemu-iotests/141           | 11 ++++++-----
> >  tests/qemu-iotests/141.out       | 24 ++++++++++++------------
> >  tests/qemu-iotests/common.filter |  1 +
> >  3 files changed, 19 insertions(+), 17 deletions(-)
>
Sascha Silbe April 7, 2016, 8:27 p.m. UTC | #3
Dear Max,

Max Reitz <mreitz@redhat.com> writes:

> On 05.04.2016 11:21, Sascha Silbe wrote:
>> On systems with fast IO, qemu-io may write more than 1 MiB before
>> receiving the block-job-cancel command, causing the test case to fail.
>> 
>> 141 is inherently racy, but we can at least reduce the likelihood of the
>> job completing before the cancel command arrives by bumping the size of
>> the data getting written; we'll try 32 MiB for a start.
>
> Hm, interesting. I tried to prevent this by setting the block jobs'
> speed to 1, which should make it stop after the block job has processed
> the first block of data.

Hmm, seems like I didn't quite understand the way the test works
then.

@Kevin: Please do NOT queue this patch.

@Max: From a cursory glance at the code, maybe your 1 *byte* per second
rate limit is being rounded down to 0 *blocks* per second, with 0
meaning no limit? See e.g. mirror_set_speed(). Though I must admit I
don't understand how speed=0 translates to unlimited (like
qapi/block-core.json:block-job-set-speed says). My understanding of
ratelimit_calculate_delay() is that speed=0 means "1 quantum per time
slice", with time slice usually being 100ms; not sure about the
quantum.

Sascha
Kevin Wolf April 8, 2016, 11:11 a.m. UTC | #4
Am 07.04.2016 um 22:27 hat Sascha Silbe geschrieben:
> Dear Max,
> 
> Max Reitz <mreitz@redhat.com> writes:
> 
> > On 05.04.2016 11:21, Sascha Silbe wrote:
> >> On systems with fast IO, qemu-io may write more than 1 MiB before
> >> receiving the block-job-cancel command, causing the test case to fail.
> >> 
> >> 141 is inherently racy, but we can at least reduce the likelihood of the
> >> job completing before the cancel command arrives by bumping the size of
> >> the data getting written; we'll try 32 MiB for a start.
> >
> > Hm, interesting. I tried to prevent this by setting the block jobs'
> > speed to 1, which should make it stop after the block job has processed
> > the first block of data.
> 
> Hmm, seems like I didn't quite understand the way the test works
> then.
> 
> @Kevin: Please do NOT queue this patch.

No problem. I expect this series to go through Max's tree anyway.

Kevin
Sascha Silbe April 8, 2016, 12:01 p.m. UTC | #5
Dear Max,

Sascha Silbe <silbe@linux.vnet.ibm.com> writes:

> @Max: From a cursory glance at the code, maybe your 1 *byte* per second
> rate limit is being rounded down to 0 *blocks* per second, with 0
> meaning no limit? See e.g. mirror_set_speed(). Though I must admit I
> don't understand how speed=0 translates to unlimited (like
> qapi/block-core.json:block-job-set-speed says). My understanding of
> ratelimit_calculate_delay() is that speed=0 means "1 quantum per time
> slice", with time slice usually being 100ms; not sure about the
> quantum.

I think I've understood the issue now.

The backup, commit, mirror and stream actions operate in on full chunks,
with chunk size depending on the action and backing device. For
e.g. commit that means it always bursts at least 0.5MiB; that's where
the value the reference output comes from.

ratelimit_calculate_delay() lets through at least one burst per time
slice. This means the minimum rate is chunk size per time slice (always
100ms). So for commit and stream one will always get at least 5 MiB/s. A
surprisingly large value for something specified in bytes per second,
BTW. (I.e. it should probably be documented in qmp-commands.hx if it
stays this way).

On a busy or slow host, it may take the shell longer than the time slice
of 100ms to send the cancel command to qemu. When that happens,
additional chunks will get written before the job gets cancelled. That's
why I sometimes see 1 or even 1.5 MiB as offset, especially when running
CPU intensive workloads in parallel.

The best approach probably would be to fix up the rate limit code to
delay for multiple time slices if necessary. We should get rid of the
artificial BDRV_SECTOR_SIZE granularity at the same time, always using
bytes as the quantum unit.

Sascha
Kevin Wolf April 8, 2016, 12:31 p.m. UTC | #6
Am 08.04.2016 um 14:01 hat Sascha Silbe geschrieben:
> Dear Max,
> 
> Sascha Silbe <silbe@linux.vnet.ibm.com> writes:
> 
> > @Max: From a cursory glance at the code, maybe your 1 *byte* per second
> > rate limit is being rounded down to 0 *blocks* per second, with 0
> > meaning no limit? See e.g. mirror_set_speed(). Though I must admit I
> > don't understand how speed=0 translates to unlimited (like
> > qapi/block-core.json:block-job-set-speed says). My understanding of
> > ratelimit_calculate_delay() is that speed=0 means "1 quantum per time
> > slice", with time slice usually being 100ms; not sure about the
> > quantum.
> 
> I think I've understood the issue now.
> 
> The backup, commit, mirror and stream actions operate in on full chunks,
> with chunk size depending on the action and backing device. For
> e.g. commit that means it always bursts at least 0.5MiB; that's where
> the value the reference output comes from.
> 
> ratelimit_calculate_delay() lets through at least one burst per time
> slice. This means the minimum rate is chunk size per time slice (always
> 100ms). So for commit and stream one will always get at least 5 MiB/s. A
> surprisingly large value for something specified in bytes per second,
> BTW. (I.e. it should probably be documented in qmp-commands.hx if it
> stays this way).
> 
> On a busy or slow host, it may take the shell longer than the time slice
> of 100ms to send the cancel command to qemu. When that happens,
> additional chunks will get written before the job gets cancelled. That's
> why I sometimes see 1 or even 1.5 MiB as offset, especially when running
> CPU intensive workloads in parallel.
> 
> The best approach probably would be to fix up the rate limit code to
> delay for multiple time slices if necessary. We should get rid of the
> artificial BDRV_SECTOR_SIZE granularity at the same time, always using
> bytes as the quantum unit.

In the 2.7 time frame we might actually be able to reuse the normal I/O
throttling code for block jobs as the jobs will be using their own
BlockBackend and can therefore set their own throttling limits.

Kevin
Sascha Silbe April 8, 2016, 1:46 p.m. UTC | #7
Dear Kevin,

Kevin Wolf <kwolf@redhat.com> writes:

> Am 08.04.2016 um 14:01 hat Sascha Silbe geschrieben:
[...]
>> The best approach probably would be to fix up the rate limit code to
>> delay for multiple time slices if necessary. We should get rid of the
>> artificial BDRV_SECTOR_SIZE granularity at the same time, always using
>> bytes as the quantum unit.
>
> In the 2.7 time frame we might actually be able to reuse the normal I/O
> throttling code for block jobs as the jobs will be using their own
> BlockBackend and can therefore set their own throttling limits.

Sounds good. Then I suggest we just drop this patch for now. I've added
a reminder to check the situation again at the end of June (i.e. before
soft feature freeze of 2.7).

Will give documenting the minimum rate a try, but that shouldn't hold up
the fixes.

Sascha
diff mbox

Patch

diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
index f7c28b4..a06dc72 100755
--- a/tests/qemu-iotests/141
+++ b/tests/qemu-iotests/141
@@ -83,9 +83,10 @@  test_blockjob()
 }
 
 
-TEST_IMG="$TEST_DIR/b.$IMGFMT" _make_test_img 1M
-TEST_IMG="$TEST_DIR/m.$IMGFMT" _make_test_img -b "$TEST_DIR/b.$IMGFMT" 1M
-_make_test_img -b "$TEST_DIR/m.$IMGFMT" 1M
+IMGSIZE=32M
+TEST_IMG="$TEST_DIR/b.$IMGFMT" _make_test_img "${IMGSIZE}"
+TEST_IMG="$TEST_DIR/m.$IMGFMT" _make_test_img -b "$TEST_DIR/b.$IMGFMT" "${IMGSIZE}"
+_make_test_img -b "$TEST_DIR/m.$IMGFMT" "${IMGSIZE}"
 
 _launch_qemu -nodefaults
 
@@ -147,7 +148,7 @@  echo
 # immediately, send a BLOCK_JOB_COMPLETED and ejecting the BDS would work just
 # fine without the block job still running.
 
-$QEMU_IO -c 'write 0 1M' "$TEST_DIR/m.$IMGFMT" | _filter_qemu_io
+$QEMU_IO -c "write 0 ${IMGSIZE}" "$TEST_DIR/m.$IMGFMT" | _filter_qemu_io
 
 test_blockjob \
     "{'execute': 'block-commit',
@@ -165,7 +166,7 @@  echo
 # immediately, send a BLOCK_JOB_COMPLETED and ejecting the BDS would work just
 # fine without the block job still running.
 
-$QEMU_IO -c 'write 0 1M' "$TEST_DIR/b.$IMGFMT" | _filter_qemu_io
+$QEMU_IO -c "write 0 ${IMGSIZE}" "$TEST_DIR/b.$IMGFMT" | _filter_qemu_io
 
 # With some data to stream (and @speed set to 1), block-stream will not complete
 # until we send the block-job-cancel command. Therefore, no event other than
diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
index adceac1..14e457f 100644
--- a/tests/qemu-iotests/141.out
+++ b/tests/qemu-iotests/141.out
@@ -1,23 +1,23 @@ 
 QA output created by 141
-Formatting 'TEST_DIR/b.IMGFMT', fmt=IMGFMT size=1048576
-Formatting 'TEST_DIR/m.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/b.IMGFMT
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/m.IMGFMT
+Formatting 'TEST_DIR/b.IMGFMT', fmt=IMGFMT size=33554432
+Formatting 'TEST_DIR/m.IMGFMT', fmt=IMGFMT size=33554432 backing_file=TEST_DIR/b.IMGFMT
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=33554432 backing_file=TEST_DIR/m.IMGFMT
 {"return": {}}
 
 === Testing drive-backup ===
 
 {"return": {}}
-Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
+Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=33554432 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: backup"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drv0", "len": 1048576, "offset": 0, "speed": 0, "type": "backup"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drv0", "len": 33554432, "offset": OFFSET, "speed": 0, "type": "backup"}}
 {"return": {}}
 
 === Testing drive-mirror ===
 
 {"return": {}}
-Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
+Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=33554432 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "drv0", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}
 {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: mirror"}}
@@ -37,23 +37,23 @@  Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.
 
 === Testing non-active block-commit ===
 
-wrote 1048576/1048576 bytes at offset 0
-1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 33554432/33554432 bytes at offset 0
+32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {"return": {}}
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: commit"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drv0", "len": 1048576, "offset": 524288, "speed": 1, "type": "commit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drv0", "len": 33554432, "offset": OFFSET, "speed": 1, "type": "commit"}}
 {"return": {}}
 
 === Testing block-stream ===
 
-wrote 1048576/1048576 bytes at offset 0
-1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 33554432/33554432 bytes at offset 0
+32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {"return": {}}
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: stream"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drv0", "len": 1048576, "offset": 524288, "speed": 1, "type": "stream"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drv0", "len": 33554432, "offset": OFFSET, "speed": 1, "type": "stream"}}
 {"return": {}}
 *** done
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 84b7434..1698d7b 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -173,6 +173,7 @@  _filter_qmp()
 {
     _filter_win32 | \
     sed -e 's#\("\(micro\)\?seconds": \)[0-9]\+#\1 TIMESTAMP#g' \
+        -e '/^{"timestamp": {[^}]*}, "event": "BLOCK_JOB_CANCELLED"/ s#\("offset": \)[0-9]\+#\1 OFFSET#' \
         -e 's#^{"QMP":.*}$#QMP_VERSION#' \
         -e '/^    "QMP": {\s*$/, /^    }\s*$/ c\' \
         -e '    QMP_VERSION'